Identification of Differentially ExpressedRBPs
The flowchart was shown in Figure 1. Atotal of 502 LUSC samples and 44adjacent tumor sampleswere considered eligible in the study. Differentially expressedRBPs were screened by the “limma” package of R software. In total, 249 differentially expressedRBPs were identified, including 104 downregulated RPBs and 145 up-regulated RPBs (Supplementary Table 1). The heatmap and volcano plot was exhibited in Figure 2A,B.
GO and KEGG Analysis
To uncover the potential function and molecular mechanism of differentially expressed RBPs, they were divided into two groups (upregulated and downregulated group) according to their expression levels. Functional enrichment analysis was carried out via thehypergeometric test using the ClusterProfiler R package. Upregulated differentially expressed RBPs werenotably enriched in biological processes, including metabolic process, ncRNA processing, ribonucleoprotein granule, and spliceosomal complex.The downregulated differentiallyexpressed RBPs were enriched in the mRNA processing, RNA splicing, ribosome, and translation repressor activity (Table 1, Figure 3A,B).The results of KEGG analysis for upregulated RPBs revealed that a total of nine KEGG terms was enriched, including Spliceosome, RNA transport and Ribosome biogenesis in eukaryotes, whereas thedownregulated RBPs were significantly enriched in RNA transport, mRNA surveillance pathway and RNA degradation (Table 1, Figure 3C,D).
Construction of PPI Network and identification of KeyModules
The PPI network was established using these differentially expressedRBPs in the STRING database, then a protein-protein co-expression network was visualized using Cytoscape.The PPI network for RBPs was exhibited in Figure 4A, which included 223 nodes and 1,094 edges. The two hub networks were identified via the plug-in MODE in Cytoscape. Module I included 22 genes, while Module II included 33 genes (Figure 4B,C,D). The GO and KEGG analysis in Module 1 was mainly enriched in RNA splicing, ncRNA processing, ribosome biogenesis, RNA polymerase, and NOD-like receptor signaling pathway, while the GO and KEGG analysis in Module 2were mainly enriched inmitochondrial translation, snoRNA metabolic process, nuclear RNA surveillance, Ribosome, and RNA degradation.
Selection of Prognostic-Related RBPs
Among 249 differentially expressedRBPs, 24 RPBs were regarded as prognostic-related RPBs using the univariate Cox regressionanalysis (Figure 5). Then, We used the LASSO method to build a prediction model, selected non-zero regression coefficients to identify the optimal RPBs gene set, and used the selected RPBs to build RPBs-based predictive model to predict prognosis. In LASSO logistic regression method, FifteenRPBs as optimal features were ultimately recognized, includingDNMT1, MRPL33, EZH2, PCF11, RBM24, TRMT112, DZIP1, EIF5A2, MKRN3, DARS2, PSMA6, AZGP1, LENG9, IGF2BP2,andCIRBP.The RPBs-based a predictive model was calculated as the following formula: The risk score of eachHNSC patient was computed according to the following formula:DNMT1*(-0.21475) +MRPL33*0.177058+ EZH2*(-0.12256)+ PCF11*(-0.23465)+ RBM24*0.08662+ TRMT112*0.091039+ DZIP1*0.025207+ EIF5A2*0.147603+MKRN3*0.495024+DARS2*0.259781+PSMA6*0.161435+AZGP1*(-0.0518)+LENG9*(-0.14885)+ IGF2BP2*(-0.027825)+ CIRBP*0.00951.
To assess the predictive ability of this model, the patients were divided into high- and low-risk groups for survivalanalysis according to the median risk score. Patients in thehigh-risk group exhibited better survival rates than thosein the low-risk group (Figure 6A). The time-dependent ROC analysis was used to evaluate the prognostic performance of the 15-RBP gene signature. The AUC of theROC curve for OS was 0.67(0.62-72) at 1 years, 0.68 (0.63-0.73) at 3 years and 0.67 (0.60-0.74) at 5 years (Figure 6B). The result indicated that the model had moderate diagnostic strengthen. The expression heat map and survival status ofpatients with the 15-RBP gene biomarkers in the low- and high-risk subgroups are shown in Figure 6C-E.
Univariate and multivariate Cox regression analysis suggested that the prediction model could become an independent predictor, including age, gender, histologicalgrade,stage, TNM stage, and risk score(Figure 7, Table 3).
Validation of candidate RPBs in the HPA dataset
To further determine the expression of these hub genes in HNSC,we used immunohistochemistry results from the Human ProteinAtlas (HPA) database to show that AZGP1,DNMT1, DZIP1, IGF2BP2, LENG9, MRPL33, PSMA6, RBM24, and TRMT112were significantly increased in head and neck cancer compared withnormal tissue (Figure 8).
Validation of established prognostic RPB model
The established prognostic RPB model based on the 15 RPBs was validated in another GEO (GSE84437) dataset using the same risk assessment formula, and the result indicated that the model exhibited better predictive ability. Patients in the high-risk group suffered a lower survival probability (Figure9A). The area under the curve of the ROC curve for 1,3,5 yearreached 0.58, 0.55, and 0.58, respectively (Figure 9B). The expression heat map and survival status ofpatients with the 15-RBP gene biomarkers in the low- and high-risk subgroups are shown in Figure 9C-E.Univariate and multivariate Cox regression analysis suggested that the IRGPM could become an independent predictor, including age, gender, histologicalgrade,stage, TNM stage, and risk score(Figure 10A,B).