Mapping of RNA binding proteins and consensus clustering
In this study, we performed a systematic analysis of RNA binding proteins in ccRCC patients. The flow chart of the study design is shown in Fig. 1. A total of 522 ccRCC patients met inclusion criteria and were included for subsequent analysis. We firstly mapped these RBPs genes on TCGA dataset and performed consensus clustering using the ConsensusClusterPlus package[13]. Based on the expression similarity of RBPs genes, k = 3 seemed to be a more stable value from k = 2 to 10 in the TCGA dataset (Figs. 2B,C). In addition, we also performed PCA analysis to evaluated the reliability of the consensus clustering. After we divided these patients into 3 clusters, PCA analysis showed that the cluster 1 and cluster 3 had high similarity and gathered together (Figs. 2E). Therefore, we divided patients into 2 clusters (Figs. 2A). PCA analysis showed that cluster classifications showed different distribution between two clusters. Patients in cluster 1 gathered together and cluster 2 gathered together (Figs. 2D). These results determined the reliability of two different clusters.
2. Cluster Classification Was Correlated With The Malignancy Of ccRCC
To evaluate the association between two clusters and clinical factors, we performed survival analysis and evaluated the distribution characteristics for the two clusters. Kaplan–Meier curve showed that patients in cluster 2 had poorer prognosis compared with cluster 1 (Figs. 2F). In addition, we found that distributions of clinical factors between two clusters, including gender, stage, grade, pathological T staging and pathological M staging were significantly different (Table 1). Patients in cluster 2 had relatively higher proportions of advanced stage (Fig. 3A), high grade (Fig. 3B), high pathological T staging (Fig. 3C), lymph node metastasis (Fig. 3D) and distant metastasis (Fig. 3E) than cluster 1. These results showed patients in cluster 2 were closely correlated with the malignancy of ccRCC. GSEA analysis revealed that cluster 2 significantly enriched several oncogenic pathways (Fig. 3F), including epithelial mesenchymal transition (normalized enrichment score (NES) = 1.578, size = 194), G2M checkpoint (NES = 1.532, size = 184), KRAS signaling ( NES = 1.539, size = 188) and IL6 JAK STAT3 signaling (NES = 1.360, size = 87). All these results indicated that cluster classifications identified by consensus clustering are correlated with the malignancy of ccRCC.
3. Screening Of Differentially Expressed RBPs In ccRCC
A total of 522 ccRCC and 72 normal tissue samples were included to obtain differentially expressed genes using “limma” package, and intersected with RBPs. A total of 115 genes, comprising 71 up-regulated and 44 down-regulated RBPs, were obtained. The results of differentially expressed analysis were showed in Table S1. Then, we performed functional enrichment analysis of these differentially expressed RBPs, and the results meeting the criterions were showed in Table S2. For up-regulated RBPs, the most significantly enriched terms were “SRP-dependent cotranslational protein targeting to membrane” for biological processes, “ribosome” for cellular component, “RNA binding” for molecular function and “Ribosome” for KEGG pathway. For down-regulated RBPs, the most significantly enriched terms were “mRNA processing” for biological processes and “nucleotide binding” for molecular function.
4. PPI Network Construction And Key Module Screening
To further evaluate the roles of these differentially expressed RBPs in ccRCC, we submitted these genes to STRING database and constructed a PPI network (Fig. 4A). We obtained 113 PPI nodes, 398 PPI edges and a PPI enrichment p-value < 1.0e-16. Meanwhile, the functional enrichments for the PPI network were obtained. For biological processes, the mainly enriched terms were mRNA metabolic process, nucleobase-containing compound catabolic process, cellular nitrogen compound catabolic process, RNA catabolic process and cellular nitrogen compound metabolic process. For molecular function, the terms including RNA binding, nucleic acid binding, heterocyclic compound binding, organic cyclic compound binding and structural constituent of ribosome were enriched. For cellular component, the PPI network was located in ribonucleoprotein complex. They also located in cytosolic ribosome, ribosome, ribosomal subunit and ribonucleoprotein granule. For KEGG pathway,
The mainly enriched results were nonsense mediated decay independent of the exon junction complex, eukaryotic translation termination, selenocysteine synthesis, selenoamino acid metabolism and viral mRNA translation. Then, we screen hub genes using cytoHubba plug-in with MNC algorithm, and ten hub genes were selected: RPLP0, RPS14, OASL, RPS20, RPL35, RPS2, RPL10, RPL30, RPS15 and RPL18. We further selected import subnetworks with MCODE plug-in (Fig. 4B, C). The ClueGO was used to explore the functional mechanism. The module 1 was mainly enriched in cytosolic large ribosomal subunit, cytosolic ribosome, ribosome assembly and ribosome biogenesis (Fig. 4B), whereas the module 2 was mainly enriched in 2'-5'-oligoadenylate synthetase activity, negative regulation of viral genome replication and type I interferon signaling pathway (Fig. 4C).
5. Evaluation Of Hub Genes
These 10 hub genes from PPI network were further evaluated in HPA database. The results showed that these genes (RPLP0, RPS14, OASL, RPS20, RPL35, RPS2, RPL10, RPL30, RPS15 and RPL18) were highly expressed in ccRCC samples compared with normal samples (Fig. 5). In addition, we used ROC curves to evaluate the efficiency of these genes to differentiate tumor samples from normal samples using “pROC” package[15]. These genes showed good performance of diagnostic accuracy for ccRCC (Figure S1). Moreover, the genetic changes (mutation and copy number alteration for TCGA) of these genes were evaluated using the cBioPortal online tool, and 77 samples out of 448 ccRCC samples were found altered (Figure S1K). The mutation frequencies of these 10 genes were low except RPS14 gene. The amplification of RPS14 gene was the most frequent copy-number alteration.
6. Identification Of RBPs With Prognostic Value
The prognostic values of these differentially expressed RBPs were evaluated in ccRCC patients. We performed univariate Cox regression analysis to select survival-related RBPs in TCGA dataset. The results showed that 77 out of 115 differentially expressed RBPs were correlated with patients’ overall survival (Table S3). Then, we performed LASSO Cox regression analysis based on survival-related RBPs in TCGA dataset to select the most prognostic genes, and 12 genes were selected, including PABPC1L, IGF2BP3, PAIP2B, RBM47, RPL22L1, IGF2BP2, RNASE2, CSDA, CNP, DDX25, RALYL and NYNRIN (Figure S2A,B). To further determine the prognostic values of these twelve genes, Kaplan–Meier curves and log-rank test for overall survival were used, and ten RBP genes showed good prognostic values between high and low expressions, including CNP (Figure S2C), CSDA (Figure S2D), DDX25 (Figure S2E), IGF2BP2 (Figure S2F), IGF2BP3 (Figure S2G), PABPC1L (Figure S2H), PAIP2B (Figure S2I), RBM47 (Figure S2J), RNASE2 (Figure S2K) and RPL22L1 (Figure S2L). While, NYNRIN (Figure S2M) and RALYL (Figure S2N) showed no significant difference in survival.
7. Construction And Validation Of A Risk Score Model
These ten survival-related RBP genes were performed multiple Cox regression analysis to construct a risk score model. The risk score for each ccRCC patient was calculated using the genes’ expressions multiplies the coefficients obtained from multiple cox regression analysis. The detailed formula was showed as follow: Risk score = (0.22403 * PABPC1L) + (0.08177 * IGF2BP3) + (-0.01128 * PAIP2B) + (-0.28659 * RBM47) + (0.06504 * RPL22L1) + (0.04107 * IGF2BP2) + (0.07886 * RNASE2) + (0.09797 * CSDA) + (0.32254 * CNP) + (-0.13138 * DDX25). To evaluate the prognostic value of the risk score model, patients were divided into high-risk and low-risk groups based on the median risk score. PCA analysis showed different distribution patterns between high-risk and low-risk patients (Fig. 6A), of which high-risk patients gathered together and low-risk patients gathered together. Kaplan–Meier curves and log-rank test were used and showed that high-risk patients had poor prognosis compared with low-risk patients (Fig. 6B). The prognostic performance of risk score model was evaluated using ROC curves (Fig. 6C). The AUCs of the curves were 0.756 at 1 year, 0.717 at 2 years, 0.738 at 3 year and 0.761 at 5 year, showing the risk score model had moderate sensitivity and specificity. The risk score model was further validated in E-MTAB-3267 set. The risk score for each patient was calculate using the same formula and patients were divided into high- and low-risk based on median risk score. PCA analysis also showed different distribution patterns (Fig. 6D). Kaplan–Meier curve showed high-risk patients had poor prognosis despite the P value larger than 0.05 (Fig. 6E). The ROC curves also showed the good performance of the risk score model (Fig. 6F). These results showed the reliability and stability of the risk score model.
8. Risk score model showed strong associations with clinicopathological factors in ccRCC
The heat map showed the expression levels of ten genes for each patient in TCGA dataset (Fig. 7A). The clinicopathological factors between high-risk and low-risk patients were compared. We found that high-risk patients had high proportion of advanced stage (P < 0.0001), high grade (P < 0.0001), high pathological T staging (P < 0.0001), male (P < 0.05) and death (P < 0.0001). We performed univariate Cox regression analysis for these clinical factors and risk score model, and we found the risk score model was risk factor for overall survival of ccRCC patients (Fig. 7B). We also performed multiple Cox regression analysis and found the risk score model was still an independent risk factor for overall survival after integrating with age, gender, stage, grade, pathological T staging, pathological N staging and pathological M staging (Fig. 7C). These results showed the risk score model was closely correlated with clinicopathological factors in ccRCC.