Screening of differentially expressed RBPs in HCC patients
RNA expression profiling of 424 samples (374 neoplasms and 50 normal tissues) for hepatocellular carcinoma patients were obtained from TCGA database. As shown in the heat map and volcano map (Additional file 1: Figure S1 A, B), 4657 differentially expressed mRNAs (3611 up-regulated and 1046 down-regulated) could be located in HCC tissues versus normal tissues. We then took the intersection of 1542 RBPs and 4657 differentially expressed mRNAs and got 133 differentially expressed RBPs, including 111 differentially expressed RBPs that were upregulated and 22 differentially expressed RBPS that were downregulated (Additional file 1: Figure S1 Figure1 C, D).
Functional enrichment analysis and protein-protein interaction (PPI) network establishment of differentially expressed RBPs
In order to reveal possible biological functions of differentially expressed RBPs, we performed GO term and KEGG pathway analysis. There were 247 pathways considering to be significantly enriched, including 152 biological process (BP) terms, 49 cellular component (CC) terms, 46 molecular function (MF) terms. The most significant BP, CC, MF were regulation of mRNA metabolic process, cytoplasmic ribonucleoprotein granule, catalytic activity and acting on RNA, respectively (Figure 1 A). Meanwhile, KEGG pathway analysis identified six significantly enriched pathways: mRNA surveillance pathway, RNA degradation, Spliceosome, RNA transport, Ribosome and Ribosome biogenesis in eukaryotes (Figure 1 B). To further investigate the potential interactions between differentially expressed RBPS, we developed a PPI network based on data from the STRING databank utilizing software Cytoscape (Figure 1 C). Figure1 D showed the top ten hub genes of the PPI network. The top ten hub genes were PABPC1, PIWIL1, ELAVL2, GSPT2, LIN28A, SNRPE, BOP1, DDX39A, DDX39B, DDX4 (Additional file 3: Table S1).
Identify the prognostic RBPs included in the risk model in the training cohort
Patients with an overall survival time of under 1 month and incomplete clinical data in the entire TCGA cohort were excluded, resulting in the inclusion of 319 patients in model construction. The entire TCGA cohort was split into a testing cohort (n = 159) and a training cohort (n = 160) randomly. Considering the validation in GSE14520, we included 81 differentially expressed RBPS in both training cohort and GSE14520 into the study. For the purpose of identifying the prognostic relevance of RBPs, approach of univariate Cox regression was conducted to evaluate the expression of these RBPs in the training cohort, yielding 23 RBPs that were associated with prognosis (Figure 2 A). Lasso regression was employed to delete prognostic-associated RBPs that correlated highly with one another and identified 10 candidate prognostic-associated RBPs (Figure 2 B). Subsequently, all the candidate RBPs were measured by the approach of multivariate Cox regression assay. In the end, five RBPs were identified to establish the prognostic risk model. The five RBPs were ribosomal protein L10-like (RPL10L), enhancer of zeste homolog 2 (EZH2), peroxisome proliferator-activated receptor gamma, coactivator 1 alpha (PPARGC1A), zinc finger protein 239 (ZNF239) and interferon-induced protein with tetratricopeptide repeats 1 (IFIT1; Figure 2 C).
Establishment of the model for the prognostic risk in the training cohort
With the purpose of investigating the relevance of these five prospective RBPs in predicting the outcomes of patients with HCC, these five RBPs have been adopted to develop a prognostic model of risk. The risk score was determined by the following method: Risk score = (0.1400 × RPL10L expression) + (0.4536 ×EZH2 expression) + (-0.1195 × PPARGC1A expression) + (0.1537 ×ZNF239 expression) + (-0.2530 × IFIT1 expression). The optimal cut-off value of -0.142 for the risk score was determined utilizing the Survminer R package. Patients in the training group were divided to high-risk (n = 41) and low-risk (n = 119) groups in accordance with cut-off values. In accordance of Kaplan-Meier analysis, the high-risk group had remarkable poorer OS than the low-risk group (p<0.001; Figure 3 A). The prognostic value of the five RBPs characteristics was further assessed with time-dependent receiver operating characteristic curve (ROC) analysis. For 1-year survival, 3-year survival, and 5-year survival, the area under the ROC curve (AUC) was 0.763, 0.763 and 0.731, respectively. (Figure 3 B). The risk score analysis for the prognostic risk model in the training cohort was depicted in figure 3 C.
Validation of the prognostic risk model
With the aim of evaluating on whether the five RBPs prognostic risk models had parallel predictive value in cohorts of other HCC patients, the same formula was used to determine risk score for the testing cohort, the TCGA full cohort and the GSE14520 cohort separately. Individuals in the testing and the entire TCGA cohort were sorted to high- and low-risk groups in accordance with the optimal cut-off value for the training cohort. The best cutoff value of 1.111 in the GSE14520 cohort was obtained through the Survminer R package as the training cohort. Our finding was that in the testing cohort, TCGA cohort as well as GSE14520 cohort, patients in the high-risk group were in more adverse OS in comparison to the low-risk group. The 1-, 3-, and 5-year AUC of the testing cohort, TCGA cohort and GSE14520 cohort were 0.74, 0.731, 0.75 and 0.751, 0.747, 0.742 and 0.668, 0.679, 0.637, respectively (Figure 4). The distribution of risk scores, the scatter plot of survival status and the expression heat map for the three validation cohorts were shown in additional file 2: Figure S2. Prognosis for the high risk group has been shown to be unfavorable in comparison to the low risk group, which parallels the results of the training set. There was higher level of EZH2, RPL10L, ZNF239 expression in the high-risk category compared to the low-risk category, while PPARGC1A, IFIT1 level was reduced compared to the low-risk category. Taking all into consideration, these findings suggested that our model of prognostic risk could reliably predict the OS of HCC patients.
Independent prognostic role of the prognostic risk model in the whole TCGA cohort
We evaluated the prognostic value of different clinical variables in the TCGA cohort of HCC patients by uni- and multi-variate Cox regression assay. Both uni- and multivariate Cox regression assays indicated that risk score and pathological stage were independent prognostic role. Among the parameters of age, gender, histological grade and pathological stage and risk score, 1-, 3-, and 5-year AUC of risk score were the largest and the hazard rate of the risk score was also the largest (Figure 5), which indicated that risk score predicted OS at 1,3 and 5 years more accurately than other clinical characteristics.
Building a predictive nomogram in the whole TCGA cohort
A nomogram created using two independent prognostic factors including pathological stage and risk score was used to predict OS at 1, 3 and 5 years in the whole TCGA cohort (Figure 6 A). ROC curve was used to evaluate the prediction accuracy of nomogram. The area under the ROC curve for 1-,3-, and 5-year was 0.792, 0.786, 0.763 (Figure 6 B), which manifested that the nomogram had very good prediction accuracy. Calibration curve analysis demonstrated that the 1-, 3-, and 5-year survival rates that were predicted by the nomogram corresponded well with the observed survival rates. (Figure 6 C, D, E).
Clinical utility of the prognostic risk model in the whole TCGA cohort
For the purposes of assessing the clinical utility of the predictive model, the correlation between risk factors (risk score and risk genes) derived from the model and the clinical properties of the entire TCGA cohort was assessed. As shown in Figure 7, compared with low histological grades, high histological grades had higher values of EZH2, ZNF239, and risk score, and lower values of IFIT1 and PPARGC1A (all p<0.05). Compared with low pathological stage, high pathological stage had higher values of EZH2 and risk score, and lower values of IFIT1 and PPARGC1A (all p<0.05). The value of IFIT1 for male was higher than that for female, and the value of ZNF239 was lower than female. The expression of RPL10L in elderly patients was greater than that of young patients. These findings demonstrated that these RBP genes were intimately associated with the development of HCC.
Tumor-infiltrating immune cells played a key function in the genesis and progress of tumors. Therefore, the potential association between risk score and level of immune infiltration in the whole TCGA set was analyzed. As shown in Figure 8, risk score had a positive correlation with neutrophil, B cell, CD4/CD8+ T cells, dendritic, macrophage and (all p<0.05). These findings revealed that the score obtained by our model were highly correlated with immune cell infiltration.