Identification of molecular subtypes based on IMRGs
Univariate cox analysis was performed on preprocessed 739 IMRGs obtained from TCGA dataset. In total, 324 HCC-related IMRGs were identified and used for HCC classification. The cophenetic coefficients, which indicate the stability of classified cluster was used to calculate the optimal k value. We performed comprehensive analysis on the cophenetic, RSS, and silhouette index, from which we selected the k = 2 as the optimal value. Consequently, 2 molecular subtypes (cluster 1 and cluster 2) were identified based on IMRGs (Supplementary Fig. 2). Notably, the matrix heat map exhibited clear boundaries based on k value of 2, suggesting that the molecular subtypes classification was stable (Fig. 1A). The gene cluster heatmap of 324 HCC-related IMRGs revealed marked differences between cluster 1 (C1) and cluster 2 (C2). Specifically, the expression level of HCC-related IMRGs in C2 was significantly higher than C1. In addition, distribution of clinical features between C1 and C2 exhibited significant differences (Fig. 1B). KM analysis revealed that C2 had significant shorter OS than C1 (P = 0.0099) (Fig. 1C). The accuracy of molecular subtypes classification based on IMRGs was determined by comparing the association between the 2 clusters and clinical features using Chi-square test. Results showed that the pathological classification of tumor (T) (P = 0.0002), stage (P = 0.0478), and grade (P = 0.0391) were significantly different between the 2 clusters (Supplementary Table 3). Further comparison of the immune scores between the 2 clusters was performed using TIMER algorithm. Except CD8 cells, the immune scores of B cell (P = 0.045), CD4 T cell (P = 0.015), neutrophil (P = 0.015), macrophage (p = 0.001), and dendritic (p = 0.001) in clusters 2 were higher than that of cluster 1 (Fig. 1C). Collectively, these results revealed that the IMRGs signature could classify HCC into distinct molecular subtypes and was associated with clinical characteristic.
Construction And Validation Of An IMRGs Signature
First, we screened for differentially expressed IMRGs between cluster 1 and clusters 2. A total of 400 IMRGs were found to be significantly differentially expressed. A volcano and clustering map revealed a distinct distribution of up-regulated and down-regulated IMRGs between the 2 clusters (Supplementary Fig. 3A, B). Results of GO analysis showed that the differentially expressed IMRGs were primarily enriched in metabolic process, such as glutamate and lactate metabolism, as well as in tumorigenesis-related process, including cell-cell adhesion and cell migration (Supplementary Fig. 3C). In the KEGG analysis, IMRGs were predominantly enriched in metabolic pathways, such as glucagon signaling, metabolism of xenobiotics by cytochrome P450, and retinol metabolism (Supplementary Fig. 3D). Functionally, the differentially expressed IMRGs were involved in tumorigenesis and metabolism related pathways.
Univariate Cox regression and LASSO (Fig. 2A, B) analyses were conducted to select suitable genes from the 400 differentially expressed IMRGs. 20 significant genes were revealed through above these two analyses were further subjected to a multivariate Cox regression analysis with the mimic AIC value = 466.72. Finally, 6 IMRGs (Supplementary Table 4) were used to construct an IMRGs signature (termed as 6-IS) using the risk score formula. Next, we explored the association between 6 IMRGs and HCC survival. Unlike FMO3 which correlated with good prognosis in high-risk group, SLC11A1, RNF10, KCNH2, ME1, and ZIC2 correlated with shorter survival time in the high-risk group than in the low-risk group (Fig. 2C). Moreover, the 6 IMRGs signature predicted significant differences in survival outcomes between C1 and C2. We then analyzed the expression profile of the 6 IMRGs in the two clusters. Except FMO3, the expression of SLC11A1, RNF10, KCNH2, ME1, and ZIC2 in C2 were significantly higher than in C1 (Fig. 2D). Thus, SLC11A1, RNF10, KCNH2, ME1, and ZIC2 were considered as the hazard indexes and FMO3 as the protective index for construction of an independent prognostic IMRGs signature.
According to the calculation of the risk scores of 6-IS in each sample, we depicted the risk score plot, survival status, and expression profiles of the 6 IMRGs in patients from training set. We found that HCC patients with high-risk scores had higher mortality rates than those with low risk scores, and the changes in expression of 6 IMRGs with increased risk score revealed that SLC11A1, RNF10, KCNH2, ME1, and ZIC2 as hazard index and FMO3 as protective index (Fig. 3A). In the ROC analysis, area under ROC curve (AUC) for the 6-IS signature was 0.80 for 1 year, 0.82 for 3 years, and 0.84 for 5 years, indicating a high prognostic prediction accuracy of the 6-IS signature (Fig. 3B). In the training cohort, patients were divided into high and low risk groups. KM analysis based on 6-IS showed that the OS of low risk groups was significantly better than that of high risk group (Fig. 3C). These results showed that the 6-IS could serve as an independent signature predicting the survival outcomes of HCC patients in the validation set (Supplementary Fig. 4A), GSE15654 (Supplementary Fig. 4B), and HCCDB18 sets (Supplementary Fig. 4C). Taken together, these findings show that the 6-IS could effectively predict the prognosis of HCC patients.
Association of 6-IS signature with clinical features and molecular characteristics of HCC
KM analysis showed that the clinical features, including alpha-fetoprotein (AFP) (p = 0.02871), stage (p = 3e − 05), T (p = 2e − 05), N (Lymph node) (p = 0.02519), and M (Metastasis) (p = 0.00223) could divide the HCC patients of training set based on OS analysis (Supplementary Fig. 5). Next, we predicted the OS of HCC patients using the 6-IS signature according to the above clinical features (AFP > 20, AFP < = 20, T, N, M, Stage I + II, and Stage III + IV). Consistently, we found that 6-IS signature could distinguish the low-risk group from high-risk group. This analysis also revealed that HCC patients in the high-risk groups had significantly shorter survival time than those in low-risk group (Fig. 4). Univariate Cox and multivariable Cox analyses were performed to verify the prognostic value of 6-IS in HCC patients from TCGA and HCCDB18 databases. Thus, we concluded that 6-IS is an independent prognostic marker associated with survival when treated as continuous variable both in TCGA (P = 2.97E-09 and P = 0.0049) and HCCDB18 (P = 0.032 and P = 0.0453) sets (Table 1). In subsequent analyses, we calculated immune, stromal and estimate scores for each sample from TCGA. Except stromal score, immune and estimate scores of high-risk group were significantly higher than those of the low-risk group (Supplementary Fig. 6A). Similar results were obtained in the HCCDB18 dataset (Supplementary Fig. 6B). We then compared expression of IMRGs in samples from TCGA, HCCDB18, and GSE15654 sets using GSEA analysis. An ssGSEA value was obtained which was used to infer the association between 6-IS risk score and biological function. Most of the biological functions were negatively associated with 6-IS risk score, such as glyoxylate and dicarboxylate metabolism, drug metabolism cytochrome p450, and beta alanine metabolism. In contrast, biological functions related to tumorigenesis, including glycerophospholipid metabolism, fatty acid metabolism, cell cycle, and RNA degradation were positively linked to the 6-IS risk score (Supplementary Fig. 7A). The clustering heatmap based on ssGSEA values significantly revealed the biological pathways positively or negatively correlated with 6-IS risk scores (Supplementary Fig. 7B).
Table 1
Univariate and multivariable Cox analyses to identify prognostic-related clinical factors.
Variables | Univariate analysis | Multivariable analysis |
HR | 95%CI of HR | P value | HR | 95%CI of HR | P value |
Entire TCGA cohort |
Risk score (High/Low) | 3.025 | 2.099–4.361 | 2.97E-09 | 1.997 | 1.234–3.233 | 0.0049 |
Age | 1.008 | 0.994–1.022 | 0.231 | 1.021 | 1.001–1.041 | 0.0434 |
Gender (Male/Female) | 0.8 | 0.556–1.150 | 0.229 | 0.881 | 0.542–1.43 | 0.6072 |
AFP | 1.748 | 1.105–2.765 | 0.017 | 2.455 | 1.392–4.333 | 0.0019 |
T3/T4 vs T1/T2 | 2.838 | 1.981–4.067 | 1.32E-08 | 1.285 | 0.775–2.13 | 0.3318 |
N1/N2 vs N0 | 1.617 | 1.112–2.349 | 0.012 | 1.030 | 0.554–1.916 | 0.9248 |
M1/MX vs M0 | 1.795 | 1.235–2.608 | 0.002 | 2.972 | 1.611–5.482 | 0.0005 |
Stage Ⅲ/Ⅳ vs Stage Ⅰ/Ⅱ | 2.767 | 1.893–4.046 | 1.51E-07 | 0.851 | 0.585–1.239 | 0.4010 |
G3/G4 vs G1/G2 | 1.069 | 0.736–1.553 | 0.724 | 1.737 | 1.054–2.864 | 0.0304 |
ICGA cohort |
Risk score (High/Low) | 2.088 | 1.066–4.089 | 0.032 | 1.955 | 0.989–3.859 | 0.0453 |
Age | 1.015 | 0.979–1.052 | 0.406 | 1.004 | 0.968–1.041 | 0.8422 |
Gender (Male/Female) | 0.516 | 0.256–1.039 | 0.064 | 0.360 | 0.166–0.782 | 0.0098 |
Stage Ⅲ/Ⅳ vs Stage Ⅰ/Ⅱ | 2.737 | 1.415–5.295 | 0.0028 | 3.462 | 1.711–7.003 | 0.0006 |
Comparison Of 6-is Signature With External Models
Six genes signature [14], eight genes signature [15], six genes-based prognostic signature [16], and four genes signature [17] were used as the external data set for validation tests. These signatures were established to calculate risk score and assess the OS of patients in TCGA using similar methods as in our study. In line with 6-IS, KM analysis showed that all the four models could divide HCC patients into high and low-risk groups, and the high-risk groups had significantly shorter survival time than low-risk groups (Fig. 5A-D). However, except that eight-genes signature (0.813) which showed similar results with our study, ROC analysis revealed that the average AUC value of 1, 3, 5-year from six genes signature (0.613), six genes-based prognostic signature (0.770), and four genes signature (0.686) were lower than that of 6-IS (0.82) (Fig. 5A-D). In the c-index analysis, the 6-IS showed better prognostic ability than other four models (Fig. 5E). Besides, RMST analysis revealed that 6-IS performed better than other four models in the prognostic prediction of HCC patients (Fig. 5F). These results confirmed that the 6-IS is a robust prognostic prediction signature.