3.1 Study Process and Patients' Information
The entire work of construction process of prediction model for HCC patients was presented in Fig. 1A, and the clinical information from training cohort, internal testing cohort and external testing cohort was summarized in Table 1.
Table 1
Summary of patient demographics and clinical characteristics in training cohort and testing cohort.
Characteristic
|
Training cohort
|
Internal testing cohort
|
External testing cohort
|
Gender
|
|
|
|
male
|
187(67%)
|
89(63%)
|
182(75%)
|
female
|
89(33%)
|
55(37%)
|
61(25%)
|
Age
|
|
|
|
< 60
|
129(46%)
|
59(40%)
|
45(19%)
|
>=60
BMI
< 25
>=25
Vascular tumor cell
Yes
No
Grade
1 + 2
3 + 4
|
147(56%)
144(53%)
132(47%)
94(34%)
182(66%)
161(64%)
165(36%)
|
85(60%)
69(49%)
75(51%)
44(30%)
100(70%)
82(59%)
62(41%)
|
198(81%)
-
-
-
-
-
-
|
Stage
|
|
|
|
stage I-II
|
223(74%)
|
120(75%)
|
146(60%)
|
stage III-IV
|
53(26%)
|
24(25%)
|
97(40%)
|
Vital
|
|
|
|
living
|
244(68%)
|
125(68%)
|
199(82%)
|
Dead
|
126(32%)
|
59(32%)
|
44(18%)
|
Risk group
|
|
|
low risk
|
185(50%)
|
96(52%)
|
123(51%)
|
high risk
|
185(50%)
|
88(48%)
|
120(49%)
|
3.2 Establishment of the four-gene-based prognostic gene signature
510 iron metabolism-related genes were identified from the 13 gene sets, and 481 overlapping iron metabolism-related genes were obtained between training cohort and external cohort. A total of 37 DEGs were obtained when compared patients who survived < 1 year and more than 3 years using training set in the TCGA database (Fig. 1B). 370 patients with complete survival time from training cohort were included for subsequent univariate Cox proportional hazards regression analysis, 25 genes were significantly associated with OS(Fig. 2C,D). Then the LASSO analysis was performed to narrow the gene range. 13 genes were identified and subsequently used to construct a gene signature. Risk-formula: Risk score = expression of CYP3A5*-0.00144471228429986 + expression of CCNB1*0.00932548817179557 + expression of ABCB6*0.115364054791514 + expression of FLVCR1*0.00981797872412028 + expression of OSBP2*0.0960318647350786 + expression of G6PD*0.00325860881257371 + expression of RAP1GAP*0.00568586514806877 + expression of SLC7A11*0.0217165375535805 + expression of PPAT*0.196619205112227 + expression of CYP2C9*-0.000726941457117346 + expression of PLOD2*0.010247507631253 + expression of IGSF3*0.000979749106599136 + expression of RRM2*0.00176828844166581. The interaction network among these genes was presented in Fig. 2E and the correlation between these genes is presented in Fig. 1F. We then calculated the risk score for each patient in training cohort and median risk score was set as cut-off for risk score, 370 patients in training cohort were divided into high risk (185)and low risk (185)group, 184 patients from internal testing cohort were divided into high risk (86)and low risk (98)group, 243 patients from external testing cohort were divided into high risk (185) and low risk (58)group.
3.3 Diagnostic Values of iron metabolism-related signature
In training cohort, Kaplan–Meier curve, time-dependent ROC and PCA analysis were performed to assess the integrated effects of iron metabolism-related signature on the prognosis(Fig. 2). Patients in high-risk group shown significantly poorer OS than patients in the low-risk group(Fig. 2B, P < 0.001), the expression level of 11 prognostic genes increased with risk score increased, including OSBP2, SLC7A11, IGSF3, ABCB6, FLVCR1, PPAT, RAP1GAP, PLOD2, G6PD, CCNB1 and RRM2. While CYP3A5 and CYP2C9 increased with risk score decreasing(Fig. 2A). The area under the ROC curve (AUC) for 1-year, 2-year, and 3-year OS were 0.828, 0.760, 0.739(Fig. 2C). PCA analysis indicated the patients in different risk groups were distributed in two and three directions(Fig. 2D,E), Therefore, the signature have demonstrated a good performance.
3.4 Further Validation of the iron metabolism-related signature using Independent internal testing cohort and external testing cohort
In order to verify the reliability of the 13-Gene Signature, similar procedures were performed in internal testing cohort(Fig. 3), external testing cohort(Fig. 4). Patients in the high-risk group indicated significantly poorer OS than patients in the low-risk group in internal testing cohort (Fig. 3B)and external testing cohort (Fig. 4B)(all P < 0.001). The AUC for 1-year, 2-year, and 3-year OS were 0.807, 0.817, 0.774 in internal testing cohort(Fig. 3C); 0.734, 0.743, 0.743 in external testing cohort(Fig. 4C). PCA analysis indicated displayed a different distribution pattern of high and low risk according to iron metabolism-related signature in internal testing cohort(Fig. 3D,E) and external testing cohort(Fig. 4D,E).
3.5 Independent prognostic role of the gene signature from the Other Clinicopathological Features
Iron metabolism-related signature could be considered as an independent factor in predicting OS. Univariate and Multivariate Cox regression analysis were conducted to assess independent predictive values. In training cohort, Univariate Cox regression analysis indicated that BMI, TNM stage and signature had a prognostic value(Fig. 2F), Multivariate Cox regression analysis revealed that BMI, TNM stage and signature were independent prognostic factor correlation with worse OS(Fig. 2G). In internal testing cohort, Univariate Cox analysis suggested that BMI, TNM stage and signature were associated with OS(Fig. 3F), Multivariate Cox regression analysis showed that BMI, TNM stage and signature were independent prognostic factor associated with OS(Fig. 3G). In external testing cohort, Univariate Cox analysis suggested that gender, TNM stage and signature had a prognostic value(Fig. 4F), Multivariate Cox regression analysis indicated that gender, TNM stage and signature were independent prognostic factor associated with OS(Fig. 4G). These results confirmed that gene signature is an independent predictor of prognosis in HCC patients.
3.6 Validation of expression level of the four genes and risk score in different subgroups
As showed in Fig. 5A-M, the expression level of ABCB6, CCNB1, FLVCR1, G6PD, IGSF3, OSBP2, PLOD2, PPAT, RAP1GAP, RRM2 and SLC7A11 was significantly higher in high risk group compared with low risk group in training cohort, The expression level of CYP3A5 and CYP2C9 was significantly lower in high risk group compared with high risk group in training cohort. Moreover, the risk score in tumor was obviously higher in than normal liver(Fig. 5N), the AUC was 0.932 (95%CI: 0.919–0.945)(Fig. 5O). In addition, the heatmap showed the expression of the nine genes in high- and low-risk patients in the TCGA dataset. It showed that high- and low-risk group associated with age, sex, survival status, grade and stage(Fig. 6A).
3.7 Subgroup Analysis of OS by clinical variables
We performed survival analysis stratified by TNM stage in training cohort(Fig. 6C,D), internal testing cohort(Fig. 6F,G) and external testing cohort(Fig. 6I,J). Risk score grouped based on 13-gene signature, the log-rank test indicated that HCC patients in high risk group still had obviously worser OS than patients in low risk group in different subgroups in stage I-II and stage III-IV. Furthermore, as the TNM stage and tumor grade increased, the risk score increased in training cohort(Fig. 6B), internal testing cohort (Fig. 6E) and external testing cohort(Fig. 6H).
3.8 Comparing the performance of the gene signature with other gene signatures
In recent years, constructing prognostic gene signature for survival prediction has been noted in many studies. Thus, we further compared the gene signature with other gene signatures, the AUC of a seven gene signature(KIF18B, CEP55, CIT, MCM7, CDC45, EZH2, and MCM5) is 0.70, 0.69, 0.67 at 1, 3 and 5-year,[15] the AUC of a three gene signature(UPB1, SOCS2 and RTN3) is 0.72, 0.70, 0.60 at 1, 3 and 5‐year,[16] the AUC of a four gene signature(PBK, CBX2, CLSPN, and CPEB3) is 0.71 at 1-year, 0.66 at 3 ‐year,[17] the AUC of a three eight signature(DCAF13, FAM163A, GPR18, LRP10, PVRIG, S100A9, SGCB, and TNNI3K) is 0.71, 0.69 at 3 and 5‐yea.[18] Peng‐Fei Chen et al[19] have revealed a four-gene signature(KPNA2, CDC20, SPP1, and TOP2A), however, the study lacks of systematic analysis of the four-gene signature, including Univariate and Multivariate Cox regression analysis, time-ROC analysis. Gao‐Min Liu et al[20] have reported a four-gene signature(ACAT1, GOT2, PTDSS2, and UCK2), the AUC of gene signature is 0.70, 0.71, 0.68 at 1-year, 3-year, 5-year, and the AUC of Nomogram is 0.75, 0.75, 0.68 at 1-year, 3-year, 5-year. Our gene signature revealed that the AUC of gene signature is 0.78, 0.68, 0.67 at 1,3,5-year, and the AUC of our Nomogram is 0.78, 0.75, 0.73 at 1,3,5-year. Thus, it indicates that our model has a high sensitivity and specificity for predict the survival, with strong prediction ability of short-term.
3.9 Functional Annotation for Genes of Interest
HCC patients were divided into high- and low-risk groups according to median risk score. We identified 936 up-upregulated DEGs and 162 down-regulated DEGs(corrected P-value < 0.05 and absolute log fold change (FC) > 1), Go enrichment analysis revealed that DEGs were mainly involved in mitotic nuclear division, nuclear division, chromosome segregation, nuclear chromosome segregation(Fig. 7A). KEGG enrichment analysis revealed that DEGs were mainly involved in Cell cycle, ECM − receptor interaction, Glycolysis / Gluconeogenesis and so on(Fig. 7B).
Moreover, we conducted GSEA between high and low risk group compared with the median level of risk score to identify the significant pathways (NOM P-value < 0.05), the genes in high risk group were mainly enriched in cell cycle related pathways, including KEGG_CELL_CYCLE, KEGG_OOCYTE_MEIOSIS, KEGG_PURINE_METABOLISM, KEGG_PYRIMIDINE_METABOLISM, KEGG_RNA_DEGRADATION. As to low risk group, the genes were enriched in metabolic pathways, such as KEGG_DRUG_METABOLISM_CYTOCHROME_P450,KEGG_FATTY_ACID_METABOLISM, KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM, KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS,KEGG_TRYPTOPHAN_METABOLISM(Fig. 7C).
To furtherly investigate the potential mechanisms of Risk score in HCC, WGCNA method was performed to cluster genes that highly correlated with the high risk and low risk group. 9 was selected as the soft thresholding power to produce a weighted network(Fig. 8A). We identified 9 modules (Fig. 8B,C) in the study(excluding a gray module). As showed in Fig. 8D, we found genes in blue module was highly positively associated with Risk score. Go enrichment analysis was performed to explore the potential mechanisms of genes in blue module using online-based web tool “Metascape” (http://metascape.org/), as showed in Fig. 8E and Fig. 8F, genes in the blue module were significantly enriched in metabolic process and cell cycle-related process.
3.10 Predictive immunotherapy response of identified HCC subtypes
We employed CIBERSORT algorithm to identify tumor infiltrating immune subsets proportions in HCC, and 21 kinds of immune cell profiles in HCC samples were constructed and showed in Fig. 9A,B. The difference analysis revealed that almost all types of immune cells (excluding B. cells. naive )were significantly different between high- and low-risk group(Fig. 9C). To further explore the correlation between the risk score and immune status, the enrichment scores of ssGSEA associated with functions or pathways were quantified, Cytolytic_activity, MHC_class_I, Type_I_IFN_Reponse and Type_II_IFN_Reponse were significantly different between the low risk and high risk group(Fig. 9D). Finally, ImmuneCellAI ( http://bioinfo.life.hust.edu.cn/web/ImmuCellAI/) was performed to predict immunotherapy response, Fig. 9E and Fig. 9F suggested that patients in the high-risk group might present with a better response for immunotherapy.
3.11 Difference of immune checkpoints expression between 2 HCC subtypes
We performed a correlation analysis between risk score and immune checkpoints, the result revealed risk score positively correlated with immune checkpoints, including PD-L1(CD274)(Fig. 10A), CTLA-4 (Fig. 10B) and TIM-3 (HAVCR2) (Fig. 10C) and TIGIT(Fig. 10D). Expression of PD-L1(Fig. 10E), CTLA-4(Fig. 10F), TIM-3(HAVCR2)(Fig. 10G) and TIGIT(Fig. 10H) were significantly higher in high-risk group versus low-risk group in TCGA.
3.12 Building and validating a Nomogram to Predict OS
Univariate and Multivariate Cox regression analysis have revealed that TNM stage and gene signature were independent prognostic factor in training cohort and testing cohorts. Thus TNM stage and signature were used to build a nomogram to predict 1-year, 3-year, and 5-year OS in the TCGA cohort(Fig. 11A). The calibration curve analysis confirmed the reliability of the nomogram, which demonstrated that 1-, 3-, and 5-year survival prediction probabilities were closely related to the observed survival probability(Fig. 11B,C,D), and the C-index for the nomogram is 0.746342(0.708523–0.784161), which was more large than Signature(0.726819(0.680894–0.772744)) and Stage(0.664469(0.589597–0.73934))(Table 2). Time- ROC curve analysis was applied to evaluate the prediction value of nomogram, all of AUCs of the nomogram was presented in Fig. 11E in training cohort(AUC of 1-year, 3-year, 5-year), in Fig. 11F in internal testing cohort (AUC of 1-year, 3-year, 5-year is 0.808, 0.813, 0.858), in Fig. 11G in external testing cohort(AUC of 1-year, 3-year, 5-year is 0.844, 0.781, 0.7830) .
Table 2
The C-index of Signature, Stage and Combined model.
Biomarker
|
C-index
|
95%CI Lower
|
95%CI Higher
|
P-value
|
Signature
|
0.726819
|
0.680894
|
0.772744
|
3.67E-22
|
Stage
|
0.664469
|
0.589597
|
0.73934
|
1.67E-05
|
Combined model
|
0.746342
|
0.708523
|
0.784161
|
4.65E-12
|
3.13 Expression level of genes of risk score in clinical samples
Finally, we compared expression level of genes in risk score between normal and cancer tissues using gene expression profile in TCGA and ICGC, Six genes were significantly different between normal and cancer tissues in TCGA (Fig. 12A-F) and ICGC(Fig. 12G-L). Mrna expression level of G6PD, CCNB1, FLVCR1, ABCB6, RRM2 were significant higher in cancer tissues than normal tissues. However, Mrna expression level of CYP2C9 was significant lower in cancer tissue than normal tissues. In addition, The six genes were significant associated with overall survival (OS), disease-free survival (DFS), disease specific survival (DSS) and progression free survival (PFS) by performing Kaplan-Meier(https://kmplot.com/analysis/). Furthermore, we performed qRT-PCR to detect the expression level in HCC tissues and adjacent samples(Fig. 12M-R). It revealed the same result, Mrna expression level of G6PD, CCNB1, FLVCR1, ABCB6, RRM2 were significant higher in cancer tissue than normal tissue, and Mrna expression level of CYP2C9 was significant lower in cancer tissue than normal tissues.