Construction of a prognostic gene signature in the TCGA cohort
The TCGA-LIHC dataset contains 374 HCC patients, and the ICGC-LIRI-JP dataset contains 260 HCC patients, the clinical characteristics of these patients are displayed in Table S3. As illustrated in Fig .1a, b, 2107 DEGs were identified from the TCGA cohort, among which 55 genes were OSRGs (Fig. 1c). 13 OSRGs were then verified significantly related to shorter OS (Fig. 1d). After LASSO Cox regression analysis, an eight-gene prognostic signature was constructed (Fig .1e, f). To calculate the risk score of each patients, following formula is used: risk score = 0.069*expression level of G6PD + 0.177*expression level of MT3 + 0.206*expression level of CBX2 + 0.063* expression level of CDKN2B + 0.078*expression level of CCNA2 + 0.164*expression level of MAPT + 0.248*expression level of EZH2 + 0.213*expression level of SLC7A11 (Fig. S1). The above eight genes' expression was upregulated in HCC tissue than in normal liver tissue in the TCGA cohort (Fig. S2). We further validated the expression pattern of the proteins encoded by the above genes using clinical specimens in HPA database. All proteins except SLC7A11 were found to be elevated in HCC tissues. No conclusion could be drawn about SLC7A11 due to lack of data. G6PD, MT3, and MAPT were exclusively located in the cytoplasm, CDKN2B was located in both the cytoplasm and nucleus, while CBX2, CCNA2, and EZH2 were located in the nucleus (Fig. 1g).
Validation of the prognostic gene signature
Then we classified the patients into the high-risk score and low-risk score subgroups according to the median cut-off value in the TCGA and ICGC cohorts (Fig.2a). Compared with the low-risk group, patients in the high-risk group had a higher mortality rate (Fig. 2c). In line with this, the K-M curve revealed a significantly shorter OS in the high-risk group (Fig. 2e). The expression heatmap of the eight genes is displayed in Fig. 2g. Time-dependent ROC curves assessed the predictive accuracy of the prognostic gene signature. The area under the ROC curve (AUC) of one, two, and three-year was 0.79, 0.75, and 0.73, respectively, which confirmed the effectiveness of this prognostic model (Fig. 2i). Then, we verified the robustness of the prognostic gene signature using the independent ICGC cohort. The results of patient distribution, K-M curve analysis, and AUC analysis were in line with those of TCGA cohort, further supporting the robustness of the prognostic gene signature (Fig. 2b, d, f, h, j).
Evaluation of the independent prognostic value of risk score and nomogram establishment
We brought the clinicopathological characteristics and risk scores into univariate and multivariate Cox regression analyses to identify the independent prognostic predictors. Univariate Cox analysis of TCGA cohort revealed that stage, T, M, and risk score were significantly correlated with OS (Fig. 3a). In the ICGC cohort, univariate Cox analysis suggested that female gender, stage, and risk score were strongly associated with OS (Fig. 3c). After multivariate Cox analysis, risk score emerged as the only independent factor for efficiently predicting prognosis in both cohorts (Fig. 3b, d). We then constructed nomograms combining clinical variables and risk scores to calculate the total score of each patient with HCC, which could predict the one, two, and three-year OS (Fig. 3e, f). The calibration curves showed the excellent predictive accuracy of the nomograms (Fig. 3g, h).
Immune infiltration and ICI therapy prediction
Next, we employed CIBERSORT algorithm to assess the infiltration of the immune cells of each HCC sample (Fig. S1). As illustrated in Fig. 4a, the memory B cells, activated memory CD4+T cells, follicular helper T cells, and M0 macrophages were more abundant in the high-risk group compared to the low-risk group. However, naïve B cells, CD8+T cells, resting memory CD4+ T cells, monocytes and eosinophils were reduced in the low-risk group. In addition, there was a positive correlation between the immune checkpoints expression level and risk scores (Fig. 4b). TIDE analysis revealed that the risk scores and TIDE scores were negatively correlated (Fig. 4c). Consistently, the response rate to ICI therapy was expected to be higher in patients suffering from HCC with high risk scores (62% vs. 37%) (Fig. 4d).
Associations of TMB with Risk Score in HCC patients
Next, we investigated the associations of TMB with the risk score. As mentioned before, we stratified HCC patients into high- or low-risk groups in the TCGA cohort. The comprehensive mutation data for each group was represented using a waterfall plot (Fig. 5a). The five most frequently mutated genes were TP53(44%), CTNNB1(24%), TTN(23%), MUC16(18%), and APOB(11%) in the high-risk group, while CTNNB1(27%), TTN(23%), ALB(13%), TP53(12%), and MUC16(11%) were the top five in the low-risk group. In addition, the overall genome mutation occurrence rate was 86.71% and 82.49% in the high- and low-risk group, respectively. As displayed in Fig. 5b, the TMB scores were positively correlated with the risk scores. The K-M curve analysis revealed that the patients in the high-TMB group have shorter OS than the patients in the low-TMB group. (Fig. 5c).
Analysis of the Correlation Between Risk Score and Chemotherapy Sensitivity
First, we estimated the chemotherapeutic response using the drug activity and transcriptomic data in the CGP cell lines. In the high-risk group, HCC patients were sensitive to 45 chemotherapy and targeted therapy drugs and were resistant to other 44 drugs, as indicated by the variation of the IC50 (Fig .6a, b). For each prognostic gene in the gene signature, the corresponding expression level was analyzed in the transcriptomic data of NCI-60 cell lines. The top two FDA-approved drugs with the strongest positive or negative association with each gene are shown in Fig. 7. The elevated expression of G6PD, MT3, CBX2, CDKN2B, CCNA2, MAPT, and EZH2 was associated with increased resistance to mitomycin, teniposide, 6-thioguanine, and fulvestrant, etc. By contrast, elevated CBX2 and SLC7A11 expression are hallmarks of increased sensitivity to dasatinib, ixazomib citrate, and arsenic trioxide.