1.Identification of differentially expressed PRGs.
First, the transcriptional profiles of 90 PRGs were investigated in the TCGA mRNA-seq data for 50 normal tissues and 374 HCC tissues. A total of 65 differentially expressed PRGs were screened out (All p<0.05). Among them, 58 genes were enriched in HCC tissues while the 7 other genes were downregulated compared to normal tissues. The mRNA expression levels of these genes are presented in the form of volcano map and heatmap (Figure 1A, B). The correlation network of these differentially expressed PRGs was shown in the Figure 1C.
2.Classification of HCC based on differentially expressed PRGs
A consistent cluster analysis using the data of 374 HCC cases from TCGA database was preformed to explore the association between the transcriptional profiles of 65 PRGs and HCC subtypes. We increased the clustering variable (k) from 2 to 9 and found that the intra-group correlations were highest and the inter-group correlations were low when k=2, suggesting that 374 HCC patients could be divided into two clusters according to the expression of 65 PRGs (Figure 2A). Then we preformed PCA analysis to verify the discrete distribution of the two clusters (Supplementary Figure 1). Kaplan–Meier curves showed that OS of cluster 1 was significantly better than that of cluster 2 (Figure 2B p=0.004872). Heatmap was drawn to show the relationship between the expression profiles of 65 PRGs and clinical features, such as tumor size (T1-4), lymph node metastasis(N0-1), distant organ metastasis(M0-1), clinical stage (stage I-IV), histologic grading (G1-G4), age (≤60 or >60 years), gender (female or male) and survival status (alive or dead) (Figure 2C). It could be observed that there were significant differences in tumor size and clinical stage between the two clusters (Figure 2D, Figure 2E, both p<0.001).
3.Construction of prognostic risk model according to PRGs
HCC samples with incomplete clinical information were removed and survival-related PRGs were screened by univariate COX regression analysis. The results revealed that a total of 11 PRGs (CASP8, CHMP3, HSP90AA1, BAK1, HSP90AB1, GSDMC, GSDME, STK4, CGAS, NOD2) were identified correlated with OS (p<0.01) and retained for further analysis (Figure 3A). LASSO-Cox regression analysis was preformed to construct an 8 genes prognostic model (HSP90AA1, CHMP4B, BAK1, HSP90AB1, GSDMC, GSDME, STK4, NOD2) based on the optimal value of penalty parameter (λ) (Figure 3B, C), the expression of which were upregulated in tumor tissue and correlated with poor prognosis according to Gene Expression Profiling Interactive Analysis (GEPIA) (Supplementary Figure 2). The risk score was calculated as follows: Risk Score = (0.309*HSP90AA1exp.) + (0.131*CHMP4B exp.) + (0.051*BAK1 exp.) + (0.006*HSP90AB1 exp.) + (0.173*GSDMC exp.) + (0.236*GSDME exp.) + (0.066*STK4 exp.) + (0.107*NOD2 exp.). Based on the median value of risk score calculated by the formula, HCC patients in TCGA cohort were divided into low-risk and high-risk subgroups (Figure 3D). The t-SNE analysis and PCA showed that patients with different risks were well distributed into two directions (Figure 3E). In addition, the scatter plot shows that high-risk patients had more deaths and shorter survival time than low-risk patients (Figure 3F). Consistently, the Kaplan-Meier curve showed that high-risk patients had significantly poorer OS than low-risk patients (Figure 3G). Time-dependent ROC curves were generated for evaluation of the sensitivity and specificity of the prognostic model and the area under the curve (AUC) reached 0.717 at 1 year, 0.665 at 2 years, and 0.658 at 3 years (Figure 3H).
4.External Validation of the 8-Gene risk Signature
As the external validation set for the prognostic model, patients in the ICGC cohort were also divided into high-risk or low-risk groups based on the median risk score calculated by the same equation (Figure 4A). PCA and t-SNE analyses confirmed a satisfactory separation of patients between the two subgroups (Figure 4B, C). As shown in Figure 4D to 4F, the OS and ROC analyses of these two subgroups showed similar results to the TCGA cohort.
5. Prognostic Value of the Prognostic Risk Model
Univariate and multivariate Cox regression analyses were preformed to assess independent predictive value of the risk scores derived from 8-genes model for OS of HCC patients. Univariate Cox regression analysis revealed that risk score was negatively correlated with prognosis, with a hazard ratio 3.303 (Figure 5A, p<0.001). Further, multivariate analysis showed that risk score was one of independent factors for predicting the prognosis in HCC (Figure 5B, p<0.001). Data from the ICGC validation set also confirm this result (Supplementary Figure 3). In addition, heatmap of PRGs expression and clinical characteristics based on the TCGA data showed significant differences in age, survival status, tumor size, and clinical stage between the low-risk and high-
risk subgroups (Figure 5C). Correspondingly, the risk scores in histologic grading 3-4 and clinical stage III-IV were significantly elevated compared to histologic grading 1-2 and clinical I-II (Figure 5D, p=0.00015, p=0.013). The area under the ROC curve showed good predictive accuracy of the PRGs model for prognosis in HCC, and could provide more accurate predictive value when combined with tumor stage (Figure 5E, AUC=0.733).
6. Expression and Mutation profiles of Prognostic Genes Between HCC Tissues and Para-carcinoma Tissues.
Immunohistochemistry from the Human Protein Atlas (HPA) database for the prognostic PRGs showed that the expression in cancer tissues were mostly higher than that of normal tissues (Figure 6A). Furthermore, we analyzed the expression and mutation profiles of the 8 PRGs based on the HPA database and cBioProtal database. Data form the 353 samples with mutation and Copy Number Alterations (CNA) showed that 67 patients (19%) had heterogeneous mutations (Figure 6B). Thereinto, 0.57% patients had deep deletion, 0.85% patients had missense mutation and 0.28% patients had truncating mutation in HSP90AA1; 0.85% patients had amplification while 0.57% patients had structural variant in CHMP4B; 1.42% patients had amplification BAK1; 3.12% patients had amplification, 0.28% patients had truncating mutation and 0.57% patients had missense mutation in HSP90AB1; 10.2% patients had amplification, 1.13% patients had missense mutation and 0.28% patients had deep deletion in GSDMC; 0.28% patients had amplification and truncating mutation respectively, while 0.57% patients had missense mutation in GSDME; 0.28% patients had missense mutation STK4; 0.28% patients had amplification and deep deletion respectively, while 1.42% patients had missense mutation in NOD2.
7.Analysises of Biological Function and Pathway.
In order to further explore the biological processes and pathways through which the 8-gene signature affect the prognosis in HCC, GSEA were performed to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses between the high- and low-risk groups. 20 GO functions and 20 KEGG pathways were respectively enriched in the high-risk group with the FDR value q and normalized value p<0.05. The results turned out that some functions and pathway involved in regulation of cell cycle phase transition were enriched in the high-risk group. And in addition to this, several tumor-associated pathways were significantly enriched in high-risk group, including apoptosis, cell cycle, WNT, TGF-β, VEGF and MAPK signaling pathway. Interestingly, the high-risk group was significantly correlated with some pathways related with immune response, including FC-γ receptor mediated phagocytosis, nod like receptor signaling pathway, T cell receptor signaling pathway, B cell receptor signaling pathway, toll like receptor signaling pathway, natural killer cell mediated cytotoxicity, mTor signaling pathway (Figure 7A and 7B).
8. Comparison of Immune status between subgroups
Since the relevance between the risk score and immune response, we further compared the immune infiltration score and immune-related function between the high- and low-risk groups in both the TCGA and ICGC cohorts by using ssGSEA. The results showed that the high-risk group was associated with higher levels of immune cell infiltration, including aDCs, DCs, iDCs, pDCs, Macrophages, CD8+ T cells, Th1 cells, Th2 cells, TIL, Treg and Tfh in the TCGA cohort (Figure 8A) as well Treg, aDCs, DCs, iDCs, pDCs, Macrophages, Neutrophils, CD8+ T cells, T helper cells, TIL, Treg, Th1 cells and Th2 cells in the ICGC cohort (Figure 8B). In addition, data from the TCGA cohort suggested that the high-risk group had higher scores in terms of immune checkpoint, inflammation-promoting, APC co-inhibition, APC co-stimulation, T cell co-stimulation, T cell co-inhibition, HLA and MHC class I. Similar results were obtained for data from ICGC cohort. The correlation analysis suggested that risk score was also significantly positively correlated with immune score and stromal score in ICGC cohort (p<0.001), while correlated with immune score in TCGA cohort (p= 0.0011).
We further analyzed the correlation between the risk score and the expression level of immune checkpoints in TCGA cohort. The results suggested that risk score was significantly positively correlated with the expression levels of B7-H3, B7H4, TIM-3, LAG3, CTLA-4, BTLA, PD-L1, PD- L2 and PD-1 (Figure 9), which combined with TIDE scores implied that patients in high-risk group might be more sensitive to immune checkpoint blockade therapies (Supplementary Figure 4, p<0.0001).