Identification of DE-PCGs and -lncRNAs in HCC
With the aim of identifying PCGs and lncRNAs potentially involved in the HCC pathogenesis, in the training set of 295 (260 HCC and 35 healthy subjects), transcriptional profiles of HCC patients and healthy subjects were compared. After a robust filtering procedure (P ≤ 0.05 and |logFC| ≥ 1), 1916 PCGs were significantly modulated (Fig. 1A; Supplementary Table S2). More than one hundred lncRNAs were detectable, of which 80.6% were up-regulated and 19.4% were down-regulated (Fig. 1B; Supplementary Table S3). The DE-PCGs and -lncRNAs with similar expression levels were clustered using the systematic cluster analysis (Figs. 1C and D). As displayed in Table 1, the most up-regulated PCG was AKR1B10 with 3.84-logFC, and the most down-regulated PCG was HAMP with − 5.90-logFC. Besides, the most up- and down-regulated lncRNA were ST8SIA6-AS1 (2.60-logFC) and LINC01093 (-3.60-logFC), respectively. The expression patterns of the top 10 up-regulated and down-regulated PCGs and lncRNAs were illustrated in Figs. 1E and F (Supplementary Tables S4 and S5).
Statistical Analysis of All of the Differentially Expressed PCGs and lncRNAs
No. of Upregulated
No. of Downregulated
The Most Upregulated (logFC)
The Most Downregulated (logFC)
|PCG: protein-coding gene;DE RNAs: Differentially expressed RNAs|
Identification of prognostic m6A methyltransferase‑related lncRNAs
Transcription data of 260 patients with full clinical information were retrieved from the TCGA-HCC training cohort. The 45 prognostic-related lncRNAs were screened out through K–M survival analyses (Fig. 2A; Table 2; Supplementary File 1). Furthermore, the m6A methyltransferase scores in each sample were calculated via ssGSEA according to the reference of the DE-m6A methyltransferase (METTL3, VIRMA, IGF2BP1, and IGF2BP2) gene sets derived from the intersection analysis of 20 m6A methyltransferase genes and DE-PCGs (Fig. 2B). After measuring with Spearman correlation analyses, 21 lncRNAs were recognized as m6A methyltransferase-related lncRNAs (|R| > 0.3, P < 0.05; Fig. 2C). At last, we have got 8 candidate m6A methyltransferase-related lncRNAs (LINC01093, LINC02362, SNHG20, SNHG17, ZFAS1, SNHG6, SNHG7, and GAS5) by the intersection analysis for further research (Fig. 2D).
The 45 prognostic-related lncRNAs
The univariate cox regression analysis of eight m6A methyltransferase-related lncRNAs
HR (95% CI for HR)
Establishment of m6A methyltransferase-related lncRNAs signature
The univariate Cox regression analysis demonstrated that LINC01093 (P = 0.055) and LINC02362 (P = 0.014) were protective genes with hazard ratio (HR) less than 1, whereas SNHG20 (P = 0.0033), SNHG17 (P = 0.019), ZFAS1 (P = 0.0093), SNHG6 (P = 0.0084), SNHG7 (P = 0.034), and GAS5 (P = 0.046) were risky genes with HR greater than 1 (Fig. 3A). Following this, eight significant m6A methyltransferase‑related lncRNAs (P < 0.2) were subjected to LASSO modeling (Figs. 3B and C). Then, three m6A methyltransferase-related lncRNAs were selected based on lambda.min values, and the HR values of the three candidate lncRNAs were calculated (Fig. 3D; Supplementary Table S6).
We stratified patients into low-risk or high-risk groups by median cut-off based on the risk score. K-M survival curve indicated that the low-risk patients lived longer than high-risk patients in the training cohort (P = 0.00064) (Figs. 4A and B). Therefore, time-dependent ROC analysis showed an appropriate accuracy of the risk score in predicting OS in training cohort and area under the ROC curve (AUC) was 0.633 at 1 year, 0.636 at 2 years, 0.651 at 3 years, 0.663 at 4 years, and 0.638 at 5 years (Fig. 4C). Moreover, these prognostic factors were further validated in the testing cohort to assess the robust prediction value of the risk score. We found that results in the testing cohort were consistent with the outcome of the training cohort, indicating all the high-risk patients were associated with poorer prognosis (Fig. 4D and E). In the testing cohort, the significant prognostic value was P = 0.031 and AUC with 1-, 2-, 3-, 4-, and 5 years were 0.708, 0.674, 0.635, 0.603, and 0.611, respectively (Fig. 4F). Also, the PCA analysis indicated that there was a significant disparity in patients between high- and low-risk groups (Supplementary Fig. 3).
Prognostic risk score displayed strong correlations with clinicopathological features in HCC patients
We used the median of the risk score as the cutoff to define the groups of HCC patients with high and low scores. The heatmap showed the expression of the three m6A methyltransferase‑related lncRNAs in high- and low-risk patients of the training cohort. Significant differences between the high-risk and low-risk groups of patients concerning prior malignancy (P < 0.05), T stage (P < 0.05), and pathologic stage (P < 0.05) were identified (Fig. 5A). We then observed that the risk scores were significantly different between patients categorized by prior malignancy (Fig. 5B), T stage (Fig. 5C), and pathologic stage (Fig. 5D). These results revealed that risk scores were positively associated with the T stage and pathologic stage, and the 'no prior malignancy' subgroup exhibiting the worst prognosis had the highest risk score. Stratification survival analyses were utilized to see whether the risk score could apply in different clinicopathological characteristics. Thus, we found that the risk score could efficiently predict the OS in most subgroups from different clinical features (Supplementary Fig. 4). Similar procedures were performed in the testing set, and the results were almost consistent with the training set (Supplementary Figs. 5 and 6).
The risk score was an independent prognostic factor in HCC
Based on the univariate and multivariate Cox proportional hazards regression analyses, only one independent predictor (risk score) was identified in the HCC. Univariate Cox proportional hazards regression analysis demonstrated that age (P = 0.44), gender (P = 0.27), N stage (P = 0.31), prior malignancy (P = 0.33), treatment type (P = 0.21), and BMI (P = 0.21) had no impact upon OS (Fig. 6A; Supplementary Table S7). Multivariate Cox proportional hazards regression analysis demonstrated that the risk score (HR = 2, P = 0.0037) was associated significantly with OS in patients with HCC (Fig. 6B; Supplementary Table S8). Then, the predictor was incorporated to build the nomogram.
According to the nomogram, every patient will get a total point from the prognostic parameter to estimate the 3- and 5-year survival rates. The higher point indicated the higher mortality of the patients (Fig. 6C). We concluded that the risk score played an important role in patient prognosis. Furthermore, calibration curves demonstrated that the prediction accuracy of our model was similar to the ideal model (Figs. 6D and E). The DCA revealed that the nomogram had an advantage of the risk score alone and displaced a high potential for clinical utility (Figs. 6F and G).
Functional enrichment analyses of the DEGs between the high- and low-risk groups
To explore the underlying molecular mechanisms of the signature, we conducted GSEA to compare 454 DEGs between high- and low-risk groups (risk DEGs) in 371 TCGA-HCC patients of the whole set. We enriched the functions of these mRNA (P < 0.05). As shown in Fig. 7A, ‘negative regulation of transcription by RNA polymerase II’ and ‘microtubule cytoskeleton organization’ were significantly enriched for biological processes (Supplementary Table S9),while for the cellular component was ‘Golgi membrane’, for molecular functions were ‘transcription regulatory region sequence-specific DNA binding’ and ‘RNA polymerase II regulatory region sequence-specific DNA binding’ (Figs. 7B and C; Supplementary Tables S10 and S11). KEGG pathway analysis revealed that the DEGs participated in ‘Herpes simplex virus 1 infection’, ‘Amyotrophic lateral sclerosis’, ‘Human papillomavirus infection’, and ‘MicroRNAs in cancer’ (Fig. 7D; Supplementary Table S12). Furthermore, the DEGs participated in the ‘Axon guidance’ pathway indicating their involvement in the neurological invasion of HCC (Fig. 7E; Supplementary Table S13). Moreover, the GSEA also revealed that immune response and immune system process terms/pathways were highly enriched in the risk DEGs (Supplementary Table S14). All these demonstrated the risk score could be related to the tumor immune microenvironment, which was so important for HCC.
The landscape of TME immune cells infiltration in the two risk subgroups
The tumor immune microenvironment is considered the ‘seventh largest representative characteristic’ of tumor and is composed of innate immune cells, adaptive immune cells, cytokines, and cell surface molecules. These immune microenvironment components constitute a complex regulatory network and play a pivotal role in tumorigenesis and development (35, 36). Considering that the risk score also had a close connection with immune reactions, thus we investigated the difference in immune signatures between the two risk subgroups.
Based on the ImmuneScore, StromalScore, and ESTIMATEScore generated by the ESTIMATE algorithm, as shown in Fig. 8A, the stromal score and ESTIMATE score were significantly negatively correlated with the risk score (P < 0.01). ssGSEA analysis revealed that the ratio of pro-tumor signatures (CD56bright natural killer cell, central memory CD8 T cell, effector memory CD8 T cell, natural killer cell, type 1 T helper cell) and anti-tumor signatures (CD56dim natural killer cell, macrophage, neutrophil) was significantly elevated in the low-risk subgroup, indicating that the low-risk subgroup was interrelated with increased immune/inflammation activity. While the high-risk subgroup had relatively higher activated CD4 T cells (Figs. 8B and C).
Low-risk patients were likely to be more sensitive to immunotherapy
Pioneering investigations revealed that immunotherapy targeting immune-checkpoints and the human leukocyte antigen (HLA) offered great hope for the clinical treatment of human cancers(37). We investigated the correlation between the risk score and the expression levels of several well-known immune-checkpoints and HLA in the TCGA database. Then we found that the expression of almost all immune-checkpoints (CTLA4, HAVCR2, ICOS, PDCD1, and TIGIT) and HLA (HLA-DPB2, HLA-DQA2, HLA-DQB2, HLA-DRB6, and HLA-K) was upregulated in the high-risk patients, which indicated the poor prognosis might be owed to the inducing of the immune checkpoints (Fig. 8D and E). Here, patients in the high-risk group may be more suitable for immunotherapy. Then, we used the TIDE algorithm to predict the likelihood of the risk model for immunotherapy. Interestingly, the low-risk group was more likely to respond to immunotherapy than the high-risk group (Fig. 8F). These data further supported that the low-risk group had a better prognosis and may have a better treatment prospect for the immunotherapy.