The overall workflow of the process that we used to develop and validate the risk signature to predict prognostic outcomes was depicted in Fig. 1. Based on the expression similarity of m7G genes, k = 2 seemed to be an adequate selection with clustering stability increasing from k = 2–9 in the TCGA datasets, which we named as cluster1 and cluster2 (Figures. 2A–2C). A significant difference of overall survival (OS) was noticed between these two clusters (p = 0.03) (Fig. 2D), and 273 m7G related differential expression genes (DEGs) (|log2FC| ≥ 1 and p < 0.05) were shown to be linked with these two clusters (Figure.2E). These results are believed to reveal a potential correlation of m7G related DEGs and patients with HCC, which led to the following exploration.
Firstly, 140 DEGs with prognostic value for patients with HCC were selected by univariate regression analysis in supplementary materials of Table 1 (p < 0.05). We selected 40 genes with the smallest p-values to analyze their mutation rates and types (Fig. 3A). The left panel shows the genes ordered by their mutation frequencies and the right panel presents different mutation types. Then, Lasso-Cox regression was performed to choose 12 genes, which were then included in multivariate Cox analysis (Fig. 3B and 3C). Finally, 3 genes (GAS2L3, HTR1D, and NEIL3) were identified as independent prognostic factors and selected for the prognostic model construction (Fig. 3D–F). The formula of the risk score for each patient was as follows: risk score =(1.18908*ExpHTR1D) + (2.31081*ExpGAS2L3) + (1.65707*ExpNEIL3). A follow-up analysis was conducted to assess the value of this prognostic model. All patients with HCC from TCGA databases were divided into two groups (low-risk and high-risk groups) based on the median risk score. The Kaplan–Meier curve indicated that high-risk patients had a substantially shorter OS than low-risk patients (p < 0.001, Fig. 4A). Time-dependent ROC curves were constructed to evaluate the prognostic model accuracy. As shown in Fig. 4B, the AUC values for 1, 3, and 5 years were 0.758, 0.698, and 0.644, respectively. From the scatter chart, we found that individuals at high-risk died sooner than those at low-risk (Fig. 4C and 4D). The PCA and t-SNE results showed that patients with varying risks were divided into two groups (Fig. 4E and 4F).
Verification of the m7G-related genes prognostic signature
The entire TCGA cohort (n = 370) was randomly divided into the training cohort (n = 186) and internal validation cohort (n = 184). In the training group (P = 007) (Figure.5A) and in validation group (P < 0.001) (Figure.5E), the ROC and Risk Score distribution point plots were also similar to the plots for the combined group, with significantly more patients dying in the high-risk group (Figure.5B–D, 5F–H). All these results suggested that these 3 m7G-related genes would be a valuable HCC prognostic model.
Independent prognostic value of the prognostic model
Risk score and clinicopathological information, such as age, gender, grade, and T, N, and M stages, were included for univariate and multifactorial analyses to assess the independent prognostic value of this prognostic model. The univariate cox analysis revealed that N stage (p < 0.001), T stage (p < 0.001), M stage (p < 0.021), and risk score (p < 0.001) were significantly associated with OS of patients with HCC (Fig. 4A). The multivariate cox regression revealed a risk score (p < 0.001) that was independent risk factors in the OS of patients with HCC (Fig. 4B).
Subgroup OS analysis with different clinicopathological features for all patients with HCC in TCGA cohort
Patients were stratified into subgroups based on distinct clinicopathological features, including age (age > 65 years and age ≤ 65 years), gender (female and male), grade (G1–G2 and G3–G4), clinical stage (stages I–II and III–IV) and T stage (T1–T2, T3–T4), to further assess the effectiveness of the prognostic model to predict the prognosis of patients with HCC in different clinicopathological conditions. Patients in each group were divided into high-risk and low-risk subdivisions based on the median risk score, and the groups were subjected to KM survival analysis. Results showed that the OS rate of the subgroups, including age > 65 (p = 0.002) and age ≤ 65 (p = 0.002) years, female (p = 0.039) and male (p < 0.001), G1–G2 (p < 0.001), G3–G4 (p = 0.041), stages I–II (p = 0.009), stages III–IV (p = 0.005), T1–T2 (0.008), and T3–T4 (0.004), were significantly different in the low- or high-risk subdivisions (Fig. 7).
Developing a nomogram that incorporates clinical features
Nomograms are always used to combine multiple factors to diagnose or predict disease onset or progression. In our study, age, gender, grade, and T, N, and M stages, clinical stage, and risk score were integrated to develop a nomogram to predict the survival probability of patients with HCC at 1, 3, and 5 years (Fig. 8A). The ROC curve showed that the AUC values corresponding to nomogram, risk score, age, gender, grade, clinical stage, and T, M, and N stages were 0.756, 0.645, 0.55, 0.444, 0.532, 0.707, 0.696, 0.521, and 0.514, respectively (Fig. 8B). The calibration plot demostrated that the nomogram performed well compared to an ideal model (Fig. 8C–E). The above results indicated that the nomogram has satisfactory accuracy in predicting the prognosis of patients with HCC.
Functional analyses based on the risk model
We further analyzed the functional enrichment analysis of DEGs (adjusted p-value < 0.05 and | log2(fold change) | ≥1) between high and low-risk groups. GO enrichment analysis revealed that these DEGs are mainly enriched in nuclear division, organelle fission, chromosome segregatio, mitotic nuclear division, and nuclear chromosome segregation. The KEGG top-ranked pathways include cell cycle, metabolism of xenobiotics by cytochrome P450, metabolism of xenobiotics by cytochrome P450, drug metabolism-cytochrome P450, and chemical carcinogenesis-DNA adducts. We found that the lower risk group with better prognosis was enriched in metabolism-related pathways, including drug metabolism-cytochrome P450, drug metabolism other enzymes, fatty acid metabolism, metabolism of xenobiotics of cytochrome P450, and primary bile acid biosynthesis based on GSEA analysis.
Assessment of differences in immune characteristics and immunotherapy between the high- and low-risk groups
Considering that TME is important for the progression and treatment of HCC, we analyzed the differences in immune characteristics between the high- and low-risk groups. The results showed in Figs. 10A and 10B hint that B cells, NK cells, neutrophils, and pDCs were highly expressed in the low-risk group, while aDCs, mast cells, macrophages, and Tfh were upregulated in the high-risk group. Additionally, immune-related functions, including cytolitic activity, MHC class, and type I and II interferon (IFN) responses, were statistically different between the two groups. Immune evasion is crucial for immunotherapy, and the greater the potential for immune evasion in patients with cancer, the less effective they are in receiving immunotherapy. We found significant differences between the high- and low-risk groups in TIDE scores, microsatellite instability (MSI) scores, immune dysfunction scores, and immune exclusion scores by analyzing the data from published immune checkpoint blockade trials on the web platform TIDE (Fig. 10C–F). The results of these scores indicate greater immune evasion potential in the high-risk group, suggesting that immunotherapy is more effective. Therefore, the risk model can predict the survival prognosis of patients with cancer and guide their clinical treatment decisions.
Pan-cancer analyses of GAS2L3 and HTR1D
Considering the scarce studies on the prognostic value of GAS2L3 and HTR1D in different cancers, we made a preliminary pan-cancer analysis in 33 cancers using the TCGA database, which we hope will be helpful to other investigators. The results showed that GAS2L3 expression was significantly increased in the following cancers compared to the adjacent paraneoplastic tissues: BCLA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KIRC, KIRP, LIHC, PCPG, SARC, STAD, THCA, and UCEC (Fig. 11A). Compared to other tumor tissues, GAS2L3 expression was highest in KIRC and SKCM, and lowest in LGG and PRAD (Fig. 11B). The Kaplan–Meier analysis was used to assess the relationship between GAS2L3 expression and survival time in various cancers, which revealed an increased GAS2L3 expression, predicted worse prognosis in ACC, LUAD, LGG, and MESO, and was associated with good prognosis in THYM (Fig. 11C). We further analyzed the relationship between GAS2L3 and clinical staging and age, and the results suggested that GAS2L3 was associated with clinical staging in ACC, KICH, and KIRP, and with age factors in BRCA, KIRP, LGG, PRAD, READ, STAD, THYM, and UCEC (Fig. 11D and 11E). The same approach was used for pan-cancer analysis of HTR1D, which was significantly increased in the following cancers compared to adjacent paraneoplastic tissues: BRCA, CHOL, COAD, ESCA, HNSC, KIRP, LIHC, LUAD, PAAD, READ, STAD, THCA, and UCEC, whereas downregulated in KICH and PRAD (Fig. 12A). Compared with other tumor tissues, HTR1D expression was highest in COAD and READ, and lowest in THYM and DLBC (Fig. 12B). HTR1D was associated with SARC, HMSC, LGG, LUAD, UVM, and THYM (Fig. 12C). Additionally, HTR1D was associated with clinical staging of BLCA, KIRC, LUAD, MESO, and THCA; and with age factor of ACC, BRCA, ESCA, OV, SARC, TGCT, THCA, THYM, and UCES (Fig. 12D and 12F).