Multi-omics analysis of m7G-related genes in HCC
The waterfall chart in the TCGA-LIHC somatic mutation dataset showed that m7G-related genes had lower mutation frequency, only mutating in 19 out of 364 samples (Fig. 1(a)). In the TCGA-LIHC somatic copy number dataset, copy number variants (CNV) were prevalent in m7G-related genes, with increased CNV in NCBP2 and LARP1 and decreased CNV in EIF4G3, EIF4E, DCPS and EIF4A1 (Fig. 1(b) and 1(c)). When comparing the gene expression in normal and tumor tissues, significant differences were found in the expression of 18 m7G-related genes, indicating that m7G-related gene expression plays a vital role in HCC (Fig. 1(d)). In addition, 15 m7G-related genes had significantly different survival curves(Fig. 2). It could be found that NCBP2 expression was significantly increased in HCC samples compared with normal samples, suggesting that the increased frequency of NCBP2 CNV may be one of the mechanisms of its overexpression, but it was not absolute. EIF4G3 CNV showed low frequency, but its expression was still higher in tumor samples than normal tissues. Therefore, the variation of CNV frequency is not the only factor affecting tumor gene expression.25,26
Identification of m7G-related gene subtypes in HCC
We integrated the TCGA-LIHC with the GSE14520 (GPL571) to examine the subtypes of HCC based on the expression of m7G-related genes. Then, Two m7G-related gene subtypes (subtype A and subtype B) were identified by unsupervised clustering (Fig. 3(a)). Principal component analysis (PCA) revealed significant differences in the transcriptional profiles of m7G-related genes between the two subtypes (Fig. 3(b)). In terms of survival analysis, Kaplan-Meier survival curves showed that subtype A had a significantly worse survival rate than subtype B (log-rank test, p = 0.002) (Fig. 3(c)). In terms of clinical clinicopathological features, the clinical TNM stages showed significant differences, and subtype A had a higher ratio of TNM stage III-IV than subtype B (Fig. 3(d)).
TME characteristics in m7G-related gene subtypes
To further explore the pathway enrichment characteristics of different subtypes of m7G-related genes, GSVA revealed that subtype A was significantly enriched in ubiquitin-mediated proteolysis and cell cycle(Fig. 4(a)). Ubiquitination modifications also play a vital role in the development of the human immune system and in the various stages of the immune response, such as the initiation, development and conclusion of the immune response. Moreover, many ubiquitination modifications are able to suppress the immune response by degrading immunomodulatory proteins. To investigate the role of m7G-realted genes in TME of HCC, correlations between the two subtypes and 23 human immune cell subpopulations were assessed using the CIBERSORT algorithm, and the results suggested that the proportion of activated CD4 T infiltration was significantly higher in the A subtype than in the B subtype(Fig. 4(b)).
Identification of DEGs in m7G-related gene subtypes
To further explore the heterogeneity of m7G-related gene subtypes, we identified 1581 DEGs. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses showed that these DEGs were significantly enriched in DNA replication biological processes and significantly enriched in cell cycle pathways(Fig. 5(a) and 5(b)). The prognostic value of 1134 subtype-associated genes was determined by univariate Cox regression analysis. A consensus clustering algorithm was used to validate this regulatory mechanism further, and classify patients into three gene subtypes (gene subtype A, gene subtype B and gene subtype C) based on 1134 subtype-associated genes. Kaplan-Meier curves showed that gene subtype C had the worst overall survival, while gene subtype B had a favorable overall survival(Fig. 5(c)). Furthermore, the three gene subtypes differed significantly in the expression of m7G-related genes(Fig. 5(d)).
Construction and validation of m7G_score
The best prognostic features were identified by lasso regression analysis and multivariate Cox regression analysis of 1134 subtype-associated genes, and eventually, five candidate genes (UCK2, PRKCD, UGT2B15, ADM, NQO1) were finally screened. The m7G_score formula was as follows:
m7G_score = (0.4466*expression of UCK2) +(0.2772* expression of PRKCD)+( -0.0928* expression of UGT2B15)+( 0.2021* expression of ADM)+( 0.0960* expression of NQO1)
Sankey diagram illustrated the distribution of m7G-related gene subtypes, gene subtypes and m7G_score groups(Fig. 6(a)). In comparing the m7G_score of m7G-related gene subtypes, it was found that the m7G_score of subtype A was significantly higher than that of subtype B(Fig. 6(b)). When comparing m7G_scores between gene subtypes, gene subtype C had the highest m7G_scores(Fig. 6(c)). When the m7G_scores were divided into two groups of high and low risk by median, the risk distribution graph showed that the survival time decreased significantly with increasing scores(Fig. 6(d)). In addition, Kaplan-Meier survival curves showed that the survival rate in the high-risk group was significantly lower than that in the low-risk group. In addition, the 1-year, 3-year, and 5-year survival rates for the m7G_score were expressed as area under the curve (AUC) of 0.754, 0.742, and 0.730, respectively(Fig. 6(d)).
Validation of Transcriptional level and expression level of five subtype-related prognostic genes in m7G_score
Five pairs of liver tissues (tumor tissues vs. adjacent normal tissues) were examined by qRT-PCR, and the results showed that ADM, UCK2, PRKCD and NOQ1 were transcribed at higher levels in liver cancer tissues than in adjacent normal tissues, while UGT2B15 was lower than in normal tissues( Fig. 7(c)). The expression levels of ADM, UCK2, PRKCD, NOQ1 and UGT2B15 in four pairs of liver tissues (tumor tissues vs. adjacent normal tissues) were detected by western blot, and the results were consistent with the transcriptional levels of each gene( Fig. 7(a) and 7(b)). In addition, we further explored the prognostic value of these genes by Kaplan-Meier Plotter website, which showed that high expression of ADM, UCK2, PRKCD and NOQ1 was associated with shorter OS (P = 0.0081, P = 1.3e-09, P = 0.00036 and P = 0.001, respectively), while low expression of UGT2B15 was associated with worse prognosis (P = 0.0022)(Fig. 7(d)).27
Relationship between m7G_score and TME
The relationship between m7G_score and immune cell abundance was positively correlated with T cell regulatory and negatively correlated with T cells CD4 memory by the CIBERSORT algorithm(Fig. 8(a)). We evaluated the relationship between five candidate genes and immune cell abundance, and the heat map showed that these genes were significantly associated with multiple immune cells(Fig. 8(b)). In addition, immune checkpoint expression was significantly higher in the high-risk group than in the low-risk group(Fig. 8(c)).
Constructing nomograms to predict survival
We combined m7G_score and TNM stage as predictors to construct a nomogram, through which we could better show the survival rate of patients at 1, 3 and 5 years(Fig. 9(a)). The calibration curves showed that the predicted survival at 1, 3 and 5 years was very similar to the actual survival, reflecting an excellent predictive efficacy(Fig. 9(b)).