3.1 Differential expression and interaction of m7G regulatory genes
Initially, the expression of 22 m7G methylation-regulated genes was analyzed in GC and healthy samples from the TCGA database. A remarkable variation was observed in the m7G regulatory genes between GC and healthy tissues. Particularly, the expression of AGO2, DCP2, DCPS, EIF3D, EIF4A1, EIF4E, EIF4E2, EIF4G3, GEMIN5, IFIT5, LARP1, METTL1, NCBP1, NCBP2, NCBP3, NUDT3, NUDT4, and WDR4 was remarkably increased in GC compared to healthy tissues (p < 0.001). EIF4E3 and NUDT10 expression was significantly decreased in GC compared to healthy tissues. However, no difference in the expression of NUDT11 and NUDT16 was found between GC and normal tissues (Fig. 1(a)). In the correlation analysis of the 22 regulatory genes, EIF4E expression was strongly associated with NUDT10 expression (Fig. 1(b)). In the next step, the STRING database helped in developing a PPI network to determine the relationship between the identified regulatory genes. A close relationship was found among all regulatory genes except IFIT5 (Fig. 1(c)). The node count diagram revealed EIF4E's relation to 13 other genes, suggesting that EIF4E may be key in the PPI network (Fig. 1(d)). It is evident from the above-mentioned findings that m7G methylation-regulated gene expression varied remarkably across GC and healthy tissues, suggesting its involvement in GC onset and advancement.
3.2 Identification of m7G-related lncRNA prognostic signature (m7G-LPS)
Initially, based on the annotation files downloaded from the ‘GENCODE’ website, the lncRNAs expression matrix was identified in the TCGA database, followed by extraction of the expression matrix of 22 m7G regulatory genes from the TCGA database. Finally, 446 m7G-associated lncRNAs (Fig. 2(a)) whose expression was associated with one or more of the m7G genes (|Pearson R|>0.3 and p < 0.05) were identified. Subsequently, univariate COX analysis COX regression analysis (p < 0.05, Fig. 2(b)) and correlation analysis (Fig. 2(c)) helped identify 25 lncRNAs having good predictive efficiency. Then, genes with p < 0.01 were screened for Lasso regression analysis, and finally, seven m7G-related lncRNAs, namely, AL161785.1, LINC01094, CHROMR, AP001528.1, AC245041.1, AL355574.1, and AC005586.1, were identified (Fig. 3(a)). Among these genes, AL355574.1 and AC005586.1 were recognized as protective effects (HR < 1, p < 0:05). In contrast, AL161785.1, LINC01094, CHROMR, AP001528.1, and AC245041.1 were considered as risk effects (hazard ratio, HR > 1, p < 0:05).
The formula used for calculating the GC sample risk score is given below: Riskscore = 0.0410201815783705×AL161785.1 + 0.232602359496281×LINC01094 + 0.0744239113701401×CHROMR + 0.1873048414553×AP001528.1 + 0.0295140650717325×AC245041.1-0.226949073687661×L355574.1-0.151514242977824×AC005586.1. As per the median value of the risk score, the samples were stratified into two groups, one of high risk and the other of low risk. The Kaplan–Meier survival curve depicted a remarkably shorter overall survival (OS) of the high-risk group in comparison to the other group (p < 0.001, Fig. 3(c)). As given in the risk value curve and the survival status scatter plot, the survival time and survival status of the high-risk subjects were worse than those in the low-risk category (Fig. 3(d)). The established prognostic model's survival prediction capability was evaluated with the help of the ROC curve for GC patients over one, three, and five years, and the AUC values were 0.68, 0.70, and 0.72, respectively (Fig. 3(b)). These findings indicate that the established m7G-LPS has accurate OS prediction ability.
3.3 Validation of the Prediction Model Constructed using the m7G-LPS and Construction of Nomograms
To test whether the risk score was an independent risk factor, the survival time, survival status, age, sex, tumor pathological stage, TNM stage, tumor grade, and risk score were integrated and analyzed using univariate and multivariate COX regression. The findings of these analyses (p < 0.05) revealed HR = 1.667546, 95% Cl = 1.186220734–2.344176354 and HR = 0.621721461, 95% CI = 0.426695209–0.905886843, respectively, for risk score, from which it can be concluded that risk score can serve as an independent risk factor for GC (Fig. 4(a), (b)). A nomogram was also constructed based on the findings of the Cox regression analysis (Fig. 4(c)). For the purpose of assessing the prognostic efficiency of the constructed model, the AUC values of the time-dependent ROC curve of the risk score were evaluated, and the values over one, three, and five years were 0.68, 0.69, and 0.71, respectively. Moreover, the risk score AUC in the clinical ROC appeared to be remarkably increased compared to other clinical indicators (Fig. 4(d), (e)). The calibration curve of the nomogram is shown in Fig. 4(f). These findings are indicative of the m7G-LPS-based prediction model’s enhanced sensitivity as well as specificity in predicting the prognosis of patients with GC.
3.4 Principal Component Analysis (PCA) of m7G-LPS
PCA was utilized for the purpose of analyzing the variations between both risk groups in terms of genome-wide expression profiles, m7G-related gene expression profiles, prognosis-related m7GlncRNA expression profiles, and seven prognostic m7G-related lncRNAs expression profiles. The findings of this analysis showed clearer differences across the two groups in the seven prognostic m7G-related lncRNAs expression profiles than in the other three expression profiles (Fig. 5). Therefore, seven prognostic m7G-related lncRNAs expression profiles were greatly distinct and could be used to differentiate effectively across the two GC populations.
3.5 Gene Set Enrichment Analysis (GSEA) of m7G-LPS
To clarify the differences in the potential pathways activated in the two risk groups, GSEA was conducted. The top ten signaling pathways (Fig. 6) in the two groups were visualized based on this analysis. Gene Ontology (GO) analysis revealed high enrichment of external encapsulating structure, negative regulation of T cell migration and T helper 1 type immune response, positive regulation of T helper 1 cell differentiation, leukotriene signaling pathway, epithelial-mesenchymal signaling, and granulocyte colony-stimulating factor production in the high-risk group and positive regulation of establishment of protein localization to telomere, DNA endoreduplication, positive regulation of meiotic cell cycle phase transition, regulation of mitochondrial mRNA stability, transcription initiation from RNA polymerase III promoter, endoribonuclease activity, ligase activity, RNA polymerase activity, endonuclease activity, catalytic activity acting on RNA, endonuclease activity active with either RNA or DNA and producing 5 phosphomonoesters, nucleotidyltransferase activity, nuclease activity, catalytic activity acting on DNA, RNA methyltransferase activity, nuclear chromosome, mitochondrial matrix, nucleolus, spliceosomal complex, preribosome, RNA polymerase complex, U2 type spliceosomal complex, small nuclear ribonucleoprotein complex, receptor complex and transcription factor IID complex in the low-risk group. The Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed a high enrichment of toll-like receptor signaling pathway, Jak-STAT signaling pathway, chemokine signaling pathway, leukocyte transendothelial migration, cytokine-cytokine receptor interaction in the subjects with a high risk score, while a high enrichment of mismatch repair, DNA replication, RNA polymerase, homologous recombination, and spliceosome in the subjects with a low risk score. These findings indicated the possibility of m7G-LPS’ influence on the course of GC by regulating transcription, translation, and immune infiltration.
3.6 Correlation of the m7G-LPS with Clinicopathological Features in Patients with GC
The association of the risk score with the clinicopathological features of the two risk groups was investigated. The results suggest a significant difference in T-stage, N-stage, pathological stage, and age between the two groups (Fig. 7(a)). In particular, GC subjects with T3 and T4 presented an elevated risk score than those with T1 (p < 0.05). The subjects with N1 and N3 showed elevated risk scores than those with N0 (p < 0.05); and risk scores were remarkably increased in those with STAGE II, STAGE III and STAGE IV than in those with STAGE I (p < 0.05, Fig. 7(b), (c), (d)). In addition, patients aged ≤ 65 years presented with a remarkably greater risk score compared to those aged > 65 years group (Fig. 7(e)). Therefore, high-risk subjects tended to have advanced clinicopathological features.
3.7 Correlation of the m7G-LPS with the Immune Characteristics of GC Patients
To explore the value of the m7G-LPS in the tumor immune microenvironment, the CiberSort algorithm was utilized for analysis of the differences in the distribution of 22 tumor immune cells in the two risk groups. The heat plot and violin plot show that the immune cell distribution differed between them (Fig. 8(a), (b)). The subjects with the high risk presented with higher infiltration abundance of memory CD4 T cells resting, monocytes, M2macrophages, dendritic cells (DCs) resting, mast cells resting, and neutrophils, while low-risk subjects had higher abundance of infiltration M0 macrophages and follicular helper T cells (p < 0.05). The immune cell composition of the samples is shown in Fig. 8(c). This was followed by an investigation of the association of risk score with immune cells. A positive correlation of risk score was found with the abundance of CD4 T cell (cor = 0.172, p < 0.01), CD8 T cell (cor = 0.330, p < 0.001), DCs (cor = 0.460, p < 0.001), macrophages (cor = 0.550, p < 0.001), and neutrophils (cor = 0.423, p < 0.001). These findings were indicative of the involvement of immune cells in risk score grading (Fig. 9).
In addition, the correlation between m7G-LPS and immune cell and immune function concentration scores was also evaluated. The results showed higher enrichment scores of aDCs, B cells, CD8 + T cells, DCs, iDCs, macrophages, mast cells, neutrophils, natural killer (NK) cells, pDCs, T helper cells, and Tfh, TIL, Treg in the subjects with high-risk score (p < 0.05, Fig. 10(a)). Also, greater enrichment scores were observed in the high-risk group for multiple immune functions, such as antigen-presenting cell (APC) co-inhibition, APC co-stimulation, CCR, checkpoint, cytolytic activity, human leukocyte antigen, inflammation-promotion, MHC class I, parainflammation, T cell co-inhibition, T cell co-stimulation, Type I IFN response, and Type II IFN response (p < 0.05, Fig. 10(b)). These findings were indicative of the involvement of seven prognostic m7G-related lncRNAs in immune function regulation.
3.8 Value of m7G-LPS in Immunotherapy and Chemotherapy
The value of m7G-LPS in guiding treatment decision-making was also studied. The expression of immune checkpoint genes across the two groups was also assessed. Elevated expression of ADORA2A, BTLA, CD160, CD200, CD200R1, CD244, CD27, CD274, CD28, CD40, CD40LG, CD44, CD48, CD80, CD86, HAVCR2, ICOS, IDO2, LAG3, LAIR1, NRP1 PDCD1LG2, TIGIT, TMIGD2, TNFRSF4, TNFRSF8, TNFRSF9, TNFSF14, TNFSF18, and TNFSF4 was found in the high-risk subjects (p < 0.05) in comparison to the low-risk subjects. In contrast, TNFRSF25 expression was higher in the low-risk subjects (p < 0.05). The findings were indicative of the significance of m7G-LPS in predicting the efficiency of immune checkpoint inhibitor therapy (Fig. 11). Moreover, the association of risk score with half maximal inhibitory concentration (IC50) of common chemotherapeutic agents was assessed, and the results revealed a negative association of the IC50 of Cisplatin with the risk score (cor=-0.125, p = 0.022), and it was remarkably decreased in the high-risk compared to the other group (p < 0.05, Fig. 12(b), (d)). However, although the IC50 of Docetaxel was also negatively correlated with risks core (cor=-0.151, p < 0.05), no remarkable variations appeared in the IC50 across the two groups (Fig. 12(a), (c)). These findings are suggestive of the increased chemotherapy sensitivity of low-risk patients as well as better prognoses and clinical outcomes.
3.9 Expression of m7G-related lncRNAs in Tissues
The expression of m7G-related lncRNAs was assessed in four cell lines, namely, GSE-1, MKN-45, AGS, and HGC-27, by q-PCR. Their expression varied remarkably between the cancerous and healthy cell lines (Fig. 13). Among them, CHROMR, LNC01094, AC245041.1, and AL355574.1 had significantly higher expression levels in tumor cell lines, whereas AC005586.1, AL16178.5, and AP001528.1 had the opposite, which is consistent with the results of our analysis. This result further validated the accuracy of the developed risk model.