The Expression and Relationship of m6A Related Genes
We first compared the expression levels of 19 m6A regulator genes between 375 primary tumor samples and 32 solid tissue samples, most of these genes showed higher expression in tumor samples than normal samples, including HNRNPA2B1, HNRNPC, IGF2BP1, IGF2BP2, IGF2BP2, KIAA1429, METTL3, RBM15, YTHDF1, YTHDF2, ZC3H13, but the expression level of ALKBH5 was opposite (Fig. 2. A, C), then the difference of ALKBH5’s expression was checked and verified in GSE29998 (Figure S1). The correlation coefficients of these regulators were detected with Pearson method, results demonstrated that among the 19 regulator genes, KIAA1429 and YTHDF3 had the strongest correlation (0.63), the top 3 genes related to ALKBH5 were RBM15B (0.30), FTO (0.30) and METTL14 (0.25) (Fig. 2. B). Furthermore, the normal samples were wiped out, the expression landscape in tumor cohort was conducted to further see the relationship of them. In different clinical and pathologic cohorts including gender, age, stage, grade, T, N and M, the distribution of these genes’ expression levels had difference (Fig. 2. D).
ALKBH5 is a Protective Gene in GC and A Risk Model Based on ALKBH5 Related Genes was Conducted
To investigate the relationship between the regulators and GC patients’ overall survival (OS), 58 tumor cases without complete clinical information were omitted, then the univariate and multivariate Cox regression were performed in remain 317 samples. Both univariate and multivariate Cox model indicated that ALKBH5 might be an independent protect factor for GC patients. RBM15, YTHDF2 also showed same hazard ratio as ALKBH5 but, ALKBH5 showed significance for both univariate and multivariate Cox regression (Figure S2. A, B). Furthermore, we drew survival curves with the vital status data in ALKBH5-low expression and ALKBH5-high expression subgroups and compared them, the cut-off of groups were median values. Result indicated that the ALKBH5-lowe subgroup had a shorter OS than the ALKBH5-highe subgroup with statistical significance (Fig. 3. A).
To better decode the roles of ALKBH5 in GC, remained 317 samples were randomized into training dataset (159) and test dataset (158), baseline of the two datasets was displayed in Table 1. In training dataset, the “edgeR” package was used to obtain DEGs between ALKBH5-low and ALKBH5-high subgroups with the conditions: log Foldchange > 2, adjust-P-value < 0.05 (Fig. 3. B, Table S1). Then univariate Cox regression was conducted to investigate the relationship between the DEGs and the samples’ OS (Table S2). For those significant ones (P-value < 0.05), we performed the LASSO-penalized Cox regression (Fig. 3. C, D). Constructing the risk model with prognosis formula in method section, the Risk score = 0.006188505 * CA10 + 0.050823131 * SLC7A2–0.008562421 * LINC02303 + 0.050382245 * CGB3 + 0.042501742 * C1QL2 + 0.00927148 *CGB8. All samples’ risk scores in both datasets were calculated with the formula, then samples in each dataset were divided into low-risk and high-risk subgroups by median value.
Table 1
The baseline of patients in training dataset and test dataset
Variables
|
Training Dataset(n = 159)
|
Test Dataset(n = 158)
|
P value
|
Age (n%)
|
|
|
0.1956
|
≤ 65
|
64 (40.25%)
|
76 (48.10%)
|
|
> 65
|
95 (59.75%)
|
82 (51.90%)
|
|
Gender (n%)
|
|
|
0.6118
|
male
|
102 (64.15%)
|
96 (60.76%)
|
|
female
|
57 (35.85%)
|
62 (39.24%)
|
|
Grade (n%)
|
|
|
0.7827
|
G1 & G2
|
57 (35.85%)
|
60 (38.00%)
|
|
G3
|
102 (64.15%)
|
98 (62.00%)
|
|
T (n%)
|
|
|
0.1457
|
T1-2
|
34 (21.38%)
|
46 (29.11%)
|
|
T3-4
|
125 (78.62%)
|
112 (70.89%)
|
|
N (n%)
|
|
|
0.8737
|
N0
|
49 (30.82%)
|
51 (32.28%)
|
|
N+
|
110 (69.18%)
|
107 (67.72%)
|
|
M (n%)
|
|
|
0.4975
|
M0
|
150(94.34%)
|
145 (91.77%)
|
|
M1
|
9 (5.66%)
|
13 (8.23%)
|
|
In addition, prediction scores in SRAMP(39) suggested that there were considerable adenosine methylation sites with very high confidence or high confidence, especially in CA10, LINC02303 and C1QL2 (Figure S3. A-F).
The Risk Model Has Strong Association with Clinical Prognosis in Gastric Cancer
Survival curves, distribution of patients’ status/survival time, univariate Cox regression and multivariate Cox regression were conducted in training dataset and test dataset to testify if the risk model was capable for predicting GC patients’ prognosis. Consistently, both in the two datasets, OS of high-risk subgroups were shorter than that of low-risk subgroups (Fig. 4. A, B). The distribution of GC patients’ vital status and survival time according to risk score were displayed in Fig. 4. C, D, result demonstrated that samples in low-risk subgroup had longer survival time than that in high-risk subgroup, and there were more dead samples in high-risk sub group than in low-risk subgroup. Meanwhile, whenever in univariate Cox regression model or in multivariate Cox regression model, the risk score could be recognized as an independent risk factor of GC patients in training dataset and test dataset (Fig. 4. E, F). What’s more, the ROC curves showed that the risk model had a promising capability to predict of GC patients with the Areas Under the Curve (AUC) of 3-year OS in training and test datasets were 0.633 and 0.668 (Figure S4. A, B), AUC of 5-year OS were 0.562, 0.607 respectively (Figure S4. C, D).
To further validate the model, samples in the two datasets were grouped by clinical prognosis features including age, gender, stage, grade, T and N. OS curves between the low-risk subgroups and high-risk subgroups in above cohorts were compared in Fig. 5. The results indicated that in most of these cohorts, low-risk subgroups had longer OS than high-risk subgroups, however, in some cohorts, there were no statistical significance, these cohorts included stage 1–2 in test dataset, grade 1–2, T 1–2 and N0 in both datasets.
Construction and Validation of Nomogram Model
To obtain a quantitative tool for predicting the OS of GC patients, a nomogram model was built using age, gender, stage, grade, and risk score in training dataset, and was verified in test dataset (Fig. 6). In training cohort, the calibration curves showed strong and acceptable consistency of observed and predicted ratios in 3-year and 5-year OS respectively (Figures S5 A, B). The Decision Curve Analysis (DCA) curves of nomogram indicated that if the threshold probability of 3-year OS was 0.16–0.39, and 5-year OS was 0.1–0.44, the nomogram could offer a higher net benefit than predicting for all patients or no patients (Figures S5, C, D). These results of validation suggested that our nomogram had a strong ability and accuracy in predicting the OS of GC patients.
Relationship of the Risk Model and Immune Cell Infiltration
The DEGs might be participated in various pathways to execute their functions. GO enrichment analysis, a method mainly used to perform enrichment analysis on gene sets (38), here was carried out to investigate the potential biological processes of these DEGs, the result indicated that some of DEGs were enriched in biological processes like epidermis and regulation of peptidase, what’s more, part of them were involved in immunity activities such as defense response (Figure S6 A, B). Thereafter, the training dataset and test dataset were combined, the risk model was checked again in the whole dataset used Kaplan-Meier method, the high-risk subgroup still led to poor OS (Figure S4. E). Subsequently, CIBERSORT algorithm (an analytical tool from the Alizadeh Lab developed by Newman et al. to provide an estimation of the abundances of member cell types in a mixed cell population, using gene expression data) was used to calculate 22 immune cells infiltration proportion of each sample in the whole dataset (Fig. 7. A), results demonstrated that ALKBH5-high subgroup were infiltrated with more naïve B cells, Neutrophils, Plasma cells and follicular helper T cells (Fig. 7. B). Furthermore, high-risk subgroup had more infiltration of naive B cells and resting CD4 T cells, but low-risk subgroup was infiltrated with more activated memory CD4 T cells, CD8 T cells, M1 Macrophages, and follicular helper T cells (Fig. 7. C). The results indicated that the expression of ALKBH5 shaped immune conditions of tumor samples and compared to high-risk subgroup, low-risk subgroup had a better immune microenvironment.