Identification and enrichment analysis of MDEGs in GC
The flowchart for this study is shown in Figure 1. With cut-off criteria of FDR < 0.05, 10400 and 3238 DEGs were identified from GSE13911 and GSE79973 respectively. A total of 2669 DEGs, including 1376 up-regulated genes and 1293 down-regulated genes were concordance. The concordance score was 99.8% (binomial test, P<0.0001). Similarly, we identified 11235 and 4414 DMGs from GSE30601 and GSE25869 respectively. A total of 3741 DMGs, including 2327 hyper-methylated genes and 1414 hypo-methylated genes were concordance. The concordance score was 97.7% (binomial test, P<0.0001). By correlating the level of RNA expression with the degree of DNA methylation, we totally identified 255 MDEGs consisted of 192 hypermethylation-low expression genes and 63 hypomethylation-high expression genes (Figure 2A, Supplementary Table S2). To confirm that the FDR value is logical using a different test, a representative volcano plot was constructed for GSE13911 and GSE79973, respectively (Figure 2B).
Enrichment analysis with the Database for DAVID was used to elucidate biological function of the MDEGs. The top significant terms emerging from the gene oncology (GO) enrichment analysis are shown in Figure 2C. MDEGs were enriched in “biological processes of positive regulation of cell proliferation,” “positive regulation MAPK cascade,” “positive regulation ERK1 and ERK2 cascade,” “positive regulation MAP kinase activity,” and “activation of adenylate cyclase acivity.” Regarding molecular function, MDEGs showed enrichment in “cytokine binding” and “protein binding.” Enrichment of cell components was mostly “integral component of plasma membrane,” which suggests MDEGs may play an important role in transcription in GC. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis suggested that MDEGs were significantly enriched in pathways in “cAMP signaling pathway,” “histidine metabolism,” and “chemical carcinogenesis” (Figure 2D).
Construction of the eight-MDEGs prognostic signature for GC
Using the univariate Cox regression analysis, we identified MDEGs with potential prognostic value in training cohort. Details of the clinical characteristics are presented in Supplementary Table S1. A total of 83 MDEGs including 35 hypomethylation‐high expression genes and 48 hypermethylation‐low expression genes were associated with the overall survival. Based on those prognostic MDEGs, we used the R package “glmnet” to perform Lasso regression analysis. The degree of Lasso regression complexity is controlled by the parameter λ (0<λ<1). We obtained the optimal value of the parameter λ with the number of variables equal to eight through multiple cross-validation. Therefore, combining the regression coefficients under the optimal λ value, we constructed an eight-MDEGs signature to guide the prognosis of GC patients. The risk-score formula was created as follows: Risk score = (0.185*expression level of TREM2) + (0.045*expression level of RAI14) + (0.078*expression level of NRP1) + (0.043*expression level of YAP1) + (0.012*expression level of MATN3) + (0.063*expression level of PCSK5) + (0.210*expression level of INHBA) + (0.154*expression level of MICAL2). We then calculated the risk score for each patient and ranked them based on increasing score, after which patients were classified into a high-risk (n = 96) or a low-risk (n = 96) group based on the median risk score. We observed the overall survival between two risk groups with significantly different survival rate (log-rank P<0.0001; Figure 3A). Patients with high risk score had significantly shorter OS than patients with low risk score. OS rates among patients were 34.4% in the high-risk group, as compared to 66.7% in the low-risk group (Figure 3B). The risk score distribution, survival status, and expression profile of the eight prognostic MDEGs are shown in Figure 3C. Taking into the patients’ clinical features, including age, gender and stage, the Multivariate Cox regression analysis showed that the eight-MDEGs signature risk score also had statistical significance as an independent prognostic factor in the training cohorts (HR=2.28, 95%CI= 1.47-3.53, P= 2.22E-04) (Table 2).
Table 2 Multivariate analysis of prognostic factors by Cox proportional hazard model
Variables
|
GSE15459
|
TCGA-GC
|
GSE84437
|
HR
(95%CI)
|
P value
|
HR
(95%CI)
|
P value
|
HR
(95%CI)
|
P value
|
Age
|
1.09
(0.71-1.68)
|
0.709
|
1.96
(1.09-3.51)
|
0.025
|
1.83
(1.38-2.43)
|
<0.001
|
Eight-MDEGs signature
|
|
High-risk vs. low-risk
|
2.28
(1.47-3.53)
|
<0.001
|
1.84
(1.13-2.99)
|
0.015
|
1.43
(1.09-1.88)
|
0.011
|
Gender
|
|
Male vs. Female
|
1.07
(0.68-1.68)
|
0.779
|
1.25
(0.71-2.22)
|
0.449
|
1.31
(0.96-1.77)
|
0.085
|
TNM stage
|
|
III+IV vs. I+II
|
5.99
(3.27-10.99)
|
<0.001
|
2.24
(1.29-3.91)
|
0.004
|
-
|
-
|
Prognostic validation of the eight-MDEGs signature
We validated the prognosis performance of the eight-MDEGs signature in two validation datasets, TCGA-GC and GSE84437 with 368 and 433 patients respectively. Similar to the training cohort findings, patients in high risk group had a shorter survival time than low risk group either in TCGA-GC (HR=1.85, 95% CI= 1.16-2.94, P= 0.0084) or GSE84437 datasets (HR=1.35, 95% CI= 1.03-1.78, P= 0.0302) (Figure 4A and Figure 4B). The risk score distribution, survival status, and expression profile of the four prognostic MDEGs are shown in Figure 4C and Figure 4D. As missing stage information of partial patients in TCGA, twenty-three patients were excluded when performed multivariate Cox regression analysis. In accordance with the result of the training set, the multivariate Cox regression analysis showed that the eight-MDEGs signature risk score also had statistical significance as an independent variable in the TCGA-GC (HR=1.84, 95%CI= 1.13-2.99, P= 0.015) and GSE84437 (HR=1.43, 95%CI= 1.09-1.88, P= 0.011) respectively (Table 2).