3.1 Identification of DEGs
In the sequencing data, 11481 DEGs were screened from the tumor and normal cases (Figure 1A).In TCGA database, 5062 DEGs were screened from the tumor and normal cases (Figure 1B). Finally, 58 DE-IMRGs were obtained from 5062 DEGs of TCGA, 11481 DEGs of sequencing data and 428 IMRGs (Figure 1C).
3.2 The linkage between DEGs and methylation of CpG site
10 DEGs were obtained whose expression were associated with CpG sites to explore the potential regulation of DNA methylation on gene expression. These 10 DEGs were SLC11A1 (Exp-logFC = 1.603420664, cor = 0.5011, p = 0.0368268), FANCG (Exp-logFC = 0.669374024, cor = -0.3843059, p = 2.87E-13), CCND1 (Exp-logFC = 0.578168838, cor = -0.3466985, p = 6.35E-11), ALKBH2 (Exp-logFC = 0.705188653, cor = -0.3018947, p = 1.65E-08), CDK5RAP1 (Exp-logFC = 0.588938787, cor = -0.2978591, p = 2.60E-08), CDO1 (Exp-logFC = -1.393788957, cor = -0.2903042, p = 6.00E-08), HMBS (Exp-logFC = 0.889997626, cor = -0.2693556, p = 5.39E-07), NTHL1 (Exp-logFC = 0.755057612, cor = -0.2686109, p = 5.81E-07), FXN (Exp-logFC = 0.698112433, cor = -0.2547962, p = 2.23E-06) and KIF23 (Exp-logFC = 1.595989666, cor = -0.2520228, p = 2.90E-06) (Table 1).
Table 1. DEG-CpGs
|
Symbol
|
logFC-exp
|
cor
|
p-value
|
SLC11A1
|
1.603420664
|
0.5011
|
0.036827
|
FANCG
|
0.669374024
|
-0.3843059
|
2.87E-13
|
CCND1
|
0.578168838
|
-0.3466985
|
6.35E-11
|
ALKBH2
|
0.705188653
|
-0.3018947
|
1.65E-08
|
CDK5RAP1
|
0.588938787
|
-0.2978591
|
2.6E-08
|
CDO1
|
-1.39378896
|
-0.2903042
|
6E-08
|
HMBS
|
0.889997626
|
-0.2693556
|
5.39E-07
|
NTHL1
|
0.755057612
|
-0.2686109
|
5.81E-07
|
FXN
|
0.698112433
|
-0.2547962
|
2.23E-06
|
KIF23
|
1.595989666
|
-0.2520228
|
2.9E-06
|
PLOD3
|
0.856492524
|
-0.2372911
|
1.1E-05
|
GLRX3
|
0.51508111
|
-0.2336994
|
1.51E-05
|
RAD51
|
1.522265757
|
-0.2295502
|
2.15E-05
|
ATP6V1C1
|
0.516632482
|
-0.2265069
|
2.78E-05
|
PUS1
|
0.939137431
|
-0.2249831
|
3.15E-05
|
FANCI
|
1.382012951
|
-0.2147057
|
7.27E-05
|
XRCC2
|
1.635300181
|
-0.2004757
|
0.000217
|
P4HA3
|
1.973571416
|
0.1896872
|
0.000473
|
ATP6V1C2
|
1.244956964
|
-0.183886
|
0.000707
|
CYP27B1
|
1.336683143
|
-0.183343
|
0.000733
|
UBE2T
|
1.532447267
|
-0.1635036
|
0.002646
|
PRIM2
|
0.761406132
|
-0.1614566
|
0.002997
|
PALB2
|
0.654905467
|
-0.1583963
|
0.003602
|
HYAL2
|
0.650940956
|
-0.1528398
|
0.00499
|
POLA1
|
0.703782535
|
-0.1504755
|
0.005714
|
ALOX12B
|
1.390150295
|
0.1495621
|
0.006019
|
CYP4B1
|
-1.92512197
|
-0.1439937
|
0.008208
|
SCARA5
|
-1.92784436
|
0.1277368
|
0.01916
|
CCNB1
|
1.735808613
|
-0.1267312
|
0.02014
|
REP15
|
-2.2991481
|
-0.1202464
|
0.02753
|
RTEL1
|
0.967633929
|
-0.1037375
|
0.05749
|
ABCE1
|
0.893800372
|
0.1032949
|
0.05857
|
POLD1
|
0.879078201
|
-0.0991999
|
0.06936
|
TMPRSS6
|
-1.56762532
|
0.09883517
|
0.0704
|
RFWD3
|
0.760886638
|
-0.09304779
|
0.08858
|
FANCB
|
1.564025495
|
-0.09022061
|
0.09874
|
FANCA
|
1.248125352
|
-0.08374725
|
0.1255
|
PLOD1
|
0.844171575
|
-0.07410696
|
0.1754
|
COL7A1
|
1.812713075
|
-0.06719322
|
0.2193
|
DNA2
|
0.987347716
|
0.06629113
|
0.2255
|
FANCM
|
0.640403824
|
0.06592408
|
0.2281
|
PPEF1
|
1.314169478
|
0.06155626
|
0.2605
|
DOHH
|
0.680727916
|
-0.05825401
|
0.287
|
MMP1
|
1.558222571
|
-0.05223964
|
0.3398
|
CYP26B1
|
1.312969294
|
-0.04155381
|
0.4477
|
PPAT
|
1.01160099
|
0.03998513
|
0.4651
|
BRIP1
|
0.876330825
|
-0.03070913
|
0.5748
|
FANCE
|
0.816669612
|
-0.02599784
|
0.6349
|
YARS2
|
0.514504526
|
-0.00627551
|
0.9088
|
BRCA1
|
1.050006084
|
0.000138788
|
0.998
|
3.3 Construction and evaluation of a IMRGs signature
The 58 DE-IMRGs of the TCGA database were imported to univariate COX regression and 5 prognosis related genes (P4HA3, DOHH, POLD1, MMP1 and FANCE) were obtained (Figure 2A). Afterward, these 5 signature genes were evaluated as signature genes for multivariate Cox regression analysis. Finally, DOHH (coef = -0.37939, HR = 0.6843, p = 0.05429), P4HA3 (coef = 0.30383, HR = 1.3550, p = 0.08584) and MMP1 (coef = 0.07982, HR = 1.0831, p = 0.10920) were selected as signature genes (Figure 2B).
The patients in training cohort were classified into low- and high-risk groups (Figure 3A). In addition, the signature genes of P4HA3 and MMP1 were positively linked to risk-score, but the DOHH was negatively associated with risk-score (Figure 3B). The Kaplan-Meier curve showed that patients with higher risk had a poorer prognosis with p = 0.0028 (Figure 3C). In details, the AUCs for OS (1-, 2-, 3-, 4- and 5-years) were 0.617, 0.604, 0.629, 0.653 and 0.711 (Figure 3D).
Likewise, the GC patients of validation cohort were segmented into low- and high-risk groups (Figure 4A). The correlation of risk-score and signature genes (Figure 4B), and the result of Kaplan-Meier curve were similar with training cohort (Figure 4C). Moreover, the AUCs for 1-year was 0.675; 2-years was 0.649; 3-years was 0.609; 4-years was 0.602 and 5-years was 0.602 (Figure 4D).
3.4 Independent prognostic analysis of IMRGs signature
The 345 patients of TCGA with complete information including treatment type, risk-score, ajcc pathologic stage, gender, ajcc pathologic m, age at index and ajcc pathologic t were included for univariate Cox regression analysis. As shown in Figure 5A, treatment type and risk-score were related with risk model. Moreover, treatment type and risk-score were selected as independent prognostic factors by multivariate Cox regression analysis (Figure 5B). Next, treatment type and risk-score were included to establish a nomogram and the C-index of nomogram model was 0.596096 (Figure 5C). Furthermore, the calibration effect of 1- and 3-years in calibration curve were performed well (Figure 5D). Decision curve analysis (DCA) shows that the nomogram model achieves better net benefit than 1-year OS rate (Figure 5E).
3.5 Correlation analysis between the IMRGs signature and clinical characteristics
The expression of the P4HA3 and MMP1 were high in GC patients of high-risk group. However, the abundance of the DOHH was high in GC of low-risk group (Figure 6A). Moreover, the expression of MMP1 was higher in GC patients at age >= 60 (Figure 6B).
3.6 Immune infiltration analysis
For studying the imparity of immune infiltration in patients with differing risks, the spearman correlations for 24 immune cells were calculated. The 11 immune cells were different between the two risk GC subgroups (Figure 7A). The relationship between the signature genes and 24 immune cells and signature genes shown that P4HA3 was significantly associated with 17 immune cells; MMP1 was significantly associated with 12 immune cells; DOHH was significantly related to 18 immune cells (Figure 7B).
3.7 Collaborative gene analysis of signature genes
To investigate genes that act synergistically with signature genes, the pearson correlations were calculated between pairwise genes (Figure 8A). The P4HA3 were associated with 49 genes, such as ADAMTS2, ANTXR1 and APCDD1L (Figure 8A). The DOHH were associated with 33 genes, such as ADAMTS2, ANTXR1 and APCDD1L (Figure 8A). The MMP1 were associated with 2 genes including IL24 and MMP3 (Figure 8A). The GO and KEGG results of 84 collaborative genes were shown in the Figure 8B and Figure 8C. The top 8 GO terms were collagen-containing extracellular matrix and extracellular matrix organization etc. The KEGG pathways of enrichment were‘Protein digestion and absorption’ as well as ‘AGE-RAGE signaling pathway in diabetic complications’.
3.8 qRT-PCR
The expression of DOHH, P4HA3 and MMP1 were detected via qRT-PCR. The abundance of these genes were higher in tumor cases than normal cases (Figure 9).