3.1Demography
After excluding those patients without methylation or clinical information, 294COAD and 97 READ patients were included in the study. The clinical and pathological information of these patients displayed in Table 1-2. In COAD cohort, 3.40% of patients were less than 30–39 years old, 11.22% were 40–49 years old, 18.71% were 50–59 years old, 26.19% were 60–69 years old, 25.85% were 70–79 years old, and 14.63% were above 80 years old. There were, respectively, 44 patients with pathologic TNM stage I, 114 patients with pathologic TNM stage II, 85 patients with pathologic TNM stage III, 41 patients with pathologic TNM stage IV, and 10 patients with an unknown TNM stage in our study (table 1).
In READ cohort, 3.09% of patients were less than 30–39 years old, 14.43% were 40–49 years old, 22.68% were 50–59 years old, 23.71% were 60–69 years old, 31.96% were 70–79 years old, and 4.12% were above 80 years old. There were, respectively, 11 patients with pathologic TNM stage I, 29 patients with pathologic TNM stage II, 35 patients with pathologic TNM stage III, 13 patients with pathologic TNM stage IV, and 9 patients with an unknown TNM stage in our study (table 2).
Table 1
COAD clinical characteristics
Colon Clinical variables
|
Clinical values (N=294)
|
Sex (Male/Female)
|
158/136
|
Age (Mean + SEM)
|
64.94 ± 0.7738
|
Race (Asian/Black/White/NA)
|
11/58/206/19
|
Pathologic stage (I/II/III/IV/V/NA)
|
44/114/85/41/10
|
Pathologist T stage (T1/T2/T3/T4/NA)
|
7/43/203/10/1
|
Pathologic N stage (N0/N1+2/NA)
|
171/74/49
|
Pathologic M stage (M0/M1/NA)
|
198/41/55
|
Table 2
READ clinical characteristics
Rectum Clinical variables
|
Clinical values (N=97)
|
Sex (Male/Female)
|
53/44
|
Age (Mean + SEM)
|
62.86 ± 1.251
|
Race (Asian/Black/White/NA)
|
1/5/76/15
|
Pathologic stage (I/II/III/IV/V/NA)
|
11/29/35/13/9
|
Pathologist T stage (T1/T2/T3/T4/NA)
|
4/12/59/11/1
|
Pathologic N stage (N0/N1+2/ NA)
|
42/30/25
|
Pathologic M stage (M0/M1/NA)
|
69/12/16
|
3.2 Differential methylationand expression analysis
We downloaded 334 COAD (38 control vs 296 tumor) and 105 READ (7 control vs 98 tumor) methylation data from the TCGA database and set thresholds for the relevant parameters (padj< 0.05 and |logFC| ≥ 0.2) to identify the differentially methylation probes (DMPs). There were 41847 DMPs (21079 hypermthylaed, 20768 hypomethylated) and 42841 DMPs (27250 hypermthylaed, 15591hypomethylated)which had been identified in COAD and READ respectively (Fig 1a-b).Combined all methylation samples from COAD and READ, 40936 DMPs (21998 hypermthylaed, 18938 hypomethylated)had been identified (Fig 1c). By considering the CpG content and the neighboring context, the hypomethylaion rate of island was shown to be the highest (71.87%, 67.63%, 69.09%) while the hypermethylaion rate of opensea was shown to be the highest (70.31%, 65.31%, 69.65%) in COAD, READ, and CRC respectively (Fig 1d-f). Examining the sites surrounding genes revealed that the hypomethylaion rate of body was shown to be the highest (26.31%, 27.65%, 26.48%) while the hypermethylaion rate of IGR was shown to be the highest (39.71%, 38.44%, 39.58%) in COAD, READ, and CRC respectively (Fig g-i).
Similarly, we also downloaded 497 COAD (41 control vs456 tumor) and 176 READ (10 control and 166 tumors) mRNA sequencing data from TCGA database and set thresholds for the relevant parameters (padj<0.05, |logFC|≥0.5, and basemean≥50) to identify the differentially expression genes (DEGs) by DEseq2 analysis standard workflow. After analysis, 7550 DEGs (4089up-regulated,3461 down-regulated) had been identified in COAD, and 6954 DEGs (3791 up-regulated,3163 down-regulated) had been identified in READ (Fig 2a-b).When we combined all collected samples in COAD and READ, 7616 DEGs (4111 up-regulated, 3505down-regulated) had been identified in CRC (Fig 2c).
By cross analysis of DEGs and DMGs, we found that there were 2080, 2064 and 2118 overlap genes in COAD, READ and CRC respectively, and 9221,7647 and 8821 corresponding DMPs (Fig 2d-f).And by cross analysis of DEGs and DMGs located in TSS1500, TSS200 and 5′UTR, we found that there were 1131, 1078 and 1130 overlap genes in COAD, READ and CRC respectively which corresponds with 3549, 2685, and 3392 DMPs (Fig 2g-i).
3.3 Correlation Analysis
Previous studies indicated that the relationship between methylation and genes expression is negative correlation. To further narrow-down target genes which potentially regulated by methylation in CRC, we introduced correlation analysisand found that there were 348, 252 and 316 overlap genes in COAD, READ and CRC respectively between DMGs and DEGs under specific criteria for DMGs (pvalue<0.05 and r<-0.3). The methylation degree distributions of top three hypermethylated and hypomethylated genes for COAD, READ and CRC patients were shown in figure 3.And the top 3 negative correlations of those genes for COAD, READ and CRC patients were also shownin figure 4.
To furtherknown the biological effects of those overlap genes of DMGs and DEGs with negative correlation, we performed GO analysis.The result of GO analysis for COAD, READ and CRC revealed that biological process (BP), cellular component (CC) and molecular functions (MF) were enriched and displayed in supplementary table 1. We setp value < 0.05 and FDR cutoff < 0.05, and found 3 BP, 1 CC and 2 MF were enriched in COAD involved 257 DEGs; 1 MF was enriched in READ involved 33 DEGs; 3 BP and 1 MF were enriched in CRC involved 166 DEGs (Fig 5a-d).
3.4 Survival Analysis
In order to evaluate the effect of methylation regulated genes on COAD, READ and CRC patient’s prognosis, we conducted the Kaplan–Meier survival analysis and univariate Cox regression analysis. We found that 24 out of 348 genes were associated with the patient’s overall survival (OS). Patients with low expression of 11 genes (APBB1, FBXO17, FES, GSTM2, HSPA1A, NPTX2, OSBPL3, SLC43A3, SPEG, TCF7L1, and TME) and high expression of 13 genes (CBFA2T3, CDC42SE2, EDNRB, EIF4E3, GNPNAT1, HHIP, NRG1, REP15, RORC, SALL1, TSPAN11, UNC5C, and ZNF132) exhibited better OS in COAD. And 16 out of 252 genes were associated with the READ patient’s overall survival (OS). Patients with low expression of 11 high expression genes (ABCC2, AQP1, CLSTN3, EPHX1, HOXA3, HOXB3, HYAL2, PGC, S100A11, ST3GAL2, and ZNF415) and 5 high expression genes (C2CD4A, EPHX4, EREG, FCHO1, and NOL12) exhibitedbetter OS in READ.And 28 out of 316 differential genes were associated with the all patient’s overall survival (OS). Patients with low expression of these 15 genes (ADAMTSL3, ANXA9, APBB1, AQP1, CBFA2T3, CLIP3, DNAJC15, GNG4, HSPA1A, LAYN, NDRG4, NPTX2, SMAD9, SYT1, TMEM106A, ZNF132) and high expression of these 13 genes (C2CD4A, DPP10, EIF4E3, EPHX4, FAM160A1, HHIP, HLX, NR3C2, RELN, REP15, TSPAN11, UNC5C, and VSIG2) exhibited better OS in CRC.
Table 3
Survival analyses of COAD, READ and CRC
COAD
|
Symbol
|
LogRank
|
HR
|
HRlower
|
HRupper
|
P
|
APBB1
|
0.027
|
1.58
|
1.05
|
2.39
|
0.029
|
CBFA2T3
|
0.013
|
0.59
|
0.39
|
0.90
|
0.014
|
CDC42SE2
|
0.009
|
0.58
|
0.38
|
0.88
|
0.010
|
EDNRB
|
0.045
|
0.65
|
0.43
|
0.99
|
0.047
|
EIF4E3
|
0.030
|
0.63
|
0.41
|
0.96
|
0.031
|
FBXO17
|
0.047
|
1.51
|
1.00
|
2.27
|
0.048
|
FES
|
0.015
|
1.66
|
1.10
|
2.51
|
0.017
|
GNPNAT1
|
0.018
|
0.61
|
0.41
|
0.92
|
0.019
|
GSTM2
|
0.017
|
1.64
|
1.09
|
2.47
|
0.018
|
HHIP
|
0.030
|
0.64
|
0.42
|
0.96
|
0.032
|
HSPA1A
|
0.000
|
2.18
|
1.42
|
3.35
|
0.000
|
NPTX2
|
0.029
|
1.57
|
1.04
|
2.37
|
0.030
|
NRG1
|
0.013
|
0.59
|
0.39
|
0.90
|
0.014
|
OSBPL3
|
0.017
|
1.65
|
1.09
|
2.48
|
0.018
|
REP15
|
0.024
|
0.63
|
0.41
|
0.94
|
0.026
|
RORC
|
0.035
|
0.65
|
0.43
|
0.97
|
0.037
|
SALL1
|
0.028
|
0.63
|
0.41
|
0.95
|
0.029
|
SLC43A3
|
0.021
|
1.62
|
1.07
|
2.43
|
0.022
|
SPEG
|
0.020
|
1.61
|
1.07
|
2.43
|
0.022
|
TCF7L1
|
0.035
|
1.55
|
1.03
|
2.33
|
0.037
|
TMEM106A
|
0.003
|
1.85
|
1.22
|
2.79
|
0.004
|
TSPAN11
|
0.018
|
0.60
|
0.39
|
0.92
|
0.019
|
UNC5C
|
0.011
|
0.58
|
0.38
|
0.89
|
0.012
|
ZNF132
|
0.034
|
0.64
|
0.43
|
0.97
|
0.036
|
READ
|
Symbol
|
LogRank
|
HR
|
HRlower
|
HRupper
|
P
|
ABCC2
|
0.041
|
2.33
|
1.01
|
5.35
|
0.047
|
AQP1
|
0.043
|
2.30
|
1.00
|
5.26
|
0.049
|
C2CD4A
|
0.015
|
0.37
|
0.16
|
0.85
|
0.020
|
CLSTN3
|
0.006
|
3.19
|
1.34
|
7.56
|
0.009
|
EPHX1
|
0.027
|
2.40
|
1.08
|
5.35
|
0.032
|
EPHX4
|
0.004
|
0.31
|
0.14
|
0.73
|
0.007
|
EREG
|
0.014
|
0.37
|
0.16
|
0.84
|
0.017
|
FCHO1
|
0.037
|
0.43
|
0.19
|
0.98
|
0.043
|
HOXA3
|
0.027
|
2.48
|
1.08
|
5.67
|
0.032
|
HOXB3
|
0.020
|
2.58
|
1.13
|
5.89
|
0.025
|
HYAL2
|
0.026
|
2.43
|
1.09
|
5.42
|
0.031
|
NOL12
|
0.016
|
0.37
|
0.16
|
0.86
|
0.020
|
PGC
|
0.020
|
2.53
|
1.13
|
5.68
|
0.024
|
S100A11
|
0.023
|
2.58
|
1.11
|
6.00
|
0.028
|
ST3GAL2
|
0.009
|
2.92
|
1.26
|
6.77
|
0.012
|
ZNF415
|
0.042
|
2.31
|
1.01
|
5.27
|
0.048
|
CRC
|
Symbol
|
LogRank
|
HR
|
HRlower
|
HRupper
|
P
|
ADAMTSL3
|
0.031
|
1.48
|
1.03
|
2.13
|
0.032
|
ANXA9
|
0.010
|
1.61
|
1.12
|
2.31
|
0.011
|
APBB1
|
0.035
|
1.47
|
1.03
|
2.11
|
0.036
|
AQP1
|
0.028
|
1.49
|
1.04
|
2.14
|
0.029
|
C2CD4A
|
0.011
|
0.63
|
0.44
|
0.90
|
0.012
|
CBFA2T3
|
0.005
|
0.59
|
0.41
|
0.86
|
0.006
|
CLIP3
|
0.043
|
1.45
|
1.01
|
2.07
|
0.044
|
DNAJC15
|
0.035
|
0.68
|
0.48
|
0.98
|
0.036
|
DPP10
|
0.034
|
0.68
|
0.47
|
0.97
|
0.035
|
EIF4E3
|
0.027
|
0.66
|
0.46
|
0.96
|
0.028
|
EPHX4
|
0.017
|
0.65
|
0.45
|
0.93
|
0.018
|
FAM160A1
|
0.006
|
0.60
|
0.41
|
0.87
|
0.006
|
GNG4
|
0.043
|
1.45
|
1.01
|
2.07
|
0.045
|
HHIP
|
0.015
|
0.64
|
0.45
|
0.92
|
0.016
|
HLX
|
0.015
|
1.56
|
1.09
|
2.24
|
0.015
|
HSPA1A
|
0.000
|
2.01
|
1.39
|
2.92
|
0.000
|
LAYN
|
0.012
|
1.58
|
1.10
|
2.27
|
0.013
|
NDRG4
|
0.049
|
1.43
|
1.00
|
2.05
|
0.050
|
NPTX2
|
0.042
|
1.45
|
1.01
|
2.08
|
0.043
|
NR3C2
|
0.033
|
0.68
|
0.47
|
0.97
|
0.034
|
RELN
|
0.026
|
0.66
|
0.46
|
0.95
|
0.027
|
REP15
|
0.029
|
0.67
|
0.47
|
0.96
|
0.030
|
SMAD9
|
0.038
|
0.68
|
0.48
|
0.98
|
0.039
|
SYT1
|
0.009
|
1.62
|
1.12
|
2.33
|
0.010
|
TMEM106A
|
0.004
|
1.70
|
1.18
|
2.44
|
0.004
|
TSPAN11
|
0.003
|
0.57
|
0.39
|
0.83
|
0.003
|
UNC5C
|
0.027
|
0.66
|
0.46
|
0.96
|
0.028
|
VSIG2
|
0.022
|
0.66
|
0.46
|
0.94
|
0.023
|
ZNF132
|
0.017
|
0.65
|
0.45
|
0.93
|
0.018
|
As is well-known, the TNM staging were correlated with the OS. We also evaluated their relationship of TNM staging with the OS, and found that the TNM staging were actually associated with the overall survival in COAD, READ and CRC, but not for the T staging in READ (Fig 6a-i). Then, we analyzed the relationship of potential prognosisgenes with TNM staging. The results showed that 3genes (HSPA1A, SALL1, and TCF7L1) were associated with T staging, 9 genes (APBB1, CDC42SE2, FBXO17, FES, GNPNAT1, OSBPL3, RORC, SPEG, and TCF7L1) were associated with N staging and 5 genes (APBB1, EIF4E3, GNPNAT1, HSPA1A, and TCF7L1) were associated with M staging in COAD (Fig 6j). Similarly, AQP1 and HOXA3 were associated with N staging;AQP1 was associated M staging in READ (Fig 6k). Combined analysis, 3 genes (EIF4E3, FAM160A1, HSPA1A) were associated with T staging, 15 genes (ADAMTSL3, ANXA9, APBB1, AQP1, C2CD4A, CLIP3, DNAJC15, EIF4E3, FAM160A1, GNG4, HLX, HSPA1A, LAYN, NR3C2, and SYT1) were associated with N staging and 10 genes (ANXA9, APBB1, AQP1, C2CD4A, EIF4E3, FAM160A1, GNG4, HSPA1A, NR3C2, and SYT1) were associated the M staging in CRC(Fig 6l).
By retrospective examination, we found that those 12 prognosis genes in COAD were negative correlated with 27 DMPs, 2 genes in READ were negative correlated with 8 DMPs, and 15 genes in CRC were negative correlated with 29 DMPs as shown in Fig 7a. Survival analysis also indicated that the patients with low methylation of cg18149207 (hypomethylated gene: RORC) in COAD, cg04408595 (hypomethylated gene: EIF4E3) in CRC and cg19413725 (hypomethylated gene: FAM160A1) in CRC exhibited better OS (Fig 7b, e-f); the patients with high methylation of cg02000808 (hypomethylated gene: HOXA3) and cg21134232 (hypomethylated gene: HOXA3) in READ exhibited better OS (Fig 7c-d).