Identification of differentially methylated markers
The workflow showing the step-by-step procedure for this analysis and the demographic characteristics of participants are presented in Fig. 1 and Table 1 respectively. We analysed the microarray methylation profile of 166 (87 males and 79 females) CRC and 424 (84 males and 340 females) healthy normal subjects. The average age of CRC subjects was 55 years old whereas, the normal subjects had a mean age of 53. The average time-to-diagnosis for cases was 6.2 years (range = 0-14.3). The Illumina Human Methylation 450 Beadchip contained DNA methylation status of 485,512 CpG sites. Pre-processing and quality control were performed and the poor performing probes were filtered out. A total of 399,934 CpG sites (Additional file 1: Figure S1) were yielded, and their methylation data were used for further analysis. A total of 49,299 CpGs (corresponding with 11, 786 unique genes) were differentially methylated (FDR < 0.05) between the CRC and healthy normal subjects.
Table 1
Characteristics of Training and Testing Dataset of Nested Case Control Study Based on EPIC-Italy Cohort
| Entire Dataset | | | Training Dataset | | | Testing Dataset | |
Characteristics | Cases | Control | | Cases | Control | | Cases | Control |
Total | 166 | 424 | | 117 | 297 | | 49 | 127 |
Age, Mean (SD) | 55.07 (6.73 ) | 53.23 (7.19) | | 55.94 (6.73 ) | 53.08 (7.20) | | 55.25 (6.62) | 53.56 (7.20) |
< 60 | 128 (26.9) | 348 (73.1) | | 89 (26.4) | 245 (73.4) | | 10 (27.5) | 103 (72.5) |
≥ 60 | 38 (33.3) | 76 (66.7) | | 28 (35.0) | 52 (65.0) | | 39 (29.4) | 24 (70.6) |
Gender | | | | | | | | |
Male | 87 (52.4) | 84 (19.8) | | 55 (47.4) | 61 (52.6) | | 32 (58.2) | 23 (41.8) |
Female | 79 (47.6) | 340 (80.2) | | 62 (20.8) | 236 (79.2) | | 17 (14.0) | 104 (86.2) |
Time-to-diagnosis (years) Mean (SD) | | | | | | | | |
< 6 | 80 (48.2) | NA | | 56 (47.9) | NA | | 24 (49.0) | NA |
≥ 6 | 86 (51.8) | NA | | 61(52.1) | NA | | 25 (51.0) | NA |
Abbreviations: NA, not applicable; SD, standard deviation |
Gene Ontology (GO) terms and KEGG pathway enrichment analysis for genes associated with the 49,299 differentially methylated CpGs were performed. The GO analysis showed the molecular functions, cellular components and biological functions of differentially methylated genes under the criterion FDR < 0.05 (Additional file 2: Table S1). In the KEGG pathway genes showed enrichments in the metabolic pathway (FDR = 1.19e-03), cancer- pathways (FDR = 6.58e-03), human papillomavirus infection (FDR = 1.61e-02), Rap1 signaling pathway (FDR = 4.36e-04) and Axon guidance (FDR = 2.12e-03) (Additional file 3: Table S2).
Of the 49,299 CpGs differentially methylated, 48 CpGs (corresponding with 29 unique genes) which had absolute mean β-value difference (|∆β| ≥ 0.05) were selected and denoted DMPs (Additional file 4: Table S3). Among the DMPs, a total of 15 CpGs (corresponding with 8 unique genes) were hypermethylated and 33 CpGs (corresponding with 21 unique genes) were hypomethylated. Hierarchical clustering was implemented to determine whether the identified DMPs could distinguish CRC from healthy normal subjects. The results shows that these two conditions in the CRC and healthy normal subjects exhibited distinctive DNA methylation patterns (Fig. 2). More so, it was observed that the samples were classified into two distinct clusters by these DMPs.
Methylation risk score construction
The entire sample of 590 was randomly split into training (117 CRC subjects and 297 healthy normal subjects) and testing (49 CRC subjects and 127 healthy normal subjects) sets (Table 1). Differentially methylated markers associated with CRC risk were screened on the training dataset using LASSO selection and stepwise logistic regression analysis. The sixteen markers mapped to nine genes including LGR6, PTPN12, PPFIA3, LOC399959, PCDHGA1, RNF39, ESYT3, MRGPRG and ATHL1 overlapping between the two methods were selected (Additional file 5: Figure S2). The associations of the sixteen individual markers with CRC by univariate and multivariate logistic regression analysis are presented in Additional file 6: Table S4 and Table 2 respectively.
Furthermore, using the sixteen-CpG panel we calculated a methylation risk score (MRS) for each subject on the training dataset using the formula:
MRS = (-0.4100*cg06551493) + (0.4332*cg01419670) + (0.2895*cg16530981) + (- 0.5172*cg18022036) + (- 0.3915*cg12691488) + (- 0.3246*cg17292758) + (- 0.2886*cg16170495) + (0.2451*cg11240062) + (- 0.5651*cg21585512) + (0.3615*cg24702253) + (- 0.2445 × cg17187762) + (- 0.3951 × cg05983326) + (- 0.5089*cg06825163) + (- 0.2504*cg11885357) + (- 0.2357*cg08829299) + (-0.3607*cg07044115). The methylation levels of 5 CpG sites were hypermethylated, and 11 CpG sites were hypomethylated.
The MRS (range, -5.59 to 4.35) was significantly higher for CRC subjects than in healthy normal subjects (P < 0.000), with a median MRS of 1.68 (IQR, 1.43) in CRC subjects and -0.430 (IQR, 2.89) in healthy normal subjects (Additional file 7: Figure S3a) in the training dataset. The MRS was associated with a 2.68-fold increased risk of CRC (OR = 2.68, 95% CI: 2.13, 3.38, P < 0.0001) Table 2. The MRS showed good predictive ability for discriminating between CRC and healthy normal subjects (AUC, 0.85; 95% CI: 0.81, 0.89) Figure 3a.
Validation of the sixteen-CpG panel MRS for CRC prediction in the testing dataset.
In order to validate the predictive performance of the sixteen-CpG panel MRS for the prediction of CRC risk, the predictive model was applied to the testing dataset. The MRS (range, -5.73 to 3.89) was also significantly higher for CRC subjects than in healthy normal subjects (P < 0.0001), with median MRS of 1.83 (IQR, 1.80) in CRC subjects and -0.45 (IQR, 2.64) in healthy normal subjects (Additional file 7: Figure S3b). Consistent with the training dataset, the MRS was associated with a 2.02-fold increased risk of CRC (OR = 2.02, 95% CI: 1.48, 2.74, P < 0.0001) Table 2. Similar to the training dataset, the MRS showed good predictive ability for discriminating between CRC and healthy normal subjects (AUC, 0.82; 95% C: 0.76, 0.88) Figure 3b.
Subgroup analysis for the association between MRS and CRC risk
When the study subjects were stratified according gender, age and time-to-diagnosis, the MRS still demonstrated an increased risk of CRC among both male and female subjects, younger (< 60 years) and the older (≥ 60 years) subjects as well as short and long time to diagnosis in the training and testing datasets (Table 3). Also, the case-only analysis demonstrated no correlation between methylation levels time-to-diagnosis (Additional file 8: Table S5).
Table 2 Multivariate Analysis on the Associations of DNA Methylation Marker, MRS and Risk of CRC of Nested Case Control Study Based on EPIC-Italy Cohort
|
|
|
|
Entire Dataset
|
|
|
|
Training Dataset
|
|
|
|
Testing Dataset
|
|
CpG ID
|
Gene Name
|
|
OR
|
95% CI
|
P-value
|
|
OR
|
95% CI
|
P-value
|
|
OR
|
95% CI
|
P-value
|
cg06551493
|
PTPN12
|
|
0.62
|
0.49, 0.78
|
6.58e-05
|
|
0.71
|
0.54, 0.91
|
0.009
|
|
0.42
|
0.25, 0.68
|
7.18e-04
|
cg01419670
|
NA
|
|
2.12
|
1.62, 2.85
|
1.47e-07
|
|
2.36
|
1.71, 3.361
|
6.16e-07
|
|
1.62
|
1.00, 2.82
|
0.06
|
cg16530981
|
NA
|
|
1.96
|
1.52, 2.57
|
5.01e-07
|
|
2.15
|
1.60, 2.98
|
1.29e-06
|
|
1.53
|
0.99, 2.57
|
0.08
|
cg18022036
|
NA
|
|
0.56
|
0.45, 0.70
|
5.47e-07
|
|
0.54
|
0.41, 0.69
|
3.97e-06
|
|
0.67
|
0.43, 1.03
|
0.07
|
cg12691488
|
NA
|
|
0.73
|
0.47, 1.10
|
0.14
|
|
0.67
|
0.40, 1.08
|
0.11
|
|
0.84
|
0.34, 1.95
|
0.70
|
cg17292758
|
PPFIA3
|
|
0.79
|
0.64, 0.98
|
0.04
|
|
0.79
|
0.61, 1.02
|
0.07
|
|
0.79
|
0.51, 1.20
|
0.27
|
cg16170495
|
RNF39
|
|
0.66
|
0.54, 0.80
|
3.17e-05
|
|
0.68
|
0.54, 0.85
|
0.001
|
|
0.62
|
0.41, 0.90
|
0.01
|
cg11240062
|
NA
|
|
1.25
|
1.00, 1.57
|
0.06
|
|
1.30
|
1.00, 1.69
|
0.05
|
|
1.11
|
0.70, 1.78
|
0.67
|
cg21585512
|
LOC399959
|
|
0.68
|
0.55, 0.83
|
1.61e-04
|
|
0.58
|
0.45, 0.74
|
1.64e-05
|
|
0.94
|
0.64, 1.38
|
0.75
|
cg24702253
|
MRGPRG
|
|
1.74
|
1.28, 2.58
|
0.002
|
|
1.78
|
1.24, 2.81
|
0.005
|
|
0.74
|
1.01, 4.58
|
0.11
|
cg17187762
|
NA
|
|
0.76
|
0.63, 0.93
|
0.006
|
|
0.78
|
0.62, 0.97
|
0.03
|
|
0.66
|
0.44, 0.98
|
0.04
|
cg05983326
|
PCDHGA1
|
|
0.69
|
0.57, 0.84
|
2.68e-04
|
|
0.73
|
0.57, 0.91
|
0.007
|
|
0.57
|
0.38, 0.84
|
0.006
|
cg06825163
|
LGR6
|
|
0.70
|
0.57, 0.86
|
5.47e-04
|
|
0.67
|
0.52, 0.84
|
8.11e-04
|
|
0.82
|
0.56, 1.21
|
0.34
|
cg11885357
|
ESYT3
|
|
0.89
|
0.73, 1.08
|
0.23
|
|
0.83
|
0.65, 1.04
|
0.11
|
|
1.07
|
0.72, 1.59
|
0.74
|
cg08829299
|
ATHL1
|
|
0.86
|
0.70, 1.04
|
0.13
|
|
0.84
|
0.66, 1.05
|
0.13
|
|
0.93
|
0.63, 1.37
|
0.69
|
cg07044115
|
NA
|
|
0.77
|
0.63, 0.93
|
0.008
|
|
0.82
|
0.66, 1.03
|
0.07
|
|
0.60
|
0.40, 0.89
|
0.01
|
MRS
|
|
|
2.41
|
2.02, 2.90
|
0.02
|
|
2.68
|
2.13, 3.38
|
<0.0001
|
|
2.02
|
1.48, 2.74
|
< 0.0001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abbreviations: CI, confidence interval; CRC: colorectal cancer; MRS, methylation risk score; ORs adjusted for age and gender; P values < 0.05 are in bold
Table 3 Associations of MRS and Risk of CRC According to Age, Gender and Time-To-Diagnosis of Nested Case Control Study Based on EPIC-Italy Cohort
|
|
Entire Dataset
|
|
|
|
Training Dataset
|
|
|
|
Testing Dataset
|
|
Characteristics
|
OR
|
95% CI
|
P-value
|
|
OR
|
95% CI
|
P-value
|
|
OR
|
95% CI
|
P-value
|
Age
|
|
|
|
|
|
|
|
|
|
|
|
< 60
|
2.35
|
1.95, 2.90
|
< 0.0001
|
|
2.62
|
2.06, 3.44
|
<0.0001
|
|
1.97
|
1.44, 2.84
|
<0.0001
|
≥ 60
|
2.60
|
1.77, 4.17
|
< 0.0001
|
|
2.87
|
1.77, 5.35
|
0.0002
|
|
2.28
|
1.20, 5.51
|
0.03
|
Gender
|
|
|
|
|
|
|
|
|
|
|
|
Male
|
1.97
|
1.47, 2.71
|
< 0.0001
|
|
2.10
|
1.44, 3.21
|
0.0002
|
|
1.91
|
1.20, 3.37
|
0.02
|
Female
|
2.64
|
2.12, 3.37
|
<0.0001
|
|
2.96
|
2.26, 4.08
|
<0.0001
|
|
2.08
|
1.46, 3.20
|
0.0002
|
Time-diagnosis
|
|
|
|
|
|
|
|
|
|
|
|
< 6 years
|
2.21
|
1.80, 2.77
|
< 0.0001
|
|
2.40
|
1.86, 3.20
|
<0.0001
|
|
1.99
|
1.40, 3.02
|
0.0004
|
≥ 6 years
|
2.51
|
2.01, 3.23
|
< 0.0001
|
|
2.88
|
2.17, 4.01
|
<0.0001
|
|
1.97
|
1.36, 3.05
|
0.0008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abbreviations: CI, confidence interval; CRC: colorectal cancer; MRS, methylation risk score; OR, odds ratios adjusted for age and gender