Identication of a Methylation-Driven Gene Panel for Survival Prediction in Colon Cancer

Background: Prediction and improvement of prognosis is important for effective clinical management of colon cancer patients. Accumulation of a variety of genetic as well as epigenetic changes in colon epithelial cells has been identied as one of the fundamental processes that drive the initiation and progression of colon cancer. This study aimed to explore functional genes regulated by DNA methylation and the potential of these DNA methylation changes to become biomarkers predictive of colon cancer prognosis. Methods: Methylation-driven genes (MDGs) were explored by applying an integrative analysis tool (MethylMix) to The Cancer Genome Atlas (TCGA) colon cancer project. TCGA colon cancer patients with available survival information (n=281) were randomly divided into training dataset (50%) for model construction and testing dataset (50%) for model validation. The prognostic MDG panel was identied in the training dataset by combining the Cox regression model with the least absolute shrinkage and selection operator regularization, a widely used approach to penalize the effect of multicollineatity. GSEA was employed to determine functional pathways associated with the prognostic 6-MDG panel. CD40 expression and methylation in colon cancer samples were also examined in datasets (expression prole [GSE8671] and methylation prole [GSE42752]) from Gene Expression Omnibus. Experimental conrmation of DNA methylation in colon cancer cell lines was performed using methylation specic PCR and bisulte sequencing. Results: We identied and internal validated a prognostic methylation-driven gene panel consisting of six gene members (TMEM88, HOXB2, FGD1, TOGARAM1, ARHGDIB and CD40). High risk phenotype classied by the 6-MDGs panel was associated with cancer-related biological processes, including invasion and metastasis, angiogenesis and tumor immune microenvironment, among others. The prognostic value of the 6-MDGs panel was independent of TNM stage, and its combination with TNM stage and age could help improve survival prediction of colon cancer patients. Additionally, we validated that the expression of CD40 was regulated by promoter region methylation in colon cancer samples and

relapse or metachronous metastase occurs in a subset of these patients, leading to high mortality [6]. Therefore, robust diagnostic, prognostic and predictive biomarkers are clearly and urgently needed.
Currently, TNM staging is the only well-recognized strati cation system used in clinical practice to guide therapy decision and predict CRC patients' prognosis [7,8]. However, the fact that the survival time in patients with the same TNM stage of CRC is often heterogeneous highlights the need for more accurate strategies [9]. Genetic changes, such as gene mutations have long been known to contribute to cancer formation and used to predicts CRC patients' outcome [10,11]. Recently, there is consensus that epigenetic alterations, including aberrant DNA methylation, abnormal histone modi cations, and altered expression of non-coding RNAs occur early and manifest more frequently than genetic changes in CRC [12]. In addition, advances in genomic technologies and bioinformatics have led to the identi cation of speci c epigenetic alterations as potential clinical biomarkers for CRC patients [12,13]. For example, with the availability of genomic platforms capable of broadly surveying gene expression and DNA methylation, as evidenced by The Cancer Genome Atlas (TCGA) project, we can now identify genomic subtypes of CRCs [14,15], and the CpG island methylator phenotype (CIMP) has undoubtedly been one of the most promising epigenetic biomarkers for prognosticating CRC patients [12,16].
By applying an integrative analysis tool (MethylMix) to colon cancer samples from TCGA project, we sought to explore functional genes regulated by DNA methylation and the potential of these DNA methylation changes to become biomarkers predictive of colon cancer prognosis. We identi ed a prognostic methylation-driven gene (MDG) panel consisting of six gene members (TMEM88, HOXB2, FGD1, TOGARAM1, ARHGDIB and CD40). High risk phenotype classi ed by the 6-MDGs panel was associated with cancer-related biological processes, including invasion and metastasis, angiogenesis and tumor immune microenvironment, among others. We also con rmed expression and methylation of CD40, a member of the 6-MDG panel, in colon cancer samples and cell lines.

Methods
Data acquisition and preprocessing Level 3 DNA methylation data of colon adenocarcinoma (COAD) samples measured by the Illumina Human Methylation 450 Beadchip (450K array) were downloaded from the TCGA data portal (https://portal.gdc.cancer.gov/) using the TCGA-Assembler [17]. These data are preprocessed via TCGA pipelines and presented in the form of b value, a ratio between methylated probe intensities and total probe intensities and probe-level data are condensed to a summary beta value by calculating the average methylation value for all CpG sites associated with a gene [18]. Totally, 353 samples of DNA methylation, including 315 COAD samples and 38 tumor adjacent samples were obtained. Methylation data were normalized using limma R package. Level 3 RNA-seq data and clinical information were also retrieved from the TCGA data portal. Among 521 cases of transcriptome pro les, 41 cases were obtained from tumor adjacent tissues, while the remaining 480 cases were COAD tissues. The transcriptome data were normalized and log2 transformed with the functions of DEGList and calcNormFactors in edgeR package [19]. The clinical data were preprocessed by exclusion of samples without survival status and patients whose survival time was less than 30 days were also removed [20]. Two additional datasets of colon cancer (expression pro le [GSE8671] and methylation pro le [GSE42752]) downloaded from Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) were used to examine the expression and methylation of CD40, respectively. GSE8671 dataset contains transcriptional data of 32 COAD patients with adjacent normal mucosa from the same individuals evaluated by Affymetrix Human Genome U133 Plus 2.0 Array [21]. GSE42752 dataset includes genome-wide DNA methylation pro le obtained from 22 COAD samples with corresponding adjacent normal colon mucosa and 20 cancer-unrelated healthy colon mucosa using 450K array [22]. Above data are available with no restrictions for research, and this study was performed in accordance with the guidelines of TCGA and GEO.

Identi cation of MDGs
To identify MDGs, the MethylMix R package was applied to perform an analysis integrating gene expression and DNA methylation data. In the MethylMix algorithm, the methylation state of a gene is established by a b mixture model and hypo-or hyper-methylated genes are determined by comparing their differential methylation state in cancer versus normal tissues (false discovery rate [FDR] < 0.05) [23,24].
MDGs should also ful ll the criteria of having a signi cant predictive effect on gene expression, implying that their methylation is inversely associated with transcription (Pearson coe cient < -0.3, P < 0.05) and thus functionally relevant [23,24].
Construction of the prognostic model for survival prediction Survival analysis was performed on 281 COAD patients for whom both methylation and survival information (overall survival > 30 days) were available. We rst randomly selected 50% of COAD patients as the training set and the remaining 50% of COAD patients as the testing set. Data matrixes were generated by combining methylation levels of the identi ed MDGs with matched follow-up data of COAD patients in the training set or the testing set. Then, univariate Cox regression analysis was performed to screen MDGs signi cantly associated with overall survival (P < 0.05) based on their methylation b value in the training set. Least absolute shrinkage and selection operator (LASSO) estimation, a well suited approach when there are a large number of correlated covariates in the patient cohort for model construction [25], was then performed to penalize the effect of multicollinearity using the glmnet R package [26]. MDGs survived from the LASSO estimation were subsequently subjected to multivariate Cox regression to construct a best tting prognostic model with the Akaike information criterion (AIC) indicating model tness [27]. Survival R package was used to execute steps in the univariate and multivariate Cox regression.

Risk score calculation
The risk score was calculated by a linear combination of the methylation b value of the selected MDGs weighted by their estimated regression coe cient in the multivariable Cox regression analysis as discussed previously [28]. COAD patients were classi ed into high or low risk group using the median risk score of the training set as the cutoff value.
Gene set enrichment analysis (GSEA) GSEA [29] was employed to determine whether the members of a given gene set were generally associated with the risk score derived from the prognostic 6-MDG panel. In the whole process, the risk score (high or low) was designated as the phenotype and the analysis was conducted on the matched gene expression pro le. Random sample permutations and the signi cant threshold were set at 1000 times and FDR < 0.01, respectively. GSEA was performed by the JAVA program (http://software.broadinstitute.org/gsea/index.jsp) using MSigDB C2 CP: KEGG gene set collection. The enriched KEGG pathways were ranked by normalized enrichment score (NES), and if a gene set have a positive NES, high expression level of the majority of its members is positively related to high risk score phenotype.

Experimental validation in colon cancer cell lines
A panel of six colon cancer cell lines (RKO, SW480, SW620, HCT116, DLD1 and LoVo) were included in the present study. All these cell lines were preserved in our institute (The First Medical Centre, Chinese PLA General Hospital, Beijing, China) and were cultured in RPMI 1640 supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. Semi-quantitative RT-PCR to evaluate mRNA expression of CD40 in colon cancer cell lines with or without 5-aza-2'-deoxycytidine (5-Aza, Sigma) treatment (2 mM for 96h) were carried out as previously described [30]. Genomic DNA was prepared by the proteinase K method. Bisul te treatment, methylation speci c PCR (MSP) and bisul te sequencing (BSSQ) were performed as previously described [31]. Genomic sequences around the transcriptional start site (TSS) were used as the template for CpG island prediction and design of MSP and BSSQ primers using Methyl Primer Express software v1.0 (Thermo Fisher Scienti c). The primers for RT-PCR, MSP and BSSQ were listed in Table S1.
Total protein of CD40 in these colon cancer cell lines was measured by western blot, which was performed as previously described [30]. b-actin was used as a loading control. The antibodies used in western blot were purchased from the Proteintech company (Wuhan, China). We also examined membrane expression of CD40 using ow cytometry analysis. Cells were harvested using trypsin and washed with phosphate-buffered solution before incubation with or without the presence of phycoerythrin (PE)-tagged mouse monoclonal antibody to human CD40 (Sino Biological) at 4 °C for 30min. Then the cells were washed twice to remove unbound antibody before they were measured on a FACSCalibur ow cytometer (BD BioSciences).

Statistical analysis
The Mann-Whitney test and Wilcoxon matched-pairs signed rank test were used to analyze the differences of DNA methylation, gene expression and risk score in non-paired and paired samples, respectively. The relativity between risk score and clinicopathological characteristics was analyzed using the Chi-square test or Fisher's exact test. Survival difference between the high-risk and low-risk group was evaluated by the Kaplan-Meier analysis, and log-rank test was used as a statistical method. Multivariate Cox regression and data strati cation analysis were performed to test whether the risk score derived from the prognostic MDGs panel was independent of COAD patients' clinicopathological features. Receiver operating characteristic (ROC) curve was employed and the area under ROC curve (AUC) was calculated to compare the sensitivity and speci city of survival prediction based on age, TNM stage, the risk score derived from the prognostic 6-MDG panel and their combination. Statistical tests were conducted by GraphPad Prism8 (GraphPad Software) or R 3.6.0 using the corresponding R package mentioned above.

Screening MDGs in COAD
We rst prepared relevant expression and methylation data for the same patients and three data matrixes were acquired: a gene expression pro le of 308 tumor tissues and two methylation pro les of 38 adjacent and 308 tumor tissues, respectively. These pro les were used as input data for MethylMix R package where methylation differential analysis and correlation analysis between DNA methylation and gene expression were conducted. According to the screen criteria, a total of 299 methylation-driven genes were identi ed (Table S2). The methylation pro le of the most signi cant 30 hypo-and hyper-methylated MDGs (ranked by b value difference between tumor and adjacent tissues) was shown in Figure 1A. The correlations between DNA methylation and gene expression and methylation mixture models of the top 3 MDGs were shown in Figure 1B & 1C, respectively.
Identi cation of a prognostic 6-MDG panel in the training set A total of 281 COAD patients with both methylation and adequate follow up (survival time > 30 days) data were included in the survival analysis after preprocessing of the methylation and clinical data. The clinical information of these 281 COAD patients were summarized in Table S3. They were randomly split into the training set (n = 141) and the testing set (n = 140). To identify certain prognostic MDGs, univariate Cox regression analysis was performed in the training set and 12 prognosis related MDGs (P < 0.05; Table S4) were chosen for subsequent LASSO estimation. Ten MDGs survived the LASSO regularization ( Figure 2A) after penalization of the multicollinearity effect and were further subjected to multivariable Cox regression analysis to construct a best tting prognostic model. AIC was used to indicate model tness. Finally, a prognostic DNA methylation gene panel consisting of six MDGs (TMEM88, HOXB2, FGD1, TOGARAM1, ARHGDIB and CD40) was identi ed. The detailed information of the six MDGs was summarized in Table 1. The methylation pro le, correlations between gene expression and DNA methylation and the methylation mixture models of the six MDGs were shown in Figure S1. The prognostic 6-MDG panel included one gene member (ARHGDIB) with statistically non-signi cant P value (P = 0.071; Table 1), but the 6-MDG panel acquired the lowest AIC representing the best model tness and the overall effect was signi cant (AIC = 202.86, global P [Log-rank] < 0.001).
Next, a risk score model for overall survival prediction was created based on the methylation b values of these six MDGs, as follows: risk score = (-6.150* methylation b value of TMEM88) + (-3.593* methylation b value of HOXB2) + (-7.287* methylation b value of FGD1) + (-7.861* methylation b value of TOGARAM1) + (-3.622* methylation b value of ARHGDIB) + (-4.288* methylation b value of CD40). Then we calculated the risk score for each COAD patient, and classi ed them into high or low risk subgroup using the median risk score of the patients in the training set as the cutoff value.
Kaplan-Meier survival curve analysis of the training set showed that COAD patients in the high-risk group had signi cantly shorter median OS than those in the low-risk group (Log-rank P < 0.001; Figure 2B). We also pro led the distribution of risk score, survival status and methylation b value in the training set ( Figure 2C-E). The risk scores of the patients in the training set ranged from -17.883 to -9.677 with the median risk score of -13.807 ( Figure 2C). Moreover, there were more patients alive in the low-risk group than those in the high-risk group (c 2 = 13.45, P = 0.0002; Figure 2D). Interestingly, methylation levels of all the six MDGs were higher in low-risk patients than those in the high-risk patients ( Figure 2E), indicating that hypermethylation of the 6-MDG panel is a favorable prognostic factor of COAD patients.
The 6-MDG panel is predictive of survival in the testing and entire set To further test the signi cance of the prognostic 6-MDG panel in COAD patients, the testing and entire set were used as validation groups. Using the same cutoff value of risk score obtained from the training set, COAD patients in the testing set were divided into high-risk group (n = 75) and low-risk group (n = 65). The result of Kaplan-Meier analysis demonstrated that COAD patients in the high-risk group showed worse overall survival than those in the low-risk group (Log-rank P = 0.0137; Figure 3A), and there were more patients alive in the low-risk group than those in the high-risk group (c 2 = 4.514, P = 0.0336; Figure 3B). We also performed the same analysis on the entire set (training set plus testing set, n = 281) and the results were consistent with those in the training and testing set ( Figure 3C & 3D). Above results suggested that the 6-MDG panel can predict survival in both training and entire set.
The prognostic value of the 6-MDG panel is independent of TNM stage TNM classi cation is widely used, clinically useful and is highly associated with 5-year overall survival in colon cancer [32]. Thus, we set to clarify whether the prognostic value of the 6-MDG panel is independent of TNM stage. For this, we performed multivariable Cox regression and strati cation analysis in the entire set. The multivariable Cox regression analysis was conducted on 271 patients, with age, gender, TNM stage and risk score as covariates. Ten cases of patients were excluded because of missing information on TNM stage. The results showed that age, TNM stage and the risk score remained independent prognostic factors in the multivariable Cox regression analysis ( Figure 4A). Data strati cation analysis was then performed where these patients were strati ed into four subgroups (Stage I, II, III, and IV). The results of strati cation analysis showed that the prognostic 6-MDG panel could identify patients with different overall survival in TNM stage II (Log-rank P = 0.0450) and IV (Log-rank P = 0.0160) subgroups ( Figure 4B), while was insu cient to clarify the patients in TNM stage I (Log-rank P = 0.0750) and TNM stage III (Log-rank P = 0.0975) with signi cantly disparate survival ( Figure 4B). This might be attributed to the small sample size or some truncated data. Thus, we combined low TNM stage (Stage I plus II) and high TNM stage (Stage III plus IV), and the risk score could signi cantly identify patients with different prognoses in these two subgroups (Log-rank P = 0.0083 and 0.0006, respectively; Figure 4C). Above results suggested that prognostic value of the 6-MDG panel is independent of TNM stage.
Moreover, we performed ROC analysis to compare the sensitivity and speci city of overall survival prediction between the prognostic factors including age, TNM stage, the risk score derived from the 6-MDG panel and combination of these three factors. As shown in Figure 4D, there was no signi cant difference when the AUCs of the three prognostic factors (age, TNM stage and the risk score) alone were pairwise compared (all P > 0.05). However, when these three prognostic were combined, the AUC was signi cantly greater than that of each prognostic factor alone (all P < 0.05). These results indicated that the combination of the three prognostic factors (age, TNM stage and the risk score) may help improve survival prediction in patients with COAD.

Assessment of biological pathways associated with the 6-MDG panel
We performed GSEA to identify relevant pathways the 6-MDG might be involved in using the risk score for phenotype classi cation. Gene sets signi cantly enriched (FDR < 0.01) in the high-risk phenotype were shown in Figure 5A. The high risk score was positively associated with up-regulation of several cancerrelated pathways, among others such as invasion and metastasis, angiogenesis and tumor immune microenvironment. For instance, vascular endothelial growth factor (VEGF), a key regulator of the growth and maintenance of blood vessels, can directly modulate the vascular wall by loosening cell-cell contacts and increasing the leakiness of blood vessels which favors tumor cell dissemination [33].
Next, we analyzed the relativity between the clinicopathological features and the risk score derived from the 6-MDG panel in COAD patients (Table 2). Consistent with the pathway analysis, the results showed that COAD patients in high-risk group were more likely to have remote metastasis (c 2 = 6.465, P = 0.011;  Figure 5B). We also evaluated the risk score as a continuous variable and patients with metastasis tended to have higher risk score than those without metastasis (P = 0.0036; Figure 5C). Collectively, above results suggested that the 6-MDG panel is associated with cancer-related signaling pathways and acts as an indicator of tumor metastasis.
CD40 is universally hypermethylated in colon cancer tissues CD40 is a member of the tumor necrosis factor family and a new immunemodulating target with great potential in cancer treatment [34]. Regulation of CD40 expression by DNA methylation has not been reported in current literatures, and thus deserves further investigation. We rst inquired the expression of CD40 in colon cancer patients from the TCGA and GSE8671 dataset. The transcriptional expression of CD40 was signi cantly downregulated in colon cancer tissues compared with adjacent colon mucosa in both datasets ( Figure 6A). Next, we analyzed the overall methylation level of CD40 in TCGA and GSE42725 dataset. The results showed that CD40 was hypermethylated in colon cancer tissues compared with adjacent or/and healthy colon mucosa in these two datasets ( Figure 6B). We also observed a negative correlation between the mRNA expression and the overall DNA methylation level in COAD patients in TCGA (Pearson r = -0.511, P < 0.001; Figure S1B).
Additionally, we analyzed the CpG site-speci c methylation status of all the 15 CpG sites of CD40 assessed by 450K array. The CpG sites located in or near the CpG island (Island, N shore and S shore) covering TSS of CD40 (12 CpG sites) were signi cantly hypermethylated in colon cancer tissues compared with the adjacent mucosa ( Figure 6C), and their methylation levels were negatively correlated with CD40 expression, except for cg24575067 ( Figure 6D). Interestingly, we observed a similar CpG sitespeci c methylation pattern of CD40 in the GSE42725 dataset ( Figure 6E). Above results suggested that CD40 is universally hypermethylated in colon cancer tissues which may contribute to its transcriptional silence.

The expression of CD40 is regulated by promoter methylation in colon cancer cell lines
To better understand the regulation of CD40 expression in colon cancer, the levels of CD40 expression were detected in a panel of colon cancer cell lines. CD40 mRNA expression was silenced in 3 of 6 colon cancer cell lines ( Figure 7A). We con rmed the expression of CD40 using western blot and ow cytometry analysis on the total and membrane protein level in these six cell lines ( Figure 7B & 7C). Next, MSP and BSSQ were employed to interrogate the methylation status of CD40 promoter region in these cell lines.
The CpG island situated in CD40 gene promoter region and the designed MSP and BSSQ primers were shown in Figure 7D. MSP analysis revealed CD40 promoter methylation in the three cell lines (SW480, SW620 and DLD1) with silenced CD40 expression ( Figure 7E). BSSQ analysis of 19 CpG sites around the TSS showed dense methylation in the CD40 silenced cell lines examined (SW480 and DLD1), but not in the CD40 expressing HCT116 cells ( Figure 7F). To test whether promoter methylation directly contributes to transcriptional silencing of CD40, these six colon cancer cell lines were treated with 5-Aza, a demethylation reagent. Restoration of CD40 expression was induced by 5-Aza in the three colon cancer cell lines with silenced CD40 expression ( Figure 7G). These results indicated that CD40 is silenced in colon cancer cell lines by promoter region hypermethylation.

Discussion
Aberrant epigenetic changes are crucial for carcinogenesis and subsequent tumor progression [35]. Of the various epigenetic modi cations, DNA methylation acts as the key element and is classically responsible for transcriptional silence via hypermethylation of CpG islands located in promoter regions of tumor suppressor genes [36]. Additionally, DNA hypomethylation has been implicated in the regulation of genome rearrangement and chromosomal instability which may also contribute to carcinogenesis [36]. A plethora of gene-speci c studies have demonstrated that hyper or hypomethylation of a gene can be utilized as epigenetic biomarker to predict behavior and prognosis of CRC [37]. There is also evidence of an association between aberrant methylation of multiple genes and increased CRC aggressiveness [38].
For instance, Weisenberger and colleagues later introduced the prevailing method used to identify CIMP in CRC, which is based on the methylation status of ve genes, CACNA1G, IGF2, NEUROG1, RUNX3, and SOCS126 [36]. CIMP-positive tumors exhibit unique clinicopathological and molecular features, correlating with an overall unfavorable prognosis [39].
The advance and prevalence of high through-put DNA methylation arrays have con rmed prior identi ed epigenetics changes, and on the other hand have uncovered a plenty of new alterations, creating an opportunity to discover novel cancer-related epigenetic biomarkers. By applying an integrative analysis tool to TCGA project, we sought to explore key genes regulated by DNA methylation and the potential of them to become prognostic biomarkers of colon cancer. A model-based algorithm (MethylMix) was employed to identify MDGs, based on which we developed a prognostic MDG panel consisting of six gene members (TMEM88, HOXB2, FGD1, TOGARAM1, ARHGDIB and CD40) in the training set (50% of TCGA cohort). The 6-MDG panel exhibited favorable performance on survival prediction, which was validated in the testing (the remaining 50% of TCGA cohort) and the entire set. Multivariate Cox regression and data strati cation analysis demonstrated that the prognostic value of the risk score derived from the 6-MDG panel is independent of TNM stage. Furthermore, it was fascinating to nd that the combination of age, TNM stage and the 6-MDG panel, three independently prognostic factors revealed by the multivariate Cox regression analysis might help improve prognosis prediction in the ROC curve analysis.
These six prognostic MDGs are differential methylated between tumor and adjacent tissues and their levels of DNA methylation and mRNA expression are inversely correlated. Such differentiation and relativity signify their potential roles in colon cancer. Our pathway analysis by GSEA provided evidence that the six MDGs are involved in cancer-related biological processes, including invasion and metastasis, angiogenesis and tumor immune microenvironment, among others. Up-regulation of HOXB2 was found to be an adverse prognostic indicator for stage I lung adenocarcinomas, promoting invasion by transcriptional regulation of metastasis-related genes [40,41]. In our study, expression of HOXB2 is negatively correlated with DNA methylation in colon cancer and hypermethylation of HOXB2 is associated with prolonged overall survival. However, Marsit et al. revealed a distinct role of HOXB2 in bladder cancer that increased promoter methylation of HOXB2 is signi cantly and independently associated with higher degree of cancer aggressiveness [42]. Thus, further studies are greatly needed to clarify the functional role of HOXB2 in cancer. ARHGDIB has been identi ed as a regulator of tumor metastasis but its role in cancer remains controversial [43]. ARHGDIB can function as a positive (in ovary [44], breast [45], colorectal [43] and gastric cancer [46]) and negative (in Hodgkin's lymphoma [47], bladder [48,49] and lung cancer [50]) regulator of cancer progression. In our study, hypermethylation of ARHGDIB is a favorable prognostic factor, in agreement with previous ndings in colon cancer. TMEM88 is a transmembrane protein and functions as an inhibitor of Wnt signaling [51]. TMEM88 promoter hypomethylation is associated with platinum resistance in ovarian cancer [52]. Our study demonstrated that TMEM88 is hypomethylated in the high-risk group with shorter overall survival in colon cancer. So we hypothesis that TMEM88 may modulate the prognosis of colon cancer via altering sensitivity of cancer cell to chemodrug mediated by promoter methylation, and further investigations are needed to con rm that. Ayala et al. revealed a central role for FGD1 in regulating focal degradation of the extracellular matrix at invadopodia [53]. They also demonstrated that FGD1 is highly expressed in prostate and breast cancer, which might lead to aberrant growth, invasiveness, and/or metastatic potential [53]. TOGARAM1 encodes a TOG domain array-containing protein that regulates cilia microtubule structure [54]. Regulation of TOGARAM1 expression by DNA methylation and its role in cancer has not been reported. CD40 belongs to the family of TNF receptors and is crucial in mediating a variety of immune and in ammatory responses [55]. CD40 ligation provides essential activation signals for immune cells [55], while CD40 possesses controversial functions in promoting or inhibiting tumorigenesis and progression via regulation of TNFα-induced apoptosis [56], angiogenesis [57], tumor cell migration and invasion [58] and chemoresistance [59]. Agonist CD40 antibodies have been developed and tested in clinical trials where impressive results have been noted, especially in pancreatic cancers [60]. We con rmed that the expression of CD40 is regulated by promoter region hypermethylation in colon cancer tissues and cell lines, which may bring new sight into the combination of epigenetic therapy and CD40-stimulating immunotherapy. Further investigations are warranted to clarify the underlying mechanisms that potentiate above methylation-driven genes as DNA methylation biomarkers for colon cancer.
Several limitations should be acknowledged for our study. First, no external validation was performed. We attempted to search for colon cancer cohorts with both methylation and follow-up data in multiple cancer databases, including GEO and the International Cancer Genome Consortium (ICGC) project, among others, but no available dataset was found. However, considering the su cient number of patients included in the process of model construction and internal validation, the identi ed prognostic signature is unlikely to be a random noise of the methylome. Second, limited experimental information regarding to the regulatory mechanisms of all the six prognostic MDGs in the methylation signature was presented. Third, the speci c functional role of these prognostic MDGs in colon cancer was left unexplored.

Conclusion
In summary, we identi ed a MDG-related signature which acts as an independent prognostic factor in colon cancer and its combination with clinical characters including age and TNM stage could help improve prognosis prediction. We also con rmed that CD40, a member of the prognostic 6-MDG panel, is regulated by DNA methylation in colon cancer samples and cell lines. More efforts are necessary to have a complete picture of the regulatory mechanisms and functional roles of all the six MDGs in colon cancer. Also clinical investigations in additional colon cancer patient cohorts are warranted to validate our ndings and to elaborate its potential utility.       The