Screening MDGs in COAD
We first prepared relevant expression and methylation data for the same patients and three data matrixes were acquired: a gene expression profile of 308 tumor tissues and two methylation profiles of 38 adjacent and 308 tumor tissues, respectively. These profiles were used as input data for MethylMix R package where methylation differential analysis and correlation analysis between DNA methylation and gene expression were conducted. According to the screen criteria, a total of 299 methylation-driven genes were identified (Table S2). The methylation profile of the most significant 30 hypo- and hyper-methylated MDGs (ranked by b value difference between tumor and adjacent tissues) was shown in Figure 1A. The correlations between DNA methylation and gene expression and methylation mixture models of the top 3 MDGs were shown in Figure 1B & 1C, respectively.
Identification of a prognostic 6-MDG panel in the training set
A total of 281 COAD patients with both methylation and adequate follow up (survival time > 30 days) data were included in the survival analysis after preprocessing of the methylation and clinical data. The clinical information of these 281 COAD patients were summarized in Table S3. They were randomly split into the training set (n = 141) and the testing set (n = 140). To identify certain prognostic MDGs, univariate Cox regression analysis was performed in the training set and 12 prognosis related MDGs (P < 0.05; Table S4) were chosen for subsequent LASSO estimation. Ten MDGs survived the LASSO regularization (Figure 2A) after penalization of the multicollinearity effect and were further subjected to multivariable Cox regression analysis to construct a best fitting prognostic model. AIC was used to indicate model fitness. Finally, a prognostic DNA methylation gene panel consisting of six MDGs (TMEM88, HOXB2, FGD1, TOGARAM1, ARHGDIB and CD40) was identified. The detailed information of the six MDGs was summarized in Table 1. The methylation profile, correlations between gene expression and DNA methylation and the methylation mixture models of the six MDGs were shown in Figure S1. The prognostic 6-MDG panel included one gene member (ARHGDIB) with statistically non-significant P value (P = 0.071; Table 1), but the 6-MDG panel acquired the lowest AIC representing the best model fitness and the overall effect was significant (AIC = 202.86, global P [Log-rank] < 0.001).
Next, a risk score model for overall survival prediction was created based on the methylation b values of these six MDGs, as follows: risk score = (-6.150* methylation b value of TMEM88) + (-3.593* methylation b value of HOXB2) + (-7.287* methylation b value of FGD1) + (-7.861* methylation b value of TOGARAM1) + (-3.622* methylation b value of ARHGDIB) + (-4.288* methylation b value of CD40). Then we calculated the risk score for each COAD patient, and classified them into high or low risk subgroup using the median risk score of the patients in the training set as the cutoff value.
Kaplan-Meier survival curve analysis of the training set showed that COAD patients in the high-risk group had significantly shorter median OS than those in the low-risk group (Log-rank P < 0.001; Figure 2B). We also profiled the distribution of risk score, survival status and methylation b value in the training set (Figure 2C-E). The risk scores of the patients in the training set ranged from -17.883 to -9.677 with the median risk score of -13.807 (Figure 2C). Moreover, there were more patients alive in the low-risk group than those in the high-risk group (c2 = 13.45, P = 0.0002; Figure 2D). Interestingly, methylation levels of all the six MDGs were higher in low-risk patients than those in the high-risk patients (Figure 2E), indicating that hypermethylation of the 6-MDG panel is a favorable prognostic factor of COAD patients.
The 6-MDG panel is predictive of survival in the testing and entire set
To further test the significance of the prognostic 6-MDG panel in COAD patients, the testing and entire set were used as validation groups. Using the same cutoff value of risk score obtained from the training set, COAD patients in the testing set were divided into high-risk group (n = 75) and low-risk group (n = 65). The result of Kaplan-Meier analysis demonstrated that COAD patients in the high-risk group showed worse overall survival than those in the low-risk group (Log-rank P = 0.0137; Figure 3A), and there were more patients alive in the low-risk group than those in the high-risk group (c2 = 4.514, P = 0.0336; Figure 3B). We also performed the same analysis on the entire set (training set plus testing set, n = 281) and the results were consistent with those in the training and testing set (Figure 3C & 3D). Above results suggested that the 6-MDG panel can predict survival in both training and entire set.
The prognostic value of the 6-MDG panel is independent of TNM stage
TNM classification is widely used, clinically useful and is highly associated with 5-year overall survival in colon cancer[32]. Thus, we set to clarify whether the prognostic value of the 6-MDG panel is independent of TNM stage. For this, we performed multivariable Cox regression and stratification analysis in the entire set. The multivariable Cox regression analysis was conducted on 271 patients, with age, gender, TNM stage and risk score as covariates. Ten cases of patients were excluded because of missing information on TNM stage. The results showed that age, TNM stage and the risk score remained independent prognostic factors in the multivariable Cox regression analysis (Figure 4A). Data stratification analysis was then performed where these patients were stratified into four subgroups (Stage I, II, III, and IV). The results of stratification analysis showed that the prognostic 6-MDG panel could identify patients with different overall survival in TNM stage II (Log-rank P = 0.0450) and IV (Log-rank P = 0.0160) subgroups (Figure 4B), while was insufficient to clarify the patients in TNM stage I (Log-rank P = 0.0750) and TNM stage III (Log-rank P = 0.0975) with significantly disparate survival (Figure 4B). This might be attributed to the small sample size or some truncated data. Thus, we combined low TNM stage (Stage I plus II) and high TNM stage (Stage III plus IV), and the risk score could significantly identify patients with different prognoses in these two subgroups (Log-rank P = 0.0083 and 0.0006, respectively; Figure 4C). Above results suggested that prognostic value of the 6-MDG panel is independent of TNM stage.
Moreover, we performed ROC analysis to compare the sensitivity and specificity of overall survival prediction between the prognostic factors including age, TNM stage, the risk score derived from the 6-MDG panel and combination of these three factors. As shown in Figure 4D, there was no significant difference when the AUCs of the three prognostic factors (age, TNM stage and the risk score) alone were pairwise compared (all P > 0.05). However, when these three prognostic were combined, the AUC was significantly greater than that of each prognostic factor alone (all P < 0.05). These results indicated that the combination of the three prognostic factors (age, TNM stage and the risk score) may help improve survival prediction in patients with COAD.
Assessment of biological pathways associated with the 6-MDG panel
We performed GSEA to identify relevant pathways the 6-MDG might be involved in using the risk score for phenotype classification. Gene sets significantly enriched (FDR < 0.01) in the high-risk phenotype were shown in Figure 5A. The high risk score was positively associated with up-regulation of several cancer-related pathways, among others such as invasion and metastasis, angiogenesis and tumor immune microenvironment. For instance, vascular endothelial growth factor (VEGF), a key regulator of the growth and maintenance of blood vessels, can directly modulate the vascular wall by loosening cell-cell contacts and increasing the leakiness of blood vessels which favors tumor cell dissemination[33].
Next, we analyzed the relativity between the clinicopathological features and the risk score derived from the 6-MDG panel in COAD patients (Table 2). Consistent with the pathway analysis, the results showed that COAD patients in high-risk group were more likely to have remote metastasis (c2 = 6.465, P = 0.011; Table 2 & Figure 5B). We also evaluated the risk score as a continuous variable and patients with metastasis tended to have higher risk score than those without metastasis (P = 0.0036; Figure 5C). Collectively, above results suggested that the 6-MDG panel is associated with cancer-related signaling pathways and acts as an indicator of tumor metastasis.
CD40 is universally hypermethylated in colon cancer tissues
CD40 is a member of the tumor necrosis factor family and a new immunemodulating target with great potential in cancer treatment[34]. Regulation of CD40 expression by DNA methylation has not been reported in current literatures, and thus deserves further investigation. We first inquired the expression of CD40 in colon cancer patients from the TCGA and GSE8671 dataset. The transcriptional expression of CD40 was significantly downregulated in colon cancer tissues compared with adjacent colon mucosa in both datasets (Figure 6A). Next, we analyzed the overall methylation level of CD40 in TCGA and GSE42725 dataset. The results showed that CD40 was hypermethylated in colon cancer tissues compared with adjacent or/and healthy colon mucosa in these two datasets (Figure 6B). We also observed a negative correlation between the mRNA expression and the overall DNA methylation level in COAD patients in TCGA (Pearson r = -0.511, P < 0.001; Figure S1B).
Additionally, we analyzed the CpG site-specific methylation status of all the 15 CpG sites of CD40 assessed by 450K array. The CpG sites located in or near the CpG island (Island, N shore and S shore) covering TSS of CD40 (12 CpG sites) were significantly hypermethylated in colon cancer tissues compared with the adjacent mucosa (Figure 6C), and their methylation levels were negatively correlated with CD40 expression, except for cg24575067 (Figure 6D). Interestingly, we observed a similar CpG site-specific methylation pattern of CD40 in the GSE42725 dataset (Figure 6E). Above results suggested that CD40 is universally hypermethylated in colon cancer tissues which may contribute to its transcriptional silence.
The expression of CD40 is regulated by promoter methylation in colon cancer cell lines
To better understand the regulation of CD40 expression in colon cancer, the levels of CD40 expression were detected in a panel of colon cancer cell lines. CD40 mRNA expression was silenced in 3 of 6 colon cancer cell lines (Figure 7A). We confirmed the expression of CD40 using western blot and flow cytometry analysis on the total and membrane protein level in these six cell lines (Figure 7B & 7C). Next, MSP and BSSQ were employed to interrogate the methylation status of CD40 promoter region in these cell lines. The CpG island situated in CD40 gene promoter region and the designed MSP and BSSQ primers were shown in Figure 7D. MSP analysis revealed CD40 promoter methylation in the three cell lines (SW480, SW620 and DLD1) with silenced CD40 expression (Figure 7E). BSSQ analysis of 19 CpG sites around the TSS showed dense methylation in the CD40 silenced cell lines examined (SW480 and DLD1), but not in the CD40 expressing HCT116 cells (Figure 7F). To test whether promoter methylation directly contributes to transcriptional silencing of CD40, these six colon cancer cell lines were treated with 5-Aza, a demethylation reagent. Restoration of CD40 expression was induced by 5-Aza in the three colon cancer cell lines with silenced CD40 expression (Figure 7G). These results indicated that CD40 is silenced in colon cancer cell lines by promoter region hypermethylation.