Combined Analysis of the Aberrant Epigenetic Alteration of Colorectal Cancer


 BackgroundColorectal cancer (CRC) is the third most common cancer which could be classified as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) mainly. Accumulating evidence indicatedmethylations are involved in multiple tumors.MethodsTo know the effect of abnormal methylation in COAD, READ and CRC, we downloaded methylation and mRNA data of COAD and READ from The Cancer Genome Atlas (TCGA) database. And then, we used DESeq2, ChAMP, DAVID 6.8, Cytoscape_3.7.2 and Correlation analysis to identify the potential biomarkers.ResultsWe obtain 12 potential biomarkers (APBB1, CDC42SE2, EIF4E3, FBXO17, FES, GNPNAT1, HSPA1A, OSBPL3, RORC, SALL1, SPEG, and TCF7L1) directly associated with the pathologic TNM of COAD maybe regulated by 28 differential methylations; 2 potential biomarkers (AQP1 and HOXA3) directly associated with pathologic NM maybe regulated by 8 differential methylations; and 15 potential biomarkers (ADAMTSL3, ANXA9, APBB1, AQP1, C2CD4A, CLIP3, DNAJC15, EIF4E3, FAM160A1, GNG4, HLX, HSPA1A, LAYN, NR3C2, and SYT1)directly associated with the pathologic TNM of COAD maybe regulated by 29 differential methylations. Furthermore, weconstruct the network of differential methylation and differential expression genes. In addition, we also obtain 1 methylation (cg18149207), 2 methylations (cg02000808 and cg21134232) and 2 methylations (cg04408595 and cg19413725) associated with the overall survival.ConclusionOur results provide an analysis of theoretical knowledge and clinical outcomes, but more researches are needed to confirm our findings.


Abstract Background
Colorectal cancer (CRC) is the third most common cancer which could be classi ed as colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ) mainly. Accumulating evidence indicatedmethylations are involved in multiple tumors.

Methods
To know the effect of abnormal methylation in COAD, READ and CRC, we downloaded methylation and mRNA data of COAD and READ from The Cancer Genome Atlas (TCGA) database. And then, we used DESeq2, ChAMP, DAVID 6.8, Cytoscape_3.7.2 and Correlation analysis to identify the potential biomarkers.

Conclusion
Our results provide an analysis of theoretical knowledge and clinical outcomes, but more researches are needed to con rm our ndings.

Background
Colorectal cancer (CRC) is the second most commoncancer diagnosed in females and third inmales, accounts for approximately 10% of all annually diagnosed cancers and cancer-related deaths [1]. There are approximately 900,000 deathsannually [1]. In the past decades, the mortality associated with CRC diagnosis has declined progressively which could be attributed to cancer screening programs, improved surgical techniques, and the availability of more-effective systemic therapiesfor early-stage and advanced-stage disease and so on [2,3]. Biomarkers mostly refer to DNA, mRNA, microRNA (miRNA), epigenetic changes or antibodies are playing an increasingly important role in the screen and treatment of CRC [4].
Epigenetics is rst illustrated by Conrad H. Waddingtone in 1942 [5].It is now clear that multipleprocesses of CRC stem cellsare the result of the progressive accumulation of genetic and epigenetic alterations, which could inactivatetumor-suppressor genes and activate oncogenes and leads to tumorigenesis [6][7][8].DNA methylation is a major epigenetic modi cation. Abnormal methylation could affect the functions of crucial genes by altering their expression in tumorigenesis. Several studies have demonstrated that DNA methylation exerted an early event, and new efforts are focused on nding biomarkers for early disease detection, prognostication, and treatment selection [9]. In this study, we used systemic analysis to identify a group of candidate prognosis gene, which may be regulated by DNA methylation. Our study might be the groundwork for further elucidation of the CRC mechanism and screening of the diagnostic biomarkers.

1 Data source and Data processing
In this study, mRNA expression, DNA methylation and the corresponding clinical information of COAD and READ were obtained from TCGA database (https://portal.gdc.cancer.gov/). In COAD, 497 (41 control vs456 tumor) samples were included in the gene expression pro les while 334 (38 control vs296 tumor)samples were included in the methylation analysis.For READ, 176 samples (10 control vs166 tumor) were included in the gene expression pro les while 105 samples (7 control vs 98 tumor) were included in the methylation analysis. The mRNA expression pro le data were analyzed by DESeq2 package in R software [10], padj< 0.05, |logFC| ≥ 0.5 and basemean>50 as selected criteria for differential expression genes. The methylation levels were analyzed by ChAMPpackage in R software [11], padj< 0.05, |logFC| ≥ 0.2 as cutoff for differential methylation probes selection.

Correlation analysis
In this study all of these overlap genes between differential expression genes and differential methylation genes were used for the correlation analysis. And the speci c criteria for correlation analysis was set like that p value 0.05 and r -0.3.

2.3Survival Analysis
The genes which expression signi cantly changed in tumor samples were used for survival analysis by RegParallel and survival packages via Cox Proportional Hazards regression. According to the medium value, all differentially expressed mRNAs data were transformed to low expression and high expression group. The gene which signi cantly correlated with survival rate were selected as pvalue < 0.05 and used for the next analysis.
For pathologic TNM, the samples were divided into two groups according to the tumor pathology. And thenthe survival analyses were performed by IBM SPSS statistics 22, with log-rank P < 0.05 indicating a signi cant correlation with overall survival outcomes.

2.4Statistical analysis
A repeated-measure ANOVA followed by Bonferroni post hoc tests or unpaired two-tail Student's t test was used as indicated. All statistical analyses were performed using the Prism 6.01 (Graph Pad Software, San Diego, CA).

3.1Demography
After excluding those patients without methylation or clinical information, 294COAD and 97 READ patients were included in the study. The clinical and pathological information of these patients displayed in Table 1-2. In COAD cohort, 3.40% of patients were less than 30-39 years old, 11.22% were 40-49 years old, 18.71% were 50-59 years old, 26.19% were 60-69 years old, 25.85% were 70-79 years old, and 14.63% were above 80 years old. There were, respectively, 44 patients with pathologic TNM stage I, 114 patients with pathologic TNM stage II, 85 patients with pathologic TNM stage III, 41 patients with pathologic TNM stage IV, and 10 patients with an unknown TNM stage in our study (table 1).
In READ cohort, 3.09% of patients were less than 30-39 years old, 14.43% were 40-49 years old, 22.68% were 50-59 years old, 23.71% were 60-69 years old, 31.96% were 70-79 years old, and 4.12% were above 80 years old. There were, respectively, 11 patients with pathologic TNM stage I, 29 patients with pathologic TNM stage II, 35 patients with pathologic TNM stage III, 13 patients with pathologic TNM stage IV, and 9 patients with an unknown TNM stage in our study (table 2).  (Fig 1a-b).Combined all methylation samples from COAD and READ, 40936 DMPs (21998 hypermthylaed, 18938 hypomethylated)had been identi ed (Fig 1c). By considering the CpG content and the neighboring context, the hypomethylaion rate of island was shown to be the highest (71.87%, 67.63%, 69.09%) while the hypermethylaion rate of opensea was shown to be the highest (70.31%, 65.31%, 69.65%) in COAD, READ, and CRC respectively (Fig 1d-f). Examining the sites surrounding genes revealed that the hypomethylaion rate of body was shown to be the highest (26.31%, 27.65%, 26.48%) while the hypermethylaion rate of IGR was shown to be the highest (39.71%, 38.44%, 39.58%) in COAD, READ, and CRC respectively (Fig g-i).
By cross analysis of DEGs and DMGs, we found that there were 2080, 2064 and 2118 overlap genes in COAD, READ and CRC respectively, and 9221,7647 and 8821 corresponding DMPs (Fig 2d-f).And by cross analysis of DEGs and DMGs located in TSS1500, TSS200 and 5′UTR, we found that there were 1131, and 3392 DMPs (Fig 2g-i).

Correlation Analysis
Previous studies indicated that the relationship between methylation and genes expression is negative correlation. To further narrow-down target genes which potentially regulated by methylation in CRC, we  (Fig 5a-d).
By retrospective examination, we found that those 12 prognosis genes in COAD were negative correlated with 27 DMPs, 2 genes in READ were negative correlated with 8 DMPs, and 15 genes in CRC were negative correlated with 29 DMPs as shown in Fig 7a. Survival analysis also indicated that the patients with low methylation of cg18149207 (hypomethylated gene: RORC) in COAD, cg04408595 (hypomethylated gene: EIF4E3) in CRC and cg19413725 (hypomethylated gene: FAM160A1) in CRC exhibited better OS (Fig 7b, e-f); the patients with high methylation of cg02000808 (hypomethylated gene: HOXA3) and cg21134232 (hypomethylated gene: HOXA3) in READ exhibited better OS (Fig 7c-d).

Discussion
Colorectal cancer is the third most common cancer and the second mortality [1].Gender and aging have shown strong associations with disease incidence consistently by epidemiological studies [1]. Most patients of CRC arise from a polyp which begins with a neoplastic precursor lesion [12,13]. It is now clear that CRCtumorigenesisisthe consequence of the progressive accumulation of genetic and epigenetic alterations, which causes dysregulation of the homeostatic functions and leads to neoplastic transformation [6][7][8].Until now, the mainly treatment for CRC is still the surgical which mainly based CT colonography and histology diagnosis [14]. So it is very important to nd the tumor based markers for the screening strategies and development of more effective treatments for CRC.
Epigenetic altercation plays a vital role in carcinogenesis and tumor development progression. Abnormal methylation could affect the functions of crucial genes by altering their expression. Increasing studies have demonstrated that DNA methylation is referred to as an early phenomenon, and new efforts are focused on ndingtumor biomarkers of early disease detection, prognostication, and treatment [15][16][17].
In this works, we integrated DNA methylation and gene expression data and screen DNA methylation driven tumor genes, and survival analysis was further to determinate these prognostic genes associated with TNM staging. Survival analysis indicated that there were 24, 16 and 29 genes maybe the prognostic genes for COAD, READ and CRC respectively. There were 11 overlap genes (HSPA1A, TMEM106A, UNC5C, CBFA2T3, TSPAN11, REP15, APBB1, NPTX2, EIF4E3, HHIP, and ZNF132) between COAD and CRC. Similarly, there were 3 overlap genes (EPHX4, C2CD4A, AQP1) between READ and CRC.
Actually, previous studies indicated that HSPA1A [18,19], UNC5C [20,21], CBFA2T3 [22,23], NPTX2 [24,25], EPHX4 [26], and AQP1 [27][28][29] have been reported to be associated with CRC. This result also suggested the feasibility of our present results. While other genes have not been reported to be associated with COAD/READ/CRC, their associations with other cancers have also been reported, such as APBB1, HHIP, and HOXA3 with lung cancer [30][31][32][33];TCF7L1, HOXB3, and ABCC2 with pancreatic cancer [34][35][36][37]; SALL1 with neck cancer [38,39];C2CD4A with breast cancer [40]. These results suggest both their relevance and their role as prognostic genes for corresponding cancers.Based on the results of our study and the relationship between those genes and other kind of cancers found in previous studies, it is suggested that those genes mentioned above may be used as prognostic genes for colorectal cancer.
TNM staging were closely associated with the overall survival. We also evaluated the relationship between prognosis genes and TNM staging. And there were 3 overlap genes (HSPA1A, APBB1, and EIF4E3) between COAD and CRC, and 1 overlap genes (AQP1) between READ and CRC.This result suggested that all of these genes not only maybe serve as prognostic genes, but also maybe as therapeutic targets for preventing tumor metastasis.

Conclusions
Integrated analysis of the abnormal methylation alteration in CRC indicated that DMGs may be involved in the occurrence of CRC. Moreover, the present study could help clinicians to further known the function of DMGs inCRC. Our study might be the fundamental work for further mechanisms elucidation of CRC and identi cation of the prognostic genes of CRC. However, it was worth emphasizing that the regulatory network of methylation-mRNA is particularly complex, and the number of case and control data used in the study was not su cient. We just provide an analysis direction depended on theoretical knowledge and clinical outcomes, more scienti c research are needed to con rm our ndings.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The data that support the ndings of this study are openly available in TCGA at https://portal.gdc.cancer.gov/ 40. Cui Q, Tang J, Zhang D, Kong D, Liao X, Ren J, Gong Y, Xie C, Wu G: A prognostic eight-gene expression signature for patients with breast cancer receiving adjuvant chemotherapy. J Cell Biochem 2019.    Correlation analyses between gene expression andhypermethylated/hypomethylated in COAD, READ, and CRC.**** p 0.0001.

Figure 6
Integrativeanalysis of prognostic genes with TNM staging in COAD, READ, and CRC.a-i, overall survival analysis of TNM staging in COAD (a-c), READ (d-f), and CRC (g-i).Associated analysis of prognostic