Dissecting drug and prognostic lncRNA signature mediated by DNA methylation and transcription factor in colon cancer


 Background: Colon cancer is still the most commonly diagnosed malignancy and leading cause of death worldwide. Apart from living habits, genetic and epigenetic changes are key factors to influence the risk of colon cancer. However, the impact of epigenetic alterations in non-coding RNAs and the consequences for colon cancer has not been fully characterized.Methods: We detected differential methylation sites (DMSs) in lncRNA promoters, and identified lncQTMs by association test. To investigate TF binding affected by DNA methylation, we characterized known TF motif occurrence among DMSs collected from MEME suit. We further combined methylome and transcriptome data to construct TF-methylation-lncRNA relationships. To study the role of lncRNAs in drug response, we used pharmacological and lncRNA profiles derived from CCLE and predict drug response by lncRNA expression level. We also used the combination of TF-methylation-lncRNA relationship to stratified patient survival information by a risk model.Results: DNA methylation display global hyper-methylation character in lncRNA promoters, and they tend to have negative relationship with the corresponding lncRNAs. Negative lncQTMs located near TSS have more significant and stronger correlation with the corresponding lncRNAs. Some lncRNAs mediated by the interplay between DNA methylation and TFs are proved markers for colon cancer. Typically, lncRNA CAHM, RP11-834C11.4 and LINC00460 are good predictors for 5 drug components (17-AAG, Sorafenib, TKI258, RAF265, Topotecan) in colon cancer. And we found HES1_cg24685006_RP4-728D4.2 and SREBF1_cg05372727_LINC00460 relationships are prognostic signatures for colon cancer.Conclusions: These findings suggested lncRNAs mediated by the interplay between DNA methylation and TFs are promising predictors for drug response, besides, combined TF-methylation-lncRNA can serve as prognostic signature for colon cancer.

from normal mucosa [3]. Early screening colonic lesions is beneficial for colon cancer diagnosis and can reduce the mortality [4]. DNA methylation, a stable epigenetic mark, have been revealed to influence a wide range of biological mechanism [5]. Studies have emphasized the important role of DNA methylation in early detection for colon cancer [6,7]. However, DNA methylation in non-coding genome and its downstream effect in colon cancer has not been fully characterized.
Human genomes are pervasively transcribed into non-coding RNAs [8]. Long non-coding RNAs (lncRNAs) are capped transcripts with length longer than 200 nucleotides [9]. The emergence of lncRNAs have broaden our understanding for many biological and disease processes [10,11].
LncRNAs can effect cancer progression in multiple aspects, including drug sensitivity, patient prognosis and cancer cell inhibition [12][13][14]. Some existing studies have found lncRNA could be regulated by epigenetic alteration, and have impact on cancer development [15]. Therefore, systematically identification DNA methylation character on lncRNAs and inferring the consequences in colon cancer will be helpful for cancer prevention and treatment.
Transcription factors (TFs) are cell fate controllers and frequently involved in many diseases, including cancer [16]. Single TF can bind to thousands of sites throughout the genome by typically recognizing DNA sequence to guide gene transcription action [17]. Recent studies revealed TF binding condition can be effect by altered cytosine methylation patterns [18]. For instance, DNA methylation at CpG island of gene promoters will block TF binding and silence gene expression [19]. While there also exist cytosine methylation that can promote TF binding [20]. Based on this, we intend to study the interaction between DNA methylation and TF and their role in lncRNA regulation for colon cancer.
Here, we systematically analyzed DNA methylation character in lncRNA promoters, and inferred lncRNAs that mediated by the interplay between DNA methylation and TFs. The lncRNAs identified in our study can serve as predictors to drug response in colon cancer. Besides, some TF-methylation-lncRNAs can used to stratify patients prognosis in colon cancer. We hope our results could provide guidance for clinical research for colon cancer in the future.

Materials And Methods
Methylation and expression datasets collection profile processed from RNA-seq in colon cancer (COAD, Colon adenocarcinoma) from the UCSC Xena archive (https://xena.ucsc.edu/). The expression value has been log2 transformed by UCSC Xena. In order to further explore the relationship between DNA methylation and lncRNA expression, we only kept colon samples that appeared in both the DNA methylation and the expression datasets. We finally got 306 colon cancer and 19 adjacent mucosa tissue samples, and used these samples to do downstream analysis.

Identification Differentially Methylated Sites In Lncrnas
Firstly, we omitted methylation probes that have missing value in more than 80% samples, and imputed missing values by impute package in R software. Then, we did t test between colon cancer and the corresponding mucosa tissue samples, and calculate methylation difference between mean value of colon cancer and the corresponding mucosa tissue samples. Probes with p value < 0.01 and absolute methylation difference (mean methylation level in colon cancer samples -mean methylation level in normal samples) > 0.2 were regarded as differential methylation sites. Circos plot of differentially methylated sites was visualized by circlize package in R software [23].

Lncqtm Identification
Based on the resulting differentially methylated sites, the association of these methylation site and the corresponding lncRNAs was determined by pearson correlation test (p < 0.01). For the correlated methylation-lncRNA pairs, the related methylation site are potential epigenetic regulators of the lncRNAs, we called these methylation-lncRNA pairs as lncRNA expression quantitative trait methylations (lncQTMs).

Identification Tf Binding Motifs Around Lncqtms
To explore the interaction between DNA methylation and TF, we collected comprehensive human TF motif position weight matrices (PWMs) from MEME suit [24], and we scan motif occurrence around +/-100 bp of methylation site in lncQTMs using the FIMO software with the default parameters [25]. For the significant motifs (p < 0.0001), we filtered out those not cover the methylation sites. Besides, we expect the motifs have more enrichment of the lncQTMs than the background methylation sites (all methylation sites within lncRNA promoters), so for each motif, we compute the Odds Ratio (OR) and its 95% confidence interval OR, the motif with lower OR value than 1.1 are retained for further study [26].

Establishing Tf-methylation-lncrna Network
Firstly, for the selected motifs, we did association test for the corresponding TF and methylation site, and get significant correlated TF-methylation relationships (pearson test, p < 0.01). What's more, we expect TF can regulate lncRNA expression by binding to the methylation site, so, we also test the relationship between TF and target lncRNA (pearson test), and retained correlated TF-lncRNA relationship (p < 0.01). Based on this, we construct TF-methylation-lncRNA relationships in colon cancer.
Collection Drug Response Data In Colon Cancer CCLE database has provide comprehensive pharmacological profiles across hundreds of cell lines [27], besides, it also sequence RNA-seq data in cell lines, which enable the quantification for lncRNAs.
We download the drug response data in CCLE, and select the colon cancer cell lines (COAD cell line) that consist with the current study, we finally get 15 cell lines. We used activity area to evaluate drug sensitivity, and kept lncRNAs that expressed in at least 50% cell lines.

Survival Analysis By Tf-methylation-lncrna Relationship
We downloaded matched TCGA survival records from R package named TCGAbiolinks [30], for each TF-methylation-lncRNA relationship, we applied multivariate Cox proportional regression model to dissect the association between TF-methylation-lncRNA and patient overall survival (OS) among colon cancer samples. After that, we designed a risk score model based on the coefficient from Cox regression model, and each colon cancer sample will get a risk score as follows. According to the patient risk scores, we categorized colon cancer samples into two risk group by the median value of risk scores, and applied survival analysis by Kaplan-Meier method, the difference of survival time between two group were done by log-rank test.

Results
LncRNA promoter present a global hyper-methylation character in colon cancer some studies have point out DNA methylation have potential role in early screening of colon cancer [31]. Here, we investigate the DNA methylation character in lncRNA promoter for colon cancer. We have successfully mapped 51,544 methylation probes to lncRNA promoter regions (Fig. 1A), which is enough for studying DNA methylation in lncRNA. Next, we identified differentially methylated sites (DMSs) in lncRNAs, and finally get 1,809 DMSs (p < 0.01, absolute methylation difference > 0.2, Fig. 1B). Generally, we observed most DMSs display higher methylation value in cancer samples than normal samples (Fig. 1B). So, we classified DMSs as hyper-(methylation difference > 0) or hypomethylation site (methylation difference < 0), hyper-DMSs are more than hypo-DMSs in all chromosomes, besides, we found chromosome 19 only have hyper-DMSs in lncRNA promoters ( Fig. 1C). In addition, we found the number of hyper-DMSs is much more than hypo-DMSs in antisense, lincRNA and process transcript (9.8, 4.2, 7.7 fold respectively, Fig. 1C). These hyper-or hypo DMSs are potential regulators for the host lncRNAs, and may participate in the pathogenic processes for colon cancer.

Dissecting the relationship between promoter methylation and lncRNA in colon cancer
To investigate the role of our identified DMSs in lncRNA transcription regulation, we systematically identified correlated DNA methylation site-lncRNA pairs in colon cancer ( Fig. 2A), and named it as lncQTMs. We found in most chromosomes, promoter methylation sites prone to negatively regulate the lncRNA expression, besides, in some chromosomes (such as chromosome 1,9,14,17,18,21,22), there are no positive lncQTM pairs (Fig. 2B). In total, there are 392 negtive lncQTMs (correlation coefficient < 0, 87.9%) and 54 postive lncQTMs (correlation coefficient > 0, 12.1%) (Fig. 2C). This observation indicates promoter methylation site are more likely to inhibit lncRNA expression in colon cancer, which is consist with previous findings for DNA methylation in protein coding genes regulation [32]. In addition, we classified lncQTMs according to the distance between methylation sites and TSSs in positive and negative lncQTMs respectively. For negative lncQTMs, methylation sites with closer distance to lncRNA TSSs display a higher significant character (Fig. 2D), however, for positive lncQTMs, some methylation sites located within the 1500 pb to 2000 bp of lncRNA TSSs show more significant character than the methylation sites within TSS 200 bp (Fig. 2E). Furthermore, we analyzed correlation size for the lncQTMs in different distance class, in negative lncQTMs, DNA methylation site located within 500 bp for TSS present a strong correlation with lncRNA (Mann-Whitney U test, p < 0.01), and there is no significant difference when the distance increased (Fig. 2F). For positive lncQTMs, the correlation coefficient significantly increased when the distance between methylation sites and lncRNA TSSs reach in 1500-2000 bp scale (Mann-Whitney U test, p < 0.05, Fig. 2G). This result suggest lncRNA promotor methylation sites are likely to inhibit expression, the sites located closer to the TSS display a more significant and stronger correlation with the corresponding lncRNAs.

Identification Acting Tf Around Lncqtms In Colon Cancer
DNA methylation can modulate gene expression through multiple ways [33], specifically, it can shape TF binding events across human tissues [34,35]. We thus speculate human TFs might sensitive to the changes of DNA methylation in lncQTMs. Based on this, we scanned TF motif occurrence in lncQTMs using FIMO tool (Fig. 3A), for the significant motifs (p < 0.0001, the methylation site located within the motif), we expect TF motifs are more enriched in lncQTMs than all lncRNA promoter methylation sites (lower OR > 1.1). After filtering, we got 155 motifs around lncQTMs (Fig. 3B). These motifs associated TFs are potential regulators of the corresponding lncRNAs. To confirm the relationship between TFs, methylation sites, and lncRNAs, we identified significantly correlated TF-methylation site and TF-lncRNA pairs by pearson method (p < 0.01). Finally, we obtained 16 TF-methylation-lncRNA relationships in colon cancer, which comprising 13 TFs, 15 methylation sites and 15 lncRNAs (Fig. 3C).
Among 15 lncRNAs regulated by the interplay between TF and methylation, 8 lncRNAs are display a differential expression pattern (t test, p < 0.01, Fig. 4D) between cancer and normal samples. Of these lncRNAs, lncRNA HAND2-AS1 have been proved to sponge miR-1275 and suppress colon cancer development [36]; besides, hypo-methylation in lncRNA LINC00460 can promote colon cancer and served as potential biomarker [37]. As a result, these TF-methylation-lncRNA events may modulate colon cancer progression, altering the interaction between them may beneficial for therapy and prognosis.
LncRNAs can serve as predictors for drug response in colon cancer Recently, lncRNAs have shed light on drug response in human cancer [38]. Here, we also investigate the association between lncRNA and anticancer drug response. We downloaded pharmacological and lncRNA profiles for colon cancer from CCLE database. We totally got 24 drug components in colon cancer, and explore the effect of lncRNAs (derived from TF-methylation-lncRNA network) in drug sensitivity. We divided cancer samples into two group by lncRNA median expression value, and investigate the drug response (activity area) difference between two groups. During this process, we

Mining Tf-methylation-lncrna Prognostic Signature In Colon Cancer
The TF-methylation-lncRNA regulatory events detected by our research might affect patient prognosis in colon cancer. So, from a more comprehensive aspect, we consider the interaction between TFmethylation-lncRNA, and did multivariate Cox proportional regression by TF, methylation site and lncRNA on the patient OS time. Based on the regression result, we designed a risk model to evaluate patient prognosis across colon cancer samples (method). Each patient will get a risk score in this process, we stratified patients into low and high-risk group by median risk score among patients, after that, we did survival analysis by Kaplan-Meier estimate method. For the identified TF-methylation-lncRNA relationships, we identified 2 of them (HES1_cg24685006 _RP4-728D4.2, SREBF1_cg05372727_LINC00460) are significantly associated with survival outcome in colon cancer (log rank p < 0.05, Fig. 5). Interestingly, LINC00460 have been identified as topotecan response predictor in our study, this result emphasizes the important role of LINC00460 in drug sensitivity prediction and prognosis stratification in colon cancer.

Discussion
Colon cancer remains one of the most commonly diagnosed cancer worldwide. Accumulation of genetic and epigenetic alterations can result progress in colon cancer [39]. DNA methylation is a stable epigenetic mark, alteration of DNA methylation has been demonstrated to influence a range of diseases, including cancer [40]. Some studies have reported the vital role of DNA methylation in coding genes [6,7,41]. Whereas its impact in non-coding genomes has not be widely understood.
Here, we systematically characterized DMSs among lncRNAs, and found most DMSs in lncRNA promoters display a hyper-methylation pattern. These DMSs might modulate the lncRNA transcription process, and further effect downstream biological responses triggered by the corresponding lncRNAs.
To gain insight of the relationship between DMSs and lncRNAs, we identified lncQTMs by association analysis in colon cancer. Of the 446 lncQTMs, 392 (97.9%) lncQTMs show negative regulation orientation. This observation suggest DNA methylation in lncRNA promoters are likely to repress lncRNA expression in colon cancer. Besides, negative lncQTMs that closer to lncRNA TSS have higher and more significant impact on the expression level. DNA methylation alteration can influence chromatin status and impact TF binding events. Sequence-specific TFs can recognize typical DNA sequence (motifs) in regulatory elements and regulate the corresponding genes [42]. We thus made use of the known TF motifs and investigated motif occurrence around lncQTMs. After that, we also consider the correlation between TFs, methylation sites and lncRNAs to establish TF-methylation-lncRNA relationships. Of the lncRNAs that regulated by the interplay between DNA methylation and TF, lncRNA HAND2-AS1 and LINC00460 have been proved to take a part in colon cancer development.
The lncRNA transcription regulation process caused by abnormal DNA methylation site have potential role in the colon cancer progression.
LncRNAs exhibit multiple functions in cancer drug sensitivity and patient prognosis [ [43][44][45]]. We thus reasoned whether the identified lncRNAs mediated by the DNA methylation and transcription factor in our study have impact on drug and prognosis efficiency for colon cancer. Based on the pharmacological and lncRNA profiles in CCLE database, we observed colon cancer cell lines show significant drug response difference by 3 lncRNAs expression stratification. This implicate these 3 lncRNAs might particulate in the corresponding drug response process. We divided samples into sensitive and resistant group and found 3 lncRNAs can serve as predictors for 5 drug response in colon cancer through logistic regression model. In addition, we also did survival analysis by combining TF-methylation-lncRNA relationship, we found HES1_cg24685006_RP4-728D4.2 and SREBF1_cg05372727_LINC00460 relationship can category colon cancer patients into low-and highrisk group. Particularly, lncRNA LINC00460 has been reported to have role in colon cancer progression, and our results further illuminate LINC00460 is drug and survival biomarker in colon cancer.

Conclusions
In summary, our research provides a comprehensive view of DNA methylation character in lncRNA promoters, and further predict lncRNA transcription process mediated by the interplay between DNA methylation and transcription factor. In addition, we highlight the lncRNAs role in drug response and patient prognosis. However, in the future, further studies are required to investigate the downstream biological mechanism regulated by the candidate lncRNA in our study for colon cancer. All data used to support the findings of this study have been presented in this manuscript.