Comprehensive Analysis of LncRNAs Role in Tumorigenesis of Colon Adenocarcinoma

Background Colon adenocarcinoma (COAD) is one of the most common gastrointestinal cancers globally. Molecular aberrations of tumor suppressors and/or oncogenes are the main contributors to its tumorigenesis. However, the exact underlying mechanisms of COAD pathogenesis are not clear yet. Thus, there is an urgent need to indicate promising potential diagnostic and prognostic biomarkers in COAD patients. In the current study, level 3 RNA-seq and miR-seq data and corresponding clinical data of colon adenocarcinoma (COAD) were retrieved from the TCGA database.The “limma” package in R software was utilized to indicate the differentially expressed genes. For in silico functional analysis, GO and KEGG signaling pathways were conducted. PPI network was constructed based on the STRING online database by Cytoscape 3.7.2. A ceRNA network was also constructed by “GDCRNATools” package in R software. Kaplan-Meier survival analysis (log-rank test) was used to indicate the relation between RNA expressions with patient’s survival time. Results The differential expression data demonstrated that 1512 mRNAs, 188 lncRNAs, and 329 miRNAs were differentially expressed in COAD. The GO and KEGG pathway analysis indicated that the differentially expressed mRNAs were primarily enriched in canonical processes in cancer. The PPI network showed that the PPARGC1A, ADAMTS5, PTGS2, FGFR2, TBX1, TWIST1, KIT, BDNF and MET proteins were the critical hubs. The Kaplan-Meier analysis revealed that 128 mRNAs, 15 lncRNAs, and 39 miRNAs were associated with overall survival time in the patients. The ceRNA network data demonstrated that three lncRNA including MAGI2-AS3, NEAT1, and SNHG3 genes were involved in the development of COAD. Conclusions Altogether, we integrated differentially expressed mRNAs, lncRNAs, and miRNAs in COAD bioinformatically. Our data suggested promising lncRNAs in the diagnosis and prognosis of patients with COAD.


Introduction
Colon adenocarcinoma (COAD) is one of the most common gastrointestinal (GI) cancers and is the second leading cause of cancer related death globally [1,2]. It is demonstrated that COAD occurs in approximately 5% of overall population at any given time in the world [3]. Despite current screenings and therapies such as endoscopic resection and radical surgery, near half of the patients are diagnosed as advanced cases of COAD, experiencing tumor recurrence and relapse [4]. The tumorigenesis of this cancer is a complicated multi-step process composed of epithelial cell proliferation, differentiation, apoptosis, survival, and invasion mechanisms [4]. Molecular aberrations of tumor suppressors and/or oncogenes are one of the main contributors in tumorigenesis [5].
However, the exact underlying pathogenesis of COAD are not clear yet. Thus, there is an urgent need to indicate promising diagnostic and prognostic biomarkers for COAD. Recent investigations have highlighted the role of non-coding RNAs in tumorigenesis in a variety of malignancies. Among different kinds of non-coding RNAs, long non-coding RNA (lncRNA) is a putative class of non-coding RNA with more than 200 nucleotides in length, without any open-reading-frame (ORF) to encode proteins [5]. A large body of evidence indicated that lncRNAs plays critical roles in a variety of biological processes including cell proliferation, cellular development and differentiation, carcinogenesis, and metastasis through modulating gene expression at chromatin remodeling, transcriptional and posttranscriptional levels [6,7]. Aberrant expression of lncRNAs has been welldocumented in different sorts of cancers [8]. Dysregulation of lncRNA HOTAIR, H19, MALAT1, SNHG7, GAS8-AS, and NEAT1 were extensively well-studied and have been demonstrated to contribute in tumorigenesis and poor prognosis [5,[8][9][10][11][12]. A mounting of investigations have shown that lncRNAs exert their function through competing for endogenous RNA (ceRNA) crosstalk [13]. For instance, it has been shown that lncRNA SCARNA2 was overexpressed in COAD tissues and it remarkably correlated to chemoresistance via targeting miR-342-3p to upregulate EGFR and BCL2 in COAD cells [13]. Furthermore, overexpression of lncRNA SNHG1 has been shown to modify epithelialmesenchymal transition (EMT) by binding to miR-497/miR-195-5p in COAD cells [14]. Moreover, lncRNA BDNF-AS was downregulated in COAD patients and served as a tumor suppressor gene.
Besides, ectopic expression of BDNF-AS suppressed the cell proliferation and migration via epigenetically downregulating GSK-3β expression through EZH2 [15]. Based on these and similar studies, many researchers have suggested the potency of lncRNAs as bimarkers in blood and serum.
LncRNAs have been discovered in microvesicles and exosomes which have been protected and stabilized in circulation [16]. In the current study, we comprehensively investigate lncRNAs, miRNAs, and mRNAs expressions from public database the Cancer Genome Atlas (TCGA) and we constructed a ceRNA network in COAD. Besides, we proposed novel potential biomarkers for COAD.

Sample and data collection
Clinical data of COAD were retrieved from the TCGA database (https://portal.gdc.cancer.gov/repository). The inclusion criteria were: (1) the histopathological diagnosis was COAD; (2)

RNA-seq and miR-seq data analysis
RNA-seq and miR-seq Level 3 data were collected from the TCGA database. The raw count of reads of RNA-seq and miR-seq data were normalized by Voom and TMM normalization methods. All the analyses were conducted in R software. The "limma" package in R software was utilized to indicate the differentially expressed mRNAs (DEmRNAs), lncRNAs (DElncRNAs), and miRNAs (DEmiRNAs) between normal solid tissues and primary tumors. The concluded data were filtered based on the |log2 fold change (FC)|>1 for DEmRNA, DElncRNA, and DEmiRNA. P-value <0.05 and false discovery rate (FDR) <0.05 were considered as significant thresholds.

Functional enrichment analysis and protein-protein interaction (PPI) network
For in silico functional enrichment analysis, gene ontology (GO) in three domains including biological processes, cellular components, and molecular functions, besides to Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathways were conducted. The GO and KEGG outputs were visualized by R software (ggplot2 package). The PPI network was constructed based on the STRING online database by Cytoscape 3.7.2. Molecular Complex Detection (MCODE) was used to analyze and predict the interactions (score value >0.4).

Statistical Analysis
All the differentially expressed data were analyzed by using R software (3.5.2) through the "GDCRNATools" package. Kaplan-Meier survival analysis (log-rank test) was used to indicate the relation between over or downregulation of RNA, based on median expression with patient's survival time. P-value <0.05 was considered as a significant threshold.

Differentially Expressed Genes
Our data demonstrated that 1512 mRNAs (433 up-regulated and 1079 down-regulated) were differentially expressed in COAD. Moreover, 188 lncRNAs (139 up-regulated and 49 down-regulated) were identified that were deferentially expressed in patients. Three hundred and twenty nine miRNAs (169 up-regulated and 160 down-regulated) have been found that were differentially expressed in the COAD samples. The data are presented in Figures 1 and 2 as well as Tables 2 and 3.

GO enrichment and KEGG pathway analysis
GO enrichment analysis demonstrated that the differentially expressed mRNAs were almost enriched in biological processes such as neurogenesis, neuron differentiation, locomotion, biological adhesion and regulation of cell population proliferation. Furthermore, GO analysis in cellular component revealed that the differentially expressed mRNAs predominantly contributed to intrinsic component of plasma membrane, plasma membrane region, extracellular matrix, neuron part, and cell junction. GO molecular function domain indicated that the genes were mostly enriched in calcium ion binding, signaling receptor binding, molecular function regulator, protein dimerization activity, and identical protein binding. GO outputs are presented in Figure 3. In addition, KEGG pathway analysis showed that the differentially expressed genes in the COAD patients were remarkably participated in pathways in cancer, ECM receptor interaction, cell adhesion molecules (CAMs), calcium signaling pathway, WNT signaling pathway, and PPAR signaling pathway ( Figure 4).

PPI network construction
The PPI network was constructed based on the STRING database to better understand the roles of the differentially expressed mRNAs. The data demonstrated that PPARGC1A, ADAMTS5, PTGS2, FGFR2, TBX1, TWIST1, KIT, BDNF and MET were the protein-protein interaction (PPI) critical hubs ( Figure 5).

Kaplan-Meier survival analysis of differentially expressed genes
Kaplan-Meier survival analysis was used to indicate the association of differentially expressed mRNAs, lncRNAs, miRNA, and prognosis of COAD patients. The data showed that 128 mRNAs, 15 lncRNAs, and 39 miRNAs were associated with overall survival time in the patients. Top 10 the each group are presented in Table 4.

LncRNA-miRNA-mRNA ceRNA network construction
According to competing endogenous RNA (ceRNA) hypothesis, which implicates that lncRNAs Moreover, the up-regulation of lncRNA colorectal cancer-associated lncRNA (CCAL) promotes CRC progression through suppressing the activator protein 2α (AP-2α) to ignite Wnt/β-catenin signaling pathway in the cells [28]. In the present study, the KEGG analysis indicated peroxisome proliferator-activated receptor (PPAR) pathway contribute with Wnt signaling. It has been discovered that the PPAR signaling pathway reduces cell proliferation and inhibits tumorigenesis in different types of cancer. Down-regulation of PPAR-α has been correlated to poor clinicopathological features of CRC that was remarkably higher in well to moderately differentiated adenocarcinoma than in mucinous adenocarcinoma [29]. In addition, lncRNA TINCR modulate PPAR signaling pathway through binding to According to the aforementioned data, the proposed lncRNAs can be considered as potential promising biomarkers that dive tumorigenesis through hijacking canonical signaling pathways in COAD.

Conclusions
Altogether, in our study, we demonstrated the differentially expressed mRNAs, lncRNAs, and miRNAs in COAD. GO and KEGG pathway analysis demonstrated canonical biological processes and signaling pathways, which have been well described previously in tumorogenesis. Moreover, our Kaplan-Meier analysis results suggested several prominent prognostic differentially expressed genes associated with poor overall survival in the patients. Last but not least, our ceRNA network presented important lncRNAs-miRNAs-mRNAs links between the differentially expressed genes. The data provided might pave the way for further investigation and clinical application in COPD patients.

Acknowledgments
This study was part of a Ph.D. dissertation (AP).

Author's contributions
AP, MRA, NN and MAK were all participated in study design, data analysis, and preparation of the drafted manuscript. All authors read and approved the final manuscript.

Funding
This study was supported financially by Mashhad University of Medical Sciences.

Availability of data and materials
The authors declare that the datasets on which the conclusions of this manuscript rely are deposited in publicly available repositories.

Ethics approval and consent to participate
The authors declare that there is no conflict of interest.       Table 4. Top 10 mRNAs, lncRNAs, and miRNAs that were associated with overall survival.    Protein-protein interaction (PPI) network of the differentially mRNAs in COAD (score > 0.4).