RNA Language in Colorectal Cancer Using an Integrative Bioinformatics Approach

Background: Identification of competing endogenous RNAs (ceRNAs), especially circRNAs, have become new hotspots in cancer researches. Although, their roles and underlying mechanisms in colorectal cancer (CRC) development remain mostly unknown. The aim of this study was to integrate both coding and non-coding available microarray data in development of CRC coupled with bioinformatics analyses to understand a more inclusive pathobiologic map regarding their molecular interactions and functions. Methods: The microarray data were retrieved from the Gene Expression Omnibus (GEO) database and analyzed. Several bioinformatics tools and databases including CircInteractome, CSCD, miRTarbBase, TargetScan, miRmap, GEPIA, STRING, Enrichr, DAVID, and MCODE were applied for further elucidation. Principal component analysis (PCA) has seperatly run for four datasets. The dysregulated circRNA-miRNA-mRNA network in CRC was constructed by Cytoscape. In addition, co-expression and protein-protein interaction (PPI) networks were established based on differentially expressed (DE) protein coding genes in CRC. Results: PCA disclose colorectal tumor and normal tissuses could be distinguished not only by mRNAs expression profile, but also by both circRNAs and miRNAs expression profiles. We identified 14 DE mRNAs (commonly between two datasets), 85 DE miRNAs and 36 DE circRNAs in CRC tissues compared with normal tissues. Taking their potential interactions into account, a circRNA-miRNA-mRNA network was constructed. Then, according to ceRNA hypothesis, the axes with expression in the desired direction were extracted. Our results disclosed some DE circRNAs with potential oncogenic (circ_0014879) or tumor suppressive (circ_0001666 and circ_0000977) effects. Finally, PPI network suggests pivotal roles for DOCK2 and PTPRC dysregulation in progression of CRC, possibly by facilitating of tumor escape from immune surveillance. Conclusion: Current study proposes a novel regulatory network consisting of circRNAs at the upstream of oncotranscriptomic cascade in CRC development, suggesting their potentiality to be utilized as both prognostic and therapeutic biomarker.


Background
Colorectal cancer (CRC), regardless of gender, is classified as the third most prevalent and second most lethal malignancy in 2018 [1]. However, it could be well treated and managed with suitable and early diagnosis. Thus, a better understanding of molecular signatures, potential biomarkers and therapeutic targets in CRC is necessary to gradually improve the diagnosis and treatment efficacy. Nowadays, high throughput technologies have helped discover and investigate both coding and non-coding dysregulated transcripts in tumor cells. The pivotal roles of non-coding RNAs in gene expression regulation and consequently cancer initiation and progression have been determined. Recently, circular RNAs (circRNAs) as a newfound category of non-coding RNAs have been highlighted in cancer researches. circRNAs due to high stability and tissue/stage specificity show potential to be diagnostic and prognostic biomarkers [2]. In 2011, ceRNA hypothesis was proposed by Salmena et al [3]. They demonstrated that RNA transcripts communicate with each other by microRNA response elements (MREs). According to this hypothesis, the circRNA function in microRNAs sponging and consequently dysregulation of protein-coding genes expression has been established in different malignancies as long as circRNA-miRNA-mRNA axis is considered [4,5]. However, various circRNAs have been reported as participating in the CRC pathogenesis, their molecular interactions with other transcripts are still unknown [6]. These interactions could improve current knowledge about molecular mechanisms underlying CRC and potential targeted therapies. Integrating experimental data with bioinformatics could present a powerful method to disclose molecular interactions among different transcripts in tumor cells as well as their potential functions. The aim of this study is integration of available microarray data concerning circRNA, miRNA and mRNA in CRC with bioinformatics analyses by consensus strategy to gain a more accurate comprehension of their molecular interactions and functions. We have constructed a dysregulated circRNA-miRNA-mRNA network in colorectal tumors as well as co-expression and proteinprotein interaction (PPI) networks. Our findings propose novel regulatory network underlying CRC progression and uncover molecular interactions between dysregulated transcripts in colorectal tumors.

Differentially Expressed Genes from GEO
To investigate and integrate differentially expressed (DE) RNAs in colorectal cancer, including both coding and non-coding transcripts, four microarray datasets have been selected from GEO database and all of them were analyzed between colorectal tumors and non-tumor tissues. Information about the four used datasets is summarized in Table 1. All raw expression data were normalized and log2-transformed (Supplementary Figure 1). Limma, a Bioconductor package for differential analysis of microarray data was run to determine DE RNAs in each dataset with the following criterias: Two datasets, GSE41657 (platform GPL6480) and GSE128435 (GPL4133 platform), concerning mRNA expression with cutoff criteria of adj.p.value < 0.05 and log fold change ≥|2| were analyzed. The subseries 128435 is part of superseries GSE128449. Finally, the common genes between the both datasets that show expression in the same direction (up-or downexpressed) have been selected for further analysis. GSE128449 (platform GPL14767) which is part of superseries GSE128449 was analyzed to determine DE miRNAs. In this step, DE miRNAs with adj.p.value < 0.05 and log fold change ≥|3| were selected. In order to assess DE circRNA in CRC, GSE126095 (GPL19978 platform) was used and transcripts with adj.p.value < 0.05 and log fold change ≥|3| were retrieved. The flowchart of data and bioinformatics analyses is visualized in Figure 1. In order to investigate similarities and dissimilarities between samples, Clustered heatmaps for top 100 DE RNAs for four datasets have been analyzed.

Principal component analysis (PCA)
The PCA plot evaluates differences and similarities among samples and determine whether samples can be grouped. Hence, it could separate normal and tumor tissues based on their gene expression profiles. Scatterplots of PCA from four microarray data have been separately generated using the ggfortify package in R.

circRNA-miRNA-mRNA network
First, CircInteractome online database [7] was used to bioinformatically find potential miRNAs that have a complement seed region on sequences of DE circRNAs. For each DE circRNA, the entire predicted miRNAs were obtained. Second, the overlapping miRNAs between the predicted miRNAs and the DE miRNAs were obtained. Third, common targets of DE miRNAs with DE mRNAs that were retrieved from both GSE41657 and GSE128435 were assessed. To this end, miRTarBase [8] was used for detection of experimentally validated miRNA-mRNA interactions and two databases, TagetScan [9] and miRmap [10], were run for bioinformatically predicted miRNA-mRNA interactions. It should be noted that we have selected interactions that are commonly predicted in both databases. Finally, a dysregulated circRNA-miRNA-mRNA network in colorectal cancer was constructed by Cytoscape version 3.6.1. Characteristics of 16 DE circRNAs which are involved in this network are shown in Table 2. Then, the axes that show expression status according to ceRNA hypothesis were extracted. In this step, we have presumed that circRNAs show opposite expression direction to their downstream miRNAs and same direction to downstream mRNAs in a circRNA-miRNA-mRNA axis. Cancer-Specific CircRNA (CSCD) database [11] was used to visualize basic structural patterns of the circRNAs involved in circRNA-miRNA-gene axis.

Co-expression Network Construction
The top 40 co-expressed genes in colorectal cancer for each DE mRNA were retrieved from GEPIA database [12] according to the Cancer Genome Atlas (TCGA) data. The co-expression network was visualized by Cytoscape and hub nodes were identified according to the degree and betweenness centrality. Finally, the functional analysis of hub genes in this network was carried out by DAVID database [13].

Protein-Protein Interaction (PPI) Network Construction
First, PPIs were identified by String database [14] and PPI network visualized by Cytoscape. To identify hub nodes, the degree and betweenness centrality were considered. Second, MCODE, a Cytoscape app, was run to find the most significant protein module. Third, Enrichr database [15] was used to understand biological function of hub genes in PPI network.
Oncogenomic and Oncotranscriptomic Analysis cBioPortal v. 3 [16] was run to identify somatic alterations of hub genes in 526 colorectal tumor samples according to TCGA, PanCancer atlas. In addition, GEPIA was used to assess expression of hub genes in CRC tissues compared with normal tissues based on TCGA data.

Differentially Expressed mRNAs, miRNAs and circRNAs
In order to obtain differentially expressed (DE) protein coding genes in CRC, two GEO datasets (GSE 41657 and GSE 128435) were analyzed. In this step genes with adj.p.value < 0.05 and log fold change ≥| 2 | were selected. Subsequently, 14 common DE genes were found in both datasets that show same dysregulation direction (Supplementary Table 1). Two of the genes are upexpressed and 12 are down-expressed. Details of the analysis are shown in Figure 1. GEO dataset concerning miRNAs expression in CRC (GSE128449) was analyzed. 85 DE miRNAs with cutoff criteria of adj.p.value < 0.05 and log fold change ≥| 3 | were obtained (Supplementary Table 2). As mentioned, GSE126095 for finding DE circRNAs in CRC was checked out. Thirty six DE circRNAs with adj.p.value < 0.05 and log fold change ≥| 3 | were obtained (Supplementary Table  3). The clustered heatmaps for all analyzed datasets are shown in Figure 3 and Supplementary Figures 2-4. In addition the volcano plots of DE RNAs in four datasets have been drawn (Figure 2A). The gene expression PCA plot provides insights into the association between samples. As it is shown in Figure 2B, colorectal tumor and normal tissuses could be distinguished not only by mRNAs expression profile, but also by both circRNAs and miRNAs expression profiles.

Potential DE circRNA-DE miRNA and DE miRNA-mRNA interactions
According to circInteractome database, 54 DE circRNA-DE miRNA interactions are bioinformatically predicted. Three databases were run to investigate DE gene-DE miRNA interactions. Six experimentally validated interactions by mirTarbase and 50 bioinformatically predicted interactions by TargetScan and miRmap were consistently obtained (Supplementary Table 4).

CircRNA-miRNA-mRNA network
The dysregulated circRNA-miRNA-mRNA network in colorectal cancer including 16 circRNAs, 14 miRNAs and 11 mRNAs was constructed ( Figure 4A). The extracted axes according to ceRNA hypothesis have been visualized in Figure 4B including five circRNAs, four miRNAs and eight mRNAs. The involved circRNAs in these axes are available in Cancer-Specific CircRNA (CSCD) database ( Figure 5).

Co-expression network construction
As mentioned, 11 dysregulated genes related to DE miRNAs were detected. Their top 40 coexpressed genes in CRC were used to construct co-expression network ( Figure 6). In this network, GAS7, C10ORF54, RELL1, and TSPAN1 are considered as hub genes because of having higher degree and betweenness centrality. In addition, functional analysis of hub genes indicates their roles in cancer related processes such as cell proliferation and migration (Table 3).

Protein-Protein interaction (PPI) network
The PPI network of 11 DE genes and their co-expressed genes has been established after removing unconnected nodes based on String output ( Figure 7A). This network shows PTPRC, SPI1, C3AR1, LCP2 and DOCK2 genes as hubnodes. We run MCODE to detect a significant module ( Figure 7B). It should be noted that all hubnodes are involved in the detected module. Functional annotation according to Enrichr web server revealed the roles of hub genes in immune system ( Figure 7C).

Oncotranscriptomic and Oncogenomic Analysis
Expression analysis for five hub nodes of PPI network using GEPIA revealed that while all of them are down-expressed in tumors compared with normal tissues, only DOCK2 and PTPRC dysregulation are statistically significant ( Figure 8A). Investigation of DOCK2 and PTPRC somatic mutations in colorectal cancer reveals a fairly high frequency (10% and 8% respectively) of genetic alterations in tumors. As it is shown in Figure 8B, the main proportion of these alterations is composed of missense and truncating mutations.

Conclusion
Investigation of non-coding RNAs as well as identification of competing endogenous RNAs (ceRNAs) have become new hotspots in cancer research. Among them, circRNAs are more prominent due to their specific characterizations such as high stability, time-and tissue-specificity. circRNAs can play an important role in gene expression regulation at both transcriptional or posttranscriptional levels [17]. Various studies have demonstrated that dysregulated circRNAs are involved in cancer initiation and progression. For example, up-regulation of hsa_circ_0000069 and its role in cell proliferation, migration, and invasion were observed in CRC [18]. Recent evidences have disclosed that circRNAs could sponge miRNA and suppress their functions. Although, there are rare studies concerning circRNA molecular interactions in CRC. miRNAs are a class of small noncoding RNAs which have pivotal roles in regulating gene expression at post-transcriptional level through suppressing target mRNA. It is well known that dysregulated miRNAs are involved in several oncopathways and result in cancer development. In this study, 36 DE circRNAs and 85 DE miRNA with log fold change ≥ |3| were identified in CRC tissues compared with non-tumor tissues as well as 14 DE mRNAs, which consistently show dysregulation in both GSE41657 and GSE128435 GEO datasets. These DE transcripts were integrated using several databases and bioinformatics tools in order to construct a dysregulated circRNA-miRNA-mRNA network in CRC. In addition, mRNA, miRNA and circRNA datasets have been seperately analyzed to assess their ability in separating colorectal tumor and normal samples. The results revealed that each of them could alone distinguish colorectal tumor and normal samples. According to ceRNA hypothesis, we have presumed that circRNAs, as miRNA sponges, show opposite expression direction to their downstream miRNAs and same direction to downstream mRNAs in a circRNA-miRNA-mRNA axis. Since, 11 axes with this expression status were extracted from circRNA-miRNA-mRNA network, including five circRNAs, four miRNAs, and eight mRNAs. Previous cancer researches concerning these transcripts are summarized in Table  4. However, most of these circRNAs and miRNAs have not been previously reported in CRC. It should be noted that six of seven detected circRNAs are available in Cancer-Specific CircRNA Database (CSCD) and the location of their microRNA response elements (MRE) are shown in Fig.  4.
We have established potential dysregulated circRNA-miRNA-mRNA axes that could play pivotal roles in CRC oncotranscriptomic. Our analyses reveal that circRNAs could show both oncogenic (oncoCirc) and tumor suppressive functions in CRC. For example, circ_0014879, an up-expressed exonic circRNA, could be an oncoCirc through circ_0014879/miR-885-5p/ARG2 axis. miR-885-5p is bioinformatically predicted to be sponged by circ_0014879. Interestingly, miR-885-5p is a tumor suppressor miRNA and down-expressed in different cancer (Table 4) as well as in our results. ARG2 is an enzyme that plays a part in immunosuppressive tumor microenvironment and tumorigenesis. ARG2 is predicted to be miR-885-5p target. This protein shows overexpression and association with poor prognosis in different cancers. In 2019, Youjun Wu et al. demonstrated that expression of ARG2 is significantly higher in CRC samples than normal tissues [19]. In addition, they revealed downregulation of ARG2 results in inhibiting colorectal cancer cells growth. According to our investigation, ARG2 is significantly up-expressed in both GSE41657 and GSE128435 datasets with log fold change 2.15 and 2.05, respectively. Since, it is suggested that overexpression of circ_0014879 as an oncoCirc could result in upregulation of ARG2 oncogene through inhibition of miR-885-5p function in CRC.
On the other hand, circ_0001666, a down-expressed exonic circRNA, could be a tumor suppressive circRNA in CRC through sponging miR-1276 in circ_0001666/miR-1276/RASSF2 axis. It is revealed that RASSF2 inhibits tumor cell growth and suggested that it is a tumorsuppressor gene in CRC [20]. Moreover, our results disclose potential tumor suppressive roles of circ_0000977 in CRC through sponging miR-1208. Interestingly, this miRNA might posttranscriptionally inhibit two tumor suppressor genes in CRC including IL10RA and AKR1B10. In 2018, Zadka et al. reported that IL10RA is down-expressed in CRC tissues and shows negative correlation with the clinical stage and proliferation [21]. It is shown that AKR1B10 expression is significantly decreased in CRC samples and its down-expression correlates with decreased survival and poor prognosis of patients [22]. In addition, AKR1B10 knock down resulted in inhibition of apoptosis in colorectal cancer cells. It should be noted that expression of RASSF2, IL10RA and AKR1B10 was down-regulated in both GSE41657 and GSE128435 datasets with log fold change < -2. Collectively, it can be proposed that under-expression of both circ_0001666 and circ_0000977, as tumor suppressive circRNAs, contributes to down-regulation of some tumor suppressive genes in CRC including RASSF2, IL10RA and AKR1B10. These finding should be more validated by functional studies in future. In the next step, we constructed a dysregulated co-expression network in CRC, including 11 DE genes and their co-expressed genes. This network shows GAS7, C10ORF54, TSPAN1, RELL1 genes are hub nodes and could play important roles in progression of CRC tumors as far as expression profile is concerned. Functional annotation based on DAVID database revealed their roles in cell proliferation, cell-cell adhesion, cell migration and cell differentiation (Table 3).
Consistently, previous studies have demonstrated their dysregulation in CRC. Remarkable downregulation of GAS7 due to promoter hypermethylation was reported in CRC [23,24]. It is suggested that the downregulation of GAS7 results in high metastatic ability of tumor cells. RELL1 is a family member of tumor necrosis factor receptor (TNFR) that induces apoptosis [25,26]. Here we proposed the important role of RELL1 in CRC oncotranscriptome for the first time; according to our results, RELL1 shows significant expression correlation with two down-regulated DE genes including RAB3B and ACER3 in CRC related TCGA data. C10ORF54 (VISTA) is a protein-coding gene, which codes immunoregulatory receptor. This protein inhibits the T-cell response and acts during adaptive immune responses [27]. Since, its expression can be a predictive biomarker as long as cancer immunotherapies are concerned [28]. Interestingly, our analyses based on TCGA data (Supplementary Figure 5) shows that GAS7 and C10ORF54 are significantly downregulated in CRC tissues. However, in 2018, Shan Xie et al. have reported that C10ORF54 expression was significantly high in CRC tissues compared with normal tissues [29]. It seems that this controversy should be noticed in CRC patients who are supposed to receive immunotherapies. Another hub gene in co-expression network is TSPAN1, which functions in cell mitosis and differentiation. TCGA data show significant up-expression of TSPAN1 in CRC tumors than normal tissues (Supplementary Figure 5). It is reported that overexpression of TSPAN1 is correlated to the poor prognostic factors of CRC patients [30]. In 2010, Chen et al. disclosed that suppression of TSPAN1 results in reducing proliferation and invasion of colon cancer cells [31]. To assess the protein interactions between DE mRNAs and their co-expressed genes, we established a PPI network by STRING search tool. This network revealed two hub nodes DOCK2 and PTPRC which significantly downregulate in CRC tumors according to TCGA data (Fig. 7A). Oncogenomic analysis shows that a notable proportion of their downregulation is probably attributed to truncating mutations (Fig. 7B). Although it seems that the transcriptional and posttranslational regulatory mechanisms are involved in down regulation of DOCK2 and PTPRC2 in CRC. Hematopoietic cells express DOCK2, which modulates activation and migration of immune cells. DOCK2 was shown to be involved in inflammatory diseases such as malignancies [32]. Yu et al. have demonstrated that DOCK2 mutation has a high frequency (7.7%) in colon tumors and low expression of DOCK2 is associated with poor prognosis of patients [33]. Another investigation has documented that DOCK2 is a prominent hypermethylated gene in CRC tissues [34]. PTPRC protein is necessary for T-cell activation and its mutation is associated with severe combined immunodeficiency. Down-regulation of PTPRC is reported in several cancer types [35,36]. Here we show that DOCK2 and PTPRC are not only hub nodes of dysregulated PPI in CRC based on degree and betweenness centrality but also involved in identified module with MCODE algorithm (Fig. 6B). Functional enrichment analysis of module members indicates their roles in immune system (Fig. 6C). Taken together, it is suggested that down regulation of DOCK2 and PTPRC in CRC contributes to the escape of tumor from immune surveillance; accordingly, their dysregulation could be involved in resistance to immunotherapy. In conclusion, in this study, combination of the both coding and non-coding microarray data as well as bioinformatics tools were applied to construct the dysregulated circRNA-miRNA-mRNA network and detect ceRNA axes in CRC. We identified potential oncoCirc (circ_0014879) and tumor suppressive circRNAs (circ_0001666 and circ_0000977) in CRC that could become both prognostic and therapeutic biomarkers. In addition, PPI network implies pivotal roles of immune system related proteins such as DOCK2 and PTPRC in CRC progression, which could be important in immunotherapies.       Table 4. Summary of previous researches about dysregulation of circRNAs and miRNAs involved in circRNAs/miRNAs/mRNAs axes (Fig. 3) in different cancer types. The red and black transcripts respectively present down-and up-expression in CRC based on our analyses.