Analysis of differential gene expression between right-sided and left-sided colon cancer by bioinformatics analysis

Colon cancer is a common tumor of the digestive tract worldwide. Recent researches have revealed that colon cancer exhibits distinct differences in clinical and biological characteristics depending on the location of the tumor. However, the underlying genetic and molecular mechanism of the differences between right-sided colon cancer (RCC) and left-sided colon cancer (LCC) are not fully understood. This study aimed to identify molecular potential biomarkers and therapeutic targets for precise treatment of right-sided and left-sided colon cancer using bioinformatics analysis.


Background
Colon cancer is a common malignant tumor of the digestive system globally, with increasing incidence and mortality [1]. Colon cancer can be approximately divided into right-sided colon cancer (RCC) and left-sided colon cancer (LCC) according to the anatomical site of the tumor. To date, a growing body of researches have demonstrated that LCC and RCC differ distinctly in terms of the epidemiology, clinical feature, drug sensitivity, treatment strategy, as well as prognosis, and some researchers deemed that RCC and LCC were supposed to be even regarded as two distinct diseases [2,3]. Also, recent studies have shown that there were underlying molecular differences between RCC and LCC. High microsatellite instability (MSI-H) or mismatch repair de cient (dMMR) tumors are more commonly located in RCC, which might act a pivotal part in the chemotherapy and outcome of RCC [4], whereas chromosome instability (CIN) and p53 mutations were characteristic of LCC [5]. However, the deeper molecular and genetic mechanism of the RCC and LCC was not fully understood. Thus, in-depth studies into colon cancers with different tumor locations were warranted for improving the precise treatment of colon cancers.
Microarray, as a high-throughput sequencing method, has been widely used to obtain information about cancer gene expression pro les in life science [6]. Nowadays, the amount of sequence data has been growing at an unprecedented magnitude worldwide. Unfortunately, vast amounts of these sequence data were just stored in various online databases after program completion. Hence, re-analyzing these data from sequencing technology might constitute a cheap and useful way to understand tumorigenesis and cancer development.
In this study, we aimed to reveal the molecular genetic mechanisms responsible for biological phenotypic differences in RCC and LCC by bioinformatics analysis. First, we downloaded the gene microarray pro le (GSE44076) and used the Limma R package to identify the differentially expressed genes (DEGs) based on two sample groups of RCC and LCC. After analyzing the DEGs by a series of bioinformatics methods, we identi ed several hub genes and pathways from the PPI networks of RCC and LCC. Our research will provide some new insights into colon cancer with different anatomic sites.

Microarray data
The gene microarray pro le (GSE44076) was downloaded from the GEO database of the National Center for Biotechnology Information (NCBI) [7]. The dataset contains 98 colon cancer samples, including 60 LCC and 38 RCC patients. All samples were from early-stage (IIA-B) colon cancer patients diagnosed by pathologic evidence. Meanwhile, clinical data of these patients in the microarray pro le was collected and evaluated in this study.
Data processing and screen for DEGs R 3.6.0 software was utilized to normalize the dataset through the normalizeBetweenArrays function of the Limma R package [8], which can eliminate batch effects of the samples. Meanwhile, gene expression data of all samples were subjected to log2 transformation. Each probe ID of data was then converted to the corresponding o cial gene symbols by annotation of GPL570 platform; if multiple probe IDs mapped to the same gene symbol, then choose the max expression value representing that gene's expression. In order to obtain su cient numbers of DEGs in this paper, we used a relaxed cutoff criterion (a corrected P < 0.05 without the limitation of absolute fold change value) for statistical analyses. Eventually, the Limma R package was applied to identify DEGs by comparing expression value between LCC and RCC.

GO and KEGG analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID version 6.8, https://david.ncifcrf.gov/) is a publicly available database for gene functional annotation and pathway enrichment analysis. In this study, we mapped the DEGs above into the DAVID to carry out the KEGG pathway analyses and GO analyses, respectively. Any adjusted P-value less than 0.05 was considered as statistically signi cant. Then, visualization for results of GO and KEGG analysis was performed in R software.
PPI network and modules analysis String (https://string-db.org/) is an online software that can analyze the interaction relationships between the DEGs. Based on this software, PPI networks were constructed for LCC and RCC, respectively. Subsequently, the PPI networks were visualized by Cytoscape 3.6.1. According to the degrees, the top 5 genes were screened as hub genes of RCC and LCC. In addition, modules in the PPI networks were explored using the plug-in Molecular Complex Detection (MCODE) of Cytoscape. The KEGG pathway enrichment analysis of the DEGs in the modules was also carried out using DAVID tool.

Validation of the hub genes
In order to validate that these hub genes were not from potential baseline differences between right and left colon tissues, we veri ed the miRNA expression levels of all the hub genes between tumor tissues and non-tumor tissues.

Survival analysis for hub genes
The Gene Expression Pro ling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn) is an online service that provides overall survival (OS) or disease free survival (DFS, also called relapse-free survival and RFS) analysis of single or multiple genes. The survival analysis for hub genes was performed using this tool. Select the quartile in gene expression for splitting the high-expression and low-expression cohorts. The 95% con dence intervals were calculated and presented as dotted lines on the plot. The log-rank P-values less than 0.05 were considered to be signi cant.

Baseline characteristics
In this study, we collected clinical data f 38 RCC and 60 LCC samples in GSE44076. The baseline characteristics of the samples were shown in Table 1. There were no signi cant differences in any of the baseline characteristics between the two groups, which guarantees a good comparability.
Data processing and identi cation of DEGs First, the colon cancer gene microarray data was normalized by normalizeBetweenArrays function (Fig. 1). The DEGs between RCC and LCC were identi ed using the Limma R package. A total of 2259 DEGs were obtained from the GSE44076 dataset, 1300 of which were upregulated in RCC and 945 of which were upregulated in LCC. The top 100 DEGs in RCC and LCC were presented in Table 2. GO and KEGG analysis GO analysis is a commonly used method for the functional annotation of genes and gene products. Based on the DAVID database, GO analysis covers three categories to describe biological functions of DEGs: biological process (BP), molecular function (MF) and cell component (CC). The DEGs upregulated in RCC were mainly enriched in immune response, external side of plasma membrane, and transferase activity (Fig. 2a), whereas DEGs upregulated in LCC were mainly involved in protein ubiquitination, cytosol, and protein binding (Fig. 2b).
The KEGG database is one of the most commonly used bioinformatics databases. In this study, we used KEGG analysis to explore cell functions and molecular functions (MF) of the DEGs at a deeper level. The DEGs upregulated in RCC were mainly enriched in antigen processing and presentation, T cell receptor signaling pathway, cytokine-cytokine receptor interaction (Fig. 3a), while the DEGs upregulated in LCC were mainly involved in endocytosis and TGF-beta signaling pathway (Fig. 3b).

PPI network construction and modules analyze
The STRING online database was utilized to construct PPI networks of RCC and LCC respectively, and Fig.  4 illustrated that all nodes degree distribution of PPI networks. Then, the top 5 hub genes in PPI networks were screened through ranking node degrees. The hub genes identi ed from the RCC network were CTLA4, IL10, IL2RB, IFNG, and NCAM1, whereas the hub genes in LCC were EGFR, MYC, SRC, CUL3, and NCBP2, as shown in Table 3. Moreover, the top modules of the PPI networks for RCC and LCC were selected using MCODE and the KEGG analysis of the genes in two modules exhibited that these genes were mainly enriched in T cell receptor signaling pathway, measles, and ubiquitin-mediated proteolysis ( Table 4).
The mRNA expression levels of the hub genes All these hub genes, except CUL3, had signi cant differential expression levels of miRNA between tumor group and normal group from GSE44076 data, and Fig. 5 showed that the gene expression levels of CTAL4, IL10, IFNG, MYC, SRC, and NCBP2 were up-regulated in colon cancer samples compared to normal tissues, whereas IL2RB, NCAM1, and EGFR were decreased in cancer patients.

Survival analysis
We used the GEPIA tool to perform OS and DFS analysis for these hub genes. The results of survival analysis showed that EGFR and IFNG overexpression were found to be connected with poor DFS in colon cancer patients, and lower expression of NCBP2 was associated with poor OS (Fig. 6).

Discussion
Taking into account that advanced protocols of chemotherapy and targeted therapy, the treatment of colon tumors should not only be decided by clinical stage and pathological grade of the tumor but the molecular mechanism of tumorigenesis, some of which were associated with the location of the tumor. The tumor location could contribute to the differences in the pathological type, clinical manifestations, prognosis and chemoresistance in the colon cancer. It is, therefore, essential to regard the tumor location as an important factor in the management of colon cancer, which might offer personalized care and accurate treatment for cancer patients.
Presently, several studies have revealed that there are differences in colon cancers with different tumor locations at the aspect of clinical features, but the underlying molecular genetic mechanism has not yet been elucidated [9][10][11][12]. In this paper, we explored potential gene expression differences between RCC and LCC using bioinformatics analysis methods.
The gene chip GSE44076 we selected was composed of a series homogeneous of early-stage ( A-B) colon cancer samples. A total of 2259 DEGs, 1300 of which were upregulated in RCC and 945 of which were upregulated in LCC, were obtained. To prove the functional heterogeneity of DEGs between RCC and LCC, we then performed GO annotation and KEGG pathway analysis on these DEGs to explore higher-level functional differences. DEGs upregulated in RCC were mainly enriched in immune response, external side of plasma membrane, and transferase activity, whereas DEGs upregulated in LCC were mainly involved in protein ubiquitination, cytosol, and protein binding. The KEGG pathways of upregulated DEGs in RCC were mainly enriched in antigen processing and presentation, T cell receptor signaling pathway, cytokinecytokine receptor interaction, while the DEGs upregulated in LCC were mainly involved in endocytosis and Hippo signaling pathway. Among these pathways, Hippo signaling pathway had been widely studied in the tumor eld, which might affect the metastasis, drug resistance, and recurrence of colorectal cancer [13]. Moroish et al. had found that loss of the Hippo pathway kinases LATS1/2 (large tumor suppressor 1 and 2) in tumor cells inhibits tumor growth by enhancing anti-tumor immune responses, suggesting that the pathway might be a potential therapeutic target for LCC [14]. Besides, we noticed that the enrichment pathways of DEGs upregulated in RCC were involved in immune responses and cytokine interaction, indicating that immune responses are more active in RCC. Here, GO and KEGG analysis revealed that there were different cellular functional levels and signaling pathways in the two groups of DEGs, which could help us to re-recognize the differences in colon cancer with different anatomic sites.
Then, we constructed PPI networks with the two groups of DEGs and screened the following 10 hub genes: CTLA4, IL10, IL2RB, IFNG, NCAM1, EGFR, MYC, SRC, CUL3, and NCBP2. Then, we found that there were signi cant expression differences in these hub genes between colon cancer tissues and normal tissues which provided evidence to some extent that the hub genes were speci c to the malignant tissues and were not from potential baseline differences between right and left colon tissues. We also performed a prognosis analysis of hub genes using the GEPIA online software. Survival analysis indicated that three of the hub genes, EGFR, NCBP2, and IFNG, were correlated with the OS or DFS of colon cancer patients. To a certain extent, these ndings might explain the survival difference in RCC and LCC. CTLA4 is the receptor that negatively regulates T-cell activation by suppressing glucose metabolism and Akt activity [15]. Based on the concept of immune checkpoints, CTLA4 has become one of the most popular therapeutic targets for tumor immunotherapy in recent years, and CTLA4 inhibitor Ipilimumab is already approved by FDA for the treatment of MSI-H/dMMR metastatic colorectal cancer [16]. Considering that MSI-H/dMMR is commonly located in RCC and CTLA4 gene was identi ed as the hub gene of RCC, we speculate that CTLA4 inhibitor might be more bene cial for the treatment of RCC. IL10 is a cytokine produced mainly by macrophages, regulatory T cells and epithelial cells [17]. IL10 has been considered to have immunosuppressive and tumor-promoting potentials in the past [18]. However, it was recently reported that IL-10 was able to promote tumor immunity and developed as an anti-tumor drug for the treatment of cancer patients [19]. Naing et al. have found that PEGylated IL-10 can induce CD8 + T cell immunity in cancer patients and promote the expansion of underrepresented T cell clones, which would improve the therapeutic e cacy of anti-PD-1 [20]. IL2RB gene encodes the IL2 receptor beta chain, a component of the IL2 receptor, which can combine with IL2 to regulate T cell activation and proliferation [21]. Kagoya et al. constructed a novel chimeric antigen receptor T cell (CAR-T) using a cytoplasmic domain of IL2 receptor β, called 28-ΔIL2RB-z (YXXQ) CAR-T cells, which suggested superior persistence and antitumor effects in both liquid and solid tumor models compared with traditional CAR-T cells [22]. Besides, according to the KEGG pathways analysis, the DEGs of RCC in the selected module, including CTLA4, IL10, and IFNG, were enriched in T cell receptor signaling pathway, suggesting that T cell receptor signaling pathway might be a therapeutic target to suppress colon cancer. Overall, these ndings may be important in the treatment of colon cancer, supporting that tumor immunotherapy could act as a potential treatment strategy for RCC, such as anti-CTLA4, anti-PD-1, and CAR-T. EGFR, as an important regulatory factor in tumor cells, was demonstrated to facilitate tumor cell progression by activating Ras/MAPK and PI3K/Akt signaling pathway [23]. To date, anti-EGFR therapy has become an important part of anti-tumor therapy for colon cancer. Recent studies have shown that patients with distal colon cancers have a better prognosis respond to anti-EGFR therapy [24,25]. Moreover, the results of our study demonstrated that a signi cantly higher level of EGFR expression was found in LCC sample, indicating that the better prognosis with anti-EGFR therapy in LCC patients may be related to the expression of EGFR was higher in LCC. SRC, the rst proto-oncogene found, encodes a cytoplasmic tyrosine kinase that belongs to Src family kinases, which were found to play a key role in the growth, differentiation, proliferation, invasion, and metastasis of tumor cells [26]. A cohort study demonstrated that higher mRNA expression of Src was related to poor prognosis in early-stage colon cancer and had a stronger negative impact on outcome in LCC and stage II disease [27]. Several studies have found a more favorable survival adjusted for tumor stage in LCC [28][29][30], whereas Weiss et al. indicated that RCC in stage -II disease demonstrated a better overall survival compared to LCC [31], which might be related to the expression of SRC was higher in early-stage LCC. Besides, it was reported that SRC overexpression was associated with colon cancer resistance to oxaliplatin and SRC inhibitors can restore sensitivity to oxaliplatin in tumors with high levels of phospho-Src [32]. These ndings supported that SRC might be a novel biological marker for prognosis analysis and chemotherapy effects of colon cancer patients, particularly LCC.

Conclusion
In summary, this study was aimed to identify DEGs between RCC and LCC and construct corresponding functional networks and signal pathways through integrated bioinformatics analysis to provide some new insights into the molecular mechanisms of colon cancers with different tumor locations. In addition, we identi ed some hub genes and signal pathways that could be potential biomarkers or therapeutic targets for LCC and RCC. However, the results of this study require further molecular biological experiments to verify.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The data that support the ndings of this study are available from GEO database.

Competing interests
The authors declare that they have no competing interests.

Funding
This work was supported by Wenzhou science and technology bureau project of China (y20180085).
Authors' contributions QW conceived the study, analysed the data and wrote this manuscript. ST designed the study, performed the research. SZ performed the part of research and acquired the data. All authors read and approved the nal manuscript.