Association Between Obesity and Incident Colorectal Cancer: An Analysis Based on Colorectal Cancer Database in the Cancer Genome Atlas

Background To investigate gene factors of colorectal cancer (CRC) in obesity and potential molecular markers. Methods Clinical data and mRNA expression data from The Cancer Genome Atlas (TCGA) was collected and divided into obese group and non-obese group according to BMI. The differential expressed genes (DEGs) were screened out by “Limma” package of R software based on (|log 2 (fold change)|>2 and p < 0.05). The functions of DEGs were revealed with Gene Ontology and Kyoto Encyclopedia Genes and Genomes pathway enrichment analysis using the DAVID database. Then STRING database and Cytoscape were used to construct a protein-protein interaction (PPI) network and identify hub genes. Kaplan-Meier analysis was used to assess the potential prognostic genes for CRC patients.


Introduction
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide based on the newly released global cancer statistics in 2018 [1]. There will be an increased by 60% under the global burden of colorectal cancer by 2030, with > 2.2 million new cases and 1.1 million deaths [2]. It is widely believed that obesity is involved in the onset of cancer. Obesity-induced abnormal lipid metabolism, adipokines and hormones, chronic in ammation, gut microbiota dysbiosis, and disrupted bile acid homeostasis play signi cant roles in the complex metabolic regulation of CRC tumorigenesis [3]. Adipocytes in tumor micro-environment are an energy source for CRC growth. In adipose tissue, secreted adipokines and in ammation can elevate levels of insulin, IGFs, leptin, and in ammatory cytokines (e.g., IL-6, TNF-α, C-C motif ligand 2, and Plasminogen Activator Inhibitor 1), and decrease levels of adiponectin in obese, which alone or together contribute to the formation and development of CRC. Moreover, obesity-induced gut microbiota dysbiosis increases harmful microbiota and metabolites and decreases bene cial ones. Additionally, CRC is promoted by bile acids, especially Deoxycholic acid and tauro-β-muricholic acid which are raised in obesity. The carcinogenesis of CRC is promoted by the bile acid-dependent inhibition of arnesoid X receptor, which is a target for anti-CRC [3]. However, there are few studies in gene terms concerning obesity with colorectal cancer, and the characteristic genes of colon cancer in obese people are not clear. Therefore, the main purpose of this paper is to identify the gene characteristics between obesity and colorectal cancer by bioinformatics based on TCGA database and to screen out potential molecular markers.

Data collection and grouping
The mRNA expression pro les and clinical data of colorectal patients were obtained from TCGA database (https://portal.gdc.cancer.gov). Approval by the Ethics Committee was not necessary because all data were collected from public available database (TCGA). Patients are divided into obese group and nonobese group according to BMI.

DEGs screening
Datasets were screened according to the following criteria: (1) samples with both mRNA sequencing data and clinical information about height and weight; (2) samples with prognosis information. Obesity is de ned as BMI ≥ 30kg/m 2 according to WHO. Finally, a total of 260 samples were enrolled in this study, including 244 with CRC and 16 without CRC. There were 78 with CRC and 3 without CRC in obese group, and 166 with CRC and 13 without CRC in non-obese group. The detailed clinical characteristics and differentially expressed mRNAs were list in Table 1. The Limma and edgeR packages in R software 3.5 were used to identify DEGs between CRC and non-CRC samples in each group. The threshold was set at (|log 2 (fold change)|>2 and p < 0.05) and adjusted P-value < 0.05. Volcano map was made later.

Functional enrichment analysis
To further analyze the potential biological processes (BP), cellular components (CC), molecular functions (MF), and pathways of the overlapping DEGs, the online software the Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) was used to perform Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. P < 0.05 and counts > 2 were set as the threshold values.

PPI networks and the top 10 hub genes
A protein-protein interaction (PPI) network of common DEGs was constructed using the online software Search Tool for the Retrieval of Interacting Gene (STRING, https://string-db.org/). Con dence score ≥ 0.7 was considered as signi cant. Cytoscape software 3.7.2 was utilized to construct a protein interaction relationship network. Then, we select the top 10 hub genes from the PPI network through cytoHubba.

Statistic analysis
SPSS 25 was used to collate the data and compare clinical informations in each group by unpaired t test.
Differences between groups in continuous variables were assessed with analysis of variance (ANOVA) and t test. Categorical variables were compared using X 2 test. The data were presented as mean ± standard deviation (SD) for continuous variable. The patients with CRC were divided into low-expression and high-expression groups, based on the mRNA expression of the ten hub genes in different groups using the X-tile software [4]. Kaplan-Meier (KM) survival curves were served to examine the prognostic performance of the CRC-related hub genes using SPSS 25. P-values less than 0.05 were considered statistically signi cant.

Clinicopathological features of total group, obese group and non-obese group
We divided total group into obese group(n = 69) whose BMI ≥ 30 kg/m 2 and non-obese group (n = 159) whose BMI < 30 kg/m 2 . As shown in Table 1, the average age of obese group was smaller than that of non-obese group, and more importantly, the survival time of obese group was shorter, but there was no signi cant difference. Though survival rate of obese group was higher than that of non-obese group, no signi cant difference was found between groups. The proportion of intermediate-and advanced-stage tumors in the two groups was comparable and the ratio of men and women was close to equal.

Identi cation of DEGs
In total group, there were 9046 DEmRNAs with 3124 upregulated and 5922 downregulated. There were 7615 DEmRNAs in non-obese group with 1747 upregulated and 5868 downregulated, and 2055 DEmRNAs in obese group with 6 upregulated and 2049 downregulated ( Fig. 1a

Functional enrichment analysis of mRNAs
In non-obese group, the DEGs were mainly enriched in leukocyte migration, cellular metal ion homeostasis, divalent inorganic cation homeostasis and calcium ion homeostasis in BP term (Fig. 2b) and involved in adherence junction, plasma membrane protein complex, neuronal cell body and actin cytoskeleton in CC term (Fig. 2c). In MF term, the DEGs were associated with cation transmembrane transporter activity, inorganic cation transmembrane transporter activity, antigen binding and metal ion cation transmembrane transporter activity (Fig. 2d). KEGG pathway analysis indicated that the DEGs were mainly related to cAMP signaling pathway, calcium signaling pathway, cGMP-PKG signaling pathway and adrenergic signaling in cardiomyocytes (Fig. 2a).
In obese group, the DEGs were markedly involved in BP term, which included leukocyte migration, lipid catabotic process, cellular lipid catabotic process and sodium ion transport (Fig. 3b). In addition, genes in CC term were enriched in apical part of cell, apical plasma membrane, cell projection membrane and basolateral plasma membrane (Fig. 3c). In MF term, genes were connective with cation transmembrane transporter activity, inorganic cation transmembrane transporter activity, antigen binding and metal anion cation transmembrane transporter activity (Fig. 3d). In the KEGG pathway, genes mainly included neuroactive ligand-receptor interaction, cAMP signaling pathway, chemical carcinogenesis, drug metabolism-cytochrome P450 and mineral absortion (Fig. 3a). GO and KEGG pathway analysis of the DEGs in total group can be seen in Fig. 4 in detailed.

Construction of the PPI network and hub genes identi cation
We constructed a PPI network to further explore the interaction between the common DEGs by using STRING database and Cytoscape. As displayed in Supplementary Material 1, there were 1042 nodes and 4073 edges in the network in obese group, 1936 nodes and 9328 edges in non-obese group, 1940 nodes and 11029 edges in total group. Top 10 hub genes in each group were identi ed by cytoHubba (Fig. 5). All of top 10 hub genes in obese group were downregulated, while top 10 hub genes except SRSF5 in total group and top 10 hub genes in non-obese group were upregulated.

Identi ed prognostic mRNAs in patients
244 samples in total group from TCGA database, containing 167 in non-obese group and 77 in obese group, were used for KM survival analysis to screen hub genes related to prognosis of CRC patients. The results showed that in non-obese group, the survival rate with low expression of PDCD11 (p < 0.05) was higher than that of high expression (Fig. 6a). High expression of SRSF5, SRSF11 and HNRNPA1 had a higher survival rate than low expression of those in total group (Fig. 6b, c, d).

Discussion
The incidence and mortality of CRC still go up in many countries [5]. Identi cation of genes and pathways related to CRC would be helpful for the diagnosis and treatment of the disease [1]. Ga Eun Nam, et al.' s study [6] suggested that abdominal obesity may be a risk factor for CRC in this East Asian population. Also, a longitudinal analysis study indicated non-alcoholic fatty liver disease (NAFLD) with obesity posed a risk factor for colorectal cancer in apparently healthy Japanese individuals [7]. In addition, Abbinaya Elangovan, et al. conducted a large retrospective, cross-sectional database study which showed the association between obesity and colorectal cancer [8]. Obesity is regarded as one of the key environmental risk factors for the pathogenesis of CRC.
This is the rst article to explore the relationship between obesity and colorectal cancer at the gene level of a large sample. We speculated that obesity may be an independent risk factor of CRC. In order to explore the characteristic genes of CRC in obese group and to identi ed potential molecular markers for early diagnosis, targeted therapy and recognition of prognosis of special CRC population, these characteristic genes were screened from TCGA database. We found that the most signi cant downregulated genes of non-obese group were mostly similar with those of total group, such as VSTM2A, BMP3, SCARA5, CA1, PKIB and SLC4A4. As for VSTM2A, consistent with our ndings, Dong, Y., et al.
identi ed that it was one of the top downregulated secreted protein in CRC and maybe a novel prognostic biomarker for CRC patients, for VSTM2A might suppress colorectal tumorigenesis by directly binding to Wnt signaling co-receptor LDL receptor related protein 6 [9]. In addition, Secco, B., et al. [10] reported VSTM2A as a factor modulating adipogenic commitment, which can maintain and amplify the adipogenic capacity of adipose precursor cells. Cancer tissue in non-obese people has lower VSTM2A expression than normal tissue beside cancer, which seems to contradict energy theory.
In our study, MS4A12, TMIGD1, CA2, GBA3, SLC51B are only sharply downregulated in obese group and SST, PYY, GNG12, CCL13, MCHR2, CCL28, ADCY9, SSTR1, CXCL12, ADRA2A were identi ed as top ten hub genes in the network in obese group. Among above genes, MS4A12 is related to proliferation and motility of colon cancer cells, participating in cell membrane composition, cell differentiation, proliferation, and cell cycle regulation [11]. Current studies supported that MS4A12 is a colon-speci c gene regulated by intestinal differentiation transcription factor CDX2 [12,13] and in adenomatous polyp, MS4A12 transcription is downregulated while CDX2 gene and protein expression upregulated [13,14]. Survival analysis of primary colorectal cancer patients showed that patients with low expression of MS4A12 had a worse survival rate than those with high expression of MS4A12, suggesting that MS4A12 was involved in the occurrence and development of primary colorectal cancer and could inhibit the progression of cancer, which may be a potential target for diagnosis and treatment [15]. What's more, MS4A12 can predict prognosis in early stage colon cancer patient and mainly relate to malignancy of non-metastasis tumor while the prognosis value of metastasis colon tumor is low [13]. However, few evidences con rm the relationship between this gene and obesity.
TMIGD1 is also a downregulated gene in obese group. It not only is a novel adhesion molecule, which is highly conserved in humans and other species, but also a novel tumor suppressor highly expressed in normal coloretal epithelial cells and downregulated in CRC [16]. Kyle Oliver, et al.' s study showed that TMIGD1 inhibits cell migration and metastasis in colon cancer as a tumor suppressor by arresting cell cycle at G2/M, and down-regulation of TMIGD1 correlates with poor survival [17]. However, few studies have been able to draw on any systematic research into the association between this gene and CRC with obesity.
SLC51B, combined with solute transporter-alpha (SLC51A) is the encode gene of the organic solute transporters alpha and beta (OSTa-OSTb), which is located at the basolateral membrane, involving in the metabolism of bile acids [18]. Some studies indicated a defect about intestinal bile acid and conjugated steroid absorption in SLC51A-de cient mice, causing a decrease in the levels of bile acid, serum triglyceride, cholesterol, and glucose [19,20]. Wang DF, et al. con rmed that expression of SLC51B which plays a role in bile acid transport, was decreased in the ulcerative colitis. Diminished uptake of bile acids into ileocytes is most likely responsible for the decreased expression of SlC51B in the ileum. In our study, the down-regulation of this gene in the obese group may be associated with metabolic abnormalities in obesity. Notably, many studies have suggested that ulcerative colitis and CRC have the same metabolic pathway, which seems to support abnormal expression of this gene in CRC.
Carbonic anhydrases, a kind of critical enzymes in regulating pH homeostasis in cells, catalyze the interconversion between carbon dioxide and carbonic acid [21,22]. Norihiro Nakada, et al.'s study showed that CA2, an acytoplasmic enzyme widely expressed in normal organs including colonic mucosa, was downregulated in ulcerative colitis-associated colorectal cancer. However, the mechanism of CA2 downregulation in CRC is not clear [23].
GBA3 is an enzyme belonging to the glycosid e hydrolase family 1 that is able to hydrolyse a wide variety of substrates with a b-D-glucose moiety linked to a hydrophobic group, mostly present in the kidney, liver, spleen, intestine and lymphocytes of mammals [24]. Very little is known about the role of this enzyme in cancer. In our study, GBA3 was downregulated in CRC tissue in obese group. Gene expression pro ling of intestinal metaplasic lesions from patients, which are precursors of gastric cancers, identi ed GBA3 among the top 25 signi cantly upregulated genes [25]. GBA3 might play an important role in the detoxifcation and/or biotransformation of dietary xenobiotic plant β-glycosides in roots and tubers, which are the main food in African populations but not in Asian and European populations. And it can be observed an overlap in Eurasia between the highest frequency of GBA3 loss and a main dietary source of meat or sh, while in Africa the lowest frequency of GBA3 inactive alleles meets the general trend in diets higher in glycoside-rich foods and poorer in meat or sh [26].
In the enrichment analysis of the three groups, DEGs in MF term are mainly focused on ion channel activity, and on in ammation in BP term, but in the obesity group, DEGs also focused on metabolism. In terms of CC, cell membrane covered the most in the obese group, while the non-obese group and total group preferred cell connection structure. The KGGG enrichment analysis of the three groups mainly focused on the common energy metabolism signal pathway.
Though KM survival analysis, PDCD11 may be potential molecular markers for non-obese patients with CRC. PDCD11 is a NF-κB-binding protein that colocalizes with U3 RNA in the nucleolus and is required for rRNA maturation and generation of 18S rRNA. PDCD11 is necessary for Fas ligand (FasL) expression, and PDCD11 overexpression is known to induce transcription of FasL, leading to the induction of apoptosis through Fas/FasL/caspase death pathway [27][28][29][30]. Current studies on PDCD11 mainly focus on the nervous system, such as transient ischemic attack and schizophrenia [31,32]. Importantly, our study argued that PDCD11 upregulated in CRC tissue, belonged to low expression category in non-obese group with longer survival time (Supplement txt. 2; Fig. 6a). There might be other pathways affecting CRC tumorogenesis, which is not clear now.
It is found that SRSF5, SRSF11 and HNRNPA1 had proven to be associated with the prognosis of CRC in total group of our study. SRSF5, previously called SRp40, is a member of the SR protein family that has early been identi ed as a splicing regulator [33,34]. Currently, research about SRSF5 is mainly focused on lung cancer, and it is hyperacetylated and upregulated in human lung cancer. Little is known about the relationship between SRSF5 and colorectal cancer. In our analysis, SRSF5 was downregulated in colorectal tissue and belonged to low expression category in total group (Supplement txt. 2; Fig. 6c). The survival time of low expression SRSF5 group was less than that of the high expression group in total group (P = 0.025). Yuhan Chen, et al.' s study [35] indicated that upon glucose intake, the splicing factor SRSF5 is speci cally induced and promotes the alternative splicing of cell cycle and apoptosis regulator 1 to produce special proteins, which promote tumor growth by enhancing glucose consumption and acetyl-CoA production. SRSF5 seems to be a tumor suppressor gene in colorectal cancer and responds to high glucose to promote cancer development.
HNRNPA1 is also a well-known splicing regulator with effects antagonistic to SR proteins [36]. Upregulated expression and aberrant cytoplasmic localization of HNRNPA1, as determined by immunohistochemical staining, were noted in CRC [37]. HNRNPA1 not only contributed to the promotion of cell growth through the regulation of energy metabolism to use glycolysis e ciently, but upregulated the promoter activity of TRA2B which was considered as an oncogene [38,39]. Sun Y, et.al argued that the phosphorylation of Ser6 of hnRNPA1 was a predictor of poor prognosis for patients with CRC [40]. HNRNPA1 was upregulated in CRC tissue in total group of our study, too. However, it belonged to low expression category with shorter survival time (Supplement txt. 2; Fig. 6d). Interestingly, an animal study showed that Hepatic HNRNPA1 overexpression activated the CaM/Akt pathway and repressed the mTOR/SREBP-1C pathway to ameliorate hyperglycemia and steatosis in obese mice [41]. Therefore, we surmise that this gene might upregulated in obese people with CRC and lead to a poor prognosis of CRC.
At present, there are few researches about SRSF11 on CRC. Ji Hoon Lee, et.al suggested that SRSF11 is a novel the telomerase RNA component binding protein that localizes to nuclear speckles with subnuclear structures that are enriched in premessenger RNA splicing factors. It acts as a nuclear speckle-targeting factor that is essential for telomerase association with telomeres and provides a potential target for modulating telomerase activity in cancer [42]. We found that SRSF11 still belonged to low expression category in total group with shorter survival time (Supplement txt. 2; Fig. 6b). It seemed to con rm that SRSF11 is a tumor suppressor gene in CRC.
There are some limitations in this study. Firstly, patients were categorized by BMI. However, some studies suggested that measures of abdominal adiposity, such as waist circumference (WC) and waist-to-hip ratio (WHR), might be more strongly associated with colorectal neoplasia risk than BMI [6,43]. Secondly, our study was performed only by pure bioinformatics analysis. Finally, our article only found the relevant colorectal cancer genes in the non-obese population, but not in obese population. Therefore, further experiments are needed to validate the results based on tumor samples and clinical data.

Conclusion
Brie y, our study reveals obesity plays an important role in pathogenesis of colorectal cancer in molecular term. Hub gene PDCD11 may serve as novel independent prognostic biomarkers that could be used to predict the clinical outcomes of non-obese patients with CRC. This signature may serve as a possible candidate biomarker and therapeutic target for these special colorectal cancer patients. It is necessary to further pre-clinical studies followed by clinical trials to validate our ndings in the future.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analysed during the current study are available in the The Cancer Genome Atlas repository.

Competing interests
The authors declare that they have no competing interests.

Funding
This research did not receive any speci c grant from funding agencies in the public, commercial, or notfor-pro t sectors.
Authors' contributions SYX completed the experimental assumption and determined the method, participated in the collection and analysis of the data, and was a major contributor in writing the manuscript. SYX and CTH prepared gure 1-6 and Table 1. CTH participated in the collection and analysis of the data, and manuscript writing. All authors read and approved the nal manuscript.