Identification of Genes in Patients for Predicting Ulcerative Colitis-Associated Colorectal Cancer

Background Ulcerative colitis (UC) has been considered as a risk factor for colorectal cancer (CRC). However, effective biomarkers for predicting UC-associated CRC are lacking. Therefore, it is necessary to screen biomarkers associated with UC-related CRC, which could be used to evaluate UC-associated CRC early, and provide possible mechanisms involved in UC-associated CRC. Efficient bioinformatics analysis could help us to explore potential biomarkers. Methods Two public datasets, including 44 UC without CRC samples and 17 UC-associated CRC samples were chosen from Gene Expression Omnibus (GEO) database. Sva package was used to remove batch effects, and then we screened out differentially expressed genes (DEGs) with limma package. STRING and Cytoscape were used to achieve protein-protein interaction (PPI) network analysis. The survival curves between high and low gene expression were performed by log rank test based on the cancer genome atlas (TCGA) program. The expression of three identified hub genes was validated based on Oncomine. To validate the expression of three hub genes, we compared the expression of three hub genes between normal and colorectal cancer based on Oncomine. were

(ECCO) (1). However, the clinical symptoms of IBD-associated CRC are different from sporadic CRC, including diagnosis at younger age, location. In IBD-related CRC, the location is dominated in proximal colon, and the patients have mucinous or signet ring histopathologic character and poor prognosis (2,3). A recent meta-analysis showed that patients with ulcerative colitis (UC) have similar risk rate of CRC in Asia compared with Europe and North America, and no regional variation was found in Asia (4).
Recent study showed that IBD related CRC have some common causing factors with sporadic CRC, like environmental changes, microbiology changes. And IBD related CRC is more prone to be found in patients who are sedentary, consume imbalanced diet with low-fiber diet/higher red meat intake, and have lower vitamin D level, and excess emulsifier intake (5). An investigation to risk factors for UC-CRC patients in China showed that common risk factors between sporadic CRC and UC-CRC, such as higher onset age and long disease course was found in UC-CRC (6).
Endoscopy surveillance is still the main tool to assess UC associated CRC (7), however, there are still many limits, like invasive, the need of bowel preparation, long term examination, patient intolerance, and specialist training for chromoendoscopy (8). Furthermore, most of the patients with UC refused to endoscopically resect colitis associated dysplasia treatment (9), although it is an effective method.
However, effective biomarkers for predicting UC-associated CRC are lacking. Therefore, it is necessary to screen IBD-CRC associated biomarkers, which could be used to early evaluate IBD-CRC, and provide possible mechanisms. In addition, the mechanisms involved in IBD associated CRC have not been clearly clarified yet, onset age and long term inflammatory status may be involved in IBD-CRC(10).
Inflammation has been regarded as a key factor contributing to IBD-CRC, and might promote cancer development via inducing DNA methylation mutation (11), and antibiotics could prevent colon epithelial cell proliferation by inhibiting DNA methylation and mutation (12). Runt-related transcription factor (RUNX) 3 was reported to loss activity in UC associated cancer (13), which might could be used for predict UC associated CRC. Otherwise, different miRNAs were found in tissues from UC, UC without neoplasia, UC patients with neoplasia. And miR193a was found to participate in the suppression of UC related CRC via regulating IL17RD/EGFR (14). IL23 was found highly expressed in UC and UC-associated CRC (15). Na+/H + exchanger isoform 8 (NHE8) absence was reported to promote colitis associated CRC, which might be related to the increase of Wnt/β catenin activation and increase of Lgr5 expression in mice (16). In addition, intestinal microbiota were found to be involved in the development of IBD associated CRC in mouse model, like Bacteroides fragilis (17), histamineproducing Lactobacillus reuteri (18).
The specific mechanisms are still not clear, therefore, the prevention of IBD-associated CRC was based on anti-inflammation drugs, such as sulfasalazine, and 5-Aminosalicylic acid. Obviously, longterm use of drugs become a heavy burden for our society. Furthermore, most of the studies about the mechanisms involved in IBD associated CRC were based on animal model, which might be different from human being. Therefore, we screened different expressed molecules between sporadic CRC patients and UC-associated CRC patients based on public database gene expression omnibus (GEO), and aimed to find out biomarkers indicating the development of UC-associated CRC in patients with UC.

Methods
Microarray data. Two raw gene expression datasets [GSE37283, GSE3629] were downloaded from GEO public database. Two datasets were performed based on GPL13158 platform, Affymetrix Human Genome U 133 Plus PM Array Plate and GPL570, Affymetrix Human Genome U 133 Plus 2.0 Array, respectively. Microarray assay were conducted on RNA samples isolated from colon mucosa of the involved patients. 43 UC without cancer, and 6 UC-associated cancer were obtained from GSE3629 dataset, and 4 quiescent UC and 11 UC with neoplasia patients were from GSE37283. All the raw array data from the two above datasets were transferred to the gene expression matrix of the probe and merged via perl programming language, and then removed batch effects by performing background correction, quality control, and standardization through R package sva and limma.
Data processing and Recognition of DEGs. The DEGs of colon mucosa between UC without CRC patients and UC associated patients were screened out by R package limma. |log fold change| (|log FC|) > 4 and P < 0.001 were regarded having statistical significance. A heat map was built based on log2FC of the screened DEGs by R package pheatmap.
DEGs KEGG and GO enrichment analysis. To clarify the function of these differentially expressed genes, R package clusterProfiler is used to obtain Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional annotation information of these genes. P < 0.05 was chosen as cut-off value.
Protein-protein interact (PPI) network building. Search Tool for the Retrieval of Interacting Genes (STRING) (version10.5) database was used to construct PPI network. Figuring out the protein-protein functional interactions would help us to find possible mechanisms underlying the development of diseases, as described in previous study (29). In the current study, STRING database was used to construct PPI network of DEGs, and the combined score was set as more than 0.9. Then, we used Cytoscape (version 3.6.1) to construct protein-protein interaction networks. In addition, Molecular Complex Detection (MCODE) (version 1.5.1), as a plug-in of Cytoscape, was used to furthermore cluster dense areas in PPI network. MCODE scores was set as more than 5, the degree cut-off value was 2, the node score cut-off was set as 0.2, the Max depth was 100, and k-score was set as 2.
Hub genes identification, survival analysis and expression validation in Oncomine. The genes with degrees ≥ 25 were considered as hub genes. The biological process analysis of hub genes was carried out with Biological Networks Gene Oncology tool (BiNGO) (version 3.0.3). The survival curves between high and low gene expression were performed by log rank test based on TCGA program. The expression of three identified hub genes was validated based on Oncomine.

Statistical analysis
Quantitative data are shown as mean ± SEM. For quantitative data, t-test was used for statistical analysis. The analyses were performed with graphpad prism 8. Log rank test was used for survival analysis. P < 0.05 was considered statistically significant.

Results
Identification of DEGs. In total, 405 differentially expressed genes, including 149 up-regulated and 256 down-regulated, were identified, as shown in the volcano map (Fig. 1A). Furthermore, a heatmap was also used to visualize the top 120 DEGs according to |logFC| (Fig. 1B).
KEGG and GO enrichment analysis of DEGs. To further understand the function of identified DEGs, functional and pathway enrichment analysis was carried out via R package clusterProfiler. For GO enrichment analysis, the DEGs were primarily involved in 'structural constituent of ribosome', 'cell adhesion molecule binding', 'cadherin binding', 'amide binding', 'peptide binding', '5S rRNA binding', 'cell adhesion mediator activity', 'cell-cell adhesion mediator activity', and 'enzyme inhibitor activity' (Table 1; Fig. 2A). Moreover, for the KEGG pathway enrichment analysis, the DEGs were primarily enriched in 'antigen processing and presentation', 'neuroactive ligand-receptor interaction', 'ribosome', 'cell adhesion molecules', and 'N-Glycan biosynthesis' (Table 2; Fig. 2B).   (Fig. 3A). The most significant module was gained from PPI network with 57 nodes and 729 edges (Fig. 3B). A top of 16 hub genes, all of which are up-regulated genes in patients with UC associated cancer according to the mcode score, were selected from the PPI network (Fig. 3C), ribosomal protein S12 (RPS12), ribosomal protein S18 (RPS18), ribosomal protein S27α (RPS27A), ribosomal protein S5 (RPS5). Additionally, these hub genes were involved in 'macromolecule biosynthetic process', 'macromolecule metabolic process', 'cellular metabolic process', 'protein metabolic process', 'gene expression', and so on (Fig. 3D). To further evaluate the effect of those hub genes on survival of CRC patients, survival analysis was performed in those genes. Among them, RPL6, RPL7, and RPL35 were related to poor prognosis of patients. Patients with higher expression of RPL6, RPL7, and RPL35 had significantly poor overall survival rate compared to patients with lower expression level (P = 0.0127, 0.00876, and 0.00549, respectively) ( Fig. 4A-C). In addition, we validate the expression of the above three genes based on Oncomine database. RPL6 expression was significantly higher in colorectal carcinoma tissues than normal colon mucosa (p < 0.001) (Fig. 4D), and similar results were found in RPL35 and RPL7 expression based on Oncomine database (p < 0.001) (Fig. 4E-F).

Discussion
As we all known, UC is regarded as an independent factor of CRC. Hence, it's necessary to clarify the molecular and pathological mechanisms participated in UC-associated CRC. And effective bioinformatics analysis could help us to reach this purpose, as well as avoiding wasting of resources.
However, it is prone to higher false positive or negative rate based on one-single dataset analysis, and multi-center analysis could at some extent decrease this phenomenon. As a result, analysis based on two series from GEO datasets (GSE3629 and GSE37283) was performed in the current study to find out biomarkers indicating the process of UC-associated CRC. A total of 405 DEGs were identified, including 149 up-regulated and 256 down-regulated genes. To figure out the function of these DEGs, we performed KEGG and GO analysis. Functional analysis showed these DEGs were mainly involved in cell adhesion molecule binding. In addition, most of them were structural constituents of ribosome. Cell adhesion participates in a majority of physiological processes, for example, blood vessel endothelial cell-cell adhesion plays a key role in vascular integrity and barrier function, and controls the process of vasculogenesis, and angiogenesis (19), which is critical for the development of CRC (20).
Furthermore, 16 hub genes were screened via Cytoscape, including RPL10A, RPL18, ribosomal RPL19, RPL23A, RPL24, RPL27, RPL30, RPL34, RPL35, RPL6, RPL7, RPL9, RPS12, RPS18, RPS27A, and RPS5. All these hub genes are belonging to ribosomes, which are related to the development of various tumor types. The ribosomal protein gene 5 (RPL5) was reported to be significantly mutated in glioblastoma, melanoma and breast cancer samples as a haploinsufficient tumor suppressor (21). In addition, the expression of mitochondrial ribosome protein L35 (MRPL35) was found higher in colorectal cancer, and was associated with poor prognosis (22). Autophagy is considered as an important part in the development of CRC (23), and decreased expression of ribosomal protein S27 like (RPS27L) was found to induce autophagy in breast cancer via inactivating mTORC1 (24). Interestingly, RPS27A was screened in the present study, indicating it might participate in the development of UC-CRC via autophagy. Besides, we also analyzed the correlation between these genes and prognosis.
Overexpression of RPL6, RPL7 and RPL35 was found to be related to poor prognosis in patients with CRC, indicating the potential role of these three genes in the development of UC-CRC. To clarify the expression of these three genes, we analyzed the expression level of these hub genes in normal and colorectal cancer tissues base on Oncomine, and RPL6, RPL7 RPL35 were significantly high-expressed in CRC. In a recent study, RPL6 was identified via proteomic profiling, and was found significantly increased expression in human colorectal cancer tissue compared with adjacent normal tissue (25), which strongly supports our data. RPL35 of Bacillus subtilis was reported to play an essential role in cell proliferation and differentiation (26), suggesting its role in cancer cell proliferation. Interestingly, RPL35 was identified to predict pelvic lymph node metastasis in cervical carcinoma (27). Additionally, another study showed RPL7 was detected in the secretory granules of enterochromaffin cells, and was markedly increased in colorectal cancer cells (28), similar to the current finding. We demonstrated that the three genes may serve as potential markers for evaluating the development of UC-associated CRC.
Although we chose two datasets to perform analysis and aimed to avoid high false positive rate related to one single dataset analysis, there are still some limits in the current study. First, subgroup analysis based on disease degree, age and race is needed. Second, it is necessary to verify the effect of these genes on UC-CRC in large amounts of clinical sample. Last but not the least, imbalanced number of patients might affect the reliability of our results. Therefore, we verified our result in another database. Further studies are still needed to investigate the biological functions and understand the underlying molecular mechanism by which ribosomal proteins participate in UCassociated CRC.

Conclusions
Ribosomal proteins may play a crucial role during ulcerative colitis-related colorectal cancer. Among them, overexpressed RPL6, RPL7, and RPL 35 may be potential tumor oncogenes and have a strong relationship with overall survival in CRC, and may act as a prognostic factor in clinical diagnosis and treatment. In the current study, three genes were identified in UC-associated CRC that may help clinicians in predicting and planning therapy for patients with UC-associated CRC.

Declarations
Ethics approval and consent to participate: Not applicable.

Consent for publication:
Not applicable.
Availability of data and materials: The datasets analyzed during the current study are available in the GEO and Oncomine repository.