3.1 3.1 Batch Correction for Crohn's Disease and Cervical Cancer Data
We downloaded the platform and matrix files for Crohn's disease (GSE95095 and GSE186582) and cervical cancer (GSE63514 and GSE63678) from the NCBI-GEO website. Batch correction was performed separately for the two disease datasets. Prior to correction, the data for Crohn's disease (GSE95095 and GSE186582) (Fig. 1A, B) and cervical cancer (GSE63514 and GSE63678) (Fig. 1C, D) exhibited clustered distribution in pairs and random distribution across datasets. However, after correction, the batch effect was effectively eliminated, and the corrected data will be used for subsequent analysis.
3.2 Identification and Visualization of DEGs in Crohn's Disease and Cervical Cancer
We conducted a search for significant DEGs between the disease and normal groups in both Crohn's disease and cervical cancer datasets. These DEGs were filtered based on logFC=1, multiplicity of difference=2, and P value=0.05 criteria, and the results were visualized as volcano plots (Fig. 2A, B). Our analysis revealed 861 DEGs in Crohn's disease and 467 DEGs in cervical cancer. To identify shared DEGs between the two diseases, we created a Venn diagram (Fig. 2C), which showed that a total of 60 genes were co-expressed in both conditions. These genes are considered the core genes shared between
Crohn's disease and cervical cancer.
3.3 Functional Characterization of Core Genes
In order to analyze the biological functions and pathways of the core genes, 60 DEGs were analyzed for GO and KEGG pathway enrichment. The results of GO analysis showed that these genes were mainly enriched in the metabolic hormone process, angiogenesis regulation, and vasculature development regulation were increased (Fig.3A, B) Regarding the KEGG pathway, the three significantly enriched metabolic pathways were sulfur metabolism, selenium compound metabolism, and nitrogen metabolism (Fig. 3D, E). And in this way, string diagrams were output on GO and KEGG enrichment (Fig. 3C, E); the left semicircle represents the gene, and the right semicircle is the critical pathway enriched by the gene. The gene enrichment was found to be most significant in the steroid hormone biosynthesis process, hormone metabolism process pathway (Fig. 3C) and most associated with the sulfur metabolism pathway (Fig. 3F). These results strongly suggest that these pathways are jointly involved in the development and progression of both Crohn's disease and cervical cancer.
3.4 Identification and Visualization of Modular Genes in Crohn's Disease and Cervical Cancer
To pinpoint key genes in Crohn's disease and cervical cancer, we conducted WGCNA analysis. The results were visualized using correlation heatmaps, and we assessed the relationship between each module and the disease based on Spearman's correlation coefficient and the P value of the correlation test. Smaller P value indicated stronger relevance between genes within the module and the disease. In Crohn's disease, the genes within the ME-yellow module demonstrated the strongest disease correlation (r = -0.15, P = 4e-04) (Fig. 4A-C), designating it as a key module. Similarly, in cervical cancer, the ME-yellow module exhibited the strongest disease correlation (r = -0.61, P = 9e-17) (Fig. 4D-F), and was identified as a key module. We then intersected the genes from the two key modules, resulting in a Venn diagram (Fig. 4G). This analysis revealed a total of 11 intersecting genes in the key modules of both diseases, which were selected for subsequent analysis.
3.5 Machine Learning Based Screening of Potential Biomarkers Co-Expressed in Crohn's Disease and Cervical Cancer
To further identify potential biomarkers co-expressed in both Crohn's disease and cervical cancer, we employed two machine learning algorithms: the LASSO regression algorithm and the Random Forest (RF). First, we applied the LASSO regression algorithm to screen for overfitting genes among the modularized genes. This analysis identified 11 genes (CA12, CLEC2B, CLEC5A, CXCR4, FAM49A, HAS2, MMP3, NCF2, NPL, PLAU, and QPCT) as potential diagnostic biomarkers (Fig.5A, B). Next, we used the RF algorithm to select 19 genes with importance scores >2 from the DEGs (CXCR4, ODF3L2, COL17A1, PRRG2, HOXC6, PTK6, CPA6, SPINK5, DUSP2, ZNF844, VSIG2, PLA2G4F, PROM2, PIM2, SHROOM3, CEACAM7, GJB2, HIST1H1C, CA4) (Fig.5C) Finally, we visualized the intersecting gene, CXCR4, identified by both algorithms using a Venn diagram. CXCR4 was considered a potential biomarker indicative of co-expression between Crohn's disease and cervical cancer (Fig. 5D).
3.6 Validating the Expression Levels of Core Genes
To validate the expression levels of CXCR4 in Crohn's disease and cervical cancer, we used the "limma" and "ggpubr" packages in R to visualize the data. We exported the results as violin plots using the "ggviolin" package. In Crohn's disease, blue color represents normal samples, and red color represents Crohn's disease (Fig. 6A). In cervical cancer, we categorized cancer into three types: blue for normal cervical epithelial samples, red for cervical intraepithelial tumors, and green for cervical squamous epithelial carcinoma (Fig. 6B). Our analysis revealed a significant up-regulation of CXCR4 expression in both Crohn's disease and cervical cancer compared to normal samples. This suggests that CXCR4 is a high-risk gene in both diseases and warrants further attention.
3.7 Prediction of Transcription Factors (TF)
To predict the transcription factors that can regulate the core gene CXCR4, we conducted an analysis using the TRRUST transcription factor enrichment analysis website(http://www.grnpedia.org/trrust/). Our analysis identified the transcription factors MYC and VHL as potential regulators of CXCR4 (Fig. 7). MYC, in particular, is a broadly acting transcription factor with the ability to regulate CXCR4 and warrants further exploration.
3.8 Expression of CXCR4 in cervical cancer and adjacent para-cancerous tissues
The combined bioinformatics analysis showed that CXCR4 had an abnormally high expression in cervical cancer, which piqued our interest. To validate this finding, we conducted in vitro immunohistochemical experiments, and images were captured using an AE2000 microscope at 10×200 magnification (Fig.8A). Quantification was performed using ImageJ software, and data analysis was done with GraphPad Prism software. Our findings revealed that CXCR4 expression was significantly higher in cervical cancer tissues than in adjacent para-cancerous tissues (P <0.05) (Fig. 8B). These results were in line with the bioinformatics analysis and provided further robust evidence supporting CXCR4 as a potential tumor marker for cervical cancer associated with Crohn's disease.
3.9 Validation of CXCR4 expression in cervical cancer cells (Hela) versus human vaginal epithelial cells (HESC) using Western Blot assay
The expression of CXCR4 in the cells was verified by culturing Hela cells with HESC cells and extracting the corresponding proteins according to Western Blot assays. The grayscale values of the strips were analyzed using Image J software, and the experimental data were organized using GraphPad Prism software. The results showed that CXCR4 was highly expressed in Hela cells and expressed at a lower level in HESC cells (P<0.01) (Fig. 9A, B). The experiments were further validated at the cellular level, and CXCR4 should be closely monitored as a risk protein for cervical cancer caused by Crohn's disease.