Identification of DEGs in colorectal cancer
Via GEO2R online tools, DEGs in three datasets (1357 DEGs in GSE35279, 1493 DEGs in GSE21815, and 1884 DEGs in GSE71187, respectively) were extracted after gene expression profile data processing and standardization with the cutoff standard of P value < 0.05 and |logFC|>2. The volcano plot and venn diagram were drawn with the “ggplot2” package in R 4.0.4. The overlapping DEGs among these three datasets contained 413 genes as shown in the Venn diagram (Fig. 2).
Enrichment analysis for DEGs
To elucidate the biological functions of the overlapping DEGs, we performed functional annotation and pathway enrichment analysis via WebGestalt online tool. Results indicated that the overlapping DEGs in biological process of GO enrichment were markedly associated with tissue development, tube morphogenesis, tissue morphogenesis, tube development, skeletal system development, anatomical structure formation involved in morphogenesis, animal organ morphogenesis, circulatory system development, cell proliferation and epithelium development (Fig. 3A). As for molecular function of GO enrichment, DEGs were remarkably related to receptor regulator activity, receptor ligand activity, RNA polymerase II regulatory region sequence-specific DNA binding, RNA polymerase II regulatory region DNA binding, extracellular matrix structural constituent, transcription regulatory region sequence-specific DNA binding, sequence-specific double-stranded DNA binding, extracellular matrix structural constituent conferring tensile strength, hormone activity, transcription regulatory region DNA binding(Fig. 3B). In addition to cellular component, collagen-containing, extracellular matrix, basement membrane, extracellular matrix component, complex of collagen trimers, basolateral plasma membrane, plasma membrane region, endoplasmic reticulum lumen supramolecular polymer, supramolecular complex(Fig. 3C). Besides, signaling pathway analysis of KEGG demonstrated that those DEGs played pivotal roles in Wnt signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Central carbon metabolism in cancer, Biosynthesis of amino acids, Breast cancer, Malaria, Gastric cancer, ECM-receptor interaction, TGF-beta signaling pathway, MicroRNAs in cancer (Fig. 3D).
PPI network construction and significant module identification
To evaluate the interrelationships among these DEGs, we first drew the network of DEGs in STRING, we set the maximum number of interacting bodies to 0 and used a confidence score of 0.7 as the cut-off criterion, the PPI network included 388 nodes and 334 edges(Fig. 4A). Additionally, the Molecular Complex Detection (MCODE) app was also employed to select modules of the PPI network in Cytoscape according to node score cut-off = 0.2, degree cut-off = 2, max.depth = 100, and k − core = 2. The top three functional clusters of modules were selected (module 1, MCODE score = 12.000; module 2, MCODE score = 6.333; module 3, MCODE score = 6.167) (Fig. 4B-4D ).
KEGG pathway analysis of each module was performed by Enrichr (https://maayanlab.cloud/Enrichr/) online tool. The KEGG pathway analysis of module 1 indicated that these genes were involved in the Neuroactive ligand-receptor interaction, Chemokine signaling pathway, and Cytokine-cytokine receptor interaction; genes in module 2 enriched in Progesterone-mediated oocyte maturation and cell cycle; and genes in module 3 related to the Protein digestion and absorption, ECM-receptor interaction, and carbon metabolism.
Hub genes selection and validation
Based on querying STRING protein information from the public database, we constructed a PPI network of the top 20 hub genes according to the maximal clique centrality (MCC). The top 20 hub genes with maximal clique centrality were as follows: SST, PPBP, SPP1, GCG, INSL5, CXCL8, EDN3, CXCL11, PYY, P2RY1, HTR1D, GPR4, SAA1, IL6, PLCB1, APLN, GRP, ADRA2C, GNGT1, GAL(Fig. 5). Then, mRNA expressions of hub genes in 20 types of cancers were measured and compared to normal tissues by ONCOMINE database, and 15 hub genes were found to be over-expression in colorectal cancer (Fig. 6).
We used GEPIA2 online tool to identify the survival data of 20 hub genes and found that the expression levels of three hub genes, namely SPP1, GRP and GNGT1, were remarkably related to the survival of colon adenocarcinoma(COAD) patients(Fig. 7,8). Furthermore, we validated the expression level of these three hub genes in COAD and normal tissues via TCGA-COAD dataset, and explored their diagnostic value as well as clinical significance. The results showed that expression levels of the three genes were significantly increased in the COAD samples and the over-expression of these three genes was correlated with the TMN stage of COAD patients (Fig. 9–10).