1. Microarray data normalization and identification of DEGs
The chip expression data set GSE44904 and GSE43338 row standardization analysis, the results are shown in Fig. 1. The limma R package (adjusted p < 0.05 and | log fold change (fc) | > 2) was used to screen DEGs. Firstly, different groups in GSE44904 were compared, and then through venn analysis, a total of 1063 DEGs were screened out, including 503 up-regulated genes and 560 down-regulated genes (Fig. 2g). A total of 905 DEGs, including 496 up-regulated genes and 409 down-regulated genes, were screened from the data set GSE43338. Different volcanoes are shown in Fig. 2a, 2b, 2c, 2d. Heat map is drawn for the first 100 DEGs shown in Fig. 2e, 2f. According to the differentially expressed genes screened by the two data sets, venn analysis was performed again, and 275 overlapping genes were found, including 103 up-regulated genes and 172 down-regulated genes (Fig. 2h).
2. GO functional and KEGG pathway enrichment analysis
In order to study the functional annotation of selected DEGs, three kinds of enrichment analysis of GO in DAVID were used, including biological process (BP), molecular function (MF) and cellular component (CC). The results were considered statistically significant if P < 0.05, and the three parts of the GO results are shown in Fig. 3c. Biological processes mainly include: positive regulation of transcription from RNA polymerase II promoter, oxidation-reduction process, negative regulation of transcription from RNA polymerase II promoter, negative regulation of cell proliferation, positive regulation of transcription, DNA-templated, cell proliferation, transport, inflammatory response, negative regulation of transcription, DNA-templated, cell adhesion, etc. Cell components mainly include: extracellular space, plasma membrane, extracellular exosome, extracellular region, integral component of plasma membrane, endoplasmic reticulum membrane, Golgi apparatus, Endoplasmic reticulum and others. Molecular functions mainly include: hormone activity, transporter activity, calcium ion binding, receptor binding, heparin binding and oxidoreductase activity. In order to further understand the pathway enrichment function of DEGS, we then constructed the KEGG. As shown in the Fig. 3e, the pathway is mainly enriched in ovarian steroidogenesis,fat digestion and absorption, metabolic pathways, vitamin digestion and absorption, signaling pathways regulating pluripotency of stem cells, arachidonic acid metabolism, foxO signaling pathway༌aldosterone-regulated sodium reabsorption, bile secretion, PI3K-Akt signaling pathway, pathways in cancer, ether lipid metabolism, etc.
3. PPI network and modularization analysis of DEGs
The STRING online database is used to analyze the 275 intersecting DEGs, the PPI network is constructed as the Fig. 3a, and then the Cytoscape software is used to analyze the data. The degree score of DEGs was calculated, and the first 11 genes with the highest score were selected as hub gene (Fig. 4a), which were IGF1, BMP4, SPP1, APOB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, TLR2. The detailed information of hub gene, including gene symbol, degree, full name and gene function, is shown in Table 1. Then, the MCODE plugin in Cytoscape software was used for modular analysis, and the sub-modules with high scores were selected with Score = 9. Module genes were SPP1, Tgoln2, ApoB, FSTL1, LAMB1, LAMC1, CHGB, BMP4, and CYR61 (Fig. 3b). David Databas was used to perform GO function and KEGG pathway enrichment analysis on all genes in the module. It was showed the GO function analysis results about submodule genes in Fig. 3d. BP mainly include extracellular matrix organization, cell adhesion, positive regulation of epithelial cell proliferation, positive regulation of cell migration. CP mainly includes extracellular region, extracellular space and extracellular exosome. MF mainly include Heparin binding, extracellular matrix binding and so on. KEGG pathway analysis showed that it was mainly enriched in ECM-receptor interaction, focal adhesion, PI3K-Akt signaling pathway, pathways in cancer, such as small cell lung cancer(Fig. 3f).
Table 1
PPI network was built, and then Cytoscape software was used to analyze the data.The DEGS degree score was calculated, and the top 11 with the highest score were selected as HUB genes, which were IGF1, BMP4, SPP1, ApoB, CCND1, CD44, PTGS2, CFTR, BMP2, KLF4, and TLR2.Detailed information about the HUB gene, including gene symbol, degree, full name, and gene function.(Table 1).
Gene symbols
|
Degree
|
Full name
|
Gene function
|
IGF1
|
24
|
Insulin Like Growth Factor 1
|
The protein is a member of a family of proteins involved in mediating growth and development
|
BMP4
|
23
|
Bone Morphogenetic Protein 4
|
The encoded protein may also be involved in the pathology of multiple cardiovascular diseases and human cancers
|
SPP1
|
22
|
Secreted Phosphoprotein 1
|
This protein is a cytokine that upregulates expression of interferon-gamma and interleukin-12
|
APOB
|
22
|
Apolipoprotein B
|
The protein affects plasma cholesterol and apolipoprotein levels in diseases
|
CCND1
|
20
|
Cyclin D1
|
This gene alters cell cycle progression, are observed frequently in a variety of human cancers
|
CD44
|
18
|
CD44 Molecule
|
This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis
|
PTGS2
|
18
|
Prostaglandin-Endoperoxide Synthase 2
|
This protein is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis
|
CFTR
|
16
|
CF Transmembrane Conductance Regulator
|
The encoded protein acts as a chloride channel, and controls ion and water secretion and absorption in epithelial tissues
|
BMP2
|
16
|
Bone Morphogenetic Protein 2
|
This protein plays a role in bone and cartilage development
|
KLF4
|
14
|
Kruppel Like Factor 4
|
This protein controls the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53
|
TLR2
|
14
|
Toll Like Receptor 2
|
The protein regulates host inflammation and promotes apoptosis in response to bacterial lipoproteins.
|
Gene symbols
|
Degree
|
Full name
|
Gene function
|
IGF1
|
24
|
Insulin Like Growth Factor 1
|
The protein is a member of a family of proteins involved in mediating growth and development
|
BMP4
|
23
|
Bone Morphogenetic Protein 4
|
The encoded protein may also be involved in the pathology of multiple cardiovascular diseases and human cancers
|
SPP1
|
22
|
Secreted Phosphoprotein 1
|
This protein is a cytokine that upregulates expression of interferon-gamma and interleukin-12
|
APOB
|
22
|
Apolipoprotein B
|
The protein affects plasma cholesterol and apolipoprotein levels in diseases
|
CCND1
|
20
|
Cyclin D1
|
This gene alters cell cycle progression, are observed frequently in a variety of human cancers
|
CD44
|
18
|
CD44 Molecule
|
This protein participates in a wide variety of cellular functions including lymphocyte activation, recirculation and homing, hematopoiesis, and tumor metastasis
|
PTGS2
|
18
|
Prostaglandin-Endoperoxide Synthase 2
|
This protein is responsible for the prostanoid biosynthesis involved in inflammation and mitogenesis
|
CFTR
|
16
|
CF Transmembrane Conductance Regulator
|
The encoded protein acts as a chloride channel, and controls ion and water secretion and absorption in epithelial tissues
|
BMP2
|
16
|
Bone Morphogenetic Protein 2
|
This protein plays a role in bone and cartilage development
|
KLF4
|
14
|
Kruppel Like Factor 4
|
This protein controls the G1-to-S transition of the cell cycle following DNA damage by mediating the tumor suppressor gene p53
|
TLR2
|
14
|
Toll Like Receptor 2
|
The protein regulates host inflammation and promotes apoptosis in response to bacterial lipoproteins.
|
4. Analysis of KEGG pathway of Hub genes
In order to understand the pathway analysis enriched by the hub gene, the KEGG pathway analysis of the hub gene was constructed by DAVID, it was showed that the pathway is mainly enriched in signaling pathways regulating many biological functions in Fig. 4b, such as pluripotency of stem cells, pathways in cancer, proteoglycans in cancer, AMPK signaling pathway, PI3K-Akt signaling pathway, hippo signaling pathway, focal adhesion. Sankey diagram shows the distribution of hub genes in different signaling pathways (Fig. 4c): signaling pathways regulating pluripotency of stem cells (Enriched genes: IGF1, BMP4, BMP2, KLF4; p = 0.0015), pathways in cancer(enriched genes: BMP4, BMP2, CCND1, IGF1, PTGS2; p = 0.0035), proteoglycans in cancer(enriched genes: CCND1, IGF1, CD44, TLR2; p = 0.0043), AMPK signaling pathway(enriched genes: CCND1, IGF1, CFTR; p = 0.0186), PI3K-Akt signaling pathway(enriched genes: CCND1, SPP1, IGF1, TLR2; p = 0.0196), Hippo signaling pathway(enriched genes: BMP4, BMP2, CCND1; p = 0.0273), Focal adhesion(enriched genes: CCND1, SPP1, IGF1; p = 0.0483).
5. Analysis of transcription factor regulatory network of Hub genes
In order to identify the transcriptional regulation of hub gene and evaluate the effect of TF on hub gene expression, a TF-gene regulatory network was constructed based on the JASPAR database on Network Analyst platform. Figure 4d shows transcription factors that can regulate two or more genes. In addition to the hub gene, there are 46 transcription factors in the regulatory network, and 86 relationship pairs have been established. Among the predicted transcription factors, FOXC1 is considered to be the core TFs, that can regulate multiple genes: SPP1, IGF1, BMP4, TLR2, CD44, KLF4, CFTR. The correlation analysis was described in Fig. 5. The results showed that the expression of these up-regulated genes SPP1, IGF1, BMP4, TLR2, CD44, which were positively correlated with FOXC1, while the expression of down-regulated genes KLF4, CFTR, which was negatively correlated with FOXC1.
In order to further investigate some up-regulated genes, we performed gene-miRNA interactions with Tarbase V8.0 of NetworkAnalyst3.0, and these gene-miRNA could regulate more than three genes(Fig. 4e) showed a total of 5 genes, 259 miRNAs and 322 gene-miRNA pairs were registered in the network, and it was found that some miRNAs played an important role in regulating hub genesIt was predicted that hsa-miR-16-5p and hsa-miR-27a-3p could regulate CCND1, CD44, IGF1, SPP1 and hsa-miR-129-2-3p could regulate CCND1, CD44, SPP1, TLR2.
6. Survival analysis of Hub genes in colon cancer
Considering CAC as an etiological classification of colon cancer, we use colon cancer data from TCGA database to analyze the survival of hub gene. The results were showed by Fig. 6. Survival analysis data contains information of high or low expression of target gene and the correlation between hub gene and colon cancer. Among the eleven hub genes, three hub genes were found to be related to the survival of colon cancer patients, including SPP1 (p = 0.019), CFTR (p = 0.031) and KLF4 (p = 0.048).