Identification of DEGs
Two microarray datasets with 170 colorectal cancer tissues and 17 normal tissues, and TCGA dataset with 482 CC tissue and 42 normal tissues were included in this study. According to our screen criteria, 651, 3047, and 13048 DEGs were pick out from GSE5206, GSE9348, and TCGA datasets, respectively. Subsequently, we applied the VennDiagram package of R to acquire an intersection for the DEGs. As a result, 501 DEGs were identified, including 139 up-regulated and 362 down-regulated genes (Figure 1).
Functional enrichment analysis of DEGs
All the DEGs were imported to the online tool DAVID, and GO categories and KEGG pathways were analyzed to identify potential gene functional annotation and pathway enrichment. The top 20 GO terms and pathways were present as bubble graphics using R package of ggplot2. The KEGG pathways were shown in Figure 2A, some signaling transduction and cancer developing related signaling pathways were involved, including cytokine-cytokine receptor interaction, ECM-receptor interaction, chemical carcinogenesis and PI3K-Akt signaling pathways (P<0.05). GO terms were shown in Figure 2B, the results showed that DEGs were mainly enriched in collagen catabolic process, extracellular matrix organization and negative regulation of growth in biological processes (BP); extracellular space, proteinaceous extracellular matrix and extracellular region in cell component (CC); chemokine activity, extracellular matrix structural constituent, and extracellular matrix binding in molecular function (MF).
PPIN construction and hub gene identification.
We utilized the STRING database to explore the interactions among the 501 DEGs. The interaction score ≥0.4 was set as the threshold for the network. The results showed that 494 nodes (genes) and 1586 edges (interactions) were included in the constructed PPI network, with an average node degree 6.42, enrichment p-value< 1.0E-16. (Figure 3). Network analysis using cytoscape indicated that VEGFA has an interaction degree of 53, the highest of all. Besides, we enrolled the genes with connectivity degree > 15 as hub genes. As a result, a total of 50 genes were included. The detailed data about these hub genes were list in Table 1. Furthermore, top 3 significant functional modules with score >6 were screened using MOCDE (Figure 4A- Figure 4C). KEGG analysis of the three modules revealed that the significantly enriched pathways were mainly correlated with ECM-receptor interaction, protein digestion and absorption, chemokine signaling pathway, bladder cancer, focal adhesion, PI3K-Akt signaling pathway, tight junction, leukocyte transendothelial migration, hepatitis C (Figure 4D- Figure 4F).
Table 1
Description of the 50 hub genes
Gene
|
Connective degree
|
Expression
|
Full name
|
VEGFA
|
53
|
Up
|
Vascular endothelial growth factor A
|
MYC
|
47
|
Up
|
Myc proto-oncogene
|
CXCL8
|
44
|
Up
|
Chemokine (C-X-C motif) ligand 8
|
COL1A1
|
43
|
Up
|
Collagen alpha-1
|
CXCL12
|
41
|
Down
|
Chemokine (C-X-C motif) ligand 12
|
CCND1
|
39
|
Up
|
G1/S-specific cyclin-D1
|
TIMP1
|
37
|
Up
|
Metalloproteinase inhibitor 1
|
MMP2
|
35
|
Down
|
Matrix metalloproteinase-2
|
MMP3
|
34
|
Up
|
Matrix metalloproteinase-3
|
SPP1
|
34
|
Up
|
Secreted phosphoprotein 1
|
SOX9
|
33
|
Up
|
SRY (sex determining region Y)-box 9
|
COL1A2
|
33
|
Up
|
Collagen alpha-2
|
BGN
|
30
|
Up
|
Biglycan
|
THY1
|
29
|
Up
|
Thy-1 cell surface antigen
|
BMP2
|
29
|
Down
|
Bone morphogenetic protein 2
|
MMP1
|
29
|
Up
|
Matrix metalloproteinase-1
|
KIT
|
26
|
Down
|
Mast/stem cell growth factor receptor Kit
|
CXCL1
|
26
|
Up
|
Chemokine (C-X-C motif) ligand 1
|
SPARC
|
26
|
Up
|
Secreted protein, acidic, cysteine-rich (osteonectin)
|
CD19
|
24
|
Down
|
CD19 molecule
|
VCAN
|
24
|
Up
|
Versican
|
THBS2
|
24
|
Up
|
Thrombospondin 2
|
HGF
|
23
|
Down
|
Hepatocyte growth factor
|
SLC26A3
|
21
|
Down
|
Solute carrier family 26, member 3
|
MMP7
|
21
|
Up
|
Matrix metalloproteinase-7
|
HPGDS
|
20
|
Down
|
Hematopoietic prostaglandin D synthase
|
COL11A1
|
20
|
Up
|
Collagen, type XI, alpha 1
|
COL5A2
|
20
|
Up
|
Collagen, type V, alpha 2
|
FABP1
|
19
|
Down
|
Fatty acid binding protein 1, liver
|
GNAI1
|
19
|
Down
|
Guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 1
|
CDH11
|
19
|
Up
|
Cadherin 11, type 2, OB-cadherin (osteoblast)
|
COL4A1
|
19
|
Up
|
Collagen, type IV, alpha 1
|
COL5A1
|
18
|
Up
|
Collagen, type V, alpha 1
|
CXCL13
|
18
|
Down
|
Chemokine (C-X-C motif) ligand 13
|
CXCL5
|
18
|
Up
|
Chemokine (C-X-C motif) ligand 5
|
CCL20
|
18
|
Up
|
Chemokine (C-C motif) ligand 20
|
NR1H4
|
17
|
Down
|
Nuclear receptor subfamily 1, group H, member 4
|
ABCG2
|
17
|
Down
|
ATP-binding cassette, sub-family G (WHITE), member 2
|
KLF4
|
17
|
Down
|
Kruppel-like factor 4
|
UGT2A3
|
17
|
Down
|
UDP glucuronosyltransferase 2 family, polypeptide A3
|
ANPEP
|
17
|
Down
|
Alanyl (membrane) aminopeptidase
|
GUCA2A
|
17
|
Down
|
Guanylate cyclase activator 2A
|
COL12A1
|
17
|
Up
|
Collagen, type XII, alpha 1
|
GCG
|
17
|
Down
|
Glucagon
|
MET
|
16
|
Up
|
MET proto-oncogene, receptor tyrosine kinase
|
CD79A
|
16
|
Down
|
CD79a molecule, immunoglobulin-associated alpha
|
CLCA1
|
16
|
Down
|
Chloride channel accessory 1
|
TGFBI
|
16
|
Up
|
Transforming growth factor, beta-induced
|
TWIST1
|
16
|
Up
|
Twist family bhlh transcription factor 1
|
PLAU
|
16
|
Up
|
Plasminogen activator, urokinase
|
Survival analysis of the hub genes
We applied Kaplan Meier survival analysis to assess the prognostic value of these hub genes. The results showed that 6 genes including TIMP1, VEGFA, KLF4, GCG, CXCL8, CLCA1, were significantly associated with prognosis in CC. Among the up-regulated DEGs, high expression of TIMP1 were correlated with worse OS (Figure 5A, P=0.042) and PFS (Figure 5B, P=0.046) remarkably. High expression of VEGFA was significantly associated with worse PFS (Figure 5D, P=0.019), but not with OS (Figure 5C, P=0.054). However, high expression of CXCL8 had a positive correlation with OS (Figure 5E, P=0.0058), no significant correlation was detected with PFS (Figure 5F, P=0.11). Among the down-regulated DEGs, low expression of CLCA1 was associated with poor OS (Figure 5K, P=0.004) and PFS (Figure 5L, P=0.0037). Low expression of KLF4 was associated with poor PFS (Figure 5J, P=0.022), but not with OS (Figure 5I, P=0.69). Low expression of GCG was associated with poor OS (Figure 5G, P=0.026), but not with PFS (Figure 5H, P=0.77).
The diagnostic value of the prognostic genes in CC
The ROC curve analysis was shown in Figure 6, all the 6 prognostic genes presented high diagnostic value for CC identification. The diagnostic parameters were AUC 95.143, 95%CI(91.872-98.414),cutoff value 246.396, sensitivity 90.87 and specificity 90.244 with TIMP1; AUC 97.292, 95%CI(95.601-98.982), cutoff value18.943,sensitivity 90.658 and specificity 95.122 with VEGFA, AUC 97.561, 95%CI(95.055-100),cutoff value 162.207, sensitivity 95.117 and specificity 95.122 with KLF4; AUC 98.674, 95%CI(97.696-99.652),cutoff value 6.046, sensitivity 94.904 and specificity 97.561 with GCG;AUC 91, 95%CI(85.809-96.191),cutoff value 8.388, sensitivity 92.357 and specificity 78.049 with CXCL8;and AUC 93.931, 95%CI(91.235-96.627),cutoff value 429.4, sensitivity 84.926 and specificity 95.122 with CLCA1,respectively(Table 2).
Table 2
Parameters of receiver operating characteristic analysis for the prognostic genes.
Gene
|
AUC
|
95% CI
|
Cutoff value
|
Sensitivity
|
Specificity
|
TIMP1
|
95.143
|
91.872-98.414
|
246.396
|
90.87
|
90.244
|
VEGFA
|
97.292
|
95.601-98.982
|
18.943
|
90.658
|
95.122
|
KLF4
|
97.561
|
95.055-100
|
162.207
|
95.117
|
95.122
|
GCG
|
98.674
|
97.696-99.652
|
6.046
|
94.904
|
97.561
|
CXCL8
|
91
|
85.809-96.191
|
8.388
|
92.357
|
78.049
|
CLCA1
|
93.931
|
91.235-96.627
|
429.4
|
84.926
|
95.122
|
AUC: area under the curve; CI: confidence interval. |
Correlation between prognostic genes and Immune cell infiltration
The top correlation between the prognostic genes and infiltrated immune cells were shown in Figure 7. TIMP1 presented a moderate positive correlation with macrophages M0; and a low negative correlation with B cells memory, CD4 memory resting, CD4 memory activated, eosinophils, mast cell resting and plasma cells. VEGFA presented a moderate negative correlation with mast cell resting; a low negative correlation with macrophages M2 and monocytes; and a low positive correlation with macrophages M0, and NK cell resting. CXCL8 presented a moderate positive correlation with neutrophils; a low positive correlation with mast cell activated; and a low negative correlation with B cells memory, mast cell resting, monocyte, and Tregs.KLF4 exhibited a low positive correlation with macrophages M2, mast cell resting, and plasma cells; a low negative correlation with macrophages M0. GCG exhibited a moderate positive correlation with mast cell resting; a low positive correlation with B cell memory, macrophages M2 and plasma cells; and a low negative correlation with macrophages M0 and NK cell resting. CLCA1 exhibited a low positive correlation with CD4 memory resting, macrophages M2, mast cell resting and plasma cells; and a low negative correlation with macrophages M0.
Expression of the 6 prognostic genes in normal and colon cancer tissue using HPA database
In HPA database, TIMP1 was negative in normal tissues, but positively stained in some of the cancerous tissues(Figure 8A). VEGFA was positively expressed in both normal and tumor tissues, but much stronger in some of the cancerous tissues(Figure 8B). KLF4 was highly expressed in normal tissues, but negative or low expressed in cancerous tissues(Figure 8C). GCG and CXCL8 were negative or low expressed in both normal and tumor tissues(Figure 8D, Figure 8E). CLCA1 was greatly expressed in normal tissues, but negative in cancer tissues(Figure 8F). Among the above 6 genes, the expression of TIMP1, KLF4, and CLCA1 in HPA database were mostly corresponding with gene expression profile.