3.1 The detailed flowchart is shown in Fig. 1
Figure 1 The flowchart of this study
TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma. PARGs, Pyroptosis- and Aging- related genes. GO, Gene Ontology. KEGG, Kyoto Encyclopedia of Genes and Genomes. GSEA, Gene Set Enrichment Analysis. GSVA, Gene Set Variation Analysis. LASSO, least absolute shrinkage and selection operator. PPI, Protein-protein interaction. IHC, immunohistochemical analysis.
3.2 Differential expression analysis
In the differential analysis of TCGA-COADREAD dataset, 55537 DEGs were identified, of which 14599 had a |logFC| greater than 1 and Padj of less than 0.05. There were 4266 and 10,333 differentially expressed genes with a negative logFC, which indicated higher expression in the colorectal cancer low-risk group and lower expression in the colorectal cancer high-risk group below the threshold. We visualized the volcano plot (Fig. 2A) and differential ranking plot (Fig. 2B) to demonstrate the outcomes of the differential assessment of this dataset. We combined PRGs and ARGs (Fig. 2C) to generate PARGs. Finally, we compared the upregulated differentially expressed genes with PARGs and identified four genes (CTSG, VDR, CITED2, and ELANE), which are represented in a Venn diagram (Fig. 2D). For the downregulated differentially expressed genes, intersection with ARGs yielded 14 genes (TREM2, FGF21, UCP1, IL17A, VEGFA, IL1A, TREM1, IL1B, DRD2, NOD2, PCSK9, CXCL8, CEBPB, and MMP1), and a Venn diagram (Fig. 2E) was used to illustrate the results. In Table 1, we provide the gene names and expression data for the 18 PARDEGs. In Dataset GSE74602, sixteen distinct PARDEGs (CTSG, VDR, CITED2, TREM2, FGF21, UCP1, IL17A, VEGFA, IL1A, TREM1, IL1B, DRD2, NOD2, PCSK9, CEBPB, MMP1) were identified. In dataset GSE87211, seventeen genes were discovered, namely, CTSG, VDR, CITED2, ELANE, TREM2, FGF21, UCP1, IL17A, VEGFA, IL1A, TREM1, IL1B, DRD2, NOD2, PCSK9, CEBPB, and MMP1. We compared the expression levels of PARDEGs in TCGA-COADREAD, GSE 74602, and GSE87211 using grouped comparison plots (Fig. 2F-H), and determined whether the differences between them were statistically significant (P < 0.05). The GSE74602 and GSE87211 datasets demonstrated 12 genes that matched TCGA-COADREAD validation results, including CTSG, VDR, CITED2, ELANE, VEGFA, IL1A, TREM1, IL1B, NOD2, PCSK9, CEBPB, and MMP1, as shown in Figure (Fig. 2F-H).
Table 1
List of description and expression difference of PARGs of differential expression analysis.
Gene Symbol
|
Description
|
logFC
|
P.Value
|
adj.P
|
CTSG
|
cathepsin G
|
3.34413154
|
1.6521E-30
|
2.9674E-29
|
VDR
|
vitamin D receptor
|
1.110243456
|
2.71598E-35
|
6.32369E-34
|
CITED2
|
Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 2
|
1.756692626
|
2.82857E-80
|
5.79674E-78
|
ELANE
|
elastase, neutrophil expressed
|
3.578797496
|
7.08607E-58
|
5.3411E-56
|
TREM2
|
triggering receptor expressed on myeloid cells 2
|
-1.145762154
|
4.92095E-08
|
1.82018E-07
|
FGF21
|
fibroblast growth factor 21
|
-2.46493427
|
2.54703E-08
|
9.72682E-08
|
UCP1
|
uncoupling protein 1
|
-2.112722822
|
1.82337E-05
|
5.0268E-05
|
IL17A
|
interleukin 17A
|
-2.640506888
|
1.59317E-15
|
1.11504E-14
|
VEGFA
|
vascular endothelial growth factor A
|
-1.522614701
|
2.7472E-54
|
1.73538E-52
|
IL1A
|
interleukin 1 alpha
|
-2.635870505
|
3.32149E-25
|
4.43041E-24
|
TREM1
|
triggering receptor expressed on myeloid cells 1
|
-2.638664774
|
6.77E-59
|
4.99E-57
|
IL1B
|
interleukin 1 beta
|
-2.635870505
|
3.32149E-25
|
4.43041E-24
|
DRD2
|
dopamine receptor D2
|
-3.249755965
|
2.637E-22
|
2.9465E-21
|
NOD2
|
nucleotide binding oligomerization domain containing 2
|
-1.465023815
|
7.23191E-23
|
8.34918E-22
|
PCSK9
|
proprotein convertase subtilisin/kexin type 9
|
-2.715151156
|
1.23554E-46
|
5.26298E-45
|
CXCL8
|
C-X-C motif chemokine ligand 8
|
-3.457782428
|
1.47291E-36
|
3.67163E-35
|
CEBPB
|
CCAAT enhancer binding protein beta
|
-1.578090596
|
1.89071E-38
|
5.2257E-37
|
MMP1
|
matrix metallopeptidase 1
|
-4.574778525
|
6.12923E-66
|
6.56802E-64
|
PARGs, Pyroptosis- and Ageing- related genes. |
Figure 2 Analysis of differentially expressed genes (DEGs). (A)Volcano plot of the distributions of all differentially expressed genes between cancer group (grouping: COADREAD) and normal group (grouping: Normal) for dataset TCGA-COADREAD. (B) Differential expression analysis differential ranking plot between cancer group (grouping: COADREAD) and normal group (grouping: Normal) for dataset TCGA-COADREAD. (C) Venn diagram showing the intersection of PRGs and ARGs. (D) Venn diagram display of the intersection of up-regulated differentially expressed genes with PARGs obtained from dataset TCGA-COADREAD. (E) Venn diagram display of the intersection of down-regulated differentially expressed genes with PARGs obtained from dataset TCGA-COADREAD. (F-H) ARGs in dataset TCGA-COADREAD, GSE 39582 and GSE113513 grouped comparison plots are shown. “ns” indicates lack of statistical significance (P > 0.05); asterisks denote statistically significant P values: * statistically significant(p ≤ 0.05), ** highly statistically significant(p ≤ 0.01), *** highly statistically significant(p ≤ 0.001). TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. PRGs, Pyroptosis related genes. ARGs, Aging related genes. PARGs, Pyroptosis- and Aging- related genes.
3.3 GO-KEGG enrichment analysis
We constructed a bar chart (Fig. 3A) to illustrate the expression of the 18 PARDEGs in TCGA-COADREAD dataset. We conducted GO-KEGG enrichment analysis on the 18 PARDEGs (Table 2) to identify the biological processes (BP), molecular functions (MF), and cellular components (CC). We used GO-KEGG enrichment analysis to generate a network graph of BP, CC, MF, and KEGG for TCGA-COADREAD dataset, with joint logFC for PARDEGs, as shown in Fig. 3B. The logFC of the molecule was used to calculate the z-score for each entry, which is shown graphically using a chord plot (Fig. 3C) and a circle plot (Fig. 3D). The study revealed that PARDEGs were largely associated with temperature maintenance (GO:0001659), control of cytokine production (GO:0001819), sustaining multicellular organisms (GO:0048871), the inside of secretory granules (GO:0034774), the inside of azurophil granules (GO:0035578), the inside of cytoplasmic vesicles (GO:0060205), cytokine action (GO:0005125), glycosaminoglycan adhesion (GO:0005539), receptor ligand action (GO:0048018), Rheumatoid Arthritis (hsa05323), IL-17 pathway (hsa05321), and Tuberculosis (hsa05152).
Table 2
Results of GO and KEGG Enrichment Analysis
ONTOLOGY
|
ID
|
Description
|
GeneRatio
|
BgRatio
|
pvalue
|
p.adjust
|
qvalue
|
BP
|
GO:0001659
|
temperature homeostasis
|
7/18
|
173/18670
|
1.52e-10
|
2.52e-07
|
1.06e-07
|
BP
|
GO:0001819
|
positive regulation of cytokine production
|
8/18
|
464/18670
|
4.82e-09
|
3.23e-06
|
1.36e-06
|
BP
|
GO:0048871
|
multicellular organismal homeostasis
|
8/18
|
485/18670
|
6.82e-09
|
3.23e-06
|
1.36e-06
|
BP
|
GO:0050900
|
leukocyte migration
|
8/18
|
499/18670
|
8.52e-09
|
3.23e-06
|
1.36e-06
|
BP
|
GO:0032103
|
positive regulation of response to external stimulus
|
7/18
|
323/18670
|
1.17e-08
|
3.23e-06
|
1.36e-06
|
CC
|
GO:0034774
|
secretory granule lumen
|
3/18
|
321/19717
|
0.003
|
0.076
|
0.065
|
CC
|
GO:0035578
|
azurophil granule lumen
|
2/18
|
91/19717
|
0.003
|
0.076
|
0.065
|
CC
|
GO:0060205
|
cytoplasmic vesicle lumen
|
3/18
|
338/19717
|
0.003
|
0.076
|
0.065
|
CC
|
GO:0031983
|
vesicle lumen
|
3/18
|
339/19717
|
0.003
|
0.076
|
0.065
|
MF
|
GO:0005125
|
cytokine activity
|
5/18
|
220/17697
|
2.13e-06
|
1.24e-04
|
4.37e-05
|
MF
|
GO:0005539
|
glycosaminoglycan binding
|
5/18
|
229/17697
|
2.59e-06
|
1.24e-04
|
4.37e-05
|
MF
|
GO:0048018
|
receptor ligand activity
|
6/18
|
482/17697
|
5.56e-06
|
1.78e-04
|
6.24e-05
|
MF
|
GO:0070851
|
growth factor receptor binding
|
4/18
|
134/17697
|
8.86e-06
|
2.13e-04
|
7.46e-05
|
MF
|
GO:0004252
|
serine-type endopeptidase activity
|
4/18
|
160/17697
|
1.78e-05
|
3.43e-04
|
1.20e-04
|
KEGG
|
hsa05323
|
Rheumatoid arthritis
|
6/17
|
93/8076
|
2.21e-08
|
1.93e-06
|
1.17e-06
|
KEGG
|
hsa04657
|
IL-17 signaling pathway
|
5/17
|
94/8076
|
1.06e-06
|
4.63e-05
|
2.80e-05
|
KEGG
|
hsa05321
|
Inflammatory bowel disease
|
4/17
|
65/8076
|
8.41e-06
|
2.44e-04
|
1.48e-04
|
KEGG
|
hsa05152
|
Tuberculosis
|
5/17
|
180/8076
|
2.59e-05
|
5.63e-04
|
3.41e-04
|
KEGG
|
hsa04933
|
AGE-RAGE signaling pathway in diabetic complications
|
4/17
|
100/8076
|
4.65e-05
|
8.10e-04
|
4.90e-04
|
GO, Gene Ontology. BP, Biological Process. CC, Cellular Component. MF, Molecular Function. KEGG, Kyoto Encyclopedia of Genes and Genomes. |
Figure 3 GO-KEGG enrichment analysis of the dataset TCGA-COADREAD. (A) The barplot displays GO-KEGG enrichment analysis results for PARDEGs (BP, CC, MF, KEGG). (B) Divergence network diagram showing the results of GO-KEGG enrichment analysis of PARDEGs. (C) Chord plot showing the results of GO-KEGG enrichment analysis of the dataset TCGA-COADREAD for PARDEGs of the joint logFC. The green dots represent specific genes and the purple circles represent specific pathways. (D) Circle plot showing the results of GO-KEGG enrichment analysis of the dataset TCGA-COADREAD for PARDEGs of the joint logFC. The purple dots represent up-regulated genes (logFC > 1) and the green dots represent down-regulated genes (logFC < -1).The screening criteria for GO-KEGG enrichment entries were P < 0.05 and FDR value (q.value) < 0.2. GO, Gene Ontology. BP, biological process. CC, cellular component. PARDEGs, Pyroptosis- and Aging- related Differentially Expressed Genes.
3.4 GSEA and GSVA
GSEA was used to analyze the associations between the expression of all genes and related biological processes, affected cellular components, and executed molecular functions in TCGA-COADREAD dataset to assess the influence of gene expression levels on the differences between the COADREAD cancer and normal groups. P-values of less than 0.05 and FDR values (q.value) of less than 0.25 were considered indicative of a significant enrichment screening. Analysis of TCGA-COADREAD dataset revealed that multiple pathways, including resistin as an inflammation regulator (Fig. 4B), autophagy (Fig. 4C), glycolysis, and gluconeogenesis (Fig. 4D), and the role of the HSP70 pathway in modulating apoptosis (Fig. 4E), were significantly enriched in genes (Table 3). We present our findings in TCGA-COADREAD dataset by creating visual representations of mountains (Fig. 4A) and pathways (Fig. 4B-E).
Table 3
Results of Combined Datasets GSEA
ID
|
setSize
|
enrichmentScore
|
NES
|
pvalue
|
p.adjust
|
qvalues
|
WP_RESISTIN_AS_A_REGULATOR_OF_INFLAMMATION
|
33
|
0.43547616
|
1.54621674
|
0.030837
|
0.09090979
|
0.05740866
|
WP_AUTOPHAGY
|
30
|
0.42571938
|
1.50380627
|
0.04365079
|
0.11439754
|
0.07224095
|
WP_GLYCOLYSIS_AND_GLUCONEOGENESIS
|
44
|
0.33117666
|
1.25108706
|
0.15533981
|
0.28425152
|
0.1795021
|
WP_APOPTOSIS_MODULATION_BY_HSP70
|
19
|
0.60073238
|
1.85915885
|
0.00363636
|
0.03591448
|
0.02267965
|
REACTOME_KERATINIZATION
|
215
|
-0.54275257
|
-2.10764977
|
0.00106952
|
0.03591448
|
0.02267965
|
REACTOME_G2_M_CHECKPOINTS
|
168
|
-0.4375401
|
-1.65067525
|
0.00110254
|
0.03591448
|
0.02267965
|
REACTOME_HATS_ACETYLATE_HISTONES
|
139
|
-0.49548866
|
-1.84359764
|
0.00111111
|
0.03591448
|
0.02267965
|
REACTOME_REPRODUCTION
|
143
|
-0.49336299
|
-1.83777081
|
0.00111607
|
0.03591448
|
0.02267965
|
GROSS_HYPOXIA_VIA_ELK3_DN
|
69
|
0.62999349
|
2.11091357
|
0.00146843
|
0.03953922
|
0.03159959
|
MENSE_HYPOXIA_UP
|
42
|
0.67727411
|
2.06922046
|
0.00159744
|
0.03953922
|
0.03159959
|
BROCKE_APOPTOSIS_REVERSED_BY_IL6
|
71
|
0.52335209
|
1.7571483
|
0.00738552
|
0.07736577
|
0.06183042
|
WP_SENESCENCE_AND_AUTOPHAGY_IN_CANCER
|
66
|
0.44080702
|
1.46191796
|
0.04654655
|
0.19930266
|
0.15928191
|
GSEA, Gene Set Enrichment Analysis |
We compared the expression of all genes in TCGA-COADREAD dataset using GSVA to evaluate variations in the hallmark gene set between the COADREAD cancer and normal groups. The data obtained from TCGA-COADREAD dataset indicated that several pathways, such as IL2 STAT5 signaling, apoptosis, hypoxia, and glycolysis, had significantly enriched gene profiles. After assessing the 44 hallmark gene sets, noteworthy disparities between the COADREAD cancer and normal groups were observed (p-value < 0.05, Fig. 4F, Table 4). We compared the representative hallmark gene sets and generated a graph to show the differences between groups (Fig. 4G).
Table 4
GSVA analysis of dataset TCGA-COADREAD.
ontology
|
logFC
|
AveExpr
|
t
|
P.Value
|
adj.P
|
HALLMARK_ADIPOGENESIS
|
-0.527959232
|
-0.072147691
|
-14.78264416
|
1.96E-43
|
9.82E-42
|
HALLMARK_MYC_TARGETS_V2
|
0.854583441
|
0.017369086
|
14.57065456
|
2.19E-42
|
5.48E-41
|
HALLMARK_MYC_TARGETS_V1
|
0.744340126
|
0.007710798
|
13.56275337
|
1.60E-37
|
2.67E-36
|
HALLMARK_HEME_METABOLISM
|
-0.466111592
|
-0.065907264
|
-13.42973383
|
6.79E-37
|
8.48E-36
|
HALLMARK_PANCREAS_BETA_CELLS
|
-0.550855044
|
-0.045909656
|
-12.76822305
|
7.84E-34
|
7.84E-33
|
HALLMARK_E2F_TARGETS
|
0.736162689
|
0.00318997
|
12.47619596
|
1.64E-32
|
1.37E-31
|
HALLMARK_UNFOLDED_PROTEIN_RESPONSE
|
0.529483031
|
-0.017977879
|
12.0995204
|
7.78E-31
|
5.56E-30
|
HALLMARK_BILE_ACID_METABOLISM
|
-0.434861256
|
-0.053540572
|
-11.48376507
|
3.60E-28
|
2.25E-27
|
HALLMARK_G2M_CHECKPOINT
|
0.624078712
|
-0.011162383
|
11.34810694
|
1.35E-27
|
7.52E-27
|
HALLMARK_DNA_REPAIR
|
0.461868068
|
-0.022384784
|
10.9964442
|
3.98E-26
|
1.99E-25
|
HALLMARK_KRAS_SIGNALING_DN
|
-0.377246204
|
-0.044020506
|
-10.54647825
|
2.69E-24
|
1.22E-23
|
HALLMARK_MTORC1_SIGNALING
|
0.477122507
|
-0.028312327
|
10.39823332
|
1.05E-23
|
4.38E-23
|
HALLMARK_XENOBIOTIC_METABOLISM
|
-0.368499235
|
-0.046846429
|
-10.15315597
|
9.66E-23
|
3.72E-22
|
HALLMARK_FATTY_ACID_METABOLISM
|
-0.394084367
|
-0.028967167
|
-9.717404433
|
4.54E-21
|
1.62E-20
|
HALLMARK_MYOGENESIS
|
-0.440182258
|
-0.040943765
|
-9.437514093
|
5.03E-20
|
1.68E-19
|
HALLMARK_COMPLEMENT
|
-0.405575635
|
-0.032080661
|
-8.68853704
|
2.41E-17
|
7.53E-17
|
HALLMARK_PEROXISOME
|
-0.327826326
|
-0.048675523
|
-8.522187976
|
8.99E-17
|
2.64E-16
|
HALLMARK_GLYCOLYSIS
|
0.302574608
|
-0.052239936
|
7.956315926
|
6.79E-15
|
1.89E-14
|
HALLMARK_KRAS_SIGNALING_UP
|
-0.355809045
|
-0.033726964
|
-7.872425251
|
1.26E-14
|
3.32E-14
|
HALLMARK_OXIDATIVE_PHOSPHORYLATION
|
-0.405994538
|
-0.021073592
|
-7.832246739
|
1.70E-14
|
4.24E-14
|
HALLMARK_UV_RESPONSE_DN
|
-0.377341443
|
-0.039477708
|
-7.521452052
|
1.60E-13
|
3.81E-13
|
HALLMARK_WNT_BETA_CATENIN_SIGNALING
|
0.36427061
|
-0.043742995
|
7.382145161
|
4.27E-13
|
9.70E-13
|
HALLMARK_ANDROGEN_RESPONSE
|
-0.343766179
|
-0.020921674
|
-7.268837254
|
9.37E-13
|
2.04E-12
|
HALLMARK_APICAL_SURFACE
|
-0.312194556
|
-0.042647052
|
-6.873898277
|
1.34E-11
|
2.80E-11
|
HALLMARK_ESTROGEN_RESPONSE_LATE
|
-0.243178736
|
-0.06054344
|
-6.559469349
|
1.02E-10
|
2.05E-10
|
HALLMARK_IL6_JAK_STAT3_SIGNALING
|
-0.332179313
|
-0.024721835
|
-6.022871535
|
2.72E-09
|
5.23E-09
|
HALLMARK_PI3K_AKT_MTOR_SIGNALING
|
-0.236320666
|
-0.06136432
|
-6.009484035
|
2.94E-09
|
5.45E-09
|
HALLMARK_ESTROGEN_RESPONSE_EARLY
|
-0.229168303
|
-0.048207404
|
-5.728349787
|
1.48E-08
|
2.65E-08
|
HALLMARK_IL2_STAT5_SIGNALING
|
-0.255964223
|
-0.036780853
|
-5.654659983
|
2.24E-08
|
3.87E-08
|
HALLMARK_APICAL_JUNCTION
|
-0.261500256
|
-0.031185057
|
-5.624723208
|
2.65E-08
|
4.39E-08
|
HALLMARK_UV_RESPONSE_UP
|
0.205914642
|
-0.055631231
|
5.615077898
|
2.80E-08
|
4.39E-08
|
HALLMARK_ALLOGRAFT_REJECTION
|
-0.317195761
|
-0.015436134
|
-5.614116799
|
2.81E-08
|
4.39E-08
|
HALLMARK_INFLAMMATORY_RESPONSE
|
-0.278946708
|
-0.017744467
|
-5.163710446
|
3.13E-07
|
4.74E-07
|
HALLMARK_TGF_BETA_SIGNALING
|
-0.249532394
|
-0.048248395
|
-4.961354905
|
8.72E-07
|
1.28E-06
|
HALLMARK_P53_PATHWAY
|
0.18388094
|
-0.065792794
|
4.869155925
|
1.38E-06
|
1.97E-06
|
HALLMARK_PROTEIN_SECRETION
|
-0.216157519
|
-0.008472292
|
-4.294036713
|
1.99E-05
|
2.77E-05
|
HALLMARK_INTERFERON_GAMMA_RESPONSE
|
-0.244894364
|
-0.02256085
|
-4.186345519
|
3.18E-05
|
4.30E-05
|
HALLMARK_ANGIOGENESIS
|
0.212539323
|
-0.031614118
|
3.653541343
|
0.000277225
|
0.00036477
|
HALLMARK_HEDGEHOG_SIGNALING
|
-0.193756103
|
-0.03111402
|
-3.613325302
|
0.000323101
|
0.000414232
|
HALLMARK_APOPTOSIS
|
-0.153182883
|
-0.040807553
|
-3.51141456
|
0.000473221
|
0.000591526
|
HALLMARK_HYPOXIA
|
-0.145493452
|
-0.046050565
|
-3.313011799
|
0.000968621
|
0.001181245
|
HALLMARK_MITOTIC_SPINDLE
|
0.159273108
|
-0.03210883
|
3.279855768
|
0.00108806
|
0.001295309
|
HALLMARK_CHOLESTEROL_HOMEOSTASIS
|
0.129793123
|
-0.051712431
|
2.841518136
|
0.004615531
|
0.005366897
|
HALLMARK_COAGULATION
|
-0.09512025
|
-0.026709223
|
-2.071228737
|
0.038689392
|
0.043965218
|
GSVA, Gene Set Variation Analysis. TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. |
Figure 4 GSEA and GSVA. (A) GSEA of dataset TCGA-COADREAD for the main 4 biological features. (B-E) Dataset TCGA-COADREAD in which genes are significantly enriched in the WP RESISTIN AS A REGULATOR OF INFLAMMATION (Fig. 4B), WP AUTOPHAGY (Fig. 4C), WP GLYCOLYSIS AND GLUCONEOGENESIS (Fig. 4D), WP APOPTOSIS MODULATION BY HSP70 (Fig. 4E), etc. (F.) Heat map display of functional scores in GSVA for dataset TCGA-COADREAD; (G) Heat map display of functional scores in GSVA for dataset TCGA-COADREAD. The significant enrichment screening criteria for GSEA enrichment analysis were P < 0.05 and FDR value (q.value) < 0.25. Asterisks denote statistically significant P values: * statistically significant(p ≤ 0.05), ** highly statistically significant(p ≤ 0.01), *** highly statistically significant(p ≤ 0.001). TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. GSEA, Gene Set Enrichment Analysis. GSVA, Gene Set Variation Analysis.
3.5 LASSO model
To evaluate the potential of the 18 PARDEGs (CTSG, VDR, CITED2, ELANE, TREM2, FGF21, UCP1, IL17A, VEGFA, IL1A, TREM1, IL1B, DRD2, NOD2, PCSK9, CXCL8, CEBPB, MMP1) in TCGA-COADREAD dataset to be used for prediction, we employed LASSO regression analysis to generate a prognostic model (Fig. 5A,5C) with 7 key genes (NOD2, VEGFA, CEBPB, ELANE, FGF21, MMP1, TREM2). The LASSO model's Risk Score median was used to divide the cancer groups into two categories, with those deemed to be of high risk placed in the high-risk group, and those of low risk in the low-risk group. The LASSO risk factor map (Fig. 5B) was designed to show the results of the LASSO regression analysis through a visual representation of the high- and low-risk groups. We used statistical analysis to corroborate the LASSO prognostic model by examining the clinical data of patients with COADREAD drawn from TCGA-COADREAD dataset (Table 5). Moreover, the Risk Score for the cancer group samples in the GSE17536 dataset was established by employing the coefficients of the variables in the LASSO model, and the cancer group was split into high-risk (grouping: high) and low-risk (grouping: low) groups by taking the median of the Risk Score as the dividing line.
Table 5
Patient Characteristics of COADREAD patients in the TCGA datasets.
Characteristic
|
levels
|
Overall
|
n
|
|
644
|
T stage, n (%)
|
T1
|
20 (3.1%)
|
|
T2
|
111 (17.3%)
|
|
T3
|
436 (68%)
|
|
T4
|
74 (11.5%)
|
N stage, n (%)
|
N0
|
368 (57.5%)
|
|
N1
|
153 (23.9%)
|
|
N2
|
119 (18.6%)
|
M stage, n (%)
|
M0
|
475 (84.2%)
|
|
M1
|
89 (15.8%)
|
Pathologic stage, n (%)
|
Stage I
|
111 (17.8%)
|
|
Stage II
|
238 (38.2%)
|
|
Stage III
|
184 (29.5%)
|
|
Stage IV
|
90 (14.4%)
|
Gender, n (%)
|
Female
|
301 (46.7%)
|
|
Male
|
343 (53.3%)
|
Age, n (%)
|
<=65
|
276 (42.9%)
|
|
> 65
|
368 (57.1%)
|
OS event, n (%)
|
Alive
|
515 (80%)
|
|
Dead
|
129 (20%)
|
DSS event, n (%)
|
Alive
|
544 (87.5%)
|
|
Dead
|
78 (12.5%)
|
PFI event, n (%)
|
Alive
|
479 (74.4%)
|
|
Dead
|
165 (25.6%)
|
Age, median (IQR)
|
|
68 (58, 76)
|
COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. TCGA, the cancer genome atlas. OS, overall survival. DSS, disease specific survival. PFI, progression free interval. IQR, interquartile range. |
We analyzed the levels of expression for NOD2, VEGFA, CEBPB, ELANE, FGF21, MMP1, and TREM2 in the datasets TCGA-COADREAD (Fig. 5D) and GSE161158(Fig. 5E) for high- and low-risk groups and compared the results to show whether there were any statistically significant differences (P < 0.05). The Figure(Fig. 5D,5E) demonstrates that The GSE161158 dataset contains two genes that match TCGA-COADREAD validation results: VEGFA and MMP1. The model formula is as follows:
Figure 5 Construct prognostic model for PARDEGs and differential genetic analysis of high- and low-risk groups in COADREAD. (A) Lasso regression model of PARDEGs. The vertical coordinates of the LASSO regression diagnostic model represent the likelihood deviation values of the LASSO regression. The log(λ) value of the x-axis at the bottom of the plot represents by default the case after taking log of the lambda coefficient of the penalty term in the LASSO regression. the number of the x-axis at the top represents the number of variables with non-zero coefficients corresponding to each lambda. (B) The risk factor plot. The green points in the scatter plot portion represent deceased patients and the purple points represent surviving patients. (C) the trajectory plot of variables in the LASSO regression diagnostic model. (D-E) Comparative graphical presentation of the grouping of hub genes in the high- and low-risk groups in the dataset TCGA-COADREAD (D), GSE161158 (E). “ns” indicates lack of statistical significance (P > 0.05). asterisks denote statistically significant P values: * statistically significant(p ≤ 0.05), ** highly statistically significant(p ≤ 0.01), *** highly statistically significant(p ≤ 0.001). TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. LASSO, least absolute shrinkage and selection operator.
3.6 PPI
We conducted a protein-protein interaction analysis of seven hub genes (NOD2, VEGFA, CEBPB, ELANE, FGF21, MMP1, and TREM2) using the STRING database, setting the biological species as humans and constructing a PPI network with the criterion of a minimum interrelationship coefficient of 0.400 or higher. Using Cytoscape software, we were able to graph the interactions (Fig. 6A) and observed that when the interrelationship coefficient was not less than 0.400, each hub gene was connected to at least one hub gene, with the exception of NOD2 and TREM2. VEGFA had the most connections with other genes in the group, with four core genes (VEGFA, CEBPB, ELANE, FGF21, and MMP1) as its primary linkages. We employed the MCC algorithm to determine the scores of the hub genes in the PPI network, which were linked to other PPI network nodes, and arranged the hub genes from the highest to lowest score using a gradient from red to yellow. The MCC algorithm yielded VEGFA as the first gene, as depicted in Fig. 6B. Furthermore, we used the GeneMANIA website to forecast and form a connection network (Fig. 6C) of functionally analogous genes among these seven key genes to analyze their co-expression, co-localization, and gene interactions. Finally, these seven core genes are displayed as their relevant proteins (Fig. 6D), including TREM2 and VEGFA on chromosome 6, MMP1 on chromosome 11, NOD2 on chromosome 16, ELANE, FGF21 on chromosome 19, and CEBPB on chromosome 20.
Figure 6 PPI. (A) PPI network of hub genes. (B) The PPI network of hub genes in MCC algorithm, the color of the rectangle in the figure from yellow to red represents the gradual increase of the score. The interconfiguration network of A-B was collected in the STRING database and constructed in Cytoscape software with a minimum interaction score of 0.400. (C)The GeneMANIA website of hub genes predicts the interaction network of functionally similar genes of hub genes. The interconfiguration network was collected and derived from the GeneMANIA website, where black circles with white slashes represent input hub genes, other black circles without white slashes represent predicted functionally similar genes, purple lines represent co-expression, yellow lines represent Predicted, and red lines represent Physical interactions between genes. (D) The hub gene chromosome location diagram. PPI, protein-protein interaction. MCC, maximal clique centrality.
3.7 Clinical correlations and KM curve
We examined the correlation between the seven hub genes and several clinical elements, such as T-stage, N-stage, M-stage, pathological stage, age, and survival indicators, including overall survival (OS), progression-free interval (PFI), and disease-specific survival (DSS), and the findings are illustrated in Figs. 7A-G. The results of the data indicated that NOD2 expression had a statistically significant correlation with the discrepancy between T1 and normal T-staging (P < 0.05); VEGFA expression was associated with the difference between survival and death in cases of PFI (P < 0.05); CEBPB expression was linked to the divergence between N0 and N1 in the clinical N-stage (P < 0.05); ELANE expression was related to the contrast between M0 and normal M-staging (P < 0.05); FGF21 expression was connected to the distinction between less than or equal to 65 years of age and normal (P < 0.05); MMP1 expression correlated with the contrast between survival and death in clinical OS cases (P < 0.05); and TREM2 expression was associated with the discrepancy between N0 and normal for clinical N staging (P < 0.05). We produced Kaplan-Meier graphs for the seven hub genes in the LASSO model considering the clinical subtype variables. Although the number of low FGF21 subgroups was less than three, it was not feasible to conduct high and low subgroup analyses. After examining the data from Figs. 7H-M, six hub genes (NOD2, VEGFA, CEBPB, ELANE, MMP1, and TREM2) were identified, and a P value of 0.05 was utilized to evaluate the importance of the associations between pertinent gene-subtype clinical parameters.
Figure 7 Clinical correlations and KM curve. (A-H) Violin plots of correlation analysis among clinical subgroups for genes NOD2 (A), VEGFA (B), CEBPB (C), ELANE (D), FGF21 (E), MMP1 (F), and TREM2 (G). (H) KM plots for clinical correlation analysis of gene NOD2 in OS events. (I)KM plots for clinical correlation analysis of gene VEGFA in OS events. (J) KM plots for clinical correlation analysis of gene CEBPB in OS events. (K) KM plots for clinical correlation analysis of gene ELANE in OS events. (L) KM plots for clinical correlation analysis of gene MMP1 in OS events. (M) KM plots for clinical correlation analysis of gene TREM2 in OS events. Asterisks denote statistically significant P values: * statistically significant(p ≤ 0.05), ** highly statistically significant(p ≤ 0.01), *** highly statistically significant(p ≤ 0.001). TCGA, the cancer genome atlas. COADREAD, colon adenocarcinoma/rectum adenocarcinoma esophageal carcinoma. PARDEGs, Pyroptosis- and Aging- Related Differentially Expressed Genes. ROC, receiver operating characteristic curve. OS, overall survival.
3.8 Cox model
We employed multivariate Cox regression analysis on TCGA-COADREAD dataset to corroborate our previously established LASSO regression prognostic model. The purpose of this analysis was to examine the connection between optimistic and pessimistic clinical manifestations of seven major genes (NOD2, VEGFA, CEBPB, ELANE, FGF21, MMP1, and TREM2). At first, we incorporated the expression levels of the seven key genes into a multivariate Cox regression model (shown in Table 6) and generated a forest plot (Fig. 8A) by using the multivariate Cox regression analysis. Additionally, we performed a nomogram analysis of the genes within the multivariate Cox regression model to evaluate their predictive strength, resulting in the formation of a nomogram (Fig. 8B). Moreover, we evaluated the accuracy of the nomograms based on the multivariate Cox regression model for one year (Fig. 8C), three years (Fig. 8D), and five years (Fig. 8E) by performing a prognostic calibration analysis and displaying the calibration curves. Examining the data, it is clear that the purple line denoting the three-year time span closely follows the gray ideal case line, suggesting that the model is more precise in forecasting results for a three-year duration than for one- or five-year durations. Subsequently, we used DCA to assess the practical value of our generated LASSO-Cox regression model for prognostication over the course of one year (Fig. 8F), three years (Fig. 8G), and five years (Fig. 8H). Examination of the data shows that the purple line representing the model always surpasses the x-values of the green line, suggesting overall positive results. The greatest variation is observed in five-year forecasts, whereas shorter-term forecasts, such as those for one or three years, are less variable. The results indicate that the model's accuracy in predicting outcomes is greater over a five-year period than over either a one-year or three-year period.
Table 6
Cox regression to identify hub genes and clinical features associated with OS.
Characteristics
|
Total(N)
|
Univariate analysis
|
Multivariate analysis
|
Hazard ratio (95% CI)
|
P value
|
Hazard ratio (95% CI)
|
P value
|
NOD2
|
643
|
|
|
|
|
Low
|
321
|
Reference
|
|
|
|
High
|
322
|
0.867 (0.611–1.232)
|
0.426
|
|
|
VEGFC
|
643
|
|
|
|
|
Low
|
321
|
Reference
|
|
|
|
High
|
322
|
1.466 (1.032–2.080)
|
0.032
|
1.666 (1.159–2.395)
|
0.006
|
CEBPB
|
643
|
|
|
|
|
Low
|
322
|
Reference
|
|
|
|
High
|
321
|
1.373 (0.969–1.945)
|
0.075
|
|
|
ELANE
|
643
|
|
|
|
|
Low
|
322
|
Reference
|
|
|
|
High
|
321
|
0.700 (0.493–0.995)
|
0.047
|
0.613 (0.426–0.881)
|
0.008
|
MMP1
|
643
|
|
|
|
|
Low
|
322
|
Reference
|
|
|
|
High
|
321
|
0.727 (0.512–1.030)
|
0.073
|
|
|
TENM2
|
643
|
|
|
|
|
Low
|
322
|
Reference
|
|
|
|
High
|
321
|
0.790 (0.558–1.118)
|
0.183
|
|
|
OS, overall survival. |
Figure 8 construct Cox model. (A-B) Forest plot (A)and nomogram (B) of multivariable Cox regression analysis of hub genes. (C-E) Calibration curve of the 1-year (C), 3-year (D), and 5-year (E) for the multifactor Cox regression model nomogram analysis. (F-G) DCA plots for the 1-year (F), 3-year (G), and 5-year (H) LASSO-Cox regression prognostic models. The probability threshold or Threshold Probability is represented by the x-axis in the DCA plots, while the net return is represented by the y-axis. DCA, decision curve analysis. LASSO, least absolute shrinkage and selection operator.
3.9 Immunohistochemical analysis
Immunohistochemical techniques were utilized to study the expression of 5 genes (NOD2, CEBPB, ELANE, FGF21, TREM2) in COADREAD tumor tissues and normal colorectal tissues, taking into consideration 7 hub genes, with the help of the HPA database. The staining process required the use of DAB (3,3-diaminobenzidine) and counterstaining was performed with hematoxylin. Analysis of NOD2 expression indicated that the gene concentration was significantly higher in the COADREAD tumor tissue (COADREAD tissue, Fig. 9B) than in the normal colorectal tissue (normal colorectal tissues, Fig. 9A). CEBPB expression was significantly more elevated in COADREAD tissue (Fig. 9D) when compared to normal colorectal tissue (Fig. 9C). ELANE expression was higher in normal colorectal tissue than in COADREAD tumor tissue in both normal colorectal tissue (Fig. 9E) and COADREAD tissue (Fig. 9F). FGF21 expression was more pronounced in COADREAD tumor tissues (COADREAD tissue, Fig. 9H) in regular colorectal tissues (normal colon tissue, Fig. 9G). TREM2 expression was more pronounced in COADREAD tissue (Fig. 9J) when compared to normal colorectal tissue (Fig. 9I).
Figure 9 Immunohistochemical analysis. (A-B) gene NOD2 immunohistochemistry analysis in COADREAD tissue(B)and Normal colorectal tissues (A). (C-D) gene CEBPB immunohistochemistry analysis in COADREAD tissue(D)and Normal colorectal tissues (C). (E-F) gene ELANE immunohistochemistry analysis in COADREAD tissue(F)and Normal colorectal tissues (E). (G-H) gene FGF21 immunohistochemistry analysis in COADREAD tissue(H)and normal Normal colorectal tissues (G). (I-J) gene TREM2 immunohistochemistry analysis in COADREAD tissue(J)and Normal colorectal tissues (I). All the data were obtained from the HPA database. COADREAD, colon adenocarcinoma/rectum adenocarcinoma. HPA, human protein atlas.