To determine the impact of gene expression levels in the normal sample group (NG) and sepsis patient sample group (SG) on sepsis in the data set, we analyzed the data set GSE131761 and the data set GSE54514 through GSEA (Fig. 7A and 7F, Table S8 and S9). The link between gene expression and the biological processes involved, the cellular components affected, and the molecular functions exerted. The results showed that the differentially expressed genes in the data set GSE131761 were significantly enriched in pathways such as Hedgehog pathway (Fig. 7B), MAPK pathway (Fig. 7C), TP53 pathway (Fig. 7D), and Glycolysis pathway (Fig. 7E). The differentially expressed genes in the data set GSE54514 were significantly enriched in Hedgehog pathway (Fig. 7G), Autophagy pathway (Fig. 7H), TGFBETA pathway (Fig. 7I), Wnt pathway (Fig. 7J) and other pathways.
To explore the variability of dataset GSE131761 and GSE54514 between the sepsis samples (group: sepsis) and the corresponding normal samples (group: Normal), We performed GSVA enrichment analysis on datasets GSE131761 (Fig. 8A, Table S10) and GSE54514 (Fig. 8B, Table S11) to calculate the functional enrichment differences. The results show that the data set GSE131761 and the data set GSE54514 show differences between sepsis samples (group: sepsis) and corresponding normal samples (group: Normal). The differential genes in the dataset GSE131761 were significantly enriched in Glycolysis pathway, PI3K AKT pathway, Wnt BETA pathway, IL6 JAK pathway and other pathways. The differential genes in the dataset GSE54514 are significantly enriched in Notch pathway, Oxidative pathway, PI3K AKT pathway, TGF BETA pathway and other pathways.
Construction of CRDEGs diagnostic model and its prognostic performance
To determine the diagnostic value of 202 CRDEGs in the merged dataset, we used Lasso regression analysis to construct a CRDEGs diagnostic model (Fig. 9A). Lasso regression is based on linear regression. By increasing the penalty item (lambda × absolute value of the slope), the overfitting of the model is reduced and the generalization ability of the model is improved. In addition, we also visualized the results of Lasso regression, and the Lasso variable locus diagram (Fig. 9C) was obtained. The results showed that the genes changed with the coefficient of the lambda coefficient (log post) of the Lasso penalty term, and the number of genes gradually increased as the lambda decreased. Then we visualized the expression of CRDEGs in different groups in the CRDEGs diagnostic model through a forest plot (Fig. 9B). It can be seen from Fig. 9C that we constructed 32 CRDEGs in the CRDEGs diagnostic model, see Table S3.
After the 32 CRDEGs in the CRDEGs diagnostic model were calculated based on the coefficients of the 32 genes calculated by LASSO analysis, the expression of each gene was multiplied by the corresponding coefficient and added to establish the prediction score of the CRDEGs diagnostic model. The final prediction score of each sample, and then through the Wilcoxon signed rank test (Wilcoxon signed rank test) to analyze the expression of the CRDEGs diagnostic model in the samples of different groups (sepsis/Normal) in the merged dataset and compare the results through the group comparison chart As shown in Fig. 10A, the results showed that the expression of the CRDEGs diagnostic model constructed by us was statistically significantly different among different groups in the merged dataset (P < 0.001). We divided the final prediction score of each sample in the disease group of the 32 CRDEGs in the merged dataset into high and low risk groups (High/Low group) to draw a heat map (Fig. 10B), and then we based on the 32 CRDEGs in the merged dataset. The final prediction score of each sample in the group was divided into high and low risk groups (High/Low group), and then we passed the Wilcoxon signed rank test (Wilcoxon signed rank test) on the 32 CRDEGs in the CRDEGs diagnostic model in the merged dataset disease group The expression difference between the high and low risk groups (High/Low group) was analyzed (Fig. 10C), and the results showed that 20 CRDEGs (AK1, BLVRB, CACNA1E, CD82, GBA, HBG1, HBQ1, IL1RN, LGALS3, LTB4R, MCOLN1, MTF1, PIK3R2, RPS3A, RUNDC3A, SGSH, TMEM145, VSTM1, VTI1B, XPO7) have extremely statistically significant differences (P < 0.001) in the expression levels of the high and low risk groups in the merged dataset disease group. Among them, CACNA1E, CD82, GBA, IL1RN, LTB4R, MTF1, SGSH and VSTM1 were significantly up-regulated genes, and AK1, BLVRB, HBG1, LGALS3, MCOLN1, PIK3R2, RPS3A, RUNDC3A, TMEM145, VTI1B, XPO7, and HBQ1 were significantly down-regulated genes. The expression levels of FLOT2 and FOXO4 in the high and low risk groups of the merged dataset disease group have highly statistically significant differences (P < 0.01), and the expression levels of MAF1 in the high and low risk groups of the merged dataset disease group have a certain statistically significant difference (P < 0.05 ). Among them, FLOT2 was a significantly up-regulated gene, and FOXO4 and MAF1 were significantly down-regulated genes.
Finally, the expression of 32 CRDEGs in different groups (sepsis/Normal) in GSE25504 was verified for expression difference analysis (Fig. 11A), and the results showed that 23 CRDEGs (BLVRB, CACNA1E, CD82, CYP1B1, FLOT2, FOXO4, GBA, GRN, HBQ1, HSPA1A, IL1RN, LTB4R, MAF1, MCOLN1, MTF1, RNF10, RPL35, RPS3A, RUNDC3A, SGSH, SLC1A5, TMEM158, VTI1B) have statistically significant differences in the expression levels of different groups in the merged dataset (P < 0.001), the expression levels of 3 CRDEGs (HBG1, XPO7, SPRYD3) in different groups of the merged dataset had highly statistically significant differences (P < 0.01). The expression levels of the three CRDEGs (LGALS3, TUBB2A, VSTM1) in different groups of the merged dataset had statistically significant differences (P < 0.05). It was also verified that the expression levels of CRDEGs between different groups (Sepsis/Normal) in GSE26378 were analyzed for expression differences (Fig. 11B). The results showed that 12 CRDEGs (CACNA1E, CD82, CYP1B1, FLOT2, GRN, IL1RN, LTB4R, MCOLN1, MTF1, RNF10, SGSH, VSTM1) in different groups of the merged dataset had statistically significant differences (P < 0.001). Among them, CACNA1E, CD82, CYP1B1, FLOT2, GRN, IL1RN, LTB4R, MTF1 were significantly up-regulated genes, and MCOLN1 and RNF10 were significantly down-regulated genes. The expression levels of the three CRDEGs (AK1, FOXO4, LGALS3) in different groups of the merged dataset had highly statistically significant differences (P < 0.01). Among them, AK1 was a significantly up-regulated gene, while FOXO4 and LGALS3 were significantly down-regulated genes.
Then we plotted the Receiver operating characteristic curve (ROC) curve of 23 CRDEGs (see Table S4) with statistically significant differences in the expression levels in the high and low risk groups of the merged dataset disease group. The results showed that: 23 differentially expressed genes related to cuproptosis were compared one by one Draw the ROC curve and display the results (Fig. 12). From the ROC curve in Fig. 12, we can see: AK1 (AUC = 0.657, Fig. 12A), BLVRB (AUC = 0.727, Fig. 12B), CACNA1E (AUC = 0.686, Fig. 12C), CD82 (AUC = 0.664, Fig. 12D), FLOT2 (AUC = 0.614, Fig. 12E), FOXO4 (AUC = 0.620, Fig. 12F), GBA (AUC = 0.704, Fig. 12G), HBG1 (AUC = 0.637, Fig. 12H), HBQ1 (AUC = 0.635, Fig. 12I), IL1RN (AUC = 0.646, Fig. 12J), LGALS3 (AUC = 0.668, Fig. 12K), LTB4R (AUC = 0.731, Fig. 12L), MAF1 (AUC = 0.600, Fig. 12M), MCOLN1 (AUC = 0.645, Fig. 12N), MTF1 (AUC = 0.713, Fig. 12O), PIK3R2 (AUC = 0.656, Fig. 12P), RPS3A (AUC = 0.696, Fig. 12Q), RUNDC3A (AUC = 0.652, Fig. 12R), SGSH (AUC = 0.676, Fig. 12S), TMEM145 (AUC = 0.643, Fig. 12T), VSTM1 (AUC = 0.651, Fig. 12U), VTI1B (AUC = 0.657, Fig. 12V), XPO7 (AUC = 0.652, Fig. 12W) were significantly associated with high and low risks of sepsis sex.
We then performed single/multivariate Cox regression analysis on the expression levels of 23 CRDEGs and constructed a Cox regression model, and then we conducted nomogram analysis to judge the prognostic ability of the model and drew a nomogram (Fig. 13A). In addition, we performed prognostic Calibration analysis on the nomogram (nomogram) of the univariate and multivariate COX regression models and drew a calibration curve (Fig. 13B), and finally we used decision curve analysis (decision curve analysis, DCA) The effect of the constructed Cox regression model on clinical utility was evaluated and the results were displayed (Fig. 13C). The x-axis in the DCA chart represents the probability threshold or Threshold Probability, and the y-axis represents the net benefit. The result could be judged by observing that the model line can be stably higher than the All positive line and the All negative line's x-value range. The larger the x-value range, the better the model effect.