Identification of DEGs in the GSE180081 cohort
The study procedures are outlined in a flowchart in Fig. 1. The GSE180081 dataset obtained from the GEO database comprised 96 samples (48 from patients with CAD and 48 from those without). Of the 44,265 genes represented in the GSE180081 dataset, 943 were differentially expressed (CAD-DEG), of which 645 were upregulated and 298 were downregulated (LogFC > 1; P < 0.05) (Fig. 2a, b).
For the CAD group, we clustered the training set based on its relationship with the cuproptosis gene dataset (from k = 2 to k = 6). Based on the results of the cluster analysis, we divided the samples into two groups (Cu_group1, Cu_group2), which comprised 21 and 27 samples, respectively. (Fig. 2c). Of the total of 44,265 genes represented in the two groups, 1,108 were differentially expressed (Cu-DEG), of which 793 were upregulated and 369 were downregulated (LogFC > 1; P < 0.05) (Fig. 2d, e). The gene set at the intersection of CAD-DEG with Cu-DEG, referred to as Co-DEG, comprised 818 differentially expressed genes, of which 231 were upregulated and 587 were downregulated (Fig. 2f).
Functional analyses of the Co-DEGs
To elucidate the biological functions and potential pathways of the Co-DEGs, GO enrichment and KEGG pathway analyses were performed. As shown in the scatter graph of the GO analysis (Fig. 3a), the differential genes were principally derived from the positive regulation of cell activation, positive regulation of leukocyte activation, actin filament organization, leukocyte cell-cell adhesion, and positive regulation of lymphocyte activation pathways, which implies that leukocyte activation and interactions are important in this interaction. The KEGG analysis showed that the Co-DEGs were principally associated with infectious diseases, including COVID-19 (Fig. 3b). Thus, when the results of the two analyses are combined, the differentially expressed genes appear to be principally associated with immune function. GSEA analysis showed that these genes were principally derived from the Apical junction, Complement, and Hypoxia pathways (Fig. 3c). To better visualize the key differentially expressed genes in the Co-DEG set, we used a chordal diagram to show the genes that are involved in the pathways identified in the GO analysis (Fig. 3d).
Co-DEGs identified to be diagnostic for CAD
LASSO regression screening was used to identify five genes that were associated with the diagnostic features of CAD: HIST1H4E, IL6ST, RN7SKP45, LST1, and SNORD50B (Fig. 4a, b). In addition, these five genes could be used to distinguish the CAD and control groups in the LASSO regression model (Fig. 4c).
We next used these five genes and the R package glm to construct a logistic regression model and analyzed this using ROC analysis. This showed that the model was capable of satisfactorily identifying CAD in the test set (AUC 0.83, Fig. 4d). To further verify the accuracy of this model, we used ROC curve analysis of the GSE180082 dataset, which yielded an AUC of 0.62 (Fig. 4d).
Immune landscape analysis
We combined the risk groups identified using the proposed model with the expression matrix to compare the immune cell infiltration of the groups using three methods: ssGSEA, estimate, and Cibersort. The ssGSEA analysis showed differences in the numbers of 14 of the 28 types of immune cell, and all of these except the CD4+ T cells were present in larger numbers in the high-risk group (Fig. 5a, b). Each of the five identified genes is closely associated with the function of multiple immune cells, and in particular, the LST1 gene is associated with 20 types of immune cell, and therefore may be a key candidate for future basic research (Fig. 5c).
The immunization scores obtained using the estimate package showed differences between the high-risk and low-risk groups, with a higher overall immunization score in the high-risk group, consistent with the ssGSEA findings (Fig. 5d). Cibersort analysis also showed differences in immunity between the two groups, with the high-risk group showing a lower number of memory CD4+ T cells (Fig. 5e). Thus, although the exact immune statuses of the two groups remain to be verified in laboratory experiments, we have shown that there are differences between the subgroups of the diagnostic model constructed using the cuproptosis gene set.
The identified genes are closely associated with CAD-related pathways
To ensure that multicollinearity during the logistic regression analysis did not interfere with the findings, we performed correlation analysis on the five genes used to construct the model (Fig. 6a, b), which suggested that there were no close correlations between them, and implies that the model was robust. To further explore the function of these five genes (Table 2), we performed GSEA analysis of each of them (Fig. 6c, d, e, f, g), and found that they are associated with pathways involved in cardiovascular diseases and the regulation of inflammation, including Leukocyte transendothelial migration, Type Ⅰ diabetes mellitus, Viral myocarditis, Arrhythmogenic right ventricular cardiomyopathy, and Glycine, serine, and threonine metabolism. Therefore, the roles of these five genes in CAD should be further explored.
Table 2
Results of the limma analysis, demonstrating low expression of five genes
Gene | logFC | AveExpr | t | P-value | adj. P-value | B | Change |
HIST1H4E | −1.28 | 6.82 | −8.44 | 3.65E − 13 | 1.66E − 11 | 19.16 | DOWN |
IL6ST | −1.26 | 6.70 | −7.78 | 8.85E − 12 | 1.74E − 10 | 16.23 | DOWN |
RN7SKP45 | −1.39 | 6.89 | −7.7 | 1.28E − 11 | 2.38E − 10 | 15.75 | DOWN |
LST1 | −1.42 | 7.37 | −7.66 | 1.62E − 11 | 2.90E − 10 | 15.41 | DOWN |
SNORD50B | −1.21 | 6.70 | −6.93 | 5.17E − 10 | 6.40E − 09 | 12.48 | DOWN |
Associations of the identified genes with immunosuppressant, immune activator, and MHC-related genes
We next analyzed the relationships of well-established immunosuppressant genes (Fig. 7a), immune activator genes (Fig. 7b), and MHC genes (Fig. 7c) with the five identified genes. We found that the identified genes closely correlated with BTLA and VTCN1 among the immunosuppressant genes, CD276 and TMEM173 among the immune activator genes, and HLA-DQA1 and TABPBP among the MHC-related genes. Subsequently, each group of correlations was examined, HIST1H4E was found to be closely associated with BTLA and TMEM173, and LST1 was found to be closely associated with TMEM173 and TAPBP (Fig. 7d). Therefore, the expression of HIST1H4E and LST1 may be regulated by these genes, providing a basis for further studies.
Creation of a ceRNA network based on the identified genes and the prediction of drugs that would target these genes
We constructed a ceRNA network based on the three genes using information from the MicroCosm and starBase databases and the multiMiR package. The network contained 59 nodes (3 genes, 47 miRNAs, 9 lncRNAs) and 59 edges (Fig. 8b). We identified many miRNA regulatory networks targeting LST1, with 27 of the miRNAs being involved in regulation, and the principal associated lncRNAs including DANCR, NEAT1, and SNHG22. The regulatory network of IL6ST was also found to be diverse, with 15 miRNAs being found to be involved in regulation and five associated lncRNAs, including let-7b-5p and miR-628-5p. Six miRNAs were found to be involved in the regulation of HIST1H4E, among which miR-148a-3p was found to have two related lncRNAs: NUTM2A-AS1 and AL049840.4.
We also attempted to identify drugs that may target the identified genes using the DGIdb database, and the results obtained using the Cytoscape software are shown in Fig. 8a. We identified three drugs that target IL6ST and one that targets LST1. These findings provided a basis for further study of these genes in the future.
RT-qPCR analysis of the expression of the identified genes in mononuclear cells obtained from the peripheral blood of patients with CAD and healthy individuals
We collected fresh blood from 12 patients with CAD and 12 healthy people who were undergoing a physical examination. RNA was extracted from peripheral monocytes and RT-qPCR was performed to measure the expression of LST1, IL6ST, and HIST1H4E. We found that the expression of LST1 and HIST1H4E was significantly lower in the CAD group (Fig. 9), which is consistent with the results of the bioinformatic analysis. In contrast, IL6ST was expressed at a significantly higher level in the CAD group (Fig. 9), suggesting that further research regarding this gene is merited. Thus, we have confirmed that LST1, HIST1H4E, and IL6ST might represent biomarkers of CAD and provide a basis for further study of these genes in the future.