Data Sources
The Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) database, which is freely accessible to the public, was used to derive all data used in our study. The atherosclerosis whole genome-wide expression profile was retrospectively derived from the GEO database using the R package ‘GEOquery’. GSE28829 comprises 16 samples from patients with atherosclerosis and 13 normal control samples. GSE43292 comprises 32 atherosclerosis samples and 32 control samples. The R package “sva” in ComBat method was applied to correct batch effects of non-biotechnical bias (11). The degree of correction was examined using principal component analysis (PCA). The present study honored the data access policies of each database.
DEGs Associated with Atherosclerosis
The “limma (version 3.50.0)” (14) package in R was adopted to screen DEGs in atherosclerosis (n = 48) and normal (n = 45) samples, with thresholds of p-adjusted < 0.05 and |log2FC| >0.5 for inclusion in subsequent evaluations. Next, the R package “pheatmap” was used to generate heatmaps by means of Complete Linkage Clustering and Euclidean Distance.
WGCNA and Significant Module Recognition
The WGCNA algorithm implemented in the WGCNA_1.70-3 package was adopted to build the co-expression networks (15). Gene expression profile similarity was assessed using Pearson’s correlation coefficient. A scale-free network was obtained by weighing the correlation coefficients between genes via a power function. Using the R package ‘PickSoftThreshold,’ the similarity of co-expression was increased to power β = 24, thus building a weighted adjacency matrix. A gene module refers to a set of genes that are tightly interconnected in a co-expression network. In WGCNA, gene modules were identified using hierarchical clustering, and the modules are indicated in color. The different modules were screened using dynamic tree cut. During the module selection process, the adjacency matrix (a measure of topological similarity) was shifted to a topology overlay matrix (TOM), and cluster analysis was performed to detect modules. We implemented Pearson’s correlation analysis to calculate the correlation between ME (referring to the entire level of gene expression in each module) and GLN. Modules that were markedly related to GLN were acquired. Heatmap plots of topological overlap in the gene network were used to visualize the structure of the co-expression module. Hierarchical clustering dendrograms of MEs and heatmap plots of corresponding eigengene networks summarized the associations among modules. The GLN-associated DEGs were obtained from the intersections of DEGs and GLN-associated module genes.
GSEA
As a computing approach, GSEA (13) is used to examine if two biological states exhibit statistically significant and consistent differences in the a priori-defined set of genes. Using the R package “clusterProfiler (version 4.2.2),” GSEA was implemented on an ordered list of entire genes based on their log2FC values. Each analysis was conducted for 1,000 times of gene set permutations. We used c2.cp.kegg.v7.5.1.symbols as the reference gene collection from MSigDB (12, 16). Gene sets with P-adjusted < 0.05 were deemed as significantly enriched.
GSVA
GSVA refers to an unsupervised and non-parametric approach that allows the utilization of gene expression profiles to evaluate connections between biological pathways and gene characteristics. To investigate the differences in biological function between normal and disease groups, GSVA was performed on “c2.cp.kegg.v7.5.1.symbols” using the R package “GSVA (version 1.42.0).” For result visualization, the R package “pheatmap (version 1.0.12)” was used. MSigDB (http://software.broadinstitute.org/gsea/msigdb) was employed to download 50 hallmark gene sets as reference. GSVA scores for the gene sets in various samples were computed using the ssGSEA function in the GSVA package. Both groups were compared in terms of GSVA score differences in various gene sets using the Limma package.
GO/KEGG Pathway Enrichment Analyses
GO (17) divides gene functions into three parts: CC, BP, and MF. KEGG [PMID: 10592173] refers to an integrated bioinformatics platform for the systematic analysis of markedly altered metabolic pathways enriched in gene lists. The R package “clusterProfiler (version 4.2.2)” (18) was employed for GO/KEGG enrichment analyses (p < 0.05) on the GLN-associated DEGs.
PPI Network Establishment
We used the online database Search Tool for the Retrieval of Interacting Genes (STRING) (19) to build PPI networks, setting 0.7 as the cut-off value of the interaction score. PPI network visualization was implemented using the Cytoscape (20) software. In the PPI network, hub genes were detected via the Cytoscape plugin cytoHubba (21). CytoHubba ranking methods were employed to screen the top 100 genes; hub genes were identified based on intersections from 12 approaches (Betweenness, BottleNeck, ClusteringCoefficient, density of maximum neighborhood component, edge percolated component, Eccentricity, maximum neighborhood component, maximal clique centrality, node connect closeness, node connect degree, Radiality, and Stress). The 12 methods in the CytoHubba plugin were used sequentially.
GeneMANIA
The GeneMANIA website (http://genemania.org) (22) was used to establish the PPI networks of hub genes. The website allows predictions of associations of functionally similar genes with hub genes, comprising protein–protein and protein–DNA interactions, pathways, biochemical reactions, and co-localization and co-expression networks. Based on the R package “clusterProfiler,” we obtained gene functional annotations of GO terms and KEGG pathway analyses.
ROC Curve
The ROC curve assesses diagnostic test performance. The curve is plotted with sensitivity as the ordinate and 1 − specificity (false positive rate) as the abscissa. AUC is a metric frequently acquired from the ROC plot of sensitivity against 1 − specificity. After using the R package “pROC” [PMID: 21414208] to plot ROC curves and determine AUC, the signature genes were screened and diagnostic values were assessed. AUC values ranged between 0.5 (completely random classifier) and 1 (perfect classifier). Generally, an AUC value of 0.5 refers to no predictive value, 0.6–0.8 to acceptable accuracy, 0.8–0.9 to excellent accuracy, and a value over 0.9 denotes outstanding accuracy.
Immune Infiltration Analysis
SsGSEA (23) refers to a GSEA extension. It computes the enrichment score for each sample paired with a gene set. The score indicates the absolute gene set enrichment within a specified dataset. SsGSEA differs from group-based (e.g., disease vs. normal) GSEA. Instead, each sample can be scored for the corresponding set of genes in ssGSEA.
We adopted the Tumor and Immune System Interactions Database (http://cis.hku.hk/TISIDB/index.php) (24) to derive 28 immune cells: central memory CD4 and CD8 T cells, effector memory CD4 and CD8 T cells, activated CD4 and CD8 T cells, T follicular helper cells, gamma delta T cells, T helper cells (Th1, Th2, and Th17), regulatory T cells, activated B cells, immature B cells, memory B cells, NK cells, CD56bright and CD56dim NK cells, myeloid derived suppressor cells, NK T cells, activated dendritic cells, plasmacytoid dendritic cells, eosinophils, immature dendritic cells, mast cells, macrophages, monocytes, and neutrophils. The relative enrichment score of each immunocyte was quantitatively determined from the gene expression profile of each sample. Variations in the ICI levels among samples in atherosclerosis and control groups were illustrated using the R package ggplot2 (version 3.3.6) (25).
Statistical Methods
The R software v4.1.2 was used to carry out the statistical analyses. The correlation between two parameters was inferred using Spearman’s correlation analysis. Inter-group differences were compared using Wilcoxon tests. Multi-group (three or more groups) differences were compared using the Kruskal–Wallis test. Statistical significance was set at two-sided p < 0.05.