Identification of key candidate biomarkers and pathways in atherosclerosis

Atherosclerosis is the leading cause of cardiovascular disease worldwide for which lacks effective prevention and therapeutic strategy. Therefore, clinical indicators for early diagnosis and screening are in great need. The present study aimed to elucidate the key genetic signatures and pathways identifying the key candidate biomarker in atherosclerosis by integrative bioinformatics analysis combining with experimental assay. The gene expression profiles (GSE30169, GSE6584) were achieved from the Gene Expression Omnibus database. Functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed to examine the biological functions of identified differentially expressed genes (DEGs). A protein-protein interaction (PPI) network was mapped using Cytoscape software. 91 DEGs were identified, including 68 up-regulated genes and 23 down-regulated genes. Functional enrichment analysis indicated that DEGs genes were significantly enriched in ferroptosis, TNF signaling pathway, IL-17 signaling pathway. 12 nodes with the highest degrees were selected as hub genes. CCL2, CXCL1, IL6, and DUSP1 can serve as a sensitive diagnostic indicator for the early stage of atherosclerosis; CEBPB and HMOX1 can serve as a diagnostic indicator for diabetic atherosclerosis; TRIB3 is a sensitive marker indicating atherosclerosis risks in diabetic women group. In conclusion, we have identified key candidate genes that indicate the diagnosis of patients with atherosclerosis, and these genes may serve as potential therapeutic or drug development targets for atherosclerosis.


Introduction
Cardiovascular diseases (CVD) are considered as the leading causes of morbidity and mortality in the world [1]. The exact cause of atherosclerosis isn't known. Certain traits, genetic and environmental factors raise the risks for developing atherosclerosis. Significant risk factors for atherosclerosis included diet, smoking, exercise, and infectious factors, and the candidate gene studies had shown that some of these factors interacted with genetic factors [2,3].
As the primary pathological basis of CVD, atherosclerosis is a progressive inflammatory disease of the arteries caused by endothelial cell injury. Over the past three decades, it has become clear that oxidized low-density lipoproteins (oxLDLs) or oxidized lipids trapped in the vessel wall induce the overlying endothelial cells (ECs) to express adhesion molecules and cytokines that promote the recruitment of monocytes and lymphocytes to the vessel wall and brought inflammatory disorder [4,5]. However, the mechanism by which oxidized lipids trigger a wide range of responses was still poorly understood. In addition, endothelial cell is the front line to defend all the stress in the vessel, and it is the basis for studying the progress of atherosclerosis. Therefore, in order to explore atherosclerosis, endothelial cell is crucial for seeking the early sensitive diagnostic marker and prevention strategies.
The development of gene microarray technology (identification of the changes in mRNA expression profiles), has been widely applied in the past decades [6]. It provided a unique opportunity to assess atherosclerosis's occurrence and development with lower expenses, more straightforward operation, high-throughput volume [7]. In recent years, numerous studies on the gene expression profile of atherosclerosis had revealed hundreds of differentially expressed genes (DEGs), which provided a large amount of practical information for gene regulatory network analysis [7,8]. However, comparative DEGs analyses reported by independent research appear to have little substantial overlap, and the results are limited or inconsistent due to tissue or sample heterogeneity. Thus, there are no reliable biomarkers to distinguish atherosclerotic from non-atherosclerotic samples. The integrated bioinformatics approaches combining with expression profiling analysis will be innovative and may solve the disadvantages.
In this study, we analyzed in-depth on two microarray datasets, GSE30169 (Romanoski et al., 2011) and GSE6584 (Gong et al., 2007) from the Gene Expression Omnibus database (GEO), in which the samples were collected from oxidized 1-palmitoyl-2-arachidonoyl-snglycero-3-phosphatidylcholine (Ox-PAPC, an component in atherosclerotic lesion) treated human aortic endothelial cell (HAECs) and human microvascular endothelial cells (HMECs) [5]. Subsequently, extracted DEGs and integration of DEGs Protein-protein interaction (PPI) network and modular analysis were applied to identify hub genes in atherosclerosis. Identifying sensitive markers, DEGs and enriching their biological functions and key pathways will provide more accurate, practically reliable biomarkers for early diagnosis and individualized prevention and therapy as well as drug targets discovery.

Data source and screening of DEGs
The GSE30169 and GSE6584 gene expression profiles and its relevant platform annotation files were obtained from GEO database. The GSE30169 dataset was submitted by Professor Romanoski CE and

Data preprocessing and screening of DEGs
The DEGs were screened by using GEO2R to identify the different expression profiles between the HAECs and HMECs treated by 40ug/ml Ox-PAPC with control samples. In this study, DEGs were defined with cut-off criteria of p < .05 and |log2 fold-change (FC)| > .6.

Functional annotation and pathway enrichment
Gene ontology analysis (GO) is a standard useful method for annotating genes and gene products to identify natural biological phenomena for high-throughput genomic or transcriptome data [10,11]. To describe gene product attributes, GO provides three categories of defined terms, including molecular function (MF), cellular component (CC), and biological process (BP) [12]. Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) is an integrated database resource with a powerful feature in the systematic analysis of gene functions, linking genomic information with higher-order functional information [13]. The above two analysis were obtainable in the Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/; date of access, 08/05/2020), which is also a bioinformatics data resource composed of an integrated biology knowledge base and analysis tools to extract biological meanings from large quantities of genes and protein collections through a novel agglomeration algorithm [14]. In the present study, GO term and KEGG pathway analysis was performed using the DAVID online tool and P < .05 was set as the cut-off criterion. Meanwhile, another available database REACTOME (available online: http://www.reactome.org) was carried out as a supplement of KEGG.

PPI network construction and analysis of modules
Search Tool for the Retrieval of Interacting Genes (STRING) database (http://string-db.org/) is an online software designed to evaluate protein-protein interaction (PPI) relationships between DEGs, including direct (physical) and indirect (functional) associations [15]. In the present study, STRING was used to map DEGs, and experimentally validated interactions with a combined score > .4 were selected as significant. Then, the PPI network was visually analyzed by Cytoscape 3.6.0 [16]. To screen the hub genes, a node degree of ≧10 was selected as the threshold. Furthermore, the crucial modules' hub genes were further mapped to GO terms and KEGG pathways for functional analysis.

ROC curve analysis
GEO profiles were searched to analyze further the relationship between the hub genes and type 2 diabetes-related diseases. ROC (receiver operating characteristic) curves were constructed by the value of every sample of GDS to compare the diagnostic value of the hub genes in the related diseases.
The diagnostic accuracy of the ROC curve was represented by the area under the curve (AUC). AUC values closer to 1 indicated that the diagnostic accuracy was reliable.

Type 2 diabetic atherosclerosis mice model and Real-time quantitative PCR
All experiments of this study were performed in adherence to the NIH Guidelines on the Use of

Laboratory Animals and approved by the Thomas Jefferson University and Shanxi Medical University
Committee on Animal Care. Apoe knockout mice (male Apoe-/-mice, 8-10 weeks old) were utilized in this study. The mice were either fed the normal diet (ND), or high-fat diet (HFD) (60% kcal fat, 20% kcal protein, 20% kcal carbohydrate, Cat #D12492; Research Diets) for 20 weeks to induce atherosclerosis according to the previous study [17]. Total RNA was extracted from Apoe -/mice's artery using Trizol reagent (Invitrogen, Carlsbad, CA). Reverse transcription of 1μg total RNA samples was carried out by using RTScript cDNA synthesis kit (Takara) according to manufacturer's instructions. Real-time PCR reactions were prepared using SYBR-Green Master mix (Thermo Fisher Scientific) in a QuantStudio5 Real-Time PCR Detection System (Applied Biosystems). The relative amount of mRNA transcripts was quantified using the △Ct method. The average Ct obtained in non-treated cells was used as a calibrator and 18s gene was used as the reference for normalization.
Sequences of the forward (For) and reverse (Rev) primers were purchased from Integrated DNA Technologies (Table 5).

Statistical analysis
Quantitative results are expressed as mean±SEM. Comparisons between two groups were analyzed by t-test, P values less than 0.05 were considered statistically significant. All statistical analyses were performed via GraphPad Prism 8.

Ox-PAPC
Gene expression profiles of GSE30169 and GSE6584 were obtained from NCBI-GEO free database.  Table 1. Therefore, the above results indicated that most of the DEGs were significantly enriched in cytoplasm mitochondrial, binding, and positive regulation of RNA polymerase II involved transcription.

Signaling pathway enrichment analysis
KEGG PATHWAY, Reactome, Panther, and Gene Ontology signaling pathway analysis tool were utilized to conduct DEGs functional and signaling pathway enrichment assay. As illustrated in Figure   3 and Table 2, the results indicated that the DEGs were significantly enriched in cytokine-cytokine receptor interaction, TNF signaling pathway, Influenza A, legionellosis, transcriptional dysregulation in cancer, protein processing in the endoplasmic reticulum, NOD-like receptor signaling pathway, Salmonella infection, mineral absorption signaling pathways. Furthermore, to clarify the relationship among signaling pathways, we analyzed the correlation among the pathways through deepen analyzing KEGG process in ClueGO online database (apps.cytoscape.org/apps/cluego). The results demonstrated that all DEGs were primarily associated with the ferroptosis, TNF signaling pathway, IL-17 signaling pathway, mineral absorption, prion diseases ( Figure 3B, 3C).

Key candidate genes and pathways identified through the construction of DEGs PPI network and modular analysis
Using STRING online database (http://string-db.org) and Cytoscape online analyses tool, PPI network with 65 nodes and 168 edges was visually constructed. Total of 91 DEGs (68 up-regulated and 23 down-regulated genes) commonly altered DEGs was filtered into the DEGs PPI network complex with a PPI score of > 0.4 ( Figure. 4). In the PPI network, the most significant 12 node degree genes with a degree of ≥ 10 were regarded as key genes ( Figure. 4 and Table 3 (Table 4), in the BP category. In the CC category, the 12 genes were mainly enriched in CHOP-C/EBP complex, extracellular space, and caveola (Table 4). In the MF category, the analysis revealed that the genes were mainly related to chemokine activity, protein homodimerization activity, enzyme binding, receptor binding, and protein binding (Table 4).
Pathway enrichment revealed that the 12 genes were mainly enriched in the TNF signaling pathway, Legionellosis, NOD-like receptor signaling pathway, and transcriptional misregulation pathway ( Table 4).

Validation of the DEGs in atherosclerosis mice model
To confirm the reliability of the identified DEGs from the two databases, we verified the RNA expression level of hub genes in the aorta isolated from the normal diet (ND) or high-fat diet (HFD) feed Apoe -/mice (atherosclerosis). As shown in Figure 5B, the expression of IL6, CXCL1, CCL2, TRIB3 DUPS1, DDIT3, CEBPB and HMOX1 were ranked on the top up-regulated in Apoe -/-HFD compared to Apoe -/-ND. The total consistency of the up-and down-regulated genes was 90.7%, suggesting that our results are reliable.
Taken together, this data suggests CCL2, CXCL1, IL6 and DUSP1 can serve as a diagnostic indicator for obesity, CEBPB and HMOX1 can serve as a diagnostic indicator for type 2 diabetes, TRIB3 is a sensitive marker for indicating risks of diabetic women group.

Discussion
Numerous studies have been conducted to reveal the cause and underlying mechanisms of atherosclerosis formation and progression. However, the incidence and mortality of atherosclerosis remain high worldwide, because tissue differentiation and most studies focus on a single genetic event or the results are spawned from a single cohort study [19]. Our study integrated two expression profile datasets from different groups on vascular cells, the basis of diabetic macroangiopathy, utilized bioinformatics methods to analyze these datasets deeply, and identified 91 changed DEGs. Then DEGs PPI network complex was developed, and 65 nodes were identified with 168 edges. Among them, we further identified CCL2, CXCL1, IL6 and DUSP1 can serve as a diagnostic indicator for obesity, CEBPB and HMOX1 can serve as a diagnostic indicator for type 2 diabetes, TRIB3 is a sensitive marker for indicating risks of the diabetic women group, which all the high-risk factors for atherosclerosis.
The changed DEGs were classified into three groups: molecular functions (MF), biological progress (BP) and cellular component (CC) groups, by GO term. Concerning BPs, the DEGs of atherosclerosis were mainly included in positive regulation of transcription from RNA polymerase II promoter, signal transduction, and immune response. It is well known that inflammation is an essential driver of atherosclerosis, and complex immune reactions drive the inflammation in the vascular wall in response to an atherosclerotic microenvironment [20]. There was compelling evidence emerging that the immune system and diabetes play a key role in human atherosclerotic CVD [21,22] [23,24] [25].
The present study, for the first time, revealed that the genes IL6 and CXCL1 involved in the crosstalk between endothelial cells and immune cells, such as driving the progression of atherosclerosis.
Next, KEGG pathway analysis was conducted by DAVID, and confirmed in ClueGO (plug-in of Cytoscape). The results showed that the 91 up-and down-regulated genes were enriched in the ferroptosis, TNF signaling pathway, IL-17 signaling pathway, mineral absorption, prion diseases.
Ferroptosis is a recently discovered regulated form of necrosis pathway characterized by iron dependence and aggravation of lipid peroxidation [26][27][28][29]. Bruni reported that the utility of inhibitors of ferroptosis was capable of reducing intracellular lipid peroxidation [30]. This study found that GCLM, PRNP, SLC3A2, HMOX1, and FTH1 were up-regulated, indicating ferroptosis involvement.
The TNF signaling pathway plays a crucial role in the inflammation. In our study, we found that the regulated cytokines, such as IL6, CCL2, CXCL1, and CXCL3 were involved in TNF signaling pathway. Of note, Interleukin-6 (IL-6) gene was first shown to regulate the acute phase response associated with inflammation [31] and considered as a mediator of inflammation, immune response, and hematopoiesis [32]. Various studies have demonstrated that interleukin-6 (Il-6) is an upstream inflammatory cytokine that plays a central role in the downstream inflammatory response responsible for atherosclerosis [33,34]. In gene knockout models, only the IL-6 knockout animals showed impaired acute-phase response [35,36]. Indeed, increased Il-6 concentrations found in downstream acute phase reactants were independent risk factors to predict overall mortality and cardiovascular mortality at 5 years in atherosclerosis [37]. Taking these results into account, IL-6 was an upstream regulator that played a central role in propagating the downstream inflammatory response responsible for atherosclerosis. Furthermore, by constructing the PPI network, a number of key genes were identified that might be useful in future therapeutic studies on atherosclerosis. Notably, key nodes in the PPI network and genes in the significant modules, including CXCL8, CXCL1, CCL2, may have specific contributions to the occurrence and development of atherosclerosis. CXCL8, also known as IL8, is an inflammatory CXCL: C-X-C motif chemokine ligand 1 CCL2: C-C motif chemokine ligand 2 factor, known to play an important role in the development of atherosclerosis [38]. High levels of IL-8 are associated with an increased risk of coronary artery disease [39,40]. CXCL1, a ligand of CXCR2, plays a vital role in atherogenic monocyte recruitment and other chemokines [41]. In the early stage of atherosclerosis, CXCL1 in the vascular wall promotes macrophages' aggregation and induces monocytes to arrest, thereby enhancing atherosclerosis [42]. The up-regulation of immobilized chemokine CCL2 released by endothelial or myeloid cells on arterial vessels can trigger cell arrest and promote leukocyte migration to atherosclerotic lesions [43]. Winter reported that Circulating myeloid cells deposit CCL2 rhythmically on the arterial endothelium and rhythmic arterial myeloid cell adhesion is triggered by the CCL2-CCR2 axis [44].
Next, to explore the difference between humans and animals, we analyze the RNA expression of hub genes from the artery of Apoe -/mice fed ND and HFD. The expression of IL6, CXCL1, CCL2, TRIB3, DUSP1, DDIT3, CEBPB and HMOX1 were up-regulated in HFD Apoe -/compared to ND mice. However, for CXCL15 (mouse homolog of human IL-8), no CT value was detected; this could be the difference between human and mouse. Finally, we searched the GEO profiles and found that some of the hub genes were related to obesity and type 2 diabetes. The area under the curve of CCL2, CEBPB and HMOX1 was 0.930, 0.877 and 0.846, respectively. This suggested that these genes could be the diagnostic indicator for diabetic atherosclerosis. However, further biological experimental evidence is required to determine the identified gene's causative function in atherosclerosis.
In conclusion, we conducted in-depth analysis of the DEGs identified in the human endothelial cells with Ox-PAPC compared with the matched without Ox-PAPC, which are involved in the development of atherosclerosis. These DEGs and key nodes identified in the PPI network constructed by genes involved in important modules may play a vital role in atherosclerosis occurrence and development. Furthermore, these results may provide valuable clues for investigating the pathogenesis and drug targets of atherosclerosis.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declared no conflict of interest.    Results were expressed as mean ± SD. n = 3. **p < 0.01 vs. Apoe -/-ND. C-F, ROC curve analysis revealed that CCL2, CXCL1, IL6 and DUSP1 distinguished patients with and without obesity. G-H, ROC curve analysis revealed that HMOX1 and CEBPB distinguishes patients with and without diabetes. I, ROC curve analysis revealed that TRIB3 distinguishes women patients with and without obese diabetes. J, AUC of hub genes. PPI, protein-protein interaction network. Apoe -/-, Apoe knockout; ND, normal diet; AUC, the area under of curve.