Bioinformatics analysis of common differential genes of coronary artery disease and carotid atherosclerosis

Coronary artery disease(CAD) is one of the most fatal diseases in the world, which seriously threatens human health. Studies have demonstrated that the appearance of carotid plaque is related to the risk of CAD, but the common differential genes and mechanism between these two conditions are still unclear. Our study identied the common differential genes between carotid atherosclerosis tissues and blood samples of CAD patients, aiming to search promising biomarkers in CAD predicting and diagnosing. We obtain datasets of GSE100927 and GSE56885 from GENE EXPRESSION OMNIBUS (GEO) database. Through scanning their mutual differentially expressed genes(DEGs), we performed Gene Ontology (GO), Kyoto Encyclopedia of Genes, Genomes (KEGG) analysis, and PPI analysis to get hub genes between these two conditions. We found that both CAD blood samples and carotid atherosclerotic plaque tissues were related to immune response, inammatory response and cell chemotaxis. Followed by PPI network construction, MCODE analysis found that 1 subnetwork, including CCR5, CCR2, CXCR4 and C5AR1, was extracted, which concerned as hub genes of the two datasets. Indicating that CCR5, CCR2, CXCR4 and C5AR1maybe potential candidate biomarkers for CAD prediction in patients with carotid plaques.


Introduction
Clinical studies have shown that carotid atherosclerotic plaque is closely related to the risk of CAD, and the application of carotid ultrasound can be used as an independent factor to predict CAD 1,2 . In order to improve the prediction of CAD, Vijay Namb et al 3 found that adding common carotid artery intima-media thickness (CCA-IMT) and plaque information to traditional risk factors could be a better choice for risk prediction. So far, with the increasing use of bioinformatics methods in medical eld, a large number of researchers have used genetic information to excavate genetic differences in patients with carotid atherosclerosis or CAD to screen for disease related markers. It was found that CCA-IMT and plaques are also genetically related to CAD and stroke in a genome-wide association study (GWAS) 4 . However, most of the researches only focused on screening differential genes in carotid atherosclerosis or CAD alone, ignoring the relationship between these two conditions. Therefore, we aim to jointly explore the common differential genes between carotid atherosclerotic tissues and blood samples of CAD patients through downloading two microarray datasets from GEO database, intending to investigate the underlying molecular mechanisms of the mutual occurrence and the development of the two diseases and providing candidate biomarkers for CAD prediction in patients with carotid plaques.

Datasets validation
The main purpose of the standardization of gene expression pro le is to eliminate deviation in expression caused by experimental technology. Through observing the boxplots of the two datasets (Figure1A and   Figure1B) showing below, we believed that the data in GSE100927 and GSE56885 was comparable, and could truly re ect the differences in biological functions. PCA analysis is a process of information concentration, which outputs with new variables like PC1, PC2...and so on. In the PCA cluster graph of PC1 and PC2 of GSE100927 (Figure1C), case group (carotid atherosclerotic tissues) and control group (controlled carotid artery tissues) were distinguished. In PCA cluster graph of GSE56885(Figure1D), there was a close correlation in control group (healthy control) samples, and samples in case group (CAD patients) showed a strong correlation as well.
DEGs in GSE100927, GSE56885, and their common DEGs With Networkanalyst 3.0 software, we scanned the DEGs in GSE100927 and GSE56885, drawing volcano maps to visualize distribution of the DGEs (Figure2A and Figure2B). There were 733 DEGs between the carotid atherosclerotic tissues and controlled carotid artery tissues in GSE100927 dataset. And in GSE56885 dataset, which contained blood samples of healthy controls and CAD patients, a total of 183 DEGs were identi ed. Venn diagram, as shown in Figure2C, demonstrated that there were 23 genes simultaneously expressed in both GSE100927 and GSE56885 datasets.

GO enrichment analysis and KEGG pathway analysis
In our study, we applied DAVID to perform GO function annotation and KEGG analysis on GSE100927 datasets, GSE56885 datasets and their common DEGs. Table 1 recorded the top ve functional enrichment results of BP, CC, MF, KEGG analysis, P value and the DEGs corresponding to each term (the complete gene list is in Supplementary Table 1) of GSE100927. The biological process differences between the carotid atherosclerotic tissues and controlled carotid artery tissues were mainly related to immunity and in ammation, which contained "immune response", "in ammatory response", "innate immune response", and so on. The cellular component differences were "plasma membrane" and "integral component of plasma membrane". In terms of molecular functions, the distinction between the two groups were "MHC class II receptor activity" and "structural constituent of muscle". In KEGG analysis, DEGs were mainly enriched in "Staphylococcus aureus infection" and "Leishmaniasis" pathways. The relevant results in ges100927 were displayed in circle diagram (Figure3A) and network diagram (Figure3D). Table 2 showed the go enrichment and KEGG analysis results of GSE56885(the complete gene list is in Supplementary Table 2). In Biological Process, the enrichment results were also concentrated on immunity and in ammation in blood samples of healthy volunteers and CAD patients. However, "chemotaxis" and "chemokine-mediated signaling pathway" were prominent differences between the two groups. In Cellular Component, "cytoplasm" and "nucleus" were top 2 enriched terms in dataset. In terms of Molecular Function, the differences between the two groups were mainly concentrated on "protein binding" and "transcription factor binding". And in KEGG analysis, the pathways of DEGs enrichment were mainly focused on "NOD-like receptor signaling pathway" and "TNF signaling pathway". The relevant results in ges56885 were shown in circle diagram (Figure3B) and network diagram (Figure3E). Table 3 displays the GO enrichment and KEGG analysis results of the common DEGs Page 4/21 of ges100927 and GSE56885. The Biological Process of the common DEGs enrichment involved immune response, in ammatory response and cell chemotaxis. In terms of cell composition, the top two results of enrichment were "integral component of plasma membrane" and "external side of plasma membrane". "C-C chemokine receptor activity", "chemokine receptor activity", and " coreceptor activity" were major molecular function enriched in common DEGs. And KEGG analysis revealed that "Cytokine-cytokine receptor interaction" and " Chemokine signaling pathway " were main enriched pathway in common DEGs.
Table1: GO and KEGG pathway analysis of GSE100927    Similarly, we obtained the hub genes CCR5, CCR2, CXCR4 and C5AR1 through mining PPI network by MCODE. Their functional enrichment also focused on cell chemotaxis, cell migration and peptide ligandbinding receptors, which were related to in ammation. Immune response was also inseparable.
According to the bioinformatics analysis results mentioned above, in ammation and immune response are mutual primary process in CAD patients and carotid atherosclerosis patients. Atherosclerosis is a chronic in ammatory disease. There are multiple types of cells existing in the plaques, including monocytes, macrophages, neutrophils, dendritic cells, T cell and B cell. Chemokines are cytokines that could induce targeted cells chemotaxis. In atherosclerosis, chemokines regulate the chemotaxis of leukocytes by binding to G protein-coupled receptors (GPCRs), and thus participate in the process of atherosclerosis and cardiovascular diseases 5,6 .
CXC-motif chemokine ligand 12(CXCL12, stromal cell-derived factor SDF-1), a chemokine for T cells and monocytes, has the function of activating platelet activity and is involved in the progression of atherosclerosis after binding to CXCR4 7 . Compared with healthy artery, CXCR4 and CXCL12 showed an up-regulated trend in both mRNA and protein levels, indicating an increase expression in both stable and unstable carotid atherosclerotic plaques, especially in macrophages 8 . Investigation of CXCR4 mimic revealed that CXCR4 mimic could block macrophage migration-inhibitory factor (MIF), an atypical chemokine that promotes atherosclerosis through CXCR4, and inhibit in ammation process in atherosclerosis 9 . Through applying iPS-CRISPR/Cas9 technology to explore the role of platelet chemokine CXCL14 in atherosclerosis, researchers found that CXCL14 and its receptor CXCR4 had a direct interaction in thrombosis formation and platelet migration. This result may provide a new treatment strategy for controlling the thrombosis formation in atherosclerotic diseases 10 .
Target of CXCR4 is also used in atherosclerotic plaque analysis and disease diagnosis. DEGs scanning is a promising and meaningful method for disease diagnosis and treatment.
However, some researchers discover that it is the epicardial adipose tissue (EAT) causes in ammatory DEGs expression in blood samples of CAD patients, not the CAD itself 29 . So, there are still controversies and challenges in judging and identifying CAD from the perspective of guring differential genes. Even so, the role of DEGs cannot be easily denied.
Biomarkers play a signi cant role in the diseases diagnosis and can be used to judge or monitor treatment response by detecting the deviation of biomarkers. In clinical practice, coronary angiography is the gold standard for the diagnosis of CAD. However, when the patients detected carotid plaques through carotid ultrasound, due to the contrast media allergy or the patients' nervousness about the examination, coronary angiography may not be the rst choice for CAD detection. Thus, combination use of detecting blood biomarkers with clinical medical detection methods maybe a quicker and easier method to judge and predict the disease. Likewise, patients with carotid plaques combined with blood biomarkers scanning maybe promising to predict the risk of CAD, and then the clinicians decide whether to use coronary angiography or coronary CT for the diagnosis of CAD or not.
Our analysis found that the common DEGs between carotid plaques and CAD are mainly related to in ammation and immune process, which concerned with chemokine functions. CCR5, CCR2, CXCR4 and C5AR1 may provide targets for CAD prediction and treatment in patients with carotid plaques.

Data Acquisition
The Affymetrix Human Gene Expression Array) contained 2 healthy controls and 4 CAD patients. The probes were transformed into the corresponding gene symbol according to the annotation information in the platform.

Data quality control and normalization
Networkanalyst3.0 (https://www.networkanalyst.ca/NetworkAnalyst/home.xhtml) is a comprehensive network visual analytics platform for gene expression analysis 30 . We used Networkanalyst3.0 to draw box plots of the two datasets to observe data distribution and determine whether there were abnormal values in the datasets. Principal components analysis PCA is a method that reveal the internal structure of the data and better explain the variables of the data. Meanwhile, PCA is a common method of sample clustering which can detect the data repeatability of the datasets. In this study, PCA analysis was also applied by Networkanalyst3.0. Upload gene expression data in "Gene Expression Table", selecting specify organism "human", data type "Microarray data", ID type "o cial gene symbol", and gene level summarization "mean", then click "submit" to start PCA analysis. According to the data type in the original dataset, select "None" in the normalization option in data ltering & normalization to standardize the data.

Identi cation of DEGs and common DEGs
We applied Networkanalyst 3.0 online software to screen genes of GSE100927 and GSE56885 datasets separately to obtain the DEGs. Here we chose FDR<0.05 and |log 2 fold change (FC)| > 1 as the criteria to determine the signi cant DEGs. Shengxinren software(https://shengxin.ren) were used to visualize the DEGs through drawing Volcano maps. Then we used TBtools 31 to visualize common DEGs of the two datasets.
Functional and pathway enrichment analysis DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov/home.jsp) 32,33 (version provides6.8), an online analysis tool suite with the function of integrated discovery and annotation is used for GO enrichment and KEGG analysis. In DAVID database operation interface, rst submit the differential gene lists. And then in "Select Identi er", we chose "OFFICIAL_GENE_SYMBOL" and select "homo sapiens" as background. At last, we submitted the list. P<0.05 was considered to indicate a statistically signi cant difference. OmicShare tools(http://www.omicshare.com/tools) is an online data visualization platform.
We used OmicShare tools to select biological processes (BP), cellular components (CC), molecular functions (MF) and KEGG pathways for data visualization. The metascape.org (http://metascape.org/gp/index.html#/main/step1) 34 were used to identi ed all statistically enriched terms to calculate clusters. The size of the point in each cluster was proportional to the number of genes in the corresponding entry. The nodes of the same color represented the same type of function. The network was visualized through cytoscape v3.1.2

PPI network construction and module analysis
Protein-protein interaction (PPI) refers to the study of the proteins interaction network, which may provide assistant to mine the core regulatory genes. String(https://string-db.org/) 35 is a powerful online software that integrates research on protein interaction research, genome research and proteome research. Select "multiple proteins", upload the gene list, select "homo sapiens" for the organization, and click "search" to construct the PPI network. MCODE algorithm is the most commonly used algorithm for mining protein complexes, which could nd closely-connected regions in the PPI network. The metascape.org are used to apply MCODE analysis. At last, GO enrichment and KEGG analysis to each MCODE network (top 3 best pvalue clusters)

Data availability
The datasets during and/or analyzed during the current study available from the corresponding author on reasonable request