Correlation Network Analyses Based on Metagenomics and Multi-type Metabolomic Data Identied Biomarkers of Coronary Artery Disease

Background: Coronary artery disease (CAD) is a complex, multifactorial disease and the underlying pathogenesis is unclear. It is essential to improve our understanding of the aetiology and pathogenesis of CAD for developing effective methods of early diagnosis and treatment. Results: We recruited 190 participants including normal coronary artery (n = 49), stable coronary artery disease (n = 93) and acute myocardial infarction patients (n = 48). We combined metagenomics (16S rRNA sequencing and gut microbiome whole-genome sequencing) and multi-type metabolomics (serum, faeces and urine) analyses to determine the correlations among metabolites, the microbiota and clinical indexes to identify biomarkers of CAD. Compared with the faecal metabolites, serum and urine metabolites exhibited strong correlation with clinical indexes. Comparing the three types of metabolome, we discovered that the faecal and urine metabolome was more suitable than the serum metabolome for correlation analysis with the microbiota. Furthermore, we found that the serum and urine metabolome was more suitable than the faecal metabolome for correlation analysis with the clinical indexes. We constructed, for the rst time, the relationship networks among the microbiota, metabolites and pathways. Through the relationship network analysis, we identied some important differential metabolites and delineated how these metabolites are related to the differentially abundant microbes. Conclusions: In our research, we used metagenome and multi-type metabolome proling based on hundreds of samples to gure out the correlations among metabolites, microbiota and clinical indexes. In addition, we rstly constructed the relationship networks of microbiota, metabolites and pathways. We believe our ndings can help the researchers to further understand the pathogenesis of CAD.


Background
Coronary artery disease (CAD) is a heart disease caused by the accumulation of atherosclerotic plaques in blood vessels [1]. Based on clinical symptoms, the extent of arterial blockage and the degree of myocardial injury, CAD is divided into different categories, including stable coronary artery disease (sCAD) and acute myocardial infarction (AMI) [2]. Stable coronary artery disease (sCAD) refers to the syndrome of recurrent, transient episodes of chest pain re ecting demand-supply mismatch, that is, angina pectoris [3]. AMI is characterized by acute myocardial ischaemia and the elevated cardiac troponin values (cTn) with at least one value above the 99th percentile upper reference limit (URL) [4].
Epidemiological studies of CAD have shown that some clinical indexes such as age, male gender and smoking can increase the risk of myocardial infarction. Simultaneously, low-density lipoprotein cholesterol (LDL-C), triglyceride-rich lipoproteins, or high-density lipoprotein cholesterol (HDL-C) are associated with CAD [1]. Although Invasive coronary angiography (CA) is the gold standard for diagnosis of coronary artery disease (CAD), it is so invasive that it can cause various complications [5]. Improving our understanding of CAD aetiology and pathogenesis is essential to developing effective methods for early diagnosis and treatment.
Recent studies have shown an association between the gut microbiome and cardiovascular diseases [6].
Several metagenome-wide association studies have revealed that the composition of the human intestinal microbiome can shape human health, and that speci c gut microbes are associated with many diseases [7][8][9]. Many studies have proven that the composition of the gut microbiome is in uenced by the diet, lifestyle and genetic susceptibility of the host [10][11][12]. Researches show that the disordered human gut microbiome can lead to diseases, such as cardiovascular disease, metabolic syndrome and type 2 diabetes [7,10,13].
In terms of the contribution of the microbiome to cardiovascular diseases, several breakthroughs have been made by performing 16rRNA-seq or metagenome sequencing. For example, Li et al. performed comprehensive metagenomic and metabolomic analyses, described a novel causal role of aberrant gut microbiota in hypertension pathogenesis and also veri ed the results in mouse models [14]. Liu et al. identi ed 29 metabolite modules that were correlated with CAD phenotypes and constructed a classi er that could discriminate cases from controls [6]. Zhou et al. provided the rst evidence that cardiovascular outcomes following myocardial infarction (MI) are driven by the intestinal microbiota [15].
Despite these advances, no one has compared the differential correlations between the gut microbiota/clinical indexes and three key metabolome pro les: serum, faeces and urine metabolomes. In addition, few studies have described the relationships among metabolites, the gut microbiota and pathways. To better understand the pathogenesis of CAD, we analyzed metagenome (16S rRNA sequencing and gut microbiome whole-genome sequencing) and multi-type metabolome (serum, faeces and urine) data to determine the correlations among metabolites, the microbiota and clinical indexes, and to identify biomarkers of CAD in 190 participants. Additionally, we systematically compared the results for three types of metabolome (serum, faeces and urine). We hope our ndings will help the researchers to understand the pathogenesis of CAD much better.

Clinical characteristics of CAD groups
We collected 190 participants including 49 individuals with a normal coronary artery (NCA) pro le, 93 patients with stable coronary artery disease (sCAD) and 48 patients with acute myocardial infarction (AMI) from Fuwai Hospital (Beijing, China) ( Table 1). In terms of patient demographics, we found that sex was signi cantly associated with the three groups (p <0.05), while smoking and drinking were only signi cantly different in NCA vs. sCAD and NCA vs. AMI (p <0.05). Only hypertension was signi cantly different in NCA vs. sCAD and sCAD vs. AMI. In laboratory parameters, C-reactive protein (CRP) levels was signi cantly different among the three groups (p <0.05), LDL cholesterol (LDL-C) and total cholesterol (TC) levels were signi cantly different in NCA vs. sCAD and NCA vs. AMI (p <0.05). In terms of the Gini coe cients of all clinical indexes ( Figure S2), lactic acid dehydrogenase (LDH), aspartate transaminase (AST), CRP, LDL-C, TC and creatine kinase-MB (CK-MB) were the top ranked, demonstrating that these indexes have auxiliary signi cance for the clinical diagnosis of CAD. 2.2 Identi cation of differential microbiotas in the gut microbiome between the CAD subgroups We rst aimed to explore the changes in intestinal microbial community composition in the three CAD subgroups. We used QIIME2 and LEfSe to analyze the 16S rRNA sequencing data from the 190 participants [20]. The alpha diversity plot ( Figure S3) showed that the sequencing data from all the samples was saturated, and the richness was diverse. We then used the cPCoA method to reveal the beta diversity of microbial composition among the CAD subgroups ( Figure S4). In addition, the taxonomic classi cation at the phylum level of the CAD subgroups is shown in Figure S5. Here, we found that the 190 samples were mainly composed of the phyla Bacteroidetes and Firmicutes.
We focused next on the microbes with signi cant differential abundances among the three CAD subgroups (LDA > 2) ( Figure 1). We identi ed 25 different taxa between the NCA and sCAD groups ( Figure   1A). For example, Bacteroides and Escherichia were found at a higher abundance in the NCA group (LDA > 3), while Desulfovibrio and Parabacteroides were found at a higher abundance in the sCAD group (LDA > 2). We also identi ed 34 signi cantly different taxa between the NCA and AMI groups ( Figure 1B). Here, Desulfovibrio, Streptococcus, and Lacobacillus had higher abundances in the AMI group (LDA > 2), while Clostridium had a higher abundance in the NCA group (LDA > 3). Finally, we found 69 signi cantly different taxa between the sCAD and AMI groups ( Figure 1C). Notably, Streptococcus, Alistipes, Olsenella, Actinomyces, and Escherichia had higher abundances in the AMI group (LDA > 2), and Prevotella, Clostridium, and Lactobacillus had higher abundances in the sCAD group (LDA > 3). Together, we have identi ed taxa with signi cantly different abundances in different CAD subgroups.

Identi cation of differential metabolites in multi-type metabolomes between the CAD subgroups
We next used NMR (Nuclear Magnetic Resonance) to detect metabolites in the three types of metabolome (faeces, serum and urine) in samples from NCA, sCAD and AMI participants. We detected total of 29, 33 and 31 metabolites in the faeses, serum and urine, respectively. In NCA vs. sCAD, we identi ed 3, 18 and 18 metabolites that were differential detectable in the faeces, serum and urine, respectively ( Table 2) (p <0.1). The methanol level was also signi cantly different in the three types of metabolome, and the levels of other four metabolites, including creatinine, were signi cantly different in the serum and urine metabolome. In sCAD vs. AMI, we detected 7, 21 and 18 differentially detectable metabolites in the faeces, serum and urine, respectively (Table 2) (p < 0.1). Similar to NCA vs. sCAD, the levels of ve metabolites, including creatinine, were signi cantly different in the serum and urine metabolomes. Finally, in NCA vs. AMI, we detected 6, 18 and 14 differentially detectable metabolites in the faeces, serum and urine, respectively (Table 2) (p <0.1). Again, the levels of creatinine and alanine were signi cantly different in the serum and urine metabolome. Overall, we found that citrate and creatinine levels in the serum and urine metabolomes were signi cantly different among the CAD subgroups and thus, we infer that citrate and creatinine are associated with the pathogenesis of CAD.

Integrated analyses reveal the relationships between the gut microbiota and differential faecal metabolites in the CAD subgroups
To identify the relationships among the microbiota, metabolites and metabolic pathways, we performed an integrated analysis based on 16S rRNA gene, metagenomic, and metabolomics (faecal) sequencing data. 20 samples of each CAD subgroups were randomly selected for metagenomic sequencing. We identi ed the relationships between metabolites and metabolic pathways in the MetaCyc database [25] and obtained the relative abundances of taxa and metabolic pathways by analyzing metagenomic data. Then, we constructed relationship networks based on the metabolic pathways, differential metabolites and differentially abundant genera.
In the NCA vs. sCAD network (Figure 2), the differential abundant metabolite alanine and differentially abundant genus Escherichia/Streptococcus were linked through the metabolic pathways: L-alanine biosynthesis. In the sCAD vs. AMI network, we observed 6 differential metabolites, 13 metabolic pathways, and 12 differentially abundant genera. According to the previous studies, methionine and acetate are associated with the development of CAD [27][28][29][30]. And in our research, methionine and acetate were connected to differentially abundant genera, such as Escherichia, Streptococcus and Coprococcus through the L-methionine cycle I, pyruvate fermentation to acetate and lactate II metabolic pathways. Notably, we found that Escherichia was present in both the NCA vs. sCAD and sCAD vs. AMI networks. Finally, the NCA vs. AMI network contained 3 differential metabolites, 11 metabolic pathways and 6 genera. Interestingly, methionine and acetate were present in both the sCAD vs. AMI and NCA vs. AMI networks. With the exception of Desulfovibrio, the ve other differentially abundant genera also existed in the sCAD vs. AMI network. These metabolites and genera might, therefore be associated with AMI occurrence.

Correlation analysis between the gut microbiota and CAD clinical indexes
To uncover the relationships between the gut microbiota and clinical indexes, we performed a spearman correlation analysis of the relative abundance of the gut microbiota with the clinical indexes of all samples ( Figure 3). As the gure shows, a lots of signi cant correlations were found, for example, AST was signi cantly positively correlated with S.agalactiae, Streptococcus, and Streptococcaceae (rho > 0.25 and p <0.05); FDP was signi cantly negatively correlated with Prevotella and Prevotellaceae (rho < -0.25 and p <0.05); LDL-C was signi cantly negatively correlated with Desulfovibrio (rho < -0.25 and P < 0.05); type 2 diabetes was signi cantly positively correlated with Lactobacillus and L. hamsteri (rho > 0.25 and p <0.05), and signi cantly negatively correlated with Erysipelotrichaceae (rho < -0.25 and p <0.05); and HbA1c was signi cantly positively correlated with L. hamsteri (rho > 0.25 and p <0.05). Twelve taxa such as Streptococcus, Prevotella and Bacteroides were signi cantly different in NCA vs. sCAD, sCAD vs. AMI and NCA vs. AMI. In addition, according to the Gini coe cients of clinical indexes, we found that AST and LDL-C were highly signi cant for CAD grouping ( Figure S2). Together, these data show that some differentially abundant taxa are signi cantly related to some key clinical indexes such as LDL-C, AST and HbA1c, which provided a good theoretical basis for these differential taxa as biomarkers in future, and pending replication in later researches.

Correlation analysis between clinical indexes and three types of metabolites
To explore and compare the relationships between clinical indexes and the metabolites in serum, urine and faeces, we again performed a spearman correlation analysis between the three types of metabolites and clinical indexes (Figure 4). In the analysis of the correlation between faecal metabolites and clinical indexes, we found that FT3 was signi cantly positively correlated with hypoxanthine (rho > 0.25 and p <0.05). In the correlation analysis of serum metabolites and clinical indexes, CRP was signi cantly negatively correlated with alanine and citrate (rho < -0.25 and p <0.05); AST was signi cantly positively correlated with lactate (rho > 0.25 and p <0.05), and negatively correlated with glycine (rho < -0.25 and p <0.05). Type 2 diabetes, blood glucose and HbA1c was signi cantly negatively correlated with most metabolites (rho < -0.25 and p <0.05) ( Figure 4A).
When comparing the results of the correlation analysis between the three types of metabolites and clinical indexes, we found that more metabolites in serum and urine were correlated with more clinical indexes than the faecal metabolome, this may indicate the serum and urine metabolome were more closely linked to the pathogenesis of CAD. It is worth noting that both type 2 diabetes vs. methanol and blood glucose vs. methanol showed signi cant negative correlation not only in the correlation analysis between serum metabolome and clinical indexes, but also in the correlation analysis between urine metabolome and clinical indexes. This may imply methanol may be correlated with both CAD and type 2 diabetes.

Correlation analysis between gut microbiota and multitype metabolites
To reveal and compare the relationships between the gut microbiota and the three CAD subgroups of faecal, serum and urine metabolites. In our spearman correlation analysis ( Figure 5), we found that in the correlation analysis between the faecal metabolome and gut microbiota ( Figure 5A), Desulfovibrio was signi cantly positively correlated with sarcosine, dimethylamine, isovalerate (rho > 0.3 and p <0.05); Consistent with a previous study [31], Faecalibacterium were signi cantly positively related to butyrate (rho > 0.3 and p <0.05). We also found that Prevotella and Prevotellaceae were signi cantly positively correlated with hypoxanthine and xanthine (rho > 0.3 and p <0.05); A. negoldii, Alistipes and Rikenellaceae were signi cantly positively correlated with p-cresol (rho > 0.3 and p <0.05), which was consistent with the previous report [31].
When analyzing the correlation between the serum metabolome and gut microbiota ( Figure 5B In the analysis of the correlation between urine metabolome and gut microbiota ( Figure 5C), Prevotella and Prevotellaceae were signi cantly positively related to hippurate (rho > 0.3 and p <0.05); M.fermentans and S.termitidis were signi cantly positively related to hypoxanthine and N-phenylacetylglycine, respectively (rho > 0.3 and p <0.05); S.variabile was signi cantly positively correlated with galactose and urea (rho > 0.3 and p < 0.05).
Finally, when analyzing the correlation between the three types of metabolome and the gut microbiota, we found that 70% of the taxa in which correlated with the urine metabolites (rho > 0.3 and p <0.05) were also correlated with the faecal metabolites. With the exception of Prevotella and Prevotellaceae, the gut microbes associated with the serum metabolome were different from those associated with the other two metabolomes. In addition, Prevotella and Prevotellaceae which were related to the three metabolomes were differentially abundant in sCAD vs. AMI.

Discussion
The study has explored the relationships among the gut microbiota, three types of metabolome and clinical indexes in an attempt to nd potential CAD biomarkers; the biological roles of our identi ed markers (such as the genus Streptococcus and Prevotella, the metabolite citrate and creatinine) now warrant further investigation. In support of these ndings, Liu et al. found that Prevotella was a potential therapeutic target for patients with cardiovascular disease [32]. It has been proved that the citrate cycle enhances cardioprotection [33]. And creatinine has been proposed to be a suitable biomarker for diabetes, CAD and renal function [34]. Furthermore, we constructed relationship networks for the microbiota, metabolic pathways, and metabolites to try to explore CAD pathogenesis. We used the faecal metabolites to build the network because the metabolites detected in the faeces are more likely to come from intestinal microorganisms. Through network analysis, we delineated how these metabolites are related to the differentially abundant microbes. For example, we speculated that Escherichia releases alanine through the L-alanine biosynthesis pathway.
In the correlation analysis, serum/urine metabolites correlated more with clinical indexes than faecal; this nding constitutes a good option for researchers who want to further study the relationships between metabolites and clinical indexes. Compared with the serum metabolome, more correlations have been found between the faecal/urine metabolome and gut microbiota. By comparing the performance of the three metabolomes, we found that faeces/urine metabolites have stronger correlations with gut microbiota than serum metabolites.
190 faecal samples were sequenced by 16S rRNA sequencing, 60 faecal samples of which were also sequenced by metagenome sequencing. Compared with metagenomics sequencing, 16S rRNA sequencing is not expensive and suitable for large cohort analysis. But it is di cult to detect the abundance of microbiotas in species level of 16S data for technical reason. A similar situation occurs in the metabolome using NMR methods. Although metabolome sequencing based on NMR is highly speci c, fewer metabolites can be detected. With the development of next generation sequencing (NGS) sequencing technology, metagenomic shotgun sequencing will become the mainstream, which will promote the further study of CAD in the future. Simultaneously, a stable untargeted metabolomics sequencing technology with high speci city, sensitivity and resolution need to be developed.

Conclusions
In our research, we used metagenome and multi-type metabolome pro ling based on hundreds of samples to gure out the correlations among metabolites, microbiota and clinical indexes. We found serum and urine metabolites have stronger correlations with clinical indexes than faecal metabolites.
Faecal and urine metabolites have stronger correlations with gut microbiota than serum metabolites. In addition, we rstly constructed the relationship networks of microbiota, metabolites and pathways. We believe our ndings can help the researchers to further understand the pathogenesis of CAD. level. The coronary angiography was performed on all patients. Plaques or stenosis was not found in age-and sex-matched control subjects. All enrolled participants in the NCA, sCAD and AMI group who were suspected of CAD underwent CAG and had no history of unstable angina, myocardial infarction, stroke, cancers, or coronary revascularization. The angiographic data were con rmed independently by two observers in this study.

Nuclear Magnetic Resonance (NMR) Sample collection and preparation
Serum before the coronary angiography surgery and urine early morning urinary samples were collected and centrifuged at 278 K at 3,000g for 10 min, the supernatants of samples were stored at -80 °C for metabolic pro le establishment and statistical analysis. Faeces samples were stored at -80 °C after homogenate with phosphate buffer (0.2 M NaH2PO4/K2HPO4, pH 7.4). Samples were prepared using the previously reported method [16].

NMR Spectra Acquisition and Processing
All NMR spectra were recorded at 298 K using a Bruker Avance 500 MHz spectrometer (1H frequency: 500.13 MHz; Bruker, Germany). For quantitative metabolomics pro ling of ltered serum, urine, and faeces, spectra were processed with the Chenomx NMR Suite 7.5 software (Chenomx Inc., Edmonton, Canada) using the "targeted pro ling" approach [17]. Open database sources, including the KEGG, MetaboAnalyst, Human Metabolome Database, and METLIN, were used to identify metabolic pathways [18,19].

NMR Multivariate Data Analysis
Output data were processed with the SIMCA-P+ 14.0 software (Umetrics, Sweden) to elucidate patterns in metabolite concentration shifts. Statistical analysis was also conducted with SPSS19.0 (IBM; USA) using the two-tailed Student's t-test. P-value of less than 0.05 was considered to be statistically signi cant between two groups.

Human faecal sample collection and DNA extraction
Fresh faeces samples were collected from 190 subjects, and then delivered from Fuwai Hospital to the laboratory in an ice bag using insulating polystyrene foam containers. DNA was extracted using an EZNA™ stool DNA isolation kit (Omega Bio-Tek, VWR, Herlev, Denmark). The DNA was then eluted in 50 µL of elution buffer and stored at -80°C. 5.7 Sequencing data analysis QIIME 2 was used to process 16S rRNA sequencing data ( Figure S1). Sequence quality control, feature table construction and lter chimeric sequences were performed by DADA2 plugins [20]. Features were created by clustering sequences with 100% similarity. Representative sequences for each feature were used to construct a rooted phylogenetic tree by q2-phylogeny plugin. The script will randomly subsample the counts from each sample to the 7327 sequences. Alpha and beta diversity were generated by q2diversity plugin. Shannon's index, the observed OTUs, and evenness were evaluated. A normalized feature abundance table was used for the constrained principal coordinate analysis (cPCoA). Taxonomic analysis was performed by q2-feature-classi er plugin. Differential abundance taxa were generated by LEfSe (LDA>2) [21].
Quality control for the metagenomics shotgun sequencing data was conducted using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Low quality reads and adapter sequences were removed by Trimmomatic [22]. Taxonomic pro les were generated using MetaPhlAn v2.662 [23] and pathways enrichment was done by HUMAnN2 [24]. The interactions of pathways and metabolites were integrated using MetaCyc and the relationship networks contained microbiota, metabolites and pathways were visualized by Cytoscape [25,26].

Statistical analyses
The Gini coe cients of clinical indexes were generated using R scripts. The wilcox and Fisher's test were used to analyze the differential clinical indexes for continuous and categorical variables, respectively. Spearman correlations between microbiota, metabolites and clinical indexes were calculated using R scripts. The visual presentation of multiple omics correlations was performed using the pheatmap package in R.

Data availability
The datasets are available in the repository of the Genome Sequence Archive Sequence Database of National Genomics Data Center (https://bigd.big.ac.cn/) under the accession number CRA002142.    Heatmap of correlation analysis between clinical indexes and microbiota Spearman correlation analysis was performed to assess the correlation of clinical indexes with gut microbiota in the NCA, sCAD and AMI groups. * p < 0.05 && rho>0.25