Demographics and Clinical Data
46 PD patients and their paired healthy spouses were recruited in this study (Table 1). There was no significant difference in sex ratio and age between PD and Spouse groups. PD patients had average age of onset of 60.0 ± 6.5 years, disease duration of 3.0 (1.0–5.0) years, Hoehn and Yahr (H&Y) stage of 1.8 (1.0-2.5), and Unified Parkinson’s Disease Rating Scale (UPDRS) total scores of 30.5 (20.8–43.0). According to clinical phenotype, patients were divided into 3 subgroups: tremor dominant (TD, n = 20), postural instability and gait difficulty (PIGD, n = 20) and indeterminate (n = 6). Hereditary factors play a role in PD, and 4 (9.7%) patients had family history in our study. Pesticides exposure is a risk of PD, and 4 (9.7%) patients had ever suffered pesticides exposure (Additional file 1). Meanwhile, prevalence of constipation between two groups were significant different (63.0% vs 10.7%, P < 0.001).
Sequencing data and taxonomic composition of all samples
Total of 5812686124 raw reads were obtained from 92 samples. The average number (mean ± SD) of raw reads per sample in PD and Spouse groups were 62387159 ± 17744069, 63975582 ± 24281624, respectively. Filtered clean reads of per sample were 59484557 ± 16536993 and 62907844 ± 24719821 in PD and Spouse groups (Additional file 2). We used GraPhlAn to construct classification tree (Fig. 1), gut microbiota annotation to Archaea, Bacteria, Eukaryota and Viruses. The average relative abundance of these kingdoms were 0.108%, 99.762%, 0.005% in PD patients, and 0.036%, 99.184%, 0.002%, 0.778% in spouses, respectively. Meanwhile, the relative abundance of Viruses in patients was significantly lower than that in spouse (P = 0.002). Bacteroidetes, Firmicutes, Proteobacteria and Actinobacteria were the dominant Bacteria at phylum level, with the relative abundance accounting for more than 98%. In the remaining phylum with the relative abundance less than 1%, Viruses_noname in Viruses accounted for 0.45% and Euryarchaeota in Archaea accounted for 0.07%. At species level, relative abundance of Prevotella_copri, Bacteroides_stercoris, Faecalibacterium_prausnitzii, Escherichia_coli and Bacteroides_uniformis were more than 3% in both two groups.
Composition of gut microbiota between PD and Spouse groups
We compared the relative abundance of the gut microbiota at the level of phylum, family and species, respectively. The results showed no significant difference in the top 5 phyla of two groups. 70 families were annotationed and Bacteroidaceae was significantly increased in PD, while Prevotellaceae was significantly decreased. In the top 10 of all 484 species, Prevotella_copri was significantly decreased in patients, while Bacteroides_stercoris, Escherichia_coli and Bacteroides_fragilis were significantly increased. Heatmap (Fig. 2a) showed the composition of top 50 species in all samples, and boxplot (Fig. 2b) showed different abundance of top 15 species in two groups. We paired each PD and spouse to further demonstrate the different abundance of Prevotellaceae and Prevotella_copri. We also screened out 23 species with significant differences between the two groups, principal coordinates analysis (PCoA) (Fig. 2c) based on the Bray-Curtis distance matrix showed significantly different beta diversity of two groups (analysis of similarities ANOSIM: R = 0.035, P = 0.036).
Correlation between gut microbiota and clinical features of PD
We also explored the correlation between gut microbiota and PD clinical features. All samples were divided into < 60, 60–70 and > 70 years subgroups to further analyze the differences of Prevotella_copri (Fig. 3a). The results indicated Prevotella_copri decreased significantly in patients of 60–70 and > 70 subgroups compared with those in paired spouse groups. Moreover, the relative abundance of Prevotella_copri in PD was decreased significantly as the increase of age, but this phenomenon didn't occur in spouse subgroups.
We analyzed gut microbiota in patients of subgroups (Fig. 3b) and found significantly different composition in age subgroups. So we filtered 22 species that had significant correlation with group factors (Spearman's test, P < 0.05), and used generalized linear model (GLM)to calculate the correlation coefficient between microbiota and clinical features of disease, such as age, disease duration and severity (H&Y stage, UPDRS total score, UPDRS III score). Most of the identified species in gut microbiota were negatively correlated with disease clinical features (Table 2). In 7 species with average relative abundance of more than 0.1%, Prevotella_copri, Alistipes_onderdonkii and Escherichia_coli had significant negative correlation with patient’s age. Bacteroides_stercoris had significant negative correlation with disease duration. Prevotella_copri, Parabacteroides_merdae and Alistipes_onderdonkii had significant negative correlation with H&Y stage. Prevotella_copri was negatively correlated with both UPDRS total score and UPDRS III score. However, there was no significant correlation between Bacteroides_fragilis, Butyrivibrio_crossotus and PD clinical features.
Gut microbiota in patients with family-history, dysosmia, constipation, pesticide expose and sleep disorder were further studied. The result indicated altered microbiota were significantly correlate with these non-motor symptoms (Additional file 3).
Prediction models for PD based on gut microbiota biomarkers
In order to evaluate the predictive value of gut microbiota for disease, we first searched out 23 species that had significantly different abundance between PD and Spouse groups by Wilcoxon rank-sum test. Then filtered out 6 important species by Boruta(Fig. 4a) and constructed random forest (RF) classification model. Relative abundance of Prevotella_copri, Parabacterid_merdae, Alistipes_onderdonkii, Bacteroides_fragilis, Lachaceae__3_1_57 and Providencia_rettgeri were involved to predict the disease status. The results (Fig. 4b) showed that the area under curve (AUC) of random forest model was 0.772 (95% CI: 0.559–0.985; Sensitivity: 0.875; Specificity: 0.500).
Altered functional pathways of gut microbiota in PD patient
To investigate the altered functional pathways of gut microbiota, we used Statistical Analysis of Metagenomic Profiles softwares (STAMP) and LDA Effect Size (LefSe) to compare the different microbiotal function in PD and Spouse groups, and then focused on the overlapped functional pathways from two methods. By mapping to the MetaCyc databases, we found 15 and 42 PWY pathways (Fig. 5a,b)significantly changed between PD and Spouse groups by STAMP and LefSe, respectively (Additional file 4). The results showed that the pathways associated with aromatic amino acid degradation/chorismate metabolism were significantly increased, while the biosynthesis-related pathways were significantly decreased. Another significant change in patient was significant increase in γ-aminobutyric acid (GABA) degradation, carbohydrate metabolism, and methylphosphonate degradation pathways. In the aspect of vitamin metabolism, pathways in vitamin B1 synthesis were increased in both two groups, while the synthetic pathway in patients was mainly from Escherichia_coli. In additionally, vitamin B6 synthesis pathway in the patients was significantly increased. With the help of metagenomic sequencing, functional annotation can be precise to species level. We investigated the functional pathways of Prevotella_copri in view of its significant changes, and found 7 relative functional pathways. There were significant differences between the two groups in UMP biosynthesis I, S-adenosyl-L-methionine cycle I and guanosine ribonucleotides de novo biosynthesis.
We also mapped clean data to Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, and KEGG orthology (KOs) of top 50 relative abundance are shown in heatmap (Fig. 5c). STAMP and LefSe screened 61 and 200 KOs, respectively, and 30 overlapped KOs were summarized and annotationed (Additional file 5). There were 163 overlapped Clusters of Orthologous Groups of proteins (COGs) and 32 overlapped gene ontology (GOs) (Additional file 6,7), Bray-Curtis distance matrix based on these genes showed different composition of genes between two groups (Fig. 5d,e).