Graves’ Disease (GD) alters the gut microbiota and its functions
In the present study, 162 subjects were divided into three groups: the healthy control group (Healthy, n = 62), mild Graves’ patient group (GD I, n = 36) and severe Graves’ patient group (GD II, n = 64) (Fig. 1A). Eleven clinical indexes characterizing the conditions of the thyroid, liver and immune system were determined for each subject. Significant differences (p ≤ 0.001, Wilcoxon rank-sum test, two-tailed) were found in all clinical indexes between Healthy and GD II, and in liver and thyroid indexes between Healthy and GD I (Fig. 1B). Shotgun metagenomic sequencing was performed on feces from each subject to assess the gut microbiome composition. Metagenomes were assembled and functional genes were annotated, and the putative metabolic capacities of the microbiomes were estimated by MelonnPan (model-based genomically informed high-dimensional predictor of microbial community metabolic profiles) pipeline.
We performed a combined analysis of metagenome-derived taxonomic and functional profiles, as well as clinical indexes. Mantel tests across these data types indicated tight coupling between the intestinal bacterial profile and the metabolic pathways, metabolites and clinical indexes (Fig. 1C). Adonis test of major metadata variables versus individual data matrices confirmed the dominant impact of the disease status (Healthy, GD I and GD II) on the clinical indexes (R2 = 64.08). Meanwhile, it also revealed strong associations between GD status and intestinal bacteria (R2 = 6.08), and their metabolic potentials (R2 = 17.14). In comparison, other demographic, behavioral and clinical variables have no significant association with any metagenome-derived metrics (Fig. 1D).
Alteration of intestinal microbiome and predicted metabolites in GD patients
We constructed PCoA ordinations based on Aitchison distance (Fig. 2A) and Bray-Curtis distance (Fig. S1A) among the taxonomic profiles. Surprisingly, intestinal microbiota of subjects in the Healthy and GD I groups were similar but obviously separated from the patients in the GD II group. To quantify these differences, we calculated the p-values (Wilcoxon rank-sum tests, two-tailed) of PC1 between the Healthy and GD II groups (p < 0.001), which indicated a serious disorder in the intestinal microbiota of severe GD patients. The results were confirmed in the microbial alpha diversity aspect, in which we observed a sharp decrease in microbial alpha diversity of GD II patients (Fig. S1C).
Accordingly, we identified species with significant differences among the three groups in trend changes as the potential biomarkers (Fig. S1B). Specifically, Faecalibacterium prausnitzii, Butyricimonas faecalis, Bifidobacterium adolescentis and Akkermansia muciniphila decreased in the GD II group, whereas Eggerthella lenta, Streptococcus parasanguinis, Veillonella parvula, Fusobacterium mortiferum and Streptococcus salivarius were enriched.
We further assembled the metagenomic reads into contigs and constructed metagenomic assembled genomes (MAGs) in each subject. Phylogenomic analysis of the MAGs suggested overall consistency with taxonomic annotation (Fig. 2D). Then, we identified MAGs with significantly differential abundance among the three groups and constructed a heatmap with represented MAGs (Fig. 2E and Fig. S3A). Furthermore, we reassembled the significantly different MAGs for specific different genes identification and annotation (Fig. S3B).
After demonstrating the disorder in the intestinal microbiota, we further explored the changes in microbial metabolic pathways of GD patients. Annotated by the UniRef protein database, we obtained profiles of microbial gene families and metabolic pathways. The Aitchison distances based on the functional features suggested an obvious shift in the intestinal microbial functional capacity of GD patients (PC1 between the Healthy and GD II groups, p < 0.05, Fig. 2B). Among the differentially abundant metabolic pathways between Healthy and GD II groups, mevalonate and isoprene biosynthesis, formaldehyde assimilation and allantoin degradation significantly increased in relative abundance in the severe GD patients, whereas the microbial metabolic abilities of fatty acid biosynthesis, creatinine degradation, pyruvate fermentation to hexanol, anaerobic energy metabolism and gluconeogenesis decreased significantly in relative abundance in the patients (Fig. S2B).
By performing the MelonnPan pipeline based on the gene family profile inferred by the HUMAnN2 pipeline (UniRef 90 database annotation), we predicted metabolomics profiles including > 80 metabolites, including the SCFAs determined by GC-MS. Similarly, PLS-DA of metabolic profiles revealed obvious differences between GD II patients and healthy subjects (Fig. 2C). The metabolites with significant differential abundance, including acetic acid, propionic acid, cholate and chenodeoxycholate among the three groups were identified as potential biomarkers (Fig. 2F and Fig. S4).
To better characterize the GD-associated intestinal microbial ecology, we constructed a network incorporating MAGs, metabolites and clinical indexes based on microbe-metabolite co-occurrence analysis (Fig. 2G). The network reveals a bipartite pattern separating healthy and GD-associated features. It characterizes the increase of pathogenic bacteria and opportunistic pathogens with detrimental metabolites as well as the lack of mutual microbes and organic acids were the common intestinal microbial ecology characteristic of GD patients.
Differential genetic variations of GD-associated microbes
Beyond the taxonomic and functional features, we further explored the evolutionary changes at the genetic level in intestinal microbial species. We aligned the metagenomic data against the reference genomes of species with relative abundance higher than 0.5% in the present cohort and reconstructed a profile of SNPs. A total of nine common intestinal species were annotated, with the number of SNPs ranging from 46 to 7,603 (Fig. 3 and Fig. S5). Among them, 776 SNPs were annotated in the species of Faecalibacterium prausnitzii, 5,974 in Bacteroides vulgatus and 7,603 in Eubacterium rectale.
A larger number of SNPs indicative of higher evolutionary diversity was observed in the genome of B. vulgatus in GD patients, whereas the opposite was found in F. prausnitzii and E. rectale genomes (Fig. 3B, E and H). Further analysis revealed the consistency between the relative abundance and the number of SNPs of these species, indicating that the correlation between evolutionary elasticity and bacteria fitness (Fig. 3A, D and G).
We further identified SNPs with significantly different frequencies between the healthy and GD groups in B. vulgatus (n = 90), F. prausnitzii (n = 119) and E. rectale (n = 66). These SNPs are mainly enriched in genes under the categories of xylanase activity, mannonate dehydratase activity, beta-lactamase activity, transporter activity and beta-galactosidase activity (Fig. 3C, F, I).
Combined microbial marker types predictive of GD status
Using the Random Forest method, we trained a supervised classification model for the disease status based on the combination of intestinal species, MAGs, MAG-related genes and SNPs. The resulting predictive model with the highest accuracy (area under the receiver operating characteristic curve, or AUC = 84.50%) encompassed 32 biomarkers, including four species, 19 MAGs, six related genes and three SNPs (Fig. 4B). We ranked the 32 biomarkers according to their contribution to the predictive model (Fig. 4A). The predictive model was applied to three test sets and exhibited high accuracy (Fig. 4C). It effectively distinguished severe (AUC = 98.08%) and mild (AUC = 78.11%) Graves’ patients from healthy subjects, and determined the disease status from all three subject groups (AUC = 88.21%).
Meanwhile, we constructed three separate predictive models, each of which was based on one type of feature (MAGs, genes or SNPs). We found that 16 MAGs or 16 genes were needed to accurately classify the samples in the training set (AUC = 74.60% and 79.85%, respectively) (Fig. S6A, B), but as many as 64 intestinal microbial SNPs or 64 selected species could reach similar accuracy (AUC = 79.53%) (Fig. S6C). The results could be confirmed by the confusion matrix of the test group (Fig. S6D). However, the performance of the combined biomarkers from multiple feature types (see above) outperformed that of the biomarkers of any single type (AUC diff. 4.65%–9.90%).
Specificity of GD biomarkers against other metabolic diseases
To test the performance of the 32 combined GD biomarkers developed above in a broader background, we performed a meta-analysis across another healthy cohort and six metabolic diseases cohorts including ankylosing spondylitis, liver cirrhosis, colorectal cancer, Parkinson’s disease (PD), rheumatoid arthritis and type 2 diabetes. The specificity of the combined biomarkers was calculated. Among the 32 biomarkers, we highlight the importance of five MAG markers under the family Erysipelotrichaceae and the genera Coprobacillus, Streptococcus and Rothia that are enriched in the GD group (Fig. 4D, red), which exhibited unique specificity among all cohorts. The discrimination was quantified and confirmed by the violin plot constructed with the CLR-transformed abundance of the five MAGs across all cohorts (Fig. 4E).
Most notably, we observed that the five GD-enriched MAG markers also exhibited excellent discrimination in PD (Fig. 4D, upper panel in red box), a disease often involving thyroid dysfunction and is difficult to distinguish from GD in diagnosis 14. To confirm this observation, we further validated the specificity of the five markers in two separate PD cohorts. The abundance of the five MAG markers was significantly higher in GD patients compared with that in PD patients (Fig. 4E). The predictive model constructed based on these five markers also exhibited high accuracy (AUC = 97.31%) in discriminating GD and PD subjects (Fig. S7).