Baseline characteristics of study population
Demographic characteristics of 201 GDM cases and 201 controls are shown in Table 1. The mean age of women in this study was 28.0 (standard deviation [SD], 4.2) years old, with gestational age of 11.6 (SD, 2.0) weeks. No significant differences were found between cases and controls in smoking, family history of GDM, and intakes of meat, vegetables, and fruits at enrollment in early pregnancy. Compared with non-GDM controls, women with GDM had significantly higher BMI, and were more likely to be current or former alcohol drinkers. A total of 13 GDM cases had a history of GDM, while none of the controls had a history of GDM.
Sequence processing and quality control
We employed a community 16S rRNA gene amplicon based sequencing approach to compare the gut microbiota of GDM cases and controls. For each sample, community 16S rRNA gene were amplified and sequenced yielding 21,914,351 high quality joined reads after quality filtering and chimera check (54,513 reads per sample on average). We clustered all quality sequences into 9,852 features. Rarefaction curves demonstrated that the sequencing depth of each sample at 13,000 was deep enough to represent its microbial diversity (Additional file 2: Figure S1).
Gut microbiota differences between GDM cases and controls
The overall gut microbial α-diversity indexes including observed ASVs, Shannon diversities, evenness, and Simpson diversity based on randomly subsampled 13,000 sequences per fecal sample showed no significant difference between GDM cases and controls (Additional file 2: Figure S2). Results on β-diversities by PCoA analyses suggested that the overall gut microbial community structures in GDM cases were significantly different (P<0.05, PERMANOVA) from controls during early pregnancy (P>0.05, PERMDISP) (Fig. 2). Then we followed the reference protocol of QIIME2 to identify the differentiated ASVs by Ancom analysis from the remaining 208 ASVs after filtered from 9852 ASVs. However, only 2 genera (Coprococcus and Anaerostipes) were identified to successfully differentiate GDM cases from controls during early pregnancy (Additional file 2: Figure S3), and both belonged to the Lachnospiraceae family of the Clostridiales order and were significantly reduced in the GDM group (Additional file 3: Table S1).
To further investigate the taxonomic differences of gut microbiota, we compared the relative abundances of bacterial groups at multiple taxonomic levels (Additional file 3: Table S2). At phylum level, Actinobacteria (case: 4.5%, control: 6.9%), Cyanobacteria (case: 0.19%, control: 0.21%) and TM7 (case: 0.39%, control: 0.55%) were significantly under-represented, while Proteobacteria that contained multiple opportunistic pathogens (case: 4.5%, control: 3.2%) was significantly over-represented in GDM cases (Fig. 3). We further compared Actinobacteria at lower taxonomic levels and found SCFA producing Actinobacteria members including Rothia, Actinomyces, Bifidobacterium, Adlercreutzia, and Coriobacteriaceae were all significantly reduced in GDM cases versus controls, while inflammation related members of Collinsella were not significantly different (Additional file 3: Table S2, Fig. 3). There were no significant differences in relative abundances on other dominant phyla of Firmicutes, Bacteroidetes, or their ratios (Additional file 2: Figure S4).
At family level, 14 abundant and prevalent families (relative abundance >1%, prevalence >50%) that added up to >95% of the gut microbiota were identified (Fig. 3, Additional file 3: Table S2). Among these abundant families, 4 were found significantly different between GDM cases and controls. Veillonellaceae in the order of Clostridia was significantly enriched in GDM cases versus controls (9.4% versus 8.4%), while potential beneficial bacterial group Bifidobacteriaceae (2.9% versus 4.3%) and Coriobacteriaceae (1.4% versus 2.4%) belonging to Actinobacteria were reduced. Streptococcaceae (3.8% versus 5.3%) in the order of Lactobacillales were also reduced in GDM cases. Enterobacteriaceae belonging to Proteobacteria that contains multiple pathogenic microorganisms was slightly higher in GDM cases (3.4% versus 2.2%), despite no difference of statistical significance (Additional file 3: Table S2).
Biomarker identification through multiple approaches
1) Genera differences between GDM cases and controls identified by Random Forestclassifier and LEfSe analysis
Random Forest classifier selected 96 from the total 274 genera included in the model (Fig. 4). The model has a relatively high robust diagnostic accuracy (areas under the ROC curve [AUC]=0.77) for detecting GDM. In the ranks of importance for the model, we found several taxa showed high impact constructing the model (Additional file 3: Table S3). The genus 227 of Enterobacteriaceae, which has multiple members of opportunistic pathogens, was significantly elevated in GDM cases (Additional file 3: Table S3). Beneficial Actinobacteria members showed high importance in the model. Those include Actinomyces and Coriobacteriaceae spp. (genus 26 and genus 27). Moreover, Gemellaceae that were significantly under-represented in GDM cases had high importance for GDM model construction. Other microorganisms included Ruminococcus, Clostridium and other several taxa in Lachnospiraceae (genus 103, Anaerostipes and Blautia).
LEfSe analysis identified 17 biomarkers at genus level that differentiate GDM cases from controls (Fig. 5). Some overlapped with the ones of high importance identified by random forest analysis as Gemellaceae and multiple beneficial Actinobacteria members including Adelercreutzia, genus26 of Coriobacteriaceae, Bifidobacterium, Rothia, and Actinomyces were found significantly reduced, while the genus67 of Cyanobacteria was significantly over-represented in GDM cases. Interestingly, there were 4 other genera enriched in GDM cases. These genera included Lachnospira, SCFA producing bacteria Phascolarctobacterium, Cardiobacterium that associated with inflammation, and opportunistic pathogen Paraprevotella (Fig. 5).
2) Gut microbiota correlated to GDM and glucose levels in the DR analysis
We applied 396 features at species level to the DR analysis, and screened from 197 features after regression to identify which taxa changed the most relative to each other in GDM cases versus controls. The top 10 most differential taxa were identified to discriminate GDM cases from controls, and added information compared with relative abundance-based analysis (Fig. 6). Interestingly, the SCFA producer Ruminococcus was on the top list of shifting towards GDM. The top 5 most reduced taxa in GDM cases included opportunistic pathogen Erysipelatoclostridium, beneficial SCFA producing Lactobacillus and Weissella.
We also highlighted several well-known gut microorganisms that tend to have high or low ranks and were considered associated with disease or healthy status. We found in that Clostridium, Roseburia, and Coriobacteriaceae tended to have high ranks, and Staphylococcus tended to have low ranks. The difference in ranks suggested similar findings in relative abundances that Clostridium, Roseburia, and Coriobacteriaceae were more abundant relative to other taxa in controls, while Staphylococcus was more abundant relative to other taxa in the GDM group. When we investigated the log-ratio of the relative abundance on the above two taxa between GDM cases and controls, we observed the increase of Staphylococcus relative to Clostridium, Roseburia, and Coriobacteriaceae in GDM cases. We also investigated the correlation of log ratios between the above selected taxa and OGTT values measured during the middle of pregnancy (Table 1). The log ratios between Staphylococcus and the other taxa were significantly different between GDM cases and controls, and correlated with OGTT values (Fig. 6).
3) GDM associated microbial interactions in co-occurrence network analyses
The microbial association network analysis was first performed based on all 274 features identified at genus level (Additional file 2: Figure S5). GDM cases had much few microorganisms (136 nodes for GDM cases, 155 nodes for controls) and significant correlations (891 edges for GDM cases, 1073 edges for controls) in the network compared with the control group (Additional file 3: Table S4). Each of the genera selected for both networks was calculated for the degree of differences between GDM cases and controls based on its importance (with how many nodes significantly correlated) in the network. Correlation analysis identified 177 genera that were significantly different in GDM cases compared with controls (Additional file 3: Table S5).
Besides microorganisms identified in other analyses, the roles of additional members were identified as changing dramatically in the network between GDM cases and controls. The top changing genera included Proteobacteria members living in extreme environments (Psychrobacter), pathogenic microorganisms of Fusobacterium, short chain fatty acid producers of Anaerostipes, Coprococcus, Blautia, Faecalibacterium, Roseburia, and Sutterella, beneficial bacteria Akkermansia of Verrucomicrobia, Atopobium and Eggerthella of Actinobacteria, and Paraprevotella of Bacteroidetes (Additional file 3: Table S6). To examine how the roles of discriminatory taxon differentiate between GDM cases and controls, we selected 27 genera of interest and conducted co-occurrence network analysis on a smaller scale. The resulting networks (P<0.05, |r|>0.2) showed that the correlations among these microbes were different between GDM cases and controls (Fig. 7). Consistent with what we found in the overall network analysis, few nodes (25 nodes for GDM cases, 26 nodes for controls) and edges (69 edges for GDM cases, 98 edges for controls) were noted for GDM cases compared with controls (Additional file 3: Table S7). Compared with controls, several taxa including genus 261 of TM7, SCFA producers Actinomyces and Roseburia, and the genus 127 of the GDM-related group Ruminococcaceae lost the most significant correlations with other taxa in the GDM network (Additional file 3: Table S8).
Gene prediction suggested little functional differences
The PICRUSt program was used to predict functional capacities of the gut microbiota from community 16S rRNA gene amplicons based on 9852 features, and 328 predicted functional categories were identified. Although we found the overall gut microbial community structures were significantly different between GDM cases and controls in earlier analyses, we did not find significant differences in the overall predicted functional structures indicated by NMDS (Additional file 2: Figure S6). Nevertheless, we further investigated the possibility of biomarkers in functional genes by going through individual biomarker identification analysis pipeline and implemented LEfSe to identify predicted functional categories at KEGG level 1, 2 and 3 that could discriminate GDM cases from controls. From the categories that made up KEGG level 1, genetic information processing showed significantly lower abundance in GDM cases versus controls. At KEGG level 2, we detected 39 functional categories in total. Of them, 3 were significantly over-represented in GDM cases, including functional categories of metabolism of cofactors and vitamins, cellular processes and signaling, and metabolism of other amino acids. Meanwhile, 5 were found enriched in controls, including functional categories of replication and repair, translation, nucleotide metabolism, metabolism of terpenoids and polyketides, and signaling molecules and interaction (Additional file 3: Table S9, Additional file 2: Figure S7). Of them, 3 were over-represented in the GDM group including pores ion channels, other ion-coupled transporters and membrane and intracellular structural molecules. The rest 5 predicted functional categories enriched in controls mainly included pyrimidine metabolism, purine metabolism, ribosome, aminoacyl-tRNA biosynthesis and DNA repair and recombination proteins. Collectively, we did not find any strong evidence or clear trends in predicted functional gene profiles to discriminate GDM cases from controls.
Prospective associations between microbial taxa and risk of GDM
We compared relative abundances of 41 gut microbial taxa of interest (identified through earlier procedures) between GDM cases and controls on all taxonomic levels. Higher relative abundance of 4 microbial taxa showed statistical associations with lower risk of GDM (Table 2). We found that greater abundance of Actinobacteria was associated with decreased risk of GDM (multivariable-adjusted OR per percent increase of relative abundance, 0.94 and 95% CI [0.90 to 0.98]). The family of Coriobacteriaceae was also associated with decreased risk of GDM (OR, 0.79; 95% CI, 0.68 to 0.91). In addition, genus 26 (OR, 0.47; 95%, 0.29 to 0.77) of Coriobacteriaceae and an unknown specie of the Coprococcus (OR, 49%; 95%, 0.30, 0.81) was respectively 53% and 51% less likely as to confer GDM risk for each 1 percent increase of relative abundance.