We enrolled 111 pregnant women, including 44 pregnant women with gestational diabetes mellitus (GDM+) and 67 healthy pregnant women (GDM-). Because the volunteers were all Chinese women, there is no effect of sex or ethnicity. Totally 105 saliva and 51 dental plaque samples were acquired, of which 45 saliva and dental plaque were paired samples, with each pair collected from same person (Additional file 1A). For each sample, the V3-V4 regions of 16S rRNA gene were sequenced. The gene sequencing yielded 16 million PE reads (2×250bp), with ~108,305 reads per sample (Additional file 1B). Each pair of PE reads was merged into one sequence by overlaps. Most of the sequences were 400-450 bp (Additional file 1C). According to the rarefaction curve (Additional file 2A) and Good’s coverage (Additional file 2B), the number of sequences can well represent the microbial diversity of each community.
Changes of oral microbiota in patients with GDM
To investigate whether hyperglycaemia that develops during pregnancy is accompanied by extensive changes in the oral microbiota, we explored the microbial shift of saliva and dental plaque of pregnant women who were diagnosed suffering from GDM. We found both saliva and dental plaque samples of GDM+ were divided into different clusters from GDM- (Figure 1A), despite there was no significant difference in α-diversity (Additional file 3A-D). Additionally, we calculated Bray-Curtis distances using normalized OTU abundance. In saliva, the Bray-Curtis (BC) distances between samples were significantly smaller intra-group GDM+ than either intra-group GDM− or inter-group GDM+ vs. GDM- (P < 0.001, Mann-Whitney test). In dental plaque, the BC distances were not as obvious as saliva (Figure 1B). These results suggest that pregnant women with GDM have a distinct oral microbial community different from healthy women. Microbial shift of oral cavity in GDM+ showed obvious sample-type specificity, and saliva was more significant than plaque.
Oral microbial variations between GDM and major oral diseases
To explore the relationship between GDM and periodontitis, and whether oral microbial variations in major oral diseases such as dental caries can disturb the accuracy of GDM classification based on bacterial biomarkers. We compared the oral microbial shifts between GDM, periodontitis, and dental caries.
No significant difference in the number of shared bacteria were shown in the oral microbiota of periodontal health (PH) and periodontitis patients (PD) when compared with that of pregnant women, regardless of whether the pregnant women had GDM or not (Figure 2A-B). Compared with PD, the Bray-Curtis distances between either GDM+ or GDM- and PH were significantly smaller (P < 0.0001, Mann-Whitney test), no matter in saliva or in dental plaque (Figure 2C-D). These results indicate the oral microbiota of pregnant women, was more similar to healthy periodontitis, but different from periodontitis, thus the microbial variations in oral cavities of pregnant women with GDM are not equivalent to those of periodontitis.
There was not any significant difference in the number of shared bacteria in oral cavity, when pregnant women with GDM or without GDM was compared with caries-free (NC), mild (LC), moderate (MC) and severe (HC), respectively (Additional file 4A-D). The saliva and dental plaque microbiota of both GDM+ and GDM- showed a larger Bray-Curtis distance to dental caries than to NC (Additional file 4E), which indicated that there should be little relationship in the oral microbial shifts between GDM and dental caries.
SVM classification model of GDM
To identify specific microbial biomarker which can be used to discriminate GDM, we investigated the differential genera from pregnant women with GDM and without GDM. Firstly, we compared the two groups by LEfSe, with the threshold value of LDA 3.0 (Additional file 5A-B). For saliva, Leptotrichiaceae, Lautropia, Neisseria, Neisseriales and 4 other bacterial taxa were significantly enriched in GDM+, while Selenomonas, Leptotrichia, F16 and 3 other taxa were depleted (Figure 3A). As for plaque, significant enrichment was shown in the abundance of Lautropia, Neisseria and Neisseriales, while the microbiota was depletion of bacteria such as Streptococcus and Veillonella in GDM+ (Figure 3B). Lautropia and Neisseria were the common characteristic bacteria in both saliva and dental plaque.
To expand the scanning scope of potential microbial markers, odds ratio analysis was performed. Significant differences in four genera Lautropia, Neisseria, Streptococcus, and Veillonella were found between GDM+ and GDM- groups in both saliva and dental plaque samples (Additional file 5C-D). It is suggested that using these four bacteria as microbial biomarkers to distinguish GDM+ from GDM- may have an ideal effect. Meanwhile, it is worth noting that Streptococcus and Veillonella also depleted in patients with periodontitis (Additional file 6A), indicating that the possible relationship between GDM and periodontitis may be related to the decreased abundance of these two genera. There was no significant variation in these four bacteria in dental caries (Additional file 6B-D), indicating that there was little relationship between GDM and dental caries in the change of microbial community.
To optimize the efficiency of identifying GDM, the common specific bacteria in both saliva and dental plaque were used to construct classification models. According to the above results, we found that Lautropia, Neisseria, Streptococcus, and Veillonella were significantly different in the two sample types, so they were used to construct classification models based on SVM algorithm. Firstly, for finding the optimal combination of microbial biomarkers, we performed orthogonal experiment using the paired samples of saliva and dental plaque collected from same person (Figure 4A). The AUC value of the optimal combination could reach 0.84 (95% CI: 0.81-0.87), using the relative abundance of Lautropia and Neisseria of dental plaque and Veillonella of saliva microbiota. The AUC value of the common bacteria Streptococcus, Veillonella of the two sample types was 0.78 (95% CI: 0.75-0.81), while the value of only Streptococcus of the two sample types was 0.75 (95% CI: 0.71-0.78). Subsequently, by 1000 iterations drawing the ROC curve (Figure 4B), the AUC value of the combination of dental plaque Lautropia and Neisseria and saliva Veillonella was as high as 0.83 (95% CI: 0.82-0.84). Even if using the simple Streptococcus of two sample types, the value of AUC could reach 0.74 (95% CI: 0.73-0.75).
Considering that saliva sampling had the advantages of simple, convenient and non-invasive collection, to explore the differentiation effect of GDM by using saliva samples, we employed another 59 saliva samples and the original 46 saliva samples to construct an SVM classification model (Additional file 1A). Based on the orthogonal experiment of the four genera in saliva (Additional file 7), it was found that the AUC value of using Streptococcus and Veillonella was 0.78 (95% CI: 0.77-0.78), while the value was 0.72 only using Veillonella (95% CI: 0.71-0.72) (Figure 4C). The results implied that several or even one kind of bacteria in saliva could effectively distinguish GDM, which provided an effective tool for classification of GDM by oral microbial targeting markers.
Random forest classification model of GDM
In addition, to give users more choices, a classifier was constructed based on random forest algorithm to discriminate GDM, using 45 dental plaque and saliva paired samples. The recursive feature elimination method was used to rank the importance of all the features, and the top ten features and their abundance information were shown (Figure 5A and Additional file 8). We then selected different features to calculate the AUC value of the model. When using five genera p_Streptococcus, s_Leptotrichia, p_Eikenella, s_Kingella, p_Anoxybacillus to build the model, the model had the best performance (Figure 5B), and the AUC value was 0.89 (95% CI: 0.81-0.97) (Figure 5C). Furthermore, only using the p_Streptococcus and s_Leptotrichia to construct the model, the AUC could also reach 0.77 (95% CI: 0.67-0.87) (Figure 5D).