Caries-associated Salivary Microbiota of Children at 1 Mixed Dentition from Different Geographic Locations

The microbial composition of dental caries may depend on age, diet, and geography, yet the effect of geography on these microbiomes 20 is largely underexplored. Here, we profiled and compared saliva microbiota from 130 individuals aged 6 to 8 years old, representing 21 both healthy children (H) and children with severe caries (C) from two geographical regions of China: Qingdao Guangzhou. First, the 22 saliva microbiota exhibited profound differences in diversity and composition between the C and H groups. The caries microbiota 23 featured a lower alpha diversity and more variable community structure than the healthy microbiota. Furthermore, the relative 24 abundance of several genera (e.g., Lactobacillus , Gemella and Cryptobacterium ) was significantly higher in the C group than in the H 25 group. Next, geography dominated over disease status in shaping salivary microbiota, and a wide array of salivary bacteria was highly 26 predictive of the individuals’ city of origin. Finally, we built a universal diagnostic model based on 14 bacterial species, which can 27 diagnose caries with 87% and 85% accuracy within each city and 83% accuracy across cities. These findings demonstrated that despite 28 the large effect size of geography, a universal model based on salivary microbiota has the potential to diagnose caries across human populations. across Our previous and other have verified the diagnostic and predictive efficacy of oral microbiota using random forest classification in deciduous and permanent dentition 6,8,44,45 Together, these results suggested that caries diagnosis models were using saliva microbial


32
Dental caries is one of the most prevalent chronic infectious diseases, affecting approximately half of children worldwide 33 1,2 . Once started, the damage to teeth is irreversible 3 . Severe caries, an aggressive form of dental caries, can lead to acute 34 pain, sepsis, and potential tooth loss and even interfere with children's quality of life, nutrition, and school participation 4 . 35 Therefore, preventive measures against caries, as well as improved tools for prognosis early diagnosis, are of particular 36 clinical significance.

37
Human oral microbiome dysbiosis is increasingly implicated in various local and systemic human diseases, such as 38 dental caries 5 , gingivitis 6 , and obesity 7 . The oral microbial composition depends on many factors, including age, diet, 39 and geography. Accumulating evidence supports that changes in oral microbiota continue throughout human life 8-11 , 40 especially among three dentitions (i.e., deciduous/primary, mixed, and permanent dentition) 12,13 . Wim et al. found that 41 Prevotella increased from deciduous, mixed, to permanent dentitions in healthy individuals, and there was a higher 42 proportion of Proteobacteria in deciduous dentition than in mixed and permanent dentition 13 . Another study showed that 43 Lactobacillus spp. and Propionibacterium FMA5 were enriched in primary teeth from caries samples, while Atopobium 44 genomospecies C1 was enriched in permanent teeth 12 . The mixed dentition stage is a crucial transitional period during 45 which deciduous teeth exfoliate successively and new permanent teeth erupt 14 . It is not only the main growth and 46 development period of children's maxillofacial and dental arches but also subject to tremendous changes in host 47 hormones and the immune system 15 , which may promote maturation of oral microbiota 16 . Notably, most of the previous 48 microbial studies were focused on early childhood or adult caries 5,8,17,18 , and there are rare reports on the association of 49 the oral microbiome with health and caries in mixed dentition 14,19,20 . 50 Regarding geographical factors, former studies reported that adult populations from different continental regions or 51 even countries had microbial variations in saliva 21,22 , and supragingival microbiota differed among ethnic groups (i.e., 52 African American, Burmese, Caucasian, and Hispanic) in children from the same geographic location (i.e., Burma) 23 . 53 2 Early microbiota development has a significant impact on oral health and diseases of adulthood 24 . Understanding the 54 oral microbiota differences in children in different geographic locations will shed light on the factors that might drive 55 oral health disparities. However, the influence of geographic factors, such as city-scale differences, on the oral 56 microbiome of healthy and diseased children is largely underexplored.

57
In this study, we address three general questions: (i) During the mixed dentition period, do oral communities 58 assemble differently at different host states (i.e., healthy and caries)? (ii) How is bacterial diversity partitioned across 59 biogeography, host states and biological gender? (iii) Should the geographic factor be taken into account when building 60 classifiers to distinguish children with caries from healthy controls? Here, we conducted a comparison of the saliva 61 microbiome from severe caries and healthy child cohorts between 6 and 8 years old from two cities in China (Qingdao 62 and Guangzhou) by 16S rRNA gene sequencing (Figure 1). Ecological modeling techniques were further employed to 63 dissect the role of saliva microbiota in caries and geography and probe the predictive value of the microbiome for 64 diagnosing caries by identifying both biogeography-and disease-associated taxa.

67
To investigate whether and how caries affects oral microbiota in the mixed dentition stage, we first compared beta diversity 68 within and between disease status (i.e., health and severe caries) and gender based on the Jensen-Shannon distances. We found 69 that disease status exhibited a remarkable effect on shaping salivary microbiota (p<0.01, F=3.20) rather than gender (p>0.05; 70 Figure 2A). Furthermore, the C group exhibited significant variability, while the H group was relatively conserved in microbial 71 community structure (p<0.05; Figure 2B). Next, we assessed the impact of the disease status on the alpha diversity represented 72 by Shannon, Simpson, and Pielou's evenness indices. The results showed that the alpha diversity was significantly lower in the 73 C group than in the H group (all p<0.01; Figure 2C). Finally, we quantitatively profiled the bacterial taxa from the phylum to 74 species level to characterize the mixed-dentition microbial composition ( Figure S1A) and then tested whether there were any 75 caries-enriched and caries-depleted taxa. All sequences were distributed in 13 bacterial phyla that included six predominant 76 phyla (accounting for > 99% of the microbial diversity; Figure S1A), namely, Firmicutes (78.0%), Actinobacteria (11.9%), 77 Bacteroidetes (5.0%), TM7 (2.0%), Proteobacteria (1.6%) and Fusobacteria (1.4%). At the genus level, a total of 124 genera 78 were identified, among which the most frequently detected genera (the four most abundant genera that each represented at least 79 5% in the average relative abundance) were Streptococcus (51.4%), Gemella (11.2%), Actinomyces (8.7%) and Granulicatella 80 (5.8%; Figure S1A). Moreover, no 'caries-specific' taxon (present in one status but absent in the other) was detected between 81 the two groups. At the genus level, Lactobacillus, Gemella, Cryptobacterium and Mitsuokella were found to have significantly 82 higher relative abundances in the C group, while Leptotrichia, Porphyromonas, Peptococcus, TM7, and Tannerella were 83 higher in the H group (all p<0.05, Figure S1B

98
To identify geography-specific markers contributing to predicting city origins, we first built classification models via the 99 random forest (RF) machine learning algorithm using healthy samples as the training set. The city origin was predicted from 100 healthy samples with 78.88% accuracy (area under the concentration curve [AUC]: 97.30%; CI: 93.80%-100.00%, Figure 4A).

101
The probability of Guangzhou city was significantly higher in Guangzhou city samples than in Qingdao city samples from the 102 H group (Wilcoxon test, p<0.05, Figure 4B). Next, the RF model ranked the contribution of each predictor based on the 103 variable importance, where we can identify the most discriminatory bacteria between two cities. Performance improvement 104 was minimal when the top eight most discriminatory species were included ( Figure 4C). Eight geography-specific marker 3 significantly increased in Qingdao city samples (Wilcoxon test, adjusted p<0.05, Figure S2A). Moreover, these taxa were 110 shared in caries samples, representing 12.88% and 13.13% abundance for healthy and caries samples, respectively. Finally, 111 application of the eight-marker-based model on the caries samples resulted in 92.31% accuracy (AUC: 95.00%; Figure S2B), 112 and the probability of Guangzhou city was significantly higher in the Guangzhou samples than in the Qingdao samples from 113 the C group (Wilcoxon test, p<0.05; Figure S2C). Thus, geography-specific differences in the salivary microbiome were 114 consistent, irrespective of health status.

115
A universal disease diagnosis model for all samples across geographic locations 116 Consistent with the results for the Qingdao city samples, a reduction in alpha diversity was associated with caries (p<0.05; 117 Shannon index; Figure S3A) in all samples from both cities, and the beta diversity was distinct between caries and healthy 118 microbiota (p<0.05, F=1.00; Figure S3B). These results suggested the feasibility of caries diagnosis based on oral microbiota in 119 different geographic locations. 120 There were three strategies to construct and optimize the caries diagnosis model. First, to test the effect of taxonomic level 121 on the discriminatory power of the RF model, the models were constructed based on taxa at the phylum, genus, and species 122 levels to discriminate between healthy and caries samples using two city datasets. We found that the use of species-level taxa 123 maximized (AUC: 88.56%, CI: 83.56%−94.61%) compared with that of the others at the phylum (AUC: 64.11%, CI: 124 54.54%−73.67%) and genus (AUC:77.61%, CI: 69.48%−85.74%) levels ( Figure S4). Second, to test whether differences in 125 oral microbiota in caries were consistent by city, we built RF models in each city (i.e., Qingdao and Guangzhou) and achieved 126 diagnosis accuracies of 84.38% and 76.47%. Furthermore, training a diagnosis model in one dataset and applying it to another 127 led to lower yet still decent and meaningful performance ( Figure S5). Specifically, application of the Qingdao model (i.e., the 128 Qingdao cohort as training data) on the Guangzhou dataset led to a reduction in the AUC from 91.10% to 83.00%, and 129 similarly, application of the Guangzhou model (i.e., the Guangzhou cohort as training data) on the Qingdao dataset led to a 130 reduction in the AUC from 85.81% to 80.00% ( Figure S5). Third, we built RF models using all caries and healthy samples 131 from the two geographic locations. Unexpectedly, excluding eight geography-specific signatures from the species profile rarely 132 affected the classification performance, with AUCs from 88.56% to 88.99% ( Figure 5A). Moreover, intriguingly, these most 133 discriminatory taxa associated with caries state did not show correlation with geography in the healthy samples ( Figure 5C Figure 6A) and across two cities (AUC: 92.17%; CI: 87.45%-96.88%; Figure 6B).

144
Notably, Streptococcus mutans (S. mutans) with the top importance score in the model (Wilcoxon test, adjusted p<0.05, Figure   145 5C and Figure S1C) has previously been documented to play a critical role in caries pathogenesis. Using only S. mutans as a 146 predictor, the simplified random forest model led to a lower yet decent performance (AUC=81.62%, CI: 74.40%-88.84%, 147 Figure S7). However, S. mutans was not detected in any of the samples (the occurrence rate in the caries sample=78.5%, the 148 occurrence rate in the healthy sample=30.8), as well as the others ( Figure S8), suggesting that dental caries is not associated 149 with a single taxon but in fact with a complex community.

151
It has been well documented that in dental caries, environmental perturbation alters the balance of the oral microbiota 152 and eventually leads to a predominance of cariogenic bacteria, resulting in sustained demineralization of tooth hard tissue 153 25 . Evidence has recently emerged that the oral microbiome may depend on age, oral dentition, diet and geography 8,13,26-

164
Moreover, caries children also have higher Jensen-Shannon distances than healthy children. This is likely because caries 165 4 microbiomes have higher intro-group variation and more personalized microbiomes than healthy microbiomes, which are 166 more similar to each other 5 . Moreover, our data substantiate existing evidence that organisms other than Streptococcus 167 mutans and Lactobacilli play a role in the development and progression of dental caries. At the genus level, the caries 168 microbiome harbored a higher abundance of Lactobacillus, Gemella and Cryptobacterium than healthy controls, which is 169 in line with previous studies 32-34 . At the species level, the increase in non-mutans streptococci (i.e., S. anginosus and S. 170 sobrinus) and Actinomyces_gerencseriae in the C group was not surprising. They were recognized as acidogenic and 171 aciduric bacteria, which have been reported to produce weaker acid resulting in caries initiation and thrive during caries 172 progression in low pH conditions (e.g., pH=5.0; 19,35,36 . Similarly, according to our and other studies 5 , Prevotella 173 denticola was significantly enriched in caries and was identified as the main predictor of caries, which potentially have 174 proteolytic/amino acid-degrading activities. Propionibacterium FMA5 was implicated in dental caries from young 175 permanent teeth 18 and root caries from elderly individuals 37 . In addition, S. mutans was identified in relatively low 176 abundance, and the detection rate was relatively low (AUC=81.62%). Consistently, previous studies found that despite a 177 significant enrichment of S. mutans with caries development, several bacteria were far more abundant in the carious 178 lesions 38 . Our findings illustrated that dental caries in the mixed dentition resulted from widespread shifts in the oral 179 microbial community instead of any particular taxa from healthy to diseased status, supporting the "ecological plaque

256
Raw sequencing data were processed by Beijing Auwigene Tech, Ltd. (Beijing, China) using the pipeline tools 257 MOTHUR 46 and QIIME 47 , and pyrosequencing data were analyzed using customized R scripts. Noise reduction was 258 carried out using MOTHUR. The sequences were binned into operational taxonomic units (OTUs) with 97% similarity.

259
OTUs are groups of sequences that are clustered based on similarity, allowing taxonomic assignment.

271
Random forest (RF) was applied to identify features that are differentially abundant (i.e., present in different abundances) 272 across sample groups and diagnosis models. The N top-ranking caries-discriminatory taxa and geography-discriminatory 273 taxa that led to reasonably good fit were identified based on the 'rfcv' function in the random forest package 274 6 (https://cran.rproject.org/web/packages/ randomForest/ index.html). RF models were trained to identify disease status in 275 the training set, which included samples from the healthy and severe caries groups using the taxonomy profiles. The 276 results were evaluated with a 10-fold cross-validation approach, and model performance was evaluated by receiver 277 operating characteristic (ROC) curves. Using the species profiles, the performance of the models based on microbiota 278 was evaluated with a 10-fold cross-validation approach where the original samples were randomly partitioned into 10 279 groups with a similar distribution of healthy and caries samples. In each cross-validation iteration, nine groups of 280 samples were used as training data and tested samples in the remaining group. The cross-validation process was then 281 repeated 10 times, and per-sample prediction was reported as ones in the test fold. Based on the optimization step that 282 selects the taxonomic level that maximizes model performance, the final RF models were based on the taxonomic 283 profiles at the species level. ROC analysis was then used to evaluate the diagnostic performance of the RF models 284 (https://cran.r-project.org/web/packages/pROC/ index.html). In the ROC plots, the x axis represents the true-positive rate 285 (TPR, or sensitivity), and the y axis presents the false-positive rate (FPR, or specificity). The area under the ROC curve 286 (AUC) was calculated to quantify the performance of the RF model.