Evaluating the ecological hypothesis: Early life salivary microbiome assembly predicts dental caries in a longitudinal case-control study

doi:10.21203/rs.3.rs-848589/v2

Download PDF

Research Article

Evaluating the ecological hypothesis: Early life salivary microbiome assembly predicts dental caries in a longitudinal case-control study

https://doi.org/10.21203/rs.3.rs-848589/v2

This work is licensed under a CC BY 4.0 License

Journal Publication

published 26 Dec, 2022

Read the published version in Microbiome →

You are reading this latest preprint version

Background

Early childhood caries (ECC) – dental caries (cavities) occurring in primary teeth up to age 6-years - is a prevalent childhood oral disease with a microbial etiology. Streptococcus mutans was previously considered a primary cause, but recent research promotes the ecologic hypothesis, in which a dysbiosis in the oral microbial community leads to caries. In this incident density sampled case control study of 189 children followed from 2-months to 5-years, we use the salivary bacteriome to 1) prospectively test the ecological hypothesis of ECC in salivary bacteriome communities and 2) identify co-occurring salivary bacterial communities predicting future ECC.

Results

Supervised classification of future ECC case status using salivary samples from age 12-months using bacteriome-wide data (AUC-ROC 0.78 95% CI: (0.71–0.85)) predicts future ECC status before S. mutans can be detected. Dirichlet multinomial community state typing and co-occurrence network analysis identified similar robust and replicable groups of co-occurring taxa. Mean relative abundance of a Haemophilus parainfluenzae/Neisseria/Fusobacterium periodonticum group was lower in future ECC cases (0.14) than controls (0.23, P value < 0.001) in pre-incident visits, positively correlated with saliva pH (Pearson rho = 0.31, P value < 0.01) and reduced in individuals who had acquired S. mutans by the next study visit (0.12) versus those who did not (0.19, P value < 0.01). In a subset of whole genome shotgun sequenced samples, case plaque had higher abundances of antibiotic production and resistance gene orthologs, including a major facilitator superfamily multidrug resistance transporter (MFS DHA2 family P_BH value = 1.9*10^− 28), lantibiotic transport system permease protein (P_BH value = 6.0*10^− 6) and bacitracin synthase I (P_BH value = 5.6*10^− 6). The oxidative phosphorylation KEGG pathway was enriched in case plaque (P_BH value = 1.2*10^− 8), while the ABC transporter pathway was depleted (P_BH value = 3.6*10^− 3).

Conclusions

Early-life bacterial interactions predisposed children to ECC, supporting a time-dependent interpretation of the ecological hypothesis. Bacterial communities which assemble before 12-months of age can promote or inhibit an ecological succession to S. mutans dominance and cariogenesis. Intragenera competitions and intergenera cooperation between oral taxa may shape the emergence of these communities, providing points for preventive interventions.

Oral microbiome

early childhood

ecological hypothesis

early childhood caries

16S rRNA gene

whole genome shotgun metagenomics

In 2015–2016, 21% of US children aged 2–5 years showed evidence of early childhood caries (ECC), i.e., at least one primary tooth with one or more decayed, missing or filled tooth surfaces (1, 2). ECC can be painful, may negatively impacts self-esteem, and is a strong predictor of future oral health problems (3, 4). Microbial digestion of carbohydrates to acids which demineralize tooth enamel is the proximate cause (5–7). Acid-producing bacteria, particularly Streptococcus mutans (S. mutans), are frequently associated with ECC (5, 8). No single bacteria species, however, has been conclusively identified as a necessary and sufficient cause of ECC across human populations (5, 8, 9). Recent research emphasizes the ecologic hypothesis, which posits that overall shifts in the composition, structure, functional potential of the oral microbial community leads to dental decay (5, 10). The oral microbiome assembles rapidly over the first two years of life (11). However, few studies of ECC have prospectively tested the ecologic hypothesis during this early life period of assembly.

To assess the bacterial community in saliva and plaque samples, 16S rRNA gene amplicon sequencing simultaneously measures the bacterial taxa present (11–15). However, common methods for analyzing 16S rRNA gene data fail to capture the spirit of the ecological hypothesis. Estimating the effect of each identified taxa as an independent predictor ignores how bacteria interact to affect risk, which is a key component of the ecological hypothesis (5, 16). Diversity metrics, such as alpha and beta diversity, conveniently and efficiently summarize information across all measured taxa, but findings on associations between diversity metrics and cariogenesis are mixed (17–21). The lack of consistency may be attributed to differences in study design, conduct and analysis, but also may reflect the inherent limitations of diversity metrics. These metrics ignore taxonomic, ecologic, and functional differences between bacteria which can impact disease processes such as cariogenesis (22). Common methods for analyzing 16S rRNA gene data do not adequately encapsulate the ecologic hypothesis.

Microbial communities and ecologies are dynamic, and early childhood is a susceptible life-period for short- and long-term oral microbial community assembly. The oral microbiome is acquired after birth and influenced by environmental factors (11, 12, 23). Very few studies have prospectively tested the effect of oral microbial community assembly on ECC risk. A 2019 Australian study of 134 children followed for 5 years noted a shift in salivary microbiome composition at 39 and 48.6 months of age associated with future ECC (14). Microbial taxa, including Streptococcus sobrinus and Scardovia wiggsiae, were identified as potential biomarkers of ECC onset. The percentage of S. mutans in saliva was the best prospective predictor of future ECC (13, 14). The authors concluded, however, that the magnitude of change in the salivary bacteriome was inadequate to differentiate between health and disease at clinical levels. A smaller 2020 study of 56 children aged 1–3 years followed for 2-years demonstrated that the early life salivary bacteriome could prospectively classify future ECC onset (area under the receiver operating curve = 0.71) and identified several taxa that may serve as biomarkers of ECC (15). These studies prospectively link community-wide shifts in the early-life salivary microbiome to ECC. However, they did not evaluate how co-occurrence or functional interactions between taxa influence ECC risk. Few longitudinal cohorts have explicitly evaluated how co-occurring groups of oral bacteria or functional interactions influence ECC risk.

To understand the influence of oral microbial community assembly on future oral health, explicit tests of the ecological hypothesis and identification of influential microbial communities is required. We used a longitudinal cohort of children to: 1) prospectively test the ecological hypothesis of ECC in salivary bacteriome communities and 2) identify co-occurring salivary bacterial communities influencing the risk of future ECC. We performed 16S rRNA gene amplicon sequencing on 855 longitudinal saliva samples from 99 children with ECC and 90 incidence-density sampled control children followed from 2-months to 5-years of age. We show that bacteriome-wide taxonomic information at 12-months of age better classifies future ECC status than S. mutans amplicon abundance alone. We identify robust and replicable communities of co-occurring bacteria using unsupervised clustering techniques, including a protective community of Neisseria/Haemophilus parainfluenzae/Fusobacterium periodonticum which was less abundant in future ECC cases. Finally, we comment on ecological and functional interactions that may shape the assembly of these communities using clinical data and functional potential measurements from a subcohort with shotgun metagenonomic sequencing data.

Description of cohort

We selected an incidence-density matched case-control subset from the Center for Oral Health Research in Appalachia 2 (COHRA2) cohort. In the entire COHRA2 cohort, 47% of children were female, 79% were White and 71% were delivered vaginally. At 2-months of age, 58% of children were breastfed; this decreased to 32% by 12-months of age and 6% by 24-months of age. By 24-months of age, 3.8% of children in COHRA2 had a carious lesion or white spot. We analyzed a nested case-control sample of 99 children who developed a carious lesion or white spot at or before 60-months of age and 90 control children who were free of dental lesions at the age of case diagnosis. Of the 189 children, 169 were White and 20 were bi- or multi-racial, 100 were from West Virginia and 89 from Pennsylvania, and 97 male and 92 female. None of these characteristics differed between cases and controls (Table 1). The mothers of controls were more likely to be educated beyond high school (63%) than the mothers of cases (33%, P < 0.001). Cases and controls were similar in the distribution of delivery mode, recent antibiotic exposure, breastfeeding, and count of erupted primary teeth (Table 1). Sampled controls were representative of the underlying disease-free cohort, although the proportion of bi- and multi-racial children was lower in the nested-case control sample (Additional file 1). Among the 99 ECC case children, the youngest age of diagnosis was 12-months, with a mean age of diagnosis of 38 months. We sequenced the V4 16S rRNA gene region in saliva samples from the visit corresponding to ECC diagnosis (incident visit) and all preceding visits (non-incident visits) for case and control children (Fig. 1, Figure S1-2 in Additional File 2, Additional File 3). From the 855 saliva samples across all incident and non-incident visits, we identified 3194 amplicon sequence variants (ASVs). We labeled ASVs that did not classify to the species level with ASV numbers. Alpha diversity of the salivary microbiome increased as children aged. Alpha diversity was inconsistently associated with future ECC diagnosis across visits (Table 1).

Table 1

Associations between future early childhood caries, salivary microbiome measures, and other characteristics among non-incident children from Appalachia
Characteristic	~ 2-month visit^a			~ 12-month visit^a			~ 24-month visit^a
Characteristic	Case, N = 99¹	Control, N = 91¹	p-value²	Case, N = 89¹	Control, N = 81¹	p-value²	Case, N = 74¹	Control, N = 69¹	p-value²
Shannon	2.1 (0.5)	1.9 (0.5)	0.01	2.9 (0.4)	3.0 (0.4)	0.05	3.5 (0.3)	3.5 (0.4)	0.5
Missing	7	6		5	5		4	4
Chao1	31.8 (12.3)	27.8 (10.0)	0.03	57.4 (16.8)	62.7 (14.6)	0.05	86.6 (20.1)	88.1 (19.2)	0.6
Missing	7	6		5	5		4	4
S. mutans abundance	0.0 (0.0)	0.0 (0.0)	> 0.9	0.0 (0.0)	0.0 (0.0)	0.07	0.0 (0.0)	0.0 (0.0)	< 0.001
Missing	7	6		5	5		4	4
S. mutans ASV detected			> 0.9			0.12			< 0.001
No	90 (98%)	83 (98%)		78 (93%)	75 (99%)		51 (73%)	64 (98%)
Yes	2 (2.2%)	2 (2.4%)		6 (7.1%)	1 (1.3%)		19 (27%)	1 (1.5%)
Missing	7	6		5	5		4	4
S. wiggsiae abundance	0.0 (0.0)	0.0 (0.0)	0.7	0.0 (0.0)	0.0 (0.0)		0.0 (0.0)	0.0 (0.0)	0.3
Missing	7	6		5	5		4	4
S. wiggsiae ASV detected			0.7						> 0.9
No	86 (93%)	78 (92%)		84 (100%)	76 (100%)		69 (99%)	65 (100%)
Yes	6 (6.5%)	7 (8.2%)					1 (1.4%)	0 (0%)
Missing	7	6		5	5		4	4
Child's race			0.8			0.8			0.5
Bi- or Multi-racial	10 (10%)	10 (11%)		8 (9.0%)	8 (9.9%)		6 (8.1%)	8 (12%)
White	89 (90%)	81 (89%)		81 (91%)	73 (90%)		68 (92%)	61 (88%)
Child's sex			> 0.9			> 0.9			0.7
Female	48 (48%)	44 (48%)		45 (51%)	41 (51%)		37 (50%)	32 (46%)
Male	51 (52%)	47 (52%)		44 (49%)	40 (49%)		37 (50%)	37 (54%)
Site			0.6			0.7			0.7
PA	45 (45%)	45 (49%)		37 (42%)	36 (44%)		37 (50%)	37 (54%)
WV	54 (55%)	46 (51%)		52 (58%)	45 (56%)		37 (50%)	32 (46%)
Currently breastfed			0.07			0.2			> 0.9
Currently breastfed	49 (49%)	57 (63%)		21 (24%)	27 (33%)		4 (5.4%)	4 (5.8%)
Not currently breastfed	50 (51%)	34 (37%)		68 (76%)	54 (67%)		70 (95%)	65 (94%)
Maternal report of child antibiotics within 3 mos prior to visit	8 (8.1%)	8 (8.8%)	0.9	24 (27%)	18 (22%)	0.5	20 (27%)	23 (33%)	0.4
Count of primary teeth erupted	0.0 (0.3)	0.0 (0.0)	0.2	6.1 (3.0)	6.1 (2.8)	0.9	16.4 (2.0)	16.0 (1.8)	0.12
Delivery			0.5			0.7			0.4
C-section	35 (36%)	28 (31%)		33 (38%)	28 (35%)		28 (38%)	22 (32%)
Vaginal	63 (64%)	63 (69%)		55 (62%)	53 (65%)		45 (62%)	47 (68%)
Not reported	1	0		1	0		1	0
Maternal education reported at prenatal visit			< 0.001			< 0.001			< 0.001
Associates degree or higher	33 (33%)	57 (63%)		28 (31%)	51 (63%)		30 (41%)	48 (70%)
High school degree or less	66 (67%)	34 (37%)		61 (69%)	30 (37%)		44 (59%)	21 (30%)
¹ Mean (SD); n (%) ² Wilcoxon rank sum test; Fisher's exact test; Pearson's Chi-squared test ^a Includes duplicate records for 1 child selected as a control at 36 months and a case at 60 months, and 1 child selected as a control for both 36 and 60 month risk sets. Excludes samples from children diagnosed as a case at that visit and their corresponding risk-set controls (N = 6 at 12-months, N = 37 at 24-months).

S. mutans did not associate with future ECC diagnosis before 24-months of age, but was elevated in cases at the visit of first ECC diagnosis

A single ASV identified as S. mutans. We validated the identity of this ASV using BLAST and shotgun metagenomic sequencing data (Additional File 4; Figure S3 Additional File 2). At the 2- and 12-month visits, S. mutans was rare and not associated with future ECC diagnosis (Table 1). By the 24-month visit, S. mutans was more prevalent in future cases (Table 1; P value < 0.001). S. mutans prevalence and abundance was elevated in cases at the visit of ECC diagnosis: 13 of 19 ECC cases diagnosed at 24-months had S. mutans at the 24-month visit vs 2 of 18 matched controls (Additional File 5; P value = 0.001). Similarly, Scardovia wiggsiae was elevated at the visit of ECC diagnosis but not in visits preceding diagnosis (Table 1, Additional File 5).

At 12- and 24-months of age, supervised random forest using the salivary bacteriome can predict ECC status before S. mutans detection

We investigated whether future ECC status could be predicted from a random forest classifier using the 273 most abundant and prevalent ASVs sequenced from saliva samples. Separate classifiers were built using samples from the 12- and 24-month visits. Only pre-incident samples were used, i.e., we predicted if a child would go on to be diagnosed with white spots or cavities at any of the 24-, 36-, 48- or 60-month visits using their 12-month saliva sample. Children who were diagnosed with white spots or cavities at the 12-month visit and their incidence-density matched controls were excluded from the classifier. Similarly, we predicted if a child would go on to be diagnosed at any of the 36-, 48- or 60-month visits using their 24-month saliva sample, excluding saliva samples from children diagnosed at 12- or 24-months. Thus, each random forest classifier predicted future ECC diagnosis using saliva samples from before disease was clinically apparent and diagnosed.

The random forest using 273 ASVs showed good classification of future ECC status at the 12-month (AUC (95% CI): 0.78, (0.71–0.85)) and 24-month visits (AUC (95% CI): 0.72, (0.63–0.81)) (Fig. 2A). The mean decrease in the Gini coefficient provides a measure of how important a feature is for classification, with a larger decrease corresponding to a greater importance. In Fig. 2B, we show the 10 ASVs with the largest decrease in Gini coefficient from the 12- and 24-month supervised random forest classifiers. The Gini coefficient for Streptococcus mutans is included for comparison. In Fig. 2C, the distribution of the square root of ASV abundance is shown for cases (black) and controls (grey) for the ASVs with the largest decreases in Gini coefficient at the 12- and 24-month visits. Several of the important features from the random forest classifiers were more abundant in controls than in ECC cases (protective ASVs). Protective ASVs Fusobacterium periodonticum and Neisseria ASV9 were among the top 10 most important features in both the 12- and 24-month classifiers. Haemophilus parainfluenzae and Porphyromonas ASV42 were among the top 10 most important features in only the 12-month classifier while Lachnoanaerobaculum umeaense and Porphyromonas ASV120 were among the top 10 most important features in only the 24-month classifier. Other important features were more abundant in ECC cases (cariogenic ASVs). Of these, only Prevotella histicola was among the top 10 most important features in both the 12- and 24-month classifier. Two Streptococcus ASVs were among the top 10 most important features in the 12-month classifier, but neither were identified as S. mutans. Streptococcus ASV8 was likely Streptococcus salivarius. Streptococcus ASV14 was closely related to Streptococcus lactarius/peroris (Additional file 4; Figure S3 in additional file 2).

Unsupervised clustering techniques identify similar groups of co-occurring taxa, which associate with ECC

Next, we attempted to identify ecologically meaningful groups of co-occurring taxa. To do so, we used two different unsupervised clustering techniques. One technique, Dirichlet multinomial community state typing, groups together samples with similar distributions of taxa into discrete clusters or community state types (CSTs). Thus, each sample is assigned to a single CST. The other technique, weighted co-occurrence network analysis, groups together taxa which co-occur across samples using graphs. ASVs are network nodes joined by edges weighted by the frequency at which two nodes co-occur across samples. Clusters of co-occurring ASVs, or network modules, are identified from the graph.

Using Dirichlet multinomial community state typing, we identified 6 community state types (CSTs) (Fig. 3, Figure S4-5, additional file 2). We named CSTs named after the ASVs defining their separation. CSTs corresponded to child age and ECC status. At the 2-month visit, most children’s samples belonged to one of two Streptococcus-dominated CSTs. Similar proportions of case and control samples were assigned to these two CSTs. At the 12-month visit, most control samples belonged to a more diverse H. parainfluenzae - Neisseria ASV9 – Gemella ASV2 CST while most cases samples belonged to a Streptococcus ASV8 – Neisseria ASV12 CST (Fig. 3). By the 24-month visit, most control samples transitioned to a second Hemophilus parainfluenzae and Neisseria ASV9 CST, while most case samples transitioned to a Neisseria ASV12 – Veillonella ASV5 CST (Fig. 3, additional file 5–6). The odds of future ECC diagnosis were 8 (95%CI: (3, 22)) times higher for children assigned to the Streptococcus ASV8 - Neisseria ASV12 CST as compared to children assigned to the H. parainfluenzae - Neisseria ASV9 -Gemella ASV2 CST at 12-months after controlling for maternal education, count of emerged primary teeth, mode of birth delivery, breastfeeding, antibiotic exposure within 3 months and visit of case diagnosis (P value = 0.002, Table 2). Similarly, the odds future ECC diagnosis were 5 (95% CI: (2, 12)) times higher for children assigned to the Neisseria ASV12 - Veillonella ASV5 CST as compared to those assigned to the H. parainfluenzae - Neisseria ASV9 at 24-months, after controlling for maternal education, count of emerged primary teeth, mode of birth delivery, breastfeeding, antibiotic exposure within 3 months and visit of case diagnosis (P value = 0.001, Table 2).

Table 2

Odds ratios and 95% confidence intervals (CIs) for community state types and future early childhood caries case status from logistic regressions stratified by visit (12- or 24-month visit) among 189 children in a nested case-control study selected from the Center for Oral Health Research in Appalachia 2 cohort study
Characteristic	12-month visit				24-month visit
Characteristic	N	OR¹	95% CI¹	p-value	N	OR¹	95% CI¹	p-value
Community state type²				< 0.001				< 0.001
Gemella ASV2 - H. parainfluenzae-Neisseria ASV9	81	—	—		6	NA²	NA²
H. parainfluenzae - Neisseria ASV9	9	NA²	NA²		67	—	—
Streptococcus ASV1 dominated w/ G. elegans	18	4.39	1.36, 15.7		0	NA²	NA²
Streptococcus ASV8 - Neisseria ASV12	42	7.73	3.00, 21.9		20	3.48	1.14, 11.7
Neisseria ASV12 - Veillonella ASV5	2	NA²	NA²		41	4.77	1.94, 12.5
Streptococcus ASV1 dominated w/ Gemella ASV2	4	NA²	NA²		0	NA²	NA²
Breastfeeding status				0.57				0.88
Currently breastfeeding	45	—	—		7	—	—
Not currently breastfeeding	111	1.30	0.52, 3.28		127	0.87	0.14, 5.02
Child received antibiotics within 3 mos prior to visit				0.74				0.75
No	116	—	—		93	—	—
Yes	40	1.16	0.47, 2.91		41	1.16	0.47, 2.87
Count of primary teeth emerged	156	1.01	0.88, 1.17	0.84	134	1.11	0.90, 1.38	0.33
Birth delivery mode				0.51				0.93
C-section	57	—	—		46	—	—
Vaginal	99	0.76	0.33, 1.72		88	0.96	0.40, 2.35
Maternal education at prenatal visit				0.003				0.030
Associates degree or higher	74	—	—		74	—	—
High school degree or less	82	3.32	1.49, 7.67		60	2.49	1.09, 5.81
Visit of case diagnosis/control matching³				0.48				0.86
36 mos	68	—	—		73	—	—
24 mos	33	0.45	0.15, 1.27
48 mos	37	0.96	0.36, 2.56		43	0.98	0.40, 2.35
60 mos	18	0.86	0.25, 2.89		18	0.71	0.21, 2.39
¹ OR = Odds Ratio, CI = Confidence Interval ²Because community state type assignment correlated with sample age, some community state types had very small cell counts at the 12- and 24-month visits. If a cell count for a community state type was < 10, we do not report the odds ratio or 95% CI since these estimates are likely unstable. We use the CST with the largest cell count as the reference category in each visit strata. ³To ensure the salivary bacteriome is prospectively predicting future early childhood caries diagnosis, cases and matched controls which were diagnosed at 12-months (N = 6) were not included in the regression model performed using 12-month salivary bacteriome characteristics as a predictor. Similarly, cases and matched controls diagnosed at 12- or 24-months (N = 37 at 24-months) were not included in the regression model performed using the 24-month salivary bacteriome characteristics as a predictor. Includes duplicate records for 1 child selected as a control at 36 months and a case at 60 months, and 1 child selected as a control for both 36- and 60-month risk sets. Since children were matched only on visit of diagnosis, controlling for visit of diagnosis controls for all matching variables.

Using weighted co-occurrence network analysis, we identified five network modules of co-occurring ASVs. Network modules were named after the top two most abundant ASVs in the network and the most highly connected or central ASV in the module (Fig. 4A&B; Figures S6-7 additional file 2; additional file 7). We create a single summary measure for each network module by summing the relative abundance of all taxa assigned to the module. A Haemophilus parainfluenzae and Neisseria ASV9 network module with a Fusobacterium periodonticum as the most central taxa was more abundant in controls at 12- and 24-months. For every 1 percentage point increase in relative abundance of this network module at 12-months, the odds of ECC at a future visit were 0.94 (95% CI: 0.91, 0.97) times higher, after controlling for maternal education, count of emerged primary teeth, breastfeeding, antibiotic exposure within 3 months and visit of case diagnosis (P value < 0.0001, Table 3). Conversely, a Veillonella ASV5 and Streptococcus ASV8 network module with a central taxon of Lachnoaerobaculum orale was more abundant in cases (Fig. 4B). For every 1 percentage point increase in relative abundance of this network module at 12-months, the odds of ECC at a future visit were 1.04 (95% CI: (1.02, 1.07)) times higher, after controlling for maternal education, count of emerged primary teeth, breastfeeding, antibiotic exposure within 3 months and visit of case diagnosis (P value = 0.001, Table 3). Three other network modules were not consistently associated with dental decay (Figure S6-7, additional file 2). S. mutans was a member of one of these networks, which had Streptococcus ASV1 and Neisseria ASV12 as the most abundant ASVS and Actinomyces ASV41 as the most central.

Table 3

Odds ratios and 95% confidence intervals (CIs) for summed network module abundance and future early childhood caries case status from logistic regressions stratified by visit (12- or 24-month visit) among 189 children in a nested case-control study selected from the Center for Oral Health Research in Appalachia 2 cohort study
Characteristic	Veillonella ASV5 & Streptococcus ASV8 network 12-month visit				Veillonella ASV5 & Streptococcus ASV8 network 24-month visit				Haemophilus parainfluenzae & Neisseria ASV9 network 12-month visit				Haemophilus parainfluenzae & Neisseria ASV9 network 24-month visit
Characteristic	N	OR¹	95% CI¹	p-value	N	OR¹	95% CI¹	p-value	N	OR¹	95% CI¹	p-value	N	OR¹	95% CI¹	p-value
Summed network module relative abundance²	159	1.04	1.02, 1.07	< 0.001	134	1.05	1.01, 1.09	0.010	159	0.94	0.91, 0.97	< 0.001	134	0.96	0.91, 1.00	0.031
Breastfeeding status				0.85				0.43				0.74				0.69
Currently breastfeeding	45	—	—		7	—	—		45	—	—		7	—	—
Not currently breastfeeding	114	1.08	0.47, 2.50		127	0.52	0.09, 2.71		114	1.15	0.49, 2.69		127	0.71	0.12, 3.78
Child received antibiotics within 3 mos prior to visit				0.92				0.35				0.43				0.94
No	119	—	—		93	—	—		119	—	—		93	—	—
Yes	40	0.96	0.42, 2.21		41	0.67	0.29, 1.54		40	1.40	0.61, 3.28		41	1.03	0.45, 2.38
Count of primary teeth emerged	159	1.03	0.91, 1.17	0.65	134	1.17	0.95, 1.45	0.14	159	1.01	0.88, 1.15	0.91	134	1.11	0.91, 1.37	0.31
Birth delivery mode				0.98				> 0.99				0.73				> 0.99
C-section	59	—	—		46	—	—		59	—	—		46	—	—
Vaginal	100	1.01	0.48, 2.14		88	1.00	0.43, 2.32		100	0.87	0.40, 1.90		88	1.00	0.44, 2.29
Maternal education at prenatal visit				< 0.001				0.004				< 0.001				0.001
Associates degree or higher	75	—	—		74	—	—		75	—	—		74	—	—
High school degree or less	84	4.50	2.14, 9.89		60	3.09	1.44, 6.84		84	3.64	1.69, 8.09		60	3.46	1.62, 7.66
Visit of case diagnosis/control matching³				0.75				0.76				0.73				0.96
36 mos	69	—	—		73	—	—		69	—	—		73	—	—
24 mos	33	0.61	0.24, 1.56						33	0.60	0.22, 1.62
48 mos	38	0.71	0.28, 1.80		43	0.79	0.34, 1.81		38	0.98	0.39, 2.46		43	0.89	0.39, 2.05
60 mos	19	0.75	0.23, 2.34		18	0.69	0.21, 2.19		19	1.12	0.34, 3.74		18	0.89	0.27, 2.83
¹ OR = Odds Ratio, CI = Confidence Interval ²Summed network module abundance obtained by summing the relative abundance of all amplicon sequence variants belonging to a network module. Summed network module abundance scaled to range from 0 to 100 such that the odds ratio can be interpreted as the ratio in odds of future ECC diagnosis for a one percentage point increase in relative abundance of the network. ³To ensure the salivary bacteriome is prospectively predicting future early childhood caries diagnosis, cases and matched controls which were diagnosed at 12-months (N = 6) were not included in the regression model performed using 12-month salivary bacteriome characteristics as a predictor. Similarly, cases and matched controls diagnosed at 12- or 24-months (N = 37 at 24-months) were not included in the regression model performed using the 24-month salivary bacteriome characteristics as a predictor. Includes duplicate records for 1 child selected as a control at 36 months and a case at 60 months, and 1 child selected as a control for both 36- and 60-month risk sets. Since children were matched only on visit of diagnosis, controlling for visit of diagnosis controls for all matching variables.

Although these two unsupervised methods cluster differently (one clustering samples and the other clustering taxa), they identified similar clinically relevant patterns in bacterial compositional data. Both identified a pattern of H. parainfluenzae and Neisseria co-occurrence elevated in controls, and a pattern of Streptococcus and Veillonella elevated in cases. Nine of the ten ASVs used to name the networks (top two most abundant ASVs in each of five network modules) were also in the top ten most important ASVs for defining the separation of CSTs (Additional file 2 Figure S8).

The ECC-associated communities identified through unsupervised clustering were robust to varying hyperparameters. In the CST analysis, we varied the number of k CSTs (k = 4 vs 5 vs 6, additional file 8). In the network analysis, we varied the normalization transform function (Hellinger vs center-log, Figure S9-10).

Communities identified through unsupervised clustering are reproducible in an external cohort

To examine the reproducibility of these bacterial community networks, we performed the same analytic pipeline (see Methods) on publicly available 16S rRNA gene sequencing data from longitudinal saliva samples of similarly aged children with a 10% prevalence of early childhood caries (Holgerson et al.; PRJEB35824; (12)). We were unable to obtain access to metadata for these samples.

A Haemophilus parainfluenzae and Neisseria perflava network module with central taxa Fusobacterium periodonticum was also identified in the Holgerson et al. sample (Fig. 4C additional file 2 Figures S11-12). The Neisseria ASV9 amplicon from our cohort was closely related to the Neisseria perflava amplicon from the Holgerson et al. cohort (additional file 2 Figure S13A).

A similar Veillonella dispar/Streptococcus/Prevotella network module was also identified in the Holgerson et al. sample (Fig. 4C). The Veillonella ASV5 amplicon from our cohort was closely related to the Veillonella dispar amplicon from the Holgerson et al. cohort (additional file 2 Figure S13B).

Early-life bacterial communities are associated with concurrent salivary pH, future S. mutans prevalence, and primary teeth count

We tested if bacterial communities from our unsupervised clustering associated with etiologically relevant variables in our cohort. Although salivary pH did not differ between cases and controls at the 12- and 24-month visit (Fig. 5A), abundance of the H. parainfluenzae/Neisseria ASV9 network module was correlated with increasing salivary pH (12-month rho = 0.31, P value = 0.002; 24-month rho = 0.28; P value = < 0.001; Fig. 5B). Mean salivary pH was also higher in samples in CSTs characterized by H. parainfluenzae and Neisseria ASV9 (12-month mean: 6.78; 24-month mean: 6.71) than in those characterized by Streptococcus ASV8, Neisseria ASV12 and Veillonella ASV5 (12-month: 6.54, Wilcoxon P value = 0.05; 24-month: 6.63, Wilcoxon P value = 0.03; Fig. 5D).

Children assigned to CSTs characterized by Streptococcus ASV8, Neisseria ASV12, and Veillonella ASV5 at the 12- and 24-month visits were more likely to have S. mutans detected at their next visit than children with communities characterized by Haemophilus parainfluenzae and Neisseria ASV9 (percent with S. mutans at 24-months: 34% vs 9%, Fisher’s exact P value = 0.03; at 36-months: 46% vs 21%, P value < 0.01). Children who acquired S. mutans by their next visit had lower abundances of the Haemophilus parainfluenzae-Neisseria ASV9 network and higher abundances of the Veillonella ASV5-Streptococcus ASV8 network than children who did not go on to have S. mutans (Fig. 5C).

The average number of primary teeth present was higher in children assigned to CSTs from later ages. The relative abundance of the Streptococcus ASV1-Neisseria ASV12 network, which included both S. mutans and Streptococcus sanguinis, correlated with the number of primary teeth present at the 12- and 24-month visits. This was not true for the protective H. parainfluenzae-Neisseria ASV9 network (Additional file 2 Figure S14). For children from Pennsylvania, the approximate age at first tooth emergence was available but was not associated with CST nor network modules.

Whole-genome shotgun metagenomics of 15 incident case samples and matched controls revealed significant differences in taxa and KEGG ortholog abundances between incident case- and control-samples

We tested for differences in the community composition and functional potential of cases and controls using saliva and plaque samples from the visit of case ECC diagnosis for 15 cases and 15 matched controls. Among others, Scardovia wiggsiae, Prevotella histicola, Veillonella dispar, Streptococcus mutans and Streptococcus salivarius were more abundant in case than matched control saliva and plaque samples at the time of diagnosis (Fig. 6; Additional file 9). Prevotella salivae was more abundant in case than matched control saliva but not plaque samples (Benjamini-Hochberg P_BH value < 0.05). The fungal genus Candida was only present in case plaque samples.

Cases and controls differed in the abundance of gene orthologs (Fig. 7). Associations with case status were stronger in plaque than saliva. Gene orthologs related to antibiotic production and resistance were more abundant in case plaque, including a major facilitator superfamily multidrug resistance transporter (P_BH value = 1.9*10^− 28) and lantibiotic transport system permease protein (P_BH value = 6.0*10^− 6) (Additional file 11). The oxidative phosphorylation KEGG pathway was enriched in case plaque (P_BH value = 1.2*10^− 8), while the ABC transporter pathway was depleted (P_BH value = 3.6*10^− 3, Additional file 11). All the case-associated gene orthologs annotating to oxidative phosphorylation were found only in Candida (Additional file 12 and Figure S15 in additional file 2).

Results of our analysis of 99 ECC cases and 90 incidence density matched children supports the ecologic hypothesis for ECC. We showed that bacteriome-wide information classified future ECC status before reliable detection of salivary S. mutans. We expanded on previous work by identifying replicable groups of co-occurring bacteria, which may represent true ecological interactions. We showed that these groups associate with concurrent salivary pH, future S. mutans acquisition and future ECC diagnosis, suggesting an ecological succession to cariogenesis. By incorporating shotgun metagenomic sequencing data, we identified functional mechanisms for ecological interactions between bacteria, including pathways related to antibiotic production and resistance. Together, these observations suggest early-life bacterial interactions during a susceptible life period can predispose individuals to ECC.

Our findings on salivary bacteriome assembly and association with ECC fit within the previous literature. We observed a well-documented succession from Streptococcus-dominated, low-diversity communities to more diverse communities by 24-months of age with stabilization thereafter (11, 12, 24, 25). As in cross-sectional dental research, we found an association between S. mutans and ECC at the time of ECC diagnosis (8). Like previous prospective studies of ECC, we found evidence for an association between early life salivary bacteriome composition and future ECC (14, 15). We were able to distinguish ECC cases from controls more accurately and at an earlier age than reported by Dashper et al., while the AUC-ROC for our 12-month random forest (0.78) is close to that of Grier et al. (0.71) (14, 15). While S. mutans, S. sobrinus, and Scardovia wiggsiae were elevated in cases at diagnosis, we found that the salivary bacteriome could prospectively predict ECC as early as 12-months, before reliable detection of these risk taxa. This supports a time-dependent interpretation of the ecological hypothesis, in which dysbiosis in the oral microbial community precedes salivary S. mutans detection, a marker of late-stage cariogenesis. Our findings highlight the first two years of life as a susceptible period for assembly of a cariogenic oral microbial community.

Unlike most previous work, we identified specific and reproducible ECC-associated bacterial communities using unsupervised clustering techniques. These unsupervised techniques better encapsulate the ecological hypothesis than diversity metrics, which may be too coarse to summarize finer level differences in communities (22). In our cohort, alpha diversity was weakly and inconsistently associated with future ECC status, echoing previous mixed findings (17–21). In contrast, groups of taxa from unsupervised clustering techniques were strongly and prospectively associated with ECC: a Haemophilus parainfluenzae, Neisseria, and Fusobacterium periodonticum community was depleted in cases while a Prevotella, Streptococcus and Veillonella community was more abundant. These bacterial communities were consistent across clustering methods, reproducible in an external cohort (12), and in line with previous work. A 2014 study of longitudinal tongue samples found strong correlations between abundances of Haemophilus parainfluenzae and Neisseria subflava (26). A cross-sectional 2017 analysis of young adult saliva found similar Haemophilus parainfluenzae/Neisseria subflava and Veillonella/Prevotella clusters (27). A 2021 co-occurrence analysis of cross-sectional adult saliva samples found a similar network module of Prevotella salivae, Veillonella atypica and Streptococcus salivarius (28). Our unsupervised clusters were also separated by genetically distinct sequence variants of Neisseria, Veillonella and Fusobacterium. Intragenera competition and strain-specific impacts on biofilm formation have been observed for Neisseria (29, 30), Veillonella (31, 32) and Fusobacterium species in previous work (33–35). Groups of ECC-associated taxa identified from our unsupervised clustering may reflect ecological roles and interactions, including intergenera cooperation and intragenera competition.

We also tested how these bacterial communities were associated with etiologically relevant variables. The Streptococcus ASV1- Neisseria ASV12 network was correlated with the number of primary teeth present. Members of this network, including S. mutans and Streptococcus sanguinis, are known to have a preference for hard oral surfaces, increasing in abundance after tooth emergence (11). The protective Haemophilus parainfluenzae, Neisseria, and Fusobacterium periodonticum network was correlated with salivary pH and inversely associated with future S. mutans detection. In a recent in vitro study, Neisseria was positively correlated with salivary pH (36). The associations of our identified bacterial communities with etiologically relevant variables provides further evidence that the communities are biologically meaningful. Moreover, these associations link the early-life oral environment to an ecological succession towards cariogenesis.

Importantly, we identified differences in the functional potential of microbial communities at the time of ECC case diagnosis using shotgun metagenomic sequencing. ECC cases and controls differed in the abundance of gene orthologs annotating to KEGG pathways related to antibiotic production and resistance. This finding is supported by previous analyses of shotgun metagenomic sequencing and dental caries (37, 38). In our study, case-enriched gene orthologs included those involving competition between related species, such as bacteriocin exporters (39) and lantibiotic production (40). These functions may represent mechanisms for the co-occurrences and potential interactions we observed between oral taxa.

The tissue of measurement for assessment of the oral microbiome is an important consideration. We performed 16S rRNA gene sequencing on longitudinal saliva samples, and shotgun metagenomic sequencing on a subsample of cross-sectional plaque and saliva samples. Plaque, not saliva, is the most proximate tissue in cariogenesis. However, plaque is difficult to collect from edentulous children, has a low biomass, and is unlikely to be used as a prognostic marker in a clinical setting. Thus, the predictive power of the early-life salivary microbiome is of practical, clinical interest. Saliva is also a composite tissue and washes over many oral surfaces with different microbial communities (7, 41–43). Therefore, differences in saliva bacteriome composition between cases and controls could reflect differences in the bacterial abundance of oral surfaces rather than changes in only the salivary bacteriome composition. Consequently, the co-occurrence patterns we identified may reflect niche-sharing of oral surfaces rather than cooperation between taxa. Although ECC-associated communities did not associate with proxies for soft-to-hard tissue ratio (tooth number; age at first tooth emergence) we cannot conclusively rule out this explanation. While having both saliva and plaque samples in the shotgun metagenomic sequencing subsample is a strength, our analysis is limited by not including longitudinal plaque samples.

Sequencing methods influence the inferences that are possible from the data. The V4 region of the 16S rRNA gene is limited in ability to resolve fine-level taxonomic differences. This could affect the identification and measurement of Streptococcus amplicons in our dataset. We validated the identity of Streptococcus amplicons using BLAST and shotgun metagenomic data, but nondifferential exposure misclassification of S. mutans prevalence is possible. The 16S rRNA gene also does not measure virus, eukaryotes, or interspecies functional variation. Without longitudinal shotgun metagenomic data, we cannot comment on virome or mycobiome assembly, or longitudinal changes in functional potential. Both 16S rRNA gene and shotgun metagenomic sequencing data is inherently compositional. We instituted transformations to address compositionality but did not have absolute abundance data. While our use of both 16S rRNA gene and shotgun metagenomic sequencing is a strength, our measurement of the oral microbiome is limited by these characteristics of sequencing methods.

Our study design is observational, so causality cannot be conclusively proved. However, our exposure measurements precede our outcome, fulfilling a key causal requirement. Our study population was primarily children of European descent from northern and north central Appalachia. Although some of the unsupervised clusters from our cohort were replicable in the Swedish Holgerson et al. cohort, microbial communities can differ by geography, race, and ethnicity. Thus, the generalizability of our findings may be limited. Further studies in additional populations, incorporating shotgun metagenomic sequencing, quantification of absolute bacterial load, and site-specific measures of oral bacterial communities are warranted.

We found that the early-life salivary microbiome associated with risk of ECC before S. mutans could be detected, supporting a time-dependent interpretation of the ecological hypothesis. Our analysis is strengthened by a longitudinal design, balanced case-control ratios, incorporation of both amplicon and shotgun metagenomic sequencing, and replication analyses. Our observations on the suitability of diversity measures vs other clustering techniques to detect fine scale differences are applicable in other microbial contexts. Our findings on ecological succession and bacterial interactions in early life may also be generalizable to other systems of microbiome development. Overall, our analyses support a developmental interpretation of the ecological hypothesis, and raise the possibility that intragenera cooperation, intergenera competition, and ecological successions in early life can predispose children to ECC.

Study cohort We used data from the Center for Oral Health Research in Appalachia 2 study (COHRA2) (44). COHRA2 recruited White, pregnant women between 2011 and 2015 from Pennsylvania and West Virginia. Healthy women who were in the 12th to 29th week of pregnancy, of European descent, over 18 years of old, fluent in English, and with a singleton pregnancy were eligible for inclusion. Women and their babies were followed longitudinally through the early years of the baby’s life. Women were excluded if they had tuberculosis, were immunocompromised, thought they might soon leave the general regions of West Virginia or southwestern Pennsylvania, or did not have a reliable telephone contact. Mother-child pairs also were excluded from the study if the child was delivered before the 35th week of pregnancy or if the mother or child developed a serious medical condition.

Participants completed in-person visits when the child was 2-months and 12-months old, then yearly thereafter. Mother-child pairs from the Pennsylvania site had additional in-person visits at birth and when the child’s first primary tooth erupted. At in-person visits mothers and children underwent a comprehensive dental assessment by a trained and calibrated dental professionals (training and calibration described in detail in Neiswanger et al (44)); participants were asked not to eat or drink for 2 hours prior to the examination. The examination included caries assessment via the PhenX Toolkit Dental Caries Experience Prevalence Protocol (http://www.phenxtoolkit.org/, protocol number 080300) which allows for the decayed, missing, and filled tooth count to be calculated either including or excluding white spots. The dental examination also included collection of microbial samples from saliva, plaque and gingival swabs using OMNIgene Discover kits (OM-501 or 505 DNA Genotek); only saliva and plaque samples were used in this analysis. Saliva was collected via swabs for children too young to spit into a collection tube and via spitting otherwise. Pooled plaque samples were taken with a Stimudent or curette from three intact tooth surfaces (in UNS/FDI notation: 8-buccal/51-buccal, 24-buccal/71-buccal, 31-occlusal/84-occusal or nearby surfaces if these were not intact). Plaque is also taken from tooth surfaces with untreated dental lesions. Salivary pH was also measured at visits where the child was old enough to spit (most by 12-months, all by 24-months).

A 30–45-minute telephone interview was administered to the mothers at approximately 6-month intervals to capture sociodemographic and behavioral data.

Sampling & case definition For this analysis we selected 99 children who had any dental lesions, including white spots (d1mft), at or prior to the 60-month visit as early childhood caries (ECC) cases. The visit in which a child was first identified as having a dental lesion was the incident-visit for that child. We then selected a similar number of children who were free of dental lesions at the same visit as the cases to serve as incidence-density sampled controls (n = 90). Incidence density sampling does not preclude the reselection of a control as a case at later time points; controls can also be selected as controls for multiple cases (Fig. 1) (45). In this analysis, one control was later selected as a case and one control was selected as a control twice (n = 92 control records). Duplicate records of the case/control and control/control children were not used in the supervised random forest: the case/control was only included as a case and the control/control was only included as a control once. In both unsupervised clustering techniques, we did not include duplicate records from these individuals when performing initial clustering but did include them when graphing and testing associations between identified clusters and variables of interest (i.e., in Table 1, Figs. 1–3). The number of total unique individuals in the analysis was 189, with 191 unique person-records (Additional file 3).

All available saliva samples from cases and controls, up to and including the incident-visit saliva sample, were pulled for 16S rRNA amplicon sequencing (Fig. 1). Note that selected individuals occasionally missed visits, did not have a saliva sample available, or had a saliva sample which failed 16S amplicon quality control (Additional file 3). Additionally, we randomly selected a sub cohort of 15 cases presenting with enamel lesions at or after the 36-month visit and 15 corresponding controls. Plaque and saliva samples from the visit of case diagnosis for these 30 individuals were submitted for shotgun metagenomic sequencing.

Laboratory and bioinformatics pipeline for 16S rRNA amplicon metagenomic sequencing Bacterial DNA was extracted from aliquots of saliva. Library preparation and sequencing of the 16S rRNA V4 amplicon was performed by the Michigan Microbial Systems Molecular Biology Laboratory using previously validated protocols (46). DNA extraction was performed using the Eppedorf EpMotion liquid handling system following the Qiagen MagAttract PowerMicrobiome kit protocol. The V4 variable region was amplified from extracted DNA using barcoded dual-index primers and sequenced on the Illumnia MiSeq platform using the MiSeq Reagent Kit V2 500 cycles. Each plate of samples was submitted with a positive mock community control, a DNA extraction kit control, and a negative water control (Additional file 2 Figures S3-4). Reads were processed to amplicon sequence variants (ASVs) using DADA2 (version 1.14.1) (47) and the Human Oral Microbiome Database (HOMD) version 15.2 (48). To identify contaminants, we used the R package decontam (version 1.8.0) (49). We filtered out samples with less than 1000 reads (n = 6 samples lost). Diversity metrics were calculated using the estimate_richness function from the R package phyloseq all ASVs. However, to limit the number of features used in supervised and unsupervised learning, we instituted a prevalence-abundance ASV filter. ASVs which were present in less than 5% of all samples and which represented less than 5% of all sequences in the samples in which they were present were excluded from the analytic subset for supervised random forest and unsupervised clustering techniques (m = 273 ASVs in analytic subset). ASVs were not collapsed at the genus or species level.

Random forest We used the 12- and 24-month visits as inputs for the random forest as these visits preserved a large subset of pre-incident samples. Only non-incident cases and matched controls were used in the random forest: individuals with incident-visit saliva samples were excluded (6 individuals with available samples who were identified as cases or controls at the 12-month visit and 37 individuals identified at the 24- month visit were excluded, total sample size of n = 158 and n = 133). Hellinger transformed ASV counts from the 273 ASVs in our analysis subset were used in the random forest. Using the train function in the R package caret (50), we ran 5 repeats of 10-fold cross validated random forest machine algorithms with 500 trees. We allowed the mtry parameter (number of parameters randomly sampled as candidates at each tree split) to be tuned from a choice of 2, 136, or 271 using the receiver operating characteristic curve; for both the 12-month and 24-month all taxa random forest an mtry parameter of 2 was selected. Area under the receiver operating curve and other evaluation statistics were calculated using the R package MLeval (51).

Dirichlet multinomial community state typing We used the R package DirichletMultinomial to cluster samples into community state types (CSTs) using Dirichlet multinomial mixture models (52). We fit ten Dirichlet multinomial models, using as input the count matrix of the 855 samples by 273 ASVs in the analytic subset and varying the number of Dirichlet components (i.e., CSTs) from 1 to 10. We calculated the Laplace measure of fit for each model and plotted against k, identifying k = 6 as the best model. We varied k={4, 5} as a sensitivity analysis. Samples were assigned to the single k CST for which they had the highest posterior probability of membership; if a sample assigned to no CST at a posterior probability > 80%, the sample was not assigned to any CST.

Weighted co-occurrence networks We used the R package WGCNA to build a signed weighted network of ASVs using the Hellinger-transformed count matrix of 855 samples and 273 ASVs (53). As a sensitivity analysis, we used the center-log ratio transformed count matrix instead. The soft thresholding power of the signed network was selected to maximize the R^2 of the model fit while preserving the mean connectivity of the network using the pickSoftThreshold function in WCGNA. We used a dynamic tree cut and the cutreeDynamic function in WCGNA to identify network modules or clusters using a minimum module size of 5 and a deep split value of 4, with the aim of producing more fine-grained clusters. Intramodular connectivity statistics were calculated for each ASV using the intramodularConnectivity function. Finally, per-sample module relative abundances were calculated by summing the relative abundances of all ASVs belonging to the same module.

Replication cohort We performed the exact same bioinformatics and analytic pipeline on publicly available V3-V4 16S rRNA gene data from the Holgerson cohort (PRJEB35824) (12), as we did to the COHRA2 samples. This cohort was also composed of sequential salivary samples from similarly aged children, the prevalence of ECC was 10% by 60 months of age. The laboratory methods for these samples are described in Holgerson et al. (12). All the bioinformatics parameters and steps were the same as described above, with the exception that decontam was not used to identify potential contaminants as the publicly available data did not include DNA quantitation data. Since we could not obtain access to any metadata characteristics of these samples, including ECC status, the random forest models could not be run. For visualization purposes, the two matching networks shown in Main Fig. 4C were filtered to only edges with a weight > 0.03. The full, unfiltered network images are shown in Additional File 2. To compare the relatedness of the amplicon sequence variants assigned to various network modules across the COHRA2 and Holgerson cohort, we performed multiple sequence alignment of the amplicons using the R packages msa, using the ClustalW algorithim (54). We computed pairwise distances from the DNA sequences using the r function dist.dml from the r package phangorn (55), using the JC69 model. We created a neighbor joining tree using the phangorn function NJ, then fit a generalized time-reversible with gamma rate maximum likelihood tree using the neighbor joining tree as a starting point. We obtained 100 bootstrap values for the tree using bootstrap.pml and plotted the tree using ggtree (56) and collapsed branches present in < 50 of the bootstrapped trees.

Statistical analyses We used logistic regression to test for associations between summary metrics from unsupervised clustering and ECC separately at the 12- and 24-month visits while controlling for potential confounders identified from literature review and directed acyclic graphic. For each time point only pre-incident cases and controls were included in the regression, i.e., cases diagnosed at 12-months of age and age-matched controls were not included in the 12-month regression models. This ensures that the regression models test for prospective associations between current salivary bacteriome and future ECC diagnosis. To test for associations between CST and future ECC diagnosis, ECC status was used as the outcome and CST assignment was included as a categorical predictor. To test for associations between network modules and future ECC diagnosis, ECC status was used as the outcome and relative abundance of the network module was included as a continuous predictor ranging from 0 to 100. This allows the exponentiated coefficient to be interpreted as the odds ratio for a 1 percentage point increase in network module abundance. In adjusted models we included the following covariates: binary indicator for child being currently breastfed at the visit, binary indicator for maternal report of child antibiotic use within 3-months of visit, count of emerged primary teeth, binary indicator for birth delivery mode, binary indicator for maternal education greater than high school, and categorical variable for visit of case diagnosis/control matching.

Laboratory and bioinformatics pipeline for shotgun metagenomic sequencing DNA was extracted from plaque and saliva samples using the Zymobiomics miniprep kit according to the manufacturer’s instructions. Isolated DNA was quantified by Qubit. DNA libraries were prepared using the Illumina Nextera XT library preparation kit according to the manufacturer’s protocol. Library quantity and quality was assessed with Qubit (ThermoFisher) and Tapestation (Agilent Technologies, CA, USA). Libraries were then sequenced on Illumina HiSeq platform 2x150bp. Quality filtering and adapter trimming were performed using Trimmomatic and the Nextera PE adapters. Host DNA was removed using bowtie2 and the GRCh38 index. Trimmed, cleaned and decontaminated reads were processed through both the Humann3 short-read profiling pipeline (57) and the SqueezeMeta assembly-based pipeline (version 1.4.0) (58). Plaque and saliva samples were run separately through the assembly pipeline. Briefly, assembly was done using Megahit, ORFs were predicted using Prodigal, and similarity searches against GenBank, eggnog and KEGG were conducted using Diamond. Read mapping against contigs was performed using Bowtie2. Binning was done using MaxBin2 and Metabat2 and bins were combined using DAS Tool. To test for differential abundance of KEGG orthologs and taxa abundance estimated from contigs, we used DESeq2, first filtering out KEGG or taxa with fewer than 500 reads from the testing subset. We tested for enrichment in KEGG pathways using gene set enrichment analysis and the R package fgsea separately on plaque and saliva samples. We used the package SQMTools to extract functional and taxonomic subsets of interest, such as the KEGG orthologs which annotated to oxidative phosphorylation. To test correlations between 16S rRNA gene amplicon sequence variants and abundances of taxa from whole genome sequencing, we used a partial spearman correlation while controlling for incident visit and case status.

ECC: early childhood caries

ASV: amplicon sequence variant

16S rRNA: 16S ribosomal RNA

CST: community state type

Ethics approval and consent to participate The study has IRB approval from the University of Pittsburgh and West Virginia University. All potential participants have the study explained to them in detail and are sent copies of the consent forms before their initial appointments. At the first visit, the study is explained again, questions are answered, and the women sign consent forms prior to any research assessments.

Consent for publication Not applicable

Availability of data and materials The 16S rRNA gene amplicon sequencing data and shotgun metagenomic sequencing data from the COHRA2 study is publicly available at the PRJNA752888 repository. The 16S rRNA gene amplicon sequencing data from the Holgerson et al. replication cohort is publicly available at the PRJEB35824 repository. Phenotype data for the COHRA2 study are available at dbGaP phs001591.v1.p1 upon application. All of the code to reproduce the analyses in this paper is available at https://github.com/blostein/ECCPaper1.

Competing interests The authors declare that they have no competing interests

Funding This work was funded by the National Institutes for Health, National Institute for Dental and Craniofacial research grant R01 DE014899. FB was funded by the National Institutes for Health, National Institute for Dental and Craniofacial research grant F31 DE029992.

Authors contribution FB analyzed sequencing data and wrote the initial draft of the paper. DB provided code review. MM, BF, and DM were responsible for the design and collection of the cohort data. ES performed the laboratory preparation for sequencing. FB, BF, KB, ED, KS, and MD contributed to the conceptualization of the analysis and analytic plan. All authors read and approved the final manuscript.

Acknowledgments Not applicable

Fleming E, Afful J. Prevalence of Total and Untreated Dental Caries Among Youth: United States, 2015–2016. NCHS Data Brief. 2018 Apr;(307):1–8.
Statement on Early Childhood Caries [Internet]. American Dental Association. 2000 [cited 2021 Jun 8]. Available from: https://www.ada.org/en/about-the-ada/ada-positions-policies-and-statements/statement-on-early-childhood-caries
Heilmann A, Tsakos G, Watt RG. Oral Health Over the Life Course BT - A Life Course Perspective on Health Trajectories and Transitions. In: Burton-Jeangros C, Cullati S, Sacker A, Blane D, editors. Cham: Springer International Publishing; 2015. p. 39–59. Available from: https://doi.org/10.1007/978-3-319-20484-0_3
Martins-Júnior PA, Vieira-Andrade RG, Corrêa-Faria P, et al. Impact of Early Childhood Caries on the Oral Health-Related Quality of Life of Preschool Children and Their Parents. Caries Res [Internet]. 2013;47(3):211–8. Available from: https://www.karger.com/DOI/10.1159/000345534
Pitts NB, Zero DT, Marsh PD, et al. Dental caries. Nat Rev Dis Prim. 2017;3:17030.
Gomez A, Nelson KE. The Oral Microbiome of Children: Development, Disease, and Implications Beyond Oral Health. Microb Ecol [Internet]. 2016/09/14. 2017 Feb;73(2):492–503. Available from: https://www.ncbi.nlm.nih.gov/pubmed/27628595
Mark Welch JL, Dewhirst FE, Borisy GG. Biogeography of the Oral Microbiome: The Site-Specialist Hypothesis. Annu Rev Microbiol [Internet]. 2019 Sep 8;73(1):335–58. Available from: https://doi.org/10.1146/annurev-micro-090817-062503
Bhaumik D, Manikandan D, Foxman B. Cariogenic and oral health taxa in the oral cavity among children and adults: A scoping review. Arch Oral Biol. 2021 Jun;129:105204.
Fakhruddin KS, Ngo HC, Samaranayake LP. Cariogenic microbiome and microbiota of the early primary dentition: A contemporary overview. Oral Dis [Internet]. 2018 Jul 3;0(0). Available from: https://doi.org/10.1111/odi.12932
Marsh PD, Zaura E. Dental biofilm: ecological interactions in health and disease. 2017;44:12–22.
Dzidic M, Collado MC, Abrahamsson T, et al. Oral microbiome development during childhood: an ecological succession influenced by postnatal factors and associated with tooth decay. ISME J [Internet]. 2018;12(9):2292–306. Available from: https://doi.org/10.1038/s41396-018-0204-z
Lif Holgerson P, Esberg A, Sjödin A, West CE, Johansson I. A longitudinal study of the development of the saliva microbiome in infants 2 days to 5 years compared to the microbiome in adolescents. Sci Rep [Internet]. 2020;10(1):9629. Available from: https://doi.org/10.1038/s41598-020-66658-7
Gussy M, Mnatzaganian G, Dashper S, et al. Identifying predictors of early childhood caries among Australian children using sequential modelling: Findings from the VicGen birth cohort study. J Dent [Internet]. 2020;93:103276. Available from: https://www.sciencedirect.com/science/article/pii/S0300571220300105
Dashper SG, Mitchell HL, Lê Cao K-A, et al. Temporal development of the oral microbiome and prediction of early childhood caries. Sci Rep. 2019 Dec;9(1):19732.
Grier A, Myers JA, O’Connor TG, et al. Oral Microbiota Composition Predicts Early Childhood Caries Onset. J Dent Res. 2021 Jun;100(6):599–607.
Nyvad B, Takahashi N. Integrated hypothesis of dental caries and periodontal diseases. J Oral Microbiol [Internet]. 2020 Jan 7;12(1):1710953. Available from: https://pubmed.ncbi.nlm.nih.gov/32002131
Hurley E, Barrett MPJ, Kinirons M, et al. Comparison of the salivary and dentinal microbiome of children with severe-early childhood caries to the salivary microbiome of caries-free children. BMC Oral Health. 2019 Jan;19(1):13.
Manzoor M, Lommi S, Furuholm J, et al. High abundance of sugar metabolisers in saliva of children with caries. Sci Rep [Internet]. 2021;11(1):4424. Available from: https://doi.org/10.1038/s41598-021-83846-1
Jiang S, Gao X, Jin L, Lo ECM. Salivary microbiome diversity in caries-free and caries-affected children. Int J Mol Sci. 2016;17(12).
Kim B-S, Han D-H, Lee H, Oh B. Association of Salivary Microbiota with Dental Caries Incidence with Dentine Involvement after 4 Years. J Microbiol Biotechnol. 2018 Mar;28(3):454–64.
Belstrøm D, Fiehn N-E, Nielsen CH, et al. Altered bacterial profiles in saliva from adults with caries lesions: a case-cohort study. Caries Res. 2014;48(5):368–75.
Shade A. Diversity is the question, not the answer. ISME J [Internet]. 2017;11(1):1–6. Available from: https://doi.org/10.1038/ismej.2016.118
Mukherjee C, Moyer CO, Steinkamp HM, et al. Acquisition of oral microbiota is driven by environment, not host genetics. Microbiome [Internet]. 2021;9(1):54. Available from: https://doi.org/10.1186/s40168-020-00986-8
Sulyanto RM, Thompson ZA, Beall CJ, Leys EJ, Griffen AL. The Predominant Oral Microbiota Is Acquired Early in an Organized Pattern. Sci Rep [Internet]. 2019;9(1):10550. Available from: https://doi.org/10.1038/s41598-019-46923-0
Ramadugu K, Bhaumik D, Luo T, et al. Maternal Oral Health Influences Infant Salivary Microbiome. J Dent Res. 2021 Jan;100(1):58–65.
Mark Welch JL, Utter DR, Rossetti BJ, et al. Dynamics of tongue microbial communities with single-nucleotide resolution using oligotyping. Front Microbiol [Internet]. 2014;5:568. Available from: https://www.frontiersin.org/article/10.3389/fmicb.2014.00568
Zaura E, Brandt BW, Prodan A, et al. On the ecosystemic network of saliva in healthy young adults. ISME J [Internet]. 2017;11(5):1218–31. Available from: https://doi.org/10.1038/ismej.2016.199
Relvas M, Regueira-Iglesias A, Balsa-Castro C, et al. Relationship between dental and periodontal health status and the salivary microbiome: bacterial diversity, co-occurrence networks and predictive models. Sci Rep [Internet]. 2021;11(1):929. Available from: https://doi.org/10.1038/s41598-020-79875-x
Custodio R, Johnson E, Liu G, Tang CM, Exley RM. Commensal Neisseria cinerea impairs Neisseria meningitidis microcolony development and reduces pathogen colonisation of epithelial cells. PLOS Pathog [Internet]. 2020;16(3):1–21. Available from: https://doi.org/10.1371/journal.ppat.1008372
Kim WJ, Higashi D, Goytia M, et al. Commensal Neisseria Kill Neisseria gonorrhoeae through a DNA-Dependent Mechanism. Cell Host Microbe. 2019 Aug;26(2):228–239.e8.
Mashima I, Nakazawa F. The influence of oral Veillonella species on biofilms formed by Streptococcus species. Anaerobe. 2014 Aug;28:54–61.
Liu J, Wu C, Huang I-H, Merritt J, Qi F. Differential response of Streptococcus mutans towards friend and foe in mixed-species cultures. Microbiology [Internet]. 2011/05/12. 2011 Sep;157(Pt 9):2433–44. Available from: https://pubmed.ncbi.nlm.nih.gov/21565931
Guo L, Shokeen B, He X, Shi W, Lux R. Streptococcus mutans SpaP binds to RadD of Fusobacterium nucleatum ssp. polymorphum. Mol Oral Microbiol [Internet]. 2017;32(5):355–64. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/omi.12177
Thurnheer T, Karygianni L, Flury M, Belibasakis GN. Fusobacterium Species and Subspecies Differentially Affect the Composition and Architecture of Supra- and Subgingival Biofilms Models. Front Microbiol [Internet]. 2019 Jul 30;10:1716. Available from: https://www.ncbi.nlm.nih.gov/pubmed/31417514
Biyikoğlu B, Ricker A, Diaz PI. Strain-specific colonization patterns and serum modulation of multi-species oral biofilm development. Anaerobe. 2012 Aug;18(4):459–70.
Rosier BT, Buetas E, Moya-Gonzalvez EM, Artacho A, Mira A. Nitrate as a potential prebiotic for the oral microbiome. Sci Rep [Internet]. 2020;10(1):12895. Available from: https://doi.org/10.1038/s41598-020-69931-x
Belda-Ferre P, Alcaraz LD, Cabrera-Rubio R, et al. The oral metagenome in health and disease. ISME J. 2012 Jan;6(1):46–56.
Edlund A, Yang Y, Yooseph S, et al. Uncovering complex microbiome activities via metatranscriptomics during 24 hours of oral biofilm assembly and maturation. Microbiome [Internet]. 2018;6(1):217. Available from: https://doi.org/10.1186/s40168-018-0591-4
Son MR, Shchepetov M, Adrian P V, et al. Conserved mutations in the pneumococcal bacteriocin transporter gene, blpA, result in a complex population consisting of producers and cheaters. MBio. 2011;2(5).
Qi F, Chen P, Caufield PW. The group I strain of Streptococcus mutans, UA140, produces both the lantibiotic mutacin I and a nonlantibiotic bacteriocin, mutacin IV. Appl Environ Microbiol. 2001 Jan;67(1):15–21.
Eren AM, Borisy GG, Huse SM, Mark JL. Oligotyping analysis of the human oral microbiome. 2014;
Mark Welch JL, Rossetti BJ, Rieken CW, Dewhirst FE, Borisy GG. Biogeography of a human oral microbiome at the micron scale. Proc Natl Acad Sci [Internet]. 2016 Feb 9;113(6):E791 LP-E800. Available from: http://www.pnas.org/content/113/6/E791.abstract
Mager DL, Ximenez-Fyvie LA, Haffajee AD, Socransky SS. Distribution of selected bacterial species on intraoral surfaces. J Clin Periodontol. 2003 Jul;30(7):644–54.
Neiswanger K, McNeil DW, Foxman B, et al. Oral Health in a Sample of Pregnant Women from Northern Appalachia (2011–2015). Int J Dent [Internet]. 2015;2015:469312–76. Available from: http://umich.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA5qD3oRX2i1lhw86GFtm0d391i1SxHB4gO8hWSTSKFWWdu7_8F_6C9xJrstRRAvXnLYBJKdSTLzTeZBCPtxHxjMIz2aRp3z2YsdQ1MZ0SIM0xQR-lmhYfuG9TJ5nfH-UuEv9BArkwWXZGwZzUQuU8-5tCK32sTh5YqDlHbecLiLa4yDSF7CVuFK5
Robins JM, Gail MH, Lubin JH. More on “Biased selection of controls for case-control analyses of cohort studies”. Biometrics. 1986 Jun;42(2):293–9.
Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl Environ Microbiol [Internet]. 2013/06/21. 2013 Sep;79(17):5112–20. Available from: https://www.ncbi.nlm.nih.gov/pubmed/23793624
Callahan BJ, McMurdie PJ, Rosen MJ, et al. DADA2: High resolution sample inference from Illumina amplicon data. Nat Methods [Internet]. 2016;13(7):581–3. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4927377/
Escapa IF, Huang Y, Chen T, et al. Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets. Microbiome [Internet]. 2020;8(1):65. Available from: https://doi.org/10.1186/s40168-020-00841-w
Davis NM, Proctor DM, Holmes SP, Relman DA, Callahan BJ. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. Microbiome [Internet]. 2018;6(1):226. Available from: https://doi.org/10.1186/s40168-018-0605-2
Kuhn M. caret: Classification and Regression Training [Internet]. 2021. Available from: https://cran.r-project.org/package=caret
John CR. MLeval: Machine Learning Model Evaluation [Internet]. 2020. Available from: https://cran.r-project.org/package=MLeval
Holmes I, Harris K, Quince C. Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS One [Internet]. 2012 Feb 3;7(2):e30126. Available from: https://doi.org/10.1371/journal.pone.0030126
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics [Internet]. 2008;9(1):559. Available from: https://doi.org/10.1186/1471-2105-9-559
Bodenhofer U, Bonatesta E, Horejs-Kainrath C, Hochreiter S. msa: an R package for multiple sequence alignment. Bioinformatics. 2015;31(24):3997–9.
Schliep, Klaus, Potts, et al. Intertwining phylogenetic trees and networks. Methods Ecol Evol. 2017;8(10):1212–20.
Yu G, Smith D, Zhu H, Guan Y, Lam TT-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol [Internet]. 2017;8(1):28–36. Available from: http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract
Beghini F, McIver LJ, Blanco-M\’\iguez A, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. bioRxiv [Internet]. 2020; Available from: https://www.biorxiv.org/content/early/2020/11/21/2020.11.19.388223
Tamames J, Puente-Sánchez F. SqueezeMeta, A Highly Portable, Fully Automatic Metagenomic Analysis Pipeline. Front Microbiol [Internet]. 2019 Jan 24;9:3349. Available from: https://pubmed.ncbi.nlm.nih.gov/30733714

No competing interests reported.

AdditionalFile1.xlsx
Additional file 1: Supplementary table: Distribution of key characteristics in the entire Center for Oral Health Research in Appalachia 2 cohort and among the children sampled into the nested case-control analysis set
AdditionalFile10.xlsx
Additional file 10: Supplementary table: Results from DeSeq analysis testing for differential KEGG ortholog abundances between incident visit case and control plaque and saliva samples. All tested KEGG orthologs included, sorted by adjusted P value, includes path annotation.
AdditionalFile11.xlsx
Additional file 11: Supplementary table: Results of gene set enrichment analysis performed using the R package fgsea and the results from DeSeq performed on KEGG orthologs.
AdditionalFile12.xlsx
Additional file 12: Supplementary table: Taxonomic annotation and log2 Fold Change values for all KEGG orthologs annotating to the oxidative phosphorylation pathway
AdditionalFile2.pdf
Additional file 2: Supplementary methods (text) and supplementary figures
AdditionalFile3.xlsx
Additional file 3: Supplementary table: Detailed sample loss by visit due to missing metadata, missing saliva samples, and quality filtering of 16S rRNA amplicon sequencing data
AdditionalFile4.xlsx
Additional file 4: Supplementary tables: Top 100 best matches from BLAST search of the 16S rRNA amplicon V4 region of Streptococcus ASVs of interest in our sample (ASV14, ASV8, and ASV82)
AdditionalFile5.xlsx
Additional file 5: Supplementary table: Distribution of key microbial features (alpha diversity metrics, weighted co-occurrence networks, community state types, and prevalence/abundance of a priori taxa) by case status from saliva samples of visit of case diagnosis and control matching
AdditionalFile6.xlsx
Additional file 6: Supplementary table: Distribution of key microbial features (alpha diversity metrics, weighted co-occurrence networks, community state types, and prevalence/abundance of a priori taxa) by case status from saliva samples of visits preceding case diagnosis/control matching
AdditionalFile7.xlsx
Additional file 7: Supplementary tables: ASV membership in the five weighted co-occurrence network modules presented in the main analysis. Each network module is a tab in the file, each row is an ASV belonging to that network module. ASV number, scientific name, module label (top two most abundant taxa in the module + the most central taxa in the module) and measures of total (kTotal), within module (kWithin) and between module (kOut) connectivity/degree (as calculated by the intramodularConnectivity function in the WCGNA package), as well as the difference between within and between module connectivity (kDiff) for each ASV.
AdditionalFile8.xlsx
Additional file 8: Rand index, adjusted rand index and adjusted mutual information index for alternate clustering parameters in the community state typing (k=4, 5, or 6 clusters) and weighted co-occurrence networks (Hellinger vs center log ratio transformation).
AdditionalFile9.xlsx
Additional file 9: Supplementary table: Results from DeSeq analysis testing for differential taxa abundances between incident visit case and control plaque and saliva samples. All tested taxa included, sorted by adjusted P value

Download PDF

Journal Publication

published 26 Dec, 2022

Read the published version in Microbiome →

Editorial decision: Major revision
08 Nov, 2022
Reviews received at journal
13 Aug, 2022
Reviewers agreed at journal
11 Aug, 2022
Reviewers agreed at journal
29 Jul, 2022
Reviewers invited by journal
26 Jul, 2022
Editor assigned by journal
27 Apr, 2022
Submission checks completed at journal
22 Apr, 2022
First submitted to journal
21 Apr, 2022

You are reading this latest preprint version

Evaluating the ecological hypothesis: Early life salivary microbiome assembly predicts dental caries in a longitudinal case-control study

Status:

Journal Publication

Version 2

Abstract

Background

Results

Conclusions

Figures

Background

Results

Discussion

Conclusions

Methods

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 2