Study Design, Setting and Participants
We conducted a cross-sectional, multi-cohort study of dental caries experience among children ages 3-5-years-old from 4 community-based cohorts in the US, Sweden, and Australia (Supplemental Table S4). A well-characterized sample of 6,404 preschool-age enrolled in the ZOE 2.0 study (2016–2019) in North Carolina, US, served as the ‘discovery’ and main analysis cohort35. Participating children attended public preschools (Head Start) in a state-wide sample in NC. Information on caries experience for all 88 primary tooth surfaces was collected by trained and calibrated clinical examiners in community locations using modified International Caries Detection and Assessment System (ICDAS) criteria36. Using the established caries lesion detection threshold (ICDAS ≥ 3), 54% of participating children had ECC. A detailed description of the study protocols has been previously reported 37–39.
The first replication cohort was the US-based, National Health and Nutrition Examination Survey (NHANES) oral health component40. We combined 7 cycles of the NHANES including years 1999–2004 and 2011–2018, comprising a total of 3,958 3-5-year-old children, 33% of whom had ECC. The second replication cohort comprised 208,112 Swedish 3-5-year-old children (15% with ECC) participating in the Swedish Quality Registry for Caries and Periodontal Diseases (SKaPa, http://www.skapareg.se/). SKaPa included data from the Swedish dental public health system extracted from electronic health records of public and private dental clinics beginning in 200841. The Australian cohort comprised 7,997 3-5-year-old preschool-age children who participated in a community-based outreach program “Wide Smiles” between 2013-19 in southwest Victoria. The program included on-site oral health screenings at preschool centers conducted by the local public dental service. Trained and calibrated clinical examiners documented dental caries experience at the tooth surface-level using International Caries Detection and Assessment System (ICDAS) criteria as previously reported42.
While ZOE 2.0 represents a low-income, high caries risk population (i.e., families are Medicaid-eligible and must meet additional socioeconomic vulnerability criteria to qualify for Head Start participation), NHANES served as the U.S. community-dwelling representative sample (although no survey weights were used prior to carrying out LCA). The Swedish population data served as another independent sample from a generally low caries-risk and high access-to-care population. The Australian data served as the fourth independent sample from an area of moderate caries risk and low access to care.
This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline for cross-sectional studies. Ethics approvals were obtained in all participating studies.
Data and Variables
Tooth surface-level information on caries experience for all 88 primary tooth surfaces (each coded as a binary variable) was available in all participating cohorts. ZOE 2.0, SKaPa and Wide Smiles recorded caries lesions using two different thresholds, i.e., one corresponding to cavitation (i.e., ICDAS ≥ 3, “established” or “advanced” caries lesions) and one representing the first visual changes or “white spots” in tooth enamel (i.e., ICDAS ≥ 1, all caries lesions including “early stage” ones). The clinical protocol of NHANES did not employ ICDAS criteria; however, it recorded caries lesions using a definition resembling ICDAS ≥ 3. Additional information is available for all cohorts included participants’ age (measured in in months), sex (boy/girl), and evidence of restorative or surgical dental procedures (i.e., fillings, crowns, extractions) (Supplemental Table S4).
Additional demographic and oral health-related information was collected in ZOE 2.0 via questionnaires administered to children’s guardians in the language of their preference (English or Spanish)43. For the purposes of this study, we considered information on self-reported race/ethnicity (categorized as African American/non-Hispanic Black, Hispanic, non-Hispanic white, American Indian/Alaskan Native, more than 1 race, and other); guardian’s educational attainment (categorized as less than high school, high school or general educational development, and some technical or community college or more); tooth brushing frequency (twice a day or more versus less than twice a day); frequency of consuming between-meal sugar-containing snacks and beverages (categorized as 2 or more versus less than 2 per day); history of child having been put to bed with a bottle containing anything other than water (yes versus no); reason for visiting the dentist among those with a previous dental visit (for check-ups and/or problems versus only when problems arise); and history of the child having dental pain not from teething (yes versus no). Information about domestic water source fluoride content was measured directly in a subset of participants (n = 1,530) and was subsequently imputed to the remainder of the sample as optimal (i.e., ≥ 0.60ppm F) versus sub-optimal (i.e., < 0.60ppm F) using a machine learning-based imputation method leveraging geographic information systems44.
Relatedness in ZOE 2.0 was determined from genome-wide genotyping data available for 96% (n = 6,144) of participants. Genotyping was performed using DNA from saliva samples at the Center for Inherited Disease Research (CIDR, Johns Hopkins University) using the Infinium Global Diversity Array-8 v1.0, offering 1,905,000 genotyped single nucleotide polymorphisms (SNPs). For this application, 183,287 autosomal SNPs were used to estimate identity-by-descent (IBD) probabilities using the KING-robust procedure45. This subset of “high-quality” SNPs was selected by linkage disequilibrium (LD) pruning from an initial pool consisting of all autosomal SNPs with missing call rate < 2% and minor allele frequency > 5%, with all pairs of SNPs having r2 < 0.1 in a sliding 10 Mb window. Subsequently, relatedness was estimated using the GENESIS package46. Relationships between participant pairs were determined based on clustering of kinship coefficients around expected values, i.e., monozygotic twins (MZ): 0.50, full siblings (FS): 0.25, half siblings (HS): 0.125, and first cousins (FC): 0.0625.
Microbiome data in ZOE 2.0 were available for a subset of participants (n = 300). Specifically, supragingival biofilm samples from the first 150 ECC cases (using the ICDAS ≥ 3 threshold) and 150 non-cases using the same threshold was carried forward to whole genome shotgun sequencing (WGS, metagenomics-MTG) and RNA-seq (metatranscriptomics-MTX)38. The resulting paired-end reads were trimmed of adapter sequences using Trim Galore (Babraham Institute, Cambridge, UK) and classified with Kraken247 and Bracken 2.548 using a custom database including human, fungal, bacterial, and the expanded Human Oral Microbiome Database (eHOMD) genomes49 to produce an initial taxonomic composition profile. After eliminating reads identified as ‘host’, paired-end reads were joined with vsearch 1.10.2.50 and again trimmed of adapter sequences using Trim Galore. We used microbiome MTG data to examine differences in measures of community diversity (i.e., Bray-Curtis estimates of beta diversity, a measure of similarity or dissimilarity) between clinical subtypes of ECC. Additionally, we examined differences in the relative abundance of 4 species, namely Streptococcus mutans, Selenomonas sputigena, Leptotrichia wadei, and Prevotella salivae, that our group recently identified as strongly associated with ECC experience51.
Latent Class Analysis
Discovery latent class analyses (LCA) included children with frank caries experience (i.e., only cases, defined at the established/severe caries lesion or cavitation ICDAS ≥ 3 threshold) in the ZOE 2.0 study (n = 3,465). After identifying an optimal LCA solution in that sample, we carried out two within-study sensitivity analyses and subsequently sought to replicate or generalize clinical subtypes in the 3 independent cohorts of similarly aged children. The first sensitivity analysis consisted of caries-free participants (i.e., the entire study population, n = 6,404) in the ZOE 2.0 LCA analysis to determine whether the solution was robust to the inclusion of non-cases. The second sensitivity analysis consisted of changing the diagnostic threshold so that all caries lesions (i.e., including early-stage, non-cavitated lesions) were included in the case-only analysis (n = 5,882), to determine whether the solution was robust to a different caries lesion diagnostic threshold.
LCA models were fitted using an expectation-maximization algorithm (EM) and robust maximum likelihood (MLR) considering each of the 88 primary tooth surfaces as a binary latent class indicator in a stepwise manner as in a previously reported application43,52. Models were identified by sequentially increasing the number of classes (k) beginning with k = 1 until model non-identification was concluded. To identify the model with the optimal number of classes (i.e., ECC clinical subtypes), we assessed and compared absolute and relative measures of model fit of the two models with the closest fit (i.e., competing models). Measures of classification accuracy included overall entropy (i.e., a measure for each model) and average posterior class probabilities (i.e., a measure for each class). Additionally, we obtained frequencies and percentages of study participants assigned to each class or subtype (i.e., class proportions or ɣ-probabilities) and class-item-specific response probabilities (i.e., IRP, surface-specific caries prevalence).
Description and correlates of ECC clinical subtypes
We used three methods to describe differences in the clinical presentation of ECC between its clinical subtypes. First, we compared measures of caries experience in the different subtypes (e.g., number of caries-affected tooth surfaces). Next, to visualize differences in the severity and distribution of caries experience, we used a custom-build visualization pipeline (SculptorHD)53 to produce annotated 3D representations of the primary dentition that are accessible via https://eccsubtypesdemo.web.app/ (demo site provided for peer-review purposes). Finally, because measures of total caries experience can mask differences in the pattern of disease distribution, we used spatially-informed measures of caries severity (termed ‘caries clusters’). These were derived using a hierarchical clustering analysis (HCA) of the 88 tooth surfaces using Ward’s minimum variance method54. Upon inspection of the resulting dendrogram and clinical interpretability, we identified 7 right-left symmetric clusters of tooth surfaces, a grouping that parallels previous reports in the literature (e.g., separating pits-and-fissures versus smooth surfaces, as well as maxillary versus mandibular teeth) (Supplemental Figure S4).
Subsequently, ECC clinical subtype memberships were used to investigate differences in terms of demographics (e.g., age, race/ethnicity), reported oral health-related behaviors (e.g., history of nighttime bottle feeding, history of tooth pain), and fluoride exposure (i.e., optimal versus sub-optimal). Comparisons between categorical variables were done using chi-square tests and comparisons involving quantitative measures (i.e., dmfs or age) were done with ANOVA including a Bonferroni multiple testing statistical significance correction.
Dental caries experience information at age 12–13 was available for 60% (45,998 out of 77,274) of SKaPa participants with ECC clinical subtype information at age 5. At ages 12–13, one would expect all primary teeth have exfoliated and all permanent teeth except 3rd molars (i.e., wisdom teeth) have emerged. This prospective clinical information was summarized by the number of caries-affected permanent tooth surfaces (i.e., the DMFS index) at ages 12–13 and was then compared between groups of the 5-class ECC classification that included all children, i.e., wherein class I includes caries-free and mild cases. For this purpose, we estimated mean ratios and 95% confidence intervals obtained based on Fieller’s theorem55, then carried out multiple testing-adjusted pairwise comparisons using the Games-Howell test56.
Familial concordance and microbiome differences
Estimates of ECC subtype concordance among related individuals in ZOE 2.0 were obtained via Cohen’s kappa (κ) with corresponding 95% confidence intervals and were compared between monozygotic twins (MZ), full siblings/1st degree relatives (FS), and half siblings/2nd degree relatives (HS) and first cousins/3rd degree relatives (FC). We compared concordances of the ECC subtypes estimates with two simulated control conditions of matched-size participant groups —a ‘positive’ control, wherein ZOE 2.0 participants were assigned to a similar number of groups according to their ordered disease severity (i.e., caries burden, as measured by the dmfs index) and a ‘negative’ control, where ZOE 2.0 participants were randomly assigned. We carried out these tests to determine to what degree ECC subtypes are associated with genetic variation. For instance, we expected concordance estimates to be larger among MZ twins compared to FS and those larger compared to HS/FC. Additionally, we hypothesized that concordance estimates would be larger for ECC clinical subtypes (i.e., along more than 1 axis of variation) compared to what could obtained from allocation of participants in caries severity rank-ordered groups (i.e., per the dmfs index, along 1 axis of variation, the “positive” control) and compared to random participant allocation (a “negative” control).
Additional between-subtype comparisons were completed among the subset of 300 ZOE 2.0 participants with supragingival microbiome information. We hypothesized that microbiome biomarkers would be similar within each clinical disease subtype but would diverge between different subtypes.
Assessment of the feasibility of ECC subtype screening using index teeth.
Given the tendency for clusters of teeth to share similar caries status, we hypothesized that examination of a small number of tooth surfaces might be sufficient to assign children to an ECC subtype. To determine whether information from examining small numbers of index teeth could help accurately categorize children in ECC subtypes, we employed machine learning using 3 sets of easily accessible tooth surfaces as inputs. The first set comprised 10 surfaces—the facial surfaces of the upper anterior 6 maxillary teeth (primary canines and incisors) and the occlusal surfaces of the lower 4 primary molars. The second set included 16 surfaces (i.e., the first set, plus the 6 proximal surfaces between the 4 maxillary primary incisors) and the third set included 20 surfaces (i.e., the second set, plus the 4 surfaces in the interproximal areas between maxillary lateral incisors (i.e., distal surfaces) and maxillary canines (i.e., mesial surfaces)]. We screened 8 candidate models (i.e., boosted neural network, generalized regression Lasso, nominal logistic, bootstrap forest, decision tree, support vector machine, k-nearest neighbors, and linear discriminant analysis), and selected the best performing model based on R2, mean root average square error (RASE), and mean area under the receiver operator curve (ROC AUC) upon 50-fold validation using a random 60% of the data for model training, 15% for model validation, and 25% for model testing. For the selected model, we reported AUC values of a neural network (including 5-fold cross-validation and a total 100 boosted models with learning rate of 0.1) correctly identifying each ECC clinical subtype in training (80%) and validation (20%) data in the ZOE 2.0 study.
Software
For LCA, we used Mplus v.8.8 (Muthén & Muthén, Los Angeles, CA, US) and the poLCA package (v.1.4.1) in R v.4.1.2 (The R Foundation for Statistical Computing, Vienna, Austria). HCA and machine learning applications were implemented with JMP® Pro 17.0 (SAS Institute Inc., Cary, NC, US). Stata v.17.0 (Stata Corp LLC, College Station, TX, US) was used for all additional analyses.