Microbial characteristics of the DMFT index in a high HIV prevalence setting

Background : Oral disease pathogenesis is primarily driven by microbial dysbiosis although diagnosis is routinely macroscopic. To improve early detection especially in HIV patients who are disproportionately affected, there is need to reconcile macroscopic and microscopic characteristics of disease. This study aimed to use amplicon sequencing data to characterize oral microbiota changes along the decayed, missing, filled teeth (DMFT) index. Methods : Amplicon sequencing of the V6-V8 region of the 16S rRNA gene was done on DNA recovered from whole unstimulated saliva of 59 HIV positive and 29 HIV negative individuals, respectively. The microbial structure, composition and co-occurrence networks were characterized using QIIME-2, Phyloseq, Microbiome-1.9.2 and Metacoder in R. Results : From a total of 2.6 million high quality sequence reads obtained, we characterized the oral microbiota into 2,093 operational taxonomic units (OTUs), 21 phyla and 239 genera. While oral microbiota did not cluster participants into distinct groups that track with the DMFT index, we observed the following: a) A steady increase in accessory microbiota while the core size (~50% of richness) remained stable. b) The abundance of core genera such as Stomatobaculum , Peptostreptococcus and Campylobacter increased at onset of pathology (low DMFT), c) A general increase in oral microbial biomass with a typical log difference between gingivitis and periodontitis. d) The onset of pathology (low DMFT) was associated with massive reduction in oral microbial entropy. Conclusions: Although oral microbial shifts along the DMFT index are not distinct, we have demonstrated the potential utility of microbiota dynamics to inform oral disease characteristics. We therefore propose a framework to inform future clinical oral metagenomic studies especially among HIV positive persons in resource limited settings.


Introduction
Oral diseases such as dental caries (tooth decay) and periodontal disease (gum disease) affect a third of the world's population 1 costing in excess of half a trillion dollars 2 . HIV/AIDS predisposes patients to a higher risk of developing dental caries and periodontal disease compared to the general population 3,4 . In Uganda for example, 1.3 million people are infected with HIV, approximately 7.3% of 5 years and provides HIV related services to approximately 16,000 patients 80% of who are on antiretroviral treatment (ART). The clinicians treat on average 300 people every day, which makes it one of few areas to examine HIV related co-morbidities like dental disease.

Sample size determination
It is noteworthy that there are very few studies on oral clinical metagenomics in LMICs, therefore information on powering of such studies is still limited 14 . Since this was study was exploratory in nature, a sample size that maximizes the total number of genera detected (richness) was critical.
Previous studies conducted elsewhere have used 50-65 samples to detect up to 80% of genera 15,16 .
Here we conveniently sampled 88 persons of the 168 who participated in this study between January and May 2018.

Sample collection
Saliva samples were collected between 9:00 AM and 12:00 noon to minimize the circadian effect on flow rate according to a published protocol 17 . For measurement of flow rate, saliva was collected for 5 minutes without any stimulation. Participants were asked not to swallow, but to expel the accumulated saliva into a calibrated plastic centrifuge tube at intervals over a period of 5 minutes.
The volume of saliva estimated from the tube calibration (in ml) was divided by 5 to give the flow rate (ml/min). Saliva samples were collected on ice, transported on ice and stored at -80°C prior to component analysis. Oral examinations were performed by trained clinicians using dental probes and a mirror under suitable artificial light after saliva collection during the same visit.

Clinical characterization of periodontal disease (gingivitis and periodontitis)
Periodontal health was assessed using the Basic Periodontal Examination (BPE) 19 . This index integrates gingival inflammation, presence of calculus or overhanging margins and pocket depth to determine a particular score for a given sextant. All teeth present in a given sextant excluding the third molars were probed using a WHO BPE probe, which has a ball end 0.5mm in diameter and a black band from 3.5 to 5.5 mm, and the deepest pocket noted. Factoring in presence or absence of bleeding, calculus or over hangs and pocket depth, a score of zero to four was recorded for each sextant. The scores ranged between 1 and 4, which were categorized as gingivitis (BPE 1 and 2) and periodontitis (BPE of 3 and 4).

Saliva processing and DNA extraction
Saliva samples were processed as follows prior to DNA extraction using GenoLyse method as published by Bruker (USA). While in biosafety cabinet, 3 milliliters of saliva were transferred to 15 ml sterile centrifuge tubes and an equal volume of 1% w/v NaCl was added. The samples were vortexed vigorously for 1 minute until the specimen was fully liquefied. This was done in order to digest saliva to release the bacteria. Phosphate buffered saline (PBS) pH 6.8 was then added up to the 15ml mark.
The specimens were then centrifuged at 3000 x g for 15 minutes at 4°C. The supernatant was then discarded, the resulting pellet was suspended into 1 ml of 1XPBS and then transferred to a 1.5 ml nuclease free centrifuge tube. These were then centrifuged at 10000 x g for 15 minutes prior to extraction of DNA using the GenoLyse method, following manufacturer's recommendations (Bruker, USA). Briefly, the supernatant was discarded and the pellet resuspended in 100mls of GenoLyse lysis buffer (A-LYS) prior to gentle vortexing to lyse the cells of the microorganisms including bacteria.
Additional cell lysis was achieved by incubating the tubes at 95°C for 5 minutes prior to adding 100mls of GenoLyse neutralizing buffer (A-NB) to stop the action of A-LYS. The mixture was vortexed again for about 2 seconds prior to centrifugation at 13000rpm for 5 minutes. DNA was collected from the supernatant and stored prior to use in subsequent analyses.

16S rRNA gene sequencing
Aliquots of 30 µl were then shipped under controlled ambient condition to Dalhousie University Integrated Microbiome Resource (IMR, Canada) 20 . Following purification of the amplicon pools using AMPure beads, sequencing of the V6-V8 16S rRNA variable loops was performed on the Illumina MiSeq platform (San Diego, CA, USA) using the 400 paired-end MiSeq run according to an established protocol 21 . Following sequencing, demultiplexed samples were returned.

Sequence analysis
A total of 88 paired end sequences in fastq format were received via an html file transfer link from the Dalhousie University Integrated Microbiome Resource (IMR) 20 . Demultiplexed reads from the sequencing facility were then imported into the Qiime 2 pipeline 22 for analysis. As part of quality control the paired-ended reads were trimmed and merged into single ended reads. A QIIME artifact was then generated from these sequences and the metadata file. After dereplication, chimeraremoval and denoising using DADA2 22 and OTUs dataset were generated. We filtered OTUs with a sequence depth of at least 3000, retaining 90% of the samples and the corresponding metadata 23 .

Microbial community structure
Microbial diversity analysis was done using QIIME 2 and R-based packages; Phyloseq and Microbiome analyzer. The OTU-data output was then used to estimate the Alpha and Beta diversity indices. We considered observed Shannon for the Alpha diversity analysis 24 while on the other hand, Beta diversity was estimated with Bray-Curtis, weighted and unweighted Unifrac distances 22 . To examine the association of clinical variables on the oral microbiota structure, we used a permutational multivariate analysis of variances (PERMANOVA) with the Adonis function (9,999 permutations) in phyloseq 24 using the estimated Beta diversity indices as the outcome variable. The results were converted into a bar plot ranking the effect size R 2 of each clinical variable and its statistical significance. To examine if patients can be clustered using beta diversity indices, we run a constrained distance-based redundancy using Bray-Curtis index as the beta diversity index. The analysis was constrained by DMFT and HIV status or periodontal disease type (gingivitis or periodontitis). We also run a principal coordinate analysis (PCoA) of weighted unifrac distance, which was used to estimate the total variance explained by first five components, henceforth referred to as TVE 25 in Phyloseq. The impact of genera on TVE in the subsequent section was used to evaluate the most influential microbes in each DMFT category.

Parameters for characterizing DMFT
In order to map and track microbial changes along the DMFT index, we characterized changes in the alpha and beta diversity indices, abundance and prevalence, and co-occurrence dynamics.

Microbial composition
To investigate changes in abundance, we used Metacoder as described by Foster 26 which combines phylogenetics and abundance. This allowed us to track changes of differentially abundant genera along the DMFT index. Briefly, this analysis was done using a Metacoder object generated from QIIME 2 taxonomic classification of OTUs generated before a naïve Bayes classifier trained on the most recent SILVA database at 97% similarity 27 . First, a training dataset was extracted using the primers used for sequencing our samples. The resultant database subset was then used to train the classifier for taxonomically assigning the OTUs. The heat tree highlights branches based on abundance. To determine the influential genera in DMFT category, we ranked the fifty most abundant genera, and then examined their impact on TVE by sequentially removing them from the data set representing each DMFT category.

Core, and accessory microbiota
To understand core and accessory changes as a proportion of the total richness along the DMFT index, first we defined the core, as the OTUs present in 85% of the samples at each DMFT category.
Secondly, the pan-microbiota as the total number of unique OTUs in each of these categories, and the accessory as the difference between the pan and core microbiota in each category. We then analyzed the data to detect these microbial components in our samples using microbiome package 1.9.2 28 , the output was summarized and plotted using Tidyrverse 1.2.1 and ggplot2 2.3.1 in R, respectively. To track the core as a proportion of microbial richness, we selected genera shared by all participants in all the DMFT categories then divided that by the richness at each stage. To examine the influence of genera, we first ranked nodes (genera) in this co-occurrence network by their centrality degree after which, we selected the top 50 genera (see co-occurrence network) and then examined their impact on TVE. This was done by sequentially dropping one genus from the select genera and compute the change TVE. The change in variance was visualized using bar plot in ggplot colored by the oxygen utilization capacity of each genus. It should be noted that we used the term invaders interchangeably with accessory microbiota, as they represented transient genera.
We also used published literature elsewhere to create two other categories i.e. oral-disease associated genera and normal flora 29,30 . (see Additional file 6: Supplementary DB variance analysis)

Microbial co-occurrence networks
We used the mean abundance correlation matrix of the genera in each DMFT category to map the changes in the co-occurrence network 31,32 . The "associate" function within the microbiome package version 1.9.2 in R (v3.5.1) was used to generate a genus-level spearman correlation matrix, here we set the FDR adjusted p-value at 0.05 33 and pruned the matrix using a correlation coefficient of >0.5 and <-0.5. The resultant matrix was then converted into a directed-network object from which communities were extracted and visualized in igraph package version 1.2.2. The edges were colored based on statistical significance.

A quasi-Poisson logistic regression for the DMFT
To determine the factors associated with changes in biomass along the DMFT we develop a Poisson regression model in GLM where the outcome variable was mean taxonomic abundance (microbial biomass) using the lme4 package in R version (v3.5.1). Data was split into five subsets each representing a DMFT category, we then run five separate models with same explanatory variables. It is these models we compared to examine the difference in estimates, the differences here represented the changes at each stage accounting for gender, HIV status and microbes at family level. Model comparison is done using sjtools in in R (v3.5.1).

Participants descriptive summary
This study involved 38 and 50 male and female participants (N=88) with an average age of 39.5 years. The proportion of HIV positive participants was 67% (59/88) with a median CD4 T cell count of on ART. The mean salivary flow rate was 0.9 ml/min with no discernable difference between HIV positive and HIV negative participants. We observed a difference in the mean DMFT among HIV negative and positive participants of 5.9 and 4.9, respectively (Table 1).

Oral microbial community structure
A total of 2,601,254 high quality 16S rRNA sequences were recovered from the 88 participants.
Apparently-healthy individuals on average generated 35,527 sequences while those with detectable oral pathology (DMFT>0) generated 32,515 sequences. Samples with the highest and lowest sequence count came from HIV negative and HIV positive participants, respectively (Additional file 1: S1). When the sequences were filtered to a depth of 3000, we retained 80 participants (Additional file 1: Fig. S1) from whom 2093 OTUs were generated with a median frequency of 14,351. There was no statistically significant difference in DMFT categories and HIV status regarding alpha diversity indices (Fig. 1A) but a statistically significant association (p>0.05) was observed between DMFT and beta diversity indices such as the unweighted Unifrac and Bray-Curtis distances (Fig. 1B, Additional file 2: S2). There was some clustering along the DMFT index specifically separating low and medium categories with the ability to cluster the participants by HIV status but not periodontal disease (Fig.   1C, D).

Oral microbial composition
The above association was further explored at a taxonomic level i.e. core and accessory microbiota prevalence and abundance ( Fig. 2A, B). The 2093 OTUs clustered into 21 phyla and 239 genera.

Apparently healthy individuals (DMFT=0)
Among the eleven apparently healthy individuals the core and pan microbiota size was 65 and 86 genera respectively, i.e. an accessory microbiota of 21 genera. The most abundant core genera included; Ruminococcus, Mogibacterium, Megasphaera, Campylobacter, Atopobium and Actinomyces (Fig. 2B). The core, normal flora and genera associated with oral pathology accounted for 49%, 23% and 27% respectively while the accessory microbiota accounted for 1% (Fig. 3). Of the 65 genera observed among apparently healthy individuals, 49 were shared across the different DMFT categories, hence forth referred to as the DMFT core.

Low DMFT
The transition from apparently healthy to low DMFT category was characterized by a slight increase in the proportion of the accessory microbiota but the proportion of the core at this stage remained relatively unchanged. This was associated with an increase in the abundance of Stomatobaculum, Peptostreptococcus and a reduction in Atopobium and Actinomyces (Fig. 2B). The DMFT microbial model suggests a significant difference in microbial biomass between participants with concurrent gingivitis and periodontitis. Here the families with significant changes in abundance include Weeksellacae, Veillonellaceae, Streptococaccae, Ruminoccocaea among others (Fig. 3). We also noted that clusters 2 & 5 (Fig. 4A, B), were exclusively composed of HIV positive participants, the former was characterized by a low abundance of genera that are associated with oral disease and the latter by those that are part of the core.

Medium and High DMFT
The medium DMFT category was distinctly separated by microbial clustering (Fig. 1C, D) and also characterized by a slight increase in the proportion of the accessory microbiota. The core, normal flora and oral disease associated genera accounted for oral microbial richness of 48%, 22% and 27% respectively (Fig. 3). This change was associated with an enrichment of Ruminococcus, Peptospteptococcus and Lautropia (Fig. 2B). Participants in the high DMFT category with concurrent periodontitis carried significantly more microbial biomass than those with concurrent gingivitis (Additional file 5: Fig. S5). Indeed, at this stage the accessory microbes accounted for 9% of the microbiota in the oral cavity (Fig. 4C). However, families whose abundance significantly changed primarily belong to the core; Carnobacteriaceae, Neisseriaceae, Micrococcaceae and Desulfoplanes, Desulfosporosinus, Sphaerochaeta, Arenimonas, and Microbacterium among others.
Notably these belonged to the families with statistically significant shifts in abundance at this stage.
The medium and high DMFT categories were also characterized by changes in variance of as high as 10% and we observed that the composition of influential genera changed from aerobic to anaerobic.
Unlike the low DMFT category, here 98% of the influential genera belonged to the core (see Additional file 6: Supplementary DB variance analysis). The extremely high DMFT category was characterized by limited change in variance, and here too almost all the influential genera are members of the core.

Discussion
The pathogenesis and progression of dental caries has for long been linked to a microbial precursor 8 but its routine diagnosis in developing countries is almost exclusively macroscopic i.e. based on the DMFT index 7 . This is mainly because microbial culture-based testing is laborious and complicated 11 , which makes comprehensive and timely delivery of clinical results prohibitively protracted. However, with the advent of culture-independent NGS approaches like 16S rRNA amplicon sequencing 34 , we can narrow this gap in the developing countries where the HIV epidemic may be disproportionately driving the incidence of oral disease. In this study, we used amplicon sequencing data to characterize changes in oral microbiota along the routinely used diagnostic DMFT index.
To the best of our knowledge, this is the first study in this setting that has characterized oral microbiota along the DMFT index. On average, the oral microbiota was characterized using 32,515 sequences, generating 2093 OTUs that clustered into 21 phyla and 239 genera. We observed no significant differences in alpha diversity along the DMFT index. Since structure and composition of the oral microbiota can be influenced by parameters such as saliva flow rate 35 , we accounted for this by establishing the flow rate (mean=0.9 ml/min) among the participants and compared it to the microbial structure. We observed clustering by HIV status with or without considering other factors such as gender and age at this microbial structural level.
Microbiota is integral to the oral cavity and plays a critical role in maintaining its integrity 8 . In apparently-healthy individuals this is expected to manifest as stability in structure and composition 11,36 . In this study, we observed a relatively stable core size along the DMFT index. The proportion of the core relative to the rest of the microbiota was incredibly stable with 49 genera present across the DMFT categories. Indeed, the notion of the microbiome core is commonly investigated as a proxy for resident microbes of a given microbiome 37 . In addition to the 49 core membership, we characterised a core for each DMFT category. Among apparently healthy individuals, we observed genera such as Streptococcus, Staphylococcus, Corynebacterium, Veillonella, Granulicatella and Gemella, most of which have been previously reported as normal flora of the oral cavity 38 . The proportions of genera characterised as normal flora, core and that which are associated with oral pathology were relatively low. The proportion of accessory microbiota, also referred to as invaders was at its lowest among the apparently healthy individuals. This observation supports the notion that resident genera dominate the oral cavity by inhibiting invaders through the production of substances such as fatty acids, peroxides and bacteriocins 39 . This concerted activity has also been characterised among functional communities 40 , which in this study are analogous to co-occurrence communities. Indeed, among healthy individuals we observed that the co-occurrence communities are dominated by members of the core such as Johnsonella, Rikenella, Porphyromonas, Alloprevotella Tannerella, Fusobacterium and others. This further supports the notion that resident bacteria maintain the integrity of the oral cavity among healthy individuals 39 . The organisms observed were predominantly anaerobic 29 probably due to microbial succession that occurs during the formation of dental plaque 41 .
Onset of pathology is widely associated with oral microbiome dysbiosis 8 . While there is ambiguity on what defines dysbiosis, there is common agreement on the nature of forces that modulate the onset of pathology 11 . The forces are similar to selection forces i.e. changes in fitness, growth and reproduction of microbes manifested as fluxes in prevalence and abundance 8 . In the low DMFT category we observed that amidst such forces, the core size still remains stable with a slight increase in the proportion of accessory microbes. The increase in the proportion of invaders was associated with presence of anaerobic lactate fermenters like Sharpea, Lawsonella and Olsenella, previously implicated in the onset of endodontic infections and acute apical abscesses 42,43 . While the size of the core does not change, there was considerable increase in abundance of genera such as Stomatobaculum, Peptostreptococus, Campylobacter and Mogibacterium which have been associated with onset of subgingival pathology 44 . In this category, we noted that individuals with concurrent gingivitis carried significantly more microbes than those with concurrent periodontitis. In this regard, families such as Weeksellacae, Veillonellaceae, Streptococcaceae and Ruminococcacaea accounted for most of the change in abundance. This is most probably due changes in the local environment and microbial interactions favoring the growth of these organisms that have been associated with dental caries 29,45 and labial abscess 46 . We also noted the absence of genera such as Roseburia, which is associated with health elsewhere 47 , its absence here probably supports the onset of pathology. The onset of pathology can be viewed as an onset of a constraint to the normal microbial community interaction. In this study, we showed that this constraint is detectable as a change in entropy, in this case 99% of the pairwise association were statistically significant. In other words, the probability that the associations observed are occurring randomly is very low, which suggests introduction of an overwhelming constraint. This change in entropy is also associated with an increase (up to 10%) in variation attributable to genera such as Lachnospira, Treponema, Aggregatibacter, Corynebacteria and Bifidobacteria whose roles in pathogenesis of oral diseases are well documented [48][49][50][51] . However, some of the influential genera such as Ferruginibacter and Sphaerochaeta are part of the accessory microbiota.
The proportion of the accessory microbiota in the medium and high DMFT categories increased compared to the apparently healthy and low DMFT categories. Here too the proportion of core remained stable and comparatively similar to normal flora and oral disease associated genera. In literature, the onset of pathology is associated with interaction of different microorganisms in a dynamic and concerted polymicrobial synergy to form a cariogenic biofilm within which the community changes as caries progress from early onset (initial demineralization) to deeper lesions with dentin exposure 52 . In particular there is an increase in abundance of core members, Rothia and Mogibacterium, which are associated with dental caries 53 and periodontal disease 53,54 , and Lautropia, which has been isolated from oral cavities of HIV infected children 55 but is not associated with clinical oral disease 55,56 . However, as severity of dental caries progresses, the resident microbes appear to be overwhelmed, seen here as a surge in the accessory microbiota. Microbiota at this point appears to have gone through significant remodeling because most of the invading population in the medium category then become established as members of the core. Moreover, it is in these two categories and at low DMFT that we observed the highest proportion of the least biologically characterized organisms.
Although the extremely high DMFT category is characterized based on six individuals, nonetheless, we observed a generalized increase in abundance of the members although the core and accessory size is smaller. The core was dominated by genera such as Peptostreptoccus, Mogibacterium and Atopobium which have been implicated in chronic oral pathology 57 . We also observed a massive reduction in variation and genera such as Solobacterium, Ruminococcus, Neisseria, Campylobacter, Atopobium and Abitrophia have the largest change in abundance.
From these findings, we propose a microbial model of the DMFT index as a foundation for clinical metagenomics in LMIC settings. In addition to presence and abundance of genera, we used microbial structure and taxonomic characteristics. We specifically characterized genus level influence based on their impact on variance and most importantly shift in entropy. To this effect, the findings here suggest the following; a) Microbial structure; alpha diversity index does not discriminate along the DMFT index but beta diversity does. b) Taxonomic composition; the DMFT core genera approximate normal flora in most categories of the DMFT index other than extremely high DMFT and the proportion of accessory microbiota (invading bacteria) increases with dental caries. In addition, at the onset of pathology/low DMFT, the number of genera whose abundance significantly changes belong to the core, this gradually changes to the accessory in the medium and high DMFT categories. c) Entropy; the onset of pathology creates a microbiome wide constraint detectable as a change in entropy.
Almost all genera-pairwise correlations are statistically significant at this stage interpreted as a reduction in microbial entropy. d) Concurrent gingivitis versus periodontitis; we noted a significant difference in microbial abundance i.e. the low DMFT category best represents gingivitis while the high DMFT category represents most cases of periodontitis. Most importantly we observed that when we account for clinical and microbiota characteristics, there is a significant difference between HIV positive and negative participants. With this information, a testable hypothesis about pathobiology, probiotic supplementation and treatment can be generated and tested to improve dental clinical practice in the management of people living with HIV.
It is important to note that in this study, we did not have equal numbers of patients in each group i.e. 11, 32, 21,16 and 8 for healthy, low, medium, high and extremely high DMFT categories. This has the potential of increasing the level of uncertainty for the estimates made for groups with fewer individuals. However, we observed a considerable level of consistency in microbes within groups and therefore the impact of sample size per group is likely to be limited.

Conclusions
In this study, we characterized oral microbiota dynamics along the DMFT index and although the microbial characteristics did not offer a categorical output of the DMFT index, they provided the following insights; a) the onset of pathology/low DMFT category appears to be driven by significant changes in core and accessory genera abundance, b) the medium and high DMFT categories were associated with a steady increase in invading/accessory microbiota likely responsible for biomass distinction between gingivitis and periodontitis, c) the influential genera in apparently healthy patients are predominantly core members but this changes to accessory in medium and high DMFT categories, d) onset of pathology was associated with a massive reduction in oral microbial entropy. Therefore, using this information we propose a microbial framework for characterizing the DMFT index as a foundation for improving diagnostics and management of oral disease in HIV positive participants in resource limited settings.   Figure 1 Attributes used to assess the oral microbial structure. Panel A shows the alpha diversity along the DMFT index, panel B shows the beta diversity analysis using weighted unifrac along the DMFT index; here the red and yellow rings show the clustering of healthy and extremely-high DMFT categories, panels C and D show beta diversity using Bray Curtis distances for periodontal disease and HIV status with clustering of the low and medium DMFT categories evident. The pan(A) and core(B) oral microbiota tracked along the DMFT index in order of severity.

Ethical considerations
Panel A and B represent genera abundance and prevalence respectively.   show a comparison of models generated from datasets that represents each stage of the DMFT. Note that for panel A we compare four models because the model for extremely high did not converge.

Figure 6
The oral microbial co-occurrence network characteristics. Panel A shows the changes in microbial community entropy, B shows the co-occurrence network remodeling and C shows variance changes and the genera to which the variance is attributable.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. AdditionalFiles.zip