The adolescent supragingival biofilm microbiota (amplicon sequencing)
The supragingival bacterial communities in 112 supragingival biofilm samples were determined using next generation amplicon sequencing, in which a short, variable region of the 16S rRNA coding marker gene was amplified by PCR before sequencing. 36 samples derived from 9 adolescent patients diagnosed with induced gingivitis, 40 samples from 10 adolescent patients diagnosed with spontaneous gingivitis, and 36 samples from 9 age-matched healthy individuals without the symptoms of gingivitis were included in this study.
In all cases, rarefaction curves reached their asymptotes (data not shown). This corroborated a sufficient sequencing depth to cover all genera in the bacterial communities. In other words, more than enough sequences were obtained from each sample to extract the total taxonomic information of the microbial community in them. Alignment of the quality-filtered reads with sequences in the NCBI RefSeq database allowed the identification of 172 operational taxonomic units (OTUs) in the samples analyzed (Supplementary table 2).
The core microbial community
The richness or alpha diversity of OTUs was lower in the supragingival bacterial communities of induced and spontaneous gingivitis patients relative to healthy controls, but the difference was not significant (data not shown). Comparison of the microbial communities using PCA (principal component analysis) also showed a general overlap between supragingival plaque samples from gingivitis patients and healthy controls (data not shown).
Slight differences were noted depending on the reference databases employed. Distinct differences were not evident between the study groups at higher taxonomy levels, i.e. genus and beyond. Characteristic alterations took place among the low-abundance members of the community, however, only very few sequence reads could be associated with those taxa, which made statistical comparisons uncertain and likely led to erroneous conclusions. Therefore, the low-abundant species and genera (i.e.: relative abundance <0.01%) were excluded from the subsequent analyses. Annotation of the genera by RDP (Ribosome Database Project) revealed 23 abundant genera (relative abundance ˃1%) and 17 rare genera (relative abundance <1%), while using the RefSeq database, 20 abundant species (relative abundance ˃1%) and 21 rare species (relative abundance <1%) were identified (Supplementary table 3).
The general similarity of the OTUs in the three study groups allowed the amalgamation of all data and determination of the global microbiota of adolescent supragingival plaque (Figure 2A). Veillonella parvula was the most abundant bacterial species in all samples. At genus level, Prevotella, Veillonella, Actinomyces, Capnocytophaga and Streptococcus predominated in the supragingival plaques (Figure 2A).
The most abundant bacterial species in the supragingival biofilms of the distinct study groups
The identified 10 most abundant bacterial species were confirmed by both RDF and RefSeq databases and represented 75 % of the whole community. The following bacterial species were the most prevalent ones in our study (Figure 2, Supplementary table 3): Veillonella parvula (Phylum: Firmicutes; Genus: Veillonella), Fusobacterium nucleatum (Phylum: Fusobacteria; Genus: Fusobacterium), Rothia dentocariosa (Phylum: Actinobacteria; Genus: Rothia), Haemophilus parainfluenzae (Phylum: Proteobacteria; Genus: Haemophilus), Campylobacter gracilis (Phylum: Proteobacteria; Genus: Campylobacter), Streptococcus sanguinis (Phylum: Firmicutes; Genus: Streptococcus), Campylobacter concisus (Phylum: Proteobacteria; Genus: Campylobacter), Veillonella dispar (Phylum: Firmicutes; Genus: Veillonella), Prevotella oris (Phylum: Bacteroidetes; Genus: Prevotella), Prevotella intermedia (Phylum: Bacteroidetes; Genus: Prevotella).
Not surprisingly, the most ubiquitous species belonged in the predominant phyla of the human oral microbiota, i.e. Firmicutes, Fusobacteria, Actinobacteria, Proteobacteria and Bacteroidetes .
Pair-wise comparison of study groups
Although the composition of the microbial communities in the supragingival biofilm of patients diagnosed with the two types of gingivitis and healthy controls were similar, the ranking order of predominant bacterial species was different in each group (Figure 2B). V. parvula dominated the microbiota in induced gingivitis patients, with a relative abundance higher than 57%, followed by C. gracilis and species in lower abundance, i.e. S. sanguinis, H. parainfluenzae, C. concisus and F. nucleatum. In contrast, in the supragingival biofilm of patients with spontaneous gingivitis, the predominant V. parvula was followed by F. nucleatum (relative abundance higher than 21%), P. intermedia and C. gracilis. In healthy controls, the dominant V. parvula was accompanied by R. dentocariosa and H. parainfluenzae and the moderately abundant species F. nucleatum and S. sanguinis (Figure 2B).
In order to uncover the characteristic and significant differences between the three subject groups, the data sets were compared pairwise (Figure 3). Perhaps the most pronounced of these was the alterations between the diseased and healthy patients. We observed an increased relative abundance of the genera Fusobacterium, Accermansia, Treponema and Campylobacter in supragingival plaques of gingivitis patients versus controls. In contrast, the genera Lautropia, Kingella, Neisseria, Actinomyces and Rothia were substantially more abundant in controls than in either of the two groups of gingivitis patients (Figure 3A). The genus Megasphaera shoved notable relative distribution changes between the control subjects and induced and spontaneous gingivitis patients. In addition, relative abundances between the two groups of gingivitis patients were also noticeable, which might deserve further studies on large cohorts of subjects.
At the species level, a significantly higher abundance of C. concisus was apparent in both gingivitis groups versus the controls (Figure 3B). We also observed that the relative abundance of Candidatus Saccharibacteria oral taxon TM7x, R. dentocariosa, R. mucilaginosa, Lautropia mirabilis and H. parainfluenzae was lower in supragingival biofilms of either gingivitis patients versus healthy controls. Interestingly, Fusobacterium periodonticum was detected in healthy samples only. Comparison of bacterial species in the two groups of gingivitis patients showed that the relative abundances of Candidatus Saccharibacteria oral taxon TM7x, R. dentocariosa and H. parainfluenzae were significantly higher in patients with induced versus spontaneous gingivitis. Other species, including P. intermedia, F. nucleatum, Parvimonas micra, Dialister pneumosintes, C. concisus, C. curvus and Aggregatibacter segnis were less abundant in induced gingivitis versus spontaneous gingivitis group (Figure 3B). Out of the 13 significantly different species seven were positively confirmed by both RDP (on the taxonomic level of genus) and RefSeq (on the taxonomic level of specie) databases, marked with stars in Figure 3B. These patterns may be useful for diagnostic and future targeted therapy point of view.
Pathogens in the gingivitis and healthy study groups
Similar to principal component analysis and alpha diversity, clustering of supragingival microbiota using UPGMA (unweighted pair group method with arithmetic mean) did not result in a clear separation the three study groups. Samples from induced gingivitis patients (A), spontaneous gingivitis patients (B) and controls (C) appeared intermingled with each other on the UPGMA tree (see the letters preceding the tooth position numbers in the innermost circle in Figure 4). The samples did not separate into distinct clusters according to the Modified Gingival Index (MGI) reflecting the severity of gingivitis either.
Nevertheless, a different clustering was apparent according to the pathogenic nature of the microbial taxa. Oral microbes have been classified on the basis of their roles in pathogenesis and have been arranged in color coded segments of the “Socransky pyramid” [66,67] (Supplementary figure 1). Species belonging in the purple Socransky complex (mainly V. parvula) are indicated by purple ID numbers, i.e. study group letter and tooth position numbers (Figure 4, innermost circle). This sample cluster comprised 52 of the 112 supragingival microbiota included in this study. F. nucleatum (25 out of 36 microbiota) predominated in the cluster highlighted in orange, together with other members of the Socransky orange complex . The third cluster, highlighted in yellow, showed a balanced distribution of prevailing species, such as R. dentocariosa, H. parainfluenzae and V. dispar. 12 out of 25 samples were assigned to this cluster (Figure 4).
It is important to note, that no correlation between the pathogenicity of the bacterial members of the supragingival plaques and the health status of the patients (gingivitis or health) could be recognized. We have also investigated the potential relationship between oral health status and the patients’ gender, age and orthodontic appliance wearing duration. Although, there was no significant correlation between any of these parameters, a tendency indicating better status of the female patients relative to the male ones and shorter duration of wearing braces versus long exposure was noted. These suggest that there are no substantial differences among the study groups in their general oral health status, hence gingivitis (induced or spontaneous) can be reversed with proper oral hygiene.
Salivary microbiota of adolescent subjects (whole metagenome sequencing)
In addition to the dental plaques, saliva samples were also collected from the same study participants. In these experiments we wanted to examine the importance of sampling site, i.e. tooth biofilm associated and planktonic microbiota, of gingivitis patients relative to the microbiota of healthy controls belonging in the same age group. Saliva samples contained enough DNA to allow whole metagenome sequencing, thereby we could eliminate the potential random systematic error implied in amplicon sequencing . The analysis and evaluation of sequence data followed a workflow (Figure 1B) similar to the one applied to the amplicon sequencing, although in this case both read-based and genome-based analyses became possible, which extended the information content considerably.
Read-based analysis of the whole metagenome data
The most abundant bacteria in the saliva of study groups
The richness and evenness of the salivary microbial communities did not differ significantly between either gingivitis study groups and healthy controls according to the Shannon indices (data not shown). Principal component analysis (PCA) of the salivary microbiota did not reveal distinct clusters either (data not shown). In this respect the whole salivary microbiota exhibited the same pattern as those found in the dental plaque biofilms with amplicon sequencing. Therefore, we could combine all saliva microbiota to generate a global picture of the adolescent saliva microbial landscape (Figure 5A).
The genera Prevotella, Rothia, Streptococcus and Veillonella predominated the salivary microbiota. It is noteworthy that Veillonella was not a principal genus in saliva, as in the cases of the supragingival biofilm samples (see Figure 2A and 5A).
The relative distribution of the 10 most abundant bacterial genera were determined next. In induced gingivitis, the relative abundance of Prevotella was higher than in the two other groups. Streptococcus was the most abundant genus in spontaneous gingivitis (Figure 5B). In control saliva samples, the relative abundance of Rothia was comparable with that of Prevotella. In each study group, 14 genera accounted for >95% of the taxa identified at the genus level.
Genome-based evaluation of the saliva sequencing data (binning)
In addition to the read-based data, bioinformatics analysis of saliva samples was accomplished by genome-centric binning (Figure 1B). In this approach, the filtered sequences were first assembled in contigs, which were then distributed into virtual bins, based on their inherent sequence features. Inspection of the genetic content of the individual bins supplied detailed information about taxonomy from a viewpoint distinct of the read-based approach and in many cases about the genes coding for possible specific metabolic pathways.
Genomic fragments belonging in the most abundant taxa of the healthy salivary microbiome  could be detected in the saliva of each study group. In line with the results of read-based metagenomics, most of the putative genomes identified by binning belonged to Prevotella species, which comprised 8 separate bins, whereas the putative genomes of species belonging in the genera Veillonella and Streptococcus comprised 4 separate bins, respectively. The genomes of Actinomyces and Rothia species were detected in 2 distinct bins, each. The genomes of 9 other species as well as the family Porphyromonadaceae and the taxon Candidatus Saccharibacteria TM7x occupied a single distinct bin (Figure 6). All these genera, were present in both supragingival biofilms and planktonic saliva samples.
The overall similarity of the read-based and genome-based microbiota validated each other, starting from the same saliva sequencing databases the two distinct bioinformatics approaches gave comparable pictures of the microbial communities.