In this study, our findings showed that the optimal conditions for analysing oral microbiota using 16S rRNA gene amplicon sequencing are a combination of 300 bp PE, the primer targeting the V1–V2 regions, stepwise trimming method, and the HOMD database (Table 1, Fig. 2, Additional file 5: Fig. S1, Additional file 4: Table S15–S18). The requirements for the proper merging of paired-end reads by DADA2 are the following criteria: 1) the length of the amplified region is shorter than the sequencing read length, 2) the trimming site for the forward read is a length whose 25-percentile sequencing quality is close to 20, 3) that the length of the reverse read is adjusted so that the minimum overlap length is at least 16 bp [21]. In the case of the V1–V2 primer, the length of the amplified region is 311 bp, and the length required for the overlap is 16 bp; therefore, the minimum length required should be at least 343 bp. The minimum required lengths for other primers are as follows: 343 bp for V1–V2, 523 bp for V1–V3, 209 bp for V3, 497 bp for V3–V4, 598 bp for V3–V5, 323 bp for V4, 424 bp for V4–V5, 348 bp for V5–V6, and 465 bp for V6–V8. Since the maximum read length for the 250 bp PE is 500 bp, the sequenced reads of V1–V3 for 523 bp and V3–V5 for 598 bp could not be merged. On the contrary, the maximum read length for 300 bp PE is 600 bp; therefore, it is theoretically possible to merge paired-end reads for all nine sets of primers. However, the primers requiring most of the read length, such as the V3–V5 for 598 bp, cannot obtain sufficient overlapped regions by being trimmed to low-quality regions, which reduces the non-chimeric survival rate of the paired-end reads. From that perspective, the survival rate of non-chimeric reads seems to be related to the length of the amplified region. Shorter regions, such as V3 and V4, showed higher rates of non-chimeric reads, whereas longer regions, such as V1–V3, showed lower rates. On the other hand, the accuracy of the analysis did not necessarily improve in proportion to the higher survival rate of the non-chimeric reads in this study. This suggested that, to improve the accuracy of phylogenetic analysis, it was necessary to select primers that were appropriate for the sample type. Nevertheless, the appropriate trimming length, which maximizes the survival rate of the non-chimeric reads, should be investigated to maintain enough information for analyzing the samples from obtained sequence reads in case an appropriate primer is selected. Unfortunately, the current versions of QIIME2 and DADA2 cannot automatically search for trimming lengths that maximize the survival of non-chimeric reads [25]. Hence, the results of this study suggested that it is essential to perform the stepwise trimming method to search for the appropriate length.
The optimal conditions of sequencing length, primer type, and database in 16S rRNA gene amplicon analysis differed between mock1 and mock2 communities. In mock1 analysis, the conditions of 300 bp PE, the primer targeting the V3 region and the SILVA database showed the most accurate result. However, the reliability of the analysis was limited to the genus level (Table 2a–2d). In the mock2 analysis, the conditions of 300 bp PE, the primer targeting V3–V4, V4, and V5–V6 regions and the HOMD database showed the most accurate result at the genus level. Furthermore, the primer targeting V1–V2 regions showed reliable results at species level (Table 2e and 2f). Our results suggest that the 300 bp PE is superior to the 250 bp PE for the analysis of bacterial composition in the oral microbiome. A previous study has reported that data generated using the 300 bp PE protocol have a longer overlap region, rendering the approach superior for generating highly reliable paired-end assemblies compared with the 250 bp PE [31]. Focusing on the appropriate database, all bacterial genera in mock1 were registered in SILVA and Greengenes databases, and SILVA showed more accurate results than did Greengenes.
Conversely, all bacterial species in mock1 were registered in only SILVA; however, not all primers showed accurate results. For mock2 analysis, all bacterial taxa were registered in SILVA and HOMD databases at both genus and species levels, and HOMD showed more accurate results than SILVA. SILVA could be an optimal database for the genus analysis of specimens obtained from various environments such as mock1, while HOMD could be for genus and species analysis of specimens from oral microbiomes. Furthermore, the results’ error compared with the theoretical values was larger at the species than genus levels for mock1 and mock2. Analyses using partial 16S rRNA gene sequences suggested that short reads are less suitable for studying accurate richness estimation and assigning taxa compared with full-length reads [32] [33]. Long-read sequencing provides good resolution for bacterial identification [34]. In this study, Spearman correlation coefficient value showed that even the highest coefficient value seemed low, which was 0.115, at the species level analysis of mock1 with 300 bp PE. However, at the species level analysis of mock2 with 300 bp PE, the coefficient value was high, which was obtained using the primer targeting V1–V2 regions and HOMD, with 0.893 to the theoretical value. A possible explanation is that the number of bacteria in mock2 was smaller than that of mock1. Another possible explanation is that since HOMD is composed of oral-specific bacteria, it may be more suitable for mock2. Therefore, when partial 16S rRNA gene amplicon analysis is performed for oral microbiota at the species level, the conditions of 300 bp PE, the primer targeting V1–V2 regions, the stepwise trimming method, and HOMD are recommended.
In addition, the selection of primer type influenced the accuracy of not only the whole bacterial community but also individual bacterial species. Focusing on the individual bacteria in the analysis of mock1 with 300 bp PE, the data using V3–V4, V4, and V6–V8 primers showed a relatively higher analysis accuracy for Streptococcus at the genus level and S. mutans at the species level (Fig. 5c and 6c). However, the data obtained using these primers did not necessarily result in more accurate outcomes for other bacterial species at both genus and species levels compared to other primers. In fact, the combination of 300 bp PE, V3 primer, and SILVA yielded better results than those obtained for other data at the genus-level analysis of mock1 with 300 bp PE. A similar tendency was observed in the analysis of mock2—the data using the V1–V3 primer and the HOMD database showed a higher analysis accuracy for Aggregatibacter and Fusobacterium at the genus level, and A. actinomycetemcomitans and F. nucleatum at the species level (Fig. 7c and 8c). However, the conditions of the V3–V4, V4, or V5–V6 primer and HOMD at the genus level and the V1–V2 primer and HOMD at the species level showed relatively higher analysis accuracy for other bacterial species. For comprehensive bacterial analysis, the use of V3–V4 or V1–V2 primers may be recommended, considering that the analysis result of V3–V4 is more accurate at genus level and has more data stored in the database because it is widely used in general, and V1–V2 is most accurate at the species level. In addition, these results implied that the accuracies of analysis for individual bacterial species were different depending on the primer, and it influenced the analysis accuracy of comprehensive bacterial microbiota. It is then recommended that primers are selected depending on the targeted bacterial species for each analysis and that attention is paid to the error of the relative abundance obtained from the 16S rRNA gene amplicon sequencing analysis.
To our knowledge, this study is the first to investigate the characteristics of bacterial findings in the supragingival dental calculus in Japanese population. The supragingival calculus in Japanese individuals with the history of periodontitis consisted of the bacterial species detected in the supra and subgingival plaque and have potential virulence, despite having done regular maintenance. The Shannon index value and the number of bacterial species detected with V1–V2 and HOMD at the species-level analysis were the highest among all the primers and databases, and these results were consistent with the results of mock2 at the species-level. The L. mirabilis and R. aeria were predominant in the supragingival calculus samples [35]. According to the species-level analysis of the Japanese subgingival plaque using the 300 bp PE, the V1–V2 primer, and HOMD database, L. mirabilis, S. sanguinis, and R. aeria were more abundant in healthy sites than in periodontitis site, while A. naeslundii was more abundant in periodontitis sites than in the healthy site, and C. matruchotii was commonly detected at both sites [36]. The relative abundances of these bacterial species were also predominant in this study. The L. mirabilis and R. aeria played a central role in healthy areas according to a metatranscriptomic and co-occurrence network analyses of Japanese subgingival plaques [37]. Therefore, the bacterial species was predominant in this study shared many species with the Japanese supragingival and subgingival plaques in previous reports, especially those present in the plaques obtained from the healthy gingival site. The potential functional gene analysis using PICRUSt2 showed that the iron complex transport system ATP-binding protein was predominant in this study. This protein is involved in the iron metabolism that plays an important role in the putative virulence factors in the red complex, which is strongly associated with the development of periodontitis [38]. Various genes related to ABC were also predominant in this study and were involved in the availability of hemin, the invasion of host gingival epithelial cells for the induction of infection of P. gingivalis, and the lipid uptake in macrophages [39]. However, further studies and information are needed about the relationship between these genes and not only P. gingivalis but also other pathogenic bacterial species and actual oral microbiota. Similarly, the details of the functions of RNA polymerase sigma-70 factor ECF subfamily, iron complex outer membrane receptor protein, sucrose-6-phosphatase, and iron complex transport system permease protein are still unknown their roles in the pathogenesis of periodontitis and their function in disease-associated bacteria such as the red complex. It has been reported that the non-mineralized biofilm gets mineralized to form dental calculus, which contains enormous amounts of oral bacteria and their DNA [40]. It could then be suggested that mineralized dental calculus could reference the genome information of the bacterial compositions in non-mineralized dental plaque, even ancient samples [41].
A limitation of this study is that the number of bacteria that constitute the mock2 is much lower than that of actual oral microbiota. However, it is difficult to imitate the oral microbiota because more than 700 species of bacteria can reside in the oral cavity, and half of these bacteria are difficult to culture. At present, the results of the mock analysis might help develop a more accurate analytic method. In the future, the development of mathematical models of oral microbiota can make possible the simulation of the natural microbiota on a computer.