Optimal 16S rRNA gene amplicon sequencing analysis for oral microbiota to avoid the potential bias introduced by trimming length, primer, and database

doi:10.21203/rs.3.rs-3139837/v1

Download PDF

Research Article

Optimal 16S rRNA gene amplicon sequencing analysis for oral microbiota to avoid the potential bias introduced by trimming length, primer, and database

https://doi.org/10.21203/rs.3.rs-3139837/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background. 16S rRNA gene amplicon sequencing analysis is widely used to investigate the diversity and complexity of bacterial communities in the environment. However, the bacterial composition estimated from the experimental data can differ from the original composition. Such a bias occurs depending on methodological stages, including trimming length, selected amplification regions, and referenced databases. The optimal condition to minimize the bias for oral bacterial analysis remain unknown. Therefore, this study aimed to evaluate the possible bias in 16S rRNA gene amplicon analysis using three bacterial DNA samples, namely mock1 community, which comprised 15 bacteria from various environments, mock2 community, which comprised 6 major oral bacteria, and dental calculus obtained from 5 patients, along with different trimming lengths, three databases, and nine primers targeting different hypervariable regions.

Results. Mock1 community analysis results at the genus level showed the highest similarity between the data using 300 bp paired-end (PE), primer targeting V3 region, and SILVA ribosomal RNA database (SILVA) and the theoretical value obtained from the bacterial species. Mock2 community analysis with 300 bp PE showed one of the highest similarities between the theoretical value and data using the V3–V4 region with the Human Oral Microbiome Database (HOMD) at the genus level and data using the V1–V2 region with HOMD at the species level. In the species analysis of the dental calculus samples with 300 bp PE, the Shannon index value was higher in the V1–V2 region with HOMD than that in other combinations of primers and databases. The composition of the relative bacterial abundance was more markedly influenced by the inter-individual variability in the samples than the selected amplified region and/or database.

Conclusion. The optimal conditions for analyzing oral microbiota with the most negligible bias were determined to be a combination of 300 bp PE, the primer targeting the V1–V2 region, and the HOMD database. Notably, this is the first report for such analyses of modern Japanese dental calculus. Furthermore, the methods of this study will be a guide for setting the appropriate sequence analysis conditions for each environment.

16S rRNA sequencing

oral microbiota

hypervariable regions

Human Oral Microbiome Database

trimming length

mock communities

primers

databases

dental calculus.

The bacterial composition of the oral cavity is composed of more than 700 bacterial species [1]. Dysbiosis of this complex ecosystem, particularly in dental plaque, contributes to periodontal diseases [2–5], affecting nearly half of the adults aged 30 or above in the United States [6]. Periodontitis caused by oral dysbiosis reportedly exacerbates type 2 diabetes, cardiovascular disease, preterm low birth weight, non-alcoholic fatty liver disease, obesity, and gut inflammation in vivo [7–14]. Oral bacteria affect the pathophysiology of systemic diseases through the inflammatory response caused by the oral infection or the ectopic accumulation of oral microorganisms and/or their components in other organs of the human body. Therefore, an accurate elucidation of the bacterial composition is important and meaningful for understanding the pathogenic factors in the diseases caused by dysbiosis, such as periodontitis. 16S rRNA gene amplicon sequencing analysis is a powerful analytical method widely used for clarifying the diversity and complexity of bacterial communities [15]. It enables cost-effective and easy-to-use for analyzing bacterial composition [16]. However, 16S rRNA gene amplicon analysis may introduce a bias between the true and real data in the results obtained from bacterial communities, which is influenced by each methodological step [17].

The several steps in which the bias arises for 16S rRNA gene amplicon sequencing analysis are sampling, DNA extraction method, primer selection, sequencing method, software for analysis, and database selection [16]. In a study, using a mock oral community, the primers targeting the V3–V4 and V4–V5 regions reportedly yielded higher result reproducibility [18]. However, in a study using mock communities and saliva samples, the primer targeting the V1–V3 region was determined to be the most accurate for the compositional evaluation of oral microbiota [19]. A previous study, using Quantitative Insights Into Microbial Ecology version 2 (QIIME2) and a primer targeting the V3–V4 region, demonstrated that paired-end reads can be merged when low-quality regions of the sequenced read are trimmed and the trimming length is optimised to obtain appropriate overlapping regions; this improvs the accuracy of the analysis [20] [21]. However, the recommended trimming length of the sequenced reads for various primers in QIIME2 has not been investigated, except for the primer targeting the V3–V4 region. Abellan-Schneyder et al. (2021) [16] also reported that among several databases, SILVA ribosomal RNA database (SILVA) and the Ribosomal Database Project displayed more accurate taxonomic classification of gut microbiota in human stool samples than the Greengenes, the genomic-based 16S rRNA Database, and the All-Species Living Tree database [16]. The Human Oral Microbiome Database (HOMD; www.homd.org), which is mainly consists of data from 16S rRNA gene amplicon sequencing analysis of oral bacteria, is frequently utilized for analyzing oral microbiota [22]. The combination of the HOMD database and the V1–V2 primer was reported to be useful for analyzing oral samples in comparison with the V4 and V3–V4 primers because the V1–V2 primer better identified most streptococci at the species level [15]. Although streptococci are predominant in the oral cavity, the high diversity of oral microbiota generates controversy about the conditions most suitable for analyzing these bacteria using the 16S rRNA gene sequencing. Furthermore, ensuring the accuracy of 16S rRNA gene amplicon analysis for high-level taxa, including species and strains, seemed difficult compared to shotgun metagenome or long-read sequencing analyses. [23] However, in the case of oral microbiota, the reliability of species-level results in 16S rRNA gene amplicon analysis is yet to be confirmed. Therefore, determining the optimal conditions for 16S rRNA gene amplicon sequencing analysis in oral cavity is necessary for understanding microbiota.

In this study, we comprehensively investigate each methodological step to determine the optimal condition for oral microbiota in 16S rRNA gene amplicon sequencing analysis using two mock communities and dental calculus samples, along with stepwise trimming of the paired-end sequenced reads, nine types of primers targeting different hypervariable regions, and three reference databases (Fig. 1).

Preparation of mock communities

The overview of this experiment is displayed in Fig. 1. Two types of mock communities were used in this study. Mock1 community was DNA-Mock-001 (NBRC, Tokyo, Japan): a mixture of equivalent amounts of DNA from 15 species of bacteria obtained from various environments (https://www.nite.go.jp/data/000107974.pdf). Mock2 community was prepared in our laboratory and consisted of six bacterial species representatively associated with oral diseases (Aggregatibacter actinomycetemcomitans ATCC 43718, Fusobacterium nucleatum ATCC 25586, Parvimonas micra ATCC 33270, Porphyromonas gingivalis ATCC 33277, Prevotella intermedia ATCC 25611, and Streptococcus sanguinis ATCC 10556; Additional file 1: Table S1). The following media were used to culture the bacteria: brain heart infusion broth (Becton, Dickinson and Company, Franklin Lakes, NJ, USA) for A. actinomycetemcomitans and S. sanguinis, brain heart infusion broth supplemented with hemin (5 mg/L) and vitamin K1 (1 mg/L) for P. gingivalis and P. intermedia, and fluid universal medium [24] for F. nucleatum and P. micra. Each species was cultured anaerobically at 37°C for 16–48 h. After the growth was confirmed, the bacterial DNA was extracted using the DNeasy PowerBiofilm Kit (Qiagen, Venlo, Netherlands) following the manufacturer’s instructions. DNA purity was determined using Nanodrop One/One Microvolume UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and the concentration was determined using Quantus Fluorometer (Promega, Madison, WI, USA). The concentration of each sample was measured thrice, and an average value was calculated. Each bacterial DNA was diluted to 3 ng/µL and mixed into a single tube. The theoretical value of the bacterial abundance in the two mock communities was calculated using the formula: 16S rRNA gene copy number = [total genomic DNA (g) × unit conversion constant (bp/g) / genome size (bp)] × 16S rRNA gene copy number per genome [16]. The genome size and 16S rRNA gene copy number were determined using the values listed in American Type Culture Collection or referring to National Center for Biotechnology Information and European Nucleotide Achieve databases. To confirm DNA contamination of bacterial species not from the cultured bacterial species in the mock2 community, sequencing was performed for each DNA of 6 bacterial species using the V3–V4 primer (341F/806R: CCTACGGGNGGCWGCAG/GGACTACHVGGGTWTCTAAT), and the 300 bp paired-end (PE) reads obtained by the Miseq platform (Illumina, San Diego, CA, USA) were analyzed by QIIME2 and HOMD database (data not shown).

DNA extraction from clinical samples

This study was performed in accordance with the Ethical Guidelines for Clinical Studies (2008 Notification Number 415 of the Ministry of Health, Labor, and Welfare) and was approved by the Ethics Committee of Tokyo Medical and Dental University (D2020-031). The supragingival dental calculus samples from five Japanese study participants with a history of periodontitis were collected from the supragingival area of the mandibular anterior teeth before performing regular maintenance using a sterile Gracey curette (Hu-Friedy, Chicago, IL, USA). Samples were stored at -20°C, and the DNA was extracted using the DNeasy PowerBiofilm Kit. DNA purity and concentration were determined in similar methods as mock samples.

Library preparation and Illumina sequencing of mock communities and clinical samples

The extracted DNA samples of mock1, mock2, and dental calculus samples were divided equally into nine aliquots for the independent amplification of nine regions of 16S rRNA gene. The library preparations were performed for each aliquot using Ex Taq Hot Start Version (Takara, Tokyo, Japan) for the 1st and 2nd PCR reactions, AMPure XP (Beckman Coulter Inc., Brea, CA, USA) for the purification of PCR amplicon, Qubit 3.0 Fluorometer (Thermo Fisher Scientific) for the measurement of DNA quantity, capillary electrophoresis with an Agilent 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) for the evaluation of the length of PCR amplicon. The 1st and 2nd PCR conditions were 94 ^◦C for 2 min, followed by 20 cycles of 94 ^◦C for 30 sec, 50 ^◦C for 30 sec, and 72 ^◦C for 30 sec, and finally 72 ^◦C for 5 min and 94 ^◦C for 2 min, followed by 8 cycles of 94 ^◦C for 30 sec, 60 ^◦C for 30 sec, and 72 ^◦C for 30 sec, and finally 72 ^◦C for 5 min, respectively. The nine universal primers used were V1–V2 (27F/338R: AGAGTTTGATCMTGGCTCAG/GCTGCCTCCCGTAGGAGT), V1–V3 (27F/518R: AGAGTTTGATCMTGGCTCAG/ATTACCGCGGCTGCTGGG), V3 (341F/518R: CCTACGGGNGGCWGCAG/ATTACCGCGGCTGCTGG), V3–V4 (341F/806R: CCTACGGGNGGCWGCAG/GGACTACHVGGGTWTCTAAT), V3–V5 (341F/907R: CCTACGGGGAGGCAGCAG/CCGTCAATTCMTTTRAGTTT), V4 (515F/806R: GTGCCAGCMGCCGCGGTAA/GGACTACHVGGGTWTCTAAT), V4–V5 (515F/907R: GTGCCAGCMGCCGCGGTAA/CCGTCAATTCMTTTRAGTTT), V5–V6 (799F/1115R: AACMGGGATTAGATACCCKG/AGGGTTGCGCTCGTTG), and V6–V8 (968F/1401R: AACGCGAAGAACCTTAC/CGGTGTGTACAAGACCC). The targeted hypervariable regions and sequences of primers were searched through Google Scholar, and cases with hits over 400 were selected. Each library was sequenced on the MiSeq platform with 250 bp and 300 bp paired ends for the mock1 community and with 300 bp paired ends for both mock2 community and dental calculus samples.

Analysis of the sequenced data

Sequenced data were analyzed through QIIME2 (version 2022.2) [25], and the number and quality of raw reads were summarized using QIIME2 plug-in demux. The 1st PCR primer sequences for the library were removed using QIIME2 plug-in cutadapt, and the information from the trimmed reads was confirmed using demux. The DADA2 process was also performed to cut and merge the reads at appropriate locations according to the length and quality. The appropriate combinations of forward and reverse read lengths that left the highest number of reads after non-chimeric sequence removal were investigated. In the DADA2 process, the 250 bp PE reads of mock1 were cut at 15 locations between 90 and 230 bp for a total of 2025 combinations, and the 300 bp PE reads of mock2 and dental calculus samples were cut at 20 locations between 90 and 280 bp for a total of 3600 combinations. The parameter with the highest number of non-chimeric reads was used in the subsequent analysis. For the calculation of the length of the overlapped region, the formula used was:

length of the amplified region (bp) = total length of the forward and reverse read after trimming (bp) – length of the overlap region (bp)

The taxonomic classification of amplicon sequence variants (ASVs) was performed using the qiime feature-classifier classify-sklearn based on the SILVA, Greengenes, and HOMD databases. Silva 138 SSU Ref NR99 full-length sequences and Greengenes 13_8 99% OTUs full-length sequences were downloaded from the QIIME2 tutorial (https://docs.qiime2.org/2022.2/data-resources/). The database referring HOMD used in this study was manually generated based on “HOMD_16S_rRNA_RefSeq_V15.22.fasta” and “HOMD_16S_rRNA_RefSeq_V15.22.qiime.taxonomy” from the HOMD website (https://www.homd.org/download#refseq). The names of bacteria used as ideal data were manually changed to attenuate the effect of different bacterial names on each database, and the bacteria not registered in the database and the inappropriate bacterial name such as metagenome were treated as “unassigned” (Additional file 2: Table S2–S13). To obtain the theoretical read number of ASVs in mock communities, the mean, maximum, and minimum number of reads were calculated for each sample, and each number was multiplied by the relative value of bacterial species in mock communities and considered as the theoretical number of reads. The ASV abundance table was used to calculate the Shannon index for alpha diversity analysis. The ASV abundance table was first converted to relative abundance for the bar graph. Next, the ASV abundance table was converted into centred log-ratio (clr) value by R version 4.2.1 (2022-06-23) with the package compositions and zCompositions according to the CoDa_microbiome_tutorial (https://github.com/ggloor/CoDa_microbiome_tutorial) in downstream analysis. In the process, the ASVs with an average number of fewer than 1 read were removed from all the samples [26]. PCA was performed based on the clr values, and beta diversity was visualized using the R packages “tidyverse” [27] and “ggplot” [28]. The normality of data distribution converted to clr was assessed using the Shapiro–Wilk test, Kolmogorov–Smirnov test and Q-Q plot with the “shapiro.test”, “ks.test” and “qqplot” in the R package “car”. Spearman’s correlation coefficient and its p-value were calculated based on the clr values for the analysis of data similarity in each sample using the “corr.test” function in the R package “psych” [29]. The analysis of potential functional genes was performed using the q2-picrust plug-in (version 2021.11) based on the table and taxonomy data obtained from QIIME2 and Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Additional file 3: Table S14) [30]. The relative abundances of the potential functional genes were converted to clr and analyzed using a scattered dot plot and PCA.

Analysis of optimal trimming and merging lengths for QIIME2

The mean percentage of the survival rate of the sequenced reads, which were filtered, merged, and denoised in QIIME2, varied depending on the sample, primer type, and trimming length of the sequenced read (Fig. 2, Additional file 4: Table S15–S18, Additional file 5: Fig. S1). The survival rate of the sequenced reads was maximized with specific trimming length of the forward and reverse reads, which differed for each primer even with the same sample and sequencing length. Conversely, compared with that of trimming length, the survival rate of sequenced reads decreased with the under- and over-trimming length. The highest survival rates were 95.83% for V4 in the 250 bp PE of mock1, 94.64% for V3 in the 300 bp PE of mock1, 79.23% for V3 in the 300 bp PE of mock2, and 90.36% for V4 in the 300 bp PE of the dental calculus sample (Table 1, Additional file 4: Table S15–S18, Additional file 5: Fig. S1). The top 2 primers with the highest survival rates were V3 and V4, and the bottom 2 were V1–V3 and V3–V5 in all samples. The data from V1-V3 and V3-V5 in the 250 bp PE analysis of mock1 and V3-V5 in the 300 bp PE analysis of the dental calculus samples were excluded from further analysis because the non-chimeric read survival rates were less than 10%. It was confirmed that the remaining data showed sufficient survival reads for subsequent analyses using rarefaction curves (Additional file 6: Fig. S2). In addition, the length of the overlapped regions varied depending on the sample, primer type, and sequencing length, ranging from 68–78 bp for V1–V2, 0–57 bp for V1–V3, 38–87 bp for V3, 23 bp for V3–V4, 0–32 bp for V3–V5, 19 bp for V4, 38 bp for V4–V5, 40–60 bp for V5–V6, and 22–32 bp for V6–V8 (Table 1).

Table 1

Summary of the trimmed length for maximizing the survival rate of the non-chimeric read.
a Mock1–250 bp PE
Primer	Forward (bp)	Reverse (bp)	Overlap (bp)	Filtered (%)	Merged (%)	Non-chimeric (%)
V1-V2	180	170	78	96.17	95.08	93.09
V1-V3	200	220	Less than 0	89.33	0.03	0.03
V3	120	110	88	98.52	95.23	95.05
V3-V4	230	220	23	84.89	83.69	83.23
V3-V5	200	210	Less than 0	67.49	0.02	0.02
V4	140	130	19	98.71	97.21	95.83
V4-V5	230	160	38	86.77	84.21	81.08
V5-V6	170	150	40	98.60	96.37	93.87
V6-V8	230	190	22	94.05	91.28	86.70
b Mock1–300 bp PE
Primer	Forward (bp)	Reverse (bp)	Overlap (bp)	Filtered (%)	Merged (%)	Non-chimeric (%)
V1-V2	200	150	78	94.59	93.17	90.63
V1-V3	270	240	57	58.75	48.39	47.00
V3	90	90	38	97.91	95.11	94.64
V3-V4	260	190	23	87.30	86.05	85.10
V3-V5	280	280	32	15.86	7.43	7.21
V4	130	140	19	97.84	96.87	93.72
V4-V5	230	160	38	94.09	91.70	86.25
V5-V6	210	110	40	96.09	94.41	90.74
V6-V8	260	170	32	85.46	82.01	64.74
c Mock2–300 bp PE
Primer	Forward (bp)	Reverse (bp)	Overlap (bp)	Filtered (%)	Merged (%)	Non-chimeric (%)
V1-V2	220	120	68	84.13	80.16	71.86
V1-V3	280	230	57	30.23	28.00	25.29
V3	100	90	48	88.42	84.85	79.23
V3-V4	250	200	23	69.45	66.77	64.08
V3-V5	280	260	12	7.94	3.36	3.27
V4	160	110	19	79.54	78.22	76.07
V4-V5	240	150	38	62.86	60.42	56.45
V5-V6	200	140	60	84.39	80.72	68.91
V6-V8	260	170	32	67.63	61.56	50.24
d Dental calculus samples − 300 bp PE
Primer	Forward (bp)	Reverse (bp)	Overlap (bp)	Filtered (%)	Merged (%)	Non-chimeric (%)
V1-V2	200	150	78	86.90	83.90	78.93
V1-V3	280	210	37	13.32	4.02	3.65
V3	90	90	38	93.85	92.04	88.58
V3-V4	260	190	23	52.22	51.26	48.92
V3-V5	90	130	Less than 0	77.42	0.08	0.08
V4	140	130	19	94.87	93.92	90.36
V4-V5	220	170	38	78.40	75.64	70.51
V5-V6	200	130	50	90.67	86.88	79.04
V6-V8	250	170	22	57.82	53.43	46.04
The results are shown for the 250 bp PE of mock1 (a), the 300 bp PE of mock1 (b), the 300 bp PE of mock2 (c), and the 300 bp PE of the dental calculus sample (d).

Table 2. Results of the similarity analysis using Spearman’s correlation coefficient.
a Mock1–250 bp PE - Genus							b Mock1–250 bp PE - Species
Database	SILVA		Greengenes		HOMD		Database	SILVA
Number of registered bacteria	15/15		15/15		13/15		Number of registered bacteria	15/15
Value	r	p	r	p	r	p	Value	r	p
ideal	1.000	**	1.000	**	1.000	**	ideal	1.000	**
V1-V2	0.881	**	0.497	-	0.558	**	V1-V2	-0.188	-
V3	0.880	**	0.362	-	0.792	**	V3	-0.162	-
V3-V4	0.744	**	0.463	-	0.723	**	V3-V4	0.146	-
V4	0.580	*	0.282	-	0.702	**	V4	-0.024	-
V4-V5	0.585	*	0.278	-	0.766	**	V4-V5	-0.024	-
V5-V6	0.491	-	0.268	-	0.595	**	V5-V6	-0.264	-
V6-V8	0.552	*	0.444	-	0.404	-	V6-V8	-0.008	-
Primer							Primer
c Mock1–300 bp PE - Genus							d Mock1–300 bp PE - Species
Primer	SILVA		Greengenes		HOMD		Database	SILVA
Number of registered bacteria	15/15		15/15		13/15		Number of registered bacteria	15/15
Value	r	p	r	p	r	p	Value	r	p
ideal	1.000	**	1.000	**	1.000	**	ideal	1.000	**
V1-V2	0.881	**	0.497	-	0.602	**	V1-V2	0.083	-
V1-V3	0.708	**	0.332	-	0.754	**	V1-V3	0.179	-
V3	0.888	**	0.383	-	0.802	**	V3	0.094	-
V3-V4	0.862	**	0.607	*	0.805	**	V3-V4	0.221	-
V4	0.612	*	0.329	-	0.690	**	V4	0.063	-
V4-V5	0.609	*	0.304	-	0.800	**	V4-V5	0.240	-
V5-V6	0.482	-	0.268	-	0.617	**	V5-V6	-0.028	-
V6-V8	0.576	*	0.472	-	0.459	*	V6-V8	0.233	-
Primer							Primer
e Mock2–300 bp PE - Genus							f Mock2–300 bp PE - Species
Primer	SILVA		Greengenes		HOMD		Database	SILVA		HOMD
Number of registered bacteria	6/6		6/6		6/6		Number of registered bacteria	6/6		6/6
Value	r	p	r	p	r	p	Value	r	p	r	p
ideal	1.000	**	1.000	**	1.000	**	ideal	1.000	**	1.000	**
V1-V2	0.835	**	0.922	**	0.893	**	V1-V2	0.488	-	0.810	*
V1-V3	0.809	**	0.833	*	0.750	-	V1-V3	0.333	-	0.619	-
V3	0.782	**	0.263	-	0.893	**	V3	0.056	-	-0.411	-
V3-V4	0.822	**	> 0.999	**	> 0.999	**	V3-V4	-0.103	-	-0.195	-
V4	0.911	**	> 0.999	**	> 0.999	**	V4	-0.103	-	-0.195	-
V4-V5	0.835	**	0.929	**	0.893	**	V4-V5	0.056	-	-0.195	-
V5-V6	0.931	**	0.335	-	> 0.999	**	V5-V6	0.056	-	-0.195	-
V6-V8	0.828	**	0.976	**	0.964	**	V6-V8	0.056	-	-0.195	-
Primer							Primer
The correlation coefficient values and p-value of 250 bp PE in mock1at the genus level (a), the values of 250 bp PE in mock1 at the species level (b), the values of 300 bp PE in mock1 at the genus level (c), the values of 300 bp PE in mock1 at the species level (d), the values of 300 bp PE in mock2 at the genus level (e), the values of 300 bp PE in mock2 at the species level.

Diversity, phylogenetic, and similarity analysis of the 250 bp PE of mock1

First, the genus-level analysis results of mock1, using the 250 bp PE sequencing length, were evaluated. (Additional file 2: Table S2–S4) All 15 genera in mock1 were registered in SILVA and Greengenes, and 13 of these 15 genera were included in HOMD. The Shannon index values of the samples ranged as 3.6–3.8 for SILVA, 3.4–3.6 for Greengenes, and 3.5–3.8 for HOMD; the value based on SILVA was slightly higher than that of Greengenes and HOMD (Fig. 3a). In addition, the relative abundance ratios differed for each primer set and database (Fig. 3b). The relative abundance of Streptococcus, which is generally predominant in the oral microbiota, using the V4 primer for each database showed relatively accurate results, within ± 5% error, compared to the ideal conditions. However, the data of Streptococcus obtained using the V4–V5 primer for each database showed relatively inaccurate results with over ± 25% error (Fig. 3c). In addition, the results for Lactobacillus from V1–V2 and V4 primers for each database were also within the ± 5% error range (Fig. 3c). The principal component analysis (PCA) showed that the variation between theoretical and sample values differed among primer sets and databases (Fig. 3d). The bacterial composition obtained using the combination of the V1–V2 primer and SILVA database closely approached the theoretical value (Spearman’s r 0.881, p < 0.01; Table 2a). Among the 15 species, 15, 10, and 7 species were registered with SILVA, Greengenes, and HOMD, respectively. Therefore, the analysis at species-level was performed using only SILVA. The Shannon index values at species-level ranged in 0.8–2.1 for SILVA; these values were lower than the theoretical value (Fig. 4a). The proportion of data classified as “unassigned” increased more at the species level than at the genus level and accounted for a large percentage in almost all the analyzed data (Fig. 4b). The relative abundance of Streptococcus mutans, which is mainly related to dental caries, using the V4 primer showed relatively accurate results, within ± 5% error, compared to the ideal conditions. Conversely, the data obtained using the V4–V5 primer for each database showed relatively inaccurate results with over ± 25% error (Fig. 4c). PCA showed that the variation between theoretical and real values at the species level exceeded that at the genus level for each primer set (Fig. 4d and Table 2a and 2b). The data obtained using the combination of the V3–V4 primer and SILVA database was closest approach to the theoretical value among the combinations, even though the value was around 0. (Spearman’s r 0.146, p > 0.05; Table 2b).

Diversity, phylogenetic, and similarity analysis of the 300 bp PE of mock1

The genus-level analysis of mock1 using 300 bp PE sequencing length yielded Shannon index values of approximately 3.5–3.8 for SILVA, 3.3–3.6 for Greengenes, and 3.3–3.8 for HOMD, with relatively high values for SILVA and HOMD than that of Greengenes (Fig. 5a). The discrepancy in the bacterial composition was observed between the theoretical values and those obtained with each primer (Fig. 5b, Additional file 2: Table S5–S7). The data obtained for the relative abundance of Streptococcus, using the V3–V4, V4, and V6–V8 primers for each database, showed results within ± 5% error compared to ideal conditions. However, the data obtained using V1–V3 and V4–V5 primers for each database showed relatively inaccurate results with over ± 25% error (Fig. 5c). The data obtained for the relative abundance of Parabacteroides, which is mainly related to the gut microbiome, using the V3–V4, V4, V4–V5, V5–V6, and V6–V8 primers for SILVA and Greengenes database showed relatively accurate results within ± 5% error as compared to ideal conditions. However, the results obtained using V1–V2 and V1–V3 primers for SILVA and Greengenes database showed relatively inaccurate results with over ± 25% error (Fig. 5c). The data variability depending on the database was shown as PCA plot (Fig. 5d). However, even though the database was the same, the results showed that the variability laid within the primers. The data obtained using the combination of the V3 primer and SILVA database closely approached the theoretical value (Spearman’s r 0.888, p < 0.01; Table 2c). At the species level, the Shannon index values of each primer ranged in 0.8–2.2 for SILVA, which were relatively lower than the theoretical value (Fig. 6a). The bar plot at the species level showed a more remarkable deviation from the theoretical values relative to that observed at the genus level (Fig. 6b). Additionally, the proportion of the “unassigned” data increased at the species level compared to the genus level. The relative abundance of S. mutans, using the V3–V4, V4, and V6–V8 primers for each database, showed relatively accurate results within ± 5% error as compared to the ideal conditions. On the contrary, the data obtained using the V1–V3 primer for each database and V4–V5 for HOMD showed relatively inaccurate results with over ± 25% error (Fig. 6c). The distances between the theoretical and real values were greater at species than at genus levels (Fig. 6d and Table 2c and 2d). The data obtained using the combination of the V6–V8 primer and SILVA database was the closest approach to the theoretical value among the combinations, even though the value was low (Spearman’s r 0.233, p > 0.05; Table 2d).

Diversity, phylogenetic, and similarity analysis of the 300 bp PE of mock2

The genus-level analysis results of mock2 using 300 bp PE sequencing length were analyzed (Additional file 2: Table S8–S10). Mock2 was composed of six bacteria associated with the oral microbiome, and all six species were registered in SILVA, Greengenes, and HOMD. The Shannon index values ranged as 2.4–2.6 for SILVA, 2.4–2.6 for Greengenes, and 2.4–2.6 for HOMD (Fig. 7a). Although all six species were detected in all samples, their relative abundance ratios differed from the theoretical composition values at the genus levels (Fig. 7b). The relative abundance of Parivimonas, Porphyromonas, Prevotella, and Streptococcus, using the V1–V2 primer for each database, showed a relatively accurate result within ± 5% error compared to ideal conditions. For Aggregatibacter and Fusobacterium, the results obtained using the V1–V3 primer for each database were relatively accurate, within ± 5% error (Fig. 7c). Even though the database was the same, it was observed that the variation between theoretical and real values varied depending on the primers used (Fig. 7d). The data obtained using the combination of the V3–V4 and V4 primers and Greengenes and V3–V4, V4, and V5–V6 primers and HOMD databases approached the theoretical value (Spearman’s r > 0.999, p < 0.01; Table 2e). All six species were registered at the species level in SILVA and HOMD. Therefore, the analyses at species level were conducted except for the data of Greengenes because not all bacterial species can be assigned. The Shannon index values ranged as 1.5–2.7 for SILVA and 1.7–2.6 for HOMD, with HOMD showing relatively similar values (Fig. 8a). All six species were detected in only V1–V3 of SILVA and V1–V2 and V1–V3 of HOMD; only parts of bacteria were encountered in other conditions (Fig. 8b). The relative abundance of P. micra, P. gingivalis, P. intermedia, and S. sanguinis, using the V1–V2 primer and HOMD showed relatively accurate results within ± 5% error compared to the ideal conditions. The data for Aggregatibacter actinomycetemconcomitans and F. nucleatum, using V1–V3 and HOMD for each database, showed relatively accurate results within ± 5% error (Fig. 8c). The distances between the theoretical and real values were greater for all data at the species level than at the genus level (Fig. 8d and Table 2e and 2f). The data obtained using the combination of the V1–V2 primer and HOMD database closely approached the theoretical value (Spearman’s r 0.810, p < 0.01; Table 2f).

Diversity and phylogenetics of the 300 bp PE of the dental calculus samples

The supragingival dental calculus was collected from patients undergoing maintenance and analyzed using 300 bp PE sequencing length. (Additional file 2: Table S11–S13) At the genus level, the Shannon index values of the samples were approximately 4.2–4.3 for SILVA, 3.7–4.1 for Greengenes, and 4.4–4.5 for HOMD (Fig. 9a). The bar plot results indicated a relatively similar number of bacteria at genus-level, however, the combinations of databases and primers influenced a relative bacterial component (Fig. 9b). The top 10 bacterial genera based on their clr values in the dental calculus samples using 300 bp PE and HOMD were common between primer targeting V1–V2 and V3–V4 regions, although their orders differed. (Fig. 9c and 9d) The top 10 genus based on their clr values, using the V1–V2 primer and the HOMD database, were Streptococcus with 6.24, Actinomyces with 5.51, Capnocytophaga with 5.35, Prevotella with 5.31, Lautropia with 5.19, Neisseria with 5.16, Ottowia with 5.11, Fusobacterium with 5.05, Leptotrichia with 4.82, and Corynebacterium with 4.69 (Fig. 9c). The variation between PCA plots of each sample differed in all databases, and the characteristics of the sample itself have more impact on the degree of aggregation of each PCA plot than that of the primers (Fig. 9e). At the species-level analysis, the Shannon index values ranged in 1.7–4.0 for SILVA and 4.0–5.3 for HOMD, with relatively higher values for HOMD (Fig. 10a). The maximum values of Shannon index in both databases were for the combination of the V1–V2 primer and HOMD. Some of the top 10 bacterial species, such as Lautropia mirabilis, based on their clr values in dental calculus samples using 300 bp PE and the HOMD were common between primers targeting V1–V2 and V3–V4 regions, although their orders differed (Fig. 10b and 10c). The top 10 species based on their centered log-ration (clr) values, using the V1–V2 primer and the HOMD database, were L. mirabilis with 7.09, S. sanguinis with 7.08, Ottowia sp. HMT 894 with 7.01, Rothia aeria with 5.84, Actinomyces naeslundii with 5.66, Capnocytophaga leadbetteri with 5.63, Corynebacterium durum with 5.49, Bacteroidales [G-2] bacterium HMT 274 with 5.47, Arachnia propionica with 5.45, and Corynebacterium matruchotii with 5.40 (Fig. 10b). The attributions of the sample itself have more impact on the degree of aggregation of each PCA plot than that of the primers (Fig. 10d). On the other hand, according to the PCA based on the potential functional genes, the influences of the sample itself were smaller than the results of the phylogenetic analysis, and each plot appeared as a single cluster except for the data using V3 primer (Fig. 10e). As the top 10 potential pathways based on clr values using the V1–V2 primer and the HOMD database, ATP-binding cassette (ABC)-2 type transport system permease protein with 6.17, ABC-2 type transport system ATP-binding protein with 6.17, ABC, subfamily B bacterial with 6.14, Putative ABC transport system permease protein with 5.95, RNA polymerase sigma-70 factor, ECF subfamily with 5.88, Putative ABC transport system ATP-binding protein with 5.86, Iron complex outer membrane receptor protein, with 5.84, Sucrose-6-phosphatase [EC:3.1.3.24] with 5.82, Iron complex transport system permease protein with 5.66, Iron complex transport system ATP-binding protein [EC:7.2.2.-] with 5.5 were observed (Fig. 10f, Additional file 3: Table S14).

In this study, our findings showed that the optimal conditions for analysing oral microbiota using 16S rRNA gene amplicon sequencing are a combination of 300 bp PE, the primer targeting the V1–V2 regions, stepwise trimming method, and the HOMD database (Table 1, Fig. 2, Additional file 5: Fig. S1, Additional file 4: Table S15–S18). The requirements for the proper merging of paired-end reads by DADA2 are the following criteria: 1) the length of the amplified region is shorter than the sequencing read length, 2) the trimming site for the forward read is a length whose 25-percentile sequencing quality is close to 20, 3) that the length of the reverse read is adjusted so that the minimum overlap length is at least 16 bp [21]. In the case of the V1–V2 primer, the length of the amplified region is 311 bp, and the length required for the overlap is 16 bp; therefore, the minimum length required should be at least 343 bp. The minimum required lengths for other primers are as follows: 343 bp for V1–V2, 523 bp for V1–V3, 209 bp for V3, 497 bp for V3–V4, 598 bp for V3–V5, 323 bp for V4, 424 bp for V4–V5, 348 bp for V5–V6, and 465 bp for V6–V8. Since the maximum read length for the 250 bp PE is 500 bp, the sequenced reads of V1–V3 for 523 bp and V3–V5 for 598 bp could not be merged. On the contrary, the maximum read length for 300 bp PE is 600 bp; therefore, it is theoretically possible to merge paired-end reads for all nine sets of primers. However, the primers requiring most of the read length, such as the V3–V5 for 598 bp, cannot obtain sufficient overlapped regions by being trimmed to low-quality regions, which reduces the non-chimeric survival rate of the paired-end reads. From that perspective, the survival rate of non-chimeric reads seems to be related to the length of the amplified region. Shorter regions, such as V3 and V4, showed higher rates of non-chimeric reads, whereas longer regions, such as V1–V3, showed lower rates. On the other hand, the accuracy of the analysis did not necessarily improve in proportion to the higher survival rate of the non-chimeric reads in this study. This suggested that, to improve the accuracy of phylogenetic analysis, it was necessary to select primers that were appropriate for the sample type. Nevertheless, the appropriate trimming length, which maximizes the survival rate of the non-chimeric reads, should be investigated to maintain enough information for analyzing the samples from obtained sequence reads in case an appropriate primer is selected. Unfortunately, the current versions of QIIME2 and DADA2 cannot automatically search for trimming lengths that maximize the survival of non-chimeric reads [25]. Hence, the results of this study suggested that it is essential to perform the stepwise trimming method to search for the appropriate length.

The optimal conditions of sequencing length, primer type, and database in 16S rRNA gene amplicon analysis differed between mock1 and mock2 communities. In mock1 analysis, the conditions of 300 bp PE, the primer targeting the V3 region and the SILVA database showed the most accurate result. However, the reliability of the analysis was limited to the genus level (Table 2a–2d). In the mock2 analysis, the conditions of 300 bp PE, the primer targeting V3–V4, V4, and V5–V6 regions and the HOMD database showed the most accurate result at the genus level. Furthermore, the primer targeting V1–V2 regions showed reliable results at species level (Table 2e and 2f). Our results suggest that the 300 bp PE is superior to the 250 bp PE for the analysis of bacterial composition in the oral microbiome. A previous study has reported that data generated using the 300 bp PE protocol have a longer overlap region, rendering the approach superior for generating highly reliable paired-end assemblies compared with the 250 bp PE [31]. Focusing on the appropriate database, all bacterial genera in mock1 were registered in SILVA and Greengenes databases, and SILVA showed more accurate results than did Greengenes.

Conversely, all bacterial species in mock1 were registered in only SILVA; however, not all primers showed accurate results. For mock2 analysis, all bacterial taxa were registered in SILVA and HOMD databases at both genus and species levels, and HOMD showed more accurate results than SILVA. SILVA could be an optimal database for the genus analysis of specimens obtained from various environments such as mock1, while HOMD could be for genus and species analysis of specimens from oral microbiomes. Furthermore, the results’ error compared with the theoretical values was larger at the species than genus levels for mock1 and mock2. Analyses using partial 16S rRNA gene sequences suggested that short reads are less suitable for studying accurate richness estimation and assigning taxa compared with full-length reads [32] [33]. Long-read sequencing provides good resolution for bacterial identification [34]. In this study, Spearman correlation coefficient value showed that even the highest coefficient value seemed low, which was 0.115, at the species level analysis of mock1 with 300 bp PE. However, at the species level analysis of mock2 with 300 bp PE, the coefficient value was high, which was obtained using the primer targeting V1–V2 regions and HOMD, with 0.893 to the theoretical value. A possible explanation is that the number of bacteria in mock2 was smaller than that of mock1. Another possible explanation is that since HOMD is composed of oral-specific bacteria, it may be more suitable for mock2. Therefore, when partial 16S rRNA gene amplicon analysis is performed for oral microbiota at the species level, the conditions of 300 bp PE, the primer targeting V1–V2 regions, the stepwise trimming method, and HOMD are recommended.

In addition, the selection of primer type influenced the accuracy of not only the whole bacterial community but also individual bacterial species. Focusing on the individual bacteria in the analysis of mock1 with 300 bp PE, the data using V3–V4, V4, and V6–V8 primers showed a relatively higher analysis accuracy for Streptococcus at the genus level and S. mutans at the species level (Fig. 5c and 6c). However, the data obtained using these primers did not necessarily result in more accurate outcomes for other bacterial species at both genus and species levels compared to other primers. In fact, the combination of 300 bp PE, V3 primer, and SILVA yielded better results than those obtained for other data at the genus-level analysis of mock1 with 300 bp PE. A similar tendency was observed in the analysis of mock2—the data using the V1–V3 primer and the HOMD database showed a higher analysis accuracy for Aggregatibacter and Fusobacterium at the genus level, and A. actinomycetemcomitans and F. nucleatum at the species level (Fig. 7c and 8c). However, the conditions of the V3–V4, V4, or V5–V6 primer and HOMD at the genus level and the V1–V2 primer and HOMD at the species level showed relatively higher analysis accuracy for other bacterial species. For comprehensive bacterial analysis, the use of V3–V4 or V1–V2 primers may be recommended, considering that the analysis result of V3–V4 is more accurate at genus level and has more data stored in the database because it is widely used in general, and V1–V2 is most accurate at the species level. In addition, these results implied that the accuracies of analysis for individual bacterial species were different depending on the primer, and it influenced the analysis accuracy of comprehensive bacterial microbiota. It is then recommended that primers are selected depending on the targeted bacterial species for each analysis and that attention is paid to the error of the relative abundance obtained from the 16S rRNA gene amplicon sequencing analysis.

To our knowledge, this study is the first to investigate the characteristics of bacterial findings in the supragingival dental calculus in Japanese population. The supragingival calculus in Japanese individuals with the history of periodontitis consisted of the bacterial species detected in the supra and subgingival plaque and have potential virulence, despite having done regular maintenance. The Shannon index value and the number of bacterial species detected with V1–V2 and HOMD at the species-level analysis were the highest among all the primers and databases, and these results were consistent with the results of mock2 at the species-level. The L. mirabilis and R. aeria were predominant in the supragingival calculus samples [35]. According to the species-level analysis of the Japanese subgingival plaque using the 300 bp PE, the V1–V2 primer, and HOMD database, L. mirabilis, S. sanguinis, and R. aeria were more abundant in healthy sites than in periodontitis site, while A. naeslundii was more abundant in periodontitis sites than in the healthy site, and C. matruchotii was commonly detected at both sites [36]. The relative abundances of these bacterial species were also predominant in this study. The L. mirabilis and R. aeria played a central role in healthy areas according to a metatranscriptomic and co-occurrence network analyses of Japanese subgingival plaques [37]. Therefore, the bacterial species was predominant in this study shared many species with the Japanese supragingival and subgingival plaques in previous reports, especially those present in the plaques obtained from the healthy gingival site. The potential functional gene analysis using PICRUSt2 showed that the iron complex transport system ATP-binding protein was predominant in this study. This protein is involved in the iron metabolism that plays an important role in the putative virulence factors in the red complex, which is strongly associated with the development of periodontitis [38]. Various genes related to ABC were also predominant in this study and were involved in the availability of hemin, the invasion of host gingival epithelial cells for the induction of infection of P. gingivalis, and the lipid uptake in macrophages [39]. However, further studies and information are needed about the relationship between these genes and not only P. gingivalis but also other pathogenic bacterial species and actual oral microbiota. Similarly, the details of the functions of RNA polymerase sigma-70 factor ECF subfamily, iron complex outer membrane receptor protein, sucrose-6-phosphatase, and iron complex transport system permease protein are still unknown their roles in the pathogenesis of periodontitis and their function in disease-associated bacteria such as the red complex. It has been reported that the non-mineralized biofilm gets mineralized to form dental calculus, which contains enormous amounts of oral bacteria and their DNA [40]. It could then be suggested that mineralized dental calculus could reference the genome information of the bacterial compositions in non-mineralized dental plaque, even ancient samples [41].

A limitation of this study is that the number of bacteria that constitute the mock2 is much lower than that of actual oral microbiota. However, it is difficult to imitate the oral microbiota because more than 700 species of bacteria can reside in the oral cavity, and half of these bacteria are difficult to culture. At present, the results of the mock analysis might help develop a more accurate analytic method. In the future, the development of mathematical models of oral microbiota can make possible the simulation of the natural microbiota on a computer.

This study suggests that combining the stepwise trimming method, 300 bp PE, V1–V2 primer, and HOMD database is recommended for analyzing oral microbiota using 16S rRNA gene amplicon analysis. Additionally, this is the first comprehensive bacterial analysis of the supragingival dental calculus in the Japanese subjects. The methods of this study will serve as a guideline for setting appropriate sequence conditions for not only the oral microbiome analysis as well as other environments.

ASVs, Amplicon sequence variants; HOMD, Human Oral Microbiome Database; PCA, Principal component analysis; PE, paired ends.

Ethics approval and Consent to participate

Consent for publication

Not applicable.

Availability of data and materials

The data sets generated for this study can be found in (DDBJ) with the following accession number for DNA sequencing: DRA016570 and DRA016571.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by the Japan Society for the Promotion of Science (grant numbers 22K21053 to T. Nemoto, 21K16987 to T. S., and 20K09934 to Y. T.) and the Japan Science and Technology Agency, Support for Pioneering Research Initiated by the Next Generation (Grant Number JPMJSP2120 to T. Nagai)

Author’s contributions

T. Nagai, T. S., and Y. T. performed the experiments, processed the sequence data, and wrote the first draft of the manuscript. K. K., T. W., T. Nemoto, S. Maekawa, R. K., S. Matsumura, Y. O., and S. K. assisted with the experiments and reviewed the manuscript. T. S., Y. T., and T. I. supervised the analyses and wrote the manuscript. All authors read and approved the final version of the manuscript.

Acknowledgments

We thank Hugo Song from HUGO LS (https://www.hugols.com) and Data Science Center for contracting Supercomputing resources provided by the Human Genome Center at the Institute of Medical Science (University of Tokyo; http://sc.hgc.jp/shirokane.html).

Author’s information (Optional)

Corresponding authors

Correspondence to Takahiko Shiba or Yasuo Takeuchi.

Kilian M, Chapple IL, Hannig M, Marsh PD, Meuric V, Pedersen AM, Tonetti MS, Wade WG, Zaura E: The oral microbiome - an update for oral healthcare professionals. Br Dent J 2016, 221(10):657-666.
Bakaletz LO: Developing animal models for polymicrobial diseases. Nat Rev Microbiol 2004, 2(7):552-568.
Costalonga M, Herzberg MC: The oral microbiome and the immunobiology of periodontal disease and caries. Immunol Lett 2014, 162(2 Pt A):22-38.
Shiba T, Watanabe T, Kachi H, Koyanagi T, Maruyama N, Murase K, Takeuchi Y, Maruyama F, Izumi Y, Nakagawa I: Distinct interacting core taxa in co-occurrence networks enable discrimination of polymicrobial oral diseases with similar symptoms. Sci Rep 2016, 6:30997.
Komatsu K, Shiba T, Takeuchi Y, Watanabe T, Koyanagi T, Nemoto T, Shimogishi M, Shibasaki M, Katagiri S, Kasugai S et al: Discriminating Microbial Community Structure Between Peri-Implantitis and Periodontitis With Integrated Metagenomic, Metatranscriptomic, and Network Analysis. Front Cell Infect Microbiol 2020, 10:596490.
Eke PI, Dye BA, Wei L, Slade GD, Thornton-Evans GO, Borgnakke WS, Taylor GW, Page RC, Beck JD, Genco RJ: Update on Prevalence of Periodontitis in Adults in the United States: NHANES 2009 to 2012. J Periodontol 2015, 86(5):611-622.
Komazaki R, Katagiri S, Takahashi H, Maekawa S, Shiba T, Takeuchi Y, Kitajima Y, Ohtsu A, Udagawa S, Sasaki N et al: Periodontal pathogenic bacteria, Aggregatibacter actinomycetemcomitans affect non-alcoholic fatty liver disease by altering gut microbiota and glucose metabolism. Sci Rep 2017, 7(1):13950.
Figuero E, Han YW, Furuichi Y: Periodontal diseases and adverse pregnancy outcomes: Mechanisms. Periodontol 2000 2020, 83(1):175-188.
Genco RJ, Borgnakke WS: Diabetes as a potential risk for periodontitis: association studies. Periodontol 2000 2020, 83(1):40-45.
Orlandi M, Graziani F, D'Aiuto F: Periodontal therapy and cardiovascular risk. Periodontol 2000 2020, 83(1):107-124.
Polak D, Sanui T, Nishimura F, Shapira L: Diabetes as a risk factor for periodontal disease-plausible mechanisms. Periodontol 2000 2020, 83(1):46-58.
Schenkein HA, Papapanou PN, Genco R, Sanz M: Mechanisms underlying the association between periodontitis and atherosclerotic disease. Periodontol 2000 2020, 83(1):90-106.
Hatasa M, Ohsugi Y, Katagiri S, Yoshida S, Niimi H, Morita K, Tsuchiya Y, Shimohira T, Sasaki N, Maekawa S et al: Endotoxemia by Porphyromonas gingivalis Alters Endocrine Functions in Brown Adipose Tissue. Front Cell Infect Microbiol 2020, 10:580577.
Kitamoto S, Nagao-Kitamoto H, Jiao Y, Gillilland MG, 3rd, Hayashi A, Imai J, Sugihara K, Miyoshi M, Brazil JC, Kuffa P et al: The Intermucosal Connection between the Mouth and Gut in Commensal Pathobiont-Driven Colitis. Cell 2020, 182(2):447-462 e414.
Wade WG, Prosdocimi EM: Profiling of Oral Bacterial Communities. J Dent Res 2020, 99(6):621-629.
Abellan-Schneyder I, Matchado MS, Reitmeier S, Sommer A, Sewald Z, Baumbach J, List M, Neuhaus K: Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing. mSphere 2021, 6(1).
Pollock J, Glendinning L, Wisedchanwet T, Watson M: The Madness of Microbiome: Attempting To Find Consensus "Best Practice" for 16S Microbiome Studies. Appl Environ Microbiol 2018, 84(7).
Teng F, Darveekaran Nair SS, Zhu P, Li S, Huang S, Li X, Xu J, Yang F: Impact of DNA extraction method and targeted 16S-rRNA hypervariable region on oral microbiota profiling. Sci Rep 2018, 8(1):16321.
Soriano-Lerma A, Perez-Carrasco V, Sanchez-Maranon M, Ortiz-Gonzalez M, Sanchez-Martin V, Gijon J, Navarro-Mari JM, Garcia-Salcedo JA, Soriano M: Influence of 16S rRNA target region on the outcome of microbiome studies in soil and saliva samples. Sci Rep 2020, 10(1):13637.
Mohsen A, Park J, Chen YA, Kawashima H, Mizuguchi K: Impact of quality trimming on the efficiency of reads joining and diversity analysis of Illumina paired-end reads in the context of QIIME1 and QIIME2 microbiome analysis frameworks. BMC Bioinformatics 2019, 20(1):581.
Seo-Young L, Yeuni Y, Jin C, Hee Sam N: Trimming conditions for DADA2 analysis in QIIME2 platform. International Journal of Oral Biology 2021, 46(3):146-153.
Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner AC, Yu WH, Lakshmanan A, Wade WG: The human oral microbiome. J Bacteriol 2010, 192(19):5002-5017.
Johnson JS, Spakowicz DJ, Hong BY, Petersen LM, Demkowicz P, Chen L, Leopold SR, Hanson BM, Agresta HO, Gerstein M et al: Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun 2019, 10(1):5029.
Gmur R, Guggenheim B: Antigenic heterogeneity of Bacteroides intermedius as recognized by monoclonal antibodies. Infect Immun 1983, 42(2):459-470.
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F et al: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019, 37(8):852-857.
Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ: Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 2017, 8:2224.
Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J et al: Welcome to the Tidyverse. Journal of Open Source Software 2019, 4(43).
Wickham H: Programming with ggplot2. In: ggplot2: Elegant Graphics for Data Analysis. Edited by Wickham H. Cham: Springer International Publishing; 2016: 241-253.
Revelle W: psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois 2017.
Douglas GM, Maffei VJ, Zaneveld JR, Yurgel SN, Brown JR, Taylor CM, Huttenhower C, Langille MGI: PICRUSt2 for prediction of metagenome functions. Nat Biotechnol 2020, 38(6):685-688.
Fadrosh DW, Ma B, Gajer P, Sengamalay N, Ott S, Brotman RM, Ravel J: An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform. Microbiome 2014, 2(1):6.
Yarza P, Yilmaz P, Pruesse E, Glockner FO, Ludwig W, Schleifer KH, Whitman WB, Euzeby J, Amann R, Rossello-Mora R: Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 2014, 12(9):635-645.
Jeong J, Yun K, Mun S, Chung WH, Choi SY, Nam YD, Lim MY, Hong CP, Park C, Ahn YJ et al: The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology. Sci Rep 2021, 11(1):1727.
Matsuo Y, Komiya S, Yasumizu Y, Yasuoka Y, Mizushima K, Takagi T, Kryukov K, Fukuda A, Morimoto Y, Naito Y et al: Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION nanopore sequencing confers species-level resolution. BMC Microbiol 2021, 21(1):35.
Kado I, Hisatsune J, Tsuruda K, Tanimoto K, Sugai M: The impact of fixed orthodontic appliances on oral microbiome dynamics in Japanese patients. Sci Rep 2020, 10(1):21989.
Ikeda E, Shiba T, Ikeda Y, Suda W, Nakasato A, Takeuchi Y, Azuma M, Hattori M, Izumi Y: Japanese subgingival microbiota in health vs disease and their roles in predicted functions associated with periodontitis. Odontology 2020, 108(2):280-291.
Nemoto T, Shiba T, Komatsu K, Watanabe T, Shimogishi M, Shibasaki M, Koyanagi T, Nagai T, Katagiri S, Takeuchi Y et al: Discrimination of Bacterial Community Structures among Healthy, Gingivitis, and Periodontitis Statuses through Integrated Metatranscriptomic and Network Analyses. mSystems 2021, 6(6):e0088621.
Duran-Pinedo AE, Chen T, Teles R, Starr JR, Wang X, Krishnan K, Frias-Lopez J: Community-wide transcriptome of the oral microbiome in subjects with and without periodontitis. ISME J 2014, 8(8):1659-1672.
Slakeski N, Dashper SG, Cook P, Poon C, Moore C, Reynolds EC: A Porphyromonas gingivalis genetic locus encoding a heme transport system. Oral Microbiol Immunol 2000, 15(6):388-392.
Akcali A, Lang NP: Dental calculus: the calcified biofilm and its role in disease development. Periodontol 2000 2018, 76(1):109-115.
Shiba T, Komatsu K, Sudo T, Sawafuji R, Saso A, Ueda S, Watanabe T, Nemoto T, Kano C, Nagai T et al: Comparison of Periodontal Bacteria of Edo and Modern Periods Using Novel Diagnostic Approach for Periodontitis With Micro-CT. Front Cell Infect Microbiol 2021, 11:723821.

No competing interests reported.

Additionalfile1.xlsx
Additional information Additional file 1 Table S1 List of bacterial species in the mock communities.
Additionalfile2.xlsx
Additional file 2 Table S2 Raw abundance data of sequence reads obtained from mock1 with 250 bp PE using SILVA. Table S3 Raw abundance data of sequence reads obtained from mock1 with 250 bp PE using Greengenes. Table S4 Raw abundance data of sequence reads obtained from mock1 with 250 bp PE using HOMD. Table S5 Raw abundance data of sequence reads obtained from mock1 with 300 bp PE using SILVA. Table S6 Raw abundance data of sequence reads obtained from mock1 with 300 bp PE using Greengenes. Table S7 Raw abundance data of sequence reads obtained from mock1 with 300 bp PE using HOMD. Table S8 Raw abundance data of sequence reads obtained from mock2 with 300 bp PE using SILVA. Table S9 Raw abundance data of sequence reads obtained from mock2 with 300 bp PE using Greengenes. Table S10 Raw abundance data of sequence reads obtained from mock2 with 300 bp PE using HOMD. Table S11 Raw abundance data of sequence reads obtained from dental calculus samples with 300 bp PE using SILVA. Table S12 Raw abundance data of sequence reads obtained from dental calculus samples with 300 bp PE using Greengenes. Table S13
Raw abundance data of sequence reads obtained from dental calculus samples with 300 bp PE using HOMD.
Additionalfile3.xlsx
Additional file 3 Table S14 Abundance of potential functional genes predicted from the sequence data of dental calculus samples using 300 bp PE, V1–V2 primer, HOMD database and PICRUSt2.
Additionalfile4.xlsx
Additional file 4 Table S15 Summary of the read processing and trimming length maximising the survival rate of non-chimeric reads for each primer and sample in mock1 with 250 bp PE. Table S16 Summary of the read processing and trimming length maximising the survival rate of non-chimeric reads for each primer and sample in mock1 with 300 bp PE. Table S17 Summary of the read processing and trimming length maximising the survival rate of non-chimeric reads for each primer and sample in mock2 with 300 bp PE. Table S18 Summary of the read processing and trimming length maximising the survival rate of non-chimeric reads for each primer and sample in dental calculus sample with 300 bp PE.
Additionalfile5.xlsx
Additional file 5 Fig. S1 Survival rate of non-chimeric reads for each primer and sample. The horizontal axis represents the trimming length of forward and reverse reads for each primer, while the vertical axis represents the percentage of the survival rate of the non-chimeric reads.
Additionalfile6.pdf
Additional file 6 Fig. S2 Rarefaction curves of data using each primer and its replicates in mock1 with 250 bp and 300 bp PE and mock2 and the dental calculus samples with 300 bp PE.

Download PDF

Version 1

posted

You are reading this latest preprint version

Optimal 16S rRNA gene amplicon sequencing analysis for oral microbiota to avoid the potential bias introduced by trimming length, primer, and database

Status:

Version 1

Abstract

Figures

Background

Material & Methods

Preparation of mock communities

DNA extraction from clinical samples

Library preparation and Illumina sequencing of mock communities and clinical samples

Analysis of the sequenced data

Results

Analysis of optimal trimming and merging lengths for QIIME2

Diversity, phylogenetic, and similarity analysis of the 250 bp PE of mock1

Diversity, phylogenetic, and similarity analysis of the 300 bp PE of mock1

Diversity, phylogenetic, and similarity analysis of the 300 bp PE of mock2

Diversity and phylogenetics of the 300 bp PE of the dental calculus samples

Discussion

Conclusion

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1