Design of high specificity and sensitive sequencing primers to detect the bifidobacterial population in biological samples

Background : Bifidobacteria were poorly represented or even totally absent in many of the currently published microbiota/metagenomics datasets. This work designed Bifidobacterium -targeting primers that were suitable for use in high-throughput sequencing to characterized the feature of Bifidobacterium . Results : After multiple steps of in silico prediction, phylogenetic analysis, polymerase chain reaction and single-molecule real-time sequencing (SMRT) experiments, the Bif-6 primer pair was confirmed to be optimal for such purpose. Up to 99.42% and 99.32% of the detected sequences in human stool and breast milk samples represented up to 6 and 8 different Bifidobacterium species, respectively. Additionally, the Bif-6 primer pair detected the species Bifidobacterium catenulatum and Bifidobacterium kashiwanohense , which have not previously been reported in human breast milk to our knowledge. The high specificity of Bif-6 could be explained by the lower average nucleotide identity of 78.42% (versus 93.91% for the universal primers 27f/1492r). Moreover, to optimize the process of microbiota annotation, here a Bifidobacterium -specific database was constructed that largely reduced the time required for annotation compared with using the NCBI nucleotide (nt) database. The two databases generated highly consistent profiles of bifidobacteria when tested on the same samples. Conclusion work designed a highly reliable primer and database specific for detecting the which will facilitate future group bacteria.


Abstract
Background : Bifidobacteria were poorly represented or even totally absent in many of the currently published microbiota/metagenomics datasets. This work designed Bifidobacterium -targeting primers that were suitable for use in high-throughput sequencing to characterized the feature of Bifidobacterium .
Results : After multiple steps of in silico prediction, phylogenetic analysis, polymerase chain reaction and single-molecule real-time sequencing (SMRT) experiments, the Bif-6 primer pair was confirmed to be optimal for such purpose. Up to 99.42% and 99.32% of the detected sequences in human stool and breast milk samples represented up to 6 and 8 different Bifidobacterium species, respectively. Additionally, the Bif-6 primer pair detected the species Bifidobacterium catenulatum and Bifidobacterium kashiwanohense , which have not previously been reported in human breast milk to our knowledge. The high specificity of Bif-6 could be explained by the lower average nucleotide identity of 78.42% (versus 93.91% for the universal primers 27f/1492r). Moreover, to optimize the process of microbiota annotation, here a Bifidobacterium -specific database was constructed that largely reduced the time required for annotation compared with using the NCBI nucleotide (nt) database. The two databases generated highly consistent profiles of bifidobacteria when tested on the same samples.
Conclusion : this work designed a highly reliable primer and database specific for detecting the bifidobacterial population in complex biological samples, which will facilitate future studies of this group of bacteria.

Background
Bifidobacterium is a Gram-positive, nonmotile, non-spore-forming, catalase-and oxidasenegative, and often branched anaerobic bacterial genus [1]. Members of this genus are part of the endogenous gut microbiota and are generally considered to be healthpromoting [2]. They play important roles in modulating gut mucosal physiological responses and immunity [3][4][5], as well as guarding against infection, inhibiting pathogenic bacteria, and alleviating inflammatory diseases [6,7]. Therefore, Bifidobacterium can be regarded as gut health indicators.
The advent of sequencing technology has made it possible to comprehensively describe the microbial community in various ecosystems, but the detection of low abundant taxa remains challenging [8]. The bifidobacterial population is often undetected or underestimated due to its intrinsic low abundance and limitation in detection metthodology [9][10][11]. Particularly, the choice of polymerase chain reaction (PCR) primers would affect the sensitivity and specificity of detection, as well as taxonomic coverage. An inadequate primer design might result in underestimation of a specific microbial population like bifidobacteria [12,13]. Most previous studies applied universal 16S rRNA gene-specific PCR primers to detect thebifidobacterial biodiversity in the ecosystem [14,15], although several other studies attemped to design PCR primers based on rRNA internal transcribed spacers sequences (ITS) [16,17], some housekeeping genes and functional genes such as HSP60, tuf, and recA [18][19][20]. However, the majority of these primers failed to distinguish between closely related bifidobacterial species [16,21]; and most of these genes when used for primer design suffer from limitations like multicopy, non-uniform product size, and low taxonomic resolution [22,23].
Thus, this study aimed to develop new bifidobacteria-targeting primers that were suitable for use in high-throughput sequencing. The primers were designed based on core genes universial to all known bifidobacterial (sub)species. Our work has established costeffective tools for in-depth study of bifidobacteria in complex environments.

Consensus Bifidobacterium sequences and deduced primers
Sixty-nine Bifidobacterium genome sequences were downloaded (Supplementary material 1), and seven consensus sequences were identified from these Bifidobacterium genomes.
Four of these genes were Bifidobacterium-specific based on search against the NCBI nt database, from which 30 Bifidobacterium-specific primer pairs were deduced. Then, 17 pairs of primers were rejected due to mismatching and non-specificity after querying against the NCBI nt database. The predicted amplicon size of the remaining 13 primer pairs was 400-700 bp.
Primer specificity to Bifidobacterium culture The amplification specificity of the 13 primer pairs was tested in PCR on 24 bacterial strains ( Table 1). Four of the primer pairs (namely Bif-4, Bif-5, Bif-6, and Bif-11) exhibited high specificity to Bifidobacterium, evidenced by positive PCR on two bifidobacterial strains while absent of non-specific amplication when non-bifidobacterial strains were tested (Table S1 and Figure S1). The sequences and the optimum annealing temperature (Tm) of these four primer pairs are listed in Table 2.
Primer specificity to Bifidobacterium diversity in biological samples (A) Initial screening To select for the primer pair that was most suitable for use in targeting the bifidobacteria population in biological samples, Bif-4, Bif-5, Bif-6, and Bif-11 were tested on samples 1-3 together with the control 27f/1492r primers by SMRT sequencing. Although the number of observed OTUs did not saturate, the Shannon diversity index reached a stable value in all cases, suggesting a representative microbial diversity had already been captured ( Figure S2).
For comparison between samples, the number of sequence per sample was randomized and standardized to 1,610 sequences. The proportion of bifdobacterial sequences in each sample was calculated ( Table 3). The four currently designed primer pairs showed high effectiveness in amplifying bifidobacterial sequences, representing 96.01% to 100% of total sequences (<1% using 27f/1492r primers) ( Table 3).The Bifidobacterium-specific primers could detect between four to seven different species per sample, while the 27f/1492r primers could only detect atmost two bifidobacterial species per sample ( Figure   1). The distribution of the eight bifidobacterial species detected by different primer pairs is listed in Table S2. Bif-6 showed the highest numerical means in both Shannon diversity index and number of observed OTUs, though no significant differences were indeed observed in both parameters among the four tested primer pairs (p > 0.05) ( Figure 2). Thus, Bif-6 was selected for the next round of verification.
(B) Further SMRT verification Ten faecal and breast milk samples were sequenced with the primer pairs Bif-6 and 27f/1492r using SMRT. For comparison between samples, the number of sequence per sample was standardized to 2,024 and 3,473 sequences for the faecal and breast milk samples, respectively. Similarly, the Shannon diversity index but not the number of observed OTUs reached a stable value as sequencing depth increased, indicating most microbial diversity of each sample had already been captured although new taxa might still be found ( Figure S3).
For the breast milk samples, the average proportion of Bifidobacterium detected by the primer pairs 27f/1492r and Bif-6 were 1.94±0.09% and 99.32±0.26%, respectively ( Figure   4; Table S4). The 27f/1492r primers failed to detect any bifidobacteria in four of the samples, while only one species was detected in each of the other four samples. On the other hand, the bifidobacterial sequences detected by the Bif-6 primer pair comprised over 99% of the total sequences in all cases (Table S4) Bifidobacterium longum subsp. suis (prevalence: 37%, proportion: 0.01%) were only occasionally deteced by the Bif-6 primer pair (Table S4).

Currently constructed Bifidobacterium database and its reliability
To identify the taxonomic resolution of Bif-6, here the anticipated amplicon sequences were predicted from all available Bifidobacterium genomes to construct a bif-6-sequence database, from which a phylogenetic tree was created. Different bifidobacterial species were distinctive in the Bif-6-sequence phylogenetic tree ( Figure S4), whereas the bifidobacterial species formed a cluster in the 16S rRNA sequence-based phylogenetic tree and were not distinguishable ( Figure S5). A distance of 100% identity indicated no distinction between two sequences. By pairwise comparison ofsequences in the bif-6 database, 100% identity occurred only in two pairs of species, namely Bifidobacterium gallinarum vs Bifidobacterium saeculare and Bifidobacterium kashiwanohense vs Bifidobacterium catenulatum (contrasting to six pairs of species using the 16S rRNA gene) (Supplementary Material 2). Such result indicated a higher taxonomic distinction could be achieved using the bif-6 amplicon than full-length 16S rRNA sequences. Additionally, the mean ANI value of bif-6 amplicon sequences was 78.42%, contrasting to 93.91% for the 27f/1492r database (Supplementary Material 3). According to Milani et al., the cutoff level of ANI value required for distinguishing between different Bifidobacterium species was 98.5% [16]. Based on this ANI cutoff level, six and ten Bifidobacterium species could not be distinguished by using Bif-6 and 27f/1492r primer pairs, respectively (Supplementary Material 3), suggesting Bif-6 offered a higher taxonomic resolution of Bifidodbacterium.
Besides, the ten faecal and breast milk microbiota profiles generated by the Bif-6 database and the nt library were largely consistent, except for the taxonomic assignment of two species, Bifidobacterium adolescentis and Bifidobacterium stercoris ( Figure 5B).

Discussion
Bifidobacterium is a bacterial genus that exists naturally in most ecosystems; however, many reports have underestimated the prevalence and biodiversity of the bifidobacterial population due to its generally low relative abundance, particularly in biological samples that contained complex microbiota [24][25][26]. In order to improve the detection of bifidobacteria in natural habitats, this worked aimed to design PCR primers that targeted the Bifidobacterium genus in complex samples using high-throughput technology.
By multiple steps of in silico prediction, analysis, and sequencing experiments, the Bif-6 primer pair was superior over the other three tested primers pairs, as demonstrated by the apparently higher number of observed OTUs in the dataset. By using the Bif-6 primer pair, the average relative abundance of bifidobacterial sequences comprised up to 99.42% (for human faecal samples) and 99.32% (for breast milk samples) of the total sequences, contrasting to 0.15% and 1.94% using 27f/1492r that targeted to the full-length 16S rRNA gene, suggesting tremendous improvement in the effectiveness in detecting the bifidobacterial populations in these two types of samples. Some published studies designed primers that were specific to the order Bifidobacteriales and reported a proportion between 24.1% to 94.4% of specific sequences [10,12,14]. The high specificity toward bifidobacteria of Bif-6 could be explained by its target gene, rpsK, which is a single copy core gene specific to the Bifidobacterium genus. Another reason for the high specificity and distinction of Bif-6 toward bifidobacteria could be the large difference in the mean ANI values between amplicons of Bif-6 and 27f/1492r primers, which were 78.42% and 93.91%, respectively, as determined by phylogenetic analysis.
Apart from the specificity and relative abundance, the Bif-6 primer pair was able to detect a much wider spectrum of bifidobacterial species both in human stool and breast milk samples compared with the 27f/1492r primer pair. The Bif-6 amplicons of the faecal samples comprised predominantly with the species Bifidobacterium adolescentis, Bifidobacterium catenulatum, and Bifidobacterium longum, which agreed with the results published by other reports describing the biodiversity of human bifidobacteria [6,[27][28][29].
Human breast milk is another natural source of Bifidobacterium [30]; thus, this study tested the Bif-6 primer pair on human breast milk samples by SMRT sequencing. By using the Bif-6 primer pair, eleven different bifidobacterial species/subspecies were detected in the eight breast milk samples; nine of them were commone to all tested samples, and six of them were previously reported in breast milk [31,32]. The most prevalent bifidobacterial species detected in the current breast milk samples were Bifidobacterium catenulatum (48.88%), Bifidobacterium longum subsp. longum (28.46%), and Bifidobacterium stercoris (16.47%), while each of the other species comprised over >0.1% of the total sequences. Chen et.al reported that Bifidobacterium longum (prevalence: 62.4%, abundance: 0.31%) was a predominant bifidobacterial species in human breast milk, whereas several reports found a low relative abundance of Bifidobacterium in human milk (<0.01%) [31][32][33][34], which could be due to inefficient detection methods used and difficulty in cultivation.
Additionally, to our knowledge, this is the first report that observed Bifidobacterium catenulatum and Bifidobacterium kashiwanohense in human breast milk; both species were often reported in healthy infant feces [35][36][37]. Our data support that a rich array of bifidobacteria exists in human breast milk.
Finally, the bif-6-sequence database constructed here facilitates rapid taxonomic assignment of bifidobacteria. However, since there are over 70 species and 10 subspecies of bifidobacteria [38], the current database (contained only 69 taxa) would need to be updated regularly subject to the availality of the rpsK gene sequences.

Conclusions
By combining in silico prediction, phylogenetic analysis, and empirical verification, we demonstrated that theBif-6 primer pair, targeting to the single copy rpsK gene that is unique to the Bifidobacterium genus, could effectively detect the bifidobacterial diversity in complex microbiota communities using the high-throughput sequencing approach. Our data support that a wide bifidobacterial diversity exists in human breast milk. The newly designed primer pair and bifidobacteria-specific database will serve as valuable tools for future in-depth studies of bifidobacteria in natural ecological niches.

Determination of primer specificity
The deduced primers were firstly tested by PCR using genomic DNA of the four bifidobacterial strains as templates ( Table 1). The size and specificity of generated PCR products were assessed by agarose gel electrophoresis. Primers that could amplify clean single DNA fragments of predicted sizes for all four strains were selected for a second round of PCR verification. In the second round, primer pairs were selected based on positive reactions for two selected bifidobacterial strains (Bifidobacterium animalis subsp. animalis and Bifidobacterium longum subsp. infantis) and negative reactions for the twenty non-bifidobacterial strains ( Table 1). The PCR was performed in a final volume of 50 µl containing 2 μl extracted DNA (20 ng/µL) on a thermocycler (LeikangHengtai Trading Co., Ltd., Beijing). The PCR cycling conditions were as follows: 95°C for 10 min, 30 cycles of 95°C for 30 s, annealing temperature (Tm) for 1 min, 72°C for 1 min, and 72°C for 7 min.
In the third round, the genomic DNA of three faecal samples (namely Samples 1, 2, and 3) were used as PCR templates. These three samples were known to contain a rich consortium of bifidobacteria as determined previously (data not shown). Based on the obtained results, four PCR primer pairs were further tested in single-molecule real-time (SMRT) sequencing (Table 2). Additionally, the universal primer pair that could amplify the full-length 16S rRNA gene, 27f/1492r (5'-AGAGTTTGATCCTGGCTCAG-3'/5'-CTACGGCTACCTTGTTACGA-3') [39], were included as control.
The PCR for SMRT sequencing was performed in a final volume of 50 µl containing 1 µl of extracted DNA (20 ng/µl), 25 ul 2 × KAPA HiFi HotStart ReadyMix, 1.2 uL forward primer (10 uM), 1.2 ul reverse primer (10 uM), ddH 2 O to a total volume of 50 ml [40]. The PCR primers were barcoded to distinguish between samples during SMRT sequencing. The

Bioinformatic analyses
Raw data were generated using the protocol RS_ReadsOfinsert.1, followed by the extraction of high-quality sequences using the QIIME package. The minimum full passes, minimum predicted accuracy, minimum read lengths of inserts, and maximum read lengths were set to 5, 90, 1,400, and 1,800, respectively [43]. The barcode and primer sequences were removed prior to operational taxonomic units (OTUs) assignment using UCLUST [44,45] with a 100% pairwise identity threshold. The taxonomic assignment of sequences generated by the 27f/1492r primer pair was based on the Greengenes (version 13.5) [45], RDP (release 11.5) [46], and Silva (version 128) [47] databases.

Statistical analyses
The relative abundances of bifidobacterial species were calculated and represented by heatmaps and pie charts. The graphic presentations were generated using the R software

Conflict Of Interest
The authors declare that they have no conflicts of interest.   Bifidobacterial composition of ten samples from Hainan province. Composition of microbiota at the genus level detected by the primer pairs 27f/1492r (A) and Bif-6 (B); heatmap representing the relative abundance of different Bifidobacterium species (C). Samples were encoded as HF1-HF10. The colour scale represents the relative abundance; a larger number represents a higher relative abundance.

Figure 4
Bifidobacterial composition of eight breast milk samples. Composition of microbiota at the genus level detected by the primer pairs 27f/1492r (A) and Bif-6 (B); heatmap representing the relative abundance of different Bifidobacterium species (C). 'BM' represents breast milk. The colour scale represents the relative abundance; a large number represents a high relative abundance.