The gyrA gene of the Bacillus genus shows higher variation rates than 16S rRNA
The housekeeping gene gyrA is considered to be more variable than 16S rRNA and has been used as a molecular tool for classification and identification of Bacillus subtilis species[13, 27]. To determine the specific variation divergence of 16S rRNA and gyrA gene sequences in the genus Bacillus, two alignments with sequences of the entire 16S rRNA and gyrA genes from 20 Bacillus species (361 genomes) (Table S1) were used to analyze nucleotide diversity. The nucleotide diversity (Pi) of 16S rRNA and gyrA gene calculated by the DnaSP software was 0,039 and 0.491, respectively (Fig. 1A,B blue line). The results showed that the degree of interspecies variation was significantly higher for the gyrA gene than for the 16S rRNA gene.
Intraspecific variation analysis of the two genes was performed using the same approach but focusing on three Bacillus species: Bacillus amyloliquefaciens (88 genomes), Bacillus licheniformis (71 genomes) and Bacillus pumilus (97 genomes) (Table S1). The intraspecific nucleotide diversity (Pi) of 16S rRNA was again lower than the intraspecific nucleotide diversity of gyrA gene sequences in all three species: B. amyloliquefaciens (Pi16S = 0.0014; PigyrA=0.0244), B. licheniformis (Pi16S = 0.00024; PigyrA=0.0021) and B. pumilus (Pi16S = 0.00136; PigyrA=0.0344) (Fig. 1A,B nonblue lines). In B. amyloliquefaciens, B. licheniformis and B. pumilus, the degree of variation of the gyrA gene was significantly higher than that of 16S rRNA. In conclusion, the Bacillus gyrA gene shows higher variation rates than 16S rRNA, hence we propose that gyrA represents a promising molecular marker for analyses of Bacillus community diversity analyses and the diversity of Bacillus isolates.
First comparative tests of three primer pairs for the detection of Bacillus species
As indicated above Bacillus isolates have been already analyzed by primers targeting gyrA, however the specificity of these primers has not been investigated broadly[28, 29]. In order to satisfy the amplicon sequencing requirements, we designed a new primer pair (gyrA3) (Fig. 2A), and compared its amplification potential in colony PCR and virtual PCR with the previously designed primers, referred to here as gyrA1[17] and gyrA2[30].
First, we selected seven strains of different Bacillus species: Lysinibacillus fusiformis, Paenibacillus polymyxa, Bacillus pumilus, Bacillus velezensis, Bacillus megaterium, Bacillus cereus and Bacillus subtilis (Fig. 2B) to perform PCR amplification with primer pairs gyrA1, gyrA2 and gyrA3. The PCR amplification results showed that gyrA1 detected only B. subtilis; gyrA2 detected B. subtilis, B. velezensis and L. fusiformis; whereas gyrA3 performed much better and detected all Bacillus species included in the analysis (Fig. 2B and Fig. S1).
For in-silico PCR analysis, we constructed a gyrA gene library containing 5062 full-length gyrA gene sequences from 226 Bacillus species obtained from the National Center for Biotechnology Information (NCBI) (Table S2). Our criteria for positive in-silico gyrA PCR amplification were annealing of the primer pair with at least 18 bases match (forward primer and reverse primer). The results showed that only 8 Bacillus species were amplified in-silico by gyrA1, 9 Bacillus species were amplified by gyrA2 (Fig. S2), while 55 Bacillus species were amplified by gyrA3 (Fig. 2C). The majority of sequences amplified by gyrA1 and gyrA2 belonged to B. subtilis, whereas gyrA3 demonstrated broader diversity as evidenced by the amplification of seven species and in-silico PCR (Fig. 2B,C and Table S3).
Specificity range of the gyrA3 primer pair by using PCR and in silico PCR
Because the gyrA3 primer pair performed better than the previously reported primers, we next combined analysis of the in-silico amplified gyrA genes with PCR analysis of Bacillus isolates from our laboratory culture collection. Virtual gyrA3 PCR amplicons from 55 different Bacillus species from the NCBI database were evenly distributed among the branches of the phylogenetic tree (Fig. 3 orange and green).
Next, we used 127 Bacillus strains and related genera (Paenibacillus, Lysinibacillus, Aneurinibacillus, Virgibacillus, Brevibacillus, Halobacillus, Fictibacillus) from our laboratory culture collection to amplify their gyrA genes with a gyrA3 primer pair (Table S4). The results showed that 28 Bacillus species and 16 Bacillus-related species could be amplified by the gyrA3 primers (Fig. 3 blue and green). These Bacillus species, marked in green in the phylogenetic tree, 7 species were detectable by both methods (Fig. 3 green). In summary, the primer pair gyrA3 can potentially detect 76 Bacillus species and as many as 16 species from related genera (Fig. 3).
The gyrA gene provides better intraspecific phylogenetic resolution than the 16S rRNA gene among certain species
Compared to 16S rRNA, the molecular evolution rate of gyrA gene sequences is faster[1], so we hypothesized that gyrA might provide better phylogenetic resolution at the subspecies level. To analyze the intraspecific variability of the 16S rRNA and the gyrA gene, we downloaded the genomes of four Bacillus species, each of which had more than 100 genomes available in the NCBI database, including B. amyloliquefaciens (116 genomes), B. pumilus (140 genomes), B. megaterium (117 genomes) and B. anthracis (226 genomes) (Table S5). We did not include analysis the of B. subtilis genomes because templates for gyrA primers have already been developed and applied for analyses of this species[17, 28–30].
In B. amyloliquefaciens, the 480 bp long 16S rRNA region (V3-V4 region) contained 12 variable base sites (Fig. 4A and Fig. S3A), which were detected in only 4 of 116 genomes of this species (Fig. 4A), and the SNP frequency at variable sites within the 4 genomes ranged from 0.21%-1.46% (Fig. S3C red column). In contrast, the 500-nucleotide gyrA region (positions 350–850) contained 59 variable sites (Fig. 4B and Fig. S3B). The variable sites were detected in 109 of 116 genomes (Fig. 4B), and the frequency of SNPs at the variable sites ranged from 0.4–6.4% (Fig. S3C blue column).
In B. pumilus, alignment of the 16S rRNA V3-V4 region revealed 21 variable base sites (Fig. 4C and Fig. S4A), but again only in 11 out of 140 genomes (Fig. 4C). The frequency of SNPs at variable sites ranged from 0.21–3.54% (Fig. S4C red column). In contrast, in the gyrA gene 130 of the base sites were variable (Fig. 4D and Fig. S4B) and these were found in 87 of 140 genomes, with SNP frequencies at variable sites ranging from 0.2–15.8% (Fig. S4C blue column). However, in B. pumilus nearly 50% of the genomes examined had 100% identity in the gyrA gene, indicating high degree of relatedness between genomes that may require sequencing of additional marker genes for clonality verification and strain typing.
We also observed that the variation of the gyrA gene is quite different in different Bacillus species, which could be a bias of the NCBI database or a property of certain species. For example, B. megaterium, showed lower diversity with 3 bases of variation in the 16S rRNA alignment region (V3-V4 region) (Fig. S5A and Fig. S6A). Although 48 of 117 genomes showed polymorphism, the maximum SNP frequency at variable sites of B. megaterium genomes was only 0.42% (Fig. S6C red column). The gyrA gene was again more polymorphic with 58 bases of variation (Fig. S5B and Fig. S6B) occurring in 99 of 117 genomes, with SNP’s frequencies at variable sites ranging from 0.2–2.8% (Fig. S6C blue column).
The variation divergence between the two genes of Bacillus anthracis was much lower than in the three Bacillus species described above. Although we identified 13 variable base sites in the 16S rRNA V3-V4 region and 65 variable base sites in the gyrA gene region (Fig. S5C,D and Fig. S7A,B), these single nucleotide polymorphisms (SNPs) occurred in only 7 and 18 of 226 genomes, respectively. Moreover, the SNP frequencies at variable sites in 7 and 18 of B. anthracis genomes were also low: 0.21%-1.04% and 0.2%-7.4%, respectively (Fig. S7C).
Overall, our results showed that within Bacillus species the frequency of SNPs in the gyrA gene was consistently much higher than in the 16S rRNA (Fig. 4 and Fig. S5). We therefore suggest that the gyrA gene provides better resolution than 16S rRNA for identification and typing of Bacillus isolates at the subspecies level. This is particularly true for B. amyloliquefaciens, B. pumilus and B. megaterium but less so for B. anthracis (Table 1).
Table 1
Polymorphisms of the 16S rRNA and gyrA gene
Species
|
Gene
|
Gene length (bp)
|
Number of variable sites
|
Total number of genomes
|
Genomes number with variable bases
|
Range of genomes variable sites
|
Range of genomes variable sites (%)
|
Bacillus amyloliquefaciens
|
16S
|
480
|
12
|
116
|
4
|
1–7
|
0.21–1.46
|
gyrA
|
500
|
59
|
109
|
2–32
|
0.4–6.4
|
Bacillus pumilus
|
16S
|
480
|
21
|
140
|
11
|
1–17
|
0.21–3.54
|
gyrA
|
500
|
130
|
87
|
1–79
|
0.2–15.8
|
Bacillus megaterium
|
16S
|
480
|
3
|
117
|
48
|
1–2
|
0.21–0.42
|
gyrA
|
500
|
58
|
99
|
1–14
|
0.2–2.8
|
Bacillus anthracis
|
16S
|
480
|
13
|
226
|
7
|
1–5
|
0.21–1.04
|
gyrA
|
500
|
65
|
18
|
1–37
|
0.2–7.4
|
The resolution power of Bacillus mock community gyrA amplicon sequencing
Our results above suggest that the primer pair gyrA3 could be used as a molecular marker for diversity analysis of Bacillus. Next, we aimed to design a mock community to test the efficacy of the gyrA3 primer and used the general 16S rRNA primers (V3-V4) as a positive control. For better comparison, we designed two additional primer pairs, gyrA4 and gyrA5, which are very close to the position of the gyrA3 in the gyrA gene (Fig. S8A). We selected eight strains belonging to four species (B. altitudinis, B. licheniformis, B. velezensis and Lysinbacillus pakistanensis) and successfully amplified their gyrA gene by gyrA3, gyrA4 and gyrA5 primer pairs in a routine PCR for selected strains (Fig. 5A). Next, we artificially mixed eight indicated strains in equal proportions to obtain a mock community and then retested the resolution power of the three gyrA primer pairs and the 16S rRNA-specific primers. Our goal was to test whether the primer pair is suitable for determining the diversity of mock community (Fig. 5). Sequencing of the 16S rRNA amplicons showed that 16S rRNA primers could distinguish only five units, as strains LY1 and LY18, LY37 and LY43, and LY39 and LY48 had identical V3-V4 nucleotide sequence (Fig. 5B). The gyrA primer pairs were capable of resolving six units, but gyrA4 and gyrA5 produced amplicons of variable abundance and preferentially amplified LY35 and LY2, respectively(Fig. 5C-E). In comparison, primers gyrA3 also amplified six fragments but the relative abundance of these amplicons was more uniform (Table S6) with the exception of LY2 strain (Fig. 5C). Our data suggest that gyrA3 has potential for Illumina amplicon sequencing of more complex Bacillus communities.