Species assignment of the strain YA215
DNA-DNA hybridization (DDH), 16S rRNA, and Average nucleotide identity (ANI) were commonly used to identify new taxonomic units of strains (Richter et al. 2009). However, DDH analysis was complex and time-consuming, and 16S rRNA identification had limitations (e.g., some species have low DDH values), so ANI analysis was gaining popularity among researchers. ANI analysis was mainly used to assess the relatedness of species at the whole-genome level, and strains can be found in species, subspecies, or even sub-species. Strains can be directly recognized at the species, subspecies or even subpopulation level, and an ANI value of 95% is regarded as the standard for defining species (Zeigler 2003). The strain YA215 was re-identified by ANI and OrthoANI analysis based on omplete genome sequencing data. The similarity between strain YA215 and most of B. velezensis was more than 97%, especially the similarity with B. velezensis JJ-D34 was 99.67% (Table 1). OrthoANI values were further calculated for strain YA215 with B. velezensis FZB42, B. amyloliquefaciens DSM 7, B. siamensis KCTC 13613, and B. subtilis 168. The strain YA215 showed the highest OrthoANI value compared with B. velezensis FZB42 (97.75%) followed by B. siamensis KCTC 13613 (95.22%) and B. amyloliquefaciens DSM 7 (94.01%) and B. subtilis 168 (76.96) (Fig. 1). Taken together, the YA215 strain was identified as the closest to strain B. velezensis, and thus the strain YA215 was designated as B. velezensis YA215, and the whole genome sequence of B. velezensis YA215 has been deposited in the NCBI GenBank database under accession number CP121465.1.
Table 1
The similarity of ANI of the YA215 strain and other bacteria.
ANI values (%) |
B. velezensis JJ-D34 | 99.67 |
B. velezensis NJN-6 | 99.26 |
B. velezensis CAU B946 | 99.24 |
B. velezensis L-S60 | 99.21 |
B. velezensis L-H15 | 99.21 |
B. velezensis FZB42 | 97.59 |
B. velezensis AS43.3 | 97.55 |
B. subtilis BS34A | 76.25 |
B. subtilis ATCC 19217 | 97.49 |
B. subtilis ATCC 13952 | 93.11 |
B. siamensis SRCM100169 | 93.97 |
B. siamensis KCTC 13613 | 93.98 |
B. licheniformis ATCC 14580 | 72.04 |
B. amyloliquefaciens XH7 | 93.19 |
B. amyloliquefaciens UMAF6639 | 97.49 |
B. amyloliquefaciens TA208 | 93.2 |
B. amyloliquefaciens KHG19 | 97.58 |
B. amyloliquefaciens DSM 7 | 93.27 |
B. amyloliquefaciens CC178 | 97.6 |
Features of B. velezensis YA215 genome |
The total length of B. velezensis YA215 genome was 3976514 bp with 46.56% GC content (Fig. 2 and Table 2), and no plasmid was detected. There were 3809 predicted coding genes. The total length of coding genes was 3519777 bp, accounting for 88.51% of the genome. The genome predicted 86 tRNAs genes, 27 rRNAs genes and 79 sRNA genes.
Table 2
Genome statistics of B. velezensis YA215.
Genomic Feature | Value |
Size of the genome assembly (bp) | 3976514 |
GC content (%) | 46.56 |
Protein-coding genes | 3809 |
Protein-coding regions (bp) | 3519777 |
rRNA genes | 27 |
tRNA genes | 86 |
sRNA genes | 79 |
Functional Annotation
COG Functional Annotation
The Clusters of Orthologous Groups of proteins (COG) database has been a common tool for functional annotation of microbial genomes (Galperin et al. 2019). COG functional annotation is the comparison of target sample sequences with known functional sequences summarized by clustering, so that the researcher can obtain the function of the unknown gene. The predicted COG functional annotation of the genome of B. velezensis YA215 using the COG database is shown in Fig. 3. Through the COG database, 3090 genes were annotated in the B. velezensis YA215 genome, representing 81.12% of all genes. Among the 23 functional categories, amino acid transport and metabolism had the highest number of 304 genes, accounting for 8.76% of the total number of genes, followed by transcription and carbohydrate transport and metabolism, with 294 and 269 genes, respectively. In addition, the number of genes with 200 genes and greater than 5% of the total number of genes were 246 in general function prediction only, 233 in translation, ribosomal structure and biogenesis, 210 in cell wall/membrane/envelope biogenesis and 202 in coenzyme transport and metabolism. The number of genes in the remaining COG types ranged from 3 to 194.
GO Function Notes
The Gene Ontology (GO) database can be used to annotate protein functions from three major directions: Biological Process (BP), Cellular Component (CC) and Molecular Function (MF). The annotation results are shown in Fig. 4. A total of 2654 genes were annotated to the GO database, accounting for 69.68% of the total genes. Among the three major categories of functions, the abundance of genes related to Biological Process was the lowest, but the genes related to Cellular Component was the highest, followed by those related to Molecular Function. Among the genes related to biological processes, the three functions with the highest abundance were regulation of transcription, DNA-templated, translation and transmembrane transport. Among the genes related to cellular components, the three functions with the highest abundance were integral component of membrane, plasma membrane and cytoplasm. Among the genes related to molecular functions, the three functions with the highest abundance were ATP binding, DNA binding and metal ion binding. The above results indicated that many functions of B. velezensis YA215 were related to biofilm, and many metabolites may be transported across the membrane, which lays the foundation for the production of metabolites by B. velezensis YA215.
KEGG Metabolic Pathway Analysis
KEGG allows systematic analysis of gene function and is a large knowledge base linking genomic and functional information. A total of 2486 functional genes of B. velezensis YA215 were annotated in the KEGG database, accounting for 65.26% of the total genes. The annotated pathways were classified into six categories, which were further divided into 41 subclassification maps, and the results of the annotations are shown in Figs. 5. Functional genes related to metabolic pathways were annotated to 1742, accounting for 70.07% of the annotated genes, which indicates that B. velezensis YA215 has a strong metabolic capacity. The number of genes annotated in the global and overview maps of metabolic pathways was the highest, including 655 genes related to metabolic pathways, 316 genes related to biosynthesis of secondary metabolites, 169 genes related to metabolism of microorganisms in different environments and 116 genes related to biosynthesis of amino acids. The genes related to the biosynthesis of amino acids and genes related to biosynthesis in the biosynthesis of secondary metabolites were closely related to the synthesis of various metabolites (He et al. 2021). In environmental information processing, signal transduction pathway was annotated the largest number of genes, which were concentrated in two-component system. The genes related to the membrane transport pathway in environmental information processing also received a high number of annotations, and the membrane transport pathway was mainly focused on ABC transporter proteins, which were the key proteins in the formation of bacteriocin processing and release processes (Qiu et al. 2011).
The largest number of genes related to the translational pathway were annotated in genetic information processing, and the genes annotated in the translational pathway were mainly concentrated in the ribosome, where most peptide metabolites were synthesized. In Cellular Processes the cellular community - prokaryotes pathway was annotated with the highest number of genes, and the cellular community - prokaryotes pathway was mainly focused on quorum sensing.
CAZymes analysis
One hundred and twenty-seven CAZymes-encoding genes were predicted in the B. velezensis YA215 genome, including 43 glycosyl transferases, 39 glycoside hydrolases, 32 carbohydrate esterases, 9 auxiliary activities, 3 polysaccharide lyases, and 1 carbohydrate-binding modules (Fig. 6). The result showed that glycosyl transferases, glycoside hydrolases and carbohydrate esterases accounted for the majority. These enzymes were found to be essential for the synthesis of secondary metabolites via the non-ribosomal pathway (Chen et al. 2022). It was found that CAZymes also have antimicrobial properties, and its antimicrobial mechanism was that CAZymes acted on substances from the corresponding substrates in the fungal cell wall, and also lysed bacteria to limit pathogen expansion and multiplication (Geiser et al. 2016).
Predictive analysis of secondary metabolite synthesis gene clusters
The genome of B. velezensis YA215 was predicted to contain 13 secondary metabolite synthesis gene clusters by antiSMASH software, and the results were compared with the existing secondary metabolite gene clusters in NCBI, and the results are shown in Table 3. The analysis of the comparison results revealed that there were 9 types of secondary metabolite gene clusters in B. velezensis YA215. There were three gene clusters of the NRPS class, two clusters of the transAT-PKS and terpene classes, and one cluster of each of the other types. It can be showed that the number of gene clusters of NRPS class was the highest, and the number of genes contained in NRPS gene clusters was also the highest, which accounted for 29.15% of the total number of predicted secondary metabolite genes. NRPS were multi-enzyme complexes that bind malonyl derivatives and amino acids in a sequential manner (Ding et al. 2023). NRPS enzymes utilized many different building blocks to produce a wide range of secondary metabolites with potential bioactive (Butcher et al. 2007). In the results of comparison with known gene clusters, six gene clusters showed 100% similarity with the clusters that have been reported, they were gene cluster #6, synthesized macrolactin H, gene cluster #7, synthesized bacillaene, gene cluster #8, synthesized fengycin, gene cluster #11, synthesized difficidin, gene cluster #12 synthesized bacillibactin, and gene cluster #13, synthesized bacilysin. whereas gene cluster #1, synthesized surfactin showed 78% homology, gene cluster #3, synthesized plantazolicin showed 91% homology, and the homology of the synthesized compounds of gene clusters #2 and #4 was found to be less than 8%. It was worth noting that gene clusters #5, 9, and 10 gene clusters was failed to match the compounds with homology, which may provide a basis for the discovery of novel antimicrobial compounds. Previous studies have shown that the potential of secondary metabolites was assessed by sequence similarity of 70% (Gontang et al. 2010), and the threshold for differentiating the production of the same secondary metabolite by sequence similarity of 85% has also been reported (Reddy et al. 2012). Regardless of the type and diversity of secondary metabolite production assessed by 70% or by 85%, 61.53% of the gene clusters were predicted to produce secondary metabolites from B. velezensis YA215 with greater than 70% similarity to known gene clusters. The results demonstrate that the genome of B. velezensis YA215 contains clusters of genes for the synthesis of various metabolites such as fengycin, surfactin, macrolactin H, bacillaene, difficidin, bacillibactin, bacilysin and plantazolicin. Meanwhile, the three gene clusters that possess the ability to synthesize novel compounds were predicted. In recent years, an increasing number of secondary metabolite gene clusters have been identified in the B. velezensis genome. B. velezensis HAB-2 contained a cluster of 13 genes related to secondary metabolites that were resistant to microorganisms (Xu et al. 2020). B. velezensis B-4 contained a cluster of 12 genes involved in the synthesis of antimicrobial metabolites (Zhu et al. 2020). B. velezensis GUAL210 also harbored 12 gene clusters related to secondary metabolites that were resistant to microorganisms (Zhou et al. 2023).
Table 3
Predicted secondary metabolic gene clusters in the B. velezensis YA215 genome
Cluster ID | Type | MIBiG accession | Similar Cluster | Similarity(%) | Gene No. |
cluster1 | NRPS | BGC0000433 | surfactin | 78 | 41 |
cluster2 | thiopeptide | BGC0000082 | kijanimicin | 4 | 23 |
cluster3 | LAP | BGC0000569 | plantazolicin | 91 | 18 |
cluster4 | PKS-like | BGC0000693 | butirosin A / butirosin B | 7 | 41 |
cluster5 | terpene | - | - | - | 22 |
cluster6 | transAT-PKS | BGC0000181 | macrolactin H | 100 | 44 |
cluster7 | transAT-PKS | BGC0001089 | bacillaene | 100 | 52 |
cluster8 | NRPS | BGC0001095 | fengycin | 100 | 64 |
cluster9 | terpene | - | - | - | 22 |
cluster10 | T3PKS | - | - | - | 43 |
cluster11 | transAT-PKS-like | BGC0000176 | difficidin | 100 | 59 |
cluster12 | NRPS | BGC0000309 | bacillibactin | 100 | 46 |
cluster13 | other | BGC0001184 | bacilysin | 100 | 43 |