Isolation of Bacteria and In Vitro Antagonistic Assay
A total of 181 Bacillus-like strains were isolated from healthy tomato rhizosphere soil and tomato plant tissue collected in either the Netherlands or Spain. Among them, 28 endophytic strains were isolated from healthy tomato plant tissues collected in Spain. 74 and 79 rhizosphere bacteria strains were isolated from tomato plants collected in Spain and the Netherlands, respectively. In order to identify potential PGPR strains, all the Bacillus-like strains were preliminarily screened by in vitro antagonistic activity against six major tomato plant pathogens, i.e. Erwinia carotovora [17], Pseudomonas syringae [18], Rhizoctonia solani [19], Botrytis cinerea [20], Verticillium dahliae [21], and Phytophthora infestans [22]. The results revealed that 34 Bacillus-like strains could inhibit different bacterial, fungal and oomycetal plant pathogens growth on plates (Figure 1 and 2).
We found that 34 Bacillus-like strains were distributed between three large major clusters of the neighbor-joining tree based on 16S rRNA genes. In the first cluster, strain EDO6 was clustered within the genus of Paenibacillus, which was related to type strain Paenibacillus xylanexedens B22a, with the percent identity value above 99.6% on 16S rRNA gene sequence. 10 isolated strains were grouped in the second cluster and were closely related to the type strains Bacillus endophyticus 2DT and MF126, Bacillus firmus IAM 12464, Bacillus megaterium NBRC 15308, or Bacillus aryabhattai B8W22. The third cluster consisted of 23 isolated strains. They were tightly releated to reference strains Bacillus cereus ATCC 14579, Bacillus velezensis FZB42 or SQR9, Bacillus subtilis BSn5 and NCD-2.
Of all strains, seven strains (TH16, FH17, EH6, DH15, BH4, BH5 and BH6) showed inhibition on all pathogens. Among them, four strains (TH16, FH17, BH5 and BH6), showing the biggest inhibition halo on all pathogens, were selected to sequence their genomes for further research. In addition, three strains (EH11, EDO6 and FH5), showing the highest inhibition activity (inhibition halo size >9.75mm against P. infestans), were also selected for genome sequencing. Besides, two strains (EH2 and EH5), showing the highest inhibition activity (inhibition halo size > 8.5 mm) against B. cinerea, were selected for genome sequencing as well. Strain DH12 was also selected for genome sequencing, because of the large inhibition halo size (>3.38 mm) measured on the plates against Pseudomonas syringae. In summary, a total of 10 strains (BH5, BH6, DH12, EH2, EH5, EH11, FH5, FH17, TH16 and EDO6) were genome sequenced for further research.
Genome Sequencing and Phylogenetic analysis
The genomes of 10 isolated strains were sequenced, assembled and annotated as described in a previous study [23]. Based on whole genome phylogenetic analyses, the 10 Bacillus strains were clustered into five clades as presented in Figure 3. All of them were tightly clustered together with reported PGPR strains from the Bacillus class, such as B. subtilis Bsn5, B. velezensis FZB42 and P. polymyxa E681. This suggests that they probably can promote plant growth as well, which needs to be further investigated. Moreover, to classify strains at the species level, Average Nucleotide Identity (ANI) and digital DNA-DNA Hybridization (dDDH) values were determined[24] (Additional file 1 Table S1). Strains DH12, EH2, EH5, and EH11 were exhibiting ≥ 98.21% ANI and ≥ 86.70% dDDH compared with reference genome B. subtilis Bsn5, therefore they were identified as B. subtilis species. Strains FH17 and TH16 were identified as B. velezensis, based on ≥ 98.14% ANI and ≥ 85.30% dDDH compared with reference genome of B. velezensis FZB42. Strains BH5 and BH6 were classified into B. cabrialesii species because of exhibiting 96.60% ANI and 73.40% dDDH compared with the reference genome of B. cabrialesii TE3. Strain FH5 was identified as B. endophyticus based on 96.35% ANI and 73.40% dDDH compared with reference genome B. endophyticus KCTC 13922. Strain EDO6 could not be classificated at the species level due to the low ANI and dDDH values (93.88% ANI and 57.60% dDDH), even compared with the closest species P. xylanexedens PAMC 22703, so we will name it Paenibacillus sp. EDO6.
Biosynthesis Gene Cluster (BGC) Mining
A total of 120 BGCs were found, averaging 12 clusters per genome. All the BGCs were designated as those encoding NRPSs, PKSs, terpenes, hybrid NRPS/PKSs, bacteriocins, RiPPs and others (Table 1A and Additional file 1 Table S2). The BGCs encoding surfactin [25], fengycin [25], bacillibactin [26], subtilosin A [27], bacillaene [28], macrolactin [29], difficidin [30], and subtilin [31] were discovered in the genomes. Besides, some BGCs encoding unknown compounds, were also identified (Table 1B). Most of the unknown BGCs (76.47%) are PKSs BGCs, which cannot be assigned to any known compounds. 73.07% bacteriocins BGCs encodes potential novel peptides. 27.78% and 27.27% of NRPSs and Hybrids BGCs are still unknown. These findings provide a great opportunity of new bioactive compounds discovery.
Novel NRPs and PKs BGCs identified from the 10 strains
The majority of BGCs could be assigned to known compounds, whereas 5 clusters represented probably novel NRPs and PKs-NRPs hybrid BGCs for which no or low similarity BGCs could be identified in the MIBiG [32] database (Figure 4).
Two novel gene clusters were identified from B. endophyticus FH5. One NRPs (Figure 4a) BGC consists of three genes and has a total size of 25 kb. Three genes are encoding 24 domains, which includes 7 condensation (C) domains, 7 adenylation (A) domian, 7 thiolation (T) domain, 2 epimerization (E) domain and 1 thioesterase (TE) domain. All the domains are essential components in this BGC and catalyze primary formation of a lipopeptide product. This BGC is showing no similarity to any known BGCs reported. The other one (Figure 4b) is a Type I PKs-NRPs hybrid BGC with a size of approximately 30 kb. The PKs module consists of a ketosynthase (KS) domain, a acyltransferase (AT) domain, an acyl carrier protein (ACP) domain and a terminal reductase (TD) domain. It likely incorporates the polyketide moiety of malonyl-CoA, while the NRPs modules incorporate six amino acid residues. Based on antiSMASH analysis, only 28% genes show similarity to the known paenilamicin BGC. Paenilamicin [33], synthesized by pam BGC from Paenibacillus larvae DSM25430, has antibacterial and antifungal activity. The pam cluster consists of five NRPs genes, two Type I PKs genes, and two Type I PKs-NRPs hybrid genes, and has a size of ∼60 kb. In contrast, the Type I PKs-NRPs hybrid BGC identified in B. endophyticus FH5 consists of only three NRPS genes and one Type I PKS gene. All of them differ from the pam cluster of Paenibacillus larvae DSM25430.
In the genome of Paenibacillus sp. EDO6, two novel trans-AT PKs-NRPs hybrid gene clusters (cluster 13 and cluster 12) were discovered, which have the sizes of almost 35 kb and 28 kb, respectively (Figures 4c and 4d). The order and domain of the genes of both hybrid clusters differ from each other. Specifically, Cluster 13 has an additional dehydratase domain variant (DHt) playing an important role during polyketide biosynthesis through the dehydration of the nascent polyketide intermediate to provide olefins [34], which cannot be found in cluster 12. In addition to the differences observed at the domain level of core biosynthetic genes, regulator and transporter genes are also different. Moreover, only 33% and 21% of the genes of cluster 13 and cluster 12 exhibit similarity to known pellasoren and xenocoumacin BGCs respectively. Pellasoren [35] was isolated from myxobacterium, which has shown to possess potential anti-cancer activity. The known pellasoren BGC, is a Type I PKs-NRPs hybrid cluster identified from Sorangium cellulosum So ce38 and consists of six genes of Type I PKs and one single gene of NRPs as compared to the trans-AT PKs-NRPs hybrid gene (cluster 13) of Paenibacillus sp. EDO6, which in turn consists of four trans-AT PKs genes and one trans-AT PKs-NRPs hybrid gene. Xenocoumacin [36] is the main anti-bacterial and anti-fungal compound produced by Xenorhabdus nematophila. The known xenocoumacin BGC, also being a Type I PKs-NRPs hybrid cluster, which was identified from Xenorhabdus nematophila ATCC 19061, consists of four genes of Type I PKs and two genes of NRPs whereas cluster 12 from Paenibacillus sp. EDO6 consists of one single trans-AT domain gene, one gene of trans-AT PKs and one gene of trans-AT PKs-NRPs hybrid.
One novel NRPs BGC was discovered both in B. velezensis FH17 and TH16 (Figure 4e). This BGC contains seven genes with a size of approximately 33 kb. Whereas seven modules are only encoded by two core biosynthetic genes, seven amino acids are incorporated into the final product. This BGC shows no similarity to any known clusters. Furthermore, a single heterocyclization (Cy) domain in the first module is found.
Novel Ribosomally synthesized and Post-translationally modified Peptides (RiPPs) identified in the 10 strains
A total of nine novel bacteriocin BGCs were identified from the 10 strains (Figure 5). All of them are belong to RiPPs (less than 10 kDa). These peptides are ribosomally synthesized, and undergo posttranslational modifications (PTMs), resulting in different structures and properties, mainly showing anti-bacterial activity against closely related producer strains [37].
Two novel gene clusters were identified as class I lanthipeptide BGCs. One lanthipeptide BGC was identified from both B. subtilis DH12 and EH11 with a size of ∼6 kb (Figure 5a). This BGC consists of four genes. The precursor peptide contains 59 amino acids, which shows no similarity to any known bacteriocins. Another one lanthipeptide BGC (Figure 5b) was identified from Paenibacillus sp. EDO6 with a size of ∼9 kb. This BGC contains seven genes. The precursor peptide encoded by the core biosynthetic gene contains 59 amino acids, which also shows no similarity to any known bacteriocins.
Three novel BGCs were identified as class II lanthipeptide BGCs. All of them belong to two-component lanthipeptides consisting of two peptides. The individual peptides of two-component lanthipeptides only have little or no antimicrobial activity, but the two peptides act in synergy to exhibit significantly higher activity in equimolar concentrations [38]. Both B. cabrialesii BH5 and BH6 harbor the same two-component lanthipeptide BGC (Figure 5c). It consists of six genes with a size of ∼9 kb. This BGC has 70% of genes showing similarity to staphylococcin C55 α/β BGC [39]. The presursors of two core biosynthetic genes (α and β) of this BGC identified contain 65 and 67 amino acids respectively. The C terminus (from C36 to K65) of the α precursor is belonging to the plantaricin C family of lantibiotics with a identity of 83.33% to the known peptide staphylococcin C55 α. Whereas the C terminus (from I38 to C67) of the β precursor shows 62.07% identity to lacticin 3147 A2 [40]. The second novel class II lanthipeptide BGC was discovered from B. subtilis EH5 (Figure 5d). This BGC has six genes with a length of ∼9 kb. The presursors of two core peptide genes (α and β) contain 65 and 67 amino acids respectively. It is also showing 70% gene sequence similarity to staphylococcin C55 α/β BGC. The C terminus (from C36 to C64) of the α precursor has a similarity of 79.31% to the known peptide staphylococcin C55 α and the C terminus (from W38 to C63) of the β precursor is showing 72% identity to lacticin 3147 A2. The third BGC was identified from B. endophyticus FH5 (Figure 5e). It is comprised of nine genes with a size of ∼10 kb. Its precursors of two peptides (α and β) contain 58 and 54 amino acids respectively. There is no similarity found to any known BGCs. The C terminal region (from A28 to C58) of the α precursor has a similarity of 53.33% to the known peptide plantaricin W α [41] and the C terminus (from A23 to D54) of the β precursor is showing 56.25% identity to haloduracin β [42]. Furthermore, the precursor β in this potential novel BGC found in B. endophyticus FH5 has four replicates, indicating potential high amount production of β peptide.
Two novel gene clusters were identified as class III lanthipeptide BGCs. This Class contains RiPPs that are modified by the mutifunctional enzymes LanKC. LanKC firstly phosphorylates the Ser/Thr residuses in the substrate peptide and then similarly catalytizes modification of the substrate to form the final product, as the class II lanthipeptide LanM enzyme [43]. The one identified from B. subtilis EH2 contains ten genes with a size of ∼8 kb (Figure 5f). No similarity was found to any known BGCs. The full precursor contains 58 amino acids. The predicted cleaveage site by antiSMASH is between T27 and G28. The C terminus (from G28 to N58) of the precursor has no identity to any known RiPPs. The other class III lanthipeptide BGC is harbored by B. velezensis TH16 (Figure 5g). This one contains five genes with a length of ∼5 kb. The core biosynthetic gene encodes a 45-amino acid precursor peptide. 35% genes of this BGC show similarity to locillomycin [44], which is a cyclic lipopeptide (NRPs) discovered from B. subtilis 916. The predicted cleaveage site is between V21 and D22 by antiSMASH and the C terminus (from D22 to C45) of the precursor has no identity to any known RiPPs.
Two novel lasso peptide BGCs were identified from the genomes of Paenibacillus sp. EDO6 and B. endophyticus FH5. The one from Paenibacillus sp. EDO6 contains eight genes with a size of ∼8 kb (Figure 5h). It shows that gene sequences are 60% similar to that of the paeninodin BGC [45]. The precursor peptide contains 45 amino acids. The predicted cleaveage site is between M22and A23. The core peptide (from A23 to S45) shows 33.3% identity to the paeninodin[45] from P. dendritiformis C454. Another novel lasso peptide BGC was mined from B. endophyticus FH5 (Figure 5i). This BGC comprised of six genes. It is showing 80% genes similarity to paeninodin. Its precursor peptide contains 45 amino acids. The cleaveage site is between M20 and A21. The core peptide (from A21 to S45) has 76% identity to the paeninodin.
Large-scale genome-based analysis of the bioactive potential of Bacillus
Lipopetides produced by the Bacillus genus are involved in the biocontrol mechanisms of plant pathogens [46]. To gain a general overview of BGCs distributed in the genomes of Bacillus genus, the diversity of BGCs in the genomes of Bacillus isolated was investigated. A total of 9459 BGCs were predicted and identified, which included NRPs (2377 BGCs), RiPPs (1564 BGCs), Type I PKs (517 BGCs), PKs-NRPs hybrids (309 BGCs), PKs (including Trans AT-PKs and Type III PKs) (1369 BGCs), Terpene (970 BGCs), Saccharide (62 BGCs) and Others (2291 BGCs).
The similarity network of predicted BGCs revealed that a large number of BGCs are present in Bacillus strains, and are distributed throughout different kinds of secondary metabolites (Figure 6). Based on our investigation, some of the NRPs BGCs were conserved among the BGCs identified in the Bacillus species. 259 out of 2377 (10.85%) NRPs BGCs were encoding surfactin, 330 (13.88%) BGCs were encoding bacillibactin, 110 (4.63%) NRPs BGCs were encoding fengycin, 158 (6.65%) NRPs BGCs were encoding petrobactin [47]. And 38 (1.60%) NRPs BGCs were encoding lichenysin [48]. Thus, a total of ∼38% of the NRPs BGCs are correlated to already reported compounds. Additionally, most of PKs-NRPs hybrid BGCs (67.64%) were ecoding bacillaene. Unlike the well-described NRPs and PKs-NRPs hybrid BGCs, the PKs BGCs were mostly attributed to unknown products with the exception of macrolactin [49] and difficidin [30]. Notably, 1357 out of 1564 (87.76%) RiPPs BGCs were also unknown. Overall, the distribution of known and unknown BGCs vary dramatically across the different kinds of metabolites in Bacillus species, in which the NRPs BGCs are the most abundant ones, comprising 2377 BGCs. Many of them are conserved and already characterized, but still a large number of unknown NRPs BGCs are identified for further study.