Isolation of Bacteria and In Vitro Antagonistic Assay
A total of 351 bacterial strains were isolated from healthy tomato rhizosphere soil and tomato plant tissue collected in either the Netherlands or Spain, 181 strains of which were considered as Bacillus-like strains based on the morphology of colonies and 80 °C treatment given before (spores surviving temperature). In order to identify potential PGPR strains, all the Bacillus-like strains were preliminarily screened by in vitro antagonistic activity against six major tomato plant pathogens, i.e. Erwinia carotovora, Pseudomonas syringae, Rhizoctonia solani, Botrytis cinerea, Verticillium dahliae, and Phytophthora infestans. The results revealed that 34 Bacillus-like strains could inhibit different bacterial, fungal and oomycetal plant pathogens growth on plates (Fig. 1). A phylogenetic analysis based on 16S rRNA revealed that these strains belong to the species of Bacillus subtilis (18 strains), Bacillus velezensis (3 strains), Bacillus endophyticus (3 strains), Bacillus megaterium (4 strains), Bacillus aryabhattai (2 strains), Bacillus cereus (2 strains), Bacillus firmus (1 strain) and Paenibacillus xylanexedens (1 strain). In total, 14 strains showed inhibition on bacterial, fungal, and oomycetal pathogens. Among them, 12 strains belong to the Bacillus subtilis group, while the remaining 2 strains belong to the Bacillus velezensis group. The others only showed antagonistic activity against fungi and oomycetes. A total of 10 strains (BH5, BH6, DH12, EH2, EH5, EH11, FH5, FH17, TH16 and edo6), showing high antagonistic activity, were genome sequenced for further research.
Genome Sequencing and Biosynthesis Gene Cluster Mining
The genomes of 10 isolated strains were sequenced, assembled and annotated as described in a previous study [17]. Based on whole genome phylogenetic analyses, the 10 Bacillus strains were clustered into four clades as presented in Fig. 2. All of them were tightly clustered together with reported PGPR strains from the Bacillus class, such as B. subtilis Bsn5, B. velezensis FZB42 and P. polymyxa E681. This suggests that they probably can promote plant growth as well, which needs to be further investigated. Strains BH5, BH6, DH12, EH2, EH5, and EH11 fall into the B. subtilis group, FH17 and TH16 were identified as B. velezensis, while FH5 and edo6 belong to the B. endophyticus and P. xylanexedens, respectively. The ten strains were selected from the rhizosphere soil and plant tissues due to their activities against tomato phytopathogens, which indicated the presence of some important antimicrobial gene clusters. By using antiSMASH 5.0 [18] and BAGEL4 [19], a total of 120 BGCs were found, averaging 12 clusters per genome. All the BGCs were designated as those encoding NRPSs, PKSs, terpenes, hybrid NRPS/PKSs, bacteriocins, RiPPs and others (Table 1A). The BGCs encoding surfactin [20], fengycin [20], bacillibactin [21], subtilosin A [22], bacillaene [23], macrolactin [24], difficidin [25], and subtilin [26] were discovered in the genomes. Besides, some BGCs encoding unknown compounds, were also identified (Table 1B). Most of the unknown BGCs (76.47%) are PKSs BGCs, which cannot be assigned to any known compounds. 73.07% bacteriocins BGCs encodes potential novel peptides. 27.78% and 27.27% of NRPSs and Hybrids BGCs are still unknown. These findings provide a great opportunity of new bioactive compounds discovery.
A.
Strains | Predicted BGCs | NRPS | PKS | Hybrid NRPS/PKS | Terpene | Bacteriocin | Other |
Bacillus subtilis BH5 | 12 | 4 | 1 | 1 | 2 | 3 | 1 |
Bacillus subtilis BH6 | 12 | 4 | 1 | 1 | 2 | 3 | 1 |
Bacillus subtilis DH12 | 12 | 4 | 1 | 1 | 2 | 3 | 1 |
Bacillus subtilis EH2 | 10 | 3 | 1 | 1 | 2 | 2 | 1 |
Bacillus subtilis EH5 | 11 | 3 | 1 | 1 | 2 | 3 | 1 |
Bacillus subtilis EH11 | 12 | 4 | 1 | 1 | 2 | 3 | 1 |
Bacillus endophyticus FH5 | 10 | 2 | 1 | 1 | 2 | 3 | 1 |
Bacillus velezensis FH17 | 15 | 5 | 4 | 1 | 2 | 1 | 2 |
Bacillus velezensis TH16 | 12 | 4 | 4 | 1 | 1 | 1 | 1 |
Paenibacillus xylanexedens edo6 | 14 | 3 | 2 | 2 | 1 | 4 | 1 |
B.
BGC Types | Total BGCs | % Unknown | Known compounds |
NRPSs | 36 | 27.78 | surfactin (8 BGCs),fengycin (8 BGCs), bacillibactin (10 BGCs) |
PKSs | 17 | 76.47 | macrolactin (2 BGCs), difficidin (2 BGCs) |
Hybrids | 11 | 27.27 | bacillaene (8 BGCs) |
Bacteriocin | 26 | 73.07 | subtilin (2 BGCs), subtilosin A (6 BGCs) |
Table 1. Distribution of BGC totals in 10 isolated strains (A) and percentages of BGCs encoding unknown compounds identified in genome sequence (B).
Novel NRPs and PKs BGCs identified from the 10 strains
The majority of BGCs could be assigned to known compounds, whereas 5 clusters represented probably novel NRPs and NRPs/PKs hybrid BGCs for which no or low similarity BGCs could be identified in the MIBiG [27] database (Fig. 3).
Two novel gene clusters were identified from B. endophyticus FH5. One NRPs (Fig. 3a) BGC consists of three genes and has a total size of 25 kb. Three genes are encoding 24 domains, which includes 7 condensation (C) domains, 7 adenylation (A) domian, 7 thiolation (T) domain, 2 epimerization (E) domain and 1 thioesterase (TE) domain. All the domains are essential components in this BGC and catalyze primary formation of a lipopeptide product. This BGC is showing no similarity to any known BGCs reported. The other one (Fig. 3b) is a Type I PKs-NRPs hybrid BGC with a size of approximately 30 kb. The PKs module consists of a ketosynthase (KS) domain, a acyltransferase (AT) domain, an acyl carrier protein (ACP) domain and a terminal reductase (TD) domain. It likely incorporates the polyketide moiety of malonyl-CoA, while the NRPs modules incorporate six amino acid residues. Based on antiSMASH analysis, only 28% genes show similarity to the known paenilamicin BGC. Paenilamicin [28], synthesized by pam BGC from Paenibacillus larvae DSM25430, has antibacterial and antifungal activity. The pam cluster consists of five NRPs genes, two Type I PKs genes, and two Type I PKs-NRPs hybrid genes, and has a size of ∼60 kb. In contrast, the Type I PKs-NRPs hybrid BGC identified in B. endophyticus FH5 consists of only three NRPS genes and one Type I PKS gene. All of them differ from the pam cluster of Paenibacillus larvae DSM25430.
In the genome of P. xylanexedens edo6, two novel trans-AT PKs-NRPs hybrid gene clusters (cluster 13 and cluster 12) were discovered, which have the sizes of almost 35 kb and 28 kb, respectively (Fig. 3c and 3d). The order and domain of the genes of both hybrid clusters differ from each other. Specifically, Cluster 13 has an additional dehydratase domain variant (DHt) playing an important role during polyketide biosynthesis through the dehydration of the nascent polyketide intermediate to provide olefins [29], which cannot be found in cluster 12. In addition to the differences observed at the domain level of core biosynthetic genes, regulator and transporter genes are also different. Moreover, only 33% and 21% of the genes of cluster 13 and cluster 12 exhibit similarity to known pellasoren and xenocoumacin BGCs respectively. Pellasoren [30] was isolated from myxobacterium, which has shown to possess potential anti-cancer activity. The known pellasoren BGC, is a Type I PKs-NRPs hybrid cluster identified from Sorangium cellulosum So ce38 and consists of six genes of Type I PKs and one single gene of NRPs as compared to the trans-AT PKs-NRPs hybrid gene (cluster 13) of P. xylanexedens edo6, which in turn consists of four trans-AT PKs genes and one trans-AT PKs-NRPs hybrid gene. Xenocoumacin [31] is the main anti-bacterial and anti-fungal compound produced by Xenorhabdus nematophila. The known xenocoumacin BGC, also being a Type I PKs-NRPs hybrid cluster, which was identified from Xenorhabdus nematophila ATCC 19061, consists of four genes of Type I PKs and two genes of NRPs whereas cluster 12 from P. xylanexedens edo6 consists of one single trans-AT domain gene, one gene of trans-AT PKs and one gene of trans-AT PKs-NRPs hybrid.
One novel NRPs BGC was discovered both in B. velezensis FH17 and TH16 (Fig. 3e). This BGC contains seven genes with a size of approximately 33 kb. Whereas seven modules are only encoded by two core biosynthetic genes, seven amino acids are incorporated into the final product. This BGC shows no similarity to any known clusters. Furthermore, a single heterocyclization (Cy) domain in the first module is found. This domain first catalyzes amide bond formation, and then the intramolecular cyclodehydration between the side chain of the first amino acid (Cys) and the backbone carbonyl carbon takes place to form a thiazoline ring [32]. This ring is important for the structure and function of this lipopeptide product. So far, many well-known drugs for anti-microbial and anti-cancer activity exhibit thiazoline rings [33], such as Sulfathiazole (anti-microbial drug), Ritonavir (anti-retroviral drug), Tiazofurin (anti-neoplastic drug) and Abafungin (anti-fungal drug) [34]. These findings suggest the potential anti-microbial activity of the compounds produced by this BGC in B. velezensis FH17 and TH16 .
Novel Ribosomally synthesized and Post-translationally modified Peptides (RiPPs) identified in the 10 strains
A total of nine novel bacteriocin BGCs were identified from the 10 strains (Fig. 4). All of them are belong to RiPPs (less than 10 kDa). These peptides are ribosomally synthesized, and undergo posttranslational modifications (PTMs), resulting in different structures and properties, mainly showing anti-bacterial activity against closely related producer strains [35].
Two novel gene clusters were identified as class I lanthipeptide BGCs. One lanthipeptide BGC was identified from both B. subtilis DH12 and EH11 with a size of ∼6 kb (Fig. 4a). This BGC consists of four genes. The precursor peptide contains 59 amino acids, which shows no similarity to any known bacteriocins. Another one lanthipeptide BGC (Fig. 4b) was identified from P. xylanexedens edo6 with a size of ∼9 kb. This BGC contains seven genes. The precursor peptide encoded by the core biosynthetic gene contains 59 amino acids, which also shows no similarity to any known bacteriocins.
Three novel BGCs were identified as class II lanthipeptide BGCs. All of them belong to two-component lanthipeptides consisting of two peptides. The individual peptides of two-component lanthipeptides only have little or no antimicrobial activity, but the two peptides act in synergy to exhibit significantly higher activity in equimolar concentrations [36]. Both B. subtilis BH5 and BH6 harbor the same two-component lanthipeptide BGC (Fig. 4c). It consists of six genes with a size of ∼9 kb. This BGC has 70% of genes showing similarity to staphylococcin C55 α/β BGC [37]. The presursors of two core biosynthetic genes (α and β) of this BGC identified contain 65 and 67 amino acids respectively. The C terminus (from C36 to K65) of the α precursor is belonging to the plantaricin C family of lantibiotics with a identity of 83.33% to the known peptide staphylococcin C55 α. Whereas the C terminus (from I38 to C67) of the β precursor shows 62.07% identity to lacticin 3147 A2 [38]. The second novel class II lanthipeptide BGC was discovered from B. subtilis EH5 (Fig. 4d). This BGC has six genes with a length of ∼9 kb. The presursors of two core peptide genes (α and β) contain 65 and 67 amino acids respectively. It is also showing 70% gene sequence similarity to staphylococcin C55 α/β BGC. The C terminus (from C36 to C64) of the α precursor has a similarity of 79.31% to the known peptide staphylococcin C55 α and the C terminus (from W38 to C63) of the β precursor is showing 72% identity to lacticin 3147 A2. The third BGC was identified from B. endophyticus FH5 (Fig. 4e). It is comprised of nine genes with a size of ∼10 kb. Its precursors of two peptides (α and β) contain 58 and 54 amino acids respectively. There is no similarity found to any known BGCs. The C terminal region (from A28 to C58) of the α precursor has a similarity of 53.33% to the known peptide plantaricin W α [39] and the C terminus (from A23 to D54) of the β precursor is showing 56.25% identity to haloduracin β [40]. Furthermore, the precursor β in this potential novel BGC found in B. endophyticus FH5 has four replicates, indicating potential high amount production of β peptide.
Two novel gene clusters were identified as class III lanthipeptide BGCs. This Class contains RiPPs that are modified by the mutifunctional enzymes LanKC. LanKC firstly phosphorylates the Ser/Thr residuses in the substrate peptide and then similarly catalytizes modification of the substrate to form the final product, as the class II lanthipeptide LanM enzyme [41]. The one identified from B. subtilis EH2 contains ten genes with a size of ∼8 kb (Fig. 4f). No similarity was found to any known BGCs. The full precursor contains 58 amino acids. The predicted cleaveage site by antiSMASH is between T27 and G28. The C terminus (from G28 to N58) of the precursor has no identity to any known RiPPs. The other class III lanthipeptide BGC is harbored by B. velezensis TH16 (Fig. 4g). This one contains five genes with a length of ∼5 kb. The core biosynthetic gene encodes a 45-amino acid precursor peptide. 35% genes of this BGC show similarity to locillomycin [42], which is a cyclic lipopeptide (NRPs) discovered from B. subtilis 916. The predicted cleaveage site is between V21 and D22 by antiSMASH and the C terminus (from D22 to C45) of the precursor has no identity to any known RiPPs.
Two novel lasso peptide BGCs were identified from the genomes of P. xylanexedens edo6 and B. endophyticus FH5. The one from P. xylanexedens edo6 contains eight genes with a size of ∼8 kb (Fig. 4h). It shows that gene sequences are 60% similar to that of the paeninodin BGC [43]. The precursor peptide contains 45 amino acids. The predicted cleaveage site is between M22and A23. The core peptide (from A23 to S45) shows 33.3% identity to the paeninodin[43] from P. dendritiformis C454. Another novel lasso peptide BGC was mined from B. endophyticus FH5 (Fig. 4i). This BGC comprised of six genes. It is showing 80% genes similarity to paeninodin. Its precursor peptide contains 45 amino acids. The cleaveage site is between M20 and A21. The core peptide (from A21 to S45) has 76% identity to the paeninodin.
Large-scale genome-based analysis of the bioactive potential of Bacillus
Lipopetides produced by the Bacillus genus are involved in the biocontrol mechanisms of plant pathogens [44]. To gain a general overview of BGCs distributed in the genomes of Bacillus genus, the diversity of BGCs in the genomes of Bacillus isolated was investigated. The complete genomes of 555 Bacillus strains from 60 species of Bacillales were downloaded from Genebank and analyzed by antiSMASH 5.0 [18]. A total of 9459 BGCs were predicted and identified, which included NRPs (2377 BGCs), RiPPs (1564 BGCs), Type I PKs (517 BGCs), PKs-NRPs hybrids (309 BGCs), PKs (including Trans AT-PKs and Type III PKs) (1369 BGCs), Terpene (970 BGCs), Saccharide (62 BGCs) and Others (2291 BGCs). The BGCs were then analyzed using BiG-SCAPE [45], a program that constructs sequence similarity networks of BGCs and groups them into Gene Cluster Families (GCFs). For visualization, the distance matrix between BGCs generated by BiG-SCAPE was used in Cytoscape [46]. The similarity network of predicted BGCs revealed that a large number of BGCs are present in Bacillus strains, and are distributed throughout different kinds of secondary metabolites (Fig. 5). Based on our investigation, some of the NRPs BGCs were conserved among the BGCs identified in the Bacillus species. 259 out of 2377 (10.85%) NRPs BGCs were encoding surfactin, 330 (13.88%) BGCs were encoding bacillibactin, 110 (4.63%) NRPs BGCs were encoding fengycin, 158 (6.65%) NRPs BGCs were encoding petrobactin [47]. And 38 (1.60%) NRPs BGCs were encoding lichenysin [48]. Thus, a total of ∼38% of the NRPs BGCs are correlated to already reported compounds. Additionally, most of PKs-NRPs hybrid BGCs (67.64%) were ecoding bacillaene. Unlike the well-described NRPs and PKs-NRPs hybrid BGCs, the PKs BGCs were mostly attributed to unknown products with the exception of macrolactin [49] and difficidin [25]. Notably, 1357 out of 1564 (87.76%) RiPPs BGCs were also unknown. Overall, the distribution of known and unknown BGCs vary dramatically across the different kinds of metabolites in Bacillus species, in which the NRPs BGCs are the most abundant ones, comprising 2377 BGCs. Many of them are conserved and already characterized, but still a large number of unknown NRPs BGCs are identified for further study.