Summary of the sequencing and assembly of strain D30202. Using the combination of third-generation PacBio and second-generation Illumina technology, the whole genome of strain D30202 was sequenced. The sequencing results showed that the whole genome size of the strain was 3,777,154 bp, the average GC content was 41.32%, the total coding genes were 3809, including 3 prophages, and the numbers of tRNAs and rRNAs were 82 and 24, respectively. In addition, the numbers of tandem and interspersed repeats were 63 and 29, respectively (Fig. S1). Their total lengths were 7803 bp and 1957 bp, respectively, and the sequence ratios were both 0.05%. The analysis found that the whole-genome size and average GC content of strain D30202 were within range of Bacillus species (Table 1).(whole-genome size:1.13—9.5 Mp, GC content: 32.8—70.7%)
Table 1 Assembly and component information for the B. altitudinis D30202 genomes
Assembly information
|
B. altitudinis D30202
|
Genome size(bp)
|
3,777,154
|
GC content(%)
|
41.32%
|
Component analysis information
|
|
CDS
|
3809
|
GC content(%)
|
41.82%
|
Prophage Number
|
3
|
tRNA Number
|
82
|
16SrRNA Number
|
8
|
5SrRNA Number
|
8
|
23SrRNA Number
|
8
|
Tandem repeat
|
63
|
Interpersed repeat
|
29
|
Annotation analysis of basic functions of the strain D30202 genome. Using the Nr, Swiss-Prot, KEGG and COG functional annotation databases, the protein sequences of predicted genes were compared in each functional database by BLAST and Diamond, and a total of 3809 genes were obtained (Fig. S2). Among them, the numbers of annotated genes from Nr, Swissport, COG and KEGG analyses were 3805, 3125, 2774 and 2280, respectively, the number of unannotated genes was 4, and there were 2064 genes annotated by four databases simultaneously. Based on the assembled genome sequence, combined with the prediction results of coding genes and COG annotation, the genome circle map (Fig. S3) and COG annotation functional classification map were drawn (Fig. 1). Among the 3809 genes predicted, there were 3354 COG functional classification genes, including 596 genetic information category genes and 549 cell-related categories, with 54 genes related to defense mechanisms and 171 genes related to signal transduction mechanisms. Moreover, there were 1493 genes related to the metabolic category, only 425 genes predicted by routine function and 291 genes with unknown function.
KEGG metabolic pathway prediction. From the analysis of 134 metabolic pathways annotated by KEGG (Fig. 2), it was found that strain D30202 had a complete herbicide active substance 4-hydroxy-3-methoxycinnamic acid phenylalanine metabolic pathway and herbicide active substance indole derivative tryptophan metabolic pathway (Fig. 3, Fig. S4). At present, Comparison found that strain D30202 has multiple pathways to produce indole acetic acid, which is the downstream substance of indole derivatives. After a detailed search and comparison, the production of 2-(hydroxymethyl) phenol was not found, but through the correlation path search, the upstream precursor 2-hydroxyphenylacetic acid of 2-(hydroxymethyl) phenol was found (Fig. S5). The enzymes and marker genes used in the biosynthesis pathway of 4-hydroxy-3-methoxycinnamic acid were obtained by comparing DIAMOND with the KEGG database. The main enzymes involved are phenylalanine ammonia-lyase ([EC:4.3.1.24], [EC:4.3.1.25]) that catalyzes the direct removal of ammonia on L-phenylalanine to produce trans-cinnamic acid, and the trans-cinnamic acid 4-monooxygenase ([EC:1.14.14.91]) can oxidize DADPH to NADP+ and water under the action of oxygen, at the same time, it can hydroxylate the 4-position of the benzene ring of trans-Cinnamic acid to form 4-hydroxycinnamate. The p-coumarate 3-hydroxylase ([EC:1.14.13.-]) can catalyze the hydroxylation of 3-C of 4-coumarate to generate caffeate (3,4-dihydroxy-trans-cinnamate), Besides, there is also caffeic acid 3-O-methyltransferase([EC:2.1.1.68]), its function is catalyzes the conversion of caffeate to 4-hydroxy-3-methoxycinnamic acid. The marker gene D30202_01823 encodes lyases, D30202_03503 encodes oxidoreductases, D30202_00958 and D30202_01056 encode hydroxylases, D30202_00280, D30202_01060, D30202_01249, D30202_03642, D30202_03556, D30202_03413, D30202_02231, and D30202_02018 encode methyltransferases, and the enzymes encoded by these marker genes promote the production of herbicidal active substances.
Advanced functional analysis of strain D30202 genome
CAZy annotation and classification. CAZy is a resource database that is capable of synthesizing or decomposing complex carbohydrates and sugar complexes. The annotation results show that there are a total of 567 gene encoding enzyme system families in the genome of strain D30202 (Fig. S6, Table S1), including 4 types of auxiliary activity (AA) enzymes 4, 18 types of carbohydrate-binding modules (CBMs) 88, 10 types of carbohydrate esterases (CEs) 42, 51 types of glycoside hydrolysis (GHs) enzyme family proteins 192, 22 types of glycosyl transferase (GT) family proteins 238, and 3 types of polysaccharide lyases (PLs) 3. Wild oat, as an annual oat crop, is rich in protein, carbohydrates and crude fiber. Strain D30202 encodes glycoside hydrolyzed enzymes (family 18, YaaH), endoglucanase ([EC: 3.2.1.4]), type III pantothenate kinase ([EC: 2.7.1.33]), cellulase (EglA) and other enzyme genes, prompting studies to further understand the biological control mechanism of strain D30202.
Virulence factor analysis. The genome of strain D30202 was compared with the VFDB database. A total of 287 virulence factor genes were found in the genome of strain D30202, and the genes had 12 functions (Table S2). The largest gene was 907 bp, and the smallest gene was 29 bp.
Resistance gene analysis. The predicted gene sequences were compared to the database maintained by Resistance Gene Identifier (RGI), and the resistance mechanisms, toxin classifications, types of resistant antibiotics and resistance-related genes were studied. Strain D30202 encodes 44 kinds of genes (321 total genes) labeled as having resistance mechanisms, toxin classifications, types of antibiotic tolerance and resistance-related genes, accounting for 8.4% of the total coding genes, mainly including efflux pump inhibitors, transporters, fluoridequinolones, pyrazinamides, tetracyclines, aminocoumarins, enzymes, cycloaliphatic peptides, and isoniazid antibiotic-related resistance genes, of which there are approximately 20 special genes(Table S3). The GyrA, gyrB and parE genes are involved in fluoroquinolone resistance, the pncA and rpsA genes are involved in pyrazinamide resistance, the EF-Tu genes are involved in elfamycin resistance, the fusA, kasA, fabI, fabG, ndh and gidB genes are involved in antibiotic resistance, and the pgsA, cls, liaS, mprF, liaR and gshF genes are involved in daptomycin resistance. katG is an isoniazid resistance gene. The murA gene is involved in transferase. And the function of all genes is to alteration the antibiotic target.
Comparative genomics analysis. Comparing the whole-genome information of strain D30202 with those of the other three sequenced Bacillus species(Table 2), the whole-genome size of strain D30202 was smaller than those of the other three strains, and the GC content and number of tRNAs were lower than those of strains FZB42 and S3-1. The number of CDSs was close to that of strain 6ww6 but was significantly higher than those of strains FZB42 and S3-1. The number of rRNAs was the same as that of 6ww6 and lower than those of the other two strains.
Table 2 Comparison of genome characteristics of strain D30202 with other strains
Items
|
B.altitudinis
D30202
|
- altitudinis
6ww6
|
B.amyloliquefaciens
FZB42
|
B.velezensis
S3-1
|
Genome size(bp)
|
3777154
|
3804206
|
3918596
|
3929772
|
GC content(%)
|
41.32
|
41.30
|
46.50
|
46.50
|
CDS
|
3809
|
3834
|
3734
|
3748
|
Number of tRNAs
|
82
|
81
|
88
|
85
|
Number of rRNAs
|
24
|
24
|
29
|
27
|
Based on the Core-Pan analysis of the whole-genome sequences of 4 strains, 2564 core genes of strain D30202 and 3 model strains and a total of 518 pan genes were identified(Fig. 4A, Fig. 4B). The specific gene numbers of strains FZB42, S3-1, 6ww6 and D30202 were 203, 148, 212 and 216, respectively, and the number of specific genes of strain D30202 was the highest among the four strains. Gene Family analysis showed that the numbers of Gene Families of the four strains FZB42, S3-1, 6ww6 and D30202 were 3704, 3637, 3806 and 3808, respectively; the number of Gene Families of strain D30202 was the largest, with a total of 2564. The average nucleotide identity (ANI) analysis heat graph (Fig. 5A), homologous gene family Venn graph(Fig. 5B) and phylogenetic tree are shown(Fig. 5C). The genome of strain D30202 had the highest homology with the model strain 6ww6, which was clustered in the same branch and had a genetic distance of approximately 0.01 relative to the other two model strains.
Prediction and analysis of secondary metabolite gene clusters. Through antiSMASH online prediction and NCBI BLAST comparison analysis, it was found that strain D30202 encoded 10 secondary metabolite synthesis gene clusters (Fig. S7, Table 3), of which 2 gene clusters could find identified gene clusters with high similarity, the other 4 gene clusters could find identified gene clusters with low similarity, and 4 gene clusters could not find identified gene clusters. Table 3 shows that the secondary metabolites encoded by strain D30202 are lichenysin, fengycin and bacillibactin in the nonribosomal pathway (NRP), carotenoids in the terpene biosynthesis pathway and bacillysin in other pathways. In addition, there is RiPP:Bottromycin A2 in the Bottromycin pathway. In the comparison with known gene clusters, the similarity of lichenysin- and bacillysin-coding gene clusters was 85%, that of carotenoid-coding gene clusters was 50%, that of fengycin- and bacillibactin-coding gene clusters was 53%. The similarity of Bottromycin A2-coding gene cluster is the lowest, at 6%.
Table 3 Gene clusters of secondary metabolite of B.altitudinis D30202
Type
|
Region
|
Most similar (known)cluster
|
From To
|
Products Pathways Similarity Resources
|
NRPS
|
362336
|
444093
|
lichenysin
|
NRP
|
85%
|
Bacillus licheniformis DSM 13
|
RRE-containing
|
897947
|
918852
|
—
|
—
|
—
|
|
terpene,siderophore
|
1067858
|
1096576
|
carotenoid
|
terpene
|
50%
|
Halobacillus halophilus DSM 2266
|
betalactone
|
1822796
|
1850340
|
fengycin
|
NRP
|
53%
|
Bacillus velezensis FZB42
|
terpene
|
1919077
|
1940954
|
—
|
—
|
—
|
|
T3PKS
|
1980564
|
2021661
|
—
|
—
|
—
|
|
betalactone
|
2479683
|
2512087
|
Bottromycin A2
|
RiPP:Bottromycin
|
6%
|
Streptomyces sp. BC16019
|
other
|
3419395
|
3460816
|
bacillysin
|
other
|
85%
|
Bacillus velezensis FZB42
|
RiPP-like
|
3631941
|
3642285
|
—
|
—
|
—
|
|
NRPS
|
3710873
|
3760583
|
bacillibactin
|
NRP
|
53%
|
Bacillus subtilis subsp. subtilis str. 168
|
Note: “−” represents None.