Genome features of B. clausii ENTPro
De novo assembly of PacBio sequencing reads of B. clausii ENTPro gDNA resulted in two contigs: one long circular contig of 4,264,866 base pairs (bp) and one short circular 31,475 bp contig. The long contig represents the composite circular chromosome of Bacillus clausii ENTPro with an average GC content of 44.75% (Fig. 1) and the smaller one (GC Content: 39.9%) is likely a plasmid. In addition, Illumina sequencing-based assembly resulted in 4.3 Mbp genome from 36 contigs and N50 of 344,696 bp, which overlaps completely with the genome assembled using PacBio reads. The composite genome obtained from PacBio sequencing reads was submitted to GenBank [NC_006582.1] and further used for all the comparisons in this study. Bacillus clausii ENTPro genome is 99.8% similar to another probiotic strain B. clausii B106 [NFZO01] (Fig. S1A), followed by 94.3% similarity to B. clausii KSM-K16 [NC_006582.1] (Fig. S1B), whereas other members of the same species are 50-94% similar. This suggests that the members of this species are quite diverse as characterized by their GGDC values (Table S1). Our analysis suggests that probiotic strains within B. clausii such as ENTPro, B106, and UBBC-07 are highly similar to each other as compared to other strains.
The plasmid sequence is novel and does not have any close similarity with other plasmids in the NCBI nucleotide database (NT). Most of the proteins encoded by the plasmid sequence are hypothetical and are not functionally characterized. We mapped Illumina reads against the plasmid database downloaded from NCBI to identify if we could obtain hits to any previously known plasmids. Very few reads mapped on to known plasmids and no full plasmid could be retrieved using the Illumina reads. Therefore, we concluded that the identified plasmid sequence harbored by B. clausii ENTPro is novel.
Annotation of the B. clausii ENTPro genome revealed the presence of 4,384 protein-coding sequences, which constitute 86.73% of the genome with an average length of 843 bp (ranging from 113 to 9,509 bp) (Table 1). A total of 1,215 Coding DNA Sequences (CDS) were annotated as hypothetical proteins, accounting for 27.72% of the total proteins. The ENTPro genome has all the three proteins R (restriction), M (modification), and S (specificity) that belongs to the Type I RM system. m6A methylation was observed in >96% of the motifs GAGNNNNNNRTGC and GCAYNNNNNNCTC in the genome at 2nd and 3rd positions, respectively. There are 75 tRNA genes and seven complete rRNA operons (>99% identity) in the B. clausii ENTPro genome. 16S rRNAs obtained from the de novo assembly of B. clausii ENTPro genome shows 99.8% similarity with B. clausii Enterogermina strains O/C, T, N/R, and SIN. This is in line to previously known variations in 16S rRNA genes in bacterial genomes . Most of the varying sites were present in the V1 region of the 16S rRNA sequences even in B. clausii KSM-K16 and B. clausii DSM 8716(Fig. S2).
Amongst the total proteome, ~75% (3,311) proteins could be categorized into Clusters of Orthologous Groups (COGs) functional groups. Among these mapped proteins, ~35% belonged to the metabolism category, ~14% to cellular processes and signaling and ~16% proteins to information storage and processing. According to COG mapping data, 152 proteins are involved in signal transduction mechanisms (COG: T) and 44 proteins were reported to function in secondary metabolites biosynthesis, transport, and catabolism (COG: Q). COG assignments to proteomes of B. clausii members revealed that all the organisms have similar number of proteins assigned to various COG categories (Fig. 2).
Phylogenetic position of B. clausii as inferred from housekeeping proteins-based phylogeny
Phylogenetically, B. clausii clustered in a separate clade with further grouping within this clade (Fig. 3). The phylogenetic tree reveals that ENTPro strain is closest to the B106 strain of B. clausii. Both these probiotic strains are further similar to another probiotic strain UBBC-07 of B. clausii. All these probiotic strains share a common ancestor with industrial B. clausii KSM-K16 strain. This phylogenetic placement of B. clausii probiotic strains is concordant with the whole genome similarity matrix as obtained by genome-genome distance calculator (GGDC) . Other B. clausii “Heroin” strains form several different groups within the B. clausii clade. Interestingly, the B. clausii proteome matches the proteome of other Bacillus species at <70% identity. This clearly suggests the genomic heterogeneity of B. clausii genome in comparison to other Bacillus species. We also included all Bacillus probiotics genomes in phylogenetic analysis to investigate their position phylogenetically . Bacillus probiotics shared clades with their species members. Interestingly, probiotic strains cluster together e.g. B. clausii, B. coagulans and B. subtilis.
B. clausii ENTPro as a derived strain from four different strains
B. clausii Enterogermina® is a mixture of four different strains each of which is supposed to confer resistance against specific antibiotics, namely novobiocin and rifampicin (strain N/R), chloramphenicol (strain O/C), streptomycin and neomycin (strain SIN) and tetracycline (strain T) . The specific genes conferring resistance could not be traced in the literature so different in silico strategies were employed to identify possible genes that could help impart resistance to these antibiotics in B. clausii ENTPro (Table S2 and S3).
Rifampicin: Rifampicin resistance is acquired by specific mutations at positions 516, 526 and 531 in the rpoB gene in Escherichia coli . These mutations are mapped in the center of the rpoB gene in 3 regions: one cluster covering 507-533 amino-acid (AA); cluster II covering AA 563–572 and cluster III with AA change at position 687, which altogether are referred to as RIF resistance determining region (RRDR) . In order to find the presence of RRDR region in RpoB protein in ENTPro, the RpoB protein sequences from all Bacillus spp. were retrieved and aligned with E. coli RpoB protein sequence [Accession Number: NP_418414.1]. P524->L (corresponding to 567 AA position in E. coli RpoB protein sequence) AA change was observed in B. clausii ENTPro strain that was not observed in other Bacillus spp. (Fig. S3).
Chloramphenicol: Chloramphenicol acetyltransferase, involved in conferring resistance against chloramphenicol , was identified from the proteome of B. clausii ENTPro [Accession Number: WP_035203840.1].
Streptomycin: Pfam domains, known to impart resistance against streptomycin, were identified in B. clausii ENTPro. Nine proteins in B. clausii had the Pfam domain PF02522, PF01636, PF01909, PF04439, PF04655, PF07091, PF07827, and PF10706 that has core domain aminoglycoside. Two proteins had streptomycin adenylyltransferase domain (PF04439), six proteins have aminoglycoside phosphotransferase [PF01909] domain and one protein has Kanamycin nucleotidyltransferase [PF07827] domain in their sequence (Table S2). This suggests the presence of domains that are involved in imparting resistance to streptomycin. In addition, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis of the organism reveal the presence of complete KEGG pathway for the streptomycin biosynthesis in the B. clausii ENTPro (Fig. S4).
Tetracycline: The domains conferring tetracycline resistance [RF0133, RF0134, RF0135, and RF0127] were present in B. clausii ENTPro (Fig. 4). The presence of these genes in the composite genome of B. clausii ENTPro was further confirmed by mapping the Illumina reads to these genes.
Probiotic Properties in B. clausii ENTPro
Probiotics are beneficial components of microbiota that modulates immunological, respiratory and gastrointestinal functions . For imparting these functions probiotics adhere to the mucosal membrane to interact with the host and have acidic, alkaline and oxidative stress resistance and stress adaptation proteins . Probiotics are believed to have good adherence capacity which promotes gut residence time, pathogen elimination and adhesion to the epithelial layer of host cells and exerting immune modulation.
Pfam analysis reveals the presence of three proteins involved in adhesion namely a mucus-binding protein with ‘Gram_pos_anchor’ Pfam domain [PF00746] at the C-terminus, a collagen-binding protein with LPXTG motif at the C-terminus and a fibronectin-binding protein  (Table S2). These adhesion proteins may help facilitate the probiotic bacterium to bind and help in the direct interactions with the intestinal mucosa layer.
Probiotic B. clausii has to encounter various harsh environmental conditions during transit in the GIT such as the acidic environment in the stomach, bile juice environment in the small intestine, oxidative stress, and osmotic stress . When a bacterium faces an acidic environment, H+ homeostasis is maintained by F0F1 ATP synthase pump, which work by hydrolyzing ATP to pump protons (H+) from the cytoplasm [1, 31]. We found that this synthase complex is present in ENTPro genome as a full operon [DB29_02342--DB29_02349].
The bacteria have to face the toxicity of bile salts that induce intracellular acidification and act as detergents that disrupt biological membranes . Five proteins were identified that were involved in bile tolerance mechanism; two belong to ornithine decarboxylase  and three to sodium bile acid symporter family [34, 35] (Table S2).
B. clausii ENTPro also harbors general stress adaptation proteins. The universal stress protein UspA [PF00582] is important for survival during cellular growth arrest and reprograms the cell towards defense and escape during cellular stress [36, 37]. Molecular chaperones that may impart resistance against environmental stress were obtained through annotation and Pfam domain search such as the chaperonin GroES [PF00166] and GroEL [38, 39] and one heat shock protein 33 [PF01430], two copies of cold shock proteins CspA [PF00313], three Clp protease [PF00574] and HtpX and HrcA-like heat shock proteins. These proteins play an important role in basic cellular functions that includes growth, the stability of DNA and RNA and they also prevent the formation of inclusion bodies [40–42].
For hyperosmotic stress and heat resistance, B. clausii ENTPro harbors one copy each of the chaperone protein DnaJ [PF00226] and nucleotide exchange factor GrpE [PF01025]. Also, two methionine sulfoxide reductase A  [PF01625] were present in B. clausii ENTPro that provides resistance in oxidative stress (Table S2). This suggests that B. clausii ENTPro has proteins to improve adhesion and handling stress and harsh conditions in the human gut.
Antibiotic Resistance in Bacillus Probiotics
Antibiotic resistance is a common phenomenon in Gram-positive bacteria [44–46]. It is accomplished by genes acquired either horizontally through plasmids, or foreign DNA recombination, or mutations at different chromosomal loci in the bacterial genome . It is preferred that probiotic strains carry few antibiotic resistance genes as possible so that they are not a putative source for transferring these genes to other gut bacteria including pathogens . However, on the other hand since some of these probiotics are administrated alongside antibiotics, some resistance to commonly administrated antibiotics are desirable.
Presence of a novel plasmid sequence in B. clausii ENTPro could be a possible source of antibiotic-resistance gene transfer but we could not identify any potential antibiotic-resistance domain(s) in the plasmid. We also searched for the presence of antibiotic resistance genes and efflux pumps in the genomes with multiple methods to avoid false positives.
The Chloramphenicol acetyltransferase, that confers resistance against chloramphenicol, is absent in B. amyloliquefaciens and B. coagulans whereas chloramphenicol efflux pump was present in B. amyloliquefaciens (Fig. 4). This would imply the presence of chloramphenicol resistance in all the Bacillus probiotics except B. coagulans. Different classes of beta-lactamase were present in one or the other Bacillus probiotics that clearly suggest the presence of resistance against Penicillin in all the Bacillus probiotics. Multidrug resistance protein, a universal stress protein, EmrB, and its efflux pump, tetracycline resistance protein, and penicillin-binding protein are present in all the Bacillus probiotics. This suggests that most of the Bacillus probiotics are resistant to common antibiotics.
Erythromycin resistance was identified by subjecting the erm (34) gene sequence (GenBank Identifier: AY234334) of B. clausii DSM8716 to BLASTn against all Bacillus genomes. This gene was identified in B. clausii ENTPro named as “SSU rRNA (adenine(1518)-N(6)/adenine(1519)-N(6))-dimethyltransferase” (GenBank Identifier: ALA53582). The gene was also identified in all the B. clausii genomes. The gene sequence shared 61% identity to rRNA adenine methyltransferase of B. halodurans and 57% identity to rRNA adenine methyltransferase of B. licheniformis, B. anthracis, B. sonorensis and B. fordii. The rRNA adenine methyltransferase gene from other Bacillus spp. shared 20-50% identity with erm (34) gene. The result reveals that the erm (34) gene is unique to B. clausii and is not present in other members of the Bacillus genus.
Vancomycin resistance, as observed from KEGG pathway analysis, (Fig. S5) was identified only in B. toyonensis while absent in other Bacillus probiotics. The accessory proteins of vancomycin resistance operon were present in some of the Bacillus probiotics, but resistance-conferring genes were completely absent.
We would like to add an advisory note that previous studies have shown that an organism may exhibit intrinsic resistance to a few antibiotics that could not be related to its genotype . Though we have endeavored to relate the genome-level occurrence of antibiotic resistance proteins or domains to their probable phenotypes, we have not performed any phenotypic studies to substantiate these analyses and/or confirm for intrinsic resistance. Further, the current situation may constitute a safety concern because of the possibility of transfer of antibiotic gene transfer to gut flora .
Bacteriocins in Bacillus probiotics
Bacteriocins are proteinaceous toxins produced by bacteria that act as narrow-spectrum antibiotics to inhibit the growth of similar or closely related bacterial strains [48, 49]. They can help probiotics to survive the toxins produced by invading bacteria by inhibiting their growth and hence can result in beneficial effects on the hosts. The identified bacteriocins in all the probiotics are represented in a presence-absence binary matrix in Fig. 5. Several of these bacteriocins are already well utilized in therapeutics  and their spectrum against pathogens is well established [9, 51–53].
Gallidermin identified via in silico analysis in B. clausii genomes is known to efficiently prevent biofilm formation in the pathogens S. aureus and S. epidermidis species . This bacteriocin has also been reported to be effective in skin disorders including acne, eczema, folliculitis, and impetigo where the targets organisms are Propionibacteria, Staphylococci, and Streptococci .
Lacticin 3147 A2 and Leucocyclin Q as identified in B. amyloliquefaciens are broad-spectrum bacteriocins. Lacticin has been used effectively in the treatment of bacterial mastitis, Staphylococcal and Enterococcal infections including vancomycin-resistant Enterococci  and is effective against Listeria infections . Similarly, leucocyclicin Q exhibit bactericidal or bacteriostatic effects on Gram-positive bacteria, including food-borne pathogens, such as Lactococcus, Weissella paramesenteroides, Pediococcus dextrinicus, Enterococcus, Streptococcus, and Leuconostoc . Plantazolicin identified in B. amyloliquefaciens and B. pumilus has nematicidal activity . Cirucularin A produced by B. coagulans has been reported to be the most effective bacteriocin against C. tyrobutyricum NIZOB570, a known cheese-spoilage bacterium  and also Lactococci, Enterococci, and some Lactobacillus strains . LichenicidinVK21A2 identified in B. paralicheniformis is considered as self-immunity bacteriocin that exhibits antimicrobial activity against several strains of Listeria monocytogenes, methicillin-resistant S. aureus, and vancomycin-resistant Enterococcus . Zoocin A in B. toyonensis shows antimicrobial activity against several other Streptococci by cleaving the peptidoglycan cross-links of the target cell wall .
Subtilosin A produced by B. subtilis is also a broad range bacteriocin that is effective against Listeria monocytogenes, and strains of E. faecalis, P. gingivalis, K. rhizophila, Enterobacter aerogenes, Streptococcus pyogenes, and Shigella sonnei . Sporulation-killing factor skfA produced by B. subtilis induces the lysis of other B. subtilis cells that have not entered the sporulation pathway. This cannibalistic behavior provides a source of nutrients to support those cells that have entered sporulation [59, 60]. At high concentrations, it can also inhibit the growth of other bacteria . The presence of well-characterized bacteriocins in the Bacillus probiotics suggests their important role in fighting against the pathogen in the gut.
Folate Biosynthesis Pathways in Bacillus Probiotics
The gut microbiota aids the host, playing a crucial role in nutrient digestion and energy recovery. Due to potentially relevant applications, the capacity to yield folate has been investigated in various probiotic strains. Previously, the presence of these pathways was reported in Lactobacillus and Bifidobacterium probiotics but was not explored in Bacillus probiotics  except B. subtilis . We performed the identification of key components of folate production pathways in Bacillus probiotics using KEGG Pathway database . The analysis of genome sequences of Bacillus probiotics revealed the presence of complete operon to synthesize para-aminobenzoic acid (PABA) de novo only in B. subtilis probiotics (Fig. 6). On the other hand, the enzymes, necessary for chorismate conversion into PABA are present in almost all the Bacillus probiotics. Moreover, the shikimate pathway for chorismate production is complete only in B. subtilis, B. pumilus and B. toyonensis, while it is partial in all the other Bacillus probiotics. On the other hand, Bacillus probiotic strains contain the genes of DHPPP de novo biosynthetic pathway, the gene encoding dihydropteroate synthase (EC 126.96.36.199) and gene encoding dihydropteroate synthase (EC 188.8.131.52). Therefore, it is expected that these strains are not auxotrophic for folates or DHP but can produce folate in the presence of PABA supplementation. The presence/absence of the components of the folate biosynthesis pathway is reported based on KEGG pathway analysis. Previous studies have revealed that B. subtilis genome harbor all the pathways components and have been engineered for folate production [63, 65, 66].