Comparative Genomic Analysis of Bidobacterium Catenulatum Reveal the Genetic Divergence of its Two Subspecies from Infant and Adult Gut

Background The two subspecies of Bidobacterium catenulatum, B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum, are usually from the infant and adult gut, respectively. However, the genomic analysis of their functional difference and genetic divergence has been rare. Here, 16 B. catenulatum strains, including 2 newly sequenced strains, were analysed through comparative genomics. A phylogenetic tree based on 785 core genes indicated that the two subspecies were signicantly divergent. A phylogenetic tree based on 785 core genes indicated that the two subspecies of B. catenulatum were signicantly separated. The comparison of genomic characteristics revealed that the two subspecies had signicantly different genomic sizes (p<0.05) but similar GC contents. The functional comparison revealed the most signicant difference in carbohydrate utilisation. Carbohydrate-active enzymes (CAZyme) present two clustering patterns in B. catenulatum. The B. catenulatum subsp. kashiwanohense specially including the glycoside hydrolases 95 (GH95) and carbohydrate-binding modules 51 (CBM51) families involved in the metabolism of human milk oligosaccharides (HMO) common in infants, also, the corresponding fucosylated HMO gene clusters were detected. Meanwhile, B. catenulatum subsp. catenulatum rich in GH3 may metabolise more plant-derived glycan in the adult intestine. These ndings provide genomic evidence of carbohydrate utilisation bias, which may be a key cause of the genetic divergence of two B. catenulatum subspecies. horizontal gene transfer; FOS: fructooligosaccharides.


Introduction
Bi dobacterium is a genus of gram-positive, anaerobic microorganisms found in the human gut [1]. Some strains of Bi dobacterium have attracted signi cant attention due to their probiotic function in regulating microbiota and immune metabolism [2][3]. Bi dobacterium catenulatum (B. catenulatum) is an important member of the genus; some of its strains demonstrate favourable probiotic characteristics, such as the preclinical treatment of acute liver injury [4], in vitro inhibition of pathogenic bacteria as well as the ability to stay alive in yoghurt for a long period [5]. These potential probiotic properties suggest that B. catenulatum may be a candidate for probiotics in food or medicine.
Bi dobacterium has long been considered an important intestinal symbiotic bacterium co-evolving with its hosts. Bi dobacterium has signi cant species bias in the intestinal environment of adults and infants [6]. For example, B. bi dum and B. breve are commonly found in the gut of infants, while B. adolescentis usually appears in the intestinal tract of adults [7][8]. Meanwhile, B. catenulatum has the typical characteristic of divergence in the human gut [9]. According to the latest taxonomy [10], B. catenulatum contains two subspecies, B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum. B. catenulatum subsp. kashiwanohense lives speci cally in the gut of infants, while B. catenulatum subsp. catenulatum is more suitable for the adult gut [7][8]11]. Current research suggests that B. catenulatum's adaptation to different hosts is partially due to the functional preference of different subspecies, such as carbohydrate metabolism [11]. However, there is no genomic evidence corresponding to the different functional preferences of the two subspecies. Therefore, it is necessary to ll the gap in the genomic knowledge of the genetic divergence and functional differentiation of the two subspecies; the additional information will be useful for supplementing the existing knowledge on the bacterium and providing scienti c support for their purported health bene ts.
In-species comparative genomics analysis allows for a deeper understanding of the individual characteristics between genomes [12]. However, because the Bi dobacterium genus is strictly anaerobic, it is di cult to culture, thus limiting the number of B. catenulatum genomes sequenced [13]. Recently, newly developed sequencing technologies have begun to uncover the B. catenulatum genomes. [14]. While there have been genomic analyses of this species, most of the genomic information of B. catenulatum remains unexplored.
In the current study, a total of 19 genomes of B. catenulatum species were analysed, including 12 B. catenulatum subsp. catenulatum and 5 B. catenulatum subsp. kashiwanohense from the Refseq database, and 2 newly sequenced (IMAUFB085 and IMAUFB087) strains. The study dissected the genetic background of and functional genomic information in B. catenulatum using comparative genomic approaches. This work not only provides general insights into the genomic differences between two subspecies of B. catenulatum but also reveal the key factors leading to their divergence.

Taxonomic status of B. catenulatum strains
The sequence similarity and taxonomic status among the strains used in this study were con rmed by calculating the pairwise ANI (Fig. 1A) and TNI (Fig. 1B) values of all 20 genome assemblies. Strains with an ANI value of over 95% are generally considered the same species [15]. The ANI and TNI analyses produced similar clustering results, displaying distinct subspecies branches. IMAUFB085 and IMAUFB087 were grouped with most of the B. catenulatum subsp. catenulatum strains; their ANI values compared to that of B. catenulatum subsp. catenulatum JCM1194 T were 98.41% and 98.42%, and TNI values were 87.45% and 84.48%, respectively. These results con rmed the classi cation of IMAUFB085 and IMAUFB087 as B. catenulatum subsp. catenulatum. ANI analysis revealed that 3 B. catenulatum subsp. catenulatum strains, JGBg468, BCJG468 and MC1, signi cantly differed from the other B. catenulatum subsp. catenulatum strains; their ANI values compared to JCM1194 T were 93.83%, 93.88% and 93.86%, respectively, less than the threshold value of 95%. Therefore, these strains were subsequently excluded. In addition, cluster analysis distinguished two subspecies. The ANI value was greater than 95% between the 2 subspecies groups, and greater than 98% within the subspecies, indicating that these strains belonged to the same species.

Comparison of general genomic features between two subspecies
The general information of the strains shows that all B. catenulatum subsp. kashiwanohense strains are derived from infants, while only two strains of B. catenulatum subsp. catenulatum are known to be infantile isolates (Table S1). The genomic features of 17 B. catenulatum strains and two novel strains, IMAUFB085 and IMAUFB087, are summarised ( Table 1). The genomic characteristics within the B. catenulatum species exhibited different degrees of difference. The genome size and GC content of B. catenulatum isolates were 2.16 ± 0.13 Mb and 56.21 ± 0.11%, respectively. A comparison of the basic genomic characteristics of the two subspecies ( Fig. S1) indicated that the genome size of B. catenulatum subsp. kashiwanohense (2.36 ± 0.05 Mb) was signi cantly larger than that of B. catenulatum subsp. catenulatum (2.09 ± 0.07 Mb) (p=0.0021), while there were no signi cant differences in GC content (p > 0.05). The substantial genomic differences re ected the speciation boundaries of the two subspecies, while the similarity in GC content represented a close relationship between them [16][17]. In addition, B. catenulatum subsp. kashiwanohense contained more coding genes (CDSs) than B. catenulatum subsp. catenulatum (p=0.0046) and there were no statistical differences in the number of tRNAs (p > 0.05).
The overall genomic differences between the two subspecies were further explored using the BLAST Ring Image Generator (BRIG) to graphically compare B. catenulatum strains with B. catenulatum subsp.
kashiwanohense strain JCM15439 T as the reference (Fig. S2). Overall, most of the sequences in JCM15439 T were also in all other strains, and the genomes were more than 90% identical. However, two large genomic gaps (GGs) existed separately in the two newly sequenced strains, IMAUFB085 and IMAUFB087, which had less than 70% of the matched degree compared to JCM15439 T . In general, the GG sequences represent hypothetical CDSs, genomic islands or prophages [18]. These data indicate that these two strains have many unknown functions to be explored.

Phylogenetic divergence of two subspecies of B. catenulatum
Classi cation of species and establishment of intraspeci c relationships are frequently based on phylogenetic analysis. This study reconstructed the phylogenetic structure of B. catenulatum species while specifying B. pseudocatenulatum JCM1200 T as an outgroup. An NJtree based on 785 core genes was constructed and visualised with the bootstrap support of 1000 replications ( Fig. 2A). This phylogenetic tree con rmed the subspecies divergence of B. catenulatum. For example, 16 B. catenulatum strains were clearly divided into two subgroups, B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum subgroup, indicating the diacritical differences between the two subspecies at the gene level. Interestingly, the annotation of the source of the isolates suggested a signi cant association between the B. catenulatum strains and their isolated source. Infant isolates, including all B. catenulatum subsp. kashiwanohense strains and 2 B. catenulatum subsp. catenulatum strains, exhibited intraspeci c genetic similarity, while the rest were adult isolates in another cluster, indicating close phylogenetic relationships. These data suggest that the trend of divergence of the B. catenulatum strains may be dependent on their hosts. B. catenulatum may adapt its functions to infant and adult intestines respectively, thus gradually differentiating into different subspecies.
Constructing the pan-core genome of B. catenulatum The gene pool of a population contains all the genetic material and functions of a species. Roary was used to calculate the pan-core genome of the 16 B. catenulatum strains; a total of 4608 pan genes were searched. The above phenomenon is that the two subspecies of B. catenulatum shared 998 core genes (21.66%) (Fig. 2B), indicating a common ancestor of them. The genetic distribution of B. catenulatum showed that there were unique core gene sets in the 2 subspecies, with 87 core genes in B. catenulatum subsp. kashiwanohense and 63 in B. catenulatum subsp. catenulatum (Table S2). The unique core gene sets can provide an internal basis for the differentiation of Bi dobacterium species [19]. Additionally, there were different numbers of strain-speci c genes in the B. catenulatum subspecies; their numbers ranged from 20 to 578, suggesting the potential genetic diversity among B. catenulatum species.
Subsequently, the pan-core gene curves for the genomes of the B. catenulatum species were established (Fig. S3A). With the augmentation by the new genomes, the number of pan genes increased, indicating the existence of an open pan-genome within the species of B. catenulatum. In contrast, the number of core genes was not expected to be signi cantly reduced by the addition of the new genomes since the exponential trendline reached the number of 1000. Notably, B. catenulatum subsp. catenulatum has a fairly open pan-core genome (Fig. S3B), while B. catenulatum subsp. kashiwanohense's genome tends to be closed (Fig. S3C). These results indicate that B. catenulatum subsp. catenulatum may have a stronger ability to adapt to the environment, while B. catenulatum subsp. kashiwanohense exists in a more speci c and conserved habitat. This nding is consistent with the information on the sources of the B. catenulatum strains (Table S1); B. catenulatum subsp. kashiwanohense only exists in the intestinal tract of infants and is relatively rare [11], while B. catenulatum subsp. catenulatum can exist in the intestinal tract of both infants and adults and has a large number. Therefore, more novel strains of B. catenulatum subsp. catenulatum may be discovered than B. catenulatum subsp. kashiwanohense in the future.

Comparison of the main functions between two subspecies
The above results have uncovered the genetic differences between the two subspecies at the general genomics level, which are usually associated with functional differentiation [19]. Therefore, it is necessary to conduct further functional genomic comparisons between the two subspecies of B. catenulatum. Their functional genomic differences were obtained by annotating all the strains through the RAST website. The functional annotations of 16 B. catenulatum genomes were examined in 23 functional categories (Table S3). These results suggest that the function of amino acid derivatives (21.06%) is the most highly represented category within B. catenulatum followed by protein metabolism (21.00%), carbohydrate metabolism (15.73%) and cofactors, vitamins, prosthetic groups and pigments (9.76%). These data indicate that a strong ability to utilise substrates by B. catenulatum. The comparison of the main functional differences between the two subspecies showed the subspecies differ signi cantly in their metabolism of carbohydrates (p=0.01), amino acids (p=0.011) and proteins (p=0.012) (Fig. 3A, 3B, 3C). In contrast, there was no statistical difference in the cofactors, vitamins, prosthetic groups and pigments-related categories (p>0.05) (Fig. 3D). Because of the most signi cant difference between the two subspecies was in carbohydrate function, the B. catenulatum genes involved in carbohydrate utilisation were analysed.
3.6 Different carbohydrate utilisation patterns in two subspecies of B. catenulatum The carbohydrate utilisation abilities of B. catenulatum subspecies at the genomic level were compared by analysing the functional genes of carbohydrate active enzymes of 16 B. catenulatum strains. As shown in Fig. 4A, 16B. catenulatum strains were distributed in all six carbohydrate-active enzyme families, indicating that they had rich carbohydrate functions. Notably, the clustering results of CAZyme were roughly consistent with those of the phylogenetic trees in that the two subspecies were distinct. This nding not only suggests that the two subspecies have different metabolic patterns in terms of carbohydrate utilisation, but also indicates that CAZyme-related genes are closely associated with the divergence of B. catenulatum subspecies.
Among the identi ed GH families in B. catenulatum species, the most dominant ones were GH3, GH13 and GH43; meanwhile, GT2 and GT4 were the main carbohydrate enzyme families within B. catenulatum species. Comparing the main carbohydrate hydrolase families in the subspecies revealed the number of GH3 family members was signi cantly higher in B. catenulatum subsp. catenulatum than those in B. catenulatum subsp. kashiwanohense (p=0.0038, Fig. 4B). GH3 is mainly involved in the metabolism of plant-derived glycan common in the adult diet, such as β-glucosidase and xylosidase. [20]. However, there was no statistically signi cant difference in the function of GH13, GH43, GT2 and GT4 between the two subspecies (p>0.05) (Fig. 4C, 4D, 4E, 4F). Therefore, GH3 may be a key factor in the divergence of carbohydrate functional genes between the two subspecies of B. catenulatum.
Analysis of the speci c CAZymes of B. catenulatum subsp. kashiwanohense revealed ve families that only existed in the subspecies, including GH18, CBM5, GH95, CBM51 and CBM66 (Fig. 4G). The CBM family is primarily responsible for banding carbohydrates. In addition, the GH18 family often combines with CBM5 to participate in the function of chitinase, and CBM66 mainly assists in the degradation of fructose [21]. In particular, the GH95 family is speci cally involved in the production of α-L-fucosidase, the most abundant substance in HMO and closely related to the function of infant-speci c species [22]. Additionally, the CBM51 family helps GH95 enzymes pick up fucose to metabolise HMO [23]. These CAZyme families CBM51 and GH95 may be conducive to the colonisation of B. catenulatum subsp. kashiwanohense in the intestines of infants, especially the utilization of HMO, in contrast to the abundance of plant-derived glycan of B. catenulatum subsp. catenulatum, further suggesting the bias of the two subspecies in carbohydrate utilisation. In addition, GH29 enzymes often interact with GH95 enzymes to utilise HMO [24], and the study found that GH29 is only in B. catenulatum subsp. kashiwanohense except for PV20-2.
Identi cation of HMO gene clusters in B. catenulatum genomes Considering the speci c utilisation of fucosylated HMO (FHMO) by GH29 and GH95 enzymes, the FHMO gene cluster in B. catenulatum were subsequently examined. Two Bi dobacterium strains (B. longum subsp. longum SC596 and B. pseudocatenulatum JCM1200 T ) with typically structural FHMO gene clusters were selected as the reference [25] for the search for the homologous FHMO gene cluster in all of the B. catenulatum genomes. The homologous alignment showed an integrated FHMO gene cluster in all B. catenulatum subsp. kashiwanohense genomes but not in B. catenulatum subsp. catenulatum (Fig. 5), further con rming the unique ability to utilise HMO by B. catenulatum subsp. kashiwanohense. In the study, two different structures of FHMO gene clusters, named type I and type II, were found in B. catenulatum subsp. kashiwanohense (Table S4). Type shared 89.6% homology with B. longum subsp. longum SC596. The size of type was about 13.0 kb, including 11 open reading frames (ORF), manifested as GH95, GH29, fucU, dihydrodipicolinate synthase family protein (DHP), amidohydrolase family protein, SDR family oxidoreductase, fuconate dehydratase, three ABC transporters and lacI. Meanwhile, type II shared 97.8% homology with B. pseudocatenulatum JCM1200 T ; it was only found I PV20-2 and lacked GH29 and fucU genes, consistent with the results of CAZyme.
Notably, the GC content of the FHMO gene clusters in B. catenulatum subsp. kashiwanohense was signi cantly lower than the entire subspecies (Fig. S4), suggesting that its FHMO gene clusters might be obtained through horizontal gene transfer (HGT) [26]. The identi cation of the FHMO gene clusters in B. catenulatum subsp. kashiwanohense further con rmed its advantage of HMO utilisation, thus providing genomic evidence for its adaptability in the infant intestine.

Discussion
As a typical intestinal symbiotic bacteria, Bi dobacterium has experienced a long and extensive evolutionary process in human hosts [1]. For example, B. catenulatum has evolved into two subspecies, B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum. Previous studies have revealed that B. catenulatum subsp. kashiwanohense and B. catenulatum subsp. catenulatum have an close phylogenetic relationship [39]. Here, phylogenetic reconstruction has revealed genomic differences between the two subspecies. The genome size and the number of the CDSs of B. catenulatum subsp. catenulatum were signi cantly lower than that of B. catenulatum subsp. kashiwanohense. Also, both subspecies have a unique core gene set, such results represent a marker of genetic divergence [17]. In addition, there was obvious host differentiation in Bi dobacterium. For example, B. catenulatum subsp. kashiwanohense, B. longum subsp. infantis and B. breve are more common in infants while B. catenulatum subsp. catenulatum, B. adolescentis are more present in adult intestines [7-8, 11, 27]. Thus, the possible association between subspecies divergence and the host was further explored through functional genomic comparisons to explain the divergence of B. catenulatum at the genomic level.
Bi dobacterium is a genus of saccharolytic microorganisms whose ability to utilise indigestible carbohydrates is essential for their establishment in the gastrointestinal tract [28]. In this study, functional genomics revealed signi cant differences in the carbohydrates consumed by the subspecies of B. catenulatum. Notably, the CAZyme cluster results are consistent with the phylogenetic tree analysis, suggesting that the functional differences in carbohydrates may be related to the genetic divergence of B. catenulatum. This study found that the GH3 content of B. catenulatum subsp. catenulatum was signi cantly higher than that of B. catenulatum subsp. kashiwanohense. Previous studies have shown that GH3 is a key family in the evolution of Bi dobacterium and is involved in the degradation of plant polysaccharides [29]. The results here indicate that GH3 is also a key factor for the divergence of B. catenulatum in carbohydrate function. Studies have shown that the gut environment in adults is more complex than in infants because adults typically consume more di cult-to-digest carbon sources, such as plant-based dietary bre [7][8]. Kim et al. found that B. catenulatum strains can degrade fructooligosaccharides (FOS) in nutritionally restricted environments [30]. Previous studies have shown that a low-bre diet in adults can cause a signi cant increase in the abundance of B. catenulatum [31]. Here, the results demonstrate that B. catenulatum subsp. catenulatum has more GH3 that utilises plantderived glycans; therefore, the subspecies is conducive to the decomposition of di cult-to-use plantderived glycans in the adult gut.
On the other hand, infants, especially those who are breastfed, have many HMOs in their intestines. HMO is a prebiotic unique to breast milk and is especially enriched in human breast milk [32]. The ability of infant-speci c Bi dobacterium to metabolise HMO has been recognised as a speci c marker of its adaptive colonisation and bene cial for strengthening the immune system in infants [33]. For B. catenulatum subsp. kashiwanohense, which is characterised by infant adaptation [11], its two speci c CAZymes, namely GH95 and CBM51, which are notable. GH95 mainly utilises fucosyllactose, a major component of HMO [34]. On the other hand, CBM51 is bene cial to GH95 and helps it pick up FHMO [23]. Thus, this study suggests that GH95 and CBM51 act synergistically in the utilisation of FHMO by B. catenulatum subsp. kashiwanohense. Particularly, GH29 is often identi ed with GH95 as the family of metabolic HMO [35]. In B. catenulatum subsp. kashiwanohense, all strains except PV20-2 contain GH29. Therefore, the study suggests that these three families (GH29, GH95 and CBM51) play an important role in the colonisation of B. catenulatum subsp. kashiwanohense in the infant intestine.
Based on the ndings related to the HMO-related families, this study further con rms the existence of relatively conserved HMO gene clusters in B. catenulatum subsp. kashiwanohense while not in B. catenulatum subsp. catenulatum. These HMO gene clusters are highly homologous to those in other typical infantile adapted Bi dobacterium that are connected to the GH95 and GH29 families. Only the PV20-2 strain lacks GH29 and fucU, while the genome of PV20-2 shares high homology with the HMO gene cluster of B. pseudocatenulatum JCM1200 T , which can grow in puri ed FHMO [36], the lack of these two genes appears to have little effect on the overall ability to use FHMO. Given that the reference genomes in HMO gene clusters are all from infants, their clusters have been demonstrated to be conducive to their utilisation of HMO [28,36]. This study suggests that B. catenulatum subsp. kashiwanohense may have a similar utilisation mechanism of FHMO for adaptive survival in the infant intestine [28, [34][35]. James et al. [11] has con rmed that B. catenulatum subsp. kashiwanohense APCKJ1 expresses HMO genes related to the consumption of FHMO, its sole carbohydrate source. Given the high similarity of the HMO gene clusters in B. catenulatum subsp. kashiwanohense, this characteristic may be extrapolated to other strains; however, this hypothesis requires further veri cation. Notably, B. catenulatum subsp. catenulatum 1899B and IMAUFB085 belong to infant isolates, but no HMO genes were found in them, further con rming that possession of HMO genes is a genetic trait of B. catenulatum subsp. kashiwanohense.
The evolution of Bi dobacterium involves a large number of gene acquisition events [37]. Garrido et al. [28] propose that the HMO gene clusters have transferred from B. longum subsp. infantis to B. longum subsp. longum during evolution. Notably, the HMO gene cluster in B. catenulatum subsp. kashiwanohense in this study showed a signi cant decrease in GC content, likely caused by HGT events, that were important in the genomic evolution of species [27]. At present, these types of HMO gene clusters have been found in typical infant-derived strains, such as B. breve, B. longum and B. pseudocatenulatum species, and they have high homology with each other [24,28,36]. This study proposes that B. catenulatum subsp. kashiwanohense acquired HMO gene clusters through HGT from other proximal species (such as B. longum), the acquisition of HGT contributed to the speci c function of genome divergence and HMO utilisation.
Although the two subspecies of B. catenulatum are closely phylogenetically related and share a common ancestor [39], previous studies have con rmed that they showed different tendencies adapted in infants and adult intestines [7,8,11]. Taken together, given that the carbohydrate genetic pattern of the two subspecies was consistent with the phylogenetic relationship, we speculated that the B. catenulatum species evolved to retain the competitive carbohydrate function genes to adapt to the intestinal environment of infants and adults, respectively, driving the emergence of two subspecies. Our results are similar to the divergence of B. longum, for the infantis subspecies of it has speci c genes related to the metabolism of HMO and is more suitable for breast-feeding infant intestines, while the longum subspecies is present in both infant and adult hosts but has more genes for the utilization of plantderived sugars and is more suitable for adult diets [28]. The example of this divergence of species in different hosts seems to suggest a potential pattern of genetic divergence of Bi dobacterium, in which infant and adult wealthy species have more HMO genes and plant-derived glycan genes respectively in the human gut in order to adapt to their respective hosts.

Conclusions
In summary, this study proposes that the B. catenulatum species evolved to retain the competitive carbohydrate function genes to adapt to the respective intestinal environment in infants and adults, driving the emergence of two subspecies. This study has provided genomic evidence for the potential host adaptation phenomenon of B. catenulatum in infant and adult intestines. However, the number of B. catenulatum strains is limited; more strains will need to be sequenced in the future to dissect further the mechanism underlying their genetic divergence.

Methods
Bacterial strains, DNA extraction and publicly available assemblies The genomes of two strains of B. catenulatum, IMAUFB085 and IMAUFB087, were provided by the Lactic Acid Bacteria Collection Center (LABCC). Moreover, IMAUFB085 was isolated from infant faeces and IMAUFB087 from adult faeces in Tibet, China [38].
The two strains were cultured under anaerobic conditions in the Man Rogosa and Sharpe (MRS) broth with L-cysteine hydrochloride at 37°C for 24 h. DNA extraction was performed using the TIANamp Bacteria DNA Kit. Genomic DNA was quanti ed using a TBS-380 uorometer. High-quality DNA samples were obtained to construct fragment libraries.
In addition, publicly available assemblies of all B. catenulatum strains were obtained from the National Coalition Building Institute (NCBI, https://www.ncbi.nlm.nih.gov/) on 4 February 2021, including that of type strains, namely B. catenulatum subsp. catenulatum (JCM1194 T ) and B. catenulatum subsp.
kashiwanohense (JCM15439 T ) (Table S1). Additionally, the B. pseudocatenulatum strain (JCM1200 T ) in the B. adolescentis group, most closely related to B. catenulatum according to the phylogenetic relationship of Bi dobacterium genus in a previous study [39], were downloaded to infer phylogenetic relationships across species within it.

Genome sequencing and assembly
Genome sequencing was performed using the Illumina HiSeq platform to generate 150-bp paired-end reads for each sample. Then, the sequences were ltered through the Illumina HiSeq system. The highquality sequences were assembled using SOAPdenovo2 [40] on a 64-bit Linux system. High-quality data corresponding to a sequencing depth of about 387-fold, was generated for each strain. In addition, local inner gaps were lled, and single-base errors were corrected using GapCloser (http://sourceforge.net/projects/soapdenovo2/ les/GapCloser/).

Genome annotation
In this study, all the general genomic information of B. catenulatum genomes was generated using selfmade Perl scripts. The functional gene information of this bacterium was obtained by performing the gene prediction and preliminary annotation of all B. catenulatum genomes through the Rapid Annotation using Subsystems Technology (RAST) server (https://rast.nmpdr.org/rast.cgi). In addition, tRNA genes were identi ed using tRNAscan-SE (http://trna.ucsc.edu/tRNAscan-SE/).

ANI (Average nucleotide identity) and TNI (Total nucleotide identity)
The genetic relatedness between the two B. catenulatum subspecies was evaluated, and the taxonomic status of the strains in this study was con rmed by analysing the ANI and TNI values of all the strains. B. pseudocatenulatum JCM1200 T , the type strain most phylogenetically related to B. catenulatum [39], was included in the comparison. All pairwise ANI values were calculated according to the method proposed by Goris et al. [41]. TNI values were calculated according to the method proposed by Chen et al. [42]. Finally, the clustering heat map was drawn using TBtools [43].
Construction of pan-core genome and strain-speci c genes The annotated genomes of B. catenulatum were obtained using Prokka v1.12 [44] and processed using Roary [45] to identify the pan genes, core genes and speci c genes using the default parameters. The intersection groups, representing the unique sets of genes identi ed only between the intersected genomes, were visualised using the UpSet diagram in TBtools [43].

Phylogenetic analysis
The core gene alignment from Roary was used in TreeBeST [46] with 1,000 bootstrap iterations to build a phylogenetic NJtree through Neighbor-Joining (NJ) [47]. The phylogenetic trees were then visualised and annotated using iTOL (https://itol.embl.de/). BRIG (BLAST Ring Image Generator) BRIG v0.95 [48] was adapted to compare the genomes of B. catenulatum strains based on a JAVA language environment. The image of the circular genomes was also generated through BRIG.
CAZyme (Carbohydrate-active enzymes) online annotation Given the carbohydrate metabolic activity of Bi dobacterium, CAZyme online annotation of B. catenulatum strains were accomplished using three annotation tools, HMMER, DIAMOND and Hotpep searches [49]. Carbohydrate-active enzymes were annotated in the genome sequence to predict potential families of glycosyltransferases (GTs), glycoside hydrolases (GHs), carbohydrate esterases (CEs), polysaccharide lyases (PLs), auxiliary activity (AA) and carbohydrate-binding modules (CBMs). The identi cation of CAZymes across the B. catenulatum genomes was carried out using the dbCAN2 meta server (http://bcb.unl.edu/dbCAN2/). According to the annotation results, the detailed information on the active carbohydrate enzyme family was checked on the CAZyme website (http: //www.aczy.org/).

Detection of the HMO gene clusters
Taking B. longum subsp.longum SC596 and B. pseudocatenulatum JCM1200 T as the reference, which possess typical HMO gene clusters. In addition, the genome of SC596 was obtained from IMG database [50]. The corresponding protein-encoding sequences were extracted from the strains and compared using NCBI BLASTp with default parameters. The recognised HMO gene clusters were visualised using the genoplotR package.

Statistical analysis
The data were presented as means ± SEM. The Wilcoxon signed-rank test was used to verify the signi cance of the difference between the groups, and visualisation was performed using the ggpubr packages in R (4.0.3). Lastly, signi cance was set at a p-value of less than 0.05.

Data availability
The assembly and Sequence Read Archive (SRA) data of the two newly isolated sequences in this work were submitted as a Whole Genome project (BioProject No. PRJNA751426) at GenBank under the accessions JAIEWL000000000 (IMAUFB087) and JAIEWM000000000 (IMAUFB085) (

Availability of data and materials
The assembly and Sequence Read Archive (SRA) data of the two newly isolated sequences in this work were submitted as a Whole Genomeproject (BioProject No. PRJNA751426) at GenBank underthe accessions JAIEWL000000000 (IMAUFB087) and JAIEWM000000000 (IMAUFB085) (available at Figure 2 Phylogenetic tree of two subspecies of B. catenulatum Phylogenetic NJtree of B. catenulatum species and taken B. pseudocatenulatum JCM1200 T as the outgroup. Bootstrap was set as 1000. All the B. catenulatum strains were annotated to isolate location and source. The scale bars represent 0.01 substitutions per site (A). UpSet diagram showing shared and unique core genes distribution among B. catenulatum strains. The horizontal bars represent the total number of genes identi ed of individual strains. The vertical bars or intersections represent the number of genes that were regulated by one or more strains. The orange dots represent unique genes and the yellow dots represent core genes. The green items represent information about B. catenulatum subsp. catenulatum, and the blue items represent information about B. catenulatum subsp. kashiwanohense. Groups with fewer than 10 genes were ltered (B).