The metagenome sequencing described community composition of microbiota in different gut regions of weanling piglets
A total of 42 samples, including 15 ileum contents, 18 colon contents and 9 faeces, were collected from 42 weanling piglets from Guangdong Academy of Agricultural Sciences. Every 3 samples from one line were combined into one sample which ultimately resulted in 14 samples. To investigate the microbiota composition in each sample, we first performed deep metagenomic sequencing for these samples and generated 20 Gb of data for each sample on average. Notably, over 12% of reads mapped to bacteria that could not be classified at the species level, which represent novel species in the gut of weanling piglets (Supplementary Fig. 1). 99.8% of the bacterial reads were assigned into 10 phyla. The phylum Firmicutes dominated the gut microbiota in the ileum, colon, and faeces, accounting for 98.9%, 67%, and 76.8% of the bacterial sequences, respectively (Figs. 1a, b). We also discovered Lactobacillus were common present in 3 regions and accounted for the highest proportion in samples from faeces and ileum and some samples from colon (Supplementary Fig. 2a, b). The higher microbial diversity, including the Shannon index, was observed in the colon as compared with the faeces and ileum (Fig. 1c), consistent with previous studies13, 14. The principal coordinates analysis showed that the microbiota composition of the faeces and colon were similar and were different from that of the ileum (Fig. 1d). These results indicated the regional similarities and differences in microbiota composition and the beneficial bacteria, like Lactobacillus dominated the gut microbiota of weanling piglets, were valuable resource for the furtherly cultured-based study.
Bacteria culturing and genome sequencing of gut of weanling piglets
To advance our understanding of the diversity of gut microbiota and acquire bacterial isolates from the weanling pig gut, we used a variety of culture methods, including nutrient-rich medium (Peptone Yeast Extract Glucose Medium, Modified, abbreviated as MPYG, with or without sheep blood), oligotrophic medium (R2A), Columbia Agar, spore medium, etc. In total, 25 culture conditions were used (Supplementary method). Most of the bacteria were cultured in sheep blood-MPYG, MPYG, and sheep blood-spore medium. A total of 1,476 strains were obtained and identified using 16S rRNA gene sequence, the detail information is provided in Supplementary Table 2. These isolates belonged to 5 different phyla, including Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, and Fusobacteria (Supplementary Fig. 3). Limosilactobacillus reuteri was the most abundant, which represented 18% of the taxa isolated and half of these 1,476 bacteria were Lactobacillus (recently divided as Lactobacillus, Ligilactobacillus, and Limosilactobacillus), representing the beneficial bacteria in gut of piglets (Supplementary Fig. 4). Most of these bacteria were the first time isolated from the pig gut microbiota, which expanded the bacteria diversity of the gut microbiota of pig. Notably, the isolated bacteria from the ileum, colon, and faeces were much different with highest species diversity of bacteria from faeces sample, that consistent with the analysis of metagenome (Supplementary Fig. 4). All bacteria have been deposited in the China National Genebank (CNGB) Shenzhen for public accessibility.
Comparison of cultured genomes and MAGs
We selected 266 representative strains that covered the bacterial diversity of the 1,476 isolates for genome sequencing. After assembly, 266 high-quality genomes with completeness more than 90% and contamination less than 5% were obtained (Supplementary Table 3a), of which 195 genomes (73.31%) were more than 99% completeness (Supplementary Fig. 5a), indicating that the great majority of the genomes were relatively complete. Based on the 95% average nucleotide identity (ANI), we clustered the 266 genomes into 59 species-level clusters. The vast majority of clusters were Firmicutes (242 genomes, 52 clusters), followed by Proteobacteria (18 genomes, 3 clusters), Actinobacteriota (4 genomes, 2 clusters), Bacteroidota (1 genome, 1 cluster) and Fusobacteriota (1 genome, 1 cluster) according to the taxonomic annotation of these genomes (Fig. 2a). Notably, 10 out of 59 clusters (23 genomes) lacked a species-level match with the GTDB-Tk (The Genome Taxonomy Database Toolkit) database (Fig. 2a and Supplementary Table 3a).
By Assembling and binning metagenomic sequence data from 14 samples, we reconstructed 482 non-redundant MAGs with completeness more than 70% and contamination less than 5% (Supplementary Table 3b). Only 223 (46.27%) MAGs were high-quality genomes (completeness more than 90%), 11 of which with completeness more than 99% (Supplementary Fig. 5b), which indicated that the quality of the genomes generated by culture-based methods was generally higher than metagenomic assembly. To evaluate the species bias caused by the two methods, we clustered 489 high-quality genomes. The results showed that MAGs and isolated genomes had 74.63% and 25.42% of unique species-clusters, respectively (Supplementary Fig. 6a, b), illustrating that the combination of two methods is needed to more comprehensively represent the species of the pig gut microbiota. We thus clustered the 748 genomes into 428 species-clusters to integrate the reference genomes. According to the GTDB classification annotations, 5 clusters (5 MAGs) were Archaea, and the remaining clusters were bacteria from 10 phyla (Fig. 2a and Supplementary Table 4). It is worth noting that 77.80% of clusters (333 clusters) could not be matched with any existing species, which represent novel species, and 96.70% (322 clusters) of the novel species without isolate genome representative in this study (Fig. 2a).
Previous studies have used metagenomics or culture-based method to study the intestinal microbes of pigs. Wang et al.9 generated 360 high-quality assembled genomes for pig fecal microbiome. And Wylensek et al.12 established the pig intestinal bacterial collection (PiBAC), a resource of cultured bacteria from the pig intestine. We evaluated the novelty of our genomes by mapping 748 genomes against these two reference datasets. The result showed that our MAGs and isolated genomes had 65.89% and 50.85% unique clusters (novel species), respectively (Fig. 2b, c and Supplementary Fig. 7a, b), which contribute new resources for the research of pig gut microbiota. In addition, most of the clusters were only detected in the MAG dataset, but not in any culture studies, reflecting the lack of culture-based studies of pig intestines.
The assembled genes contributed to the present gene catalog
We next predicted genes from the 14 metagenomic samples and generated a non-redundant gene catalog with a number of 5,283,405, which covered 22.01% of the reference gene catalog established by Xiao et al.5 (Fig. 3a, b). The samples from faeces contained the most genes, followed by the colon and ileum. Each sample contributed more than 50% of novel genes for the gene catalog. Among them, samples from ileum have the largest proportion of novel genes, followed by colon and faeces (Fig. 3a). Considering the samples of the reference gene catalog by Xiao et al. were derived from pig faeces, we thought that construct a more complete gene catalog of pig intestinal needs to include samples from multiple parts. Altogether, we expanded the present gene catalog to 10.91 Mb (Fig. 3b). In addition, the coverage of genes obtained by MAGs and isolated genomes were 12.97% and 1.29%, respectively, whereas 62.97% of the genes in the isolated genomes could not be detected by metagenomics (Fig. 3c). This indicates that the missed genes from metagenomic analyses can be detected by culture-dependent methods.
Functional insight of gut microbial in weanling piglets
The comprehensive gene catalog of pig intestines enables a higher-resolution functional analysis to better understand the interaction between gut microbes and pigs. We subsequently performed a KEGG pathway annotation of total protein sequences and found that 213 pathways were annotated in at least one genome. The most general pathway was Metabolic pathways, followed by Biosynthesis of secondary metabolites, Biosynthesis of antibiotics, Microbial metabolism in diverse environments, and Biosynthesis of amino acids, which were extensively annotated in all genomes (Supplementary Fig. 8 and Supplementary Table 5). In addition, we predicted 229 carbohydrate-active enzymes (CAZymes) in all genomes to explore the ability to metabolize carbohydrates (Fig. 4a and Supplementary Table 5). The result showed that the CAZymes of bacteria are mainly glycoside hydrolases (GHs), whereas archaea are glycosyl transferases (GTs) (Fig. 4a). This indicates that bacteria obtained energy for growth by metabolizing carbohydrates, while archaea obtain energy through other pathways and synthesize sugars substance.
Antibiotics are widely used for the treatment and prevention of infection during the suckling period. Microbes may accumulate antibiotic resistance genes (ARGs) under the exposure of antibiotics, and can eventually evolve into drug resistance. We have predicted a total of 2,113 ARGs that can be classified into 38 drug classes from metagenomics, isolate genomes, and MAGs (Fig. 4b). Comparison of the ARGs predicted by the three approaches showed that metagenomics can detect the largest number and types of ARGs, followed by culture-dependent, while a large number of ARGs will be lost in the process of reconstructed genomes from metagenomic bins (Fig. 4b). For analysis of virulence factors (VFs), we used the predicted genes for blast to VFDB45. A total of 1,866 VFs were predicted in 748 genomes (Supplementary Fig. 9). Similarly, a large amount of VFs cannot be detected in MAG. We note that Escherichia flexneri not only had most various ARGs, but also contained a large number of VFs.
Discovery of novel SMBGs in the gut microbiome of weanling piglets
Microbes produce a series of secondary metabolites that are not necessary for life activities but have biological activity, which usually mediate important interactions between microbe and microbe-hosts. We explored secondary metabolite biosynthetic gene clusters (SMBGs) in the 748 genomes, and identified 445 SMBGs that could be classified into 19 types from 292 genomes (Supplementary Table 6). Most of these SMBGs are responsible for the synthesis of bacteriocin, followed by sactipetide, arylpolyene, and lantipeptide (Fig. 4c). These SMBGs are predicted from the genomes from 6 phyla, among which Firmicutes contained the largest number and widest variety of SMBGs (Fig. 4c).
For clustering the 445 SMBGs, we generated a total of 231 families, which were distributed in 5 classes of ribosomally synthesized and post-translationally modified peptides (RiPPs), terpene, non-ribosomal peptide synthetases (NRPs), polyketide synthases (PKS) and Others. RiPPs make up the largest class, but only 4 families of RiPPs that are included the reference with known functions in the MIBiG database. Salivaricin CRL1328 is a bacteriocin produced by Ligilactobacillus salivarius15, 16, which has activity against pathogenic bacteria such as Enterococcus faecalis and Enterococcus faecium. We have predicted SMBGs related to the biosynthesis of Salivicin CRL1328 in 22 isolated L. salivarius genomes (Fig. 4d). We speculating that these strains have the potential to inhibit infections. Gassericin, derived from Lactobacillus gasseri, is another important bacteriocin. Studies have shown that Gassericin A can confer diarrhea resistance in pigs17. We discovered SMBGs that synthesize Gassericin E and Gassericin T in isolated genomes of L. johnsonii (Fig. 4d).
The pangenome analysis of representative species
To extend the phylogenetic analysis of representative species in the gut of weanling piglets at a genome-wide level, we conducted a pangenome analysis of 8 species with the genome number more than 10, including Escherichia flexneri (n=16), Enterococcus faecalis (n=22), Enterococcus faecium (n=21), which represent opportunistic pathogens, and Lactobacillus amylovorus (n=16), Ligilactobacillus salivarius (n=43), Limosilactobacillus mucosae (n=13), Limosilactobacillus reuteri (n=21), Lactobacillus johnsonii (n=17), which represent probiotics. As expected, the resulting accumulation curves showed that the gene repertoires of pan-genome of all the representatives were increased on addition of a new genome and the core genome decreased in contrast (Fig 5a and Supplementary Fig. 10). However, the number of gene families did not show a rapid increase in the pan genome. E. flexneri contained the largest pan genome size of 8,143 genes, 3,111 of which formed the core genome (Supplementary Fig. 11 and Supplementary Table 7). Over half of core genes present in the genomes of E. faecalis and E. faecium, indicating that gene loss and acquisition happened less frequently in these two species. The size of the species-specific pan genomes of the 5 lactobacilli varied from 3,194 to 4,187 genes, respectively (Supplementary Fig. 11 and Supplementary Table 7).
We next analyzed the function details of the core genomes and pan genomes, including the Clusters of Orthologous Groups (COGs) and Kyoto Encyclopedia of Genes and Genomes (KEGG) classification, biosynthesis of bacteriocin, and ARGs. It is obvious that different COG functional classes were enriched in both the core and pan genomes between the opportunistic pathogens and probiotics. COG0438 (Glycosyltransferase involved in cell wall biosynthesis), COG4690 (Dipeptidase), and COG1307 (Fatty acid-binding protein DegV) were enriched in the core genomes compared to their respective accessory and unique genomes of the 5 lactobacilli (Supplementary Table 8), indicating that these COG categorizations represented housekeeping functions involved in implementing the basic growth and metabolism of these probiotics. However, in contrast to lactobacilli, E. flexneri, E. faecalis, and E. faecium possessed more abundant COG1609 (DNA-binding transcriptional regulator, LacI/PurR family) and COG1132 (ABC-type multidrug transport system, ATPase and permease component) in their core genomes (Supplementary Table 8).
For determining the ARGs distribution in the core and pan genome of the 8 species, we discovered tetracycline and macrolide were distributed in the core genome of E. flexneri, E. faecalis, and E. faecium, respectively (Fig. 5b), indicating a wide prevalence of tetracycline and macrolide resistance among these species. Their presence may have resulted from the overuse of antibiotics in farmed pigs for disease prevention in past years. Nevertheless, rare ARGs present both in the core and pan genome of other lactobacilli (Fig. 5b and Supplementary Table 9), suggests the safety of these species for using as probiotics in feeding of weanling piglet.
Reuterin, produced by some strains of L. reuteri, has antimicrobial properties18. We investigated the prevalence of biosynthesis genes (pdu-cbi-cob-hem) related to reuterin in the core and dispensable genomes and detected the production of reuterin experimentally. In total 41 genes related to the synthesis of reuterin were predicted in the pan genome of L. reuteri, including cbiA~cbiT (15 genes), cobB~cobU (10 genes), hemA~hemL (7 genes), and pduA~pduV (9 genes) (Fig. 5c and Supplementary Table 10). Notably, only cobB, cobC_1, and cobC_2 were found in the core genomes for all the 21 genomes. The rest were present in the accessory or unique genomes. 9 of 21 genomes encoded completed biosynthesis gene clusters (pdu-cbi-cob-hem), indicating these strains were most likely to be reuterin producers. We next detected the production of Reuterin in the fermentation supernatant of the culture of 19 isolated L. reuteri. As expected, 4 strains with completed biosynthesis gene clusters (pdu-cbi-cob-hem) yield reuterin ranged from 0.6 to 2.54 nm/L (Supplementary Table 10).