Experimental animals and sampling
Two experimental pig cohorts were used in this study. The discovery cohort comprised 550 Duroc pigs from Shahu (280 pigs) and Jiangying (270 pigs) farms in southern China. Another 148 Duroc pigs from the Jiangying farm were used as the validation cohort. All experimental pigs were raised under similar feeding and management conditions. The commercial formula feeds provided to experimental pigs of each farm contained 60% corn, 15% soybeans, 10% wheat bran, and 8% rice polishings. The main nutrient components of the diets are listed in Additional file 2:Table S8. Diet and water were offered ad libitum. Backfat thickness and transection area of the longissimus dorsi muscle were measured in the middle of the last 3rd and 4th ribs using a B-model ultrasound instrument (Pie-Medical, Netherlands) when the body weight of experimental pigs achieved 120 ± 10 kg, around the age of 160 ± 10 days. The GPS software was used to adjust the backfat thickness and transection area of the longissimus dorsi. Lean meat percentage (LMP) was calculated by the model: adjusted PPL = [80.95-(16.44*adj.bf)+(4.693*adj. LMA)]*0.54[49], where PPL represents lean meat percentage, and adj.bf and adj. LMA represent adjusted backfat thickness and transection area of longissimus dorsi, respectively. The fecal samples were collected from all experimental pigs at the age of 160 days, conserved in sterilized tubes, and immediately immersed in liquid nitrogen for transportation and then stored at −80°C until use. In the validation cohort, we chose 16 fecal samples with extreme phenotype values for metagenomic sequencing, including eight samples with high LMP values (57.83 ± 0.54, mean ± SD) and eight samples with low LMP values (54.57 ± 0.59). To investigate the abundance of P. copri isolated in this study in the gut of pigs fed diets with different fiber contents, metagenomic sequencing data of six fecal samples from natural free-living wild boars (high-fiber diets) and 13 fecal samples from semi-grazing Tibetan pigs (supplemented with potato and highland barley, high-fiber diets) were also used in this study. All experimental pigs were healthy and had not received any antibiotics, probiotics, or prebiotics within at least two months before sample collection.
DNA extraction and 16S rRNA gene sequencing
Fecal microbial DNA was extracted using the QIAamp DNA Stool Mini Kit (QIAGEN, Germany) following the manufacturer's guidelines. DNA concentration was measured with a Nanodrop-1000 (Thermo Scientific, USA), and the quality was assessed by agarose gel electrophoresis. The barcoded fusion forward primer 515F (5’-GTGCCAGCMGCCGCGGTAA-3’) and the reverse primer 806R (5’-GGACTACHVGGGTWTCTAAT-3’) were used to amplify the V4 hypervariable region of the 16S rRNA gene in the discovery cohort. The primers 338F (5’- ACTCCTACGGGAGGCAGCA -3’) and 806R (5’- GGACTACHVGGGTWTCTAAT -3’) were used to amplify the V3-V4 hypervariable region of the 16S rRNA gene in the validation cohort. The PCR amplification conditions were as follows: initial 95°C denaturation step for 10 min, 35 cycles of 95°C for 25 s, 55°C for 20 s, and 72°C for 5 min followed by a final extension for 10 min at 72°C. All amplicons were sequenced using the paired-end method on a MiSeq platform (Illumina, USA) following the standard protocols.
The raw 16S rRNA gene sequencing data were filtered for the primer sequences, the barcodes, and the low-quality reads according to Illumina’s quality control procedure. High-quality paired-end clean reads were assembled using FLASH (v1.2.11) [50]. The USEARCH (v7.0.1090) quality filter pipeline was used to filter the putative chimeras and to choose Operational Taxonomic Units (OTUs) at 97% sequence identity [51]. Only those OTUs that had relative abundance > 0.05% and were present in more than 1% of the experimental pigs were included for further analysis. Taxonomies were assigned for the aligned sequences using Quantitative Insights Into Microbial Ecology (QIIME, v1.80) with a Ribosomal Database Project (RDP) classifier [52].
Construction of enterotype-like clustering
Enterotype-like clustering was performed according to the method described previously [53]. In brief, Jensen-Shannon divergence (JSD) distances were calculated based on the relative abundances of bacterial taxa at the genus level using the Partitioning Around Medoids (PAM) method. The optimal number of clusters and the groups’ robustness were evaluated with the Calinski-Harabasz (CH) index and silhouette value. Sparse Correlations for Compositional data (SparCC) was applied to determine co-abundance (positive) and co-exclusion (negative) relationships between genera based on their relative abundances [54]. Significant correlations between bacterial genera were identified using the partial correlation and information theory (PCIT) algorithm [55]. The absolute correlations were transformed into links between two genera in the genus network, and the networks were visualized in Cytoscape (v3.4.0). The comparison of the LMP values between enterotypes was performed by Wilcoxon rank sum tests in the R package (v3.5.1).
Association analysis between OTUs and pig LMP
The residuals of phenotypic values of the LMP corrected for the effects of sex and sampling batch (three and two sampling batches for discovery and validation cohort, respectively) were used for further association analysis between the LMP values and the relative abundances of OTUs. Because the relative abundances of OTUs exhibited a non-normal distribution pattern, the association analysis was performed using a two-part model as reported previously [56]. In brief, the two-part model accounts for both binary and quantitative characteristics of gut microbial abundance. The binary model (adj_p = β1b + e, adj_p represents LMP values adjusted for the effects of sex and batch, β1 is the estimated binary effect, b is a binary feature, and e refers to the residuals) describes a binomial analysis that tests for association of detecting a microbe with the LMP. The binary feature of a microbe under investigation was coded as 0 for undetected or 1 for detected in each sample. The quantitative model (adj_p = β2q + e, where β2 is the estimated quantitative effect, and q is a quantitative feature) evaluates the association between the abundances of the detected microbes and the LMP values. A meta-analysis was performed to assess the effects of both binary and quantitative models by using an unweighted Z method (Z = ∑ki=1 zi/ ~N(0,1); zi = -1(Pi)). The final association P value was set as the minimum of P values of binary, quantitative, and meta-analyses. In total, 1000 permutation tests were performed to correct for false positives, and a false discovery rate (FDR) < 0.01 was set as the significance threshold.
Co-abundance group (CAG) analysis of OTUs
The OTUs having relative abundance > 0.1% were used to construct CAGs. We first calculated the correlation coefficients among OTUs using the Sparse Correlations for Compositional data (SparCC) algorithm in both test and validation cohorts[54]. Then, CAGs were defined by a heat plot using the SparCC correlation coefficient matrix and Ward’s linkage hierarchical clustering through the Made4 (v3.40) package [57]. PERMANOVA was performed to assess the accuracy of clustering with 1000 permutations at P < 0.01 [58]. The network plot highlighting the SparCC correlations among CAGs was constructed in Cytoscape (v3.6.0) [59]. Spearman’s correlation analysis was performed to test the correlations between CAGs and the LMP values in both test and validation cohorts.
Metagenomic sequencing analysis
A pair-end (PE) library with insertion size of 350 bp was constructed for each of 16 samples according to the manufacturer’s instructions (Illumina, USA). Sequencing was performed on a Novaseq 6000 platform (Illumina, USA). High-quality reads were obtained by filtering out adaptors, low quality reads, and host genomic DNA contamination from the raw data.
We assembled the high-quality reads into contigs using the SOAPdenovo assembler (v.2.21) [60]. The USEARCH (v.7.0.1090) program was used to exclude the redundant contigs [51]. The contigs more than 300 bp in length were used to predict open reading frames (ORFs) by applying MetaGeneMark (v2.10) [61]. A non-redundant gene set containing 2,799,188 genes was constructed by excluding the redundant genes from all predicted ORFs using Cd-hit software (v4.6.1) [62]. A gene abundance profile was generated by mapping the high-quality reads from each sample to the non-redundant gene set using the screen function in MOCAT (v2.0) [63]. To assess gene richness in the high and low LMP pigs, we calculated the total gene number in each sample using the pair-oriented counting method [16]. The α-diversity (Shannon index) was calculated using the gene abundance profiles using the vegan R package (v3.5.1). Comparisons of gene counts and the α-diversity between high and low LMP pigs were performed using the Wilcoxon rank sum test. Taxonomic assignments of the predicted genes were performed using the BLAST + Lowest Common Ancestor (BLAST + LCA) algorithm based on the sequence similarity to the reference genomes in the non-redundant (NR) database [64]. Functional annotations were performed by aligning the putative amino acid sequences that were translated from the predicted genes against CAZy and KEGG databases using BLASTP [65]. Linear discriminate analysis effect size (LEfSe) was used to identify the bacterial species and function capacities of gut microbiome having significantly different abundances between high and low LMP pigs. Correlations between the LMP-associated bacterial species and the LMP-associated function capacities of gut microbiome were evaluated in the 16 samples with metagenomic sequencing data using Permutational analysis of variance (PERMANOVA) based on 9,999 permutations using the vegan package in R (v3.5.1) [12]. The significance threshold was set at FDR < 0.05. The correlation coefficient was calculated as Spearman’s rank correlation. The heatmap was plotted using the gplots package in R (v3.5.1) [66].
The metagenomic sequencing data of another 20 fecal samples from the discovery cohort were obtained in our previous study via the same method [17] and were also used in this study. The association of bacterial species with the LMP in the integrated 36 metagenomic sequencing data was analyzed by a two-part model as described above. The comparison of the abundance of P. copri among pigs fed diets with different fiber contents was performed with the metagenomic sequencing data of fecal samples from six wild boars, 13 Tibetan pigs, and 36 Duroc pigs as described above. Clean reads of each sample were aligned to the gene catalog using BWA MEM (v0.7.17-r1188) [67], and then the number of successfully assigned reads was computed using FeatureCounts (v2.0.1) [68]. The abundances of genes were normalized to fragments per kilobase of gene sequence per million reads mapped (FPKM). The abundance of P.copri was calculated by summing the abundances of all the members belonging to this species. The comparison of gut P. copri abundances among wild boars, Tibetan, and Duroc pigs was performed by a Wilcoxon test and visualized using the ggpubr package in R (v3.6.2).
Isolation and culture of the bacterial strain of P. copri from pig fecal samples
The fecal samples from 22 experimental Duroc pigs with both extreme phenotypic values of fat accumulation (low LMP) and high abundance of P. copri were collected and used for the P. copri isolation experiment. One-gram fecal samples were suspended in phosphate buffered saline (PBS) buffer and serially diluted to 10-8. Eighty-microliter diluted samples were plated anaerobically on Bacteroides mineral salt agar to isolate P. copri [69] . The plates were incubated at 37oC for 2–7 days in an anaerobic workstation (ELECTROTEK AW500SG,UK) filled with 80% N2–10% CO2–10% H2 gases [70]. A single colony from plates was selected according to the main characteristics of the strain that we were looking for based on the previous description [71], i.e., white, circular, convex and gram-negative rods, and purified by streaking the single bacterial colony on modified PYG agar supplemented with 5% (v/v) sterile defibrinated sheep blood with a sterile probe [71]. The plates were maintained under the culture conditions mentioned above for two days. The 16S rRNA gene of the single strain was amplified using two universal primers 27F (5'-AGAGTTTGATCCTGGCTCAG-3') and 1492R (5'-GGTTACCTTGTTACGACTT-3') and sequenced by the Sanger method. The 16S rRNA gene sequences were then aligned to the NCBI nucleotide sequence database to determine P. copri strains. In addition, we blasted the 16S rRNA gene sequences of the isolated strains with the V3–V4 sequence of the OTU1905 (P. copri) that was most significantly associated with the LMP in this study. The isolated strain with > 99% sequence identity was used for gavage in germ-free mice. The P. copri strain isolated above was cultivated in modified PYG medium for 36 h under anaerobic conditions, harvested in log phase, centrifuged at 1000 rpm for 10 min, and then washed twice with PBS. The precipitate was re-suspended with 5% sterile non-fat milk prepared by PBS and stored at −80°C until use.
Whole-genome sequencing of P. copri
The isolated P. copri strain was recovered and grown on PYG liquid medium at 37℃ with 80%-N2-10%CO2-10%H2 for 72 h. Ten milliliters of cultured PYG fluid was centrifuged at 5000 rpm for 10 min. P. copri cells were washed twice using sterilized PBS solution and collected for DNA extraction. Genomic DNA of P. copri was extracted using QIAamp DNA Mini Kit according to the manufacturer’s instructions. The quantity and quality of extracted DNA were evaluated by agarose gel electrophoresis, NanoDrop-2000 (ThermoFisher, USA) and Qubit (ThermoFisher, USA).
A Nanopore sequencing library was prepared according to Oxford Nanopore’s “1D gDNA selection for long reads” protocol (Oxford Nanopore Technologies, UK). In brief, 2 μg of genomic DNA of P. copri was sheared using a g-Tube (Covaris, USA) with 150 μl of nuclease-free water at 5000 rpm for 2 min. Long DNA fragments were enriched using the Blue Pippin selection system (Sage Science, USA). Subsequent purification of the DNA fragments was performed using AMPure beads. Nanopore 1D adapters were ligated to the end-repaired and adenylated DNA fragments using NEB Blunt/TA Master Mix (NEB, UK). The libraries were sequenced on a GridION X5 (Oxford Nanopore Technologies, UK). To improve the sequence quality, a library for second generation sequencing was prepared according to the standard protocol and sequenced on an Illumina Hiseq-2500 platform using a paired-end strategy.
Base calling of Nanopore raw data was performed with cloud-based Metrichor workflow [72]. Nanopore reads were processed using Poretools to convert fast5 files to fasta format [73]. High-quality reads were selected for further genome assembly. Canu (v1.7.11) was used for genome assembly of the Nanopore sequencing data [74]. Illumina paired-end reads were aligned and used for correcting base errors, fixing mis-assemblies, and filling gaps by Pilon (v1.22) [75]. After removing redundant sequences, the automated assembly of the P. copri genome was performed by Circlator (v1.5.5) [76]. To estimate the sequence contiguity and coverage, Nanopore reads after quality control were mapped to the assembled genome using Minimap2 (v2.11-r797) and Samtools (v1.9) [77, 78]. In addition, plasmid sequences were identified by blasting the genome against the plasmid database [79].
Protein-coding genes of P. copri genome were predicted using Prodigal (v2.6.3) [80]. The predicted protein-coding genes were further annotated with InterProScan using Blast2GO against Pfam (release 31.0), TIGRFAMs (release 15.0) and SMART (v8.0) databases [81, 82]. Functional annotation of protein-coding genes was also performed by Blast2GO with the KEGG database. To compare the abundances of those interesting genes identified on the P. copri genome and participating in arachidonic acid metabolism, BCAA biosynthesis, AAA biosynthesis and metabolism, insulin resistance, and other glycan degradation between high and low LMP pigs, the sequences from the metagenomic sequencing data were mapped to the obtained P. copri genome. The relative abundances of these genes were determined and compared using Wilcoxon tests. FDR < 0.05 was set as the significance threshold.
Construction of phylogenetic tree and analysis polysaccharide utilization loci of P. copri isolates
To construct the phylogenetic tree of P. copri isolates from humans and pigs, we downloaded 111 P. copri genomes from westernized and non-westernized human gut microbiome [19]. Gff file of each genome was generated using prokka (v1.11) [83] and used to produce the alignment of core genes by Roary (v3.11) [84]. The phylogenetic tree was constructed based on the alignments of core genes using neighbor-joining approach in Megan 7 and visualized by iTOL [85]. Polysaccharide utilization loci of P. copri isolates were predicted by using deCAN-PUL (http://bcb.unl.edu/dbCAN_PUL) with identity > 75% and E-value < 1e-50.
Mouse intervention study
Twenty-one germ-free mice having similar body weight and size (Kunming; 12 males and nine females, each six weeks of age) used in this study were housed in cages under sterile conditions. Male and female mice were kept separately. Feed and water were available ad libitum. After two weeks of acclimatization to the new environment and the standard chow diet, mice were randomly divided into three groups (four males and three females per group). One group received a chow diet with P. copri administered by gavage. A high-fat diet group (60% fat, Research Diet, D12492) was administered with P. copri by gavage, and a chow diet group without gavage was used as a control. For the two colonization groups, mice were given 100 μl of P. copri suspension (1 × 107 CFUs/μl) three times a week for four weeks. To further investigate the effect of diet on P. copri’s role in host fat accumulation, we used another 18 germ-free mice with the similar body weights and sizes (C57BL6; nine males and nine females) to perform gavage experiments using P. copri. These germ-free mice were managed and administered the bacteria using the same gavage methods and procedures described above. The 18 mice were randomly divided into three groups (three females and three males for each group) comprising a standard chow diet group, a high-fat diet (60% fat, Research Diet, D12492) group, and a high-fiber diet (35% fiber) group. The feeding experiment lasted four weeks, and mice were administered P. copri by gavage three times per week as described above.
Fecal samples were collected at the end of the gavage experiment, dipped into liquid nitrogen immediately, and stored at −80°C until use. All mice in each group had lean mass measured and body fat percentage calculated by a whole-body composition analyzer (Niumag, China) following the manufacturer’s instructions. After the body weight measurements, all mice were sacrificed by cervical dislocation. Epididymal fat was isolated and weighted for all mice. The epididymal fat percentage (EMP) was calculated. Tissue samples of colon, epididymal white adipose, and muscle were sampled from each experimental mouse for further RNA-seq analysis. Venous blood was taken from the inner canthus of each mouse for serum metabolomic analysis. The concentrations of lipopolysaccharide, intestinal barrier permeability plasma biomarkers, and pro-inflammatory cytokines were also determined in serum samples of phenotyped mice by ELISA using the method described above.
Quantifying the abundances of P. copri in treated mice
Mouse fecal bacterial DNA was extracted using the QIAamp fast DNA stool mini kit (Qiagen, Germany) as described above. The quantitative PCR was performed using a 7500-Fast Real-Time PCR System (ABI, USA) and SYBR® Premix Ex Taq™ II (TaKaRa, Japan). The two-step Real-Time PCR conditions were described as follows: an initial denaturation for 10 s at 95°C, 40 cycles of denaturation at 95°C for 5 s, and annealing at 60°C for 25 s. The RQ value of P. copri was determined by normalization to the 16S rRNA gene using the 2-ΔΔCt method [15]. Primer sequences are listed in Additional file 2: Table S9 [86].
Determination of metabolome profiling of serum samples
Metabolome profiles of serum samples were determined for 38 pigs randomly selected from the validation cohort, and for seven mice from each of control, P. copri gavage, and P. copri gavage + HFD feeding groups (a total of 21 samples). Blood samples were collected from the anterior vein. After being placed into serum separator tubes, all samples were centrifuged at 2500 × g for 15 min at room temperature to isolate the serum. Serum samples were immediately stored at −80°C until use. A 100-μL aliquot of serum sample was used for the extraction of metabolites using 3 ml of pre-cooled methanol (chromatographically pure) (Merck Corp., Germany). After vortexing for 1 min and incubation at −20°C in a refrigerator for 3 h, the mixture was centrifuged at 15,000 rpm for 15 min at 4°C to precipitate the protein. Then, 200 μl of the supernatant was processed in a Speedvac overnight. The concentrated product was resuspended by the addition of 150 μl of water/methanol (85:15, v/v) and then placed into a sampling vial pending ultraperformance liquid chromatography-quadrupole time-of-flight mass spectrometry (UPLC-QTOFMS) (Waters Corp., USA). The quality control was performed via a pooled QC sample by mixing equal volumes (15 μl) of each serum sample.
Chromatographic separations were performed on a UHPLC BEH C18 column (2.1 mm × 100 mm, 1.7 μm) (Waters Corp., USA) maintained at 40°C. The injection volume was 0.4 μl for each sample, and the samples for blank-QC-tests were run alternately. The column was eluted with a linear gradient of 1–20% B at 0–3 min, 20–50% B at 3–5 min, 50–70% B at 5–10 min, 70–85% B at 10–15 min, and 85–100% B at 15–17 min followed by a re-equilibration step of 5 min. For electrospray positive ion mode (ES+) analysis, the mobile phase was water with 0.1% formic acid (A) and acetonitrile with 0.1% formic acid (B). For Negative ion mode (ES−) analysis, eluents A with water and B with acetonitrile were used. The flow rate was set at 0.3 mL/min. All the samples were kept at 8°C during the analysis.
The mass spectrometric data in both positive and negative modes were collected using an electrospray ionization source. The source parameters were set as follows: capillary voltage: 3 kV; drying gas flow: 11 L/min, and gas temperature: 350°C. Centroid data were collected from 50 to 1200 m/z with a scan time of 0.3 s and interscan delay of 0.02 s over a 20-min analysis time. MassLynx software (Waters, USA) was used for system controlling and data acquisition. Data normalization was performed by QC samples using MetNormalizer in R (v 3.5.1) that generated a data matrix containing retention time, m/z value, and normalized abundance [87]. To obtain metabolite names and the molecular formulas, we aligned the molecular mass data (m/z) of ions to the metabolites in the HMDB database (http://www.hmdb.ca) with a mass error of 10 ppm or less [88].
Associations between serum metabolites and porcine LMP phenotypic values were tested by Spearman rank correlation in the 38 experimental pigs. The analysis was performed using both 16S rRNA gene sequencing and metabolome analyses at FDR < 0.05. The correlations between the LMP-associated serum metabolites and the LMP-associated OTUs were assessed by Spearman correlation coefficients (FDR < 0.05). To further evaluate the correlation between the LMP-associated bacterial species and the LMP-associated serum metabolites in the 16 tested samples from the metagenomic sequencing, the metabolites differing in normalized abundance between high (n = 8) and low LMP pigs (n = 8) were identified by LEfSe. The online MetaboAnalyst program (http://www.metaboanalyst.ca) was used to assign the differential metabolites to KEGG pathways [89]. PERMANOVA and Spearman’s correlation analysis were performed to assess the correlations between the LMP-associated bacterial species and the LMP-associated serum metabolites as described above. The serum metabolites with different abundances between controls and gavage-treated mice were identified by LEfSe.
Quantifying serum concentrations of lipopolysaccharide, intestinal barrier permeability biomarkers, and pro-inflammatory cytokines
We quantified the concentrations of serum lipopolysaccharide (LPS), fatty acid-binding protein 2 (FABP2), zonulin, IL-1, IL-6, IFN-γ, and TNF-α using the enzyme linked immunosorbent assay (ELISA) method with commercial ELISA kits (Keshun, China) following the manufacturer’s instructions. Briefly, except for the blank control wells, 50 μl of standard samples or appropriately diluted serum samples were added into the 96-well microtiter plates coated with the primary antibodies, and then 100 μl of HRP-conjugated secondary antibodies was added to the microtiter plates and incubated for 60 min at 37°C. Microtiter plates were washed four times with washing buffer, and 50 μl of substrates A and B were added to each well of microtiter plates, mixed gently, and incubated for 15 min at 37°C under light shading conditions. Finally, 50 μl of enzymatic reaction termination solution was added to each well to stop the reaction. The O.D. value for each sample at 450 nm was measured and recorded using a microtiter plate reader (Tecan Infinite 200 pro, Switzerland). A standard curve was plotted according to the O.D. values and the concentrations of standard samples. The serum concentrations of LPS, intestinal barrier permeability plasma biomarkers, and pro-inflammatory cytokines in each test sample were determined using the standard curve. Each standard and tested serum sample was measured in triplicate. The Wilcoxon rank sum test was used to compare the serum concentrations of LPS, FABP2, zonulin, and pro-inflammatory cytokines between high and low LMP pigs at an FDR < 0.05. The multiple group comparisons of these data among experimental mice were performed by the Kruskal-Wallis test34. All these analyses were carried out using the R software (v3.5.1).
RNA extraction, sequencing, and data analysis
The mice used for confirming the causality of P. copri were further used for RNA sequencing analysis. Six mice from the group administrated with P. copri and fed standard chow diet, and the other six mice from the control group were randomly chosen. Total RNA was extracted from colon, epididymal white adipose, and muscle tissues using Trizol (ThermoFisher, USA) according to the manufacturer’s manuals. The RNA concentration and integrity were assessed using a Nanodrop-1000 spectrophotometer (ThermoFisher, USA) and a bioanalyzer-2100 (Agilent, USA). The cDNA libraries were prepared using the Illumina Truseq Stranded mRNA preparation kit (Illumina, USA) according to the manufacturer’s guidelines. The libraries were sequenced on an Illumina HiSeq 2500 platform (Illumina, USA). Raw data were trimmed for adapter sequences, and low-quality reads were filtered out to generate clean data. After that, the HISAT, StringTie and Ballgown pipelines were used to explore differentially expressed genes (DEGs) between controls and colonized mice as described previously [90]. Briefly, Hisat2 (v2.1.0) was employed to build a reference genome index, and then high-quality read sequences were aligned to the mouse reference genome assembly (GRCm38) to generate SAM files. Samtools (v1.8.0) was used for SAM file transformation and read sorting to generate sorted bam files [77]. Transcript assembly and quantification were performed using StringTie (v1.3.4). The outputs of StringTie, including gene annotation and gene abundance files, were processed by Ballgown (v3.5) to identify DEGs based on FPKM values with FDR < 0.05. DEGs were aligned to the GO database to perform gene ontology analysis [91].
Statistical analysis
Shapiro-Wilk and Levene’s tests were performed to evaluate the distribution and equality of variances of the LMP values in the tested pig populations. The significance levels of differential LMP values between two groups of pigs selected for metagenomic sequencing (8 vs. 8), and between two groups of pigs used for determining serum metabolome profiles (17 vs. 21) were determined by t-tests. All analyses were performed using R (v3.5.1). The bacterial species, function capacities of gut microbiome, and serum metabolite features showing differential abundance between high and low LMP pigs were identified using LEfse using the online version of Galaxy at a significance threshold criteria of LDA score > 2.5 and alpha value < 0.01 [92]. The associations between the relative abundances of bacterial species and serum metabolite features were analyzed with the PERMANOVA method. The correlations between the LMP-associated bacterial species and the LMP-associated functional capacities of gut microbiome, and between the LMP-associated bacterial species and the LMP-associated serum metabolites, were evaluated using Spearman’s correlation at FDR < 0.05.
For the colonization experiments using mice, we first tested the distributions of the phenotypic data in each group by the Shapiro-Wilk and Levene’s tests. The multiple group comparisons of the phenotypic values of body fat percentage and epididymal fat percentage among controls, P. copri gavage mice, and HFD + P. copri gavage mice or among P. copri gavage, P. copri gavage + HFD and P. copri gavage + high fiber diet groups were performed by TukeyHSD tests at FDR < 0.05. Differential serum metabolites between groups were identified by LEfSe at LDA score > 3.5 and alpha value < 0.01.