Experimental design and sample collection
Animal care and experimental procedures were carried out following national and institutional guidelines for the Good Experimental Practices and were approved by the IRTA Ethical Committee. A total of 648 healthy Duroc pigs from the same commercial genetic line, at two different ages, and allocated in two distinct farms were employed (Figure 1). Faecal samples were collected at 60 ± 8 days of age from 400 weaned piglets (201 males and 199 females) distributed across six batches on commercial farm conditions. Refer to (Ballester et al. 2020) and (Ramayo et al. 2021) for more information about animal management and data collection.
The remaining 248 pigs (119 castrated males and 129 females) were reared during 2017 and 2019 at IRTA experimental farm in Monells (Girona, Spain). Faecal samples were taken at 190 ± 10 days of age when pigs were fed a finishing standard diet. For 134 out of the 248 pigs, BW (kg) and average daily feed intake (kg) were recorded using both a weighing scale and electronic feeding stations during the experimental period (INSENTEC). These comprehensive datasets were employed to calculate Average Daily Gain (ADG), Feed Conversion Ratio (FCR), and Residual Feed Intake (RFI), computed from a model after adjusted by the metabolic weight, ADG, and the backfat thickness at the end of the period. Moreover, 96 out of the 248 growing pigs, comprising 48 castrated males and 48 females, were allocated into eight pens of 12 animals (six castrated males and six females per pen). Five out of the 96 pigs were excluded, while the remaining 91 were distributed across two treatments.: T1 – control group (N=44), involving pigs not mixed during the growing-finishing period; and T2 – stress group (N=47), inducing social stress by mixing the pigs three times during the growing-finishing period. Cortisol (CORT) levels in hair samples of these pigs were analysed at the end of the experiment as follow: 150 mg of hair was weighted from each hair sample and placed into a 50-ml conical tube. After three washes with 3 ml of isopropanol, all samples were left to dry in an extractor hood during 12 h. Dried hair samples were cut into 2–3 mm pieces using scissors, and 50 mg were transferred into 2 ml eppendorf. One ml of methanol was added to each sample and incubated 18 h at 37 °C under moderate shaking (100 rpm). After incubation, extracted samples were centrifuged at 7000g for 2 min and 700 µl of supernatant was transferred to a new 1.5 ml tube. The supernatant was placed into a speed vac for 2 h to evaporate the methanol. The dried extracts were stored at −20 °C until analysis. Total concentrations of CORT were measured by ELISA kit (Cusabio Technology LLC., Bionova, Spain) with dried samples reconstituted with 210 µl of sample diluent. Samples were quantified by reference to standard curves constructed with known concentrations of pig cortisol dilutions of the Standard. Absorbance was read at 450 nm using an ELISA plate reader (Bio-Rad) and analysed using the Microplate manager 5.2.1 software (Bio-Rad).
Microbial DNA extraction, sequencing, and bioinformatics analysis
DNA was extracted from the fecal samples with the DNeasy PowerSoil Kit (QIAGEN, Hilden, Germany) following manufacturer’s instructions. Extracted DNA was sent to the Keck Center at the University of Illinois for Fluidigm sample preparation and paired-end (2 × 250 nt) sequencing on an Illumina NovaSeq (Illumina, San Diego, CA, USA). The 16S rRNA gene fragment was amplified using the primers V3_F357_N: 5'-CCTACGGGNGGCWGCAG-3' and V4_R805: 5'-GACTACHVGGGTATCTAATCC-3'. Sequences were analysed with Qiime2 (Bolyen et al. 2019). Barcode sequences, primers and low-quality reads (Phred scores of < 30) were removed. The quality control also trimmed sequences based on the expected amplicon length and removed chimaeras. Afterwards, sequences were processed into Amplicon Sequences Variants (ASVs) at 99% of identity. ASVs present in less than two samples and representing less than 0.0001% of the total counts were filtered out. Samples with less than 10,000 reads were also excluded. ASVs were classified to the lowest possible taxonomic level based on a primer-specific trained version of GreenGenes2 database (released October 2022) (McDonald et al. 2023).
Enterosignatures identification
Following Frioux et al (2023), to evaluate the existence of enterosignatures in pigs, a non-negative matrix factorization (NMF) approach was applied to the normalized genera relative abundance table. NMF decomposed the initial scaled table to two new ones: W, defined as the weight of the genera in ESs, and the second one, H, defined as the existence of ESs across samples (Supplementary Figure 1). Particularly, W represents the contribution of the genera (n in rows) to the resulted ESs (k in columns) when normalized by the columns. The optimal number of ESs was evaluated by performing a nine-fold bicross validation (Owen and Perry 2009) as it was suggested in Frioux et al (2023). For each of the 9-folds, the model uses 8/9 of the folds as the training set, while the performance was validated on the remaining data set (1/9). The process was repeated 200 times for a k number of ESs ranging from 2 to 30, each time the matrix was randomly shuffled. The optimal k of pig ES was determined by tracking the derived evaluation metrics corresponding to each ES, such as reconstruction error, explained variance, and cosine similarity. More specifically, the best k was considered once the gain of the previous metrics used to assess the quality of the decomposition, does not change. Pigs ES computation was performed by applying NMF with Python 3.9, SciPy v1.7.3. and Scikit-Learn v0.24.1 for 100 runs, a maximum number of 2000, random initialization, the multiplicative update solver and Kullback-Leibler divergence as a beta-loss function. Additionally, an L1 regularization parameter equal to 1.0 was selected and applied to both H and W matrices.
Enterosignature analyses
After reducing the putative number of clusters, a second step of the ESs evaluation was done using the enterosignatures_nmf.py script (https://gitlab.inria.fr/cfrioux/enterosignature-paper) for each of the two data set (400 and 248 samples). The script uses the previously estimated k optimal ESs as hyperparameter. In addition to this evaluation, we run the Python script for a range of k [2-10], keeping the output evaluation scores to define the optimal k. During each run, the NMF algorithm implemented the decomposition of the initial relative abundance of genera matrix, resulting in the two new complementary matrices (H and W). The driven genus in each k-ESs was identified based on the higher contribution value across all the genera, while the assignment probability of genera to ESs is retrieved by normalizing the W matrix row-wise. Thus, the ESs are assigned to each sample based on the higher relative abundance value obtained by the normalized H matrix. The same H matrix is used to define the number and set of ESs needed to explain most of variance in relation to host covariates or phenotypes (e.g. age, sex, BW, ADG, RFI). The accumulated relative abundance is calculated for a k of ESs in each sample. Then, we applied the same cut-off of 90% used in Frioux et al. (2023) to filter out k of ESs below this threshold.
Determinants of pig ES and association with host performance
We aim to explore the determinants influencing pig ES assembly, considering both host-specific and environmental factors, and examine associations between pig ES and host productive traits. Our dataset comprises 648 samples gathered at various ages (N=400 at 60 days vs N=248 at 190 days of life), from different farms (N=400 commercial vs N=248 experimental facilities), and from diverse experimental setups. Consequently, metadata information varies between the samples. To address this, we conducted separate analyses for the 400 and 248 samples based on the available information as outlined below:
1. The composition of ESs in the 400 piglets was first studied in two discrete traits: batch and sex effects. For the batch category, there were six different groups with number of samples 57, 68, 73, 71, 72, and 59. The sex category contains 201 male piglets and 199 females. For this type of traits, pie plots were generated to demonstrate the minimum number of ESs needed to explain most of the variance within each category, as well as the ES distribution across each category. Specifically, for each sample, the variance explained by the ES was considered as the corresponding abundance value. Then, the cumulative sum of the explained variance was calculated gradually, keeping the minimum number of ESs that explain 100% of the variance. The frequency of specific ES that appeared in the corresponding selected number was also calculated. Furthermore, the dynamics and association of ES abundance with age and BW were also investigated across the 400 samples using ggplot (Wickham, 2016) and geom_smooth() option with method “lm” to add a regression line, where “lm” stands for a classical linear model.
2. On the other hand, concerning the 248 piglets raised under experimental conditions, our initial focus was on examining the diversity of ESs across sex, encompassing 119 males and 129 females. Subsequently, we conducted two independent analyses to explore the progression of pig ESs based on quantitative traits:
2.1 The dataset comprising 134 samples was utilized to investigate the relationship between pig ES with ADG and feed efficiency traits (FCR, RFI).
2.2 The impact of stress challenge on pig ES assignment was evaluated based on the dataset consisting of 91 samples (47 stress vs. 44 control). Additionally, this dataset was employed to assess the association between ES abundance and levels of hair cortisol.
Genotyping, and estimation of genetic parameters
After evaluating host-specific and environmental factors affecting pig ES, the potential genetic impact of the host was investigated using the 400 samples collected at 60 ± 8 days of age. The utilization of this dataset with a larger sample size than the growing pigs dataset is expected to improve the power and the accuracy of estimated genetic parameters, including heritability (h2) and the genetic correlation between pig ES and BW at the time of sampling (60 days). The Porcine 70 k GGP Porcine HD Array (Illumina, San Diego, CA) was used to genotype the 400 animals. The quality control excluded SNPs with minor allele frequencies < 5%, rates of missing genotypes above 10%, and SNPs that did not map to the porcine reference genome (Sscrofa11.1 assembly). The h2 of ES was estimated by applying Reproducing Kernel Hilbert Space (RKHS, Gianola et al. 2006) from the BGLR package (Pérez and de Los Campos, 2014) separately for each of them using the following model: