Mouse strains and diets
Wildtype (WT) C57BL/6J mice (from Animal Resources Centre, WA, Australia) were housed at the specific pathogen-free animal facility at QIMR Berghofer Medical Research Institute, and experiments were approved by the animal ethics committees of QIMR Berghofer Medical Research Institute, or University of Queensland, Australia.
Breeding-age females were fed either a high-fiber diet (HFD) or low-fiber diet (LFD) from three weeks prior to timed mating and until the end of the study. HFD and LFD chow, which were identical except for the carbohydrate makeup (Table S1), was purchased from Specialty feeds (Western Australia, Australia). Briefly, HFD carbohydrates were a mix of crude fiber, acid detergent fiber, cellulose, and wheat starch, while LFD carbohydrates were a calorically equivalent quantity of glucose monohydrate.
Milk collection and enrichment
Milk was collected from lactating HFD- or LFD-fed mothers at post-partum day seven. Briefly, dams were separated from their litter then injected i.p. with oxytocin (2 IU/kg). After induction of anaesthesia via injection (i.p. route) of a mixture of xylazine (10 mg/kg) and ketamine (100 mg/kg), the fur was removed and the teats and surrounding area cleaned with ethanol swabs to reduce contamination. Gentle pressure was applied to the nipple to release the milk (for microbiome samples, the areola was first punctured with a needle), which was collected into sealed serum bottles containing 30% anaerobic glycerol in PBS (for later enrichment) or without glycerol (for later metabolite and microbiome analysis) and stored at -80°C prior to further processing.
Anaerobic yeast-casitone-fatty-acids broth was prepared as described [52]. The broth was inoculated separately with milk collected from HFD- or LFD-fed mothers and incubated anaerobically at 37°C for 48 hours. Enriched cultures were centrifuged for 5 min at 7,500 g to obtain bacterial pellets, which were then washed twice with anaerobic Ringer’s solution. Final pellets were resuspended with a sufficient volume of sterile 30% (v/v) glycerol in PBS to produce an optical density of 1.0 at 600 nm and were stored at -80°C for future experiments.
Metabolite extraction and gas chromatography-mass spectrometry (GC-MS) analysis
20 µL of mouse milk was added into a 1.5 mL Eppendorf tube on ice. Metabolites were extracted by addition of 100 µl of methyl-tert-butyl-ether:methanol (1:1 v/v) containing internal standard U-13C-sorbitol (16.6 µM). Samples were then vortexed for 1 min and centrifuged for 10 min at 16,100 g, 7°C. Supernatant was then transferred to fresh 1.5 mL Eppendorf tube, and a 20 µL aliquot of the supernatant was dried under vacuum (Eppendorf Concentrator Plus) in glass pulled point inserts. A 10 µL aliquot from each sample was also pooled to generate three pooled biological quality control samples (PBQCs), each containing 20 µL of the vacuum dried mixture. Samples and PBQCs were derivatized by methoxyamination by the addition of 25 µL methoxyamine (30 mg/mL in pyridine, 2 h, 37°C, 900 rpm, Eppendorf ThermoMixer C), followed by trimethylsilylation with 25 µL BSTFA + 1% TMCS (45 min, 37°C, 900 rpm, Eppendorf ThermoMixer C).
GC-MS analysis was performed on a Shimadzu GC/MS-TQ8050 NX system. 1 µL of derivatized sample was injected into the GC inlet set at 280°C in 1:10 split mode, and chromatographically separated using an Agilent DB-5 ms capillary column (30 m × 0.25 mm × 1 µm) with helium at 1 mL/min. A 100°C starting temperature was held for 4 min, then ramped at 10°C/min to 320°C and held for 11 min. Compounds were fragmented by electron impact ionization and analyzed in MRM mode using the Shimadzu Smart Metabolites Database containing 475 MRM metabolite targets (https://www.shimadzu.com/an/gcms/metabolites/index.html). A high-quality matrix was manually curated using the Shimadzu LabSolutions Insight GCMS program (v.3.7 SP3), where metabolites not present in all samples were removed from the dataset. Compound abundances were normalized within each sample by first scaling by the percent difference of internal standard abundances between the sample and the pool average, and then subtracting the average blank abundance of each compound.
DNA extraction and sequencing
After thawing on ice, total DNA from was extracted from 100 µL of either raw or enriched milk using an adaptation of a previously described method [53]. Briefly, the thawed sample was transferred into a sterile 2 mL screw cap tube containing 0.4 g of sterile zirconia beads. After adding 600 µL of lysis buffer (50 mM EDTA, 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, and 4% SDS), tubes were homogenized in a Precellys 24 Tissue Homogenizer at 4°C and 5000 rpm for 3 x 60 s with 30 s rest. The lysates were incubated for 15 min at 70°C, with mixing by inversion once every 5 min. The lysate was then centrifuged for 5 min at 13,200 g and 4°C, and the supernatant was mixed with 30 µL Proteinase K (Maxwell 16 LEV Blood DNA Kit, Promega) in a new 1.5 mL Eppendorf tube, and vortexed for 30 s. The mixture was then incubated at 56°C for 20 min, and the remaining extraction steps completed following the Maxwell 16 LEV Blood DNA cartridge protocol. After extraction, the remaining paramagnetic particles were removed by centrifugation for 2 min at 10,000 g at 4°C, and RNA was removed by incubation at 37°C for 15 min with RNase A (10 mg/mL final concentration; PureLink). DNA concentration was measured using the Qubit dsDNA BR Assay Kit (ThermoFisher Scientific), and quality was checked visually by agarose gel electrophoresis.
The V6-V8 hypervariable regions of microbial 16S rRNA genes were amplified using primers 926F (5’-AAA CTY AAA KGA ATT GRC GG-3’) and 1392wR (5’-ACG GGC GGT GWG TRC-3’) [54]. Both the reverse and forward primers were modified to include an Illumina overhang adapter at their 5’ ends to be compatible with Nextera XT indices i5 (Index 2) and i7 (Index 1), respectively, while the forward primers also contain an 8 bp unique molecular identifier (MID). PCR was performed using 2.5 µL template DNA in 1x AmpliTaq Gold 360 master mix (Applied Biosystems) with 200 µM of each primer, made up to a total volume of 20 µl with ultrapure water. PCRs which performed on a SimpliAmp 96-well Thermocycler (Applied Biosystems) using the following thermocycling conditions: 95°C for 8 min; then 35 cycles of 95°C for 20 s, 56°C for 30 s, 72°C for 45 sec min; followed by 72°C for 7 min. Amplification products were visualized by agarose gel electrophoresis, then purified using an 18% suspension of Sera-Mag Speed-beads Carboxyl Magnetic Beads (GE Healthcare), added in a ratio of 1.8 beads : 1 vol PCR product and resuspended in 25 µl water. PicoGreen dsDNA Quantification Kit (Invitrogen) was used to quantify the purified MID-barcoded amplicons, which were pooled in equimolar quantities and then subjected to dual indexing using the Nextera XT Index Kit (Illumina) according to the manufacturer’s guidelines, before purification as described above. The concentrations of the indexed amplicons were normalized and pooled. Finally, 16S rRNA gene amplicon libraries were sequenced using a MiSeq Reagent Kit v3 (Illumina- 600 cycle and Illumina- 30% PhiX Control v3) at the Queensland Brain Institute (QBI), University of Queensland.
Whole genomic DNA (gDNA) was sequenced to enable genome recovery, species identification, and metabolic reconstruction. Libraries from gDNA were prepared and indexed using IDT® for Illumina Nextera DNA Unique Dual Indexes (Illumina, Australia). Libraries were purified using an 18% suspension of Sera-Mag Speed-beads Carboxyl Magnetic Beads (GE Healthcare) as described above and sequenced by the Australian Centre for Ecogenomics (ACE) Sequencing service.
Processing of 16S rRNA gene amplicon data
The sequencing data was processed via a modified UPARSE approach [55]. Briefly: i) forward reads were quality filtered and clustered to produce an OTU table using default parameters using USEARCH (v10.0.240) [56]; ii) SILVA SSU r138 [57] taxonomy was assigned using BLASTN (v2.3.0+) [58] within QIIME2 (v2017.9) [59] and; iii) the OTU table was filtered for non-bacterial sequences using BIOM [60]. Finally, the OTU table was rarefied to 1000 sequences for each sample, and the number of observed OTUs and Shannon’s diversity index were calculated using QIIME2.
Metagenome-assembled genome (MAG) assembly
Raw reads were quality-trimmed to ≥ 10 trailing bases and an average of ≥ Q15 across a four-base window with Trimmomatic (v0.36) [61]. Trimmed reads of < 75 bases were then discarded. Cleaned reads were then assembled using four different methods in parallel (all using default parameters): individual sample assembly with metaSPAdes (v3.11.0) [62]; individual sample assembly with MEGAHIT (v1.1.2) [63]; co-assembly of replicate samples with metaSPAdes; and co-assembly of replicate samples with MEGAHIT. Metagenome-assembled genomes (MAGs) were then produced from each assembly using differential coverage binning, whereby all reads from all samples were mapped to an assembly using BWA mem (v0.7.17-r1188) [64], mapped reads were sorted using Samtools (v1.9) [65] sort, and finally the assembly’s contigs were binned based on sample-to-sample differential coverage with MetaBAT (v2.12.1) [66].
Quality statistics were evaluated for all 568 resulting MAGs using the CheckM (v1.0.7) [67] lineage workflow, and then a final set of high quality (Completeness – 4·Contamination ≥ 80%) and partial (Completeness – 4·Contamination < 80% and Contamination < 10%) MAGs were dereplicated at 99% average nucleotide identity with dRep (v2.6.2) [68], whereby final MAGs were selected from secondary clusters based on the following priority: 1) single-assembled raw milk MAGs; 2) co-assembled raw milk MAGs; 3) single-assembled enriched milk MAGs; 4) co-assembled enriched milk MAGs. Finally, GTDB r202 taxonomy [69] was assigned using the GTDB-Tk (v1.7.0) [70] de-novo workflow.
To check whether dereplicated MAGs represented the same strains in both raw and enriched communities, consensus strain variants for each MAG were called and compared between samples. Trimmed reads from each sample were mapped to a concatenated file containing all final MAGs, ensuring that all read-MAG pairings were unique. Reads were mapped with BWA mem, with off-diagonal X-dropoff of 120 and end clipping penalty of 7, and the resulting sam files were then separated by MAG. Next, genotype likelihoods were assessed with Samtools sort followed by mpileup. These were then used to call variant genomes with BCFtools call, followed by vcfutils.pl vcf2fq, and finally seqtk (v1.2-r94) seq to convert to fasta format. To compare the raw and enriched variants, sample-by-sample fastANI (v1.33) [71] distance matrices were generated for each MAG and compared using PERMANOVA.
Host contamination and community profiling
Community abundance profiles were calculated from the distribution of raw reads both between kingdoms, and within bacteria. For the kingdom-level profile, host reads were first identified and removed from each sample’s raw reads by mapping to the Mus musculus C57BL/6J NCBI reference genome (GCF_000001635.27) using Bowtie2 (v2.3.4.1) [72] with ‘sensitive-local’ settings, and then using Samtools to sort (sort -n), filter (view -f 12 -F 256), and export (fastq) unmapped reads. Raw and filtered reads were counted with grep -c “@”, and the number of host reads was calculated as the difference between raw and filtered counts.
Host-filtered reads were then used to assess MAG-associated reads using CoverM (v0.6.0) (genome -m trimmed_mean --bam-file-cache-directory), which scales MAG coverage by MAG size after excluding the 5th and 95th percentile of positional coverages for each MAG. The exported mapping files were used to filter and count MAG-associated reads from each sample using the same workflow used for host reads.
The distribution of remaining reads between remaining kingdoms was then assessed using Kraken2 (v2.1.1) [73] with ‘report’ output, where the reference database was built from NCBI nr on 28 July 2021. It was assumed that kingdom-distribution of uncharacterized reads was proportional to that of the characterized reads, and so kingdom-characterized read counts were scaled to count of total input reads. The bacterial reads measured by Kraken were counted as “unbinned” reads. These were summed with the MAG-associated reads to get total bacteria reads. No additional animalia reads were identified, and so only M. musculus is represented in the results.
Finally, the kingdom-level reads distribution profiles were calculated as the percent distribution of read counts in each sample. Similarly, the MAG relative abundance profiles were calculated as the percent distribution of trimmed mean coverage within each sample, scaled by the relative proportions of MAG-associated and unbinned reads.
MAG Richness
Assembled community richness was also used to estimate the proportion of raw milk lineages assembled into MAGs. A set of 24 single-copy bacterial marker genes were identified and clustered into OTUs from both the host-filtered reads and the MAGs using SingleM (v0.13.2) pipe. The percent of total lineages assembled into MAGs was then estimated by comparing the reads and MAGs OTUs using SingleM appraise.
MAG functional annotation
High quality, near complete MAGs were functionally annotated using the EnrichM (v0.6.3) annotate workflow. Briefly, open reading frames were identified and translated to protein sequences using Prodigal (v2.6.3) [74] in meta-mode. Protein sequences were then functionally annotated for either carbohydrate degradation, protein degradation, or general metabolic function. Carbohydrate degradation functions were identified using HMMER (v3.1b) [75] hmmsearch against the CAZy database [76]. Protein degradation and general metabolic functions were identified using Diamond (v0.9.36) [77] blastp search against the MEROPS [78] and KEGG [79] databases, respectively. The KEGG database was constructed by KO-annotation of the UniRef100 database [80]. All database alignments applied cutoffs of 1e-5 E-value, 30% identity match, 70% query alignment, and 70% reference alignment. CAZy, MEROPS, and KO gene counts from each MAG were then parsed into MAG-by-CAZy/MEROPS/KO (respectively) gene count matrices.
Key immunoregulatory metabolic pathways were defined based on the KEGG module definition structure using KO accession numbers, whereby genes that perform analogous reactions are separated by commas, and individual reactions are separated by spaces (Table S3). EnrichM classify was then used to compare the MAG-by-KO matrix to the pathway definitions, and a MAG-by-pathway presence/absence matrix was built whereby a pathway was present if the MAG could perform at least 80% of the reactions while missing no more than two. This allowed for the possibility of unassembled genes given the 80% quality score cutoff for “complete” MAGs.
Cholesterol degradation to coprostanol, associated with the ismA gene, has no KEGG representative and contains a highly conserved active site. As such, a graftM (v0.13.1) [81] package was constructed from the IsmA confirmed, probable, and negative protein sequences reported by Kenny et al. [21] using Clustal Omega (v1.2.4) [82], HMMER hmmbuild, and graftM create. GraftM graft then hmmsearched, aligned, and grafted MAG protein sequences to the IsmA tree, filtering to ≥ 30 aligned residues and 1e-5 E-value.
Statistics and Data Visualization
Metabolite Log2 Fold Change (L2FC) was calculated as log2 of the average HFD divided by the average LFD metabolite abundances. P-values were generated via 2-way ANOVA with R (v4.1.2) stats using anova. Significant (p < 0.05) metabolites with L2FC ≥ 0.5 were then plotted with ggplot2 (v3.3.6) [83] using geom_bar, coord_flip, and theme_ipsum.
The 16S rRNA gene amplicon and MAG community relative abundance profiles were Hellinger-transformed, and then visualized as heatmaps in R with pheatmap (v1.0.12) [84]. Gene count and trait presence/absence profiles were also visualized with pheatmap using the row and column cluster features. MAG dendrograms were retained, while genes and traits were re-grouped by classification.
Reads distribution and alpha diversity metrics were plotted with ggplot2 using geom_bar, geom_errorbar, and theme_ipsum. Error bars represent 95% confidence intervals, calculated as the lower 2.5% probability tail t-stat (qt from stats) multiplied by the standard error (se in sciplot (v1.2-0) [85]). Post-hoc group labelling was generated by applying Tukey’s HSD adjustment with aov from stats.
Community beta diversities and functional distributions were assessed with vegan (v2.6-2) [86] using rda and cca after confirming significant group variances via PERMANOVA with adonis2 using Euclidian distances of either Hellinger transformed (MAGs and traits) or z-score standardized (genes) abundances. The primary and secondary axis site and species scores were then plotted with ggplot2 using geom_point and stat_ellipse. The Mantel test was used to determine whether two sample sets were significantly similar using mantel from vegan on the Euclidian distance matrices of each sample set, generated with vegdist. Significant indicator features were identified by fitting the normalized abundance matrices of those features to the given ordination using envfit from vegan. Z-score standardized abundances were used for indicator metabolites. The features were then added to the ordination plots with ggplot2 with geom_point.