Efficiency of methods for improving the proportion of microbial reads
We evaluated different approaches to optimise shotgun metagenomic sequencing-based analysis of the microbiomes of milk. Seven approaches were provisionally tested to assess their suitability for extracting DNA from low volumes of material (Additional file 1: Figure S1). From these, only the DNeasy PowerSoil Pro (PS) and MolYsis complete5 (ML) kits yielded sufficient DNA concentrations. Therefore, the three methods presented in this study that were used for further evaluation with improved starting volumes were the (i) PS kit, which is not specifically designed to enrich microbial DNA or deplete host DNA, (ii) the PS approach but used in combination with the NEBNext Microbiome Enrichment kit (NEB) and (iii) ML kit, designed to decrease the amount of host DNA present.
Microbial DNA was extracted from each of six bovine and six human milk samples using the three different methods (Fig. 1). The extracted DNA was subjected to shotgun metagenomic sequencing, generating an average of 3,972,773 (± 383,371) high quality paired end reads per sample (Additional file 2: Table S1), Although the DNA extraction approach taken did not significantly impact on the total number of reads per sample, the proportions of host to microbial reads differed considerably (Fig. 2a). More specifically, the ML kit yielded a significantly higher (P < 0.05) % of microbial reads (Fig. 2b), i.e., 38.31% average (range 2.01 – 93.12%), relative to the other two methods, i.e., NEB kit 12.45% average, (range 1.03 – 41.63%) and PS kit 8.54% average (range 1.22 – 30.28%).
Differential performance of taxonomic classification algorithms on spiked samples
Bovine and human milk samples (BMS12 and HMS12) were spiked with a mock community of ten strains, present at equal proportions, to investigate the potential for the introduction of bias due to differences in extraction efficiencies from different bacteria and to inform the choice of bioinformatic classification tools. Negative extraction controls were also included to investigate the potential for kit contamination.
After sequencing and quality control of the sequences, taxonomic classification was performed using three different tools, Kaiju, Kraken2, and MetaPhlAn2. Overall, the top 20 species detected accounted for 48.7% of the species identified by Kaiju, 98.3% of the species identified by Kraken2, and 61.3% of the species identified by MetaPhlAn2 (Fig. 3a). The mock community accounted for an average relative abundance of 29.2% of the species identified by Kaiju, 96.6% of the species classified by Kraken2 and 58.1% of the species identified by MetaPhlAn2. All classifiers detected the 10 mock community species, although at varying relative abundances. In general, all three tools showed an under-representation of Bifidobacterium adolescentis (0.03 – 0.49%) and Clostridium beijerinckii (0.08 – 1.04%) in samples and an over-representation of Deinococcus radiodurans (12.29 – 38.16%) (Fig. 3a, 3b). Both Kraken2 and MetaPhlAn2 produced similar mock community composition profiles, albeit with Kraken2 assigning a much higher % of reads. Both Kaiju and Kraken2 assigned reads within the negative control samples, while MetaPhlAn2 only assigned reads from one of the three negative control samples. The taxa identified were distinct from those detected in milk samples (without spiked controls) (Additional file 3: Table S2).
When the abundances of the 10 mock community strains were investigated in greater detail, Kraken2 and MetaPhlAn2 were found to assign the strains at relative abundances closer to that expected for the mock community than Kaiju. Kraken2 identified Escherichia coli, Enterococcus faecalis, D. radiodurans, and Bacillus cereus at a higher abundance than expected, while MetaPhlAn2 appeared to over-represent both E. faecalis and D. radiodurans (Fig 3b). When the Bray-Curtis dissimilarity of all mock community samples were plotted in individual PCoA plots by taxonomic rank, the overall dissimilarity increased further down the taxonomic ranks, but communities classified with Kraken2 consistently clustered closer to the mock community from phylum to species (Fig. 3c). At the highest rank (phylum), Kraken2 clustered closest to the mock community and significant differences were observed between the Kraken2 and MetaPhlAn2 communities (P = 0.012). Ultimately, Kraken2 was selected for further downstream analysis as it performed best in terms of overall correct assignment of mock community DNA.
The spiked mock community samples were additionally compared using Kraken2-classified metagenomic reads with 16S rRNA amplicon reads. The determined community composition, using these two different sequencing approaches, was similar, with small differences in relative abundances (Fig. 3d). Additional genera, including Acinetobacter and Lactococcus were consistently detected in bovine milk samples regardless of sequencing method.
Analysis of bovine and human milk samples – community structure and taxonomic and functional profiles
The species-level composition of bovine and human milk microbiome samples as assigned by Kraken2 was compared across the different extraction methods used (Fig. 4a). In bovine milk, aliquots of the same samples that were extracted using different methods clustered closely together and, in general, generated similar profiles, albeit with slightly varying relative abundances. However, ML extraction from BM7 led to the additional assignment of reads to Lactococcus lactis, Pseudomonas alkylphenolica and Acinetobacter baumannii at low abundances (<0.1%). For sample BM8, L. lactis was detected at higher relative abundances in the sample extracted with the ML kit (compared to the PS and NEB kit).
With regard to the human milk samples (Fig. 4b), with the exception of HM7, aliquots of the same sample that were extracted by different methods did not cluster together. HM8_ML was found to be more similar in species composition to HM10_NEB and HM11_ML, reflecting the detection of Staphylococcus epidermidis in these samples. Similarly, HM9_PS clustered with HM8_NEB, HM8_PS, HM11_NEB and HM11_PS reflecting the high relative abundance of Yersinia enterocolitica detected in these samples.
Despite the taxonomic compositional differences, no significant dissimilarity was observed between the methods for either bovine (PERMANOVA: P=0.997, R2=0.02362) or human (PERMANOVA: P=0.749, R2=0.08218) samples (Fig. 4c). The overall community diversity was not impacted by any method used, implying that there was no community-level bias introduced by any of the methods.
In terms of functional profiling, the predicted microbiome function in bovine or human milk samples using the three sub-domains of gene ontology (cellular components, biological processes and molecular functions) was not significantly impacted by extraction method used (Additional file 4: Figure S2).
Characterization of the milk microbiome
Extraction of DNA from milk samples using the ML kit resulted in greater apparent species richness from an overall higher detection of unique observed species. This increase was significant (P < 0.05) among bovine milk samples (Fig. 5a). A total of 84 metagenome-assembled genomes (MAGs) were recovered across all samples, of which 19 were considered high quality, using the cut-off of > 90% completeness and < 5% contamination (Fig. 5b, Table 1). From among these 19, 11 were assembled with sequence data derived from DNA extracted using the ML kit, while only 4 high-quality MAGs were recovered from DNA extracted from each of the other two methods. The majority of the high-quality MAGs recovered were from milk samples with the spiked mock community but Corynebacterium pyruviciproducens and E. faecalis MAGs were also recovered from the HM10 sample extracted using the ML kit.
Table 1
High quality metagenome-assembled genomes from shotgun sequencing analysis.
Sample
|
Kit
|
Taxonomy
|
Completeness (%)a
|
Contamination (%)b
|
HM10_ML
|
ML
|
Corynebacterium pyruviciproducens
|
99.76
|
0.1
|
BMS12_ML
|
ML
|
Streptococcus mutans
|
99.63
|
0
|
HMS12_ML
|
ML
|
Enterococcus faecalis
|
99.63
|
0
|
HMS12_ML
|
ML
|
Deinococcus radiodurans
|
99.15
|
0.21
|
HMS12_ML
|
ML
|
Streptococcus mutans
|
98.88
|
0
|
BMS12_ML
|
ML
|
Deinococcus radiodurans
|
98.72
|
0.21
|
HM10_ML
|
ML
|
Enterococcus faecalis
|
98.5
|
0
|
BMS12_ML
|
ML
|
Bacillus cereus
|
98.3
|
0.33
|
BMS12_ML
|
ML
|
Rhodobacter sphaeroides
|
97.58
|
0.15
|
HMS12_ML
|
ML
|
Bacillus cereus
|
96.18
|
0.42
|
HMS12_ML
|
ML
|
Rhodobacter sphaeroides
|
92.05
|
0.25
|
HMS12_NEB
|
NEB
|
Deinococcus radiodurans
|
96.17
|
0.21
|
HMS12_NEB
|
NEB
|
Enterococcus faecalis
|
94.88
|
0.94
|
HMS12_NEB
|
NEB
|
Bacillus cereus
|
94.32
|
0.33
|
BMS12_NEB
|
NEB
|
Deinococcus radiodurans
|
90.92
|
0.21
|
HMS12_PS
|
PS
|
Bacillus cereus
|
97.99
|
0.33
|
HMS12_PS
|
PS
|
Streptococcus mutans
|
96.38
|
1.06
|
HMS12_PS
|
PS
|
Deinococcus radiodurans
|
95.96
|
0
|
BMS12_PS
|
PS
|
Deinococcus radiodurans
|
91.9
|
0.64
|
a Completeness: ratio of observed single-copy marker genes to total single-copy marker genes in the chosen marker gene set [46]
b Contamination: ratio of observed single-copy marker genes in ≥2 copies to total single-copy marker genes in the chosen marker gene set [46]