Stool samples.
Fecal samples from nine healthy volunteers and nine patients with
Clostridium difficile infection (CDI) were provided by a certified testing laboratory in France and tested for
Clostridium difficile toxins. Upon reception, each fecal sample was freshly aliquoted into 24 tubes (8 protocols x 3 replicates) and frozen at -80 °C until extraction, the − 80 °C storage being known to maintain a stable microbial community for long-term period [
68].
Microbial mock community.
The microbial mock community was prepared by mixing nine bacteria (
Table 2), including four easy-to-lyse Gram-negative bacteria (
Pseudomonas aeruginosa,
Escherichia coli,
Salmonella enterica and Rhizobium radiobacter) and five more difficult to lyse Gram-positive bacteria (
Lactobacillus fermentum,
Enterococcus faecalis,
Staphylococcus aureus,
Listeria inocula and
Bacillus subtilis). Bacterial cells were obtained from ATCC and cultivated according to ATCC’s recommendations. The number of viable cells was estimated by plate counting. The mock community was prepared by mixing between 2.7 × 10
7 and 3.6 × 10
8 cells of nine bacteria and stored at -80 °C until extraction.
DNA extraction.
Four commercial protocols were compared in this study, according to the manufacturers’ recommendations: the NucleoSpin Soil kit (#740780.50, protocol May 2016/Rev. 06, Macherey-Nagel), the DNeasy PowerLyzer PowerSoil Kit (#12855-100, protocol 07272016, QIAGEN), the QIAamp Fast DNA Stool kit (#51604, QIAGEN, protocol modified from [
36]) and the ZymoBIOMICS DNA Mini kit (#D4300, protocol 1.1.0, ZymoResearch). These protocols were also tested in combination with a stool preprocessing device (SPD, #421061, bioMérieux, [
51]). This device was designed to facilitate and standardize fecal sample preparation before nucleic acid extraction. It includes a spoon for a 200 mg calibrated sample and a vial containing a buffer for sample resuspension, glass beads for homogenization and two filters for retaining fecal debris. After 5 minutes hands-on-time, the filtrate is ready-to-use for downstream DNA extraction. Protocols of extraction methods as well as SPD are detailed in
Supplementary Methods. DNA was extracted in triplicates from fecal samples and from the microbial community. A260/A280 ratio was assessed using the DropSense 96 system (Trinean). Genomic DNA size was assessed using the Genomic DNA ScreenTape (#5067–5364, Agilent) on the 2200 TapeStation system (Agilent). DNA concentrations were estimated using the QuantiFluor One dsDNA kit (#E4870, Promega) with the GloMax system (Promega).
16S rRNA gene library preparation and sequencing.
16S rRNA gene libraries was prepared according to Illumina’s protocol (# 15044223 RevB, [
69]). In order to minimize the risk of cross-contamination and pipetting errors, the workflow was automated using a high-throughput liquid handler; the Freedom EVO NGS workstation (TECAN) [
70]. Briefly, V3-V4 hypervariable regions were first amplified from 12.5 ng of genomic DNA, using the following primers: (i) Forward Primer: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGAGGCAGC-AG and (ii) Reverse Primer: GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVG-GGTWTCTAAT and 2x KAPA HiFi HotStart ReadyMix (Kapa Biosystems). PCR cycle conditions were 95 °C for 3 min, 25 cycles of (95 °C for 30 seconds, 55 °C for 30 seconds 72 °C for 30 seconds), then a final extension of 72 °C for 5 minutes. The libraries were purified using AMPure XP beads (Beckman Coulter). Dual indexes and sequencing adapters from the Illumina Nextera XT index kits (Illumina) were added in a second PCR using 2x KAPA HiFi HotStart ReadyMix (Kapa Biosystems). Cycle conditions were 95 °C for 3 minutes, 8 cycles of (95 °C for 30 seconds, 55 °C for 30 seconds, 72 °C for 30 seconds), then a final extension of 72 °C for 5 min. Ready-to-sequence libraries were purified using AMPure XP beads (Beckman) and quantified by fluorescence using the QuantiFluor One dsDNA kit (# E4870, Promega) with the GloMax system (Promega). Quality control was performed using a 2200 TapeStation system with the DNA 1000 screenTape (# 5067–5582, Agilent). The library pool was quantified by qPCR with the KAPA Library Quantification Kit for Illumina platforms (Kapa Biosystems). Sequencing was performed on a MiSeq system (Illumina) with the MiSeq Reagent v3 kit (600 cycles) in a 2 × 300 bp mode.
Shotgun metagenomic library preparation and sequencing.
SMS libraries were prepared using the Nextera XT DNA Library Preparation Kit (# FC-131-1096, Illumina), following Illumina’s instructions (protocol # 15031942 v03 February 2018). Briefly, 1 ng of genomic DNA was used for the tagmentation reaction for a total volume of 20 µl. After 5 min at 55 °C, the reaction was stopped by adding 5 µl of the Neutralize Tagment (NT) Buffer. A limited-cycle PCR amplification was then performed to amplify the tagmentated DNA (addition of 15 µl of Nextera PCR Master Mix (NPM)) and to add Illumina sequencing adapters (addition of 5 µl of both Index 1 primer and Index 2 primer from the Nextera XT index kit, Illumina) for a total volume of 50 µl. The following PCR cycle program was used: 72 °C for 3 minutes, 95 °C for 30 seconds, 12 cycles of (95 °C for 10 seconds, 55 °C for 30 seconds, 72 °C for 30 seconds), 72 °C for 5 minutes. SMS libraries were quantified using the QuantiFluor One dsDNA kit (# E4870, Promega) with the GloMax system (Promega). The quality of libraries was assessed using the High Sensitivity DNA kit on the Agilent 2100 Bioanalyzer. Sequencing was performed on a NextSeq500 system (Illumina) with the NextSeq 500/550 High Output v2 kit (300 cycles) in 2 × 150 bp.
16S rRNA gene profiling.
After quality control with FastQC (v0.11.3), overlapping paired-end reads were merged with PEAR (v0.9.10). Quality trimming and filtering of amplicons were performed with trimmomatic (v0.36) and SGA (v0.9.9). The following parameters were used: maximum of 20 low-quality base calls in the whole sequence, no ambiguous bases (N), a minimum Phred quality score of 15 over a 4 bp sliding window, a minimum average quality score of 25, and a minimum length of 100 bp after trimming. During the PCR amplification process, artefactual sequences can be generated from multiple parent sequences, and are called chimeric sequences. These sequences were removed using the UCHIME
de novo algorithm, which is integrated in the usearch6.1 QIIME pipeline. Sequences were grouped into operational taxonomic units (OTUs) based on a 97% identity threshold using usearch6.1 through the “pick_open_reference_otus.py” QIIME script. OTUs recruiting less than 0.005% of the total number of sequences were filtered out, as recommended by [
71]. Taxonomic annotation was performed in QIIME using the RDP classifier trained on SILVA rRNA reference database (v123).
Shotgun metagenomic profiling.
Reads were trimmed and filtered based on the sequence quality and length, exactly as for the 16S rRNA gene data analysis. Reads were annotated using the Centrifuge software [
72] on the NCBI Refseq genome database (v.2018, complete genomes and scaffolds [
73]).
Statistical analysis.
Annotated tables were normalized by a “total count” method (at the OTU level for 16S rRNA gene sequencing, at the species level for SMS). All subsequent analyses were performed in R (version 3.3.1). The repeatability was assessed by calculating a coefficient of variation for each bacterium present in all replicates of a condition for every patient.
Alpha-diversity (Shannon indices) was calculated for each sample using the vegan package. The accuracy of the protocols was evaluated on the mock community sample by calculating the Euclidean distance between expected and predicted abundances (log2), using the “philentropy” R package. Differentially abundant bacteria between protocols with or without the SPD were identified using the DESeq2 package. For each criterion (except for
alpha-diversity), the statistical significance of the differences between protocols was computed with a pairwise Wilcoxon rank test. For multiple comparisons,
P-values were corrected by Benjamini Yakuteli correction and adjusted
P-values below 0.05 were considered statistically significant. The
alpha-diversity values varied greatly from one patient to another, so the patient effect was controlled in a linear model using the “limma” package, and statistics were computed with the empirical Bayes method.