5.1. Study cohort, drug, and specimen
Subjects were recruited from among 17 centers in the United States and Canada from 10 December 2014 through 13 November 2015. Subjects were adults with recurrent CDI who have had either i) at least two recurrences after a primary episode (total three CDI episodes) and had completed at least two rounds of oral antibiotic therapy or ii) had at least two episodes of severe CDI resulting in hospitalization. They were randomly assigned to one of three treatment groups: placebo, single, or double doses of RBX2660. All treatments were blinded and delivered by enema . The second dose was administered approximately 7 days after the first dose. For patients that received two RBX2660 doses, donor selection was random and not constrained to provide a single representative donor per patient.
The selection and screening of donors for RBX2660 were performed as previously described [27,28]. The placebo composed of normal saline and formulation solution including cryoprotectant in the same proportions used for the RBX2660 preparation. RBX2660 and placebo were stored frozen after preparation until administration. They were thawed for 24 hours in a refrigerator and administered within 48 hours after thawing. AROs were isolated from patient stools and RBX2660 products on selective agar media plates, chromID VRE (bioMerieux, Marcy-l’Etoile, France), MacConkey with Cefotaxime (Hardy Diagnostics, Santa Maria, CA), MacConkey with Ciprofloxacin, (Hardy Diagnostics), and HardyCHROMTM ESBL (Hardy Diagnostics), at 35˚C in air. The remaining stools were stored frozen at -80˚C until metagenomic DNA extraction. Isolate colonies were sub-cultured to trypticase soy agar with 5% sheep blood (Becton Dickinson, Franklin Lakes, NJ) and identified using VITEK MS matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) system [60,61]. Each isolate was frozen in tryptic soy broth with glycerol at -80˚C.
5.2. Antibiotic susceptibility testing
Antibioticl susceptibility testing was performed through Kirby Bauer disk diffusion, and the resulting zone sizes were interpreted according to the M100 document from the Clinical and Laboratory Standards Institute .
5.3. DNA extraction and sequencing
Metagenomic DNA was extracted from approximately 100 mg of stool samples using DNeasy PowerSoil Kit (Qiagen) following the manufacturer’s protocol excepting the stool lysis step: stool samples were lysed by 2 rounds of bead beating for 2 min (total 4 min) at 2,500 oscillations/min using a Mini-Beadbeater-24 (Biospec Products). Samples were chilled on ice for 2 min between the two bead beating rounds. Extracted DNA was quantified using a Qubit fluorometer dsDNA HS Assay (Invitrogen) and stored at −20°C until the library preparation. Metagenomic DNA was diluted to 0.5 ng/μL before preparing the sequencing library. Libraries were prepared using the Nextera DNA Library Prep Kit (Illumina) as previously described . The libraries then were purified through the Agencourt AMPure XP system (Beckman Coulter) and quantified by Quant-iT PicoGreen dsDNA Assay Kit (Invitrogen) before sequencing. Approximately 70 library samples were pooled in an equimolar manner at the final concentration of 5 nM for each sequencing lane. Prepared pools were submitted for 2 × 150 bp paired-end sequencing on an Illumina NextSeq High-Output platform at the Center for Genome Sciences and Systems Biology at Washington University in St. Louis with a target sequencing depth of approximately 5.5 million reads per sample.
Isolate genomic DNA was extracted using QIAmp BiOstic Bacteremia DNA Kit (Qiagen). Libraries for whole genome sequencing of isolates were prepared from diluted genomic DNA (0.5 ng/μL) as described above. About 180 libraries were pooled together in an equimolar manner at the final concentration of 5 nM for each sequencing lane. Prepared pools were submitted for 2 × 150 bp paired-end sequencing on an Illumina NextSeq High-Output platform at the Center for Genome Sciences and Systems Biology at Washington University in St. Louis with a target sequencing depth of approximately 2 million reads per sample.
5.4. Data processing and genome assembly
Sequence reads were binned by index sequence. Adapter and index sequences were trimmed using Trimmomatic v.0.38  using the following parameters: java -Xms2048m -Xmx2048m -jar trimmomatic-0.38.jar PE -phred33 ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:1:true SLIDINGWINDOW:4:15 LEADING:10 TRAILING:10 MINLEN:60. Human sequence contamination was eliminated using Deconseq , and the qualities of resulting reads were verified by FastQC (https://github.com/s-andrews/FastQC).
Isolate genomes were assembled, assessed, and annotated using SPAdes , QUAST , and Prokka , respectively. Average nucleotide identity between E. coli and VRE isolate pairs were calculated using dnadiff . Within-species pan genomes and core genome alignments were obtained with Roary  with default parameters, using 24 and 4 NCBI reference strains (Supplementary Table 5) for E. coli and VRE, respectively, with additional Escherichia fergusonii and general Enterobacter faecalis as outgroups. Alignments were converted via FastTree  and visualized on iTOL v4 .
5.5. Microbiome composition and comparison
Microbiome taxonomic composition was predicted by MetaPhlAn v2.0  and controlled for relative abundance. Genus-level composition plots were obtained by grouping together genus present in less than 50% of samples as “Other.” DS00 pseudo-donor microbiome was obtained by averaging the species-level taxonomic profiles of all donor microbiomes. Bray-Curtis distances were calculated using the vegan package  and visualized as PCoA plots via the ape package  in R 3.5.3. LEfSe  identified baseline taxonomic and metabolic features distinguishing transplanted and non-transplanted patients (alpha value for the factorial Kruskal-Wallis test = 0.05, threshold on the logarithmic LDA score = 2). HUMAnN2  was employed for metabolic pathway prediction.
5.6. Resistome identification and random forest classifier
ARGs in the microbiome were identified using ShortBRED  with CARD . Isolate ARGs were identified with RGI and CARD [54,78]. The resulting genes were manually curated into more general ARG families (n = 64). A subset of 70% of available resistomes were then used to train a random forest classifier distinguishing patient baseline and donor stool resistomes (training set n=103), which was then tested on the remaining samples (test set n=45). The random forest classifier was built with the package scikit-learn (https://scikit-learn.org/stable/index.html) on Python 3.7.3, with trees averaging 12 nodes and a maximum depth of 4.
5.7. ARO tracking and SNP calling
SNPs were called using Bowtie2 , SAMtools and BCFtools , with the first isolate from the patient or donor used as the reference genome. Reads from subsequent isolates of the same species were aligned against the reference with Bowtie2 (-X 2000 --no-mixed --very-sensitive --n-ceil 0,0.01). BAM files were obtained and sorted with SAMtools (view and sort), which were then converted to pileup files (mpileup). BCFtools view generated VCF files, and variants were called, with the following criteria: minimum coverage of 10 reads per SNP, major allele frequency above 95%, and FQ-score of -85 or less. Indels were excluded. VCF files for each patient were compiled with BCFtools merge, after which SNPs were parsed and counted using custom python and R scripts.