WWTP Processing and Sample Collection
The NESTP is the largest wastewater facility in the Province of Manitoba (49°57'08.1"N 97°06'11.4"W). This WWTP serves roughly 70% of the population in the city of Winnipeg treating an average of 200 million liters per day. Treatment begins with the removal of large solids followed by smaller solids and oils in primary clarification. Wastewater is then moved to high purity oxygen bioreactors for biological treatment where it is inoculated with activated sludge returned from secondary clarification to remove nutrients, biosolids, and other organic compounds. This is followed by secondary clarification which removes the remaining small solids through sedimentation before the wastewater is subjected to UV disinfection. The total hydraulic retention time of wastewater in the NESTP is 12 hours. The removed biosolid waste is subjected to anaerobic sludge digestion for 25 days followed by dewatering for disposal. Effluent water quality parameters are summarized in Table 1. Influent and effluent water quality datasets are included in Additional file 2. Raw sewage or untreated influent (RS), returned activated sludge (RAS), and UV-treated final effluent (EFF) were collected on the following dates from the NESTP: October 22nd, 2019 (T1), November 28th, 2019 (T2), December 18th, 2019 (T3), and February 6th, 2020 (T4). Dewatered sludge (SC) was also collected during the T3 and T4 sampling events. One liter of each sample was collected in sterile containers and transported on ice to the laboratory where they were stored at 4°C. Within 24 hours of collection, aqueous treatment samples were filtered through a sterile cheesecloth to remove large solids and 100 to 200 mL of the filtrate was filtered through 0.2-µm 47-mm Supor-200 membrane filters (Pall Corporation, Ann Harbor, MI) to capture bacterial cells for nucleic acid extraction. A filtration control consisting of 200 mL of Milli-Q water was also prepared for each sampling event. Filters were collected and stored at -20°C for further processing.
Bacterial Fraction DNA Extraction
For the aqueous treatment samples (RS, RAS, and EFF), the 0.2-µm filters were washed with 15 mL of 1X PBS-Tween20 solution and homogenized at 2500 rpm for 15 minutes. Supernatants were then transferred to a fresh tube and centrifuged at 3300 x g for 15 minutes to pellet down cells. Pellet was resuspended and transferred to PowerBead tubes for nucleic acid extraction using the DNeasy PowerLyzer PowerSoil Kit (Qiagen, Hilden, Germany) as per the manufacturer’s instructions. For the dewatered sludge samples, ~5 g of solids was collected into 30 mL of 1X PBS-Tween20 solution, vortexed until homogeneous (2500 rpm for 15 minutes) and centrifuged at 4°C and 4500 x g for 20 minutes. The resulting pellet (0.25 g) was then transferred to a PowerBead tube and extracted using the DNeasy PowerLyzer PowerSoil kit as per the manufacturer’s instructions, while the supernatant was kept for further filtration and DNA extraction of viral particles.
Viral Fraction DNA Extraction
The 0.2-µm filter flow-through from the aqueous treatment samples was collected and stored at 4°C. Dewatered sludge viral DNA extraction followed the bacterial DNA extraction protocol scaled up to 30 g followed by supernatant filtration through 0.2-µm filter repeated twice. For both sets of samples, 140 mL of filtrate was transferred into Centricon Plus-70 filter units (Millipore Corporation, Billerica, MA). Filtrates were concentrated to ~250 µL according to the manufacturer’s instructions with additional modifications described here. Each sample was centrifuged at 3000 x g for 30 minutes at 20°C in 70 mL increments. The supernatant was discarded after each run. The Centricon Plus-70 filter units were then inverted, and the concentrated viral fraction was transferred to sterile tubes and centrifuged at 800 x g for 2 minutes at 20°C. The ultrafiltrate was pre-treated with 2U of Turbo DNase and 60 µg of RNAse A (ThermoFisher Scientific, Waltham, MA, USA). Nucleic acid was extracted using the QIAamp MinElute Virus Spin kit (Qiagen Sciences, Maryland, MD) following the manufacturer’s instructions including the Qiagen Protease and omitting the carrier RNA step. Nucleic acid was eluted with 50 µL of AVE elution buffer.
Mock Community
Positive sequencing controls were created for each bacterial and viral sequencing batch. Bacterial mock community consisted of equal amounts of Escherichia coli (ATCC 25922), Salmonella enterica (ATCC 13076), Pseudomonas aeruginosa (ATCC 10145), Staphylococcus aureus (ATCC 25923), Legionella pneumophila (ATCC 33152), L. longbeachae, clinical isolates of Campylobacter lari and C. upsaliensis, and environmental isolates of C. jejuni and C. coli with a total concentration of 11.2 ng/µL. The viral mock community consisted of equal amounts of Adenovirus and Myophages M2 and M3 with a total concentration of 0.4 ng/µL. Nucleic acid was extracted according to their respective extraction protocol described above.
DNA Precipitation and Sequencing
All extracted bacterial DNA samples were precipitated using 0.1 volumes of 3 M sodium acetate (pH 4.6), two volumes of 100% ethanol, and 4 µL of 5 mg/mL linear acrylamide, which was stored at -80°C overnight. Samples were centrifuged at 16,000 x g for 30 minutes at 4°C. Supernatants were discarded and pellets were washed with 1 mL of ice-cold 70% ethanol before repeating centrifugation. Resulting pellets were air-dried and resuspended in 50 µL of 10 mM Tris solution. DNA concentration was measured using the Qubit 4.0 fluorometer and Qubit dsDNA High Sensitivity Assay Kit (Invitrogen, Carlsbad, CA). Metagenomic shotgun sequencing was performed on the Illumina NextSeq platform (Illumina, Inc., San Diego, CA) using the Illumina Nextera Flex kit with 300 bp paired-end outputs at the Integrated Microbiome Resource (IMR, Halifax, NS). Raw sequencing read adaptor sequences were trimmed prior to receiving sequencing reads. Raw metagenomic sequencing reads are available in the NCBI Sequence Read Archive under BioProject ID: 768945.
Quantification of Gene Copy Numbers and Statistical Analysis
Real-time PCR TaqMan assays were carried out for each sample (RS, RAS, EFF, and the negative filter control) in triplicate. Mobile integron-integrase genes (classes 1-3) served as biomarkers for measuring variation in ARG dissemination between treatments [8, 22]. Coliform beta-glucuronidase gene uidA was used as a culture-independent method of E. coli and coliform quantification [23] in addition to plate counts reported in Table 1. The 16S rRNA gene was quantified to assess total bacterial abundance. Sequence and product size of primers and probes are listed in Table 2. Real-time PCR data for the dewatered sludge was unavailable for the present study.
Each 10 µL TaqMan real-time PCR mixture consisted of 5 µL of TaqMan Environmental Master Mix 2.0 (Applied Biosystems, Foster City, CA), 400 nM of each primer, 100 nM of each probe, and 2 µL of 10 ng/µL template DNA. Real-time PCR was performed using an ABI QuantStudio 5 Real-Time PCR System (Applied Biosystems, Foster City, CA). Thermal cycling conditions were the same for all targeted genes: 2 minutes incubation at 50°C, denaturation and activation of Taq polymerase for 10 minutes at 95°C, followed by 40 cycles of 15 s at 95°C and 60 s at 60°C. Primers and probes are listed in Table 2. Primers were used in the Primer-BLAST tool [24] to extract target regions. Then, these regions were uploaded to Geneious Prime R9 [25] to corroborate primer and probe sequences of gene fragments. Combined integron-integrase gene primer sequences and uidA primer sequences were used to construct gBlock Gene Fragments (IDT, Coralville, IA). These gBlock constructs were used to generate standard curves for the quantification of environmental gene copy numbers (GCN).
The 16S rRNA gene standard curve was generated with DNA from Salmonella enterica (ATCC 13076). GCN of all target genes were calculated and normalized per milliliter of sample filtered and per nanogram of DNA using equations described by Ritalahti et al. [26] and Lee et al. [27]. GCN were log10-transformed for generalized linear model (GLM) analysis using Statistical Analysis System (SAS University Edition for Windows). Tukey-Kramer tests were run to determine statistical differences between GCN using GLM results for treatment conditions and across sampling events (T1 through T4) [28]. Differences between conditions were considered statistically significant at a p-value ≤ 0.05.
Metagenomic Data Processing and Analysis
Paired-end metagenomic shotgun sequences were merged and sequence lengths below 151 bp were removed.
Taxonomical characterisation. MG-RAST was used with the merged paired-end sequencing reads to characterize the community composition through sequence similarity searches using default parameters [29]. Metagenomic sequences are available from MG-RAST with sequence IDs listed in Table S1. Principal coordinate analysis (PCoA) with Bray-Curtis distance matrix was performed using abundance data at the species taxonomic level using the phyloseq R package [30]. Alpha-diversity indices (Shannon diversity, Simpson diversity, and Chao1) and rarefaction curves were generated using vegan R package [31]. Data visualization was performed using Tableau [32] and ggplot2 [33] in RStudio [34]. Alpha-diversity values were log10-transformed for generalized linear model (GLM) analysis and Tukey-Kramer test using SAS 9.4M6 to determine statistical differences between treatments [28].
ARG analysis. Contigs were generated using the Geneious (version 9.0.5) de novo assembler tool [25]. They were then compared against the comprehensive antibiotic resistance database (CARD) using the resistance gene identifier tool (RGI) to identify ARGs for the characterization of the resistome [35]. The ARGs identified by CARD RGI were grouped into classes according to the type of antibiotic they confer resistance against and quantified by absolute abundance as well as relative abundance (number of ARG class divided by total ARG) and normalized abundance (number of ARG class divided by the total number of bacterial and viral reads) reported in percentage.
Network analysis and visualization. Network analysis was performed to investigate co-occurrences between ARGs and bacterial and phage families in the different wastewater sample types. Microsoft Excel was used to organize data into a suitable format for R [36]. RStudio [34], was used to generate Spearman’s correlation matrices and the preliminary edge and node lists needed for network visualization. The R packages used are listed in Supplementary Materials (Table S2). The edge and node lists were finalized using Microsoft Excel. Gephi [37] was then utilized for network visualization. Gephi’s output files were aesthetically enhanced in Inkscape [38].