Characterization of a bacterial community standard using the RAD004 rapid sequencing kit. The first step in our investigation for the use of the rapid sequencing kit (RAD004) for fast taxonomic classification of the microbial composition of a sample was to test the performance of our proposed workflow (nanopore sequencing using the RAD004 library preparation followed by WIMP classification tool) with a known microbial standard (ZymoBIOMICS Microbial Community DNA Standard, Zymo Research). The successful characterization of the same microbial standard, using the same instrument, but with a different DNA library preparation, the ligation sequencing kit LSK109, and analysis pipeline has been demonstrated earlier (38). There are some fundamental differences between these two DNA library preparation methods: 1) DNA gets more fragmented in the rapid kit, 2) there is potential loss of DNA in the ligation kit because of several DNA cleaning steps, and 3) the ligation kit is composed of several more steps than the rapid sequencing kit. All of these could affect the microbial profile of a sample, the speed of the sequencing, and in-field usage of the nanopore sequencing device. As shown in Figure 1 and Table 1, the preparation of the DNA library using the RAD004 kit followed by WIMP analysis resulted in a correct classification of the composition of the mock community across different sequencing intervals (https://epi2me.nanoporetech.com/shared-report-226486?tokenv2=a812d53e-7d47-4294-975c-3550fd037336). This experiment showed that the RAD004 kit could be used as efficiently as the ligation kit (LSK109) for determining the microbial composition of a sample, albeit with lower output.
Testing metagenomic characterization of agricultural water using nanopore rapid sequencing kits. Our original goal for this project was to test 1) the on-site, fast characterization of the bacterial composition and 2) detection of STECs in culture-independent, concentrated agricultural water samples (Fig. 2) using the Oxford Nanopore hand-held MinION and two versions of DNA library preparation kits (RAD004 rapid sequencing kit and LRK001 field sequencing kit). DNA extraction of each sample (10 ml) resulted in 2.5 ug total DNA per sample. A DNA library for sample 26 was prepared using the RAD004 kit and resulted in a non-productive sequencing reaction with a low output (388,130 reads and 0.4 Gb yield) (Additional File 1). Pore availability at the onset was modest (70%) with only ~40% of pores actively sequencing and steadily declined over the first 24 hours. Almost 60% of the total reads did not pass the quality filter (Additional File 1). Of the reads that passed the quality filter, a large majority were less than 1,000 bp (103,235 reads) and taxonomy was classified by WIMP for about 45% of them (https://epi2me.nanoporetech.com/shared-report-243787?tokenv2=934deea4-2201-4e0d-bf82-d42c3a03078b). The rapid sequencing kit does not contain a DNA cleaning step and requires the highest DNA quality to maintain optimal performance. The poor performance of sample 26 suggested that the sample contained an inhibitor or other interference with proper sequencing.
In order to reduce or eliminate this inhibition, samples 17 and 26 were further cleaned with an Agencourt magnetic bead cleaning step (as described in Methods) and prepared for sequencing using the LRK001 or RAD004 kit. The cleaning resulted in a loss of 40% of the total DNA and the sequencing continued to show inhibition, although less pronounced and with better results (Fig. 3). The sequencing run with the LRK001 kit for sample 26 showed a rapid decay of the sequencing pores in less than 24 hours, resulted in low read output (322,000 reads and 0.4Gb yield) (Fig. 3A), and only 50% of reads passing the quality filter (Fig. 3B). A similar result was obtained with the RAD004 kit for the same sample 26, but with a higher read output (1,700,000 reads and 1.8 Gb yield) (Fig. 3C) and a slight increase in the number of reads passing the quality filter (Fig. 3D). Nevertheless, the majority of the read sizes were below 1,000 base pairs in length (~560,000 reads) which resulted in more than 86% of the reads being unclassified (https://epi2me.nanoporetech.com/shared-report-214408?tokenv2=9ed5fce3-da1c-434c-a388-5d98953a7e1c). The DNA was highly sheared due to the Agencourt cleaning and the use of the rapid sequencing kit. Sequencing of sample 17 using the RAD004 kit showed similar results. The total read output was 1,680,000 reads with 67% of reads passing the quality filter and 23% of those reads were classified by WIMP. Additionally, more than 430,000 reads were under 1,000 base pairs in length (https://epi2me.nanoporetech.com/shared-report-214241?tokenv2=e55ce2ae-6d63-4dd4-a1a5-dd3c4f0c567a).
Use of the LSK109 ligation kit for eliminating agricultural water inhibitors. Because we found that the field sequencing kit and the rapid sequencing kit did not produce satisfactory results for our field application, we decided to test the ligation kit (LSK109). The ligation kit has several advantages over the rapid sequencing kits such as: higher output, no enzymatic shearing, and with several cleaning steps the inhibitors will diminish to levels that would not interfere with the sequencing reaction. The drawback was that it takes almost 90 minutes for sample preparation compared to the 15 minutes required for the rapid sequencing or field sequencing kits and it contains more steps where sample can be lost. Testing produced promising results with 2,210,000 reads and 8.45 Gb yield with more than 60% of pores sequencing over the 24-hour sequencing run. Furthermore, over 85% of the reads passed the quality filter (Fig. 4). We decided to process three additional samples and compare their taxonomic composition to conduct a baseline metagenomic survey of samples collected across 3.7 contiguous miles in the Southwestern US (Fig. 2).
Agricultural water metagenomic taxonomic characterization. Each run produced an average of 2,200,000 reads with an average total yield of 8.5 Gb (Additional File 2). The base-called reads were passed through a quality filter and reads above 5,000 base pairs in length were analyzed by the EPI2ME WIMP workflow. The agricultural water samples had a diverse composition. The reads were predominately bacterial (89-92%) with the remaining of eukaryotic (6-9%), viral (1-2%), and archaeal (<1%) origin (Additional File 3) that could be organized into approximately 50 phyla, 90 classes, and 1,500 genera (Table 2).
Bacterial composition in agricultural water. We identified the bacterial genera with an abundance greater than 1% in at least one sample (Fig. 5). The 11 bacterial genera identified include Synechococcus, Cyanobium, Pseudomonas, Streptomyces, Flavobacterium, Candidatus Fonsibacter, Limnohabitans, Hydrogenophaga, Acidovorax, Variovorax, and Rubrivivax. A large portion of reads (~40%) were classified as various taxa, but the combined genus abundance was less than 1%. Sites 9, 10, and 11 displayed very similar composition with 30-40% Synechococcus, 4% Cyanobium, and 1-2% each Pseudomonas, Streptomyces, Flavobacterium, Candidatus Fonsibacter, and Limnohabitans. Site 12 is located approximately 6.9 miles from site 11 in a saltwater drainage canal and has a similar abundance of Streptomyces (1.3%), and Limnohabitans (1.3%). However, site 12 had almost no Synechococcus (0.3%), Cyanobium (0.1%), and Candidatus Fonsibacter (0.1%), While, Pseudomonas (3.1%), Flavobacterium (5.9%), Hydrogenophaga (3.1%), Acidovorax (1.9%), Variovorax (1.3%), and Rubrivivax (1.2%) were each present in approximately 2 - 6 times the abundance as sites 9-11.
Eukaryotic composition in agricultural water. Strikingly, approximately 6 - 9% of the total reads were identified as genus Homo. Eukaryotic DNA was represented by 19,645 to 56,173 reads (Additional File 3). Within the eukaryotic reads, approximately 98% of reads were identified as Homo sapiens, with the other 2% being largely fungal in origin. Thus, the fungal composition of the agricultural water was minimal.
Detection of STECs. Each sample site was analyzed for the presence of Shiga toxin-producing E. coli by the FDA BAM Chapter 4A methods. Sites 9, 10, and 12 were confirmed to be STEC positive after enrichment. Contrary to these results, the WIMP analysis of the nanopore sequencing output of the unenriched agricultural water revealed that between 46 – 152 reads were identified as E. coli. The strain-level identification further classified one read from site 11 and 2 reads from site 12 as the O157 serotype. An NCBI BLAST search of those individual reads revealed that only the read from site 11 matched the O157:H7 genome. Due to the limited coverage, strain level identification could not be obtained. Therefore, the concentration of E. coli in unenriched agricultural water samples was not sufficient for the detection of STECs or E. coli O157:H7 by direct nanopore sequencing.