Simultaneous detection of various pathogenic Escherichia coli in water by sequencing multiplex PCR amplicons

Waterborne diseases due to pathogen contamination in water are a serious problem all over the world. Accurate and simultaneous detection of pathogens in water is important to protect public health. In this study, we developed a method to simultaneously detect various pathogenic Escherichia coli by sequencing the amplicons of multiplex PCR. Our newly designed multiplex PCR amplified five genes for pathogenic E. coli (uidA, stx1, stx2, STh gene, and LT gene). Additional two PCR assays (for aggR and eae) were also designed and included in the amplicon sequencing analysis. The same assays were also used for digital PCR (dPCR). Strong positive correlations were observed between the sequence read count and the dPCR results for most of the genes targeted, suggesting that our multiplex PCR-amplicon sequencing approach could provide quantitative information. The method was also successfully applied to monitor the level of pathogenic E. coli in river water and wastewater samples. The approach shown here could be expanded by targeting genes for other pathogens.


Introduction
Waterborne infectious diseases caused by pathogenic microorganisms present in the aquatic environment have become the most important problems in the world (WHO, 2019;Sunderland et al., 2007). The World Health Organization (WHO) estimates that such infectious diseases account for 3.6% of all illnesses worldwide, killing about 1.5 million people each year (WHO, 2019). As call for attention of waterborne infectious diseases, WHO lists the major pathogenic bacteria that cause waterborne infections (WHO, 2001). Escherichia coli and enterococci are currently used as fecal indicator bacteria to assess the presence of pathogens and analyze the potential risk (WHO, 2001;United States Environmental Protection Agency, 1996). One of the main reasons for the use of fecal indicator bacteria for water quality monitoring is related to the difficulty to simultaneously detect and quantify various pathogens. However, poor correlations have been reported between the levels of fecal indicator bacteria and the occurrence of pathogens in water environments (Bradshaw et al., 2016;Goh et al., 2019;McQuaig et al., 2012). Therefore, there is a strong need to obtain information accurately and simultaneously on pathogens in water environment.
Various molecular biology tools have been developed and applied to detect various pathogens. Polymerase chain reaction (PCR) has been widely used to detect the DNA or RNA of the target pathogens in various environmental samples (Toze, 1999). Quantitative PCR and digital PCR (dPCR) can provide quantitative information on target genes/pathogens; however, they are limited in quantifying multiple genes. While multiplex PCR is available to simultaneously detect multiple genes, conventional multiplex PCR cannot provide quantitative information. Multiplex qPCR is possible, but only for up to 3-4 genes. High-throughput microfluidic qPCR can also offer simultaneous quantification of multiple genes (Ishii et al., 2013), but it is less sensitive and requires pre-amplification prior to qPCR.
Recent advancement in the sequencing technologies enables us to directly detect genes of pathogens in various environments (Cheng et al., 2021;Martínez-Porchas & Vargas-Albores, 2017). Multiple PCR amplicons can be also sequenced and analyzed (Oshiki et al., 2018). However, when multiple amplicons are present in the sequencing library, shorter fragments are more preferentially sequenced (Harismendy et al., 2009). It is also unclear if and how sequence read abundance can reflect the gene quantities in the original samples.
The objectives of this study were to (i) develop a method to simultaneously detect various pathogens by sequencing multiplex PCR amplicons, (ii) compare the results with those obtained by dPCR, and (iii) apply the method to detect pathogens in river water and sewage samples. Among a wide variety of pathogens of interest, we targeted pathogenic E. coli (diarrheagenic E. coli) because they are one of the most important pathogens causing diarrheal diseases (Boisen et al., 2015;Huang et al., 2012) and there is a pressing need to understand the presence of pathogenic E. coli in water and other environments. Pathogenic E. coli can be classified into five categories based on pathogenic mechanisms, clinical manifestations, and immunological differences: enteropathogenic E. coli (EPEC), enteroinvasive E. coli (EIEC, which also includes Shigella spp.), enterotoxigenic E. coli (ETEC), enterohemorrhagic E. coli (EHEC), and enteroaggregative E. coli (EAEC) (Nataro & Kaper, 1998;Kubomura et al., 2017;Farajzadeh-Sheikh et al., 2020;Huang et al., 2012). In this study, we tried to detect each of these pathogens by our multiplex PCR-amplicon sequencing approach.

Primer design
We selected seven genes to target pathogenic E. coli: eae for EPEC, ipaH for EIEC including Shigella spp., stx 1 and stx 2 for EHEC, STh and LT genes for ETEC, and aggR for EAEC. We also used uidA as a gene to detect general E. coli. For positive control of each gene, DNA extracted from pathogenic E. coli strains was used (Table S1).
For each target gene, five or more gene sequences were retrieved from the GenBank database. Multiple alignment was performed using ClustalW (DDBJ, https:// www. ddbj. nig. ac. jp/ servi ces/ clust alw. html). After alignment, five primer candidates were designed for each gene by using Primer3Plus to have the primer melting temperature (Tm) of 59 ± 3 °C and the amplicon size of 300 to 400 bp (except for STh). The primers that have the highest possibility of specifically amplifying the target genes were selected based on the results of a BLAST homology search. For the amplification of eae and uidA, previously developed primers were used (Kong et al., 1999;Silkie et al., 2008;Walk et al., 2009). Probes were prepared by Thermo Fisher Scientific (Thermo Fisher Scientific, Waltham, MA, USA) using Primer Express Software v3.0.1 (Thermo Fisher Scientific) based on the designed primers, and TaqMan Probe was used for all target genes. The primer and probe sequences used in this study are shown in Table S2.

PCR
Simplex PCR was done to amplify each of the eight genes listed above. The PCR reaction mixture (25 µL) contained 5 μL of 5 × KAPA Taq EXtra Buffer, 1.5 μL of 25 mM MgCl 2 , 0.5 μL of 10 mM dNTP Mix, 0.12 μL of KAPA Taq EXtra DNA Polymerase (5 U/μL) (NIP-PON Genetics, Tokyo, Japan), 0.25 µM each primer (NIPPON GENE, Tokyo, Japan), and 1 µL of template DNA. The PCR reaction was performed with Applied Biosystems SimpliAmp Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, USA) with the following thermal conditions: initial denaturation at 95 °C for 1 min, 40 cycles of 95 °C for 15 s, 58 °C for 15 s, and 72 °C for 30 s, and the final elongation reaction at 72 °C for 7 min. The amplification product was separated by electrophoresis on a 2% agarose gel and visualized by staining with ethidium bromide for 10 min.
The genes for pathogenic E. coli were simultaneously detected by the multiplex PCR with the same primers and reaction conditions described above. The sizes of the amplified products after the PCR reaction were mostly the same; therefore, we used the amplicon sequencing approach, instead of gel electrophoresis, to verify the correct amplification of the target gene (see below). To analyze the quantitative relationship between the number of sequence reads and the amount of amplification product, a serial dilution of gBlock DNA fragments (1.53 × 10 −3 -6.20 × 10 2 ng/µL) was prepared to generate standard curves.

Amplicon sequencing analysis
Library preparation and sequencing were performed using the DNBSEQ G400 platform (MGI Tech, Shenzhen, CHINA) at the Bioengineering Lab, Japan. In brief, the PCR product was purified with DNA Clean Beads 1.0x (MGI Tech) and quantified using QuantiFluor dsDNA System (Promega, Madison, WI, USA). The sequence library was prepared according to the MGI Easy FS DNA Library Prep Set (MGI Tech). To add a sequence adapter to the PCR product, ligation was performed at 23 °C for 30 min. For the sequence adapter arrangement, 5′-AAG TCG GAG GCC AAG CGG TCT TAG GAA GAC AA-3′ on the read 1 (R1) side and 5′-AAG TCG GAT CGT AGC CAT GTC GTT CTG TGA GCC AAG GAG TTG -3′ on the read 2 (R2) side were used. After the ligation reaction, the PCR reaction for sequencing analysis was performed with the following: first heat denaturation at 95 °C for 3 min, 8 cycles of 98 °C for 20 s, 60 °C for 15 s, and 72 °C for 30 s, and final elongation reaction at 72 °C for 10 min. The PCR product was purified using DNA Clean Beads (MGI Tech) and quantified using the Qubit dsDNA HS assay kit (Thermo Fisher Scientific). The resulting sequencing library was used to create the DNA nanoballs (DNB) by using the DNBSEQ G400RS High throughput Sequencing Set (MGI Tech). Sequencing analysis was performed using DNBSEQ G400 under the condition of 2 × 200 bp. The read number for each gene was determined by counting the number of reads mapped against the reference gene sequences.

Digital PCR
The concentrations of the pathogenic E. coli genes were quantified using dPCR with the primers and probes designed as described above (Table S2) and the QuantStudio 3D Digital PCR System (Thermo Fisher Scientific). The dPCR reaction mixture (14.5 µL) contained 1 × Quant Studio 3D Digital PCR Master Mix (Applied Biosystems, Norwalk, CT, USA), 0.9 µM each primer, 0.25 µM probe, and 2 µL template DNA. The dPCR reaction mixture was dispensed onto a dPCR chip using the QuantStudio 3D Digital PCR 20 K Chip Kit v2 and the 3D Digital PCR Chip Loader (Thermo Fisher Scientific). The amplification was done using ProFlex Thermal Cycler (Thermo Fisher Scientific) with the following thermal conditions: initial denaturation at 95 °C for 30 s followed by 40 cycles of 95 °C for 15 s and at 58 °C for 1 min. The default setting of Quant Studio 3D Analysis Suite (Thermo Fisher Scientific) was adopted to determine the threshold fluorescence intensity that distinguishes between positive and negative amplifications. Chips with ≥ 10,000 wells measured were used for analysis. The raw sequence reads of environmental samples have been deposited in the DDBJ Sequence Read Archive under DRA015257.

Environmental water samples
Surface water samples (top 30 cm) were collected from Oyodo, Yae, and Kiyotake Rivers in Miyazaki, Japan, on 17 December 2018, by using a surface-disinfected polyethylene container as described previously (Jikumaru et al., 2020). The detailed sampling locations and water quality information are summarized in Table S3 and S4, respectively. The raw and treated sewage (before disinfection) were also collected from a sewage treatment plant in Miyazaki, Japan.
Bacterial cells were concentrated from the water samples (5 L of river water, 1 L of raw sewage, and 2 L of treated sewage) by using the coagulation and foam concentration method as described previously (Jikumaru et al., 2020). The resulting bacterial cell pellets were resuspended in 2 mL sterile physiological saline (0.85% NaCl) and filtered through a 0.45-μm-pore membrane filter (ADVANTEC, Tokyo, Japan). DNA was extracted from the membrane filters by using a DNeasy PowerWater DNA Kit (Qiagen, Venlo, Netherlands) according to the manufacturer's instructions. A final DNA elution volume of 100 µL was used. The DNA concentrations were measured using a Quantus fluorometer (Promega).

Confirmation of target gene amplification by PCR analysis
Six new PCR assays designed in this study (for stx 1 , stx 2 , ipaH, aggR, STh gene, and LT gene) produced amplicons of expected sizes by simplex PCR (Fig. 1). When primers for these assays were all combined along with previously designed eae and uidA assays (Kong et al., 1999;Silkie et al., 2008;Walk et al., 2009) (i.e., eight-plex PCR), amplicons of expected sizes were also obtained for uidA, stx 1 , stx 2 , ipaH, STh gene, and LT gene, although aggR and eae were not amplified in the multiplex PCR format probably due to resource competition between assays and/or secondary structure formations between primers (Fig. 2). Similarly, uidA was not amplified by the multiplex PCR when ipaH was present in the sample. Other pathogen-targeted genes such as stx 1 , stx 2 , ipaH, STh, and LT genes were amplified simultaneously. We therefore amplified eggR and eae separately from the other genes for sequencing analysis as follows: (1) amplify five target genes (stx 1 , stx 2 , ipaH, STh, and LT) simultaneously by multiplex PCR; (2) amplify aggR and eae by simplex PCR; (3) mix amplification products (multiplex, 15 µL; aggR, 15 µL; eae 15 µL; total, 45 µL) to prepare a sample for the amplicon sequencing analysis.
Quantification of pathogenic genes in DNA standard by digital PCR method The same target genes were also quantified by dPCR. Accurate and sensitive quantification of pathogens is important to assess potential health risks associated with water (Tang et al., 2021). Compared to traditional quantitative PCR (qPCR), dPCR is more sensitive (i.e., can detect genes at low concentrations) and accurate (i.e., tolerant to PCR inhibitors and independent of amplification efficiency) (Ishii, 2020). Based on our standard curve analysis (Fig. 3), the detection limits of our assays were 4, 3, 16, 12, 6, 6, 13, and 6 copies/ reaction for aggR, stx 1 , stx 2 , ipaH, eae, STh gene, LT gene, and uidA, respectively. Based on our standard curve analysis (Fig. 3), the lowest quantified values of our assays were 4-16 copies/reaction depending on the target gene, which is corresponding to 6-24 copies/L of original water sample. These concentrations are at Fig. 1 Gene amplification by single PCR with the primers designed in this study Fig. 2 Gene amplification by multiplex PCR with the primers designed in this study the levels naturally seen in environmental water samples (Cui et al., 2019;Huang et al., 2016); therefore, our dPCR assays have strong application potentials for environmental monitoring.

Sequencing multiplex PCR amplicons
Serially diluted standard DNA solutions (i.e., the mixture of DNA originated from pathogenic E. coli strains) were amplified with the multiplex PCR and sequenced to generate 3,377,510 to 3,881,717 reads per sample. All target genes except for the STh gene were detected by sequencing. The most abundantly detected gene was eae with the number of reads ranging from 726,944 to 1,798,269 reads/sample (Table 1). Interestingly, the read number of eae was larger in more diluted samples. In contrast, the read number of aggR, stx 1 , stx 2 , ipaH, LT gene, and uidA decreased in more diluted samples. Even when the amplified products were not visible on the gel, they were detected by sequencing, suggesting that the sequencing approach is more sensitive in detecting multiplex PCR amplicons similar to the previous study (Li et al., 2019).

Comparison of dPCR and amplicon sequencing data
The relationship between the copy number determined by dPCR and the read count obtained by the amplicon sequencing was analyzed for each target gene (Fig. 4). Strong positive correlations with the correlation coefficient values ranging from 0.970 to 0.998 were found between the copy and read count numbers for stx 1 , stx 2 , ipaH, LT gene, aggR, and uidA. In contrast, a negative correlation was observed for eae. The eae gene may have sequenced preferentially over others probably. The correlation was not analyzed for the STh gene because STh sequences were not detected by the amplicon sequencing analysis.

Number of target gene reads in the actual sample
The method developed here was applied to detect pathogenic E. coli in river water and sewage samples. Among the reads obtained (6,216,055 to 6,428,846 reads/sample), eae was most abundantly detected (1,982,246 to 2,664,351 reads/sample); however, this does not necessarily indicate that EPEC (which harbor eae) are most abundant in the environment because the read count of eae did not necessarily correlate with the absolute abundance in the samples as discussed above (Table 2). Nonetheless, other studies report eae as one of the most frequently detected virulence genes of E. coli in water environment (Ahmed  Huang et al., 2016;Ishii et al., 2014;Zhang et al., 2016). For more accurate quantitative information, qPCR or dPCR is needed. The aggR gene was also relatively frequently detected in our water samples with the read count ranging from 23 to 274,919 reads/sample. The read number of aggR in the Oka River, which receives effluents from septic tanks, was 20 times larger than those in sewage, suggesting that there are non-sewage sources of EAEC (which carry aggR) in the river environment. Similar to aggR, ipaH was also more abundantly detected in river water than in sewage. Since humans are the primary host of EAEC and EIEC (Lääveri et al., 2014;Pasqua et al., 2017), the rivers studied likely have been polluted with EAEC and EIEC due to anthropogenic loading (Canizalez-Roman et al., 2019;Hsu et al., 2010). The genes stx 1 and stx 2 represent STEC. We detected more stx 2 than stx 1 in the samples, similar to previous studies (Rooks et al., 2012;Duris et al., 2013;Ahmed et al., 2015). We previously quantified stx 1 , stx 2 and ipaH in the same river water and sewage samples by using dPCR (Jikumaru et at., 2020). While only stx 2 was detected in the Oyodo River and the influent sewage by dPCR, stx 1 , stx 2 , and ipaH were all detected by sequencing analysis (Table S5), suggesting that amplicon sequencing is more sensitive than dPCR. Therefore, our approach based on multiplex PCR followed by amplicon sequencing is useful to simultaneously detect various pathogen genes. Digital PCR may complement the sequencing approach as it can provide more quantitative information.

Conclusions
In this study, we developed a method to detect genes for pathogenic E. coli by sequencing the multiplex PCR amplicons. Competition may occur among target genes during PCR amplification (e.g., for enzymes, dNTPs); however, strong positive correlations between the sequence read counts and dPCR results for some genes (aggR, stx 1 , stx 2 , ipaH, LT gene, and uidA) suggest that the sequencing approach could provide information with higher sensitivity and specificity than conventional multiplex PCR. This method can be applicable to monitor the occurrence of pathogenic E. coli in surface water and wastewater samples in high throughput. The method can be expanded by targeting other genes such as for Salmonella and Campylobacter pathogens, although dPCR or qPCR may be needed if more quantitative information is desired.
The main focus of this study was to simultaneously detect multiple pathogen genes. However, it may be also possible to identify potential sources of pathogens by analyzing the sequence variations (Kobayashi et al., 2022;Zhang et al., 2018). Our results suggest potential non-human sources for EAEC and EIEC (and Shigella spp.) in the river environment. By sequencing and analyzing the aggR and ipaH amplicons, we may be able to identify the potential sources of these genes in the future.

Conflicts of interest
The authors declare no competing interests.