Characterization of urban wastewater treatment plant euent from Tokyo using metagenomics and β-lactam-resistant Enterobacteriaceae isolates

Background: The World Health Organization has endorsed a global action plan on antimicrobial resistance (AMR), which calls upon all nations to adopt mitigation strategies based on the one-health approach. In particular, environmental factors to evaluate their potential risk for promoting the evolution of AMR have not been suciently characterized, yet. Instead of the limited nosocomial AMR surveillance, sewage AMR surveillance could highlight a broader picture of the global burden of AMR, including local-specic features, such as urban-, suburban-, industrial-, or agricultural-based, based on the One-Health approach. Methods: We characterized environmental AMR burden in euents of eight wastewater treatment plants (WWTPs) around the Tama River and Tokyo Bay by metagenomic analysis of AMR genes (ARGs) and heavy metal resistance genes and whole-genome sequencing of AMR bacteria (ARB) isolates. Results: We found that seasonal features of ARGs eluted from WWTPs and sul1 and qacE (or qacE ∆ 1), which are the main contributors of class 1 integron, were rich genetic factors as baseline ARG pollutants. In addition, aadA6 and msr(E) were predominant ARGs against aminoglycoside and macrolide resistance, respectively. Among the detected ARB isolates, complete genome sequencing demonstrated that E. coli O16:H5-ST131-mH41 producing CTX-M-27 exhibited marked genome clonality with clinical isolates from several countries. Complete plasmid sequences highlighted that ARG distribution was involved in plasmid replicon and its bacterial hosts. Conclusions: Here, we provide a baseline for investigating environmental AMR dissemination and the possibility of using metagenomics for comprehensive AMR surveillance in sewage. This study demonstrated that AMR monitoring could uncover the actual status of the euents from WWTP and would suggest how we should conduct further studies to reduce the related AMR burden to prevent it from becoming a potential health risk. ≥ 90% nucleotide sequence identity and ≤ 1E-100 e-value against β-lactamase gene-positive complete plasmids revealed in this study; (ii) sharing ≥ 10 ORFs ( ≥ 99% identity) against these plasmids; (iii) possession of the ARGs; and (iv) existence of metadata related to isolation source. Pan-genome analysis using the collected plasmids was performed using Roary version 3.12.0 [69] with a parameter of 99% BLASTP percentage identity cutoff, followed by construction of a distance matrix using R package “proxy” with edge weights of plasmids sharing ≥ 10 ORFs. NMDS and community structure analysis were performed by “vegan” and “igraph” of R package, respectively. Plasmidome data were visualized using Cytoscape version 3.7.2.


Introduction
The World Health Organization has endorsed a global action plan on antimicrobial resistance (AMR), which calls upon all nations to adopt mitigation strategies within two years [1]. However, there is still a need to fully understand the ecology and evolution of AMR based on the onehealth approach. In particular, the properties of microbial resistome in ecosystems dominated by humans and how to monitor such environmental factors to evaluate their potential risk for promoting the evolution of AMR have to be su ciently characterized. Instead of the a speci c vector and carrier as the most suspected risk factor. The risks of ARB by colonization or infection to humans from the environment must be assessed based on ARB clone genetics rather than by the prevalence of ARGs [7].
The purpose of this study was to characterize the environmental AMR burden from the e uent of WWTPs around the Tama River and Tokyo Bay area based on metagenomic analysis of ARGs and whole-genome sequencing of ARB isolates. This study demonstrated that AMR monitoring could uncover the actual status of the e uents from WWTP and would suggest how we should conduct further studies to reduce the related AMR burden to prevent it from becoming a potential health risk.

Results
Bacterial proportion in WWTP e uents Surface water from the sewage e uent samples was collected from eight WWTP sites along the Tama River and around Tokyo Bay (Fig. 1) and the recreational beach area (BEC1) (Fig. 1). Basic information on the WWTPs is summarized in Table 1. The sampling was conducted in summer and winter seasons for two years between August 2017 and February 2019 (Table S1). The collected water samples were ltered through a 0.2-µm pore membrane, and DNA was extracted from the membrane, followed by metagenomic DNA-seq short-read sequencing (Fig.  S1).
Taxonomic classi cation of domain rank (Fig. S2A) suggested that the e uents of fresh water area (WP1-8) were mostly composed of bacterial sequences, whereas those of brackish water (WP9 and BEC1) were rich in eukaryote organisms, such as mussels (Mytilus edulis) and diatoms (Chaetoceros spp.), found in seawater. Further taxonomic classi cation of family rank (Fig. 1B) suggested that fresh water area (WP1-8) e uents predominantly contained Comamonadaceae, which are aerobic gram-negative β-proteobacteria as reported in sewage active sludge digesters [8], and Mycobacteriaceae, which are rich organisms in sewage active sludge [9]. The e uents of brackish water (WP9 and BEC1) were rich in Rhodobacteriaceae, which mostly habitats aquatic environments.
Non-metric multidimensional scaling (NMDS) showed that plots of the e uents of fresh water area (WP1-8) were closely scattered, but the plots were markedly divided in a season-dependent manner ( Fig. 2A). Linear discriminant analysis effect size (LEfSe) analysis revealed that multiple aquatic habitat bacteria were mostly observed in winter, whereas some speci c members of the human gut ora (Streptococcus, Escherichia, and Anaerococcus), Cyanobacteria (Dolichospermum), and purple sulfur bacteria (Rheinheimera) increased in summer (Fig. 2B), demonstrating typical seasonal features.

ARG proportion in WWTP e uents
Abovementioned metagenomic DNA-seq short reads were analyzed by Blastn homology search in the ResFinder ARGs database under normalization using reads per kilobase of exon per million mapped reads (RPKM). Prior to the evaluation of ARGs, we assessed whether all metagenomic DNA-Seq short reads were properly obtained to characterize the bacterial population size using RPKM normalization for 16S rRNA genes and rpoB RNA polymerase β-subunit orthologs (Fig. S2B). All samples, except for BEC1, in winter 2019 showed no signi cant difference with an average of four copies of the 16S rRNA gene per one copy of rpoB, suggesting that RPKM estimation is an effective approach to characterize the relative abundance of ARGs. Resistance genes to quaternary ammonium compounds (QAC), sulfonamide (SA), aminoglycoside (AMG), β-lactam, and macrolide (MAC) were markedly detected in e uents from all WWTPs, except for WP9 and BEC1 (Fig.   3A). Unlike the bacterial proportion shown in Fig. 2A, a seasonal difference was not observed in ARG variations (Fig. 3B), but it appeared to be dependent on sampling year and season.
We speculated that possible occasional trends of ARB dissemination may affect ARG variations at every sampling and WP site. The detected ARGs in each AMR category were further classi ed into orthologous genes (or gene families) ( Fig. 4A and Table S2). The most detected ARG was the SA resistance gene sul1 at 440.5 RPKM (Table S2); sul2 was detected at rather low levels, but sul3 was not detected. The second most detected ARG was the QAC resistance gene qacE at 203.8 RPKM (Table S2), suggesting that basic gene sets (sul1 and qacE) in the class 1 integron [10] were the most predominant in the detected ARGs.
Regarding clinically important ARGs, multiple AMG-and MAC-resistant genes were identi ed (Fig. 4A). The seven most predominant AMG resistance genes belonged to the aadA and aph gene families (Table S2), which contribute to streptomycin resistance, implying that streptomycin could be the most potent selective agent among aminoglycosides. The two most predominant MAC resistance genes were msr(E) and mph(E), which are primarily located on the chromosomes of Acinetobacter and Proteus species (The Comprehensive Antibiotic Resistance Database, CARD https://card.mcmaster.ca/ontology/39685).
Regarding β-lactamase gene type, bla OXA , bla GES , and bla IMP were the most markedly detected (Fig. 4A), whereas the bla CTX-M cluster, which is a popular extended-spectrum β-lactamase (ESBL) gene in clinical settings, was in observed in minute abundance at 0.82 RPKM (Table S2), indicating that other β-lactamase type genes, such as bla OXA and bla GES , may be present in baseline levels were derived from environmental bacteria but not pathogenic bacteria.
In addition to the distribution of ARG, metagenomic analysis suggested a unique season-dependent ARG detection. For instance, qnrS2, qnrS6, and aac(6')-Ib increased markedly in summer (Fig. 4B), whereas aadA13 was detected in winter (Fig. 4C). These season-dependent ARG detection patterns suggested that, to some extent, ARGs could be affected by seasonal elements in a particular temperature-dependent manner (above 30°C in summer and below 10°C in winter; Table S1) similar to that observed in the growth of ARB in activated sludge of the anaerobic-anoxic-oxic (A 2 O) water treatment system. However, another possibility remains that seasonal (or temporal) infectious trends may in uence the differences in antimicrobial use in the WP-dependent treated area, leading to differential human gut ora as excreta. Thus, possible WP-dependent ARG differences were elucidated using a fold change (log 2 ratio) of RPKM (Fig. S3A). In fact, season-dependent ARGs as well as steadily increasing ARGs were observed in WPspeci c features (Fig. S3B). WP2 e uents showed that the RPKM of major ARGs [AMG, aadA2 and aadA4; MAC, msr(E) and mph(E); SA, sul1; and QAC, qacF] increased steadily (Fig. S3B), suggesting that the population of major ARB species may increase predominantly. Likewise, the WP-speci c increase in ARGs was identi ed at other WPs, and metagenomic RPKM value could be a suitable index as one of the evaluation standards to monitor such increasing trend of ARB in WWTP-speci c areas.

Heavy-metal resistance genes (HMRGs) in WWTP e uents
Resistance genes to mercury (Hg), copper (Cu), and arsenate (As) were markedly detected in e uents from all WWTPs ( Fig. S4A and Table   S3). Unlike the bacterial proportion shown in Fig. 2A, seasonal difference was not observed in heavy-metal resistance (Fig. S4B), but it appeared to be dependent on sampling year and season as well as on ARGs (Fig. S5).
Genome analysis of ESBL/carbapenemase-producing organisms (EPO and CPO) isolated from WWTP e uents and environment The metagenomic approach is good for revealing overall ARGs, but it is insu cient to characterize potentially pathogenic ARB. Therefore, we isolated EPO and CPO from fresh 50-mL samples of each e uent (Fig. S6). CHROMagar ESBL selection demonstrated that potential clinically pathogenic Enterobacteriaceae (E. coli, Klebsiella, and Enterobacter spp.) were observed markedly at approximately 500 colonies from the e uents of WP4, WP5, and WP8 at all tested sampling times (Fig. S6). More than hundred typical chromogenic colonies, which would be su cient to characterize environmental pollution due to EPO and CPO, were isolated at each sampling time ( Table 2). These isolates were subjected to whole-genome sequencing ( Fig. S6 and Table S4). CTX-M-positive organisms were isolated as follows: 134 (31%) ( Table 2) Table S4). The EPO and CPO isolates on CHROMagar ESBL and its sequence type (ST) are summarized in Table S5. Eighteen CPO isolates (seven bla IMP -positive, eight bla KPC -positive, and three bla NDM -positive) were identi ed ( Table 2).
Among the 71 E. coli isolates, nine isolates of predominant ST131 (11 isolates) possessed bla CTX-M-27 gene (Table S4). Core-genome single nucleotide variation (SNV) phylogeny (see detailed procedures in Fig. S7) of the ST131 isolates were compared with publicly available E. coli ST131 draft or complete genome sequences (a total of 315 strains; see Table S6), suggesting that primarily four subclonal types of ST131 were identi ed from the WWTP e uents, two isolates of ST131-mH41 with bla CTX-M-27 , seven isolates of ST131-mH30 with bla CTX-M-27 , one isolate of ST131-mH30 with bla CTX-M-3 , and one isolate of ST131-mH30 with bla CTX-M-15 (Fig. S8).
Among the hundred isolates of Aeromonas species (Table 2), 10 isolates of A. hydrophila, 10 isolates of A. caviae, one isolate of A. media, and four isolates of A. veronii were identi ed as CTX-M producers (Table S4). Although four isolates of CTX-M-14 producing A. caviae were detected in different WWTP e uents (WP2, 4, and 7), core-genome SNV phylogeny revealed that they were clonal in 21 SNVs with same recombination and pangenome regions (Fig. S9A), suggesting that the A. caviae clone may be successfully predominating in the general WWTP environment in Tokyo but not in WWTPs at speci c locations.
Among the 51 isolates of Klebsiella species (Table 2), eight isolates of K. pneumoniae, seven isolates of K. quasipneumoniae, two isolates of K. variicola, and seven isolates of other Klebsiella sp. were identi ed as CTX-M producers (Table S4). Core-genome SNV phylogeny revealed that four isolates of CTX-M-3-producing K. quasipneumoniae ST668 exhibited clonality in 14 SNVs and shared similar recombination and pangenome regions (Fig. S9B), although these isolates were obtained from the same place (WP5) but at different sampling times (summer 2017, summer 2018, and winter 2019), suggesting that the ST668 clone may remain in active sludge at WP5 for at least one and a half years.
Plasmidome analysis of EPO and CPO isolated from WWTP e uents and environment ESBL and carbapenemase β-lactamase genes are acquired by conjugative plasmid transfer among variable strains in Proteobacteria. Genetic network analysis among the AMR plasmids could be useful to uncover the mode of relational transfer and to trace dissemination. Of the potential 433 EPO/CPO isolates (Table 2), 85 strains were identi ed as complete genome sequences in this study (Table S4). Of the 304 complete plasmid sequences, 73 β-lactamase-positive plasmids (Table S7) Table S8) were subjected to pangenome analysis using Roary and NMDS (vegan in R package) and visualized; the results revealed that 14 communities were classi ed and clustered (Fig. 7).
Most network communities (Co) showed notable incompatibility (Inc) replicon-dependent distribution, suggesting that the network analysis was well-performed based on the genetic features of each Inc replicon. The major Co comprised Co10, Co11, and Co12 of the shared 423 plasmids among all tested 831 plasmids. Co10 was primarily IncFIA and IncFIB (AP001918) replicon plasmids harboring bla CTX-M gene in E.
coli from human sources, whereas Co12 was primarily IncFIA(HI1) and Inc FIB(K) replicon plasmids harboring bla CTX-M , bla KPC , and bla GES genes in Klebsiella from human sources (Fig. 7). This network analysis highlighted the potential mutual role of Co11 plasmids between Co10 and Co12 plasmids because Co11 plasmids are placed around Co10 (rich in E. coli) and next to Co12 (rich in Klebsiella). Co11 is primarily rich in IncFII, Inc FII(pHN7A8), IncX1, and IncN replicon plasmids harboring bla CTX-M , bla KPC , and bla NDM genes and mainly detected in E. coli from human and animal sources. Therefore, Co11 plasmids may contribute to the horizontal ARG transfer between E. coli and Klebsiella. Regarding other notable Co-exhibiting speci c replicon and host bacteria, Co3 comprised IncHI2 and IncHI2A replicon plasmids harboring bla CTX-M and bla IMP genes and was mainly detected in Enterobacter from human sources. Co4 comprised IncB/O/K/Z and IncI1-gamma replicon plasmids harboring bla CTX-M gene and were primarily detected in E. coli from human, environmental, and animal sources. From the viewpoint of carbapenemase genes, gene-speci c distribution was found as follows: bla IMP in Co1 and Co3; bla KPC in Co6, Co8, and Co12; bla NDM in Co5, Co11, and Co13. Such an ARG distribution could be involved in plasmid replicon and its bacterial hosts.

Discussion
We have demonstrated that urban WWTP e uent could be a potential source of AMR burden by the detection of ARGs using a metagenomic approach and whole gnome analysis of the ARB isolates. Furthermore, there is concern that hospital and community e uents include a considerable proportion of ARGs in WWTPs [12]. Hospitals are not the only sources of antibiotic resistance in the environment. According to the European Centre for Disease Prevention and Control, a signi cant proportion of antibiotics consumed by humans are in the community rather than in healthcare settings [13], suggesting that outpatient therapy could be the most effective factor in increasing the proportion of antimicrobials, selected ARGs, and ARB in WWTP. Therefore, a nation (or regional)-wide survey of ARGs and ARB in WWTP e uents could provide a rapid and e cient method for assessing environmental AMR burden from urban populations. As suggested in this study, ARG pro les in WWTP could then re ect the structure and diversity of resistant bacteria in the gastrointestinal tracts of urban residents within the WWTP catchment area (Figs. 3 and 4). This may be particularly true when the WWTP primarily treats domestic wastewater without signi cant contributions from agricultural and industrial sources [14]. Through the two-year investigation, we found some bacterial hosts and ARGs ( Fig. 4B and C) that increased seasonally (Fig. 2).
The risk assessment of transmission of AMR vectors from the environment to humans cannot be adapted from the model for pathogens because most AMR vectors are supposed to be composed of non-or low-pathogenic bacterial species. Therefore, they can colonize humans without notable symptoms, resulting in a healthy carrier [7]. As colonization may be asymptomatic in most humans, this may cause further dissemination under frequent abuse of antimicrobials in clinical use, leading to an underestimation of the extent of transmission of ARB from the environment to humans as well from humans to humans in the community. Therefore, such reservoir ARBs should be considered for the overall AMR risk assessment. A major reservoir of antibiotic-resistant gram-negative bacteria is the gut microbiota of humans and animals, particularly apparent after exposure to antibiotics. The most active potential bacteria that play a pivotal role as carriers and vectors appear to be members of the classes γ-proteobacteria and β-proteobacteria and of the phyla Actinobacteria and Firmicutes [15]. Members of the family Enterobacteriaceae and of genera such as Aeromonas, Acinetobacter, Pseudomonas, Enterococcus, and Staphylococcus have been frequently documented as carriers in wastewater samples, and some of them may contribute to the role of vector for HGT. Although nonpathogenic carrier bacteria are not able to colonize and infect humans, their proliferation and subsequent dissemination in the environment would increase the abundance and diversity of ARGs in vectors. Hence, it may increase the risk of transmission of pathogenic ARB, such as hypervirulent Klebsiella, to humans.
Thus far, the healthy carriage rate of EPO has been rising worldwide. In Japan, the detection rate of EPO was reported to be 12.2% in healthy adult volunteers [16] and 15.6% in healthy food handlers [17]. Because the number of ESBL-positive heathy carriers has been increasing around the world, Japan is no exception to the issue of increasing AMR healthy carriers. Indeed, according to the Japan Nosocomial Infections Surveillance (JANIS), clinical reports of antibiotic-resistant gram-negative bacteria are increasing (https://janis.mhlw.go.jp/english/index.asp).
In addition to nosocomial AMR surveillance, the WWTP metagenomics approach will prevent the dissemination of the bacterial ora of predominantly healthy individuals carrying notable ARG/Bs. Among potential EPO healthy carriers, globally disseminated E. coli ST131 clones should be rst investigated. This study suggested that multiple ST131 clones in the WWTP e uents harbor the currently prevalent CTX-M variants (Fig. S8). E. coli O16:H5-ST131-mH41 with bla CTX−M−27 is a marked clone (Fig. 5). Indeed, the increase in this clonal group harboring bla CTX−M−27 has been reported in China [18], EU [19], and Oceania countries [20], which are one of the marked clonal disseminations associated with ST131 CTX-M-27 producers in Japan [21] because this increase so far has been speculated by the observation that CTX-M-27 producers have a higher MIC of ceftazidime compared with that of CTX-M-14 producers [22]. In contrast to ST131 with bla CTX−M−27 , other E. coli STs still harbor bla CTX−M−14 , which is the predominant variant since the early 20th century in Japan [21], suggesting that these other STs have not been shifted to acquire more potential CTX-M-27 thus far.
Plasmidome network analysis (Fig. 7) exhibited a good approach to illuminate bacterial communication through plasmid-based HGT. This study highlighted bacterial species-dependent plasmid tropism and its compatibility among distinct species. The bla CTX−M variants were identi ed in wide variable plasmids, whereas carbapenemase genes (bla NDM , bla KPC , and bla IMP ) relatively showed a typical plasmid replicon with speci c distribution, suggesting that the regional-speci c distribution of carbapenemase [23] may be associated with its locality [24]. However, we have reported NDM-5-and CTX-M-55-coproducing E. coli GSH8M-2 isolated from WWTP e uent (WP8) in Tokyo Bay [25]. Such co-producers will be the next clonal dissemination with powerful activity against multiple β-lactams. Therefore, such network analysis will be useful for the prediction of future clonal expansion in local to global dissemination.
The diverse bacterial population from household, hospital, and industrial wastewaters reach the WWTP where conditions for HGT are optimal due to the formation of bio lms in activated sludge and stress caused by various pharmaceutical and heavy metal contamination [26]. It is now well-demonstrated that even low concentrations of antibiotics can result in the selection of ARGs [27]. Furthermore, the uoroquinolone antibiotic cipro oxacin markedly induces reactive oxygen species (ROS), leading to a mutant-generating cell subpopulation and homologous recombination [28]. Such an SOS response in bacteria increases the mutation rate by increasing the expression of error-prone DNA polymerases [29]. Thus, it is very di cult to standardize the upper limit of an antibiotic concentration in wastewater. When the antibiotic concentration is low by dilution and active processing in WWTPs, non-corresponding contaminants (e.g., heavy metals, organic pollutants, and physical and chemical factors) may share or substitute the role of antibiotics and contribute to the propagation of ARGs in the environment.
The main route of ARG dissemination may switch from active transmission, which is dominated by antibiotics, to passive transmission, which is motivated and affected by non-corresponding contaminants [30]. In fact, Azuma et al. reported the concentration of pharmaceutical compounds (PhCs) in the Japanese environment and demonstrated 48 PhCs in the range of 1 ng/L (losartan carboxylic acid) to 228 µg/L (acetaminophen sulfate) in hospital e uent, and the contribution of the pollution load derived from hospital e uent to WWTP in uent was estimated as 0.1-15% [31].
Heavy metals are also one of the most important contaminants as selective pressure to ARB [32,33]. Unlike antibiotics, metals are not subjected to degradation and can subsequently represent prolonged selective pressure. Overall, heavy metals may be one of the dominant factors in estuaries and marine environments, and they can also play an important role in the maintenance and proliferation of ARGs when antibiotic-selective pressure is weak. In particular, heavy metals are used as feed supplements, followed by accumulation in manures, thus, indicating the potential for co-selection of ARGs [34]. Signi cant correlations exist between ARGs and heavy metals (Cu, Zn, and As), suggesting that the contamination of ARGs is related to the chemical pollution of the sediment [32]. The presence of heavy metals provides another co-selection pressure for ARG [34,35]. Such co-selection would generate a co-resistant phenotype located together on the same genetic determinant responsible for resistance to antibiotics and other contaminants. In particular, even short-term copper stress signi cantly enhances the abundance of ARGs and mobile genetic elements (MGEs) along with increasing Cu concentrations and can signi cantly change the potential of soil ARGs [36].
Many societies still face water scarcity due to increasing population and land use, and conventional water resources are insu cient in many regions to meet the needs of the growing population. Any water treatment or distribution system should focus on decreasing the discharge of ARB because transmission of ARGs is evidenced by the ndings that non-potable wastewater harbors a higher number of ARGs [37] and that these ARGs are distributed across a broad range of microorganisms rather than any speci c group [38]. In addition to conventional standard treatment (biological treatment, followed by A 2 O and chlorination for disinfection), advanced treatment technologies, such as photocatalysis, membrane ltration, activated carbon adsorption, and advanced oxidation processes (AOPs), should be implemented to remove emerging contaminants from wastewater [39].

Conclusions
Urban WWTP e uent, even with proper treatment, may cause AMR burden with a high frequency of acquired ARGs in the environment. The dissemination of ARB/Gs in the environment can increase the risk of infectious diseases [40], but there is scarce direct evidence about their epidemiological effects. This study revealed the presence of ARB/Gs in WWTP e uents, suggesting that every urban community, including hospitals, healthy carriers, and travelers, can be a potential source of ARB/Gs. WWTP is speculated to be a hotspot where various bacteria can acquire ARGs. Monitoring of ARB/Gs in WWTP e uent is expected to identify the presence of undetected nosocomial infections and healthy carriers at the earliest, leading to the detection of potential on-going dissemination of ARB/Gs in the overall community.

Sample collection
Sewage e uent samples were collected from eight WWTPs (WP1-WP9, and WP6 was treated as missing number due to incorrect sampling locations) along the Tama River and around Tokyo Bay (Table 1 and Fig. 1). As an environmental control sample, surface water from a recreational beach (BEC1) was collected (Fig. 1). Four sampling months during the two years were August 2017, February 2018, August 2018, and February 2019 (Table S1). The water samples in this study were collected on ve consecutive days without recent rainfall to exclude the effect of weather. All samples were collected in 500-mL sterilized containers (Corning® Easy-Grip round, plastic storage bottles of 500 mL).
Detailed information on these samples is summarized in Table S1. A metagenomic DNA-seq library was prepared using the QIAseq FX DNA library kit (QIAGEN, Hilden, Germany), followed by short-read sequencing using a NexSeq 500 platform (2 × 150-mer paired-end) (Illumina, San Diego, CA, USA). Adapter and low-quality sequences in shortread data were trimmed using Sickle version 1.33 (https://github.com/najoshi/sickle) with the following parameters: average quality threshold, "-q 20" and minimum length threshold, "-l 40". The metagenomic analysis was performed using KrakenUniq version 0.5.8 [41] 1,670,542). Subsequently, the database with these sequences was built using the KrakenUniq build program with a K-mer length of 31 bp. All metagenomic data were summarized with Pavian version 1.0.0 [42]; rare genera with one sequence count in one sample were excluded. Alpha diversity indices with InvSimpson diversity were calculated with the R package "vegan" version 2.5.6 (https://CRAN.R-project.org/package=vegan). NMDS ordination of the relative abundance was generated at the genus level, and NMDS plots were visualized using method metaMDS from the "vegan" with Bray-Curtis distances. In an attempt to identify metagenomic biomarkers, LEfSe analysis was performed with relative abundance at the genus level in bacteria [43]. In this study, p < 0.05 was considered signi cant for both statistical methods. ARGs with markedly differential reads per kilobase per million mapped reads (RPKM) numbers were de ned as those with a linear discriminant analysis (LDA) score (log 10 ) of > ± 2.0.
The detected genes were summarized for each OTU of ARGs and HMRGs. For normalization, RPKM was calculated using the following formula: RPKM = number of detected reads against OTUs/(average gene length of detected OTUs (bp) × total number of trimmed reads) × 10 9 . The NMDS plot for resistome analysis was also visualized as described in the previous section. Statistical analysis was performed by exact Wilcoxon rank-sum test using the R package "exactRankTests" version 0.8.30, followed by multiple comparisons by the Benjamini-Hochberg false discovery rate (FDR) procedure (q-value) using the R package "FSA" version 0.8.30. Differences were considered signi cant with a pvalue of < 0.05 and q-value of < 0.4.

Whole-genome sequencing of bacterial isolates
The experimental procedures are shown in Fig. S2. Freshly collected 50 mL of water sample was centrifuged at 7,000 × g for 3 min, and the cell pellets were suspended with residual water (~ 500 µL) and spread on CHROMagar ESBL plates (CHROMagar, Paris, France), followed by incubation at 36 °C for 24 h. Marked suspected colonies of E. coli, Klebsiella, Enterobacter, Aeromonas, and Pseudomonas (Fig. S2) were isolated as single clones, followed by whole-genome sequencing, similar to the metagenomic DNA-seq described in the above section. In addition to gram-negative ESBL-producing bacteria, potential CPO were isolated as described previously [50]. Brie y, 500 mL of e uent was ltered, and the membrane was incubated in 20 mL of Luria-Bertani (LB) broth supplemented with 1 mg/L meropenem at 37 °C for 14 h, after which the culture was spread on CHROMagar ESBL plates (CHROMagar, Paris, France).

SNV phylogenetic analysis and pan-genome analysis
The owchart for SNV phylogenetic analysis is summarized in Fig. S7. Assembly data and/or Illumina raw reads of four species (E. coli, A. baumannii, A. caviae, and K. quasipneumoniae) were retrieved from the NCBI database as of April 2019. If assembly data were not deposited in the NCBI assembly database, raw sequence data downloaded from the SRA database were assembled using SKESA version 2.3.0 [59].
MinHash sketch for each genome sequence of this study (n = 53) and NCBI data (n = 76,449) were constructed using the sourmash program version 2.0.0a4 [60] with the following parameters: compute -k 21 -scaled 4000. The MinHash sketch database was built using the sourmash program (parameter: index -k 21) with the NCBI dataset, followed by MinHash search against 53 genome sequences of this study. The collected dataset was analyzed for core genome SNV and pan-genome analyses for each species or ST type. In SNV analysis, the longest chromosomal sequences from this study were selected as reference sequences. Repeat and prophage regions of reference sequences were analyzed using NUCmer (MUMmer 3.0) [61] and prophet [62] programs, respectively. If available data were only contig sequences from the NCBI database, the SimSeq program (https://github.com/jstjohn/SimSeq) was used to construct the simulated 150-mer paired-end reads of 200 bp insert length. Read mapping was performed using BWA-MEM [63] with default parameters against the reference chromosomal sequences, followed by variant calling using VarScan version 2.3.4 [64]. SNVs on repeat and predicted prophage regions were removed, and recombination regions were predicted using the Gubbins software [65], followed by ltering SNVs on recombination regions. Core genome SNV phylogenetic analysis was performed by the approximate maximum likelihood phylogenetic method using FastTree v2.1.10 [66], followed by visualization using fandango version 1.3.0 [67] and Interactive Tree Of Life (iTOL) version 3 [68]. SNV network analysis was performed using the median joining network method and TCS network method of PopART (http://popart.otago.ac.nz). Pan-genome analysis of predicted open reading frames (ORFs) was performed using Roary version 3.12.0 with default parameters [69].

Plasmidome analysis
The owchart for plasmidome phylogenetic analysis is summarized in Fig. S10. Complete plasmid sequences were retrieved from the NCBI database as of November 2019, followed by re-annotation and ARG detection using DFAST with ResFinder and BARRG databases. The plasmid database of these nucleotide and protein sequences was constructed using BLAST. For plasmidome analysis, complete plasmids in public database were selected according to the following criteria: (i) ≥ 90% nucleotide sequence identity and ≤ 1E-100 e-value against βlactamase gene-positive complete plasmids revealed in this study; (ii) sharing ≥ 10 ORFs (≥ 99% identity) against these plasmids; (iii) possession of the ARGs; and (iv) existence of metadata related to isolation source. Pan-genome analysis using the collected plasmids was performed using Roary version 3.12.0 [69] with a parameter of 99% BLASTP percentage identity cutoff, followed by construction of a distance matrix using R package "proxy" with edge weights of plasmids sharing ≥ 10 ORFs. NMDS and community structure analysis were performed by "vegan" and "igraph" of R package, respectively. Plasmidome data were visualized using Cytoscape version 3.7.2.

Declarations
Ethics approval and consent to participate Not applicable Consent for publication Not applicable Availability of data and materials All complete sequences are available from the DDBJ/EMBL/GenBank database (accession numbers AP021908-AP022304, see Table S4). All raw read sequence les are available from the DRA/SRA database [accession numbers DRR198489-DRR198524 (metagenomic data, see Table S1) and DRR199157-DRR199560 (whole-genome data, see Table S4)].

Competing interests
The authors declare that they have no competing interests.  See the information of whole-genome sequence for all isolates in Table S4.
EPO, extended-spectrum β-lactamase-producing organisms CPO, carbapenemase-producing organisms    S4. A) HMRGs in e uents from WWTPs (WP) and beach (BEC1). Metagenomic DNA-seq short reads were analyzed by homology search in BacMet2 database (bioinformatics resource of antibacterial biocide-and metal-resistance genes. http://bacmet.biomedicine.gu.se/), followed by normalization with RPKM. See Table S3 for the detail detection of ARGs list. B) NDMS plots of metagenomic sequencing reads classi ed for heavy-metal resistance in WWTP e uents (WP1-9) and recreational beach (BEC1). Fresh water area in WWTPs (WP1-8, see Fig.   1) were closely scattered (shaded with gray square), but the plots were markedly divided in a sampling time-dependent manner with partial WWTP dependency.     Fig. S7) of E. coli ST131 isolates using publicly available ST131 draft or complete genome sequences (a total of 315 strains; see Table S6). Genome sequence of A) A. caviae and B) K. quasipneumoniae isolates were subjected to SNV network, genome recombination, and pangenome analysis, and the results suggested a marked clonality among isolates, although the isolates were obtained at different WP sites or time points.  Table S1. Metagenomic DNA-seq information for e uent of WWTP and environment water samples in this study.  Table S4. Whole-genome sequence information for the strain growing on CHROMagar ESBL. Table S5. Summary of isolates on CHROMagar ESBL and its sequence type in this study. Table S6. Strain information of E. coli used in core-genome SNV phylogenetic and pan-genome analyses. Table S7. Complete plasmid sequence information of the strain growing on CHROMagar ESBL. Table S8. Complete plasmid sequence list for plasmidome analysis. Table S9. Summary of absolute count of plasmids against each category in plasmidome analysis. Figure 1 Outline map illustration of water sampling sites along the Tama River and around Tokyo Bay area. E uents of WWTPs and surface water from recreational beach (BEC) were obtained for further AMR investigation in this study.

Figure 2
Metagenomic analysis for bacteria detection of WWTP e uents. A) NDMS plots of metagenomic sequencing reads classi ed at the bacterial genus level for WWTP (WP1-9) e uents and recreational beach sample (BEC1). Fresh water areas in WWTPs (WP1-8, see Fig. 1) were closely scattered (shaded with gray square). Plots were markedly divided in a season-dependent manner but were not WWTP dependent. B) The seasonal feature was observed in multiple bacterial genera by LEfSe analysis (score threshold: 2.0).

Figure 5
Characterization of E. coli ST131 isolates from WWTP e uents. A) Among 11 E. coli ST131 isolates, nine isolates predominantly possessed blaCTX-M-27 genes (Table S4). Core genome SNVs phylogeny (see detail procedures in Fig. S7) suggested that primarily four subclonal types of ST131 were identi ed from WWTP e uents (Fig. S8). E. coli WP5-S18-ESBL-09 is genotyped as O16:H5-ST131-mH41 with blaCTX-M-27 and exhibited a marked clonality showing <12 SNVs with clinical isolates in several countries (Southeastern Asia, EU, and Oceania). B) SNV network, genome recombination, and pan-genome analysis suggested that the genome structure of these ST131 strains were very stable, except for some ARGs that are transferred with conjugative plasmid.