Microbiome divergence across four major Indian riverine water ecosystems impacted by anthropogenic contamination: A comparative metagenomic analysis

Indian rivers are a major source of livelihood as river water is used for drinking, agriculture, and religious purposes to a large population. In this study, we report comparative microbial structures and functional potential of four major rivers of India, namely Ganga, Narmada, Cauvery, and Gomti. Comparative microbiome study of these geographically distinct rivers was performed using the samples collected from the source to the downstream sites of each river. We employed metagenomic approach to comprehensively determine the taxonomic and functional potential of river microbiome. In this study, we report the pollution inuences on microbial composition and functional potential of four distantly located rivers. Results revealed signicant microbial diversity in contaminated locations as compared to the upstream samples. A total number of 37 bacterial phyla were detected out of which Proteobacteria, Actinobacteria, Bacteroidetes, Planctomycetes, and Verrucomicrobia were abundant. Microbial diversity in respect to anthropogenic activities revealed the prevalence of Acidobacteria, Actinobacteria, Verrucomicrobia, Firmicutes, and Nitrospirae phyla, whereas a decline in Proteobacteria and Bacteroides. Virulent and temperate bacteriophages were found high in Ganga when compared to others. Interestingly, the abundance of bacteriophage decreased with increasing pollution load in the river Ganga, unlike in other rivers. The carbon utilization studies indicated a correlation with functional genes occurred in metal contaminated sites. Ganga water has relatively higher trace elements at pristine-upstream than in the Narmada and Cauvery, indicating its origin from Himalayan rocky mountains and also both Ganga and Cauvery rivers found to harbour a large number of metal resistance genes. rivers Metal

2.2. Comparative GC content characteristics of river metagenomic datasets GC content of a microbiome was reported to exhibit a dependency on the physicochemical characteristics of respective habitat [23,24]. It was found that upstream locations of river samples G1, C1, N1, and C3 sample showed low GC content (~53% GC) and similarity was observed in GC distribution pattern of the sequences. In the diversity analysis, it was observed that these four samples showed similar bacterial diversity by forming a separate clade in operational taxonomic unit (OTU) cluster analysis (Fig 7). Whereas downstream samples G6, G7, C4 and N4 showed similar GC content distribution patterns and also formed a separate clade in OTU cluster analysis. Ganga river samples G6 and G7 showed similar GC distribution patterns, whereas G1, G3, G4 and G5 showed similar GC distribution patterns, interestingly, G2 sample showed an entirely different GC pattern with respect to other samples of Ganga. Cauvery river sample C1 was closer to C3, and C2 was found to be closer to C4 in terms of GC content distribution pattern. In the Narmada river, N1 exhibited very different GC content (~53%) distribution as compared to the other four samples N2, N3, N4 and N5, which had 58 -62%. All three samples of Gomti river showed high GC content of 58 -62%. It was also observed that the GC content distribution pattern was highly similar to the occurrence of bacterial diversity clustered at OTUs level. Thus, GC distribution pattern correlates very well with the OTUs cluster analysis, as shown in additional le 1: Fig. S3 and Fig 7. 2.3. Microbial diversity structure Assessment of alpha diversity indices revealed signi cant differences between the pristine and polluted sites of all the rivers. The estimates of the observed OTUs richness and evenness were found to increase at the polluted site in comparison to the pristine sites in all the rivers. Furthermore, the statistical signi cance of the grouping based on experimental factors was also estimated using either parametric or nonparametric tests. Alpha diversity measurement across all the samples for the given diversity index is provided in Fig.2. The boxplot analysis of alpha diversity richness and evenness results indicated that Gomti river has the highest bacterial diversity, followed by Narmada, Ganga, and Cauvery ( Fig. 1) in that order. Chao-1 index showed that Lko2_Bhatpur have the highest richness (chao1-2229) followed by N4_Hoshangabad (chao1-2222), G6_Bithoor (chao1-2175) and C4_Erode (chao1-2162) representing the Gomti, Narmada, Ganga, and Cauvery rivers respectively. Shannon index showed that Lko3_Gaughat have highest evenness (Shannon -23.24) followed by C4_Erode (Shannon-22.76), N4_Hoshangabad (Shannon- 16.71), and G6_Bithoor (Shannon-11.65) representing the Gomti, Cauvery, Narmada, and Ganga, rivers respectively (Table 1). When multiple samples were compared, a clear separate bracket in PCoA and NMDS plot was formed based on the pollution load using the phyloseq package3 in MicrobiomeAnalyst (Fig. 3). This bracket has downstream samples of all the rivers that include G6 and G7 of Ganga, C4 of Cauvery, N4 and N5 of Narmada and all three samples of Gomti.

Bacterial community composition
The bacterial community structure showed a signi cant increase in bacterial diversity with increasing anthropogenic activities at genus, order, class, and other higher levels of taxonomy. A total of 37 bacterial phyla were detected out of which the Proteobacteria, Actinobacteria, Bacteroidetes, Planctomycetes, and Verrucomicrobia were found to be prevalent in all samples (Fig. 4A). The increasing anthropogenic activity showed the dominance of gram-positive phyla, namely Acidobacteria, Actinobacteria, Verrucomicrobia, Firmicutes and Nitrospirae and gram-negative phyla such as Proteobacteria and Bacteroides. At class level Alphaproteobacteria and Betaproteobacteria were low and Deltaproteobacteria, Gammaproteobacteria, and Planctomycetia were found to be more abundant (Fig. 4B). For easy presentation and understanding, a graphical representation using the krona chart was prepared. These krona charts show bacterial diversity from the phylum level to species level with their relative abundance (Additional le 3-6).

Core microbiome analysis
Core bacterial orders identi ed by using MicrobiomeAnalyst tool indicated the unchanged microbiome composition across the samples from different river stretches. Core microbiome analysis at order level showed Burkholderiales, Rhizobiales, Planctomycetales, Rhodospirillales, Sphingomonadales Verrucomicrobiales, Rhodobacterales, Cytophagales, Acidimicrobiales, and Nevskiales prevailed in all the rivers (Additional le 1: Fig. S4) (Additional le 2: Table S1). Core microbiome analysis is adopted from the core function in R package microbiome [25]. The result showing comparative core microbiome occurrence is shown in a heatmap (Additional le 1: Fig. S4).
2.6. Bacteriophages pro ling pathogenic bacteria in river water systems. These bacteria are Staphylococcus aureus, Enterococcus faecalis, Pseudomonas aeruginosa, Salmonella typhimurium, Escherichia coli and Vibrio cholera. For this, dsDNA of phage that is incorporated into the bacteria (prophage) were analyzed from the total DNA. Both virulent and temperate phages are key drivers of bacterial community composition in natural aquatic systems. However, prophages were found to play a vital role in horizontal gene transfer [26][27][28][29]. Prophage is believed to provide several advantages to the host bacteria such as stress tolerance, bio lm formation, antibiotic resistance etc.
Analysis of virulent phage against six bacteria showed that river Ganga exhibits high diversity as compared to other rivers (Table 2). Interestingly, it was observed that the presence of S. aureus phage in G1_Bhagirathi and G3_Devprayag, the upstream samples of river Ganga. However, it was not observed in downstream samples which were more in uenced by anthropogenic activities. Intriguingly, river Ganga also showed an abundance of phage in pristine samples. Whereas, Cauvery and Narmada showed an abundance of phages in samples where anthropogenic interferences were more. In the case of Gomti river, as it receives a large volume of wastewater, it showed the highest concentration of phages for the hosts E. coli and E. faecalis (Table 2).
High throughput metagenomic sequencing revealed that the prophage related genes observed in this study are associated with 218 bacterial species.
Alaknanda sample of river Ganga showed the highest abundance of phage speci c genes, followed by Bhatpur sample of river Gomti. In comparison to other rivers, Ganga showed a much higher abundance of prophage speci c genes, namely phage prohead/capsid assembly, phase terminase large subunit, phase portal protein, endolysin activity. In Ganga river, prophages are mainly contributed by G2_Alaknanda river which exhibits the highest number of prophage genes among all samples. These results suggested that like virulent phage, there is also an abundance of temperate phage presence in river Ganga. The temperate phage abundance pattern in various rivers indicates that increasing anthropogenic activities has led to a decrease in the population of prophage ( Fig. 5A). Analysis of the host bacteria harbouring the prophage showed a preponderance of phylum Proteobacteria and Terrabacteria. Among the top 15 bacteria that were harbouring maximum phage genes, 13 belonged to Proteobacteria and two others were found to be associated with Terrabacteria. At the class level, the dominant bacteria harbouring bacteriophages were from Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, Firmicutes and Chloro exi (Fig. 5B).
The prophage speci c genes were analysed using the ACLAME database. Our analysis showed that for most of the identi ed genes, ACLAME functional annotation was not available. However, the functionally annotated genes were found to code for prohead/capsid assembly followed by phage terminase large subunit, phage portal protein and others were present ( Fig S5). River Ganga showed the highest abundance of prophage speci c genes. Interestingly, the samples from industrially polluted sites of rivers Ganga, Gomti and Cauvery showed a signi cant dip in the abundance of prophage speci c genes.

Microbial community carbon utilization pattern analysis using Biolog
Functional metabolic diversity of microbial communities at different sites of rivers is re ected in terms of Average Well Colour Development (AWCD, Additional le 1: Fig. S6 A-D). Among the four rivers, the maximum AWCD range was observed in Ganga (0.701-1.946; Additional le 1: Fig. S6 B), whereas, the lowest AWCD range was found in Narmada (0.523-0.98; Additional le 1: Fig. S6 D). Moreover, higher AWCD was documented in the downstream sites of all the rivers in comparison to the upstream sites. The maximum AWCD was found at G6_Bithoor (1.94) and G7_ Jajmau (1.75) sites of Ganga and lowest in Narmada N1_Udgam (0.522; upstream sampling site). The diversity richness and evenness indices show signi cant differences between upstream and downstream sampling sites (Additional le 2: Table S2). The microbial communities based on carbon substrate utilization pattern showed two distinct clusters, for instance, river Ganga and Narmada clustered together, whereas Gomti and Cauvery formed a distinct cluster (Fig. 6 A). Notably, an analysis of carbon utilization pattern of samples individually showed an interesting trend, as the water samples collected from downstream sites clustered separately, except Gomti water samples (Additional le 1: Fig. S7 A). Cauvery showed the presence of similar carbon utilization pattern at C1_Talacauvery and C3_KRS Dam regions. However, the microbial communities of C2_Bhagamandla and C4_Erode showed a distinct pattern of substrate usage (Additional le 1: Fig. S7 B). In Ganga, G3_Devprayag, G2_Alaknanda and G4_Rishikesh, and G5_Haridwar were clustered close to each other. Also, the microbial communities from G7_Jajmau showed a distant pattern of carbon substrate usage and cluster in an entirely distinct way (Additional le 1: Fig. S7 C). A similar pattern was also noted for Narmada, which showed a close clustering within the microbial communities of N1_Udgam and N2_Kapildhara followed by N3_Bhedaghat and N4_Hoshangabad. The microbial communities of N5_Omkereshwar utilized different carbons as compared to the other four samples (N1 to N4) and hence clustered separately (Additional le 1: Fig. S7 D). The higher utilization pattern of carbon substrates in MT plates was recorded by Ganga followed by Narmada, Gomti and Cauvery (Fig. 6 B). In amino acids, higher utilization of Serine and L-Proline was noticed in Gomti, which was followed by Serine in Cauvery. Ganga and Narmada showed a maximum preference for Glutamine ( Fig. 6 B). In carbohydrates and miscellaneous, casein was utilized maximum in Ganga. Furthermore, chitin was the most preferred polymer in all rivers, and Gomti and Ganga utilized dextrin (Fig. 6 B). In carboxylic acids group, microbial communities of Gomti showed a preference for succinic acid, whereas Narmada, Ganga and Cauvery favoured the usage of citric acid as primary carbon source ( Fig. 6 B). Persistent pesticides were also found to be utilized by different bacterial communities present in the rivers, for examples the Gomti river sample showed a preference for anthracene, Cauvery for endosulfan, Ganga and Narmada for most of the pollutants studied ( Fig. 6 B). Moreover, site-speci c carbon substrate utilization pattern revealed that G6_Bithoor and G7_Jajmau utilize a maximum site-speci c carbon source (Additional le 1: Fig. S8). Microbial community of G4_Rishikesh sample utilized maximum carbohydrates. Furthermore, N5_Omkareshwar also showed higher use of different substrates (Additional le 1: Fig. S8).

Cluster analysis of microbial community in respect to carbon utilization pattern
Microbial community metabolic potential assay was performed using Biolog® Eco and MT plates. The AWCD data revealed the metabolic activity of microbial community which re ect the functional potential of actively available microbes. We used the AWCD data generated by growing bacterial communities on different carbon sources and correlated the above data with the microbial communities at species level. Data analysis was performed by using the Microbiome Analyst tool. It was observed that sample G1, C1, N1 and C3 (mostly upstream sites); sample G2, G3, G4, and G5 (Ganga middle stretch); samples N2, N3, N5, Lko1, Lko2, Lko3 and C2 (mostly the middle stretch of Narmada, Cauvery and all three samples of Gomti), and samples G6, G7, N4 and C4 (downstream of rivers) formed separate clusters (Fig.6). Based on these carbon sources utilization pattern and their correlation with microbial diversity at the species level it may be concluded that anthropogenic activities in uence the structure, occurrence and adaptation of the microbial communities in these perturbed ecosystems.

Impact of physicochemical parameters on microbial diversity
In an ecosystem, physicochemical parameters play a crucial role for the growth of organisms and therefore, the microbial diversity results were correlated with the physicochemical parameters obtained in the study (Additional le 2: Table S3). A signi cant correlation between microbiome and major physicochemical parameters of the samples analyzed through PCoA plot is provided as an Additional le 1: Fig. S9. Identi cation of statistically signi cant and enriched OTUs with respect to various parameters was performed through ANOVA and Parametric Student T-test. The number of key OTUs that are enriched within each metagenome given in Additional le 2: Table S4. We were able to identify OTUs that are uniquely enriched in one or more parameters using Gene Matrix algorithm. Nitrate and pH have the highest number of OTUs that are uniquely enriched in the respective comparisons. With respect to high pH (>8) genus Bordetella,Sphingomonas, Achromobacter, Phenylobacterium, Phenylobacterium, Sphingobium, Erythrobacter and Erythrobacter was found enriched whereas nitrate has led to the enrichment of Haliscomenobacter, Malonomonas, Hansschlegelia, Hyphomicrobium, Chelativorans, Peptostreptococcus,Psychrobacter, Psychrobacter, Psychrobacter and Psychrobacter.

Metal contamination and its correlation with microbial community
High levels of metals were detected in large number of samples that may be generated and introduced owing to the industrial discharges. High concentrations 70 and 100 µg/L of trace elements such as Mn, Fe, Co, Cu were recorded in Bhagirathi and Alaknanda (the upstream of Ganga) respectively as compared to 60 µg/L at Narmada Udgam. Their concentration further gradually increased in the downstream samples of Ganga (over 2500µg/L at Jajmau), however there was no signi cant change in Narmada. Cauvery has a relatively lower concentration of trace elements, except Cu (with <15µg/L total trace elements). In the river Gomti, the level of both trace and toxic metals were relatively high at all the sampling sites, i.e. the total level of trace and toxic elements up to 208 µg/L and 150µg/L respectively. The samples of Ganga from Bithoor (G6) and Jajmau (G7) showed the highest concentration of all the elements except Cd, which was highest in Gomti river (Fig 8). In Narmada and Cauvery, chromium (Cr) and arsenic (As) were mostly present at downstream while trace of lead (Pb) and cadmium (Cd) were present at all sites. Interestingly, the level of As was higher at all sites of Ganga in comparison to other rivers (Fig 8). Consequently, metal and antibiotic resistant bacteria identi ed as Acinetobacter baumannii species from the Bhagirathi river and Alcaligenes faecalis species from Cauvery river were isolated (data not shown) and characterized.
It is signi cant to note that high levels of metal resistant genes (MRGs) were detected in pristine water samples as compared to downstream polluted samples. The highest MRGs were detected in C1 sample of Cauvery, followed by G2 in Ganga and N2 of Narmada river (Fig. 9A), which are upstream locations of the rivers. Maximum MRGs was detected for copper followed by chromium (Cr), arsenic, zinc and iron. Most of the MRGs were located on bacterial chromosome (91.5%), whereas rest (8.5%) were found to be located in plasmids (Fig S10). The highest number of plasmids located MRGs were recorded in Cauvery C1 (22.32%) and Ganga G2 (14.32%) samples, whereas all other samples have low percentage, ranging from 4 to 10% (Fig. 9D). Besides MRGs, biocides and other chemical resistance genes for ethidium bromide, rhodamine 6G, acri avine and triclosan were also found in abundance in the pristine sample as compared to their respective downstream samples (Fig. 9B). The PCoA plot of bacterial data at species level with respect to the metal concentration using the phyloseq package3 in MicrobiomeAnalyst depicted a signi cant relation of trace metals with respective microbial communities ( Fig.  10) revealing their direct correlation with the presence of trace metals.

Discussion
Rivers have been lifeline as water source however in many parts of the world massive human settlements surrounding the rivers have become a critical issue along with huge volumes of untreated e uents and waste sewage discharged into the rivers. As per the 2018 report of Central Pollution Control Board (CPCB), New Delhi, India, an estimated 6.07 billion litres per day of industrial and sewage wastewater was discharged in the Ganga river [30]. Severe contamination in these waters is expected to pose alterations in the native microbiota of the river water ecosystem. Worldwide, around 4.0% of deaths and 5.7% of the global disease burden was caused by waterborne diseases [31]. Whereas in India, an estimated 37.7 million people were affected by the waterborne diseases annually, and an estimated 1.5 million children die every year due to diarrhoea alone [32]. The rivers chosen for the study are geographically very distantly located, having varying climatic conditions, almost covering North, Middle and Southern part of India. The analysis of the metagenome based microbial community data of these four river samples has revealed certain common and yet unique patterns. Data obtained also provides a complete picture on the alterations on microbial community structure and acquisition of pathogenesis because of anthropogenic particularly industrial in uences causing a greater risk to public health.
All the four rivers studied showed the prevalence of phylum Proteobacteria, Actinobacteria, Bacteroidetes, Planctomycetes, and Verrucomicrobia suggesting that these are the most exible phyla which may get enriched even under constant variation of climate, human interference, soil types and physicochemical water quality parameters commonly observed in any river ecosystem. We have also observed different community structure with increasing anthropogenic activities for example, a decline in phyla Proteobacteria, and Bacteroides whereas phyla Acidobacteria, Actinobacteria, Verrucomicrobia, Firmicutes and Nitrospirae were speci cally found enriched. The phyla Acidobacteria, Actinobacteria, and Firmicutes were reported to be associated with wastewater and fecal contaminants [22]. Similarly, while moving downstream along the rivers, the classes Deltaproteobacteria, Gammaproteobacteria and Planctomycetia were found enriched. Interestingly, many pathogenic genera such as Acinetobacter, Aeromonas, Escherichia, and Pseudomonas belonged to class Gammaproteobacteria. This is an indication of possible human interference, enhancing the load of pathogenic microbial communities.
Furthermore, it was observed by alpha diversity analysis that Gomti River showed the highest richness and evenness thus exhibiting highest bacterial diversity.
This can be correlated to the high human in uence in the Gomti River as also evident from previous studies [33,34]. Furthermore, Gomti river owing to its origin from plains and with human interferences exhibits high contamination in origin itself as compared to the upstream samples of other rivers. This leads to increased diversi cation and richness in the Gomti river sample. Similarly, as evident from Fig. 2 within the river samples, the increase in contamination leads to increased richness and evenness in the samples in all the four rivers. However, C1_ Talacauvery and C3_KRS dam sample of Cauvery River showed less richness and evenness, and the reason to this may be attributed to relatively static water conditions at these locations. Previous studies have also suggested the distinct variations in sediment bacterial community in dam-affected sites [35]. The studies indicate that dam construction can cause changes in microbial diversities. More detailed studies are required to study, the effect of dam constructions on ecosystems and its overall impact, on the environment. At order level classi cation in core microbiome analysis, the most abundant ve orders were Burkholderiales, Rhizobiales, Actinobacteria, Bacteroidetes, and Planctomycetia. It was previously reported that these orders were common to freshwater, lakes and rivers [36][37][38]. They are oligotrophic in nature and are able to adapt to diverse variations in their niches such as metal, pH, pesticides, and other toxicants or low availability of metabolic substrate [39,40]. Thus, the core bacterial orders such as Burkholderiales, Rhizobiales, Planctomycetales, Rhodospirillales, Sphingomonadales, Rhodobacterales and Acidimicrobiales play a crucial role in maintaining the ecosystem [41].
Bacteriophage analysis suggests that out of the four rivers analysed, Ganga has the highest abundance and diversity of both virulent and temperate phages. In contrast to other rivers studied, the pristine samples of river Ganga also showed a prevalence of several phages including S. aureus phage. Since infections of S. aureus are often associated with wounds causing pain and pus formation, therefore the presence of S. aureus phage further strengthens the belief that Ganga water has healing properties. It is expected that the host speci c phage would control the population of S. aureus within the wounds causing relief from pain when water is applied to the wounds. The uniqueness of Ganga was further emphasized by the highest abundance of prophages and higher abundance of phages in comparison to other rivers. However, the increasing anthropogenic activity has led to a signi cant decline in the phage population. From these indications it is evident that bacteriophages play an important role in shaping the diversity and abundance of microbial communities in natural river water systems. Therefore, shift in the abundance and types of bacteriophage with anthropogenic activities would also cause a change in bacterial communities.
Metal contamination has been reported to affect the river micro ora and leads to the acquisition of metal tolerance in the microorganisms [42]. The higher levels of trace elements in Ganga upstream (G1-G5) samples in comparison to other locations are attributed to its hilly origin. While the increased level of toxic elements in downstream samples, e.g. G6 and G7 is attributed to the increase in human bathing, domestic discharges, industrial wastes and wide cattle grazing activities [8,43,44]. Gomti river samples showed high level of elements throughout the sampling sites, suggesting the origin of the river through surface runoff. The occurrence of the high load of E.coli phages and E. faecalis phages in Gomti river also indicates the high sewage and fecal load presence in the river.
Carbon substrate utilization pro ling has great potential as a technique to evaluate the quality of different sources of water bodies [45]. This technique has previously been employed for analyzing the metabolic ngerprints of microbial communities [2,[46][47][48][49]. We have observed a different carbon utilization pattern for each water samples. This may be due to the difference in the water quality, which nurtures diverse structure, the composition of microbial communities based on the nutrients [50,51] and other physic-chemical characteristics. The microbial communities based on carbon substrate utilization showed interesting results for instance, in Ganga, the samples from Alaknanda to Haridwar (G2 to G5) clustered together demonstrating that river Ganga gets most of its attributes from the source stream Alaknanda. In addition, the dissolved solids and mineral load in Alaknanda are signi cantly higher than Bhagirathi [52]. This difference probably results in enriching different microbial communities in Bhagirathi than Alaknanda as observed by the results of carbon source utilization pattern. Similarly, highly contaminated sites of Bithoor (G6) and Jajmau (G7) also distinctly positioned based on principal component analysis for carbon utilization pattern. Our results suggest that downstream samples showed a higher carbon utilization pattern than upstream sampling sites which were more polluted because of sewage and industrial e uents discharges [53]. Also, the organic carbon processing is a characteristic property of heterotrophic microorganisms whose number may increase because of the increasing contamination [54]. Therefore, it is strictly recommended that the industrial e uents should be fully treated and diluted before its release into the rivers [55]. The physiological pro ling of the microbial communities demonstrates that the amino acids, carbohydrates and carboxylic acids were most utilized in the upstream sites compare to downstream locations. It was documented that several heavy metals such as Cr, Cd, Hg, Pb, As and other pollutants are regularly disposed into the river bodies [43,44,56,57]. Therefore, the availability of these substrates becomes reduced to the bacterial communities to assimilate them once high concentrations of pollutants reach rivers.
Moreover, the higher utilization of pollutants found in Ganga may be due to the reason that it covers most of the land of Uttar-Pradesh, Uttarakhand, Bihar and West Bengal (almost 1,000,000 km 2 ) which are highly industrialized and under intense agricultural practices. Thus, pollutants like heavy metals, pharmaceuticals and pesticides reach routinely in Ganga river owing to industrial e uents and surface runoffs [8,53,[58][59][60]. The present ndings highlight that microbial communities differed in all the river water samples in terms of their composition quantitatively from upstream to downstream. The microbial analysis results were signi cantly correlated with carbon utilization pro ling of all the water samples which were in relative proportion to their community members that were subjected to physicochemical changes in the environment (water body) as well as by the physiological and metabolic perturbations of the organisms.

Conclusions
The metagenomic strategy employed in this study allowed us to identify common microbiome as well as few unique signatures of microbial communities occurring in the four major geographically distantly located rivers namely Ganga, Gomti, Narmada and Cauvery. Understandably, microbial activities shape the biogeochemistry, water quality parameters and e ciency in maintaining the health of river ecosystems. The results showed that the studied rivers have speci c microbial community in upstream samples based on their geography and environment of origin. River Ganga showed several unique characteristics in terms of higher concentration of trace elements, abundant microbial communities, unique community structure, and abundance of bacteriophages. However, increasing anthropogenic activities and the con uence of different waste streams in the rivers alter their microbial community structure which leads to enrichment of speci c phyla such as Acidobacteria, Actinobacteria, Verrucomicrobia, Firmicutes and Nitrospirae. The metabolic variations in river Ganga re ected in the pristine and contaminated stretches vary differently. A similar pattern in the increase in heterotrophic microorganism due to increased human interferences in downstream of the rivers was observed through carbon spectrum utilization studies. The insights obtained through this comprehensive study are in terms of microbial diversity, contaminations, genome plasticity, metal resistance and the overall metabolic diversity that will help in making strategy for river cleanup and protection of human and environmental health. The holy rivers Bhagirathi and Alaknanda con uence at Devprayag (Uttrakhand) thereafter called river Ganga. Comparative microbiome analysis of Ganga revealed that, the Alaknanda river has the foremost contribution in Ganga with respect to bacteriophage, microbial diversity, physiochemical parameters, trace elements, and heavy metals. To the best of our knowledge this is the rst study that reveals the rapid and dynamic response of major river water microbiome in response to human activities and pollution load.

Sampling sites
The sampling sites of each river was decided on the basis of covering both upper region and downstream stretches which are usually anthropogenically in uenced. To cover the aformentioned stretches, seven sites of Ganga, ve of Narmada, four of Cauvery and three of Gomti rivers were selected along a gradient of pollution, from the pristine source of the river to the heavily polluted downstream site located in the densely populated urban areas (Fig. 1). The details of sampling sites and their GPS coordinates are provided in Table 1.

Sample collection and processing
The water sample from each location was collected aseptically by using airtight 10 L sterile container during the season of December 2014 to March 2015.
The samples were stored at 4°C for metagenomic studies and bacterial analysis, while a portion of the sample was subjected to physiochemical analysis.

Physicochemical characterization of water samples
Physicochemical properties of water sample such as pH, temperature, dissolved oxygen, turbidity, and conductivity were measured on-site using a multiparameter water quality meters (Horiba U-50, Kyoto, Japan). Other parameters, viz. total dissolved solids, nitrate, phosphate, chloride, uorides, sulphates, total alkalinity, total hardness, calcium hardness, magnesium hardness, and silica were tested in water testing division of our institute as per APHA, standard protocols [61].

Metagenomic DNA extraction
For metagenomic DNA extraction, water samples were ltered using the Millipore ltration apparatus (Merck Millipore, USA) through a 0.45 μm lter membrane to trap su cient microorganism for total DNA isolation. DNA was extracted using a Meta-G-Nome DNA isolation kit (Epicenter Biotechnologies, Madison, WI, USA) following the manufacturer's instruction. The DNA quality was monitored on 1% agarose gel on electrophoresis. The DNA purity and concentration was assessed by measuring a ratio of absorbance at 280/230 nm and 260/280 nm using a NanoDrop 1000 (Thermo Scienti c Inc. Wilmington, DE, U.S.A.). Using the extracted DNA High-Throughput Sequencing (HTS) was performed on Illumina HiSeq 2500 platform. Rest of the extracted DNA samples was stored at -80 °C until further analysis.

Library preparation and sequencing
TruSeqDNAPCR-Free Library Preparation kit (Illumina Inc., San Diego, California, USA) was used to prepare the Illumina shotgun library as per the manufacturer instructions. In brief, approximately 200 nucleotide DNA fragments were generated from an equal concentration of DNA collected from each sample, followed by end repairing using a T4 DNA polymerase, Klenow fragment, and T4 Polynucleotide Kinase, sequentially. Furthermore, the adapters were ligated, and the DNA library was prepared at the time of PCR. The quali ed libraries were further taken for sequencing using the Illumina HiSeq 2500 platform (Illumina Inc., San Diego, California, USA). Minimum 25 million reads were generated to ensuring high depth. The sequencing was outsourced to Bionivid Technology Pvt Ltd, Bangalore, India [62]. 5.6. Sequencing data quality control Estimation and elimination of high base call error reads were done using NGS QC Toolkit v2.3.3 with default parameters [63]. During quality control, the reads containing adapter sequences were discarded. Furthermore, the reads were merged to obtain a single fragment, and dereplication was performed using Vsearch [64]. 5.7. Whole environmental microbiome analysis [65]. The output was then imported to MEGAN6 [66]. Homology binning (includes quanti cation) was done in a combination of Megan's inclusive LCA assignments and conservative NCBI blastx assignments. Sequences were assigned to a bin in the case of passing both the above methods at all taxonomic levels (kingdom, phylum, class, order, family, genus and species).

Bacteriophage isolation
The double agar overlay method was employed for the isolation of bacteriophage from water samples. Filtered water (1 mL) was added to top agar (0.7%) which contain 100 µL of overnight grown targeted bacteria. The mixture was then poured on to the agar plates and the plates were analysed for the formation of lytic zones. The plaque forming units (pfu) were counted for enumeration. Samples in which no lytic zones appeared were further analysed using the method of [67] with slight modi cations. A hundred millilitres of the water sample was added to 100 mL of double strength TSB. The medium was then inoculated with overnight grown bacteria of interest and incubated at 30°C. After 12 h of incubation, the cell free supernatant was ltered with 0.45µm lters (Merck Millipore, Billerica, MA), followed by spotting on plates containing a lawn of bacteria of interest. The plates were then incubated at 37°C for 24 h for the development of lytic zones.

Biosystem enrichment and depletion analysis of environments
The functional analysis of aligned sequences was performed using MEGAN6. Gene function, subsystem classi cation, and identi cation were performed using SEED Genome Databases analysis, KEGG (Kyoto Encyclopedia of Genes and Genomes) [68] was used for pathway elucidation with EC numbers of enriched enzymes. A Classi cation of Mobile Genetic Elements (ACLAME) [69] and Antibacterial Biocide and Metal Resistance Genes Database (BacMet) were used to identify mobile genetic elements and metal resistance gene, respectively [70].

Analysis of microbial diversity using carbon source utilization pattern
To explore the functional diversity of microbial communities, the carbon utilization pattern was checked by using BIOLOG. The carbon source utilization pattern of microorganisms in all the river water samples was determined by using MT and Biolog Eco plates (Biolog, Inc., Hayward, CA, USA). From the water samples diluted with saline (1:100), 150 µL was inoculated in each well of MT and Biolog Eco plates followed by incubation at 28±2 °C. A redox indicator dye, tetrazolium changes color from colorless to purple, which indicate the substrate utilization. Average Well Colour Development (AWCD) was determined as an indicator of microbial activity by measuring at 590 nm for 7 days [71]. Evenness and diversity indexes were calculated as per [2], and Principal component analysis (PCA) was performed as described by [72].

Quanti cation of trace and toxic elements
The toxic elements (viz., Cd, Pb, Cr, and As) and trace mineral nutrients (viz., Mn, Zn, Fe, Co, Cu, Se) were analysed using Inductively Coupled Plasma Mass spectrometer (ICP-MS 7500ex, Agilent Technologies, Japan). Internal standardization was maintained as described previously by Dwivedi et al. [73]. The instrument was calibrated using multi-element calibration standard 3 (8500-6948) and 2A (8500-6940) from Agilent Technologies, USA. The analytical accuracy and precision of the ICP-MS was maintained as per the requirements of the National Accreditation Board for Testing and Calibration Laboratories (NABL) accredited lab (certi cate no. T-1381). Repeated analysis of spiked river water samples (n=5) were performed for each analytical batch for calibration and quality assurance. More than 98% recovery of As, Mn, Fe, Cr, Co, Zn, Cu, Cd, Pb, and Se was recorded with a detection limit of 1 μg/L for each sample.

Statistical analysis
Statistical analysis of microbiome data was performed using Microbiome Analyst, an online tool for comprehensive statistical, visual, and meta-analysis of microbiome data at different taxonomic level [25,74]. To determine the effect of a physicochemical factor on the bacterial communities, physicochemical data were statistically correlated with diversity and validated by performing the principal component analysis (PCA) and regression with respect to each environmental factor to the bacterial community through MicrobiomeAnalyst. Alpha diversity (observed, chao1, and Shannon) and Beta diversity analysis by calculating Bray-Curtis distances method show at operational taxonomic unit (OTU) level with nonmetric multidimensional scaling (NMDS) and PCA plots perform to show differences between groups based on the bacterial community composition.

Additional Files
Additional le 1: Supplementary gure S1 to S10. Additional le 2: Table S1 to       Level of trace and toxic elements at selected sites of Ganga, Cauvery, Narmada and Gomti rivers.

Figure 9
Stacked bar plot showing the distribution of metal and biocides resistance genes in all the water samples from four rivers. (A) Metal resistance genes, (B) Biocides resistance genes, (C) Abundance of MRGs located in chromosomes, and (D) Abundance of MRGs located in plasmid. Note, in this representation some of the genes are represented in more than one group due to broad gene speci city.