Distinct Composition and Assembly Processes of Bacterial Communities in a River from the Arid Area: Ecotypes or Habitat Types?

The composition, function, and assembly mechanism of the bacterial community are the focus of microbial ecology. Unsupervised machine learning may be a better way to understand the characteristics of bacterial metacommunities compared to the empirical habitat types. In this study, the composition, potential function, and assembly mechanism of the bacterial community in the arid river were analysed. The Dirichlet multinomial mixture method recognised four ecotypes across the three habitats (biofilm, water, and sediment). The bacterial communities in water are more sensitive to human activities. Bacterial diversity and richness in water decreased as the intensity of human activities increased from the region of water II to water I. Significant differences in the composition and potential function profile of bacterial communities between water ecotypes were also observed, such as higher relative abundance in the taxonomic composition of Firmicutes and potential function of plastic degradation in water I than those in water II. Habitat filtering may play a more critical role in the assembly of bacterial communities in the river biofilm, while stochastic processes dominate the assembly process of bacterial communities in water and sediment. In water I, salinity and mean annual precipitation were the main drivers shaping the biogeography of taxonomic structure, while mean annual temperature, total organic carbon, and ammonium nitrogen were the main environmental factors influencing the taxonomic structure in water II. These results would provide conceptual frameworks about choosing habitat types or ecotypes for the research of microbial communities among different niches in the aquatic environment.


Introduction
Bacterial communities play an important role in river ecosystems, such as the element cycle and degradation of organic matter. The perturbations to the river bacterial communities can affect the entire system, leading to algal blooms [1, 2] and black-odour waterbody [3,4]. Hence, understanding the composition, function, and assembly processes of bacterial communities in rivers is gaining more and more attention in the recent decade [5][6][7].
Many factors influenced the composition, function, and assembly processes of aquatic microbial communities, such as landscape and environmental factors. In the Yangtze River of China, the landscapes (mountain, foothill, basin, foothillmountain, and plain) were a vital driver shaping bacterial communities in both water and sediment [8]. NH 4 + -N is found to be one of the key factors affecting the composition of river bacterial communities [3,9]. The relative importance of deterministic and stochastic processes was also analysed to underline the assembly mechanism of bacterial communities in the aquatic environment [8,10,11]. Dispersal limitation of stochastic processes was found to shape the spatial patterns of anammox bacterial communities in river sediment [10]. Stochastic processes also drove the assembly mechanism of water microeukaryotic communities in a subtropical river [12]. However, deterministic environmental filtering played a more critical role in shaping benthic bacterial communities, while stochastic processes played a more significant role in the water column [13].
In the river catchments, most researchers investigated the composition, function, and assembly processes of bacterial communities based on habitat types (e.g. water, biofilm, or sediment) or landscapes within them [8,10,13,14]. In the River Thames, the assembly mechanism of bacterial community within habitat types, including the free-living, particle-associated, biofilm, and sediment, exhibited consistency, respectively [13]. A meta-analysis also confirmed that bacterial phyla showed similar patterns in the same habitat among different streams [15]. While in the Bahe River, the bacterial communities in residential and industrial areas exhibited different assembly mechanisms [14]. In the Yangtze River, four ecotypes were found within sediment habitats, proving the effect of dams on benthic bacterial communities [16]. These results indicated that habitat types were given priority to be considered, then followed by the landscape or other environmental conditions within the same habitat in most previous studies. However, should it be like this? This study applied the unsupervised partitioning of microbial metacommunities among different habitat types not within the same compartment. We proposed that more ecotypes (or metacommunity types) could be found than habitat types, which would challenge the empirical analysis method based on habitat types. We addressed this issue by testing a range of interacting habitats (water, sediment, and biofilm) at sites across a river catchment.
In this study, the bacterial communities of three major niches (water, sediment, and biofilm) in the Ili River were firstly partitioned via the unsupervised Dirichlet multinomial mixture (DMM) method, and the underlined assembly mechanism of bacterial communities was determined. Water, sediment, and biofilm samples from the mainstream of the Ili River and its three tributaries were collected to study the responses of bacterial communities in a wide range of habitats under different environmental conditions. These results would provide conceptual frameworks about choosing habitat types or ecotypes for the research of microbial communities among different niches in the aquatic environment.

Study Site and Samples Collection
Ili River originating from Tianshan Mountain is an important water resource in arid Central Asia. The climate in this region is temperate continental and alpine, with an average annual precipitation of 200 to 800 mm [17]. The Ili River valley (80°09′E-84°56′E, 42°14′N-44°50′N) is in the hinterland of the Eurasian continent with an altitude ranging from 538 to 5859 m [18]. The Kunes River, Tekes River, and Kash River are major tributaries of the Ili River. In this study, six, seven, six, and nine sampling sites were set in Kunes River, Tekes River, Kash River, and the mainstream of Ili River, respectively (Supplementary materials, Fig. S1). Each sample was a mixture of at least three subsamples that were randomly collected within an area of 5 m 2 . For water samples, 1 L of surface water was finally obtained in an autoclaved polypropylene bottle, filtered via 0.22-μm membrane within 12 h. The filtered membrane, biofilm, and sediment samples were sealed in sterilised tubes and stored at − 20 °C for DNA extraction. The mean annual temperature (MAT) and precipitation (MAP) of sampling sites were obtained from Resource and Environment Science and Data Center (https:// www. resdc. cn/). The pH, salinity (salt), total organic carbon (TOC) and nitrogen (TN), available nitrogen (AN) and phosphorus (AP) nitrate (NN) and ammonium nitrogen (ANM) in water or sediment were determined according to the environmental monitoring method standards of Ministry of Ecology and Environment of the People's Republic of China.

Full-length 16S rRNA Sequencing and Data Processing
The filtered membrane, biofilm, and sediment samples were used to extract DNA via PowerSoil DNA Isolation Kit (MO BIO) according to the manufacturer's protocols. The composition of bacterial communities in water, sediment, and biofilm was detected by 16S rRNA full-length sequencing method, which is offered by PacBio singlemolecule, real-time (SMRT) technology. Raw paired-end reads were merged using FLASH v1.2.11 [19] and filtered by Trimmomatic v0.33 [20] and UCHIME v8.1 [21]. Then, the obtained clean tags were clustered into OTU by USEARCH version 10.0 at 97% similarity levels [22]. The OTUs with relative abundance less than 0.005% were removed [23]. After that, the taxonomy was annotated using RDP Classifier against the Silva (Release132, http:// www. arb-silva. de) database with a confidence threshold of

Partitioning of Bacterial Communities
Dirichlet multinomial mixture (DMM) was a robust method for partitioning bacterial communities [16]. In this study, the online R code of DMM (https:// micro biome. github. io/ tutor ials/ DMM. html) was used to partition bacterial metacommunities in water, sediment, and biofilm with rarity threshold of OTUs from 0 to 0.5%. The principal coordinate analysis (PCoA) and difference in Bray-Curtis distance of bacterial metacommunities based on OTUs within the water, sediment, and biofilm samples were applied to verify the confidence of the DMM result.

Diversity and Model Fitting of Bacterial Metacommunities
Chao1 and Shannon index of bacterial communities of each sample were calculated to assess the richness and diversity of bacterial communities. Kruskal-Wallis test was applied to compare the difference of Chao1 and Shannon index among sample groups. The above analysis was carried out using a "microeco" package of R software [26]. The differences in taxonomic and potential function composition of bacterial communities were detected via the Tukey test. The correlations between environmental parameters and taxonomic (OTU level) and the potential function composition of bacterial communities across different habitats or ecotypes were analysed by Mantel test via R software. A neutral community model (NCM) was applied to distinguish the contribution of stochastic processes to bacterial community assembly by analysing the relationship between the occurrence frequency of taxa and their relative abundance [25,27,28]. In the model, R 2 and "m" represent the goodness of fit and the migration rate, respectively [14,28].

Statistics of Sequencing Results and Partition of Bacterial Metacommunities
The 69 samples, including water, sediment, and biofilms, were sequenced by PacBio and identified by Barcode. A total of 882 211 circular consensus sequencing (CCS) sequences were obtained. Each sample was produced at least 10,129 CCS sequences, with an average of 12,786 CCS sequences. After rarefaction, the sequences were clustered into 1434 OTUs, among which 1350, 1285, and 1078 OTUs were identified from water, sediment, and biofilm, respectively.
The bacterial metacommunities were partitioned at the OTU level with different rarity thresholds by Dirichlet multinomial mixture (DMM) methods (Fig. 1a). Four optimal clusters were finally obtained, indicating that the samples from three habitats (water, sediment, and biofilm) could be automatically classified into four different ecotypes of bacterial communities acclimated to the environment of the Ili River. Cluster I (sediment) included 26 sediment samples and two water samples. Cluster II (biofilm) included all biofilm samples and one sediment sample. Cluster III (water I) was composed of most water samples from Kunes River and mainstream of Ili River and one sediment sample. All water samples from Tekes River and Kashi River and one sample from Kunes River constituted cluster IV (water II). Considering the consistency of sample habitat, the two water samples, one, and one sediment sample from cluster I, cluster II, and cluster III were removed in the following study. The PCOA (Fig. 1b) and the differences in Bray-Curtis within groups (Fig. 1d) also confirmed the confidential clusters. These results also indicated that the water samples could be mainly divided into two clusters. The average population and gross domestic product (GDP) in the main region (Tekes River and Kashi River basin) of water II were 21.3 people per km 2 and 472,500 RMB per km 2 , which were lower than those in the main region of water I (48 people per km 2 and 1,025,000 RMB per km 2 ). Grassland and woodland were the dominant landscape in the main region of water II. The main towns (Xinyuan County, Gongliu County, Yining City) in the Ili River Valley are located along the Kunes River and mainstream of Ili River (water I). Therefore, bacterial communities in water are more sensitive to human activities compared to sediment and biofilm under the same background of climate and geography.

Species Diversity and Abundance Distribution at Four Optimal Clusters
From the results of the Kruskal-Wallis test, there were no statistical differences in the Chao1 richness and Shannon index between the metacommunities from biofilm and water I (Fig. 2a). Sediment had the highest Chao1 richness (601 ± 116) and Shannon index (4.87 ± 0.45), which were higher than those in water II (471 ± 115 and 4.11 ± 0.31), biofilm (305 ± 126 and 3.05 ± 0.96), and water I (341 ± 135 and 2.92 ± 0.82) ( Fig. 2a and b). These results indicated that bacterial diversity and richness in water decreased as the intensity of human activities increased. However, there was no difference in Chao1 and Shannon index in bacterial communities between water and sediment according to the common conditional analysis based on habitat types ( Fig. 2c and d). The significant difference within water samples was also ignored by habitat type.
At the phylum level, the most abundant and common taxa across all the four clusters were Proteobacteria (Fig. 1c). Figure 3 shows the differences in the relative abundance of the top 12 phyla in water I, water II, biofilm, and sediment samples. The average proportion of Proteobacteria in water II (68.4%) and biofilm (65.8%) were significantly Fig. 1 The optimal clusters of bacterial communities in the Ili River were obtained by Dirichlet multinomial mixture (a). The principal coordination analysis (PCoA, b), composition, and difference in Bray-Curtis distance of bacterial communities across the four optimal ecotypes higher than those in sediment (51.0%) and water I (38.8%). Firmicutes had the highest percentage in water I (44.2%), followed by water II (8.94%), sediment (5.93%), and biofilm (0.42%). Significant higher proportions of Bacteroidota, Acidobacteriota, Panctomycetota, Desulfobacterota, Germmatimonadota, and unclassified bacteria were found in sediment. Biofilm had the highest percentage of Cyanobacteria (18.4%), which was higher than those in water I (5.52%), water II (2.95%), and sediment (0.20%). Actinobacteriota was observed in water II with a relative abundance of 9.3%, which was higher than those in water I (2.97%), sediment (2.06%), and biofilm (0.26%). That is also to say, Proteobacteria and Actinobacteriota had significantly higher relative abundance in Tekes River and Kashi River with less human activities than those in Kunes River and mainstream of Ili River, while the relative abundance of Firmicutes was reversed. By habitat types, the neglect of significant differences within water samples led to different results in the relative abundance of bacterial communities. For example, no difference in the relative abundance of Proteobacteria between water and biofilm was found, and the significant difference in the relative abundance of Firmicutes within water samples was also ignored (Fig. S2). Therefore, these results further confirmed that ecotypes might be a better way

The Potential Function Profile of Bacterial Communities Across the Four Optimal Ecotypes
Distinct potential function profile of bacterial communities obtained by FAPROTAX was found across different habitats or ecotypes in the Ili River ( Fig. S3 and Fig. 4). Chemoheterotrophy and aerobic chemoheterotrophy were the main potential functions. To further analyse the difference, the Tukey test was applied for potential function composition across the four ecotypes (Fig. 4). The bacterial communities with potential functions of manganese oxidation, human pathogens nosocomia, human pathogens pneumonia, and plastic degradation in water I had significantly higher relative abundance than those in water II, sediment, and biofilm. Sediment had a significantly higher relative abundance in the potential function of chemoheterotrophy and aerobic chemoheterotrophy than those in water I, water II, and biofilm. Biofilm had a higher relative abundance in potential phototrophy function of bacterial communities than those of the other three ecotypes. Biofilm also had a higher relative abundance of photoautotrophy, cyanobacteria, and oxygenic photoautotrophy than those in water II and sediment but showed no significant difference compared with water I. Compared with the habitat types (Fig. S3), the significant differences in potential functions within the water samples were also neglected, such as the manganese oxidation, human pathogens nosocomia, human pathogens pneumonia, and plastic degradation.

Bacterial Communities Across Different Ecotypes Fitted to NCM
The internal assembly mechanism of bacterial community across the four ecotypes was analysed via NCM and shown in Fig. 5. Water (including water I and water II) and sediment showed higher goodness-of-fit to NCM (R 2 > 0.5) than that of biofilm (R 2 = 0.154). The migration rates of the four ecotypes were shown in the following order: biofilm (0.063) < sediment (0.153) < water I (0.251) < water II (0.342). These results indicated that an immigrant from metacommunities was less likely to randomly colonise in the saturated local communities in sediment and biofilm compared to water in the Ili River. Meanwhile, water I influenced by higher anthropogenic impacts also had a lower migration rate compared with water II, with less human activities.

Environmental Factors Influencing the Composition and Potential Function of Bacterial Communities
The relationship between environmental factors and community structure was important to understand the bacterial assembly process from the perspective of deterministic selection. Mantel's test (Fig. 6) showed that three, two, two, and three environmental factors could influence the bacterial community structure at the OTU level for sediment, water, water I, and water II, respectively. In contrast, no tested environmental factors showed influence on the taxonomic structure of the biofilm. For bacterial communities in sediment, TOC, TN, and AP (available phosphorus) were the main drivers shaping the biogeography of taxonomic structure. In water I, with higher intensity of human activities, salinity and mean annual precipitation (MAP) were the main drivers, while mean annual temperature (MAT), TOC, and ammonium nitrogen (ANM) were the environmental factors influencing the taxonomic structure in water II with less human activities. If water was considered as a habitat, it was found that MAT and salinity were the main drivers influencing the water bacterial communities in the Ili River.

Discussion
In this study, an unsupervised machine learning method instead of empirical experience was applied to identify the ecotypes of bacterial communities across the Ili River. Four ecotypes were identified among samples from water, sediment, and biofilm. Water was more susceptible to be influenced by human activities and divided into two ecotypes. The difference in the taxonomic and potential function composition and assembly processes of bacterial communities across the four ecotypes was found. Our study provided an alternative conceptual framework for choosing habitat types or ecotypes to research microbial communities among different niches in the aquatic environment.

Distinct Bacterial Communities Across the Four Ecotypes
Bacterial communities in the same habitat could be divided into different ecotypes, first reported in the human gut microbiome [29]. DMM as a robust method has been widely applied in the pattern recognition of microbiome in the same habitat, such as aerobic composting sludge [30], soil [31], and sediment [16]. Bacterial phyla showed similar patterns in the same habitat among different streams [15], so fewer studies used the DMM method to partition microbiome across different habitat types in the river ecosystem. However, this study divided the microbiome across three habitats (e.g. sediment, biofilm, and water) into four ecotypes via the unsupervised DMM. The microbiome of three niches (workplaces, high-touch areas, and environments) in healthcare-associated institutes in Taiwan was also divided into four ecotypes via DMM [32]. Therefore, DMM could be a choice to identify the ecotypes of bacterial communities in the environment, rather than just based on the habitat types. However, it also should be cautious that the DMM may improve the reliability of field investigations to some extent [33]. To avoid this, in this study, PCoA and the difference in Bray-Curtis distance of bacterial metacommunities confirmed the ecotypes obtained from DMM.
Human activities played a much more critical role in shaping bacterial communities in water than sediment and biofilm. The higher diversity and richness of bacterial communities in water II with less human activities were higher than those in water I with intense human disturbance. In temperate southern Africa, the Chao1 richness and diversity of bacterial communities in Kowie River with relatively higher urban land use efficiency were also lower than those observed in Kariega River with less human activities [34]. The PCOA (Fig. 1b) and the differences in Bray-Curtis within groups also indicated that distinct bacterial communities formed between water I and water II. In Songhua River, the water bacterial communities in areas with high population density also exhibited different compositions compared to those in sites with low population density [35].
Firmicutes were the main bacteria (44.17%) at the phylum level in water I with intense human interference, followed by Proteobacteria (38.80%). While in water II with fewer human activities, Proteobacteria (68.4%) and Actinobacteriota (9.3%) were the top two bacteria at the phylum level. In the surface water along the Three Gorge Reservoir, Firmicutes was also the most dominant phyla and correlated with the concentration of chemical oxygen demand [36]. Firmicutes played an important role in the degradation of organic matters, such as sugar cane [37] and potato starch processing wastewater [38]. More organic matter could be discharged into water I due to the higher human population and intense agricultural activities compared to water II, with less human activities, which may lead to the domination of Firmicutes. The potential functions based on the bacterial communities also confirmed that water I experienced more human disturbance than water II. For example, water I had a higher relative abundance of bacterial communities with potential functions of human pathogens nosocomia, human pathogens pneumonia, and plastic degradation. Significant higher proportions of Bacteroidota, Acidobacteriota, Panctomycetota, Desulfobacterota, Germmatimonadota, and unclassified bacteria were found in sediment. Bacteroidota showed a high potential for degradation of lignocellulosic polymers [39]. Desulfobacterota is a sulphatereducing bacteria and showed a preference for freshwater environments [40]. These results indicated that sediment played a more important role in the biogeochemical cycles of elements, such as carbon and sulphur. Cyanobacteria were the main bacteria in biofilm from the Ili River, which was also in the biofilm from the French and New Zealand rivers [41,42].

Different Assembly Mechanisms of Bacterial Communities Across the Four Ecotypes
NCM has been widely used to assess the assembly mechanisms of bacterial communities in the aquatic environment, such as the Yangtze River [8] and Tingjiang River [12]. In this study, we found that the bacterial communities on the biofilms were not well fitted to the NCM, indicating that stochastic processes were not the key assembly mechanism of bacterial communities in this habitat. Meanwhile, the environmental factors tested in water also did not correlate with the taxonomic composition of bacterial communities on biofilm. These results indicated that habitat filtering might play a more important role in the assembly of bacterial communities in the river biofilm. For Phormidium (cyanobacteria) biofilms in French and New Zealand rivers, it was also observed that dispersal capacities of bacteria were not the key factor driving the bacterial community [41].
Community composition of bacterial communities in water I, water II, and sediment was fitted to NCM with R 2 values between 0.516 and 0.563, indicating a strong effect of dispersal limitation on the assembly of bacterial communities in water and sediment. Stochastic processes, including dispersal limitation, were the main driving ecological processes shaping the bacterial communities in the sediment from subtropical mangroves [27] and microeukaryotic community assembly in water from the subtropical river [12]. Two or three environmental factors were found to exert influence on the taxonomic composition of bacterial communities in water I, water II, and sediment. These results indicated that deterministic processes also played a partial role in shaping the bacterial communities in the water and sediment. Among the identifiable deterministic factors, TOC showed a significant influence in shaping the bacterial community structure of both sediment and water II. In previous studies, TOC was also found to be an important driver of bacterial communities in the Danube floodplain [43] and temperate lakes [44].
Salinity was the most important factor driving the bacterial community in water I with intense human activities.
Salinity was found to be the main driver shaping bacterial community structure in the inland freshwater lakes [45] and wetlands [46]. The water I region is the main irrigation area of the Ili River. Salinity is a threat to agricultural productivity and sustainability in the irrigation area of the Ili River [47,48], among which input of available phosphorus, alkaline nitrogen, available potassium was the important factor [48]. For the water II region with less human activities and covered mainly by forest and grassland, MAT was the main driver shaping the bacterial communities in water II. Water temperature was found to be the primary factor driving the bacterial community structure in the Yangtze River [8]. MAT was also found to mediate the assembly of the rare sub-communities in agricultural ecosystems of China from south to north [49]. If the habitat type was used for water together, the factors influencing the bacterial community assembly were MAT and salinity, and the difference in bacterial assembly mechanism within the water could be neglected.

Conclusions
The comprehensive composition and assembly mechanism of microbial communities among different habitats are critical to understand their adaption to environmental changes or human activities. In this study, we found that the unsupervised machine learning Dirichlet multinomial mixture (DMM) method was effective to analyse four ecotypes of microbial communities among three habitats. Water habitat was successfully clustered into two ecotypes with different compositions, potential functions, and driving factors of bacterial communities. Our study may provide an alternative framework for the research of microbial communities among different niches in the aquatic environment. It should also be noted that more studies are still needed to investigate the effectiveness of the unsupervised machine learning method in partitioning the microbial communities compared to empirical habitat types.
Author Contribution W.S. and R.Q. conceived and designed the study; W.S., N.X., and R.Q. collected the samples and data; Y.Y. and W.S. performed the analysis; and W.S., X.Z., and L.Z. wrote and revised the manuscript.
Funding This work was supported by the National Natural Science Foundation of China (grant number 41673127) and grants from the Youth Innovation Promotion Association of the Chinese Academy of Sciences (grant number Y201976 and 2017478).
All the raw data sets are publicly available in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under project accession no. PRJNA733708.