Taxonomic and Functional Metagenomics Proling of Tuwa and Unnai Hot Springs Microbial Communities

Hot springs are of great importance due to their unique physicochemical properties. Due to unique selection pressure in this habitat, a diverse microbial community is prevailing and can be analyzed by high throughput sequencing technology. Present study focuses on metagenomic sequencing of two hot springs from Gujarat, India namely Tuwa and Unnai through both, culturable and culture independent approach. Sequence analysis from both the water reservoirs depicted higher species richness and diversity based on various diversity indices. The microbial community structure at both the hot springs was distinct and dependent on physicochemical factors like temperature, pH, mineral content etc. Enrichment by cultivation before metagenome sequencing revealed the abundance of Firmicutes (up to 96%) representing cultivable organisms in hotsprings. The bacterial phyla Firmicutes, Proteobacteria, Bacteroidetes, Thermotogae, Deinococcus-Thermus, and Chloroexi dominate the thermoalkaline spring at Unnai and Tuwa in different proportion. Economically important microorganisms belonging to genera Thermus, Brevibacillus, Anoxybacillus, Bacillus, Pseudomonas, and Geobacillus were prevalent in hot springs. The analysis of functional potential by KEGG revealed pathways for metabolism of carbohydrates, amino acids, vitamins, cofactors and xenobiotics. Annotation with Carbohydrate Active EnZymes (CAZy) revealed the presence of four major classes of enzymes: glycosyl transferase, glycoside hydrolase, polysaccharide lyase and carbohydrate-binding modules. The study provides insight into the microbial community structure and their untapped functional potential for various biotechnological and environmental applications. of both the sites. The metagenomic analysis in comparison with public databases KEGG and CAZy was performed for the functional proling. More than 20,000 reads were assigned to various pathways for genetic information processing, survival and human diseases. Organisms present in both the hot springs show potential for the production of important enzymes involved in complex carbohydrate metabolism. GHs were highly abundant in the metagenomes and contributed by various genera reported to survive at higher temperature and other extremities. These genera can be the important source of heat tolerant enzymes and secondary metabolites. This study provides an insight into unexplored metabolic potential of two hot springs in Gujarat that can be utilized for industrial application.


Introduction
Extremophiles are organisms known to survive in environments with extremities of temperature, salinity, pH, pressure and radiation. Diverse thermophilic microorganisms speci cally inhabit extreme environments such as hot springs, deep-sea hydrothermal vents, and volcanic island sediments [1]. Thermophilic microorganisms survive in hot environments with the help of thermostable enzymes that remain catalytically active at a higher temperature.
Studying thermophiles provides a closer insight into the origin of life and evolutionary mechanisms as scientists believe that life has evolved at high temperatures during the prebiotic stage on the Earth [2]. Metagenomic analysis for the presence of microbial communities in several hot springs and their potential to produce useful metabolites has revealed the applications of thermophiles for industrial and biotechnological applications [3]. Geothermal hot springs are well documented for therapeutic signi cance and the chemical composition depends on the physicochemical properties of surrounding rock material. The microbial community structure of the hot springs is mainly shaped by the geochemical characteristics of the water reservoirs such as pH, temperature, ecological interactions, redox potential, dissolved minerals, and other chemicals [4].
Advancement in sequencing technology for the metagenomic pro ling has served three major purposes: understanding the microbial community structural pro ling, harnessing the functional potential of the existing microbiota, and discovery of novel natural biomolecules without culturing them. Several culture-independent studies have been done to explore microbial diversity and their functional potential to produce various bene cial metabolites from hot springs. The hot springs mainly harbor Gammaproteobacteria and Alphaproteobacteria including Geobacillus, Anoxybacillus [5], Bacillus [6], Hydrogenobacter, Thermus [7] Clostridium [8], Opitutusterrae, Rhodococcus, Cellovibrio [9], Exiguobacterium, Pseudomonas, Paenisporosarcina, Acinetobacter, Stenotrophomonas, Thermanaerothrix and Thermoanaerobaculum as major bacterial genera. Many studies have also reported the presence of the genes for catalysts of biotechnological potential for the metabolism of carbohydrate, nitrogen, sulfur, methane, xenobiotics and antibiotic biosynthesis as well as resistance genes [10][11][12]. Microbiota residing in the sediments of various habitats harbors genes of interest for biodegradation, bioremediation, and stress response [13][14][15]. Fewer studies reported the metagenomics analysis of soils and sediments of hot water reservoirs which revealed the presence of novel genes for the biotechnological potential [16][17]. Hence it is essential to explore microbial community and their functional capabilities of still unexplored hot springs.
In the present work, we focused on analyzing the whole metagenome sequences derived from water and the soil of Tuwa and Unnai hot springs. This investigation aimed to study the microbial diversity of the hot springs and their functional pro ling to create a metagenomic wealth of novel genes encoding enzymes of industrial potential. The study also focused on detecting the presence of important cultivable organisms and comparison for community structure between soil and water from Tuwa as well as between Tuwa and Unnai hot springs.

Materials And Methods
Site description and sampling Two hot springs from Gujarat were selected for the current study namely Tuwa (22.799444˚N and 73.4602778˚E) and Unnai (20.8533333˚N and 73.3344444˚E). Both of the hot springs are less explored and situated in Panchmahal and Navsari District respectively in Gujarat, India. To understand the microbial community, physicochemical analysis of water samples from both the hot springs was carried out. Water samples and soil samples were collected in sterile 50 mL Falcon tubes following standard protocols to avoid direct exposure to the environment. Before sampling, the pH was recorded at the sampling site with a handheld pH meter (Whatman PHA 260) calibrated with warm buffer. Additionally, Tuwa water and soil samples were enriched in the medium containing 1 g L -1 Yeast extract, 1.3 g L -1 (NH 4 ) 2 SO 4 , 0.25 g L -1 KH 2 PO 4 , 0.25 g L -1 MgSO 4 .7H 2 O, 0.08 g L -1 CaCl 2 .2H 2 O by culturing at 63°C for 24 h before metagenomics analysis.

Isolation of genomic DNA and NGS sequencing
Total four samples were processed for the sequencing: ES-enriched soil samples from Tuwa, EW-enriched water samples from Tuwa, Tuwa_S-soil samples from Tuwa, Unnai_S-soil samples from Unnai. The total DNA from soil, water and enriched samples was isolated using either DNeasy PowerWater or DNeasy PowerSoil Kits (QIAGEN).
Libraries were prepared using Ion plus fragment library kit (Thermo Fisher Scienti c) and the sequencing was performed using Ion 520 chip and 400-bp chemistry of Ion S5 plus system.

Bioinformatics analysis of metagenomics data
Raw sequences were imported into a pipeline where reads were ltered using FastQC (Galaxy version 0.72) and processed with PRINSEQ version 0.20.4 lite. All reads having read length shorter than 100 bp, mean quality ≤20, containing any ambiguous base or incorrect primer sequences were excluded from the dataset. The taxonomic pro ling of the obtained reads was done using the SILVA database with an e-value <1e-5 and a minimum identity of 90%, and analyzed up to genus level. The SILVA output les were further analyzed using STAMP to visualize the taxonomic composition for each sample from phylum to genus level. Rarefaction curves and various diversity indices were measured using PAleontological STatistics (PAST) v4.03. Principal coordinates analysis (PCoA) was carried out to complement the output of the cluster analyses.
Metabolic potential of hot spring samples The SqueezeMeta pipelines [18] were employed for assembly and functional analysis of reads obtained. This pipeline used the co-assembly mode using Megahit and reads from all the samples were pooled before the assembly. The functional diversity of the samples from Tuwa and Unnai hot springs was analyzed following Kyoto Encyclopaedia of Genes and Genomics (KEGG) analysis with e-value < 1e-5. Functional classi cation in all the samples was performed up to the level-2 of KEGG annotations. The protein sequences of the predicted open reading frames (ORF) obtained were aligned against Carbohydrate Active Enzyme database (CAZy) database using BLASTp in DIAMOND [19]. The contigs were mapped and screened at 1e-5 e-value threshold and gene prediction was strengthened further by applying the criteria of ≥90% identity.

Results And Discussion
In the present work, comparison of microbial diversity between two hot springs in the Gujarat area was carried out to observe location variation. Both the sites harbor unique physicochemical factors, which have shaped the existing microbial community. The temperature of Unnai hot springs usually ranges from 54 to 56°C and for Tuwa 63 to 64°C. pH ranges from 7.9 to 8.4 for Unnai and 8.2 to 9 for Tuwa. The pH of the water is slightly alkaline and has in uence on the biological and chemical reactions. The rise in pH may have resulted due to the contamination of the water reservoirs by organic matter as inferred from the high BOD value in previous report [20]. pH is one of the factors that has an impact on the microbial diversity [21]. The abundance of Gammaproteobacteria and Betaproteobacteria is directly correlated with the pH of the environment [22]. Higher temperature of the water reservoir also has a critical in uence on the existence and abundance of microbial communities as well as their metabolic activities. Vast numbers of phyla dominate the habitats with a range of temperatures. In this study, Proteobacteria, Deinococcus-Thermus, Firmicutes and Bacteroides account as the pivotal phyla in both the hot springs. Similar observations have been reported earlier in the metagenomic analysis of the various hot springs in India having temperature ranging from 43 to 98°C [10,17,[23][24][25].

Sequence analysis
To characterize the microbial community structure of the environmental samples DNA was extracted from two different hot springs. The DNA extracted from the enriched as well as the raw samples were used for further downstream processing. A total of 22,517,835 reads were generated by high throughput shotgun WGS of 4 samples.
Maximum reads were obtained in Tuwa soil sample, while the lowest was obtained in enriched water sample from Tuwa. After trimming and ltering the reads of each sample, average read length of about 276 -296 and approximately 98% high quality reads at ≥Q20 were generated. The high-quality reads were assembled in 491,980 contigs, with N50 score of 1065, and the mean length of contigs were 836. The results of sequence assembly statistics are shown in Table 1. NGS is a powerful tool to identify the potential genes or enzymes from the DNA samples obtained from environment [26]. Metagenomic sequencing of both the hot spring samples with and without enrichment step provided plenty of sequence data. Thus, analysis of this data provides the information on the community structure and their functional potential using different tools. Such studies facilitate our knowledge about the role of culturable as well as uncultured microbes in the environment.

Diversity and species richness analysis
Rarefaction analysis and other diversity indices like Chao, Simpson, Shannon etc. were studied for species diversity.
An overview of the alpha diversity of the community in both the hot springs is displayed in Table 2. The rarefaction cure (Fig. 1a) depict that generated reads are insu cient to capture the microbial diversity as the curve did not reach plateau. This upward trend in the soil samples without enrichment indicated higher bacterial diversity. However, for enriched sample the curve reached to plateau which is obvious as only very few of the organisms may have grown under the growth conditions. Shannon index at phylum level for enriched samples was 0.25 -0.35 and for soil samples 2.06 -2.6. For genera level the Shannon index ranged from 2.1 -3.74 for all analyzed samples. Same pattern of diversity in terms of evenness was obtained for Simpson index at phylum and genera level. Analysis of Shannon and inverse Simpson index at the phylum and genus level represented a higher diversity in terms of richness and evenness in soil samples without enrichment. PCoA analysis was performed using STAMP to check the cluster analysis of existing bacterial community (Fig. 1b). The two dimensions of the plot accounted for 81.1 and 14.4% of the diversity variations. Thus, PCoA1 and PCoA2 jointly expresses 95% variation among four samples.
PCoA of the each metagenome sample revealed distinct and non-overlapping community structure with high variation at genus level between two hot springs.
The taxonomic pro les of both the sites identi ed microbes belonging to a total of 67 phyla, 158 classes, 326 families, and 491 genera. The metagenomic samples were found to be dominated by 6 bacterial phyla-Firmicutes, Bacteroidetes, Proteobacteria, Thermotogae, Deinococcus-Thermus, and Chloro exi (Fig. 2). Phyla present in less abundance were represented as others and listed in supplementary le 1. The phylum Proteobacteria exhibited the highest relative abundance (23%) in the soil samples from Tuwa. Similar abundance of Proteobacteria in sediment and water was observed from Soldhar and Polok hot springs respectively in previous reports in India [24,27]. Hot springs at Soldhar and Polok show similar alkaline pH range but the temperature was higher (76 and 95°C respectively) than hot spring at Tuwa. Deinococcus-Thermus was the most abundant phylum (45%) in the Unnai soil samples. This nding resembles with the hot spring in Spain where the alkalinity was similar but the temperature was >76°C [28]. The microbial community structure signposted the alkaline pH and higher temperature as the prime selection factors favoring the abundance of Proteobacteria and Deinococcus-Thermus in Tuwa and Unnai soil respectively. In the enriched soil and water samples from the Tuwa, Firmicutes (94-96%) was found to be the largest phylum and comprises a vast diversity of bacterial strains. From the heatmap it is very clear that phylum Firmicutes got enriched while Deinococcus-Thermus, Proteobacteria, Bacteroidetes, Thermotogae and Chloro exi, though abundant in the sample, are not enriched. This observation implies that the medium used for enrichment may be suboptimal for the growth of the organisms. The enrichment of Firmicutes was signi cant and it denotes the presence of cultivable organisms in Tuwa hot spring. Metagenome analysis of water samples from both the sites are reported earlier [29,30]. Diversity at these sites matches with earlier studies nonetheless, the proportion of abundant phyla and genera varied in soil samples. Some distinct phyla such as Deinococcus-Thermus, Bacteroidetes, Chloro exi and Thermotogae were found to be present in soil and water samples from both the sites in comparison to earlier reports from these sites for water samples (Supplementary le 2). This observation supports that each hot spring harbors unique microbial community due to different physicochemical properties as well as gradient of temperature and light. Habitat of thermoalkaline spring also differs in the form of sediment, thermal uids or microbial mats depending on the geological pro le of the surrounding rock. All these factors play crucial role in shaping the existing micro ora [31]. The obtained results about dominating phyla of the samples analyzed in this study also coincides with the earlier reports for hot springs in Indian regions and other parts of the world [10,17,[23][24][25][32][33][34]. Cellulose, chitin, hemicellulose and starch degrading bacterial genes have been reported in the metagenomes where phyla Firmicutes, Proteobacteria, Deinococcus-Thermus, Bacteroidetes etc dominate the environment [35]. This is the rst report of metagenomic analysis of enriched soil and water samples from the Tuwa as well as soil from both the hot springs. The enrichment approach depicts the wide variety and abundance of culturable organisms present in the hot spring environment.

KEGG analysis
To reveal the functional potential of bacteria prevailing in the hot springs, the metagenomic data was analyzed using KEGG and a total of 20,321 genes/hits were annotated with KO IDs. The detailed analysis showed that KOs were assigned to pathways for metabolism (>4,000), genetic information processing, cellular processes, environmental information processing, organismal systems and Human diseases (Fig. 5a). Further analysis of the major pathways for metabolism at level-2 showed 18 sub-pathways (Fig. 5b). Pathways for carbohydrate metabolism, energy metabolism, metabolism of cofactors and vitamins, amino acid metabolism, xenobiotics degradation and metabolism and membrane transport were found to be dominant in the four metagenomes. The abundance of genes related to metabolic pathways may be responsible for survival of the organisms into the hot springs environment [10]. Soil samples of both the hot springs showed higher proportion of the reads assigned to different KO categories as compared to enriched samples. Soil samples represent total microbiome present in comparison to enriched samples where only cultivable group dominate the samples due to selective enrichment. Interestingly other KEGG subsystems were found to be associated with biosynthesis of glycan and other secondary metabolites, metabolism of terpenoids, and polyketides etc.

Mining for carbohydrate active enzymes
To have insight into carbohydrate related enzymes present in the metagenomes, predicted ORF reads obtained from the SqueezeMeta pipeline were matched against the CAZy database. The metagenome sequences were distributed among four major enzyme classes of CAZy database-glycosyl transferase (GT), glycoside hydrolase (GH), polysaccharide lyase (PL), and Carbohydrate-Binding Modules (CBMs) [25]. Proportion of GH family was higher (>80%) followed by GT, PL and CMBs (Fig. 6a). GH13 was the most abundant (33%) within the GH family. GH2, GH3, GH57, and other families were comparatively less abundant than GH13 (Fig. 6b). The CAZy enzymes are reported to be involved in the catabolism of polysaccharides. GH and GT enzymes have an important role in energy metabolism, communication and structural support in the organisms [40]. GHs are involved in the catalysis of carbohydrates at O-glycosidic bond and e ciently hydrolyse cellulose, chitin, other abundant carbohydrates in the habitat etc. Amongst GH enzymes whole gene sequences for debranching enzymes (GH13, GH16, GH23, GH51, GH67, GH 94,  GH97), hemicellulose degrading (GH2, GH3, GH4, GH5, GH10, GH26, GH31, GH35, GH38, GH43, GH53), and cellulases (GH3, GH5, GH9) are present (Fig. 6c). GT are responsible for the glycosylation of amino acids during cellular processes [41] Conclusion The present study provide insight into the microbial composition of soil and enriched samples from hot springs at Tuwa and Unnai, Gujarat. Metagenomic analysis for taxonomic pro ling of the hot springs depicts great microbial diversity in terms of alpha and beta diversity at the two hot water reservoirs. Due to unique environmental factors (pH and temperature) at the hot springs, the metagenomes show distinct community structure. Deinococcus-Thermus (45%) and Proteobacteria (23%) were the major abundant phyla at Unnai and Tuwa hot spring respectively. Enrichment dependent approach revealed presence and abundance of cultivable organisms from the phylum Firmicutes (≥94%) in soil and water samples from Tuwa. In this study, some distinct cultivable and non cultivable organisms from Firmicutes, Bacteroidetes, Proteobacteria, Deinococcus-Thermus, Thermotogae and Chloro exi were reported for the rst time, when compared to earlier reports of water metagenomes of both the sites. The metagenomic analysis in comparison with public databases KEGG and CAZy was performed for the functional pro ling. More than 20,000 reads were assigned to various pathways for genetic information processing, survival and human diseases. Organisms present in both the hot springs show potential for the production of important enzymes involved in complex carbohydrate metabolism. GHs were highly abundant in the metagenomes and contributed by various genera reported to survive at higher temperature and other extremities. These genera can be the important source of heat tolerant enzymes and secondary metabolites. This study provides an insight into unexplored metabolic potential of two hot springs in Gujarat that can be utilized for industrial application.    Heat map representing the microbial community pro ling of all four samples at class level. Only dominant bacterial families present at relative abundance >1% were indicated in the graph.