In the present work, comparison of microbial diversity between two hot springs in the Gujarat area was carried out to observe location variation. Both the sites harbor unique physicochemical factors, which have shaped the existing microbial community. The temperature of Unnai hot springs usually ranges from 54 to 56°C and for Tuwa 63 to 64°C. pH ranges from 7.9 to 8.4 for Unnai and 8.2 to 9 for Tuwa. The pH of the water is slightly alkaline and has influence on the biological and chemical reactions. The rise in pH may have resulted due to the contamination of the water reservoirs by organic matter as inferred from the high BOD value in previous report [20]. pH is one of the factors that has an impact on the microbial diversity [21]. The abundance of Gammaproteobacteria and Betaproteobacteria is directly correlated with the pH of the environment [22]. Higher temperature of the water reservoir also has a critical influence on the existence and abundance of microbial communities as well as their metabolic activities. Vast numbers of phyla dominate the habitats with a range of temperatures. In this study, Proteobacteria, Deinococcus-Thermus, Firmicutes and Bacteroides account as the pivotal phyla in both the hot springs. Similar observations have been reported earlier in the metagenomic analysis of the various hot springs in India having temperature ranging from 43 to 98°C [10, 17, 23-25].
Sequence analysis
To characterize the microbial community structure of the environmental samples DNA was extracted from two different hot springs. The DNA extracted from the enriched as well as the raw samples were used for further downstream processing. A total of 22,517,835 reads were generated by high throughput shotgun WGS of 4 samples. Maximum reads were obtained in Tuwa soil sample, while the lowest was obtained in enriched water sample from Tuwa. After trimming and filtering the reads of each sample, average read length of about 276 - 296 and approximately 98% high quality reads at ≥Q20 were generated. The high-quality reads were assembled in 491,980 contigs, with N50 score of 1065, and the mean length of contigs were 836. The results of sequence assembly statistics are shown in Table 1. NGS is a powerful tool to identify the potential genes or enzymes from the DNA samples obtained from environment [26]. Metagenomic sequencing of both the hot spring samples with and without enrichment step provided plenty of sequence data. Thus, analysis of this data provides the information on the community structure and their functional potential using different tools. Such studies facilitate our knowledge about the role of culturable as well as uncultured microbes in the environment.
Diversity and species richness analysis
Rarefaction analysis and other diversity indices like Chao, Simpson, Shannon etc. were studied for species diversity. An overview of the alpha diversity of the community in both the hot springs is displayed in Table 2. The rarefaction cure (Fig. 1a) depict that generated reads are insufficient to capture the microbial diversity as the curve did not reach plateau. This upward trend in the soil samples without enrichment indicated higher bacterial diversity. However, for enriched sample the curve reached to plateau which is obvious as only very few of the organisms may have grown under the growth conditions. Shannon index at phylum level for enriched samples was 0.25 - 0.35 and for soil samples 2.06 - 2.6. For genera level the Shannon index ranged from 2.1 - 3.74 for all analyzed samples. Same pattern of diversity in terms of evenness was obtained for Simpson index at phylum and genera level. Analysis of Shannon and inverse Simpson index at the phylum and genus level represented a higher diversity in terms of richness and evenness in soil samples without enrichment. PCoA analysis was performed using STAMP to check the cluster analysis of existing bacterial community (Fig. 1b). The two dimensions of the plot accounted for 81.1 and 14.4% of the diversity variations. Thus, PCoA1 and PCoA2 jointly expresses 95% variation among four samples. PCoA of the each metagenome sample revealed distinct and non-overlapping community structure with high variation at genus level between two hot springs.
The taxonomic profiles of both the sites identified microbes belonging to a total of 67 phyla, 158 classes, 326 families, and 491 genera. The metagenomic samples were found to be dominated by 6 bacterial phyla- Firmicutes, Bacteroidetes, Proteobacteria, Thermotogae, Deinococcus-Thermus, and Chloroflexi (Fig. 2). Phyla present in less abundance were represented as others and listed in supplementary file 1. The phylum Proteobacteria exhibited the highest relative abundance (23%) in the soil samples from Tuwa. Similar abundance of Proteobacteria in sediment and water was observed from Soldhar and Polok hot springs respectively in previous reports in India [24, 27]. Hot springs at Soldhar and Polok show similar alkaline pH range but the temperature was higher (76 and 95°C respectively) than hot spring at Tuwa. Deinococcus-Thermus was the most abundant phylum (45%) in the Unnai soil samples. This finding resembles with the hot spring in Spain where the alkalinity was similar but the temperature was >76°C [28]. The microbial community structure signposted the alkaline pH and higher temperature as the prime selection factors favoring the abundance of Proteobacteria and Deinococcus-Thermus in Tuwa and Unnai soil respectively. In the enriched soil and water samples from the Tuwa, Firmicutes (94-96%) was found to be the largest phylum and comprises a vast diversity of bacterial strains. From the heatmap it is very clear that phylum Firmicutes got enriched while Deinococcus-Thermus, Proteobacteria, Bacteroidetes, Thermotogae and Chloroflexi, though abundant in the sample, are not enriched. This observation implies that the medium used for enrichment may be suboptimal for the growth of the organisms. The enrichment of Firmicutes was significant and it denotes the presence of cultivable organisms in Tuwa hot spring. Metagenome analysis of water samples from both the sites are reported earlier [29, 30]. Diversity at these sites matches with earlier studies nonetheless, the proportion of abundant phyla and genera varied in soil samples. Some distinct phyla such as Deinococcus-Thermus, Bacteroidetes, Chloroflexi and Thermotogae were found to be present in soil and water samples from both the sites in comparison to earlier reports from these sites for water samples (Supplementary file 2). This observation supports that each hot spring harbors unique microbial community due to different physicochemical properties as well as gradient of temperature and light. Habitat of thermoalkaline spring also differs in the form of sediment, thermal fluids or microbial mats depending on the geological profile of the surrounding rock. All these factors play crucial role in shaping the existing microflora [31]. The obtained results about dominating phyla of the samples analyzed in this study also coincides with the earlier reports for hot springs in Indian regions and other parts of the world [10, 17, 23-25, 32-34]. Cellulose, chitin, hemicellulose and starch degrading bacterial genes have been reported in the metagenomes where phyla Firmicutes, Proteobacteria, Deinococcus-Thermus, Bacteroidetes etc dominate the environment [35]. This is the first report of metagenomic analysis of enriched soil and water samples from the Tuwa as well as soil from both the hot springs. The enrichment approach depicts the wide variety and abundance of culturable organisms present in the hot spring environment.
At the class level taxonomic profiling from enriched water and soil samples from Tuwa, Bacilli (78-82%), Negativicutes (7-9%), Clostridia (5-8%) were abundant and their dominance support the enrichment of culturable group. Flavobacteriia, Alphaproteobacteria, Gammaproteobacteria, Deinococci and Actinobacteria were lesser than 2% (Fig. 3). For soil samples from Tuwa and Unnai (without enrichment) major abundant class were Deinococci, Gammaproteobacteria, Bacilli, Clostridia, Flavobacteria, Deltaproteobacteria, Alphaproteobacteria, Negativicutes, Thermotogae, Anaerolineae, and Ignavibacteria. The metagenomes contained diverse bacterial genera in comparison to earlier reports for water samples from both the hot springs [29, 30]. Analysis for bacterial genera prevailing in all four samples revealed the presence of total of 491 genera (Fig. 4a, 4b). Analysis of dominant genera in thermoalkaline spring revealed the presence of autotrophs, heterotrophs, chemoorganotrophs and chemolithotrophs; thermophilic, neutrophilic, and alkaliphilic organisms; as well as aerobes, microaerophilic, facultative anaerobes and obligate anaerobes. The most abundant chemorganotrphs and chemolithotrophs belonged to Brevibacillus (26-42.5%), Staphylococcus (13-15%), Anaerosinus (8-9%), Anoxybacillus (7-9%), Streptococcus (7-8%), and Bacillus (5-11%) which are carbohydrate decomposing cultivable genera significantly enriched in the water and soil samples from Tuwa [36]. The proportion of all the genera in samples with or without enrichment varied. The abundance of Thermus and Brevibacillus was significantly higher (up to 44%) in Unnai soil and enriched soil from Tuwa respectively. Other major genera in the soil samples were Myroides (4-6%), Staphylococcus (4-6%), Fervidobacterium (2-4%), and Vibrio (3-5%). Genus Brevibacillus was not detected in Tuwa soil but enriched in soil and water from Tuwa. In enriched soil samples, Brevibacillus, Anaerosinus and Streptococcus were abundant. Whereas in enriched water samples, Staphylococcus, Anoxybacillus and Bacillus were abundant. Human pathogens like Staphylococcus and Streptococcus are also present in the hot spring environment. Some important genera like Bacillus, Brevibacillus, Anoxybacillus, Thermus, Geobacillus, Aeribacillus, Pseudomonas etc. are identified as potential culturable organisms for the industrially important thermozymes for degradation of polysaccharides, proteins, lipids, xenobiotics and other components [37-39].
KEGG analysis
To reveal the functional potential of bacteria prevailing in the hot springs, the metagenomic data was analyzed using KEGG and a total of 20,321 genes/hits were annotated with KO IDs. The detailed analysis showed that KOs were assigned to pathways for metabolism (>4,000), genetic information processing, cellular processes, environmental information processing, organismal systems and Human diseases (Fig. 5a). Further analysis of the major pathways for metabolism at level-2 showed 18 sub-pathways (Fig. 5b). Pathways for carbohydrate metabolism, energy metabolism, metabolism of cofactors and vitamins, amino acid metabolism, xenobiotics degradation and metabolism and membrane transport were found to be dominant in the four metagenomes. The abundance of genes related to metabolic pathways may be responsible for survival of the organisms into the hot springs environment [10]. Soil samples of both the hot springs showed higher proportion of the reads assigned to different KO categories as compared to enriched samples. Soil samples represent total microbiome present in comparison to enriched samples where only cultivable group dominate the samples due to selective enrichment. Interestingly other KEGG subsystems were found to be associated with biosynthesis of glycan and other secondary metabolites, metabolism of terpenoids, and polyketides etc.
Mining for carbohydrate active enzymes
To have insight into carbohydrate related enzymes present in the metagenomes, predicted ORF reads obtained from the SqueezeMeta pipeline were matched against the CAZy database. The metagenome sequences were distributed among four major enzyme classes of CAZy database-glycosyl transferase (GT), glycoside hydrolase (GH), polysaccharide lyase (PL), and Carbohydrate-Binding Modules (CBMs) [25]. Proportion of GH family was higher (>80%) followed by GT, PL and CMBs (Fig. 6a). GH13 was the most abundant (33%) within the GH family. GH2, GH3, GH57, and other families were comparatively less abundant than GH13 (Fig. 6b). The CAZy enzymes are reported to be involved in the catabolism of polysaccharides. GH and GT enzymes have an important role in energy metabolism, communication and structural support in the organisms [40]. GHs are involved in the catalysis of carbohydrates at O-glycosidic bond and efficiently hydrolyse cellulose, chitin, other abundant carbohydrates in the habitat etc. Amongst GH enzymes whole gene sequences for debranching enzymes (GH13, GH16, GH23, GH51, GH67, GH 94, GH97), hemicellulose degrading (GH2, GH3, GH4, GH5, GH10, GH26, GH31, GH35, GH38, GH43, GH53), and cellulases (GH3, GH5, GH9) are present (Fig. 6c). GT are responsible for the glycosylation of amino acids during cellular processes [41]