Exploring the bacterial community composition of soil from a tropical dry evergreen forest in Tamil Nadu, India


 Metagenomics is a cutting edge omics technology that has been employed in various fields including novel product discovery, diagnostics, and pollutant monitoring. 16S metagenome amplicon sequencing is used for understanding the microbial diversity from various environments. Forest ecosystems have been known for the discovery of novel bacteria and also bacteria that produces novel compounds that are pharmaceutically and industrially relevant. In this study we try to show the bacterial community structure of the soil obtained from a tropical evergreen forest in India. We use 16s metagenomics sequencing and then follow it up with various analysis like alpha diversity analysis, to find out the dominant bacterial species found in these soils. Actinobacteria was found to be the most copmmonly found bacterial phylum followed by proteobacteria, firmicutes, chloroflexi, acidobacteria, verrucomicrobia, bacteroidetes, gemmatimonadetes, nitospirae and other unclassified organisms. Further studies can elucidate on the discovery of novel compounds from these bacteria.


Introduction
Omics based technologies have become very important for understanding life, its various functions and the marked differences. Metagenomics is one of the recent upcoming elds of omics technologies that have been utilized in a wide number of elds including the discovery of novel enzymes, diagnosis of pathogenic microbes, environmental pollutant monitoring, and discovery of novel therapeutic compounds. This omics technology involves the extraction of the total DNA from various samples and then sequencing it [1]. To understand the species diversity in various environments a 16s metagenomic sequencing is performed [2]. The tropical dry evergreen forests are home to a lot of endemic plant and animal species. The forests are known to have a wide diversity of life and much of it still remains unexplored. The various species of plants that are found in these forests also include various important medicinal plants. A lot of the forest land has been cleared to give way to human habitation. The rest of our remaining forest land should be well conserved and hence it is very important to improve our understanding of these forests [3]. The dry evergreen forests are at an altitude that measures about 200-1500 metres and receives about 2500-5000 mm of rainfall. The nutrient composition of the soil of the tropical dry evergreen forests have been studied by research groups. It is suggested that the soil should be high in nutrients as there is continuous supply of fallen leaves which decompose into the soil. The soil was found to contain 273 kgha − 1 of nitrogen, 21 kgha − 1 of phosphorus and 143 kgha − 1 of potassium which are quite high [4]. One of the things that have not been explored is the microbial diversity of these soils. The microbial diversity can help us better understand the soil composition and can also be important to understand the interaction of the bacteria and the plants. In this study we analyze the diversity of bacteria present in the soil taken from tropical evergreen forests in Tamil Nadu by 16s metagenomic sequencing.

1.Sample collection
Soil samples were collected from tropical dry evergreen forest(Vallam Reserve forest) in Chengalpattu, the state of Tamil Nadu in India 2. Sequencing preparation Total genome DNA from samples was extracted using CTAB/SDS method. DNA concentration and purity was monitored on 1% agarose gels. According to the concentration, DNA was diluted to 1ng/µL using sterile water. 16S rRNA genes of distinct regions (16SV3-V4) were ampli ed used speci c primer with the barcode. All PCR reactions were carried out with Phusion® High-Fidelity PCR Master Mix (New England Biolabs). Mix same volume of 1X loading buffer (contained SYBR green) with PCR products and operate electrophoresis on 2% agarose gel for detection. Samples with bright main strip between 400-450bp were chosen for further experiments. PCR products was mixed in equidensity ratios. Then, mixture PCR products was puri ed with Qiagen Gel Extraction Kit (Qiagen, Germany). The libraries generated with NEBNext® UltraTM DNA Library Prep Kit for Illumina and quanti ed via Qubit and Q-PCR, would be analysed by Illumina platform.

Sequencing data processing and analysis
Paired-end reads were assigned to samples based on their unique barcode and truncated by cutting off the barcode and primer sequence. Paired-end reads were merged using FLASH (V1.2.7, http://ccb.jhu.edu/software/FLASH/) [5], a very fast and accurate analysis tool, which was designed to merge paired-end reads when at least some of the reads overlap the read generated from the opposite end of the same DNA fragment, and the splicing sequences were called raw tags. Quality ltering on the raw tags were performed under speci c ltering conditions to obtain the high-quality clean tags [6] according to the Qiime (V1.7.0, http://qiime.org/scripts/split_libraries_fastq.html) [7] quality controlled process. The tags were compared with the reference database (Gold database, http://drive5.com/uchime/uchime_download.html) using UCHIME algorithm (UCHIME Algorithm, http://www.drive5.com/usearch/manual/uchime_algo.html)[8] to detect chimera sequences (http://www.drive5.com/usearch/manual/chimera_formation.html). And then, the chimera sequences were [9] removed. Then the Effective Tags nally obtained. Sequences analysis was performed by Uparse software (Uparse v7.0.1001 http://drive5.com/uparse/) [10] using all the effective tags. Sequences with ≥ 97% similarity were assigned to the same OTUs. Representative sequence for each OTU was screened for further annotation. For each representative sequence, Mothur software was performed against the SSUrRNA database of SILVA Database (http://www.arb-silva.de/) [11] for species annotation at each taxonomic rank (Threshold:0.8 ~ 1) [12] (kingdom, phylum, class, order, family, genus, species). To get the phylogenetic relationship of all OTUs representative sequences, MUSCLE [13] (Version 3.8.31, http://www.drive5.com/muscle/) was utilised to compare multiple sequences rapidly. OTUs abundance information were normalized using a standard of sequence number corresponding to the sample with the least sequences. Subsequent analysis of alpha diversity was performed basing on this output normalized data. Alpha diversity is applied in analyzing complexity of species diversity for a sample through 6 indices, including Observed-species, Chao1, Shannon, Simpson, ACE, Good's-coverage. All these indices for the samples were calculated with QIIME (Version 1.7.0) and displayed with R software (Version 2.15.3).

Results And Discussion
Amplicon was sequenced using Illumina paired-end chemistry and Illumina platform to generate 250bp paired-end raw reads (Raw PE), and then assembled and pretreated to obtain Clean Tags. The chimeric sequences in Clean Tags were detected and removed to obtain the Effective Tags nally. In order to analyze the species diversity in each sample, all Effective Tags were grouped by 97% DNA sequence similarity into OTUs (Operational Taxonomic Units).The heat-map shows a view of species composition and abundance among different samples. Figure 1a shows the OTU table heatmap for the sample SN1. Figure 1b shows the SN1 OTU annotation tree construct using GraPhlAn [14]. Speci c species (showing the top 10 genus in high relative abundance by default) were selected to make the taxonomy tree Taxonomy tree in sample SN1 is shown in Fig. 2a. KRONA [15] visually displays the analysis result of species annotation. Circles from inside to outside stand for different taxonomic ranks, and the area of sector means respective proportion of different OTU annotation results. The Krona graph for SN1 is shown in Fig. 2b. It is observed that the soil of the tropical evergreen forest is majorly composed of actinobacteria followed by proteobacteria, rmicutes, chloro exi, acidobacteria and bacteroidetes. Figure 3a shows the top ten species relative abundance in phylum and Fig. 3b shows the top ten species relative abundance in genus. The top ten phylum in the descending order of abundance includes actinobacteria, proteobacteria, rmicutes, chloro exi, acidobacteria, verrucomicrobia, bacteroidetes, gemmatimonadetes, nitospirae and other unclassi ed organisms. The top ten species in the relative abundance were Bacillus, Acidothermus, Streptomyces, Solirubacter, undidenti ed genus, sphingomonas, pseudonocardia, geodarmatophilus, microvirga, bradyrhizobium. A majority of the genus are unclassi ed. Figure 4 shows the evolutionary tree of the various genus of the bacteria. Alpha diversity is applied in analyzing complexity of species diversity for a sample through 6 indices, including Observed-species, Chao1, Shannon, Simpson, ACE, Good's-coverage Table 1 shows the alpha diversity indices values for the sample.Rarefaction Curves and Rank abundance curves are widely used for indicating the biodiversity of the samples. Rarefaction Curve is created by selecting randomly certain amount of sequencing data from the samples, then counting the number of the species they represent. If the curve is steep, lots of the species remain to be discovered. If the curve becomes atter, a credible number of samples have been taken, which means only the scarce species remain to be sampled. Rank abundance curve is used to display relative species abundance. It also can be used to visualize species richness and evenness. It overcomes the shortcomings of biodiversity indices that cannot present the role the variables played in their assessment[16] Fig. 5a shows the rarefraction curve for the sample SN1. Figure 5b shows the rank abundance curve for the sample SN1. The rarefraction curve shows that a lot of the diversity of the sample has been sampled and determined.Actinobacteria have been previously isolated from the soil samples taken from various areas including forest soil [17]. Many novel actinobacteria have been isolated from forest soil [18]. Actinobacteria are generally known to produce various bioactive compounds like antibiotics which are of both pharmaceutical and industrial relevance. Many of the bacteria isolated from different forest environments have been found to produce various such compounds [19,20]. Hence it is important to further study these different bacteria found in the tropical evergreen forests of India.

Conclusion
Our study shows that actinobacteria as the most abundant bacterial phylum in the soil of tropical evergreen forests of Tamil Nadu. Here in this study we understood the bacterial community composition of soils taken from a tropical evergreen forest in Tamil Nadu. This study can provide valuable insights on the soil composition, soil nutrient status, plant-microbe interactions and can also help give insights into novel compound discovery.   Figure 1 1a shows the OTU table heatmap for the sample SN1. Figure 1b shows the SN1 OTU annotation tree construct using GraPhlAn  The evolutionary tree of the various genus of the bacteria Figure 5 Figure 5a shows the rarefraction curve for the sample SN1. Figure 5b shows the rank abundance curve for the sample SN1