Field of study, sample collection and soil testing
We have chosen two popular tea gardens from the Darjeeling hill region- Makaibari (26.8716° N, 88.2678° E) and Casselton (26.8659° N, 88.2777° E). The distance between these two tea gardens is only 12 min (4.0 km) via NH110 and they were on the same side of the hill. Makaibari is solely an organic tea using organic fertilizers and manure whereas; Casselton uses chemical pesticides and chemical fertilizers. Soil samples were randomly collected from the rhizosphere region of tea plants. Debris from the samples like roots, pebbles, etc. was removed by hand. Soil texture was assessed by the field method. The moisture percentage of soil samples was determined from the difference in weight of freshly collected and oven-dried soil samples. The clean air-dried samples were passed through a sieve and crushed with mortar and pestle. Soil pH, Electrical conductivity, and Loss of ignition were estimated following the protocol of Baruah and Barthakur, 1997). Other important parameters like organic carbon (Walkey and Black, 1974), total soil Nitrogen (Jackson, 1973) phosphorus as phosphate (Baruah and Barthakur, 1997; Jackson, 1973; Bray and Kurtz, 1945) and potassium (Chapman and Pratt, 1961), sulphur were determined during soil analysis. The level of micronutrients was qualitatively assessed by micronutrient kit. Information regarding the health of the tea garden workers were collected from a survey based approach. The persons directly associated with the tea workers health of both organic and inorganic manure based tea gardens were interviewed for collecting information regarding the present health scenario of those gardens.
DNA isolation
Soil DNA was isolated using rhizosphere soil of Makaibari and Casselton tea garden. Before initiating the isolation, 2 g of respective soil (three replicates per sample) was mixed with 4 ml of 1X Tris-EDTA buffer followed by proper vortexing (4-5 min) in 50 ml Oakridge tube. Before cell lysis, 150 µl of lysozyme (50 mg/ml), 100 µl of Proteinase K (20 mg/ml) and 600 µl of freshly prepared 10% SDS were mixed with the soil samples and the sample incubated at 37ºC for 90 min with gentle shaking every 15 min interval. After incubation, 1 ml 5M sodium chloride, 1.6 ml CTAB/NaCl was mixed with respective solution and further incubated at 65ºC for 30 min with occasional mixing in between to release the DNA from microbial cells. The supernatant containing microbial DNA was extracted with chloroform–isoamyl alcohol (24:1, v/v) and collected in a new tube after centrifugation at 6000 rpm for 15 min at room temperature. The aqueous phase containing DNA was precipitated with 0.6 volumes of cold isopropanol and 0.1 volumes of 3M sodium acetate followed by 2 h incubation at -20ºC. The DNA pellets were obtained by centrifugation at 10,000 rpm for 30 min at 4ºC, washed with cold 70% ethanol, and dissolved in 100 µl of 1X Tris-EDTA buffer. To evaluate the purity of the extracted DNA, absorbance ratios at 260 nm/280 nm (DNA / protein) were determined and the DNA was sent for 16s Metagenomics amplicon sequencing (V3-V4) to Genotypic Technology Pvt. Ltd.3. 16s Metagenomics analysis
This analysis has been performed using the Parallel-META pipeline (version 3.5; Jing, et al., 2017).
Taxonomic and functional profiling
Parallel-MEA 3 initially constructs Hidden Markov Models (HMM) using all bacterial 16S rRNA sequences of SILVA (version 123; Pruesse et al. 2007) and forecasts the 16S rRNA gene fragments in metagenomic shotgun samples from both the forward sequences and reversed complementary sequences by HMMER (version 3.1, e-value < 1e-5). Then Parallel-META 3 extracts all the 16S rRNA fragments from metagenomic shotgun sequences for profiling. All 16S rRNA gene sequences (either extracted from shotgun sequences or 16S rRNA amplicon reads) are aligned to the Parallel-META 3 reference database by Bowtie2 (Langmead et al., 2012) for OTU picking, taxonomical annotation and phylogeny construction. The reference 16S rRNA sequences are from a customized database that integrates GreenGenes (version 13-8, sequence similarity on 97% level; DeSantis et al., 2006) with RDP and SILVA consensus taxonomy annotation (assigned by BLASTN with e-value < 1e-30 and similarity >97%), which raised the proportion of annotated sequences at the genus level from 35.8% to 81.5%. The phylogenetic architecture of all reference sequences is built by FastTree (Price et al., 2010). Since the 16S rRNA gene copy number varies greatly among different bacterial species, Parallel-META 3 also calculates the precise relative abundance of each organism by 16S rRNA copy number calibration using IMG database (Markowitz et al., 2012). Besides, considering that the uneven sequencing depth (number of sequences) among multiple samples may introduce bias in detecting diversity patterns (Koren et al., 2013), an optional sequence rarefaction for sequencing depth normalization at the OTU level is provided after the taxonomic profiling.
For prediction and annotation of the functional profile, Parallel-META 3 re-implements the PICRUSt (Langille et al., 2013) algorithm using the KEGG database (Kanehisa et al., 2012) to estimate all the functional genes harbored in a microbiota using 16S rRNA gene OTUs. The functional genes are annotated by KO (KEGG Ontology) and KEGG pathway. Parallel-META 3 also measures the prediction accuracy by the NSTI (Nearest Sequenced Taxon Index) value (Langille et al., 2013), which is calculated by the sum of distances between OTUs and their nearest individually sequenced relatives in the phylogenetic architecture. (Jing, et al., 2017)
After taxonomical and functional profiling, Parallel-META 3calculatedthe sequence counts and relative abundances (normalized into 0–100%) for all OTUs, and estimates the same information for annotated taxa from the phylum level to the genus level, as well as the genes and the pathways. Such data is structured into tables compatible for further analysis in Parallel-META 3 and also suitable for manual examination by users. (Jing, et al., 2017)
The α diversity evaluation and statistics
Parallel-META 3 estimates α diversity describing the inner complexity of each microbiota sample. This process generates rarefaction curves of α diversity based on observed OTU number and Shannon index to determine the adequacy of the sequencing depth. The rarefaction performs a series of random sequence selection on different sequencing depth with bootstrap (default = 20), and α diversity in the curves is calculated by the mean sequence count of each OTU among the bootstrapping procedures. Then the influence of each environmental factor on α diversity is quantitatively evaluated by multivariate statistical analysis with Shannon index, Simpson index and CHAO1 index. (Jing, et al., 2017)
Parallel-META 3 examines β diversity of multiple microbiota samples based on their pair-wise similarity matrix to discover the patterns of organism/gene sharing and variation among samples. The quantitative similarity between each sample pair is computed by Meta-Storms (Su et al., 2012) algorithm, which considers both the relative abundance of OTUs existent in two samples and the distances among OTUs in the phylogenetic architecture. The β diversity evaluation includes unsupervised hierarchical clustering, supervised clustering using PCA (Principal Component Analysis), PCoA (Principal Co-ordinate Analysis) and multivariate statistical analysis that quantitatively evaluates the correlation between environmental factors and the sample similarities. (Jing, et al., 2017)
Biomarker discovery
Parallel-META 3 can categorize key organisms or functional genes that are highly correlated with the variations of the habitats or other types of metadata. Organisms or genes with significant differences among microbial community samples were firstly chosen using Kruskal-Wallis or Wilcoxon rank-sum test as candidate makers, and these candidate makers are then ranked based on their contribution to the differentiation among samples using the Random Forest algorithm. (Jing, et al., 2017)
Construction of microbial interaction network
The microbial interaction network is constructed to explore co-occurrence and co-exclusion patterns of organisms or functional genes across microbial community samples. In the interaction network, each node represents a single organism (or gene), and nodes are connected by links that represent their correlation coefficient of abundance variation among multiple samples (Faust, et al., 2012). Subsequently, Parallel-META 3 exhibits the global pattern among numerous samples by network’s topological characters such as number of nodes, number of isolated islands, diameter, density, centralization and radius. (Jing, et al., 2017)
16S rRNA sequencing
The 16S rRNA sequencing was done using Illumina™ Nextseq platform. The raw reads qualities were checked using FastQC. Mean quality score for each base, per sequence quality score, per sequence GC contents and per base, N contents are calculated. Adapter contamination was present in less than 0.1% reads; therefore, adapter trimming was not performed.
Downstream analysis of the metagenomic sequencing
MG-RAST server was used to analyze the high quality reads from each sample for taxonomic profiling and metagenomic data. We intended to do an in-silico ecological study based on reverse ecology analysis. Hence we selected the type strain from each of the genus identified in both the samples and downloaded their metabolic information from the KEGG database. RevEcoR, an R-based package (Cao et al. 2016) was used to assess the reverse ecological interactions among each set of microbes in terms of competition and complementation.