Insights into endophytic bacterial diversity of rice grown across the different agro-ecological regions of West Bengal, India

Endophytes have recently garnered importance worldwide and multiple studies are being conducted to understand their important role and mechanism of interaction inside plants. But before we indulge in their functions it is necessary to dig into the microbiome. This will help to get a complete picture of the microbes intrinsic to their host and understand changes in community composition with respect to their habitats. To fulfil this requirement in our study we have attempted to dissect the endophytic diversity in roots of rice plant grown across the various agro-ecological zones of West Bengal by undergoing amplicon analysis of their 16S rRNA gene. Based on the measured environmental parameters agro-ecological zones can be divided into two groups: nutrient dense groups, representing zones like Gangetic, Northern hill and Terai-Teesta zone characterised by soil with higher levels of nitrogen (N) and total organic carbon and nutrient low groups representing Coastal saline, Red-laterite and Vindhyan zone mainly characterised by high electroconductivity and pH. Gammaproteobacteria, Alphaproteobacteria, Bacilli and Bacteroidetes were mostly abundant in nutrient dense sites whereas Clostridia and Planctomycetes were concentrated in nutrient low sites. Few genera (Aeromonas, Sulfurospirillum, Uliginosibacterium and Acidaminococcus) are present in samples cultivated in all the zones representing the core microbiome of rice in West Bengal, while some other genera like Lactococcus, Dickeya, Azonexus and Pectobacterium are unique to specific zone. Hence it can be concluded that this study has provided some insight in to the endophytic status of rice grown across the state of West Bengal.


Introduction
Rice is the most important staple food for more than half the world's population. India is one of the leading producers and consumer of rice and the state of West Bengal is one of the highest producers of rice in India (Kunda et al. 2018). In spite of being the leading producer, rice production is impeded by several factors. In West Bengal, which can be broadly divided into six agro-ecological zones, rice is cultivated in almost all the zones but the production of rice varies from one zone to another (Adhikari et al. 2011).
The difference can be attributed due to geographical variation. The regions comprising the old and new alluvial soil i.e. Gangetic Alluvial zone and Vindhyan Alluvial zone are the main rice producing areas (Ghosh et al. 2005) whereas the western part comprising the Red and Laterite zone are drought prone regions and the extreme southern part has Coastal Saline zone which is frequently washed by brackish water from the Bay of Bengal making it saline thus affecting rice production (Adhikari et al. 2011). Production of rice is also affected since many zones fall under flood-prone ecosystem (Adhikari et al. 2011).
In recent years, the use of endophytes for improving rice productivity has received considerable attention (Edwards et al. 2015;Mashiane et al. 2017). Role of bacterial endophytes of rice in stimulating plant growth and contributing to sustainable rice production are well documented (Hardoim et al. 2011;del Castillo et al. 2015). Uses of indigenous microbes to improve rice plant growth have also been established (Gholamalizadeh et al. 2017;Ji et al. 2014;Shabanamol et al. 2018).
For a better understanding of the associations between plants and microbes, it is imperative to identify the plant endophytic microbiomes (Kunda et al. 2018). Hence, studying endophytic diversity becomes the foremost requisite in understanding the association of bacteria with plants. Since bacteria have established a close bilateral interaction with the surrounding environment, studying their diversity could play a crucial role in understanding the biodiversity-ecosystem functioning and various plant-related processes (Khare et al. 2018;Kumar et al. 2018).
Studies on total microbial diversity using metagenomic methods have provided knowledge on rice endosphere composition and function as well as their dynamic changes (Sessitsch et al. 2012). In India, research on the diversity of bacteria associated with rice are either based on diversity of rice seed endophytes (Chaudhry et al. 2017;Verma et al. 2001) or from rice plants grown under aerobic condition (Vishwakarma and Dubey 2020) or from any single variety (Banik et al. 2017.;Sengupta et al. 2017). These works can be characterised as one dimensional, focusing on a single site with a limited sample size. Even though endophytic microbes of many plants are being studied but our knowledge of rice endophytic bacterial diversity is limited. Abundance and diversity of bacterial communities in rice fields that affect crop growth are not well documented (Ahn et al. 2016). In view of the above, the aim of our study was to provide information regarding the rice endophytic community across different agro-ecological zones, of West Bengal. Since agro-ecological zones are diverse, this study will provide a baseline on various endophytes that colonize rice plants grown throughout West Bengal. As our sampling covers different ecological zones, we hypothesized that different endophytic communities will colonize rice plants due to the influence of the environmental variables that characterize these ecological zones. Hence, the need arises for search of potential endophytes that comprise the core microbiomes of different regions so that formulating endophyte-based stress management becomes easy.
Therefore the objectives of our study were: (i) to characterize the microbial community among the six different agro-ecological regions based on 16S rRNA gene sequencing; (ii) to identify differences in the community composition among the zones, i.e. characterizing unique and core microbiomes of the different zones; and (iii) to examine the relationship of endophytic bacterial communities with the environmental covariates.

Site description and sample collection
Rice (Oryza sativa) plants were sampled at the vegetative stage in the year 2016-2017 during Spring (March) and Autumn (September) from the six different agro ecological regions of West Bengal (Fig. 1), namely, Coastal saline zone (CSZ), Gangetic alluvial zone (GAZ), Northern Hill zone (NHZ), Red and Laterite zone (RLZ), Terai-Teesta alluvial zone (TTAZ) and Vindhyan alluvial zone (VAZ). From each zone, three rice fields were sampled and three plants were randomly collected from each field along with their respective rhizospheric soil. The rice varieties that were primarily cultivated in those particular sampling sites were sampled. For example, the local name of the cultivar from CSZ is Lal Miniket, GAZ is Swarnamasuri, NHZ is Kalo Nunia, RLZ is MTU1010, TTAZ is Khitish and VAZ is Tualipanji. The plants were dug out carefully to prevent any damage to the roots. Immediately after collection the samples were placed in autoclaved plastic bags (Himedia), kept on ice and brought back to the laboratory for further processing within 24 h. Rice roots of plants from the same field were pooled together for DNA extraction.

Analysis of soil physicochemical properties
The physical and chemical properties including pH, electrical conductivity (EC), total organic carbon (TOC), available nitrogen (N), available phosphorus (P) and available potassium (K) of the soil samples were also measured. pH and electrical conductivity of the soil samples was determined using Systronic (India) pH meter model No. 335 and Systronic (India) conductivity -TDS meter 307, respectively. The organic carbon (OC) of soil was estimated by titration following the method of Walkley and Black 1934. The quantity of nitrogen was measured using Kjeldahl's method. The quantity of available phosphorous of soils was determined by following Olsen's method by spectrophotometric analysis using spectrophotometer (Systronics -117, India) (Olsen et al. 1954). Available potassium was determined from the neutral normal ammonium acetate (NH 4 OAc) extracts (soil: NH 4 OAc, ratio of 1:10) by Flame photometer (Rathje 1959).

Metagenome extraction and amplicon metagenomic sequencing
Surface sterilization of roots was done following the protocol by Sessitsch et al. (2012), with few modifications which has been established in our previous paper (Kunda et al. 2018). Briefly, soil particles were removed by washing and then roots were separated from the shoot portion, rinsed with sterile distilled water followed by 0.1% tween 20 solutions and surface sterilized. The surface sterilized roots were then frozen with liquid nitrogen and grounded to a fine powder using sterile mortar and pestle. DNA was extracted in duplicates using Power Plant Pro DNA isolation kit (Mo Bio) following manufacturer's instructions. The replicated DNA samples were pooled together and sequenced by Eurofins. Sequencing was performed on Illumina Miseq platform in a 2 × 300 bp paired-end run. PCR amplification of the hypervariable V3-V4 regions of bacterial 16S rRNA gene was done with universal primers 341F and 806R and multiplexing index sequences as well as common adapters required for cluster generation (P5 and P7) as per the standard Illumina protocol. The amplicon libraries were prepared using Nextera XT Index kit (Illuminainc.) and Nextera XT DNA Library Prep Kit (Part # 15044223 Rev. B). The amplicon libraries were purified by 1× AMpure XP beads and quantified using Qubit fluorometer. The raw paired-end primer trimmed sequences were provided by Eurofins, Germany.

Sequence data processing
For all the samples the raw FastQ dataset (R1-forward read & R2-reverse read) were processed following (Hassenrück et al. 2016) and  protocol. Sequences were trimmed based on minimum quality score of 15 and window size of 4 bases using trimmomatic v0.32. The trimmed sequences were then merged using PEAR v0.9.5 and OTU Fig. 1 Map of sampling sites. Map of West Bengal showing the sampling sites with different colour codes. Each site belonged to a particular agroecological zone. The map was created with the help of ArcGIS (operational taxonomic unit) clustering was performed using swarm v2.0 with default parameters. The quality filtered OTUs were taxonomically assigned using SINA (SILVA Incremental Aligner; v1.2.11; Silva reference database release 132) with a minimum similarity alignment of 0.9. OTUs assigned as mitochondrial /and chloroplast were excluded from further studies using well standardized R scripts (Hassenrück et al. 2016).

Statistical analysis
A principal component analysis (PCA) was performed on the environmental parameters to evaluate the effects of these parameters i.e. pH, EC, N, P, K and TOC on sampling sites. Manova (multivariate analysis of anova) was done to check differences in the parameters among the sampling sites.
Alpha diversity indices were calculated to evaluate species richness and evenness of bacterial community composition in all the samples. The α-Diversity indices were measured using repeated random sub-sampling of the amplicon sequence datasets. Species richness and evenness were represented by OTU number, Chao 1 estimator, Shannon diversity index, inverse Simpson diversity index, percentage of absolute (occurs only once in the complete data set) and relative singletons (occurs only once in one sample in the complete data set) as well as absolute doubletons (occurs only twice in the complete data set). Differences in bacterial communities as indicated by alpha diversity among the agro-ecological zones were tested with a paired Wilcoxon test. P-values of pair wise comparisons based on Wilcoxon tests were adjusted using FDR correction.
Beta diversity as indicated by the differences in composition of the bacterial communities among the samples were visualized by cluster analysis and non-metric multidimensional scaling (NMDS) using a Bray-Curtis dissimilarity matrix calculated separately from the OTU data of the different agro-ecological zone samples. In case of cluster analysis Bray-Curtis dissimilarities were calculated based on relative sequence abundances of OTUs. Analysis of similarity (ANOSIM) was tested to assess the separation of bacterial communities among the sampling sites based on similar environmental parameters. Redundancy analysis (RDA) assessed the ability of environmental parameters to explain the variation in bacterial community composition. Prior to RDA, the data set was reduced by removing OTUs with low sample coverage and rare OTUs i.e. OTUs that did not occur in at least 2 replicates in each location per sample type and those that were not present in less than 10% of the samples were removed from the dataset. Although this removal affected 83% of the OTUs we confirmed that this process did not affect the trends in beta diversity as is given by mantel test. Furthermore, the sequence counts were clr-transformed with the aldex.clr function of the R package ALDEx2, using the median of 128 Monte Carlo Dirichlet instances. Prior to significance testing parameters were excluded using forward model selection and best fitting RDA models were selected based on maximum adjusted R 2 and minimum AIC value (Akaike Information Criterion, which estimates the quality of statistical models based on given datasets). Variance inflation factors of the explanatory variables in the best-fitting models were below 10. The differentially abundant OTUs among the zones were reflected in Dotplot prior to this test. The unique and common genera among the agro-ecological zones were identified by using Venn diagram (Bardou et al. 2014). Those genera that had abundance less than 0.5% were not included in the Venn diagram.
All statistical analysis and figure visualization was performed in R software package, version 3.6.2 using the R core distribution (R Core Team 2019) along with additional packages vegan (Oksanen et al. 2016) and ALDE × 2 (Fernandes et al. 2014).

Nucleotide accession number
The raw sequence data reported in this paper were submitted to NCBI with Bioproject accession numbers for 16S rRNA gene sequences as following: PRJNA471586, PRJNA471587, PRJNA471590, PRJNA471599, PRJNA471617 and PRJNA394071.

Environmental parameters
Studying the environmental parameters of the different agro-ecological zones revealed that all the six parameters tested, i.e., pH, electroconductivity (EC), total organic carbon (TOC), available phosphorus (P), potassium (K) and nitrogen (N), differs significantly among the zones (Supplementary Fig. 1) as revealed by Manova test (Manova, Pillai = 4.16, df = (5,12), p-value = 0.0001). PCA analysis showed that the first two axes, PC1 and PC2, could explain 73.2% variation in the samples based on environmental parameters. pH, EC, available P, K and N were the major contributors to PC1 which accounted for 50.5% of the total variation. Similarly, 22.7% variation in PC2 is mainly attributed to TOC (Fig. 2). Based on the PCA patterns the sampling sites that clustered together belonged to a particular agro-ecological zone. CSZ was distinctly separated from rest of the zones based on all the aforementioned environmental parameters with emphasis on EC and N. It recorded the highest EC value of 5.61 dS/m and the lowest N value of 21.99 ppm in average. Samples from NHZ and TTAZ (both belonging to the northern part of West Bengal) were different from the other zones with respect to their TOC and N content. These two zones recorded highest values of TOC (1.45% and 1.65%) and N (70 ppm and 73 ppm) respectively. Likewise, the separation of VAZ was mainly driven by pH as it has recorded the highest pH among all the sampling sites which is 6.9 (close to neutral) on average. In rest of the zones pH values are in the range of 5, indicating slightly acidic soil. PCA pattern corroborated the separation of sampling sites was based on the studied environmental parameters. Zones like GAZ, NHZ and TTAZ can be categorised as nutrient dense group based on its rich available nutrient profile as well as low EC and pH whereas the other three zones CSZ, RLZ and VAZ labelled by lower nutrients profile and/ either high pH and EC can be regarded as nutrient low groups.

Bacterial diversity and taxonomic composition
A total of 115,73,768 paired end reads were generated by amplification of V3-V4 region of the 16S rRNA gene with average reads of 321,494 (ranging from 181,085 to 507,482) per sample. After trimming and merging the paired end reads high quality reads were clustered using > 97% sequence identity which generated a total of 386,785 OTUs. To avoid rare biosphere and PCR artifacts low abundance OTUs as well as those affiliated to chloroplast and mitochondria were removed which resulted in taxonomically classified denoised unique sequences clustered into 21,608 OTUs (Supplementary Table 1). The OTUs were again pruned and finally 3491 OTUs were obtained. Mantel test was performed using Bray-Curtis dissimilarities method (Mantel test, R = 0.99, p = 0.001) and Jaccard dissimilarity method (Mantel test, R = 0.98, p = 0.001) which proved that the trends in beta diversity was not altered after data pruning.
The Alpha diversity, i.e. within sample diversity, was indicated as rarefied average OTUs per sites. nOTUs from all the samples ranged from 408 to 3197 where GAZ recorded the highest number and CSZ the lowest. This significant difference in species richness among the different agro-ecological zones was given by Kruskal-Wallis test (χ 2 = 12.9, df = 5, p-value = 0.02). CSZ was significantly different from GAZ (p-value = 0.001), RLZ (p-value = 0.02) and TTAZ (p-value = 0.04), whereas GAZ differed significantly from NHZ (p-value = 0.01). Species richness and evenness as indicated by abundance based coverage estimator (invS) and Shannon index also followed the same pattern which was highest for GAZ (21.6 and 4.35) and lowest for CSZ (1.14 and 0.41) respectively ( Supplementary Fig. 2).
At phylum level, the bacterial communities of the agroecological zones could be explained with only 4 abundant phyla affiliated to Firmicutes (22-98% among all samples), Proteobacteria (1-57%), Epsilonbactereota (0.14-34%) and Bacteroidetes (0.03-20%) which represented almost 97% of the total sequences. Zones like GAZ, NHZ and TTAZ showed predominance of Proteobacteria (32-57%) followed by Firmicutes (16-50%), whereas in the other three zones, viz., RLZ, VAZ and CSZ Firmicutes (42-98%) was the dominant phylum succeeded by Proteobacteria (1-39%). The other phyla that represented GAZ are Bacteroidetes and Spirochaetes. Rest of the zones except CSZ were governed by Epsilonbactereota and Bacteroidetes whereas almost all the samples of CSZ are represented by Firmicutes. The phylum, Nitrospirae, was present among samples from all the zones except in CSZ. Relative abundances of phyla Fig. 2 Principal component analysis (PCA) of environmental parameters measured at different agro-ecological zones. EC electroconductivity, N available nitrogen, P available phosphorus, K exchangeable potassium, TOC total organic carbon. Convex hulls were introduced to mark the three sampling sites that belonged to the same zone such as Actinobacteria, Bacteroidetes, Epsilonbactereota, Patescibacteria, Planctomycetes and Proteobacteria were significantly different across the different zones (Anova, p-value < 0.01) (Fig. 3A).
The same pattern was also accompanied at the class level. The top 7 dominant classes obtained in this study represented almost about 95% of the sequences. GAZ, NHZ and TTAZ showed abundance of Gammaproteobacteria succeeded by Clostridia which were totally opposite as in case of the other three zones. The next abundant class for GAZ was Bacteroidia whereas for NHZ, TTAZ and RLZ it is Campylobacteria. VAZ on the other hand showed abundance of Negativicutes. Noteworthy to mention, Alphaproteobacteria mostly dominated zones like NHZ and Bacilli were abundant only in GAZ whereas in all other zones it did not contribute significantly in community composition. For CSZ 95% of the sequences were occupied by Clostridia. The class Planctomycetes was found in only 3 zones, NHZ, TTAZ and RLZ. Based on relative abundance, classes such as Actinobacteria, Bacilli, Bacteroidia, Campylobacteria, Clostridia, Gammaproteobacteria, Negativicutes and Planctomycetes differ significantly across the agro-ecological zones (Anova, p-value < 0.01) (Fig. 3B).
In lower taxonomy level, the top 14 families explained the maximum variation in bacterial community composition among the different zones and they almost accounted for 80% of the sequences. Except for TTAZ in all the other zones, viz. GAZ, NHZ, RLZ, VAZ and CSZ family Clostridiaceae 1 was dominant. In case of TTAZ, Sulfurospirillaceae was the predominant family. The next abundant family was different for all the zones. For GAZ its Enterobacteriaceae, both NHZ and VAZ showed abundance of Rhodocyclaceae, RLZ was dominated by Sulfurospirillaceae and TTAZ by Clostridiaceae 1. Other families which showed dominance among the zones are Aeromonadaceae among GAZ and NHZ, Burkholderiaceae among NHZ and RLZ and Veillonellaceae in VAZ (Fig. 3C).
The unique and core genera distributed along the agroecological zones were identified by Venn diagram. There are 15 genera which are common in all the sampling sites; among them Clostridium sensu stricto 1 was the most dominant with an abundance of about 33.7% (average of all the samples) and genus Enterobacteriaceae Incertae sedis was the least abundant with average of 0.5%. However, two zones have few genera distinctive to them. GAZ and TTAZ have 3 unique genera with abundance not less than 0.5%. Moreover, the abundance of these isolated genera in all the samples was not greater 1.1%, indicating they represent the rare microbiome ( Supplementary Fig. 3).

Variation in microbial community composition
In OTU level, there is a significant variation in bacterial community composition among all the zones as given by ANOSIM (ANOSIM, R = 0.549, p = 0.001). The OTUs that are responsible for the variations in bacterial community among the six agro-ecological zones were identified by dot plot (Fig. 4). In total, there are 17 differentially abundant OTUs among all sampling sites. These OTUs were mainly affiliated to Gammaproteobacteria (8 OTUs), Negativicutes (4 OTUs), Clostridia (2 OTUs), Alphaproteobacteria (1 OTU), Campylobacteria (1 OTU), and Spirochaetia (1 OTU). Among class Clostridia, OTU 1 belonging to Clostridium sensu stricto was abundant in all the sampling chosen. For taxa that were unclassified on the respective level of resolution, the next higher level classified taxonomic rank is shown sites whereas OTU 11 also affiliated with Clostridium sensu stricto was found to be abundant in RLZ, TTAZ and VAZ only and was not present in the other 3 sites. Genera such as Uliginosibacterium (OTU3) and Acidaminococcaceae Incertae sedis (OTU7), of the classes Gammaproteobacteria and Negativicutes, respectively, were also found to be enriched in all the zones. The genus, Veillonellaceae Incertae sedis (OTU 6 and OTU14) was abundant among NHZ, RLZ, TTAZ and also VAZ but was absent in CSZ. It was also observed that genera like Enterobacter (OTU5) and Burkholderiaceae Incertae sedis (OTU68) both belonging to the class Gammaproteobacteria were found to be dominant only in 3 zones, viz. TTAZ, RLZ and VAZ and show decreasing trend in the other 3 zones. Another genus, Comamonas (OTU70) of the class Gammaproteobacteria, is predominant only in VAZ. Except GAZ and CSZ, the genera Aquitalea (OTU22) and Aeromonas (OTU4 and OTU10), belonging to class Gammaproteobacteria, and genus Sulfurospirillum (OTU2), belonging to class Campylobacteria, are present distinctly in all the zones. The class Alphaproteobacteria, represented by the genus Pleomorphomonas (OTU30), is enriched among the sampling sites of NHZ and in some sites of VAZ. In a similar fashion, the genus Treponema 2 (OTU51), belonging to class Spirochaetia, was abundant in sites of GAZ and RLZ only.
To represent the original position of communities in multidimensional space sampling sites were placed in an ordination space as well as their associated environmental parameters on NMDS plot (Fig. 5). Based on Bray-Curtis dissimilarities the six agro-ecological zones tend to cluster apart from each other. CSZ was separated from the rest of the zones based on EC; NHZ was separated due to TOC and separation of RLZ and VAZ were mainly associated with N and P. Envfit that indicates which environmental parameters . Asterisks indicate OTUs detected as differentially abundant among the sampling sites are strongly correlated with the data shows that patterns in bacterial community composition has strong correlation with EC, K, N and TOC. To assess the role of environmental factors in explaining the variation in bacterial community composition among the agro-ecological zones redundancy analysis (RDA) was performed (Table 1). RDA revealed that P and TOC could explain about 19% variation in the bacterial communities. Although the explanatory power is not considerably high but still it is statistically significant. P alone (pure effects) contributed significantly to explain 12% variation in bacterial communities (AIC -180.8) while the effect of TOC along with P (total effects) was significant in explaining the variation (AIC -180.38). Other parameters like pH, EC, K and N were not selected in the best fitting model apparently because they did not contribute essentially in explaining the variation. Together NMDS and RDA supported each other's result that TOC is the most determinant variable which dominantly explained the variation in bacterial community composition across these zones.  Table 1 Contribution of agro-ecological zone or soil parameters like P and pH, P and K, P and TOC in explaining variation in bacterial community composition based on redundancy analysis (RDA) The numbers which are given in bold are statistically significant. p value < 0.05 The proportion of explained community variation is expressed as R 2 values. Significances of the respective F-ratios were tested by performing 1000 Monte Carlo permutation tests and are indicated by **very significant (P ≤ 0.01), ***highly significant (P ≤ 0.001), and ns when not significant (P > 0.05

Discussion
Endophytes are ubiquitous in nature. Factors such as geographic locations, soil source, host genotype and cultivation practice influence microbial communities in soil which in turn play a critical role in establishing the endospheric microbes (Edwards et al. 2015).
The six agro-ecological zones of West Bengal differ from each other in almost all the environmental parameters tested, which were expected as the zones vary in their soil, landform and climate characteristics (Banerjee et al. 2019). The differences could also be attributed to the difference in cultivation practices across the zones (Kunda et al. 2020). In our result, the southernmost part of West Bengal represented by CSZ reported the highest EC, which is quite obvious given the fact that saline zones are characterised based on high EC (Sen and Maji 1994). PCA results also corroborated that separation of CSZ from the rest of the zones was mainly due to its high values of EC. The northern part of West Bengal represented by the zones-NHZ and TTAZ was seen to be rich in TOC which was in line with reports that indicated these regions have experienced low biological activity and lower decomposition of biomass. Soil organic carbon is one of the most important soil quality indicators which indicates fertility and productivity of soil (Sahoo et al. 2019). The farmers from these regions mostly cultivate high-yielding varieties of rice (Adhikari et al. 2011), which justifies the fact that there is no requirement of additional fertilizers and the soil also remains less disturbed thus experiencing high organic carbon levels. These two zones also reported containing the highest available N, one of the most important nutrients for plant growth. Reports are there which indicated that the soil of these regions is generally high in nitrogen content (Devi and Sherpa 2019), making it nutrient rich. As per PCA, soil pH in VAZ was highest. The pH of VAZ was nearly neutral but the soil of the remaining zones had an acidic pH. Low pH could be accounted for greater rainfall in these regions that washes the basic cations rendering the soil acidic.
Alpha diversity indicated that there was a significant difference in species richness among the zones. Available reports show that soil fertility has a positive correlation with microbial diversity (Furtak et al. 2019). The high microbial species richness of rice plants in GAZ is reflected by the fact that this soil is most fertile (Banerjee et al. 2019), whereas low species richness of CSZ could be due to its high salinity, since excess salinity decreases species diversity and alters the community composition in plants . The difference between GAZ and NHZ is also due to the fact that NHZ is not as fertile as GAZ and it faces difficulty in cultivation (Banerjee et al. 2019). Hence NHZ is low in endophytic microbial diversity because although the soil is nutrient dense, it is not productive, resulting in lower plant growth (Adhikari et al. 2011).
The members of Gammaproteobacteria, found universally in rice (Kunda et al. 2018) and abundantly in rice endorhizosphere (Moronta-Barrios et al. 2018), was found to be the most dominant taxa in fertilized (Fierer et al. 2011), intensively cultivated (Hamamoto et al. 2018) and agricultural soil (Kuramae et al. 2012) or soil rich in N (Wang et al. 2018, which was in line with our results that indicates its high abundance in GAZ, NHZ and TTAZ. These zones are characterised by soil rich in nutrients like organic carbon and nitrogen. As endophytes are a part of rhizospheric soil, therefore, we can say that enrichment of Gammaproteobacteria as root endophyte is seen in nutrient rich environment. Another class, Bacilli, is also present abundantly in permanent grasslands and arable land (Mendes et al. 2013), which justifies their highest occurrence in GAZ. Bacteroidetes being abundant in nutrient rich soil (He et al. 2017) explains the high proportion of class Bacteroidia as rice root endophyte in GAZ. Phylum Firmicutes was abundant in zones like CSZ, RLZ and TTAZ, which were categorised as nutrient low groups. Higher abundance of this phylum was reported from the rhizosphere of plants grown in extreme environments (Mukhtar et al. 2019), for example, high saline conditions . Hence, its representative class Clostridia was dominant in CSZ. This class has also been reported as an abundant phylum in rice seed (Raj et al. 2019) as well as in rice soil (Hayat et al. 2010). Alphaproteobacteria was known to be associated with N and TOC enriched soils (Fierer et al. 2011;Kim et al. 2014). Alphaproteobacteria was mostly dominant in NHZ which was characterised by high TOC and N. Abundance of this class in NHZ can also be related to its plant growth promoting properties. This class occur in rice roots endophytes largely irrespective of genotypes because of their universal adaption in rice. They confer beneficial functions which might be the driving force for their selection (Hardoim et al. 2011). It has been reported that Planctomycetes have more stable and resistant life-strategy (De León-Lorenzana et al. 2018) and hence were found in zones like RLZ, NHZ and TTAZ. Planctomycetes is also reported to be abundant under drought conditions (Dai et al. 2019) which is a characteristic of RLZ.
The most common genera found as rice root endophytes among all the sampled zones are Clostridium sensu stricto (33.7%), Sulfurospirillum (8.3%), Uliginosibacterium (7.7%), Aeromonas (5.2%), Veillonellaceae Incertae sedis (2.8%), Acidaminococcaceae Incertae sedis (2.4%), Lachnospiraceae Incertae sedis (1.6%), Bacillus (1.1%), Burkholderiaceae Incertae sedis (1.1%), Shewanella (1%) and Massilia (1%). These genera are already reported as rice endophytes (Hardoim et al. 2011;Kunda et al. 2018;Walitang et al. 2017) and their presence in all the sampling sites was presumably because they represent the core endophytic microbiomes of rice in West Bengal. Most of these genera are known diazotrophs and are reported to have plant growth promoting properties. They being present inside plants may help their host by promoting growth, tolerating stressful conditions or producing allelopathic substances to compete with other species. Genus like Massilia reported as nitrate reducers have an important role in the nitrogen cycle and thus act as a plant growth promoting bacteria (Wemheuer et al. 2017). This genus also induces the production of napthoquinones like alkannin and shikowin in root cultures of a medicinal plant, Alkanna tinctoria Tausch (Boraginaceae), and thus possesses anti-microbial properties (Rat et al. 2021). Another genus, Shewanella, was reported to alleviate salt stress (Paul and Lade 2014). Moreover, as revealed in our analysis, two zones have some specific genera that are unique to them. The unique genera for GAZ were Dickeya, Lactococcus and Prevotellaceae Incertae sedis while for TTAZ they were Azonexus, Pectobacterium, and Diplorickettsia. Dickeya and Lactococcus which were known rice endophytes (Kunda et al. 2018;Marag and Suman 2018) and Prevotellaceae, previously reported as an endophyte of fruit Pitaya (Ren et al. 2018) with no known functions in plants. Among the unique genera of TTAZ, Azonexus was reported as rice endophytes (Kunda et al. 2018) but the other two genera were not reported as rice endophytes so far. Diplorickettsia was reported as an insect endosymbiont (Mathew et al. 2012) and Pectobacterium as plant pathogen (Davidsson et al. 2013). Pectobacterium possess a large number of plant cell-wall degrading enzymes (Davidsson et al. 2013) and thus may have colonised rice roots. Maybe these endophytes are signatorial bacterial genera of the particular zones whose functions are yet to be discovered.
According to the dot plot, out of the 17 differentially significant OTUs, many were reported as plant growth promoting bacteria (PGPB). Genera such as Clostridium, Bacillus, Comamonas, Aeromonas, Aquitalea, Burkholderia and Enterobacter can promote plant growth by fixing nitrogen, solubilising phosphorus, potassium, zinc, producing phytohormones like IAA as well as can protect plants from pathogen attack by producing HCN and siderophore Ishizawa et al. 2017;Mendes et al. 2013;Nath Yadav et al. 2017;Radhakrishnan et al. 2017;Saxena et al. 2020;Wang et al. 2020B). Clostridium apart from being a plant growth promoter (Doni et al. 2014;Emami et al. 2019) was also reported to tolerate and mitigate soil salinity (Rahman et al. 2017). Interaction of these bacteria with rice roots can contribute to the growth of the plants and can also help plants to grow under normal as well as stressful conditions.
It is worth mentioning that due to logistical challenges, we were unable to complete sampling at once. We have done sampling in 2 years in two different seasons. This could also contribute to any differences in microbial community composition. Since all the samples were not collected in the same season any direct relation of seasonal variation with endophytic composition could not be drawn.

Conclusions
1. From our study, we have seen that the sampling zones could be broadly divided into two groups-nutrient dense soil group represented by GAZ, NHZ, TTAZ and nutrient low soil group represented by CSZ, RLZ and VAZ. The diversity was enriched in nutrient dense zones than in nutrient low groups. It was also found that classes like Gammaproteobacteria, Bacteroidetes and Bacilli were abundant in nutrient dense zones while Clostridia, Planctomycetes were dominant in nutrient low zones. 2. Also, the bacterial communities of different habitats differed in bacterial diversity and composition. Some endophytes like Aeromonas, Acidaminococcus, Bacillus, Clostridium, Sulfurospirillum, Uliginosibacterium are associated ubiquitously with rice across all zones. They may comprise the core microbiome of rice in West Bengal. Other genera like Prevotellaceae Incertae sedis, Lactococcus, Dickeya, Azonexus, Diplorickettsia and Pectobacterium are unique to particular zones and are not distributed uniformly in rice. 3. This diversity study has helped us to visualise the endophytic status of rice grown throughout the state of West Bengal which has provided some insight into which endophytes are inhabiting rice and what may be their probable function in that particular zone. 4. Our future plan of work will involve culture dependent characterization of endophytes from these regions and studying their role in promoting plant growth under different conditions.