SSR marker-based genetic diversity analysis and SNP haplotyping of genes associating abiotic and biotic stress tolerance, rice growth and development and yield across 93 rice landraces

As rice is the staple food for more than half of the world’s population, enhancing grain yield irrespective of the variable climatic conditions is indispensable. Many traditionally cultivated rice landraces are well adapted to severe environmental conditions and have high genetic diversity that could play an important role in crop improvement. The present study revealed a high level of genetic diversity among the unexploited rice landraces cultivated by the farmers of Kerala. Twelve polymorphic markers detected a total of seventy- seven alleles with an average of 6.416 alleles per locus. Polymorphic Information Content (PIC) value ranged from 0.459 to 0.809, and to differentiate the rice genotypes, RM 242 was found to be the most appropriate marker with a high value of 0.809. The current study indicated that the rice landraces are highly diverse with higher values of the adequate number of alleles, PIC, and Shannon information index. Utilizing these informative SSR markers for future molecular characterization and population genetic studies in rice landraces are advisable. Haplotypes are sets of genomic regions within a chromosome inherited together, and haplotype-based breeding is a promising strategy for designing next-generation rice varieties. Here, haplotype analysis explored 270 haplotype blocks and 775 haplotypes from all the chromosomes of landraces under study. The number of SNPs in each haplotype block ranged from two to 28. Haplotypes of genes related to biotic and abiotic stress tolerance, yield-enhancing, and growth and development in rice landraces were also elucidated in the current study. The present investigation revealed the genetic diversity of rice landraces and the haplotype analysis will open the way for genome-wide association studies, QTL identification, and marker-assisted selection in the unexplored rice landraces collected from Kerala.


Introduction
Growing population, unhinged environmental conditions, and decreasing agricultural resources profoundly impact global crop production. The current global yield increase rate (0.9-1.3% per year) of major crops like rice, wheat, maize, etc., is inadequate to meet the food demand for the estimated nine billion people in 2050 [1]. Rice yield needs to be increased by almost 40% by 2030 to satisfy the growing demand without adversely affecting the resource [2]. As rice is one of the staple food crops, the ultimate way to increase the yield within the obtainable agricultural lands is genetic improvement in rice.
Landraces are natural cultivars of farmers with adaptability to varying environments and evolve to generate new genetic traits that address the challenge against the biotic and abiotic stresses with copious diversity [3]. Rice landraces are a crucial requirement for food security, but the germplasm of the rice gene pool is poorly understood [4]. Molecular markers are a powerful means of determining genotypes and investigating genetic diversity. Among the available molecular marker systems, microsatellite loci, or simple sequence repeats (SSR) and Single Nucleotide Polymorphisms (SNP) became important DNA markers as they display a high allelic variance between organisms. In the present scenario, the genetic information of rice landraces 1 3 is essential for improving the existing crop varieties with recent molecular techniques to evade food scarcity through the concept of sustainable agricultural development.
Haplotypes are sets of genomic regions within a chromosome which tend to be inherited together due to the low probability of recombination among them in the given population [5]. Haplotype blocks consist of a combination of adjacent allelic markers. The number of polymorphisms in the haplotype blocks developed by the combination of SNP alleles represents the genetic diversity in rice [6]. Determination of SNP in the regions linked with a particular trait increases the probability of detecting variants located on the same haplotype and develops the accuracy of identifying candidate genes in association mapping studies [7]. The principles of genomic selection (GS) has been used considerably in genetic improvement programs of complex traits [8]. Recent advances in DNA sequencing and genotyping techniques help the plant breeding community to achieve high yields even in incompatible environments. GS can be executed for plant improvement programs using individual SNPs and haplotypes for complex traits.
Plants are sessile and struggle to cope with environmental stresses, which recurrently limit the overall plant growth. Haplotype analysis has been conducted on several important genes from diverse rice genomic sequences like quality and yield traits [9], disease resistance [10], flowering time [9], etc. As haplotypes are genomic sequence variants that cooccur in a chromosome, inherited genetic variations have created diversity among these haplotypes [11]. So haplotype identification and analysis will explain the genetic patterns with higher accuracy and produce knowledge for effective utilization of the existing rice germplasm.
Generally, excessive inbreeding decreased genetic diversity and created a more uniform genome pattern in rice. To overcome the scenario, genome or haplotype shuffling need to be carefully considered in future rice breeding. Our objectives are to explore the genetic diversity and characterize the novel haplotypes exhibiting association with different stress tolerance, growth and development, and yield-enhancing properties in the rice landraces collected from Kerala state.

Plant material and DNA extraction
For the current study ninety-three rice landraces were collected from the northern region of Kerala (Supplementary  Table S1). Genomic DNA was isolated from two weeks old plant leaf sample using CTAB plant DNA extraction protocol [12] and the DNA quality check was carried out in Nanodrop by determining A260/280 ratio followed by visualization in agarose gel electrophoresis. Twelve different SSR markerswith high polymorphism  information content distributed across the chromosomes  were selected for the genetic diversity analysis (Supplementary Table S2). The PCR amplification was carried out in 10 µl reaction mixture containing Origin Taq PCR mixture (2x) -2.5 µl, Forward primer (10 pmol/ µl)-0.5 µl, Reverse primer (10 pmol/ µl)-0.5 µl, Template DNA (< 200 ng)-1 µl and 5.5 µl distilled water, using a thermal cycler. PCR amplification was done using the following temperature profile, 94 °C for 5 min; 2*(94 °C for 1 min, 65 °C for 1 min and 72 °C for 2 min); 2*(94 °C for 1 min, 62 °C for 1 min and 72 °C for 2 min); 4*(94 °C for 1 min. 59 °C for 1 min and 72 °C for 2 min); 25*(94 °C for 1 min.57 °C for 1 min and 72 °C for 2 min; 72 °C for 2 min); 72 °C for 5 min. Amplification products and 100 bp DNA ladder were resolved on 1% agarose gel using horizontal gel electrophoresis unit and the DNA fragments were visualized and documented using Gel documentation system.

SSR data analysis
The size of amplified fragments were estimated using 100 bp ladder, and only unambiguous bands of SSR markers were scored. Polymorphic information content (PIC) values were calculated for each SSR primer pair based on the presence or absence of bands, producing a binary data matrix of 1 or 0 for each marker. Genetic Diversity analysis using SSR markers, analysis of allelic variability and major allelic frequency and PIC is computed using PowerMarker V3.0 [13]. Genetic diversity parameters like effective allele number (Ne), Shannon's information index (I), Expected heterozygosity (He) and Fixation index (F) were computed. Cluster analysis based on the genetic distance matrix using the Neighbor-Joining (NJ) algorithm implemented in Darwin v6.0.018 [14] was performed. Principal Coordinate Analysis (PCoA) was done using GenAIex 6.5 [15].

Genotyping and haplotype block construction
In our previous study, Genotyping by sequencing was performed on the Illumina Nextseq500 platform with 2 × 150 bp v2 chemistry [16]. Monomorphic SNPs and SNPs with a call rate < 90% and SNPs with a minor allele frequency (MAF) < 0.05 were also excluded from the analysis.
The software haploview v.4.2 [17] was used to estimate the haplotype block structure in the 93 rice genotypes with the command: java -jar Haploview.jar -memory 4096.
Input data file (.ped file) and locus information file (.info file) was created by using PLINK v1.90b6.21. Based on the confidence interval algorithm, the haplotype blocks were constructed [18] with the default parameters: 'ignore pairwise comparisons of markers > 500 kb apart, exclude individuals with > 50% of missing genotypes'.

Decoding of haplotype linked rice functional genes
To analyze the variants and predict the functional consequence of the it, the chromosome number and marker position in the tagger output data of each chromosome were uploaded on Variant Effect Predictor (VEP) tool in the Ensembl Plants dataset (http:// plants. ensem bl. org). Ensembl results were compared with Gramene database (http:// www. grame ne. org) to get curated and integrative information about the genes which are depicted in Supplementary  Table S3. To characterize the selected genes, functions of each haplotype linked genes were decoded by using Rice Annotation Project Database (RAP-DB) (http:// rapdb. dna. affrc. go. jp/) and funRiceGenes database (https:// funri cegen es. github. io/) which will promote the functional genomics studies in rice landraces.

Polymorphism and marker efficiency
The current study evaluated the genetic diversity of 93 rice landraces. Here, the 12 SSR markers screened to assess the genetic variability and relatedness were produced reproducible and polymorphic bands. The gel images of amplified fragments using primer selected for the SSR markers were shown in Supplementary Figure S1. A total of 77 alleles were detected among rice genotypes with an average of 6.416 alleles per locus. The number of alleles per locus ranged from 4 (RM232 & RM248) to 9 (RM 206 & RM242), which indicates the richness of the population ( Table 1). The PIC value for SSR markers was ranged from 0.459 (RM232) -0.809 (RM 242) with an average of 0.658. Markers RM242, RM26, RM228, and RM206 were the most informative primers based on the highest PIC of 0.809, 0.764, 0.763, and 0.688 respectively. Among the 12 SSR markers used in this study, eleven markers had PIC values exceeding 0.5 (Table 1).
From the entire microsatellite loci analyzed in the study, the mean value of Fixation index (F), Shannon's information index (I), number of effective alleles (Ne), number of different alleles (Na) were 0.98, 1.23, 3.02, 5.5 respectively. The heterozygosity value ranged from 0.41 (RM 349) to 0.8 (RM 242) ( Table 1). The genotypes showed Shannon information index ranged from 0.8 to 1.7 and the highest 'I' value was exhibited by RM242 which also had the maximum

Haplotype block construction
In our recent study, Genotyping by sequencing the germplasm used for the current study yielded an average of 20 million reads per sample with an average depth on the reference genome (without Ns) in 5.37X to 13.01X range [16]. Haploblocks were constructed by using SNPs obtained from GBS of 93 rice landraces. A total of 270 haplotype blocks and 775 haplotypes were identified from all the chromosomes ( Table 2). Out of 79,953 filtered SNPs, 1402 tag SNPs were grouped into haplotype blocks. The maximum number of haplotype blocks was determined by combinations of SNPs located on chromosome 8 (n = 31). A lesser number of blocks were constructed by SNPs located on chromosome 10 (n = 16). From the total blocks formed by the chromosomes, 775 haplotypes were obtained, while each block had three haplotype variants on average. The number of SNPs in each haplotype block ranged from two to 28. The largest haplotypes had a length of 499 kb obtained from chromosome 12.

Analysis of haplotype linked genes
Several genes determining various traits have been reported in rice. However, haplotypes for developing supreme variety from the landraces collected from Kerala remain intangible. To elucidate the relationship between haplotypes and traits, selected SNPs related to different genes and their corresponding haplotypes were described in Fig. 2. The functions of each gene were explored from the Rice Annotation Project Database and FunRiceGenes database.

Haplotypes of abiotic stress tolerance genes
Plant, being sessile organisms cannot escape from the extremes in environmental factors resulting in stressful conditions. These ill effects can inhibit plant growth and development, and limit crop productivity up to 70% [9,10]. Abiotic stress tolerance is a multigenic trait and we identified the genes, OsRZFP34, OsNADP-ME2, OsDH-SRP1, OsSIRH2-14, OsGRAS23, CK2alpha2, OsSWI3C/ OsCHB705, OsAIR2 and corresponding haplotypes responsible for tolerance against different abiotic stresses (Table 3, Fig. 3a).

Fig. 2
Genes and their haplotypes related to abiotic and biotic stress tolerance, growth and development, and grain yield in rice landraces. Numbers in blue within the bracket specifies the number of haplotypes for that particular gene

Haplotypes of genes related to rice growth and development
Plant growth and development comprise a continuous process that is highly compatible with environmental conditions. By Several phytohormones control plant growth by regulating cell proliferation and expansion, and these phytohormones regulate meristem function and organogenesis which entails controlling cell division. Some of the genes involved in plant growth and development, RAD51C, HAZ1/ HOX1a, TIG1, WEG1, OspTAC2, OsABA2, OsCCC2 and their haplotypes are listed in Table 3, Fig. 3c.

Haplotypes of yield enhancing genes
The ideal way to increase yield within the existing agricultural land is to improve yield-related traits in rice [9]. The significant traits like grain length, grain width, number of well-filled grains per panicle, panicle number per plant, 1000 grain weight of etc. are directly associated with rice  [46] grain productivity. Some of the genes, OsSTAP1/OsLG3, OsMFT1, OsABCG3 and their haplotypes that strongly depend on the yield potential of rice landraces are illustrated in Table 3, Fig. 3d.

Polymorphism of microsatellite markers
Allelic variation among SSR markers was high enough to categorize rice landraces [23]. The number of alleles per locus (6.416), observed in the current study, was high compared with some earlier reports, which reported 2.274 [19], 5.79 [20]. Moreover this result was consistent with other studies, which showed 6.13 alleles per locus [21]. The current result showed greater diversity than the results scrutinized in previous studies showing 3-9 alleles with an average of 4.53 alleles per locus [22]. The variability in the number of alleles per locus might be allied to the marker's diversity in germplasm and specificity. PIC is the value of a marker for distinguishing polymorphism within a population-based on the number of detectable alleles and major allelic frequency. The higher the PIC value of a locus, the higher the number of alleles detected which indicates a higher diversity of germplasm. Thus, PIC values unveil allelic frequency and the diversity among rice germplasm could be assorted from one to another SSR locus [23]. The mean PIC value (0.658) explored from the current study is higher than the reported PIC value in previous work [24]. Markers with PIC values greater than 0.5 are considered highly polymorphic [24]. Thus the highly polymorphic markers explored in the current study are significantly informative for genetic studies, and revealing more alleles at a specific locus and the association with the alleles could be utilized for marker-assisted selection programs.
Gene diversity is often designated as expected heterozygosity, and higher the heterozygosity values, the broader the Fig. 3 Haplotypes of genes. A Genes related to abiotic stress tolerance, B Genes related to biotic stress tolerance, C Genes related to rice growth and development, D Genes related to yield genetic diversity. The average 'I' value (1.23) was higher than the value described earlier [25], which was found to be 0.58. A higher value (> 0.5) of genetic diversity indices like Shannon's information index and expected heterozygosity among the markers indicated the heterozygous nature of the population [25]. The highest levels of the actual and effective number of alleles showed in such locus also contribute to their significant levels of expected heterozygosity (here, 0.64).

Genetic relatedness using cluster analysis and principal coordinate analysis
Cluster analysis based on microsatellite allelic diversity demarcated rice landraces into different groups based on genetic distance and dissimilarity matrix. The clustering patterns in the current study illustrated that the selected markers were consistent as they can detect advanced levels of variations among closely related germplasm. These highlight the precision of SSR markers in tracking the phylogeny of rice landraces. The cumulative variation found out from PCoA analysis was higher than the earlier report which stated 13.33% of variance using 36 SSR markers [26]. The results indicated that the genotypes placed far away from the centroid were more genetically diverse while the genotypes placed near the centroid hold more or less similar genetic backgrounds [22]. Thus the results indicated that the rice genotypes used for the current study were highly diverse with relatively high values of the effective number of alleles, PIC, and Shannon information index. Moreover, utilizing these informative SSR markers for future molecular characterization and population genetic studies in rice landraces is advisable.

Haplotype block construction
Several studies had explored the use of haplotypes in different GS models, and the primary advantage of using haplotypes in GS is the capability to detect and include mutations as genomic information. In a rice study, among 177 rice accessions, 3211 haplotype blocks were found out from 3259 SNP combinations in the 12 rice chromosomes [6], whereas 270 haplotype blocks and 775 haplotypes were identified from the chromosomes of 93 landraces used in the current study. Eighteen consensus haplotype blocks were identified in Koshihikari, Hitomebore, Akitakomachi, and Hinohikari, which are the major cultivars of Japan [27]. Besides, these haplotype blocks identified could be allocated to genomic regions of the ancestral landraces and are crucial for identifying recent Japanese rice cultivars. Still, they did not develop functional annotations for theses haplotype blocks. The association between particular haplotypes and phenotypic differences is necessary. So in the current study, we had revealed functional genes associated with the detected haplotypes.

Haplotypes of abiotic stress tolerance genes
Rice RZFP gene, OsRZFP34 (OS01G0719100), had three haplotypes in the 20th hap block present in chromosome 1. Heat stress due to elevated temperature is one of the major environmental limitations of plant growth which can denature cellular proteins and reduce photosynthetic electron transport. Plants have evolved molecular responses like the synthesis of the heat shock proteins associated with thermo tolerance to acclimatize to stressful high temperatures. After exposure to a higher temperature, plant leaves can physiologically respond by regulating stomata opening, which results in heat dissipation from the leaves. Higher transpiration cooling lowers leaf temperature. OsRZFP34 has a physiological function in rice leaves with exposure to elevated temperature and drought to mediate stomatal aperture in leaf cooling [28].
NADP Malic-enzyme (NADP-ME2) [29], a major role in adapting to carbonate and salt stress, had three haplotypes clustered in the 20th hap-block of chromosome 1. OsDH-SRP1, involved in regulating the abiotic stress response, had three haplotypes clustered in the first hap-block in chromosome 2. Ubiquitination is one of the mechanisms that plants have developed to survive abiotic stresses. RING finger E3 ligase, Oryza sativa Drought, Heat and Salt-induced RING finger protein 1 (OsDHSRP1), gene transcripts were highly expressed under various abiotic stresses such as NaCl, drought, and heat and the phytohormones Abscisic acid (ABA) and regulate stress response through the Ub/26S proteasome system [30].
OsSIRH2-14, plays a vital role in salinity tolerance through ubiquitin/26S proteasome-mediated degradation of salt-related proteins [31], had three haplotypes clustered in the 22nd hap-block in chromosome 4. OsGRAS23, entailed in drought stress response [32], had three haplotypes grouped in the 23rd hap-block in chromosome 4. CK2al-pha2, which regulates response to phosphate level [33], had five haplotypes in the 1st hap-block in chromosome 7. OsS-WI3C, which plays a major in drought tolerance [34], had three haplotypes grouped in the 3rd hap-block in chromosome 11. OsAIR2, associated with arsenic stress tolerance [35], had four haplotypes grouped in the 17th hap-block in chromosome 11.

Haplotypes of biotic stress tolerance genes
OsPSKR1, an immunity regulating signal against the pathogen Xanthomonas oryzae oryzicola (Xoc), had three haplotypes grouped in hap-block 21 of chromosome 2. Bacterial leaf streak caused by the gram-negative bacteria 1 3 Xanthomonas oryzae pv. oryzicola is an important disease affecting rice production in the lowlands of Asia and OsP-SKR1 function as a phytosulfokine receptor and regulates resistance to Xoc [36]. OsLOX10, entailed in blast resistance [37], had four haplotypes clustered in the 17th hap-block in chromosome 11. Pi-ta, involved in blast resistance [38], had four haplotypes grouped in the 6th hap-block in chromosome 12.
Haplotypes of genes related to rice growth and development RAD51C, required for double-strand break repair in rice [39], had five haplotypes clustered in the 15th hap block present in chromosome 1. OspTAC2, regulates rice chloroplast development [40], had two haplotypes grouped in 20th hap-block in chromosome 3. HAZ1, entailed in gibberelin response [41], had four haplotypes clustered in the 4th hapblock in chromosome 6. TIG1, promotes cell elongation [42], had four haplotypes grouped in 24th hap-block in chromosome 8. WEG1, entailed in root elongation [43], had four haplotypes clustered in 16th hap-block in chromosome 9.

Haplotypes of yield enhancing genes
OsLG3 which plays an essential role in grain yield and length [44], had three haplotypes clustered in the 3rd hapblock in chromosome3. OsTIF1, involved in caryopses development [45], had three haplotypes grouped in the 2nd hap-block in chromosome 4. OsMFT1, which regulates rice heading and panicle architecture [46], had three haplotypes grouped in the 12 th hap-block in chromosome 6. OsABCG3, an ATP binding cassette transporter vital for pollen development [47], had three haplotypes grouped in the 25th hapblock in the chromosome 1. OsPHS8, [48] entailed in seed development and germination in rice, had three haplotypes grouped in the 29th hap-block in chromosome 8.
Many regions with low haplotype diversity were not allied with QTLs or genes. This finding may be due to undiscovered genes affecting important traits in the selected landrace germplasm. A haplotype is a set of nearby SNPs with strong linkage disequilibrium between them, the development of GWAS and competence of QTL identification based on haplotype blocks are more efficient than individual markers. Moreover, the use of haplotypes reduces the degree of freedom in the models of genomic association increasing the accuracy in QTL identification.
Global climatic changes and increasing population draw attention to the demand for novel allelic variants of rice stress tolerance genes. For increasing rice productivity, new yield-enhancing genes or superior alleles should be identified, and gene-specific markers need to be developed. Overall, this study revealed genetic diversity of rice landraces using different criterion such as allele number (Na), polymorphism information content (PIC), and genetic diversity indices like Shannon information index (I) and expected heterozygosity (He), which endows with adequate facts in the identification of rice genotypes. As far as we know, this is the first study in exploring the haplotypes in rice landraces from Kerala, and these results will therefore pave the way for evaluating genetic diversity and genomic composition in the unexplored rice landraces collected from Kerala and marker development and genome-wide association studies for next-generation breeding in rice.