Molecular mapping of drought-responsive QTLs during the reproductive stage of rice using a GBS (genotyping-by-sequencing) based SNP linkage map

In rice, drought stress at reproductive stage drastically reduces yield, which in turn hampers farmer’s efforts towards crop production. The majority of the rice varieties have resistance genes against several abiotic and biotic stresses. Therefore, the traditional landraces were studied to identify QTLs/candidate genes associated with drought tolerance. A high-density SNP-based genetic map was constructed using a Genotyping-by-sequencing (GBS) approach. The recombinant inbred lines (RILs) derived from crossing ‘Banglami × Ranjit’ were used for QTL analysis. A total map length of 1306.424 cM was constructed, which had an average inter-marker distance of 0.281 cM. The phenotypic evaluation of F6 and F7 RILs were performed under drought stress and control conditions. A total of 42 QTLs were identified under drought stress and control conditions for yield component traits explaining 1.95–13.36% of the total phenotypic variance (PVE). Among these, 19 QTLs were identified under drought stress conditions, whereas 23 QTLs were located under control conditions. A total of 4 QTLs explained a PVE ≥ 10% which are considered as the major QTLs. Moreover, bioinformatics analysis revealed the presence of 6 candidate genes, which showed differential expression under drought and control conditions. These QTLs/genes may be deployed for marker-assisted pyramiding to improve drought tolerance in the existing rice varieties.


Introduction
Rice (Oryza sativa L.) holds the second position in maintaining food security for mankind, just next to wheat, and is consumed by almost half of the world's population, mostly in Asia [1]. The South East Asian region is home to almost 1 3 61 million under nourished people, and it also produces around 30% of the world's rice harvest. But the water footprint for the production of rice is very high [2]. The existing rice varieties yield an average of 4.5 t/ha under irrigated conditions. There is a need for at least 8-10 million metric tons more rice to be produced each year, with an annual increase of 1.2-1.5% (0.6 t ha −1 ) in the coming decade. The pressure on rice-growing lands is increasing globally day by day because of climate change, urbanization, over-exploitation of groundwater resources, and competition from high-value agriculture crops in developing countries [3]. From early 2000 onwards, the South Asian regions, including the Indian sub-continent, have been among the perennially droughtprone regions in the world. Drought produces devastating effects in rice at its critical stage i.e. panicle initiation stage (PI stage) [4]. In India, the eastern region has the highest rice cultivation intensity [5]. This region is characterized by heavy rainfall, hot and humid climate, which is good for rice production. Therefore, rice is cultivated as a main crop in these regions. However, due to erratic rainfall, eastern India has a drought frequency of one in 5 years, often resulting in up to a 40% loss in total rice production in India [6,7]. In order to ensure food sufficiency, it is important to improve rice productivity in eastern India amidst drought incidences, where most of the rainfed rice is grown by small-scale farmers for sustenance [8].
The North-East region of India is native to diverse and unique rice accessions. In Assam, rice is a major crop and is cultivated on around 26.52 lakh hectare with an average productivity of 1.48 tonne per hectares. It is grown mostly during three consecutive seasons, viz., Ahu (March/April to June/July), Sali (June/July to November/December), and Boro (October/November to May/June). Among these seasons, Sali rice is a major cereal crop in Assam. The transplanted Sali rice is severally affected by the flash floods, which cause a huge loss in total yield. Whereas, in recent days, drastic changes have been observed in the climatic conditions, causing more frequent and severe drought conditions. The field crops during the Sali season are suffered from flash flood followed by intermitted drought stress in the same cropping season. In Assam, the lowest productivity of rice is recorded during the Ahu season as compared to the other rice-growing seasons. It is mainly due to the limited quantity of rainfall recorded during the Ahu season. The crop suffers from drought stress, which results in significant decline in total yield. Therefore, development of droughttolerant improved rice varieties may be considered as effective approach to overcome the yield loss in rice production.
QTL mapping is a prospective tool to detect genomic regions associated with the inheritance of polygenic traits [9]. In this regard, many QTLs have been identified for various droughtresponsive traits in rice using high-density molecular markers [10]. Among the various molecular markers, Single Nucleotide Polymorphisms (SNPs) are the most valuable and informative genetic markers [11,12], as these are uniformly distributed across the genome [13,14]. In the recent days, plenty of SNPs have been identified in rice using various next-generation sequencing platforms (NGS) such as Illumina, Ion Torrent, Complete genomics technology, Single molecule real time sequencing and Oxford nanopore technologies [15,16]. These sequencing platforms allows fast and low-cost genotyping of short or long nucleotide sequences. The genotyping-bysequencing (GBS) approach proposed by Elshire et al. [17] is a high throughput technique for genome-wide SNP discovery by multiplexing samples. It follows simple library preparation procedures for the detection of high-density SNPs, which may be used in QTL mapping within narrow confidence intervals, and also avoids the cost of in-depth whole genome sequencing [18]. Therefore, the present research programme was proposed with an objective to identify QTLs/candidate genes associated with yield component traits under drought stress using traditional rice varieties of NE India. The identified QTLs may be deployed in crop improvement programmes for the development of drought-tolerant rice varieties, for cultivation in Assam and other states of NE India.

Plant materials
In the present research programme, a traditional drought-tolerant landrace, 'Banglami' was hybridized with drought-susceptible rice variety 'Ranjit'. The tolerant parent, Banglami is characterized in tall height, medium in duration (120-130 days) with bold in grain type. While, Ranjit is a semi-dwarf, weekly photosensitivity, long duration rice variety (150-155 days) it has medium slender grains. The true F 1s were identified using molecular markers and used for development of F 2 plants. The single seed from each F 2 plants was bulked for development of recombinant inbred lines (RILs). The high density genotyping and phenotyping was performed using advanced RIL population grown under drought and control conditions.

Phenotyping
A set of 210 F 6 and F 7 RILs derived from the cross between 'Banglami × Ranjit' were evaluated under drought stress and control conditions during two consecutive years (2018 & 2019). The rice varieties, SahbhagiDhan, Nagina 22 and IR-64-drought were used as tolerant checks while, a short duration variety 'Luit' was used as a susceptible check for phenotyping. The field experiments were laid out in Augmented-Randomized Block Design (ARBD). The drought stress was imposed during the reproductive stage by withholding the irrigation water till the soil moisture content was reduced to 10% at 30 cm of the depth. Whereas, control (non stress) trial was conducted in the adjacent field following the standard agronomical practices. The whole trial layout was divided into 4 blocks with 2 replications of checks and parents in each block, for a total of 258 experimental units that were sown. The observations were recorded at different growth stages for sixteen morphological traits viz., days to 1st heading (DH), days to 50 percent flowering (DTF), number of tillers per plant (NOT), number of effective booting tillers per plant (EBT), plant height at maturity (PH), panicle length (PL), flag leaf length (FLL), flag leaf width (FLW), number of filled grains per panicle (NOG), number of chaffs per panicle (NOC), spikelet fertility (SF), grain yield per plant (GYP), total panicle weight per plant (TPWP), panicle harvest index (PHI), and test seed weight (TSW) (Supplementary Table 1). The statistical analysis of phenotypic data was performed using the software programmes R-Studio and SPSS.

Genotyping-by-sequencing
Genomic DNA was isolated from young leaves of both parents and population using standard protocol of Qiagen Plant DNeasy Mini Kit (QIAGEN, USA), and ApeKI restriction enzyme was used for GBS multiplex library preparation following the standard protocol [17]. DNA fragments were generated, and ligated with Illumina barcoded adaptor sequences. The fragments were then amplified using the KAPA HiFi Hotstart Ready Mix PCR Kit (KAPA Biosystems) to form a 192-plex library. The final PCR products were purified using Beckman Coulter AMPure XP to remove unincorporated dNTPs, unused primers, and primer dimers. The enriched plex library was quantified and sequenced in pair-end mode on a single lane on an Illumina HiSeq-TM X10 platform (Illumina® Inc., San Diego, CA, USA).

GBS data analysis
The raw sequence reads were processed using the Genome Analysis Toolkit (GATK)-GBS pipeline to identify highquality informative SNPs [19]. Illumina adapters and other unwanted low-quality sequences were trimmed using Trim Galore [16]. Next, reference-based read mapping was done by alignment of paired-end reads against the Oryza sativa genome assembly ensemble 48 using the Burrows-Wheeler Alignment programme (BWA), followed by variant (SNP) calling. The identified variants were filtered using VCFtools [16], based on a minor allele frequency of 0.05, minimum read depth of 5, and Phred quality score of 30 [20]. Finally, the informative SNPs were screened and only the polymorphic SNPs were brought forward for the construction of the genetic linkage map.

Identification of QTLs/candidate genes
The QTL IciMapping software [21] was used to remove the distorted SNPs through the χ 2 test to avoid unwanted noise in map construction [22]. Also, it filtered SNPs that are redundant in nature and positions with the most missing data to maintain the map quality. After stringent filtration, a genetic linkage map was constructed by JoinMap where SNPs were grouped into 12 linkage groups (LGs). The QTL analysis was performed using phenotypic data recorded during two consecutive seasons using the Inclusive Composite Interval Mapping of Additive Function (ICIM-ADD) programme [23]. The QTL nomenclature was done as per the procedure standardized by McCouch [24,25]. The QTLs identified under drought stress conditions were selected for candidate gene mining. The gene survey was restricted to a 1 mega base pair (MB) region, covering 500 kb regions towards both sides of the (upstream and downstream) QTL peak, considered as QTL bins. The sequence of flanking markers to the QTLs were extracted and mapped on the rice genome using the BLASTn tool. In order to survey genes, the physical map constituting the whole genome sequence of Oryza sativa belonging to subspecies Indica was used as a reference. The nucleotide sequence and genes in the QTL bins were retrieved from the Ensembl database version 67. The candidate genes within the QTL bins were analysed using gene expression atlas (https:// www. ebi. ac. uk/ gxa/ home).

Phenotypic analysis
The phenotypic evaluation of advanced RILs (F 6 and F 7 stage) was performed during the two consecutive years in 2018 and 2019, respectively. The mean trait data on yield component traits were recorded under drought and control conditions are presented in Table 1. Among the yield attributes, days to 50% flowering (DTF) is the most critical factor associated with drought tolerance in rice [9]. Under drought stress, the average number of 114.87 days was observed for DTF among the genotypes as compared to 124.43 days recorded under control conditions. The tested entries have flowered early under drought stress. In rice, early flowering is a mechanism to escape drought [26]. Moreover, significant positive co-relation was recorded between DTF and yield component traits, PH, PL, NOC, NOG and SF under control conditions whereas, negative correlation was observed between DTF with NOT, EBT and GY under drought conditions. In rice, spikelet fertility significantly contributes to grain yield under drought by regulating the number of filled grains [27]. Therefore, a significant decline in the SF (32.14%) was recorded among the genotypes under drought as it was observed as 89.81% under control condition. Suji et al. [28], report a significant positive correlation between grain yield with spikelet fertility under drought stress. As a consequence, significant decline in grain yield to 8.29 g/ plant was also recorded during the drought. While, the genotypes were recorded for an average grain yield of 16.26 g/ plant under control conditions.

Genotypic analysis
High-quality DNA was extracted from young and tender leaves and subjected to the GBS assay on an Illumina Hi-SeqTM X10 platform as this is the most preferred sequencing system with a low error rate and has significantly contributed to many crop genome sequencing projects [29]. In comparison to SSR markers, GBS takes just a week for library preparation and sequencing and provides excellent results [17,30]. A total of 103.74 GB of raw sequencing data was generated and submitted to short read archive (SRA) database of NCBI and assigned to a Bio-Project number, PRJNA770474. The average base content per sample was 246.55 million and the average guanine-cytosine percentage (GC%) per sample was 58.91%. After all the clean-up processes, a total of 81.74 GB of data was used for the alignment process. Among these, approximately 97.34% of reads were aligned properly to the reference genome. A total of 3,224,967 SNPs were identified. Among them, only 30,122 SNPs were left after filtering by several parameters, which were screened for significant segregation distortion. Only 5758 polymorphic informative SNPs were assigned to 12 Linkage Groups (LGs). A total of 90 segregation distortion regions were identified on all the 12 chromosomes, out of which the maximum number of distorted regions (15) was found on chromosome 1. A total of 1112 SNPs were discarded at the time of final ordering and rippling as they were unlinked in any of the LGs. Finally, 4646 polymorphic informative SNPs were grouped into 12 LGs by using the Kosambi mapping function. The genetic map showing the distribution of 4646 markers can be seen in Fig. 1. Several linkage maps have already been established for rice, which have proven functional in deciphering the drought tolerance mechanism of rice [31,32].
Finally, a total map length of 1306.424 cM was constructed, whereas the size of LG varied from 64.72 to 144.21 cM for chromosome 6 and chromosome 2, respectively ( Table 2). The average length of a chromosome was observed as 108.87 cM consisting of 3.75 SNPs per cM with 0.32 cM of inter-marker distance. The present genetic map represents 96.74% of the total Oryza sativa L. genome. While, LG1 had a maximum coverage of 99.4% and LG10 had the lowest coverage of only 86.28%. The maximum number of 580 SNPs were identified on chromosome 4 whereas, only 158 SNPs were detected on chromosome 11 with an average marker density of 387 SNPs per chromosome.  (Table 4). Therefore, the tolerant genotypes have the tendency to flower early in order to escape from drought stress. It is an important mechanism in rice to produce grain despite limited water availability [34,35]. In rice the production of tillers is a highly complex process affected by environmental conditions. This is an important trait, directly governing the yield potential of genotypes [36]. However, in the present study only 2 QTLs, qNOT_10.1 and qNOT_12.1 were identified for NOT under control conditions. These QTLs were located on chromosomes 10 and 12, explaining 2.86% and 8.48% of PVE, respectively. Under control conditions, 3 QTLs, qPH_5.2, qPH_8.1 and qPH_10.1, have been identified for PH on chromosomes 5, 8 and 10, respectively. Among these, qPH_5.2 was located at 27.32 Mb on chromosome 5, explaining 6.27% of PVE, while the remaining QTLs showed only 3.10%-5.25% contribution to the phenotype. A total of seven QTLs were mapped for the PL which were distributed across the 6 different chromosomes 1, 2, 4, 6, 7 and 12. Among these, 5 QTLs, qPL_1.1, qPL_2.1, qPL_6.1, qPL_7.1 and qPL_12.1 were identified under drought conditions whereas, the remaining 2 QTLs, qPL_4.2 and qPL_12.2 were detected under control conditions. It is a yield-attributing trait significantly associated with grain yield under drought-stress conditions. Whereas, a consistent QTL for PL, qPL_2.1 was identified at 23.42 Mb position on chromosome 2 under both hydrological conditions i.e., drought stress and control conditions. It was also   [37]. Several large effect and consistent QTLs have previously been identified [7] and are actively being introgressed into the varieties [10]. In the present study, a total of 3 QTLs were identified for the GYP, out of which only qGYP_3.1 was detected under drought stress while, qGYP_3.2 and qGYP_4.1 were identified under control conditions. Among these, qGYP_3.1 was located at 20.46 Mb on chromosome 3 explaining 6.89% of PVE, the favourable allele of which was contributed by the drought-tolerant parent Banglami. In the past, several researchers have already reported grain yield, a highly complex quantitative trait associated with the various component traits in rice. The mechanism for providing drought tolerance among plants is very complex process. It may result due to the interaction among the multiple quantitative traits including physiological and biochemical factors during vegetative and reproductive stages of the crop [38]. In rice, relative leaf water content (RLWC) is an important physiological trait associated with drought tolerance. It is an important indicator of water stress in leaves and associated to the moisture content of soil [31]. Under drought stress, tissue has a very low water status, and this becomes an important limiting factor and causes reduction in grain yield [39]. We have identified 4 QTLs, qRLWC_2.1, qRLWC_4.1, qRLWC_6.1 and qRLWC_12.1, for RLWC located on chromosomes 2, 4, 6 and 12 respectively. These QTLs contributes 3.78-8.17% of the total PVE. The favourable allele among all these QTLs was contributed by the tolerant parent. The QTLs identified during the present research programme were further investigated to identify the major candidate genes associated with drought tolerance.

Mining of candidate genes from the identified QTLs
The QTLs identified for various drought-responsive traits were studied using bioinformatics tools to locate candidate genes. A total of 17,390 genes were identified within the QTLs using the online available resources of IRGSP-1.0 of  After gene ontology studies, all annotated genes were classified into 6978 gene ontology (GO) terms, which were then assigned to three major GO terms: biological process (3049 GO terms), molecular function (2027 GO terms), and cellular component (1902 GO terms). Among these, six genes were identified within the important QTL regions which were differentially expressed (p < 0.05) under drought stress conditions. The list of candidate genes presents within the QTL regions, which are differentially expressed ispresented in Table 5. Among these, 5 genes are upregulated and 1 gene is downregulated under stress. The genes regulating calciumtransporting ATPase 9 (LOC_Os02g08018.1), nuclear-pore anchor (LOC_Os02g50799.1) and phosphoinositide binding protein (LOC_Os02g27130.1) were found to be up-regulated under drought stress condition within qSF_2.1. This is similarly reported in the study of Faraji et al. [40]. The regulated activity of calcium-transporting ATPases is needed for Ca 2+ flux inside and outside of cells (homeostasis) during stress and triggers a number of signalling cascades in plant cells.
The nuclear pore complex (NPC) is an important gateway for the trafficking of molecules between the nucleus and cytoplasm. The NPC along with its different karyopherins (nucleoporins, importins and exportins) have been found to be involved in the plant responses to different stress conditions [41]. The phosphoinositide binding proteins are important signalling molecules involved in the phosphatidylinositol signalling pathway in plants and also get involved in the ABA pathway, Ca 2+ release, and ROS generation during abiotic stress [42]. Within the same QTL interval, another gene was found to be downregulated under control conditions i.e. Cytochrome P450 (LOC_Os02g29960.1). When plants are exposed to drought stress, ABA comes into the highlight and constantly fluctuates during dehydration and rehydration which is catalyzed by ABA 8-hydroxylase, an enzyme that belongs to the Cytochrome P450 family [43]. The gene regulating histone demethylase JARID1C (LOC_Os03g05680.1) was found to be upregulated within the genomic region of qGYP_3.1. JARID, a histone demethylase enzyme that is involved in histone demethylation. Emerging evidence suggests the role of histone modifications in regulating plant responses to abiotic stresses [44,45]. The gene regulating OsWAK3-OsWAK receptor-like cytoplasmic kinase (LOC_Os01g20880.1) was found to be up-regulated within the genomic region of qPL_1.1. Receptor-like cytoplasmic kinases (RLCKs) have been reported to play important role in mediating the cellular response to various environmental stress perception [46]. Mining of these important candidate genomic regions allowed the narrowing down of different trait QTLs mapped in this study. The study of candidate  genes and their differential expression has much significance in dissecting the drought tolerance strategies of rice. These candidate genes can be validated in further gene expression studies and can be exploited for rice breeding to understand the adaptability of rice to drought stress.

Conclusion
Drought stress is a complex, abiotic stress. It is a serious threat to the rice-growing ecosystem of South-east Asia. Considering its future impact on the cultivation of rice, breeding for drought tolerance should be a prioritized objective for the researchers. The various traits studied here under both stress and non-stress conditions provide a major highlight of the QTLs and their associated genes, which can be further dissected and validated in different populations. Further, the contribution of either of the parents in contributing the favorable allele for different QTLs for different quality traits strengthened the usefulness of this population in further generations for QTL analysis and genetic interaction analysis between the alleles. The six genes identified within the drought stress QTLs can be considered as potential candidates for future investigation. All these QTLs and possible candidates will serve as an enriched raw material for molecular breeders to start another molecular breeding programme.