Morpho-molecular characterization of ethnic Bora rice for conservation and breeding

Glutinous Bora rice plays an important role in socio-economic and cultural livelihood of Boro tribes of Assam. However, neither systematic survey had been done, nor databanks are available on their detailed morpho-molecular characterization essential for breeding programme. Present study attempted to prepare a morphological database with molecular signature using a total of 22 Bora rice lines collected from upper Brahmaputra basin of Assam, which were phenotyped with 27 popular agro-morphological traits followed by marker based genotyping, sequence diversity study and expression profiling for selected genetic loci associated with low amylose content. Long day flowering and less number of tillers with panicles are the main discouraging traits of Bora rice that showed negative association with culm length. Among the 8 different phenotypic markers established, anthocyanin pigment in different parts of the plant and availability of awn may be used as identifying criteria for selection of the hybrid lines introgress with “Komal” trait. In marker based genotyping, all the studied marker loci were reported to be polymorphic with variable number of allelic forms and RM241 showed highest PIC value. In case of expression study, 5 days (after anthesis) old developing grains of selected Komal genotype (Aghani Bora) showed 90.9-fold down regulation of GBSSI (loci associated with amylose deposition in rice grain) with respect to the commonly grown popular rice (IR36). Phylogenetic relationship study with 5 different rice genomes (O. rufipogon, O. glaberrima, O. sativa japonica group, O. sativa indica group and IR36) based on sequence diversity of six major genetic loci (associated with starch deposition in endosperm) showed distinct relationship. Selected morpho-molecular markers (polymorphic between Bora and improved rice) were subjected to validation on hybrid population developed from selected Bora rice (Vogali Bora) and improved rice (IR36), which showed a promising response for use in marker assisted breeding program.

length. Among the 8 different phenotypic markers established, anthocyanin pigment in different parts of the plant and availability of awn may be used as identifying criteria for selection of the hybrid lines introgress with "Komal" trait. In marker based genotyping, all the studied marker loci were reported to be polymorphic with variable number of allelic forms and RM241 showed highest PIC value. In case of expression study, 5 days (after anthesis) old developing grains of selected Komal genotype (Aghani Bora) showed 90.9-fold down regulation of GBSSI (loci associated with amylose deposition in rice grain) with respect to the commonly grown popular rice (IR36). Phylogenetic relationship study with 5 different rice genomes (O. rufipogon, O. glaberrima, O. sativa japonica group, O. sativa indica group and IR36) based on sequence diversity of six major genetic loci (associated with starch deposition in endosperm) showed distinct relationship. Selected morphomolecular markers (polymorphic between Bora and improved rice) were subjected to validation on hybrid population developed from selected Bora rice (Vogali Bora) and improved rice (IR36), which showed a promising response for use in marker assisted breeding program.

Introduction
Rice landraces and wild rice relatives are unique valuable genetic resources (Odjo et al. 2017), for which study of rice population structure restricted within a particular area is very much essential for their use. However, since the onset of green revolution a number of indigenous rice landraces have been replaced by high yielding sd-1 carrying improved dwarf rice lines in most of the rice growing regions of South and South East Asia (Nelson et al. 2019). India being one of the centers for origin and diversity of rice, was once harbor plenty of indigenous rice lines; however, at present majority of them are either extinct or on the verge of extinction (Deb 2019;Ranteallo et al. 2020). Five North Eastern states of India are very rich for ethnic floral diversity due to its specific geographical location and agro-climatic conditions (Roy et al. 2016). The Bora rice of Assam is one such traditional rice landraces having low amylose content, soak-n-eat property and used for preparation of "Komal rice"one of the valuable folk rice preparations popular among the Boro tribes of Assam (Rajak and Bhuyan 2013). Bora rice is used in daily breakfast by making puffed rice, Bamboo rice, rice beer and other various ways (Panja et al. 2022a). These are the reasons for farming of this kind of rice by local farmers in small pockets of rural Assam despite the availability of high yielding improved varieties (Shaptadvipa and Sarma 2009). Parboiled rice of selected Bora rice (Aghani Bora, Vogali Bora etc.), which are commonly known as Komal or soft rice as they are edible after soaking in normal water after 50-60 min, are not only important for their usage by ethnic Boro tribes of Assam but also being investigated by the environmental biologists as possible environment friendly food for near future when natural energy source or fossil fuel will be under limitation (Samaddar andsamaddar 2010, Panja et al., 2022b). Thus, preservation, conservation and popularization of this group of lines are of utmost importance for their utilization and introgression of this unique trait in normally cultivated rice through breeding programme. For doing so, the initial step is the unequivocal identification and preparation of a databank for all the available promising Bora rice lines. Though very limited information (Pragnya et al. 2018, Rathi and Sarma 2012and Shaptadvipa and Sarma 2009) are available on their common morpho-molecular from limited marginal farmers in scattered way, a well-documented databank is very much essential for their proper usage and introgression in breeding program. Available reports (Bomit et al. 2018;Islam et al. 2018;Umarani et al. 2017;Roy et al. 2014) showed that morpho-molecular characterization of rice landraces is powerful tool for conservation and utilization for breeding. Considering this significant aspect, in present study 22 Bora rice landraces were characterized for important agromorphological trait, molecular marker based genotyping and functional validation of low amylose trait through RT-PCR based expression profile of selected genetic loci. To study the Phylogenetic relationship of this group with other rice lines and rice relatives a number of genetic loci associated with starch synthesis were sequenced and bioinformatically analyzed. Finally, identified morpho-molecular markers were then validated for usefulness in breeding of Bora rice; outcome of this validation will be very helpful to breeders during marker assisted selection of hybrid Bora rice.

Agromorphological characterization
For agromorphological characterization 22 glutinous rice lines, popularly known as Bora ricecollected from upper Brahmaputra basin of Assam ( Fig. 1) were included in this study along with a popularly grown improved high yielding variety (IR36). Images of grain kernel of all the experimental lines with provided accession numbers are presented in Fig. 2   were studied on 6 plants randomly selected from each rice line following the standard guide lines of IRRI and modified protocol of Pachauri et al. (2013).

Statistical analysis of quantitative agromorphological traits
Association between different grain and kernel dimension was studied statistically and a pair's panel was constructed using R, an open-source statistical program. Correlation among major desirable agromorphological traits considered by growers was done through Pearson correlation matrix analysis. All the 19 traits and their distribution among the studied lines were presented in a heat map where the degree of a particular phenotypic variant carried by a clustering genotype is presented through color gradation. Distribution of the rice lines based on their determining role in the total population is studied through Principal Component Analysis (PCA), where the association and distribution of the studied lines in respect of their determining role of the population is presented. Different statistical tools used in this study are given in supplementary files S 1.

Molecular characterization
Genomic DNA of all studied Bora rice lines along with improved rice line (IR36) was isolated followed by genotyping with 4 trait-linked molecular markers (supplementary files, S 2) according to a standard protocol used earlier (Karmakar et al. 2012). PCR products were resolved in 3% Agarose gel and the molecular weight was determined to construct a marker-based microsatellite panel. Based on the presence and absence of an allele of studied markers a 1/0 matrix based Dendogram was constructed. Polymorphism information content (PIC) values of each marker were calculated using the most popularly used formula (Hwang et al. 2009). RT-PCR based expression study of GBSSI, the key loci responsible for amylose synthesis in developing grains was carried out both in selected Bora rice (Aghani Bora) and improved rice (IR36) using 5-days old developing grains. Primer sequences of GBSSI and reference gene (actin) are given in supplementary files S 3. Total RNA was extracted using RNeasy Plant Mini Kit of Quiagen followed by cDNA synthesis with Thermo Scientific RevertAid First-Strand cDNA Synthesis Kit. RT PCR was carried out in Bio Rad real time PCR system and reaction was set up according to the manufacturer's protocol of Sybr green master mix of Promega.
Considering the low amylose containing properties of Bora rice the genomic DNA of selected line (Aghani Bora which is typical representative of Bora rice with 'Soak-and-eat' trait) and high amylose containing improved rice (IR36) without Komal trait were amplified with 6 genetic loci (supplementary files S 4) associated with starch synthesis in developing endosperm. The amplified products were sequenced and derived FASTA sequences were searched against genomic sequences of 2 wild rice (O. rufipogon-the progenitor of modern rice, O. glaberrima-the progenitor of African rice), one japonica rice (O. sativa var. Nipponbare) and one indica rice (O. sativa cv. 93-11) using the Blast tool available in Ensembl Plants data base (https:// plants. ensem bl. org/ Multi/ Tools/ Blast).
Aligned sequences were retrieved and subjected to phylogentic study with MEGA X tool (Kumar et al. 2018) using Neighbor-Joining method and evolutionary distances were determined. Twelve derived sequences (6 from Aghani Bora and 6 from IR36) were then deposited in NCBI gene bank and generated IDs are mentioned in the constructed phylogentic tree.

Agromorphological characterization
Grain and kernel dimensions, as were found to vary among rice lines, are graphically presented in Fig. 3a-c. Maximum grain size (4.37 cm 2 ) was noted in VB548 (Vogali Bora) and maximum kernel size (4.15 cm 2 ) found in VB550 (Poita Bora), whereas minimum grain size (2.32 cm 2 ) and minimum kernel size (1.8 cm 2 ) were observed in VB568 (Titaphulia Bora). Flag leaves were very large in Bora rice in comparison to improved rice (IR36) and showed characteristic of typical land races ( Fig. 3d-f). Thus, largest flag leaf size was found in VB562 (Pangari Bora) and largest flag area was found in VB552 (Kola Bora), with 45.28 cm 2 and 68.80cm 2 respectively (Fig. 4a). Plant heights of Bora rice lines were taller than the studied improved rice line (Fig. 4b.) The tallest culm length (175.31 cm) was recorded in VB567 (Saudang Bora) and smallest (94.86 cm) was noted in VB547 (Aghani Bora). Regarding yield and improvement related traits, Bora rice lines showed least yield performance with respect to improved rice lines. Among all studied Bora rice lines, maximum number of tiller (> 14) and panicle (> 12) was recorded in Aghani Bora, whereas in improved rice (IR36) maximum number of tiller and panicle was more than 20 and 18, respectively (Fig. 4c). Longest panicle length (27.26 cm) was recorded in VB556 (Ruphohi Bora), which is quite better than improved rice lines, but in the same plant number of panicle was only 9 (Fig. 4d). Improved rice lines took only 69 days to 50% flowering and 100 days to harvesting, but in studied Bora rice lines days to flowering and harvesting was too long; minimum days to flowering and harvesting was found to be 98 and 130, respectively in Soraibhanu Bora (Fig. 4e). Highest number of grains (> 220) per panicle was observed in Huki Bora (VB561) while the number of panicles in VB561 was only < 9 (Fig. 4f). Yield, in term of 100 grain weight, ranges from 2.2 to 3.95 g among the studied rice lines (Fig. 5a).
Based on variable amount of anthocyanin deposition in different parts of the shoot, five types of immature panicle (Fig. 5b), two types of auricles (Fig. 5c), two types of basal leaf sheath (Fig. 5d) and two types of leaf margin (Fig. 5e) were recorded in studied lines. Also, based on presence and absent of awn, two types of grain morphology were noted (Fig. 5f).
Grouping of studied rice lines (Table 1), as was done following the standard scale of IRRI (supplementary files S 5) with 6 quantitative traits, showed that most of the rice lines belong to medium size grain, slender kernel and tall plant with medium number of tillering. Grouping of rice lines (Table 2) with 8 qualitative traits (identified markers) showed that majority of the plants possess awn in grain and anthocyanin pigmentation in different parts of the plants.

Statistical analysis of quantitative agromorphological traits
The association between dimensions of grain and kernel was presented in a pair's panel (Fig. 6a) and correlation among all the quantitative traits is presented in a correlogram (Fig. 6b) (calculated correlation matrix is given in supplementary files S 6). Correlation analysis showed that GS and KS are positively correlated whereas PH and NT are negatively correlated. Cluster analysis based on studied traits generated two major clusters, of which the 1st cluster included most of the Bora rice lines, whereas 2nd cluster included rest 5 Bora rice lines [Aghani Bora (VB547), Vogali Bora (VB548), Bora 1(VB549), Poita Bora (VB550) and Narkul Bora (VB551)] along with improved rice line (IR36), which was included in a separate satellite sub cluster, as represented in cluster with heat map (Fig. 6c). Heat map clearly showed that improved rice line has characteristic of highest numbers of panicle with short height and early flowering, whereas most of the Bora rice lines showed contrasted except 5 rice lines which were included in 2nd cluster close to the improved rice line (IR36). PCA analysis also showed result like cluster analysis where 30.6% of total variance was found in PCA 1 and 16.8% in PCA 2 (Fig. 6d) where improved rice (IR36) has been included in PCA 2 with few Bora rice lines whereas most Bora rice lines are in PCA 1.

Molecular characterization
Genotyping with 4 trait-linked rice microsatellite markers showing polymorphism was used to construct a microsatellite panel (Fig. 7a) where all the different alleles and corresponding allelic ranges were presented. Maximum numbers (6) of alleles were observed for RM241 and minimum (3) for RM332 across the studied rice lines. Unique alleles were found for RM207, RM241 and RM266, which are monomorphic. PIC of RM207, RM332, RM241 and RM266 are 0.711, 0.553, 0.779 and 0.679 respectively. Highest PIC value was found in RM207 and RM241 (linked with GL and GB respectively) which are the molecular signature for varied range of GL and GB of studied population. Grain weight study among the studied Bora lines showed that about 86% rice lines showed less than 3 gm/100 grains. Marker based genotyping with RM266 (linked with grain weight) showed that most of the lines showed either A or C allele. On the other hand, marker based genotyping with RM332 (linked with panicle length) showed that improved rice with short panicle possess B allele whereas most of the Bora rice line posses A allele with longer panicle than improved rice. In the constructed Dendogram (Fig. 7b), there were two major    548,549,550,551,552,554,555,556,557,559,561,565 Absent (White) 09,547,553,558,560,562,563,564,566,567,568 PA Present (Purple) 548,549,551,552,553,558,559,561,563,565,566,567 Absent (White) 09,547,550,554,555,556,557,560,562,564,568 PLM Present (Purple) 548,549,550,551,552,554,556,557,559,561,565 Absent (White) 09,547,553,555,558,560,562,563,564,566,567,568 CIP Whitish green 09,547,553,554,560 Greenish yellow 548,559 Blackish 549,550,551,556,557,561,562,564,565,566,567 Golden brown 552,555,558,563,568 PLP Whole 548,549,550,551,552,555,556,557,558,559,561,562,563,564,565,566,567,568 Only tip 553,554 Absent 09,547,548,549,550,551,554,555,556,557,559,560,561,565,567 Absent 09,552,553,558,562,563,564,566,568 clusters, which included 14 and 9 rice genotypes, respectively. Expression profile of GBSSI transcript in VB547 showed significant down regulation (90.9 fold) with respect to improved rice (VB09) as graphically presented in Fig. 7c and clastogram, (Fig. 7d). Phylogenetic tree for AGPSIIb loci (Fig. 8a) revealed two separate clusters, of which the first one included both komal and improved rice, whereas in second cluster Japonica group shared with O. rufipogon and O. glaberrima with Indica group to form a sub cluster. In AGPLII derived phylogenetic analysis (Fig. 8b) O. glaberrima showed distance from rest of the lines, whereas the studied komal rice and improved rice formed an isolated cluster, to which rest of the lines joined. GBSSI derived phylogenetic tree (Fig. 8c) showed that O. rufipogon got separated from rest of the lines forming a cluster, while Indica group along with African rice (O. glaberrima) and Japonica group share with Komal rice. In SSIIa derived phylogenetic analysis (Fig. 8d) the improved rice (IR36) distinctly separated from all other rice lines but komal rice lie with Japonica and Indica group to form a small sub cluster. In case of phylogenetic tree of SBEIIb (Fig. 8e), IR36 and komal rice lied in a single cluster, whereas rest of the line form another cluster. In ISA1 sequence based Phylogenetic analysis (Fig. 8f) Aghani Bora totally become separated from rest of the lines, where the African rice and O. rufipogon shared to form a small sub cluster, to which Japonica and Indica joined.

Utilization of identified morpho-molecular markers in breeding of Bora rice
Among different identified morphological markers two (anthocyanin pigmentation in immature grains and presence of awn) were validated along with promising molecular marker (RM332) giving reproducible band, which were subjected to polymorphism screening between Bora rice (Vogali Bora) and improved rice (IR36) and their heterozygous nature in their generated F 1 hybrid plant. Anthocyanin deposition in immature panicles and presence of awn were recorded in F 1 true hybrid plant; those traits are actually inherited from donor plant (Vogali Bora), as donor plant has both the traits but recipient plant (IR36) is devoid of these two traits (Fig. 9a). Marker-assisted selection of hybrid population with RM332 showed heterozygous condition, which also confirmed the true hybrid nature (Fig. 9b).

Discussion
Agromorphological analysis showed that most of the Bora rice lines are with medium grain and kernel, as also reported by Prynga et al. (2018). Large flag leaves are recorded in most of Bora rice lines, which has a strong positive correlation with yield related traits (Zaman et al. 2005); however, almost 86% Bora rice lines fall in tall category, which has a negative effect on yield traits due to less accumulation of sugar in reproductive organ (Chakraborty and Chakraborty 2010). On the other hand, 86% Bora rice lines belong to medium tillering group, whereas only improved rice (IR36) belongs to good tillering group. Almost all Bora rice lines showed late flowering and late harvesting, which is a key characteristic feature of landraces (Pachuri et al. 2017). Anthocyanin pigmentation in different parts of the plant, especially in reproductive part and presence of awn in grains are the common qualitative traits found in Bora rice lines, a feature noticed earlier by Prynga et al. (2018). It appeared that the apparent variation in amount of colouration in lemma and palea of developing panicle may mislead their genetic purity which is a common phenomenon for rice landraces, but common presence of white colour stigma in all the studied Bora lines confirmed their genetic purity. For further confirmation we have included molecular marker-based study which is not influenced by environmental factors and can unequivocally identify the genetic purity of studied rice lines and may be used for identification and conservation of genetic resources. We found strong positive correlations between GS with KS, NT with NP/P and DF with DH (r = 0.8 to1), whereas negative correlation was noted between NT with PH, DF, DH and grain weight (− 0.8 to − 1). This result again reveals that plant height is negatively associated with yield traits as already reported by Dhakal et al., (2020). In case of cluster analysis, most of the lines are in cluster I and a few (VB547, VB548, VB549, VB550 and VB551) rice lines with better yield and improved traits are close to cluster II with VB09 (IR36). All studied rice lines belong to glutinous group (Bora) due to same quantity (low) of amylose rather than their morphological characteristic though they are from same geographical location. But based on 19 studied quantitative agromorphological traits, they form two clusters. Most of them showed similar morphological traits and lies in cluster I but few of them showed some similarity with IR36 and formed another cluster (cluster II). But for qualitative traits like anthocyanin deposition during panicle development, most of the Bora rice lines (except VB547 and VB560) showed anthocyanin deposition which is the marking agronomic trait of Bora rice lines. Also VB547 and VB560 posse's awn similar to other rest Bora rice lines. So, it can be conferred that after evolution in Majuli for its Komal trait, those lines are distributed in different parts of the Assam (presented in Fig. 1) and due to environmental factors some diversity/changes occurred in quantitative traits which are reflected in result but qualitative traits are not changed as those are not influenced by the environment. As all studied lines belong to glutinous Bora rice lines except VB09 (IR36), the total population as a whole included in PCA 1 and again, above-mentioned four lines are close to PCA 2 with VB09. Both the clusters and PCA results clearly indicate that same group of lines remain in similar cluster with same population structure as also reported by Roy et al. 2016. Also, in molecular markers based dendogram we got two clusters, where 60% population was found in cluster I and 40% in cluster II. More uniformity was noted in cluster II and also rice lines of cluster II represent, cluster with more improve in yield traits. Available reports (Sharma et al. 1971;Rajak and Bhuyan 2013) suggested that a long time ago Bora rice lines were introduced from Thailand or Burma in Majuli River Island of Assam through natural migration where they evolved and spread across others parts of the Assam state where due to natural/environmental effect some changes occurred in their agronomical traits in course of evolution. As a result of which few lines accrued some distinct quantitative agromorphological characteristic like improved rice but qualitative traits are remained same like others Bora rice. In molecular study 3 to 6 alleles were detected per marker loci and two markers (RM207, RM241) showed significant polymorphism information content, which is an important indicator for genetic variation in studied population (Roy et al. 2016) which is also indirectly supported by the variation observed in linked morphological traits (GL, GB) of those markers. Variance in grain weight was very limited in studied lines which showed a parallel information for the marker loci (RM226) linked with grain weight by virtue of limited allelic diversity. Another marker loci (RM332), for which the same phenomenon was recorded was linked with panicle length. All these finding indicated the effectiveness of studied four trait linked markers associated with a particular agronomical trait. Expression of GBSSI was very less in low amylose containing Aghani Bora compared to high amylose containing IR36; this whole molecular study leads to a unique molecular mechanism of low amylose trait in Bora rice and also reconfirm that Bora rice lines are Japonica rice with low amylose (Vyas 2020). Similar observation was also noted by Inukai (2017) with low amylose containing Japonica rice. Phylogenetic study revealed that two important genetic loci (GBSSI, SSIIa) of Komal rice (Aghani Bora) associated with amylose and amylopectin synthesis are close to Japonica group with low amylose and high amylopectin, which may be due to their Japonica ancestry (Vyas 2020). But in case of other four genetic loci responsible for addition of glucose molecule during synthesis of starch (AGPSIIb, AGPLII) and branching of amylopectin (SBEIIb), Komal rice showed closeness with high amylose containing IR36 whereas in case another loci (ISA 1) responsible for de branching of amylopectin (Ohdan et al. 2005), Komal rice, totally separated from other genotypes. As both high amylose containing Indica rice (IR36) and very low amylose containing Japonica rice (Nipponbare) do not possess Komal trait, it can be concluded that major starch synthesis related loci of Bora rice are very distinct from both low and high amylose containing rice and specially evolved for Komal trait due to special environmental and geographical condition of Majuli Island.
In the present study, with an objective of conservation of Bora rice lines, we tried to characterize these lines with 19 quantitative traits and, as breeders always looking for markers (Rawte and Saxena 2018), we also focused on marker identification, thorough characterization of 8 qualitative traits and also with the help of genotypic profiles developed for 4 trait linked markers. As qualitative traits are most important morphological marker for identification (Singh et al. 2014), which are not influence by environment and purely genetically controlled (Kalyan et al. 2017), we identified at least 8 morphological markers (as mentioned in Table 2) suitable for Bora rice breeding. Among these, anthocyanin deposition in flower and presence of awn are successfully validated for use in breeding program and other markers like auricle colors, basal leaf sheath color also can be used in breeding between Bora rice and improved rice. As these lines were disappearing day by day, prepared data bank of Bora rice group characterized with 27 popular agronomic traits is very essential for proper morphological identification during conservation of individual genotypes. At present polymorphic markers are widely used in molecular breeding programme (Junjian et al. 2002) for identification and selection of the recombinant inbreed lines (RIL) showing desirable traits. In this work the identified polymorphic marker (RM332) is now being utilized for screening of hybrid population generated from a crossed between Bora rice with Komal trait and improved rice without Komal trait. Additionally, we established some positive association among different yield related quantitative traits which will help in future breeding program to introgress unique trait (Komal) of Bora rice to higher yielding back ground.
Our study reveals that the morpho-molecular characteristics of Bora rice groups with 'soak-and-eat' (Komal) traits that may be helpful to conserve this special group of landraces; and that the specific markers used in this study can also be utilized in future breeding programme involving Bora rice.