Phenotyping of the population for physiological traits in rice
The mean values of 250 genotypes for 15 physiological traits viz., EC, RI, G,GI, RL, SL, SDW, SVI-I, SVI-II, RRG, RSG, RPLE, RGR, AGR, and MGR related to seed vigour were estimated during wet seasons of 2019 and 2020 (Supplementary Table 1). Significant differences were noticed among the germplasm lines for these 15 traits. The frequency distribution of the 250 germplasm lines were broadly classified into 4 groups each for the 15 physiological parameters (Fig. 1A-C). The distribution of germplasm lines into various groups were categorized into groups or subpopulations (Fig. 1). A representative panel population containing 96 genotypes was developed from the original population by shortlisting germplasm lines from all the phenotypic groups of each parameter (Table 1; Fig. 2). The mean values of the 15 physiological traits estimated from the studied panel population also showed significant variation among the genotypes for each trait (Table 1). High to very high estimates of seed vigour index I was observed in the germplasm lines MNPAC9005, MNP-AC9006, MNP-AC9021, MNP-AC9038, MNP-AC9043, JBSAC20282, JBSAC20328, JBSAC20362, JBSAC20371, JBSAC20389, JBSAC20770, JBSAC20920, Kantakapura, Kantakaamala, Kapanthi, Champaeisiali, Gondiachampeisiali, Kaniar, Adira-2PallakadR3, Pk-21, and AC-10187. High seed vigour index-II were estimated from the lines MNP-AC9030, MNP-AC9038, MNP-AC9043, MNP-AC9044A, MNP-AC9058, MNP-AC9063, MNP-AC9065, MNPAC9090, MNPAC9093, JBSAC20282, JBSAC20328, JBS-AC20362, JBS-AC20371 and JBS-AC20389 (Table 1). The germplasm lines MNP-AC9021, JBS-AC20317, JBS-AC20328, JBS-AC20362, JBS-AC20371, JBS-AC20389, Adira-2PallakadR3, Pk-21, Shayam, Basumati-B, Sugandha-2 and Chatuimuchi showed high germination index. High values for seedling dry weight were recorded from the germplasm lines MNP-AC9028, MNP-AC9030, MNP-AC9035, MNP-AC9043, MNP-AC9058, MNP-AC9063, MNP-AC9065, MNP-AC9090, and MNP-AC9093. Very high RSG estimates was noticed in MNP-AC9030, MNP-AC9035, MNP-AC9038, Kapanthi, Pk-21 while MNP-AC9044A, JBS-AC20423, JBS-AC20920, Champaeisiali, Magra, Balisaralaktimachi-k, AC-6023, AC-6183, AC-10162, AC-10187 exhibited high rate of root growth. Very high relative growth rate was recorded from the genotypes viz., MNP-AC9030, MNP-AC9038, MNP-AC9043, MNP-AC9044A, Kundadhan, Latamahu, Laxmibilash, Kanakchampa, Magura-s , PK-18Ezhoml-2, PK-24Jyothi, PK-25Jaya, PK-33D1, AC-5993, AC-6027, AC-6172, AC-7134 and AC-7269s. Germplasm lines Kantakapura, Champaeisiali and Latamahu showed very high estimates for rate of plumule elongation. Very high shoot length was recorded from MNP-AC9030, MNP-AC9035, MNP-AC9038, MNP-AC9043, MNP-AC9044A, JBS-AC20282 and JBS-AC20770. Very high electrical conductivity was detected from the genotypes, MNP-AC9030, JBS-AC20282, JBS-AC20845, JBS-AC20907 and AC-7008. Germplasm lines namely MNP-AC9038, MNP-AC9043, JBS-AC20282 and BS-AC20845 were categorized for very high Rate of imbibitions. In addition, very high seedling root length was recorded from the germplam lines MNP-AC9005, MNP-AC9076A, kapanthi and Champaeisiali. Also, very high estimates of rate of plumule elongation were observed in the germplasm lines namely Kantakapura, Champaeisiali and Latamahu. MNP-AC9030, MNP-AC9035, MNP-AC9038, MNP-AC9043, MNP-AC9044A, JBS-AC20282 and JBS-AC20770 had high seedling shoot length in the panel population. The germplasm lines possessing high values of multiple physiological traits for ≥ 10 were in MNP-AC9005, MNP-AC9006, MNP-AC9038, MNP-AC9043, MNP-AC9044A, JBS-AC20282 and JBS-AC20328 (Table 1).
Relatedness among germplasm lines for physiological traits through genotype-by-trait biplot analysis
The scatter diagram was plotted taking the first two principal components to generate genotype-by-trait biplot graph for the 15 physiological traits estimated from the 96 genotypes present in the panel (Fig. 3). The first and second principal components showed 99.651 and 0.2204 of the total variability with eigen value of 102641 and 227.016, respectively (Supplementary Fig. 1). SVI-I contributed maximum diversity among the 15 physiological parameters followed by GI and SVI-II for the panel population based on the principal component analysis (Fig. 3). The scattering pattern of genotypes in the 4 quadrants indicated that genotypes containing high estimates of parameters are placed in opposite direction of the quadrant 1 and II. Higher estimates of physiological parameters containing genotypes have been encircled in the figure (Fig. 3). The top right (Ist quadrant) and bottom right (2nd quadrant) accommodated majority of the genotypes containing high estimates of physiological parameters. The 3rd (bottom left) quadrant kept most of the moderate value containing physiological parameters while the 4th quadrant (top left) accommodated majority of poor in values of the physiological parameters (Fig. 3).
Nature of association among seed vigour related traits
The association among 15 physiological traits revealed a strong positive correlation (r≥0.7) of SVI I with GI and GP; SVI II with SDW; GP with GI, and AGR with RRG and RSG (Fig. 4). Moderate positive correlation (r: 0.5-0.7) of SVI I with RL and SL; SVI II with GI,SVI I and GR; RI with EC; SL with RI and RL; RRG with RL; RSG with SL; AGR with RL, SL; MGR with GI, SDW, SVI II were observed. Weak positive correlation (r < 0.5) was noticed for SVI I with RI, RRG, RSG, AGR and MGR; SVI II with RI, GP, RL and SL; RRG with SL, RSG and AGR; RRG with SL; RSG with GI, RL, SDW and RRG; RPE with RL and SL; AGR with RI, SDW; MGR with Germ, RL, SL, RSG. However, weak negative correlation was estimated for RGR with GI, GP and SVI I (Fig. 4).
Genetic diversity parameters analysis
The constituted panel containing 96 genotypes from the original population which exhibited wide variation for the physiological parameters was genotyped using 109 molecular markers. The gene diversity, loci used for genetic diversity and other diversity related parameters are presented in Table 2. A total of four hundred four markers alleles were obtained with average value of 3.07 alleles per locus. The range of alleles per locus varied from 2 to 7 per marker showing the highest number of alleles by RM220, RM448 and RM493 in the studied panel for the physiological parameters. The average value of the major allele frequency of the parameters linked to the polymorphic markers was observed to be 0.578 which varied from 0.292 (RM488 and 493) and 0.958 (RM22034) (Table 2). The range for PIC value was estimated to be from 0.141 (RM315 and 6054) to 0.771 (RM493) with mean value of 0.477. The observed average heterozygosity (Ho) in the population was 0.117 which varied from 0.00 to 0.958. The gene diversity (He) in the panel ranged from 0.061 (RM556) to 0.799 (RM493) showing a mean value of 0.533.
Population Genetic Structure Analysis
The genotypes in the panel exhibiting variation for the studied physiological parameters were evaluated for genetic structure adopting probable sub-populations (K) and selecting higher delta K-value estimated by STRUCTURE 2.3.6 software. The delta K value is related to the rate of change in the log probability of data between successive K values. It categorized the genotypes into two sub-populations (Fig. 3A; Fig.3B) with a high ∆K peak value of 264.2 at K = 2 among the assumed K (Fig. 5). The proportions of genotypes in the inferred clusters were 0.875 and 0.125 in subpopulation 1 and subpopulation 2, respectively. However, the two subpopulations did not show correspondence well with the studied physiological parameters. Hence, next peak at the ∆K peak was considered and the population was categorized into 6 subpopulations. The proportions of genotypes in the inferred clusters were 0.179, 0.211, 0.258, 0.081, 0.181 and 0.091 for the sub-population 1, 2, 3, 4, 5 and 6, respectively. The fixation index (Fst) values were 0.278, 0.254, 0.201, 0.332, 0.206 and 0.507 for the sub-population 1, 2, 3, 4, 5 and 6, respectively. The expected average distances or heterozygosity were 0.342, 0.348, 0.366, 0.390, 0.373 and 0.331 in the sub-population 1, 2, 3, 4, 5 and 6, respectively. The genotypes with ≥80% ancestry value were categorized for that subpopulation (Table 3; Fig. 5).
The physiological parameters showed a relatively fair correspondence at K=6 with the structure subpopulations present in the panel population. Majority of the moderate to high seed vigour showing germplasm lines present in the subpopulations SP2 and SP6 while poor vigour containing lines were in subpopulation SP4 and SP5. The panel also showed a low alpha value (alpha = 0.0591) by the structure analysis at K=6. Positively skewed leptokurtic distributions were observed for mean alpha-value, Fst3, Fst4 and Fst5 while mesokurtic distributions detected for Fst1, Fst2 and Fst6 for the panel population showing a distinct variation in the distribution among the Fst values (Supplementary Fig. 2).
Molecular variance (AMOVA) and LD decay plot analysis
The closely related plants in a population are clustered into isolated groups and form various subpopulations. Genetic variations between and within the sub-populations at K=6 were detected through analysis of molecular variance (AMOVA) (Table 4). The genetic variations obtained between and within at K=6 was computed to be 12% among the populations, 67% among individuals and 21% variation within individuals of the panel population. Deviation from Hardy-Weinberg’s prediction was calculated from Wright’s F statistics estimates. Different parameters like uniformity of individual within the subpopulation (FIS) and individual within the total population (FIT) were estimated for differentiation of population. The FIT and FIS values of total population and within population based on 109 loci were 0.791 and 0.763, whereas FST was 0.118 between the two subpopulations. Fst is estimated to measure the population differentiation or the subpopulations within the total population. The Fst values of each sub-population and their distribution pattern showed a clear differentiation between the six sub-populations from each other (Supplemental Fig. 2).
The nonrandom association of alleles at different loci is successfully utilized for marker-trait association study. The LD decay rate is important factor for getting marker–trait association. The decay rate will facilitate the discovery of reliable markers associated with the physiological parameters and will facilitate the discovery of new genes or allelic variants controlling these traits. Syntenicr2 was used to plot the LD decay of the population against the physical distance in million base pair (Fig. 6). Tightly linked markers have the highest r2 and average r2 rapidly decreases as linkage distance increases. There was a sharp decline in LD decay for the linked markers at 1-2 mega base pair and thereafter a very slow and gradual decay was noticed. Overall, it is clear that LD decay occur for the physiological parameters.
Genetic relatedness among genotypes by principal coordinates and cluster analyses
The two dimensions diagram for principal coordinate analysis (PCoA) is drawn based on 109 markers which grouped the genotypes as per the genetic relatedness among them (Fig. 7). The component 1 accounted for 11.04% inertia and component 2 for 6.71% of total inertia. The panel genotypes were placed in various spots of the 4 quadrants which formed three major groups (Fig. 7). A total of 30, 46, 11 and 9 number of germplasm lines were distributed in the 1st, 2nd, 3rd and 4th quadrant, respectively. The genotypes belonging to the 6 different sub-populations are grouped in different quadrants. The 1stquadrant genotypes are divided into 3 groups whereas the 2nd quadrant genotypes are divided into two groups of which one group is closer to axis1 and another is to axis 2. This 2nd group genotypes closer to axis2 are admix type depicted in black colour (Fig. 7).
The majority of the germplasm lines containing high to very high mean values of physiological traits were placed in the 1st (top right) and 2nd (bottom right) quadrants of the PCoA. The PCoA distributed the genotypes in the four quadrants forming 7 clusters including the admix type subpopulation. The subpopulations clustered by PCoA are encircled in the figure and showed correspondence with population structure (Fig. 7). Germplasm lines having high to very high mean values of physiological traits placed in the quadrant II were ARS-AC-6221, MP-Joha, KE-Adira-2-Pallakad-R3, KE-PK-18-Ezhoml-2, KE-PK-14-Vachaw and KE-PK-24-Jyothi. Genotypes namely OD-Landi, OD-Balisaralaktimachi, OD-Kaniar, OD-Kanakchampa, ARS-AC-6023, ARS-AC-6172, MNP-AC-9030, KE-PK-19-Cheruvirippu, JBS-AC-20614, MNP-AC-9005, JBS-AC-20371 and JBS-AC-20423 were observed in the quadrant I. Quadrant III consisted of genotypes mostly from SP4 subpopulation. Majority of the germplasm lines in quadrant IV were from SP6 subpopulation. Majority of the admix genotypes were found in quadrant I and II.
Six sub groups were observed in the dendrogram based on the mean values of studied physiological parameters (Fig. 8A). A total of 15, 4, 11, 21, 19 and 26 genotypes were distributed in the cluster I to VI, respectively. Cluster VI was biggest cluster which accommodated 26 germplasm lines while cluster II was smallest with only 4 genotypes. The germplasm lines present in sub-population 2 of genetic structure were observed in group 6 of the dendrogram. Similarly, four genotypes of structure subpopulation 1 were in the group 4 of the dendrogram. Admix genotypes obtained from the structure were found in all the groups of this phenotype cluster except group 4.
The cluster analysis discriminated the germplasm lines on the basis of genotyping of 109 SSR markers and placed the genotypes into different clusters which corresponded with the studied physiological parameters. The unweighted-neighbour joining tree differentiated the genotypes into 6 different clusters (Fig. 8). Clusters SP1 was differentiated from SP3 by the presence of high estimates of root length and rate of plumule in it whereas Seedling dry weight & Seed vigour index II were rich in SP3. SP2 and SP6 had accommodated majority of germplasm lines containing high values for the studied parameters except Seedling dry weight in SP2 and germination % in SP6. Sp4 was discriminated from others based on absence of germplasm lines containing high estimates for germination rate and seed vigour index II while root length and seedling dry weight were absent in SP5.
Association of marker alleles with physiological parameters in rice
Association of molecular markers with 15 physiological parameters was computed using Mixed Linear Model (MLM/ K+Q model) and Generalized Linear Model (GLM) by TASSEL 5 software. The marker-trait comparisons were subjected to filtration at less than 1% error i.e. 99% confidence (p<0.01). Twelve parameters showed significant associations with markers using both the models at p<0.01. A total of 112 and 93 significant marker-trait associations were detected by GLM and MLM, respectively at p<0.01. The marker R2 values computed by GLM approach was from 0.565 to 22.7 while the range was 0.7009 to 0.175 by Mixed Linear Model (Supplementary Table 3). Significant marker-trait associations were detected for G1 with 5 markers; SVI-II, RSG and MGR with 3 markers; GP, RGR and RP with 2 markers, and RI, RL, SVI, AGR and RRG with 1 marker by both GML and MLM models at p<0.01. Considering trimming at r2>0.10 and p<0.01, 6 markers exhibited associations with 4 physiological parameters namely GI with RM225 and RM502; GP with RM225 and RM502; SVI with RM5638 and RP with RM220 (Table 5; Supplemental table 4). The Q-Q plot also confirmed the association of these markers with the associated physiological traits in rice (Fig…).
Four markers showed significant association with GI detected by GLM and MLM models at p<0.01. The genomic regions controlling the trait, GI was detected on chromosome 1, 8, 11 and 12 associated by markers RM5638, RMRM502, RM229 and RM20A, respectively. Among the four markers, RM502 showed highest marker R2 value of 0.227 analyzed by GLM and 0.175 by MLM. Three markers namely RM5638, RM14723 and RM7003 located at 204, 86 and 132 cM positions on chromosome 1, 3 and 12, respectively were associated with the parameter, SVII. RSG was detected to be associated with RM 6547, RM3701 and RM7003 present on chromosome 1, 11 and 12, respectively. MGR was found to be controlled by the QTLs present on the chromosome 1, 8 and 9 which showed associations with markers RM220, RM502 and RM201, respectively. QTLs for Germination % showed significant associations with RM225 on chromosome 6 and RM502 on chromosome 8. The parameter, RPE was detected to be located on chromosome 1 showing association with RM220 and RM403. Relative growth rate exhibited association with RM468 and RM3701. Marker RM256 showed significant association with RRG and AGR. Significant associations of markers RM248, RM229 and RM5638 with RI, RL and SVI, respectively were detected by both the models. Marker RM256 was strongly associated with parameters, RRG and AGR. In addition, RI, RL and SVI showed significant associations with RM248, RM229 and RM5638, respectively (Table 5). The Q-Q plot also confirmed the associations of these markers with the estimated physiological parameters in rice (Fig. 9).
Common markers were observed to be associated with different physiological parameters in rice. Few markers showed significant associations with two physiological parameters namely RM220 with RP and AGR; RM225 with GI and GP; RM229 with GI and RL; RM256 with AGR and RRG; RM3701 with RSG and RGR, and RM7003 with SVII and RSG by both the models at <1% error. Also, marker RM602 showed significant associations with 3 traits namely GI, GP and MGR. In addition, RM5638 also associated with 3 parameters namely GI, SVI and SVII by both the models at p<0.01 (Table 5).