Uncovering Genomic Regions Regulating the Superoxide Dismutase, Flavonoids, Anthocyanins, Carotenoids, γ-Oryzanol and Antioxidant Activity in Rice Through Association Mapping


 Consumption of antioxidants rich rice has impressive health benefits. Five antioxidant traits viz., superoxide dismutase, flavonoids, anthocyanins, γ-oryzanol and ABTS were mapped in a panel population using 136 SSR markers through association mapping. A panel population containing 120 germplasm lines by including genotypes from all the phenotypic groups of the six antioxidant traits from the original shortlisted 270 rice genotypes. Donor lines rich in multiple antioxidant compounds were identified from the population. The population was classified into 4 genetic groups and showed fair degree of correspondence with the antioxidants content. A total of 14 significant marker-trait associations for antioxidants were detected of which 3 QTLs namely qANC3, qPAC12-2 for anthocyanin content and qAC12 for ABTS activity were validated in the population. Eleven putative QTLs such as qTAC1.1 and qTAC5.1 for anthocyanin content; qSOD1.1, qSOD5.1 and qSOD10.1 for SOD; qTFC6.1, qTFC11.1 and qTFC12.1 for TFC; qOZ8.1 and qOZ11.1 for γ-oryzanol and qAC11.1 for ABTS were detected to be novel loci. Co-localization of the QTLs detected for OZ11.1, TFC11.1 and AC11.1 regulating γ-oryzanol, flavonoid and anthocyanin content, respectively while PAC12.2 for anthocyanin content remained closer to TFC12.1 for flavonoid content. These QTLs will be useful in the antioxidant improvement programs in rice.

Enhancement of anthocyanin content has been attempted through genetic engineering of anthocyanin biosynthesis transgenes that has generated 'Purple Endosperm Rice' 44 . Attempts have been made to develop few pigmented rice varieties like'Rubi' and 'Onix' of Brazil 45 and 'Rice berry" of Thailand 46 . Also, some advanced breeding lines are available 33,36,47,48,49 .Though several studies reported about genes for these antioxidant traits of which some genes have been ne mapped in certain cases (Rc, Rd,Pb) and each adopted a different gene coding system which is still confusing for which more rm genes/loci should be identi ed to explain these complex traits 19 . There by development of trait-speci c markers will accelerate the efforts to breed high yielding pigmented rice varieties.
The genetic diversity and structure of the population will be helpful for detecting marker-trait association which could be useful for trait enhancement in molecular breeding programs. In order to avoid spurious association of marker-phenotype in a population, population structure (Q) with relative kinship (K) analyses is essential to check the correctness of the panel population composition for linkage disequilibrium (LD) mapping analyses 50,51,52 . Thus, association estimate based on both the models of Generalized linear model (GLM) and Mixed linear model (MLM) is considered appropriate for mapping complex traits that have shown to perform better than other model analyses.
Association mapping based on linkage disequilibrium has emerged as a powerful alternative strategy for identifying genes or quantitative trait loci (QTL) for various complex traits in plants in a natural variable population by examining the marker-trait associations. In the present study, we have identi ed the marker loci/QTLs underlying the naturally occurring variations of antioxidant traits and antioxidant activity by association mapping. We have used a panel population of 120 genotypes (67 white and 53 red grain) shortlisted from 270 diverse genotypes (121 white and 149 red grain) of India through phenotyping of six physio-biochemical traits (carotenoids, SOD, total anthocyanins, γ-oryzanols, TFC, and ABTS). Rice microsatellite markers were used to study the population genetic structure, diversity and association of molecular markers with these phytochemical traits to improve nutritional quality of rice.

Seed material
The study material includes 270 genotypes (landraces and cultivars) of 121 white and 149 colored rice grains. The initial population was shortlisted on the basis of maturity duration (upto 135 days) and kernel color (red, black, purple and white) from about 1000 germplasm lines. Seeds of these germplasm were collected from Gene bank, ICAR-NRRI, Cuttack and were grown in the experimental plot of the Institute during wet season, 2018. The genotypes were grown in randomized complete block design in three rows each following spacing of 20 x 15 cm in two replications following recommended package and practices. Each replication is divided into 5 blocks accommodating 54 germplasm lines in each block. Panicles of middle row plants of each replication were harvested, sun dried for 4-5 days to reduce the moisture content to 11-12 %, stored for three months to remove dormancy and then used for estimation of superoxide dismutase, avonoids, anthocyanins, carotenoids, γ-oryzanol and antioxidant activity. A representative panel population containing 120 germplasm lines was prepared from the original 270 germplasm lines (120 genotypes consisting of 67 white and 53 red grain rice). The panel population was raised during wet seasons, 2019 and 2020 and the antioxidants were estimated. The panel population (120) is used for genotyping for association mapping of antioxidant traits ( Table 1). The plant experiments were performed in accordance with relevant guidelines and regulations.

Phenotyping for antioxidant traits
The seed samples were dehulled by the Satake rice huller, Japan and were ground into our by a grinding machine (Glenmini grinder) and sieved through 100 mesh size and then stored at 4 0 C for the experiment. Analysis of all the traits were based on dry matter basis except for carotenoid content which were estimated based on fresh weight basis. Leaf samples from 10 days old seedling grown on petriplate at 30 0 C were used for estimation of carotenoids (mg g -1 ) by following the protocol of Davis 68 . Seed enzymatic antioxidant, super oxide dismutase (SOD: unit g -1 ) was estimated as per the procedure of Madamanchi et al. 69 . Non-enzymatic antioxidant, total anthocyanin content (TAC: mg 100 g -1 ) was estimated by the procedure of Fuleki and Francis 70 . Estimation of γ-Oryzanol (GO:mg 100 g -1 ) was done according to Bucci et al. 71 with minor modi cations. Total avonoids content (TFC) was estimated as per the procedure of Eberhardt et al. 72 and expressed as catechine equivalent (mg CEt 100 g -1 ). Antioxidant activity, ABTS (2,2'-azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) radical scavenging was assayed by modi ed protocol of Serpen et al. 73 and expressed as % inhibition.
Statistical analysis Cropstat software7.0. was used for analysis of variance (ANOVA) for each trait including the estimation of mean, range, and coe cient of variation (CV %).
Pearson's correlation coe cients were analyzed to nd out the relationship among the various antioxidant traits, based on the mean values of the 120 genotypes and presented in correlation matrix heatmap by using PAST software. The germplasm lines were classi ed into six groups as very high, high, medium, low and very low category based on mean values antioxidant traits.
Genomic DNA isolation, PCR analysis and selection of SSR markers The genomic DNA was isolated from 15 days-old seedlings of the germplasm lines by adopting CTAB method 74 (Murray and Thompson 1980). The 136 SSR (simple sequence repeat) markers were taken from the database (http://gramene.org/) available in the public domain (Supplementary Table 5). The DNA fragments were resolved in gel electrophoresis for quanti cation of isolated DNA. PCR analysis was done using the markers selected based on positions covering all the chromosomes to illustrate the diversity and to identify the polymorphic loci among the 120 rice germplasm lines ( Table 1). Conditions of reaction was set to initial denaturation step (2 min, 95 0 C), followed by 35 cycles of denaturation (30 s, 95 0 C) and annealing/extension (30 min, 55 0 C), extension (2 min, 72 0 C), nal extension (5 min, 72 0 C) and store at 4 0 C (in nity). The PCR products were electrophoresed using 3 % agarose gel containing 0.80 g ml -1 ethidium bromide and. 50 bp DNA ladder was used to determine the size of amplicons. The gel was run for 4 h at 2.5 V cm -1 and photographed using a Gel Documentation System (SynGene). Earlier publications of 75,76,77 were followed for DNA isolation, electrophoresis and imaging techniques.

Molecular data analysis
Presence or absence of ampli ed products obtained on the basis genotype-primer combination was used to score the data. A binary data matrix was used as discrete variables for the entry of our result data. The parameters namely polymorphic information content (PIC), observed heterozygosity (H), number of alleles (N), major allele frequency (A) and gene diversity (GD) for each SSR locus 78 was analysed by using, 'Power Marker Ver3.25' software. A Bayesian model based clustering approach STRUCTURE 2.3.6 software was used to analysis genetic data and obtain population structure 79 . STRUCTURE software was run with K varying from 1 to 10, with 10 iterations for each K value to derive the ideal number of groups. A high throughput parameter set of burn-in period of the 150,000 followed by 150,000 Markov Chain Monte Carlo (MCMC) replications was adapted during the running period. Highest value of ΔK was pick up from Evanno table used to detect the subpopulation groups from the panel of populations in the next step. The maximal value of L (K) was identi ed using the exact number of sub-populations. The model choice criterion to detect the most probable value of K was ΔK, an ad-hoc quantity related to the second-order change of the log probability of data with respect to the number of clusters inferred by STRUCTURE 80 . For estimation of the ΔK-value as function of K showing a clear peak as the optimal K-value Structure Harvester was used. The principal coordinate analysis of all the genotypes and unweighted neighbor joining unrooted tree for NEI coe cient dissimilarity index 82 with bootstrap value of 1,000 were obtained by using DARwin5 software 83 . Analysis of molecular variance (AMOVA) using GenAlEx 6.5 software was used to estimate the presence of molecular variance across the whole population, within a population and between the sub-population structures (FIT, FIS, FST) calculated by the deviation from Hardy-Weinberg expectation. The procedures followed in earlier publications were adopted for molecular data analysis 51,67,84 .
To analyze the marker-trait association for mapping study of the seed antioxidant traits in rice, the software "TASSEL 5.0" was used. General linear model and Mixed linear model in TASSEL 5.0 were used to perform the genetic association between the phenotypic traits and molecular markers 85 . By considering the signi cant p-value and r 2 value convincing associated markers were identi ed. The associations of markers were further con rmed by the Q-Q plot generated by the software. Linkage disequilibrium plot was obtained using LD measured r 2 , between pair of markers is plotted against the distance between the pair. Also, the accuracy of the marker-trait association by estimating the FDR adjusted p-values (q-values) using R software as described in the earlier publications 6,61,67 .

Results
Phenotyping of the population for antioxidant content in rice A total of six antioxidant compounds including one antioxidant enzyme namely superoxide dismutase, avonoids, anthocyanins, carotenoids, γ-oryzanol and ABTS were estimated from 270 genotypes during wet seasons of 2019 (Supplementary Table 1). Wide variation exists among the germplasm lines for 6 antioxidant compounds. The germplasm lines were classi ed into 5 groups based on the phenotying results of each compounds (Fig. 1). The frequency distribution of germplasm lines showed various groups or populations for each compound and enzyme (Fig. 1). A panel population was preparedby selecting120 genotypes representing each group and trait from the original population of 270 germplasm lines (Table 1; Fig. 2). The mean estimates of 6 antioxidants obtained from the representative panel population showed wide variation among the genotypes for each trait ( Relatedness among germplasm lines for antioxidant compounds through Genotype-by-trait biplot analysis The scatter diagram was plotted taking the rst two principal components to generate genotype-by-trait biplot graph for the 6 antioxidants estimated from the 120 genotypes present in the panel (Fig. 3). The rst and second principal components showed 68.3 and 19.8 % of the total variability with eigen values of 8064 and 2342, respectively (Supplementary Fig. 1). γ-oryzanol contributed maximum diversity among the 6 antioxidant compounds in the genotypes present in the panel followed by TFC and ABTS (Fig. 3). The scattering pattern of genotypes in the 4 quadrants indicated that genotypes containing high estimates of

Nature of association among antioxidant traits
The association among 6 antioxidant traits revealed a strong positive correlation (r≥0.7) of TAC with TFC and TFC with ABTS. Moderate positive correlation (r: 0.5-0.7) of TAC with ABTS and weak positive correlation (r < 0.5) was observed for carotenoid with GO (Fig. 4). These antioxidant traits positively or negatively correlated may be controlled by the closely linked genes or because they might be structurally related. Therefore, a variety that accumulates high concentrations of one antioxidant may contain higher quantity of other correlated antioxidants.
Genetic diversity parameters analysis The panel population containing 120 germplasm lines which exhibited wide variation for the antioxidants was genotyped using 136 SSR markers. The genetic diversity parameters estimated from the panel population are presented in the Table 2. Genotyping results showed a total of 508 markers alleles from the population exhibiting mean alleles of 3.74 per locus. The number of alleles per locus ranged from 2 to 7 per marker. The highest numbers of alleles were produced by the marker RM493 in the studied panel for antioxidants content in rice. The measure for the variation by a marker in the population was analysed by the of availability major allele frequency parameter. The average major allele frequency linked to the polymorphic markers was computed to be 0.561 which showed a range of 0.279 (RM8044) and 0.925 (RM6054) ( The diverse population for antioxidants was genotyped for genetic structure and analyzed by adopting probable sub-populations (K) and selecting higher delta K-value by applying the STRUCTURE 2.3.6 software. The rate of change in the log probability of data between successive K values is the delta K value used in the analysis. The panel population was categorized into two sub-populations with a high ∆K peak value of 264.2 at K = 2 among the assumed K (Supplementary Table 2; Supplementary Fig. 2). The two subpopulations were in the proportion of 0.277 and 0.723 for population 1 and population 2, respectively. But, the subpopulations showed poor correspondence with antioxidants containing genotypes present in the studied population. Therefore, next ∆K peak at K=3 was compared in which the population was classi ed into 3 subpopulations.  Table 2; Supplementary Fig. 3). The ancestry value of ≥80% obtained in a genotype grouped the genotype into the particular subpopulation.
The assumed subpopulations at K=3 differentiated the germplasm lines for antioxidant compounds but did not clearly separate for the SP2 and SP3 subpopulations. Hence, next ∆K peak at K=4 was considered for the subpopulations in which the population was classi ed into 4 genetic groups. The The association of alleles by different loci in a nonrandom manner is utilized in the marker-trait association analysis. Existence of marker-trait association is dependent on the LD decay rate in a population over a time period. The LD decay rate will indicate the possibility of new genes or allelic variants controlling the antioxidant compounds associated with molecular markers for these traits. Syntenic r 2 value was used to plot the linkage disequilibrium decay of the population versus the physical distance in million base pair (Fig. 6A). Tightly linked markers had higher r 2 value and the average r 2 values rapidly decreases for increase in linkage distance. In the LD plot it is observed that the LD decay in the beginning was delayed in the studied panel populations. However, a decline of LD decay was noticed in the curve for the associated markers at about 1-2 mega base pair and thereafter a gradual and very slow decay was noticed from the graph. The graph clearly indicated the continuance of linkage disequilibrium decay in the population for the studied antioxidant compounds in the rice population. The limitation for LD decay depends on non-random mating, mutation, selection, migration or admixture, and genetic drift will in uence the estimates of LD. This LD decay plot also provided clue for creation of genetic admixture groups for various antioxidants compounds in the normal population. A similar trend was also noticed in the marker 'P' versus marker 'F' and marker R 2 (Fig. 6B) curve. The detected markers from this study indicated the strength of the markers for the studied antioxidant compounds.
Principal coordinates and cluster analyses for genetic relatedness among the germplasm lines The two dimensional plot for principal coordinate analysis (PCoA) was constructed based on the genotyping results of 136 SSR markers which classi ed the 120 germplasm lines as per the genetic relatedness among the lines (Fig. 7). The inertia showed by component 1 was 11.73 % while 7.49 % exhibited by the component 2. The germplasm lines were allotted different spots in the four quadrants forming 3 major groups (Fig. 7). The biggest group accommodated all the germplasm lines of the subpopulation 2 and 3 together and clustered in the 2 nd (bottom right) quadrant. The genotypes in the 1 st quadrant are divided into 2 groups of which one group on the top of the 1 st quadrant forms the sp3 subpopulation with low to very low in antioxidant content in the seeds. The other group near to the axis1 is all admix type of germplasm lines. Few germplasm lines of quadrant 2 and closer to the axis 1 are also admix genotypes. The admix genotypes present on both sides of axis 1 are depicted in black colour (Fig. 7).
The germplasm lines containing high to very high mean values of antioxidants are grouped together forming the subpopulation 4. This subpopulation is present on the quadrant 3 (top left) and 4 (bottom left) and encircled in red color. The germplasm lines rich in antioxidants are placed both side of the axis 1 on the quadrant 3 and 4 (Fig. 7). The PCoA distributed all the germplasm lines into the four quadrants classifying into 4 clusters and a separate admixture group. The subpopulations clustered by PCoA showed correspondence with the population structure ( Fig. 7 The Wards's clustering broadly grouped all the genotypes into two major groups. The largest cluster, cluster 1 accommodated 111 germplasm lines in which all poor to average containing antioxidants were present. The cluster II had 9 germplasm lines only. The dendrogram placed all the germplasm lines in this cluster II which were rich in antioxidant content. This cluster again subdivided into 2 sub groups which further divided into 6 sub-sub clusters. The cluster I was divided into two main sub clusters which nally divided into 32 small groups. All the clusters and small groups accommodated in the Ward's clustering approach were based on the antioxidant compounds content in the germplasm lines (Fig. 8A).
The cluster analysis discriminated the germplasm lines on the basis of genotyping of 136 SSR markers and placed the genotypes into different clusters which corresponded with the studied antioxidant traits. The unweighted-neighbour joining tree differentiated the genotypes into 4 different clusters (Fig. 8B ). The Cluster for Subpopulation 4 was differentiated from SP 2 by the presence of germplasm lines containing high antioxidants in it while moderate to high containing genotypes were in the subpopulation 2. The green coloured portion of the tree is designated as SP4 while blue for SP2. The very poor antioxidants containing germplasm lines were in the subpopulation 3 depicted in red colour in the tree. Majority of the germplasm present in the subpopulation 1 were poor to medium in antioxidant content and shown in pink colour. The germplasm lines with admix type of population are depicted in black color in the neighbour joining tree neighbour joining tree neighbour joining tree (Fig. 8B).

Marker-trait association for antioxidant compounds in rice
Marker-trait associations was computed for 6 antioxidant compounds by using Generalized Linear Model (GLM) and Mixed Linear Model (MLM/ K+Q model)) in the TASSEL 5 software. The marker-trait association values were compared at less than 1% error i.e. 99% con dence (p<0.01). A total of 57 and 23 signi cant marker-trait associations were detected for 5 antioxidant compounds by GLM and MLM, respectively at p<0.01. The range for marker R 2 values was from 0.0477 to 0.159 by GLM while 0.0607 to 0.1169 detected by Mixed Linear Model (Supplementary Table 3; Supplementary Table 4). A total of 15 signi cant marker-trait associations were detected by both the models for 5 antioxidant compounds present in the seed at p<0.01 (Fig. 9A). Signi cant association of 5 SSR markers with TAC; 3 with SOD, TFC; 2 with GO, and ABTS were detected. Five antioxidant compounds present in the studied germplasm lines showed higher marker R 2 (>0.1) with low p (<0.01) values in the associations study includes SOD with RM405 and GO with RM3701 (Table 5; Fig. 9A). The Q-Q plot also con rmed the association of these markers with the associated antioxidant traits in rice (Fig. 9B).  Fig. 9A). The Q-Q plot also con rmed the associations of these markers with the estimated antioxidant compounds in rice (Fig. 9).
Association mapping study for antioxidant compounds in rice seeds identi ed co localization of QTLs controlling antioxidant compounds in rice. It is observed that same marker showed signi cant associations with different antioxidant compounds in rice by both the models (Table…). Signi cant associations of marker RM3701 with antioxidant compounds GO, TFC, and ABTS present in the germplasm lines were detected. In addition, it was also detected association of RM235 with compounds TAC, TFC, and ABTS by both the models at <1% error and p<0.01 (Table 5). While considering marker association analyzing by GLM, the marker RM494 showed association with both carotenoids and TFC. In addition, RM494 was associated with both SOD and TFC content analyzed by MLM.

Discussion
The genotypes shortlisted for antioxidants mapping showed variation from each other with respect to 6 antioxidant compounds (Supplementary Table 1;  Table 1). Also, few compounds showed higher correlation values among themselves. This provides enough clues for improvement of the antioxidant compounds through the correlated compounds based on the coe cients of the correlations and the availability of higher genetic variations for the antioxidants in the population (Table 1; Table 4). Earlier reports of high variations for antioxidants were also published by few researchers 17,43,53,54,55 . The available diversity in the population based on 136 markers data and 6 antioxidants showed clear cut groups in the studied population (Table 2). A moderate to high PIC value coupled with better informative markers in the studied population will be useful for improvement of antioxidants breeding program. The Jayapur tract of Odisha, known for secondary centre of origin of rice, germplasms from this tract has also been included in this study. Also, the shortlisted germplasm lines used as materials in this study were collected from the states known for their rich rice genetic diversity 43 .
The genotypes rich in multiple antioxidant compounds were identi ed in Ac. 44646, Ac. 44594 and Ac. 44595; 7 in Ac. 43660, and Ac. 20282 and 6 in Ac. 43738, Ac. 43660, and Ac. 44592. These germplasm lines will be good source materials in the antioxidant improvement programs (Table 1; Supplementary   Table 2). Therefore, it is expected that breeding program with inclusion of parental lines from this population will be effective for antioxidants improvement. Five antioxidant compounds were found to be associated with 12 SSR markers analyzed by both GLM and MLM approaches ( Table 5). The markers association detected by both the models at p<0.01 and low 'p'value are considered to be very robust and useful markers for improvement program. The strongly associated SSR markers namely RM440, RM235, RM5638, RM253 and RM5626 for TAC; RM582 and RM467 for SOD; RM 3701, RM235 and RM494 for TFC; RM3701 and RM235 for ABTS; RM3701 and RM502 for GO will be useful markers for selection of antioxidant compounds ( Table 5). The Q-Q plot also con rmed the associations of these markers with the antioxidant compounds in rice (Fig. 9B).
The QTLs for anthocyanin and proanthocyanin content in rice was reported by earlier researchers 23,35,39,62  Two markers were observed to be associated with more than one antioxidant traits analyzed by both the models at <1% error and p<0.01. Marker RM3701 showed associations with GO, TFC, and ABTS present in the germplasm lines. Also, RM235 was associated with traits, TAC, TFC, and ABTS by both the models (Table 5). These observations indicated the close location of the candidate genes and simultaneous inheritance of these QTLs. Hence, simultaneous improvement of both these antioxidant traits will be effective. In recent publications also suggested easy improvement of the co-localized genes controlling various traits in rice 6,51,61,67 . Results of the present investigation showed that association mapping is an effective method to find more potential loci for antioxidant compounds and antioxidant capacity in rice. The detected loci will further be fine mapped for application in maker-assisted breeding for improvement of antioxidants in rice. These 3 QTLs are useful in the marker-assisted breeding programs. Eleven putative QTLs such as qTAC1.1 and qTAC5.1 for anthocyanin content; qSOD1.1, qSOD5.1 and qSOD10.1 for SOD; qTFC6.1, qTFC11.1 and qTFC12.1 for TFC; qOZ8.1 and qOZ11.1 for γ-oryzanol and qAC11.1 for ABTS were detected to be novel loci. Co-localization of the QTLs detected for OZ11.1, TFC11.1 and AC11.1 regulating γ-oryzanol, avonoid and anthocyanin content, respectively while PAC12.2 for anthocyanin content remained closer to TFC12.1 for avonoid content. These strongly associated QTLs will be useful in the antioxidants improvement programs in rice.

Declarations
Ethics approval and consent to participate: The authors declare that this study complies with the current laws of the country in which the experiments were performed.

Consent for publication: Not applicable
Availability of data and material: The data generated or analyzed in this study are included in this article.