Genome-wide Association Studies Reveal the Genetic Basis of Natural Ionomic Variation in Rice Grain Oryza Sativa Subsp. Indica

This study presents a comprehensive study of the genetic bases controlling variation in the rice ionome employing genome-wide association studies (GWAS) with a diverse panel of indica accessions, each genotyped with 5.2 million markers. GWAS was performed for twelve elements including B, Ca, Co, Cu, Fe, K, Mg, Mn, Mo, Na, P, and Zn and four agronomic traits including days to 50% owering, grain yield, plant height and thousand grain weight (TGW). GWAS identied 128 loci associated with the grain elements and 57 associated with the agronomic traits. There were sixteen co-localization regions containing QTL for two or more traits. Fourteen grain element quantitative trait loci were stable across growing environments, which can be strong candidates to be used in marker-assisted selection to improve the concentrations of nutritive elements in rice grain. Potential candidate genes were revealed including OsNAS3 controlling the variation of Zn and Co concentrations. The effects of starch synthesis and grain lling on TGW and multiple grain elements were elucidated through the involvement of OsSUS1 and OsGSSB1 genes. Overall, our study provides crucial insights into the genetic basis of ionomic variations in rice and will facilitate improvement in breeding for trace mineral content.


Introduction
Rice (Oryza sativa L.) is a major staple food for over half of the world's population and is a major source of nutrition, although in the form that most consumers eat (white, polished rice), it contains only small amounts of micronutrients 1 . A reliance on rice in the diet, coupled with limited diversity of nutrient-rich foods can lead to malnutrition 2 , with an estimated two billion people suffering from Fe de ciency 3 and 1.5 billion from Zn de ciency 4 . Fe and Zn are responsible for 2.4% and 1.9%, respectively, of the total global burden of disease 5 .
To combat these de ciencies, various interventions have been used by the nutrition and public health community including supplementation, forti cation and in more recent years, bioforti cation 1 . Both supplementation and forti cation can be expensive as they require suitable infrastructure and networks to deliver the nutrient rich product. Bioforti cation relies on the delivery of Fe-and Zn-dense crops via various strategies, including plant breeding and fertilizer applications 6-8 and is considered a longer-term sustainable approach where farmers can keep back nutrient-dense seed for subsequent plantings.
Worldwide, the bioforti cation strategy has led to 300 bioforti ed varieties being approved for release, in over 40 developing countries 9 and this includes Zn bioforti ed rice with moderate levels of Zn, indicating the need for further improvement. The development of Fe-dense rice has not been a target for conventional plant breeding due to insu cient variation for Fe within germplasm, which would not allow for su cient genetic improvement to have a signi cant biological effect in humans. The strategy for breeding Fe-dense rice is through a transgenic route, with the overexpression of nicotianamine synthase genes showing signi cant promise in delivering Fe to de cient communities 9,10 .
Conventional breeding for Zn-dense rice is challenging. While su cient variation for Zn exists in the germplasm and this allows for a breeding strategy to be undertaken, there is moderate heritability 11 and therefore stability of the Zn-dense trait across environments is a major challenge. Understanding of the various environmental factors and genes impacting on the scavenging of Zn from the root rhizosphere and the short/long distance transport routes to the developing caryopsis has come a long way 12-14 but there remain major gaps in our understanding and this confounds selection of stable parents. The introduction of stable genetic markers would be advantageous to accelerate development of Zn-dense rice.
GWAS is a powerful tool to study the molecular basis for phenotypic diversity in rice. Compared with conventional biparental population linkage mapping, GWAS has two outstanding advantages: (i) the rice varieties/accessions used in GWAS panels often contain much more genetic diversity and (ii) GWAS can take full advantage of numerous ancient recombination events resulting in higher mapping resolution 15 .
Over the last decades, studies using GWAS platforms have successfully dissected the genetic bases of several complex traits in major crops, such as owering time and yield-related traits [16][17][18] . There have also been several studies investigating the genetics controlling the element accumulation in rice grain resulting in the identi cation of signi cantly associated loci and putative casual genes 11,19−21 . Examples of the identi ed genes include the OsHMA3 transporter gene controlling the translocation of Cd from the roots to the shoots 22 and the molybdate transporter OsMOT1 gene controlling molybdenum concentration in both straw and grain 23 .
Many factors can affect the e cacy of GWAS such as population structure, sample size and marker density. The rice diversity panel used in this study consisted of 233 Oryza sativa subsp. indica genotypes. This panel was developed at the International Rice Research Institute (IRRI), Philippines for the Phenomics of Rice Adaptation and Yield potential (PRAY) project as a part of the Global Rice Phenotyping Network (http://ricephenonetwork.IR.org/diversity-panels/pray-diversity-panel). The panel represented the diversity within the indica sub-species covering improved and traditional varieties across subtropical and tropical regions around the world 24 . Previously, this panel was used in GWAS studies for various traits including grain quality, panicle architecture, root plasticity, grain yield and yield-related traits [24][25][26][27][28] . This panel was initially genotyped with 700,000 SNPs 29 and in the latest restatement, 5.2 million biallelic SNPs have been imputed on this panel by comparing the 700,000 SNPs with wholegenome sequence data of the 3000 sequenced rice cultivars 30 . The imputed high-density SNP set aimed to increase the signal strength of the marker-trait associations (MTA) and improve the mapping resolution.
In this study, we investigated the performance of twelve elements in brown rice grain of the diversity panel grown in four environments. GWAS were carried out to identify signi cant association loci that were stably expressed in the multiple environments. Subsequently, multiple potential causal candidate genes were identi ed and a genetic mechanism underlying the correlations among different trace minerals were proposed. Favourable alleles and candidate genes for improved micronutrient nutrition, especially for zinc, were identi ed that could be used in rice bioforti cation programs.

Genotypic markers and population structure
The number of independent markers in the indica accessions was estimated to be 6,591. Using this gure in a Bonferroni correction gave a corrected signi cance threshold of p < 7.59 x 10 − 6 (= 0.05/6591) or alog 10 (p) > 5.12 for use in declaration of signi cant MTA.
Population structure within the panel was examined using principal component analysis. The rst PC (PC1) was su cient to discriminate indica from aus and japonica accessions (Fig. 1). PC2 and PC3 further separated indica accessions into three distinct groups. Wang et al. 30 30 . Non-indica accessions were removed from this study. The rst 2 PCs were used as covariates for association analyses due to their representation of real indica sub-populations.

Phenotypic variation and trait heritability
The concentrations of 12 element (B, Ca, Co, Cu, Fe, K, Mg, Mn, Mo, Na, P and Zn) of the brown rice grain from the panel and their broad-sense heritability values are presented in Table 1. The scale of ionomic variation depends on both the element and the growing environment (Table 1, Fig. 2). Amongst the 12 elements, the lowest variation in concentrations were found with the three macronutrients: K, Mg and P (coe cient variation (CV) ranging from 6-10%), followed by Ca, Fe, Mn and Zn (CV ranging from 12-19%). The largest variations were found with the ve elements: B, Co, Cu, Mo and Na (CV ranging from 19-51%). The agronomic traits including days to 50% owering (DF), grain yield (GY), plant height (PH) and TGW were also included in Table 1. Amongst these traits, GY (CV ranging from 25-50%) varied considerably more than the other three traits: PH, DF and TGW (CV ranging from 12-22%). were observed for the nine grain elements (Ca, Co, Fe, K, Mg, Mn, Mo, P and Zn) and the two agronomic traits (DF and PH). The three grain elements (B, Cu and Na) and GY were estimated to have low to moderate heritabilities (26-55%).

Correlations between traits and environments
Signi cant correlations between all four environments were observed for 10 elements (all except for B and Cu) and PH (P < 0.01, r: 0.25-0.93) (Supp.  Signi cant correlations were also observed between element concentrations and the other traits within each environment (Supp . Table S3). Speci cally, GY was negatively correlated with the grain Zn, Fe, K, Mg and P concentrations in all environments, with the Cu, Mn, Mo and Na concentrations in three and with B, Ca and Co in two environments. PH had positive correlations with the Ca, Co, K, Mg, Na, P and Zn concentrations in two or more environments and negative correlations to Cu and Mo concentrations in three environments. DF had negative correlations six elements: namely Ca, Co, Cu, Fe, Mg, P in two or more environments.
The correlations between developmental and agronomic traits differed markedly between the growing environments. GY was positively correlated to DF and PH in two seasons (PR12 & PR13) and negative correlated with PH in one (IR13). PH and DF had only one signi cant correlation in the wet growing season IR12 (r:0.58).
Detection of stable and environment-speci c QTL GWAS was performed separately for each environment and identi ed 128 QTL for grain element concentrations and 57 QTL for agronomic traits (Supp . Tables S3 and S4).

QTL associated with grain elements
The QTL identi ed for grain elements were distributed on all chromosomes (Supp. Table S3). The highest number of QTL was detected for Mo (22 QTL), followed by B, Co, Fe, K, Mn, Na and Zn (10-19 QTL, each) and Ca, Cu, Mg and P (3-5 QTL, each). Environment-wise, the highest numbers of QTL were detected for the two dry growing seasons IR13 and PR13 (45 and 43 QTL, respectively), followed by PR12 (35 QTL) and IR12 (25 QTL).
Of the grain element QTL, 14 were consistently identi ed in two or more environments (Table 2). There was one QTL that was common in all four environments; namely qZn7.2, three QTL stable in three environments (qCo7.1, qK6.1 and qZn7.2) and ten QTL common in two environments (qB7. Haplotypes for the four environment Zn QTL (qZn7.2) were further examined (Fig. 3). Ten haplotypes were identi ed in the panel (Fig. 3). The H40 haplotype was associated with high Zn (30.5 mg.kg − 1 ).
Co-localisation of QTL Co-localization among QTL for different traits were detected chromosomes 1, 2, 3, 6, 7, 8, 9, 10, 11 and 12 ( Table 3). As expected, some highly correlated traits showed QTL that were co-located or in close proximity (within 100kb). For examples, Zn had one common QTL with Cu on chr 1 and one with Mg on chr 3. Na and K shared a common QTL on chr 2, P and K shared one on chr 1. All agronomic traits had co-located QTL with those of the grain elements (Table 3). DF shared a common QTL with K and Mn (chr 2). GY had co-located QTL with Mo (chr 10). TGW shared a common QTL with B, Ca, Co and Mo (chr 3). The QTL for PH were co-localized with those for Co, Cu, K, Mg, Na, P and Zn on chr 1 and 8.
The two genomic regions that harboured the highest number of QTL were on chrs 3 and 8 (

Candidate genes
The physical positions of trait-associated markers were used to locate genes that were either linked to or in close proximity (within 300 kb) of the most signi cant SNPs. Supp. Table S6 lists the potential candidate genes linked to the stable QTL (over three or more environments) and the major QTL clusters (on chrs 3 and 8). There were four main groups: (i) genes involved in metal transportation processes such as Zinc transporter (ZIP2), High-a nity potassium transporters (HKT1, HKT9), Nicotianamine synthase 3 (NAS3), Heavy-metal transport/detoxi cation, heavy-metal ATPase; (ii) genes controlling grain development such as Sucrose synthase (SUS1), Granule-bound starch synthase 1 (GBSS1), grain size (GS3); (iii) genes controlling plant phenology such as Flowering-time locus (FLT7), Heading response regulator, senescence protein, No apical meristem protein (NAC factor) and (iv) genes involved the synthesis or processing of storage proteins such as Serine-type carboxypeptidase family.

Discussion
The control of macro-and micro-nutrient homeostasis in plants have been extensively studied, however the loci that control natural ionomic variations in the grain are still largely undetermined in rice 31 , 32,33 .
GWAS has been a powerful to dissect complex traits in plants 31 , however there are several factors that can limit the success of using GWAS to study the rice ionomes. These limiting factors include the relatively little variation in plant elemental concentrations and the often-substantial environmental effect 34,35 . In plant the the transport and homeostasis of essential mineral nutrients are highly regulated processes as they require adequate levels of these essential nutrients for their growth and reproduction, while at the same time excess accumulation can also be detrimental to cell growth 36 .
The rice panel used in this study represents an excellent resource for genetic diversity covering a wide geographical and ecological variation in rice germplasm 25,26,30 . This diversity promising a large number of haplotypes is advantageous, however the effects of population structure need to be accounted for, in this case through a mixed model approach. The high density of SNPs (∼17 SNPs per kb on average) in our GWAS panel also facilitates high-resolution mapping with the loci were generally obtained within ∼100 kb, much higher than those obtained using linkage mapping approach 37 . This high resolution makes it possible to identify plausible candidate genes for a number of loci identi ed by GWAS using the information about the functional gene annotation or their orthologous genes in other plant species 38,39 .
The group of elements that showed relatively low levels of variation for the grain element concentrations consisted of four essential macronutrients (K, Mg, P, Ca) and four essential micronutrients (Cu, Fe, Mn and Zn) (Table 1, FigS1). These results indicate that the homeostasis of these elements is under relatively tight regulation. Previous research has shown that plants have evolved regulatory mechanisms to control the internal uctuation of the essential nutrients to maintain their concentrations within narrow ranges for optimal growth, development and seed production 40,41 . Signi cantly larger variations were found for the concentration of the second group consisting of the three essential micronutrients B, Co and Mo and Na (Na is a functional but nonessential element) 42 . It is likely that the elements in the second group were under less pressure to regulate their concentrations (unless they approach toxicity levels), thus having more relaxed control mechanisms. The differences in these control mechanisms exist not only among genotypes, they can also vary temporally and spatially within a given plant. Because this regulatory variability exists, it would appear that enhancing the micronutrient density of edible plant components through the manipulation of physiological processes is an achievable goal. The high heritability values of the nine grain elements also indicate that the contribution of genotypic variance to the total phenotypic variance was signi cant for these traits. Similar results were reported in previous studies 11,43,44 . Thus, direct selection for these elements may be a practical approach for trait improvement.
All twelve elements concentrations in the grain were negatively correlated with grain yield in at least two environments (Supp . Table S2). Six elements including Fe, K, Mg, Na, P and Zn had negative correlations with grain yield in all four environments and the highest correlation coe cients were found with Fe, Mg and P (r: -0.39 to -0.52). The negative correlations between grain yield and grain element concentrations are not uncommon in rice and have been reported in past studies for K, Mg, Mn, P, and Zn 11,20 . This likely re ects the dilution effects of increasing grain mass on the elemental concentrations. Minimizing the effect of grain yield for genetic mapping may be a required corrective measure in determining the genetics controlling this element accumulation in the grain, which would bene t breeding for rice lines with high nutrient concentrations. In our study, despite having strong negative correlations with all elements, grain yield had only one co-located QTL with Mo concentration on chr 10 (16.8Mb) in one environment (PR13). This suggests that selection to enhance these grain elements at the identi ed loci is not likely to incur a yield penalty.
PH had consistent positive correlations with Co, Ca K and Zn; negative correlations with Cu and Mo and no correlations with Fe or Mn in three or more environments. As expected, PH QTL were located with those of Co, Ca, Na and Zn on chr 1 and 8. In theory, a taller plant will have more biomass and hence is able to accumulate higher levels of Zn during vegetative growth, which then becomes a larger Zn source during seed lling. Lower harvest index of taller plants may also mean that internal competition among grains for the elements, if any, will be weaker compared to the shorter counterparts. Plants can remobilize and move nutrients from vegetative source organs into seeds during the lling of grains 45 . Hence, the amount of both Fe and Zn in the cereal grain is dependent on the former physiological processes: rstly, their acquisition from the soil by roots, and secondly, transportation to the shoots and further remobilization of stored minerals from leaves when they senesce at grain lling 46,47 .
Strong positive correlations between Co, Cu, Fe, K, Mg, Mn, and P were observed in three or more environments. This could be explained by an overlap in mechanisms to uptake and transport these elements within the plant. There have been several studies that reported correlations between different trace minerals 21,44,48 and between essential minerals and toxic elements 21 . Genetic mapping has also been attempted to elucidate the genetic basis underlying these correlations in rice and other cereals [49][50][51][52] . Previous studies suggested that gene pleiotropy and QTL co-localization played a role in the correlations among trace minerals 21,44,49 . Similarly, several correlated traits were associated with the same QTLs either in the same or in a different environment in this study ( Table 3). The results con rm that there is a highly complex genetic network controlling grain nutrition levels at multiple loci 19,53,54 . The colocalisations of Cu-Zn (chr 1), Co-Zn (chr 7), K-P (chr 1), K-Na (chr 2), K-Mn (chr 2) QTL support the possibility of simultaneous improvement of these elements in rice grain. Fe and Zn have been targeted for bioforti cation for decades 55,56 and it is bene cial to explore the possibility to expand to other essential nutrients.
The lack of co-localisation of GY QTL and all grain elements (except for Mo), indicate that selection for higher grain element concentrations at those loci is unlikely to incur yield penalty.
Despite of their strong correlations, P did not share any common QTL with either Zn, Fe or Mg. Thus, the selection for increasing the element concentrations of those at the loci is not likely to increase P concentration, which is desirable in relations to Zn and/or Fe bioavailability. In mature grain, P is mainly stored as phytate (myo-inositol-1,2,3,4,5,6-hexakisphosphate, InsP6), which has the ability to complex Zn and Fe forming insoluble complexes that cannot be digested or absorbed by humans 57 .
Two genomic regions contained the most QTL for element concentration on chrs 3 and 8 (  20,34,51,58,59 . This is the rst time that a QTL for the grain Co and Na concentrations is being reported in this region. Overall, the two genomic regions on chr 3 and 8 that were associated with multiple elements could lead to the possibility for improvement of multiple nutrients simultaneously in rice breeding. However, grain yield and other developmental traits have also been mapped to the three regions in previous studies, suggesting that selection for higher grain nutrition may incur yield penalty and should be taken into consideration.
For QTLs to be highly effective within breeding programs, they must explain a signi cant proportion of the variation and be stable across environments and populations 31 50 and heading date 58 which may have to be taken into account for breeding purposes.
The three-environment QTL qK6.1 was located on the top of chr 6. This genomic region also harboured QTL for K, Cu and Zn concentrations and heading date in previous studies 11,20,60 . There has not been any QTL for grain yield reported in either of the genomic regions on chr 6 and 7 indicating that they are promising targets for improving Zn, Co and/or K concentration without yield penalty.
There was no stable QTL detected for grain yield or Cu, Mg, Mn and P concentrations. This is likely attributed to the large effect of the environmental conditions on the traits. Temperature, daylength, rainfall, soil nutrition could affect plant growth, owering time, grain development, grain yield and grain elements 34,61 . Although consensus QTL can generally be considered as more favourable for markerassisted selection, some QTL detected in one environment may lead to important discoveries.
Calmodulin 7 belongs to a Ca 2+ ion binding protein family present in the rice phloem sap. This family is considered to play an important role in signal transduction in the sieve tubes of rice plant 69 . The fact that both NAS3 and CAM7 genes were linked to the stable Zn and Co QTL implies the important roles of the phloem transport processes for Zn and possibly also for Co from vegetative tissues into the grain. In this genomic region, there were also a gene coding for Phytochelatin synthase 12 (Os07g0690800). The enzyme Phytochelatin synthase catalyses the nal step in the biosynthesis of phytochelatins, which are shown to be essential for Cd detoxi cation and Zn tolerance in Arabidopsis thaliana 70,71 . The roles of this in regulating grain Zn and/or Co concentrations has not been reported to date.
On chr 3, there were three genes playing key roles in starch synthesis and grain lling including SUS1 (Os03g0401300), GS3 (Os03g0407400) and GS5 (Os03g0393300) ( Table 4 & Table S5). SUS1 (linked to qCo3.1) encodes a sucrose synthase (Sucrose-UDP glucosyltransferase) responsible for the biosynthesis of starch within the endosperm. Overexpression of this gene in transgenic rice lines led to increased grain yield (per plant) and TGW 72 . GS3 (linked to qCa3.2 and qTGW3.1) and GS5 (linked to qMo3.2) encode a transmembrane protein and a putative serine carboxypeptidase, respectively. Natural variations in either of these genes were found to play important roles in regulating grain lling and nal grain size and weight [73][74][75][76] . On chr 6, there was one gene coding for granule-bound starch synthase GBSS1 (Os06g0133000) located within the three-environment QTL qK6.1. This enzyme is involved in starch synthesis during grain lling, speci cally being responsible for the synthesis of amylose and building the nal structure of amylopectin. These results suggest that those genes played signi cant roles in controlling grain lling and TGW, which in turn affected grain nutrient concentrations in our study, particularly for Ca, Co, K and Mo. The transfer route of micronutrients (such as Fe and Zn) into the grain is thought to be similar to that of sucrose 77,78 . In transgenic wheat lines overexpressing a sucrose transporter gene, there was an increase in grain yield as well as a 20-40% increase in grain Fe and Zn concentrations 78 . The functionality of those genes in relation to controlling grain nutrient elements such as Ca, Co, K and Mo in rice, will require further studies to elucidate.
Underlying the QTL clusters, there were also several genes that play important roles in regulating owering time and whole-plant senescence in rice including Terminal ower 1-like protein, FT7-Flowering Locus and NAC factor ( Table 4, Table S5). Flowering, grain lling and whole-plant senescence are processes that are highly important in determining grain weight, yield and quality parameters such as grain protein content (GPC) and grain micronutrient including Fe, Mn and Zn levels in cereals 79,80 .
There are several transporter genes linked to the stable QTL for K concentration qK6.1 and the QTL clusters on chrs 3 and 8. On chr 3, two metal transporters; i.e OsZIP2 (Os03g0411800) and a heavy metal transport/detoxi cation (Os03g0412300) were located within the QTL for Ca concentration and TGW. On chr 6, a high-a nity potassium transporter genes (OsHKT9) were linked to qK6.1. On chr 8, a Cation e ux protein MTP12 (Metal tolerance protein) was linked to qPH8.3. Amongst these genes, the functionality of the ZIP transporter families has been well characterised. There are 17 ZIP transporters in rice and most of them show broad substrate transport activity (transporting Zn, Fe, Mn, Cd and Co). The genes OsZIP1, 2, 4, 5, 7, and 8 are highly induced by Zn de ciency 81-83 . It has been shown that mineral uptake and transportation in rice is a complex process that involved the combined actions of several transporter genes 45,84 . The genes being identi ed in this study would be potential candidates for further studies to improve essential nutrient element concentration in the rice grain.
In conclusion, the rice diversity panel used in this study proved to be a useful resource for association mapping of rice grain nutrition with signi cant variation observed and QTL detected for all traits. Colocalizations of QTL for multiple grain element concentrations was found, and particularly those on chrs 3 and 8 open the opportunity for enhancing multi elements simultaneously. Consistent QTL across environments were identi ed, particularly the four-environment QTL for Zn (qZn7.2). This QTL had been reported previously, indicating its stability in different genetic backgrounds, and is a strong candidate for being used in breeding for higher Zn concentration. Multiple candidate genes were identi ed, which play various roles in controlling mineral accumulations in rice grain including NAS3 and SUS1. Further gene functionality studies would be helpful to validate the signi cance of the candidate genes in breeding for higher micronutrient content in rice grains.

Materials And Methods
Plant material and eld trial  Table S6.

Phenotyping
At maturity, plants of from the middle two rows (excluding the border rows) were harvested to assess yield (14% moisture) and thousand grain weight (TGW) following standard procedure 85 . Days to owering (DF) was assessed as the interval between the date of sowing and the date when panicles of 50% of plants per plot were fully exerted. Plant height (PH) was measured from the base of the rootshoot junction to the tip of the ag leaf, which was manually straightened to be aligned with the culm. Genotypic data, population structure and linkage disequilibrium Genotypic data describing 5.2 million biallelic SNPs in a rice reference panel covers 233 genotypes from the PRAY Indica panel 30 and was used in this current study. The number of independent markers in the genotypic data was estimated according to the autocorrelation method described by Sobota et al. 89 . Brie y, genotypic data was split by chromosome and recoded to represent the number of minor alleles at each locus for each individual. An autoregressive model was t to each individual to estimate the number of independent markers. This number was averaged for each chromosome and the nal number of independent markers derived by summing all chromosomes.
Principal component analysis was conducted using PLINK 1.9 90 to identify population substructure and identify non-Indica individuals.
A kinship matrix was constructed using the IBS model in emmax 91 to describe cryptic relatedness in the population.

GWAS
Normality of phenotypic distribution was assessed using the Shapiro-Wilk test implemented in R (R Core Team 2018) using a signi cance threshold of p < 0.05. Where possible, phenotypes found not to be normally distributed were transformed to normality using the following transformations: square root, cube root, natural log, inverse cube root, inverse square root, inverse. GWAS were performed for all transformations up to and including the rst to be normally distributed.
GWAS were performed utilising a mixed linear model (MLM) in emmax, incorporating kinship plus up to two principal components to account for population structure.
In the mixed model, principal components and family kinship were included Y = X α + Q β +K µ + e where Y represents the vector of phenotype, X represents the vector of SNPs, Q is the PCA matrix and K is the relative kinship matrix. X α and Q β are the xed effects, and K µ and e represent random effects. The Q and K matrices help to reduce the spurious false positive associations. Correction for population structure (Q) substantially reduces the false positives but it sometimes eliminates true positive associations due to overcorrection. Therefore, the optimal number of principal components was estimated for each trait before incorporating them for MLM tests, based on the forward model selection method using the Bayesian information criterion. This method helps to control both false-positive and -negative associations more effectively although it cannot eliminate both completely.
The lambda genomic in ation factor was determined for each association analysis using the regression method of the estlambda function from the GenABEL R package 92 . In comparing multiple association models applied to a single trait, an in ation factor closest to one signi ed the best analysis.
A signi cance threshold α = 0.05 was used for association mapping, but was adjusted using the Bonferroni approach considering the estimated number of independent markers: α adj = α / n where α is the unadjusted signi cance threshold and n is the number of independent markers in the population.
Quantitative trait loci (QTLs), regions containing SNPs associated with phenotypes, were de ned as described by McCouch et al. 29 . A QTL was de ned as any region containing one SNP with -log 10 (p) >log 10 (α adj ) anked by markers with -log 10 (p) > 4 on each side and within 100 kb.
Candidate gene and haplotype analysis The physical locations of SNPs were identi ed based on the Rice Annotation version of 7.0 of variety Nipponbare from Michigan State University. Considering that the LD decay distance in XI accessions is about 100 kb 93 , signi cant SNPs located to a region of less than 100 kb were treated as one locus. The annotations of genes located within LD blocks were obtained from the Os-Nipponbare-Reference-IRGSP-1.0 rice genome database (https://rapdb.dna.affrc.go.jp/download/irgsp1.html).
Haplotype analysis of candidate genes plus 1.5 kb upstream was performed in R using the imputed SNP dataset.

Trait heritability and G×E analysis
Trait heritability and genotype × environment interactions were investigated using additive main effects and multiplicative interactions (AMMI) model in GenStat 94 . The computed variance components were used to estimate broad-sense heritability across the four environments tested using the formula described by Velu et al. 95 : H 2 = σg 2 / (σg 2 + σge 2 + σe 2 / rl) where σg 2 is the genotypic variance, σge 2 is the genotype × environment variance, and σe 2 is the residual error variance for r replicates and l locations.