The genetic basis of grain protein content in rice by genome-wide association analysis

The grain protein content (GPC) of rice is an important factor that determines its nutritional, cooking, and eating qualities. To date, although a number of genes affecting GPC have been identified in rice, most of them have been cloned using mutants, and only a few genes have been cloned in the natural population. In this study, 135 significant loci were detected in a genome-wide association study (GWAS), many of which could be repeatedly detected across different years and populations. Four minor quantitative trait loci affecting rice GPC at four significant association loci, qPC2.1, qPC7.1, qPC7.2, and qPC1.1, were further identified and validated in near-isogenic line F2 populations (NIL-F2), explaining 9.82, 43.4, 29.2, and 13.6% of the phenotypic variation, respectively. The role of the associated flo5 was evaluated with knockdown mutants, which exhibited both increased grain chalkiness rate and GPC. Three candidate genes in a significant association locus region were analyzed using haplotype and expression profiles. The findings of this study will help elucidate the genetic regulatory network of protein synthesis and accumulation in rice through cloning of GPC genes and provide new insights on dominant alleles for marker-assisted selection in the genetic improvement of rice grain quality.


Introduction
Rice (Oryza sativa L.) is one of the world's most important food crops and a staple food for more than half of the world's population (Kim et al. 2013;Tian et al. 2009). Owing to improvements of living standards, consumers now have high demands on the quality of rice. Rice grain quality includes a complex set of characteristics such as grain appearance, milling, cooking and eating, and nutritional qualities Yang et al. 2019). The protein content of rice is an important factor, second only to starch content, affecting rice's nutritional and taste quality (Sun et al. 2011). During the past decades, studies have reported the influence of starch on the eating quality of rice. However, the genetic basis and molecular mechanisms underlying the impact of protein content on rice nutrition and eating quality have not been elucidated.
Approximately 90% of rice grains consist of starch and protein; protein accounts for approximately 8-10% of the grain weight (Tian et al. 2009;Wang et al. 2009). A large proportion of rice grain proteins are seed storage proteins (SSPs). There are four kinds of rice SSPs with different solubility-related physical properties: albumin, globulin, prolamin, and glutelin (Chen et al. 2018). Glutelin is rich in essential amino acids and the most abundant SSP, accounting for approximately 80% of total SSPs (Chen et al. 2018). Rice SSPs possess good nutritional quality and are highly digestible by humans (Hamaker and Griffin 1993;Yang et al. 2019). Several studies have demonstrated that GPC affects the cooking properties of rice (Hamaker and Griffin 1993;Martin and Fitzgerald 2002;Wang et al. 2007;Yang et al. 2019;Zhou et al. 2017). Most of them have reported a significant negative correlation between rice GPC and cooking and eating qualities. Furthermore, low GPC is associated with a better taste value, while cooking and eating qualities tend to decrease when GPC exceeds 7% (Furukawa et al. 2006;Wakamatsu et al. 2008). In contrast, other studies have reported that rice GPC is not necessarily negatively correlated with eating quality (Chrastil 1992;Furukawa et al. 2006;Sun et al 2011;Wakamatsu et al. 2008).
Rice GPC-related gene analysis is essential for investigating mechanisms through which GPC affects rice quality and accelerating the genetic improvement of rice quality. Rice GPC is a quantitative trait that has low heritability, and it is significantly affected by the environment (Yang et al. 2019). Quantitative trait loci (QTLs) are located on 12 rice chromosomes, with some significant QTLs having many repetitions (Cheng et al. 2013;Lou et al. 2009;Wang et al. 2007;Yang et al. 2015;Ye et al. 2010;Zheng et al. 2011). At present, only a few cloned genes have been associated with rice GPC in the natural population Yang et al. 2019). Rice QTL qPC1, identified as gene OsAAP6, encodes an amino acid transporter, which positively regulates GPC, including the content of the four SSPs, and increases amylose content, thus affecting the cooking, eating, and nutritional qualities of rice ). Furthermore, two stable QTLs, qGPC-10 and qGPC-1, have been identified. The gene OsGluA2, which encodes a glutelin type-A2 precursor and positively regulates GPC and glutelin content in rice, was verified as a candidate gene for qGPC-10 via map-based cloning (Yang et al. 2019). Many genes encoding traits related to the four SSPs have been cloned in rice mutants. Genes related to glutelin synthesis and accumulation have been identified, including Lgc-1, OsSar1a, OsRab5a, gpa3, and gpa4 (Kusaba et al. 2003;Ren et al. 2014;Tian et al. 2013;Wang et al. 2010Wang et al. , 2016. The proteins RPBF and RISBZ1 directly interact with and identify the GCN4 motifs of the GluA, GluB-1, and GluD-1 promoters to regulate rice glutelin gene expression (Gupta et al. 2015).
In the natural population, more than in mutants, the exploration of genes related to rice GPC and the genetic basis and molecular regulatory mechanisms are helpful for the genetic improvement of rice grain quality. Genome-wide association studies (GWAS) are based on ancient recombination events and have been widely used for high throughput detection of genetic loci for complex traits using a diverse germplasm collection . Although GWAS is widely utilized in genetic analysis of rice grain quality traits, including amylose content, milling quality, grain size, and gelatinization temperature Page 3 of 16 1 Vol.: (0123456789) (Borba et al. 2010;Huang et al. 2010;Li et al. 2014), genetic studies on rice GPC are still scarce (Chen et al. 2018;Tang et al. 2019;Xu et al. 2016;Zhong et al. 2021).
Therefore, the aim of this study was to perform a GWAS using 529 accessions to detect lead single nucleotide polymorphisms (SNPs) that control rice GPC. Among them, four loci and the flo5 gene were identified in the genetic population as affecting the rice GPC. These results provide useful insights for analyzing the regulatory network of GPC synthesis and accumulation and improving rice grain quality.

Plant materials and growth conditions
A total of 529 O. sativa accession performances, collected from 87 countries, including 327 from the World Core Collection and 202 from the China Core Collection, were evaluated using GWAS (Data S1). Near-isogenic lines (NILs) of QTLs were formed by the successive backcrossing and crossing of the high-protein-content variety Zhenshan97B (ZS97B), a low-protein-content variety Delong208 (DL208), and a low-protein-content variety Nanyangzhan (NYZ), respectively. Populations of NIL-F 2 were constructed using ZS97B as the recurrent parent, which was backcrossed four times and then self-crossed. For each QTL, the BC 4 F 2 populations were genotyped using flanked simple sequence repeat (SSR) markers (Data S2). The 529 accessions were planted in the experimental field of Huazhong Agricultural University in three locations during the summer seasons of 2015 and 2016 in Wuhan (China) and 2014 in Ezhou (China). Populations of NILs-F 2 were planted at the experimental field of the Guangdong Academy of Agricultural Sciences, Guangzhou, China, during the 2020 growing season. Three mutants (flo5-1, flo5-2, and flo5-3) and the wild-type (WT) Zhonghua11 (ZH11) were planted in an experimental farm under natural conditions at the Guangdong Academy of Agricultural Sciences, Guangzhou, China, during the 2020-2022 growing seasons. The field experiments were conducted following a completely randomized block design with two replicates per year. Seedlings (approximately 25-day-old) of each accession were planted in the experimental farm, with a spacing of 16.5 × 26.4 cm within and between rows. Field management involved nitrogen fertilizer application of 48.75, 86.25, and 27.6 kg per hectare for the basal, tilling, and booting stages. Other fertilizers were used according to standard farming practice.

Trait measurements
Ten plants from the middle of each line were harvested at maturity. The harvested seeds were threshed, air-dried, and kept for 3 months at room temperature before storing at 4 °C. Approximately 50 g of seeds was de-hulled into brown rice by using a TR 200 dehuller (Kett, Tokyo, Japan). Quantitative analyses of GPC were performed using a TECAN Infinite M200 . Visual inspection was performed to assess grain chalkiness, including white belly and white core phenotypes. Approximately 200 de-hulled grains of rice from each plant, including broken grains, were randomly selected and placed on a visualization device ).

Genome-wide association study
In total, 529 rice accessions were used to perform association analysis. Structural and SNPs of the rice accessions were obtained by RiceVarMap (http:// ricev armap. ncpgr. cn/) . The physical locations of SNPs were determined using the annotated version 6.1 of the variety Nipponbare from Michigan State University as a reference. Association analyses were performed including only SNPs with minor allele frequency (MAF) > 5% and deletion rate < 15%. The factored spectrally transformed linear mixed model (FaST-LMM) program was used to perform associations by the linear mixed models (LMM) method, with a total of 3,916,415, 2,767,159, and 1,857,845 SNPs in the whole population and indica and japonica subpopulations, respectively . A P-value of 5.0 × 10 −6 was the threshold to detect significant association loci. To identify independent association loci, multiple SNPs exceeding the threshold in the 5 Mb region were clustered with an r 2 of linkage disequilibrium (LD) ≥ 0.25, and in a cluster, lead SNPs were the ones with the minimum P-value ). Candidate genes and haplotype analysis SNP variation data for Flo5 (LOC_Os08g09230), LOC_Os07g11120, LOC_Os07g11150, and LOC_ Os07g11250 in the 529 rice accessions were obtained from RiceVarMap. The haplotypes of the candidate gene were analyzed on the basis of all SNPs with MAF > 0.05 (except those in introns), including their 2 kb upstream and intragenic region.

Vector construction and genetic transformation
The clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) protein 9 system was used to generate three Flo5 knockout mutants, and a plasmid was constructed according to previous reports (Xie et al. 2015). The three target sgRNAs for Flo5 were Flo5-gRNA1 (5′-GCC TAG CAC ATA TAG ACA AG-3′), Flo5-gRNA2 (5′-CCT TCT AAT ACG GTG CTG AA-3′), and Flo5-gRNA3 (5′-TTA GGT GGT CCT TTA ACC GA-3′). Primers containing gene-specific sequences were designed using previously described standards (Xie et al. 2015) and assembled according to the instructions using the NEB Golden Gate Assembly Kit and pRGEB32 plasmid (New England Biolabs, Ipswich, MA, USA). The correct recombinant vector was transformed into Agrobacterium tumefaciens EHA105 and subsequently to the rice variety ZH11. The T 0 transgenic-positive plants were detected by a pair of specific primers HPT-F/HPT-R. The primer pairs F5-1-F/F5-1-R and F5-2-F/F5-2-R were used in T 0 to amplify the target sites to verify the mutation by sequencing, and positive transformation was further validated in the T 1 generation. The primers used in this study are listed in Data S2.

RNA extraction and qRT-PCR
Trizol reagent (Invitrogen) was used to extract total RNA. According to the instructions, SuperScript III reverse transcriptase (Invitrogen) was used to synthesize first-strand cDNAs with 3 µg of total RNA in 20 μL of the reaction mixture. SYBR Premix Ex Taq (TaKaRa Bio) was used to perform qRT-PCR on an ABI 7500 PCR instrument (Applied Biosystems). The PCR procedure was carried out as follows: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s, and 60 °C for 40 s. The internal control was the rice Actin gene. Each experiment was performed using at least two biological samples with three replicates for each sample. The primers (Flo5-F/ Flo5-R) for qRT-PCR analysis are listed in Data S2.

Statistical analysis
Significant differences between individual means were determined using a two-tailed student's t-test in Microsoft Office Excel 2010. The IBM SPSS statistics software (version 22.0) was used for one-way analysis of variance (ANOVA) and Duncan multiple comparisons.

Phenotypic variation of GPC in the association panel
The GPC was investigated using 529 rice accessions from a worldwide collection (Data S1). Structural analysis of these accessions revealed a unique structure of the population, which was divided into several distinct subpopulations, including indI (95 accessions

Identification of loci associated with GPC by GWAS
In this study, 3,916,415 SNPs in the whole rice genome were selected for GWAS, which was performed separately for different populations. The Page 5 of 16 1 Vol.: (0123456789) results of GWAS from the full population and the indica and japonica subpopulations over the 3-year study were analyzed ( Fig. 2; Data S5). To avoid structural noise, 299 indica and 156 japonica accessions were used for GWAS. Manhattan plots for the association analysis of protein content and quantile-quantile plots of protein content over 3 years for the full population are illustrated in Fig. 2. Pairs of lead SNPs of around or less than 100 kb were considered to be associated with a common gene and treated as one association locus. Associations differed between populations and years ( Fig. 2; Data S5).
Details of significant association signals with a threshold value at 5.0E-05, among the numerous lead SNPs detected, are listed in Data S5. In the above list, information on lead SNPs regarding population, chromosomes, physical positions, the proportion of phenotypic variance explained, and P-values calculated using LMM, MAF, and neighboring known genes are described. To further analyze the association results, GWAS was performed using the full association panel, indica, and japonica subpopulations over the 3 years (Table 1; Data S5). Based on their physical positions in the rice genome, the detected association loci were widely distributed on 12 chromosomes, with the highest number of associations located in chromosomes 1, 4, 7, and 8. Among them, 56, 36, and 43 loci were detected in 2014, 2015, and 2016, respectively. The significance levels of the associations, excluding those located close to known genes, ranged from P = 4.8E-05 to P = 3.6E-08 according to the LMM for protein content. Lead SNP sf0102043947, located on chromosome 1, had the most significant effect on GPC. The proportion of phenotypic variance explained by each locus ranged from 0.11 to 75.88%. Furthermore, 67 associations were detected in different environments or populations, explaining more than 10% of the phenotypic variation (Table 1; Data S5).
To better understand the comprehensive association results, the number of loci detected was counted (Table 1). Significant genetic heterogeneity was observed among the 12 chromosomes, different groups, and the 3 years of experimentation. A comparison of GWAS results over the 3 years revealed that most of the loci were detected in only 1 year, indicating that environmental variation can greatly affect the performance of GWAS. However, five lead SNPs, sf0102342328, sf0400950425, sf0618816229, sf0709202668, and sf0817958573, were detected in all 3 years and different groups, while sf0401819925 was detected in 3 years and only in the full population. In addition, 16 lead SNPs were detected in 2 of 3 years. Three lead SNPs that were both identified in the full population and two subpopulations were merged for protein content. Specifically, loci sf0102103441, sf0400950425, and sf0709202668 were detected in different populations. A multiple significant GWAS signal corresponding to lead SNP was detected on chromosome 1 in a hot spot at 2.04-2.17 Mb, explaining 8.18-58.3% of the variation. SNP sf0102103441 was detected eight times in different populations and environments, indicating that the respective loci play an important role in the protein content phenotype. Similarly, the lead SNPs sf0102342328, sf0401819925, and sf0618816229 were detected five times (Table 1).
Co-localization of associated sites with previously reported QTLs and genes related to grain quality and agronomic traits In the past decade, dozens of genes related to grain quality have been reported and cloned in rice. To evaluate significant GWAS signals, the localization of associated sites was compared with QTLs previously detected in cultivated rice and genes related to grain quality reported in previous studies (Chen et al. 2018;Lou et al. 2009;Ryoo et al. 2007;Wang et al. 2007Wang et al. , 2015. The analysis identified overlaps between associated sites detected via GWAS and previously reported QTLs or intervals related to grain quality genes. In this study, 120 associated sites were co-localized with nine QTLs adopted from the literature (Table 1). Most of the co-localized QTLs were detected on chromosomes 2 and 7. The lead SNPs sf0707834002 and sf0708340365 overlapped with the reported QTL for both qPC-7 (Lou et al. 2009) and7-4 (Wang et al. 2007). The lead SNP sf0719598121 was detected four times in or overlapped with the reported QTL 7-8(9) (Wang et al. 2007) and lead SNP sf0719625273 (Chen et al. 2018) in both 2014 and 2016 in different populations (Table 1).
In the present study, nine association loci were found in regions of genes involved in grain quality (Table 1). Two genes, Susy2 (LOC_Os03g28330) and Flo5 (LOC_Os08g09230), were less than 50 kb away from the lead SNPs and have been demonstrated to play vital roles in the synthesis of rice grain starch (Ryoo et al. 2007;Wang et al. 2015). Our study detected lead SNPs located near six genes: Sar1a (LOC_Os01g23620) (Tian et al. 2013), GluB6 (LOC_Os02g15070) (Uemura et al. 2003), OsTudor-SN (LOC_Os02g32350) (Chou et al. 2017), and Glb1 (LOC_Os05g41970) (Morita et al. 2009), which have been previously confirmed to participate in the biosynthesis and accumulation of storage protein. The above protein-related genes were less than 150 kb apart from lead SNPs (Table 1), likely because of the strong associations among these SNPs. In addition to the ones described above, the detected genes OsAGPL4 (LOC_Os07g13980) and  GBSSII (LOC_Os07g22930) have been reported to be involved in the starch synthesis pathway (Maung et al. 2021;Ryoo et al. 2007;Wang et al. 2015). Environmental factors and agronomic traits of rice have a great influence on rice protein content. We analyzed rice-cloned genes controlling agronomic traits (growth date, yield, plant type, etc.) within 100 kb upstream and 100 kb downstream of identified loci detected in this study. Genes OsMADS50, Hd1, Ghd7, and RCN1 controlling the growth duration of rice and genes PLA2, OsCDPK13, D11, and TAC1 controlling other agronomic traits of rice were found (Data S5). The effects of these genes related to agronomic traits on rice protein content still need to be verified by further experiments.

Validation of GWAS signals with QTL mapping
Notably, we found that many loci were co-located with amino acid content-related QTLs (Table 1). Two F 9 recombinant inbred line populations were used to validate the authenticity of significant GWAS signals based on previous studies on amino acid content QTLs (Wang et al. 2007). One F 9 recombinant inbred line population was hybridized between ZS97B and DL208 and named ZS97B/DL208, while the other was derived from ZS97B and NYZ and named ZS97B/NYZ. To investigate GWAS loci results for feasibility and efficiency and evaluate their genetic effects, NIL-F 2 populations were constructed using the genetic background of ZS97B (Table 2).
The QTL (2-5) co-localized with lead SNP sf0206873340 was renamed to qPC2.1; QTL (7-4) co-localized with lead SNP sf0706126055 was renamed to qPC7.1; QTL (7-4) co-localized with lead SNP sf0709202668 was renamed to qPC7.2, and QTL co-localized with lead SNP sf0113186602 was renamed to qPC1.1 ( Table 2). The QTL qPC7.1 detected between the markers RM125 and RM214 on chromosome 7 exhibited the highest phenotypic variation (43.4%) and a logarithm of odds (LOD) score of 13.04. The QTL qPC7.2, detected between the markers RM1186 and RM5499 on chromosome 7, explained 29.2% of the phenotypic variation and had the highest LOD score (22.8). The above QTL was reported, validated, and found to be reliable in previous studies (Chen et al. 2018). The QTL qPC2.1, flanked by RM555 and RM492 on chromosome 2, presented a LOD score of 7.9 and phenotypic variation of 9.82%. The QTL qPC1.1, which was only detected in ZS97B/DL208, explained 13.6% of the phenotypic variation and had a LOD score of 6.7. The QTLs, qPC7.1 and qPC7.2, underlined a dominant allele from NYZ and showed a positive additive effect on protein content, suggesting that the allele of ZS97B increased protein content; on the contrary, the other two QTLs, underlining a dominant allele from ZS97B, showed a negative additive effect on protein content, suggesting that this allele decreased the protein content (Table 2). Therefore, all four QTLs controlling protein content were stable, validating the reliability of GWAS signals.
Phenotypic characterization of the flo5 mutants Gene Flo5 was identified in the total population in 2014 based on the GWAS results (Table 1; Fig. 3a). We performed haplotype analysis of SNPs in the promoter (2 kb) and gene regions of Flo5, excluding the intron, and analyzed the GPC of different haplotypes. Ten Flo5 haplotypes (Hap1-Hap10) with differences in GPC values between the 3 years were identified (Fig. 3b). The GPC of Hap2 in 2014 and Hap5 in 2015 was significantly higher than that of the other haplotypes in the same years. In 2016, the GPCs of Hap2 and Hap5 were significantly higher than those of Hap3 and Hap7 (Fig. 3b). These results suggested that Flo5 might affect the GPC in rice. Gene Flo5 has been cloned and shown to affect the quality of rice grain (Ryoo et al. 2007) although its effect on the GPC of rice remains unknown. In this study, a CRISPR/Cas9knockout construct expressing three guide RNAs targeting two exons was introduced into ZH11 (Fig. 3c,  d), and three homozygous mutants (flo5-1, flo5-2, and flo5-3) were generated (Fig. 3d, e). Compared to the wild-type ZH11, flo5-1 had a 3 and 4 bp deletion at the first and third targets, respectively; flo5-2 had a onebase C insertion and a 2 bp deletion at the first and third targets, respectively, and flo5-3 had a 6 bp deletion at the first target and a large 43 bp deletion between the second and third target sites. These changes led to the production of frameshift mutations in all three mutants (Fig. 3d). The chalkiness rates of flo5-1, flo5-2, and flo5-3 mutants were significantly higher than that of ZH11, while the chalkiness rate of flo5-1 was the highest (nearly 80%) (Fig. 3e, f). The rice GPCs of the flo5-2 mutant was the highest, while GPC levels of mutants flo5-1and flo5-3 were also significantly higher than that of ZH11 (Fig. 3g). The GPC of Hap5 was significantly higher than that of Hap3 in the 3 years (Fig. 3b). We detected relative expression levels of Flo5 with Hap3 and Hap5 in 13 and 11 accessions randomly chosen from 529 accessions, respectively. The relative expression levels of Flo5 in Hap3 accessions were higher than those in Hap5 accessions (P < 0.05) (Fig. 3h; Data S6). These results indicate that flo5 could increase rice GPC.
Analysis of candidate genes GWAS analysis detected many SNPs in different years and populations (Table 1; Fig. 4a, b). Candidate gene analysis of repeatedly detected loci can provide a basis for cloning rice GPC genes. Therefore, we analyzed the lead SNP sf0706126055, located on chromosome 7. The above SNP verified by the genetic population was repeatedly detected in 2015 and 2016 and in all indica populations (Tables 1, 2). By analyzing candidate genes within 100 kb upstream and downstream of the lead SNP sf0706126055, multiple haplotypes of three genes, LOC_Os07g11120, LOC_Os07g11150, and LOC_Os07g11250, were found in the full population. The GPC of haplotypes differed between 2015 and 2016, with Hap7 showing a significantly higher GPC than Hap4, Hap5, and Hap6 in the 2 years. Three of the candidate genes were expressed in the endosperm of Minghui 63 (MH63) and ZS97B (Fig. 4c-k). Among them, the candidate gene LOC_Os07g11120 encodes a hydrolase belonging to the NUDIX family; it was highly expressed in the endosperms of MH63 and ZS97B (Fig. 4c-e). The candidate gene LOC_Os07g11150 encodes an expressed protein in the endosperm of MH63 and ZS97 (Fig. 4f-h). The GPC of Hap3 of LOC_Os07g11150 was significantly higher than that of Hap4, Hap7, and Hap10 in 2015, and the GPC of Hap7 was significantly lower than that of other haplotypes in 2016. The candidate gene LOC_Os07g11250 also encodes an expressed protein that was highly expressed in the endosperm of MH63 and ZS97B. The GPC of Hap4 was significantly higher than that of Hap1 and Hap2 in the 2 years ( Fig. 4i-k). Genetic variations of the three candidate genes with different haplotypes were further analyzed (Fig. 5). Most of the SNPs in LOC_Os07g11120 and LOC_Os07g11250 were distributed in the promoter, and the intragenic regions of candidate genes contained 11 and 12 SNPs, respectively (Fig. 5a, b). Gene LOC_Os07g11150  and flo5 mutants. f, g The chalkiness rate and protein content of WT and flo5 mutants. Phenotypic statistics are presented as the means ± SD, n = 10. ** indicate significant differences at P < 0.01 between mutant and WT phenotypes. h Relative expression levels of Flo5 in the endosperm at 10 days after pollination in 24 accessions. Relative expression levels are presented as mean ± standard error of mean (s.e.m); * indicates significant differences at P < 0.05 between Hap3 and Hap5 Page 11 of 16 1 had fewer SNPs than the above genes, with only two SNPs in the intragenic region (Fig. 5b). Subsequently, the haplotype distributions in different subpopulations were analyzed (Fig. 5). Haplotype Hap4 of LOC_ Os07g11120 with significant phenotypic differences was only found in the japonica subpopulation, while Hap6 and Hap7 differences were mostly found in the indica subpopulation (Fig. 5a). Haplotype Hap7 of LOC_Os07g11150 with significant phenotypic differences was also only distributed in the japonica subpopulation, while Hap2, Hap3, Hap9, and Hap10 of LOC_Os07g11150 were mostly found in the indica subpopulation (Fig. 5b). Haplotype Hap2 of LOC_ Os07g11250 with significant phenotypic differences only existed in the japonica subpopulation. While Hap4 and Hap5 were contained within the indica subpopulation, Hap6 and Hap7 were mainly found in the Aus subpopulation (Fig. 5c). The main component of rice protein is glutelin, which is concentrated in the endosperm. The GPC of brown rice is significantly positively correlated with glutelin content and the total SSP content of milled rice. We used the same genetic populations to measure the GPC of brown rice by near-infrared reflectance spectroscopy (NIRS) and SSPs of milled rice flour by solvent extraction method and then performed GWAS on all of them. The GWAS results of glutelin content and total SSP content showed that few lead SNPs were detected (Chen et al. 2018). Meanwhile, compared with the GWAS results of GPC measured by NIRS of this study, it was found that only two lead SNPs on chromosome 7 were both detected in the two GWAS results. In this study, they were sf0709348920 and sf0712867464 (Table 1; Data S5), corresponding to sf0709447538 and sf0712842943 of the GWAS results of glutelin content and total SSP content (Chen et al. 2018), respectively. The non-destructive determination of GPC was based on the NIRS model developed by combining near-infrared spectra and nitrogen content measured by the Kjeldahl method of brown rice samples Xie et al. 2014). It can be concluded that GPC measured by NIRS was the total nitrogen content, including the content of four kinds of SSPs, free amino acids, non-storage protein, and there are many influencing factors for GPC measured by NIRS (Hayes 2020;Xie et al. 2014). The contents of glutelin and total SSP of milled rice were extracted according to the solubility of different SSP (Kumamaru et al. 1988;Yang et al. 2019), and the corresponding SSP content was measured by Coomassie bright blue G250 (Chen et al. 2018;Bradford 1976). The SSP content obtained by this method is the extracted protein content, but this method is easily affected by the extraction solvent and method (Udaka et al. 2000;Kumamaru et al. 1988). Protein content measured by Coomasite bright blue also has influence factors such as interference substances (Chial and Splittgerber 1993;Hayes 2020). In conclusion, protein contents obtained by NIRS and by solvent extraction are different, and the lead SNPs detected by GWAS of the two kinds of protein content are also different. In order to better detect the lead SNPs controlling GPC, it is most effective to find a suitable method for GPC measurement. In accordance with the FAO, the amino acid analysis method provides the most accurate measurement of protein content in foods and should be used where possible (Hayes, 2020).

Effects of Flo5 on rice GPC
Gene Flo5/OsSSIIIa encodes starch synthase III, the second most important key enzyme involved in rice starch synthesis. Gene Flo5 affects the physicochemical properties of starch, amylose content, and amylopectin structure of rice grains (Ryoo et al. 2007). This was demonstrated by double inhibition lines of OsSSIIa/OsSSIIIa, which exhibited high chalkiness, amylose content, gelatinization temperature, medium-long amylopectin chain content, and low viscosity and short and long amylopectin chain content. Starch and protein are the two main components of rice. Starch content mainly affects rice cooking and eating quality, whereas GPC mainly affects the nutritional quality (Chen et al 2018;Peng et al. 2014;Sun et al 2011;Yang et al. 2019). Many genes involved in rice grain quality including Chalk5, OsAAP6, RAG2, RISBZ1, RPBF, and OsMADS6 have been reported to affect both grain starch and protein content (Gupta et al. 2015;Li et al. 2014;Peng et al. 2014;Yu et al. 2020;Zhou et al. 2017). In the present study, we found that the GPC of Hap2 and Hap5 of Flo5 was high, which can be exploited by using these two haplotypes to improve rice grain quality (Fig. 3). Furthermore, flo5 can increase rice GPC, indicating that it affects the content of starch and protein in rice, providing the means for studies elucidating the genetic basis and molecular mechanism underlying starch and protein impact on rice grain quality. The region involved their 2 kb upstream and intragenic region of candidate genes (except those in introns). Yellow SNPs refer to those that are not the same as Nipponbare. Ind, jap, aus, and total were indica, japonica Aus, and the full populations, respectively. GPC phenotypic statistics are presented as the means. Letters in parentheses on the right of the average GPC phenotype values are ranked by the Duncan test at P < 0.05; different letters indicate significant difference ◂ reported QTL or cloned genes related to rice quality (Table 1). Four significant loci were verified by genetic population (Table 2), and the presence of genes affecting rice GPC at these four loci was demonstrated. These results confirm the feasibility of GWAS for preliminary investigations of GPC genes in rice.

Screening of candidate genes
Lead SNPs that were repeatedly detected in different years and populations were considered more reliable. In this study, candidate genes upstream and downstream of the lead SNP sf0706126055 were analyzed, and LOC_Os07g11120, LOC_Os07g11150, and LOC_ Os07g11250 encoding different proteins were identified. Among them, the expression profiles of MH63 and ZS97B showed that LOC_Os07g11120 was highly expressed specifically in the endosperm during the threegrain filling stages. Gene LOC_Os07g11120 encodes a Nudix hydrolase, belonging to a class of enzymes that can catalyze the hydrolysis of various nucleoside diphosphate derivatives and have biological functions, such as maintaining the stability of genetic material and stress responses (Karačić et al. 2017). The GPC differed among different haplotypes of LOC_Os07g11120, and it was highly expressed specifically in the endosperm. The effect of LOC_Os07g11120 on GPC was further verified using a transgene experiment. Significant loci were detected repeatedly, including sf0102103441 and sf0102342328 on chromosome 1, sf0206873340 on chromosome 2, and sf0401819925 and sf0403146346 on chromosome 4 (Table 1). Among them, sf0618816229 on chromosome 6 and sf0719598121 on chromosome 7 were detected more than four times (Table 1). For lead SNPs detected many times, gene cloning can be carried out by constructing a genetic population. Four lead SNPs in this study were verified to have an impact on rice GPC by constructing a genetic population (Table 2), among which sf0206873340 and sf0706126055 were repeatedly detected on chromosomes 2 and 7, respectively ( Table 1). The influence of this locus on rice GPC was verified in the genetic population, and the genes controlling rice GPC can be cloned in the future. This study also provides a theoretical basis for further exploration of the GPC genes in rice.