The genetic basis of grain protein content in rice by genome-wide association analysis

doi:10.21203/rs.3.rs-2206021/v1

Download PDF

Research Article

The genetic basis of grain protein content in rice by genome-wide association analysis

https://doi.org/10.21203/rs.3.rs-2206021/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 26 Dec, 2022

Read the published version in Molecular Breeding →

You are reading this latest preprint version

The grain protein content (GPC) of rice is an important factor that determines its nutritional, cooking, and eating qualities. To date, some genes affecting GPC have been identified in rice, most of which have been cloned using mutants. A few genes controlling rice GPC have been cloned in the natural population. Here, 135 significant association loci were detected in a genome-wide associated study (GWAS), and many loci could be repeatedly detected across different years and populations. Four minor quantitative trait loci affecting rice GPC at four significant association loci, qPC1.1, qPC1.2, qPC1.3, and qPC1.4, were further validated in near-isogenic line F₂ populations (NIL-F₂), and explained 9.82, 43.4, 29.2, and 13.6% of the phenotypic variation, respectively. The associated flo5 knockdown mutation simultaneously increased the grain chalkiness rate and GPC. Three candidate genes in a significant association locus region were analyzed using haplotype and expression profiles. The findings of this study will contribute to the cloning of rice GPC genes to elucidate the genetic regulatory network of protein synthesis and accumulation in rice, and provide new dominant alleles for marker-assisted selection in the genetic improvement of rice grain quality.

Oryza sativa

Grain protein content

GWAS

flo5

Grain quality

CRISPR/Cas9

Rice (Oryza sativa L.) is one of the most important global food crops, and is a staple food for more than half of the world's population (Kim et al. 2013; Tian et al. 2009). With improvements in living standards, consumers now have high requirements for rice quality. Rice grain quality including grain appearance, milling, cooking and eating, and nutritional qualities is a complex characteristic (Peng et al. 2014; Yang et al. 2019). Protein content in rice is second only to starch content, and is an important factor that affects rice nutritional and taste quality (Sun et al 2011). In recent decades, studies have reported the influence of starch on the eating quality of rice. However, the genetic basis and molecular mechanism of protein content on rice nutrition and eating quality are not well understood.

Approximately 90% of rice grains are starch and protein storage materials; protein, the second major nutrient in rice, accounts for approximately 8–10% (Tian et al 2009, Wang et al 2009). Seed storage proteins (SSPs) account for a large proportion of rice grain protein. Based on their solubility-linked physical properties, there are four kinds of rice SSPs: albumin, globulin, prolamin, and glutelin (Chen et al. 2018). Glutelin is rich in essential amino acids and is the most abundant, accounting for approximately 80% of total SSPs (Chen et al. 2018). Therefore, SSPs possess good nutritional quality and have high digestibility for humans (Hamaker and Griffin 1993; Yang et al. 2019). Some studies have shown that GPC impacts the cooking properties of rice (Hamaker and Griffin 1993; Martin and Fitzgerald 2002; Wang et al. 2007; Yang et al. 2019; Zhou et al. 2017). Most studies have reported a significant negative correlation between rice GPC and cooking and eating quality. Low GPC is associated with a higher taste value, and when GPC exceeds 7%, cooking and eating quality tend to worsen (Furukawa et al. 2006; Wakamatsu et al. 2008). Some studies have also reported that rice GPC is not necessarily negatively correlated with eating quality (Chrastil 1992; Furukawa et al. 2006; Sun et al 2011; Wakamatsu et al. 2008).

Exploring genes related to rice GPC will help to analyze the mechanism through which rice GPC affects rice quality and to accelerate the genetic improvement of rice quality. Rice GPC is a quantitative trait that has low heritability and is readily impacted by the environment (Yang et al. 2019). Quantitative trait loci (QTLs) have been positioned on twelve chromosomes of rice, and few QTLs with large effects have been detected repeatedly (Cheng et al. 2013; Lou et al. 2009; Wang et al. 2007; Yang et al. 2015; Ye et al. 2010; Zheng et al. 2011). At present, only a few cloned genes related to rice GPC in the natural population (Peng et al. 2014; Yang et al. 2019). In rice, the QTL qPC1 have been identified as gene OsAAP6, and encodes an amino acid transporter, which positively regulated GPC, including the content of four SSPs. In addition, OsAAP6 increases amylose content, thus affecting the cooking, eating, and nutritional qualities of rice (Peng et al. 2014). Two stable QTLs qGPC-10 and qGPC-1 have been identified, and OsGluA2 was verified as a candidate gene for qGPC-10 by map-based cloning, which encodes a glutelin type-A2 precursor. OsGluA2 positively regulated GPC, which affects glutelin content in rice (Yang et al. 2019). Many genes encoding four SSPs have been cloned in mutants of rice. Genes related to glutelin synthesis and accumulation have been reported, including Lgc-1, OsSar1a, OsRab5a, gpa3, and gpa4 (Kusaba et al. 2003; Ren et al. 2014; Tian et al. 2013; Wang et al. 2010; Wang et al. 2016). The proteins RPBF and RISBZ1 directly interact and identify the GCN4 motifs of the GluA, GluB-1, and GluD-1 promoters to regulate rice glutelin gene expression (Gupta et al. 2015).

Exploration of genes related to rice GPC in the natural population, and the genetic basis and molecular regulatory mechanisms are helpful for genetic improvement of rice grain quality. Genome-wide association studies (GWAS) take advantage of ancient recombination events, and have been widely used to detect genetic loci for complex traits at high throughput using a diverse germplasm collection (Chen et al. 2014). Although GWAS is widely studied in genetic analysis of rice grain quality traits, including amylose content, milling quality, grain size, and gelatinization temperature (Borba et al. 2010; Huang et al. 2010; Li et al. 2014), genetic studies on rice grain protein content is still rare (Chen et al. 2018; Tang et al. 2019; Xu et al. 2016; Zhong et al. 2021).

In this study, we performed a GWAS of rice GPC using 529 accessions and detected 135 lead SNPs that control this trait. Among them, four loci were verified in the genetic population. Flo5 was found to affect the rice GPC. These results may be useful for analyzing the regulatory network of GPC synthesis and accumulation, and for improving rice grain quality.

Plant materials and growth conditions

A total of 529 Oryza sativa accessions, collected from 87 countries, including 327 from the World Core Collection and 202 from the China Core Collection, were performed for GWAS (Data S1). Near-isogenic lines (NILs) of QTLs were formed by successive backcrossing and crossing of the high-protein-content variety Zhenshan97B (ZS97B), a low-protein-content variety Delong208 (DL208), and a low-protein-content variety Nanyangzhan (NYZ), respectively. NIL-F₂ populations were constructed using ZS97B as the recurrent parent, which were backcrossed four times, and then self-crossed. For each QTL, the BC₄F₂ populations were genotyped using flanked simple sequence repeat (SSR) markers (Table 2; Data S2). The 529 accessions were planted at the experimental field of Huazhong Agricultural University in three environments during the summer seasons of 2015 and 2016 in Wuhan (China) and 2014 in Ezhou (China). NILs-F₂ populations were planted at the experimental field of Guangdong Academy of Agricultural Sciences, Guangzhou, China, during the 2020 growing season. Three mutants (flo5-1, flo5-2, and flo5-3) and wild-type (WT) Zhonghua11 (ZH11) were planted under natural experimental farm conditions of the Guangdong Academy of Agricultural Sciences, Guangzhou, China, during the 2020–2022 growing seasons. The field experiment was conducted following a completely randomized block design with two replicates per year. Seedlings (about 25-days-old) of each accession were planted in the experimental farm, with spacing of 16.5 × 26.4 cm within and between rows. Field management involved nitrogen fertilizer application (per hectare): 48.75, 86.25, and 27.6 kg as the basal fertilizer, the tilling stage fertilizer, and the booting stage fertilizer. Other fertilizers shall be managed as normal in the field.

Table 1 Associated lead SNPs detected by LMM method in the different populations and environments

Chr.	Lead SNP	Num.	Population ^a	Year	Known genes/QTLs (bp) ^b
1	sf0102103441	8	All, Ind_All, Jap_All, TrJ	2014, 2016
1	sf0102342328	5	All, Jap_All, TeJ	2014, 2015, 2016
1	sf0103093920	2	All	2014, 2016
1	sf0113186602	2	IndI	2015, 2016	Sar1a (-98.48)
1	sf0139892657	2	TeJ, Aus	2015	1-19
2	sf0206873340	4	All, Ind_All	2014, 2015	2-4(5)
2	sf0208274542	1	IndI	2015	GluB6 (-128.21), 2-7
2	sf0212141249	2	All, Jap_All	2014	2-9(10)
2	sf0219198685	1	All	2015	OsTudor-SN (93.15)
3	sf0311636192	3	All, Jap_All	2014
3	sf0316272964	1	IndII	2015	Susy2 (-33.18)
3	sf0319366452	2	All, Ind_All	2014
4	sf0400950425	3	All, Ind_All, Jap_All	2014, 2015, 2016
4	sf0401250213	3	All, Jap_All	2015, 2016
4	sf0401819925	5	All	2014, 2015, 2016
4	sf0403146346	4	All, Jap_All	2014, 2016
4	sf0404599105	2	All, TeJ	2014
4	sf0423552592	3	All, Ind_All	2015, 2016
5	sf0508556305	2	All, Jap_All	2014
5	sf0516166384	2	All, IndII	2015
5	sf0524443891	1	Ind_All	2014	Glb1 (-133.85)
6	sf0609401331	3	All, TeJ	2016
6	sf0618816229	5	All, Jap_All	2014, 2015, 2016
7	sf0706087386	1	All	2016	Rc (24.5), 7-4(5)
7	sf0706126055	4	All, Ind_All	2015, 2016
7	sf0707834002	1	Ind_All	2016	OsAGPL4 (-155.28)
7	sf0708340365	2	All, Jap_All	2014	qPC-7, 7-4(5)
7	sf0709202668	3	All, Ind_All, Jap_All	2014, 2015, 2016	7-4(5)
7	sf0710047261	3	All	2014, 2016	7-4(5)
7	sf0712761453	2	All, IndI	2014, 2016	7-4(5)
7	sf0712867464	1	IndI	2014	GBSSII (-53.72), 7-4(5)
7	sf0714417157	2	IndII	2015, 2016
7	sf0714859990	2	IndI	2014, 2015
7	sf0716523661	2	Jap_All, TeJ	2014
7	sf0719598121	4	All, TeJ	2014, 2016	7-8(9)
7	sf0729064931	2	All	2014, 2016
8	sf0800465605	3	All, Ind_All	2015, 2016
8	sf0805366362	1	All	2014	Flo5 (14.26)
8	sf0817958573	3	IndII, IndI	2014, 2015, 2016	8-9(10)
8	sf0827747654	2	Ind_All, IndII	2014, 2016
8	sf0828049587	2	All, Jap_All	2015
9	sf0911061640	2	Ind_All, IndII	2014
11	sf1102396805	2	All, Jap_All	2014

Chr., chromosome; Num., number. ^a All, lead SNPs were detected in the full population; Ind_All and Jap_All, lead SNPs were detected in the indica and japonica subpopulations, respectively; TrJ, TeJ, IndI, IndII, and Aus, lead SNPs were detected in the TrJ, TeJ, IndI, IndII, and Aus accessions, respectively. ^b Negative value means the known genes and QTLs is upstream of the lead SNP site

Table 2

Validation of four QTLs for GPC in NIL-F₂ populations derived from the crosses derived from ZS97B, NYZ, and DL208
Pop.	Genotype	Num.	Mean ± SD	QTL	Chr.	Interval	LOD	Add	Dom	Var (%)	Lead SNP
ZS97B/NYZ	ZS97B	45	105.32 ± 2.10(A)	qPC1.1	2	RM555- RM492	7.9	-0.99	-1.17	9.82	sf0206873340
	Het.	79	105.19 ± 1.91(A)
	NYZ	41	107.28 ± 2.00(B)
	ZS97B	43	114.03 ± 12.07(A)	qPC1.2	7	RM125- RM214	13.04	2.13	-2.52	43.4	sf0706126055
	Het.	86	109.38 ± 9.31(B)
	NYZ	30	109.83 ± 5.16(B)
	ZS97B	48	123.98 ± 5.00(A)	qPC1.3	7	RM1186- RM5499	22.8	4.07	-2.45	29.2	sf0709202668
	Het.	113	117.06 ± 3.40(B)
	NYZ	64	115.56 ± 4.06(B)
ZS97B/DL208	ZS97B	57	120.74 ± 2.32(A)	qPC1.4	1	RM493- RM562	6.7	-1.44	-0.18	13.6	sf0113186602
	Het.	99	121.83 ± 2.74(A)
	DL208	56	123.43 ± 2.63(B)
Pop., population; Num., number; Chr., chromosome; Add, additive effect, positive value means ZS97B allele increasing GPC values, Dom, dominance effect, Var, variance explained by the QTL. Phenotypic statistics are presented as the means ± standard deviation (SD). Different capital letters after phenotype values represent significant differences as determined by Duncan’s test at P < 0.01

Trait Measurements

Ten plants from the middle of each line were harvested at maturity. Harvested seeds were threshand air-dried. Then seeds stored for three months at room temperature, and followed by storage at 4°C. Approximately 50 g of seeds was de-hulled into brown rice by using a TR 200 dehuller (Kett, Tokyo, Japan). Quantitative analyses of GPC were performed using a TECAN Infinite M200 (Peng et al. 2014). Visual inspection was performed to assess grain chalkiness, including white belly and white core phenotypes. Approximately 200 de-hulled grains of rice from each plant, including broken grains, were randomly selected and placed them on a visualization device (Li et al. 2014).

Genome-wide Association Study

In total, 529 rice accessions were used to construct an association analysis. Structural analyses and single nucleotide polymorphisms (SNPs) of the e accessions can be obtained in RiceVarMap (http://ricevarmap.ncpgr.cn/) (Chen et al. 2014). The physical locations of SNPs were determined by reference to annotated version 6.1 of the variety Nipponbare from Michigan State University (MSU). Association analyses selected only SNPs with a minor allele frequency (MAF) > 5% and deletion rate < 15% to perform. The factoral spectrally transformed linear mixed model (FaST-LMM) program was used to adopt to perform associations by linear mixed models (LMM) method, with a total of 3,916,415, 2,767,159, and 1,857,845 SNPs in the whole population and indica and japonica subpopulations, respectively (Chen et al., 2014). P-value of 5.0 × 10^− 6 was the thresholds to detect significant association loci. In order to identify independent association loci, multiple SNPs exceeding the threshold in the 5 Mb region were clustered with an r² of linkage disequilibrium (LD) ≥ 0.25, and in a cluster lead SNPs were SNPs with the minimum P-value (Chen et al., 2014).

Candidate Genes And Haplotype Analysis

SNP variation data for Flo5 (LOC_Os08g09230), LOC_Os07g11120, LOC_Os07g11150, and LOC_Os07g11250 in the 529 rice accessions are available at RiceVarMap. The haplotypes of a candidate gene were analyzed on the basis of all SNPs with a MAF > 0.05 (except those in introns), including their 2 kb upstream and intragenic region.

Vector Construction And Genetic Transformation

The Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) protein 9 system was used to generate three Flo5 knockout mutants, and a plasmid was constructed as described in previous reports (Xie et al. 2015). The three target sgRNAs for Flo5 were Flo5-gRNA1 (5′-GCCTAGCACATATAGACAAG-3′), Flo5-gRNA2 (5′-CCTTCTAATACGGTGCTGAA-3′), and Flo5-gRNA3 (5′-TTAGGTGGTCCTTTAACCGA-3′). Primers containing gene-specific sequences were designed using previously described standards (Xie et al. 2015). They were assembled according to the instructions using the NEB Golden Gate Assembly Kit together with the pRGEB32 plasmid (New England Biolabs, Ipswich, MA, USA). The correct recombinant vector was transformed into Agrobacterium tumefaciens EHA105 and rice variety ZH11. The T₀ transgenic positive plants were detected by a pair of specific primers HPT-F/HPT-R. F5-1-F/F5-1-R and F5-2-F/F5-2-R were used to amplify the target sites to verify the mutation by sequencing in T₀, and positive was further validated in the T₁ generation. The primers used in this study are listed in Data S2.

Statistical analysis

Significant differences between individual means were determined using a two-tailed Student’s t-test in Microsoft Office Excel 2010, and IBM SPSS statistics software (version 22.0) was used for one-way ANOVA and Duncan multiple comparisons.

Phenotypic variation of GPC in the association panel

The GPC was investigated using 529 rice accessions collected worldwide (Data S1). Structural analysis of these accessions revealed that the population had a unique structure, and was divided into several distinct subpopulations, including indI (95 accessions), indII (74 accessions), indIII (13 accessions), indica intermediate (117 accessions), Tej (93 accessions), Trj (43 accessions), japonica intermediate (20 accessions), aus (46 accessions), and a mixture (Chen et al., 2014). Rice GPC was normally distributed in the full population and varied widely over the 3 years (Fig. 1a, Data S3). The protein content ranged from 95.92 to 122.96 mg/g in 2014; from 92.11 to 121.13 mg/g in 2015; and from 92.92 to 122.10 mg/g in 2016 in the full population. Large variations were found in the indica accessions, and japonica accessions presented a similar distribution (Fig. 1b, c, Data S3).The highest mean values for GPC were observed in 2014 and the lowest mean values were observed in 2015 in the full population and the two subpopulations (Fig. 1, Data S3). The correlation coefficients for GPC were significant among the 3 years, being 0.57 between 2014 and 2015, and 0.77 between 2014 and 2016. The correlation coefficient of GPC between 2015 and 2016 was the highest, at 0.93 (Data S4).

Identification Of Loci Associated With Gpc By Gwas

In this study, 3,916,415 SNPs in the whole rice genome were selected for GWAS, which was performed separately for different populations. In the following analysis, we analyzed GWAS results from the full population and in indica and japonica subpopulations over the 3 study years (Fig. 2, Data S5). To avoid structural noise, 313 indica and 156 japonica accessions were used for GWAS. Manhattan plots for the association analysis of protein content and quantile-quantile plots of protein content over 3 years for the full population are illustrated in Fig. 2. Any two lead SNPs of around or less than 200 kb were identified as being caused by a common gene and treated as one association locus. A lot of associations have been found in different populations, and some associations were identified in different years (Fig. 2, Data S5).

Many lead SNPs were detected; details of these significant association signals with a threshold value at 5.0E-05 are listed in Data S5. More detailed information on lead SNPs, such as population, chromosomes, physical positions, proportion of phenotypic variance explained, P-values calculated using LMM, MAF, and neighboring known genes, are described in the list (Data S5). To further analyze the association results, GWAS were performed using the full association panel, indica, and japonica subpopulations over the 3 years (Table 1, Data S5). The detected association loci were widely distributed on 12 chromosomes based on their physical positions in the rice genome, with chromosomes 1, 4, 7, and 8 exhibiting the highest numbers of associations. Of these, 56, 36, and 43 loci were detected in 2014, 2015, and 2016, respectively. The significance levels of the associations (except those located close to known genes) ranged from P = 4.8E-05 to P = 3.6E-08 in the LMM for protein content. Lead SNP sf0102043947, located on chromosome 1, presented the most significant effect. The proportion of phenotypic variance explained of each locus ranged from 0.11 to 75.88%. In addition, 67 associations were detected in different environments or populations, explaining more than 10% of the phenotypic variation (Table 1, Data S5).

To better understand the comprehensive association results, the number of loci detected was counted (Table 1). Significant genetic heterogeneity was observed among the 12 chromosomes, the different groups, and the three years. Compared with GWAS results over the three years, most of the loci were detected in only one year, demonstrating that environmental variation can greatly impact the performance of GWAS. However, five lead SNPs, sf0102342328, sf0400950425, sf0618816229, sf0709202668, and sf0817958573, were detected in the three years and the different groups, and sf0401819925 was detected in the three years and only in the full population. In addition, 16 lead SNPs were detected in two years. Three lead SNPs that were simultaneously identified in the full population and two subpopulations were merged for protein content. For example, loci sf0102103441, sf0400950425, and sf0709202668 were detected in different populations. We found that a multiple significant GWAS signal on chromosome 1, lead SNP sf0102103441, in a hot spot at 2.04–2.17 Mb, explaining 8.18–58.3%, was detected eight times in the different populations and environments, indicating that the loci exerted important roles in the protein content phenotype. Similarly, lead SNPs sf0102342328, sf0401819925, and sf0618816229 were detected five times (Table 1).

Co-localization Of Associated Sites With Previously Reported Qtls And Genes Related To Grain Quality

In the past decade, dozens of genes related to grain quality have been reported and cloned in rice. To evaluate significant GWAS signals, the localization of associated sites were compared with QTLs previously detected in cultivated rice, and genes related to grain quality reported in a previous study (Chen et al. 2018; Lou et al. 2009; Ryoo et al. 2007; Wang et al. 2007; Wang et al. 2015). Overlaps were identified between associated sites detected by GWAS and the reported QTLs or intervals related to grain quality genes. In this study 120 associated sites were co-localized with nine reported QTLs from the corresponding references (Table 1). Most of the QTLs that co-localized with reported QTLs were detected on chromosomes 2 and 7. Lead SNPs sf0707834002 and sf0708340365 overlapped with the reported QTL for both qPC-7 (Lou et al. 2009) and 7 − 4(5) (Wang et al. 2007). The lead SNP sf0719598121 was detected four times in or overlapping the reported QTL 7–8(9) (Wang et al. 2007) and lead SNP sf0719625273 (Chen et al. 2018) in both 2014 and 2016 in different populations (Table 1).

In the present study, nine association loci were found in regions of genes involved grain quality (Table 1). Two genes, Susy2 (LOC_Os03g28330) and Flo5 (LOC_Os08g09230), were less than 50 kb away from the lead SNPs and have been shown to play vital roles in synthesis of rice grain starch (Ryoo et al. 2007; Wang et al. 2015). In our study, lead SNPs were detected near six genes: Sar1a (LOC_Os01g23620) (Tian et al. 2013), GluB6 (LOC_Os02g15070) (Uemura et al. 2003), OsTudor-SN (LOC_Os02g32350) (Chou et al. 2017), and Glb1 (LOC_Os05g41970) (Morita et al. 2009), which were confirmed to participate in the biosynthesis and accumulation of storage protein. The six storage protein-related genes were less than 150 kb from lead SNPs (Table 1), likely because of the strong correlations among these SNPs. In addition to the genes described above, OsAGPL4 (LOC_Os07g13980) and GBSSII (LOC_Os07g22930) are involved in the starch synthesis pathway (Maung et al. 2021; Ryoo et al. 2007; Wang et al. 2015).

Validation Of Gwas Signals With Qtl Mapping

Interestingly, we found that many loci are co-located with amino acid content QTLs (Table 1). Referring to studies on amino acid content QTLs, two F₉ recombinant inbred line populations were used to validate the authenticity of significant GWAS signals (Wang et al. 2007). The F₉ recombinant inbred line population was hybridized between ZS97B and DL208 and named ZS97B/DL208, while the other population was derived from ZS97B and NYZ and named ZS97B/NYZ. To investigate loci from the GWAS results that were feasible and efficient, and to evaluate the genetic effects, NIL-F₂ populations were constructed from the genetic background of ZS97B (Table 2).

QTL (2–4(5)) co-localized with lead SNP sf0206873340 was renamed qPC1.1; QTL (2–4(5)) co-localized with lead SNP sf0706126055 was renamed qPC1.2; QTL (7 − 4(5)) co-localized with lead SNP sf0709202668 was renamed qPC1.3; and QTL (1–12) co-localized with lead SNP sf0113186602 was renamed qPC1.4 (Table 2). The QTL qPC1.2 detected between the markers RM125 and RM214 on chromosome 7 presented the highest phenotypic variation (43.4%) and a logarithm of odds (LOD) score (13.04). The QTL qPC1.3, detected between the markers RM1186 and RM5499 on chromosome 7, explained 29.2% of the phenotypic variation, and had the highest LOD score (22.8). The QTL was previously reported and validated, and found to be reliable (Chen et al. 2018). The QTL qPC1.1, flanked by RM555 and RM492 on chromosome 2, presented a LOD score of 7.9 and phenotypic variation of 9.82%. The QTL qPC1.4, was only detected in ZS97B/DL208, explained 13.6% of the phenotypic variation, and had a LOD score of 6.7. The QTLs, qPC1.2 and qPC1.3, underlined a dominant allele from NYZ and showed a positive additive effect on protein content, suggesting that the allele of ZS97B increased protein content; however, the other two QTLs, underlining a dominant allele from ZS97B, showed a negative additive effect on protein content, and the allele at ZS97B decreased the protein content (Table 2). Thus, all four QTLs controlling protein content were stable, indicating that GWAS signals were reliable.

Phenotypic characterization of the flo5 mutants

The GWAS results identified Flo5 in the total population in 2014 (Table 1; Fig. 3a). We performed haplotype analysis of SNPs in the promoter (2 kb) and gene regions, except for the intron of Flo5, and analyzed the GPC of different haplotypes. Ten haplotypes of Flo5 were identified, and the GPC of the haplotypes differed over the 3 years (Fig. 3b). The GPC of Hap2 in 2014 and Hap5 in 2015 was significantly higher than that of the other haplotypes in the same year. In 2016, the GPC of Hap2 and Hap5 was significantly higher than that of Hap3 and Hap7 (Fig. 3b). The results suggested that Flo5 may affect the GPC in rice. Flo5 has been cloned and affects the quality of rice grain (Ryoo et al. 2007). However, the effect of Flo5 on the GPC of rice remains unknown. In this study, a CRISPR/Cas9-knockout construct expressing three guide RNAs targeting two exons was introduced into ZH11 (Fig. 3c, d), and three homozygous mutants, flo5-1, flo5-2, and flo5-3, were generated (Fig. 3d, e). Compared to wild-type ZH11, flo5-1 had a 3 and 4 bp deletion at the first and third targets, respectively. flo5-2 had a one-base C insertion and a 2 bp deletion at the first and third targets, respectively. flo5-3 had a 6 bp deletion at the first target, and a large 43 bp deletion between the second and third target sites. These results led to the production of frameshift mutations in all three mutants (Fig. 3d). The chalkiness rates of flo5-1, flo5-2, and flo5-3 mutants were significantly higher than that of ZH11, and the chalkiness rate of flo5-1 was the highest (nearly 80%) (Fig. 3e, f). The rice GPCs of mutant flo5-1, flo5-2, and flo5-3 were significantly higher than that of ZH11, with that of flo5-2 being the highest (Fig. 3g), indicating that flo5 could increase rice GPC.

Analysis Of Candidate Genes

GWAS detected many SNPs in different years and populations (Table 1; Fig. 4a, b). Candidate gene analysis of repeatedly detected loci can provide a basis for cloning rice GPC genes. We analyzed the lead SNP sf0706126055, located on chromosome 7. Lead SNP sf0706126055, which was repeatedly detected in 2015 and 2016, was verified by the genetic population and was detected in all and indica populations (Tables 1, 2). By analyzing the candidate genes within 100 kb upstream and downstream of the lead SNP sf0706126055, multiple haplotypes of three genes, LOC_Os07g11120, LOC_Os07g11150, and LOC_Os07g11250, were found in the full population. The GPC of different haplotypes was significantly different in 2015 and 2016, and three genes were expressed in the endosperm of Minghui 63 (MH63) and ZS97B (Fig. 4c-k). The candidate gene LOC_Os07g11120 encodes a hydrolase belonging to the NUDIX family, and Hap7 showed a significantly higher GPC than Hap4, Hap5, and Hap6 in two years. LOC_Os07g11120 was highly expressed in the endosperms of MH63 and ZS97B (Fig. 4c-e). The candidate gene LOC_Os07g11150, which encodes an expressed protein, was found in the endosperm of MH63 and ZS97 (Fig. 4f-h). The GPC of Hap3 of LOC_Os07g11150 was significantly higher than that of Hap4, Hap7, and Hap10 in 2015, and the GPC of Hap7 was significantly lower than that of other haplotypes in 2016. The candidate gene LOC_Os07g11250 also encodes an expressed protein and was highly expressed in the endosperm of MH63 and ZS97B. The GPC of Hap4 was significantly higher than that of Hap1 and Hap2 in two years (Fig. 4i-k). Genetic variations of three candidate genes with different haplotypes were further analyzed (Fig. 5). LOC_Os07g11120 and LOC_Os07g11250 had more SNPs than LOC_Os07g11150, but most of SNPs were distributed in the promoter, and intragenic regions of candidate genes contained eleven and twelve SNPs, respectively (Fig. 5a, b). LOC_Os07g11150 had few SNPs, with only two SNPs in the intragenic region (Fig. 5b). The haplotype distributions in different subpopulations were analyzed (Fig. 5). Hap4 of LOC_Os07g11120 with significant phenotypic differences was only found in japonica subpopulation, while Hap6 and Hap7 were mostly found in indica subpopulation (Fig. 5a). Hap7 of LOC_Os07g11150 with significant phenotypic differences was also only distributed in japonica subpopulation, while Hap2, Hap3, Hap9, and Hap10 of LOC_Os07g11150 were mostly found in indica subpopulation (Fig. 5b). Hap2 of LOC_Os07g11250 with significant phenotypic differences was only existed in japonica subpopulation. While Hap4 and Hap5 were basically contained in the indica subpopulation, Hap6 and Hap7 were mainly found in Aus subpopulation (Fig. 5c).

GWAS for rice GPC

GWAS has been used widely to detect genes in different natural populations. In this study, 135 lead SNPs were detected by GWAS (Data S5). In a further analysis, a large number of loci were repeatedly detected in different years and populations. In addition, many loci were co-localized with reported QTL or cloned genes related to rice quality (Table 1). We verified four significant loci by genetic population (Table 2) and demonstrated the presence of genes affecting rice GPC at these four loci. These results confirm the feasibility of GWAS for preliminarily explorations of GPC genes in rice.

Effects of Flo5 on rice GPC

Flo5/OsSSIIIa encodes starch synthase III, the second key enzyme involved in rice starch synthesis. Flo5 affects physicochemical properties of starch, amylose content, and amylopectin structure of rice grains (Ryoo et al. 2007). The double inhibition lines of OsSSIIa/OsSSIIIa presented high chalkiness, amylose content, gelatinization temperature, and medium-long amylopectin chain content, and lower viscosity and short and long amylopectin chain content. Starch and protein are the two main components of rice. Starch content mainly affects rice cooking and eating quality, whereas GPC content mainly affects nutritional quality (Chen et al 2018; Peng et al. 2014; Sun et al 2011; Yang et al. 2019). A lot of genes involved to rice grain quality have been reported to affect both grain starch and protein content, thus affecting rice grain quality. Such genes include Chalk5, OsAAP6, RAG2, RISBZ1, RPBF, and OsMADS6 (Gupta et al. 2015; Li et al. 2014; Peng et al. 2014; Yu et al. 2020; Zhou et al. 2017). In the present study, we found that the GPC of Hap2 and Hap5 of Flo5 was high, which would be helpful when using these two haplotypes to improve rice grain quality (Fig. 3). At the same time, flo5 can increase rice GPC, indicating that Flo5 can affect the content of starch and protein in rice, which provides a theoretical basis for the study of the genetic basis and molecular mechanism of starch and protein in rice grain quality.

Screening Of Candidate Genes

Lead SNPs that were repeatedly detected in different years and populations were more reliable. In this study, candidate genes upstream and downstream of the lead SNP sf0706126055 were analyzed, and LOC_Os07g11120, LOC_Os07g11150, and LOC_Os07g11250 were found to encode different proteins. Among them, the expression profiles of MH63 and ZS97B showed that LOC_Os07g11120 was specifically highly expressed in the endosperm of the three grain filling stages. LOC_Os07g11120 encodes a Nudix hydrolase. Nudix is a class of enzymes that can catalyze the hydrolysis of various nucleoside diphosphate derivatives and has biological functions, such as maintaining the stability of genetic material and responding to stress (Karačić et al. 2017). GPC differed among different haplotypes of LOC_Os07g11120, and it was specifically highly expressed in the endosperm. The effect of LOC_Os07g11120 on GPC was further verified using a transgene experiment. In this study, significant loci were repeatedly detected, including sf0102103441 and sf0102342328 on chromosome 1, sf0206873340 on chromosome 2, and sf0401819925 and sf0403146346 on chromosome 4 (Table 1). sf0618816229 on chromosome 6 and sf0719598121 on chromosome 7 were detected more than four times (Table 1). For lead SNPs detected many times, gene cloning can be carried out by constructing a genetic population. Four lead SNPs in this study were verified by genetic population to have an impact on rice GPC (Table 2), among which sf0206873340 on chromosome 2 and sf0706126055 on chromosome 7 were repeatedly detected (Table 1). The influence of this locus on rice GPC was verified by genetic population, and the genes controlling rice GPC at these two loci could be cloned in the future. This study provides a theoretical basis for exploring the GPC genes in rice.

Supplementary Information

The supplementary material is available at Supplementary1.

Ethics approval and consent to participate Not Applicable

Consent for publication

Acknowledgements and Funding

This work was supported by grants from the National Natural Science Foundation of China (31901533), Natural Science Foundation of Guangdong Province (2020A1515011390), Guangdong Key Laboratory of New Technology in Rice Breeding (2020B1212060047), National Natural Science Foundation (32001529), Deans Fund of Guangdong Academy of Agricultural Sciences (202006), and the earmarked fund for CARS-01 of China.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Authors’ information and contributions

Pingli Chen performed the experiments, various data analysis and wrote the manuscript; Guangming Lou, Yufu Wang, Junxiao Chen, Wengfeng Chen, Zhilan Fan, Qing Liu, Bingrui Sun, Xingxue Mao, Hang Yu, Liqun Jiang, Jing Zhang, Shuwei LV, Junlian Xing, and Dajian Pan participated in part of phenotyping, genotyping, and biochemical experiments. Pingli Chen, Yuqing He, and Chen Li designed experiments. All authors discussed and commented on the manuscript.

Data Availability

The variation information of 529 rice accessions can be obtained through the website RiceVarMap (http://ricevarmap.ncpgr.cn/). The phenotypic data of 529 rice accessions and other data can be obtained from the references mentioned in the main text or the Supplementary Data part of this study. For materials, please contact the corresponding author's email address.

Akihiro T, Mizuno K, Fujimura T (2005) Gene expression of ADP-glucose pyrophosphorylase and starch contents in rice cultured cells are cooperatively regulated by sucrose and ABA. Plant Cell Physiol 46(6):937-946. https://doi.org/10.1093/pcp/pci101
Borba T, Brondani R, Breseghello F, Coelho A, Mendonça J, Rangel P, Brondani C. (2010) Association mapping for yield and grain quality traits in rice (Oryza sativa L.). Genet Mol Biol 33(3):515-524. https://doi.org/10.1590/S1415-47572010005000065
Chen P, Shen Z, Ming L, Li Y, Dan W, Lou G, Peng B, Wu B, Li Y, Zhao D, Gao G, Zhang Q, Xiao J, Li X, Wang G, He Y (2018) Genetic basis of variation in rice seed storage protein (albumin, globulin, prolamin, and glutelin) content revealed by genome-wide association analysis. Front Plant Sci 9:612. https://doi.org/10.3389/fpls.2018.00612
Chen W, Gao Y, Xie W, Gong L, Lu K, Wang W, Li Y, Liu X, Zhang H, Dong H, Zhang W, Zhang L, Yu S, Wang G, Lian X, Luo J (2014) Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nat Genet 46(7):714-721. https://doi.org/10.1038/ng.3007
Cheng L, Xu Q, Zheng T, Ye G, Luo C, Xu J, Li Z (2013) Identification of stably expressed quantitative trait loci for grain yield and protein content using recombinant inbred line and reciprocal introgression line populations in rice. Crop Sci 53:1437-1446. https://doi.org/10.2135/cropsci2013.02.0075
Chou H, Tian L, Kumamaru T, Hamada S, Okita T (2017) Multifunctional RNA binding protein OsTudor-SN in storage protein mRNA transport and localization. Plant Physiol 175(4):1608-1623. https://doi.org/10.1104/pp.17.01388
Chrastil J (1992) Correlations between the physicochemical and functional properties of rice. J Agric Food Chem 40:1683-1686
Furukawa S, Tanaka K, Masumura T, Ogihara Y, Kiyokawa Y, Wakai Y (2006) Influence of rice proteins on eating quality of cooked rice and on aroma and flavor of sake. Cer Chem 83(4):439-446. https://doi.org/10.1094/CC-83-0439
Gupta S, Malviya N, Kushwaha H, Nasim J, Bisht N, Singh V, Yadav D (2015) Insights into structural and functional diversity of Dof (DNA binding with one finger) transcription factor. Planta 241(3):549-562. https://doi.org/10.1007/s00425-014-2239-3
Hamaker B, Griffin V (1993) Effect of disulfide bond-containing protein on rice starch gelatinization and pasting. Cereal Chem 70:377-380 https://doi.org/10.1021/bp00022a011
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng W, Liu K, Huang T, Zhou T, Jing Y, Li W, Lin Z, Buckler E, Qian Q, Zhang Q, Li J, Han B (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42(11):961-967. https://doi.org/10.1038/ng.695
Karačić Z, Vukelić B, Ho G, Jozić I, Sučec I, Salopek-Sondi B, Kozlović M, Brenner S, Ludwig-Müller J, Abramić M (2017) A novel plant enzyme with dual activity: an atypical Nudix hydrolase and a dipeptidyl peptidase III. Biol Chem 398(1):101-112. https://doi.org/10.1515/hsz-2016-0141
Kim J, Kim B, Lee J, Lee D, Rehman S, Yun S (2013) Protein content and composition of Waxy rice grains. Pak J Bot 45:151-156
Kusaba M, Miyahara K, Iida S, Fukuoka H, Takano T, Sassa H, Nishimura M, Nishio T (2003) Low glutelin content1: a dominant mutant that suppress the glutelin multigene family via RNA silencing in rice. Plant Cell 15(6):1455-1467. https://doi.org/10.1105/tpc.011452
Li Y, Fan C, Xing Y, Yun P, Luo L, Yan B, Peng B, Xie W, Wang G, Li X, Xiao J, Xu C, He Y (2014) Chalk5 encodes a vacuolar H⁺-translocating pyrophosphatase influencing grain chalkiness in rice. Nat Genet 46(4):398-404. https://doi.org/10.1038/ng.2923
Lou J, Chen L, Yue G, Lou Q, Mei H, Xiong L, Luo L (2009) QTL mapping of grain quality traits in rice. J Cereal Sci 50(2):145-151. https://doi.org/10.1016/j.jcs.2009.04.005
Martin M, Fitzgerald M (2002) Proteins in rice grain influence cooking properties! J Cereal Sci 36:285-294. https://doi.org/10.1006/jcrs.2001.0465
Maung T, Chu S, Park Y (2021) Functional haplotypes and evolutionary insight into the granule-bound starch synthase II (GBSSII) gene in korean rice accessions (KRICE_CORE). Foods 10(10):2359. https://doi.org/10.3390/foods10102359
Morita R, Kusaba M, Iida S, Nishio T, Nishimura M (2009) Development of PCR markers to detect the glb1 and Lgc1 mutations for the production of low easy-to-digest protein rice varieties. Theor Appl Genet 119(1):125-130. https://doi.org/10.1007/s00122-009-1022-5
Peng B, Kong H, Li Y, Wang L, Zhong M, Sun L, Gao G, Zhang Q, Luo L, Wang G, Xie W, Chen J, Yao W, Peng Y, Lei L, Lian X, Xiao J, Xu C, Li X, He Y (2014) OsAAP6 functions as an important regulator of grain protein content and nutritional quality in rice. Nat Commun 5:4847. https://doi.org/10.1038/ncomms5847
Raubenheimer D, Simpson S J (2016) Nutritional ecology and human health. Annu Rev Nutr 36:603-626. https://doi.org/10.1146/annurev-nutr-071715-051118
Ren Y, Wang Y, Liu F, Zhou K, Ding Y, Zhou F, Wang Y, Liu K, Gan L, Ma W, Han X, Zhang X, Guo X, Wu F, Cheng Z, Wang J, Lei C, Lin Q, Jiang L, Wu C, Bao Y, Wang H, Wan J (2014) GLUTELIN PRECURSOR ACCUMULATION3 encodes a regulator of post-Golgi vesicular traffic essential for vacuolar protein sorting in rice endosperm. Plant Cell 26(1):410-425. https://doi.org/10.1105/tpc.113.121376
Ryoo N, Yu C, Park C, Baik M, Park I, Cho M, Bhoo S, An G, Hahn T, Jeon J (2007) Knockout of a starch synthase gene OsSSIIIa/Flo5 causes white-core floury endosperm in rice (Oryza sativa L.). Plant Cell Rep 26(7):1083-1095. https://doi.org/10.1007/s00299-007-0309-8
Sun M, Abdula S, Lee H, Cho Y, Han L, Koh H, Cho Y (2011) Molecular aspect of good eating quality formation in Japonica rice. PLoS One 6(4):e18385. doi: 10.1371/journal.pone.0018385
Sweeney M, Thomson M, Pfeil B, McCouch S (2006) Caught red-handed: Rc encodes a basic helix-loop-helix protein conditioning red pericarp in rice. Plant Cell 18(2):283-294. https://doi.org/10.1105/tpc.105.038430
Tang L, Zhang F, Liu A, Sun J, Mei S, Wang X, Liu Z, Liu W, Lu Q, Chen S (2019) Genome-wide association analysis dissects the genetic basis of the grain carbon and nitrogen contents in milled rice. Rice (N Y) 12(1):101. https://doi.org/10.1186/s12284-019-0362-2
Tian L, Dai L, Yin Z, Fukuda M, Kumamaru T, Dong X, Xu X, Qu L (2013) Small GTPase Sar1 is crucial for proglutelin and α-globulin export from the endoplasmic reticulum in rice endosperm. J Exp Bot 64(10):2831-2845. https://doi.org/10.1093/jxb/ert128
Tian Z, Qian Q, Liu Q, Yan M, Liu X, Yan C, Liu G , Gao Z, Tang S, Zeng D, Wang Y, Yu J, Gu M, Li J (2009) Allelic diversities in rice starch biosynthesis lead to a diverse array of rice eating and cooking qualities. Proc Natl Acad Sci U S A 106(51):21760-21765. https://doi.org/10.1073/pnas.0912396106
Uemura Y, Kumamaru T, Ogawa M, Satoh H (2003) Glu6 gene encodes a glutelin polypeptide containing α-2 acidic subunit with pI6.55. Rice Genet Newsl 20(0):47-48
Wakamatsu K, Sasaki O, Uezone I, Tanaka A (2008) Effect of the amount of nitrogen application on occurrence of white-back kernels during ripening of rice under high-temperature conditions. Japan J Crop Sci 77(4):424-433. https://doi.org/10.1626/jcs.77.424
Wang L, Lu Q, Wen X, Lu C (2015) Enhanced sucrose loading improves rice yield by increasing grain size. Plant Physiol 169(4):2848-2862. https://doi.org/10.1104/pp.15.01170
Wang L, Zhong M, Li X, Yuan D, Xu Y, Liu H, He Y, Luo L, Zhang Q (2007) The QTL controlling amino acid content in grains of rice (Oryza sativa) are co-localized with the regions involved in the amino acid metabolism pathway. Mol Breeding 21:127-137. https://doi.org/10.1007/s11032-007-9141-7
Wang Y, Liu F, Ren Y, Wang Y, Liu X, Long W, Wang D, Zhu J, Zhu X, Jing R, Wu M, Hao Y, Jiang L, Wang C, Wang H, Bao Y, Wan J (2016) GOLGI TRANSPORT 1B regulates protein export from endoplasmic reticulum in rice endosperm cells. Plant Cell 28(11):2850-2865. https://doi.org/10.1105/tpc.16.00717
Wang Y, Ren Y, Liu X, Jiang L, Chen L, Han X, Jin M, Liu S, Liu F, Lv J, Zhou K, Su N, Bao Y, Wan J (2010) OsRab5a regulates endomembrane organization and storage protein trafficking in rice endosperm cells. Plant J 64(5):812-824. https://doi.org/10.1111/j.1365-313X.2010.04370.x
Wang Y, Zhu S, Liu S, Jiang L, Chen L, Ren Y, Han X, Liu F, Ji S, Liu X, Wan J (2009) The vacuolar processing enzyme OsVPE1 is required for efficient glutelin processing in rice. Plant J 58(4):606-617. https://doi.org/10.1111/j.1365-313X.2009.03801.x
Xie K, Minkenberg B, Yang Y (2015) Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc Natl Acad Sci USA 112(11):3570-3575. https://doi.org/10.1073/pnas.1420294112
Xu F, Bao J, He Q, Park Y (2016) Genome-wide association study of eating and cooking qualities in different subpopulations of rice (Oryza sativa L.). BMC Genomics 17:663. https://doi.org/10.1186/s12864-016-3000-z.
Yang Y, Guo M, Li R, Shen L, Wang W, Liu M, Zhu Q, Hu Z, He Q, Xue Y, Tang S, Gu M, Yan C (2015) Identification of quantitative trait loci responsible for rice grain protein content using chromosome segment substitution lines and fine mapping of qPC-1 in rice (Oryza sativa L.). Mol Breeding 35:130. https://doi.org/10.1007/s11032-015-0328-z
Yang Y, Guo M, Sun S, Zou Y, Yin S, Liu Y, Tang S, Gu M, Yang Z, Yan C (2019) Natural variation of OsGluA2 is involved in grain protein content regulation in rice. Nat Commun 10(1):1949. https://doi.org/10.1038/s41467-019-09919-y
Ye G, Liang S, Wan J (2010) QTL mapping of protein content in rice using single chromosome segment substitution lines. Theor Appl Genet 121(4):741-750. https://doi.org/10.1007/s00122-010-1345-2
Yu X, Xia S, Xu Q, Cui Y, Gong M, Zeng D, Zhang Q, Shen L, Jiao G, Gao Z, Hu J, Zhang G, Zhu L, Guo L, Ren D, Qian Q (2020) ABNORMAL FLOWER AND GRAIN 1 encodes OsMADS6 and determines palea identity and affects rice grain yield and quality. Science China Life Sci 63(2):228-238. https://doi.org/10.1007/s11427-019-1593-0
Zheng L, Zhang W, Chen X, Ma J, Chen W, Zhao Z, Zhai H, Wan J (2011) Dynamic QTL Analysis of rice protein content and protein index using recombinant inbred lines. J Plant Biol 54:321-328. https://doi.org/10.1007/s12374-011-9170-y
Zhong H, Liu S, Zhao G, Zhang C, Peng Z, Wang Z, Yang J, Li Y (2021) Genetic diversity relationship between grain quality and appearance in rice. Front Plant Sci 12:708996. https://doi.org/10.3389/fpls.2021.708996
Zhou W, Wang X, Zhou D, Ouyang Y, Yao J (2017) Overexpression of the 16-kDa α-amylase/trypsin inhibitor RAG2 improves grain yield and quality of rice. Plant Biotechnol J 15(5):568-580. https://doi.org/10.1111/pbi.12654

Supplementary1.xlsx

Download PDF

Journal Publication

published 26 Dec, 2022

Read the published version in Molecular Breeding →

Reviewers agreed at journal
27 Oct, 2022
Reviewers invited by journal
27 Oct, 2022
Editor assigned by journal
27 Oct, 2022
First submitted to journal
26 Oct, 2022

You are reading this latest preprint version

The genetic basis of grain protein content in rice by genome-wide association analysis

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Materials And Methods

Plant materials and growth conditions

Trait Measurements

Genome-wide Association Study

Candidate Genes And Haplotype Analysis

Vector Construction And Genetic Transformation

Statistical analysis

Results

Identification Of Loci Associated With Gpc By Gwas

Co-localization Of Associated Sites With Previously Reported Qtls And Genes Related To Grain Quality

Validation Of Gwas Signals With Qtl Mapping

Analysis Of Candidate Genes

Discussion

GWAS for rice GPC

Screening Of Candidate Genes

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1