Genetic Basis of Relationship Between Yield and its Component Traits in Rice Revealed by Genome-Wide Association and Mendelian Randomization Study

Background Rice yield has a complex genetic architecture, which mainly determined by its three component traits: the number of grains per panicle (GPP), kilo-grain weight (KGW) and tillers per plant (TP). Exploring ideotype breeding based on selection for genetically less complex component traits is an alternative route for further improving rice production. Thus, it is important that studying the genetic basis of relationship between rice yield and component traits and clarifying the effects of each component trait on yield. In this study, we carried out meta-analyses of genome-wide association study (Meta-GWAS) with two population (575 + 1495 F 1 ) in different environment for yield and its three component traits in rice. Totally, 3589 significant loci for three components traits were detected, while only 3 significant loci for yield were detected. It indicated that rice yield is mainly controlled by minor-effect loci and hardly to be identified. Selecting quantitative trait locus (QTL)/gene affected component traits to further enhance yield is recommended. A Mendelian randomization (MR) design was adopted to further estimate the causal relationship between rice yield and its component traits. Both GPP (Beta=0.086, 95% CI: 0.030~0.141, P =0.003) and TP (Beta=1.865, 95% CI: 1.035~2.694, P <0.0001) has a positive causal relationship with yield, but no significant relationship between KGW and yield (Beta=0.456, 95% CI: -0.119~1.031, P =0.120) was observed. Additionally, TP (Beta=1.865) has a greater effect on yield than GPP (Beta=0.086). Four significant loci for TP and GPP with indirect effect on yield were identified. Pyramiding superior alleles of the four loci revealed improved yield. A combination of direct and indirect effects may better contribute to the yield potential of rice. on studying the nature and strength of the relationship between yield and its components, genetic insights for further improving potential. improvement of rice by pyramiding the superior alleles of genes regulating GPP and TP, and a combination of direct and indirect effects may better contribute to the yield potential of rice in breeding practice. These will provide theoretical guidelines for the rational design of rice by MAS breeding.


Background
Rice is a staple food crop for about half of world people. Improving rice productivity has been the main goal of rice breeding research since the growth of population and the loss of arable land. However, rice yield has a complex genetic architecture, which determined by various physiological processes changing temporally during the growing period. These processes often matched the yield component traits that are genetically less complex than yield (Kadam et al. 2018). Therefore, selecting the component traits of yield was proposed as a complementary route for further improving the rice production, which also has been emphasized by national and international rice breeding programs (Huang et al. 2009). Studying the causal relationship between rice yield and component traits, and clarifying the effects of each component trait on yield, will provide new clues for enhancing rice yield potential.
Rice yield is a very complex agronomic trait mainly determined by its three component traits: the number of grains per panicle (GPP), kilo-grain weight (KGW) and tillers per plant (TP), which are typical quantitative traits that are affected by multiple genes and the environment, with a low heritability (Xing et al. 2010). With the development of high-throughput technology, a large number of genes/quantitative trait loci (QTLs) of the three component traits were identified using QTL mapping and genome-wide association study (GWAS) methods (Le et al. 2019; Wang et al. 2019). At the end of 2019, 209, 223, and 239 genes/QTLs for GPP (TO: 0000445), KGW (TO: 0000382) and TP (TO: 0000152) were identified respectively (http://www.gramene.org/), which densely distributing across the 12 chromosomes. Some of them have been applied in rational design of super rice by marker-assisted selection (MAS) breeding, in which multiple defined genes with superior alleles pyramided to increase rice yield (Qian et al. 2016). Liu et al. (2012) introduced the DEP1 and Gn1 genes introduced into the restorer line 93 − 11, then the yield of the DEP1 / Gn1-9311 line was significantly improved, due to resource allocation improved. In 2020, Wang et al. (2020) compared the transgenic lines with GNP1 or NAL1 to the transgenic lines with both genes. They found the latter had a significantly higher yield, it indicated the two genes combinations may enhance the source-sink relationship. In above researches, only a small number of genes combined for super rice breeding, if more genes are selected for pyramiding, the trade-offs between different traits need to carefully consider (Zeng et al. 2017). Therefore, understanding the nature and strength of the relationship between yield and its components will be helpful for efficient gene selection in MAS breeding (Li et al. 2019).
The relationship between rice yield and its components investigated by various researchers with different materials and methods, but they were inconsistent. In Huang et al.'s study, the superior alleles of grain number generally had a positive effect on yield, while the superior alleles of grain weight generally had a negative effect on yield (Huang et al. 2015). Path analyses were performed by Oladosu et al. (2018) on rice yield and component traits revealed that three component traits possessed positive effect with yield. Xu et al. (2015) conducted a correlation analysis between yield and its components of 300 rice germplasms. Their result indicated that yield was significantly correlated with GPP or KGW, but non-significant correlations of yield were found with TP. One possible explanation for the conflicting results is that the bias caused by small sample size and lack of proper control for potential unmeasured confounders. For allowing the synthesis of results from different studies to estimate a common summary effect, meta-analysis was recognized as the appropriate method to achieve adequate sample sizes and optimal power (Panagiotou et al. 2013). In 2019, Zhao et al. (2019) reported the meta-analysis of GWAS from three GWAS panels, discovered 305 significant loci associated with tomato flavor, and demonstrated the benefits obtained from meta-analysis.
Recently, Mendelian randomization (MR) approach is a popular technique to assess causal relationship between disease and environmental risk factors within a meta-analysis framework in epidemiology (Bowden et al. 2019). MR method was used to investigate the role of ATP citrate lyase inhibitors in cardiovascular disease (Ference et al. 2019), in which the potential unmeasured confounders could be well protected from observed association. In MR approach, genetic variants were used as instrumental variables to avoid the possibility of confounding, because the genetic variants are randomly allocated at meiosis (Mokry et al. 2015). Thus, combine metaanalysis and MR for complex traits will help researchers to obtain more reliable conclusion of their genetic relationship.
GWAS has been proved to be a new strategy for explaining the genetic basis of complex traits, which has the advantage of improving the efficiency of detecting natural variations (Li et al. 2018). Most GWAS studies focused on dissecting the genetic basis of single yield traits (Ta et al. 2018;Jiang et al. 2019), but the study on clarifying the genetic basis of relationship between the yield and component traits of rice is few. Here we carried out metaanalyses of GWAS results from two population (575 + 1495 F 1 ) in different environment, and adopted an MR design to further estimated the causal relationship between rice yield and component traits of rice. Our aim was to detect significant single-nucleotide polymorphisms (SNPs) associated with yield or component traits, to analyze the genetic bases contributing to relationship between them, and to investigate possible utilization pattern for selecting the component traits of yield in breeding practice so as to improve the rice production. The study will provide theoretical guidelines for enhancing rice yield potential.

Materials and phenotyping
Two populations of rice hybrid varieties were used in our study. One of population consists of 575 F 1 hybrid rice lines, which produced by 115 varieties (restorer lines of 29 three-line wild-deficient hybrid rice and 86 accessions of micro-core germplasm) as male parents were crossed with 5 sterile lines (4 two-line sterile lines and 1 threeline sterile line) as female parents. The 575 hybrid lines were grown both in Huazhong Agricultural University and Wuhan University in 2012. The other population consists of 1495 F 1 hybrid rice lines, including 1,170 lines were bred from the three-line system and 325 lines were generated from the two-line system. The 1495 hybrid lines were grown in Hangzhou and Sanya respectively. This population was obtained from the national center for gene research of Chinese academy of sciences (Huang et al. 2015). A total of four agronomic traits including GPP, KGW, TP and yield (YD) were recorded in both populations. These agronomic traits were measured for at least three samples of each accession, and the average measurement was taken as the phenotypic value for GWAS analysis.

Resequencing And Genotyping
The population of 575 hybrid rice lines was sequenced on the Illumina HiSeq2500 platform at 11⋅ genome coverage on average. By quality control, we obtained 1,894,012 high quality SNPs with minor allele frequency (MAF) > 5% and missing rate < 20% across the 575 accessions. The high diversity SNP maps of 1,495 hybrid rice varieties are publicly available (http://www.ncgr.ac.cn/RiceHap4) (Huang et al. 2015). The genomes of 1,495 hybrid lines were sequenced on the Illumina HiSeq2000 at twofold genome coverage, and 1,531,463 SNPs passing quality control (MAF < 1%).

Genotype Imputation And Gwas Analysis
3000 rice genomes project (https://snp-seek.irri.org/download.zul) as the reference panel to perform SNP imputation in the genotype data of 575 and 1495 hybrid rice lines by using beagle software (version 5.0) (Browning et al. 2018), and all imputed SNPs with MAF < 1% were filtered. Then conducting separate GWAS using mixed-linear-model association (MLMA) in GCTA software (Yang et al. 2014) and collecting the summary statistics to run a meta-GWAS. At last, a total of 1,838,525 SNPs from four GWAS datasets were used for metaanalysis.

Meta-gwas Analyses
We used the fixed-effect model in Metal as the primary approach to conduct the meta-analyses (Willer et al. 2010), and the Cochran's Q-test was performed to heterogeneity test (Cochran et al. 1954). For those SNPs where heterogeneity occurs (I 2 > = 50%), the random effect model in METASOFT was adopted (Han et al. 2011). The genome-wide significant P-value for meta-GWAS was set as P < 1E-06(-log 10 P = 6).

Mr Analysis
For the causal effect of rice yield and each component trait to be consistently estimated, the genetic variants were selected according to the three assumptions in MR analysis (Burgess et  The inverse-variance weighting (IVW) method was conducted for MR analysis to assess the effect of component traits on yield, which by summarizing the effects of multiple independent SNPs (Burgess et al. 2013). In sensitivity analyses, the weighted median method (Bowden et al. 2016) and MR-Egger method (Burgess et al. 2017) are used for MR analysis, which are more robust due to pleiotropic or invalid instruments involved.

Analysis Of Superior Alleles Of Significant Associate Loci
Calculated the average phenotypic measurement corresponding to genotypes of each significant SNP, and the least significant difference method was used for multiple comparison. Following Huang et al.'s method (Huang et al. 2015), the genotype of SNP with the highest-level yield or component trait was set to be the superior allele (for example, the allele corresponding to the largest number of grains per panicle was set to be the superior allele). Calculated the number of superior alleles in each hybrid rice line and recorded their corresponding average yield measurements. Omitted the number of superior alleles with less than 3 hybrid lines.

Meta-GWAS analyses
Meta-analyses of GWAS were performed based on four datasets' (two locations for each population) GWAS results (Additional file 1: Figures S1-S4). Manhattan plots and quantile-quantile plots of meta-GWAS are shown in Fig. 1. A total of 3592 significant loci were identified (Additional file 2: Table S1), including 2450, 1116, 23 and 3 significant associated loci were separately detected for GPP, KGW, TP and YD, which were distributed on all of the rice chromosomes except for chromosome 10. According to the information of RAP-DB (http://rapdb.dna.affrc.go.jp/), candidate genes were searched in a genomic region of 200 KB around the associated SNPs (Additional file 1: Table S2). We discovered 7, 7, and 3 cloned genes separately associated with GPP, KGW and TP. A total of three candidate genes associated with different traits, among which OsBZR1 (Zhu et al. 2015) and OsSPL14 (Jiao et al. 2010) have been reported previously and Os02g0106966 was novelty discovered. Both OsBZR1 and OsSPL14 were detected in KGW and GPP, Os02g0106966 was detected in KGW and TP. In this study, only 3 significant loci for YD were detected, but 3589 significant loci for the component traits were detected. It may be because rice yield has a low heritability which mainly affected by many minoreffect genes, the low heritability of rice yield is also showed in our previous study (Xu et al. 2018). These results suggested that selecting the component traits of yield as a complementary route to improve the rice production is recommended.

Causal Relationship Between Yield And Its Components
MR analyses were performed to estimate the causal relationship between rice yield and its component traits in the study. The results of MR analyses were shown in Table 1. The IVW method results showed both GPP and TP have a positive causal relationship with YD (P < 0.05), and there was no directional causal relationship between KGW and YD. In sensitivity analyses, the results of the weighted median method are also confirmed the results of IVW method. The intercept term obtained by MR-Egger regression analysis indicated no evidence of directional pleiotropic (P > 0.05). The MR analyses provided some evidence that rice yield probably enhanced by pyramiding the superior alleles of genes controlling GPP or TP, and the superior alleles of genes controlling TP should be priority to pyramid, because TP has a greater effect (Beta = 1.865) on yield than KGW (Beta = 0.456) and GPP (Beta = 0.086).

Causal Relationship Between Gpp And Yd
As required for MR analysis, a total of 2450 SNPs reached genome-wide significance for GPP (P < 1E-06) in metaanalyses of GWAS, among which six SNPs were selected as instrumental variables to estimate the causal relationship between GPP and YD ( Table 2). The six SNPs were not associated with KGW or TP (P > 0.05), and no evidence of LD between them (all pairwise r 2 < 0.01). In MR analysis, a positive causal relationship between GPP and YD were observed with the IVW method (Table 1

Causal Relationship Between Kgw And Yd
As required for MR analysis, a total of 1116 SNPs reached genome-wide significance for KGW (P < 1E-06) in meta-analyses of GWAS, among which eleven SNPs were selected as instrumental variables to estimate the causal relationship between KGW and YD (Table 3). These SNPs were not associated with GPP or TP (P > 0.05), and no evidence of LD between them (all pairwise r 2 < 0.01). In MR analysis, no significant associations between KGW and YD were observed with the IVW method (

Causal Relationship Between Tp And Yd
As required for MR analysis, a total of 23 SNPs reached genome-wide significance for TP (P < 1E-06) in metaanalyses of GWAS, among which three SNPs were selected as instrumental variables to estimate the causal relationship between TP and YD (Table 4). These SNPs were not associated with KGW or GPP (P > 0.05), and no evidence of LD between them (all pairwise r 2 < 0.01). In MR analysis, a positive causal relationship between TP and YD were observed with the IVW method (Table 1, Fig. 2c), 1 SD genetic higher TP was associated with a 1.865 SD higher YD (Beta = 1.865, 95% CI: 1.035 ~ 2.694, P < 0.0001). In sensitivity analyses, Cochran's Q-test

Loci for component traits had an indirect effect on yield
We identified four significant loci that had an indirect effect on yield by MR analyses (Fig. 2, Additional file 1: Table S3). Among them, the SNP chr05_7226049 (Fig. 2a) for GPP had an indirect effect on yield, and located nearby the cloned gene OsPYL11. Kim et al. (2014) reported that compared with the control plants, the transgenic plants over expressing OsPYL11 showed no significant difference in tiller number, but the yield was severely reduced. Our study indicated the yield severely reduced may be caused by the number of grains decreased. The SNP chr06_1578700 (Fig. 2c) for TP closed to the D62 (a gene regulating tillers). Li et al. (2010) found that the tiller number of D62 mutant rice was less than that of wild type. The SNPs chr02_21604477 and chr11_26492375 for TP also had indirect effects on yield (Fig. 2c, Table S3), which were first detected in our research. These finding provided new information for further improve rice yield potential.

Pyramiding Superior Alleles Of Significant Loci
The average yield performance of F 1 lines with different superior allele number of significant loci with direct effect, indirect effect and direct plus indirect effect were showed in Table 5 and Fig. 3. Three loci had an direct effect on yield were detected in the meta-GWAS on YD (Additional file 1: Table S3), among them, the average yield of the lines without superior alleles was 41.29 g, and the average yield of the lines with one superior alleles was 44.26 g (Fig. 3a, Table 5). The superior alleles of four loci that had an indirect effect on yield were also pyramided in the study. The results showed that the average yield of F 1 lines with 0 to 4 superior alleles was: 42.75 g, 42.52 g, 43.32 g, 45.34 g, 52.60 g, respectively. In general, pyramiding superior alleles of loci revealed enhanced yield (Fig. 3b, Table 5). A similar phenomenon also found in pyramiding the direct plus indirect loci, and the yield probably further increased (Fig. 3c, Table 5). Other research reported that the phenotype performance improved by pyramiding the superior alleles of loci associated with agronomic traits (Huang et al. 2015), our results suggested the yield enhanced also by pyramiding the superior alleles of loci that had indirect effect on yield. A combination of direct and indirect effects may better contribute to the yield potential of rice.

Discussion
In this study, a total of 3592 significant SNPs were detected in meta-GWAS on yield and its component traits, which provide more information for rice agronomic traits breeding. It is worth noting that only 3 loci were detected in meta-GWAS on yield, this maybe result from that rice yield has a low heritability and hardly to be detected. For a low-heritability trait (such as yield), highly correlated auxiliary traits (such as GPP) will help improve the selection of traits with low heritability since they reflecting a shared biological basis (Wang et al. 2017). Then MR model was used to explore the causal relationship between yield and its component traits. The results showed that both GPP and TP has a positive causal relationship with YD, which are consistent with the improvement of rice production achieved by increasing GPP and TP in previous study conducted by Yan et al. (2011). However, no causal relationship between KGW and YD were observed in the study. Some genes that regulate KGW have been reported have the potential to enhance rice yield, such as GW2 (Song et al. 2007) and GW5 (Liu et al. 2017), we also detected a locus for KGW that had indirect positive effect on yield, but the summary effect of multiple loci showed no directional causal relationship KGW and YD (Fig. 2b). It probably because KGW as a agronomic trait is closely related to appearance quality traits (Zhu et al. 2019), while quality traits are often negatively correlated with yield (Zeng et al. 2017). The results of MR provided rationale for the improvement yield by ideotype breeding based on selection for GPP and TP. Finally, four loci were identified with indirect effect on yield by MR analysis, providing new information for enhancing the yield potential of rice. Previous study indicated that pyramiding the superior alleles of significant associated loci increased yield (Huang et al. 2015). Our results suggested the improvement of yield also by pyramiding the superior alleles of loci with indirect effect on yield, and a combination of direct and indirect effects may better contribute to the yield potential of rice.
The strengths of the study are: (i) a meta-analysis of GWAS data from multiple population and environments to estimate a summary effect provided greater statistical power (Panagiotou et al. 2013); (ii) The MR approach could less prone to confounders since the genetic variants were used as instrumental variables (Mokry et al. 2015); (iii) MR method were used to analysis the causal relationship between quantitative traits in this study, which weighted the effects of multiple independent SNPs into a summary effect, for quantitative traits, most of them are affected by multiple genes or the interaction of genes, while the individual SNP only explain a small fraction of the variation in the quantitative traits. However the MR analysis may be biased by the possibility of invalid instrumental variables, it is difficult to completely exclude type I error and the potential influence of pleiotropy since the instrumental variables derived from meta-analysis of GWAS in the study. Thus, we conducted weighted median method and MR-Egger method to do sensitivity analysis. Compared the IVW method, the weighted median method showed have better finite-sample Type I error rates, the estimator is consistent even if up to 50% of the information comes from invalid instrumental variables (Bowden et al. 2016).
The results of MR-Egger and heterogeneity test indicated the genetic variants had no pleiotropic effects on yield to some extent (Burgess et al. 2017). These results strengthened our confidence in the validity of assumptions.
In conclusion, we analyzed the genetic basis of the relationship between yield and its component traits by GWAS and MR methods, providing genetic insights for further improving rice yield potential. Our results suggested the improvement of rice production by pyramiding the superior alleles of genes regulating GPP and TP, and a combination of direct and indirect effects may better contribute to the yield potential of rice in breeding practice. These findings will provide theoretical guidelines for the rational design of rice by MAS breeding.

Ethics approval and consent to participate
No applicable.

Consent for publication
No applicable.

Availability of data and materials
The datasets supporting the conclusions of this article are provided within the article and its additional files.