Multiple-trait model through Bayesian inference applied to flood-irrigated rice (Oryza sativa L)

The objectives of this study were to use a bayesian multi-trait model, estimate genetic parameters, and select flood-irrigated rice genotypes with better genetic potentials in different evaluation environments. For this, twenty-five rice genotypes and six traits belonging to the flood-irrigated rice improvement program were evaluated. The experimental design used in all experiments was a randomized block design with three replications. The Monte Carlo Markov Chain algorithm estimated genetic parameters and genetic values. The grain thickness trait was considered highly heritable, with a credibility interval ranging from: h2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${h}^{2}$$\end{document}: 0.9480; 0.9440; 0.8610, in environments 1, 2, and 3, respectively. The grain yields showed a weak correlation estimate between grain thickness and 100-grain weight, in all environments, with a credibility interval ranging from (ρ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho$$\end{document}= 0.5477; 0.5762; 0.5618 and 0.5973; 0.5247; 0.5632, grain thickness and 100-grain weight, in environments 1, 2, and 3, respectively). The Bayesian multi-trait model proved to be an adequate strategy for the genetic improvement of flood-irrigated.

variation in the data (Malosetti et al. 2008). For that, Bayesian inference has become a useful statistical tool for dealing with complex models (Torres et al. 2018).
Bayesian inference has surpassed traditional analyses by providing different results from the classical approach, such as credibility intervals, genetic parameter estimates, and genetic values with greater precision (Peixoto et al. 2021;Silva Júnior et al. 2022). The Bayesian inference is a flexible methodology that allows the estimation of precise genetic values and variance components, even from small samples (Jarquín et al. 2016;Peixoto et al. 2021;Schoot et al. 2021;Silva Júnior et al. 2022).
Bayesian multi-trait models (MTM) have become a proper statistical method for genetic evaluations of plants and animals (Junqueira et al. 2016;Volpato et al. 2019). In addition, this model allows the estimation of variance components and breeding values for each trait (Peixoto et al. 2021;Silva Júnior et al. 2022), jointly modeling multiple traits compared to the analysis of each trait separately. The inference process adequately explains the correlation between the traits, which helps to increase prediction accuracy, statistical power, parameter estimation accuracy, and reduce the character selection bias (Henderson and Quaas 1976;Pollak et al. 1984;Schaeffer, 1984). The common use of multiple traits benefits from the genetic correlation between traits and significantly improves prediction accuracy compared to single-trait methods, specifically for low heritability traits that are genetically correlated with a high heritability trait (Jia and Jannink, 2012;Guo et al. 2014;Jiang et al. 2015;Montesinos-Lopez et al. 2016).
Some studies have shown the potential of the Bayesian approach for genetic evaluation in plant breeding, considering models with multiple traits or multiple environments (Cané-Retamales et al. 2011;Arriagada et al. 2012;Mora et al. 2014;Junqueira et al. 2016;Torres et al. 2018;Volpato et al. 2019;Silva Júnior et al. 2022). However, few studies combine multi-trait models under a Bayesian view for flood-irrigated rice cultivation. Therefore, the objectives of this study were to use a Bayesian multi-trait model, estimate genetic parameters, and select floodirrigated rice genotypes with better genetic potentials (desirable agronomic traits) in different evaluation environments.
Twenty-five rice (Oryza sativa L) genotypes belonging to the flood-irrigated rice breeding program of the state of Minas Gerais were evaluated, and five of these genotypes were used as experimental controls (Rubelita, Seleta, Ourominas, Predileta, and Rio Grande). These genotypes were evaluated in comparative trials after multiple generations of selection, and in addition, they are known for their high yield, uniform growth rate and plant growth, resistance to major diseases, and for their excellent grain quality. The traits evaluated were grain yields (GY, Kg ha −1 ), grain length (GL, mm), grain width (GW, mm) and grain thickness (GT, mm), grain length, and grain width and weight of 100 grains (GWH, g) in the agricultural year 2016/2017.
The design used in all experiments was a randomized complete block design with three replications. The experiments were conducted in floodplain soils with continuous flood irrigation. Management practices were carried out according to recommendations for flood-irrigated rice in the relevant regions (Soares et al. 2005).
The useful plot area consisted of 4 m central of three internal rows (4 m × 0.9 m, 3.60 m 2 total). The soil preparation was carried out by plowing and harrowing around 30 days before sowing and harrowing on the eve of the installation of the tests. For planting fertilization, a mixture of 100 kg ha −1 of ammonium sulfate, 300 kg ha −1 of simple superphosphate, and 100 kg ha −1 of potassium chloride was used, applied to the plot, and incorporated into the soil before planting. The fertilization in the top dressing was carried out approximately 60 days after the installation of the experiments, with 200 kg ha −1 of ammonium sulfate. The weeds were controlled with the use of herbicides and manual weeding. Sowing was carried out in the planting line with a density of 300 seeds m −2 . The irrigation started around 10-15 days after the Page 3 of 12 124 Vol.: (0123456789) implantation of the experiments, and the water was only removed close to the maturation of the material later. The harvest was carried out when the grains reached a humidity of 20-22%. Grain production data were obtained by weighing all grains harvested in the useful plot, after cleaning and uniform drying in the sun, until they reached a humidity of 13%.

Biometric analysis
The measured traits were analyzed using the univariate model and the multi-trait model through the Bayesian approach of Markov Chain Monte Carlo (MCMC).
The multi-trait model was given by: where y is the vector of phenotypic data, and the conditional distribution was given by: y| , g, G, R ~ N (X + Zg, R ⊗ I), G is the matrix of genotypic covariance, R is the matrix of residual covariance. I is an identity matrix, is vector of systematic effects (genotypes mean and replication effects), assumed as ~ N ( , Σ ⊗I). g is the vector of genotype effects, assumed as g|G, ~ N (0, G ⊗ I). is the vector of residuals, assumed as | R, ~ N (0, R ⊗ I). The uppercase bold letters X and Z refer to the incidence matrices for the effects and g, respectively.
We assume that G and R follow an inverted Wishart distribution WI (v, V), with hyperparameters v and V (Sorensen and Gianola 2002). Hyperparameters for all prior distributions have been selected to provide non-informative or flat prior distributions. For the systematic effect (β), a uniform distribution was assigned. In addition, the parameters β, g, G, R were estimated following the set posterior distribution: P(β, g, G, R |y) α P(y | β, g, G, R) × P(β, g, G, R).
For the model, the package was used "MCM-Cglmm" (Hadfield et al. 2010) of the R software (R Development Core Team 2020). A total of 10,000,000 samples were generated and assumed a burn-in period and thin range of 500,000 and 10 iterations, respectively, resulting in a final total of 950,000 samples. The convergence of the MCMC was verified by the criterion of Geweke et al. (1992), carried out in two R software packages: "boa" (Smith et al. 2007) and "CODA" (Convergence Diagnosis and Output Analysis) (Plummer et al. 2006).
The model was compared using the deviation information criterion (DIC) proposed by Spiegelhalter et al. (2002): where D is a point estimate of the deviance obtained by replacing the parameters with their posterior means estimates in the likelihood function and p D is the effective number of model parameters. Models with a lower DIC should be preferred over models with a higher DIC. The components of variance, broad-sense heritability, and genotypic correlation coefficients between traits and breeding values were calculated from the posterior distribution. The package "boa" (Smith et al. 2007) R software was used to calculate the intervals of higher posterior density (HPD) for all parameters. A posteriori estimates for broadsense heritability ( h 2 ) of the six traits for each iteration were calculated from the later samples of the variance components obtained by the multivariate model, using the expression: where ̂ 2(i) g ,̂ 2(i) r , and h ̂ 2(i) are the genetic, replication, and residual variations for each iteration, respectively.
For the multi-trait model, the genetic correlation coefficients between the pairs of traits in each environment were obtained, as suggested by Piepho et al. (2018), using the expression below for all models: where ̂ 2 gl represents the genetic variance of the evaluated trait and ̂ gl(1,2) represents the genetic covariance between pairs of traits.

Genetic selection based on selection index
The multi-trait index based on factor analysis and genotype-ideotype distance (FAI-BLUP) (Rocha (2) 124 Page 4 of 12 Vol:. (1234567890) et al. 2018) was used to identify superior genotypes to be selected in the flood-irrigated rice breeding program.
where P ij : probability of the ith genotype (i = 1, 2, …, n) to be similar to the jth ideotype (j = 1, 2, …, m);d ij : genotype-ideotype distance from ith genotype to jth ideotype-based on standardized mean distance. Selection gains (SG) were obtained directly from the FAI-BLUP result considering four different selection intensities: 12, 20, 40, and 60%, which referred to the selection of 3, 5, 10, and 15 genotypes, respectively, as follows: where X s is the overall mean of the estimated breeding values of the selected genotypes, and X 0 is the general population average.

Results
Geweke's convergence criterion indicates convergence for all dispersion parameters by generating 10,000,000 MCMC iterations, 500,000 samples for burn-in, and a sampling interval of 10, totaling 950,000 effective samples used for estimating variance components (Fig. 1). However, all chains [(co) variance components] reached convergence by this criterion. Similar posterior mean, median and modal estimates were obtained for variance components, suggesting normal-appearing density. According to the deviation information criteria (DIC), there was evidence that the full model for multi-trait (DIC = 348.84, 466.01 and 671.70, environment 1, 2, and 3, respectively) is the one that best fits the data, which reveals the significance of genotypic effects (DIC = 675.40, 675.52 and 898.84, environments 1, 2 and 3, respectively) (Table 1). Therefore, the DIC values were lower when using the full model (considering the effects of genotype x environment interaction), in which the difference between the genotype model was greater than 1.30 (Table 1). Hence according to Spiegelhalter et al. (2002), suffices to suggest that the use of the full model can lead to greater precision in the estimation of parameters (Table 1). Since the DIC values are higher, it is possible to indicate the superiority of the full model over the restricted models.
On the other hand, as this component of the model is important, the "best" genotypes measured in different environments may not be the same.
The posterior mean estimates for the variance components suggested density with chi-square, and normal distributions (Fig. 1). The GW to GWH traits showed a chi-square distribution (of which the Wishart distribution is a generalization) and only GY shows a normal distribution appearance. Table 2 corresponds to the result of the estimate of heritability in the broad sense and the credibility interval with 95% probability for the multitrait model. The estimates were different for mode, mean, median, and higher posterior density range (HPD). The highest estimates of heritability in the broad sense were for the GL, GW, and GWH traits, in all environments observed. On the other hand, the lowest estimates consisted of the grain yield and length-width ratio traits. The GW trait was considered highly heritable, with a credibility interval (95% probability) ranging from: h 2 : 0.7890-0.9480; 0.7640-0.9440; 0.5640-0.8610, in environments 1, 2 and 3, respectively ( Table 2).
The posterior inferences for mean and higher posterior density range (HPD) of the correlation between six characteristics of flood-irrigated rice, in three environments, considering the MTM model, are described in Table 3. GY showed a low correlation estimate between GT and GWH, in all environments, with a credibility interval (95% probability) ranging from ( = -0.5444-5477; -0.5806-0.5762; -0.5574-0.5618 e -0.5980-0.5973; -0.5287-0.5247; -0.5561-0.5632, GT and GWH, in environments 1, 2 and 3, respectively). About GY, the traits that presented the highest correlation estimates were GL and GLW.

Variance estimate
Estimates of genotypic variances, residuals, and genotype x environment interaction in the multitrait models were very different between environments ( Table 4). The GY trait had a higher estimate of genotypic variance compared to the other traits. On the other hand, all traits showed similarities in the estimation of genotypic variance in each environment. Smaller interaction variance estimates were observed for GW and GT traits.
The selection gains obtained by the FAI-BLUP index considering four different selection intensities: 12, 20, 40, and 60%, which referred to the selection of 3, 5, 10, and 15 genotypes, for six traces of flood-irrigated rice in three environments it represents in Table 5. The FAI-BLUP index indicated discrepant selection gains between environments for the same trait. Selection gains decreased with increasing selection intensity for the GL and GT traits in all environments (Table 5). The selection gain for the grain yield trait was approximately zero in environments 1 and 3. On the other hand, in environment 2, the greatest genetic gain was observed for the selection intensity of 10 genotypes. In this environment, the evaluated genotypes showed greater genetic variation compared to other environments. This environment showed greater genetic variance than other environments (Table 4). Regarding the GWH trait, it was the one with the greatest genetic gain in all environments (Table 5). Figure 2 shows the ranking of the 25 genotypes according to the FAI-BLUP index and their associated spatial probability, and the complete ranking was presented in (Table S). The results allowed for a unique and easy genotype selection process. We observed that genotypes 2 and 15 were similar in the three environments, they should be selected as high-performance multi-trait genotypes.

Discussion
The successful evaluation of a breeding program is related to the accuracy of the prediction of genotypic values, which is closely linked to the adoption of adequate models. The implementation of bayesian multi-trait models is straightforward and, currently, it has been widely used due to the possibility of considering a priori knowledge in the modeling.
One of the main limitations of using multi-trait models is the correlations between traits that are in practice undesirable for breeders. Guo et al. (2020), demonstrated the ability of multi-trait models for four traits when compared to the single trait in wheat in response to selection and prediction accuracy. These authors showed a genetic gain of 22% compared to the single trait model across the environment reflected by the response to selection. Silva Junior et al. (2022) argues that multi-trait multienvironment models show better consistency for estimates of genetic parameters in flood-irrigated rice.
Chain convergence ensures that the most likely estimate for each co(variance) component is reached. The significance of genetic effects indicates genetic variability among the twenty-five flood-irrigated rice genotypes, which allows for genetic selection. The deviation information criterion is widely applied as a criterion to assess the best fit of models in Bayesian inference. HPD ranges also indicate the significance of genetic effects (genotypes). An advantage of Bayesian inference over frequentist inference is the possibility to obtain HPD intervals (Peixoto et al. 2021).
The HPD intervals are more accurate when compared to frequentist inference confidence intervals, which increases the reliability of variance components and genetic parameters estimated by Bayesian inference. Silva Junior et al. (2022) in flood-irrigated rice reports this range for genetic parameter estimates using the multi-trait, multi-environment Bayesian model to estimate genetic parameters. In maize, estimates of genetic parameters for nitrogen uptake and use efficiency under contrasting soil nitrogen levels were demonstrated using multi-trait models (Torres et al. 2018). In soybean, estimates of genetic parameters for genetic selection of segregating progenies have been reported using multi-trait models (Volpato et al. 2019). The difference between mean, mode, and median of the broad-sense heritability estimates and the correlations between traits (Tables 2 and 3) reflect some lack of symmetry in the posterior distribution estimates. The lack of symmetry between mean, mode, and median heritability estimates in posterior distribution estimates was reported by Moura et al. (2014) and Torres et al. (2018). The low broad-sense heritability observed in the traits does not depend on the number of samples evaluated, since the Bayesian structure used is essentially recommended for situations involving small samples. On the other hand, quantitative characters are traits of agronomic interest, determined by several genes, showing low expression, and significantly influenced by the environment (Falconer 1996), reflected in the grain yield trait. Magalhães Junior et al. (2020), state that in rice breeding programs, productivity is identified as the main objective, however, the grain quality attributes, for example, long and fine grains, can directly reflect the market value and acceptance of the product by the consumer, making rice dependent on increased productivity and grain quality.
Low genetic gain for the grain yield trait using the FAI-BLUP index in environments 1 and 3 (Table 5). One possible justification is that genotypes belong to advanced comparative experiments after going through several generations of selection. The GWH trait showed the greatest genetic gain in all environments (Table 5). This high gain proves recent efforts by breeders in the search for advances in grain quality. Grain weight is crucial in determining rice yield; Table 2 Posterior inferences for mode, mean, median, and higher posterior density range (HPD) of heritability in the broad sense ( h 2 ), in three environments (E1, E2, and E3), considering the multi-trait model GY: grain yields (Kg ha −1 ); GL: grain length (mm); GW: grain width (mm); GT: grain thickness (mm); GLW: grain length and width ratio; GWH: 100 grain weight (g); E: environments E1, E2, and E3, respectively. therefore, it is a trait that breeders have been directed toward these traits. The grain size, in addition to being important for yield, is also an important suggestion of intrinsic quality (Custodio et al. 2019). The GWH trait is determined by grain size and fill rate, which is characterized by grain length, width, and height (Huang et al. 2013;Xie et al. 2015). Grain weight is related to genetic factors, and grain filling rate is affected by environmental conditions (Li et al. 2019). This trait negatively correlates with GLW (Song et al. 2015;Li et al. 2019). However, grain length and width are important factors influencing Table 3 Posterior inferences for mean and higher posterior density range (HPD) of the correlation between six traits, in three environments, considering the multi-trait model GY: grain yields (Kg ha −1 ); GL: grain length (mm); GW: grain width (mm); GT: grain thickness (mm); GLW: grain length and width ratio; GWH: 100 grain weight (g); E: environments E1, E2, and E3, respectively  Table 4 Genetic parameters of six traits of flood-irrigated rice, in three environments, using multi-trait models GY: grain yields (Kg ha −1 ); GL: grain length (mm); GW: grain width (mm); GT: grain thickness (mm); GLW: grain length and width ratio; GWH: 100 grain weight (g); E: environments 1 , 2 , and 3 , respectively. 2 g , 2 r , and 2 are the genetic, replication, and residual variations for each iteration, respectively grain yield in rice (Zhang et al. 2012;Si et al. 2016). Compared to other traits, GWH was consistent in different environments and may help in the genetic gain of grain yield (Yang and Zhang, 2010). The GL, GW, and GT traits show moderate heritability (Table 2) and are difficult to estimate due to grain size. Therefore, the use of multi-trait models helps the breeder in genetic progress for the selection of various traits, environments, and genotypes. Guo et al. (2020) argue that joint prediction of multiple traits benefits from the genetic correlation between traits and indirect selection of a target trait with relatively low heritability that is genetically correlated with other traits of high heritability. The joint multi-trait model obtained greater predictive accuracy than the single trait methods, especially for a trait with low heritability (Guo et al. 2020).
This result is in agreement with previous studies that reported that multi-trait models could be implemented to increase selection response for low heritability traits correlated with high heritability traits (Jia and Jannink 2012;Schulthess and Tal 2016;Rutkoski et al. 2016;Montesinos-López et al. 2018;Ward et al. 2019;Guo et al. 2020). Jia and Jannink (2012), also indicated that a multi-trait model is more effective when the genetic correlation between these traits is moderate. Traits with lower heritability, such as grain yield, showed more benefits compared to high heritability traits such as GLW using the MTM model. Guo et al. (2020) reported that the traits with lower heritability performed better than traits with high heritability through the MTM model, as it contemplates the two-way interaction (Traits x Genotypes), and provides a better correlation estimate between dashes. Table 5 Selection gains obtained by the FAI-BLUP index considering four different selection intensities: 12%, 20%, 40%, and 60%, which referred to the selection of 3, 5, 10, and 15 genotypes, for six traits of flood irrigated rice in three environments GY: grain yields (Kg ha −1 ); GL: grain length (mm); GW: grain width (mm); GT: grain thickness (mm); GLW: grain length and width ratio; GWH: 100 grain weight (g); E: environments E1, E2, and E3, respectively  It has been reported in the literature that multi-trait analyzes improve parameter estimates (Schulthess et al. 2017;Montesinos-López et al. 2018). These authors also showed that the performance of multi-trait analysis depends considerably on whether only a few traits are missing in just some individuals or all individuals. Precise estimates of genetic parameters bring new perspectives on the application of bayesian methods to solve modeling problems in the genetic improvement of flood-irrigated rice.
One of the great contributions of biometrics is the evaluation of the indirect response by selecting a certain trait. However, the problem related to the indirect response is when the traits present unfavorable correlations causing undesirable changes in others. When the selection is for the grain yield trait, other components of these traits are indirectly selected, such as 100-grain weight, grain length, width and thickness, and the grain length and grain width ratio, which is associated with grain yield (Li et al. 2019).
The results of the present study can potentially be applied in plant breeding to achieve more selection cycles per unit of time for multiple traits, to accurately assess genotype performance due to the low number of test environments or due to lack of replication, and to predict the performance of genotypes for different environments that present low heritability.

Conclusion
The Bayesian multi-trait model proved to be an adequate strategy for the genetic improvement of flooded rice. Furthermore, the bayesian multi-trait model has the potential for genetic evaluation of other crops.
The genotypes 2 and 15 were similar in the three environments, they should be selected as high-performance multi-trait genotypes.