A Multivariate Heterogeneous Variance Components Model for Multi-Environment Studies with Locational Genetic Effects

In this paper a multivariate heterogeneous variance components model is developed, which allows for determining location specific variance components in the analysis of multiple related traits. In addition to spatial heterogeneity, genetic similarities are also considered by assigning genetic variance components. The performance of the developed model is evaluated through an extensive simulation study and comparison of models are conducted by heritability estimations. Simulation study reveals that the developed method can well control the locational heterogeneity and under the developed model the heritability estimations are close to desired proportions. A real plant breeding data set is used for illustration.


Introduction
In last decades, there has been a growing interest in multivariate analysis due to the available extensive large-scaled data. Especially, in the field of interdisciplinary sciences such as statistical genetics, multivariate approach allows for analysis of multiple traits affected by the interplay of both genes and environmental influences. Moreover, these multiple traits are usually related to each other in several ways hence the correlation between multiple variants also should be taken into consideration by a multivariate approach. Linear mixed model (LMM) has been widely used to analyse the relationship between a trait and the genetic random factors, and multivariate linear mixed model (mvLMM) is an extension of LMM for modelling multiple traits simultaneously.
In LMM approach, the certain sources of variation are assessed by random effect variables and the variance-covariance parameters of such variables, referred to as variance components. Since, they are useful for explaining the total variation, estimation of variance components have become an essential part of the modelling process in most of the applied science fields. Particularly in animal and plant breeding studies, the familial and environmental relationships necessitate the usage of LMM due to the naturally clustered structure of the population.
Moreover, this clustered structure may often cause genetic diversity among the members of species owing to natural selection or geographic conditions and the genetic variance components serve for understanding the genetic ethology of the characteristics of interest.
Although variance components models have a long history, advanced computational strategies have been still developed for estimating variance parameters accurately (Henderson 1953;Smith and Graser 1986;Searle et al. 1992;Lynch and Walsh 1998). Especially, in mvLMM, due to the large dimensions of matrices, most of the extensions are presented as to overcome computational difficulties. For instance, Gilmour et al. (1995) develop a computationally convenient and extensive algorithm based on average information matrix for the estimation of variance parameters and Lee and Van der Werf (2006) discuss the efficiency of direct use of the variance covariance matrix with a general complex pedigree. Recently, an extension to the mvLMM based on genomic information is developed by combing the direct average information algorithm with an eigen-decomposition of genomic relationship matrix (Lee and Van der Werf 2006).
In addition the computational solutions, multivariate variance components models are need to be extended as to accounting for several different sources of relatedness and heterogeneity. Especially, in genetic analysis, most of the complex traits are affected by a collaboration of genetic and environmental factors and this collaboration may cause necessity of additional variance components. For instance, in multiple environment population studies, due to the interplay of genes and environment, it is misleading to assume that the genetic background is common across the different locations (Covarrubias-Pazaran 2016). In such study designs, specifying separate genetic variance components for each environment is a way to model the heterogeneity arising from the interaction of genetic and environmental factors.
In this paper, it is focused on the estimation of heterogeneous variance components of mvLMM for the analysis of multiple related traits across multiple locations. In addition to spatial heterogeneity, genetic similarities are also considered by assigning genetic variance components. Due to the genetic background and location clustered structure of the desired design, a complex multiple phenotype simulation is conducted that rely on genotype simulations. A multivariate heterogeneous variance components model is proposed taking the location specific variance components into consideration.
Then the matrix notation of mvLMM is where ∈ ℝ × is the design matrix for fixed effects and ∈ ℝ × is the design matrix for random effects. (2) The diagonal elements of and Z are identical in themselves, 1 = ⋯ = = * ∈ ℝ × and 1 = ⋯ = = * ∈ ℝ × and off-diagonal elements of and Z are null matrices. The random effects = [ 1 , 2 , … , ] and = ( 1 , 2 , … , ) ′ . is assumed to be normally distributed with zero mean and the variance-covariance matrix For multiple traits, the multivariate distribution can be assumed as where is the variance-covariance matrix of all observations.
Due to multivariate structure, the random effects covariance and the residual covariance between traits and are denoted by and = 2 , respectively ( = 1,2, … , ; = 1,2, … , and ≠ ). The term 2 is the residual covariance between traits and .
In mixed modelling approach, the random part of the model may have several components reflecting different grouping effects such as treatment effect, common environmental effect or serial effect for repeated measurements. Especially, in genetic studies, a random term is usually defined for considering the genetic similarities. In this case including genetic background effects, the components of the random vector for trait can split into two parts, genetic and non-genetic random effects. For simplicity, considering a single component for the random genetic effects over all individuals as a vector of total genetic value, = ( 1 , 2 , … , )′, then = ( , 1 , 2 , … , ( −1) ) ′ and the variance-covariance matrix for trait can be written as where is the genetic relatedness matrix and 1 , …, are the variance-covariance matrices of = ( − 1) random effect components other than genetic background.
Here, = 2 and 2 denotes the genetic variance component related to trait .
∈ ℝ × is the genomic relationship or kinship matrix. Similarly, the genetic component of the is = 2 and the term 2 is the genetic covariance between traits and .
In single-environment studies, 2 and 2 are assumed as also single.
To illustrate the location-specific variance components, a multiple trait design over 4 locations is considered ( = 4). For regarding the locational heterogeneity, the genetic component of the mvLMM is assumed as to have location specific variance 2 denoting the genetic variance component for environment ( = 1,2, … , ).
The log likelihood function of the mvLMM is

Estimation of Heterogeneous Variance Components
In this study, a multivariate Newton-Raphson iterative algorithm is used to obtain residual/restricted maximum likelihood estimates (REML) of variance components.

REML is often solved by updating variance components based on Hessian matrix or
Fisher's information matrix consisting of second derivatives of the log likelihood function (Searle et al 1992;Lynch and Walsh 1998). For a more efficient computation, the REML method is implemented via the average of the observed Hessian and the expected Fisher information matrices (Gilmour et al 1995;Lee and Van der Werf 2006).
In the average information Newton-Raphson (AI-NR) algorithm, the REML estimates are obtained with where Θ is the vector of variance components decomposed as in Equation (7) and AI is the average information matrix consisting of the second derivatives of the log likelihood function L.
For the mvLMM, AI matrix is directly derived from V as In a multiple trait design, considering observations collected consisting of L environments, the variance covariance matrix V can be rewritten as

Data Simulation
In the simulation scenario design, the number of multiple traits is assigned as three and number of samples are assigned as n=1000. Multiple traits are simulated as the sum of genetic background effect, locational effect and noise effects as described in Meyer and Birney (2018)  In the modelling process of the simulation study, the heritability estimations were obtained under mvLMM and heterogeneous mvLMM with location specific variance components. Estimation of heritability, ℎ 2 , relies on the partitioning of observed variation into unobserved genetic and environmental components (Wray and Visscher 2008). For the mvLMM with a general variance component, the heritability is estimated as the proportion of genetic variance to total variance, ℎ 2 = g 2 /( g 2 + e 2 ). However, in a multi-environment design with different locations, the heritability depends on the proportion of total genetic variance over locations and can be estimated as The results in Table 1 indicate how close heritability estimation to the true proportion (%30) of genetic variance.

Application to Real Data
In this section, to examine the performance of the developed model, the two variance components models were fitted to safflower (Carthamus tinctorius L.) data collected from   Table 2.

Discussion
In this paper a multivariate heterogeneous variance components model with location specific variance components is developed, which allows for determining location specific variance components. The simulation results show that the heritability estimations are closer to desired proportions under the developed mvLMM with location specific genetic variance components as the locational heterogeneity increases. Based on the real data results, developed heterogeneous mvLMM with location specific variance components fits data better in multi-environmental designs. Thus, our method can control for locational heterogeneity compared to an mvLMM with a general variance component.