A Kronecker-based covariance specification for spatially continuous multivariate data

We propose a covariance specification for modeling spatially continuous multivariate data. This model is based on a reformulation of Kronecker’s product of covariance matrices for Gaussian random fields. The structure holds for different choices of covariance functions with parameters varying in their usual domains. In comparison with classical models from the literature, we used the Matérn correlation function to specify the marginal covariances. We also assess the reparametrized generalized Wendland model as an option for efficient calculation of the Cholesky decomposition, improving the model’s ability to deal with large data sets. The reduced computational time and flexible generalization for increasing number of variables, make it an attractive alternative for modelling spatially continuous data. The proposed model is fitted to a soil chemistry properties dataset, and adequacy measures, forecast errors and estimation times are compared with the ones obtained based on classical models. In addition, the model is fitted to a North African temperature dataset to illustrate the model’s flexibility in dealing with large data. A simulation study is performed considering different parametric scenarios to evaluate the properties of the maximum likelihood estimators. The simple structure and reduced estimation time make the proposed model a candidate approach for multivariate analysis of spatial data.


Introduction
Multivariate random fields have been of interest from the very early days of the geostatistical literature, with an increasing number of proposed approaches as data sets became richer and the ever-increasing computational power. The specification of the covariance structure is central in the estimation and prediction process. Recent contributions include asymmetric models (Qadir et al. 2021;Alegría et al. 2018), modeling on spheres Alegría et al. 2019;, on mapping disease (Martinez-Beneito 2020;MacNab 2018MacNab , 2016, on multinary problems (Teichmann et al. 2021), to name a few.
We are interested in multivariate random fields analysis, in the specific context of spatially continuous data. Possible applications cover a wide range of disciplines, such as climatology, meteorology, geophysics, among others, where spatially referenced data is usually of interest. We consider two illustrative examples, one on chemical soil properties relevant for agriculture and another on North African temperatures, important for climatology and environmental sciences.
Let YðsÞ ¼ fY 1 ðsÞ; . . .; Y p ðsÞg > , on R d , d ! 1, a p-dimensional multivariate Gaussian random field with mean vector lðsÞ ¼ E½YðsÞ and matrix-valued covariance function: RðhÞ ¼ covfYðs 1 Þ; Yðs 2 Þg ¼ ½R ij ðhÞ p i;j¼1 ; where h ¼ s 1 À s 2 2 R d is the spatial separation vector. We consider a stationary and isotropic process (Chilès and Delfiner 2012;Diggle and Ribeiro Jr 2007;Gneiting 1999) where i ¼ j, the functions R ii ðhÞ in Eq. (1) describe the spatial variability of the ith process Y i ðsÞ, for i ¼ 1; . . .; p, and are referred as the direct-or marginal-covariance functions (Genton and Kleiber 2015) and, if i 6 ¼ j, the functions R ij ðhÞ in Eq. (1) describe the spatial variability between the process Y i ðsÞ and Y j ðsÞ and are called as crosscovariance functions. An important condition for (1) is that it must meet the positive definite condition, that is, a > Ra [ 0, for any vector a 6 ¼ 0.
The main goal is to propose a valid covariance specification for (1) in the case of spatially continuous multivariate data. The model, presented in Sect. 2, is based on the Kronecker products and it is quite flexible to handle with two or more variables. Furthermore, we present the conditions for positive definiteness of the proposed model, perform comparisons in terms of computational times and adequacy measures with classical models, and consider compactly supported covariance functions as an efficient approach to compute the Cholesky decomposition.
The literature on covariance functions for multivariate random fields is extensive. A careful review of the main works in the area can be found in Genton and Kleiber (2015) and Salvaña and Genton (2020). An intuitive early proposal and possibly the most traditional model is the linear model of corregionalization (LMC) (Goulard and Voltz 1992;Bourgault and Marcotte 1991;Wackernagel 2003). The key idea for the LMC is the overlap of spatial processes in order to induce a multivariate field. This approach is widely explored, including under the Bayesian approach for inference and prediction. Finley et al. (2015), Banerjee et al. (2003), Gelfand et al. (2004), Schmidt and Gelfand (2003) and Cecconi et al. (2016) are examples where the LMC structure underlies the models.
Another popular structure considers the class of Matérn correlation functions (Matérn 1986;Guttorp and Gneiting 2006). For the univariate case, the Matérn class covariance model is defined as r 2 Mðhjm; /Þ, where Mðhjm; /Þ ¼ 2 1Àm CðmÞ jhj=/ ð Þ m K m jhj=/ ð Þ, is the Matérn spatial correlation at distance jhj, K m is the modified Bessel function, r 2 ; m; / [ 0 are the variance, smoothness and scale parameters, respectively. When m ¼ 0:5, the Matérn model reduces to the exponential covariance function. Gneiting et al. (2010) elegantly extended this class for multivariate case considering the Matérn family for the marginal and cross-covariance functions. The authors present conditions for the parameters that lead to a valid covariance structure, the full bivariate Matérn model. For more than two variables the authors presented the parsimonious multivariate Matérn model, which considers common scale and constrained smoothness parameters. Another important specification are the separable models, which considers that the components of the multivariate random field share the same correlation structure (Bevilacqua et al. 2016a;Vallejos et al. 2020) and appears as a parsimonious modeling alternative because it allows a simplification of the more complex models. Bevilacqua and Morales-Oñate (2018)  The aforementioned models are widely assessed in the geostatistical literature. Cressie (1993), Gneiting et al. (2010), Goovaerts et al. (1997), Porcu et al. (2013), Bevilacqua et al. (2016b) noticed some difficulties to handling with the LMC due, for example, to its lack of flexibility and difficulty in recovering the smoothness of latent processes. Separable models are not capable to capture the different scales and smoothness for the variables under study (Bevilacqua et al. 2016a(Bevilacqua et al. , 2015. The bivariate Matérn model presents some restrictions in the parametric space. Vallejos et al. (2020) note that variation of the colocated correlation parameter is constrained by the values of the scale and smoothness parameters, resulting in difficulties for the estimation process and parameter interpretation.
The covariance specification presented here emerges as an additional modeling alternative for multivariate random fields that can be more flexible to deal with two or more variables, as it allows the model parameters to vary freely in their usual parametric domains. With a simple construction, the computational implementation has no major difficulties for a number of variables and sampling location points, with a parsimonious estimation computational time. Furthermore, unlike the separable models, our proposal allows different marginal correlation structures, making it able to capture the structure of each variable.
The article is organized as follows. In Sect. 2 we present our covariance specification for multivariate spatial data and discuss some results. Section 3 provides an efficient approach for calculating the Cholesky decomposition using compactly supported covariance functions. The dataset analyses are presented in Sect. 4. In Sect. 5, through a simulation study, we evaluate the properties of the proposed model estimators. Finally, the main conclusions are summarized in Sect. 6. The model implementation and reported analysis are performed using the computational statistical software R (R Core Team 2021).

Model specification
This section presents our proposed covariance specification for multivariate Gaussian random fields, which is based upon Martinez-Beneito (2013).We present the proof of its validity and discuss how to obtain the maximum likelihood estimates of the model parameters. We also present computational time estimation results comparing it with classical approaches.
In Martinez's proposal, the results are presented for modeling multivariate mapping diseases problems based on Gaussian Markov random fields (GMRF), which are discretely indexed, following a Gaussian multivariate distribution with the additional restriction of conditional independence (Rue and Held 2005).
Our proposal extends Martinez's approach to construct a covariance function for Gaussian random fields that are continuously indexed, with several applications in geostatistical problems.
We present a simple construction that allows its generalization to larger dimensions more easily. The idea is to write the cross-covariance matrix as a product of matrices that induce variability within processes and between processes, and it is built upon the Kronecker products reformulation of covariance matrices. The resulting construction will be always positive definite for any parameter values in their usual domains.
Consider a symmetric correlation matrix R b , with dimension p Â p, induces a correlation between spatial processes, while the marginal-covariance functions R ii , for i ¼ 1; . . .; p, model the variability within each process. We specify the covariance matrix for the Y process considering the generalized Kronecker product, presented in Martinez-Beneito (2013). Thus, for the Gaussian random fields continuously indexed, the matrix-valued covariance function is defined by: where,R ii is the lower triangular matrix of the Cholesky decomposition of the matrix R ii , Bdiag represents the matrix in diagonal blocks of the matricesR 11 ,R 22 , ...,R pp and I is the identity matrix. The structure defined in (2) is very flexible, allowing different marginal-covariance functions for R ii and different correlation structures for R b . Without loss of generality, we will consider that the correlation between the processes will be induced by the matrix: where q ij , ij ¼ 1; . . .; p, is the correlation parameter between the variables i and j.
To quantify the variability within each process, different marginal-covariance structures could be used in (2). In a general way, we can write: where RðhjW i Þ is a valid correlation function, with W i denoting the parameters vector that model the spatial dependence structure of the i-th component, for i ¼ 1; . . .; p. For simplification and without loss of generality, we can consider for RðhjW i Þ, the Matérn correlation function. Thus, the marginal covariance function takes the form: The structure specified by (2), (3) and (5)  with different smoothness and scale parameters for each variable. The simpler multivariate exponential (ExpSimpler) model is a particular case when the exponential correlation function is used. (Ribeiro et al. 2021) illustrates the ExpSimpler model in bivariate analysis of meteorological data. In Theorem 2.1, below, we prove the validity of our covariance specification for modeling multivariate spatial data.
Theorem 2.1 Let R ii , for i ¼ 1; . . .; p, the marginal covariance functions of dimension n Â n, R b a valid spatial correlation function of dimension p Â p and I the identity matrix of dimension n Â n, then the covariance function defined in (2) is a valid and full rank np specification for multivariate spatial data modeling.
Proof Since the marginal covariance functions, R ii , for i ¼ 1; . . .; p, are symmetric positive definite matrix, the matricesR ii , resulting from the Cholesky decomposition, are lower triangular with positive diagonal elements and therefore, full rank (Banerjee and Roy 2014). Thus, rankðR ii Þ ¼ n, for all i, and from the rank properties of block-diagonal matrix, the rank of any block-diagonal matrix is the sum of the ranks of its diagonal blocks (Banerjee and Roy 2014), that is: therefore, BdiagR > 11 ;R > 22 ; :::;R > pp is a full np rank matrix. On the other hand, since R b and I are positive definite matrices, it follows by the kronecker product properties that R b I ð Þis also a positive definite matrix (Hardy and Steeb 2019). With R b of dimension p and I of dimension n, the resulting kronecker product between them will be a positive definite matrix of dimension np. Now, for simplicity of notation, let's denote by A and B the respective R b I ð Þ and BdiagR > 11 ;R > 22 ; . . .;R > pp matrices. Since A is a positive definite matrix and B is a full rank matrix, follows that B > AB preserves not only the rank but also the positive definiteness (Gentle 2017;Petersen et al. 2008).
To visualize this, let a 6 ¼ 0, any vector of dimension np, and let z ¼ Ba, where z 6 ¼ 0, because B is a full rank matrix. Using the positive definite matrix definition, follows: The result holds if variables are observed in different numbers of sample locations, since the incomplete data can be treated as missing information and this does not imply any additional complexity and the proof of theorem 2.1 remains valid.
The dimensions of the resulting covariance matrix will depend on the number of sample locations for each variable considered in the analysis. If we consider marginal covariance functions with dimension n i , for i ¼ 1; . . .; p, the resulting covariance matrix will have dimension N ¼ P p i¼1 n i . Considering that R is a valid covariance specification for any valid choice of marginal-covariance and correlation functions, the proposed model allows its parameters to vary in their usual domains, favoring the inferential process and allowing the model parameters to be more easily interpreted.
In our covariance specification, considering the correlation structure defined in (3), the matrix-valued covariance function RðhÞ, can be written in a more compact form, in terms of the cross-covariance matrices between the process. Thus, for the ij-th component, with 1 i 6 ¼ j p, the cross-covariance function takes the form: Clearly, when i ¼ j, q ii ¼ 1 and we achieve q iiRii ðhÞR ii ðhÞ > ¼ R ii ðhÞ, the spatial covariance matrix of the i-th component.
The Theorem 2.2 shows that the separable model (Vallejos et al. 2020;Bevilacqua et al. 2016a) is a particular case of the simpler covariance model class.
Theorem 2.2 Let Y 1 ðsÞ; Y 2 ðsÞ; . . .; Y p ðsÞ a p-dimensional random field with the same spatial dependence structure for all Y i ðsÞ, i ¼ 1; . . .; p, then the simpler covariance model specified by (2), (3) and (4) is reduced to the class of separable models.
Proof By (6) and considering the general structure of the marginal covariance matrices, defined in (4), we see that in the particular case where the process components share the same spatial dependency structure, that is, . . .; p, the simpler covariance specification reduces to the class of separable models, ie, ...;p and q ii ¼ 1: Here,RðhjWÞ is the lower triangular Cholesky decomposition of the correlation matrix, RðhjWÞ. h

Estimation and inference
For the estimation process, let N ¼ np and Y ¼ fY > 1 ; . . .; Y > p g > , be the N Â 1 stacked vector of response variables, with N Â 1 mean-vector l ¼ fl > 1 ; . . .; l > p g > , where l i ¼ X i b i denotes the n Â 1 vector of expected values for the response variable Y i , i ¼ 1; . . .; p, with X i being a n Â k i design matrix composed of k i covariates and, consequently, b i denotes a k i Â 1 regression parameter vector. We suppress the spatial indexes for convenience.
We will denote the set of parameters to be estimated by . . .; b > p Þ > denotes the regression parameters vector and k ¼ ðq 1 ; . . .; q pðpÀ1Þ=2 ; r 1 ; . . .; r p ; m 1 ; . . .; m p ; / 1 ; . . .; / p Þ > is the covariance specification parameters vector. Considering y the stacked vector of observed values, the log-likelihood function for h is given by: The covariance matrix proposed in (2) involves block-diagonal matrices and a Kronecker product. Thereby, the calculation of its determinant and inverse can be expressed, It is worth mentioning that the calculation of the inverse of the covariance matrix in the log-likelihood function will not involve the complete matrix but only the Cholesky decompositions of the marginal covariance matrices, which allows computational advantages and reduction of the estimation time. Therefore, the log-likelihood function at (7) can be rewritten as, where k ðiÞ jj is the jth diagonal element ofR ii . The vector of expected values, lðbÞ, depends on the regression parameters while the covariance specification RðkÞ depends on the covariance parameters vector, k. We obtain the maximum likelihood estimators by maximizing the function in (8) with respect to the h parameter vector.
The proposed covariance specification, in addition to its flexibility and interpretability of its parameters, also reduces the number of parameters to be estimated when compared to the multivariate Matérn model (Gneiting et al. 2010), which is quite useful from the point of view of the estimation process. Considering the Matérn correlation function, the number of parameters involved in the MatSimpler model is pðp þ 1Þ=2 þ 2p, where p is the number of variables. In contrast, the number of parameters for the multivariate Matérn model is pðp þ 1Þ=2þ 2ðpðp þ 1Þ=2Þ. Table 1 summarizes how the proposed covariance specification reduces the number of parameters as p increases. For p [ 7, the number of parameters for multivariate Matérn model is more than twice the number of parameters for the MatSimpler model.
In Sect. 5 through a simulation study we evaluate some properties of the maximum likelihood estimators. In Appendix A we describe the score function, the Newton scoring iterative algorithm and we find the Fisher information matrix associated with the proposed model.

Prediction
For the case of multivariate spatial data, spatial prediction is a generalization of the univariate case that consists of predicting Y at some unknown location, s 0 , based on other sample information, s i , for i ¼ 1; . . .; n (Ver Hoef and Cressie 1993; Bivand et al. 2008;Pebesma 2004).
Let R Y 1 be the covariance matrix of Y 1 ¼ Yðs i Þ, for i ¼ 1; . . .; n, R Y 0 be the covariance matrix of Y 0 ¼ Yðs 0 Þ, R Y 1 Y 0 be the covariance matrix between Y 1 and Y 0 and d 0 ¼ Bdiagðx 1 ðs 0 Þ; . . .; x p ðs 0 ÞÞ. Then, the best linear unbiased predictor for Y 0 is: with prediction covariance matrix: We can replace the unknown parameters of the model by their respective maximum likelihood estimators (Martins et al. 2016) and, with this, we obtain the stacked vector prediction for the p variables in the s 0 unobserved locations.

Computational resources
In our computational implementation, we used the R statistical software (R Core Team 2021) for the estimation, simulation and prediction procedures. For the implementation of the proposed covariance matrix specification, we used the kronecker function, basic to R, which enables the calculation of the kronecker product between the matrices R b and I. We use the Matrix (Bates and Maechler 2021) package to perform matrix operations more efficiently, such as the Cholesky decomposition of the marginal-covariance matrices, through the chol function, and the construction of the diagonal block matrix BdiagR 11 ;R 22 ; . . .;R pp À Á , through the bdiag function. Furthermore, the crossprod and tcrossprod functions allowed a more efficient calculation of products between matrices. When working with the Matérn correlation function we use the matern function from geoR (Ribeiro Jr et al. 2020) package. For the efficient calculation of the Cholesky factor, when working with sparse correlation functions, we use functions from the spam package (Furrer and Sain 2010).
For simulations we use the rmvn function from mvnfast package (Fasiolo 2016) that provides computationally efficient methods related to the multivariate normal distribution. For the log-likelihood function optimization, we use optim function. R codes are available in on-line supplementary material.

Computational results
As already mentioned, the Simpler covariance model can be extended for more than two variables with relative ease. The computational time estimation will depend on the number of variables and sample locations considered in the analysis. To illustrate the computational time spent on estimation, we implement the generic model for p variables and n sample locations and simulate scenarios of the MatSimpler model considering different sample sizes and variable numbers. To make the simulation process easier, we set / i ¼ 0:2, m i ¼ 0:5, r i ¼ 0:3, for all i ¼ 1; . . .; p. The correlation parameters were chosen between -0.7 to 0.7 such that the resulting R b was a valid structure. We consider the number of variables, p, ranging from 2 to 6 and the number of sample locations, n ¼ ð100; 225; 400; 625; 900Þ, taken in a unit square grid.
The results, presented in Appendix B (Fig. 7) shows that the estimation computational time increases with the number of variables and sample locations, which is due to the Cholesky decompositions of the marginal-covariance matrices. In the Sect. 3 we present an approach to make the Cholesky decomposition calculation more efficient, especially for large data.
We also compare the estimation computational times for the bivariate case of the MatSimpler model with three other literature models, the bivariate Mate´rn model with constraints (MatConstr), the bivariate separable Mate´rn model (MatSep) and the LMC model. The data were simulated from the MatConstr model. We set / i ¼ 0:2, m i ¼ 0:5, r i ¼ 0:3, for i ¼ 1; 2, and q 12 ¼ 0:8. The MatConstr, MatSep and LMC models were estimated by GeoModels package, in which we consider the standard likelihood function. For all models we used the Nelder-Mead optimizer and convergence was successful in all scenarios.
The Table 8 in Appendix B presents the results for all models with respect to the elapsed estimation time (Elps.Time), number of iterations required (N.Iter) and average time per iteration (Time.by.Iter) and Fig. 8

An efficient way to calculate the Cholesky factor
The Matérn covariance model is a globally supported model that has been widely used in spatial statistics due to its flexibility and for its well-discussed theoretical justifications in the area of spatial statistics. It has interesting particular cases, such as the exponential and Gaussian models. However, from a computational point of view, this model presents the restriction of generating dense matrices and in this case, the calculation of the Cholesky decomposition becomes impractical when the sample size n increases. Recent approaches (Bevilacqua et al. 2019(Bevilacqua et al. , 2022 suggest that working with compactly supported covariance matrices has a computational advantage over globally supported covariance models, such as the Matérn model, for example, since they favor the use of algorithms for sparse matrices, reducing the computational complexity and consequently the estimation time.
As previously described, our Simpler model specification has the flexibility to accommodate different marginal covariance structures, unlike the multivariate Matérn model and its simplifications, which consider only the Matérn correlation function in its marginal specifications. Therefore, in this section, we will show how the use of compactly supported covariance functions can be used to reduce the computational complexity, and consequently the estimation time for large samples. In particular, we will work with the Generalized Wendland model.
The Generalized Wendland family represents a class of isotropic correlation functions defined by: where B(.) denotes the beta function. The function in (9) where/ m;l;/ ¼ / Cðl þ 2m þ 1Þ CðlÞ 1 1þ2m , is a positive definite reparametrization of the Generalized Wendland family for l ! ðd þ 1Þ=2 þ m, m ! 0 and / [ 0, whose compact support is jointly specified by the parameters m; l and /. This reparametrization allows considering, from the same point of view, correlation functions of compact and global support. In particular, the Matérn model, Mðhjm þ 1 2 ; /Þ, is a special limit case of the model w m;l;/ ðhÞ (when l ! 1).
To illustrate the reparameterized Wendland compactly supported model in reducing computational complexity and estimating time, we simulate a zero-mean Gaussian random process with MatSimpler covariance model. We set r 1 ¼ r 2 ¼ 0:3 and / 1 ¼ / 2 ¼ 0:1, so the practical range is approximately 0.3. We estimate the parameters considering the MatSimpler and the RGW-Simpler models, where the notation RGW-Simpler will be used to denote the model specified by the equations by (2), (3) and (4), considering the reparametrized Generalized Wendland model, in (10), for the correlation function, RðhjW i Þ.
Regarding the RGW-Simpler model estimation, we set l ¼ 1:5, resulting in a sparse covariance matrix with percentage of null entries around 85%. The vector of parameters to be estimated is h ¼ ð/ 1 ; / 2 ; r 1 ; r 2 ; q 12 Þ > .
The Table 2, presents the Elps.Time, N.Iter and the Time.by.Iter (in seconds). In addition, it provides information about the log-likelihood value and parameter estimates.
Although the log-likelihood values were smaller for the RGW-Simpler model, the reduction in estimation computational time is notable when considering large sample sizes. This shows that considering compactly supported marginal-covariance structures can additionally reduce the computational complexity of the model.

Data analysis
This section illustrates the application of the proposed model in two datasets. The first one, the soil dataset, aims to model the relationship between hydrogen content and the catium

Example 1: Soil data
The soil250 dataset from geoR package (Ribeiro Jr et al. 2020) contains some soil chemistry properties measured on a regular grid with 10x25 points spaced by 5 meters. We study the relation between the hydrogen content and the catium exchange capability (CTC). These data illustrate a scenario with strong correlation between the variables under study. Figure 1 shows circle plots of the hydrogen (left panel) and CTC (right panel) data separately. The coordinates are divided by a constant to easy the visualizations. Figure 2 shows the histograms of each variable. Scatterplots of each variable against each spatial coordinate in Fig. 3 are used to check for spatial trends. We proceed with the residuals of a linear regression with constant trend as a realization of a zero-mean bivariate isotropic stationary Gaussian random field. The sample correlation between the variables was 0.68. Standard deviations are 0.60 for hydrogen content and 0.77 for the CTC.
The models considered in the estimation step are the MatSimpler, LMC, MatConstr, MatInd and MatSep. For MatSimpler and MatConstr the vector of parameters to be estimated is k ¼ ð/ 1 ; / 2 ; m 1 ; m 2 ; r 1 ; r 2 ; q 12 Þ, for the separable model, MatSep, we have / 1 ¼ / 2 and m 1 ¼ m 2 and for the independent model we have q 12 ¼ 0. The LMC model is parameterized by k ¼ ða 11 ; a 12 ; a 22 ; a 21 ; / 1 ; / 2 Þ, where a 11 ; a 12 ; a 22 and a 21 are elements of the corregionalization matrix and / 1 and / 2 are parameters of the exponential correlation model. Table 3 presents the parameter estimates, the maximized log-likelihood (LL), the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values. The standard errors for all models (in parentheses) were calculated using the Hessian matrix approximation. The MatConstr, MatInd, LMC and MatSep models are estimated using the GeoFit function from the GeoModels package (Bevilacqua and Morales-Oñate 2018) for which the standard likelihood function and the Euclidean distance were considered. The Nelder-Mead optimizer is the algorithm of choice for all model fits. Table 4 presents the Elps.Time, N.Iter and the Time.by.Iter (in seconds), considering each model. The R codes are available in the supplementary material.
The predictive behavior was assessed by a random training selection of 200 locations (80% of the data), from which we estimate the models under study and compute the mean absolute error (MAE), the root mean square error (RMSE) and the normalized mean square error (NMSE) for each model using cokriging predictor for the 50 remaining locations (20% of the data). These measures are defined as,  jY i ðs k Þ ÀŶ i ðs k Þj; whereŶ i ðs k Þ is the cokriging predictor of the variable Y i ðs k Þ, with i ¼ H; CTC, representing hydrogen content and CTC, respectively. We repeated the same process 150 times, calculating the values of the MAE i , RMSE i and NMSE i , for each variable each time. Table 5 presents the summary of this measures. In general, all models presented equivalent results in terms of predictive capacity. The MatSimpler model showed a better fit when compared to the other models in terms of log-likelihood, AIC and BIC values. The Elps.Time and the N.Iter for the MatSimpler model were much lower than the other models. Thus, we observe that the proposed MatSimpler model is a competitive model with the classical models in the literature.

Example 2: North Africa's minimum and maximum temperature
In this section we illustrate the application of the proposed model to a large dataset. We considered data of minimum and maximum average temperatures for the years 1970-2000. In particular, a region of North Africa in the period of September observed at 3061 locations. Data were obtained from WorldClim (www.worldclim.org), a high spatial resolution climate database (Fick and Hijmans 2017). Figure 4 shows the spatial locations of the temperatures along the region under study. We detrend the data to remove patterns along the coordinates and consider the resulting residuals as a realization of a bivariate zero-mean Gaussian random field. We transform the coordinate system to UTM using the spTransform function from the sp package (Pebesma and Bivand 2005) and divide the coordinates by a constant to facilitate the estimation process. We consider the Euclidean distance. The empirical variograms of the residuals are presented in Fig. 5.
In the estimation step, aiming to demonstrate the computational efficiency of our proposal for large data sets, we consider the RGW-Simpler model and the MatConstr classical model. To make the estimation process more agile we set m 1 ¼ m 2 at 0.5, for the Mátern model, and at 0, for the RGW model. Regarding the RGW model we also set l ¼ 1:5 in order to obtain sparse matrices. The parameters vector to be estimated is h ¼ ð/ 1 ; / 2 ; r 1 ; r 2 ; q 12 Þ > , where the indexes 1 and 2 represent the minimum and maximum temperature parameters, respectively. The N.Iter, Time.by.Iter (in seconds), log-likelihood value and parameter estimates are presented in Table 6.  For the MatConstr model, the convergence was not successful, so it was not able to estimate the standard errors of the estimators. In contrast, for the RGW model, the convergence was successful and the standard errors are shown in parentheses in Table 6. The number of iterations required for estimation and the average time per iteration were also notably lower considering our covariance specification. The RGW-Simpler model proved to be more advantageous in terms of estimation time and computational complexity, when compared to the MatConstr model.

Simulation study
This section presents a simulation study to evaluate the behavior of the maximum likelihood estimators. We consider the MatSimpler model and explored a bivariate random field in three different scenarios to illustrate different situations that could occur in practice, exemplifying the flexibility of the proposed model in each case. Table 7 summarizes the simulated scenarios.
Scenario 1 considers the variables have smaller variability, for which we consider relatively small values for the variance parameters: r 2 1 ¼ 0:25 and r 2 2 ¼ 1:0. Also in this scenario, we consider smaller smoothness for the variables, with values equal to m 1 ¼ 0:3 and m 2 ¼ 0:4. When m ¼ 0:5 the Matérn correlation function reduces to the exponential correlation function. Thus, this scenario represents lesser smoothness marginal behavior for the variables when compared to the exponential correlation model. Scenario 2 keeps m 1 ¼ 0:3 and m 2 ¼ 0:4, and considers a greater variability with variances values equal to r 2 1 ¼ 2:25 and r 2 2 ¼ 4:00. Scenario 3 considers a situation of smaller variability, with variance values fixed at r 2 1 ¼ 0:25 and r 2 2 ¼ 1:0, and considers higher smoothness values for variables: m 1 ¼ 0:7 and m 2 ¼ 1:0. In this scenario, we have marginal behaviors that are smoother in comparison with the exponential correlation model.
In all scenarios, we set the scale parameters values in / 1 ¼ 0:05 and / 2 ¼ 0:1. For the correlation parameter between the variables the values considered are q 12 ¼ ðÀ0:7; À0:4; 0:0; 0:4; 0:7Þ, aiming to illustrate different correlation structures that could occur in practice between the variables, with values ranging from a strong negative correlation (q 12 ¼ À0:7) to a strong positive correlation (q 12 ¼ 0:7) and including the no correlation case where q 12 ¼ 0.
For each scenario, 500 samples of a bivariate stationary isotropic Gaussian random field, of sizes 100, 225, 400 and 625, were simulated in a regular unit grid.  bias plus and minus the expected standard error for estimators of the model for each scenario.
To facilitate the visualization we follow Bonat and Jørgensen (2016) and Petterle et al. (2019), considering, for each parameter, standardized scales with respect to the standard error of the sample size 100, that is, for each parameter, the expected bias and the limits of the confidence intervals are divided by the standard error obtained on the sample of size 100. Standard errors and biases gets closer to zero as the sample size increases for the

Discussion
We presented a covariance specification for multivariate Gaussian random fields given by the product of matrices for continuously indexed data. The model is simple whilst flexible, allowing for different correlation structures and marginal-covariance functions. Its structure facilitates the specification, estimation, computation and generalization for more than two variables. As it is a flexible structure allowing different marginalcovariance specifications, we considered the Matérn correlation function, for being flexible, widely discussed in the geostatistical literature and for having a closer relationship with the traditionally discussed models. We also consider the reparameterized Generalized Wendland model to illustrate the flexibility of our specification in allowing compactly supported covariance structures that bring computational advantages for large datasets.
We illustrate the computational time growth for the MatSimpler model for up to six response variables and 900 sample locations. In this scenario, the estimate time was approximately two hours (Fig. 7). Precise times will be hardware dependent. However, comparing it with some literature models for the bivariate case (Fig. 8), our proposal was competitive, presenting a lower average time per iteration and number of iterations required, specially when compared to the MatConstr model, which has the same number of parameters.
The analysis of two data-sets illustrate the ability to deal with different covariance structures. The use of compactly supported covariance functions made it possible to deal with large data sets due to the efficient calculation of Cholesky factors for sparse matrices, showing that the proposed model has a good balance between flexibility and computational complexity, with reduced computational estimation times when compared to other competing classical approaches.
Future directions include the organization and construction of an R package that involves the proposed approach. Furthermore, the presented proposal opens options for future research in the context of non-Gaussian modeling and asymmetric data for multivariate spatial problems.
We achieve the maximum likelihood estimator of b by solving U b , which results in: Making similar calculations, we find the Fisher information matrix which, for b, is given by: For k, the ði; jÞ th entry of the Fisher information matrix, is given by: where W k i ¼ ÀoRðkÞ À1 =ok i . Consideringĥ ¼ ðb > ;k > Þ > the maximum likelihood estimator of h parameter, the asymptotic distribution ofĥ iŝ Fisher information matrix of h. This result is compatible with the increasing domain regime (Cressie 1993). The maximum likelihood estimates of k can be found through Newton's scoring iterative algorithm (Bonat et al. 2020): whereh ¼ ðb > ; k > Þ > and a controls the step length. Now, let q r , for r ¼ 1; . . .; pðp À 1Þ=2, denoting the correlation parameters of R b , r 2 i , / i and m i , denoting the variance, scale and smoothness parameters of the marginalcovariance matrix, R ii , for i ¼ 1; . . .; p.
The partial derivative of the matrix-valued covariance function R, with respect to each correlation parameter, q r , is given by: An analogous procedure to the Eq. (12) can be used to obtain the derivatives with respect to / i and m i . Thus, to obtain the derivatives of R with respect to each parameter, we must calculate the parcial derivatives in (12). Using the result of partial derivatives of Cholesky's factorization (Särkkä 2013;Bonat and Jørgensen 2016), follows: where Uð:Þ is the strictly lower triangular part of the argument and half of its diagonal.
Appendix: Computational results Author Contributions All authors contributed equally to the design and preparation of the material, with comments and suggestions for improvements. All authors read and approved the final manuscript.
Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.