3.1. Average and variability of soybean and maize departmental yields
As Fig. 7 shows, the departmental average yields (2000/2018) of soybean in the study area present highest values in those departments located towards the southeast of the province (MJ and UN with mean yields of 3463 and 3187 kg ha− 1, respectively), which are also the ones with the lowest standard deviation values (469.2 and 507.6 kg ha− 1, respectively). On the other hand, the lowest average yields are located in RC, JC and GSM (2384, 2455 and 2601 kg ha− 1, respectively). The departments that presented the greatest variation in soybean yields were GR and RS (735.3 and 673.4 kg ha− 1, respectively).
The maize crop also presents highest yields in those departments located in the southeastern of Córdoba province, with average yield values of 9346 and 8827 kg ha− 1 for MJ and UN, respectively (Fig. 8). The lowest yields are located also in the departments of RC and JC, with the addition of RP; their average yields were 5936, 6007 and 6581 kg ha− 1, respectively. Regarding the yield variability, Fig. 8 shows an increase in the standard deviation values from the southern-southeast portion of the Córdoba province, toward north-northwest, with the departments of PRSP and MJ showing the lowest values (732.6 and 938.8 kg ha− 1, respectively). The highest standard deviation values correspond to RS, RP and SJ departments with values of 1484.4, 1428.6 and 1412.9 kg ha− 1, respectively.
The spatial and temporal variations observed in maize and soybean yields are mainly due to the distribution of rainfall, which is more abundant and widespread towards the southeast of the province (Rolla et al., 2018), although they are also affected by other factors such as the type of soil (more sandy towards the southwest, (Jarsún et al., 2006)) and the contribution of water tables in the south of the province (Videla Mensegue et al., 2015).
3.2. Comparison of remote sensing data
The relationships between different remote sensing data used to monitor soybean and maize yields are shown in Fig. 9. The NDVI relationships, both with LST and TVDI, were inverse in both cases, denoting in general terms that the stress condition expressed by the increase of LST or TVDI determines a decrease in the production of plant biomass.
3.2.1. Relationship between NDVI and LST
Figure 9a shows very significant results (p < 0.001) for the 7 months considered as the summer crop season in the region; the correlation analysis between NDVI and LST shows very high negative correlation values, particularly during the intermediate stages of the growing cycle, when soil coverage tends to be complete. For October, the NDVI and LST correlation is a little weaker (-0.648). This behavior may be explained because the plots during the starting of growing season present a heterogeneous mosaic in the region, from those that have not been sown yet, to those that are in the first vegetative stages. A potential limitation in the use of thermal spectral information occurs at the initial stages of growth when the crop coverage is still low, since the soil surface is the dominant component of the scene (Akuraju et al., 2021).
During the intermediate stages, on the other hand, the correlation between NDVI and LST reaches the highest coefficients, when the crop coverage is complete and the greenness is more generalized in each plot. The greatest correlation value (-0.887) is reached in January. For maize, the highest value in the NDVI curve has been associated with the tasseling stage (Viña et al., 2004; Wang et al., 2020). Probably, the highest correlation is the product of the lower proportion of exposed soil at that moment, reducing the number of mixed pixels of vegetation and bare soil. Therefore, when the crop coverage is greater in relation to the highest NDVI value (de la Casa et al., 2018b) and the plots show less exposed soil, the lower (high) surface temperature can be interpreted unequivocally as a consequence of a more (less) intense transpiration rate.
After reaching the highest values in the middle season, correlation values decrease when the crops during March and April go through the last reproductive phases. In the senescence, the field condition again becomes complex in spectral terms (Viña et al., 2004; Martin et al., 2007). Due to the different sowing dates and also to the gradual senescence process, each plot presents a variable mixture of still active plant tissue and another that has lost chlorophyll in different proportions. It is why the lowest correlation values are manifested, being − 0.637 in March and − 0.509 in April for the entire study region.
3.2.2. Relationship between NDVI and TVDI
As Fig. 9b shows, there is also an inverse relationship between NDVI and TVDI. Although with relatively lower correlation values compared to LST, between − 0.146 (April) and − 0.767 (January), they are also statistically significant (p < 0.05). This inverse relationship is consistent to the extent that TVDI is an indicator designed to express the intensity of water stress and, consequently, is associated with a more restricted plant growth. Since TVDI was developed to reduce the influence of exposed soil on water stress, the lower correlation can be interpreted as a result of greater independence of the vegetation biomass.
Various authors (Sandholt et al., 2002; Patel et al., 2019; Wan et al., 2021), show the inverse nature of the relationship between TVDI and the soil water content. As the soils lose moisture and the TVDI values become higher, the stress condition translates into losses of agricultural productivity (Holzman and Rivas, 2016).
3.2.3. Relationship between LST and TVDI
Figure 9c presents the correlation between LST and TVDI monthly data throughout the crop cycle. All correlations are positive and greater than 0.7. This highlights the similarity of both indicators. The close relationship between the indicators used to monitor the crop condition on a regional scale, suggests that their performance to estimate yield should not be very different either.
3.3 Analysis of the relationship between indicators of water stress and maize and soybean yield.
An evaluation of the relationship between the spectral indicators of water stress and the maize and soybean yield in each department was carried out using a linear analysis. The coefficient of determination magnitude is an expression of the degree of variability that each spectral indicator can explain. As shown in Fig. 10, for General Roca department as example, the vertical bars indicate the determination coefficient obtained for the relationship between each spectral index (NDVI, LST and TVDI) with respect to maize and soybean yield, in the different months during the growing season. Superimposed on the R2 values, the curve of the monthly mean values of NDVI represents the seasonal dynamics of vegetation.
In this case, while the maximum R2 values (R2x) for soybean are 0.56, 0.37 and 0.34, for NDVI, LST and TVDI, respectively; the values for maize are 0.49, 0.55 and 0.44, respectively. The occurrence of R2x is always in the month of January. Soybean has 4, 3 and 1 significant (n = 19 and p < 0.05) months for NDVI, LST and TVDI, respectively; being in the case of maize 3, 2 and 1, respectively. A clearly feature observed is the lag between the month in which R2x occurs (R2xm) and when NDVIx occurs (NDVIxm), as shown in Fig. 10, in such a way that the greater relationship between productivity and stress indicators tends to appear a little before the NDVI curve reaches its maximum value for both crops.
3.3.1. Maximum coefficient of determination (R2x)
Figure 12 shows the predictive behavior of the set of spectral indicators in relation to the productivity of both crops. The relationships tend to be non-significant between October and December, during the sowing stage and the beginning of vegetative activity, when the NDVI curve presents minimum values. Next, the R2 presents a peak that generally results in the highest value (R2x) and, later, the correlation tends to decrease, when the NDVI curve reaches the maximum (February) and decreases during the reproductive phases of the cycle. The decline of the R2 values is more gradual during the reproductive stages, particularly for soybean. This behavior is similar to that reported by Liu et al. (2020) for wheat, barley and rapeseed crops in Canada, where the maximum correlation (using NDVI and EVI2) in all crops occurs in the mid-season, although the opportunity of the maximum varies between crops and regions.
According to Fig. 11 evidence, no indicator can be judged superior to another in predictive terms, since NDVI, LST and TVDI in some departments perform better than the rest. To reinforce this concept, a frequency evaluation considering both crops, as well as the departments set, shows that the R2x values from the NDVI present a proportion that is only slightly lower (27.3%) than both LST (36.4%) and TVDI (36.4%). While in the case of soybean exclusively, the proportion changes to 36%, 36% and 27%, for NDVI, LST and TVDI, respectively; for maize the values are 18%, 36% and 45%, respectively. In accordance with this, TVDI emerges as a slightly more generalized indicator to assess maize productivity.
It is worth noting that the predictive level of the univariate linear models, although generalized, is only moderate, since although in all departments there is some indicator that reaches a significant correlation value, in no case does R2x exceed 0.73 for maize (in JC and RS for the relationship with TVDI), nor 0.77 for soybean (in RS and TA for the relationship with NDVI).
The predictive behavior is heterogeneous in the region, to the extent that no particular pattern of R2x values is evidenced in both crops (Fig. 11). The yield estimation in JC, RS and UN is more favorable for maize since the three indices here have higher R2x values. On the other hand, for soybean yield estimation the higher R2x values in GSM, RS and TA correspond to the relationship between yield and NDVI.
3.3.2. Occurrence of R2 maximum (R2xm)
When R2x occurrence values (R2xm) were analyzed, a significant difference (p < 0.1) between soybean and maize crops is recognized for the TVDI, being later in the case of soybean. On the other hand, the NDVI in soybean tends to present the maximum value of R2 significantly (p < 0.05) later than TVDI (Fig. 11).
Liu et al. (2020) analyzed the relationship between crop yields (wheat, rapeseed and barley) and the seasonal patterns of MODIS vegetation indices in Canada. They found higher correlation values when the crop growth peak happens (at the end of July (January in the Southern Hemisphere) and early August (February)). Sakamoto (2020) determined from MODIS WDRVI that the highest correlation with county-scale yields occurs 13 days before the maize silking stage and 6 days before the soybean pod setting stage. Similar results were obtained in this study for maize and soybean, although a coarser temporal resolution was used here. Johnson (2014) determined that maize and soybean yields in the US production region were positively correlated with the mid-summer NDVI and negatively with the LST at the same period, which is consistent with the results obtained in this study.
3.4. Regression models with dummy variables
The statistical models to estimate maize and soybean yields were developed separately, by taking in account the different physiological nature of both crops. In addition, as the spectral indicators perform in a particular way for each region, probably due to the influence of territorial differences related to the edaphic variability and technological conditions, a specific relationship for each region had to be established.
Several multiple regression models were assessed by including different combinations of the spectral explanatory variables under study (NDVI, LST and TVDI). For both, maize and soybean, Table 1 shows the five selected models for each crop (maize and soybean) according to their good explanatory performance and parsimony condition. The adjusted R2, AIC, RSE, and degrees of freedom were calculated for selection and control of each proposed model. A technological component (TC) represented by the planting year (long-term yield variation) was also considered. For maize, the term TC is not always included in the models because, depending on the explanatory variables incorporated, the model may be significant (p < 0.05) or not. Instead, the technological term is always significant for soybean, showing that the effect of temporal tendency on yield is particularly relevant for this crop.
Models SM1 and SM2 for soybean and MM1 and MM2 for maize show a similar explanatory capacity of yield. These models have the particularity of incorporating as an independent variable the spectral indicator that presents the highest adjusted R2 value (LST from January for SM1/MM1 and NDVI from February for SM2/MM2).
The TVDI of January was incorporated only in MM5 and, in general, TVDI does not show a predictive capacity much better than the indices from which it was calculated (NDVI and LST). In this sense, given that the calculation procedure to obtain it is more complex, it seems more labor-saving to use directly the information of the NDVI and LST products.
Table 1. Regression models with dummy variables preselected for their explanatory capacity, and its equations. Also Adjusted R-squared (Adj. R2), Akaike Information Criterion (AIC), Residual Standard Error (RSE) and Degrees of Freedom (DF) values are presented.
Model
|
Equation
|
Adj. R2
|
AIC
|
RSE
|
DF
|
SM1
|
SY= int.+β1 LSTJ+β2 TC+ EC+RE
|
0.635
|
3108.1
|
396.2
|
196
|
SM2
|
SY= int.+β1 NDVIF+β2 TC+ EC+RE
|
0.626
|
3112.8
|
400.7
|
196
|
SM3
|
SY= int.+β1 J.LSTJ+β2 LSTF+β3 TC+ EC+RE
|
0.713
|
3058.8
|
351.3
|
195
|
SM4
|
SY= int.+β1 LSTJ+β2 NDVIF+β3 TC+ EC+RE
|
0.707
|
3062.8
|
354.7
|
195
|
SM5
|
SY= int.+β1 LSTJ+β2 LSTF+β3 NDVIF+β4TC+ EC+RE
|
0.722
|
3053.2
|
345.9
|
194
|
MM1
|
MY= int.+β1 LSTJ+β2 TC+ EC+RE
|
0.701
|
3424.7
|
845.0
|
196
|
MM2
|
MY= int.+β1 NDVIF+EC+RE
|
0.651
|
3456.1
|
912.9
|
197
|
MM3
|
MY= int.+β1 LSTJ +β2 LSTF+β3 TC+ EC+RE
|
0.777
|
3364.5
|
729.9
|
195
|
MM4
|
MY= int.+β1 LSTJ+β2 NDVIF+EC+RE
|
0.739
|
3395.8
|
788.6
|
196
|
MM5
|
MY= int.+β1 TDVIJ+β2 TC+ EC+[dept. TDVIJ]+RE
|
0.726
|
3415.2
|
808.2
|
186
|
References: SY and MY: soybean and maize yield. int.: intercept. EC: environmental component represented by the departments. TC: technological component. RE: residual error. β1…βn: regression coefficients. LSTx, NDVIx and TDVIx: are the LST, NDVI and TDVI of x month.
Although the models that achieve the greatest predictive capacity in absolute terms are MM3 and SM5, an interesting option are the SM4/MM4 models, because they not only present an appropriate performance (Table 1 and Figure 12), but also allow explaining yield by using January LST, as well as February NDVI. The use of models with a similar structure allows a more direct comparison between both crops, as well as to analyze a drought impact beyond the crop in question. Therefore the predictive behavior of these models (SM4 and MM4) is analyzed in more detail.
These models' conformation differ only in the TC variable, which the soybean model includes due to its significant character (p <0.05), and allows to contrast the effect of the explanatory variables (LST and NDVI, for January and February, respectively) on crop yield. In this sense, negative coefficients for LST (-89.6 and -215.2) and positive ones for NDVI (5067.1 and 9256.7) were obtained, for soybean and maize, respectively. The coefficient for TC is positive for soybean and its magnitude indicates that for each year the yield increases 27.5 kg ha-1, which suggests a positive impact of the technological contribution on soybean yields at regional level. However, this positive influence is not manifested in all sectors, as Figure 12 shows for RC and JC.
The good performance of the NDVI in February to explain crop yield variability may be associated with a lag effect of water stress impact during the previous month (with LST in January, associated to the beginning of the reproductive stage), and which manifests later in the growing season by the reduction of biomass, as well as by the reduction of the number and size of reproductive organs. Similar results were obtained by Johnson (2014) using NDVI and LST from MODIS, who reported a maximum correlation value between NDVI and maize yield in mid-summer and a similar but inverse LST response.
Because the region has an heterogeneous productive capacity, it is necessary to incorporate the departments into the model as a dummy variable to represent their particular productive potential for both crops. Both selected models show the department variable as significant (p <0.05), which reinforces this interpretation. In addition, as the interactions between the dummy variable and the spectral and TC indicators are not significant (p >0.05), they were not included. This behavior is supported by Figure 12, where the similarity of the slopes for each covariate with crop yields is observed in most of the departments.
The productive behavior of each year responds to a particular meteorological condition. For this reason, it is important to assess and validate the general model (fitted models with all the data, period 2000-2018) by analyzing year by year the error parameters changes. The consistency of the selected models was evaluated through the adaptation of Leave One Out Cross-Validation (LOOCV) test, for which the RSE was calculated by using the validation data of each year. The RSE obtained each year was, mostly, lower than that obtained for the general model. The mean RSE obtained for the set of years was 279.4 and 579.4 kg ha-1 for soybean and maize, respectively, which are below those obtained for the general models (354.7 and 788.6 kg ha-1, respectively).
The relationships between observed and estimated yields by the multiple regression analysis for maize and soybean crops at a regional scale are presented in Figure 13. As Figure 13 shows, the general adjustment of the models is moderate and similar for both crops, reaching an R2 value of 0.60 for maize and 0.58 for soybean, in correspondence to the also moderate individual performance of the spectral variables for each crop and department (Figure 10). Although the degree of explanation of the variability is moderate, Liu et al. (2020) also present R2 values between 0.53 and 0.7 for crop yield estimation models in Canada.
In summary, the regression models proposed with dummy variables, exhibit a stable and relatively accurate performance for estimating soybean and maize yields at regional level, using NDVI and LST of mid-season as model inputs.