Evaluating spectral indices from MODIS to predict maize and soybean regional yields

doi:10.21203/rs.3.rs-3224403/v1

Download PDF

Research Article

Evaluating spectral indices from MODIS to predict maize and soybean regional yields

https://doi.org/10.21203/rs.3.rs-3224403/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

A regression model with spectral information and dummy variables was developed and evaluated for predicting regional maize and soybean yield in the agricultural rain-fed region of Córdoba, Argentina. The study area comprises eleven departments that currently harvest more than 80% of the provincial production of maize and soybean. In this study monthly Normalized Difference Vegetation Index (NDVI) product (MOD13C2) and daytime Land Surface Temperature (LST) product (MOD11C3) derived from the MODIS sensor on board of TERRA satellite were used as model input. From these data Temperature Vegetation Dryness Index (TVDI) was calculated and assessed also. In total, 19 summer crop seasons were analyzed between 2000/2001 and 2018/2019. There is a close and negative relationship between the NDVI, with both LST and TVDI. The best regression models with dummy variables were selected to estimate yield variation on a regional scale are integrated both with spectral information, as LST from January and NDVI from February, and factors linked to edaphic and management differences of each department, as well as the technological improvement in the model for soybean. By using an adaptation of the Leave One Out Cross-Validation (LOOCVad) technique, model accuracy was verified. The Residual Standard Error (RSE) obtained each year was, mostly, lower than that obtained for the entire record (general models). The mean RSE obtained for the set of years was 279.4 and 579.4 kg ha^− 1 for soybean and maize, respectively, which are below those obtained from the general models (354.7 and 788.6 kg ha^− 1, respectively).

NDVI

LST

TDVI

Water stress

Accurate crop production predictions before harvest are important for those countries, where agriculture contributes substantially to the economy. It is also relevant information, particularly for small nations states and developing countries, to make decisions about logistics and commercial activities and on food security also, among other uses. Agricultural yield data are obtained more conventionally by survey methods that, not only are time consuming, but also fail to guarantee regional-scale timeliness (Kingra et al., 2021; Zhu et al., 2021).

Different methods and data sources are commonly used to crop yield forecast: field studies, statistical regressions between historical yield records and seasonal variables (agrometeorological or remote sensing data), crop simulation models or by integrating statistical models with simulation of crops based on dynamic processes (Basso and Liu, 2019). Through the fast development of sensors, remote sensing has gained attention for crop monitoring, from the field scale to large regional areas (Zhu et al., 2021).

Drought stands out as the predominant cause of low yields worldwide (Leng and Hall, 2019; Wang et al., 2020), and it constitutes the main climatic restriction faced by rain-fed agricultural activity in the central region of Argentina. In a warm-temperate environment, as the province of Córdoba, Argentina, presents during the summer season, the insufficient or untimely supply of water constitutes the main factor that limits crops productive capacity and generates yield gaps in both maize (de la Casa et al., 2019) and soybean (de la Casa et al., 2018a).

Vegetation indices (VI), such as the Normalized Difference Vegetation Index (NDVI), are simple and effective algorithms conceived from spectral data to remotely evaluate the type of vegetation cover, its vigor and growth dynamics (Xue and Su, 2017; Nolasco et al., 2021). The negative anomaly of VI would be indicative of a lower amount of biomass produced with respect to a reference value in a region and would highlight a possible drought scenario or other restrictive growth contingency. In addition, since VI are directly related to the amount of vegetation but not necessarily with harvested organs, variability of VI does not always effectively represent the variability of yield caused by environmental stress (Sakamoto, 2020).

The yield reduction due to water stress depends not only on severity of drought but on the sensitivity of different phenological stages to the lack of water episodes. Bolton and Friedl (2013) employed a two-band variant of the Enhanced Vegetation Index (EVI2) and the Normalized Difference Water Index (NDWI) to predict U.S. crop yields. They found that the best dates to predict maize and soybean yields were 65–75 days and 80 days after the MODIS-derived green up stage, respectively. Liu et al. (2020) used multiple linear regression to estimate wheat, barley and rape yield, using alternatively NDVI, EVI2, Gross Primary Production (GPP) and Net Primary Production (NPP) MODIS data. The general model consisted only of a satellite-based indicator and the year to consider the increase in yield over time due to the incorporation of technology.

In a scenario where soil water availability is insufficient to sustain atmospheric demand, the transpiration rate results necessarily reduced. For this reason, the plant must increase the dissipation of energy as sensible heat, which implies an increase in leaf and canopy temperature. Al Faisal et al. (2021) evaluated seasonal Land Surface Temperature (LST) changes from 2000 to 2020 with Landsat data at 10- year intervals in Bangladesh. They found that the average LST increase in this region was associated with water availability and crop yield, as well as increase in drought vulnerability and extreme weather events in the study area. In this sense, Johnson (2014) developed yield estimation models for maize and soybean using only NDVI and daytime LST data.

Another procedure developed to evaluate drought is the Temperature Vegetation Dryness Index (TVDI), proposed by Sandholt et al. (2002), that is determined directly from the relationship between LST and NDVI. The TVDI proved to have an acceptable relationship with surface soil moisture, both in a temporal and regional context. Wan et al. (2021) identified the sensitive period of maize water demand by linear regression analysis of the measured maize yield and TVDI values for eight-day intervals. Holzman and Rivas (2016) developed predictive models of maize yield on a regional scale based on TVDI.

The objective of this work was to develop and evaluate predictive models to estimate regional yield for maize and soybean by using NDVI, LST and their combination from the TVDI data as water stress indicators and considering different time windows during the crop growing season. In an attempt to establish predictive yield models for both crops destined to an early warning system, as complementary objectives it was proposed to evaluate the differences that can appear between crops, subject to the different productive response of both species to drought. In addition, to analyze the moment in the growing season when this spectral information reaches a greater predictive potential of the yield.

2.1. Study area.

The study area includes eleven departments located in the south east of Córdoba province (Fig. 1). Currently, more than 80% of the provincial maize and soybean production is harvested in these departments (MAGyP, 2020).

In this region the soils are classified as Entic and Typic Haplustolls, and they present a slightly undulating flat relief developed on loessic material, of silt loam texture with a small slope to the east. In the south and southwest of the study area, the soils present a higher percentage of sand resulting in the subregions: sandy pampa, floodable sandy pampa and medanous pampa (Jarsun et al., 2006).

The climate in the study area is classified as dry sub–humid (Mather, 1965), and the average annual rainfall is approximately 800 mm, concentrated in summer and shows a decreasing gradient from east to west (Rolla et al., 2018). In this area, agricultural production is mainly under a rainfed regime and the two predominant crops, in summer, are soybeans and maize (Sayago et al., 2017). In this work we consider a growing season for summer crops from October to April of next year. Plot sizes that exceed 50 ha, cover more than 90% of the agricultural area (Ghida Daza and Sánchez, 2009). In the study area, agricultural production predominates, or it is associated with livestock in a greater percentage than in the rest of the province's territory (Guida Daza et al., 2019).

2.2. Evolution of maize and soybean sowing area and grain yield

The Ministerio de Agricultura, Ganadería y Pesca of Argentina provided departamental maize and soybean yield data for the 11 major rainfed agriculture departments in Córdoba province (MAGyP, 2020).

A main aspect to highlight is the significant variation in the sowing area that both crops experienced in the region between 2000 and 2018. To avoid short-term variability, caused by changes in the markets, weather conditions, etc., average of the planted area was calculated for the years 2000–2004, 2005–2009, 2010–2014 and 2015–2018.

According to Fig. 2, which shows the fraction of the sowing area with respect to the total area of each department for both crops, at the beginning of the series maize occupied less than 10% of the territory and currently the fraction has been doubled, so that currently the percentage is about 20%. A similar but larger increase is observed for soybean since the sowing fraction at the beginning had a range between 10 and 20%, which currently amounts to values between 40 and 60%, depending on the department, but with a trend to decrease in recent years.

Figure 2 also shows that this behavior has been very uniform throughout the region and denotes a sustained increase in agricultural activity in the Córdoba province as response to the persistence of favorable climatic conditions (de la Casa and Ovando, 2014).

This information also shows that although the spectral response in the entire study area corresponds to a large extent to the production of maize and soybean, the signal originates from a mixture of both crops with a particular evolution of planting area in each department during the analysis period. For this reason, it is considered a priori that this analysis developed on a regional (departmental) scale has particularly more potential to represent spatial and temporal variations of the agricultural behavior in a broad sense.

2.3. NDVI and LST Data

In this study monthly NDVI product (MOD13C2) and LST product (MOD11C3) derived from the MODIS sensor on board of the TERRA satellite were used.

The MOD13C2 Version 6 product provides NDVI values at a per pixel basis in a 0.05 degree latitude/longitude Climate Modeling Grid (CMG). A CMG granule is a geographic grid with 7200 columns and 3600 rows representing the entire globe. More details on MODIS NDVI products can be found in Huete et al. (2002).

The MOD11C3 Version 6 product provides monthly daytime LST values also with 0.05 degree latitude/longitude CMG resolution. For more details about the MOD11C3 product refer to Wan (2014).

According to availability of yield and remote sensing data, 19 summer crop seasons from October to April of next year were used (from 2000/2001 to 2018/2019). A spatial subset for MOD13C2 and MOD11C3 products, using a bounding box (from 29°S 66°W to 35°S 61°W ) containing the entire Córdoba province, was used.

Both products were obtained from the Giovanni online data system developed and maintained by NASA GES DISC (Acker and Leptoukh, 2007).

2.4. Temperature vegetation dryness index (TVDI)

In each pair of NDVI-LST images, the NDVI range between 0.15 (bare soil) and the maximum NDVI value was considered. That range was divided into 10 bins and within each of them the 0.1 and 99.9 percentiles of LST were calculated.

Linear regression was performed between the 99.9 LST percentiles of each bin and the central NDVI of each interval, thus the equation of the dry line was obtained. In a similar way, the 0.05 LST percentiles were used to obtain the equation of the wet line. As an example, Fig. 3 shows the LST-NDVI space and the determination of the dry and wet lines for the LST and NDVI images corresponding to November 2001.

Once the dry and wet lines were available, a TDVI image was generated, calculating the TDVI values (Eq. 1) of each pixel from the NDVI and LST values as:

\({TDVI}_{i}=\frac{{LST}_{i}-{LST}_{w}}{{LST}_{d}-{LST}_{w}}\) (Eq. 1)

Where LST_i is the value of LST in a given pixel, LST_w and LST_d are the values obtained from NDVI_i with the equation of the wet and dry lines, respectively. If TVDI_i value is greater than 1 then the resulting value of TVDI will be 1, and if the value of the index was less than 0 it was made 0.

For each set of TDVI, NDVI and LST images, the average of all the pixels within each department was extracted, and they were correlated with the departmental soybean and maize yield values. The flowchart is shown in Fig. 4.

2.5. Regression models with dummy variables

The models presented 3 components (Eq. 2 and Fig. 5): the spectral indices (SI) such as NDVI, LST and/or TVDI for the months that best explain yield variability. The environmental component (EC) represented by the departments that constitute the heterogeneous productive capacity of the region and the technological component (TC) that uses the harvest year as a proxy, were considered through the use of dummy variables. The best five models for maize and 5 for soybean were selected for further analysis.

\(CROP YIELD=SI+ EC+TC + residual error\) (Eq. 2)

In order to evaluate the explanatory performance of the models, the Adjusted R-Squared (R²), Akaike Information Criterion (AIC) and the Residual Standard Error (RSE) statistics were used. In addition, to select the models, the analysis of the parsimony and the reasonableness of sign and size of the coefficients in the representation of the relationship between explanatory variables and crop yield were adopted as a criterion.

2.6. Adaptation of the Leave One Out Cross-Validation (LOOCV_ad)

The Leave One Out Cross-Validation (LOOCV_ad) test was used to validate the two models selected (one for each crop). The procedure consists of taking the estimated and observed yield data corresponding to one year at a time as a validation set and the rest as a training set (Fig. 6). Therefore, this procedure will be carried out for each model 19 times (from the year 2000 to 2018). In order to analyze the stability of the model, the RSE was calculated for each year and selected model.

3.1. Average and variability of soybean and maize departmental yields

As Fig. 7 shows, the departmental average yields (2000/2018) of soybean in the study area present highest values in those departments located towards the southeast of the province (MJ and UN with mean yields of 3463 and 3187 kg ha^− 1, respectively), which are also the ones with the lowest standard deviation values (469.2 and 507.6 kg ha^− 1, respectively). On the other hand, the lowest average yields are located in RC, JC and GSM (2384, 2455 and 2601 kg ha^− 1, respectively). The departments that presented the greatest variation in soybean yields were GR and RS (735.3 and 673.4 kg ha^− 1, respectively).

The maize crop also presents highest yields in those departments located in the southeastern of Córdoba province, with average yield values of 9346 and 8827 kg ha^− 1 for MJ and UN, respectively (Fig. 8). The lowest yields are located also in the departments of RC and JC, with the addition of RP; their average yields were 5936, 6007 and 6581 kg ha^− 1, respectively. Regarding the yield variability, Fig. 8 shows an increase in the standard deviation values from the southern-southeast portion of the Córdoba province, toward north-northwest, with the departments of PRSP and MJ showing the lowest values (732.6 and 938.8 kg ha^− 1, respectively). The highest standard deviation values correspond to RS, RP and SJ departments with values of 1484.4, 1428.6 and 1412.9 kg ha^− 1, respectively.

The spatial and temporal variations observed in maize and soybean yields are mainly due to the distribution of rainfall, which is more abundant and widespread towards the southeast of the province (Rolla et al., 2018), although they are also affected by other factors such as the type of soil (more sandy towards the southwest, (Jarsún et al., 2006)) and the contribution of water tables in the south of the province (Videla Mensegue et al., 2015).

3.2. Comparison of remote sensing data

The relationships between different remote sensing data used to monitor soybean and maize yields are shown in Fig. 9. The NDVI relationships, both with LST and TVDI, were inverse in both cases, denoting in general terms that the stress condition expressed by the increase of LST or TVDI determines a decrease in the production of plant biomass.

3.2.1. Relationship between NDVI and LST

Figure 9a shows very significant results (p < 0.001) for the 7 months considered as the summer crop season in the region; the correlation analysis between NDVI and LST shows very high negative correlation values, particularly during the intermediate stages of the growing cycle, when soil coverage tends to be complete. For October, the NDVI and LST correlation is a little weaker (-0.648). This behavior may be explained because the plots during the starting of growing season present a heterogeneous mosaic in the region, from those that have not been sown yet, to those that are in the first vegetative stages. A potential limitation in the use of thermal spectral information occurs at the initial stages of growth when the crop coverage is still low, since the soil surface is the dominant component of the scene (Akuraju et al., 2021).

During the intermediate stages, on the other hand, the correlation between NDVI and LST reaches the highest coefficients, when the crop coverage is complete and the greenness is more generalized in each plot. The greatest correlation value (-0.887) is reached in January. For maize, the highest value in the NDVI curve has been associated with the tasseling stage (Viña et al., 2004; Wang et al., 2020). Probably, the highest correlation is the product of the lower proportion of exposed soil at that moment, reducing the number of mixed pixels of vegetation and bare soil. Therefore, when the crop coverage is greater in relation to the highest NDVI value (de la Casa et al., 2018b) and the plots show less exposed soil, the lower (high) surface temperature can be interpreted unequivocally as a consequence of a more (less) intense transpiration rate.

After reaching the highest values in the middle season, correlation values decrease when the crops during March and April go through the last reproductive phases. In the senescence, the field condition again becomes complex in spectral terms (Viña et al., 2004; Martin et al., 2007). Due to the different sowing dates and also to the gradual senescence process, each plot presents a variable mixture of still active plant tissue and another that has lost chlorophyll in different proportions. It is why the lowest correlation values are manifested, being − 0.637 in March and − 0.509 in April for the entire study region.

3.2.2. Relationship between NDVI and TVDI

As Fig. 9b shows, there is also an inverse relationship between NDVI and TVDI. Although with relatively lower correlation values compared to LST, between − 0.146 (April) and − 0.767 (January), they are also statistically significant (p < 0.05). This inverse relationship is consistent to the extent that TVDI is an indicator designed to express the intensity of water stress and, consequently, is associated with a more restricted plant growth. Since TVDI was developed to reduce the influence of exposed soil on water stress, the lower correlation can be interpreted as a result of greater independence of the vegetation biomass.

Various authors (Sandholt et al., 2002; Patel et al., 2019; Wan et al., 2021), show the inverse nature of the relationship between TVDI and the soil water content. As the soils lose moisture and the TVDI values become higher, the stress condition translates into losses of agricultural productivity (Holzman and Rivas, 2016).

3.2.3. Relationship between LST and TVDI

Figure 9c presents the correlation between LST and TVDI monthly data throughout the crop cycle. All correlations are positive and greater than 0.7. This highlights the similarity of both indicators. The close relationship between the indicators used to monitor the crop condition on a regional scale, suggests that their performance to estimate yield should not be very different either.

3.3 Analysis of the relationship between indicators of water stress and maize and soybean yield.

An evaluation of the relationship between the spectral indicators of water stress and the maize and soybean yield in each department was carried out using a linear analysis. The coefficient of determination magnitude is an expression of the degree of variability that each spectral indicator can explain. As shown in Fig. 10, for General Roca department as example, the vertical bars indicate the determination coefficient obtained for the relationship between each spectral index (NDVI, LST and TVDI) with respect to maize and soybean yield, in the different months during the growing season. Superimposed on the R² values, the curve of the monthly mean values of NDVI represents the seasonal dynamics of vegetation.

In this case, while the maximum R² values (R2x) for soybean are 0.56, 0.37 and 0.34, for NDVI, LST and TVDI, respectively; the values for maize are 0.49, 0.55 and 0.44, respectively. The occurrence of R²x is always in the month of January. Soybean has 4, 3 and 1 significant (n = 19 and p < 0.05) months for NDVI, LST and TVDI, respectively; being in the case of maize 3, 2 and 1, respectively. A clearly feature observed is the lag between the month in which R²x occurs (R²xm) and when NDVIx occurs (NDVIxm), as shown in Fig. 10, in such a way that the greater relationship between productivity and stress indicators tends to appear a little before the NDVI curve reaches its maximum value for both crops.

3.3.1. Maximum coefficient of determination (R²x)

Figure 12 shows the predictive behavior of the set of spectral indicators in relation to the productivity of both crops. The relationships tend to be non-significant between October and December, during the sowing stage and the beginning of vegetative activity, when the NDVI curve presents minimum values. Next, the R² presents a peak that generally results in the highest value (R²x) and, later, the correlation tends to decrease, when the NDVI curve reaches the maximum (February) and decreases during the reproductive phases of the cycle. The decline of the R² values is more gradual during the reproductive stages, particularly for soybean. This behavior is similar to that reported by Liu et al. (2020) for wheat, barley and rapeseed crops in Canada, where the maximum correlation (using NDVI and EVI2) in all crops occurs in the mid-season, although the opportunity of the maximum varies between crops and regions.

According to Fig. 11 evidence, no indicator can be judged superior to another in predictive terms, since NDVI, LST and TVDI in some departments perform better than the rest. To reinforce this concept, a frequency evaluation considering both crops, as well as the departments set, shows that the R²x values from the NDVI present a proportion that is only slightly lower (27.3%) than both LST (36.4%) and TVDI (36.4%). While in the case of soybean exclusively, the proportion changes to 36%, 36% and 27%, for NDVI, LST and TVDI, respectively; for maize the values are 18%, 36% and 45%, respectively. In accordance with this, TVDI emerges as a slightly more generalized indicator to assess maize productivity.

It is worth noting that the predictive level of the univariate linear models, although generalized, is only moderate, since although in all departments there is some indicator that reaches a significant correlation value, in no case does R²x exceed 0.73 for maize (in JC and RS for the relationship with TVDI), nor 0.77 for soybean (in RS and TA for the relationship with NDVI).

The predictive behavior is heterogeneous in the region, to the extent that no particular pattern of R²x values is evidenced in both crops (Fig. 11). The yield estimation in JC, RS and UN is more favorable for maize since the three indices here have higher R²x values. On the other hand, for soybean yield estimation the higher R²x values in GSM, RS and TA correspond to the relationship between yield and NDVI.

3.3.2. Occurrence of R² maximum (R²xm)

When R²x occurrence values (R2xm) were analyzed, a significant difference (p < 0.1) between soybean and maize crops is recognized for the TVDI, being later in the case of soybean. On the other hand, the NDVI in soybean tends to present the maximum value of R2 significantly (p < 0.05) later than TVDI (Fig. 11).

Liu et al. (2020) analyzed the relationship between crop yields (wheat, rapeseed and barley) and the seasonal patterns of MODIS vegetation indices in Canada. They found higher correlation values when the crop growth peak happens (at the end of July (January in the Southern Hemisphere) and early August (February)). Sakamoto (2020) determined from MODIS WDRVI that the highest correlation with county-scale yields occurs 13 days before the maize silking stage and 6 days before the soybean pod setting stage. Similar results were obtained in this study for maize and soybean, although a coarser temporal resolution was used here. Johnson (2014) determined that maize and soybean yields in the US production region were positively correlated with the mid-summer NDVI and negatively with the LST at the same period, which is consistent with the results obtained in this study.

3.4. Regression models with dummy variables

The statistical models to estimate maize and soybean yields were developed separately, by taking in account the different physiological nature of both crops. In addition, as the spectral indicators perform in a particular way for each region, probably due to the influence of territorial differences related to the edaphic variability and technological conditions, a specific relationship for each region had to be established.

Several multiple regression models were assessed by including different combinations of the spectral explanatory variables under study (NDVI, LST and TVDI). For both, maize and soybean, Table 1 shows the five selected models for each crop (maize and soybean) according to their good explanatory performance and parsimony condition. The adjusted R2, AIC, RSE, and degrees of freedom were calculated for selection and control of each proposed model. A technological component (TC) represented by the planting year (long-term yield variation) was also considered. For maize, the term TC is not always included in the models because, depending on the explanatory variables incorporated, the model may be significant (p < 0.05) or not. Instead, the technological term is always significant for soybean, showing that the effect of temporal tendency on yield is particularly relevant for this crop.

Models SM1 and SM2 for soybean and MM1 and MM2 for maize show a similar explanatory capacity of yield. These models have the particularity of incorporating as an independent variable the spectral indicator that presents the highest adjusted R² value (LST from January for SM1/MM1 and NDVI from February for SM2/MM2).

The TVDI of January was incorporated only in MM5 and, in general, TVDI does not show a predictive capacity much better than the indices from which it was calculated (NDVI and LST). In this sense, given that the calculation procedure to obtain it is more complex, it seems more labor-saving to use directly the information of the NDVI and LST products.

Table 1. Regression models with dummy variables preselected for their explanatory capacity, and its equations. Also Adjusted R-squared (Adj. R²), Akaike Information Criterion (AIC), Residual Standard Error (RSE) and Degrees of Freedom (DF) values are presented.

Model	Equation	Adj. R²	AIC	RSE	DF
SM1	SY= int.+β₁ LST_J+β₂ TC+ EC+RE	0.635	3108.1	396.2	196
SM2	SY= int.+β₁ NDVI_F+β₂ TC+ EC+RE	0.626	3112.8	400.7	196
SM3	SY= int.+β₁ J.LST_J+β₂ LST_F+β₃ TC+ EC+RE	0.713	3058.8	351.3	195
SM4	SY= int.+β₁ LST_J+β₂ NDVI_F+β₃ TC+ EC+RE	0.707	3062.8	354.7	195
SM5	SY= int.+β₁ LST_J+β₂ LST_F+β₃ NDVI_F+β₄TC+ EC+RE	0.722	3053.2	345.9	194
MM1	MY= int.+β₁ LST_J+β₂ TC+ EC+RE	0.701	3424.7	845.0	196
MM2	MY= int.+β₁ NDVI_F+EC+RE	0.651	3456.1	912.9	197
MM3	MY= int.+β₁ LST_J+β₂ LST_F+β₃ TC+ EC+RE	0.777	3364.5	729.9	195
MM4	MY= int.+β₁ LST_J+β₂ NDVI_F+EC+RE	0.739	3395.8	788.6	196
MM5	MY= int.+β₁ TDVI_J+β₂ TC+ EC+[dept. TDVI_J]+RE	0.726	3415.2	808.2	186

References: SY and MY: soybean and maize yield. int.: intercept. EC: environmental component represented by the departments. TC: technological component. RE: residual error. β₁…β_n: regression coefficients. LST_x, NDVI_xand TDVI_x: are the LST, NDVI and TDVI of x month.

Although the models that achieve the greatest predictive capacity in absolute terms are MM3 and SM5, an interesting option are the SM4/MM4 models, because they not only present an appropriate performance (Table 1 and Figure 12), but also allow explaining yield by using January LST, as well as February NDVI. The use of models with a similar structure allows a more direct comparison between both crops, as well as to analyze a drought impact beyond the crop in question. Therefore the predictive behavior of these models (SM4 and MM4) is analyzed in more detail.

These models' conformation differ only in the TC variable, which the soybean model includes due to its significant character (p <0.05), and allows to contrast the effect of the explanatory variables (LST and NDVI, for January and February, respectively) on crop yield. In this sense, negative coefficients for LST (-89.6 and -215.2) and positive ones for NDVI (5067.1 and 9256.7) were obtained, for soybean and maize, respectively. The coefficient for TC is positive for soybean and its magnitude indicates that for each year the yield increases 27.5 kg ha^-1, which suggests a positive impact of the technological contribution on soybean yields at regional level. However, this positive influence is not manifested in all sectors, as Figure 12 shows for RC and JC.

The good performance of the NDVI in February to explain crop yield variability may be associated with a lag effect of water stress impact during the previous month (with LST in January, associated to the beginning of the reproductive stage), and which manifests later in the growing season by the reduction of biomass, as well as by the reduction of the number and size of reproductive organs. Similar results were obtained by Johnson (2014) using NDVI and LST from MODIS, who reported a maximum correlation value between NDVI and maize yield in mid-summer and a similar but inverse LST response.

Because the region has an heterogeneous productive capacity, it is necessary to incorporate the departments into the model as a dummy variable to represent their particular productive potential for both crops. Both selected models show the department variable as significant (p <0.05), which reinforces this interpretation. In addition, as the interactions between the dummy variable and the spectral and TC indicators are not significant (p >0.05), they were not included. This behavior is supported by Figure 12, where the similarity of the slopes for each covariate with crop yields is observed in most of the departments.

The productive behavior of each year responds to a particular meteorological condition. For this reason, it is important to assess and validate the general model (fitted models with all the data, period 2000-2018) by analyzing year by year the error parameters changes. The consistency of the selected models was evaluated through the adaptation of Leave One Out Cross-Validation (LOOCV) test, for which the RSE was calculated by using the validation data of each year. The RSE obtained each year was, mostly, lower than that obtained for the general model. The mean RSE obtained for the set of years was 279.4 and 579.4 kg ha^-1 for soybean and maize, respectively, which are below those obtained for the general models (354.7 and 788.6 kg ha^-1, respectively).

The relationships between observed and estimated yields by the multiple regression analysis for maize and soybean crops at a regional scale are presented in Figure 13. As Figure 13 shows, the general adjustment of the models is moderate and similar for both crops, reaching an R² value of 0.60 for maize and 0.58 for soybean, in correspondence to the also moderate individual performance of the spectral variables for each crop and department (Figure 10). Although the degree of explanation of the variability is moderate, Liu et al. (2020) also present R² values between 0.53 and 0.7 for crop yield estimation models in Canada.

In summary, the regression models proposed with dummy variables, exhibit a stable and relatively accurate performance for estimating soybean and maize yields at regional level, using NDVI and LST of mid-season as model inputs.

The results of this work confirm that there is a close and negative relationship on a monthly scale of the NDVI, both with the surface temperature (LST) and with the vegetation temperature drought index (TVDI). This relationship is particularly tighter in the month of January, when crops completely cover the soil and before the beginning of crop senescence stages. For its part, a high correlation between LST and TVDI was also determined, so it is considered that the three spectral indicators assessed have a potential equivalent to the effects of estimating the yield of maize and soybean crops at a regional scale.

The individual analysis of the relationship between the spectral indicators and the yield of the crops revealed that none of them explains productivity in an exclusive way and that, on the contrary, the interannual yield variation of maize and soybean in each departmental sector can be better represented by a different spectral indicator. In another sense, the work confirms that the indicators obtained in mid-season reach the greatest predictive capacity and allow for an accurate projection of corn and soybean yields well in advance of the harvest.

After analyzing different alternatives, for each crop a model was selected to estimate yield variation on a regional scale, integrated both by spectral information and factors linked to the advance and edaphic or management differences of each department. Although the predictive capacity of these models is only moderate, which leaves an ample gap for improvement, especially when increasing resolution detail at the intra-departmental level, the verification analysis using a modified LOOCV technique demonstrated the robust nature of these tools, both on a temporal and a territorial scale.

Funding

This work was supported by the Secretaría de Ciencia y Técnica de la Universidad Nacional de Córdoba, Argentina (SeCyT-UNC) [Grant Number 33620180100318 CB].

Conflict of interest

The authors declare no competing interests.

Data availability

Data will be made available from the corresponding author on reasonable request.

Code availability

Not applicable.

Authors' contributions

G. G.: Conceptualization, Data curation, Performing geospatial computations, Funding acquisition, Formal analysis, Visualization, Writing – original draft.

A. C.: Conceptualization, Data curation, Funding acquisition, Formal analysis, Project administration, Resources, Visualization, Writing – original draft.

G. D.: Data curation, Formal analysis, Validation, Visualization, Writing – original draft.

F. S.: Data curation, Investigation, Writing – original draft.

P. D.: Data curation, Investigation, Writing – original draft.

J.P.C.: Data curation, Investigation, Writing – original draft.

Acker, J. G., & Leptoukh, G., 2007. Online analysis enhances use of NASA earth science data. Eos, Transactions American Geophysical Union, 88(2), 14-17. https://doi.org/10.1029/2007EO020003
Akuraju, V.R., Ryu, D., & George, B. , 2021. Estimation of root-zone soil moisture using crop water stress index (CWSI) in agricultural fields. GIScience & Remote Sensing 58(3), 340-353. https://doi.org/10.1080/15481603.2021.1877009.
Al Faisal, A., Kafy, A. A., Rahman, A. F., Al Rakib, A., Akter, K. S., Raikwar, V., Jahir, D. M. A., Ferdousi J., Kona, M. A., 2021. Assessment and prediction of seasonal land surface temperature change using multi-temporal Landsat images and their impacts on agricultural yields in Rajshahi, Bangladesh. Environmental Challenges, 4, 100147. https://doi.org/10.1016/j.envc.2021.100147
Basso, B., Liu, L., 2019. Seasonal crop yield forecast: Methods, applications, and accuracies. Advances in Agronomy. https://doi.org/10.1016/bs.agron.2018.11.002
Bolton D.K., Friedl, M.A., 2013. Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics, Agricultural and Forest Meteorology. 173, 74–84. https://doi.org/10.1016/j.agrformet.2013.01.007
de la Casa, A., Ovando, G., 2014. Climate change and its impact on agricultural potential in the central region of Argentina between 1941 and 2010. Agricultural and Forest Meteorology 195–196, 1-11. https://doi.org/10.1016/j.agrformet.2014.04.005
de la Casa, A., Ovando, G., Díaz, G., Bressanini, L., Miranda, C., 2018a. Brecha de rendimiento del cultivo de soja estimada con el modelo AquaCrop en la región central de Córdoba, Argentina. Revista Argentina de Agrometeorología X, 1-19.
de la Casa, A., Ovando, G., Bressanini, L., Martínez, J., Díaz, G., Miranda, C., 2018b. Soybean crop coverage estimation from NDVI images with different spatial resolution to evaluate yield variability in a plot. ISPRS Journal of Photogrammetry and Remote Sensing 146, 531-547. https://doi.org/10.1016/j.isprsjprs.2018.10.018
de la Casa, A., Ovando, G., Bressanini, L., Díaz, G., Díaz, P., Miranda, C., 2019. Evaluación de la brecha de rendimiento para maíz tardío con distintas densidades de siembra en la región central de Córdoba, Argentina. Agriscientia 36 (2), 1-17.
Ghida Daza, C., E. M., Sánchez, C., 2009. Zonas agroeconómicas homogéneas: Córdoba (No. E16/121). Instituto Nacional de Tecnología Agropecuaria, Buenos Aires (Argentina). Proyecto Específico Economía de los Sistemas de Producción: caracterización y prospectivas (PE AEES 1731). https://www.produccion-animal.com.ar/regiones_ganaderas/23-zonas_agroeconomicas_cba.pdf. Retrieved March 20, 2022,
Ghida Daza, C.A.; Issaly, C.; Pizarro, L.; Sanchez, C.; Freire, V.; Gigena Parker, G.; Reynoso, D.; Salminis, J.; Urquiza, O.B.; Vigliocco, M., 2019. Monitoreo económico de los sistemas productivos predominantes del sector agropecuario de Córdoba : resultados campaña 2016-17; coordinación general de Carlos Ghida Daza. - 1a ed mejorada. - Córdoba : Ediciones INTA, 2019. ISBN 978-987-521-973-1. https://inta.gob.ar/sites/default/files/inta_monitoreoeconomico_cba_xiii_2018.pdf. Retrieved March 20, 2022,
Holzman, M. E., Rivas, R. E., 2016. Early maize yield forecasting from remotely sensed temperature/vegetation index measurements. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(1), 507-519. https://doi.org/10.1109/JSTARS.2015.2504262.
Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., Ferreira, L. G., 2002. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote sensing of environment, 83(1-2), 195-213. https://doi.org/10.1016/S0034-4257(02)00096-2
Jarsún, B., Gorgas, J. A., Zamora, E., Bosnero, H., Lovera, E., Ravelo, A., Tassile, J. L., 2006. Los suelos. Agencia Córdoba Ambiente Córdoba Argentina.
Johnson, D.M., 2014. An assessment of pre- and within-season remotely sensed variables for forecasting corn and soybean yields in the United States. Remote Sens. Environ. 141, 116–128. https://doi.org/10.1016/j.rse.2013.10.027
Kingra, P. K., Setia, R., Kaur, J., Pal, R. K., Singh, S. P., 2021. Role of Geospatial Technology in Crop Growth Monitoring and Yield Estimation. In Singh, R. (Ed.) Re-envisioning Remote Sensing Applications. CRC Press, Boca Raton. pp. 273-290.
Leng, G., Hall, J., 2019. Crop yield sensitivity of global major agricultural countries to droughts and the projected changes in the future. Science of the Total Environment, 654, 811-821. https://doi.org/10.1016/j.scitotenv.2018.10.434
Liu, J., Huffman, T., Qian, B., Shang, J., Li, Q., Dong, T., Davidson, A., Jing, Q., 2020. Crop yield estimation in the Canadian Prairies using Terra/MODIS-derived crop metrics. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 2685-2697. https://doi.org/10.1109/JSTARS.2020.2984158.
MAGyP, 2020. Ministerio de Agricultura, Ganadería y Pesca de la República Argentina. Estadísticas agrícolas: Series históricas https://www.argentina.gob.ar/agricultura.
Martin, K.L., Girma, K., Freeman, K.W., Teal, R.K., Tuban´a, B., Arnall, D.B., Chung, B., Walsh, O., Solie, J.B., Stone, M.L., Raun, W.R., 2007. Expression of variability in corn as influenced by growth stage using optical sensor measurements. Agronomy Journal 99, 384–389. https://doi.org/10.2134/agronj2005.0268
Mather, J. R., 1965. Average climatic water balance data of the continents. Part VIII. South America. Publications in Climatology, 18(2), 297-433.
Nolasco, M., Ovando, G., Sayago, S., Magario, I., Bocco, M., 2021. Estimating soybean yield using time series of anomalies in vegetation indices from MODIS. International Journal of Remote Sensing, 42(2), 405-421. https://doi.org/10.1080/01431161.2020.1809736
Patel, N. R., Mukund, A., Parida, B. R., 2019. Satellite-derived vegetation temperature condition index to infer root zone soil moisture in semi-arid province of Rajasthan, India. Geocarto International, 37(1), 179-195. https://doi.org/10.1080/10106049.2019.1704074.
Rolla, A. L., Nuñez, M. N., Guevara, E. R., Meira, S. G., Rodriguez, G. R., de Zárate, M. I. O., 2018. Climate impacts on crop yields in Central Argentina. Adaptation strategies. Agricultural Systems, 160, 44-59. https://doi.org/10.1016/j.agsy.2017.08.007
Sakamoto, T., 2020. Incorporating environmental variables into a MODIS-based crop yield estimation method for United States corn and soybeans through the use of a random forest regression algorithm. ISPRS Journal of Photogrammetry and Remote Sensing 160, 208-228. https://doi.org/10.1016/j.isprsjprs.2019.12.012
Sandholt, I., Rasmussen, K., Andersen, J., 2002. A simple interpretation of the surface temperature/vegetation index space for assessment of surface moisture status. Remote Sensing of environment, 79(2-3), 213-224. https://doi.org/10.1016/S0034-4257(01)00274-7
Sayago, S., Ovando, G., Bocco, M., 2017. Landsat images and crop model for evaluating water stress of rainfed soybean. Remote Sensing of Environment, 198, 30-39. https://doi.org/10.1016/j.rse.2017.05.008
Videla Mensegue, H. R., Degioanni, A. J., Cisneros, J. M., 2015. Estimating shallow water table contribution to soybean water use in Argentina. European Scientific Journal, 11(14), 23-40.
Viña, A., Gitelson, A.A., Rundquist, D.C., Keydan, G., Leavitt, B., Schepers, J., 2004. Monitoring maize (Zea mays L.) phenology with remote sensing. Agronomy Journal 96, 1139–1147. https://doi.org/10.2134/agronj2004.1139
Wan, Z., 2014. New refinements and validation of the collection-6 MODIS land-surface temperature/emissivity product. Remote sensing of Environment, 140, 36-45. https://doi.org/10.1016/j.rse.2013.08.027
Wan, W., Liu, Z., Li, K., Wang, G., Wu, H., Wang, Q., 2021. Drought monitoring of the maize planting areas in Northeast and North China Plain. Agricultural Water Management, 245, 106636. https://doi.org/10.1016/j.agwat.2020.106636
Wang, X., Zhang, S., Feng, L., Zhang, J., Deng, F., 2020. Mapping maize cultivated area combining MODIS evi time series and the spatial variations of phenology over Huanghuaihai Plain. Applied Sciences, 10(8), 2667. https://doi.org/10.3390/app10082667
Xue, J., Su, B., 2017. Significant remote sensing vegetation indices: A review of developments and applications. Journal of sensors, 2017. https://doi.org/10.1155/2017/1353691
Zhu, B., Chen, S., Cao, Y., Xu, Z., Yu, Y., Han, C., 2021. A regional maize yield hierarchical linear model combining landsat 8 vegetative indices and meteorological data: Case study in Jilin province. Remote Sensing, 13(3), 356. https://doi.org/10.3390/rs13030356

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Evaluating spectral indices from MODIS to predict maize and soybean regional yields

Status:

Version 1

Abstract

Figures

1. Introduction

2. Material and Methods

2.1. Study area.

2.2. Evolution of maize and soybean sowing area and grain yield

2.3. NDVI and LST Data

2.4. Temperature vegetation dryness index (TVDI)

2.5. Regression models with dummy variables

2.6. Adaptation of the Leave One Out Cross-Validation (LOOCV_ad)

3. Results and Discussion

3.1. Average and variability of soybean and maize departmental yields

3.2. Comparison of remote sensing data

3.2.1. Relationship between NDVI and LST

3.2.2. Relationship between NDVI and TVDI

3.2.3. Relationship between LST and TVDI

3.3 Analysis of the relationship between indicators of water stress and maize and soybean yield.

3.3.1. Maximum coefficient of determination (R²x)

3.3.2. Occurrence of R² maximum (R²xm)

3.4. Regression models with dummy variables

4. Conclusions

Declarations

References

Additional Declarations

Status:

Version 1

Evaluating spectral indices from MODIS to predict maize and soybean regional yields

Status:

Version 1

Abstract

Figures

1. Introduction

2. Material and Methods

2.1. Study area.

2.2. Evolution of maize and soybean sowing area and grain yield

2.3. NDVI and LST Data

2.4. Temperature vegetation dryness index (TVDI)

2.5. Regression models with dummy variables

2.6. Adaptation of the Leave One Out Cross-Validation (LOOCVad)

3. Results and Discussion

3.1. Average and variability of soybean and maize departmental yields

3.2. Comparison of remote sensing data

3.2.1. Relationship between NDVI and LST

3.2.2. Relationship between NDVI and TVDI

3.2.3. Relationship between LST and TVDI

3.3 Analysis of the relationship between indicators of water stress and maize and soybean yield.

3.3.1. Maximum coefficient of determination (R2x)

3.3.2. Occurrence of R2 maximum (R2xm)

3.4. Regression models with dummy variables

4. Conclusions

Declarations

References

Additional Declarations

Status:

Version 1

2.6. Adaptation of the Leave One Out Cross-Validation (LOOCV_ad)

3.3.1. Maximum coefficient of determination (R²x)

3.3.2. Occurrence of R² maximum (R²xm)