Maize Yield Forecast Using GIS and Remote Sensing The Case of Kaffa Zone, South Western Ethiopia


 Background: Policy makers, government planners and agriculturalist in Ethiopia require accurate and timely information about maize yield and production. Kaffa zone is by far the most important maize producing zone in the country. The manual collection of field data and data processing for crop forecasting by the CSA requires significant amounts of time before official reports are released. Several studies have shown that maize yield can be effectively forecast using satellite remote sensing data. The objectives of this study were to develop a maize yield forecast model in kaffa Zone derived from time series data of eMODIS_NDVI, actual and potential evapotranspiration and CHIRPS for the years 2008-2017.Official grain yield data from the Central statistical Agency of Ethiopia was used to validate the strength of the indices in explaining the yield. Crop masking at crop land area was applied and refined by using agro ecological zones suitable for the crop of interest. Correlation analyses were used to determine associations among crop yield, spectral indices and agro meteorological variables for maize crop of the long rainy season (kiremt). Indices with high correlation with maize yield were identified. Results: Average Normalized Difference Vegetation Index and rainfall have high correlation of maize yield with 84% and 89%, respectively. That means their variables are positively strong related with maize yield. The generated spectro-agro meteorological yield model was successfully tested against the Central Statistical Agency's expected Zone level yields (r2= 0.89, RMSE = 1.54qha1, and 16.7% coefficient of variation).Conclusions: Thus, remote sensing and geographical information system based maize yield forecast improved quality and timelines of the data besides distinguishing yield production levels/areas and making intervention very easy for the decision makers there by proving the clear potential of spectro-agro meteorological factors for maize yield forecasting, particularly for Ethiopia.


Background
Crop yield prediction is critical for planning and making various policy decisions. Many countries rely on traditional data gathering techniques such as groundbased visits and reports for crop monitoring and yield forecast. These reporting processes are subjective, costly, time-consuming, and prone to huge errors due to insu cient ground observation, which results in inaccurate crop production assessments and a delay in reporting necessary actions (Greatrex, 2012). Before the advent of remote-sensing methods like NDVI, crop-weather models were utilized for crop monitoring and yield forecasts (Rojas, 2007).Crop data was collected on the ground in the Kaffa zone, which is a challenging task that requires human resources, money, and time. Remote sensing is more signi cant than ground survey in reducing these issues. Because remote sensing can give precise and timely data for crop production estimation, most studies have found a link between NDVI, agro meteorological data, green biomass, and yield (Rojas, 2007).
Many studies in Ethiopia have been conducted using this techniques for crop yield forecasting at various zonal levels; Abiy(2014) conducted a study in the south Tigray Zone for maize crop forecasting using time series data from SPOT VEGETATION, actual and potential evapotranspiration, and rainfall estimate satellite data for the years 2003-2012, and Akililu(2015) conducted a study in the Arsi zone for wheat crop forecasting using time series data from SPOT VEGETATION, However, both studies used SPOT VEGETATION NDVI and RFE 2.0, which cover large areas with low-resolution (1km) and low-resolution (10km), respectively, rather than eMODIS NDVI, which is a relatively better data set used in crop monitoring, due to the length of the time series (since 2000) and spatial resolution (250 meters), as well as the fact that it is freely available and easily accessible, and CHIRPS rainfall, which data are available from 1981 dekedal As a result, the researchers wanted to solve this research vacuum by developing a model that uses eMODIS NDVI and CHERIPS satellite rainfall to anticipate Maize output for the year 2018.

Description of study area
This research was carried out in the Kaffa Zone, which is located in the South Western region of the South, Nation, Nationalities and Peoples Region, between 6 o 24' and 8 o 13' north latitude and 35 o 30' to 36 o 46' east longitude. The Zone covers a total area of 10,602.7 km2, accounting for 7.06 percent of the region's total area. Kaffa Zone is divided into twelve administrative districts and contains three traditional climate zones based on altitude and temperature variances.
Highland (2500-3000 m), midland (1500-2500 m), and lowland (500-1500 m) are the three types. Highland, midland, and lowland occupy 11.6 percent, 59.5 percent, and 28.9 percent of the Zone's total area, respectively. According to national meteorology agency, the average yearly temperature in the area is between 10.1 and 27.5 degrees Celsius. February, March, and April are the hottest months, while July and August are the coolest. The yearly rainfall ranges from 1001-2200mm. The Kaffa Zone is located in Ethiopia's South West area, which receives the most rainfall. This is due to the existence of an evergreen forest cover on top of the wet monsoon winds' windward location.
Source of data and software's used It is critical to determine the sources and types of data in order to meet the study's objectives. The information for this study was gathered from both primary and secondary sources. Primary data is made up of information gathered from satellite imagery and observations on the ground. Books, topographic and thematic layers, periodicals, Meteorological Agency and Central Statistical Agency reports, as well as other publications and scholarly works, are all examples of secondary data sources. Different softwares were also employed to analyze these data sets. Data processing and analysis

Classi cation
The pan sharpened SPOT 6 image is processed for supervised classi cation in ArcGIS software. According to Yan et al. (2006), supervised categorization necessitates the user specifying the various pixels values or spectral signatures that should be associated with each class. This is performed by identifying Training Sites or Areas, which are typical sample sites of known cover types. To construct the thematic map of Land cover and to identify the Land use land cover classi cation of the research area, the maximum likelihood classi er (MLC) was used to classify land cover into two classes (agricultural and nonagriculture) ( Figure 2).
The accuracy of a map created from remote sensing data must be assessed. The most popular technique to communicate the accuracy of categorization results is using an error matrix. The error matrices were used to calculate overall accuracy, user and producer accuracies, and the Kappa statistic. The Kappa statistic integrates the error matrices' off diagonal portions and represents agreement after reducing the fraction of agreement that may be anticipated to happen by chance. As a result, the above-mentioned classi cations (agricultural and non-agriculture) were represented evenly. The enough number of samples that represent the thematic classes and ensure good distribution across the map is important to test the attribute accuracy. As a rule of thumb Congalton et al.
(2008) recommends at least 50 samples per class. If the area exceeds 500km 2 or the number of categories is more than 12, then at least 75-100 samples should be taken per class. These recommendations coincide with those recommended by Fenstermaker (1991). The number of samples for each category might be adjusted based on the relative importance of that category for a particular application. To verify attribute correctness, there must be a su cient number of samples that represent the thematic classes and are distributed evenly across the map. Congalton et al. (2008) suggests at least 50 samples each class as a general guideline. If the region is greater than 500km2 or the number of categories is greater than 12, at least 75-100 samples per class should be taken. These suggestions are similar to those made by Fenstermaker (1991). Depending on the relative relevance of each category for a given application, the quantity of samples for each category may be changed. Furthermore, sampling could be assigned based on the degree of variation within each category (Congalton et al., 2008). As a result, the accuracy assessment sample size was determined to be 200, with 100 sample points created for each class. Then, for each class, these spots were produced at random and their GPS readings were placed onto a GPS for eld accuracy testing (Figure3).
These points were veri ed in two ways: those that were visible and accessible in the eld, and those that were veri ed using Google Earth as a reference. As a result, for the 200 sample points, the following error matrix (Table 3) is presented. The overall accuracy and kappa analysis were used to complete a classi cation accuracy evaluation, and the overall accuracy of the data is 90.0 percent, with a kappa coe cient of 0.80, and the interpretation may be taken as correct for further analysis based on the result.

Maize Crop Mask Data Derivation
Crop agro-ecology in the research area is another input for disguising crop data. According to Gorfu and Ahmed (2012), maize is primarily grown between the elevations of l500 and 2200masl, i.e. Figure 4 shows crop masks data for maize.
Preparing Independent Variables Using Mask Data of Maize.
To determine the predictive capability of the independent variables, all variables were extracted with crop mask data for further correlation analysis and to identify highly correlated ones with maize yield. The time series data (120 decadal) of NDVI have undergone image preprocesses in one goes were ready for monthly maximum value compositing (MVC).In ArcGIS there is a tool called 'Cell Statistics' under Spatial Analyst toolbox. You will add multiple rasters, which during this case is MODIS NDVI june-sept. Select the 'maximum' option and 40 monthly composited NDVI images were prepared. These monthly NDVI images were then extracted using the crop mask data to focus only on crop of interest then average NDVI value for every year was computed. The calculated value is in raster value, which ranges from 0 to 255 and needed to be changed to NDVI value. Thus, the formula, emodis NDVI = Float (Smoothed eMODIS NDVI -100) / 100 (Gidey et al., 2018), was run and also the result were ready for correlation with sorghum yield (Table 4). CHIRPS time series data of Decadal image was also composited at monthly level using MVC and were extracted with crop mask data and yearly average was computed from the extracted results for further analysis ( Table 4). The WRSI model is a ratio of seasonal actual crop evapotranspiration (ETA) to the seasonal crop water requirement, the same as the potential crop evapotranspiration (PETc). Here, sorghum crop coe cient from LEAP software was adopted for the phonological from planting to owering (initial 0.3, vegatative1.15, owering1.15, Ripening0.55) ( Figure 5). Multiple Linear Regression Analysis.
To run Multiple Linear Regression we use the data of Table 4. There were some assumptions using in this statistics: -(a) the basic assumption of the regression analysis approach is that su ciently long and consistent time series of both remote sensing data and agricultural statistics are available. The latter are normally aggregated at the level of national/sub national administrative units, from which average NDVI values be extracted (b) The criterion variable was assumed to be a random variable (c) There would be statistical relationship (estimating the average value) rather than functional relationship (calculating an exact value) (d) Multiple linear regressions assume the relationship between the dependent and each independent variable to be linear. The The β's are the regression coe cients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. The β0 is the constant, where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are 0. The standardized version of the β coe cients is the beta weights, and the ratio of the beta coe cients is the ratio of the relative predictive power of the independent variables (Linear regression analysis, Yan and Su, 2009). The developed model predicts the average value of one variable (Y) from the value of another variable (X). The X variable is also called a predictor. Generally, this model is called a regression model.

Developing Multiple Linear Regression Model Equation for Maize Yield Forecasting in the Study Area.
Normalized Difference Vegetation Index Average, which is a result of monthly maximum value composite (MVC) averages of NDVI from the planting date to the end of the crop cycle gives a correlation coe cient of 0.84 with signi cant P value of 0.002 at 95 % con dence level and highly correlated independent variable was rainfall with a correlation coe cient of 0.89 with signi cant P value of 0.0001 at 95% con dence level. While others like ETa, which has a correlation value of 0.024 with signi cant P value of 0.942 at 95% con dence level, Eta total which has a correlation value of 0.22 with signi cant P value of 0.537 at 95% con dence level and WRSI (r =0.258) with a P value of 0.472, which is beyond the acceptable range at 95 % of con dence level were rejected from the model development. Hence the two most correlated variables (NDVIa and rainfall) with the dependent variable (yield) are selected to create a multiple linear regression model.
As many studies on crop forecast states that linear regression modeling is the most common method to produce yield predictions by using remote sensing derived indicators together with bio climatic information. Maize yield data and data derived from the different indices were prepared for multiple linear regression analysis. The Statistical Package for Social Science (SPSS) software was used to build a multiple linear regression model using the two most correlated variables. As a result of all the above processes, the model highly correlated variables (NDVIa and Rainfall) were used to develop a model. This model was validated based on its Coe cient of determination (R2), root mean square error (RMSE) and coe cient of variation (CV) as shows in (Figure 7). When we see the overall t of the model by examining the plot of the actual yield per hectare against the predicted yield per hectare, it reveals that, most points lie fairly close to the 45° line (exact prediction line). The R square value of the model is 0.89; R square adjusted is 0.88 with root mean square error of 1.54 quintal per hectare. The P value of the model is 0.0001 at 95 % con dence level and. By observing this P value, it is unclear which independent variable is the very good predictor and which is poor. The analysis of variance as shown in Table 5 state that maize yield forecast model has an observed signi cance probability (Prob>F) of 0.0001, which is signi cant at 0.05 level. Since the p<0.0001, we conclude that Yield is related to NDVIa and/or CHIRPS. From Table 6, the Variance In ation Factor (VIF) of NDVIa and CHIRPS is 1. 992. This shows that there is a no multicollinearity problem between these two variables since VIF is less than 10. Therefore, for this research NDVIa and CHIRPS rainfall are selected for the model development. Table 7 shows parameter estimates of the model which reveals that NDVIa and CHIRPS rainfall. Therefore, from the result of Table 7   Degree of freedom (df): -is the number of values in the nal calculation of a statistic (estimated value) that are free to vary. Comparing the accuracy level of maize crop yield forecast using model and Central Statistics at the ground level in the study area.
When comparing the subjectivity of the conventional and remote sensing yield forecasts, the remote sensing approach wins. The forecast data, which is a result of the conventional approach, has a coe cient of variation of 17.7%, according to the CSA report, and it is a subjective approach. However, the remote sensing-based model predicts 16.7% with a high level of con dence (95%) and a large probability value. Furthermore, because September is the owering stage of the maize crop, the forecast result of the remote sensing enabled methodology may be delivered in early October, whereas the traditional method data release calendar is generally in December and includes all cereal crops. Even though we did not consider all cereals that CSA has covered both in my research, this suggests that the timeliness issue may be addressed by applying the remote sensing aided strategy in a better way than the conventional approach.
Another advantage of the remote sensing-based approach is that it gives location information, as the forecast can be checked by getting GPS readings and navigating to the places after it is created. As a result, this method allows for a precise indication of which areas have a high and low yield in a tangible manner, whereas traditional methods fail miserably. As a result, it is obvious that maize yield forecast utilizing remote sensing and GIS increases the data quality and timeliness while also reducing subjectivity. This research and prior related studies have shown that a remote sensing enabled approach can show places (lower administrative areas) where there is comparatively high, medium, and low production, making intervention very easy for decision makers. The following ( Figure 8) shows a comparison of traditional yield estimations against the Remote Sensing assistance technique.
Based on the developed model, the 2018 maize crop forecast was made. Accordingly, highest maize yield for 2018 is expected to be 25 q•ha−1 and lowest 15 q•ha−1 with a mean of 20q•ha−1. The prediction also indicates that maize yield in 6.1% of the study area will be 10-15q•ha−1 and in 50.3% of the area to be 15-19q•ha−1 while the rest of 43.6% area is likely to yield 20-25 q•ha−1 (Table 4.4).Spatial distribution of the production levels in kaffa zone reveal that certain pockets of North-western, North-Eastern, northern and eastern part of the study area such as Gesha, Sayilem, Gimbo, Gewata and Menjwo district are most productive with 20 -25 q•ha−1 of yield while the, western, south-eastern and central part of the Zone, Bita, Cheta,Talo and Bonga town zuria weredas are intermediately productive with 15-19 q•ha−1 of output. The rest of the study area also hosts least productive pockets giving only 10-15 q•ha−1 of grains. Hence, North-western, North-Eastern, northern and eastern part of the zone was more productive than other part of the study area ( Figure 9). Another consistency in terms of components or characteristics generated from Remote Sensing data was that in this study, the Water Requirement Satisfaction Index (WRSI) and Actual Evapotranspiration (Eta) were not related to yield, and the same was true for Abiy's (2014) and Akililu's (2014) studies (2015). In this study work, NDVI and rainfall are selected for the nal model based on the Statistics results, similar to Abiy's paper, however rainfall is omitted from the model in Akiliu's(2015) paper based on the Variance In ation Factor (VIF) result. The ndings of this study, which is the third crop analyzed after Abiy's (2014) maize crop yield forecast research and Akililu's(2015) wheat crop yield forecast research, show that Agro metrological factors have a de nite potential for maize yield forecasting in the kaffa zone.

Declarations
Data and materials, generated and analysed are available in this research work.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interest
On behalf of all authors, the corresponding author states that there is no con ict of interest among all the authors in this research work.  Location map of Kaffa Zone.

Figure 2
Maps of Land use/land cove of the study area. Random points generated for accuracy Crop mask data for Maize. Methodological ow chart.

Figure 7
Page 14/14 Comparison between the maize yields estimated by the agro meteorological Model and the observed yields for the study area.

Figure 8
Comparison between maize yield (quintal/ha) Estimated By the model and the observed yield.

Figure 9
Maize yield forecast map of 2018.