Description of study area
This research was carried out in the Kaffa Zone, which is located in the South Western region of the South, Nation, Nationalities and Peoples Region, between 6o24' and 8o13' north latitude and 35o30' to 36o46' east longitude. The Zone covers a total area of 10,602.7 km2, accounting for 7.06 percent of the region's total area. Kaffa Zone is divided into twelve administrative districts and contains three traditional climate zones based on altitude and temperature variances. Highland (2500-3000 m), midland (1500-2500 m), and lowland (500- 1500 m) are the three types. Highland, midland, and lowland occupy 11.6 percent, 59.5 percent, and 28.9 percent of the Zone's total area, respectively. According to national meteorology agency, the average yearly temperature in the area is between 10.1 and 27.5 degrees Celsius. February, March, and April are the hottest months, while July and August are the coolest. The yearly rainfall ranges from 1001-2200mm. The Kaffa Zone is located in Ethiopia's South West area, which receives the most rainfall. This is due to the existence of an evergreen forest cover on top of the wet monsoon winds' windward location.
Source of data and software’s used
It is critical to determine the sources and types of data in order to meet the study's objectives. The information for this study was gathered from both primary and secondary sources. Primary data is made up of information gathered from satellite imagery and observations on the ground. Books, topographic and thematic layers, periodicals, Meteorological Agency and Central Statistical Agency reports, as well as other publications and scholarly works, are all examples of secondary data sources. Different softwares were also employed to analyze these data sets.
Table 1
Source, description and purpose of the data used in the study
No
|
Types of data
|
Source
|
Description
|
Purpose
|
1
|
Study area boundary
|
CSA, 2007
|
Shape file
|
Used to demarcate the study area boundary
|
2
|
DEM
|
USGS
|
Aster 30m resolution
|
For maize elevation generates.
|
3
|
Pan sharpened SPOT 6 image
|
EGIA
|
Landsat8
• Spatial resolution of 30 meters (visible, NIR, SWIR); 100 meters (thermal); and 15 meters (panchromatic).
• 12-bit Radiometric resolution
• 16 days Temporal resolution
• Path/raw - 170/55
Spot 6
• Spatial resolution of 1.5m Panchromatic and 6m Multispectral
• 12 bit Radiometric resolution
• 1days Temporal resolution
• Path/raw - 170/55
|
For supervised LULC classification
|
4
|
CHERIPS rainfall
|
NMA
|
• Spatial resolution:(0.050*0.050)
• Frequency: dekadal
• Archive:2008-2017
• Format:netCDF
|
Used to calculate yearly average rainfall to correlate it with maize yield.
|
5
|
potential evaporation(pet)
|
NMA
|
• 1km*1km Spatial resolution
• Frequency: dekadal
• Archive:2008-201
• Format:netCDF
|
Used to calculate WRSI to correlate it with maize yield.
|
6
|
eMODIS NDVI
|
Downloading from https://earlywarning.usgs.gov/fews/datadownloads/East%20Africa/eMODIS%20NDVI%20C6
|
• 250m*250m special resolution
• 12bits radiometric resolution
• 1-2days Temporal resolution
• Frequency: dekadal
• Archive:2008-2017
|
Used To calculate yearly average NDVI to correlate it with maize yield.
|
7
|
Eta (actual evaporation)
|
Downloaded freely from EWSNET http://earlywarning.usgs.gov/fews/downloads/index.php?
|
• 1km*1km special resolution
• 8 bit spectral resolution
• 16 days Temporal resolution
• Frequency: dekadal
• Archive: 2008-2017
|
Used to calculate yearly average Eta ,Eta total and WRSI to correlate these variables with maize yield
|
8
|
Maize yield(qt/ht)
|
CSA annual agricultural report, 2018
|
• Archive data from 2008-2017
|
To calibrate the developed model with historical crop yield statistics
|
9
|
Ground Truth and Accuracy Assessment Points
|
Bonga University
|
Random coordinates from each land use using HHGPS Garmin62
|
For accuracy assessment of the supervised classification
|
Table 2
Summary of equipment and materials used for data collection and analysis.
Software used
|
Purpose
|
GPS(Global Position System)
|
For collecting of GCP points which will be created at random for the study area using Arc GIS 10.3 used mainly for accuracy assessment area measurement
GPS is from Bonga University.
|
Erdas2015, ArcMap10.3, LEAP 2.7.1, SPSS statistical tool,
|
GIS and statistical software for image and vector processing and data analysis.
|
Google Earth
|
Used as supplementary for checking and correcting area of doubt about accuracy of the classification.
|
CDT (Climate Data Tool)
|
To calculate potential evapotranspiration
|
Data processing and analysis
Classification
The pan sharpened SPOT 6 image is processed for supervised classification in ArcGIS software. According to Yan et al. (2006), supervised categorization necessitates the user specifying the various pixels values or spectral signatures that should be associated with each class. This is performed by identifying Training Sites or Areas, which are typical sample sites of known cover types. To construct the thematic map of Land cover and to identify the Land use land cover classification of the research area, the maximum likelihood classifier (MLC) was used to classify land cover into two classes (agricultural and non-agriculture) (Figure 2).
The accuracy of a map created from remote sensing data must be assessed. The most popular technique to communicate the accuracy of categorization results is using an error matrix. The error matrices were used to calculate overall accuracy, user and producer accuracies, and the Kappa statistic. The Kappa statistic integrates the error matrices' off diagonal portions and represents agreement after reducing the fraction of agreement that may be anticipated to happen by chance. As a result, the above-mentioned classifications (agricultural and non-agriculture) were represented evenly. The enough number of samples that represent the thematic classes and ensure good distribution across the map is important to test the attribute accuracy. As a rule of thumb Congalton et al. (2008) recommends at least 50 samples per class. If the area exceeds 500km2 or the number of categories is more than 12, then at least 75-100 samples should be taken per class. These recommendations coincide with those recommended by Fenstermaker (1991). The number of samples for each category might be adjusted based on the relative importance of that category for a particular application. To verify attribute correctness, there must be a sufficient number of samples that represent the thematic classes and are distributed evenly across the map. Congalton et al. (2008) suggests at least 50 samples each class as a general guideline. If the region is greater than 500km2 or the number of categories is greater than 12, at least 75-100 samples per class should be taken. These suggestions are similar to those made by Fenstermaker (1991). Depending on the relative relevance of each category for a given application, the quantity of samples for each category may be changed. Furthermore, sampling could be assigned based on the degree of variation within each category (Congalton et al., 2008). As a result, the accuracy assessment sample size was determined to be 200, with 100 sample points created for each class. Then, for each class, these spots were produced at random and their GPS readings were placed onto a GPS for field accuracy testing (Figure3).
These points were verified in two ways: those that were visible and accessible in the field, and those that were verified using Google Earth as a reference. As a result, for the 200 sample points, the following error matrix (Table 3) is presented. The overall accuracy and kappa analysis were used to complete a classification accuracy evaluation, and the overall accuracy of the data is 90.0 percent, with a kappa coefficient of 0.80, and the interpretation may be taken as correct for further analysis based on the result.
Table 3
|
|
Ground Truth data
|
|
|
|
|
Agricultural
|
Non agricultural
|
Total
|
User Accuracy
|
Map data
|
Agricultural 88
|
8
|
96
|
91.7
|
|
Non agricultural 12
|
92
|
104
|
88.5
|
|
Total
|
100
|
100
|
200
|
|
|
Producer accuracy 88
|
92
|
|
|
Maize Crop Mask Data Derivation
Crop agro-ecology in the research area is another input for disguising crop data. According to Gorfu and Ahmed (2012), maize is primarily grown between the elevations of l500 and 2200masl, i.e. Figure 4 shows crop masks data for maize.
Preparing Independent Variables Using Mask Data of Maize.
To determine the predictive capability of the independent variables, all variables were extracted with crop mask data for further correlation analysis and to identify highly correlated ones with maize yield. The time series data (120 decadal) of NDVI have undergone image preprocesses in one goes were ready for monthly maximum value compositing (MVC).In ArcGIS there is a tool called 'Cell Statistics' under Spatial Analyst toolbox. You will add multiple rasters, which during this case is MODIS NDVI june-sept. Select the 'maximum' option and 40 monthly composited NDVI images were prepared. These monthly NDVI images were then extracted using the crop mask data to focus only on crop of interest then average NDVI value for every year was computed. The calculated value is in raster value, which ranges from 0 to 255 and needed to be changed to NDVI value. Thus, the formula, emodis NDVI = Float (Smoothed eMODIS NDVI - 100) / 100 (Gidey et al., 2018), was run and also the result were ready for correlation with sorghum yield (Table 4). CHIRPS time series data of Decadal image was also composited at monthly level using MVC and were extracted with crop mask data and yearly average was computed from the extracted results for further analysis (Table 4). The WRSI model is a ratio of seasonal actual crop evapotranspiration (ETA) to the seasonal crop water requirement, the same as the potential crop evapotranspiration (PETc). Here, sorghum crop coefficient from LEAP software was adopted for the phonological from planting to flowering (initial 0.3, vegatative1.15, flowering1.15, Ripening0.55) (Figure 5).
Table 4
showing observed yield and independent variables.
NO
|
|
Year (meher season)
|
Yield in(qt/ht)
|
NDVIa
|
Eta
|
|
Eta total
|
WARSI
|
CHERIPS
|
1
|
|
2008
|
17.5
|
0.78
|
38.35
|
|
135.94
|
136.77
|
49.35
|
2
|
|
2009
|
20.2
|
.84
|
39.59
|
|
135.07
|
138.43
|
50.99
|
3
|
|
2010
|
21.76
|
0.84
|
39.57
|
|
136.47
|
157.01
|
59.86
|
4
|
|
2011
|
25.26
|
0.94
|
38.36
|
|
130.34
|
144.19
|
60.89
|
5
|
|
2012
|
21.84
|
0.85
|
37.99
|
|
132.64
|
155.15
|
64.87
|
6
|
|
2013
|
25.66
|
0.95
|
39.17
|
|
133.72
|
152.43
|
63.48
|
7
|
|
2014
|
28.49
|
0.95
|
39.13
|
|
136.26
|
151.09
|
65.40
|
8
|
|
2015
|
29.93
|
0.97
|
39.80
|
|
137.52
|
154.37
|
70.28
|
9
|
|
2016
|
29.51
|
0.95
|
37.69
|
|
128.50
|
144.94
|
76.18
|
10
|
|
2017
|
29.3
|
0.87
|
38.42
|
|
133.7
|
140.51
|
69.97
|
Multiple Linear Regression Analysis.
To run Multiple Linear Regression we use the data of Table 4. There were some assumptions using in this statistics: - (a) the basic assumption of the regression analysis approach is that sufficiently long and consistent time series of both remote sensing data and agricultural statistics are available. The latter are normally aggregated at the level of national/sub national administrative units, from which average NDVI values be extracted (b) The criterion variable was assumed to be a random variable (c) There would be statistical relationship (estimating the average value) rather than functional relationship (calculating an exact value) (d) Multiple linear regressions assume the relationship between the dependent and each independent variable to be linear. The linearity assumption can be tested with scatter plots (Osborne & Waters, 2002). Multiple regression analysis provides a predictive equation:
Y = β0 + β1 x1 + β2 x2 +……+ βn xn + Ɛ
Where, β0= constant
β1, β2… βn = beta coefficient or standardized partial regression coefficients (reflecting
the relative impact on the criterion variable)
X1, x2, x n = scores on different predictors
The β's are the regression coefficients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. The β0 is the constant, where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are 0. The standardized version of the β coefficients is the beta weights, and the ratio of the beta coefficients is the ratio of the relative predictive power of the independent variables (Linear regression analysis, Yan and Su, 2009). The developed model predicts the average value of one variable (Y) from the value of another variable (X). The X variable is also called a predictor. Generally, this model is called a regression model.