3.2. Remote sensing and terrain data processing
Variance Inflation Factor (VIF) was used to assess collinearity (relationship between at least two predictor variables) for Landsat 8 OLI spectral bands and derived indices as well as terrain data. Collinearity leads to uncertainty in the regression method. VIF demonstrates the grade that each independent variable is illustrated by the other independent variable, VIF is used for linear and generalized linear models [52].
About the multivariate collinearity issue does not need to concern for RF. Since RF method in training sample is not sensitive to noise. So, the accuracy of models could be greater than the other machine learning and the conventional statistical regressions [53]. Therefore, only variables in Table 4 were used for SMLR model as input parameters, which were selected by VIF method from all parameters in Table 1 and Table 2.
3.3 Stepwise multi linear regression
At first, the correlation between several RS data and terrain attributes (which were chosen by VIF method) with measured ECe was investigated by Pearson correlation coefficients method (Table 4). This table indicated that the covariates which had**, were significantly correlated with ECe at P < 0.01, covariates that had*, were also significantly related to ECe at P < 0.05 (but this correlation was weaker than **). According to Table 4 the strongest positive correlation was between B10 and EC, with a correlation coefficient of 0.514 as well, B11 and SAR with a correlation coefficient of 0.299, there were significant at P < 0.01.
Different regression models have been constructed between the measured ECe and SAR values and specific RS data as well as terrain attributes (parameters which were chosen according to their significantly, Table 4). Table 5 shows the results of best equations for ECe, SAR and auxiliary data. As it can be seen in equations 1 to 6 and Table 5, the most important covariates for ECe were band10, band11, relative slope position (RSP), Generalized difference vegetation index (GDVI) and Enhanced vegetation index (EVI) with R2val = 0.62, 0.52, 0.54 and R2adjval = 0.58, 0.47 and 0.52 respectively. The best equations for SAR were also obtained using of band10, band11, Ratio Vegetation Index (RVI) and GDVI covariates with R2val = 0.49, 0.46, 0.45 and R2adjval = 0.46, 0.45, 0.43. The best regressions for auxiliary data EC and SAR were presented as follows:
$$\text{E}\text{C}=80.79+2.60\left(\text{b}10\right)-2.31\left(\text{b}11\right)-0.259\left(\text{R}\text{S}\text{P}\right)$$
1
$$\text{E}\text{C}=53.94+2.78\left(\text{b}10\right)-2.42\text{b}11+0.5\left(\text{G}\text{D}\text{V}\text{I}\right)$$
2
$$\text{E}\text{C}= 102.08+2.90\left(\text{b}10\right)-2.60\left(\text{b}11\right)+0.034\left(\text{E}\text{V}\text{I}\right)$$
3
$$\text{S}\text{A}\text{R}=231.08+2.97\left(\text{b}10\right)-2.67\left(\text{b}11\right)-0.043\left(\text{P}\text{C}3\right)$$
4
$$\text{S}\text{A}\text{R}=27.64+2.81\left(\text{b}10\right)-2.39\left(\text{b}11\right)+0.168\left(\text{R}\text{V}\text{I}\right)$$
5
$$\text{S}\text{A}\text{R}=27.64+2.81\left(\text{b}10\right)-2.39\left(\text{b}11\right)+0.168\left(\text{G}\text{D}\text{V}\text{I}\right) \left(6\right)$$
Table 4
Pearson correlation coefficients between auxiliary data and measured ECe and SAR
Covariate
|
EC(dS/m) 0–15 cm
|
Covariate
|
SAR
|
RSP
|
-0.237**
|
RSP
|
-0.067
|
S
|
-0.064
|
S
|
-0.085
|
TWI
|
0.070
|
TWI
|
0.052
|
VD
|
0.093
|
VD
|
0.027
|
CNBL
|
0.207*
|
CNBL
|
0.157
|
VDCN
|
-0.152*
|
VDCN
|
-0.106
|
AH
|
-0.19
|
AH
|
0.101
|
AS
|
-0.097
|
AS
|
-0.209*
|
CA
|
0.067
|
CA
|
0.026
|
LSF
|
-0.127
|
CI
|
0.107
|
MRRTF
|
0.204*
|
LSF
|
0.038
|
FA
|
0.105
|
MRRTF
|
0.167
|
S5
|
0.210*
|
FA
|
0.059
|
S7
|
-0.190*
|
S5
|
0.005
|
S9
|
0.249**
|
S7
|
-0.202*
|
SI
|
0.285**
|
S9
|
0.088
|
SI1
|
0.290**
|
SI
|
0.141
|
CRSI
|
0.093
|
SI1
|
0.141
|
GDVI
|
-0.327**
|
CRSI
|
-0.242*
|
B2
|
0.267**
|
GDVI
|
-0.293**
|
B9
|
0.107
|
b2
|
0.138
|
B10
|
0.514**
|
b5
|
-0.135
|
B11
|
0.483**
|
b9
|
0.059
|
PC3
|
0.161
|
b10
|
0.297**
|
EVI
|
0.330**
|
b11
|
0.299**
|
RVI
|
-0.327**
|
PC3
|
0.280**
|
NDVI
|
-0.327**
|
EVI
|
-0.145
|
BSI
|
0.283**
|
RVI
|
-0.293**
|
|
|
NDVI
|
-0.256*
|
|
|
BSI
|
0.106
|
Table 5
Evaluation data of the models generated by stepwise multiple linear regression
Soil Depth (cm)
|
Equation
|
Calibration
|
Validation
|
R2cal
|
R2adjcal
|
R2val
|
R2adjval
|
EC (dS/m)
|
1
|
0.69
|
0.66
|
0.62
|
0.58
|
2
|
0.59
|
0.55
|
0.52
|
0.47
|
3
|
0.57
|
0.54
|
0.54
|
0.52
|
SAR
|
4
|
0.51
|
0.48
|
0.49
|
0.46
|
5
|
0.48
|
0.46
|
0.46
|
0.45
|
6
|
0.46
|
0.45
|
0.45
|
0.43
|
According to the results, evaluation of SMLR equations shows moderate to weak predictive ability using terrain and RS data as well spectral indices derived from Landsat 8 OLI data. An et al. [54] used SMLR in a study in combination with soil spectra measured in field condition and satellite-based remote-sensing images, along with laboratory measurements of soil sample salinity. Their best model for the prediction of soil salinity using the RS data indicated R2 of 0.896, verification R2 of 0.867 and RMSE of 0.264. Rahmati and Hamzehpour [55] reported that the constructed regression relations could show a robust prediction of the soil salinity with the Radj2 up to 0.875 and the best equation was related to the data set with NDVI values above 0.35. In a research was conducted by Hihi et al. [56] results demonstrated that applying linear regression model with combining the Sentinel_2 SWIR bands and the salinity index could illustrate 48% of the spatial variation of soil salinity in the study area.
3.4 Random forest regression
RF was used to model the relationship between ECe, SAR and total auxiliary data. Results demonstrated that the RF model could provide a good relationship between auxiliary data and soil salinity and alkalinity (Table 6). The highest accuracy were obtained with R2val = 0.82 and RMSEval = 7.35 dS/m for EC and R2val of 0.76 also, RMSEval of 11.20 were achieved for SAR. Figure 2 shows the scatter plots of the measured versus predicted ECe and SAR for calibration and validation data set. Spatial distribution of soil salinity and alkalinity maps as well the important parameters in each depth are shown in Fig. 3. Two variable importance (VI) indicators were calculated using the RF model, included the percent increase of the mean squared error (%IncMSE) and the cumulative increase in node purity (IncNodePurity) [57].
Table 6
Evaluation data of the models generated by random forest regression
parameters
|
Calibration
|
Validation
|
R2cal
|
RMSEcal (dS/m)
|
R2val
|
RMSEval (dS/m)
|
EC
|
0.80
|
7.73
|
0.82
|
7.40
|
SAR
|
0.88
|
4.63
|
0.83
|
3.37
|
Results showed that the use of Landsat 8 OLI images and terrain data can lead to an acceptable accuracy in soil salinity and alkalinity estimation. Evaluation of those maps in this paper based on Fig. 2 showed strong predictive ability of RF model. In addition, band 10 values (Thermal Infrared Band, 10.60-11.19 µm) were found to be highly correlated with ECe and SAR (Fig. 3).
The temperature of soil surface is affected by internal as well as external factors. Thermal conductivity and heat capacity are considered as internal factors. The rate at which heat passes through a substance is measured with thermal conductivity. The soil's thermal conductivity relies on physical characteristics of the soil including soil particles, air, moisture and porosity. The external variables that affect the surface temperature are meteorological conditions such as, solar radiation, air temperature, relative humidity, wind speed and cloudiness. Thermal infrared bands specially B10 was broadly utilized in the investigation of soil salinity and soil water [58]. The B10 value describes the surface temperature and the high value associated with the high surface temperature. Land surface temperature is mainly affected by soil moisture. The zones with low soil moisture content, where the areas with high salinity in soil surface [59]. The capacity of thermal band Landsat TM for monitoring of soil salinity was evaluated by Alavipanah and Goossens [60]. The results of this research revealed that the addition of the thermal band information contained some helpful information that could play an important role in soil salinity and alkalinity studies.
The flow accumulation (FA) map includes values of cumulative hydrologic flow, which show the quantity of information pixels that contribute any water to outlet. The activity was utilized to understand the drainage pattern of the terrain. According to results of Elmahdy and Mohamed [61], there is a good relationship between FA, groundwater salinity, topographic features and salt-affected soil under irrigated agriculture in arid regions. By comparing Fig. 3 with Fig. 4, it can be found that in places with the highest salinity and alkalinity B10 and B11 maps had the maximum value. Furthermore, B10 and PC3 had the lowest values in irrigated agriculture lands (Fig. 1 and Fig. 4). According to Fig. 3, PC3 was another important parameter that playeda key role in salinity modeling. The PCA is computing based on the eigen vectors and eigen values. Csillag et al [62] stated that principle component analysis is used to separate saline from non saline soils by the stable brightness of PC1 and the stable greenness of PC2, while the differential brightness in PC3 and the differential greenness in PC4 are used to understand the changes that occurring in salinity. According to RS images, large amounts of PC2 were distributed mostly in salt spots. In addition, farmland or wetland had mainly low PC2 values [63]. Figure 4 shows the spatial distribution of TWI, the small values are generally associated with plateau, the intermediate values are related to parts of the piedmont alluvial plain, and the larger values of TWI is corresponded to river alluvial plain and flood plain, which area showed high potential of accumulation of soluble salts such as sodium, calcium and magnesium and caused to higher alkalinity in this area. Moore et al. [64] showed strong relationship between soil salinity and TWI. Also, they have previously been used to classify areas with saline soils by TWI, which displaying landscape degree of wetness and hydrology. BSI is composed of blue, red, near infrared and short wave infrared spectral bands. In order to evaluate the soil mineral compound short wave infrared and the red band are utilized, whereas to increase the attendance of vegetation, the blue and the near infrared spectral bands are used [65]. Some spectral indices including the BSI, Normalized Difference Salinity Index (NDSI), and Salinity Index (SI) have been suggested in order to recognize and map salt-affected soils [66]. In a research performed by Noroozi et al. [67] on 288 soil samples, result revealed that mid-infrared band (TM Band-7), visible band (TM Band-1), Tasseled cap3, Wetness index and PCA2 had strong correlation with the observed EC values in soil surface. RS has been shown satisfactory results in predicting soil salinity. Meanwhile, the spatial distribution of soil salinity seems to be correlated with one more variables based on the properties of region under research therefore, there is no universal spectral index which can use with the best outputs in any environmental conditions [8].Results showed that the most important terrain data, which used in RF modeling, wereVDCN, AH, FA, TWI (Fig. 3 and Fig. 4).Since, the most common land use of study area was bare land, AH covariate was identified as an important terrain data, which could distinguish the land without vegetation cover. Analytical hill shading images are helpful not only display landforms but also to recognize lineaments, to because the shaded relief images show bare land surfaces that are not covered by vegetation [68]. Allbed et al. [9] and Taghizadeh-Mehrjardi et al. [31] used Landsat and terrain data in order to predict the soil surface salinity, they reported R2 values around 0.65 and 0.87 by genetic programming. According to Pal [69] and Wu et al. [25], both SVM and RF could attain equally well land cover mapping with very high preciseness about 95.7– 96.8% of local sites despite taking much longer processing time than the maximum likelihood. There are several studies have been carried out on soil surface, however in this research, RS datawere used along with terrain data for different soil depths in addition to the topsoil, and the results showed good accuracy in modeling of EC with these covariates up to 60 cm. As can be seen in Fig. 1 and Fig. 3, soils with the highest salinity and alkalinity were located on the sides of the river in bare land as well as in flood plain and river alluvial plain according to the land use and physiographic maps, respectively. However, the soil with the lowest level of salinity and alkalinity can be found in the irrigated agriculture and piedmont alluvial plain based on the land use and physiographic maps. Since, the mean amount of EC and sodium adsorption ratio (SAR) of Doviraj river were 4220 µS/cm and 3.8 respectively, therefore, water quality for this river was classified as C4S1, which indicates very high salinity and slight sodicity [70]. The other reasons of salinity in the study area were the strong evaporation and low precipitation, which caused difficulty for leaching the salts, the presence of gypsum and carbonate calcium materials in soils, and the water table which was between two and three meters from the surface of the soil around the river. In addition, the soils in this area have been classified into four suborders of cambids, calcids, gypsids and salids [26].