A New Approach in Comparison and Evaluation of the Overall Accuracy of Six Soil-Water Retention Models Using Statistical Benchmarks and Fuzzy Method

In this research, six samples of valid and widely soil-water retention curve (SWRC) estimation models, including van Genuchten, Brooks and Corey, Fredlund and Xing, Durner, Kosugi, and Seki were studied. To realize this approach, in the first step, the accuracy of each model was calculated using ten statistical benchmarks, and then the numbers obtained from the fitting accuracy were standardized based on each benchmark by the standard fuzzy method so that all of them had a similar scale. Finally, with the sum of the fuzzy standardized values, an index for each model was obtained as a new index called the Best-Fit Model (BFM) index, which was the basis for comparing and evaluating the overall accuracy of the models in each soil texture. Accordingly, if the BFM index is more extensive and closer to 10, the model is more fitted and gets a higher rating. The superiority of this method to other similar studies is that here, as in the multi-criteria evaluation methods, it can simultaneously assess in terms of different statistical benchmarks and ranking the models according to the diversity in the reaction of each to various benchmarks provided. The results showed that Brooks and Corey model with the lowest and the Durner model with the highest BFM and rank among other models in most soil textures are considered as the weakest and as the most suitable model from the overall accuracy viewpoint of fitting based on the approach used throughout this study, respectively.


INTRODUCTION
The unsaturated soil area controls the porosity of the environment, the transfer of water and materials (pollutants), and is vital in soil studies [23]. Essential characteristics in the unsaturated soil part are the moisture curve and unsaturated hydraulic conductivity. The SWRC is one of the essential physical properties of soil and shows the relationship between the suction of the matrix (h) and soil moisture (θ_h) and in soil and water issues such as irrigation and drainage, soil conservation, movement of pollutants in soil and water flow modeling it is crucial in soil. However, there are many direct methods for measuring soil hydraulic properties [12], but most of them are timeconsuming and costly [1]. Besides, due to the spatial and temporal variability of these properties, a large number of samples are needed to describe these characteristics accurately under field conditions [25].
For these reasons, indirect methods for estimating the SWRC are widely used. Pedotransfer functions are indirect methods for estimating the soil moisture content characteristics of soil that can save time and costs in addition to proper precision. Considering the importance of the SWRC in soil unsaturated area studies, the assessment of the accuracy of the Pedotransfer functions is critical in estimating the SWRC. Therefore, many studies in Iran and other parts of the world to evaluate the accuracy models of SWRC fitting have been carried out, including the following: Manyame et al. [33] compared the van Genuchten [53] and Campbell [10] models in the sandy soils of Nigeria and found that the accuracy of the Campbell [10] model was higher than the van Genuchten [53] model for soil samples with higher sandy percent. Furthermore, the estimation of the van Genuchten [53] model in the dry area of the SWRC (high suction) was higher than the real value and showed excellent performance in the wet area (low suction).
Ghanbarian-Alavijeh and Liaghat [20] by comparing different Pedotransfer functions in 72 soils collected from three data banks showed that Saxton et al. [44] pedotransfer function compared to other functions, such as Rawls and Brakensiek [41] and Campbell [10], showed more accurately estimate the SWRC.

SOIL PHYSICS
Patil et al. [40] showed that the van Genuchten [53] model is superior to the Brooks and Corey [6] model by studying Indian Vertisol soils. Nakhaei and Šimůnek [37] used the estimation of feature parameters classifying soil hydraulic from transient water flow to do parameter estimation of soil hydraulic and thermal property functions for unsaturated porous media. Their results showed that whenever the number of unknown soil thermal parameters is three and soil hydraulic parameters are more than two, measurements of cumulative infiltration and temperature data would not necessarily result in unique and reliable estimates of hydraulic parameters.
Nakhaei and Amiri [38] used the inverse solution by HYDRUS-1D to assess soil hydraulic characteristics. By solving a one-dimensional, the soil-water retention data were estimated, and the inverse water redistribution problem was solved using HYDRUS-1D. Their result showed that the performance of the redistribution experiment is logical for inverse parameter estimation.
Bahmani and Ramazani [5] performance of ten SWRC models such as Simmons et al. [48], Libardi et al. [31], Campbell [10], Farrell and Larson [15], van Genuchten [53], Brooks and Corey [6], power forms of the Driessen [13], exponential form of the Bruce and Luxmoore [7], and Rogowski [42] to evaluated fitting quality of the models by the root mean square error (RMSE), the Akaike index (AIC), and the coefficient of determination (R 2 ). Based on their research results, the Simmons et al. [48] model with the least error of RMSE and the lowest AIC and maximum R 2 were the best, and the Bruce-Luxmoore model showed the worst fit with the measured results in Lahijan region of Iran.
Other researchers in this area include Amanabadi et al. [3], Sun et al. [50], Liu et al. [32], Shahnazari et al. [47], and Kuang and Jiao [28]. Review previous research that has been done thus far [5,24,43,45,49,51,52,55,56] showed that no research had been conducted to compare the accuracy of the models in all soil texture categories and simultaneously in two categories field's data and laboratory data. However, according to the research records, in all the researchers conducted so far, the statistical benchmarks have been limited and separately used to check the validity of the models, and no research work has been carried out on the estimation of the results of all the benchmarks for determining the amount accuracy of the models has not been used.
The purpose of this research is to address the two issues mentioned in the review of past research. Accordingly, in this research, for the first time, to evaluate the accuracy of the soil moisture content estimation models, all soil texture categories are evaluated and compared in two categories of field and laboratory data. However, unlike previous studies in this category, the combination of estimating the results of all statistical benchmarks based on fuzzy standardization is used to determine the accuracy of models based on the production of a new composite metric, called BFM. In addition to the ranking, this index can determine the numerical value of each model's quality.

MATERIALS AND METHODS
In this research, data from 20 samples of ten soil texture categories (including both laboratory and field data sections) from the UNSODA Soil Database (Table 1) were used to draw up six valid and widely used models of SWCR. These models include van Genuchten [53], Brooks and Corey [6], Fredlund and Xing [19], Durner [14], Kosugi [27], Seki [46]. In Table 2, the formulas and parameters used for each of the models used during this study are shown. The Solver toolbox in Microsoft Excel 2016 and the SWRC Fit software [46] were used to estimate parameters of SWRC.
In the second step, to evaluate the efficiency of each model, standard statistical benchmarks used in various sciences to evaluate the accuracies were used. Each of these statistical benchmarks expresses the relationship between the measured and estimated data of each model. For example, one of these benchmarks is the correlation coefficient (R), which indicates the correlation between the estimated results of the model and the actual data. Obviously, as near as possible to one, represents the most closeness of estimated values to real values and more precision of the model.
The coefficient of determination (R 2 ) shows how the independent variable explains many percent of the changes in the dependent variable. In other words, R 2 indicates how much the dependent variable changes are affected by the independent variable, and the other changes in the dependent variable are related to other factors. The R 2 is between 0 and 100%.
The root means square error (RMSE) represents the number of deviations of estimated values from the observed values. In other words, it shows the disper- sion of the data, and as the smaller number as possible and closer to zero, it expresses the excellent performance of the model. Akaike Information Criterion (AIC) is another statistic benchmark for measuring the accuracy of estimation that is based on the concept of entropy and suggests how much the use of a statistical model leads to the loss of information. In other words, this benchmark establishes a balance between the model's accuracy and its complexity. The smaller AIC (the more negative) shows, the higher the precision of the model [2].
Another benchmark is the mean squared error (MSE), which is defined as the average of the second power of the deviation of an estimator from its real value. This benchmark is of particular utility among statisticians [29].
The median absolute deviation (MAD) benchmark is a robust measure of overlapping data. This benchmark is more resistant to the standard deviation in the field of overload data [22].
The geometric mean error ratio (GMER) is one of the other statistical benchmarks for the accuracy of estimation. If the GMER value is equal to one, the measured and estimated values are matched. If the GMER value is less than one, the estimated values are lower than the measured values, and if the GMER is higher than one, the estimated values are higher than the measured values. Therefore, the most appropriate condition is that the GMER value is close to one [54].
The Nash-Sutcliffe model efficiency coefficient (NS) has been used to quantify how well a model simulation can predict the outcome variable. The variation of this statistical benchmark is between 0 and 1.
When the values are closed to 1, the model is considered favorable that [39].
One of the best alternatives to Mean Absolute Percent Error (MAPE) is Symmetric Mean Absolute Percent Error (SMAPE) [17] when there are zero or nearzero demand for items. SMAPE self-limits to an error rate of 200%, reducing the influence of these low volume items. As low volume items could have infinitely high error rates that skew the overall error rate, they are problematic.
Chavez et al. [11] has developed the optimum index factor (OIF) as a method to determine the three-band combination that maximizes the variability in a partic- h is the suction head; θ is the volumetric water content; S e is the effective water content (Effective saturation) defined by S e = . Therefore, is a normalized form of the cumulative normal distribution function; Please note that Q(x) is different from error function; In BC model, λ is the index of pore size distribution. In VG and DB models, a is reverse suction air entry to the soil, m is model asymmetric parameter and n is in connection with the soil pore size distribution. In the FX model, e is Napier's constant; the FX model is implemented in SWRC Fit version 3.0 and higher. In the web interface, correction function, C(h) = 1. In the offline version, the correction function can be changed; other parameters are soil hydraulic parameters to be estimated.

Model Function Parameters
Brooks and Corey [6] (BC) , van Genuchten [53] (VG) , , Fredlund and Xing [19] (FX) , , Durner [14] (DB) , , , Seki [46] (BL) , , , , ular multispectral scene. The index mentioned above is based on the total variance and correlation between all possible band combinations in the dataset. The concept and methodology of OIF apply to any multilayer dataset even though its method was developed for Landsat TM data [26]. The high value of OIF demonstrates that the bands encompass much information (e.g., high standard deviation) with little duplication (e.g., the low correlation between the bands) [9]. Table 3 shows the abbreviation and calculation methods for each of these statistical benchmarks, and Fig. 1 shows the flowchart of the research process.
In the third step, a fuzzy method is used to provide a standardized index for evaluating and ranking the overall accuracy of models using all statistical benchmarks. To this end, all the calculated statistical benchmarks are standardized for each model using the "Linear" fuzzy membership function. This function creates a linear relationship between the minimum and maximum user-defined values [35,57]. The values below the minimum value tend to be linear and fixed to zero and values greater than the maximum values tend to be oriented toward one [35,34]. In this function, the changes are linear and used when the changes No. 5 2021 MOHAMMAD NAKHAEI et al. in the classes are fixed [57,34]. The formula 1 shows the "Linear" fuzzy membership function: (1) In this formula, min and max are entered by the user.
The only benchmark that does not fuzzify by the fuzzy linear membership function is GMER. In this statistic benchmark, the midpoint is the optimal point of 1, and more and less than 1, represents the distance from the optimal state. Therefore, the fuzzy "Gaussian" membership function is used for fuzzification. This function is based on changing the initial data in rows concerning regular operation. Membership in other values is reduced due to the movement of the midpoint in the positive and negative direction. The Gaussian function can be useful when membership is close to a specific value [57,30]. The formula 2 shows the fuzzy "Gaussian" membership function: , 1 x f x f In this formula, f 1 is the user-entered spread, and f 2 is the midpoint. Spread is the size of the membership function charts.
In the last step, to integrate all standardized statistical metrics, algebraic sums of the numbers are used. So that for each model, a number is entered as the accuracy of the model. This method represents the overall accuracy of the best and worst model for the SWRC of the soil. Accordingly, the BFM as the index of the sum of fuzzy operators was developed. This index indicates the validity of the results obtained from each model in a range of 0 to 10. As the number is larger and close to 10, the accuracy of the model results is higher based on the measured results.

RESULTS AND DISCUSSION
As described, in this research, six valid and widely used models for soil-moisture content curves were used. Figure 2 shows the SWRC for all textures in the models evaluated during this study. By observing these figures, somewhat from a visual viewpoint, we can see the degree of conformance of each model with real data; however, statistical benchmarks were used for accurate examination. Based on the results obtained from the evaluation of the accuracy of the model stage, the highest amount of R 2 among the Fuzzy standardization field's data to the three models FX, BC, and VG in the silty clay loam texture with a value of 1 and the lowest value is 0.702 in the BC model was observed in the sandy loam texture. On the same basis, the highest amount of R 2 was found among the lab's data of the FX model in clay with a value of 0.999 and the lowest value on the BC model in sandy clay loam texture with a value of 0.922.
In terms of AIC in the field's data, BL model in clay texture is the worst (with a value of -59.776), and LN model in silty clay loam texture is the best (with a value of -325.78) models, and among the lab's data BL model in silty clay loam texture is the worst (with a value of -38.381), and VG model in sand texture is the best (with a value of -2.51) models.
According to the results obtained from the MAD benchmark, the FX model in silty clay loam texture and BL model in the sand texture with a value of 2.718 × 10 -11 and 0.299 known as the best and the worst model among field's data, respectively. Accordingly, the FX model in the clay texture and the BC model in the silt texture with the value of 0.0002 and 0.0218 was identified as the best and the worst model among the lab's data, respectively.
Investigating RMSE variations among the field's data showed that the maximum value of this benchmark belongs to the BL model in the sand texture (with a value of 0.036), which indicates the poor performance of this model in estimating the soil moisture curve in terms of this statistical benchmark. The lowest RMSE value on the FX model in the silty clay loam texture is 6.53 × 10 -11 , which indicates the superiority of this model compared to other models in terms of its accuracy. Among the lab's data, the highest value of this benchmark belongs to the BC model in the silt texture (with a value of 0.025), and the lowest RMSE on the FX model in clay texture is 0.0002. The result of this section on the low performance of the BC model in the RMSE benchmark corresponded to the results of Patil et al. [40].
A comparison of different values of MSE shows that among the field's data, the FX model in silty clay loam with a value of 4.426 × 10 -21 is the most suitable and BL model in sand texture with 0.001 is the most unsuitable models for SWRC. Similarly, by comparing the values of this benchmark for the lab's data, the FX model in clay texture has the best (whit value of 8.299 × 10 -8 ), and the BC model in the silt texture has the worst accuracy (whit value of 6.420 × 10 -4 ).
The results of the GMER benchmark show that the DB model in the loam texture and BL model in the sand texture with a value of 1.00001 and 1.935 known as the best and the worst model among field's data, respectively. Accordingly, the LN model in the silty clay loam texture and the BC model in the silt loam texture with the value of 1.000008 and 1.211 was identified as the best and the worst model among the lab's data, respectively.
The best NS values in the field's data (similar to the R 2 ) belong to the three models of FX, BC and VG in the silty clay loam texture with a value of 1, and the lowest value of this benchmark was 0.691 in the BL model in the silty clay loam texture was observed. Accordingly, similar to the results of the R 2 , the FX model in the clay data with the value of 0.999 best and the BC model in the sandy clay loam texture with a value of 0.920 showed the worst accuracy among the lab's data.
In the SMAPE benchmark among the field's data, the FX model in silty clay loam with a value of 8.237 × 10 -11 is most suitable, and the BL model in the sand texture of 0.225 is the most unsuitable model for SWRC. Similarly, by comparing the values of this benchmark for the lab's data, the FX model in the clay texture (with a value of 7.47 × 10 -4 ) has the best, and the BC model in the silt loam texture (with a value of 0.410) has the worst match with measurement data.
According to the results obtained from the OIF benchmark in the field's data, the DB model in the silty clay loam with a value of 0.010 is the best, and the LN model in the loamy sand texture with 0.197 is the worst match with the measurement data. In the lab's data, the LN model in clay texture (with the value of 0.197) has the best, and the LN model in the loamy sand texture (with the value of 0.277) has the worst conformance to the measurement data.
It is necessary to note that the process of computing the values of statistical benchmarks and fuzzy standardization for each texture has been implemented in all models, in which only the best and worst models are presented.
It seems that the best model in all benchmarks always is expected to have the best matching results, and vice versa about the worst model seems that have the worst matching results. However, in practice, in terms of some of them did not happen. For example, according to the results of calculating the BFM index, the DB model in the clay loam texture is presented as the best and most consistent model with the measurement data. While in terms of compliance with the statistical benchmarks (R 2 , AIC, MSE, RMSE, R, SMAPE) has an excellent quality, it has a good quality in three statistical benchmarks (MAD, NS, OIF) and has inferior quality in one of them (GMER). These results highlight the importance of simultaneously evaluating models in terms of different statistical benchmarks. Thus, simultaneous evaluation of different statistical benchmarks, such as those that occur in multi-criteria evaluation methods, allows the classification of models to vary according to the difference in response to each other relative to different statistical benchmarks. In the following, Table 4 shows the fitting accuracy of SWRC estimation models in each texture based on the numerical values of the BFM index. As can be seen, among the lab's data, the highest value of the BFM index belonging to the clay texture in the FX model, and EURASIAN SOIL SCIENCE Vol. 54 No. 5 2021 MOHAMMAD NAKHAEI et al.  the lowest of this index was observed on sandy loam in BC models. Furthermore, among the field's data, the highest BFM index belonged to clay loam texture in the BL model and the lowest in clay and BL models.
According to the classification given in Table 4, it is observed that among the lab's data, the BC model with 70% and the highest frequency of rank 6 (lowest rating) and DB model with 40% and the most abun- Silt loam (field data) Silt loam (lab data) Sandy loam (field data) Sandy loam (lab data) Sandy clay loam (field data) Sandy clay loam (lab data) Clay loam (field data) Clay loam (lab data) Silty clay loam (field data) Silty clay loam (lab data)  dance of rank 1 (top rank) are introduced as the worst and the best models, respectively. Moreover, among the field's data, the BC model with 50% and the highest frequency of rank 6 (lowest rank) and models of FX and DB each with 30% and the most abundant of rank 1 (highest rank), are introduced as the worst and the best models, respectively. The results of this section, in terms of the poor performance of the BC model, agree with the results of Giménez et al. [21], Nabizadeh and Beigi Harchegani [36], Ferreira et al. [16], and Bahmani and Ramazani [5]. It is worth mentioning that despite the introduction of the BC model among all lab and field data with the highest frequency of rank six as the most improper model, it cannot be concluded that this model does not have the proper performance in all texture because, as in Table 4 are found, in some textures have ranked 1, 2 or 3. Thus, considering the low rating of this model in most of the textures, it should be doubted about the application and accuracy of its results compared to other models. Since the BC model showed a slight accuracy in fitting, it can be stated that any size of soil pores is more uniform, by increasing the amount of λ, which indicates the pore size distribution, the slope of the SWRC increases in the non-saturated soil [5]. This model offers acceptable results for coarse-textured soils with even pores, but its results are unsuitable near the saturation point, especially in heavy textured soils (clay, clay loam, silty clay loam) [21]. According to the DB model, which among other models, with seven parameters in its formula, considers the maximum value of the parameters, it can be argued that using more parameters and considering more soil properties (but not always) can dramatically increase the conformance of the model results with the measured results. However, the ease of using the model is reduced [8].
CONCLUSIONS In this study, a new approach was proposed to evaluate and compare the accuracy of the fitting of six SWRC estimation models by integrating the results of statistic benchmarks for each model and texture based on the fuzzy method. This method requires fuzzy standardization of the results of different statistical benchmark for each soil texture in each model for both scaling and preparation for integration. It seems that this integration of the different benchmarks can be acceptable results like the multi-criteria evaluation methods with simultaneous assessment of different statistical benchmarks and ranking of models concerning the difference in the reaction of each to different statistical benchmarks. This argument is proved by comparing the results of this research with similar research.
Twenty soil samples were selected from the UNSODA database to implement the mentioned approach. In total, ten soils from field data and ten soils with their corresponding texture from lab data were selected. Then, the fitting accuracy of each model for soil texture was calculated and evaluated using statistical benchmarks of AIC, GMER, MAD, MSE, NS, OIF, R, R 2 , RMSE, SMAPE. Finally, with  • Among the field's data, the LN model in silty clay loam texture in terms of AIC, DB model in loam texture in terms of GMER, FX model in silty clay loam texture in terms of MAD, FX model in silty clay loam texture in terms of MSE, three models of FX, BC and VG in silty clay loam in terms of NS, DB model in silty clay loam texture in terms of OIF, three models of FX, BC and VG in silty clay loam texture in terms of R, three models FX, BC and VG in silty clay loam texture In terms of R 2 , the FX model in silty clay loam texture in terms of RMSE and the FX model in silty clay loam texture in terms of SMAPE were introduced as the best models for SWRC curve fitting. Accordingly, BL model in clay texture in terms of AIC, BL model in sand texture in terms of GMER, BL model in sand texture in terms of MAD, BL model in sand texture in terms of MSE, BL model in silty clay loam texture in terms of NS, LN model in loamy sand texture in terms of OIF, BC model in sandy loam texture in terms of R, BC model in sandy loam texture in terms of R 2 , BL model in sand texture in terms of RMSE and BL model in sand texture in terms of SMAPE were considered as the worst models for SWRC curve fit.
• Based on the results of calculating the BFM index among the lab's data, the DB model in clay loam texture, FX model in sandy loam texture, BL model in silt texture, DB model in silt loam texture, DB model in sand texture, LN model in silty clay loam texture, FX model in clay texture, BL model in sandy clay loam texture, DB model in loam texture and BL model in loamy sand texture were introduced as the best models. Accordingly, the BC model in clay loam texture, BC model in sandy loam texture, BC model in silt texture, BC model in silt loam texture, LN model in sand texture, BC model in silty clay loam, VG model in clay texture, BC model in sandy clay loam texture, LN model in loam texture and BC model in loamy sand texture were introduced as the worst model.
• Based on the results of calculating the BFM index among the field's data, the BL in clay loam texture, DB model in sandy loam texture, LN model in silt texture, BC model in silt loam texture, BC model in sand texture, FX model in silty clay loam texture, FX model in clay texture, DB model in sandy clay loam texture, DB model in loam texture and FX model in loamy sand texture were introduced as the best models. Accordingly, the BC model in clay loam texture, BC model in sandy loam texture, BC model in silt texture, LN model in silt loam texture, BL in sand texture, BL in silty clay loam texture, BL in clay texture, VG model in sandy clay loam texture, BC model in loam texture, BC model in loamy sand texture was introduced as the worst model.
• In overall result, due to the lowest BFM index and the lowest rank among other models in most soil textures in both field and lab data, BC model is considered as the weakest and DB model with the highest BFM and highest ratings in most textures were introduced as the most suitable model for SWRC fitness in terms of overall accuracy.

ACKNOWLEDGMENTS
The authors are thankful to Kharazmi University for providing the necessary facilities to carry out this work. Fig. S1. An example of AIC criteria's calculations. In this figure, At is actual data, Ft is forecast data, n is the number of samples, and k is the number of estimated parameters in model. Table S1. The best and worst model in each texture based on lab's data Table S2. Example of fuzzy standardization and calculation the BFM index in each texture based on lab's data Table S3. The best and worst model in each texture based on field's data Table S4. Example of fuzzy standardization and calculation the BFM index in each texture based on field's data