Estimating soil–water characteristic curve (SWCC) using machine learning and soil micro-porosity analysis

This study explores soil water characteristic curve (SWCC) prediction through informatics and machine learning. Utilizing these techniques, SWCC prediction was significantly simplified, enabled by the Orange.3 data mining software's integration of diverse soil properties. This integration eliminated the need for extensive programming, establishing a link between scientific insights and engineering applications. Limitations emerged in models relying solely on matric suction for SWCC prediction, evident through a Mean Absolute Error exceeding 0.08 and an R-squared value below 40% in the test dataset. To enhance accuracy, a comprehensive approach encompassing various soil properties, such as bulk density, organic carbon content, and micro-porosity characteristics, was employed. The Gradient Boosting algorithm excelled, yielding near-perfect SWCC estimations with RMSE and Pi values of 0.016 and 0.03, respectively. Likewise, AB, Random Forest, and Tree models displayed highly accurate predictions with RMSE and Pi values below 0.03 and 0.04, respectively. However, Neural Network, SVM, kNN, and Linear Regression models showed no improvements, even with added soil properties. Feature importance analysis highlighted matric suction's critical role in select models and soil micro-porosity characteristics' contribution to lowering RMSE by up to 0.04. These findings are pivotal in understanding errors in SWCC prediction, especially in cases of matric suctions surpassing the SWCC inflection point, with these errors, though present, minimally impacting model efficacy due to diminishing variations at high matric suctions.


Introduction
The Soil Water Characteristic Curve (SWCC) is closely associated with soil physical properties and plays a crucial role in soil and water management (Shwetha and Varija, 2015).The SWCC provides valuable direct and indirect information about the behavior of water in unsaturated soils (Zhai and Rahardjo, 2012; van

Genuchten et al., 2015)
. There is a need to reliable determination of the SWCC of any given soil using a combination of both measurement and predicting techniques.However, all the eld, laboratory, and computer vision-based measurements of SWCC are expensive, tedious, time-consuming, and sometimes impossible due to issues related to scaling, spatial variability, and stud-site inaccessibility (Achieng, 2019) thus use of modeling procedures is a very common approach to predict SWCC (Dobarco et al., 2019).
While multiple linear regression (MLR), ANN, and SVR, have been commonly used in the development of pedo-transfer functions (PTFs) (Rani et al., 2022).There has been a signi cant increase in the application of machine learning (ML) algorithms such as LR, ANNs, SVMs, classi cation and regression tree (CART), and RF, in soil moisture researches.These ML algorithms are preferred for their non-parametric nature and ability to capture complex and non-linear relationships (Padarian et al., 2020).
Machine learning techniques for estimating SWCC fall under the category of supervised learning, where a labeled training dataset is provided with known output values.The model is trained using algorithms applied to the input dataset to predict the desired output.Training continues until the model achieves the desired accuracy on the training dataset.Supervised learning is commonly used for classi cation and regression tasks (Rani et al., 2022).Achieng (2019) conducted a comparative study of several ML algorithms for modeling SWCC in loamy sand soil.They found that the RBF-based support vector regression (SVR) outperformed SVR with linear and polynomial kernels, single-layer ANN, and deep neural network (DNN) models.In another study, Araya and Ghezzehei (2019) demonstrated the superior performance of the Boosted Regression Tree (BRT) model compared to other algorithms, such as KNN, SVR, and RF, for predicting saturated hydraulic conductivity.However, the RF model closely followed the BRT model in terms of performance.These ndings highlight the satisfactory performance of various ML algorithms in predicting environmental events.For instance, Hong and Pai (2007) and Hu et al., (2013) observed the effective use of techniques such as ANN, SVM, and KNN for forecasting soil water evaporation.Furthermore, Baydaroglu and Kocak (2014) observed the valuable performance of these algorithms in predicting free water surfaces, while Valipour et al., (2012;2013) utilized these algorithms to predict water reservoir in ows.As a result of their high exibility, accurate predictive performance, and consistent results, data mining techniques have become a preferred choice for many researchers seeking to enhance their understanding of unsaturated soil hydrological properties (Botula et al., 2013).
The capability of machine learning methods to accurately t the SWCC is directly in uenced by the availability of measured soil water content data at various soil matric potentials (Hastie et al., 2009; K. Lamorski et al., 2013).Toth et al. (2014) analyzed the SWCC using the RF model at four matric suctions (0.1, 33, 1500 kPa, and 150 MPa).The results demonstrated that the signi cance of soil properties in predicting soil water content varies across different soil types and matric suctions.In another study, Gunarathna et al. (2019) evaluated ML algorithms, including ANN and KNN, to estimate the volumetric water content at matric suctions of 10, 33, and 1500 kPa.Pekel (2020) applied decision tree regression, speci cally the CART algorithm, to estimate soil moisture.The input variables were air temperature, time, relative humidity, and soil temperature.In other study, Cai et al. (2019) proposed the use of a Deep Learning Regression Network (DLRN) with big data tting capability for constructing soil moisture prediction models.Numerical models like HYDRUS-2D often require a large amount of input data for simulating the time-series of soil moisture.However, if limited input data is available, ML algorithms such as SVM and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) can e ciently handle the task (Karandish & Simunek, 2016).While the accuracy of ML algorithms may be comparatively lower than numerical models, they can serve as a better alternative under limited and missing data conditions.
Although machine learning techniques have been explored in various soil moisture-related studies, the use of boosting techniques for this purpose is relatively rare (Araya and Ghezzehei, 2019).Boosting methods aim to iteratively combine weak learners to create a strong learner that can provide more accurate predictions.One popular technique in boosting is gradient boosting, which involves sequentially adding predictors to an ensemble, with each predictor correcting the errors made by its predecessor.Unlike AB, which adjusts the weights of data points, gradient boosting trains on the residual errors of the previous predictor.In this study, gradient boosting and AB were selected as the most popular boosting-based algorithms for the estimation of SWCC using ML.
Vereecken et al. (2010) concluded that incorporating soil structure information as one of the predictors in PTFs is likely to enhance their performance.Nguyen et al. (2014) found that including categorical soil structure information in point PTFs developed using the MLR technique improved the accuracy of SWCC estimation for tropical paddy soils.They also suggested further investigation to explore whether these improvements would hold true when using different data mining techniques and for other types of PTFs.Passoni et al. (2014) utilized ImageJ software to characterize the porosity of Oxisols in the southeastern region of Brazil by relying on shape factors or pore form.ImageJ offers a convenient built-in option for analyzing soil porosity.This feature provides valuable output parameters including the number of porosity surface area (Total Area of Porous Regions, cm 2 ), volume (Total Number of Porous Voxels × Voxel Volume, cm 3 ), elongation (Major Axis Length/Minor Axis Length, dimensionless), atness (Average Length of Major Plane/Average Length of Minor Plane), sphericity (4π × area/perimeter 2 , dimensionless), and compactness (volume of the porous region/surface area of the porous region, dimensionless).Building upon these ndings, we utilized detailed soil structural properties derived from image analysis as inputs in the machine learning technique employed in this study.The aim was to assess whether the incorporation of such soil structure information would contribute to improved SWCC estimation.
Data mining techniques have shown superiority in modeling the interactions of the soil-water complex system compared to traditional MLR techniques.
However, these techniques also have some drawbacks, including susceptibility to over tting, high data demand, and expert knowledge requirements.In this study, machine learning methods were employed to analyze soil structure using selected soil properties.Therefore, our objective is to predict SWCC in soil samples with different properties using data mining algorithms.The prediction process was conducted under two conditions: 1) using matric suction as the only prede ned input, and 2) using a range of input parameters obtained from laboratory and image analysis methods.

Soil sampling
The characteristics of soil porosity and SWCC of two soil samples with textural classes of loamy sand and silty clay, taken from Arenosols (located at coordinates 35° 54′ N and 50° 32′ E), and Vertisols (located at coordinates 36° 22′ N and 49° 35′ E) of Central Iran.These soil samples were subjected to various treatments, including amendments such as CaCO3, Fe2O3, vermin-compost, their combined treatment, cation treatment, and a structural degradation treatment.The aim of this study was to examine the effects of these treatments on the soil porosity and SWCC.The topsoil (0-10 cm) was sampled, dried and sieved to pass a 2 mm sieve and analyzed to determine typical soil characteristics.The basic soil properties that have been widely used for SWCC estimation (Wassar et al., 2016), including soil organic carbon (SOC) (Walkley & Black, 1934), particle size distribution (PSD) (Gee & Or, 2002), cation exchange capacity (CEC) (Rhoades, 1983), electrical conductivity (EC) (Rhoades, 1996), pH (Thomas, 1996) and the parameters of soil porosity (a, n, Θs, Θr, hi) (Dexter et al., 2008) were determined using the standard methods.The bulk density of soil samples was determined using the core method (Grossman & Reinsch, 2002), in the undisturbed samples collected using Kopecky rings (5 cm in height and 5 cm in diameter).Machine learning (ML) models, known for their ability to predict target properties without limitations (Bell, 2022), were employed to model the data obtained from the studied soils and analyze the effects of different treatments.

Preparation of the treatments
The soil samples were subjected to various treatments, including four levels of CaCO 3 (0%, 1.5%, 3%, 5%), Fe 2 O 3 .7H 2 O (0%, 0.5%, 1%, 2%), vermicompost (0%, 0.6%, 1%, 2%) with the composition of vermicompost (OC, N, etc.) incorporated, and a combination of these treatments (including the blank, 1st level: CaCO 3 = 1.5%, Fe 2 O 3 .7H2O= 0.5%, vermicompost = 1%, as well as the 2nd and 3rd levels combined).Furthermore, to assess the effects of cations on soil structure, the soil pots were irrigated with solutions containing CaCl 2 and NaCl at concentrations of 0, 5, 10, and 20 meq L − 1 during the incubation period, as explained subsequently.In addition to the aforementioned amendment treatments, three replicates of degraded treatments were included in the study.These degraded treatments were prepared through a consolidation process speci c to each soil texture.The treated samples were weighed and lled in 1 kg pots (without drain) and incubated at room temperature 24 ~ 26°C.To investigate the effect of shrink-swell on soil structure development, all treated samples (including disturbed and structure-less soils) were subjected to a series of periodic shrink-swelling and wetting and drying cycles, repeated 20 times.The soaked pots were weighed daily to track minimal variations.Rewetting until eld capacity is achieved by carefully adding water to the sponge cover placed on top of the columns to avoid disturbing soil conditions.Detailed information about the studied treatments can be seen in Table 1.The total number of samples studied was 128, which consisted of 5×4×3×2 (amendment treatments) + 4×2 (degraded treatments).One of the replicates was utilized to prepare intact samples for measuring the soil water characteristic curve using pressure plate and pressure membrane methods, while the other two replicates were impregnated with a mixture of polyester resin, catalyst, hardener, and uorescent dye for UV photography and subsequent image analysis.

Determination of the soil water characteristic curve
The soil water characteristic curve (SWCC) was constructed by combining the results obtained for water content at both low matric suctions (0, 10, 20, 40, and 70 cm) using a sandbox apparatus and higher matric suctions (100, 300, 500, 1000, 3000, 5000, 9000, and 15000 cm) using pressure plate/pressure membrane apparatus.Undisturbed samples were used to determine the lower matric suctions ranging from 0 to 1000 cm, while disturbed samples were utilized for matric suctions ranging from 3000 to 15000 cm.2.4.Preparation of samples for imaging after SWCC construction 2.4.1.Impregnation with uorescent dye resin A total of 128 pre-treated samples were impregnated with a mixture of polyester resin and styrene in a 5:1 ratio, along with an appropriate amount of hardener and catalyst.To enhance the visibility of soil pores for digital imaging, 2 g.L − 1 of uorescent dye was added as a brightener (Ringrose-Voase 1996).The mixture was added up to the middle height of the samples placed in plastic containers within a vacuum desiccator.The desiccator was connected to a compressor and evacuated to 8 psi for 2 h to ensure resin lling and air removal from the pores.Subsequently, the samples were taken out of the desiccator, re lled with the same mixture up to a height of 1 cm from the upper surface of the samples and subjected to a vacuum for an additional two hours.Following the vacuum process, the samples were re lled to their upper level and sealed to prevent rapid volatilization of styrene (Wei et al., 2019; Liu et al., 2016).After one week, the sealed samples were opened to let the styrene to volatilize at room temperature.Approximately 75 days later, the polyester resin reached the desired hardness.

Cutting, polishing, and imaging
The hardened samples were cut and polished to facilitate imaging and visualization of soil pores and potential structure development in the treated soils (Wei et al., 2019).Two horizontal and two vertical cuts were made on each sample, exposing the four nearest horizontal and vertical surfaces (2 sides × 2 cuts) for digital imaging.Imaging was conducted in a dark room equipped with two UV lamps to maximize the brightness of the uorescent dye impregnated in the pores, enabling visibility of even the smallest pores.The images were captured using a digital camera with a resolution of 12 MP and an f/1.8 aperture.
Subsequently, the captured images were imported into the ImageJ software for preprocessing and analysis of soil pores.

Image preprocessing
The color images were imported into the ImageJ software and then scaled, calibrated, and converted into grayscale using the image conversion module.The grayscale images were further processed by applying a thresholding method to convert them into binary images, where pores were represented as white pixels and solids as black pixels.The resulting binary images were stacked together to obtain four 3D volumes (two horizontal and two vertical) for each sample.
Key parameters of the pores, including 3D porosity (de ned as the total volume of pore voxels divided by the number of voxels), pore sphericity (ranging from 0 for elongated pores to 1 for spherical pores), aspect ratio (calculated as the ratio of the short axis to the long axis), and object orientation, were determined using 3D and 2D plugins in ImageJ.Object orientation refers to two angles: φ (ranging from 0° to 90°), which represents the angle between the horizontal plane and the particle's long axis (pore channel), and θ (ranging from 0° to 180°), which represents the direction of the long axis (pore channel) projected onto the horizontal plane.Some of the 2-D and 3-D properties, such as pore space surface area and sphericity, were directly determined using ImageJ software.
Porosity was calculated as the fraction of image volume occupied by the pore space.

Machine learning procedure
The Orange.3 data mining software, which is a visual-based version of the Python programming language, was utilized for preprocessing the raw data collected from laboratory and image analysis methods.Ahangar-Asr et al. ( 2012) emphasized that the simplicity of a procedure and its capability to apply multiple models simultaneously are key factors in determining the priority of a method for estimating SWCC.In line with this, we utilized Orange.3software, which offers a user-friendly and e cient machine learning process.This software provides comprehensive data mining, modeling, and evaluation tools without requiring complex and time-consuming coding.The results of this procedure enable a quick comparison of various tted models, including Gradient Boosting, Ada Boost, Decision Tree, Random Forest, Neural Network, Support Vector Machine, k-Nearest Neighbors, and Linear Regression, to identify the most effective machine learning algorithm for predicting SWCC from the input raw dataset.Furthermore, the Feature Importance widget was used to determine the relative importance of input features in predicting SWCC with a minimal dataset.This widget assesses the contribution of each feature to the prediction by measuring the increase in prediction error when the values of the feature are permuted.Additionally, Orange.3 software was used to analyze the impact of different values of each feature on the model output.This analysis helps identify critical values of important features in modeling the SWCC, providing researchers with insights into the most in uential range of feature values.The statistical analysis included the calculation of R-squared (R 2 ) and root mean square error (RMSE) to assess the predictive capabilities of machine learning algorithms for SWCC prediction using features obtained from laboratory and image analysis methods.

The properties of initial soil samples
Table 1 presents the routine properties parameters obtained from the SWCCs of soil samples prior to any treatment.The selection of these two samples was done deliberately to ensure a wide range of variations in their physical, chemical, and hydraulic properties, allowing for a comprehensive evaluation.The loamy sand sample has a high sand content of 82.6% with approximately 12% clay, while the silty clay sample has a clay content exceeding 40% and a lower sand content of around 10%.Both samples are non-saline and slightly alkaline, but they differ signi cantly in terms of organic carbon (OC) content (0.12% vs. 0.42%) and cation exchange capacity (CEC) values (4.8 cmol + kg − 1 vs. 24.1 cmol + kg − 1 ).The matric suction at the in ection point (hi) of the soil water characteristic curve (SWCC) varies from 300 cm in the loamy sand sample to 70 cm in the silty clay sample.The shape factor (n) of the SWCC in Van Genuchten's (1980) model ranges from 2.07 in the loamy sand sample to 1.0 in the silty clay sample.The bulk density of the samples did not show signi cant differences.However, there were signi cant differences in the alpha coe cient, which corresponds to the inverse value of air entry into the soil (α, cm − 1 ), as well as in the saturation water content (Θ s , g.g − 1 ) and residual water content (Θ r , g.g − 1 ) between the two samples.2 presents the changes in the physical and chemical properties of the treated samples after the incubation period, compared to the blank samples.Additionally, Table 2 provides the results from image analyses of the soil pores developed as a result of the treatments.In the loamy sand samples, all measured properties (BD, CEC, EC, and OC except the pH values) were increased in all treatments, except for the Fe 2 O 3 treatment, compared to the blank sample.On the other hand, in the silty clay samples, bulk density was decreased in most treatments compared to the blank, except for the treatments involving cations and degraded samples.In the loamy sand samples, CEC showed a slight increase in the CaCO 3 and OC treatments, while it decreased in the other treatments.On the other hand, in the silty clay samples, EC, OC, and pH values increased in all treatments compared to the blank sample.The results obtained from image analyses revealed that the properties examined, including porosity surface area, volume, elongation, compactness, sphericity, and atness, did not exhibit consistent variations across different treatments.In the loamy sand sample, the porosity surface area increased in the CaCO 3 , Fe 2 O 3 , and combined treatments, while decreasing in the other treatments.In the silty clay sample, there was a slight increase in the OC treatment, but a decrease was observed in all other treatments compared to the blank.Orange.3software, based on a machine learning procedure, applied the mentioned features from Table 2 and Table 1 in eight algorithms to predict the soil water content at different matric suction levels.Soil matric suction is used as a prede ned input feature, while the other features are applied separately in all evaluating models.The most important features are determined based on their effects on the model output, as shown in Fig. 1.

Impacts and relative importance of the input parameters on the models
Researchers have utilized various soil properties, including the percentages of clay, silt, and sand, as well as void ratio and water content at saturation, along with soil matric suction related to gravimetric water content, for the estimation of SWCC (Ahangar-asr et al., 2012).Identifying the most signi cant features in SWCC estimation can greatly reduce time and energy consumption while increasing accuracy.As input features of models Fig. 1 (1.a to 1.h) illustrates the effects of different input parameters on model outputs and their relative importance in terms of the model's accuracies (RMSEs) in eight machine learning models.Among these models, matric suction was identi ed as the most important parameter in the GB (Fig. 1.a), AB (Fig. 1.b), RF (Fig. 1.d), and SVM (Fig. 1.f) models.On the other hand, organic carbon percentage, soil texture, porosity surface area, and electrical conductivity emerged as the most signi cant parameters in the DT (Fig. 1.c), ANN (Fig. 1.e), KNN (Fig. 1.g), and LR (Fig. 1.h) models, respectively.Matric suction was identi ed as the most important parameter among the rst three in uential parameters affecting the model outputs in all models, except for the ANN model (Fig. 1.e).Lower matric suction values resulted in higher prediction accuracy in the models, while higher matric suction values led to a decrease in accuracy.The results indicated that, except for the ANN model, three to ve of the input characteristics were identi ed as the most in uential parameters for prediction accuracy in different models.
After matric suction, soil pore characteristics emerged as the next important parameters in predicting accurate results, except in the case of the ANN model.At least one or two pore characteristics played a role in predicting the SWCC, with structural atness and porosity surface area being particularly in uential compared to other pore characteristics.In their study, Ahangar-Asr et al. ( 2012) attempted to incorporate soil void ratio as an input parameter in a model aimed at predicting SWCC and soil porosity characteristics.However, they did not speci cally investigate the in uence of these properties on the model's results.

The output of the models when all parameters used
When comparing the SWCCs generated by the models using all the studied parameters, it was found that the GB, AB, RF, and DT models produced the most accurate results with lower RMSE (< 0.028) and MAE (< 0.018), and higher d1 (> 0.93) and R 2 (> 0.968), as shown in Table 3.This means that the mean difference between the predicted and measured water contents was less than 0.02 g g − 1 for all matric suctions used to plot the SWCCs.Achieng   4 presents the Pearson correlation (r) between the measured water content (θ Measured ) and the evaluating models, along with the identi ed important features.Previous studies have reported correlation coe cients greater than 0.9 between estimated and measured SWCC or soil moisture content using the random forest algorithm (Im et al., 2016;Bai et al., 2019;Long et al., 2019;Zappa et al., 2019).However, it is important to note that the ability of the same algorithm to estimate soil moisture content may vary depending on the input features used in the modeling procedure.For example, the aforementioned studies utilized different sets of input features, including satellite-derived data, soil texture (Zappa et al., 2019), and leaf area index (Im et al., 2016).These variations in input features can result in different levels of correlation with the target values.As illustrated in Fig. 1 and further supported by Table 4, certain features exhibit a stronger correlation with the measured soil moisture content.Notably, matric suction has shown a strong negative correlation with θ Measured , indicating its in uence on soil moisture dynamics.
As anticipated, soil bulk density and sand percentage exhibit a negative correlation with soil water content.Additionally, a negative correlation was observed between water content and structural atness, indicating that increased soil pore compaction leads to a decrease in water content at varying matric suction levels.Notably, based on Pearson correlation coe cients, structural atness (r = -0.625)demonstrates a more explicit effect on the decrease of soil water content compared to soil bulk density (r = -0.469).

Just Appling soil matric suction as model input feature
To assess the necessity of incorporating additional input features for improving the model outputs, we conducted an evaluation using only the matric suction feature as the input.While soil matric suction has a signi cant impact on model learning and prediction accuracy, the results presented in Table 5 demonstrate that models trained solely using matric suction and related water content data did not achieve acceptable precision.The models exhibited high error rates and low R 2 values when tested on the dataset.These ndings indicate the need for additional input features to improve the accuracy and reliability of the models.
Despite the negative correlation observed between soil water content and matric suction in the evaluating models (Table 6), the calculated RMSE values revealed relatively high errors in the model outputs.The mean absolute errors further indicated signi cant inaccuracies in the prediction of soil water contents at different matric suction levels, with values ranging from 0.08 to 0.09.Such errors are far from acceptable in this context.Moreover, the considerably low values of R2 highlight the inconsistency between the predicted SWCC patterns and the observed data.The use of soil matric suction as the sole input feature in the eight evaluating models signi cantly reduces the correlation between the models and the measured water content (θMeasured).This, in turn, causes the correlation of the linear regression model with θ Measured to be lower than the correlations between matric suction and θMeasured (as shown in Table 6).Based on these ndings, it can be concluded that utilizing matric suction values alone in the prediction of the Soil Water Characteristic Curve (SWCC) yields better results compared to using the Linear regression model with only matric suction values.
This observation suggests that in this case the modeling process was not effective and did not produce useful outcomes.Speci cally, for the Loamy Sand soil sample, Gradient Boosting, Ada Boost, Tree, and Random forest models (Fig. 2, a-d) exhibited almost perfect predictions of SWCC.While the high accuracy prediction of the SWCC is consistent in Silty Clay soil samples, it is worth noting that for soil matric suctions higher than 1000 cm, the error of the mentioned models shows a relatively decreased trend.Previous studies have highlighted the exibility and reliability of machine learning algorithms such as ANN, kNN, and SVM in providing accurate estimations, as they do not rely on stringent assumptions about the underlying data and can adapt to various situations (Nguyen et al., 2017;Hastie et al., 2009).However, in the present study, the performance of the Neural Network, SVM, KNN, and Linear Regression models in predicting SWCC for both Sandy Loam and Silty Clay soil samples yielded errors that were deemed non-acceptable.Speci c details regarding the nature and magnitude of these errors would provide further insights into the limitations of these models in the context of the study.These errors resulted in deviations between the predicted SWCC patterns and the measured SWCC pattern across the entire range of matric suctions (Figs. 2 and 3, e ~ h).Speci cally, the models showed underestimation at low matric suction and overestimation at high matric suction for all studied soil samples.The SVM and KNN models fail to exhibit the expected decreasing trend with respect to matric suction in the Loamy Sand sample, rendering them unable to adequately explain the Soil Water Characteristic Curve (SWCC).Similarly, the KNN model yields inaccurate outputs for the Silty Clay soil sample.observed visual evidence of increasing model errors with higher soil matric suction in Figs. 2 and 3, as well as in Table 7.To further support this observation, we compared the error percentages of the evaluating models at matric suction values below and above a matric suction related to SWCCs in ection point (hi).

Evaluating models uncertainty
For the Loamy Sand soil samples, we calculate hi equal to to 70 cm, while for the Silty Clay soil samples, hi was calculated equal to to 300 cm. Figure 4 presents the results of this comparison.In both studied soil textures, we observed that the error percentages of all evaluating models are considerably higher at matric suctions greater than hi compared to matric suctions less than hi.Among the models, the DT model exhibited the maximum difference between the measured and predicted SWCCs at the two sides of the in ection point.Moreover, the prediction error percentages at matric suctions greater than hi were found to be ten times higher than those at matric suctions less than hi.Additionally, we observed that SVM and KNN models exhibit minimal changes in prediction errors with respect to matric suction.Consequently, there is a minimum difference between the prediction errors of SWCC at the two sides of the in ection point for these models.Based on this concept, the best performance models are identi ed as those with lower error percentages and a minimal difference in prediction errors at the two sides of the in ection point.Models such as Gradient Boosting, AB, and Random forest exhibit these characteristics.

Residual contents of predicted SWCCs
To quantify the absolute differences between predicted and measured SWCCs, we have presented the difference curves for both Loamy Sand and Silty Clay soil samples.Figure 5

Conclusion
The utilization of orange.3data mining software has facilitated a simple and e cient modeling procedure for predicting the soil water characteristic curve (SWCC) based on variations in soil properties.This software enables the seamless integration of a diverse range of measured physical soil properties into the model, without requiring extensive programming knowledge.In this approach, the training, testing, and evaluation of machine learning models were conducted to predict the SWCC.Interestingly, it was observed that models relying solely on soil matric suction as a prede ned feature were unable to accurately predict SWCC.The evaluation of the models in this form revealed that the Mean Absolute Error (MAE) exceeded 0.08, and (R 2 ) value was below 40%.Therefore, this study examined the effects of using all possible soil properties as model features to enhance its performance of any of these features resulted in a signi cant decrease in model prediction accuracy.our ndings indicate that for a more accurate estimation of SWCC, it is crucial to consider not only soil matric suction as a prede ned feature, but also important soil properties such as soil structural (bulk density), physicochemical (organic carbon), and morphological (structural atness or porosity surface area) properties.These properties were included as measured features in the model, demonstrating their signi cance in achieving a more precise estimation of SWCC.Based on the ndings of this study, we can draw conclusions regarding the necessity of considering other soil properties, in addition to matric suction, for accurate prediction of SWCC using ML.It is determined that SWCC prediction requires the input of soil properties, thus, we successfully determined which properties have the most signi cant impact on the models' outputs.A negligible error was identi ed in the models mentioned above which is related to matric suctions greater than SWCC in ection point matric suction.Due to the low susceptibility of soil properties on soil water content at speci ed matric suction in this part of SWCC, we can regardless of this error and assume perfect prediction in mentioned models. Input (2019)    conducted research using machine learning techniques, including ANN, DNN, and SVM models, to estimate SWCC.In most cases of drying SWCC, the models achieved an RMSE of less than 0.01, with R 2 and d1 values exceeding 0.99 and 0.94, respectively.The study demonstrated high accuracy in the estimation of SWCC in the studied Loamy Sand soil sample.Lamorski et al. (2017) employed various SVM models trained with physical soil properties, including SWCC drying branch, BD, Sand%, Silt%, clay%, OC, and soil speci c surface, as input variables.The resulting models successfully estimated SWCC wetting branches with an R 2 greater than 0.98 and an RMSE less than 0.02.Srivastava et al. (2013) utilized the SVM algorithm, which yielded an RMSE of 0.013 and an R 2 of 0.69.In contrast, the performance of the random forest algorithm varied across different studies.Long et al. (2019) and lm et al. (2016) reported RMSE values greater than 0.04 m 3 m − 3 , while Bai et al. (2019) achieved accurate results with an RMSE less than 0.02 m 3 m − 3 .However, in this study (ANN, SVM, KNN, and LR) showed a signi cant decrease in model accuracy (as indicated by higher values of RMSE, MAE, and lower values of d1 and R 2 ) compared to the acceptable limits of accuracy.Consequently, these models were unable to generate SWCCs that met the required level of accuracy.Similar to the ndings ofHastie et al. (2009), which demonstrated that regression-based methods may yield non-accurate results in pedo-transfer function methods, the LR algorithm in this study produced an R 2 of 0.66 and an RMSE of 0.69 when applied in the machine learning method, categorizing it as a non-accurate model.Nguyen et al. highlighted the bene ts of the KNN model, including its exibility, simplicity, accuracy in limited data availability conditions, and the ability to incorporate new observations into training datasets without the need to redevelop the PTF models.However, Guevara and Vargas (2019) examined the performance of the KNN algorithm for predicting soil moisture content based on DEM data and found that the prediction RMSE exceeded 0.05 m 3 m − 3 .In another study,Liu et al. (2018)  observed an RMSE greater than 0.07 m 3 m − 3 in the prediction of moisture content using the KNN algorithm with inputs derived from satellite-derived data.

Figures 2 and 3
Figures 2 and 3 the soil water characteristic curves for Loamy Sand and Silty clay soil samples, respectively.As mentioned earlier, the evaluating models can be categorized into two classes based on their prediction accuracy: high and low.In Figs.2 and 3, we explicitly demonstrate these differences.
depicts the difference curves for Loamy Sand samples, while Fig.6displays the difference curves for Silty Clay samples.Each gure includes multiple sub gures (a ~ h) representing different scenarios or conditions within each soil sample.Building upon the previous discussions regarding the high capability of the Gradient Boosting, AB, Tree, and Random forest models, it is evident from Figs.5 and 6(sub gures a-d) that these models exhibit minimal uctuation relative to zero.Furthermore, the other studied models, including Neural Network, SVM, KNN, and Linear regression, demonstrate signi cant underestimation at low matric suction and overestimation at higher matric suctions, as depicted in Figs.5 and 6(sub gures e-h).Similar to the results of this study, Achieng (2019) observed residual SWCC values of about − 0.1 to 0.1 g.g-1, but did not nd a speci c pattern for changes in errors with increasing matric suction.However, as illustrated in Figs.5-f and 6-f for both the studied Loamy Sand and Silty Clay soil textures, the highest estimation errors are observed at the two ends of the SWCC.In other words, the SVM model shows the highest error in the estimation of the structural-based and textural-based sections of the SWCC, and around the in ection point, the estimation error of the SVM model diminishes to about zero.

2
Figure 2 Comparison of the predicted and measured SWCCs by different models in Loamy Sand soil sample

Figure 3 Comparison
Figure 3 Comparison of the predicted and measured SWCCs by different models in Silty Clay soil samples

Figure 4 Comparisons
Figure 4 Comparisons evaluating models error percentages at two side of SWCC in ection point in a) Loamy Sand and b) Silty Clay soil samples

Figure 5 Absolute
Figure 5 Absolute difference of prediction and measured SWCC at evaluating models in Loamy Sand soil samples

Figure 6
Figure 6 Absolute difference of prediction and measured SWCC at evaluating models in Silty Clay soil samples

Table 2
Properties of treatments at the end of incubation and the results obtained from image analysis Similar toTable 2 and Table 1, a dataset of individual treatments was prepared, which was automatically divided into model training and test datasets.

Table 5
Statistics of models in the case where matric suction was used as the only input parameter

Table 7
Wang et al. (2021) percentages to quantify the mean differences between the observed and predicted SWCCs in both the Loamy Sand and Silty Clay soil samples.These error percentages provide insights into the uncertainty associated with each evaluating model.AlthoughWang et al. (2021)demonstrated high accuracy in determining SWCC for soils with a high clay fraction, this study found that the average error of the eight models used for Loamy Sand soil samples was considerably higher at 35% compared to Silty Clay Soil samples.However, the four well-predicted models, namely Gradient Boosting, AB, Tree, and Random Forest models, exhibited an equal average error percentage of approximately 5% in both Loamy Sand and Silty Clay soil samples, and no signi cant difference in the estimation of SWCC was observed between the two studied soil textures.The Gradient Boosting model demonstrated superior prediction capability in both studied soil textures, and it exhibited the lowest error percentage in Loamy sand soil samples, with an average uncertainty of 2.7%.The other evaluating models, such as Neural Network, SVM, KNN, and Linear regression, exhibited unreliable outputs with error percentages exceeding 20%.In particular, the SVM model performed poorly in Loamy Sand soil samples, reaching approximately 90% errors.Interestingly, these models showed comparatively better prediction performance in Silty Clay soil samples compared to Loamy Sand soil.

Table 7
Uncertainties of evaluating models (error percentage between observed and predicted results) in prediction of soil moisture content at different soil matric suction of Loamy Sand and Silty Clay soil samples model can effectively predict soil-water characteristic curves, especially for soils at high matric suctions, in contrast, in this study, we 3.6.Prediction errors at two sides of the in ection pointSome have observed that their models underestimated the water content of the SWCC at relatively high suction heads(Nguyen etal., 2017; Hwang and Powers, 2003a; Meskini-Vishkaee et al., 2014; Mohammadi and Meskini-Vishkaee, 2012; Tuller and Or, 2001; Tuller et al., 1999).Nguyen et al. (2017) attributed the underestimation of SWCC to the lack of measurement of input features at high matric suction situations.Other studies have shown the existence of corner water, lens water, and lm water in soils, which may be one of the main causes of the underestimation phenomenon (Mohammadi and Meskini-Vishkaee, 2012; Or and Tuller, 1999; Shahraeeni and Or, 2010; Tuller and Or, 2005; Tuller et al., 1999).However, Wang et al. (2021) claim that their improved prediction