According to the Köppen classification, Konya exhibits a cold semi-arid climate (BSk), while the Trewartha classification categorizes it as having a temperate continental (Dc) climate. The mean temperature during the summer season is approximately 30°C, characterized by chilly nights, while the winter season registers an average temperature of -4.2°C. The precipitation levels exhibit a low magnitude and primarily occur during the winter and spring seasons, with an average of merely 325 mm.
This study focuses on the Great Konya Basin, which holds excellent agricultural importance within Turkey. The region is renowned for its diverse fruit production, particularly cultivating sweet cherry, apple, peach, and other fruits. Additionally, the area is conducive to the growth of cash crops such as corn and clover, provided that irrigation water is accessible. The consistent implementation of commercial fertilizers, supplemented by a limited employment of organic fertilizers, has proficiently sustained the yields of the orchards. However, due to freshwater scarcity in the region, a very large portion of the agricultural lands are under a low-input rain-fed agricultural system. Therefore, drought-resistant cereals such as wheat, barley, etc. are common field crops in the area.
The altitude of the Great Konya Basin is estimated to be around 1000 meters. The influx of multiple rivers has influenced the Great Konya Basin, leading to the significant role of lacustrine carbonate formation and diagenesis in determining the physicochemical and mineralogical characteristics of the soils. This has been noted in previous studies by de Meester (1971) and Ozaytekin et al. (2012). The soils in question were primarily developed on Quaternary sediments, as documented by de Meester (1970 a,b). These sediments were deposited by a shallow lake during the Late Pleistocene era, forming several sandy beach ridges and sand plains. The presence of diverse sediments has led to the formation of different physiographic units, including uplands, colluvial slopes, piedmont plains, bajadas, terraces, alluvial plains, and lacustrine plains, atop the soft-calcareous lake bottom (de Meester, 1970 a, b). The flat Neogene limestone terraces are situated in the peripheries of the Konya Basin. The terrain exhibits gradual inclines leading towards the central region, which has been subject to localized fragmentation due to the presence of erosion gullies. The southern region of the basin is characterized by alluvial plains and fans that are composed of sediments ranging from coarse sand to heavy clay textures, which certain rivers have deposited. The Konya Basin is characterized by the presence of flat and calcareous lacustrine plains that were deposited underwater. These plains are surrounded by sandy beach ridges and shores that were formed due to the continuous washing of the former Pleistocene lake, which has been dated to be between 23,000 and 17,000 years old (Roberts et al. 1979; Roberts, 1983).
Descriptive soil properties
Soil samples were collected from the 0–20 cm depth interval where the fertilizer was applied (Kacar, 2013). The samples were homogenized using standard soil ploughing techniques. The study area was subjected to the collection of 538 surface soil samples. The soil samples underwent an air-drying process and were subsequently sifted through a 2 mm sieve to ascertain their standard soil characteristics. The physical and chemical attributes of the soil were evaluated through conventional techniques applicable to calcareous soils, as outlined by Kacar (2009) and Sparks (1996). The soil properties that were ascertained include the mechanical analysis, which involved using a hydrometer method to quantify the proportion of soil particles of clay, silt, and sand sizes. The saturation paste was also used to determine the electrical conductivity (EC) and pH levels, which were measured using an EC probe and a combined pH electrode. The organic matter content was determined using the dichromate oxidation method of Walkley-Black, while the available phosphorus was measured using the Olsen et al. (1954) method. The soil samples were solubilized using a solution of aqua-regia with a 3:1 volumetric ratio of HNO3 to HCl. Subsequently, the concentrations of P and Cd in the resulting digests were analyzed using ICP-OES technology, specifically the Perkin-Elmer Optima 2100 instrument.
Statistical procedures
The descriptive statistics of all soil properties from each observation point were reported as mean, standard error of mean, median, std. deviation, variance, skewness, kurtosis, range, minimum, and maximum (Table 1). Then the coherence between measured soil parameters was further investigated by conventional Spearmen correlation. The data was initially partitioned into two distinct sets, namely the training set, which constituted 70% of the data, and the testing set, which comprised 30% of the data. The model was trained using the first 70% of the observations. The training dataset was partitioned into two subsets, namely the learning and validation datasets, with a random split of 70% and 30%, respectively. These subsets were utilized to train a variety of machine learning and regression models. The utilization of Cd facilitated the selection of optimal parameters for a variety of machine learning and regression models through a grid search approach in parameter space. The final model was chosen based on the RMSE value, with preference given to the model with the lowest RMSE. The study's analyses were conducted utilizing Rstudio version 4.3.0 software (R Core Team, 2023).
Table 1
Descriptive statistics for measured soil properties
Statistics | | EC dS/cm | pH | P2O5 mg/kg | OM % | CaCO3 % | TCd mg/kg | P mg/kg | CEC mmol/kg | Sand % | Silt % | Clay % |
Mean | 0.054 | 7.84 | 2.99 | 1.68 | 29.01 | 0.421 | 272.6 | 29.0 | 31.66 | 27.57 | 40.78 |
Std. Err. of Mean | 0.003 | 0.01 | 0.07 | 0.04 | 0.88 | 0.015 | 4.6 | 0.4 | 0.75 | 0.37 | 0.69 |
Median | 0.063 | 0.34 | 1.58 | 0.85 | 20.50 | 0.337 | 106.4 | 8.9 | 17.29 | 8.61 | 16.08 |
Std. Deviation | 0.004 | 0.11 | 2.48 | 0.72 | 420.10 | 0.114 | 11327 | 78.4 | 298.8 | 74.13 | 258.5 |
Variance | 5.442 | -0.53 | 1.29 | 1.73 | 0.71 | 2.142 | 0.5 | 0.5 | 0.59 | 0.54 | 0.11 |
Skewness | 40.74 | 6.83 | 2.51 | 4.31 | -0.38 | 4.959 | 0.0 | 0.1 | 0.02 | 2.04 | -0.73 |
Kurtosis | 0.686 | 3.59 | 10.2 | 5.36 | 95.24 | 1.970 | 578.2 | 58.2 | 90.54 | 65.00 | 75.37 |
Range | 0.003 | 5.83 | 0.46 | 0.34 | 0.00 | 0.000 | 79.4 | 7.5 | 1.27 | 1.09 | 5.57 |
Minimum | 0.689 | 9.42 | 10.68 | 5.70 | 95.24 | 1.970 | 657.6 | 65.7 | 91.81 | 66.09 | 80.94 |
Maximum | 0.054 | 7.84 | 2.99 | 1.68 | 29.01 | 0.421 | 272.6 | 29.0 | 31.66 | 27.57 | 40.78 |
Std. Error of Skewness 0.105 and Std. Error of Kurtosis 0.21 (N 538)
Machine Learning Algorithm
The MARS (Multivariate Adaptive Regression Splines) algorithm was introduced by Friedman (1991) as a nonparametric machine learning technique that effectively handles the identification of pattern challenges in both classification and regression tasks, especially for data that exhibit nonmonotonic or nonlinear characteristics. The MARS model is employed to generate a set of functions that represent linear regression that have the ability to predict the values of the continuous variable of interest.
Decision trees are algorithms that are based on trees and are utilized to assess a quantitative feature, as Ali et al. (2015) noted. Breiman et al. (1984) developed the CART procedure to fulfill this particular objective. The CART algorithm is a type of tree model that employs a binary approach, wherein a node is recursively divided into two child nodes. The algorithm involves an iterative procedure for acquiring a group of uniform nodes from a dataset used for learning, aiming to reduce the error variance across both the training and test sets.
The Random Forest approach employs a nonparametric algorithm for analysing data, which was first introduced by Breiman (2001). This technique is capable of handling tasks such as classification and regression. The application of regression trees in diverse amalgamations. It is feasible to construct regression trees by utilizing a subset selected randomly from predictors. The bootstrapping methodology is implemented to choose a subset of characteristics randomly. As a result, in the Random Forest algorithm, each individual regression tree assigns a unique group of predictors to its root, internal, and leaf nodes. Svetnik et al. (2003) proposed that the mean of tree results situated at the nodes that correspond to the leaves can be used to estimate the dependent parameters' prediction values.
Friedman (2001) introduced XGBoost as a proficient and expandable algorithmic implementation grounded on gradient boosting. The XGBoost methodology is founded on the gradient-based tree technique. Furthermore, the method known as XGBoost is a tree-based regression algorithm that utilizes the same decision rules as the decision tree algorithm. Yu et al. (2020) state that the XGBoost employs a collection of tree structures for classification and regression to construct an expression that can proficiently accommodate the training dataset. Moreover, XGBoost exhibits the potential to exploit sparsity and tackle the problem of overfitting in the dataset by incorporating shrinkage and regularization methodologies (Gertz et al. 2020). In the training phase, XGBoost employs decision trees to distinguish between two cohorts and identify the variables that can improve the model's effectiveness. Furthermore, it is customary to prioritize computational efficiency rather than incorporating superfluous variables, as Gertz et al. (2020) stated. The primary objective of this procedure is to construct an ensemble of decision trees that exhibit high variance and low bias.
Model comparison criteria
The metrics frequently utilized for model comparison include root mean squared error (RMSE), standard deviation ratio (SDratio), coefficient of determination (R2), and coefficient of variation (Table 2). The Eyduran (2019) ehaGoF package was utilized to compute the goodness of fit criteria.
Table 2
Assessing the Effectiveness of Models
Error metric | Equation |
Root mean square deviation | \(RMSE=\sqrt{\frac{1}{n}{\sum }_{t=1}^{n}\left(y\text{t}\text{r}\text{u}\text{e}\_y\text{p}\text{r}\text{e}\text{d}\right)}\) |
Standard deviation ratio (SDR) | \({SD}_{ratio}=\sqrt{\frac{\frac{1}{n-1}\sum _{i=1}^{n}{\left({\epsilon }_{i}-{\epsilon }^{-}\right)}^{2}}{\frac{1}{n-1}\sum _{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}}\) |
Coefficient of determination | \({R}_{sq}=\left[\sqrt{1-\frac{\sum _{i=1}^{n}{\left({Y}_{i}-{\widehat{Y}}_{i}\right)}^{2}}{\sum _{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}}\right]*100\) |
Coefficient of variation | \(CV\left(\%\right)=\frac{\sqrt{\frac{1}{n-1}\sum _{i=1}^{n}{\left({\epsilon }_{i}-{\epsilon }^{-}\right)}^{2}}}{{Y}^{-}}*100\) |