Prediction of Cadmium Content Using Machine Learning Methods

doi:10.21203/rs.3.rs-3087164/v1

Download PDF

Research Article

Prediction of Cadmium Content Using Machine Learning Methods

https://doi.org/10.21203/rs.3.rs-3087164/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 27 May, 2024

Read the published version in Environmental Earth Sciences →

You are reading this latest preprint version

Heavy metals are the most environmentally hazardous pollution type in agricultural soils, threatening human and ecological health. Cadmium (Cd) is a highly toxic element but distinctively different with its high mobility in soil environments. The study aimed to evaluate the Cd concentration of Konya plain soils with a specific attribute to soil fertilization practices, mainly phosphorous fertilizers. A total of 538 surface (0-20 cm) soil samples were analysed for the routine soil properties and total phosphorus (P) and Cd. Descriptive statistics, machine learning and regression models considered the accumulation of Cd in soils. Among the MARS, Decision Trees, Linear Regression, Random Forest, and XGBoost machine learning methods used in Cd prediction, the XGBoost model proved to be the best prediction model with a coefficient of determination of 98.1%. EC, pH, CaCO3, silt, and P2O5, which are the soil components used in Cd estimation of XGBoost model, explained 56.51% of the total variance in relation to measured soil properties. Therefore machine learning processes could be a useful tool to estimate the nature of an element in the soils of a specific region by using routine soil properties.

Heavy metals in soil pose a significant risk to the quality and quantity of agricultural production. The combustion of fossil fuels, mining activities, non-standard application of pesticides and fertilizers, discharge of municipal effluents, traffic, and other industrial processes have led to an escalating deposition of heavy metals in soil and the environment, resulting in ecological imbalances (Vaverková et al. 2019). Increasing industrial development and intensive agricultural techniques to meet the infinite demand of human beings accelerate the heavy metal load in agricultural areas day by day. These activities eventually result in many environmental problems, various soil-related issues have been identified in the literature, including but not limited to land desertification, soil pollution, soil fertility decline, soil erosion, and salinization. Researchers such as Nosrati and Collins, (2019), Zhang and Wang (2020) have extensively studied these issues. The issue of heavy metal pollution in soil presents unique challenges compared to other soil problems. This is due to several factors, including its long residuality, irreversibility, limited mobility, high toxicity, and complex chemical composition based on elements (Tsai et al. 2019; Dhaliwal et al. 2020; Zhao et al. 2022). As a result, addressing this problem is a highly challenging task.

Intensive or excessive usage input of chemicals used in agriculture is a significant reason of non-point heavy metal pollution of agricultural soils (Dong et al. 2020; De Souza et al. 2016; Kurwadkar, 2019). Pollutants arising from agricultural activities are mainly fertilization, pesticides, and agricultural traffic. Fertilizers have a special importance in this context due to their continuity and high use. The utilization of phosphorous and organically produced fertilizers that include heavy metals is contributing to the escalation of heavy metal contamination in agricultural soils, as per the findings of Chao et al. (2019). On the other hand, the abuse of various chemicals and the utilization of fertilizers and pesticides with acidic properties in agricultural settings has resulted in soil acidification in numerous regions from a physiological standpoint. (Wang et al. 2019a; Xu et al. 2019). The acidification can exacerbate the hazardous effect of the heavy metals in soils by increasing solubility and mobility.

Although new methods to reduce the heavy metal content of fertilizers are developed every day in fertilizer production processes, especially phosphorus fertilizers may contain high amounts of heavy metals due to the content of phosphate rocks in which they are produced (Nicholson et al. 2003; Nziguheba and Smolder, 2008; Lambert et al. 2007; Kacar, 2013). Among these, superphosphates, double superphosphates, triple superphosphates, and phosphorus compound fertilizers obtained with the addition of phosphoric acid obtained by sulphuric acid or wet combustion method contain higher potential hazards (Kacar, 2013). However, the P fertilizers using phosphoric acid produced through dry combustion contain negligible heavy metals (Kacar, 2013; Lopez-Valdez and Fernandez-Luqueno, 2014). The other nitrogenous and potassium fertilizers consist of relatively much smaller amounts of heavy metals (Senesi and Polemio, 1981; Molina et al. 2009; Jiao et al. 2012; Benson et al. 2014).

Current research has been primarily centered on urban areas as potential sources of heavy metal soil contamination (Wang et al. 2019b; Yadav et al. 2019; Vural et al. 2021), industrialized and mining areas (Wanhong et al. 2020; Koca, 2019), crop producing areas (Chakaraborty et al. 2019; Ozkan and Uygur, 2019; Taspinar et al. and, 2022; Gunal et al. and, 2023; Bayraklı et al. and, 2023), besides urban parks, there are also public playgrounds (Luo et al. and, 2019; Yesil and Yesil, 2019). However, more detailed data on the most dynamic and mobile of the heavy metals, Cd, regarding farmland soils of Konya plain, the primary producer of wheat in Türkiye, are still scarce on a very large scale. In this study, we aimed at the Cd pollution levels of the soils by specific attributes to fertilizer usage and routinely measured soil properties towards understanding and estimating the Cd accumulation at any specific site by machine learning and regression models.

According to the Köppen classification, Konya exhibits a cold semi-arid climate (BSk), while the Trewartha classification categorizes it as having a temperate continental (Dc) climate. The mean temperature during the summer season is approximately 30°C, characterized by chilly nights, while the winter season registers an average temperature of -4.2°C. The precipitation levels exhibit a low magnitude and primarily occur during the winter and spring seasons, with an average of merely 325 mm.

This study focuses on the Great Konya Basin, which holds excellent agricultural importance within Turkey. The region is renowned for its diverse fruit production, particularly cultivating sweet cherry, apple, peach, and other fruits. Additionally, the area is conducive to the growth of cash crops such as corn and clover, provided that irrigation water is accessible. The consistent implementation of commercial fertilizers, supplemented by a limited employment of organic fertilizers, has proficiently sustained the yields of the orchards. However, due to freshwater scarcity in the region, a very large portion of the agricultural lands are under a low-input rain-fed agricultural system. Therefore, drought-resistant cereals such as wheat, barley, etc. are common field crops in the area.

The altitude of the Great Konya Basin is estimated to be around 1000 meters. The influx of multiple rivers has influenced the Great Konya Basin, leading to the significant role of lacustrine carbonate formation and diagenesis in determining the physicochemical and mineralogical characteristics of the soils. This has been noted in previous studies by de Meester (1971) and Ozaytekin et al. (2012). The soils in question were primarily developed on Quaternary sediments, as documented by de Meester (1970 a,b). These sediments were deposited by a shallow lake during the Late Pleistocene era, forming several sandy beach ridges and sand plains. The presence of diverse sediments has led to the formation of different physiographic units, including uplands, colluvial slopes, piedmont plains, bajadas, terraces, alluvial plains, and lacustrine plains, atop the soft-calcareous lake bottom (de Meester, 1970 a, b). The flat Neogene limestone terraces are situated in the peripheries of the Konya Basin. The terrain exhibits gradual inclines leading towards the central region, which has been subject to localized fragmentation due to the presence of erosion gullies. The southern region of the basin is characterized by alluvial plains and fans that are composed of sediments ranging from coarse sand to heavy clay textures, which certain rivers have deposited. The Konya Basin is characterized by the presence of flat and calcareous lacustrine plains that were deposited underwater. These plains are surrounded by sandy beach ridges and shores that were formed due to the continuous washing of the former Pleistocene lake, which has been dated to be between 23,000 and 17,000 years old (Roberts et al. 1979; Roberts, 1983).

Descriptive soil properties

Soil samples were collected from the 0–20 cm depth interval where the fertilizer was applied (Kacar, 2013). The samples were homogenized using standard soil ploughing techniques. The study area was subjected to the collection of 538 surface soil samples. The soil samples underwent an air-drying process and were subsequently sifted through a 2 mm sieve to ascertain their standard soil characteristics. The physical and chemical attributes of the soil were evaluated through conventional techniques applicable to calcareous soils, as outlined by Kacar (2009) and Sparks (1996). The soil properties that were ascertained include the mechanical analysis, which involved using a hydrometer method to quantify the proportion of soil particles of clay, silt, and sand sizes. The saturation paste was also used to determine the electrical conductivity (EC) and pH levels, which were measured using an EC probe and a combined pH electrode. The organic matter content was determined using the dichromate oxidation method of Walkley-Black, while the available phosphorus was measured using the Olsen et al. (1954) method. The soil samples were solubilized using a solution of aqua-regia with a 3:1 volumetric ratio of HNO₃ to HCl. Subsequently, the concentrations of P and Cd in the resulting digests were analyzed using ICP-OES technology, specifically the Perkin-Elmer Optima 2100 instrument.

Statistical procedures

The descriptive statistics of all soil properties from each observation point were reported as mean, standard error of mean, median, std. deviation, variance, skewness, kurtosis, range, minimum, and maximum (Table 1). Then the coherence between measured soil parameters was further investigated by conventional Spearmen correlation. The data was initially partitioned into two distinct sets, namely the training set, which constituted 70% of the data, and the testing set, which comprised 30% of the data. The model was trained using the first 70% of the observations. The training dataset was partitioned into two subsets, namely the learning and validation datasets, with a random split of 70% and 30%, respectively. These subsets were utilized to train a variety of machine learning and regression models. The utilization of Cd facilitated the selection of optimal parameters for a variety of machine learning and regression models through a grid search approach in parameter space. The final model was chosen based on the RMSE value, with preference given to the model with the lowest RMSE. The study's analyses were conducted utilizing Rstudio version 4.3.0 software (R Core Team, 2023).

Table 1

Descriptive statistics for measured soil properties
Statistics	EC dS/cm	pH	P₂O₅ mg/kg	OM %	CaCO₃ %	TCd mg/kg	P mg/kg	CEC mmol/kg	Sand %	Silt %	Clay %
Mean	0.054	7.84	2.99	1.68	29.01	0.421	272.6	29.0	31.66	27.57	40.78
Std. Err. of Mean	0.003	0.01	0.07	0.04	0.88	0.015	4.6	0.4	0.75	0.37	0.69
Median	0.063	0.34	1.58	0.85	20.50	0.337	106.4	8.9	17.29	8.61	16.08
Std. Deviation	0.004	0.11	2.48	0.72	420.10	0.114	11327	78.4	298.8	74.13	258.5
Variance	5.442	-0.53	1.29	1.73	0.71	2.142	0.5	0.5	0.59	0.54	0.11
Skewness	40.74	6.83	2.51	4.31	-0.38	4.959	0.0	0.1	0.02	2.04	-0.73
Kurtosis	0.686	3.59	10.2	5.36	95.24	1.970	578.2	58.2	90.54	65.00	75.37
Range	0.003	5.83	0.46	0.34	0.00	0.000	79.4	7.5	1.27	1.09	5.57
Minimum	0.689	9.42	10.68	5.70	95.24	1.970	657.6	65.7	91.81	66.09	80.94
Maximum	0.054	7.84	2.99	1.68	29.01	0.421	272.6	29.0	31.66	27.57	40.78

Std. Error of Skewness 0.105 and Std. Error of Kurtosis 0.21 (N 538)

Machine Learning Algorithm

The MARS (Multivariate Adaptive Regression Splines) algorithm was introduced by Friedman (1991) as a nonparametric machine learning technique that effectively handles the identification of pattern challenges in both classification and regression tasks, especially for data that exhibit nonmonotonic or nonlinear characteristics. The MARS model is employed to generate a set of functions that represent linear regression that have the ability to predict the values of the continuous variable of interest.

Decision trees are algorithms that are based on trees and are utilized to assess a quantitative feature, as Ali et al. (2015) noted. Breiman et al. (1984) developed the CART procedure to fulfill this particular objective. The CART algorithm is a type of tree model that employs a binary approach, wherein a node is recursively divided into two child nodes. The algorithm involves an iterative procedure for acquiring a group of uniform nodes from a dataset used for learning, aiming to reduce the error variance across both the training and test sets.

The Random Forest approach employs a nonparametric algorithm for analysing data, which was first introduced by Breiman (2001). This technique is capable of handling tasks such as classification and regression. The application of regression trees in diverse amalgamations. It is feasible to construct regression trees by utilizing a subset selected randomly from predictors. The bootstrapping methodology is implemented to choose a subset of characteristics randomly. As a result, in the Random Forest algorithm, each individual regression tree assigns a unique group of predictors to its root, internal, and leaf nodes. Svetnik et al. (2003) proposed that the mean of tree results situated at the nodes that correspond to the leaves can be used to estimate the dependent parameters' prediction values.

Friedman (2001) introduced XGBoost as a proficient and expandable algorithmic implementation grounded on gradient boosting. The XGBoost methodology is founded on the gradient-based tree technique. Furthermore, the method known as XGBoost is a tree-based regression algorithm that utilizes the same decision rules as the decision tree algorithm. Yu et al. (2020) state that the XGBoost employs a collection of tree structures for classification and regression to construct an expression that can proficiently accommodate the training dataset. Moreover, XGBoost exhibits the potential to exploit sparsity and tackle the problem of overfitting in the dataset by incorporating shrinkage and regularization methodologies (Gertz et al. 2020). In the training phase, XGBoost employs decision trees to distinguish between two cohorts and identify the variables that can improve the model's effectiveness. Furthermore, it is customary to prioritize computational efficiency rather than incorporating superfluous variables, as Gertz et al. (2020) stated. The primary objective of this procedure is to construct an ensemble of decision trees that exhibit high variance and low bias.

Model comparison criteria

The metrics frequently utilized for model comparison include root mean squared error (RMSE), standard deviation ratio (SDratio), coefficient of determination (R²), and coefficient of variation (Table 2). The Eyduran (2019) ehaGoF package was utilized to compute the goodness of fit criteria.

Table 2

Assessing the Effectiveness of Models
Error metric	Equation
Root mean square deviation	\(RMSE=\sqrt{\frac{1}{n}{\sum }_{t=1}^{n}\left(y\text{t}\text{r}\text{u}\text{e}\_y\text{p}\text{r}\text{e}\text{d}\right)}\)
Standard deviation ratio (SDR)	\({SD}_{ratio}=\sqrt{\frac{\frac{1}{n-1}\sum _{i=1}^{n}{\left({\epsilon }_{i}-{\epsilon }^{-}\right)}^{2}}{\frac{1}{n-1}\sum _{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}}\)
Coefficient of determination	\({R}_{sq}=\left[\sqrt{1-\frac{\sum _{i=1}^{n}{\left({Y}_{i}-{\widehat{Y}}_{i}\right)}^{2}}{\sum _{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}}\right]*100\)
Coefficient of variation	\(CV\left(\%\right)=\frac{\sqrt{\frac{1}{n-1}\sum _{i=1}^{n}{\left({\epsilon }_{i}-{\epsilon }^{-}\right)}^{2}}}{{Y}^{-}}*100\)

The study conducted correlation analysis to investigate the association between Cd in soil and various independent variables, including EC, pH, total phosphorus (P), organic matter (OM), lime content (CaCO₃), available phosphorus (P₂O₅), cation exchange capacity (CEC), sand, silt, and clay. Figure 1 displays the outcomes of the Spearman correlation analysis. This analysis facilitates comprehension of the impact of variables on Cd levels in soil and the selection of appropriate variables for application in various machine learning and regression models. Based on the correlation analysis, it can be observed that the correlation coefficients (with a P value < 0.01) of Cd exhibit significant correlations with various soil properties, including EC (0.16), pH (0.17), P (0.26), P₂O₅ (0.22), CEC (0.13), Silt (0.16), Clay (0.22), CaCO₃ (-0.28), and sand (-0.26). Besides soil organic matter, all of the soil properties have significant correlations, which different mechanisms may explain. The strong relation between the total soluble salt and Cd concentration may be attributed to the relatively high mobility of both salt and Cd in the soil environment (Kabata-Pendias, 2011). On the other hand, the irrigated ag-lands have been under increasing salinity treatment (Usta, 1999) which also requires larger amounts of fertilizer input than the typical rain-fed management system. The CEC and clay content are related to Cd sorption that result in accumulation.

The concentration of heavy metals in soils is influenced by various factors, including pH, soil organic matter, cation exchange capacity, and clay content, as noted by Alloway (2012). The study conducted by Chavez et al. (2015) examined the correlation between cadmium and soil properties. The findings indicated a strong association between Cd, clay, CEC, pH, P₂O₅, and P. According to Seshadri et al. (2016), applying phosphorus fertilizers, an agricultural practice, has been identified as a significant contributor to soil Cd contamination. The present findings indicate that the variables of EC, pH, P₂O₅, P, CEC, silt, and clay exhibit a positive correlation with Cd, which is consistent with the existing literature. As per the findings of Sø et al. (2011), the reaction between the lime present in the environment and Cd results in the formation of an insoluble compound. According to Cattani et al. (2008), the incorporation of CaCO₃ resulted in a 25% decrease in the Cd content of rice. The inverse relationship observed in sand value can be attributed to reduced surface area and ease of leaching. In contrast, the negative correlation observed in lime content is purportedly linked to a relative decrease in the portion of high sorption capacity soil components such as clay and sesquioxide minerals and incorporation of raw parent material due to shallow soil depth at hilly topography (de Meester, 1971; Ozaytekin et al., 2012).

Chemometric Relations of Cadmium

The estimated values obtained from the MARS, Decision Trees, Linear Regression, Random Forest, and XGBoost are given in Table 3. According to Grzesiak et al. (2012), the RMSE and SDR values were near zero. According to Wilding et al. (1994), the coefficient of variation (CV) values were less than 10%, while the R-squared (Rsq) values approached 100. Hence, the XGBoost algorithm emerged as a dependable machine learning technique for accurately forecasting the Cd concentration, as evidenced by the results presented in Table 3. The XGBoost model exhibited a strong positive correlation between predicted and actual values, as depicted in Fig. 2. The findings suggest that the XGBoost algorithm demonstrated a proficient ability to make accurate predictions.

Table 3

Model performance assessment of relationships between soil properties and Cd
Selection criteria	MARS	Decision Trees	Linear Regression	Random Forest	XGBoost
Root mean square error (RMSE)	0.268	0.340	0.307	0.260	0.007
Standard deviation ratio (SDR)	0.859	1.095	0.989	0.829	0.021
Coefficient of determination (Rsq)	0.247	-0.212	0.011	0.291	0.981
Coefficient of variation (CV)	67.680	86.260	77.880	65.320	1.670

The results revealed that the XGBoost machine learning model has excellent predictive ability, which is indicated by an extremely high coefficient of determination (0.981) predictive values compared to other machine learning models (Fig. 2, Table 3).

Figure 3 presents the impact of the explanatory variables utilized in the XGBoost model on Cd. The EC (12.77%) and pH (11.78%), parameters related to solubility and equilibrium in soils, are important in predicting Cd by the XGBoost model. The parameter P₂O₅, which is the available phosphorus with the application of phosphorus fertilizers, which are given for the sustainability of agricultural production and which contain trace amounts of Cd, is important in the estimation of Cd (11.30%). In the study area, a former lake bottom, the physical parameters silt (10.25%), and CaCO₃ (10.25%) are important. It is observed that parameters such as EC, pH, silt, and CaCO₃, which can be counted from the genetic characteristics of soils, play a dominant role in Cd estimation. At the same time, available phosphorus and the total P from fertilization, controlled by anthropological factors totaling up to 21.19% of the variance, take the first place in Cd estimation. These five parameters totaled a 56.51% variance in the estimation of Cd by the XGBoost model. Indeed there could be a significant contribution from the topography as a driving force, but this study did not consider it.

Research conducted on the Web of Science platform on May 16, 2023, retrieved 19 scientific papers concerning XGBoost in the field of soil science. Most of these publications have been utilized to create models based on data acquired from spectrophotometric techniques, such as VNIRS and/or remote sensing predictions (Xu et. al. 2022; Abedi et al. 2021; Ågren et al. 2021; Nguyen et al. 2021; Andrade et al. 2020; Mahmoudzadeh et al. 2020). The publications obtained through online open-source searches demonstrate a certain level of resemblance regarding subject matter (Ye et al. 2022; Chen et al. 2020; Ge et al. 2021; Dhaliwal et al. 2022; Budak et al. 2023). Although not directly related to the current topic, Budak et al. (2023) research demonstrated that boosting algorithms produced the most resilient reliability estimates. Similarly, Gokmen et al. (2023) employed the XGBoost machine learning methodology to determine the available boron content, yielding high reliability with an R² coefficient of 0.997. The impact of explanatory variables of the XGBoost model on the concentration of Cd is primarily constituted by parameters that demonstrate significance in the correlation analysis. The XGBoost regression model is expected to exhibit high reliability in estimating total cadmium.

The present research showcases a variety of efficacious models for machine learning and regression models that offer a novel perspective on establishing the correlation between Cd levels and soil properties. The derived XGBoost, a machine learning model, is purported to aid in the identification of Cd, evaluation of economic and environmental ramifications across diverse local contexts and enable informed decision-making towards the enhancement of targeted environmental policies. The present study examines the levels of Cd in the data of Konya Plain’s soils. In contrast to traditional methodologies, the XGBoost algorithm, a type of machine learning model, has demonstrated efficacy in analysing the presence of Cd in soil by utilizing a range of soil parameters derived from empirical data. The present article presents novel findings derived from the application of XGBoost to the ecosystem of interest. Specifically, the study investigates the relationship between Cd and various soil parameters, representing the first instance of such analysis in the literature. The informative results offer valuable guidance for future research in this area. The findings may be regarded as a significant indication of the superiority of XGBoost compared to other traditional methodologies. The findings have been determined to hold significant potential for environmental and interdisciplinary research planning within this scientific domain. The current study evaluated the levels of Cd present in the Konya Plain. However, it is noteworthy that the applicability of this study to other ecosystems with differing scales and crop types has to be tested.

Author Contributions

The authors confirm contribution to the paper as follows: study conception, design and data collection: Mehmet Keçeci, Mustafa Usul and Celal Koca; analysis and interpretation of results and draft manuscript preparation: Veli Uygur and Fatih Gökmen; All authors reviewed the results and approved the final version of the manuscript.

Code Availability Statement

To reach the code please contact with the author Fatih Gökmen.

Acknowledgement

The authors would like to express their appreciation to TAGEM for their financial contributions to the completion of the Project with 2012/148 reference number.

Abedi F, Amirian‐Chakan A, Faraji M, Taghizadeh‐Mehrjardi R, Kerry R, Razmjoue D, Scholten T (2021) Salt dome related soil salinity in southern Iran: Prediction and mapping with averaging machine learning models. Land Degradation & Development, 32(3), 1540-1554.
Ågren AM, Larson J, Paul SS, Laudon H, Lidberg W (2021) Use of multiple LIDAR-derived digital terrain indices and machine learning for high-resolution national-scale soil moisture mapping of the Swedish forest landscape. Geoderma, 404, 115280.
Ali M, Eyduran E, Tariq MM, Tirink C, Abbas F, Bajwa MA, et al (2015) Comparison of artificial neural network and decision tree algorithms used for predicting live weight at post weaning period from some biometrical characteristics in Harnai sheep. Pakistan Journal of Zoology, 47(6).
Alloway BJ. (Ed.). (2012) Heavy metals in soils: trace metals and metalloids in soils and their bioavailability (Vol. 22). Springer Science & Business Media.
Andrade R, Silva SHG, Weindorf DC, Chakraborty S, Faria WM, Mesquita LF, et al (2020) Assessing models for prediction of some soil chemical properties from portable X-ray fluorescence (pXRF) spectrometry data in Brazilian Coastal Plains. Geoderma, 357, 113957.
Bayraklı B, Dengiz O, Özyazıcı MA, Koç Y, Kesim E, Türkmen F (2023) Assessment of heavy metal concentrations and behavior in cultivated soils under humid-subhumid environmental condition of the Black Sea region. Geoderma Regional, 32, e00593.
Breiman L (2001) Random forests. Machine learning, 45, 5-32.
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and Regression Trees. Chapman and Hall, London.
Budak M, Günal E, Kılıç M, Çelik İ, Sırrı M, Acir, N (2023) Improvement of spatial estimation for soil organic carbon stocks in Yuksekova plain using Sentinel 2 imagery and gradient descent–boosted regression tree. Environmental Science and Pollution Research, 30(18), 53253-53274.
Cattani I, Romani M, Boccelli R (2008) Effect of cultivation practices on cadmium concentration in rice grain. Agronomy for Sustainable Development, 28, 265-271.
Chakraborty P, Sampath S, Mukhopadhyay M, Selvaraj S, Bharat GK, Nizzetto L (2019). Baseline investigation on plasticizers, bisphenol A, polycyclic aromatic hydrocarbons and heavy metals in the surface soil of the informal electronic waste recycling workshops and nearby open dumpsites in Indian metropolitan cities. Environmental Pollution, 248, 1036-1045.
Chavez E, He ZL, Stoffella PJ, Mylavarapu RS, Li YC, Moyano B, Baligar VC (2015) Concentration of cadmium in cacao beans and its relationship with soil cadmium in southern Ecuador. Science of the Total Environment, 533, 205-214.
Chen Y, Ma L, Yu D, Zhang H, Feng K, Wang X, Song J (2022) Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecological Indicators, 135, 108545.
de Meester T (1970a) Soil Map of the Great Konya Basin, Turkey. Agricultural University, Wageningen.
de Meester T (1970b) Soils of the Great Konya Basin, Turkey. Agric. Res. Rep, 740, 290.
de Meester T (1971) Highly calcareous lacustrine soils in the Great Konya Basin, Turkey. Wageningen University and Research.
De Souza RV, Garbossa LHP, Campos CJA, Vianna LDN, Vanz A, Rupp GS (2016) Metals and pesticides in commercial bivalve mollusc production areas in the North and South Bays, Santa Catarina (Brazil). Marine Pollution Bulletin, 105(1), 377-384.
Dhaliwal JK, Panday D, Saha D, Lee J, Jagadamma S, Schaeffer S, Mengistu A (2022) Predicting and interpreting cotton yield and its determinants under long-term conservation management practices using machine learning. Computers and Electronics in Agriculture, 199, 107107.
Dhaliwal SS, Singh J, Taneja PK, Mandal A (2020) Remediation techniques for removal of heavy metals from the soil contaminated through different sources: a review. Environmental Science and Pollution Research, 27, 1319-1333.
Dong W, Zhang Y, Quan X (2020) Health risk assessment of heavy metals and pesticides: A case study in the main drinking water source in Dalian, China. Chemosphere, 242, 125113.
Eyduran E (2019): ehaGoF: Calculates Goodness of Fit Statistics. R package version 0.1.0. https://CRAN.R project.org/package=ehaGoF.
Friedman JH (1991) Multivariate adaptive regression splines. The annals of statistics, 19(1), 1-67.
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Ge X, Ding J, Jin X, Wang J, Chen X, Li X, et al (2021) Estimating agricultural soil moisture content through UAV-based hyperspectral images in the arid region. Remote Sensing, 13(8), 1562.
Gertz M, Große-Butenuth K, Junge W, Maassen-Francke B, Renner C, Sparenberg H, Krieter J (2020) Using the XGBoost algorithm to classify neck and leg activity sensor data using on-farm health recordings for locomotor-associated diseases. Computers and electronics in agriculture, 173, 105404.
Gökmen F, Uygur V, Sukuşu E (2023) Extreme Gradient Boosting Regression Model for Soil Available Boron. Eurasian Soil Science, 1-9.
Grzesiak W, Zaborski D (2012) Examples of the use of data mining methods in animal breeding. Data mining applications in engineering and medicine, 303-324.
Günal E, Budak M, Kılıç M, Cemek B, Sırrı M (2023) Combining spatial autocorrelation with artificial intelligence models to estimate spatial distribution and risks of heavy metal pollution in agricultural soils. Environmental Monitoring and Assessment, 195(2), 317.
Kacar B (2009) Toprak Analizleri. Nobel Yayın, Ankara.
Kacar B (2013) Temel gübre bilgisi. Nobel Yayın, Ankara.
Kurwadkar S (2019) Occurrence and distribution of organic and inorganic pollutants in groundwater. Water Environment Research, 91(10), 1001-1008.
Luo P, Xiao X, Han X, Ma Y, Sun X, Jiang J, Wang H (2019) Application of different single extraction procedures for assessing the bioavailability of heavy metal (loid) s in soils from overlapped areas of farmland and coal resources. Environmental Science and Pollution Research, 26, 14932-14942.
Mahmoudzadeh H, Matinfar HR, Taghizadeh-Mehrjardi R, Kerry R (2020) Spatial prediction of soil organic carbon using machine learning techniques in western Iran. Geoderma Regional, 21, e00260.
Nguyen TG, Tran NA, Vu PL, Nguyen QH, Nguyen HD, Bui QT (2021) Salinity intrusion prediction using remote sensing and machine learning in data-limited regions: A case study in Vietnam's Mekong Delta. Geoderma Regional, 27, e00424.
Nosrati K, Collins AL (2019) A soil quality index for evaluation of degradation under land use and soil erosion categories in a small mountainous catchment, Iran. Journal of Mountain Science, 16(11), 2577-2590.
Ozaytekin HH, Mutlu HH, Dedeoglu M (2012) Soil formation on a calcic chronosequence of Ancient Lake Konya in Central Anatolia, Turkey. Journal of African Earth Sciences, 76, 66-74.
Ozkan A, Uygur V (2019) Determination of Heavy Metal Concentrations in Agricultural Lands of Amik Plain with MP-AES. Fresenius Environmental Bulletin, 28(1), 416-425.
Peel MC, Finlayson BL, McMahon TA (2007) Updated world map of the Köppen-Geiger climate classification. Hydrology and earth system sciences, 11(5), 1633-1644.
R Core Team (2023) R: A language and environment for statistical computing. R Foundation for Statistical Computing, R version: 4.3.0 Vienna, Austria. URL: https://www.R-project.org/.
Roberts N (1983) Age, palaeoenvironments, and climatic significance of late Pleistocene Konya Lake, Turkey. Quaternary research, 19(2), 154-171.
Seshadri B, Bolan NS, Wijesekara H, Kunhikrishnan A, Thangarajan R, Qi F, et al (2016) Phosphorus–cadmium interactions in paddy soils. Geoderma, 270, 43-59.
Sø HU, Postma D, Jakobsen R, Larsen F (2011) Sorption of phosphate onto calcite; results from batch experiments and surface complexation modeling. Geochimica et Cosmochimica Acta, 75(10), 2911-2923.
Sparks DL (1996) Methods of soil analysis, Part 3: Chemical properties. Soil Sci Soc Am Book Series, 5.
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of chemical information and computer sciences, 43(6), 1947-1958.
Taşpınar K, Ateş Ö, Özge Pınar, M, Yalçın G, Kızılaslan F, Fidantemiz YF (2022) Soil contamination assessment and potential sources of heavy metals of alpu plain Eskişehir Turkey. International Journal of Environmental Health Research, 32(6), 1282-1290.
Tsai MS, Chen MH, Lin CC, Liu CY, Chen PC (2019) Children's environmental health based on birth cohort studies of Asia (2)–air pollution, pesticides, and heavy metals. Environmental research, 179, 108754.
Vural H, Meral A, Şenyiğit Doğan S (2021) Changes in the Heavy Metal Levels in Highway Landscaping and Protective Effect of Vegetative Materials. Applied and Environmental Soil Science, 2021, 1-9.
Wang P, Sun Z, Hu Y, Cheng H (2019a) Leaching of heavy metals from abandoned mine tailings brought by precipitation and the associated environmental impact. Science of the Total Environment, 695, 133893.
Wang S, Cai LM, Wen HH, Luo J, Wang QS, Liu X (2019b) Spatial distribution and source apportionment of heavy metals in soil from a typical county-level city of Guangdong Province, China. Science of the Total Environment, 655, 92-101.
Wanhong L, Fang L, Fan W, Maiqi D, Tiansen L (2020) Industrial water pollution and transboundary eco-compensation: analyzing the case of Songhua River Basin, China. Environmental Science and Pollution Research, 27, 34746-34759.
Wilding LP, Bouma J, Goss D. W. (1994) Impact of spatial variability on interpretive modeling. Quantitative modeling of soil forming processes, 39, 61-75.
Xu D, Carswell A, Zhu Q, Zhang F, de Vries W (2020) Modelling long-term impacts of fertilization and liming on soil acidification at Rothamsted experimental station. Science of the Total Environment, 713, 136249.
Xu S, Zhao Y, Wang M, Shi X (2022) A comparison of machine learning algorithms for mapping soil iron parameters indicative of pedogenic processes by hyperspectral imaging of intact soil profiles. European Journal of Soil Science, 73(1), e13204.
Yadav IC, Devi NL, Singh VK, Li J, Zhang G (2019) Spatial distribution, source analysis, and health risk assessment of heavy metals contamination in house dust and surface soil from four major cities of Nepal. Chemosphere, 218, 1100-1113.
Ye Z, Sheng Z, Liu X, Ma Y, Wang R, Ding S, et al (2021) Using Machine Learning Algorithms Based on GF-6 and Google Earth Engine to Predict and Map the Spatial Distribution of Soil Organic Matter Content. Sustainability, 13(24), 14055.
Yesil P, Yesil M (2019) Heavy metal pollution in children’s playgrounds in Ordu, Turkey. Fresenius Environ. Bull, 28, 5090-5098.
Yu X, Wang Y, Wu L, Chen G, Wang L, Qin H (2020) Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. Journal of Hydrology, 582, 124293.
Zheng C, Guo ZX, Yuan YZ, Guo Y, Chai M, Liang XY, Bi RT (2019) Spatial and temporal changes of farmland soil acidification and their influencing factors in different regions of Guangdong Province, Chin. Ying yong sheng tai xue bao= The journal of applied ecology, 30(2), 593-601.

No competing interests reported.

Download PDF

Journal Publication

published 27 May, 2024

Read the published version in Environmental Earth Sciences →

Editorial decision: Major revision
02 Sep, 2023
Reviews received at journal
25 Jul, 2023
Reviewers agreed at journal
24 Jul, 2023
Reviewers invited by journal
24 Jul, 2023
Editor assigned by journal
26 Jun, 2023
Submission checks completed at journal
21 Jun, 2023
First submitted to journal
20 Jun, 2023

You are reading this latest preprint version

Prediction of Cadmium Content Using Machine Learning Methods

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Material and Methods

Descriptive soil properties

Statistical procedures

Machine Learning Algorithm

Model comparison criteria

Results and Discussion

Chemometric Relations of Cadmium

Conclusions

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1