Soil temperature (ST) plays a key role in different ecosystems by affecting processes such as the hydrological response of the soil, accumulation and degradation of organic matter, plant growth, nutrient mineralization, carbon emissions, proper time of sowing and even micro-organisms activity (Brar et al., 1992, Peng et al., 2009; Hu et al., 2016; Citakoglu, 2017). ST variation can alter soil characteristics and accordingly has considerable environmental outcomes with the change in the carbon balance (Qian et al., 2011). It can be an important parameter in ecological, climatological and hydrological modeling (Tabari et al., 2015). Information about ST is therefore important in decision-making processes. ST also varies with depth, but this variation is much smaller in the deeper layers than near the soil surface, and thus, accurate soil temperature assessments have to be done at different depths. For instance, ST in the topsoil (less than 5 cm) affects seed sowing, while information about ST at a depth of 10–15 cm is required for tree grafting. Previous studies have tried to find relationships between ST at various depths and the most important factors that affect this parameter (Citakoglu, 2017).
Unfortunately, spatially distributed data of ST does not exist in many regions of the world (Hu et al., 2016), as accurate measurements are expensive and time-consuming (Napagoda and Tilakaratne, 2012). Moreover, many environmental factors influence ST, including meteorological variables (e.g. solar radiation, air humidity, pressure and temperature, precipitation, sunny hours and wind speed), topographic conditions and soil factors such as soil water content, texture and surface cover (Paul et al., 2004; Samadianfard, 2018; Zeynoddin, 2019). Despite the technological advances in sensors and devices, direct measurements of ST are still expensive and require continuous measurements at different soil depths (Plauborg, 2002). To overcome the challenge of ST quantification for large areas, researchers have been concentrated on ST modelling and prediction using various techniques (Sándor and Fodor, 2012). Accurate ST modelling reduces time, costs and instrument maintenance (Maryanaji et al., 2017). Because ST has a strong correlation with meteorological variables, several models have been developed based on these relationships such as linear models (Kang et al., 2000; Bond-Lamberty et al., 2005), analytical models –based on conduction heat transfer (Ozgener et al., 2013; Cleall et al., 2015; Badache et al., 2016)– and numerical models –which consider complex heat and mass transport of soil (Liu et al., 2005; Belghit and Benyaich, 2014; Gao et al., 2016)–. These three types of models have their limitations: Linear equations do not have reasonable prediction power due to their simple and linear structure, while analytical and numerical models are difficult to use because of their complexity and high data demand (Xing et al., 2018).
Over the past recent decades, machine learning (ML) models, as computational artificial intelligence-based (AI) models, have captured researchers’ attention in distinct disciplines, especially in geosciences. ML tools are able to process large datasets efficiently. Moreover, non-linear models (e.g. AI models) have a high capability to simulate complex processes due to their non-linear and complex structures (Khosravi et al., 2018, 2019).
Previous studies applied ML models successfully for ST modeling. For example, Mihalakakou (2002) and Ozturk et al., (2011) used an ANN model with geographical and meteorological variables and concluded that ANN has good accuracy for predicting monthly mean ST. Araghi et al., (2017) showed that the wavelet artificial neural network (WANN) was an accurate approach for forecasting 1–7 days ST ahead at depths of 5–30 cm. Citakoglu (2017) compared ANN, adaptive neuro-fuzzy inference system (ANFIS) and multiple linear regression (MLR) models in estimating ST. They indicated that ANFIS outperforms both ANN and MLR models. Kisi et al., (2017) implemented ANN, ANFIS and gene expression programming (GEP) for ST prediction at the depths of 10, 50 and 100 cm by using climatic data. They concluded that GP outperforms other algorithms while developing models without climatic data obtained better performance for ANN than the ANFIS and GP. Sanikhani et al., (2018) used non-tuned data intelligent models including extreme learning machine (ELM), ANN and M5 Model Tree (M5 Tree) to predict ST with monthly meteorological information as inputs and found that the ELM model is a suitable tool for ST estimation at multiple soil depths. Samadianfard et al., (2018) developed two data-intelligent models including WANN and GEP for the short-term estimation of ST at different depths and found that WANN had the best performance in all considered depths. Xing et al., (2018) applied a support vector machine (SVM) to predict daily ST in a different climate of the USA and revealed that SVM has a good performance in predicting ST. Zeynoddin et al., (2019) found that the multilayer perceptron neural network (MLP) model resulted in a good performance for daily ST at two weather stations in northwestern Iran. Feng et al., (2019) assessed the abilities of ELM, generalized regression neural networks (GRNN), backpropagation neural networks (BPNN) and random forests (RF) models in modeling half-hourly ST and found that all models had an acceptable performance at different depths, while ELM had slightly better performance. Zeynodin et al., (2020) predicted ST using stochastic linear modeling and Holt-Winters AExpo-SARIMA model at Bandar Abbas and Kerman synoptic stations, Iran, at depths of 5, 10, 20, 30, 50 and 100 cm. They stated that the proposed method had a reasonable prediction power.
Although the above mentioned traditional models (e.g. ANN, ANFIS, SVM and ELM) were successfully applied for ST modeling, those models may have drawbacks, such as i) low generalization performance of ANN (Melesse et al., 2011), ii) accurate determining the weights in the membership function of ANFIS (Bui et al., 2016), iii) requirement of large training datasets of ELM, and iv) high sensitivity of the SVM model to the hyper-parameter selection (Waseem Ahmad et al., 2018). We hypothesize that newer AI-based models and data mining algorithms can address these drawbacks.
Data mining algorithms have proved their usefulness in flood susceptibility mapping through Logistic Model Trees (LMT), Kernel Logistic Regression (KLR), Radial Basis Function Classifier (RBFC), Multinomial Naïve Bayes (NBM) (Pham et al., 2020), landslide susceptibility assessment (Luo et al., 2019); and groundwater potential mapping using LMT, logistic regression (LR) and RF (Razavi-Termeh et al.,2019). Recently, these types of algorithms have also been applied for time-series data of prediction, including bed-load transport rate prediction through the M5P, random tree (RT), RF and REPT and four types of hybrid algorithms trained with a Bagging (BA) (BA-M5P, BA-RF, BA-RT and BA-REPT) (Khosravi et al., 2020a); also spring discharge prediction using M5P, RF and SVR (Granata et al., 2018). Other environmental processes such as suspended sediment load (Khosravi et al., 2018; Salih et al., 2019), and fluoride concentration prediction in groundwater (Khosravi et al., 2020b), erosion processes (Pourgahasmi et al., 2017, 2020) and soil respiration (Ebrahimi et al., 2019) have also employed these types of data-mining algorithms.
Most of the above-mentioned studies demonstrated that newer standalone data mining algorithms (e.g. M5P, RT) are promising alternatives to traditional ML methods (e.g. ANN, ANFIS, SVR) while hybrid algorithms can further improve predictive performance over standalone algorithms due to increasing model’s flexibility (Khosravi et al., 2018; Pham et al., 2019).
In this study, we aim to propose six novel hybrid resampling algorithms of BA and dagging (DA) with IBK, KStar and LWL, namely: i) BA-IBK, ii) BA-KStar, iii) BA-LWL, iv) DA-IBK, v) DA-KStar, and vi) DA-LWL; for 3, 6 and 9 days ahead daily soil temperature forecasting based on nearby meteorological data at two different soil depth levels (5 and 50 cm). To the best author’s knowledge and literature review, these hybrid algorithms are novel not only in soil science but also generally in geoscience, and this is the first attempt to forecast ST using the proposed models. We tested these approaches in a largely arid and semi-arid area devoted to agriculture where soil and water resources are scarce and accurate ST estimations are necessary to promote long-term sustainable agriculture. Thus, through these algorithms, ST may easily and accurately be predicted worldwide. Moreover, ST prediction through these models could reduce the time and resources for measuring ST.