Predicting the dust events frequency around a degraded ecosystem and determining the contribution of their controlling factors using gradient boosting-based approaches and game theory

This study was aimed to evaluate the performance of gradient boosting machine (GBM) and extreme gradient boosting (XGB) models with linear, tree, and DART boosters to predict monthly dust events frequency (MDEF) around a degraded wetland in southwestern Iran. The monthly required data for a long-term period from 1988 to 2018 were obtained through ground stations and satellite imageries. The best predictors were selected among the eighteen climatic, terrestrial, and hydrological variables based on the multicollinearity (MC) test and the Boruta algorithm. The models’ performance was evaluated using the Taylor diagram. Game theory (i.e., SHAP values: SHV) was used to determine the contribution of factors controlling MDEF in different seasons. Mean wind speed, maximum wind speed, rainfall, standardized precipitation evapotranspiration index (SPEI), soil moisture, erosive winds frequency, vapor pressure, vegetation area, water body area, and dried bed area of the wetland were confirmed as the best variables for predicting the MDEF around the studied wetland. The XGB-linear and XGB-tree showed a higher capability in predicting the MDEF variations in the summer and spring seasons. However, the XGB-Dart yielded better than XGB-linear and XGB-tree models in predicting the MDEF during the autumn and winter seasons. Rainfall (SHV = 1.6), surface water discharge (SHV = 2.4), mean wind speed (SHV = 10.1), and erosive winds frequency (SHV = 1.6) had the highest contribution in predicting the target variable in winter, spring, summer, and autumn, respectively. These findings demonstrate the effectiveness of the gradient boosting-based approaches and game theory in determining the factors affecting MDEF around a destroyed international wetland in southwestern Iran and the findings may be used to diminish their impacts on residents of this region of Iran.


Introduction
Wetlands are considered the transitional area between aquatic and terrestrial ecosystems (Gokce 2018). In general, these valuable ecosystems are divided into two categories of natural and man-made wetlands and provide a wide range of services to humanity, including water supply, climate regulation, and carbon sequestration (Jia et al. 2020;Martins et al. 2020). Considering the 1% discount rate, the present value of the damages caused by not protecting West Asian wetlands would reach $7.2 billion over the next 30 years (Eppink et al. 2014). In recent years, the water levels of these valuable ecosystems have declined due to climate change, meteorological droughts, and overuse of surface and groundwater resources (Cao et al. 2012;Ebrahimi-Khusfi et al. 2020a). According to a report provided by Davidson (2014), more than two-thirds of the world's wetlands have been destroyed since 1900. Wetland degradation, especially in arid areas, has led to the development of desertification and increased the occurrence of wind erosion and the activity of dust storms (Poornazari et al. 2020). In addition to climatic conditions, the frequency of dust events around degraded wetlands is a function of changes in the moisture content of wetlands and the flow discharge variations in the rivers leading to wetlands (Khusfi et al. 2017). Using the spectral indices derived from remotely-sensed data, it was demonstrated that the water level of many wetlands has decreased in many countries, including China (Jiang et al. 2017;Song et al. 2014), Nigeria (Ayanlade and Proske 2016), Ethiopia (Gebresllassie et al. 2014), India (Chatterjee et al. 2015), and Iran (Ebrahimi- Khusfi et al. 2020a).
The most important wetlands in Iran are Hamoun-e-Pouzak, Hamoun-e-Sabereen, Hamoun-e-Hirmand, Jazmourian, Gavkhooni, and Hour-al-Azim wetland (Rashki et al. 2017;Salmabai and Saeedi 2018;Vali et al. 2016). The Shadegan Wetland is also a destroyed wetland in the southwest of Iran that has been conducted numerous works on its various aspects such as the ecological risk assessment of heavy metals in its sediments and water (Ashayeri and Keshavarzi 2019), its economic valuation (Shamsudin et al. 2011), identify the atrazine sources (Almasi et al. 2020), the impact of dust on its vegetation variations (Bayat et al. 2016), and the prediction of the influencing factors on conservation behavior of rural operators (Ghanian et al. 2015). However, no attempt has been made to predict the monthly dust events frequency (MDEF) and to identify its controlling factors around the Shadegan Wetland. The dried bed of destroyed wetlands has become areas prone to soil erosion and dust production has negatively affected the life quality of surrounding cities (Dahmardeh 2016;Rashki et al. 2021;Shahraki et al. 2021).
Dust storms have greatly threatened human health in the Middle East, particularly in Iran (Goudarzi et al. 2017;Khaniabadi et al. 2017a;Khaniabadi et al. 2017b). Ahvaz metropolis, as one of the dustiest cities in Iran (Geravandi et al. 2017), is located near Shadegan Wetland which has suffered greatly from the increase in air pollution caused by particular matter, wind erosion, and dust events in previous years (Khaefi et al. 2017;Salmabadi et al. 2020). In recent decades, the concentration of heavy metals in the indoor dust of Ahvaz had a considerable potential to cause cancer in children (Neisi et al. 2016). Recognizing the major influencing agents on dust event frequency in different seasons is an appropriate strategy to deal with the dangers of dust storms in various regions (Ebrahimi-Khusfi et al. 2020c). Although the impact of various climatic and terrestrial factors on dust events has been addressed in some previous studies, no attempt has been made to determine the agents affecting MDEF in the degraded wetlands using advanced modeling techniques. Accordingly, in this study, we tried to analyze this issue for the destroyed Shadegan Wetland in the southwest of Iran using the best algorithm selected among the stochastic gradient boosting machine (GBM) and extreme gradient boosting (XGB) algorithm with different boosters. The ability to reproduce and the ability to quantitatively analyze the contribution of factors affecting natural hazards and their potential for continuous updating are among the capabilities of machine learning models (Youssef and Pourghasemi 2021). Good performance of various machine learning (ML) models in spatial predicting of dust-sensitive areas and their relatively good performance in temporal predicting of dust storm index (DSI) has been proven in some previous works (Chen et al. 2019;Ebrahimi-Khusfi et al. 2020b;Gholami et al. 2020). Although gradient boosting-based ML models have demonstrated successful performance in various regression applications (Konstantinov and Utkin 2021;Natekin and Knoll 2013), the efficiency of four different gradient boosting-based ML approaches, namely the XGB model with different boosters and GBM model, has not been evaluated and compared in the prediction of MDEF in different seasons around degraded wetlands in the world. Thus, this evaluation was performed in the current study for an important international wetland, called Shadegan International Wetland in southwestern Iran. Given that strong and problematic correlations between some influencing variables may lead to deviations in the results, it is necessary to identify the most important variables based on feature selection techniques before the modeling procedure. This issue has received less attention in dust studies, while it has received more attention in the present study. In dust studies, the multicollinearity (MC) test has often been used for choosing variables with the least correlation coefficient. However, feature selection algorithms have not been used for the prediction of MDEF in different seasons in previous dust studies. Boruta is a well-known algorithm that has been utilized in researches related to other natural hazards such as floods, landslides, forest fire, gully, and soil salinity (Amiri et al. 2019;Pourghasemi et al. 2020;Xu et al. 2020). Therefore, in this paper, in addition to the MC test, we used this algorithm to select the most input data for predicting the variability in MDEF around the Shadegan Wetland.
Furthermore, determining the importance of influencing agents on the target variable variations, improving the interpretability of the predictive models, and the capability to explain the contribution of predictor variables are important issues in the modeling process (Gilpin et al. 2018). One of the robust approaches to improve interpretability is the capability to explain the contribution of predictor variables of data-driven predictive models is the Shapley values from cooperative game theory (Gu et al. 2021;Nafarzadegan et al. 2021). Up to now, this approach has not been utilized in modeling MDEF variations across various geographical regions. Thus, SHapley Additive exPlanations (SHAP) values (Lundberg and Lee 2017) are computed for interpreting the best-performing model of the current study to increase our knowledge about the contribution and importance of influential variables. One of the advantages of this approach for explaining blackbox ML models is that it is independent of the structure and parameters of the corresponding predictive model. Another advantage is its high ability and its efficiency to fairly allocate the contribution to the prediction accuracy among the predictor values of each prediction (Rodríguez-Pérez and Bajorath 2020).
In total, the specific objectives of this study were (i) to choose the optimal input data combination for predicting MDEF in different seasons using the MC test and Boruta algorithm, (ii) to evaluate the GBM and XGB models with different boosters-algorithms for predicting the MDEF variations around the Shadegan International Wetland, and (iii) to determine the contribution of the influencing factors on the MDEF in different seasons over the study area using the concept of cooperative game theory (i.e., SHAP values).

Study area
Shadegan Wetland is located in southwestern Iran, near the northern shores of the Persian Gulf (Fig. 1a). The wetland with an area of 5400 km 2 is one of the largest wetlands in Iran, and out of 1201 international wetlands, it is registered as the 34th wetland of international importance on Ramsar Convention (http:// www. ramsar. org). Three rivers, the Maleh, the Gupal, and the Jarrahi, lead to this wetland (Fig. 1c), and the wetland is mainly fed by the Jarrahi River (Ghorbani et al. 2016). According to the discharge data recorded at hydrometric stations near Shadegan Wetland for a long-term period , the average discharge in the rivers leading to the lagoon was about 7.3  (Fig. 1b). Disruption of wetland hydrological systems is a result of the construction of the Maroon Reservoir Dam on the Jarrahi River, the development of irrigation networks in the upstream of the catchment, and the addition of agricultural wastewaters into the wetland in recent years (Ashayeri and Keshavarzi 2019;Sima and Tajrishy 2006). The prevailing wind direction in the northern and southern regions of the study area is mainly from the west and northwest, respectively (Fig. 1c). Based on the long-term average of monthly data from 1988 to 2018, mean annual rainfall, temperature, and wind speed in this area varied between 47.2 to 298.7 mm, 24.3 to 27.8 °C, and 4 to 5 m/s, respectively.

Material and methods
The framework used to perform the present research is shown in Fig. 2, and more details are provided in the following sections.

Data collection
The data used in this study are divided into two general categories. The first group is the data related to the target variable. The MDEF is considered the target variable in the current research. For calculating it, 3-h data on dust events which are observed and recorded in synoptic stations with codes of 6 to 9, 30 to 35, and 98 were used. These data were obtained from the Islamic Republic of Fig. 2 The methodology used for predicting the monthly dust events frequency and determining its controlling factors around the Shadegan wetland Iran Meteorological Organization (IRIMO) for the nearest synoptic stations located around the Shadegan Wetland. The second group of data used in this study is the influential variables which are divided into three main groups: i) meteorological data, ii) terrestrial data, and iii) hydrological data. All required data were collected for all months from 1988 to 2018.
Some meteorological data such as the mean monthly values of wind speed, precipitation, relative humidity, air temperature, maximum values of wind speed, temperature, relative humidity, and minimum values of temperature and relative humidity were acquired from IRIMO for the synoptic stations around the wetland for the years 1988 to 2018.
Some other meteorological data such as vapor pressure, deficit vapor pressure, and downward surface shortwave radiation were obtained from the TerraClimate data set that provided by the University of Idaho on a monthly time scale at 4 km resolution (Abatzoglou et al. 2018). Furthermore, hourly wind velocity data was utilized to compute the frequency of erosive winds (> 6 m/s) in all months of the study period. Drought and potential evapotranspiration (PET) are other climatic drivers that affect dust events (Al-Khalidi et al. 2021;Moghanlo et al. 2021). In this work, the standardized precipitation evapotranspiration index (SPEI) and PET were respectively calculated using the equations proposed by Vicente-Serrano et al. (2010) and Thornthwaite (Thornthwaite and Mather 1957) in MATLAB software for all months from 1988 to 2018.
Terrestrial data used in this work include the area of the vegetation cover, water body, and dried bed of the wetland. These data were obtained from the multitemporal Landsat images for the study region (path: 165 and row: 39) from 1988 to 2018 via the Google Earth Engine (GEE) platform. In order to preprocess Landsat images, radiometric and atmospheric corrections were performed based on methods applied by Ebrahimi- Khusfi et al. (2020a). In the image processing stage, the support vector machine (SVM) algorithm was used to separate the classes of the vegetation cover, waterbody, and dried bed of the wetland. The training samples used in this classification were selected based on Google Earth images and field observations. Lastly, the areas of three classes of the water body and vegetation cover were calculated for all study months during 1988-2018 in GEE.
Soil moisture is another ground-based variable used in this study, which was obtained via GEE from data provided by the University of Idaho (Abatzoglou et al. 2018). Due to the large number of images prepared on a monthly scale for the study period, only changes in the water body, vegetation cover, and dried bed of the Shadegan Wetland on an annual scale are shown in Fig. 3. Of note, in the modeling process, monthly data related to these parameters along with the hydro-climatic data have been used, which their monthly changes over the study period are presented in Fig. 4.
The third group of data, the mean monthly values of surface water discharge to Shadegan Wetland, was also obtained from Iran Water and Power Resources Development Company (IWPCO).
Temporal variations of all variables used in this study from 1988 to 2018 are indicated in Fig. 4. It is worth noting that monthly data for each season were used to predict seasonal changes. In other words, for each season, data related to 93 months during the study period have been used.

Multi-collinearity test (MCT)
Selecting non-correlated attributes that affect the target attribute is one of the main steps before the modeling procedure that helps to reduce the prediction error (Adeboye et al. 2014). The tolerance coefficient (TC) is one of the most common techniques used to determine strongly correlated variables and to select non-correlated variables (Amare et al. 2021), which is calculated using Eq. (1). The TC > 0.1 indicates a weak correlation between a variable with other predictor variables. The variable reduction process is iterative whereby TC is calculated; the variable with the lowest TC is removed and the TC is then recalculated. This iterative process is terminated once all variables have TC > 0.1, which indicates that the remaining variables have a minimal linear correlation. For the current work, the MCT has been performed in SPSSv.26 software.
where r 2 i represents the R-square obtained from regressing the i th variable against the other predictor variables.

Boruta algorithm
The Boruta algorithm is a random forest-based algorithm and a good technique for selecting features. In this algorithm, the status of study features was determined as rejected, confirmed, or tentatively important (Hassanien et al. 2012). This algorithm is a useful RF-based tool for extracting all relevant attributes before the predictive modeling process (Kursa and Rudnicki 2010). It was frequently used in previous studies (Amiri et al. 2019;Keskin et al. 2019;Prasad et al. 2019;Shaheen and Iqbal 2018) and includes the following steps: i. Add a copy of all attributes to expand the system information with at least five shadows. ii. Remove the correlation of added attributes with the response variable. iii. Run an RF classification on the expanded information system and collect the calculated Z-scores.
(1) TC = 1 − r 2 i iv. Explore the maximum of Z-score among the shadow attributes and give a hit to any attribute that is better than it, v. Carry out a two-way equality test for attributes that importance has not been specified in the previous step, vi. Eliminate variables with very low importance as well as shadow attributes and vii. Repeat this process until the importance of all the attributes is determined, or the algorithm reaches the predetermined range of RFs.
Generally, by growing randomness to the structure of this algorithm and collecting outcomes from the randomized samples, the oblique effect of random discord and relationships reduce (Pourghasemi et al. 2020). As a result, we get a clearer insight into the really important and influential features of the target variable. In the present work, the R package Boruta was utilized for performing this algorithm.

Stochastic gradient boosting machine (GBM)
The GBM model creates a set of decision trees in a sequential manner and the main goal is to reduce the difference between the measured and forecasted values by all its previous escapes (Natekin and Knoll 2013). Using this algorithm, decision trees are continuously trained and modified for the residuals of all their previous trees to make the best predictions. Generally, the GBM algorithm is trained by Eq. (2) (Friedman 2001;Kong et al. 2020): (2) here, Gn(z) refers to the output model in which n reaches N. λ is computed in the procedure of reducing the loss function. Gn − 1(z) is utilized to illustrate the prevalent mood of the GBM model that has an n − 1 decision tree. Hyperparameters of the GBM model include the number of boosting iterations (n.trees), the maximum tree depth (interaction.depth), shrinkage, and minimum terminal node size (n.minobsinnode). In this study, they were tuned by fivefold cross-validation via R packages of caret and gbm.

Extreme gradient boosting (XGB) model
The extreme gradient boosting (XGB) algorithm was first introduced by Chen and Guestrin (2016). This algorithm uses a gradient boosting structure which has the benefits of regularization and parallel tree boosting. This can help increase modeling performance by overcoming the overfitting problem and decreasing the computation time by applying the parallel procedure (Bansal and Kaur 2018). This algorithm can work well for data sets that have missing values (Chen et al. 2019). Thus, three boosters of the linear, tree, and Dart were used to predict the monthly DEF in our study area. Using the linear booster, linear relationships between influencing agents, and using two other boosters, more complex relationships between them are explored (Naghibi et al. 2020). For the XGB model, the main parameters controlling the tuning process (hyper-parameters) include learning rate, column sample rate, number of trees, and maximum tree depth. Further details on hyperparameters are provided in Table 1.
In total, the XGBoost model is expressed by Eq.
(3) (Ebrahimi- Khusfi et al. 2020b): in which K denotes the decision trees. Equation (4) is used to evaluate the XGBoost model for reducing the overfitting problem: In Eq. (4), m shows the loss function, k is the number of data, and r refers to the regularization factor. It is defined as (Eq . 5) where vs is the vector scores, L indicates the leaves number in a decision tree, and p 1 and p 2 refer to the penalty factors.
The R packages of caret and XGBoost were applied to tune and execute the XGB-based models with different boosters. Fivefold cross-validation was conducted with about 80% of the data set for the training as well as the hyper-parameter tuning of models and 20% of them (holdout data) for the test step.
It is worth noting that in all study models, the input and output parameters are the explanatory variables (including the meteorological, terrestrial, and hydrological variables) and target variable (MDEF), respectively.
To predict the MDEF in different seasons, monthly data related to all predictor variables in different seasons of the whole study period  were used (93 months for each season). For example, records corresponding to each

Performance evaluation of predictive models
In this stage of the present study, the performance of the predictive models was assessed through a comparison between observed and predicted values of DEF in different seasons using the metrics provided in the Taylor diagram (Eqs. 6 and 7) (Shabani et al. 2021): Here, r and RMSE are correlation coefficients and root mean square of error in a study time series, respectively. The ADEF and PDEF indicate the actual and predicted values of DEF for the ith variable, respectively. Also, ADEF i refers to the mean value of actual DEF data.

Interpretability and importance of controlling factors using game theory
Improving the interpretability of machine learning models is one of the most important topics in studies associated with modeling (Kaur et al. 2020;Rice et al. 2020). Game theory is one of the most important theories that investigate the strategic relationships between illustrative decision-makers, has engrossed the attention of some researchers to enhance the interpretability of data mining models (Kaur et al. 2020;Wu et al. 2017). SHAP, obtained from Shapley value in cooperative game concept, measures the effect of attributes by considering the interaction with other attributes. SHAP values (SHV) obtained from the cooperative game concept is used to determine the share of the influencing agents on the target variable (Lundberg and Lee 2017), which is calculated by the following Eq. (8): Equation (8) Phi k refers to the Shapley value and n denotes the total number of predictors. f(g) is the result of the ML model that is explained by a set g of an attribute. In the present study, SHAP values were computed using the R package SHAPforxgboost.

Effective factors selected by multi-collinearity test and Boruta algorithm
The outcomes obtained from the MC test to select the noncorrelated variables for modeling MDEF in the study area are presented in Table 2. According to the MC test, the In addition to these factors, autumn rainfall (Ra), winter Vap, and summer Vap had a TC < 0.1, highlighting that a problematic correlation between these parameters in the studied seasons. Therefore, these parameters were excluded and non-correlated variables with TC < 0.1 were selected at this stage of the present study. They include DbA, WA, Wsmean, Wsmax, WS6, Ra (except for autumn months), SPEI, SM, Q, VegA, and Vap (except for winter and summer months). These variables were introduced to the Boruta algorithm for choosing the relevant predictive variables. The results are summarized in Fig. 5. According to the relevance score (Z-score) obtained from the Boruta algorithm (Fig. 5), Wsmean (12.5), Rainfall (8.2), Wsmax (6.2), and discharge (3.8) were detected as the most influencing variables for the occurrence of the dust events during winter months in the study area. The highest score of influential variables for occurrence of dust events during spring months was assigned to WS6 (7.5), Wsmean (7.1), Q (6.8), and SM (4.9). Furthermore, the highest value of Z-score was achieved for WS6 (12.1), WSmean (9.8), WA (5.2), and Wsmax (4.8) for the occurrence of dust events around the Shadegan Wetland in the summer months. Six variables of WS6 (8.2), Ra (7.8), Wsmean (4.2), WA (4.3), Wsmax (3.8), and Vap (2.8) were confirmed as the relevant drivers affecting dust events during autumn months in the study area. Using the Boruta, some variables were detected as tentative variables. For dust events in winter months, WA, VegA, DbA, and SPEI, and for dust events in spring months, WA had a status of tentative (Z-score: 2.1-2.5). Similar status was observed for SM, VegA, and DbA in summer and for SM and SPEI in the autumn months. We used these features for modeling MDEF. However, other agents were rejected and have not been used for modeling MDEF around the Shadegan Wetland.

Hyper-parameters tuning
The optimal values of tuned parameters for different seasons are summarized in Tables 3 and 4 and Figs. S1-S8 (Supplementary infomation). The best-tuned parameters of

Models' performance for predicting MDEF
The performance of GBM and XGB with different boosters for predicting MDEF in different seasons using the Taylor diagram is shown in Fig. 6. The r and RMSE are important performance evaluation indicators included in this diagram.
In modeling MDEF in the summer season, the GBM model resulted in r = 0.78 and RMSE = 6.7. Furthermore, values of 0.99 and 0.0006, 0.88 and 5.2, and 0.74 and 7.4 were obtained based on the XGB-L, XGB-T, and XGB-D models for two accuracy metrics of r and RMSE, respectively (Fig. 7). In the MDEF prediction of autumn, the r values of 0.75, 0.99, 0.86, and 0.88 were, respectively, estimated using the GBM, XGB-L, XGB-T, and XGB-D models. However, the RMSE was estimated at 3.4, 0.02, 2.7, and 2.6, respectively, according to the above-mentioned models (Fig. 7). Afterward, the trained models were employed to predict the MDEF using the holdout data sets. According to the obtained results, it was found that for estimating MDEF in summer months, the XGB-L and XGB-T resulted in r = 0.86 and 0.62 and RMSE = 9 and 11.8, respectively.  Also, values of 0.63 and 0.72 as well as 12.5 and 11.7 were obtained after performing the XGB-D and GBM models on the holdout data sets of summer months (Fig. 8c). For predicting MDEF in spring season, the (r; RMSE) had, respectively, values of (0.76; 2.6), (0.67; 5.2), (0.78; 2.6), and (0.82; 2.1) based on the GBM, XGB-L, XGB-T, and XGB-D models (Fig. 8b).

Discussion
In this study, the best combination of variables affecting MDEF in different seasons was selected from climatic, terrestrial, and hydrological variables using the MC test and Boruta algorithm. These techniques provide quantitative results and have the important advantage that permits the potential comparison of research in various parts of the world (Ebrahimi- Khusfi et al. 2020c;Pourghasemi et al. 2020). Based on these techniques, the higher relevance and influence of some climatic parameters such as wind speed, erosive winds frequency, and rainfall as well as some ground-based parameters such as the dried bed of the wetland was confirmed to predict MDEF in all seasons (Fig. 3).
The influence of monthly changes in the surface water discharge to Shadegan Wetland on the occurrence of dust events was confirmed in the winter and spring seasons. In the study conducted by Ebrahimi- Khusfi et al. (2020b), the most important parameters used to predict the seasonal dust storm index in arid regions of Iran were wind speed, air temperature, rainfall, evapotranspiration, and vegetation cover. Of note, these factors were only chosen by the MC test, while the factors selected in this work were selected after performing two techniques of the MC test and the Boruta algorithm. Ebrahimi- Khusfi et al. (2020c) stated that in addition to these factors, the meteorological drought index was also important to predict temporal changes in dust concentrations over semi-arid regions of Iran.
On the spatial scale, some important confirmed factors for predicting dust-prone areas were soil texture, soil bulk density, DEM, vegetation, precipitation, organic matter, land use, and wind speed (Gholami et al. 2020). These findings indicate the importance of some variables such as rainfall and wind speed on dust events at both temporal and spatial scales.
According to the results of performance evaluation in testing data sets (Fig. 8), the use of the XGB model only increased the prediction accuracy of the MDEF in summer by at least about 15% compared to other study models (Fig. 8c). In the winter months, the lowest performance was observed for the XGB-tree model while the higher and almost similar performance was observed by the other three models, especially by the XGB-Dart (Fig. 8a). On the contrary, the highest performance of MDEF prediction was observed for the XGB-tree model in the spring months. The prediction performance of monthly changes in the target variable across the autumn was improved using the XGB-Dart by 4%, 7%, and 16% compared to the XGB-tree, GBM, and XGB-linear models, respectively (Fig. 8d).
In total, these results indicate that the linear booster in the XGB model causes weak generalization capability in the winter, spring, and autumn months. Although the capability of the XGB model is poorly evaluated to predict dust storms in arid regions of Iran (Ebrahimi- Khusfi et al. 2020b), its high efficiency has been proven in predicting PM2.5 ) and other natural hazards, including the gully erosion (Chen et al. 2021) and landslides (Sahin 2020). Differences in the results related to the performance of the XGB model with past dust studies may be due to differences in the criteria considered for measuring dust occurrences and the studied years. By comparing the outcomes of this study with the Ebrahimi- Khusfi et al. (2020b) findings, it may also be concluded that the XGB model has a greater ability to predict the frequency of dust events, while it has less ability to predict the dust storm index in the desert areas.
According to SHAP values, variations in MDEF were mostly affected by changes in rainfall and the dried surface of the wetland, followed by changes in wind speed and surface water discharge to Shadegan Wetland during the winter months of the study period (1988 to 2018). The higher contribution of inflow discharge, wind speed, and soil moisture parameters was observed for controlling the MDEF variations in the spring season.
Meteorological drought is another important climatic parameter, the impact of which is undeniable, especially in the winter and autumn months. Vegetation cover also plays an important role in intensifying or weakening wind erosion events, which in our study area, its effect was more observed on summer events. In addition, vapor pressure was identified as the sixth most influential factor in autumn dust events frequency around the study area.
Generally, the results indicate that the priority of influencing climatic, terrestrial, and hydrological factors on the frequency of dust events during different seasons is different. According to Baltaci (2021), erosive wind speed (> 6.5 m/s) is one of the main parameters for the occurrence of dust storms. However, Ebrahimi- Khusfi et al. (2020b) reported that dust events in different seasons are influenced by different factors. For example, wind speed, vegetation cover, and maximum wind speed have been, respectively, identified as the most important agents for the activity of dust events in winter, spring, summer, and autumn over arid regions of Iran. It has been reported that wind speed and vegetation are two driving forces that have a great impact on Middle East dust events ).
As early mentioned, changes in the dried bed of Shadegan Wetland have also been effective on dust events around the wetland. Some other researchers have also concluded that the drying up of Iranian wetlands, especially Hamoun, Meighan, and Jazmourian, had a great impact on intensifying the spread of dust in these areas (Arjmand et al. 2018;Ebrahimi-Khusfi et al. 2020a;Gholami et al. 2020). In addition, the reduction in inflow discharge to the Gavkhooni wetland in central Iran has led to the wetland drying up and degradation in air quality due to increased dust particle emissions (Khusfi et al. 2017), as we observed for our study area.
Meteorological drought occurs due to a long-term decrease in rainfall in an area, which in turn can make the soil of an area more prone to wind erosion (Teng et al. 2021). The impact of this phenomenon on wind erosion events over different parts of the world has been proven (Arcusa et al. 2020;Kandakji et al. 2021;Knapp et al. 2020), which supports our outcomes of the present research.
Soil moisture was recognized as the third, sixth, and seventh dust control factor in summer, autumn, and winter months, respectively. According to Han et al. (2021), this important terrestrial agent can shift the activity rate of dust events by affecting the soil erosion threshold. Some researchers have also concluded that reducing soil moisture content had remarkable impacts on dust emissions across arid and semi-arid areas (Meng et al. 2018;Wang et al. 2014).
In general, the results of this study showed the combined effect of some hydro-climatic factors, including precipitation, wind speed, erosive winds velocity, discharge, and some terrestrial drivers, especially dried bed of wetland and soil moisture in the occurrence of dust events around the Shadegan Wetland. In other words, these controlling factors had a great contribution in predicting the MDEF around the destroyed wetland of Shadegan in southwestern Iran.
Although gradient boosting-based approaches and game theory are promising tools for predicting MDEF and Fig. 7 Performance of study models for predicting monthly dust events frequency in a winter, b spring, c summer, and d autumn around the Shadegan wetland according to the training data sets determining their controlling factors around the destroyed Shadegan Wetland, future studies should evaluate their effectiveness for wetlands with different environmental conditions to verify its generalizability for other destroyed wetlands. Therefore, it is suggested that such topics be addressed in future research. The results of such research can help managers and decision-makers to better manage these valuable ecosystems and reduce dust pollution hazards in affected areas.

Conclusion
Predicting the MDEF and determining the contribution of their controlling factors are appropriate measures to combat the dangers and threats posed by dust events around degraded wetlands. In the current study, gradient boosting-based approaches were used to predict the MDEF around the Shadegan Wetland in southwestern Iran. The higher yield of the XGB-Dart model for predicting the variability of MDEF in cold months and the XGB-L and XGB-tree for predicting MDEF in warm months were proved by the Taylor diagram. Using the SHAP values, it was found that precipitation, followed by wind speed, inflow discharge, dried bed area, vegetation, and drought had the highest contribution in predicting DEF in the winter months. However, water discharge, wind speed, soil moisture, wetted area, and frequency of erosive winds were the major causes of MDEF variations around the Shadegan Wetland in the spring months during the study period . Surface winds velocity and frequency of strong winds had a great contribution in predicting MDEF in summer and autumn seasons. On the contrary, monthly changes in the water area of the wetland in summer and meteorological drought in autumn had the least share in the forecasting MDEF around the studied wetland. Researchers and stakeholders can use these results to mitigate the undesirable impacts of dust storm activities in areas around degraded wetlands, which can lead to a reduction in desertification and help move toward sustainable development in this susceptible ecosystem to dust events. Fig. 8 Performance of study models for predicting monthly dust events frequency in a winter, b spring, c summer, and d autumn around the Shadegan wetland according to the testing data sets