Development of novel hybrid machine learning models for monthly thunderstorm frequency prediction over Bangladesh

Thunderstorm frequency (TSF) prediction with higher accuracy is of great significance under climate extremes for reducing potential damages. However, TSF prediction has received little attention because a thunderstorm event is a combination of intricate and unique weather scenarios with high instability, making it difficult to predict. To close this gap, we proposed two novel hybrid machine learning models through hybridization of data pre-processing ensemble empirical mode decomposition (EEMD) with two state-of-arts models, namely artificial neural network (ANN), support vector machine for TSF prediction at three categories over Bangladesh. We have demarcated the yearly TSF datasets into three categories for the period 1981–2016 recorded at 28 sites; high (March–June), moderate (July–October), and low (November–February) TSF months. The performance of the proposed EEMD-ANN and EEMD-SVM hybrid models was compared with classical ANN, SVM, and autoregressive integrated moving average. EEMD-ANN and EEMD-SVM hybrid models showed 8.02–22.48% higher performance precision in terms of root mean square error compared to other models at high-, moderate-, and low-frequency categories. Eleven out of 21 input parameters were selected based on the random forest variable importance analysis. The sensitivity analysis results showed that each input parameter was positively contributed to building the best model of each category, and thunderstorm days are the most contributing parameters influencing TSF prediction. The proposed hybrid models outperformed the conventional models where EEMD-ANN is the most skillful for high TSF prediction, and EEMD-SVM is for moderate and low TSF prediction. The findings indicate the potential of hybridization of EEMD with the conventional models for improving prediction precision. The hybrid models developed in this work can be adopted for TSF prediction in Bangladesh as well as different parts of the world.


Introduction
Thunderstorms are spectacular mesoscale phenomena that affect the environment and pose a severe threat to life, economy, agriculture, and infrastructures. A thunderstorm event results from a turbulent convective activity, which may bring about heavy rainfall, lightning, hail, tornadoes, and thunder . Thunderstorms occur in almost every region of the world because of meteorological instability and strong moisture convergence, which causes serious convections. It usually exists for less than an hour and typically has varying sizes ranging from a few kilometers to a few hundred kilometers (Saha and Quadir 2016). It is now a well-acknowledged fact that the climate system is getting warmer, which has implications for thunderstorm occurrences (Allen et al. 2014;Trenberth et al. 2007). Severe thunderstorms frequency is likely to increase in the twenty-first century due to the increasing convective instability (Rädler et al. 2019). Therefore, it is essential to predict the number of thunderstorm events that occur in a particular period under changing meteorological conditions in a given location. Predicting the number of thunderstorm phenomena could provide insights about future thunderstorm incidents under the climate change scenario.
Thunderstorm frequency (TSF) can be defined as the number of thunderstorm occurrences in a given location over a day, month, season, or annum. It is estimated that daily TSF is nearly 45,000 and annually 16 million worldwide (Siddiqui and Rashid 2008). Many parts of South Asia experience higher TSF during the summer months (March-May) when high temperatures that prevail at lower levels create a volatile atmosphere. Each year Bangladesh and its surroundings witness high TSF, especially during the pre-monsoon and early months of the monsoon season; however, thunderstorms occur in all seasons. Spatially, TSF is highest in the northeastern part and less in the southeastern and northwestern parts of Bangladesh (Mannan et al. 2016). Before 1981, the country endured thunderstorm strikes in about 9 days in May, which later rose to 12 days. Besides, thunderstorms-associated disasters cause severe damage to agricultural yields and infrastructures and lives on the ground and in aviation. Due to the exorbitant impact of thunderstorms on human life and the economy, the Government of Bangladesh declared it a natural disaster on May 17, 2016 (Wahiduzzaman et al. 2020). In contrast, thunderstorms bring crucial rainfall during the dry season, which benefits the country's crop production and cleans the air from dust, haze, and pollutants. A TSF prediction model can help prepare and design a more useful crop calendar adaptive to thunderstorm events. Besides, a TSF prediction model is essential for policymakers to adopt a mitigation plan for reducing the potential damages of thunderstorm casualties.
Thunderstorm prediction is a challenging task due to its small spatiotemporal extension, and the event is a combination of very complex and unique weather scenarios, which are highly unstable. Despite the challenges, many researchers have attempted to predict thunderstorms worldwide, e.g., Jacovides and Yonetani (1990) in Cyprus; Mills and Colquhoun (1998) in Australia; Haklander and Delden (2003) in Netherland; Manzato (2007) in Italy; Zhen-hui et al. (2013) in China; Ali et al. (2011) in Malaysia; Litta et al. (2013) and Meher et al. (2019) in India; Collins and Tissot (2015) in the USA; Dowdy (2016) in the temperate and tropical regions; Osuri et al. (2017) in Indian monsoon region; Rädler et al. (2019) in Europe; Chen et al. (2020) in Taiwan; Kulikov et al. (2020) in Russia; Bouttier and Marchal (2020) in Western Europe; Islam et al. (2020) in Bangladesh. A variety of approaches have been taken in those studies. For example, Collins and Tissot (2015) used and compared an ANN and MLR model for thunderstorms prediction within 400 km 2 of South Texas; Rädler et al. (2019) used an ensemble of 14 regional climate models such as AR-CHaMo models, EURO-CORDEX model to assess the changes in the frequency of thunderstorm. Most of the studies have focused on numerical weather prediction (NWP) modeling or forecasting of a single thunderstorm event on an hourly basis based on the convective indices. However, studies focused on predicting monthly TSF based on the convective indices and other thunderstorm-related parameters are still scarce in the literature ). In the present study, we have employed machine learning models including artificial neural network (ANN), support vector machine (SVM), incorporated with ensemble empirical mode decomposition (EEMD), and auto-regressive integrated moving average (ARIMA) modeling to predict the monthly TSF over Bangladesh.
Among the machine learning models, ANN is a powerful model that can identify complex inherent nonlinear relationships between responses and predictors. Therefore, ANN models have drawn attention in the thunderstorm forecasting community (Manzato. 2007;Collins and Tissot 2015;Litta et al. 2013). SVM is also a useful prediction technique that was used before in thunderstorm prediction (Qiu et al. 2010;Zhen-hui et al. 2013). The time series model like ARIMA is widely used because it can characterize nonlinear data; this model was also applied previously in thunderstorm prediction ). However, these models are not always efficient enough to predict a target dataset accurately. Due to this reason, many researchers have developed techniques that adjoin several types of methods to obtain more accuracy in their prediction (Chen and Letchford. 2007;Gao and Stensrud. 2014;Solari et al. 2017;Suparta and Putro. 2018;Bouttier and Marchal 2020;Salam and Islam 2020;Kamangir et al. 2020;Taludar et al. 2021;Islam et al. 2021;Mallick et al. 2021). The hybrid EEMD integrated machine learning models have successfully applied in different fields of studies, e.g., runoff (Tan et al. 2018); streamflow forecasting (Zhang et al. 2015); rainfall forecasting (Johny et al. 2020), wind speed forecasting (Yu 2020); groundwater level (Gong et al. 2018). However, TSF prediction has received little attention in the existing literature due to its complicated nature and unique weather feature with high instability, making it difficult to predict. Our work fills this research gap in the literature. Therefore, hybrid EEMD-ANN and EEMD-SVM models, the combination of an ensemble empirical mode decomposition (EEMD) with an ANN and SVM model, are proposed as effective methods to predict monthly TSF. In this study, widely used convective indices and thunderstorm-related variables were used as input parameters. The EEMD-ANN and EEMD-SVM prediction results were compared with three conventional prediction methods, e.g., ANN, SVM, and ARIMA, based on five performance evaluation metrics, i.e., coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), index of agreement (IA) along with the Taylor diagram. Even though machine learning models can solve prediction problems with reasonable accuracy, their predictive capability relies significantly on the input data quality. In such a case, sensitivity analysis can help identify which input parameter is remarkable in building an ideal model. So, sensitivity analysis is employed in this study to improve the performance of the models.
This study's primary objective is to predict and evaluate the performance of the monthly TSF based on the convective and thunderstorm-related indices. Compared to other earlier studies, our work has two novel aspects. First, we develop hybrid machine learning models to predict monthly TSF at three frequency metrics over Bangladesh for the first time in the literature. Second, this study identifies the most contributing parameters influencing TSF prediction and selects optimal input parameters using the random forest model. It is hoped that the novel hybrid model proposed in this work would address the challenge of the complicated nature of thunderstorm events due to its high instability and randomness.

3 2 Study area and data sources
The site selected for this research is Bangladesh, a part of Southeast Asia, geographically located between 20° 34′-26° 38′ North latitude and 88° 01′-92° 42′ East longitude. Bangladesh is the biggest deltaic country in the world, occupying 147,570 sq. km area (Ghose et al. 2021). The three vigorous rivers, Padma, Jamuna, and Meghna, and their tributaries encompass 80% of Bangladesh's floodplains, leaving out the hilly parts. The geographical features of this narrow flat lowland are very well suited for convection, as the moisture conveyed by the monsoon winds from the highly elevated regions and the Bay of Bengal causes the development of convection (Ahmed et al. 2017). Here, the monsoon is probably the controlling feature of climatic variability , portrayed by pelterbearing breezes, humbly warm temperatures, and high moisture in the air. As an outcome, thunderstorms, floods, and tidal floods are regular incidents in this country. There are three prominent seasons observed in Bangladesh, which are pre-monsoon, monsoon, and post-monsoon.
In this study, monthly TSF and TSD data were collected from 28 stations ( Fig. 1) of the Bangladesh Meteorological Department (BMD) ranging from 1981 to 2016. There are more meteorological stations in Bangladesh, but those stations do not have longterm records of thunderstorms, and few stations have excessive missing data. Therefore, we have excluded those stations. Although some of the selected stations have few missing data, we have filled them by obtaining the nearest station's value. Table S1 contains the missing data information. Thunderstorm frequencies are recorded eight times per day in each station of BMD with a 3-h interval according to the World Meteorological Organization (WMO) standard hour. The number of TSF observations recorded per day is regarded as the daily TSF. From the daily observations, monthly TSF is computed for each station. Description of the 28 meteorological stations of BMD and an overview of the annual TSF data are given in Table 1. The number of days with thunderstorm observations in a month is regarded as monthly TSD. Daily precipitation data were also Fig. 1 Geographical location of the study area and red delta signs represent the selected meteorological stations of BMD 1 3 collected from the same 28 stations of the BMD from 1981 to 2016. We then converted the daily precipitation into monthly average data and used them as predictors for the model building.
Single point data of dew/frost point at 2 m (DP), relative humidity at 2 m (RH), wind speed range at 50 m (WS50), and Earth skin temperature (ST) data were used as a predictor in building TSF prediction model. These parameters were obtained for the specific latitude and longitude of the selected 28 stations from the NASA Langley Research Center Atmospheric Science Data Center Surface Meteorological and Solar Energy (SSE) web portal supported by the NASA LaRC POWER Project (https:// power. larc. nasa. gov/ data-accessviewer/), which has a 0.5° × 0.5° gridded global dataset. Moreover, the monthly averaged CAPE, convective rain rate (CRR), convective precipitation (CPRCP), K index (KI), and total totals (TT) were obtained from Climate Data Store (CDS) of Copernicus Climate Change Service (https:// cds-dev. coper nicus-clima te. eu/ cdsapp# !/ datas et/ reana lysis-era5single-levels-month ly-means? tab= overv iew). We have calculated the average of 28 stations for all the data and then used them as country average because this approach helps the 1 3 future assumption of thunderstorm frequency within a large area. Table 2 corresponds to the descriptions of the datasets used in this study. For convenience in predicting monthly TSF accurately, we have classified all the data in three categories of time-series, e.g., HTSF (high thunderstorm frequency; containing high-frequency months of March, April, May, June), MTSF (moderate thunderstorm frequency; containing moderate-frequency months of July, August, September, October), and LTSF months (low thunderstorm frequency; having low-frequency months of November, December, January, February). This classification helps differentiate the months with high, moderate, and low TSF, and thus, it reduces abrupt fluctuations in the time-series, which increases the prediction accuracy.

Parameter selection
The input parameters were selected based on a random forest (RF) relative importance technique performed in Salford Predictive Modeler 8.2. The RF algorithm can handle complex relationships and datasets with missing cases (Breiman 2001). It is a popular and highly flexible supervised artificial intelligence applied to measure the importance of various contributing factors and is extensively used as a feature-selection tool ). However, much time for training, computational power, and resources are required for this model to build numerous decision trees to combine their outputs. The RF method details can be found in the literature (Rahman and Islam 2019). We have initially considered 21 input parameters and tested different combinations among them, but some of the parameters were not suitable enough to efficiently predict TSF (Fig. S1). We have excluded 10 of the initially considered parameters affecting the model performances and finalize 11 input parameters (Table 2) for model building. The excluded parameters were lifted index (LI), maximum temperature (MaxT), precipitable water (PRW), diurnal temperature (DT), specific humidity (SH), wind speed range at 10 m (WS10), minimum temperature (MinT), V component of wind (VCW), U component of wind (UCW), and surface pressure (SP). CAPE, CPRCP, CRR, DP, KI, PRCP,

Artificial neural network (ANN)
Artificial neural network (ANN) is one of the most employed techniques for modeling accurate predictions to solve complex and nonlinear problems (Phuong et al. 2017;Alizadeh et al. 2018;Pham et al. 2019). ANNs are data processing systems that exploit learning algorithms to imitate knowledge and save this knowledge in weighted connections, similar to a human brain (Pradhan and Lee 2010;Boateng et al. 2019). An ANN has numerous processing components called neurons (Boateng et al. 2019). The data are processed by the neurons and then feed-forwarded to the subsequent layer. Corresponding links between layers connect these neurons. On each connecting link, there is a numeric weight. An ANN structure consists of three main layers, e.g., input layers, hidden layers, and output layers. The input layer contains the variables used for model construction; the hidden layers analyze the interconnection between the input and the output parameters based on algorithms, and the output layer represents the predicted variables. Unlike statistical models, ANNs can automatically synthesize their weights to elevate their attitude. (Boussabaine 1996). ANN is like a "black box" which lacks self-explanation. As a result, both ANNs and statistical approaches can be ensembled into a robust and potent methodological platform despite their differences (Karlaftis and Vlahogianni 2011). It can work with incomplete information, and also, a fault of one or multiple cells does not prevent generating output. However, the problem with ANN is that it does not hint about how it delivers when it produces a solution, and there is no specific rule for an efficient ANN structure. At first, an ANN model has to be trained with an acquainted dataset called the "training" dataset. The model will then "learn" by synthesizing the neurons' numerical weights regarding the errors between the predicted output values and the target output values through the training process. After the training period, the neural network delivers a model that can predict a target value from a specificity input value. This research has used a backpropagation-based neural network regression approach to predict thunderstorm frequency. Here, we have used 11 variables as input parameters and two layers in the hidden unit-the 1st hidden layer composed of 4 neurons and the 2nd layer consisting of 2 neurons (Fig. 2). We have set the learning rate to 0.1 while using the sigmoid function as the activation function. A neural network model can be expressed in mathematical form as Eq.
where x j (p) = input variable in discrete-time t, y(x) = predicted thunderstorm frequency, n = hidden neuron by trial, w j (p) = weight that connects the ith neuron in the input layer, c = neuronal bias, K(.) = hidden transfer function.

Support vector machine (SVM)
SVM, one of the most successful forecasting methods in recent years, was first proposed by Vapnik (1995). It is remarkably capable of handling small-sized datasets and nonlinear problems (Liu and Wang 2016;Ghimire et al. 2019). So, it has been widely applied in regression modeling. It is one of the most effective predicting tools often used as an alternative approach to ANN. The SVM approximates structural risk minimization based on statistical learning theory (Meng et al. 2019) rather than empirical risk minimization (Huang et al. 2014). Despite these advantages, the difficulty with SVM is that choosing a suitable kernel and on a large data set, SVM can consume more time in training than other models. The SVM description is avoided in this paper because many documents and books have described SVM theory in detail (Vapnik 1998;Carrier et al. 2013;Ch et al. 2013;Chiogna et al. 2018;Meng et al. 2019). SVM's basic idea is using the maximum margin algorithm (Pham et al. 2019), which searches for a hyperplane with the largest separating margin between the observed data. SVM can simplify an intricate problem by mapping the complicated nonlinear problem input factors into high-dimension space with kernel functions, transforming the complicated nonlinear problem into a linear problem. This process can find the optical function to fit the observations while avoiding overfitting to maintain the model generality. The useful and popular SVM kernel functions are linear, polynomial, sigmoid, radial basis function (RBF), and so on. This research employs SVM as a regression technique that uses RBF: where represents the Gaussian noise level of standard deviation. ( Architecture of the artificial neural network (ANN) model with an input layer, two hidden layers, and an output layer used for predicting monthly thunderstorm frequency

Autoregressive integrated moving average (ARIMA)
ARIMA models are widely employed statistical prediction techniques because of their ability to handle non-stationary series efficiently. ARIMA modeling's basic idea is, here, the examined time series is linear and follows a particular normal distribution (Box and Jenkins 1970). It is an effective modeling process and can be easily integrated with statistical software, like SAS, SPSS, and Stata. The model can be employed when the time series is stationary without having missing data. ARIMA model's disadvantage is that it can only extract linear relationships within the time series data (Zhang et al. 2013). In a traditional ARIMA (p, d, q) model, p is autoregressive (AR), d is the number of differences from the actual time-series data to make it stationary, and q is moving average (MA). The standard equation for ARIMA models is as follows: where d t is the observed value at time t, f i is the ith number of autoregressive parameter, j is the jth number of moving average parameter, and t is the error at time t. In this study, the Box-Jenkins methodology is used to formulate the ARIMA (1,1,1) (1,0,0) models for fitting TSF. This methodology comprises model identification, parameter estimation, and testing residual and forecast. A detailed description of the ARIMA model can be found in the literature (Contreras et al. 2003;Shadab et al. 2020).

Ensemble empirical mode decomposition (EEMD)
Based on Hilbert-Huang transform (HHT), Huang et al. (1998) first proposed empirical mode decomposition (EMD), which has been employed effectively throughout the decades. This is because of the following advantages: 1. EMD is a highly efficient and adaptive method for nonlinear and non-stable signals (Chen et al. 2021). 2. HHT is fully adaptive by initially introducing the intrinsic mode functions (IMFs), which is unlike the wavelet transform or Fourier transform that needs a pre-determined basis. The brief mathematical process of the EMD can be found in the literature (Zhou et al. 2014;Wang et al. 2015a, b;Fan et al. 2020). However, few of these IMFs may contain dramatic oscillations of different scales called "mode mixing" (Chen et al. 2021). This inconvenience can make these IMFs lose their physical signification and make the EMD algorithm less robust (Chen et al. 2021). The ensemble empirical mode decomposition (EEMD) was subsequently proposed by Wu and Huang (2009) to vanquish these shortcomings, adding a Gaussian white noise into the raw data series. It enables EEMD to automatically attribute signals with different time scales to the precise reference scales. In consequence, the correlation between the resultant IMFs and the raw series significantly improved. The challenge of employing EEMD is to better identify the amplitude of added noise and ensemble trials, which affect the performance. A considerable mode mixing betterment cannot be gained if the amplitude of the added noise is too less compared to the main series. In contrast, high added noise may produce redundant IMFs, leading to misinterpretation of the result (Tabrizi et al. 2015). The processes of EEMD are as follows: Add the normally distributed Gaussian white noise (t) to the target series f (t) to get a new signal F(t)∶ F(t) = f (t) + (t); 2. Decompose F(t) using EMD method. Obtain IMFs C i (t) and the residual r(t): 3. Adding different white noise sequence to the same raw series and repeat the above steps; 4. Since the mean value of Gaussian white noise is equal to zero, the IMFs obtained are integrated and averaged as the final result: where C j,m represents the jth IMFs from the mth time, and N denotes the number of the added white noise sequences. Resolved by the above process, we have obtained six IMFs in total from the raw data series.

Hybrid model building
In weather prediction, predicting thunderstorms is one of the most challenging tasks because of the implicit nonlinearity of thunderstorms' physics and dynamics (Litta et al. 2013). A problem with ANN, SVM, and other linear and nonlinear prediction models is that they cannot accurately handle non-stationary data. To solve the non-stationary problem, we have built an EEMD-ANN and an EEMD-SVM hybrid model. In our cases, EEMD decomposes the monthly TSF data of HTSF, MTSF, and LTSF into six IMFs and one residual, presented in Fig. 3a-c. In each category's case, we have selected the first three IMFs as predictors of the hybrid models with the other meteorological predictors. The value of IMF1, IMF2, and IMF3 is more relevant to the original series, and they are the most nonlinear components of their respective series. On the contrary, the value of IMF4, IMF5, and IMF6 is minimal. The residuals' value is not entirely relevant so that their contributions are lesser to fit the model, indicating difficulty in predicting the TSF more accurately. Besides, the correlation analysis (Table 3) suggested that the first three IMFs are the essential variables in predicting monthly TSF. Therefore, using these subseries in building the models might enhance the performances by giving useful information on several resolution levels. Figure 4 demonstrates the methodological procedures of the EEMD-ANN and EEMD SVM prediction models. As seen in Fig. 4, the main steps of the presented EEMD-ANN or EEMD-SVM prediction can be summarized as follows: Step 1 Decompose the original time series into a finite set of IMFs and a residue using EEMD.
Step 2 Eliminate the irrelevant or redundant IMFs and residue and select the IMFs with the highest frequency bands and a more significant correlation with the original series.
Step 3 Combine the selected IMF with other input parameters and then apply the ANN or SVM model to construct a prediction model for predicting TSF.
Step 4 Obtain the predicted output by the models.
There are some advantages to the proposed hybrid models. First, the basic principle of the EEMD is elementary, can still give a thorough understanding of the monthly TSF time series dataset. Second, it is suitable and adequate to couple the EEMD with ANN, SVM to predict the non-stationary and nonlinear TSF. Third, the EEMD-ANN and EEMD-SVM models' prediction outcomes can be more precise when applying the TSF time series decomposition. Fourth, the developed hybrid models do not need complex policy-making about each specific case study's obvious form. Thus, developing these hybrid prediction models by integrating EEMD may lead to more robust and better prediction outcomes.

Fig. 3
The decomposed sub-series of the original TSF data for high-frequency months (a), moderate-frequency months (b), and low-frequency months series (c) using EEMD

Model performance evaluation
The performance of each prediction model is evaluated using the following metrics. By letting t represent the reference values, ̂t represents the predicted values at time t , and denotes the mean of the reference values.

Coefficient of determination ( R 2 )
R 2 is the Coefficient of determination, which is a number between 0 and 1. It measures the degree of alignment between two parameters; in our case, the reference data ( t ) and the predicted data ( ̂t ). It quantifies how well future outcomes are likely to be predicted by the model. The coefficient of determination is calculated according to the formula as follows: where ̄ represents the average of the reference values.

Root mean square error (RMSE)
The second statistical metric RMSE is defined as follows: where N is the number of points in the test dataset. (5)

Mean absolute error (MAE)
The third statistical metric, namely the MAE, can be defined as follows: The MAE is the average of the absolute error between i and ̂i (i = 1, 2, … , n).

Mean absolute percentage error (MAPE)
The fourth criterion is MAPE , which is used to compute the relative error between | i −̂i| and | i | (i = 1, 2, … , n) , which is defined as

Index of agreement (IA)
The IA is used in this study and is defined as follows: The IA is a dimensionless metric used in comparing different models. The outcome value of IA is always between 0 and 1. For a "perfect" model, the R 2 and IA are equivalent to 1, and the MAE , MAPE , and RMSE are equal to 0. The three commonly used metrics, i.e., MAE , MAPE , and RMSE , all quantify the differences between the predicted and observed concentrations. However, the RMSE is more sensitive to extreme values, and the MAPE is sensitive to small values because of the power term. IA summarizes the similarity between the predicted and observed propensities.

Taylor diagram
Another model performance evaluation technique, "Taylor diagram," is used in this study. This technique is widely used to compare model data and tracking changes in model performances. The mathematical theory to construct the diagram can be found in the literature (Taylor 2001;Pakalidou and Karacosta 2018). The diagram provides the degree of similarity between the observed point and the test point. The closest the test point to the observed point, the highest the accuracy of the model.

Sensitivity analysis
The sensitivity analysis of the input parameters was measured for the best performing model of each category using the following equation: where R 2 is the square of the correlation coefficient of the best prediction model and R 2 is the square of the correlation coefficient of the predicted model when a specific parameter is excluded from the model.

Comparative analysis
In this section, the prediction results are presented along with a comparison of model performance metrics among the predicted outputs from different models. Figure 5a depicts the prediction results of HTSF using ANN, SVM, EEMD-ANN, EEMD-SVM, and ARIMA models of the testing dataset. In general, the hydrograph illustrates that all the models except for ARIMA have an excellent performance for simulating the monthly TSF. Moreover, Fig. 5b shows the scatter plots of prediction by these models, which indicates that the EEMD-ANN and EEMD-SVM models have the best performance for predicting the monthly TSF. Here, for EEMD-ANN and EEMD-SVM, the least square fitting line is slightly closest to the best possible 45° fitting line than ANN, SVM, and ARIMA. From Fig 5a and b, it is not clear which model performs better between EEMD-ANN and EEMD-SVM, as both are very close.
It is evident from Fig 5c that EEMD-ANN generated the highest value of CC and lowered centered RMS difference, while the other hybrid model EEMD-SVM was approximately equal in this regard. Comparing the models' standard deviation suggests that the EEMD-ANN and EEMD-SVM models were more in agreement and closer to observed values than the conventional ANN and SVM. It can also be seen in Fig. 5c that the EEMD-ANN has the standard deviation relative to the observed, but the traditional ANN has a standard deviation less than the observed. It shows that the hybrid EEMD-ANN outperforms classic ANN. Here also, the ARIMA model has the furthest distance from the reference data.
It can be observed from  1.478, and 1.67, 1.562 and 1.777, 1.099 and 1.446, 6.049 and 9.481, 0.992 and 0.99, respectively. ARIMA model has the worst performance compared with the other models, as observed in Table 4. The EEMD-ANN model has acquired the best score in all the validation metrics, gaining the best R 2 , training, and testing RMSE, MAE, MAPE, and IA values of 0.982, 1.177, 1.241, 0.917, 5.738, and 0.995, respectively. It increases the R 2 and IA by 1% and 0.3% and reduces the training RMSE, testing RMSE, MAE, and MAPE by 20.34%, 20.58%, 16.52%, and 0.31%, respectively, compared to the conventional ANN.
The hydrograph of the MTSF prediction results of the testing dataset is presented in Fig. 6a. Again, the hydrograph demonstrates that all the approaches except ARIMA have a decent performance in predicting the monthly MTSF. The scatterplot of the predicted outputs suggests that the EEMD-ANN and EEMD-SVM approaches have the closest leastsquare fitting line to the 45° line (Fig. 6b).
The Taylor diagram (Fig. 6c) suggests that the EEMD-ANN and EEMD-SVM have performed better than conventional ANN, SVM, and ARIMA. For the hybrid models, the correlation coefficient is > 0.95, while for the conventional models, it is < 0.95. The lower centered RMS difference also confirms that the hybrid models are more in agreement than the conventional models. It is immensely challenging to find out the better model between the hybrid models as both the models have performed almost equally.
The validation metrics from   2.68%, testing RMSE by 22.482%, MAE by 23.428%, and MAPE by 2.352% compared to the conventional ANN. The EEMD-SVM has also acquired a significant R 2 , training RMSE, testing RMSE, MAE, MAPE, and IA of 0.924, 1.412, 1.089, 0.909, 6.323, and 0.973, respectively, which improves the R 2 and IA by 4.881% and 0.62% and minimizes the training RMSE by 14.112%, testing RMSE by 8.024%, MAE by 2.258%, and MAPE by 0.529% while comparing with the conventional SVM. Figure 7a exhibits the predicted outcomes of the testing dataset of the LTSF using ANN, SVM, EEMD-ANN, EEMD-SVM, and ARIMA models. The hydrograph suggests that these models' predicted values, except for the ARIMA model, agree well with the observations of LTSF. However, several points exhibit a clear difference between the predicted and observed values. These differences often occur during the month of abrupt changes of TSF (e.g., 2012TSF (e.g., , 2014TSF (e.g., -2016. The scatterplot (Fig. 7b) indicates that the least-square fit for the EEMD-SVM prediction is closest to the perfect 45° fit, closely followed by the EEMD-ANN, SVM, and ANN approaches. However, there is a massive difference between the perfect fit and the least-square fit for the ARIMA model prediction.
EEMD-SVM approach has performed better than all the models (Table 4) in almost every validation metrics, which is coherent with the scatterplot (Fig. 7b) and the Taylor diagram (Fig. 7c) results. It is the best among the LTSF prediction models used in this study, gaining an impressive R 2 of 0. 0.886, training RMSE of 0.291, testing RMSE of Fig. 6 Observed and predicted output of the testing dataset of the moderate TSF series (a); scatterplot fitting of the prediction models (b); and Taylor diagram of prediction by EEMD-ANN, EEMD-SVM, ANN, SVM, and ARIMA models (c). The deep cyan contours represent the Pearson correlation coefficient, green contours represent centered RMS error in the simulated field, and violet contours represent the standard deviation of the simulated pattern 1 3 0.398, MAE of 0.325, MAPE of 17.925, and IA of 0.966. However, the EEMD-ANN has a better performance in acquiring the lower training RMSE (0.23) than EEMD-SVM, but it was not consistent in gaining better R 2 value (0.864), testing RMSE (0.439), MAE (0.352), MAPE (21.929), and IA (0.956). The conventional ANN, SVM, and ARIMA models were remarkably inferior to the hybrid approaches predicting the LTSF. Compared to the traditional SVM, the EEMD-SVM raises the R 2 , IA and lessens the training RMSE, testing RMSE, MAE, MAPE by 4.03%, 1.90% and 10.39%, 14.83%, 11.79%, 32.07%, which is remarkable.

Sensitivity assessment
The sensitivity analysis result of the best HTSF prediction model, EEMD-ANN, is shown in Fig. 8a. The rankings of the three most sensitive parameters are TSD, IMF1, and IMF2. Apart from these, all the parameters positively contribute to achieving better performance during the TSF prediction of the high-frequency months. TT, CPRCP, CAPE, and PRCP also played a pivotal role in constructing the EEMD-ANN model, followed by CRR, IMF3, WS50, RH, ST, DP, and KI (Fig. 8a). A similar result is observed in the three most sensitive parameters (Fig. 8b) for the EEMD-SVM model while predicting MTSF. In building this EEMD-SVM model, TT, CAPE, and IMF3 are the next ranked sensitive parameters, Fig. 7 Observed and predicted output of the testing dataset of the low TSF series (a); scatterplot fitting of the prediction models (b); and Taylor diagram of prediction by EEMD-ANN, EEMD-SVM, ANN, SVM, and ARIMA models (c). The deep cyan contours represent the Pearson correlation coefficient, green contours represent centered RMS error in the simulated field, and violet contours represent the standard deviation of the simulated pattern respectively, which are followed by CPRCP, CRR, WS50, DP, PRCP, RH, KI, and ST. Figure 8c depicts the sensitivity analysis of the best LTSF prediction model, which is EEMD-SVM. TSD, IMF1, and IMF3 are the top three sensitive parameters for predicting LTSF using the EEMD-SVM model. IMF2, CAPE, CPRCP, CRR, TT, and PRCP also play a vital role in constructing the model for predicting LTSF. Although the parameters like DP, KI, RH, ST, and WS50 have low sensitivity values, they help achieve better prediction accuracy.

Discussion
Due to increasing computational abilities, machine learning algorithms in modeling severe weather events are becoming progressively popular in current atmospheric studies. It is because of robust prospects from its use in operational prediction Czernecki et al. 2016;Kamangir et al. 2020) and the generating of severe weather events that add forthcoming changes in their frequencies (Allen et al. 2015;Lee et al. 2020). It is a fact that most current machine learning models have superiority over conventional statistical models . Furthermore, some machine learning models like random forest permit studying variable importance, making it likely to obtain a better insight into the factors influencing physics behind such studied processes. In this research, we assessed the use of machine learning algorithms in modeling high-, moderate-, and low-frequencies thunderstorm events on a monthly scale. This analysis was based on the observed TSF dataset, and convective parameters come from ERA5 reanalysis datasets. In the case of HTSF, the hybrid EEMD-ANN outperformed other models based on the evaluation criteria. For MTSF and LTSF, the hybrid EEMD-SVM model has superior performance than other standalone models. Theoretically, the three sub-series with various thunderstorm frequencies were used instead of thunderstorm events because it has physical meaning. TSF analysis can be used operationally to help in policy-making by lessening the cognitive associated with thunderstorm event identification.
Uncertainty increases in low TSF months (winter) because of the low SST and northeast wind flow from the BoB and lowers vapor flux availability. We anticipate that due to low surface temperature and soil moisture, the winter season (November to February) is the least favorable for forming TSF. However, this outcome is not surprising, and it can be underlined by analyzing the TSF pattern of three categories. This work proposed a prediction strategy for TSF prediction circumventing the probable precision reduction triggering from calibrating the decomposition method during implementing and accepting the application of operational research reported in several previous works (Napolitano et al. 2011;Zhang et al. 2015;Johny et al. 2020).
The outcomes obtained in this research indicate that EEMD can efficiently increase prediction accuracy, and the proposed EEMD-ANN model can achieve notable improvement over the conventional ANN method in the high, medium, and low TSF monthly timeseries predictions. The EEMD-ANN is more successful in capturing the HTSF monthly, showing remarkable precision than the SVM-EEMD model. Our finding is similar to the other hydrological time-series studies (Wang et al. 2015a, b). One probable reason for the improved performance of EEMD-ANN can be the method's capability to solve complex and nonlinear problems (Phuong et al. 2017).
In predicting MTSF, the Taylor diagram suggests EEMD-ANN is slightly more accurate than the EEMD-SVM. But the validation metrics suggest otherwise. We have selected the EEMD-SVM as the best model for MTSF prediction because the difference between the models is very narrow in the Taylor diagram, and it has performed better in most of the validation metrics. Besides, the EEMD-SVM has gained a substantial improvement in testing RMSE than EEMD-ANN, which indicates better model fitting for EEMD-SVM. The difference in the Taylor diagram and validation metrics results is probably due to the diagram's algorithm based on root mean square difference, standard deviation, and correlation coefficient. The standard deviation of the observed and predicted datasets might create this result difference. Also, the EEMD-SVM is more accurate in predicting the LTSF monthly. This is because SVM has been incredibly robust and efficient in nonlinear noise mixed data (Devak et al. 2015). Furthermore, the potential of decomposition might be more prominent in predicting the TSF dataset in the EEMD-ANN or EEMD-SVM model than the standalone model because the hybrid model can overcome the shortcomings of the standalone model to produce a synergetic impact on prediction. The hybrid EEMD-SVM can help avoid the overfitting or underfitting problem of the SVM model caused by the input parameters' improper determination. This also implies that the EEMD tool is applicable for decomposing monthly TSF time series and the idea of "decomposition and ensemble" is suitable. The findings in this work agree well with those obtained from the studies (Hawinkel et al. 2015;Wang et al. 2015a, b;Czernecki et al. 2016). Previous studies have found that hybrid EEMD-ANN and EEMD-SVM models outperformed the classical models, which apply original datasets in other fields of studies, e.g., runoff (Tan et al. 2018); streamcflow forecasting (Zhang et al 2015); rainfall forecasting (Johny et al. 2020) wind 1 3 speed forecasting (Yu 2020); groundwater level (Gong et al. 2018). The hybrid model is robust, theoretically justified, and more realistic compared to other standalone models. It can be said that the proposed methodology can not only predict the complicated thunderstorm frequency over Bangladesh rationally well, but it can also attain extreme climatic events.
Topographical differences, wind regimes, and the inland distance far from the coastal and hilly regions may differ sensitivity results in these categories. Based on the sensitivity assessment, TSD, IMF1, and IMF2 generated the highest score, similar to other thunderstorm-associated parameters in India by Umakanth et al. (2020). TSD is very high in sensitivity analysis due to its close association with TSF calculation and an enhanced number of TSF causing moist air circulated from the Bay of Bengal (BoB). Generally, the high TSD along with high ST and RH in May and June is observed in the northeast region, close to Cherrapunji, where the cloud formation is high, and hill ranges generate a tremendous amount of water vapor flux and precipitation. In recent times, more vigorous and more continuous moist is derived from the BoB because of elevated sea surface temperature. The high ST and sea surface temperature triggered a rise in CAPE in most parts of Bangladesh (Wahiduzzaman et al. 2020;Sahu et al. 2020). These findings support the outcomes of Glazer et al. (2020), who assessed the variations in TSD in Bangladesh due to global climate change and revealed an increase in TSD in many regions of the country. An increase in CAPE and higher moisture content in the BoB may play a vital role in enhancing TSD. Thus, the sensitivity assessment gives a physical means to capture the non-overlapping TSD that would otherwise trigger the concern of multicollinearity. This is obvious from the improvement in the model performance for high TSF identification reported in earlier works (Siddiqui and Rashid 2008;Gagne et al. 2017). Generally, the problem of overfitting is a familiar matter for predicting a severe extreme event that can be lessened using current machine learning methods (Czernecki et al. 2019), particularly if various data sources are coupled. All the other convective parameters, e.g., TT, CPRCP, CRR, KI, and the meteorological parameters, e.g., PRCP, RH, ST, WS50, have positively contributed to the best model building. Similar to our findings, Czernecki et al. (2019) found that the most unstable and downward CAPE, RH, temperature, and dew point in different pressure levels contribute to large hail prediction. Kulikov et al. (2020) identified that KI, TT, and CAPE were efficient parameters to predict thunderstorms, which conforms with our study. Meher and Das (2019) concluded that either KI or TT or both could be used to predict thunderstorms in the Gangetic West Bengal, where TT had a better influence than KI, similar to our sensitivity results.
The role of the input parameters on TSF can be understood with the help of Pearson's correlation analysis among the input and output parameters. From Fig. 9, it is evident that most of the parameters have a positive correlation among them except for the correlation between CAPE and PRCP. The reason behind the positive correlation is the parameters are very closely associated with each other, and their inherent relationship is such that the change of a parameter triggers the other. TSD, CPRCP, CRR, KI, DP, RH, PRCP, and TT have a very high correlation (above 0.5) with thunderstorm occurrences, while CAPE, WS50, and ST have a moderately strong correlation (between 0.35 and 0.5). Due to this very high positive association, thunderstorm occurrences and their variabilities in Bangladesh largely depend on these parameters. This result supports the sensitivity analysis findings where all the input parameters are positively contributed to predicting the TSF of each category.
It is worth mentioning that the limitation of this research lies in two perspectives. First, although monthly TSF data were taken from 28 stations, future studies using datasets from different regions may be needed to strengthen these valid conclusions because the performance of data-driven models is data-based and case reports explicit. Second, a wider variety of datasets can be considered to examine their influence on TSF prediction. Input parameters like TSD might not be available for all regions. Future research should search for an alternative to TSD as input parameters and focus on improving the models' accuracy in TSF prediction. Our future work includes the case-study concept of generating a seasonal TSF forecasting on a continental scale that can provide deep insight into the severe weather event's current knowledge. Also, testing our hybrid model is to forecast thunderstorm frequency for other similar climatic regions globally. The ERA5 based parameters can be used in the RF model that is yet more reliable than any sole parameters used in operational models ). In addition to this, good tuning of generated machine learning is feasible if it is more robustly fitted on a considerable number of datasets or adding new parameters from satellite datasets. Hence, with computational interests in modeling tools, machine learning models play a pivotal role in examining thunderstorm events' climatological perspective and improving operational prediction. The application of machine learning algorithms in thunderstorm prediction brings with a new promise for forthcoming studies concerning both operational predictors and meteorological research that intend to examine observed and future variations in frequencies of severe extreme events Yasen et al. 2017;Taszarek et al. 2019).

Conclusion
In the current study, we have taken monthly TSF data from 28 meteorological stations of BMD from 1981 to 2016 and proposed two novel hybrid machine learning models along with conventional ANN, SVM, and ARIMA for TSF prediction at three different categories over Bangladesh. CAPE, CPRCP, CRR, DP, KI, PRCP, RH, ST, TSD, TT, and WS50 were used as input parameters to construct the models. The performances of the adopted models were evaluated through model performance metrics like Coefficient of determination, RMSE, MAE, MAPE, IA, and Taylor diagram. Finally, a coefficient of determinationbased sensitivity analysis was applied to examine each input parameter's sensitivity in constructing the best models for each category. The major outcomes drawn from this study are as follows: 1. In building the hybrid models, we found that the first three IMF, such as IMF1, IMF2, and IMF3, significantly correlate with the main series, while the other IMFs and residuals have an insignificant correlation with the main TSF series. Results suggest that using the first three IMF as input parameters significantly improves the model performances. 2. 11 out of 21 input parameters were selected based on the random forest (RF) variable importance analysis. The redundant variables were eliminated to enhance the capability of the models. 3. The proposed EEMD-ANN and EEMD-SVM hybrid models showed 8.02-22.48% better precision in terms of root mean square error (RMSE) than conventional ANN, SVM, and ARIMA models at different categories. Among the models, EEMD-ANN has the highest accuracy in predicting high TSF, while EEMD-SVM has the best performance in predicting moderate and low TSF. The findings indicate that the hybridization of EEMD with the conventional models has implications for better prediction. 4. The coefficient of determination-based sensitivity analysis results showed that each input parameter was positively contributed to building the best model of each category, and thunderstorm days are the most contributing parameters influencing TSF prediction. The IMFs obtained through EEMD coupled with ERA5 reanalysis and NASA LaRC POWER Project-derived indices positively contribute to building the best models of each category. 5. The findings of this study can support the policymakers in developing an effective and efficient early warning system with a view to mitigating the adverse effects of thunderstorm hazards. Besides, this study has implications for estimating the precious thunderstorm-induced rainfall in agricultural production to maximize rainwater use. Also, the thunderstorm prediction models developed in this study can be adopted in different parts of the world.