2.1 Hybrid model based on DWT-SVR-Prophet
We propose a DSP model for rainfall time series prediction, and the framework shown in Fig. 1.
(i) Data preparation and preprocessing: We construct a dataset with daily rainfall data from the National Meteorological Center. The validity and superiority of the model introduced in this paper are verified using this dataset. We preprocess the measured rainfall data to ensure the fitting effect of the applied machine learning model. This process is described in detail in Section 2.2.
(ii) DWT processing: The rainfall time series are decomposed using DWT to obtain high-frequency subsequences with high randomness and volatility and low-frequency subsequences with high periodicity (see Section 2.2.1). This approach allows us to choose forecasting models based on the characteristics of each subseries.
(ⅲ) Hyperparameter optimization: The hyperparameters of the three methods in the coupled model are optimized to obtain the optimal prediction effect. Notably, the parameters of the DWT method are determined by referring to previous literature, and a grid search method is used to set the hyperparameters of the SVR and Prophet models. The specific process of parameter selection is described in detail in Section 2.5.
(iv) Rainfall prediction: The optimized SVR model and Prophet model are used to predict high-frequency subseries and low-frequency subseries, respectively. The prediction results for each subseries are summed to obtain the final prediction results.
2.2 Data preprocessing
The data used in this paper were obtained from the National Weather Science Data Center (http://data.cma.cn). To construct the dataset, we selected daily rainfall data (from 1 January 2014 to 30 June 2016) from five stations in different climate zones. The rainfall statistics are shown in Table 1. Notably, station 59855, station 58345 and station 57348 are associated with high average annual rainfall and a high rainfall frequency. Station 54823 is characterized by moderate annual rainfall and the lowest rainfall frequency. Moreover, station 56018 displays the smallest annual average rainfall but a high rainfall frequency.
Table 1
Station (ID) | Rainfall (mm) | Rainfall frequency (%) | Climatic type |
Per year | Per month | Daily maximum |
Hainan (59855) | 1972.2 | 182.6 | 253.1 | 41.4 | Northern tropics |
Jiangsu (58345) | 1362.7 | 126.2 | 154.8 | 36.0 | Northern subtropics |
Chognqi (57348) | 881.9 | 81.7 | 113.6 | 35.8 | Mid-subtropics |
Shandon (54823) | 642.5 | 59.5 | 127.1 | 20.7 | Southern temperate |
Qinghai (56018) | 466.3 | 43.2 | 31.3 | 40.4 | Plateau climate |
One of the prerequisites for a data-driven model is ensuring that the data quality meets the relevant modeling requirements (Luo et al. 2018). Therefore, before performing data decomposition, the measured rainfall data need to be preprocessed, such as filling missing values and normalizing the data. There were no missing values in the dataset used in this paper. To avoid the influence of extreme rainfall values and improve the accuracy of the machine learning algorithm, the maximum-minimum scaling method (Ponnoprat 2021) was adopted to normalize daily rainfall data to between 0 and 1. The specific formula is as follows:
$${P_{norm}}=\frac{{{P_i} - {P_{\hbox{min} }}}}{{{P_{\hbox{max} }} - {P_{\hbox{min} }}}}$$
1
where Pnorm, Pi, Pmin and Pmax are the normalized, measured, minimum and maximum values of rainfall, respectively.
2.3Methods used in the DSP model
2.3.1 Discrete wavelet transform
Wavelet transform is a time-frequency analysis method with multiresolution characteristics.The characteristics of an original sequence in different frequency bands is obtained by changing the corresponding scale (Ma et al. 2022). The main wavelet transform methods include continuous wavelet transform (CWT) and discrete wavelet transform (DWT). DWT is based on simple processing steps, avoids the redundancy problem of CWT and is more suitable for processing time series data (Liu et al. 2014). Therefore, DWT is chosen for rainfall time series decomposition, and the corresponding expression is:
$$W\left( {p,q} \right)={2^{ - \left( {\frac{p}{2}} \right)}}\sum\limits_{{t=0}}^{T} {\psi \left( {\frac{{t - q \cdot {2^p}}}{{{2^p}}}} \right) \cdot x\left( t \right)}$$
2
where t is the time parameter, T is the signal length, p is the scale parameter, and q is the offset parameter.
The specific decomposition process is shown in Fig. 2. The rainfall time series is decomposed into m subseries of different frequencies (D1, D2,…, Dm, and Am), which are used as the inputs of the coupled model. The frequency of the subsequences decreases from D1 to Am in the above order.
2.3.2 Support vector regression model
SVR is an extended approach based on the support vector machine (SVM) concept. The benefits of this approach include structural risk minimization and the ability to use small sample sizes. It is an application of SVM in the field of regression(Essam et al. 2022). Its regression process is as follows.
The basic form of linear regression is :
$$f\left( x \right)={\omega ^T} \cdot x+b$$
3
For a given sample set\(\left\{\left({x}_{i},{y}_{i}\right),\text{i}=\text{1,2},,,\text{N}\right\}\), \({x}_{i}\in {R}^{n}\) is the input quantity and \({y}_{i}\in R\) is the output quantity. Consider a mapping form, such that\(\phi \left(x\right)\) is the eigenvector of x after mapping to higher dimensions, yielding the following linear regression function:
$$f(x)={\omega ^T} \cdot \varphi (x)+b$$
4
where \(\omega\) is the coefficient, and\(b\in R\) is the deviation.\(\omega\) after b is learned, the model is established.
According to the structural risk minimization criterion, the problem is transformed into an objective function R minimization problem, which can be expressed as:
\(\mathop {\hbox{min} }\limits_{{\left( {\omega ,e} \right)}} R\left( {\omega ,e} \right)=\frac{1}{2}{\omega ^T} \cdot \omega +C\sum\limits_{{i=1}}^{n} {{e_i}^{2}}\) \({s_\cdot }{t_\cdot }{y_i}={\omega ^T}\varphi \left( {{x_i}} \right)+b+{e_i}\) (5)
where the equation after \({s_\cdot }{t_\cdot }\) denotes the constraint that the objective function R should satisfy when minimized;\({ e}_{i}\in R\) is the error variable, to be determined based on model training; and C is the penalty coefficient, which is greater than 0.
To solve the above minimization problem for the objective function R, the Lagrangian function L is constructed.
$$\begin{gathered} L\left( {\omega ,b,{e_i},{\alpha _i},{{\tilde {\alpha }}_i}} \right)=R\left( {\omega ,e} \right) - \sum\limits_{{i=1}}^{n} {{\alpha _i}\left[ {{\omega ^T}\varphi \left( {{x_i}} \right)+b+{e_i} - {y_i}} \right]} \hfill \\ {\text{ }}+\sum\limits_{{i=1}}^{n} {{{\tilde {\alpha }}_i}} \left[ {{\omega ^T}\varphi \left( {{x_i}} \right)+b+{e_i} - {y_i}} \right] \hfill \\ \end{gathered}$$
6
The above equation is substituted into Eq. (3), and the following expression is obtained:
$$\omega =\sum\limits_{{i=1}}^{n} {\left( {{{\tilde {\alpha }}_i} - {\alpha _i}} \right)} \cdot {x_i}$$
7
where \({\tilde {\alpha }_i},{\alpha _i}\) are the Lagrange multiplier operators, and can make the sample of \({\tilde {\alpha }_i} - {\alpha _i} \ne 0\)that is the support vector of SVR.
Finally, by satisfying the Karush-KuhnTucker (TTK) condition in the SVR dual problem and substituting the resulting formula into Eq. (3), the SVR solution can be obtained as follows:
$$f\left( x \right)=\sum\limits_{{i=1}}^{n} {\left( {{{\tilde {\alpha }}_i} - {\alpha _i}} \right) \cdot {x_i}^{T}x+b}$$
8
where bias term\(b={y}_{i}-\sum _{i=1}^{n}\left(-{\alpha }_{i}\right)·{{x}_{i}}^{T}x\). The SVR solution when considering the form of feature mapping is:
$$f\left( x \right)=\sum\limits_{{i=1}}^{n} {\left( {{{\tilde {\alpha }}_i} - {\alpha _i}} \right)} \cdot k\left( {{x_i},x} \right)+b$$
9
where\(k\left({x}_{i},x\right)\) is the kernel function. The commonly used kernel functions are the linear kernel function and the radial basis kernel function.
2.3.3 Prophet model
The Prophet model is an open-source decomposable time series prediction model that was developed by Facebook (Taylor et al. 2018). It is based on time series decomposition and machine learning fitting, which are used to predict the future trends of time series. The model consists of four main components:
$$y(t)=g(t)+s(t)+h(t)+\varepsilon (t)$$
10
where t is the current time; \(y(t)\) is the current value; \(g(t)\) is the trend term, which represents a nonperiodic variation of the time series; \(s(t)\) is the period term, which reflects the cyclical or seasonal variation in the time series; \(h(t)\) is the holiday-event term, which can be interpreted as an additional influence term; and \(\varepsilon (t)\) is the error term, which obeys a normal distribution.
The trend term \(g(t)\) can be based on either a logistic regression function or a segmented linear function. The segmented linear modelling equation is as follows:
$$g\left( t \right)=\left( {k+a{{\left( t \right)}^T}\delta } \right)t+\left( {m+a{{\left( t \right)}^T}\gamma } \right)$$
11
where k denotes the growth rate of the model; δ is the change in k; m is the offset; t is the time stamp; and a (t) is the indicator function. Additionally, \({\text{a}\left(\text{t}\right)}^{\text{T}}\) is the transpose vector of a (t), and γ is the offset of the smoothing process, which serves to make the function segment continuous. The logistic regression model is:
$$g(t)=\frac{c}{{1+\exp ( - k(t - m))}}$$
12
where c denotes the bearing capacity of the model;and the definitions of k and m are the same as those above.
The periodic term \(s(t)\), which models the periodicity of the time series using a Fourier series, is as follows:
$$s\left( t \right)=\sum\limits_{{n=1}}^{N} {\left[ {{a_n}\cos \left( {\frac{{2\pi nt}}{p}} \right)+{b_n}\sin \left( {\frac{{2\pi nt}}{p}} \right)} \right]}$$
13
The parameters P and N can be adjusted according to the length of the selected period. For example, P = 365.25 and N = 10 when the period is annual, and P = 7 and N = 3 when the period is weekly.
The holiday-event model \(h(t)\) is:
$$h\left( t \right)=Z\left( t \right)k=\sum\limits_{{i=1}}^{L} {{k_i} \cdot {1_{\left\{ {t \in {D_i}} \right\}}}}$$
14
where \(Z\left(t\right)=\left({{1}_{\left\{t∊{D}_{1}\right\}},,,1}_{\left\{t∊{D}_{i}\right\}}\right)\) ,\(k={({k}_{1},,,{k}_{L})}^{T}\); L denotes the number of holiday events;\({D}_{i}\) represents the time range of the holiday-event; Z(t) is set to 1 when time t is within the range of a holiday-event and equals 0 otherwise; and \({k}_{i}\) indicates the influence of different holiday-events on the time series prediction; k obeys normal distribution.
2.4 Hyperparameter optimization
The prediction effect of the coupled model mainly depends on the selection of the model parameters, and the main parameters of each model are shown in Table 2. To obtain the optimal prediction effect, the three model parameters are optimized, and the detailed methods are as follows.
2.4.1 DWT
The main parameters of DWT include the wavelet function and decomposition level. Among the standard wavelet functions, Daubechies (db) family wavelets are commonly used in hydrometeorology (Quilty et al. 2021; Wu et al. 2021). Nalley et al. (2012) used db5-db10 wavelets for DWT-based analyses of rainfall time series. Altunkaynak et al. (2015) performed a 3-level decomposition of rainfall time series and used the result as the input to a prediction model. Referring to the existing studies, we select db7 and 3-level decomposition for DWT data processing.
2.4.2 SVR
We use a grid search method for parameter search optimization in the SVR model. Then, a 10-fold cross-validation technique is applied to the training set to prevent overfitting.
-
The radial basis function (rbf) is chosen as the kernel function of the SVR model. First, the ranges of values and search steps are set for the main parameters C and γ, and all parameter combinations within the given ranges are obtained\(\left( {{\gamma _x},{C_y}} \right),\left( {x=1,2,...,M;y=1,2,...,N} \right)\).
-
All parameter combinations are applied to rainfall predictions, and the best parameter combination is selected based on effect evaluation \(\left( {{\gamma _j},{C_k}} \right)\).
-
To ensure the stability of the search result, the adjacent interval of the optimal parameter combination is selected as the new search range \(\gamma \in \left( {{\gamma _{j - 1}},{\gamma _{j+1}}} \right),C \in \left( {{C_{k - 1}},{C_{k+1}}} \right)\). Then, the search step size is reduced by factor of 2 (or another multiple), and the optimal parameter combination is again obtained. If the result is unstable, the process is continued until a stable result, i.e., the optimal combination of parameters, is obtained.
In this paper, \(C \in \left[ {{2^{ - 10}},{2^{10}}} \right]\), \(\gamma \in \left[ {{2^{ - 10}},{2^{10}}} \right]\), and the step size is 2.
2.4.3 Prophet model
A grid search method is used to search for the optimal parameters of the Prophet model, and the basic process is the same as that used for the SVR model. The following initial range settings are used for the parameters of the Prophet model.
-
Both linear and logistic "Growth" parameters, and additive and multiplictive "Seasonality mode" parameters are considered.
-
The monthly period term is summed with the "Add_seasonality" function in the Prophet model, with "period" = 30.5. Then, the initial range of "Year_seasonality" and "Seasonality_prior_scale" is set to \(\left[\text{1,100}\right]\) with a step size of 5.
-
The "Changepoint_prior_scale" parameter has a range of \(\left[\text{0.01,20}\right]\) and the corresponding step size is 0.5.
Table 2
Hyperparameters of the DWT, SVR and Prophet models
Model | Parameters | Parameters description | Default value | Optimization method |
DWT | Wavelet name | Wavelet basis function | ─ | From previous research |
| Level | Wavelet decomposition level | ─ |
SVR | Kernel | Kernel function | rbf | Grid search |
| C | Penalty coefficient | 1 | |
| γ | Kernel function coefficient | auto | |
Prophet | Growth | Function in the trend model | linear | Grid search |
| Changepoint_prior_scale | Trend flexibility | 0.05 | |
| Year_seasonality | Year flexibility | 10 | |
| Seasonality_prior_scale | Seasonality flexibility | 10 | |
| Seasonality mode | Model learning style | additive | |