Forecasting Short-term Water Demands with an Ensemble Deep Learning Model for a Water Supply System

Short-term water demand forecasting is crucial for constructing intelligent water supply system. Plenty of useful models have been built to address this issue. However, there are still many challenging problems, including that the accuracies of the models are not high enough, the complexity of the models makes them hard for wide use in reality and the capabilities of models to catch peaks still have much room for improvement. In order to solve these problems, we proposed an ensemble deep learning model named STL-Ada-LSTM for daily water demand forecast by combining STL method with AdaBoost-LSTM model. After data preprocessing, the smoothed series is decomposed by STL to gain three input series. Then, several LSTM models are integrated by the AdaBoost algorithm to construct the ensemble deep learning model for water demand forecast. At last, the superiority of the proposed model is demonstrated by comparing with other state-of-the-art models. The proposed method is applied for water demand forecast using daily datasets from two representative water plants located in Yiwu, East China. All models are assessed by mean absolute scaled error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root mean square error (RMSE), coefficient of determination (R2) and Akaike information criterion (AIC). The results show that the proposed model not only improves the accuracy of the forecast, but also enhances the stability and conciseness. It is proven as a practical model with good accuracy and can be further applied in daily water demand forecast in other regions.


Introduction
Water scarcity has become a serious threat to human in recent decades (Salloom et al. 2021). An intelligent water management system is efficient to respond to the threat (Bajany et al. 2021). Moreover, driven by continuous and rapid expansion of urbanization, an urban intelligent water supplying system is becoming more and more indispensable (Du et al. 2021). Since short-term water demand forecasting (SWDF) which means forecasting water demand over time horizons ranging from 1 day to 1 month is an important component of an intelligent water supply system, it is crucial to forecast the demand effectively and accurately. However, forecasting urban water demand is a challenging task, especially considering its complicated influencing variables, non-stationarity and stochasticity (Guo et al. 2020).
In recent years, a variety of SWDF models have been built to enhance forecast accuracy and efficiency. These models can be classified from the perspective of input, model construction techniques and output. Based on SWDF inputs, it is possible to categorize the models into single-variable based models and multiple-variable based models. The input of a single-variable based model is historical observed water demand data (Huang et al. 2021). In the other category, historic water demand and weather data are considered as inputs for multiple-variable based models (Antunes et al. 2018). Some methods such as principal component regression (Haque et al. 2017) and random forest  have been used to select the reasonable multiple inputs to reduce the dimensionality of the original data set. Although water demand is influenced by plenty of factors, practically it is hard for water utilities to collect daily meteorological and hydrological data and others in a short time for SWDF (Chen et al. 2022). Using multiple variables as inputs would also increase the uncertainty of forecast due to errors of variables themselves. Therefore, this study still used the historic water demand data as the single input of SWDF.
In terms of model construction techniques, SWDF can be classified into statistical and artificial intelligence-based models. To represent temporal dynamics, a classic family of statistical models has been applied in the former years, such as Auto-Regressive model (Maidment and Miaou 1986), Auto-Regressive Integrated Moving Average (ARIMA) model (Jowitt and Xu 1992)and seasonal ARIMA (Caiado 2010). Because these models are relatively simple to understand and implement, they have been widely applied in practice. However, these models are usually inadequate to fit the complex features of urban water demands and may not be able to produce sufficient accuracy for forecast (Voitcu and Wong 2006). Although some studies tried to explore more accurate statistical method such as independent component regression technique (Haque et al. 2017) and double-seasonal time series model (Chen and Boccelli 2018) for water demand forecast, the averaged absolute relative errors (MAPE) are often more than 3%. To solve this problem, a plenty of artificial intelligence (AI) models such as support vector machine (Shabani et al. 2017), random forest and artificial neural networks (Antunes et al. 2018;Zubaidi et al. 2018) have been widely used based on the rapid development of artificial intelligence in recent years. However, the layer architectures of these classical machine learning models are too shallow to mine more implicit features of large data (Chen et al. 2020b).
Deep learning is one of the pragmatic solutions to cope with this situation (Wang et al. 2019) for its strong feature mining ability and it could improve the performance of water demand forecast compared to classical artificial intelligence models (Xu et al. 2018). Since deep learning was introduced to SWDF, Recurrent Neural Networks (RNNs) has become a good option because they often obtain reliable forecasts that outperform the benchmarks (Heddam et al. 2016). The variants of RNN including gated recurrent unit network (GRU) and long short-term memory network (LSTM) were found to be able to increase the prediction accuracy. Guo et al. (2018) developed a gated recurrent unit network (GRUN) model for SWDF and obtained more accurate and stable results than conventional ANN model, but they found that the errors at extreme points were hard to improve. To tackle the problem, Salloom et al. (2021) inserted virtual data between the actual data to alleviate the nonlinearity at the extreme points to reduce the error. However, they also admitted that this expansion reflects negatively on the computational load because of increment in the input size. Compared to GRU, although LSTM has more parameters, it has stronger ability to catch extreme points in long time series (Hewamalage et al. 2021). In 2020, LSTM was applied to SWDF for the first time, and the study proved that the LSTM-based model can offer forecast with improved accuracy than other classical models (ARIMA, RF, SVM) when dealing with data with abrupt changes and data with a relatively high uncertainty level (Mu et al. 2020).
To further improve the performance of a single LSTM model for water demand forecasting, some studies started constructing hybrid models. Du et al. (2021) proposed a hybrid LSTM model combining discrete wavelet transform (DWT) with principal component analysis (PCA) pre-processing techniques for water demand forecast. To further solve uninformative and unreliable problems when the uncertainty level of data increases, a hybrid model (KDE-PSO-LSTM), which combines LSTM with kernel density estimation (KDE) optimized by using the particle swarm optimization (PSO) algorithm, was proposed . To some degree, the mechanically stitched hybrid model increases the difficulty of model application and uncertainty of the forecast because the other components of the hybrid model are usually relatively independent models without the interconnectedness of internal mechanisms. Models like DWT and PCA are two independent methods and they need to be tuned respectively. An ensemble model has the potential to solve the problems above since the components are deeply integrated in a holistic model which means managers do not need to run the composed models respectively. Only one run would gain the final forecast results while the complexity and time-consuming of forecast would decrease dramatically. Nevertheless, to our knowledge, the ensemble methods based on LSTM for water demand forecasting have not yet been studied sufficiently. Zanfei et al. (2022) used the simple average method to ensemble different models including LSTM for drinking water consumption. This simple ensemble method ignores the different contributions of different models. The AdaBoost algorithm, one of the most successful ensemble methods, has various advantages including simple computation, high precision and without overfitting (He et al. 2020). Therefore, in this study, the AdaBoost algorithm is adopted to improve LSTM performance through iterative computation to achieve a stronger SWDF. Considering SWDF output, it is possible to categorize the models into deterministic and uncertain results. In this study, we aim to construct a SWDF model which offers deterministic result for the convenience of water supply system managers.
To summarize, limited by non-linearity, non-stationarity and uncertainty of SWDF, many studies have to combine multiple models to achieve higher accuracy. The operation of every single combined model is required to be accurate enough to guarantee the final accuracy. Due to the complexity and high technical barrier, these hybrid models might be difficult for use in the real water supply system because the operators working in grassroots positions might not have such professional knowledge and model experience.
To solve the above-mentioned problem, it is necessary to develop a more practical forecast model with high accuracy and low technological barriers. This is the main goal of this study. The proposed model of this study is expected to offer a practical SWDF model for applications in the cities of water shortage which need to establish their own intelligent water supplying systems. The so-called STL-Ada -LSTM model developed in this study reduces the complexity of the previous hybrid SWDF models by integrating all independent component models through an ensemble algorithm. The innovative points of this study include: 1) extracting seasonal features by a time series decomposition method is adopted to enhance the accuracies of the proposed model; 2) the accuracies of peaks are guaranteed by the superiority of LSTM models; 3) the number of the parameters of the proposed model is controlled as small as possible to ease the tuned process; 4) the STL-Ada -LSTM model is validated and compared to state-of-the-art SWDF models using the Yiwu City in East China as a case study.
The remainder of this paper is organized as follows. Section 2 describes the main methodology used in this study, including the STL-Ada-LSTM model. Section 3 introduces a practical case study. Sections 4 and 5 describe the main results and discussion. Finally, in Section 6, the conclusions are given.

Methodology
A schematic diagram of the proposed research framework is presented in Fig. 1. Firstly, the outliers of origin water demand time series are identified by the use of 3σ criterion. The moving average method is applied to smooth the identified outliers. Then, according to the seasonal characteristics of data series, the Seasonal and Trend decomposition using the Loess (STL) method (Antunes et al. 2018) is adopted to extract the seasonal, trend and residual features of the smoothed series. Thus, the time series is decomposed into three components including trend series T t , seasonal series S t and residual series R t as shown in Fig. 1. Considering the heterogeneity and strong forecast capabilities of LSTM and the independent error distributions of AdaBoost model, a forecast method that combines these two (AdaBoost-LSTM) is developed to make up each other's deficiencies. The decomposed series are regarded as the inputs of three AdaBoost-LSTM models respectively. The first AdaBoost Model, displayed on the left of Fig. 1, is designed to forecast the trend; the middle one is designed to extract and predict seasonal features; the right one is proposed to enhance the ability to catch the peaks. Finally, the outputs of three AdaBoost-LSTM deep learning models are summed up to gain the final forecasts of water demand. The whole model aforementioned is named as STL-Ada-LSTM model.

Identification and Processing of Outliers
Outliers in time series, depending on their nature, may have a moderate to significant impact on model forecast. To guarantee the reliability of data, the 3σ criterion is used to distinguish the outliers of original water demand series X t . Using the 3σ criterion, X t will be controlled in a 99.73% confidence interval (Du et al. 2021) and the other outliers will be smoothed to fit in with the standard by the weighted average method as Formula (1): t−k and x t−k represent the weighted values and historical data near the outliers, respectively; k refers to a positive integer and E t is the smoothed outliers. Finally, all data will be processed in the band by t − 3 t , t + 3 t , where and represent the mean and standard deviation of the original water demand series respectively (Alvarado-Barrios et al. 2020). (1)

Seasonal and Trend Decomposition Using Loess
Seasonal and Trend Decomposition Using Loess (STL) is a time series decomposition method based on locally weighted scatterplot smoothing (loess) (Chen et al. (2020a). The time series could be decomposed into the three additive components of seasonal S t , trend T t , and remainder R t components:X t = S t + T t + R t . Compared with other traditional seasonal decomposition techniques, such as X-12-ARIMA and the ratio-tomoving-average method, STL is able to provide more robust results (Xiong et al. 2018). Because the short-term water demand time series is characterized by seasonality and instability (Antunes et al. 2018), the STL is suitable for it. Also, this method is not as complicated as discrete wavelet transform which needs to be tuned. STL is an iterative method consisting of two recursive procedures, inner and outer loops. The detailed steps of this method are described in the study by Tao Xiong et al. (2018).

Deep Learning Using Long Short-term Memory
Long Short-Term Memory (LSTM) is a variation of the RNN architecture which could avoid the long-term dependency problem and also made a tremendous breakthrough on dealing with vanishing or exploding gradients problem (Nguyen et al. 2020). The advantages of LSTM such as memorability and shared parameters can effectively learn the longterm dependencies in time series, and make it popular in time series forecast ( Zanfei et al. 2022;Du et al. 2021). LSTM is controlled by three gates, i.e., input gate, output gate and forget gate, forming a self-loop update. The forget gate (f) decides how much information will be kept and passed to the next stage, the input gate (i) decides how much new information will be added, and the output gate (o) updates the system state using the information and cell state (c) from the previous two gates. The update process of LSTM is illustrated briefly as follows (Han et al. 2019): where W f , W i , W c , W O denote the weight matrices of forget gate, input gate, cell state and output gate respectively; b f , b i , b c are the bias items of forget gate, input gate and output gate respectively; h represents the hidden state, x represents the input, t represents the time, σ means sigmoid function, and " × " means point-wise multiplication.
The Back Propagation Through Time (BPTT) algorithm is used to train the LSTM model (Tepper et al. 2016). Firstly, calculate forwardly the outputs of each memory cells in LSTM and then calculate backwardly the error for each cell. Afterwards the gradients and biases of each weight matrix using the errors are supposed to figure out. Finally, put gradients and biases into the Adam optimization algorithm. to train every single LSTM of the ensemble mode.

Adaptive Boosting
As an efficient ensemble learning model, Adaptive Boosting (AdaBoost) performs well in time series forecast (Bai et al. 2021) by combining a set of classifiers into a more powerful learner to improve model performance. Weak classifiers are trained based on the training set, and each time the next weak leaner is trained on a different set of weights of the sample. The weights are determined by errors and boosting is used to reduce bias as well as variance for supervised learning basing on the principle of classifiers growing sequentially. In the process of using the sample training set, the AdaBoost algorithm selects the key classifiers feature sets for several times, trains the component weak classifiers step by step, selects the best weak classifiers with appropriate thresholds, and finally builds the best weak classifiers selected by each iteration training into strong classifiers (Xiao et al. 2019). The construction of AdaBoost is introduced as follows: Step 1: Input training dataset (X, Y) = x 1 , y 1 , x 2 , y 2 , ⋯ , x N , y N , where X is historical water demand data, Y is the current observed data, and N is the length of data. Among them, x i is a column vector with d entries, x i ∈ ⊆ R d .
Step 2: Initialize weights, D 1 = w 11 , w 12 , ⋯ , w 1N Step 3: Repeat the following for m = 1, 2, …, M to get M base learners. The base learners are LSTM models in this study: 1. The m th base learner is obtained by training the data according to the sample weight Calculate the classification error rate of G m (x) on the weighted training dataset: In the above equation I(⋅) is the indicator function.
3. The calculated G m (x) coefficient (i.e. the weight of the base learner used in the final integration) is: 4. Update weights of the training samples: where Z m is the normalization factor, so that all the elements of D m+1 sum to one.

Construct the final linear combination of the base learners:
The final regression mode is obtained as: According to Formula (9), when the error rate e m ≤ 0.5 , m ≥ 0 . And m increases with the decrease of e m , that is, the smaller the classification error rate is, the larger the proportion of the base learner will be in the final integration. This means that AdaBoost can adapt to the training error rate of each weak classifier.

Ensemble AdaBoost-LSTM Deep Learning Model
The AdaBoost algorithm was originally designed for classification. Consequently, to use this algorithm for water demand forecast, it needs to be modified appropriately. In this study, we develop the AdaBoost-LSTM (Ada-LSTM) model by adjusting the sample weights depending on whether a specified threshold is exceeded. And the LSTM deep learning models are used as week learners in the Ada-LSTM model. Figure 2 shows the structure of proposed LSTM-AdaBoost model for SWDF.
Firstly, the decomposed data series T t and R t are divided into two sets (80% training set and 20% testing set) respectively. Secondly, the optimal structure of a single LSTM model is selected by analyzing the sensitivities of main hyper-parameters including the number of hidden layers, the number of neurons in hidden layers and train epoch. In experiments, the BPTT algorithm is used to optimize the training process of LSTM networks. Additionally, the value range of hyper-parameters refers to the existing studies (Bai et al. 2021;Busari and Lim 2021). To reduce the number of parameters of the . 2 Structure of the proposed LSTM-AdaBoost model for SWDF proposed model to ease the tuned procedure, the values of each hyper-parameter in every LSTM base leaner are the same and only four hyper-parameters needed to be adjusted according to the reference interval. The loss function of LSTM model is mean squared error function. After training by the first LSTM model, the result of the first base learner G T 1 and G S 1 are automatically imported to the Trend and Seasonal Ada-Boost Model respectively. Then the weights of the first LSTM model will be updated and transformed to the second LSTM model based on the error rate of previous. The weights of the samples that are not accurately predicted in the last round will increase in the next round to improve the performance of these samples. The forecast results of the second LSTM are also automatically imported to the AdaBoost Model in the whole ensemble model. According to this law, the training data sets are trained by the LSTM models one by one to improve the forecast of the samples continually. After multiple iterations, the results of the LSTM models are weighted and combined to gain the final strong learner according to Formulas (9)-(12).
Different from the forecast procedure of the other two decomposed series, the residual series is ranked and all the peaks are identified by the hypothesis testing about differences of two consecutive slopes introduced by Bramante et al. (2019) and other historical data are put into training data set before forecast due to its large fluctuations. Finally, the sum of the Trend AdaBoost-LSTM, Seasonal AdaBoost-LSTM and Residual AdaBoost-LSTM model output is the final forecast.

Study Area and Data Description
The study area is Yiwu City, located between 119°49 'E-120°17' E and 29°02 ′13 "N-29°33′ 40" N covering an area of 1105 km 2 . Yiwu City is the first national watersaving society construction demonstration area in Zhejiang Province, and the largest small commodity distribution center in the world. The annual average water resources in Yiwu is 820 million cubic meters. According to the permanent resident population, the annual water resource occupancy per capita is 1,76 cubic meters. The water resource occupancy per capita is about 500 cubic meters, which is only 22.7% of the national level. With rapid development of the city and population increase, such percentage will decline further. In the foreseeable future, water resources in Yiwu will become increasingly tense and scarce. Accurate water demand forecast is helpful to construct intelligent and efficient water supply system for sustainable development of an industrial city with water shortage such as Yiwu (Fig. 3).
Daily water demand data were obtained from two water plants in Yiwu City. There are eight water plants in Yiwu City and two of them are selected as representations in this study. The first water plant (P1) called Chengbei is located in the center of the city and serves six streets in the main city with a population of 640,114. The second water plant (P2) called Fotang is in the southeast of Yiwu. The observed daily water demand data series of P1 and P2 range from March 1st 2010 to February 28th 2021, and from March 1st 2014 to February 28th 2021 respectively. Table 1 presents the characteristics of the data for the plants.

Evaluation Criteria
Six criteria, i.e., mean absolute scaled error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root mean square error (RMSE), coefficient of determination (R2) and Akaike information criterion (AIC) are calculated to evaluate the forecasting model from different angles. MAE and MAPE are used to evaluate the forecasting accuracy of models. R2 is calculated to evaluate the model performance and indicates the agreement between observed and predicted values (Xenochristou et al. 2021). MSE and RMSE magnify values with large deviations, which means that both are sensitive to outliers. Therefore, they are generally suitable for evaluating stability (Huang et al. 2021). AIC, as a reliable tool for selecting the best between several competing models depending on the  Agricultural & residential number of variables and output error of the models when applying the same data set, is calculated to evaluate the complexity of models. The model with the lowest AIC is deemed to be the simplest (Salloom et al. 2021). Their formulas are as follows respectively: where y i represents the ith observed value; ŷ i denotes the ith forecast value; N is the total number of data points being predicted, k is the number of model's variables, which are usually the weights and the biases of a neural network model, and RSS represents the sum of square error of the model output. When 1 < n k < 40, AIC needs a bias adjustment.

Development of Forecast Models
To simplify model structure, the base learner has one hidden layer, the time step of input variables is one for daily forecast, the gradient threshold is one and the initial learn rate is 0.02 (Bai et al. 2021). Adaptive Moment Estimation (Adam) is used as the activation function. In the training process, the training data are shuffled at the beginning of every training epoch. Four core hyper-parameters including the number of nodes in hidden layers, batch size, epoch and number of base learners(M) are selected by sensitive analysis. In experiments, the ranges of the four hyper-parameters are 1-20, 1-10, 50-500,1-10. Figure 4 shows the sensitive analysis results of MAPE and AIC of the proposed model when changing the four core hyper-parameters respectively to analyze the best setting. As shown in Fig. 4a, MAPE exhibits a process of first rocketing, then fluctuating and finally decreasing dramatically when varying the nodes number of the hidden layer. MAPE is the lowest with a value of 0.03 at the nodes number of the hidden layer is 17 and the AIC is smaller than other points with similar MAPE. If we give priority to the accuracy of the

Data Preprocessing
According to 3σ criterion, the outliers of daily water demand values of P1 water plant were identified as the red points shown in Fig. 5a. Then, the outliers were smoothed to fit in with the standard by the weighted average method based on Eq. (1). A thermodynamic diagram was built to illustrate the monthly and yearly behaviors of the water demand values.

Fig. 4 Sensitivity analysis results of four parameters
According to Fig. 5b, the monthly pattern of water demand is higher in June, July and August and lower in December, January and February. It is apparent that the water demand value is higher in summer than winter. It could also be observed from Fig. 5c that the water demand value is more centralized in summer and more dispersive in autumn than other seasons. Thus, the seasonal characteristics are obvious. The observed data series of P2 water plant was also analyzed by the procedure above. Since the conclusion of P2 is similar to that of P1, the results of P2 were omitted.

Decomposing the Data by STL
The smoothed water demand time series of P1 and P2 water plants were decomposed by STL and shown in Fig. 6. The seasonal periodicity of data series is obvious in both plants as shown in the seasonal subgraphs. For P1, the seasonal series gradually decreases from summer to winter. It is consistent with the main function of the plant because residential and industrial water use increase for cooling purpose in warmer season. As for P2, the water demand fluctuates up and down within a certain range from spring to autumn and drops rapidly in winter. It is mainly because this plant supplies irrigation water and less water is needed by most of crops in winter. The change of the series could be recognized by the trend component shown in Fig. 6. The trend component of P1 decreased at first and then raised back to the former level during 2010 to 2020. Differently, the trend component of P2 exhibits continuous increase in the first two years and then keeps stable over 40,000 m 3 . Last but not least, the residual series of both plants presents an irregular distribution.

Forecast Results of the STL-Ada-LSTM Model
By applying the STL-Ada-LSTM model for water demand forecast, the results of the forecasted decomposed components (trend, seasonal and residual) of P1 and P2 water plants are shown in Fig. 7. It can be observed that the predicted trend and components of the two water plants fit perfectly due to its simple change features. The model achieves 0.001%, 0.247% and 0.176% of MAPE in P1 for trend, seasonal and residual series, respectively, and 0.002%, 0.520% and 0.347% of MAPE in P2 for trend, seasonal and residual series, respectively. The residual components are predicted accurately and the forecast errors usually occur at extreme points. All the predicted components are summed up to obtain the final daily water demand forecast results.
In order to investigate the performance of the proposed model, nine forecast models were built for comparison. The core idea of construction of STL-Ada-LSTM is extracting and simulating the characteristics of the water demand series, especially the substantial seasonal characteristics. However, there are also other ways to extract seasonal features. Autocorrelation regression is suitable to gain prior knowledge of input variable dynamics, and it has been implemented in hydrological studies (Mouatadid et al. 2019). To prove the superiority of STL, the autocorrelation regression method was used to build contrast models (the models are named with "seasonal" in the parentheses of the name). On the other hand, the base learner of STL-Ada-LSTM is set as the LSTM model, so it is necessary to compare the performance of proposed model with the LSTM model to show the effect of Adaboost algorithm for the improvement. Huang et al. (2021) have combined the BP and AdaBoost (Ada-BP) model to forecast short-term water demand and achieved promising improvements in both accuracy and stability compared with the single BPNN and SARIMA models. Thus, the Ada-BP model was also used for comparison. Therefore, in total nine models namely BP(no-seasonal), BP(seasonal), LSTM(no-seasonal), LSTM(seasonal), Ada-BP(no-seasonal), Ada-BP(seasonal), Ada-LSTM(no-seasonal), Ada-LSTM(seasonal) and STL-LSTM were constructed for comparison purpose. All parameters which also exist in the proposed model are the same with the STL-Ada-LSTM model.
The forecast results of nine models are shown in Fig. 8. It's obvious that the proposed model achieves the best accuracy compared with others. Especially, the proposed model demonstrates outstanding imitative ability in extreme values, thus it would offer distinct advantages over the traditional SWDF for water demand series with high fluctuation. It can be observed that the LSTM-based (which means the model containing LSTM as a component) models perform better than the BP-based models and the AdaBoost algorithm enhances the forecast ability at peaks.

Discussion
The STL decomposition and ensemble deep learning algorithm steps enhance the STL-Ada-LSTM model accuracy by using individual strength. As shown in Fig. 8, the new model achieves the best accuracy among all models. To illustrate the forecast results of the models more clearly, scatter plots between actual value and predicted value of ten models are shown in Fig. 9. According to Fig. 9, the "seasonal" models don't show significant improvement compared with "no-seasonal" models. The STL decomposition based models are more accurate than the ones without STL, and they performed better than those models using autocorrelation regression. This proves the rationality of using STL. And it is evident that the accuracies of LSTM model and Ada-LSTM model are  (Mu et al. 2020). Among all models, the points of STL-Ada-LSTM are closest to the 45° diagonal line, which reveals that the proposed model predicted most accurately. The LSTM component in the proposed model offers high accuracy. The STL decomposition significantly ease the complicated short term water demand forecast process by refining the concise characteristics of water demand data. The AdaBoost ensemble learning model guarantees the stability of the forecast model and enhances the forecast ability of water demands at peaks. Therefore, the STL-Ada-LSTM model proposed achieves the best forecast accuracy among the constructed models in this study.
For further analysis, the performances of forecast models were evaluated by the evaluation criteria before-mentioned, and the results were tabulated in Table 2. The  Table 2 that the proposed model outperforms other models not only in forecast accuracy but also in stability and conciseness of the model itself. Compared to the BP-based models and other LSTM-based models, the STL-Ada-LSTM model of water plant P1 improves from 73.08% to 88.37% in terms of MAPE metric. Meanwhile, the MAPE metric of P2 forecast results is reduced about 69.57% ~ 88.52% from other contrast models except the STL-LSTM model. Especially, the MAPE of STL-LSTM is the same as the proposed model with the value of 0.01% in water plant P2. However, the RMSE and AIC of STL-Ada-LSTM model are significantly less than those of the STL-LSTM model, indicating that the proposed model improves the STL-LSTM model in terms of stability and simplicity. In addition, the proposed model yields the highest R2 coefficient with the value in a range of 0.99 ~ 1.00. To sum up, the forecast performance of the STL-Ada-LSTM model is superior to other models, indicating that the ensemble deep learning model improves the traditional LSTM model not only in forecast accuracy, but also in stability and simplicity of the model. Therefore, the proposed STL-Ada-LSTM model performed the best among these common models.
In order to further prove the significance of this study, our model is compared with state-of-the-art models shown in Table 3. It is worth mentioning that the "Technology" term in the table presents the dominant model used, and the other details are omitted. In addition, the minimum values of MAPE, RMSE and AIC of the results in literature are selected for comparison. The three indexes of our study were set to the mean of results of P1 and P2 water plants. As shown in Table 3, the proposed model in this study achieves the best MAPE with the value of 0.01%. Although RMSE of the proposed model is larger than the study of Chen et al. (2022), our forecast obtained lower MAPE with relative small RMSE, which could balance the accuracy and stability of the forecast. AIC was not discussed in most studies. Despite the value of AIC in our study is larger than that of Salloom et al. (2021), the more accurate results were achieved by the STL-Ada-LSTM model. All in all, the proposed model gained the most accurate results and balanced the stability and simplicity. As a footnote, the comparison results are actually limited by the number of data, data stability and so on.

Conclusions
In this study a new ensemble deep learning model named STL-Ada-LSTM was proposed for daily water demand forecast. The main goal of the new design is to maximize the accuracy of SWDF while keeping the conciseness, stability and accessibility of the model to support the construction of intelligent water systems in cities of water shortage.
To enhance the performance of the model, the STL decomposition method and Ada-Boost-LSTM model were coupled for SWDF. The historical data of water demand was decomposed into three components by STL to refine the concise characteristics of water demand data. The multiple LSTM models were integrated by AdaBoost ensemble learning for higher accuracy with stable predictive ability. Moreover, the highly integrated model makes the operation easier for users.
The comparisons between the proposed model and other nine models expose the benefits of model structure. The results show that the proposed model predicted most accurately compared to other models. It is found out that applying AdaBoost ensemble algorithm is beneficial for better accuracy with stable forecasting ability of the proposed model.
Meanwhile, the comparisons with state-of-the-art models show that the proposed SWDF can has the lowest MAPE(0.001%), relatively small RMSE (368.79) and AIC(38887.42). The results show that the proposed model not only obtain the most accurate SWDF results among all the listed studies, but also gained stable forecasting ability as well as relatively simple model structure.
As a result, the proposed STL-Ada-LSTM achieves remarkable accuracy for SWDF, and balances the stability and conciseness of the model at the same time. It has a huge potential for wide use in real word intelligence water systems of cities with scarce water resources.