Water Quality Prediction in Urban Waterways Based on Wavelet Packet Denoising and LSTM

: The prediction of water quality in urban rivers plays a crucial role in 19 supporting water environment management. This study collected and organized real-20 time water quality monitoring data from four water quality monitoring stations in the 21 Fenjiang River Basin of Foshan City, spanning from 2016 to 2021. To reduce noise


Introduction
Water is a crucial resource for the proper functioning of cities.However, urban water environment is increasingly polluted by the rapid socio-economic development, which poses a challenge for sustainable urban development [1][2][3] .[6] .Existing water quality prediction models can be divided into two types: mechanistic and non-mechanistic models [7,8] .Mechanistic models, such as the Streeter Phelps (S-P) model [9][10][11][12] , QUAL2E model [13][14][15] , Water Quality Analysis Simulation Program (WASP) [15][16][17] , Soil and Water Assessment Tool (SWAT) [18][19][20] , etc., are based on physical and chemical principles and require a large amount of basic and long-term monitoring data.However, such data are often unavailable or incomplete in practice, which limits the applicability and accuracy of mechanistic models.In contrast, nonmechanistic models are data-driven methods that use mathematical statistics and machine learning techniques to capture the dynamic patterns of water quality.These methods do not rely on prior knowledge of the underlying mechanisms, but only on data quality.
With the advancement of machine learning technology, various non-mechanistic methods have been applied to water quality prediction.These include Support Vector Machine (SVM) [21][22][23] , Backpropagation (BP) neural network [24,25] , Radial Basis Function (RBF) neural network [26,27] , and Long Short-Term Memory (LSTM) neural network [28][29][30][31][32] .The research conducted on non-mechanistic water quality prediction has contributed to enhancing the applicability and accuracy of predicting water quality in urban waterways to some extent.However, in the research of non-mechanistic water quality prediction, past studies often lacked consideration of the impact of data quality on prediction models, limiting model accuracy, and even allowing some serious data errors to gradually accumulate and propagate within the model, resulting in error amplification.In recent years, some scholars have realized the importance of considering data quality in non-mechanistic water quality prediction models, and have conducted some related research [33][34] .
Data quality is a key factor for non-mechanistic water quality prediction models, as it directly affects the accuracy and reliability of the models.The original data are usually collected by online monitoring devices, which may be disturbed by various factors, resulting in noise, outliers, missing values and other problems.These problems reduce the integrity, consistency, accuracy and validity of the data, and thus affect the training and prediction performance of the models.Therefore, before performing nonmechanistic water quality prediction, the raw data need to be preprocessed, including data cleaning, data imputation, data smoothing and other steps, to improve the data quality, eliminate or reduce the noise and outliers in the data, and enhance the stability and credibility of the data.Data preprocessing can not only improve the accuracy of non-mechanistic water quality prediction models, but also improve their generalization ability and robustness, enabling them to adapt to different water quality scenarios and conditions.This study aims to build a more accurate water quality prediction model than the traditional LSTM, by using a wavelet denoising method to denoise the original data of online water quality monitoring, and then combines LSTM neural network technology to establish a single-factor prediction model for the main factors of water quality damage in urban waterways.
The rest of the current paper is organized as follows: Section 2 and Section 3 introduce study area and methodologies, the results and discussion are analyzed in Section4 and Section 5, and the Section 6 concludes the current work.

Study Area
The Fenjiang River (Foshan Waterway) is elected as our study area (Fig. 1).It is a typical inland river flowing through urban areas in the Pearl River basin.It is located between 112°59 '14 "E-113°12' 52" E and 22°58 '33 "N-23°9' 11" N in the downstream network area of the Pearl River delta and flows through Chancheng and Nanhai, the two most densely populated and economically populated regions in Foshan.
The river starts at the Shakou sluice of the Tanzhou waterway, west to the Shawei Bridge in Guicheng, Nanhai District, and enters the Pingzhou waterway, with a length of 25.5 km.The Shakou and Shiken sluices control the upstream water flux, forming a semi-closed water body.The terrain in the basin is flat with a small drop.The sunshine duration is about 1800 h per year.The average annual temperature ranges between 21.2 and 22.2 °C.The frost-free period is more than 350 days.Precipitation is abundant, and the average annual precipitation is between 1600 and 2000 mm.The total basin area of the Fenjiang River is nearly 265 km2.The river network in the basin is crisscrossed and scattered.Industry, mining, and commerce are the main human activities in the catchment.The population density and the risk of river contamination are high.According to the research conducted by Pang.J. [35] in 2022, the main factors causing water quality damage in the Fenjiang River in Foshan City are COD and NH3-N.In order to further study the changes and development trends of water quality damage factors in the Fenjiang River, four monitoring sections were set up in this study, including the Shakou section, Renmin Bridge section, Hengjiao section, and Sanzhou section, for COD and NH3-N monitoring.

Wavelet Packet Denoising
The measured data of four online water quality monitoring stations in the Fenjiang River Basin of Foshan City are often mixed with noise signals due to complex outdoor working conditions, which can have adverse effects on water quality prediction.To solve the problem of noisy signals, Donoho & Johnstone proposed the basic idea of wavelet threshold denoising in 1994 [36] .
In the formula, f(n) is the original signal, ε is the noise deviation coefficient, and   Compared to wavelet analysis, using wavelet packet analysis can provide more precise signal analysis.Wavelet packet analysis can divide the time-frequency plane more finely, and has better resolution for the high-frequency part of the signal than wavelet analysis.It can adaptively select the optimal wavelet basis function based on the characteristics of the signal, in order to better analyze the signal.Therefore, the denoising method based on wavelet packet threshold has been deeply studied and widely applied [37] .The process of applying wavelet packet analysis to denoise signals is similar to wavelet denoising, which is mainly divided into four steps, as follows.
(1) Wavelet packet decomposition of signals.Select a wavelet packet basis and determine the level of decomposition required, then perform wavelet packet decomposition on the signal.
(2) The selection of optimal wavelet packet basis.
(3) Thresholding of wavelet packet decomposition coefficients.For each wavelet packet decomposition coefficient, select an appropriate threshold to perform threshold quantization on the decomposed coefficients.
(4) Signal reconstruction.Perform wavelet packet reconstruction on low-frequency coefficients and processed high-frequency coefficients.The first step in using wavelet packets for denoising is to select the wavelet packet basis and the number of decomposition layers.Wavelets with good symmetry do not produce phase distortion, while wavelets with good regularity are easy to obtain smooth reconstructed signals.

LSTM Water Quality Prediction Model
Water quality data is typically organized as time series data arranged in chronological order.Recurrent neural networks (RNN) are a commonly used method for predicting time series data. [38]However, RNNs struggle to learn long-term dependencies and can face issues such as vanishing gradients after multiple cycles.To address these challenges, Hochreiter [39] introduced the Long Short-Term Memory (LSTM) model, which has been proven to have significant advantages in predicting time series data.Compared to RNNs, LSTM models incorporate a dedicated memory unit that retains information from previous iterations.This enables them to overcome the vanishing gradient problem that occurs after multiple cycles, leading to improved prediction accuracy.The mathematical structure in the LSTM cell can be expressed by: where Here the subscript t and t-1 represents the current and previous moments respectively, x is the input vector of the LSTM cell, f, i and O denote the forget gate, the input gate, and the output gate respectively, h and C are the hidden state and the cell state respectively,  ̃ is the candidate cell update vector, W and b are weight matrices and bias vector parameters.The internal structure of the LSTM model is illustrated in Fig. 4. Fig. 4 The computation workflow of LSTM.

Water Quality Monitoring Data
In the Fenjiang River, there are four monitoring sections that are sampled six times Based on Pang's research in 2022, it is known that the main pollution factors in the Fenjiang River are COD and NH3-N [35] .Therefore, COD and NH3-N were chosen as the primary water quality factors for this study, and the COD and NH3-N monitoring data from the past 6years were used to establish.the LSTM water quality prediction model.The statistical analysis software SPSS Statistics was used for data analysis， and the data analysis software Origin 2021 was used for producing charts and plots.

Fenjiang River
In this study, we have analyzed the spatiotemporal evolution characteristics of  From Fig. 7, it can be seen that although NH3-N and COD are both the main factors causing water pollution in the Fenjiang River, the variation pattern of pollutant content over time in each section of the two is inconsistent.The distribution pattern of NH3-N pollutants in various sections of the Fenjiang River is as follows: the average content of Shakou Gate section is the lowest, the content of Hengjiao section is the highest, and the Renmin Bridge section and Sanzhou section alternate in the middle.The distribution law of COD pollutants in each section of Fenjiang River is as follows: Renmin Bridge section is the main contribution section of COD pollutants in Fenjiang River before 2018, Sanzhou section is the main contribution section of COD pollutants in Fenjiang River after 2018, and the contribution amount of COD pollutants in each section has decreased after 2019.
The spatial layout of each section shows that the Shakou Gate section is located upstream of the Fenjiang River, and the water quality is slightly less affected by the economic and social development of the city.On the other hand, the Renmin Bridge section and Hengjiao section are the middle sections of the Fenjiang River that receive the urban waterway pollution discharge and production and domestic sewage collection, therefore the water quality is poor.The Sanzhou section is located in the lower reaches of the Fenjiang River, near the intersection with other waterways.There are two reasons why the water quality is better than that of the middle section.Firstly, the pollutants are diluted due to the mixing with other water sources downstream.Secondly, the pollutants generated by economic production and social life are mainly concentrated in the middle reaches of the Fenjiang River.The self-purification ability of the water body is also a factor affecting the reduction of downstream pollutant content.

LSTM Model Based on Wavelet Denoising
The following will discuss the predicted results of NH3-N and COD for the four sections of the Fenjiang River.The analysis of prediction results mainly starts from two aspects: the changes in time series before and after wavelet denoising; and the RMSE comparison of LSTM model prediction results before and after wavelet denoising.

Analysis of NH3-N Prediction Results
Analyzing the NH3-N sequences before and after wavelet denoising, comparing its mean and standard deviation, it can be seen from Table 2, Fig. 8, and Fig. 9 that after wavelet denoising, the mean of the sequence remains basically unchanged, while the standard deviation decreases, making the sequence smoother.The LSTM model based on wavelet denoising was used to perform 100 multi-step predictions on the NH3-N data of each section.As shown in Fig. 10, within a prediction time of 12 hours, the RMSE value of the Shakou Gate section changed from 0.078 before denoising to 0.026 after denoising, the RMSE value of the Renmin Bridge section changed from 0.521before denoising to 0.228 after denoising, and the RMSE value of the Hengjiao section changed from 0.415before denoising to 0.185 after denoising, The RMSE value of the Sanzhou section has changed from 0.336 before noise reduction to 0.118 after noise reduction, and the predicted NH3-N after noise reduction is better than before.The LSTM model based on wavelet denoising was also used to perform 100 multistep predictions on the NH3-N data of each section.As shown in Fig. 11,within a prediction time of 3 days, the RMSE value of the Shakou Gate section changed from 0.005 before denoising to 0.003 after denoising, the RMSE value of the Renmin Bridge section changed from 0.030 before denoising to 0.008 after denoising, and the RMSE value of the Hengjiao section changed from 0.020 before denoising to 0.007 after denoising, The RMSE value of the Sanzhou section has changed from 0.029 before noise reduction to 0.005 after noise reduction, and the predicted NH3-N after noise reduction is better than before.Fig. 11 Comparison of RMSE changes of 100 predictions before and after wavelet denoising of NH3-N data from the four sections of the Fenjiang River (3d).
By comparing the RMSE of NH3-N indicators in four sections of the Fenjiang River, it was found that regardless of the prediction time of 12 hours or 3 hours, the RMSE after wavelet denoising was smaller than before, indicating that wavelet denoising can improve the accuracy of the model to a certain extent.

Analysis of COD Prediction Results
Analyzing the COD sequences before and after wavelet denoising, comparing its mean and standard deviation, it can be seen from Table 3, Fig. 12, and Fig. 13 that after wavelet denoising, the mean of the sequence remains basically unchanged, while the standard deviation decreases, making the sequence smoother.12 Comparison of COD sequences from the four sections of the Fenjiang River before and after wavelet denoising (12h).Fig. 13 Comparison of COD sequences from the four sections of the Fenjiang River before and after wavelet denoising (3d).
The LSTM model based on wavelet denoising was also used to perform 100 multistep predictions on the COD index data of each section.As shown in Fig. 14,within a prediction time of 12 hours, the RMSE value of the Shakou Gate section changed from 4.343 before denoising to 3.570 after denoising, the RMSE value of the Renmin Bridge section changed from 0.964 before denoising to 0.684after denoising, and the RMSE value of the Hengjiao section changed from 2.607 before denoising to 1.289 after denoising, The RMSE value of the Sanzhou section has changed from 2.242 before noise reduction to 1.796 after noise reduction, and the predicted COD index after noise reduction is better than before.The LSTM model based on wavelet denoising was used to perform 100 multi-step predictions on the COD index data of each section.As shown in Fig. 15, within a prediction time of 3 days, the RMSE value of the Shakou Gate section changed from 0.291 before denoising to 0.091 after denoising, the RMSE value of the Renmin Bridge section changed from 0.178 before denoising to 0.080 after denoising, and the RMSE value of the Hengjiao section changed from 0.163 before denoising to 0.055 after denoising, The RMSE value of the Sanzhou section has changed from 0.315 before noise reduction to 0.158 after noise reduction, and the predicted COD index after noise reduction is better than before.By comparing the RMSE of COD indicators in four sections of the Fenjiang River, it was found that the RMSE after wavelet denoising was lower than before, regardless of the prediction time of 12 hours or 3 hours, indicating that wavelet denoising can improve the accuracy of the model to a certain extent.

Pattern of changes in the water quality in the Fenjiang River in recent years
Through the analysis of the main pollution factors of water quality of the four sections of Fenjiang River from 2016 to 2021, ammonia nitrogen (NH3-N) and Chemical oxygen demand (COD), it can be found that: (2) Spatially, from the perspective of NH3-N content, the average content of Shakou Gate section is the smallest, while the content of Hengjiao section is the highest.
The Renmin Bridge section and Sanzhou section alternate in the middle; From the perspective of COD content, before 2018, Renmin Bridge section was the main contribution section of COD pollutants in Fenjiang River.Since 2018, Sanzhou section has become the main contribution section of COD pollutants in Fenjiang River, but after 2019, the contribution amount of COD pollutants in each section has declined.
From the analysis of time and space dimensions, it can be found that the water quality in the upper reaches of the Fenjiang River continues to improve, while the water quality in the middle and upper reaches continues to improve synchronously, and there is a recurrence of water quality in the middle and lower reaches.Scholars have pointed out that the government's watershed management projects, industrial structure adjustment and transfer, and comprehensive river water environment management measures have all brought positive changes to the water quality of the middle reaches of the Fenjiang River (near the Renmin Bridge section) [40] .A survey found that the majority of local residents and shop owners believe that the river water is clearer and better than before, but still hope that the water quality can be further improved.Some respondents also hope to return to the drinking and swimming water quality state of the "old Foshan" era.The survey also believes that the government has closed and relocated upstream polluting enterprises, established sewage treatment plants, and carried out upstream sewage interception The construction of artificial floating islands and other measures has produced beneficial results in improving the water quality of the Fenjiang River [41] .The research in this article also confirms the research conclusions of the aforementioned scholars on the changes in water quality of the Fenjiang River, and it is not difficult to find that there is still significant room for improvement in water quality in the middle and lower reaches of the Fenjiang River basin.

Improvement of COD and NH3-N prediction in Fenjiang River using wavelet packet denoising
Wavelet analysis has been widely applied in noise filtering, data compression and signal analysis [42] .In terms of water quality prediction, Lin, J. etc. have successfully predicted blooms' occurrence and magnitude by combining wavelet analysis with LSTM [43] .The research on urban river water quality prediction based on wavelet packet denoising is still in the exploratory stage.This article is an empirical study that attempts to combine wavelet packet denoising with LSTM to predict river water quality.In summary, it is believed that the LSTM water quality prediction model based on wavelet packet denoising can improve the prediction of COD and NH3-N in Fenjiang River.

WPD-LSTM's Role in Fenjiang River Water Quality Improvement
The WPD-LSTM model can perform more scientific and accurate water quality prediction, which helps to understand the laws and development trends of water quality changes, and can help water environment departments provide water quality warning information, enhance their decision-making initiative in ensuring drinking water safety and water pollution prevention and control, and improve work efficiency.In addition, the model can be considered to predict the water quality of other sections of the Fenjiang River and even other rivers in the future, providing strong support for water quality prediction, management, and decision-making of rivers and basins.

Conclusion
This study collected and organized over 52560 water quality sampling data from four monitoring sections of the Fenjiang River from 2016 to 2021.Through statistical analysis of the spatiotemporal evolution of COD and NH3-N in the Fenjiang River Basin, it was found that from the perspective of spatial layout, the upstream of the

3
Methodology and DataThis study analyzed the water quality monitoring data from four online monitoring stations along the Fenjiang River in Foshan City between 2016 and 2021.The data encompassed eight parameters, including pH, conductivity, water temperature, flow rate, turbidity, NH3-N, COD, and DO.Research identified NH3-N and COD as the main factors affecting water quality in the river.To improve data accuracy, the original NH3-N and COD monitoring data underwent denoising using wavelet packet techniques.Based on this refined data, an LSTM-based prediction model was established to forecast NH3-N and COD levels.The model demonstrates predictive capabilities for 12-hour and 72-hour timeframes for COD and NH3-N, respectively.

Fig. 2 A
Fig.2A summary of the analyzing workflow.

e
(n) is the noise signal.Due to the influence of various noise signals, the extraction of f(n) signals require the use of mathematical methods for noise reduction.Wavelet analysis can effectively solve the time-frequency characteristics of signals, so it is usually used as a choice for on-site signal processing.Wavelet analysis further decomposes the low-frequency part of the signal and no longer decomposes the highfrequency part.Therefore, wavelet transform can effectively represent signals with lowfrequency information as the main component, as shown in the following figure:
a day.Over the period from 2016 to 2021, a total of 52,560 samples were collected.The monitoring indicators include both physical indicators (water temperature, flow rate, conductivity, turbidity) and chemical parameters (pH value, NH3-N, COD, and DO).
Fig. 5 Characteristics of NH3-N interannual changes in Fenjiang River

Fig. 8
Fig.8 Comparison of NH3-N sequences from the four sections of the Fenjiang River before and after wavelet denoising (12h).

Fig. 10
Fig.10 Comparison of RMSE changes of 100 predictions before and after wavelet denoising of NH3-N data from the four sections of the Fenjiang River (12h).

Fig. 14
Fig.14 Comparison of RMSE changes of 100 predictions before and after wavelet denoising of COD data from four sections of Fenjiang River (12h).

Fig. 15
Fig.15 Comparison of RMSE changes of 100 predictions before and after wavelet denoising of COD data from the four sections of the Fenjiang River (3d).

( 1 )
Temporally: From the perspective of NH3-N content, N in Shakou Gate, Renmin Bridge, and Sanzhou Section has shown a downward trend since 2018, while the Hengjiao Section showed an upward trend from 2016 to 2020, with a slight decrease in 2021; From the perspective of COD content, the Shakou Gate section has been showing a downward trend since 2018, while the Renmin Bridge and Hengjiao sections have been showing a downward trend.The Sanzhou section showed an upward trend from 2016 to 2019, and decreased slightly in 2020 and 2021; This paper is based on continuous monitoring data from 2016 to 2021, and predicts COD and NH3-N in four sections of the Fenjiang River by constructing an LSTM water quality prediction model.Train and validate the monitoring data, compare the RMSE of simulation results and detection data, and find that the RMSE of COD and NH3-N are both small.It is believed that the model can predict COD and NH3-N in the Fenjiang River.Afterwards, wavelet packet denoising was performed on the monitoring data, and the above process was repeated.The COD and NH3-N predicted by the LSTM model based on wavelet packet denoising were compared with the prediction results without considering denoising.It was found that after considering wavelet packet denoising, regardless of the prediction duration of 12 hours or 3 hours, the RMSE of COD and NH3-N decreased compared to before, and the model accuracy improved.

Fenjiang
River was less affected by economic and social development, and the water quality was better; the middle section flows through densely populated urban built-up areas, and the water quality declined; the downstream is close to the intersection with other waterways, and the water quality has rebounded.In terms of temporal distribution, the water quality of the Fenjiang River has gradually improved since 2018.This study establishes a long and short memory neural network water quality prediction model LSTM after wavelet denoising, and performs single factor water quality prediction on the main factors of water quality damage in the Fenjiang River, COD and NH3-N.It was found that by combining wavelet packet denoising with LSTM, a better prediction performance than traditional LSTM models was achieved.The model was able to improve the accuracy of single-factor water quality prediction with a prediction period of 12 hours and 3 days.In the 12-hour prediction, the RMSE values of NH3-N predictions in the four monitoring sections decreased by 55% to 67%, with an average decrease of 61%; the RMSE values of COD predictions in the four monitoring sections decreased by 18% to 51%, with an average decrease of 29%.In the 3-day prediction, the RMSE values of NH3-N predictions in the four monitoring sections decreased by 40% to 83%, with an average decrease of 65%; the RMSE values of COD predictions in the four monitoring sections decreased by 50% to 69%, with an average decrease of 60%.At the same time, the experimental data also show that the model with a prediction time of 3 days has higher fitting performance and prediction accuracy compared to the 12-hour prediction model.Acknowledgements This research was funded by Guangdong Basic and Applied Basic

Table 2
Comparison of means and standard deviations of NH3-N sequences before and after wavelet denoising in the four sections of the Fenjiang River.

Table 3
Comparison of means and standard deviations of COD sequences before and after wavelet denoising in four sections of the Fenjiang River.