COD is considered an important indicator for characterizing the presence of organic pollutants in the water environment. The higher COD value indicates more severe pollution by organic pollutants in the water. Organic pollutants are often transferred to water bodies through atmospheric dry and wet deposition, agricultural runoff and wastewater discharge (Arfaeinia et al. 2017). Bioaccumulation may be caused by these pollutants when they are deposited into the atmosphere and enter water bodies, posing potential toxicity to both biota and humans (Agrell et al. 2002). A large amount of organic substances have been found to exist in the aqueous phase or particle extract during rainstorm sampling, expanding the scope of pollution as they migrate with the runoff (Glaser et al. 2023). A study examining organic compounds in 38 rivers of the United States were examined, which showed the 10 most widespread anthropogenic organisms are strongly associated with agricultural runoff (Bradley et al. 2017). By time-of-flight mass spectrometry (GC × GC-TOF-MS) combined with two-dimensional gas chromatography, organic pollutants in Yongding River basin were analyzed. It was found that the types of organic pollutants in downstream rivers were relatively similar to those in the effluent of sewage treatment plants (Jiang et al. 2023), which showed that wastewater treatment plant effluents are a non-negligible source of organic pollutants in surface runoff. A survey of 46 wastewater treatment plant effluents in China were found to contain 302 trace organic pollutants, accounting for 59 percent of the total organic matter (Liu et al. 2024). Organic pollutants pose a serious threat to the environment due to their toxic characteristics over prolonged period (Kumar et al. 2022). Organic pollutants can suffocate aquatic organisms by disrupting photosynthesis of aquatic plants and leading to significant reduction in water quality which can indirectly affect human health (Muralikrishna and Manickam 2017). Mussels farmed in the coast of the Gulf of Naples and Domitio had been found to pose a high food safety risk in terms of heavy metals and persistent organic pollutants (POPs) (Esposito et al. 2020). POPs pose a greater risk in these environments because they are difficult to be broken down and can easy to be accumulated in the air, water, soil, and fat cells of animals (Shetty et al. 2023; Zaynab et al. 2021; Lan et al. 2024; Mallah et al. 2022; Yua et al. 2018). Due to the toxicity of some organic pollutants to human health, its characteristics and toxicity have become the focus of global attention (Li et al. 2017). Researches by scholars not only focus on individual organic pollutants but also emphasize the measurement of organic content. Measuring and analyzing COD content has also been a hot topic of research. In China, a lot of work has been taken out to measure, analyze and control COD. But there are still many challenges (Li et al. 2012). Table 1 shows the status of COD emission and control in China.
Table 1
Status of COD emission and control in China
Items | China |
Formulation | Mainly national standards, A small quantity local standards |
Implementation | 1) National standard specified uniform limits for similar dischargers 2) The control items are mainly centralized control. The total quantity control is comprehensively implemented. 3) Local standards outperform national standards in execution |
Control level | 1) Compared to the lower treatment rate of ammonia nitrogen, COD treatment rate reaches over 90%. 2) The monitoring and supervision system is largely established, but the development of the shared system is uneven. |
Limits of major pollutants | 1)National standard: Standard A: 50mg/L, Standard B: 60mg/L. |
2)Sichuan Province Standards: Urban sewage treatment plant༚30mg/L |
The continuous determination of COD with long time series can be helpful to obtain a lot of necessary data for the analysis and prediction of organic matter in watershed. In order to achieve this goal, many methods for COD measurement were developed continuously and the regional, conventional and continuous monitoring methods were gradually realized. Traditional methods for COD measurement, such as potassium dichromate, potassium permanganate, and UV-vis spectroscopy, Ultraviolet-visible (UV-vis) spectroscopy combined with stoichiometric tools was used to determine COD content. This method is particularly suitable for real-time and rapid determinations while a large number of sample data is required for modeling (Li et al. 2018). Carbon point fluorescent capillary sensors are also used to determine COD. This method has the advantages of low detection limit, good linear range and reliability, portability and low cost. But it is easy to be limited by interference protection and environmental impact (Zhang et al. 2023). A nano-lead dioxide-composite electrochemical sensor has been used for COD determination offering fast response times, simple instrumentation, low cost, high detection sensitivity and wide linear range. However, sampling limitations and sensor stability can pose challenges (Wang et al. 2022). Currently, COD monitoring has been further developed to automatic monitoring. The CL system with flow injection was applied for COD determination, which has the obvious advantages of much shorter analysis time, simple operation except for a lower detection rate (Li et al. 2003). A new reagent-free method for measuring chemical oxygen demand (COD) was proposed based on ultraviolet absorption spectroscopy (UV-AS) which enables high-precision and long-range COD measurement by automatically selecting the wavelength and analyzing the full spectrum of the data. This method can therefore be used for in situ and online environmental monitoring (Wang et al. 2019). The development of automated monitoring has significantly improved the efficiency and frequency of COD monitoring. It can realize the simultaneous determination of COD over a wide area and provide technical and methodological support for regional COD monitoring research.
To better understand the changing patterns and drivers of organic matter in water bodies, a variety of models have been established by scientists to analyze and predict COD. These models input actual monitoring data and predicted values of COD. The dynamics of COD in water bodies are studied to analyze and manage the flow patterns and long-term series of organic matter. The removal of COD from polluted solutions has been simulated and predicted using artificial neural networks (ANN) (Elmolla et al. 2010; Masouleh et al. 2022). One main advantage of ANN is its ability to handle complex relationships between input and output variables (Ataei et al. 2021). However, ANN techniques require large datasets to train the model and are computationally expensive (Khanmohammadi et al. 2024). Faster convergence, higher convergence accuracy, and better pattern recognition are achieved by recurrent neural networks (RNN) compared to ANN (Al-Qaili et al. 2024; Gholami et al. 2023). RNNs are more suitable for time series prediction. However, RNNs suffer from the problem of long-term dependence, i.e., gradient vanishing and gradient explosion will be met when RNNs learns long sequences, which causes its difficulty to understand nonlinear relationships for long periods of time (Hochreiter 1991). Therefore, the use of convolutional neural networks (CNNs) was investigated by some researchers for efficient feature extraction to extract data features. A correlation between COD concentration and spectral reflectance in urban rivers was found and COD concentration was accurately predicted by using a one-dimensional convolutional neural network (1D-CNN) (Cai et al. 2022). The Long Short-Term Memory (LSTM) network, introduced by Hochreiter and Schmidhuber (1997), is an extension of the RNN. Long-term dependencies can be learned by LSTMs, avoiding the exploding or vanishing gradient problem that affects traditional RNNs (Xu et al. 2020). For monitoring and predicting the performance of COD in wastewater treatment, a novel LSTM-based soft sensor was developed (Xu et al. 2023). The LSTM method can mine the potential information between different water quality indicators at different time scales improving prediction accuracy. Major pollutants such as Biological Oxygen Demand(BOD), Chemical Oxygen Demand(COD), Total Nitrogen(TN), Total Phosphorus(TP), and Ammonia Nitrogen(AN)were predicted and demonstrated to have a high degree of correlation with each other using an integrated multivariate LSTM network(Wang, Xue et al. 2024). Although LSTMs can effectively process and predict events with long intervals and delays (Yousfi et al. 2017). A large number of parameters was needed and low convergence rate is existed. For this reason, GRU based LSTM was proposed (Cho et al. 2014)to simplify the internal cell architecture of LSTM and reduce the network training time in order to guarantee the prediction accuracy. Combining machine learning with sensor networks, multiple machine learning algorithms are employed to predict COD emissions. Comparative results showed that GRU is better than LSTM (Miao et al. 2021).
Prediction results can be influenced by the model characteristics themselves. How to improve their performance has become a necessary research topic nowadays. Time-frequency domain transformation technique was tried to be combined with some models to enhance the simulation effects. A complex signal can be decomposed into a series of intrinsic modal functions (IMF) with different frequencies and amplitudes by frequency division. the complexity and strong nonlinearities in a time series can be effectively reduce while obtaining a relatively stable subsequence that contains several different frequency scales. This approach extracts the main data components, and representing them as subsequences with different frequency features, thus optimizing the simulation effect of some intelligent models. The Variational Mode Decomposition (VMD), an adaptive and fully non-recursive signal processing method, was proposed in 2014(Dragomiretskiy and Zosso 2014) with the advantage of determining the number of mode decompositions. The VMD was used to decompose river flow into IMF, and the mixed models RVFL_VMD, GRNN_VMD and RBFNN_VMD were established to predict river turbidity. The best performance was achieved by RVFL_VMD on the hourly time scale, while GRNN_VMD provided the best prediction on the daily time scale (Heddam et al. 2022). VMD can improve the accuracy of power consumption forecasts. The combination of Bi-directional Gated Recurrent Unit (BiGRU) and LSTM models with VMD showed excellent prediction accuracy on various assessment metrics, especially for short to medium term predictions (Ahmed et al. 2023). The VMD method can be used to study the periodicity of vegetation and its relationship with climate (Wang et al. 2023). A new wind power forecasting method called IVMD combining VMD and HFCM was developed to achieve more accurate wind power generation prediction and reduce the prediction error by extracting time series features and learning the weights using Bayesian ridge regression method (Qiao et al. 2022). The spectral characteristics of VMD can be used to sort iron ore in hyperspectral images (Nie et al. 2023).
In this paper, taking the rivers in Chengdu area of China were selected as the research focus, with COD as the primary research index. The frequency division technique of VMD was applied to process the raw COD data. Moreover, three models — Random Forest, LSTM and GRU were integrated with VMD to develop an optimized model framework. The research aimed to find a new method was hoped to be found for the prediction and analysis of organic matter in the Chengdu watershed.