## 2.1 Theoretical frameworks of proposed models

## 2.1.1 Multivariate adaptive regression spline (MARS)

According to Friedman (1991), a non-parametric and nonlinear regression technique, the MARS, was utilised in this investigation. MARS uses numerous splines to build nodes between these lines (Friedman, 1991). The underlying functional link between inputs and outputs is not assumed in the MARS model. The data in each spline is assigned using Basis Functions (BF) in MARS models. It is possible to express the BF as a single equation between two knots. Two adjacent data domains converge at a knot, and the output is continuous. An adaptive regression algorithm is used (Heddam and Kisi, 2018). The MARS model depicts the piecewise relationship between the input and output variables using numerous lines. The over-fitting of training data is avoided by setting a predefined minimum number of observations between knots (Heddam and Kisi, 2018).

Let y be the target output, and a matrix of n input variables be the vector x = (\({x}_{1},\dots ,{x}_{n})\). The data are then presumed to be created from an undisclosed “true” model. In the case of a straight answer, this will be as follows:

$$y=f({x}_{1},\dots .,{x}_{n})+\xi =f\left(x\right)+\xi$$

1

In which \(\xi\) is the distribution of the model error, and n is the number of training data points. By adding sufficient BFs, MARS approximates the f(.). For linear functions piecewise: max (*0, x-t*) where a knot exists at position *t* (Zhang and Goh 2013). The max (.) equation implies that only the positive portion of (.) is used; otherwise, a zero value will be given corresponding to:

max (0, x-t) =\(\left\{\begin{array}{c}x-t, if x\ge t\\ 0, otherwise\end{array}\right.\) (2)

Thus, *f* (x) is constructed as a linear BF(x) combination:

$$f\left(\text{x}\right) = {\beta }_{0}+\sum _{i=1}^{n}{\beta }_{i}BF\left(x\right)$$

3

The coefficients \(\beta\) are constants, calculated using the form of least-squares. Initially, *f* (x) is applied to input data in a forward-backward stepwise process to determine the knot’s position where the feature value varies (Deo et al., 2017b). A broad model is built at the end of the forwards’ stage to over-fit the qualified input data. According to the generalised cross-validation, the model is optimised by deleting one last basis function from the model (GCV). GCV for a model is computed as follows for the training data with n observations:

$$\text{G}\text{C}\text{V} =\frac{\frac{1}{n}{\sum }_{i=1}^{n}{\left[{y}_{i}-f\left({x}_{i}\right)\right]}^{2}}{{\left[1-\frac{M+d\times (M-1)∕2}{n}\right]}^{2}}$$

4

Where M is the number of BF, d is the penalising parameter, n is the number of measurements, and *f* \({(x}_{i})\) denotes the MARS model’s expected values.

MARS is a non-parametric regression modelling technique that is flexible and does not make any assumptions about the relationships between the variables (Stull et al., 2014). The model is simple to understand and interpret (Kuhn and Johnson, 2013). MARS models typically exhibit a favourable bias-variance trade-off. While the models are sufficiently flexible to account for non-linearity and variable interactions (and so have a relatively low bias), the limited nature of the MARS basis functions precludes excessive flexibility (thus, MARS models have relatively low variance).

## 2.1.5 Maximal overlap discrete wavelet transforms (MODWT)

Distinctive wavelet transforms (DWTs) are modified by the maximal overlap discrete wavelet transform (MODWT) (Li et al., 2017a). Ideally, time series analysis can be done using the MODWT’s appealing qualities, which prevent missing data without subsampling. MODWT’s ability to extract additional information is enhanced because the coefficients of decomposed components in each layer are identical to the original time series. Time-series data are broken down into high-pass and low-pass filters using MODWT, which handles two feature sets. Further, high-pass filters can be broken down into several information levels depending on the suitable time frame (He et al., 2017). Low-pass filters reflect the real-time-series signal pattern called an approximation. The signal \({\varkappa }_{m}\) is decomposed through wavelet low-pass \({ı}_{m}\) and high-pass detail filters \({h}_{m}\) and reconstructed by digital reconstruction filters complementing decomposition filters. This principle is described in the equations below:

$${\mathcal{X}}_{m+1}\left(K\right)= \sum _{p}{h}_{p-2k}{\mathcal{X}}_{m}\left(P\right)$$

5

$${d}_{m+1}\left(K\right)= \sum _{p}{l}_{p-2k}{\mathcal{X}}_{m}\left(P\right)$$

6

$${\mathcal{X}}_{m}\left(K\right)= \sum _{p}{{h}^{\text{'}}}_{p-2k}{\mathcal{X}}_{m+1}\left(P\right)+ \sum _{p}{{l}^{\text{'}}}_{p-2k}{\mathcal{X}}_{m+1}\left(P\right)$$

7

## 2.2 Comparing Models

In this study, we proposed a MODWT-MARS model to predict the dissolved oxygen of a running river. In order to find a practical approach to machine learning methods and feature decomposition methods, a pool of six machine learning models and five feature decomposition methods were also incorporated. The theoretical description of the proposed algorithms (i.e., MODWT and MARS) was explained in the previous section, and this section provides a short overview of the comparing algorithms.

Breiman (2001) created an RF based on a random forest (RF), which included methods for regression and classification. The bootstrap resampling procedure generates a new set of training data from the initial training sample set N, and then bootstrap-set random forests are built using K decision trees. The RF model’s full specifications may be read here (Ali et al., 2020a). The random forests approach has become a prominent tool for classification, prediction, investigating variable relevance, selection, and outlier identification. RF comprises a group (ensemble) of basic tree predictors. Each tree may generate a response given a collection of predictor values (Jui et al., 2022; Yu et al., 2017).

With regularisation and the kernel technique, it is possible to reduce overfitting using the KRR (Kernel Ridge Regression) regression model (Saunders et al., 1998). The “kernel technique” can be used to generate a nonlinear form of ridge regression. Extending the general framework, kernel ridge regression allows nonlinear prediction. Linear, polynomial and Gaussian kernels are only some of the many options available for enhancing overall performance (You et al., 2018). The suggested KRR technique has the fundamental advantage of learning a global function and predicting any target variable using a regularised variation of least-squares.

The Bayesian modelling approach uses hierarchical data (Huang and Abdel-Aty, 2010). Bayesian regression uses this regularisation parameter, easily tailored to the data. The Gaussian maximum posterior estimate is discovered before the coefficient *w* and, with an accuracy of λ (-1), is treated as a random variable instead of a lambda. In contrast, most decision-making analyses based on maximum likelihood estimation entail determining the values of parameters that may significantly impact the analysis outcome and for which there is considerable uncertainty. The capacity to include previous information is one of the primary advantages of the Bayesian technique (Saqib, 2021).

A machine learning kernel method known as SVR (Support Vector Regression) can be used for various purposes, including forecasting time series. SVRs that use kernels can also learn the nonlinear trend of the training data. There are three SVR models to pick from, each with a different kernel (RBF, poly, and linear) (Yang et al., 2017). It should also be noted that the proposed KRR model in its generic sense has been used in many research including the forecasting of precipitation (Ali et al., 2020b), drought (Ali et al., 2019), wind speed (Alalami et al., 2019; Douak et al., 2013; Mishra et al., 2019; Naik et al., 2018; Zhang et al., 2019) and solar power (Dash et al., 2020).

K-Nearest Neighbors (KNN) algorithm is implemented using instance-based learning, which serves two purposes: 1) estimating the test data density function and 2) categorising the test data obtained from the test patterns (Shabani et al., 2020). Choosing the number of neighbours (k) is a crucial stage. This method’s efficiency depends on selecting samples from the nearest reference database (or most similar). If k is significant, other points from other classes can be placed inside the desired range of possibilities (Wu et al., 2008). The KNN method has been successfully applied previously (Ghiassi et al., 2017; Liu et al., 2020).

This study incorporated five decomposition methods (i.e., DWT, EMD, EEMD, MODWT and CEEMDAN) and six machine learning methods (i.e., MARS, RF, BNR, SVR, KNN and KRR) to address the prediction problem of dissolved oxygen concentration. Hyperspectral feature decomposition is DWT-assisted, and the features are evaluated for their efficacy in discriminating between subtly different ground covers (Bruce et al., 2002). The theoretical explanation of the method is explained by other researchers (Agbinya, 1996; Fowler, 2005; Shensa, 1992). Most recently, Huang et al. (Huang et al., 1998) developed an empirical mode decomposition (EMD) method for analysing the information contained in data derived from non-stationary and nonlinear systems. This algorithm decomposes the signal into a series of oscillatory functions that are “well-behaved,” which are referred to as the intrinsic-mode functions in this context (IMFs). When used with the powerful adaptive EMD tool, it behaves as a dyadic filter bank (Flandrin et al., 2004). It is handy for filtering out noise in the measurement domains (Khaldi et al., 2008). Torres et al. (2011) implemented the CEEMDAN process to reduce the computational cost and retain the ability to eliminate mode mixing. The readers are requested to go through the previous studies (Ahmed et al., 2021a; Zhang et al., 2017; Zhou et al., 2019) for getting further information on CEEMDAN.

## 2.2 Study Area And Data

The Surma River, Bangladesh, provided daily water quality factors. Figure 01 depicts the Surma River monitoring stations. This river drains one of the heaviest runoffs in the Surma-Meghna Basin system (Chowdhury and Ali, 2006). The Surma River originates in Assam’s Cachar district, flows through Bangladesh’s Sylhet and Sunamganj districts, joins the Meghna River near Bhairab Bazar Kishoreganj, and empties into the Bay of Bengal. Many studies are found regarding water quality analysis (Ahmed, 2017; Ahmed and Shah, 2017a; Ahmed and Shah, 2017b), riverbank erosion (Islam and Hoque, 2014), stream flows (Ahmed and Shah, 2017b), and water level modelling (Biswas et al., 2009). The Surma River’s Keane Bridge station provided the study’s water quality variables between January 2017 and December 2019 obtained 15 cm to 20 cm below the surface.

The selection of prospective predictive factors is critical for predictive modelling. Various studies reveal that some variables predict DO better than the others (Ahmed, 2017; Tomic et al., 2018). Ahmed (2017) used Biological Oxygen Demand (BOD) and Chemical oxygen demand (COD) for predicting the dissolved oxygen of the Surma River. Ay and Kisi (2012) observed that the temperature, pH, and electrical conductivity are highly influential over Fountain Creek, Colorado. However, Ranković (2010) claimed that pH and water temperature have a practical relation in DO prediction, whereas nitrates, chloride, and total phosphate have poor connections. It is found that pH is a standard variable for predicting DO values using ANN, followed by temperature. However, along with pH and temperature, some authors used oxygen-containing (PO43−, NO3-N) variables or oxygen demanding variables (NH− 4N, COD, and BOD) (Wen et al., 2013). Turbidity (Iglesias et al., 2014) and total solid can be considered essential water quality parameters, as their high value indicates typically high values of other parameters associated with water quality. The missing values were interpolated from two adjacent values. The fundamental statistics of the input variables are tabulated in Table 1.

Table 1

Basic Statistics, i.e., minimum (min), maximum (max), mean (M), standard deviation (SD), and coefficient of variation (CV) of the water quality variables in Surma River, Sylhet, Bangladesh

Variable | Acronyms | Unit | Min | Max | Mean | SD | CV (%) |

Humidity | h | % | 0.01 | 3.79 | 0.53 | 0.70 | 132 |

Water Temperature | w | 0C | 0.18 | 4.0 | 1.53 | 1.05 | 69 |

Rainfall | r | mm | 8.00 | 127 | 32.66 | 20.99 | 64 |

TDS | td | Mg/l | 10.0 | 522 | 142.3 | 102.15 | 72 |

pH | p | - | 5.70 | 8.25 | 6.92 | 0.55 | 8 |

Turbidity | tr | (NTU) | 4.18 | 42.62 | 11.84 | 7.37 | 62 |

Air Temperature | a | 0C | 12.30 | 33.30 | 27.10 | 4.93 | 20.00 |

DO | d | (mg/l) | 1.90 | 17.30 | 5.40 | 2.45 | 45 |

## 2.3 Development Of Modwt-mars Model

The multi-phase MODWT-MARS model and other benchmark models were created in Python using the sci-kit-learn machine learning platform (Pedregosa et al., 2011b). All simulations were performed on a machine with an Intel i7 processor running at 3.6GHz and 16 GB of RAM. Furthermore, a software platform such as “MATLAB2020” is employed for feature selection using neighbourhood component analysis (NCA). However, tools such as *matplotlib* (Barrett et al., 2005) and *seaborn* (Waskom et al., 2020) are employed to visualise the forecasted DO. Figure 2 depicts the workflow of the proposed MODWT-MARS model.

The wavelet transformation using MODWT was combined with the predictor variables filtered by the NCA approach to create the MODWT-MARS model. Identifying the wavelet-scaling filter types and decomposition level is vital in creating a substantial wavelet transformation model. Because there is no one approach to choose the optimal filter, Al-Musaylh et al. (2020) used a trial and error strategy. Quilty and Adamowski (2018) discovered an issue in the forecast model inputs due to erroneous wavelet decomposition during the wavelet-based forecasting model. The inaccuracy can be traced back to the decomposition process’s boundary conditions. They identified three problems: 1) improper use of future data, 2) unsuitable selection of decomposition levels and filters, and 3) incorrect division of validation and calibration data. The readers are encouraged to look up more information about the findings of Quilty and Adamowski (2018). The authors’ concerns about the development of MODWT and DWT decomposition were addressed in this study. After separating the DO variables to resolve more comprehensive information to create the MODWT-MARS model, Fig. 3 displays the time-series of the intrinsic mode functions (IMFs) and the residual components and decomposed components of MODWT.

There is no formula for verifying whether or not a model’s valid predictors are present (Tiwari and Adamowski, 2013). Although the research describes three input selection strategies for picking the time series of lagged memories of DO and predictors for an optimum model, the literature does not specify which method should be used. The autocorrelation function (ACF), partial autocorrelation function (PACF), and cross-correlation function (CCF) approaches are the three types of approaches to consider. A substantial antecedent behaviour in terms of the lag of DO from the predictors was found in this study, utilising PACF as the predictor (Tiwari and Adamowski, 2013; Tiwari and Chatterjee, 2011). Figure 4 demonstrates the PACF for DO time series showing the antecedent behaviour in terms of the lag of DO and decomposed components of DO using MODWT. It is clear from the figure that antecedent monthly delays are found significant.

The cross-correlation function determines which predictor’s antecedent lag selects the input signal pattern and which pattern the predictor selects (Adamowski et al., 2012). The cross-correlation function is used to establish the statistical similarity between the predictors and the target variable. The cross-correlation function between the predictors and the DO for the River Surma is depicted in Fig. 5a. Afterwards, a set of significant input combinations were determined by assessing rcross of each predictor with DO. In this plot, a 95% confidence level of the statistically significant rcross is shown in the blue line. It is found from the Fig. 5a the correlation of respective data with DO was found as the highest for all stations at lag zero (*r**cross* *≈* 0.25–0.45). A similar procedure is maintained for the decomposed predictor variables. Figure 5b to 5f demonstrate the rcross value between *#d**1* (DO) and *#d**n* (Predictors) and their respective residuals (n = 1 to 4). Figure 5 shows that the rcross value was ranged between 0.25 to 0.50 found more than 95% confidence level. The predictor data sets are normalised (Ahmed, 2017; Ali et al., 2019) between 0 and 1 to minimise one variable’s overestimation.

$${DO}_{norm}=\frac{DO-{DO}_{min}}{{DO}_{max}-{DO}_{min}}$$

6

In Eqs, (30), \(DO\) is the respective predictors, \({DO}_{min}\) is the minimum value for the predictors, \({DO}_{max}\) is the maximum value of the data and \({DO}_{norm}\) is the normalised value of the data. After normalising the predictor variables, the data sets are partitioned 70% of the data sets as training, 15% of the data as testing, and the remaining 15% of the data sets are used for validation. The theoretical descriptions of the models have been elucidated in Section 2.

Python-based *Scikit-learn* (Pedregosa et al., 2011a) was used to build this study’s SVR, RF, KRR, BNR, and KNN model. For SVR, the RBF (Radial Basis Function) was employed in developing the SVR model (Suykens et al., 2002). The RBF uses a faster function during training to examine non-linearities between the objective and predictor variables (Goyal et al., 2014; Lin, 2003; Maity et al., 2010). The tricky process of creating an accurate SVR model required identifying the 3D parameters (C, σ, and ε) (Hoang et al., 2014). This is why the NCA algorithm was used to select the parameters with the smallest weight value. (Pedregosa et al., 2011a)

Alternatively, the MARS model adopted the Python-based Py-earth package (Rudy and Cherti, 2017). The two MARS models used are cubic or linear piecewise functions. This study used a piecewise cubic model because it provided a smoother response. Also, the generalised recursive partitioning regression was adopted since it can handle multiple preconditioners. A forward and backward selection was used for optimisation. Initially, the algorithm ran with a ‘naïve’ model that only contained the intercept term. The training MSE was reduced by iteratively adding the reflected pairs of basis functions.

The accuracy of the hybrid MARS and other comparing models was constructed using piecewise cubic and linear regression functions, respectively. The best MARS model was selected using the lowest Generalised Cross-Validation (GCV) (Lin, 2003); the MODWT-MARS model yielded the lowest RMSE and the highest LM, demonstrating the most accurate predictions.

## 2.4 Model Evaluation Benchmarks

Several statistical score metrics were considered in the rigorous evaluation of the proposed model (i.e., MODWT-MARS) compared with the counterpart models. The commonly adopted model score metrics such as Pearson’s Correlation Coefficient (r), root mean square error (RMSE; mg/l), mean absolute error (MAE; mg/l), Nash- Sutcliffe efficiency (NSE), Absolute Percentage Bias (APB; %), and Willmott’s Index agreement (WI) (Krause et al., 2005; Legates and McCabe, 1999; Nash and Sutcliffe, 1970; Willmott et al., 2012) were used as the popular metrics used elsewhere (Ahmed et al., 2021a; Ghimire et al., 2019c). Due to the stations’ geographic alterations, the percentage error measures relative error values such as RRMSE, RMAE, and MAPE were considered. Owing to the inherent merits and weaknesses of the metrics, combining them is prudent (Sharma et al., 2019). Different sets of model evaluation metrics such as RMSE, MAE, and r2 (coefficient of determination) (Chu et al., 2020); NSE, RMSE, MAE, and PERS (persistence index) (Tiwari and Chatterjee, 2010); Legates McCabe’s Index (LM), Willmott’s Index (WI), RRMSE, RMAE (Ali et al., 2019; Ghimire et al., 2019b; Yaseen et al., 2019) were selected for evaluating the model with numerous sets of variables. The correlation coefficient (*r*) provides information about the linear association between forecasted and observed DO data; therefore, it is limited in its capacity. However, *r* is considered oversensitive to extreme values (Willmott et al., 1985). Moreover, RMSE and MAE can provide appropriate information regarding the forecasting skill, whereby RMSE evaluates the robustness of the model related to high values but focuses on the deviation of the forecasted value from the observed (Deo et al., 2017a). Alternatively, MAEs are not a perfect replacement for RMSEs (Chai and Draxler, 2014). The Nash-Sutcliffe Efficiency (NSE) is a widely used model evaluation criteria for the hydrological models. NSE is a dimensionless metric and a scaled version of MSE, offering a better physical interpretation (Legates and McCabe, 2013). However, the NSE over-emphasises the higher values of outliers, and lower values are neglected (Legates and McCabe, 1999). Due to the standardisation of the observed and predicted means and variance, the robustness of *r* is limited. Willmott’s Index (WI) was utilised to address this issue by considering the mean squared error ratio instead of the differences. The mathematical notations of the statistical metrics are as follows:

$$\text{M}\text{A}\text{E} (\text{m}\text{g}/\text{l})=\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}\left|{{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right|$$

7

$$\text{R}\text{M}\text{S}\text{E} (\text{m}\text{g}/\text{l})=\sqrt{\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}{\left({{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right)}^{2}}$$

8

$$\text{N}\text{S}\text{E}=1- \left[1- \frac{\sum _{\text{i}=1}^{\text{N}}{{(\text{D}\text{O}}_{\text{f}\text{o}\text{r}})}^{2}}{\sum _{\text{i}=1}^{\text{N}}{\left({\text{D}\text{O}}_{\text{o}\text{b}\text{s}}- {\stackrel{-}{\text{D}\text{O}}}_{\text{f}\text{o}\text{r}}\right)}^{2}}\right]$$

9

$$\text{r}= {\left\{\frac{\sum _{\text{i}=1}^{\text{N}}\left({\text{D}\text{O}}_{\text{o}\text{b}\text{s}}-{\stackrel{-}{\text{D}\text{O}}}_{\text{o}\text{b}\text{s}}\right)({\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-{\stackrel{-}{\text{D}\text{O}}}_{\text{f}\text{o}\text{r}})}{\sqrt{{\sum }_{\text{i}=1}^{\text{N}}{\left({\text{D}\text{O}}_{\text{o}\text{b}\text{s}}-{\stackrel{-}{\text{D}\text{O}}}_{\text{o}\text{b}\text{s}}\right)}^{2} {\sum _{\text{i} =1}^{\text{N}}({\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-{\stackrel{-}{\text{D}\text{O}}}_{\text{f}\text{o}\text{r}})}^{2}}}\right\}}^{2}$$

10

$$\text{R}\text{R}\text{M}\text{S}\text{E} \left(\text{\%}\right)= \frac{\sqrt{\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}{\left({{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right)}^{2}}}{\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}{(\text{D}\text{O}}_{\text{o}\text{b}\text{s}})} \text{x} 100$$

11

$$\text{R}\text{M}\text{A}\text{E} \left(\text{\%}\right)= \frac{\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}\left|{{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right|}{\frac{1}{\text{N}}\sum _{\text{i}=1}^{\text{N}}{(\text{D}\text{O}}_{\text{o}\text{b}\text{s}})} \text{x} 100$$

12

$$\text{M}\text{A}\text{P}\text{E} \left(\text{\%}\right)=\frac{1}{\text{N}} \left(\sum _{\text{N}}^{\text{i}=1}\left|\frac{{{ (\text{D}\text{O}}_{\text{f}\text{o}\text{r}}- \text{D}\text{O}}_{\text{o}\text{b}\text{s}})}{{\text{D}\text{O}}_{\text{o}\text{b}\text{s}}} \right|\right)\text{*}100$$

13

$$\text{W}\text{I} = 1- \left[\frac{\sum _{\text{i}=1}^{\text{N}}{\left({{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right)}^{2}}{\sum _{\text{i}=1}^{\text{N}}{\left(\left|{{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\stackrel{-}{\text{D}\text{O}}}_{\text{o}\text{b}\text{s}}\right|+ \left|{{\text{D}\text{O}}_{\text{o}\text{b}\text{s}}-\stackrel{-}{\text{D}\text{O}}}_{\text{o}\text{b}\text{s}}\right| \right)}^{2}}\right]$$

14

$$\text{A}\text{P}\text{B} (\text{m}\text{g}/\text{l}) = \left[\frac{\sum _{\text{i}=1}^{\text{N}}\left(\left|{{\text{D}\text{O}}_{\text{f}\text{o}\text{r}}-\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right|\right)*100}{\sum _{\text{i}=1}^{\text{N}}\left|{\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\right|}\right]$$

15

$$\text{L}\text{M} = 1- \left[\frac{\sum _{\text{i}=1}^{\text{N}}\left|{{DO}_{\text{f}\text{o}\text{r}-}DO}_{\text{o}\text{b}\text{s}}\right|}{\sum _{\text{i}=1}^{\text{N}}\left|\left|{{DO}_{\text{o}\text{b}\text{s}-}\stackrel{-}{DO}}_{\text{o}\text{b}\text{s}}\right|\right|}\right]$$

16

Where \({\text{D}\text{O}}_{\text{o}\text{b}\text{s}}\) and \({\text{D}\text{O}}_{\text{f}\text{o}\text{r}}\) denote the observed and model-forecasted values from the ith element; \({\stackrel{-}{\text{D}\text{O}}}_{\text{o}\text{b}\text{s}}\) and \({\stackrel{-}{\text{D}\text{O}}}_{\text{f}\text{o}\text{r}}\) denote their average, respectively, and N represents the observation’s number of the DO.