Uncertainty analysis for the data-driven model using Monte Carlo simulations to predict sodium adsorption rate: A case study, Aras, Sepid-Rud, and Karun Rivers in Iran


 Water quality management requires a profound understating of future variations of surface and groundwater qualities for assessment and planning for human consumption, industrial, and irrigation purposes. In this regard, mathematical models, such as Box-Jenkins time series models, Bayesian time series models, and data-driven models are available for future prediction of water quality. However, the uncertainty associated with forecasting is one of the main problems of using these models towards water quality and future planning. In the present work, the uncertainty of the Adaptive Neuro-Fuzzy Inference System, based on Fuzzy c-means clustering, (ANFIS-FCMC) (genfis 3) model is quantified to analyze and predict Sodium Adsorption Rate(SAR) of water of Aras, Sepid-Rud, and Karun Rivers by using Monte Carlo simulations. The results indicate the combined standard and the expanded uncertainty simulated for SAR of Aras River water are 0.58 and1.16, respectively, and the gap is 2 .412 ±1.1622. Also, the combined standard and the expanded uncertainty simulated for SAR of Spid-Rud River water were1.11 and 2.22, respectively, and the gap is equal to 2 .235 ±2.22. Furthermore, the combined standard and the expanded uncertainty simulated for SAR of Aras River water are 2.063, and 4.126, respectively, and the gap is 4.79 ±4.126. Finally, the minimum uncertainty happened to predict SAR of Aras River using ANFIS-FCMC (genfis3) model and maximum SAR uncertainty belong to Karun River.


Introduction
For water resources management, forecasting water quality parameters for future planning and management of surface and groundwater is vital. Several statistical models are available for use in water quality modelings such as regression analysis, Box-Jenkins time series, Bayesian time series, and datadriven techniques. Uncertainty emerges from data collection of water quality monitoring and developed models. Uncertainty of the mentioned models is one of the challenges for application in practice. Some methods for analyzing uncertainty exist such as Expression of Uncertainty in Measurement (GUM) and Monte Carlo Simulations (MCS). In the present study, an adaptive network-based fuzzy inference system (ANFIS) is used to predict the sodium adsorption rate(SAR) of three rivers, including Aras, Spid-Rud, and Karun Rivers. However, analyzing the uncertainty of these models is necessary for practice. Some researchers have worked on the uncertainty of data-driven models that we briefly discussed in the following paragraphs. Zou et al. (2002) used a neural network embedded Monte Carlo (NNMC) method to determine the uncertainty in water quality modeling. By embedding a NN into the conventional Monte Carlo simulation, the proposed method significantly improves upon the conventional method. They applied uncertainty and risk analyses of a phosphorus index for the Triadelphia Reservoir in Maryland. They indicated that the NNMC method had the potential for effective uncertainty analysis of water quality modeling. Nakane and Haidary(2010) developed two linkage models of multiple regression approach in twenty-one river basins in the Chugoku district of west Japan for forecasting the total nitrogen (r2= 0.70, p<0.01) and the total phosphorus (r2=0.47, p<0.01) concentrations. They also applied Monte Carlo method-based sensitivity analysis and their results described that the total nitrogen (TN) regression model would be able to forecast stream water TN concentration between 0.4-3.2 mg/L. Jiang et al. (2013) proposed a method that integrates an ANN into the MCS to improve the computational efficiency of conventional risk assessment. They used an ANN model as a replacement for the iterative finite element method (FEM) runs. The number of water quality parameters decreased. They studied chemical oxygen demand(COD) in the Lanzhou section of the Yellow River in China as a case study and compared their method with the conventional risk assessment approach. They showed that the ANN-MCS-based technique can save much computational effort without a loss of accuracy. Cuesta Antanasijevic et al. (2014) performed the training, validation, testing, and uncertainty analysis of general regression neural network (GRNN) models for the prediction of dissolved oxygen (DO) in the Danube River. They determined optimal data normalization and input choice methods, the determination of the relative significance of uncertainty in different input parameters, in addition to the uncertainty analysis of model outcomes applying the Monte Carlo Simulation (MCS) method. They applied the GRNN models on monthly water quality parameters of 17 stations for 9 years. The finest outcome was gained applying min-max normalized data and the input selection according to the correlation between DO and dependent parameters, which provided the most precise GRNN model, and in combination the smallest number of inputs: pH, temperature, HCO -1 3, sulfate, nitrate, hardness, Na, Cl -, Conductivity, and Alkalinity. Cordoba et al. (2014) studied chlorine concentration in a water distribution system by applying a combination of ANN and Monte-Carlo. They examined the model on one specific location applying the hydraulic and water quality parameters for example flow, pH, and temperature. The model permits projecting chlorine concentration at chosen nodes of the water supply system. Šiljić et al.(2015) developed an initial general regression neural network (GRNN) using 20 inputs to predict biological oxygen demand. Then, they decreased inputs to 15 applying the Monte Carlo simulation technique as the input selection method. The best outcome was obtained with the GRNN model using 25 % fewer inputs than the initial model. Karimi et al. (2018)  Considering mentioned literature review, the number of studies to analyze the uncertainty of ANFIS-FCMC using Monte Carlo analysis are few. Also, none of the studies are related to the uncertainty of predicting sodium adsorption rate(SAR) in three rivers, including Aras, Spid-Rud, and Karun River in Iran.
The objective of the present study was to determine the uncertainty of ANFIS-FCMC to forecast the Sodium Adsorption rate (SAR) of water in Aras, Sefid-Rud, and Karun River using Monte Carlo analysis.

Material and methods
Fars water authority, Iran, has collected water quality data of the three rivers.

Adaptive network-based fuzzy inference system (ANFIS)
The principal of the development of the ANFIS is Takagi-Sugeno fuzzy inference system (FIS) (Güler and Übeyli 2005). The fuzzy system, working based on the membership function, is an effective method for problem-solving, modeling, data mining, and decreasing the intricacy of data (Jang 1993). The ANFIS applies both the ANN and FIS, which have a successful learning algorithm and a high training rate.
Consequently, both nonlinear and complex systems can accurately be modeled using ANFIS (Heddam et al.2012;Salahi et al. 2017;Asadi et al. 2020). we built the FIS structure using only fuzzy c-means clustering (FCMC), which is a clustering method using the Gaussian kind of membership function (Sivanandam et al.2007). The ANFIS-FCMC, Genfis3, function is available in MATLAB (2018a) defining FCMC method.

Genfis3 Function
The Genfis3 extracts a set of regulations, which simulates data performance. The function requires different sets of input and output data as input arguments. Equation1 presents a sample command-line for the FCMC (The Math Works 2018): Where Genfis3 is a function that builds a fuzzy system due to FCMC; the type is either 'Sugeno' or 'Mamdai', and cluster_n is the number of clusters. The Sugeno systems are mathematically wellorganized and suitable for math analysis (The Math Works, 2018); consequently, they were used in the present study.
We examined two fuzzies if-then rules based on a first-order Sugeno model to show the ANFIS Five distinct layers, which are used to form an ANFIS structure are available (Jang 1993;Wali et al. 2012). A detailed description of ANFIS is accessible in other studies(Yetilmezsoy 2010a). We explained ANFIS shortly which is: The first layer covers input nodes, which present the inputs to the ANFIS model. Each node is an adaptive node indicating a membership grade of a linguistic label. The first layer output suit the input of layer two. The second layer is a fixed node, which carries a prior quantity of MFs, which are selected based on the input value. The nodes in layer two calculate the fuzzy rules and create them for layer three with a suitable degree of activity. The third layer performances are to normalize the degree of activity for all rules. The defuzzification is the fourth layer. The fifth layer comprises the output nodes and computes the sum of all outputs of each rule getting from the preceding layer. Figure 2 describes a schematic flow diagram of ANFIS and MCS for the uncertainty of model predicting of the SAR.

Model selection criteria
To obtain the usability of the ANFIS, two statistical norms were examined, i.e., root means square error (RMSE) and mean bias error (MBE). The MBE indicates if the model forecasting in the training process overestimate (MBE>0) or underestimates (MBE<0). The MBE equal to zero is the best score.
ℎ are the observed(measured) and the predicted(forecasted) amount of the parameters respectively, the number of data indicated by n.

Model Efficiency
To confirm the performance of the RBF neural network and the ANFIS, we calculated the coefficient . to (8) ) 6 ( according to Equations ) � data of the parameters, respectively. predicted and the (measured) = the observed i and P i Where, O and � = the mean of observed and the projected data. Also, we calculated the index of agreement (IA) to determine how close the predicted data was to the observed data.
The index of agreement (IA) varies from 0.0 to 1.0.
The efficiency E, suggested by Nash and Sutcliffe (1970), is obtained as one minus the sum of the absolute squared differences between the projected data and observed data normalized by the variance of the observed amounts in the period under study. The variety of E is between 1.0 (perfect fit) and −∞.

Rahnama et al. (2020) developed a different ANN model, including the Radial base function(RBF) and
Adaptive neuro-fuzzy inference system(ANFIS) to predict SAR of water of four rivers including Aras, Sepid-Rud River, Karun, and Mond. Input variables were SO4 2-(mg/L), Cl -1 (mg/L), HCO3 -(mg/L), pH, EC (µs/cm) and Discharge (m 3 /s). The number of input data was 144 for training and 36 for testing the RBF and ANFIS models. In this study, the uncertainty of one of the ANFIS neural networks, , genfis3, is analyzed for the SAR of the first three rivers.

Monti Carlo Analysis ( MCA )
Our focus in this study was on evaluating the uncertainty of ANFIS neural network models by using Monte Carlo analysis (MCA). The Anderson-Darling (AD) test (Goktepe et al. 2008) was applied to evaluate the goodness-of-fit and find proper probability density functions (PDF) to each input variable with a 5% significance level.
After that, we developed ANFIS (genfis3) neural network using the generated 1000 data for each input parameter using MATLAB (2018). Furthermore, we computed the PDFs of the output of genfis3 using input data generated by MCA. Figure 2 illustrates ANFIS-genfis3 and MCA ANFIS-genfis3 of SAR prediction in the rivers.  Table 2 indicates the results ANFIS-genfis3 for predicting SAR of water in Aras, Sefid-Rud, and Karun Rivers. water is higher than two other rivers. R 2 and IA is also acceptable for the models' performance of SAR forecasting of the three rivers. However, the model performance of SAR prediction of the Spid-Rud river (R2=0.88) is higher than the two rivers. The main objective of the present work was to analyze the uncertainty of ANFIC-FCMC(genfis3) of the three mentioned rivers.

Aras River results
Results of ANFIS (genfis3) neural network uncertainty in predicting SAR of water of Aras River Table 3 indicates the fitting of different probably density function(PDF) distributions on sulfate, chloride, bicarbonate, pH, and flow rate of 180 inputs data to the genfis3 neural network of Aras River water. Results were tested with Anderson-Darling and p-value. Some results reached a p-value of less than 0.05, which is not a statistically well fit. However, the data collected and analyzed by the Iran water authority, and the authors of the article were not played a role in the uncertainty of water quality sampling and the dispersion of data, and the number of samples. Indeed, the best possible PDF fit on the available data was carried out in this study. This explanation is true for  Table 4 describes the results of the Monte Carlo method uncertainty on the results of genfis3 neural network of SAR of Aras river water.  Figure 4 indicates the tolerance interval of the SAR of water with a 95% confidence interval and minimum coverage of 95% of the population of Aras River using MCS. According to Figure4 which is almost a normal fit. The combined standard uncertainty simulated for SAR is equal to 0.58. The expanded uncertainty is equal to two times of standard deviations with a 95% confidence interval (Farrance and Frenkel, 2014). Therefore, the expanded uncertainty for SAR of Aras water river is equal to 1.16, and the gap is equal to 2 .412 ±1.1622.
. Figure 4 the tolerance interval of the SAR of water with 95% confidence interval and minimum coverage of 95% of the population of Aras River using MCA Table 5 illustrates the fitting of different probably density function(PDF) distributions on sulfate, chloride, bicarbonate, pH, and discharge of 180 inputs data to the genfis3 neural network of Sepid-Rud River water using Anderson-Darling and p-value tests.    Figure 6 illustrates the tolerance interval of the SAR of water with a 95% confidence interval and minimum coverage of 95% of the population of Sepid-Rud River using MCS. According to Figure 6 which is approximately a normal fit. The combined standard uncertainty simulated for SAR is equal to 1.11. The expanded uncertainty is equal to two times of standard deviations with a 95% confidence interval (Farrance and Frenkel, 2014). Therefore, the expanded uncertainty for SAR of Sepid-Rud River water is equal to 2.22, and the gap is equal to 2 .235 ±2.22. When we compare the uncertainty of SAR in Aras and Sepid-Rud River water, it is clear that Sepid-Rud River has higher uncertainty than Aras. Figure 6 the tolerance interval of the SAR of water with 95% confidence interval and minimum coverage of 95% of the population of Sepid-Rud River using MCS    Table 8 shows the results of Monte Carlo uncertainty analysis of genfes3 neural network in predicting SAR of water on Karun river. As indicated in Table 8, the standard deviation (SD) of MCA is higher than ANFIS-FCMC (genfis 3).   Figure 6, which is roughly a normal fit. The combined standard uncertainty simulated for SAR is equal to 2.063. The expanded uncertainty for SAR of Sepid-Rud River water is equal to 4.126, and the gap is equal to 4.79 ±4.126. Table 9 describes the difference between the mean and standard deviation of ANFIS(Grnfis3) models and Monte Carlo analysis.

Karun River results
When we compare the uncertainty of SAR in Karun and Sepid-Rud River water, it is clear that Karun River models (Genfis 3) have higher uncertainty than Aras in predicting SAR (Table 9). The uncertainty of SAR in Aras River water is the lowest compared to Sfid-Rud River and Karun River. Uncertainty emerges from confirmation of model and water quality data analysis. Considering the uncertainty of SAR of Aras River water ( and data measurement was precise. However, in Karun River, the SAR model from genfis 3 has higher uncertainty than Sifid-Rud River and Aras River, which come from model configuration and water quality collection. Coulthard et al. (2013) stated, "uncertainty in numerical models has many origins: input data, model simplifications, algorithm structure, calibration process, calibration and validation data, as well as equifinality." However, according to the study of Kamali et al. (2017), most uncertainties emerge from input data, and less uncertainty belongs to model configuration. The input data was collected by Fars Water Authority in Iran. Considering the literature review in present wor, none of the uncertainty studies predict SAR of water in Rivers using ANFIS-FCMC and MCA. Therefore, there was no opportunely to confirm the results with other researches.

Conclusions
Considering using Monti Carlo analysis(MCA) to evaluate the uncertainty of ANFIS-FCMC(genfis3) model for predicting Sodium Adsorption Rate(SAR) of water of Aras, Sepid-Rud, and Karun Rivers, the following conclusion can be reached.