Using Water Quality Parameters To Predict Trihalomethane Ion Based On An Articial Neural Network Model

Background: Trihalomethanes (THMs) are the rst disinfectant by-products in the drinking water distribution network and classied as potential carcinogens. The presence of THMs in chlorinated water depends on the pH, water temperature, contact time between water and chlorine, type and dose of disinfection, bromide ion concentration, and type and concentration of natural organic materials (NOMs). In the present study, the formation of THMs was evaluated by water quality parameters and modeled by an articial neural network (ANN) approach through ve water distribution networks (WDNs) and Karoun river in Khuzestan province. Results: This study was conducted from October 2014 to September 2015. Results showed that THMs concentration ranged in ve WDNs, including Shoushtar, Ahvaz 2, Ahvaz 3, Mahshahr, Khorramshahr, and Karoun river through N.D.-9.39 µg/L, 7.12-28.60, 38.16- 67.00, 17.15-90.46, 15.14-29.99 and N.D.-156, respectively. Conclusions: The concentration of THMs exceeded Iran and EPA standards in many cases in Mahshahr and Khorramshahr WDNs. Evaluation of R 2 , MSE, and RMSE showed the appropriate correlation between measured and modeled THMs, indicating good ANN potential for estimating THM formation in water sources.


Introduction
Water-related epidemics have considerably decreased by using chlorine as disinfection for drinking water resources in 1904. Researchers have discovered a new compound called THM in chlorinated disinfected water that forms under the reaction of chlorine and NOM, both of which are known to be precursors of THMs. Because of the low molecular weight of these precursors, they do not eliminate by conventional water treatment units [1][2][3]. The type and concentration of THMs depend on several factors, i.e. the type of chlorinated disinfectants, contact time, pH water, and concentration of added disinfectants, organic materials, remaining chlorine, and bromide ion. Chloroform (CHCl 3 ) is the most frequent compound among other THMs (over 70% of THMs compounds belong to chloroform in many cases) that classi ed as B 2 according to the international agency for research on cancer (IARC). Effects of THMs on health are divided into acute and chronic effects. Skin lesions, allergic symptoms, and poisoning are acute complications, and cancer infection is a chronic complication [4]. Due to the health effects of THMs, USEPA in 1979 imposed regulatory controls on the amount of THMs in drinking water. Accordingly, the maximum THM concentration in drinking water of 100 µg/ l was considered the annual average, which was reduced to 80 µg /l in 1998 [5].
Researchers use different models for the evaluation of variations of environmental pollutants in order to better management of environmental resources. The use of these models is very complex, and requires a signi cant amount of eld data for analysis. In addition, many of the statistical models consider the relationship between response and predicted variables as linear with normal distribution. However, evaluation of environmental issues is under numerous factors. Thus the traditional models may not be practical and robust enough to solve the environmental issues. In other words, they have weak accuracy for nonlinear modeling relationships with many different variables. ANN is capable of evaluating the complex nonlinear relationships with high accuracy. At the same time the ANN technique is exible enough and can reveal the hidden relationships among data. Therefore, it facilitates modeling of nonlinear behavior. ANN is the modeled on the biology of the human brain, in which millions of neurons are linked together to process different complex information [6]. In order to model THMs compounds, a suitable method is needed. The compounds that are affected by different factors and standard mathematical models are not capable of analyzing them. Due to the simplicity and strangeness of ANN for simulation, prediction, and modeling, many researchers used it [6,7]. Recently, many scienti c branches, including water engineering and biological and environmental sciences, used the neural network approach. For example, it is used to simulate and predict the concentration of different pollutants in the air, water, and earth [8][9][10][11][12][13][14][15][16].
Recent studies showed that ANN is a vital tool for enhancing the performance of water resources management systems [17,18]. This approach also describes the behavior of water quality parameters with higher accuracy than other methods, e.g. linear regression [14]. Since surface water is one of the main drinking water sources in Iran, and the application of chlorination systems is the most frequent method of disinfection for drinking water, the formation of THMs compounds increased in treated water.
In recent years, drinking water resources in Khuzestan province, including Ahvaz2, Ahvaz3, Mahshahr, Khorramshahr and Shoushtar water treatment plants, and Karoun river confronted with a high level of pollution and water shortage crisis. Therefore, the potential of THMs formation is high during the water treatment, and the necessity of momentary management of THMs with robust tools like ANN is vital. In this study, ANN is used to predict the concentration of THMs in WDNs by the in uencing parameters.

Type of Study
In this descriptive, analytical, and cross-sectional study, quantitative and qualitative levels of THMs and their precursors evaluated and modeled at water withdrawal points of the Karoun river, including Shoushtar, Ahvaz, Mahshahr, and Khorramshahr during 12 months of sampling (from October 2014 to September 2015). In every sampling, one sample was recorded from raw water before the water treatment plant, and three samples were taken through WDNs (the rst, middle, and end of WDN). Sampling was performed twice each month. Water temperature, pH, free residual chlorine (FRC), and UV 254 were measured during sampling. Water temperature (°C) was detected by a digital thermometer, made in Germany, with an accuracy of around ± 0.05. FRC and pH were observed by digital chlorine/pH meter, made in Palintest company from England model Multi 1000. In this colorimetric method, red phenol and diethyl-p-phenylenediamine (DPD) tablets are used as pH and FRC markers, respectively. Observing ranges of the devices are 0-5 mg/l and 6.8-8.4 for FRC and pH meters, respectively.
The concentration of THMs was measured by an Agilent 6890 Gas Chromatograph (USA) with a microelectron capture detector (µECD). All stages of the study, including sampling, sample preparation and stabilization, and measurements, were conducted according to USEPA standards (EPA-METHOD 551.1) in the hydrology laboratory of Ahvaz Water Treatment Plant No.2, Health faculty of Ahvaz Jondishapour University of Medical Sciences, and Iran Mineral Processing Research Center.
The samples taken by grab sampling were used to analyze dissolved organic carbon (DOC) and ultraviolet absorption at a wavelength of 254nm. DOC samples were analyzed by a Shimadzu TOC Analyzer-VCSH (Japan). The water samples collected for measuring bromine ions were analyzed by a Waters Alliance 2695 ion chromatography (USA) equipped with a Waters 2465 electrochemical detector (USA).
As shown in Fig. 1, the neural network model includes the input values that are multiplied by a set of weights. The results aggregated in the neurons of the middle layer, and nally, the outputs are calculated using Equation 1 [7,9,16,19].
In this equation, y is the simulated output, W T ij is the transpose of the weight of input i for neuron j, P is the input vector, and b is the bias.
First, the neural network was trained for a portion of the data to determine the best values for its weights and biases. The network was then tested for the data over same. In the next step, several evaluation criteria were used to determine the best method for training the network. To determine the best number of hidden neurons, the network was built several times with different numbers of neurons and trained several times in each case, and the results were compared in terms of root mean square error (RMSE), mean squared error (MSE), and coe cient of determination (R 2 ) as formulated in Equations 2-4. In these equations, N is the number of data items, Pi is the value predicted by the network, Oi is the value obtained from the experiments, and i is the subscript of data items.
To determine the relative importance of each input variable for the solution, a sensitivity analysis was performed using Equation 5 (Garson's equation) [7,20].
In this equation, I j is the relative importance of the j th input for the output, N i and N h are the numbers of input neurons and hidden neurons, respectively, Ws denotes the connection weights, i, h, and o are the subscripts for input, hidden, and output layers, and k, m, n are the subscripts for input, hidden, and output neurons, respectively [7,9].
After data collection, the gathered data were processed in Excel. Then, the arti cial neural network for predicting THM concentration was built in Matlab. The inputs of this network were six parameters that affect THM formation, and its output was the concentration of THMs in water.

Results And Discussion
THM concentration at the points of water withdrawal from Karoun River Changes in THM concentration along Karoun River from Shoushtar to Khorramshahr in different seasons and the standard concentration of THMs in water are illustrated in Fig. 2. As Fig. 2 shows, the concentration of THMs in the Karoun River gradually increases as it ows from Shoushtar to Khorramshahr. The THM value of Karun River in Ahvaz is almost twice that of Shoushtar. While passing through Ahvaz, the river receives signi cant amounts of municipal and industrial wastewater, which increase its DOC and, therefore, THM level. As a result, in Mahshahr, the river has on average 1.5 times more THM than when it arrives at Ahvaz. In Khorramshahr, the DOC level of Karoun River is even higher than in Mahshahr, further increasing the potential for THM formation.
Both Mahshahr and Khorramshahr sections of the river are at risk of THM concentrations above the EPA standard. In all four points, the highest THM concentrations emerge during summer.
Variations and the relative importance of parameters affecting THM formation in the studied river sections The average values of six parameters affecting THM formation in the studied water samples (DOC, pH, Water Temperature, UV 254 , Bromide, and Chlorine Demand) are provided in Table 1.  reported that the potential for THM formation increased with increasing pH, resulting in THM concentrations of 9.7, 20.7, and 41.6µg/l at pH levels of 5.5, 7, and 7.9, respectively [21]. A study by Liang and Singer has also shown that more THM tends to form at pH=8 than at pH=6 [3]. Some studies have reported a linear relationship between pH and the formation of THMs [2]. However, in our study, the effect of pH on the THMs concentration in the studied areas and Karun River was not signi cant in general, which could be due to reasonably low variations in water pH over the length of Karoun River and during each year.
As shown in Fig. 2, the concentration of THMs in the drinking water of the studied networks usually exceeds the recommended level in summer and at the same time as the water temperature rises. Water temperature is an uncontrollable factor dictated by environmental conditions. Since rising temperature greatly accelerates the decrease of residual chlorine in water, it is challenging to maintain a speci c chlorine concentration in water distribution networks during the hot months of the year. High doses of chlorine should be used to ensure su cient residual chlorine in the water [22]. According to Villanova et al. and Rodriguez et al., water temperature is one of the factors that signi cantly affect the formation of THMs in water [22,23]. One study reported that the total THM concentration in three water distribution systems was 34.2, 35.5, and 35.7µg/l when water was more relaxed than 15°C and increased to 64.2, 40.6, and 60.8µg/l when the water had a temperature above 15°C [22]. Our results showed that water temperature had a notable impact on the formation of THMs in the Shoushtar and Mahshahr water distribution networks.
Examining the bromide ion concentration along Karoun, it was observed that this parameter also gradually increases from Shoushtar to Khorramshahr. The results showed an increase in bromide ion levels due to a gradual increase in water EC throughout the river. This increase is much more pronounced in the Khorramshahr section, where the bromide ion concentration increases averagely of three times as much as in the Ahvaz section. According to the results, the only place where the bromide ion concentration signi cantly affects THM concentration is the Khorramshahr water distribution network.
One of the reasons for the high bromide concentration ion in the Khorramshahr segment of Karoun is its proximity to the Persian Gulf and the effects of the tides. Bromide ion is an inorganic precursor for the formation of disinfectant by-products. This ion is naturally present in the groundwater of coastal areas (because of the seawater seepage). In chlorinated water, bromide ions are oxidized by hypochlorous acid (HOCl), forming hypobromous acid (HOBr), which reacts with natural organic matter to form disinfectant by-products. Many studies have shown that the simultaneous presence of bromide and chlorine in a drinking water source during the chlorination process can lead to bromine and bromochlorine by-products [24][25][26].
In a study by Kampioti et al. on the Greek coastal city of Heraclion, they observed high concentrations of bromide ions in raw water (4.0-4.2mg/L). They reported that the bromine components of THMs were dominant over the chlorine components of disinfectant by-products in drinking water [27]. In the present study, the amount of residual chlorine was found to be the factor with the most signi cant effect on THM concentration in the Mahshahr water distribution network and also an essential determinant of this parameter in other places, including Ahvaz water treatment plants No.2 and 3, Khorramshahr, and Karoun River as a whole. The signi cance of the effect of free residual chlorine concentration on THM concentration in the studied water distribution networks is directly associated with the dose of chlorine used.
The optimal numbers of hidden neurons in the arti cial neural network model were determined by examining 5 to 15 neurons, and in each case, they were trained several times, and the results were compared in terms of MSE, RMSE and R 2 . Figure 3 shows the error of the models with different numbers of neurons for all points. The best neuron has the lowest MSE and RMSE while having an R 2 of greater than 0.9. As can be seen, the network tries to nd the best weights for the connections coming from every input and going into every neuron. At some point, the model has obtained the best possible weights, while producing worse results with more signi cant errors with any further change in the weight matrix.
In this study, for all sampling points, the model inputs were six parameters affecting the concentration of THMs (including DOC, water temperature, pH, bromide ion concentration, UV 254 absorbance, and residual chlorine content of water), and the model output was the concentration of THMs. Accordingly, the model was built with six neurons (six water parameters) in the input layer and one neuron in the output layer (simulate THM concentrations). The hidden neurons for the Shoushtar, Ahvaz 2, Ahvaz 3, Mahshahr, and Khorramshahr water distribution networks were 7, 13, 8, 7, and 7 neurons, respectively ( g .3). Neural network training was performed with 70% of the database to determine the best weights and biases, then 15% of the database was used to validate the model and the last 15% of the database was used to test the ability of the model to predict and simulate THM concentrations. The results of the testing of the developed arti cial neural network for all sites are presented in Fig. 4. Fig. 4 shows the relationship between predicted and measured THMs concentrations at all sites.
The THM concentration values predict by the network for each sites are incredibly close to and highly consistent with the measurements made at those points. Accordingly, no difference was observed between the predicted and measured THMs concentrations.
This consistency is shown more clearly in Part B of the Fig. 4, where the values simulated by the network are plotted against the measured values. In this diagram, most points are close to the bisector line representing R 2 = 1, indicating that R 2 is more signi cant than 0.95 for all sites. This is a pretty desirable level of consistency for environmental data.
Part C of Fig. 4 shows the error of the simulated values relative to the measured concentrations. As can be seen, over 90% of the data have almost zero error, indicating a high level of accuracy. The high accuracy of the model is also re ected in Part D of Fig. 4, which shows the histogram of error values. As this diagram demonstrates, the error histogram has a normal-like distribution, with the data points being more frequently located around the zero error. This indicates the excellent performance of the neural network in modeling and predicting the concentration of THMs [8, 9,11,12].

Conclusion
The analysis of water samples taken from all points of water withdrawal from Karoun River between quality parameters that affect THM formation, including water temperature, DOC, bromide ion, and consequently a gradual increase in THM concentration in water distribution networks from Shoushtar to Khorramshahr. All studied water distribution networks showed much higher THM concentrations in hot seasons (spring and summer) than in cold seasons (autumn and winter), with the difference being more pronounced in Ahvaz water treatment plant No.3, the Mahshahr and the Khorramshahr water distribution network. In the Shoushtar water distribution network, the concentration of THMs and their components is much lower than the standard levels of Iran and WHO guidelines. However, in Ahvaz water treatment plants No.2 and 3, the Mahshahr water distribution network, and the Khorramshahr water distribution network, these concentrations occasionally exceed Iranian and WHO standards in spring and summer.
DOC and free residual chlorine were the most signi cant impact on THM formation in all studied sections and Karoun River as a whole. The presence of free chlorine remaining in the studied sections can be partly attributed to the excessive use of chlorine in high doses, especially in the warm seasons. While THM formation is typically in uenced by water pH (the higher the pH, the higher the THM formation), in this study, water pH had little effect on THM levels in the studied water networks due to small pH changes along the river Karun and during each year.
The results showed that the developed arti cial neural network could produce acceptably accurate predictions of THM concentration in the studied water distribution networks. This model can be used with reasonable accuracy to estimate THM concentrations, so it can help organizations and authorities avoid costly THM measurements. Given the parameters included in the model, it can also facilitate the adoption of appropriate strategies to control THMs. The modeling results of this study suggest that the majority of the studied water treatment plants will bene t from more aptly chosen DOC and chlorine dose control measures for controlling THMs, although further technical and economic assessments are needed to decide which strategy would be more appropriate for and responsive to the situation.