Water quality parameters
Table 1 presents the main characteristics of the treated/chlorinated water of the 19 systems. In general, the water quality was maintained from the outlet of the chlorinated water storage tank to the end of the network. The temperature range is typical for tropical countries and the pH values were close to 7. The turbidity and color of all samples were relatively low indicating the efficiency of the treatments and/or that the water sources were good. Similarly, in most cases TOC and DOC were quite low. Moreover, UV254 indicates a low presence of humic substances, and SUVA, in most cases less than 2 L/mg·m, suggests non-humic NOM and low molecular weight aliphatic compounds (Edzwald and Tobiason 2011).
Table 1 Water characteristics of the data sets
Parameters
|
Whole data set
(N = 216)
|
Spring water data set
(N = 70)
|
Surface water data set
(N = 67)
|
Mixed water data set
(N = 79)
|
Temperature
(°C)
|
22.0a ± 6.0b
17.0c – 32.1d
|
22.0 ± 5.9b
17.8 – 31.3d
|
23.6 ± 6.0
17.0 – 31.1
|
21.1 ± 4.1
17.9 – 32.1
|
pH
|
7.30 ± 1.01
5.94 – 8.17
|
7.30 ± 0.92
6.00 – 7.94
|
7.27 ± 0.80
6.19 – 7.90
|
7.30 ± 1.08
5.94 – 8.17
|
Turbidity
(NTU)
|
0.35 ± 0.79
< 0.01e – 7.88
|
0.18 ± 0.27
< 0.01e – 1.53
|
0.87 ± 1.35
< 0.01e – 6.66
|
0.46 ± 0.68
< 0.01e – 7.88
|
Apparent color
(U Pt-Co)
|
3.15 ± 8.64
< 0.01e – 31.30
|
0.01 ± 4.44
< 0.01e – 11.00
|
4.67 ± 17.23
< 0.01e – 31.30
|
4.11 ± 7.19
< 0.01e – 30.97
|
Free residual chlorine
(mg/L)
|
0.45 ± 0.37
< 0.02e – 1.64
|
0.40 ± 0.34
< 0.02e – 1.64
|
0.53 ± 0.41
0.04 – 1.64
|
0.43 ± 0.38
< 0.02e – 1.13
|
TOC
(mg/L)
|
0.50 ± 0.38
0.16 – 4.81
|
0.35 ± 0.18
0.16 – 2.42
|
0.79 ± 0.58
0.32 – 4.81
|
0.51 ± 0.23
0.18 – 3.52
|
DOC
(mg/L)
|
0.48 ± 0.37
0.10 – 4.74
|
0.30 ± 0.22
0.10 – 2.42
|
0.66 ± 0.56
0.25 – 4.74
|
0.45 ± 0.28
0.10 – 3.47
|
UV254
(cm-1)
|
0.0082 ± 0.0093
0.0004 – 0.0861
|
0.0046 ± 0.0034
0.0004 – 0.0478
|
0.0153 ± 0.0245
0.0043 – 0.0861
|
0.0091 ± 0.0078
0.0017 – 0.0829
|
SUVA
(L/mg·m)
|
1.99 ± 1.31
0.15 – 14.06
|
0.15 ± 1.14
0.15 – 14.06
|
2.31 ± 1.44
0.82 – 4.74
|
2.25 ± 1.65
0.26 – 8.88
|
TTHM
(µg/L)
|
10.64 ± 15.24
< 0.20e – 91.31
|
7.22 ± 6.79
< 0.20e – 24.62
|
19.91 ± 27.90
< 0.20e – 91.31
|
10.65 ± 17.12
< 0.20e – 65.45
|
a Median, b Interquartile range (IQR = Q3 – Q1), c Minimum, d Maximum, e Detection limit.
The low values in the above parameters related to NOM and the low concentrations of residual free chlorine justify the low concentrations of TTHMs, where only two samples slightly exceeded the 80 μg/L regulated by the US EPA (US EPA 1998). As for the dominant species of THMs, chloroform occurred in a higher percentage on average (62% of the samples) and in low concentrations (10.60 ± 13.86 μg CHCl3/ L). In addition, the species CHBrCl2, CHBr2Cl and CHBr3 were frequently found, but at much lower concentrations (i.e., < 2 μg/L). Such speciation of THMs has been reported in other studies (Sérodes et al. 2003). In general, in all the parameters (except in pH and free residual chlorine), surface water values at least duplicate spring water ones, and the mixed and the whole date set values were in between. That is expected as surface water is highly influenced by allochthonous and autochthonous production, and the effect is also observed in the whole and the mixed water data sets. Furthermore, the higher concentration of precursor (e.g., TOC, UV254) is reflected in higher THMs concentration.
Correlation of independent variables with THMs in treated water
The Anderson–Darling statistical test (Ryan 2007) showed that the dependent (TTHMs concentrations) and most of the independent variables presented a non-normal distribution across all data sets (p value < 0.05) (Table S1, Online Resource 1). This is expected because the data comes from systems with different operational characteristics. The data presented a positively skewed distribution, which is characterized by having a large amount of data in the low ranges of the parameter compared to the higher ranges. Therefore, to evaluate the correlation between the variables, Spearman's non-parametric test was used (Kurajica et al. 2020).
Temperature and pH showed non-significant and weak correlations (p value > 0.05, rs < 0.3) in all data sets (Table 2), expected as both parameters were relatively stable (Table 1). This differs from those reported by Al-Tmemy et al. (2018) for treated water from five treatment plants in Iraq where they found a significant and moderate correlation for both parameters. Accordingly, an increase in temperature tends to increase the reaction rate between organic matter and chlorine, and the THMs concentration increase with pH because many hydrolysis reactions, which occur in basic medium, promote their formation.
Turbidity presented a weak correlation in all data sets (rs < 0.3) and was significant (p value < 0.05) only in the whole data set and surface water data set (Table 2). Tsitsifli and Kanakoudis (2020) reported a greater correlation between turbidity and TTHMs (r = 0.553) for two treatment plants using surface sources. With regard to apparent color, a low and significant positive correlation in the surface water data set were observed, in the others, the correlation was not significant (Table 2). Abdel et al. (2014) reported Pearson correlation coefficients between THMs and color between 0.87 to 0.93 for treated water at four treatment plants in Egypt.
Free residual chlorine showed a significant correlation in the whole data set and in the spring and mixed water data sets (Table 2). In addition, the correlation was moderate and positive in all data sets. Contrary, some authors reported negative correlations between this parameter and TTHMs (Feungpean et al. 2015; Kumari and Gupta 2015). This inverse correlation can be attributable to radial diffusion and wall consumption of residual chlorine while THMs formation (Kumari & Gupta, 2015). However, similar to the present study, positive and significant correlations have been attributed to the covariance of operational parameters or interactions between parameters (Salam et al. 2020).
With regard to the NOM, TOC and DOC presented a moderate positive correlation (0.3 < rs < 0.7) and significant (p value < 0.05) in all data sets (Table 2), which agrees with the correlation values reported by several authors between 0.47 to 0.57 (Kumari and Gupta 2015; Shahi et al. 2020). Considering that chlorine reacts with NOM to produce THMs, the trend is that as TOC and DOC increase, the concentration of THMs increases, as long as sufficient free residual chlorine is available (Kumari and Gupta 2015). Also, it was found that UV254 presented a significant and moderate positive correlation in the whole data set and surface water data set, however, in the other data sets the correlation was weak and not significant. Similar, significant and moderate observations were reported by other researchers for UV254 and THMs (Semerjian et al. 2009; Kumari and Gupta 2015). Finally, the SUVA only presented a significant, but low negative correlation in the mixed water data set (Table 2). Other studies have reported low and negative correlations for SUVA, but not significant (Babaei et al. 2015).
Table 2 Spearman correlation between TTHMs and the independent variables
Parameters
|
Statistic
|
Whole data set
(N = 216)
|
Spring water data set
(N = 70)
|
Surface water data set
(N = 67)
|
Mixed water data set
(N = 79)
|
Temperature
|
rs
|
0.042
|
-0.072
|
0.150
|
-0.051
|
p value
|
0.548
|
0.568
|
0.242
|
0.654
|
pH
|
rs
|
-0.010
|
-0.034
|
-0.065
|
0.108
|
p value
|
0.884
|
0.785
|
0.613
|
0.347
|
Turbidity
|
rs
|
0.146
|
0.050
|
0.321
|
-0.182
|
p value
|
0.036
|
0.687
|
0.010
|
0.111
|
Apparent color
|
rs
|
0.135
|
0.189
|
0.164
|
-0.275
|
p value
|
0.058
|
0.128
|
0.199
|
0.023
|
Free residual chlorine
|
rs
|
0.392
|
0.432
|
0.220
|
0.489
|
p value
|
< 0.001
|
< 0.001
|
0.083
|
< 0.001
|
TOC
|
rs
|
0.454
|
0.330
|
0.325
|
0.380
|
p value
|
< 0.001
|
0.007
|
0.009
|
< 0.001
|
DOC
|
rs
|
0.492
|
0.366
|
0.370
|
0.430
|
p value
|
< 0.001
|
0.003
|
0.003
|
< 0.001
|
UV254
|
rs
|
0.337
|
0.224
|
0.357
|
0.113
|
p value
|
< 0.001
|
0.071
|
0.004
|
0.325
|
SUVA
|
rs
|
0.014
|
-0.109
|
0.104
|
-0.256
|
p value
|
0.842
|
0.386
|
0.417
|
0.024
|
Modeling THMs formation within distribution system
As shown in Table 3, a linear, logarithmic, and exponential models were developed for each type of water. All models were significant (p value < 0.05 of F test) and in most cases the Durbin - Watson value was found between 1.5 to 2.5 as recommended in the literature to avoid autocorrelation problems (Tsitsifli and Kanakoudis 2020). The models presented a wide range of adjusted R2, from 0.132 to 0.687 indicating a varied performance and adjustment of the data.
The most appropriated models (in bold in Table 3) were selected not only because the values of the coefficient of determination, but also for statistical parameters related to the error (i.e., SE, MSE, MAE). For the whole data set, spring and mixed water data sets, the models 1, 4 and 10, respectively, presented the lowest values of SE, MSE and MAE and they were selected although they presented a slightly lower R2. However, in these models the R2 of 0.448, 0.657 and 0.531, respectively (Table 3), remain satisfactory and comparable to those reported by several authors (Babaei et al. 2015; Feungpean et al. 2015; Tsitsifli and Kanakoudis 2020). In the surface water data set, model 7 presented the lowest value of SE, MSE and MAE, and the highest value of R2 (Table 3). Therefore, models 1, 4, 7 and 10, all linear, were selected as the ones with the best performance and goodness-of-fit. Among those models, a greater goodness-of-fit is observed in those of spring waters (of higher quality) followed by the model of the mixed water data set, then the model of the whole data set and a lower performance in the case of the surface water data set. In general, those models can be considered moderately robust and could be improved by including some parameters and operational variables that affect the formation of THMs in distribution networks (e.g., bromide ion, contact time, chlorine dose) (Nikolaou et al. 2004).
Through a more detailed analysis of each of the chosen models, it can be determined which are the most influential variables in the formation of THMs by type of water source. Thus, the model 1, similar to models reported by Kumari and Gupta (2015), includes the variables pH, free residual chlorine, DOC and UV254. In the case of the spring water data set, model 4, free residual chlorine, DOC and turbidity were included, the latter variable has also been used in THMs prediction models (Al-Tmemy et al. 2018). Finally, in the surface and mixed water data sets, models 7 and 10, free residual chlorine and organic matter content such as DOC and TOC respectively, are observed as influential.
Table 3 TTHMs predictive models for various data sets
Data set
|
Model
|
R2
|
Adjusted R2
|
F test
(p value)
|
SE
|
MSE
|
MAE
|
Durbin - Watson
|
n
|
Whole data
|
1
|
Linear:
|
0.448
|
0.429
|
25.43
(< 0.001)
|
8.67
|
74.93
|
6.58
|
1.54
|
131
|
2
|
Logarithmic:
|
0.568
|
0.557
|
52.14
(< 0.001)
|
11.94
|
142.60
|
7.22
|
1.59
|
123
|
3
|
Exponential:
|
0.331
|
0.314
|
19.42
(< 0.001)
|
14.08
|
198.12
|
8.83
|
1.77
|
122
|
Spring water
|
4
|
Linear:
|
0.657
|
0.628
|
22.97
(< 0.001)
|
2.32
|
5.38
|
1.93
|
1.56
|
40
|
5
|
Logarithmic:
|
0.718
|
0.687
|
22.91
(< 0.001)
|
2.78
|
7.73
|
2.08
|
1.58
|
41
|
6
|
Exponential:
|
0.281
|
0.224
|
4.94
(0.005)
|
4.82
|
23.23
|
3.35
|
1.42
|
42
|
Surface water
|
7
|
Linear:
|
0.342
|
0.306
|
9.60
(< 0.001)
|
12.22
|
149.33
|
10.09
|
1.84
|
40
|
8
|
Logarithmic:
|
0.328
|
0.291
|
8.78
(0.001)
|
13.92
|
193.82
|
10.77
|
0.84
|
39
|
9
|
Exponential:
|
0.177
|
0.132
|
3.97
(0.027)
|
14.67
|
221.06
|
11.19
|
1.61
|
40
|
Mixed water
|
10
|
Linear:
|
0.531
|
0.508
|
23.23
(< 0.001)
|
6.71
|
45.04
|
5.50
|
2.12
|
44
|
11
|
Logarithmic:
|
0.630
|
0.593
|
17.04
(< 0.001)
|
21.96
|
482.40
|
10.80
|
1.74
|
45
|
12
|
Exponential:
|
0.501
|
0.466
|
14.39
(< 0.001)
|
18.94
|
358.57
|
11.88
|
2.14
|
47
|
Nomenclature: TTHM: total trihalomethanes (µg/L); Cl: free residual chlorine (mg/L); UV254: ultraviolet absorption at 254 nm (cm−1); DOC: dissolved organic carbon (mg/L); AP: apparent color; T: turbidity (NTU).
Validation of THMs models
Table 4 presents the validation results, R2, SE, MSE and MAE, as well as the results of T test for each model. The R2 were between 0.359 to 0.772, which demonstrated a satisfactory level of explanation of the observed variability and are comparable with those reported by Golfinopoulos and Arhonditsis (2002) (i.e., 0.37 to 0.54). Similar to the calibration phase, SE, MSE and MAE results showed that models 4 and 10 (spring and mixed water respectively) performed better. Also, the bias of the four models determined by a T test (Shahi et al. 2020) indicated no statistically significant difference between the predicted and measured average values (p value > 0.05, Table 4). Furthermore, Fig. 2 shows that most of the data are within the prediction interval for all the models. In case of the whole data set and surface water (Fig. 2a and 2c) the data tend to move away from the line of best fit above 30 µg/L. In the case of the models for spring water and the mixed water (Fig. 2b and 2d), with lower TTHMs concentrations, the data tend to distribute more evenly. Therefore, these models seem to perform better at TTHMs concentrations lower than 30 µg/L.
Table 4 Validation of proposed models for the prediction of TTHMs in the distribution systems
Data set
|
Model
|
N
|
R2
|
SE
|
MSE
|
MAE
|
t
value
|
p value
|
Significance
|
Whole data
|
1
|
50
|
0.393
|
10.05
|
101.05
|
6.95
|
0.56
|
0.576
|
No
|
Spring water
|
4
|
14
|
0.598
|
2.83
|
8.03
|
2.31
|
-1.05
|
0.303
|
No
|
Surface water
|
7
|
17
|
0.359
|
15.84
|
250.85
|
11.20
|
0.96
|
0.346
|
No
|
Mixed water
|
10
|
16
|
0.772
|
4.40
|
19.33
|
3.50
|
0.44
|
0.665
|
No
|