Evaluation of the water quality data
As previously mentioned, standard monitoring of Lake Ludas includes reasonably regular analysis of 35 water quality parameters. The main issue with these data is that over the years, amid various restructuring, some of the monitored parameters were replaced with others. Another inconvenience is that while some of these quality parameters are analyzed monthly, others are examined only four times a year. This is the consequence of the complexity of the overall location. Namely, as seen in Figure 1, the complete area is made up of nine separate segments. Lake Palic incorporates seven of them. Lake Omladinsko is the eighth, and Lake Ludas the ninth. Hence, just one monthly sample within each segment would result in 315 analyzed data monthly, or 3780 annually. This inevitably leads to different attempts at reducing the number of tests being conducted. Therefore, if we were to conduct thorough data analysis, we would have to significantly reduce the number of data included in the research, either by reducing the variety of examined quality data or by decreasing the temporal frequency at least three times (from 12 data sets per year to 4). Considering this research aims to investigate temporal and spatial variations of the quality data, the reduction of the variety of water quality parameters included in the research would be an optimal choice. This way, the initial number of 35 parameters was reduced to 15 parameters (Table 1). These are the measurements that were generally implemented regularly at a monthly pace.
Table 1
Methods and measuring uncertainties of the selected water quality parameters
No.
|
Parameter
|
Notation
|
Units
|
Method
|
Measuring uncertainty
|
1
|
Temperature
|
Temp.
|
oC
|
2250B*
|
1.15%
|
2
|
Power of hydrogen
|
pH
|
-
|
500-H+B*
|
1.23%
|
3
|
Conductivity
|
Cond.
|
S/cm
|
SRPS EN 27888/2009
|
17%
|
4
|
Dissolved oxygen
|
DO
|
mg/L
|
SRPS EN 25813:2009/1/2011
|
19%
|
5
|
Dichromate chemical oxygen demand
|
CODCr
|
mg/L
|
EPA 410.4 US 5220D, ISO 15705
|
17%
|
6
|
Potassium permanganate chemical oxygen demand
|
CODMn
|
mg/L
|
4500-KMnO4*
|
11%
|
7
|
Five-day biochemical oxygen demand
|
BOD5
|
mg/L
|
SRPS EN 1899-1:2009
|
22%
|
8
|
Chlorophyll-a
|
Chl-a
|
mg/m3
|
10200-H*
|
25%
|
9
|
Suspended sediments
|
SS
|
mg/L
|
2540D*
|
22%
|
10
|
Total phosphorus
|
TP
|
mg/L
|
SRPS EN ISO 6878:2008
|
13%
|
11
|
Orthophosphates
|
PO43-
|
mg/L
|
SRPS EN ISO 6878:2008
|
13%
|
12
|
Total Kjeldahl nitrogen
|
TN
|
mg/L
|
4500-N.C*
|
3.5%
|
13
|
Nitrite nitrogen
|
NO2--N
|
mg/L
|
SRPS EN 26777:2009
|
21%
|
14
|
Nitrate nitrogen
|
NO3--N
|
mg/L
|
SRPS EN ISO 10304-1:2009
|
23%
|
15
|
Ammonia nitrogen
|
NH4+-N
|
mg/L
|
**
|
2.2%
|
* Clesceri et al. (1999) ** Čoha (1990)
|
Table 2 gives the results of the descriptive statistics , including the mean value, standard deviation (SD), minimum, median, and maximum value of the 15 quality parameters considered.
Table 2
Quality parameter characteristics
No.
|
Variable
|
Mean
|
SD
|
Minimum
|
Median
|
Maximum
|
1.
|
Temp
|
16.371
|
7.701
|
-4.30
|
16.20
|
29.50
|
2.
|
pH
|
9.034
|
0.660
|
7.06
|
9.05
|
10.63
|
3.
|
Cond.
|
1354.303
|
446.474
|
805.00
|
1253.00
|
4162.00
|
4.
|
DO
|
10.843
|
4.582
|
1.78
|
10.64
|
24.40
|
5.
|
CODCr
|
185.765
|
111.819
|
31.00
|
155.00
|
860.00
|
6.
|
CODMn
|
26.350
|
12.892
|
6.36
|
24.59
|
92.43
|
7.
|
BOD5
|
58.076
|
45.731
|
3.00
|
46.00
|
206.00
|
8.
|
Chl-a
|
382.078
|
364.549
|
0.10
|
273.00
|
2422.00
|
9.
|
SS
|
73.890
|
112.143
|
0.13
|
38.00
|
800.00
|
10.
|
TP
|
0.315
|
0.298
|
0.01
|
0.23
|
1.58
|
11.
|
PO43-
|
5.215
|
8.806
|
0.00
|
0.06
|
49.37
|
12.
|
TN
|
16.938
|
8.413
|
3.29
|
15.54
|
49.46
|
13.
|
NO2--N
|
0.011
|
0.018
|
0.00
|
0.00
|
0.12
|
14.
|
NO3--N
|
0.266
|
0.166
|
0.05
|
0.24
|
0.90
|
15.
|
NH4+-N
|
1.367
|
0.584
|
0.15
|
1.27
|
3.53
|
The next step in the study was to investigate which of the remaining parameters are representative for the upcoming quality analysis, by computing the Pearson correlation coefficient. The Pearson correlation coefficient r will have the value ranging from -1, indicating a complete negative correlation of the considered parameters, to +1, suggesting their complete positive correlation. The value of r approaching zero illustrates the absence of correlation for the examined parameters.
Figure 2 presents examples of different quality correlations of the Pearson correlation, e.g., Figure 2(b) displays the value of r for SS and Cond. where it is quite apparent these two quality parameters aren't correlated. On the other hand, Figure 2(a) shows the correlation of SS and COD, and Figure 2(c) presents the correlation for SS and Chl-a, where both show distinct positive correlations of the considered quality parameters.
Table 3 presents the correlation matrix for the 15 remaining quality parameters, where a correlation is considered significant if r is greater or equal r=0.5. Significant correlations were noticed between: pH and Temp (r=0.526); CODCr with BOD5 (r=0.595), Chl-a (0.648) and SS (r=0.546); Chl-a with SS (r=0.549) and TN (r=0.502).
Table 3
The Pearson correlation table
Further analysis helped to additionally reduce the number of parameters by including only one parameter from a group of available data for a considered characteristic. For example, the presence of organic matter in the water is described with CODCr, CODMn, and BOD5, yet including all of these data in further research would provide a biased appearance of the water quality. Thus, further studies that helped to select the most appropriate parameter to represent each water quality characteristic, were conducted. To pinpoint the most suitable parameters, the best suited distribution for the measured data was examined for each of the considered groups separately, the phosphorus content, nitrogen, and organic matter, arguing that such numerous water quality data should follow some type of distribution. This helped to differentiate the most reliable data in the group, and decide which should be discarded as less trustworthy. It should be pointed out that the representative distributions in this part of the study were not the best fitting ones for each of the compounds within a group; e.g., for TP, the best distribution was the Frechet distribution. At the same time, for PO43- it was a mixture distribution. The selection was made to compare the results, meaning the same distribution had to be assigned for all compounds within a group. Consequently, the five best-fitting distributions for each compound in a group were considered and compared. After identifying a distribution that is common for all the data within a group, and is the best among the available, the distributions presented in Table 4 were selected: the Extreme value distribution for the nitrogen group; the Student T distribution for the phosphorus group; and the LogNormal distribution for the evaluation of the organic matter content.
Table 4
Distribution fit test results
Parameter
|
Distribution*
|
PCS +
|
CVM +
|
AIC -
|
BIC -
|
HQIC -
|
LogLik. +
|
Com. -
|
TN
|
Extreme Value
|
0.219030
|
0.726715
|
-6.730
|
-6.745
|
-6.746
|
-3.352
|
2
|
NO2—N
|
0.000125
|
0.048696
|
2.557
|
2.542
|
2.541
|
1.292
|
2
|
NH4+-N
|
0.273204
|
0.085680
|
0.332
|
0.318
|
0.318
|
0.178
|
2
|
NO3—N
|
0.171210
|
0.515795
|
-1.948
|
-1.963
|
-1.963
|
-0.961
|
2
|
PO43-
|
StudentT
|
0.000000
|
0.000000
|
-3.917
|
-3.939
|
-3.940
|
-1.938
|
3
|
TP
|
0.000021
|
0.067025
|
-0.264
|
-0.286
|
-0.286
|
-0.113
|
3
|
BOD5
|
LogNormal
|
0.413103
|
0.731957
|
-9.809
|
-9.824
|
-9.823
|
-4.893
|
2
|
CODCr
|
0.531894
|
0.735407
|
-11.580
|
-11.595
|
-11.594
|
-5.779
|
2
|
CODMn
|
0.004310
|
0.238261
|
-7.612
|
-7.627
|
-7.627
|
-3.793
|
2
|
* the + and – signs indicate whether higher test results are better (+) or lower (-)
Additionally to the selected distributions, Table 4 presents the test results utilized for the decision process that includes the Pearson χ2 (Chi-square) test (PCS), Cramer von Mises (CVM), Bayesian information criterion (BIC), Akaike information criterion (AIC), Hannan-Quinn information criterion (HQC), Log Likelihood (LogLik) and the complexity (Com). The PCS tests the goodness of fit, where the null hypothesis was that the data is drawn from a population with the considered distribution, while the alternative hypothesis was that the data was not drawn from the given population. The results are presented using the p value, with the significance level of 0.05, where small p values indicate a small chance that the data came from the considered distribution. The CVM test is also presented in the form of p values, with the same meaning as in the previous (PCS) case, where smaller p values mean there is a smaller probability the data is from the considered distribution. The AIK is a test that includes the penalty for model complexity to protect against over-fitted models. Since it relies on the assumption of an infinite-sized sample, it is reliable only as a comparison test of various models instead as a general decision-making approach. Although for sample sizes less than 40, one can use the adjusted AIC formula, the considered dataset has a much larger size (over 100) and was analyzed using the standard AIC test, with lower results indicating more appropriate models. The BIC is also called the Schwarz information criterion, and much like AIC, it includes a penalty criterion against over-fitting models. A better model fit is indicated with smaller values of the results. The HQC is often utilized as an alternative to the AIC or BIC and measures the goodness of fit, where lower values of HQC imply either fewer explanatory variables, better fit of data, or both. The LogLik test also measures the model's goodness of fit, where higher values indicate a better fit. The Log Likelihood value ranges from negative infinity to positive infinity, making it inappropriate for drawing straightforward conclusions. Instead, it can be used to compare these values between various models and aid decision-making. Finally, the analysis includes the Com. indicating the complexity of the data distribution, where higher complexity is considered a negative characteristic, since it can lead to over-fitted models.
Based on the results, we selected the TN, TP, and CODCr as the most suitable parameters to represent the considered characteristics. After choosing the representative parameters, the measured data were further evaluated against the most appropriate distribution for the given method to analyze how well the data and the best distribution fit. The best-fitting distributions are not the same as the ones used for the selection of the representative parameters, since at this point, there was no need to accommodate various distributions anymore. The best distribution for CODCr and TN was the LogNormal distribution with distribution parameters μCOD=4.9987 and σCOD=0.5266 for CODCr and μTN=2.6767 and σTN=0.4746 for TN. The most suitable distribution for TP was the Frechtel Distribution with the distribution parameters αTP= 1.8384, βTP=0.2297, and μTP=-0.0752. Figure 3 displays the probability scale plots for these distributions, confirming a good fit of the data.
Regarding the selected distributions, the distribution fit test was implemented with the results given in Table 5, where it is clearly displayed the data match the selected distributions. Based on the given test results, it was safe to conclude the chosen parameters include reliable data and can provide a reasonable representation of the compound group, making them a sound choice for the upcoming evaluation.
Table 5
Distribution fit test results
Pair
|
CODCr
|
TN
|
TP
|
Null hypothesis
|
The data is distributed
according to the LogNormal distribution with parameters
(4.9987, 0.5266)
|
The data is distributed
according to the LogNormal distribution with parameters
(2.6767, 0.4746)
|
The data is distributed
according to the Frechet distribution with parameters
(1.8384, 0.2297, -0.0752)
|
Test
|
Statistic
|
p value
|
Statistic
|
p value
|
Statistic
|
p value
|
Anderson-Darling
|
0.2481
|
0.9714
|
0.6153
|
0.6333
|
1.8801
|
0.1071
|
Cramer von Mises test
|
0.0433
|
0.9154
|
0.0861
|
0.6571
|
0.3376
|
0.1063
|
Pearson χ2
|
12.0989
|
0.7371
|
22.8684
|
0.0624
|
32.9398
|
0.0048
|
Conclusion
|
The null hypothesis is not rejected at the 5 percent level based on the Cramer von Mises test
|
The null hypothesis is not rejected at the 5 percent level based on the Cramer von Mises test
|
The null hypothesis is not rejected at the 5 percent level based on the Cramer von Mises test
|
The statistical significance for Pearson's correlation was assessed through the p values for the given correlations. The null hypothesis was that the correlation coefficient of the bivariate population is equal to zero, while the alternative hypothesis was that the correlation coefficient is not equal to zero. Keeping in mind higher p values (with the 5% limit) represent statistically insignificant correlations, suggesting strong evidence for the null hypothesis or the absence of correlation. These values helped us distinguish quality parameter pairs with reasonable correlations. After evaluating the results, the following parameters were selected to be included in the further analysis: Cond. and pH indicating the ions content of the sample; CODCr measuring all organic contaminants, including those that are not biodegradable; Chl-a as an indicator of algae; SS indicating the presence of minerals and organic substances.
The p values matching the Pearson coefficients for the selected parameters are given in Table 6, where upper index ** suggests the null hypothesis should not be rejected, meaning the pair's correlation may be zero.
Table 6
Table of p values for the Pearson coefficients
Pair
|
Pearson r
|
95% CI
|
p-value
|
Pair
|
Pearson r
|
95% CI
|
p-value
|
Cond., CODCr
|
0.263
|
0.097 to 0.415
|
0.0022
|
pH, Cond.
|
-0.109
|
-0.274 to 0.063
|
0.2126**
|
Cond., Chl-a
|
-0.072
|
-0.239 to 0.100
|
0.4122**
|
pH, COD-Cr
|
0.468
|
0.324 to 0.591
|
<0.0001
|
Cond., SS
|
0.014
|
-0.157 to 0.184
|
0.8729**
|
pH, Chl-a
|
0.45
|
0.302 to 0.576
|
<0.0001
|
Cond., TP
|
-0.018
|
-0.187 to 0.153
|
0.8388**
|
pH, SS
|
0.376
|
0.219 to 0.513
|
<0.0001
|
Cond., TN
|
0.099
|
-0.072 to 0.265
|
0.2562**
|
pH, TP
|
0.2
|
0.031 to 0.358
|
0.0208
|
CODCr, Chl-a
|
0.64
|
0.528 to 0.731
|
<0.0001
|
pH, TN
|
0.056
|
-0.115 to 0.224
|
0.5193**
|
CODCr, SS
|
0.549
|
0.418 to 0.658
|
<0.0001
|
SS, TP
|
0.444
|
0.296 to 0.571
|
<0.0001
|
CODCr, TP
|
0.395
|
0.241 to 0.530
|
<0.0001
|
SS, TN
|
0.166
|
-0.004 to 0.327
|
0.0560**
|
CODCr, TN
|
0.273
|
0.108 to 0.423
|
0.0015
|
TP, TN
|
0.038
|
-0.133 to 0.207
|
0.6617**
|
Chl-a, SS
|
0.57
|
0.443 to 0.675
|
<0.0001
|
Chl-a, TN
|
0.331
|
0.170 to 0.474
|
0.0001
|
Chl-a, TP
|
0.47
|
0.326 to 0.593
|
<0.0001
|
|
|
|
|
Spatial and temporal distribution of the water quality data
Hoping to understand better the causes and nature behind the spatial and temporal alterations, the selected water quality data was further assessed by constructing box plots, Figures 4, 5, and 6 with marked median, interquartile range (IRQ) as the range between the 25th (Q1) and 75th (Q3) percentile, maximum (as Q3 + 1.5 IRQ), minimum (as Q1 − 1.5 IRQ), and outliers. Figures 4 and 5 present the temporal distribution of the box plots for the seven selected water quality parameters for the three sampling locations separately.
A general overview of the data shows an improvement of the water quality during the time interval from 2013 to 2016 (possibly 2017). Although, it is not a distinct improvement apparent through all of the quality indicators, there is a noticeable tendency supporting this statement. As an example, we can single out a decrease of the concentration of nutrient elements (TN and TP) on both north and middle part of Lake Ludas in 2013, and a straightforward increase in 2017, Figures 4 (a), (b), (e) and (d). Matching trends are noticeable with SS and Chl-a, given in Figures 4(j), (k), (m) and (n), showing decreased concentrations from 2013 until 2016.
On the other hand, the same parameters show conflicting information on the south part of the lake, Figures 4 (c), (f), (l), and (o). These different spatial tendencies can be explained as consequences of the inflow location of the Palic-Ludas channel, containing untreated wastewater as well as runoff from agricultural lands, and vicinity of the two sampling locations (north and middle), opposed to the remoteness of the south sampling location. Further examination supports this assumption, since all of the considered quality parameters have milder variations on the south, compared to the north and middle.
A more detailed examination of the data shows that in 2015 there was a significant rise in TP at the southern sampling location that can be detected on Figure 4(f), while according to Figures 4(d) and 4(e) the two other sites didn't endure such sudden changes in phosphorus concentrations. Considering the measurements are conducted only once a month, it is impossible to determine what caused this event with sufficient certainty. This type of conclusion could be made by a significant increase in the data collection frequency in space or time, potentially both, since information like that would provide significantly reliable information through a spatially and/or temporally denser data distribution. By examining Figures 5(a), (b), and (c) we can recognize higher conductivity in the southern part of the lake in all the examined years, also implying there is a significant spatial variation of the water quality between the considered locations. These spatial alterations of the water quality parameters also support the idea that additional water quality measurements are required.
Figure 6 displays the spatial distribution box plots jointly for the overall time interval from 2011 to 2018. This representation is used to eliminate the temporal influence of the data and help focus the attention onto the spatial variations. The results once again exhibit clear spatial distinctions of the water quality, affirming the previously noted poorer quality on the north part (Figures 6 (a), (b), (d), and (e)) that seems to be improving as one approaches the south part.
Considering that the current data are available only within the displayed locations, the only equitable explanation one can extract is that the systematic spatial quality alterations result from the inflow location of the Palic-Ludas channel on the north part of the lake. The influence of the Kires channel, or the effect the agricultural lands surrounding the lake have on its quality, would need to be monitored more thoroughly. This type of examination could be made by eliminating the temporal considerations and conducting extensive data sampling all around the lake, since it could provide an insight into the micro-distribution of the quality parameters around it. On the other hand, a more comprehensive understanding of temporal changes in the lake's water quality would require more frequent water sampling. For this purpose, the authors initially propose daily sampling of the lake for a shorter time interval (e.g., a couple of weeks). This type of data would provide understanding the frequency and intensity of temporal changes in the water quality data. Furthermore, these values could be associated with precipitation measurements to assess the influence of the weather on the quality variations as well. Subsequently, an even more frequent sampling should be conducted, e.g., hourly values, to identify short-term alterations of the quality data and the causes behind them. Seeing the restrained amount of information that can be drawn out of such long data series, regardless of the spatially distributed data availability, the authors would suggest implementing the proposed much denser data measurements, at least once for any lake, to help better adapt the monitoring approach for the encountered circumstances.
Finally, Figure 7 presents the scatter plot matrix providing a graphical representation of the correlations between pairs of the analyzed water quality parameters. The red lines mark the 95% density ellipses indicating the correlation between the two parameters, where narrower ellipses suggest a stronger correlation, such as the correlation between CODCr and Chl-a, and CODCr and SS, also supported by the Pearson's correlation coefficients given in Table 3. The different symbols employed on the scatter plot mark the locations (north, middle, and south) in Lake Ludas. Another interesting observation that can be made based on the scatter plot is that better the correlation between the considered parameters coincides with more distinct clustering of the data. For example, a higher correlation of previously mentioned CODCr and Chl-a, and CODCr and SS, also shows more pronounced grouping compared to TP and pH, or TP and TN, where both have low correlation coefficients, circle-shaped ellipses, and quite dispersed measured values.
Principal component analysis
The principal component analysis (PCA) boils down to finding new variables, called the principal components (PC), that capture as much of the data variation as possible (Pastor et al. 2016). The principal components are the linear combination of the analyzed variables (in this case, water quality parameters), computed so, that they are not correlated amongst each other, and so, that the first principal component captures the most of the original data variation, the second PC includes the next most variation, etc. When the analyzed data sets are presented in a graph where the principal components are the axes, similar data points will cluster together, which can be used to attain a deeper insight into the water quality parameter's nature (e.g., temporal, spatial, or other tendencies).
Table 7 gives the standardized variance where the sum of all variances equals the number of analyzed parameters (in this case, 7), the proportion of the variance contained in each PC, and the cumulative proportion of the standardized variance. We can see the first two components include 59% of the total variance, while the first three include over 72%.
Table 7
Variance of the principal components
Component
|
Variance
|
Proportion
|
Cumulative proportion
|
1
|
2.948
|
0.421
|
0.421
|
2
|
1.180
|
0.169
|
0.590
|
3
|
0.931
|
0.133
|
0.723
|
4
|
0.804
|
0.115
|
0.838
|
5
|
0.491
|
0.070
|
0.908
|
6
|
0.381
|
0.054
|
0.962
|
7
|
0.266
|
0.038
|
1.000
|
The proportion of the standardized variance in each PC is also presented by a scree plot on Figure 8, where it is visually clearly displayed that the first PC encloses most of the data variance, as indicated with the apparent drop of variance between components 1 and 2. In contrast, the variances between all the subsequent components are minor in comparison.
The coefficients that define the principal components, as a linear combination of the analyzed water quality parameters, are listed in Table 8. The coefficients' higher absolute values imply a more significant impact of the water quality parameter on the deliberated PC.
Table 8
Coefficients of the principal components
Parameter
|
Principal component
|
PC1
|
PC2
|
PC3
|
PC4
|
PC5
|
PC6
|
PC7
|
pH
|
0.361
|
0.280
|
-0.037
|
0.709
|
-0.297
|
-0.426
|
-0.128
|
Cond.
|
0.038
|
-0.791
|
0.464
|
0.153
|
-0.058
|
-0.133
|
-0.337
|
CODCr
|
0.487
|
-0.228
|
0.122
|
0.194
|
-0.063
|
0.410
|
0.697
|
Chl-a
|
0.498
|
0.074
|
-0.175
|
-0.085
|
-0.064
|
0.574
|
-0.612
|
SS
|
0.454
|
0.076
|
0.122
|
-0.109
|
0.813
|
-0.317
|
-0.023
|
TP
|
0.363
|
0.175
|
0.360
|
-0.615
|
-0.477
|
-0.316
|
0.054
|
TN
|
0.211
|
-0.450
|
-0.770
|
-0.196
|
-0.110
|
-0.322
|
0.075
|
For instance, PC1 is relatively evenly influenced by three quality parameters, CODCr, Chl-a, and SS, with over 45%, following the impact of pH and TP with 36%. The next component, PC2, primarily consists of Cond. and TN, where we can notice the significance of Cond. is more pronounced at 79% when compared to TN with 45%. A similar distribution is observed at PC3, with TN carrying 77% and Cond. 46%. In the case of PC4, pH encompasses almost 71% of it, while TP includes 61% of the effect.
Since the first two principal components contain nearly 60% of the data variance, a two-dimensional representation of the measurements is reasonable. After the computation of the principal components, the original data can be presented on biplots, presented on Figures 9, 10, and 11, where the horizontal and vertical axes mark the first two principal components. The biplots also contain (angled) axes representing those water quality parameters whose variation is included in the first two principal components, as well as the percentage of their impact.
The biplot on Fig. 9 lets us recognize a specific clustering of the data regarding the sampling location (north, middle, and south), regardless of the sampling time. Nevertheless, some influence of the timing of the sampling is anticipated. Consequently, we provided two additional biplots, given in Figures 10 and 11, showing potential grouping in terms of the timing of the data acquisition.
Figure 10 displays the accumulation of data depending on the sampling months for all three locations. In light of the large amount of evaluated data, the same symbols and colors by seasons were utilized, dividing each year into four groups of three months (January to March – group 1, April to June – group 2, July to September – group 3 and October to December – group 4). To help distinguish data within one group, the sizes of the symbols were varied in decreasing order, labeling the symbol of the first month in each group as the largest and the symbol of the last month as the smallest.
The analysis of the result given in Figure 10 does indicate a certain grouping of data both through months and seasons, regardless of the location. Yet, the large number of data makes it hard to identify more definite characteristics. Ergo, we included the biplot presented in Figure 11, that gives the monthly clusterings using analogous denotation, as on Figure 10, but depicting only data for the north part of the lake. Based on these results, we were able to make some clear-cut conclusions. Namely, there is a distinct grouping of the data sampled in the first half of each year (January to June), that includes groups 1 and 2, marked with squares and circles and located on the left side of the biplot, in contrast to the values sampled in the second half of the year (July to December), that combines groups 3 and 4, located on the right side of the biplot, marked with triangles and diamonds.
Further analysis suggests a noticeable accumulation of the data during the three-month intervals, previously established as groups 1, 2, 3, and 4. Although this grouping is less pronounced, it is clearly identifiable since the first data group is mainly located in the lower-left quadrant of the biplot. The second is predominantly grouped within the upper left segment. In contrast, groups 3 and 4 seem to be almost evenly distributed on the upper and lower right quadrants.
A more exhaustive investigation implied considering monthly clusterings within each group (i.e., include the symbol sizes into our considerations). Although there are some vague tendencies of monthly data cumulation, they are not nearly enough to make educated, reliable conclusions regarding their behavior. For that purpose, the authors would once again suggest a more frequent data gathering campaign (e.g., daily sampling for a shorter time interval).
CCME water quality index
Although any substantial research requires a fair amount of data, standard approaches in water quality assessment rely on finally representing the overall water quality via a simplified methodology. Consequently, to provide a clearer understanding of the water quality, the evaluation of Lake Ludas was carried out by computing the water quality index (WQI), relying on the Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI), as the most suitable for the lake at hand (Davies 2006; Bilgin 2018; Horvat and Horvat 2020). This method was selected after analyzing several approaches presented in numerous research papers (Lumb et al. 2011; Voudouris and Voutsa 2012; Bhateria and Jain 2016; Ji et al. 2016; Liu et al. 2017; de Almeida and de Oliveira 2018). This study gives only a short overview of the CCME WQI method, while a detailed representation can be found elsewhere (CCME 2001, 2012; Lumb et al. 2006, 2011; Horvat and Horvat 2020).
The water quality index CCME WQI is computed as a combination of scope, marked as F1, frequency denoted with F2 and amplitude F3:
$$CCME WQI=100-\left[\frac{\sqrt{{F}_{1}^{2}+{F}_{2}^{2}+{F}_{3}^{2}}}{1.732}\right] \left(1\right)$$
whereas the factors are estimated using the following equations:
$${F}_{1}=100\bullet \left[\frac{{n}_{nmo}}{{n}_{par}}\right], {F}_{2}=100\bullet \left[\frac{{n}_{fail}}{{n}_{total}}\right] \left(2\right)$$
$${F}_{3}=\left[\frac{{n}_{nse}}{0.01\bullet {n}_{nse}+0.01}\right] \left(3\right)$$
where nnmo denotes the number of parameters not meeting the objectives, npar is the total number of considered parameters (in this case npar = 7), nfail stands for the number of failed tests, and ntotal marks the total number of tests. The total number of tests is the product of the total number of considered parameters and the number of times the parameters were tested. The value nnse is the normalized sum of excursion showing the collective amount of individual tests that don’t meet the objective and is computed using equation:
$${n}_{nse}=\left[\frac{{\sum }_{i=1}^{n}{excursion}_{i}}{{n}_{test}}\right] \left(4\right)$$
while the individual excursions i represents the number of times individual measurements fail the objective and are determined according to:
$${excursion}_{i}=\left[\frac{{x}_{fail,i}}{{y}_{obj}}\right]-1, {excursion}_{i}=\left[\frac{{y}_{obj}}{{x}_{fail,i}}\right]-1 \left(5\right)$$
where xfail, i is the value of the failed test, and yobjis the objective of the test.
Table 9
Water quality objectives for the computation of CCME WQI for Lake Ludas
No.
|
Parameter
|
Objectives
|
1.
|
SS (mg/L)
|
35
|
2.
|
TP (mg/L)
|
1
|
3.
|
Cond. (µS/cm)
|
1500
|
4.
|
CODCr (mg/L)
|
50
|
5.
|
Chl-a (mg/m3)
|
100
|
6.
|
pH (-)
|
8.5
|
7.
|
TN (mg/L)
|
10a, 20b
|
a Objective from December until end of April, b Objective from May until end of November
|
The CCME WQI was determined in two ways: using the 7 selected water quality parameters to establish one representative WQI for the complete time interval of eight years, and computing one WQI for each year separately to detect alterations of the lake’s quality through the years. Considering the significant spatial variations of the quality parameters identified during the previous examinations, both approaches were performed for the three sampling locations separately. The desired objectives for the considered water quality parameters are given in Table 9.
The results for the CCME WQI are presented graphically in Figure 12, where Figure 12(a) depicts the water quality index for every year and sampling location, while Figure 12(b) presents the water quality index for the entire analyzed time period. A higher value of the CCME WQI describes a better water quality and vice versa.