Assessment of Spatial And Temporal Water Quality Distribution of Lake Ludas, Serbia

This work presents the analysis of both spatial and temporal water quality distribution of Lake Ludas in the Republic of Serbia using water quality data from 2011 to 2018 at three different locations. By including a set of standard methods, we reduced the initial 15 water quality parameters to 7 parameters representative for the upcoming temporal and spatial considerations. The selected parameters were subject to a series of tests such as spatial and temporal analysis. Principal component analysis (PCA) was employed to present the variation of the measurements most eciently and identify spatial and temporal tendencies. The PCA was expanded by the utilization of biplots providing a more comprehensive understanding of the measurements. Finally, the overall state of the lake's quality was evaluated using the Canadian Council of Ministers of the Environment Water Quality Index method for each sampling location, both annually and for the overall time interval, and as one representative value for the whole lake. The presented research lead to several conclusions, including the need for more detailed future measurements. It was shown that a reasonable monitoring approach leading to reliable conclusions should include much denser data in space and time. Furthermore, the necessity of three sampling locations remains relevant. In fact, it would be preferred to have a shorter list of monitored variables covering denser time and space data acquisition than having more diverse quality parameter evaluation at fever locations or temporally sporadic measurements.


Introduction
Considering the overall declining water quality induced by anthropogenic activities, there is an increasing need for e cient water quality monitoring and management. Yadav et al. (2019) investigated the in uence of various land uses on 16 water quality parameters in the Mun River, concluding that there is a strong link between the water quality parameters and land usage. The estimation of the overall quality of a water body is done using various approaches aiming to reduce a large number of data into a single number (Gradilla-Hernández et al., 2020). For example, a biological assessment of rivers was done using aquatic macroinvertebrates derived from the Tanzania River Scoring System (Dusabe et al., 2019). Computing the water quality index is another approach frequently utilized to evaluate water quality (Kükrer and Mutlu, 2019; Mutlu, 2019; Gradilla-Hernández et al., 2020; Radu et al., 2020). Others opt for hydrological simulations to provide the necessary current and future water quality information (Kumar, 2019). Models for ow and sediment transport simulations (Horvat et al., 2015), as well as heavy (toxic) metal transport simulations (Horvat and Horvat, 2016), can also be used to assess the water quality regarding sediment and/or other constituent content. However, one should keep in mind that the water quality is often described with a considerable number of inter-correlated parameters, making the task of evaluating the acquired data sets quite di cult. Consequently, the researchers opt for various approaches for establishing the most representative parameters to be utilized for a needed quality assessment. These methods include visual inspection of the data, evaluation of the Pearson's correlation between the measured parameters (Palaniet al., 2008; Radu et al., 2020), plotting of box and scatter plots, inspecting the most suitable distribution the given data will t or implementing the principal component analysis (PCA) often used to provide a deeper insight into the water quality data (Olsen et al., 2012). A critical feature of the PCA is its potential to help identify possible clusters of the gathered measurements and examine the impact of water quality on other environmental factors (Mandal et al., 2008). Researchers also used PCA to evaluate water quality monitoring stations (Ouyang, 2005), streamlining water quality management (Guo et al., 2018), and determining spatial and temporal changes in water quality (Zeinalzadeh and Rezaei, 2017).
The scope of this research was to implement exhaustive investigation and analysis on a series of temporally and spatially distributed data in order to establish a more reasonable monitoring approach for a shallow lake. Mutlu (2019) carried out a similar study, whose research included 11 points in space (sampling locations), which can provide excellent information considering the spatial distribution of water quality parameters. However, it had only one year of monthly data, making the temporal section of the research something that can still be enhanced. Since standard lake management approaches rely on comprehensive eld and laboratory work, we aimed to implement exhaustive research that includes the consideration of spatial and temporal distributions of the water quality data while reducing the number of monitored parameters as much as possible, without signi cant loss of information. Lake Ludas in the Republic of Serbia was selected as an example of a shallow lake with systematic measurements both in space and time. Relying on standard approaches such as measurement visualization through the preparation of box plots and scatter plots, the use of Pearson's correlation coe cient, and inspection of numerous distributions and conducting distribution t tests, to determine the most appropriate, we reduced the initial number of quality parameters from 15 to seven. Further data analysis was implemented relying on the PCA and preparation of biplots to identify potential grouping of the measurements. By establishing a reduced list of representative parameters for the evaluation of the water quality for Lake Ludas, we were able to simplify the quality water investigations, thus paving the road to measurements with increased frequency in space and time. Ultimately we employed a standard methodology for representing the water quality by computing the Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI).

Study area
Lake Ludas is a Nature reserve that represents one of the Ramsar sites in the Republic of Serbia, located in northern Vojvodina, about 12 km from the city of Subotica. The lake is approximately 5 km long, stretching in the north-south direction, and from 1 km to 0.3 km wide, occupying an area of approximately 593 ha. There are two surface in ows into the lake. One is from the Palic-Ludas channel, entering the lake on its northwestern tip with an average annual discharge of roughly 0.3 m 3 /s and connecting it to Lakes Palic and Omladinsko (Fig. 1). In contrast, the other less signi cant in ow (considering the small discharge entering the lake this way) is from channel Kires on the lake's northern part. Seeing the in ow/out ow locations are positioned on the lake's northern region, it is justi ed to wonder how this in uences the spatial distribution of the water quality. Furthermore, Lake Palic serves as a recipient of the treated wastewater from the Wastewater treatment plant Subotica (WWTPS), and is also mostly surrounded by agricultural land that drains into the lake. Finally, the Palic-Ludas channel is an uno cial recipient of wastewater from numerous homes surrounding Lake Palic and the channel itself. Considering all this, it is expected to see both spatial and temporal variations of the quality status in Lake Ludas.

Assessment Methodology
Another reason for selecting this location is the fact that Lake Ludas is a nature reserve site implying fairly regular measurements are conducted, covering a wide range of parameters for the time interval of the last eight years. Standard measurements conducted at Lake Ludas include a monthly water sampling at three different locations at the northern, middle, and southern parts of the lake, marked with diamonds on Fig. 1. In this research study, the 35 regularly assessed water quality parameters for over eight years, from 2011 until 2018, were rstly reduced to 15, which are presented in Table 1, along with the appropriate measurement methodologies employed.
In order to investigate which of the remaining parameters are representative for the upcoming quality analysis, the Pearson correlation coe cient (Hinkle et al., 2003) was computed for each of the quality parameter combinations. In this way, a relation between the considered parameters by evaluating both the strength and direction of the potential correlations, was obtained. Thus, the most appropriate parameters to represent each water quality characteristic were selected and reduced to seven.
These were further used in extensive data analysis, including spatial and temporal water quality distributions, the principal component analysis (PCA), and determining the water quality index relying on the Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI).

Results
Evaluation of the water quality data As previously mentioned, standard monitoring of Lake Ludas includes reasonably regular analysis of 35 water quality parameters. The main issue with these data is that over the years, amid various restructuring, some of the monitored parameters were replaced with others. Another inconvenience is that while some of these quality parameters are analyzed monthly, others are examined only four times a year. This is the consequence of the complexity of the overall location. Namely, as seen in Fig. 1, the complete area is made up of nine separate segments. Lake Palic incorporates seven of them. Lake Omladinsko is the eighth, and Lake Ludas the ninth. Hence, just one monthly sample within each segment would result in 315 analyzed data monthly, or 3780 annually. This inevitably leads to different attempts at reducing the number of tests being conducted. Therefore, if we were to conduct thorough data analysis, we would have to signi cantly reduce the number of data included in the research, either by reducing the variety of examined quality data or by decreasing the temporal frequency at least three times (from 12 data sets per year to 4). Considering this research aims to investigate temporal and spatial variations of the quality data, the Authors opted to reduce the variety of water quality parameters included in the research. This way, the initial number of 35 parameters was reduced to 15 parameters (Table 1). These are the measurements that were generally implemented regularly at a monthly pace.  Table 2 gives the descriptives, including the mean value, standard deviation (SD), minimum, median, and maximum value of the 15 quality parameters considered. The next step in the study was to investigate which of the remaining parameters are representative for the upcoming quality analysis, by computing the Pearson correlation coe cient. The Pearson correlation coe cient r will have the value ranging from − 1, indicating a complete negative correlation of the considered parameters, to + 1, suggesting their complete positive correlation. The value of r approaching zero illustrates the absence of correlation for the examined parameters.  Temp (r = 0.526), COD Cr with BOD 5 (r = 0.595), Chl-a (0.648) and SS (r = 0.546), Chl-a with SS (r = 0.549) and TN (r = 0.502). Table 3 The Pearson correlation table Further analysis helped us additionally reduce the number of parameters by including only one parameter from a group of available data for a considered characteristic. For example, the presence of organic matter in the water is described with COD Cr , COD Mn , and BOD 5 , yet including all of these data in further research would provide a biased appearance of the water quality. Thus, further studies that helped to select the most appropriate parameter to represent each water quality characteristic, were conducted. To pinpoint the most suitable parameters, the authors examined which distribution is best suited for the measured data for each of the considered groups separately, the phosphorus content, nitrogen, and organic matter, arguing that such numerous water quality data should follow some type of distribution. This helped us differentiate the most reliable data in the group, and decide which should be discarded as less trustworthy. It should be pointed out that the representative distributions in this part of the study were not the best tting ones for each of the compounds within a group; e.g., for TP, the best distribution was the Frechet distribution. At the same time, for PO 4 3− it was a mixture distribution. The selection was made to compare the results, meaning the same distribution had to be assigned for all compounds within a group. Consequently, the ve besttting distributions for each compound in a group were considered and compared. After identifying a distribution that is common for all the data within a group, and is the best among the available, the distributions presented in Table 4 were selected: the Extreme value distribution for the nitrogen group; the Student T distribution for the phosphorus group; and the LogNormal distribution for the evaluation of the organic matter content. Additionally to the selected distributions, Table 4 presents the test results utilized for the decision process that includes the Pearson χ2 (Chi-square) test (PCS), Cramer von Mises (CVM), Bayesian information criterion (BIC), Akaike information criterion (AIC), Hannan-Quinn information criterion (HQC), Log Likelihood (LogLik) and the complexity (Com). The PCS tests the goodness of t, where the null hypothesis was that the data is drawn from a population with the considered distribution, while the alternative hypothesis was that the data was not drawn from the given population. The results are presented using the p value, with the signi cance level of 0.05, where small p values indicate a small chance that the data came from the considered distribution. The CVM test is also presented in the form of p values, with the same meaning as in the previous (PCS) case, where smaller p values mean there is a smaller probability the data is from the considered distribution. The AIK is a test that includes the penalty for model complexity to protect against over-tted models. Since it relies on the assumption of an in nite-sized sample, it is reliable only as a comparison test of various models instead as a general decision-making approach. Although for sample sizes less than 40, one can use the adjusted AIC formula, the considered dataset has a much larger size (over 100) and was analyzed using the standard AIC test, with lower results indicating more appropriate models. The BIC is also called the Schwarz information criterion, and much like AIC, it includes a penalty criterion against over-tting models. A better model t is indicated with smaller values of the results. The HQC is often utilized as an alternative to the AIC or BIC and measures the goodness of t, where lower values of HQC imply either fewer explanatory variables, better t of data, or both. The LogLik test also measures the model's goodness of t, where higher values indicate a better t. The Log Likelihood value ranges from negative in nity to positive in nity, making it inappropriate for drawing straightforward conclusions. Instead, it can be used to compare these values between various models and aid decisionmaking. Finally, the analysis includes the Com. indicating the complexity of the data distribution, where higher complexity is considered a negative characteristic, since it can lead to over-tted models.
Based on the results, we selected the TN, TP, and COD Cr as the most suitable parameters to represent the considered characteristics. After choosing the representative parameters, we further evaluated the measured data against the most appropriate distribution for the given method to analyze how well the data and the best distribution t. The best-tting distributions are not the same as the ones used for the selection of the representative parameters, since at this point, there was no need to accommodate various distributions anymore. The best distribution for COD Cr and TN was the LogNormal distribution with distribution parameters μ COD =4.9987 and σ COD =0.5266 for COD Cr and μ TN =2.6767 and σ TN =0.4746 for TN. The most suitable distribution for TP was the Frechtel Distribution with the distribution parameters α TP = 1.8384, β TP =0.2297, and μ TP =-0.0752. Figure 3 displays the probability scale plots for these distributions, con rming a good t of the data.
Regarding the selected distributions, we implemented the distribution t test with the results given in Table  5, where it is clearly displayed the data match the selected distributions. Based on the given test results, it was safe to conclude the chosen parameters include reliable data and can provide a reasonable representation of the compound group, making them a sound choice for the upcoming evaluation. The statistical signi cance for Pearson's correlation was assessed through the p values for the given correlations. The null hypothesis was that the correlation coe cient of the bivariate population is equal to zero, while the alternative hypothesis was that the correlation coe cient is not equal to zero. Keeping in mind higher p values (with the 5% limit) represent statistically insigni cant correlations, suggesting strong evidence for the null hypothesis or the absence of correlation. These values helped us distinguish quality parameter pairs with reasonable correlations. After evaluating the results, the following parameters were selected to be included in the further analysis: Cond. and pH indicating the ions content of the sample; COD Cr measuring all organic contaminants, including those that are not biodegradable; Chl-a as an indicator of algae; SS indicating the presence of minerals and organic substances.
The p values matching the Pearson coe cients for the selected parameters are given in Table 6, where upper index ** suggests the null hypothesis should not be rejected, meaning the pair's correlation may be zero. Spatial and temporal distribution of the water quality data Hoping to understand better the causes and nature behind the spatial and temporal alterations, the selected water quality data was further assessed by constructing box plots, and (c) we can recognize higher conductivity in the southern part of the lake in all the examined years, also implying there is a signi cant spatial variation of the water quality between the considered locations. These spatial alterations of the water quality parameters also support the idea that additional water quality measurements are required. Considering that the current data are available only within the displayed locations, the only equitable explanation one can extract is that the systematic spatial quality alterations result from the in ow location of the Palic-Ludas channel on the north part of the lake. The in uence of the Kires channel, or the effect the agricultural lands surrounding the lake have on its quality, would need to be monitored more thoroughly.
This type of examination could be made by eliminating the temporal considerations and conducting extensive data sampling all around the lake, since it could provide an insight into the micro-distribution of the quality parameters around it. On the other hand, a more comprehensive understanding of temporal changes in the lake's water quality would require more frequent water sampling. For this purpose, the authors initially propose daily sampling of the lake for a shorter time interval (e.g., a couple of weeks). This type of data would provide understanding the frequency and intensity of temporal changes in the water quality data. Furthermore, these values could be associated with precipitation measurements to assess the in uence of the weather on the quality variations as well. Subsequently, an even more frequent sampling should be conducted, e.g., hourly values, to identify short-term alterations of the quality data and the causes behind them. Seeing the restrained amount of information that can be drawn out of such long data series, regardless of the spatially distributed data availability, the authors would suggest implementing the proposed much denser data measurements, at least once for any lake, to help better adapt the monitoring approach for the encountered circumstances.
Finally, Fig. 7 presents the scatter plot matrix providing a graphical representation of the correlations between pairs of the analyzed water quality parameters. The red lines mark the 95% density ellipses indicating the correlation between the two parameters, where narrower ellipses suggest a stronger correlation, such as the correlation between COD Cr and Chl-a, and COD Cr and SS, also supported by the Pearson's correlation coe cients given in Table 3. The different symbols employed on the scatter plot mark the locations (north, middle, and south) in Lake Ludas. Another interesting observation that can be made based on the scatter plot is that better the correlation between the considered parameters coincides with more distinct clustering of the data. For example, a higher correlation of previously mentioned COD Cr and Chl-a, and COD Cr and SS, also shows more pronounced grouping compared to TP and pH, or TP and TN, where both have low correlation coe cients, circle-shaped ellipses, and quite dispersed measured values.

Principal component analysis
The principal component analysis (PCA) boils down to nding new variables, called the principal components (PC), that capture as much of the data variation as possible (Pastor et al., 2016). The principal components are the linear combination of the analyzed variables (in this case, water quality parameters), computed so, that they are not correlated amongst each other, and so, that the rst principal component captures the most of the original data variation, the second PC includes the next most variation, etc. When the analyzed data sets are presented in a graph where the principal components are the axes, similar data points will cluster together, which can be used to attain a deeper insight into the water quality parameter's nature (e.g., temporal, spatial, or other tendencies). Table 7gives the standardized variance where the sum of all variances equals the number of analyzed parameters (in this case, 7), the proportion of the variance contained in each PC, and the cumulative proportion of the standardized variance. We can see the rst two components include 59% of the total variance, while the rst three include over 72%. The proportion of the standardized variance in each PC is also presented by a scree plot on Fig. 8, where it is visually clearly displayed that the rst PC encloses most of the data variance, as indicated with the apparent drop of variance between components 1 and 2. In contrast, the variances between all the subsequent components are minor in comparison.
The coe cients that de ne the principal components, as a linear combination of the analyzed water quality parameters, are listed in Table 8. The coe cients' higher absolute values imply a more signi cant impact of the water quality parameter on the deliberated PC. Since the rst two principal components contain nearly 60% of the data variance, a two-dimensional representation of the measurements is reasonable. After the computation of the principal components, the original data can be presented on biplots, presented on Figs. 9, 10, and 11, where the horizontal and vertical axes mark the rst two principal components. The biplots also contain (angled) axes representing those water quality parameters whose variation is included in the rst two principal components, as well as the percentage of their impact.
The biplot on Fig. 9 lets us recognize a speci c clustering of the data regarding the sampling location (north, middle, and south), regardless of the sampling time. Nevertheless, some in uence of the timing of the sampling is anticipated. Consequently, we provided two additional biplots, given in Figs. 10 and 11, showing potential grouping in terms of the timing of the data acquisition. Figure 10 displays the accumulation of data depending on the sampling moths for all three locations. In light of the large amount of evaluated data, we utilized the same symbols and colors by seasons, dividing each year into four groups of three months (January to March -group 1, April to June -group 2, July to September -group 3 and October to December -group 4). To help distinguish data within one group, we varied the sizes of the symbols in decreasing order, labeling the symbol of the rst month in each group as the largest and the symbol of the last month as the smallest.
The analysis of the result given in Fig. 10 does indicate a certain grouping of data both through months and seasons, regardless of the location. Yet, the large number of data makes it hard to identify more de nite characteristics. Ergo, we included the biplot presented in Fig. 11, that gives the monthly clusterings using analogous denotation, as on Fig. 10, but depicting only data for the north part of the lake. Based on these results, we were able to make some clear-cut conclusions. Namely, there is a distinct grouping of the data sampled in the rst half of each year (January to June), that includes groups 1 and 2, marked with squares and circles and located on the left side of the biplot, in contrast to the values sampled in the second half of the year (July to December), that combines groups 3 and 4, located on the right side of the biplot, marked with triangles and diamonds.
Further analysis suggests a noticeable accumulation of the data during the three-month intervals, previously established as groups 1, 2, 3, and 4. Although this grouping is less pronounced, it is clearly identi able since the rst data group is mainly located in the lower-left quadrant of the biplot. The second is predominantly grouped within the upper left segment. In contrast, groups 3 and 4 seem to be almost evenly distributed on the upper and lower right quadrants.
A more exhaustive investigation implied considering monthly clusterings within each group (i.e., include the symbol sizes into our considerations). Although there are some vague tendencies of monthly data cumulation, they are not nearly enough to make educated, reliable conclusions regarding their behavior. For that purpose, the authors would once again suggest a more frequent data gathering campaign (e.g., daily sampling for a shorter time interval).

CCME water quality index
Although any substantial research requires a fair amount of data, standard approaches in water quality assessment rely on nally representing the overall water quality via a simpli ed methodology. Consequently, to provide a clearer understanding of the water quality, the evaluation of Lake Ludas was carried out by computing the water quality index (WQI), relying on the Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI), as the most suitable for the lake at hand (Davies, 2006 The water quality index CCME WQI is computed as a combination of scope, marked as F 1 , frequency denoted with F 2 and amplitude F 3 : whereas the factors are estimated using the following equations: where n nmo denotes the number of parameters not meeting the objectives, n par is the total number of considered parameters (in this case n par = 7), n fail stands for the number of failed tests, and n total marks the total number of tests. The total number of tests is the product of the total number of considered parameters and the number of times the parameters were tested. The value n nse is the normalized sum of excursion showing the collective amount of individual tests that don't meet the objective and is computed using equation: while the individual excursions i represents the number of times individual measurements fail the objective and are determined according to: where x fail, i is the value of the failed test, and y obj is the objective of the test. The CCME WQI was determined in two ways: using the 7 selected water quality parameters to establish one representative WQI for the complete time interval of eight years, and computing one WQI for each year separately to detect alterations of the lake's quality through the years. Considering the signi cant spatial variations of the quality parameters identi ed during the previous examinations, both approaches were performed for the three sampling locations separately. The desired objectives for the considered water quality parameters are given in Table 9.
The results for the CCME WQI are presented graphically in Figure 12, where Figure 12(a) depicts the water quality index for every year and sampling location, while Figure 12(b) presents the water quality index for the entire analyzed time period. A higher value of the CCME WQI describes a better water quality and vice versa.

Discussion
After implementing a series of statistical analyses, including the determination of the distributions most suitable for the considered parameters, data visualization, and evaluation of the correlation so e cient, the initially considered 15 water quality parameters were reduced to 7 parameters su cient to provide a trustworthy display of Lake Ludas quality state. Although the implemented analysis did not yield a clear tendency, it seems to imply the in ow on the north of the lake as a possible cause in the water quality alterations, possibly identifying the Palic-Ludas channel as a contaminant source. By studying the temporal variations of the quality parameters, we noted an increase in the water quality from 2013 to 2016 (2017) in both the north and middle part of the lake, while the south part seems to have its own trends (Fig. 4). On the other hand, the southern part of the lake has a higher conductivity virtually in all the examined years (Fig. 5), indicating possible pollution from fertilizers from adjacent elds on this narrow part of the lake.
Employing the PCA allowed us a more comprehensive insight into the overall quality alterations within the lake, both in space and time. Table 7 and Fig. 8 present the variations included by the computed principal components, where we can see that PC1 and PC2 contain nearly 60% of the data variation. Figure 9 presents a spatially oriented biplot that includes the rst two principal components intended for the identi cation of possible clusters regarding the locations. The displayed results support this idea, as the measurements obviously group according to the sampling location, which is in accordance with the analyzed box plots. Although there is some overlapping, the clustering according to sampling location con rms the spatial variation in the measured data, as well as the source of pollution coming from the northern part of the lake. Figures 10 and 11 depict the same measurements in the form of biplots, now labeled with regards to the month of sampling. The clustering is less evident for the results given in Fig. 10, since it encloses all of the measurements both in space and time, making it harder to make out potential accumulation based on various criteria. Regardless, we can notice some clustering inclination suggesting that some months (e.g., January, February, March, etc.) seem to be grouped more around one side of the plot, while others (e.g., October, November, and December) on the other. To point out this kind of behavior, we employed the same notation and coloring for groups of three months, as explained in Section Principal component analysis, differentiating the months within a group only by varied symbol sizes. Using this notation principle accompanied with the reduction of the displayed data to the north sample location, the biplot given in Fig. 11 makes it relatively easy to detect a couple of characteristics. There is a clear distinction between the rst two data groups, including sampling from January to June, and the third and fourth groups, including data from July to December. Furthermore, the data from the rst group (January to March) have generally different quality compared to the remaining groups. The same can be stated for the second group, while the third and fourth groups have a more pronounced overlapping of the quality characteristics. This suggests that the quality variations tend to decrease towards the second half of the year as the colder months approach. Although the data gives an impression of additional grouping between every month, making such conclusions based on the presented data would be bold but reckless. Such considerations would require a more thorough insight into the data that can be provided by introducing temporally much denser data sampling. For instance, daily measurements of properly selected water quality parameters would allow us to identify additional time-dependent behavior. Nonetheless, it can be safely concluded that the data clustering is more pronounced regarding the sampling locations than the data acquisition timing. This judgment also collaborates with the results identi ed on the box plots and makes it safe to suggest that the number of monitored sampling locations shouldn't be reduced and could potentially be increased.
The computed CCME WQI (Fig. 12) demonstrated that the water quality of Lake Ludas got progressively better from 2013 to 2016 (2017), which is following the conclusions attained from Figs. 4 and 5. Furthermore, these results support the idea that the water quality seems to improve farther from the in ow location on the north, aside from the timing of the data sampling suggesting that the signi cant source of pollution is located in the northern part of the lake, most probably the in ow water from the Palic-Ludas channel. Another conclusion that can be drawn is that the presently used three sampling locations are indeed necessary since there is a calculable difference in CCME WQI, and thus in the water quality as well between the northern part and the rest of Lake Ludas. Considering the lake's hydraulic peculiarity, described in Sect. 2.1, the primary mechanism of transferring pollutants to the southern part of the lake is wind. Therefore, while the number of monitored parameters can be reduced and still be enough to assess the lake's water quality, the three sampling locations should remain since they can give valuable information about the faith of pollution coming from the lake's northern part. Moreover, including additional sites for data analysis should be contemplated, even if as just a one-time undertaking, since it would provide valuable insight into the spatial distribution of the water quality.

Conclusion
Lake Ludas is a Nature Reserve site. Thus there are fairly regular monitoring activities of its water quality that include both spatially and temporally distributed data. These measurements include a monthly water sampling at three different locations at the north, middle and south parts of the lake. Although 35 physical and chemical parameters are being monitored, preliminary analysis revealed only 15 of them are observed monthly during the entire time interval from 2011 to 2018. Aiming to reduce the number of parameters that would be included in the upcoming investigations, we engaged in the statistical analysis, including computing the correlation matrix, establishing the data distributions, and implementing distribution t test, helping us select which water quality parameters should be chosen as the most representative for further analysis. The chosen parameters included Cond., pH, COD Cr , Chl-a, SS, TP, and TN.
Further temporal and spatial analysis of the water quality included box plots, scatter plots, and principal component analysis. Box plots made it apparent that there is both a spatial and a temporal change of the water quality that can be detected. The temporal alteration suggests an improvement of the quality status during the time interval from around 2013 until 2016 and 2017. This improvement is noticeable in the north and middle parts of the lake, while it was less clear on the south part. Another distinct tendency is the spatial variation of the water quality. It was identi ed that the north part of the lake has the worst quality, which seems to improve as one departs from the north, implying that the primary source of pollution comes from the Palic-Ludas channel that connects to the lake at its north part. Another observation suggests that the south part of the lake has a higher conductivity. Considering how narrow the lake is at this location and the fact it is surrounded by agricultural land would suggest fertilizing material as a source of pollution.
The principal component analysis provided additional insight into the data characteristics by preparing biplots focusing on both time and space distribution. We were able to con rm the existence of data clustering in both aspects. Additionally, we showed a general seasonal character of the measurements that can indeed be grouped through time. By compiling groups made of three subsequent months, we showed that the measurements show altering water quality during the rst and second half of a year. Furthermore, there is a noticeable change in the quality in the rst and second trimester, while the third and fourth trimester alterations seem less obvious. Finally, one can also recognize variations of the water quality within a trimester by comparing values throughout months, but in this case, the changes are less explicit. As an overall conclusion, we found that a more thorough evaluation of the spatial and temporal characteristics of the water quality changes requires much denser data gathering that would include daily measurements for the temporal analysis and a seriously increased number of sampling locations for the spatial evaluation.
The computation of the CCME WQI supported the previously noted conclusions regarding the spatial and temporal tendencies, once again proposing that a signi cant source of pollution is the in ow of water from the Palic-Ludas channel. Furthermore, one can offer a suggestion that although the monitored water quality parameters can be reduced signi cantly and still be able to compute a representative CCME WQI, the necessity of three sampling locations remains relevant. The data gathered from these three locations can give valuable information about the spread of pollution coming from the northern part of the lake, as well as information regarding possible pollution from agricultural land located on the south section of Lake Ludas. Ultimately, we would propose implementing an enhanced data sampling campaign regarding enriched data acquisition both in space in time to allow a better understanding of the lake quality alterations. Namely, the presented data includes eight years of monthly data on three locations, yet proved insu cient for an overall unambiguous quality assessment that would result in clear-cut conclusions regarding the contaminant sources or aftereffects. Examples of the Pearson's correlation coe cients