3.1.1. Hierarchical Cluster Analysis (HCA)
The HCA was applied to the data matrix including the 18 sample points and 28 variables. The variable Br− was removed from the analysis because it presented values below the detection limit in all months analyzed for all points (Tables S4 and S6). The choice of the number of clusters was performed using the Gap statistic method, which uses the output of any hierarchical clustering algorithm, comparing the change in dispersion within the cluster with the expected one, under an appropriate null distribution of reference [55] (Fig. 2a). From there, a dendrogram was obtained that gathered the 18 sample points into four statistically significant clusters (Fig. 2b). The generated clusters have similar characteristics and font types.
In the first grouping of Cluster 1 (P1, P3, P5, P9 and P13) are the points considered most preserved, including headwaters of HUs with rural, urban and natural occupation. The headwaters of the Jardim River (P1) and Buriti Vermelho Stream (P3) belong to the Preto River watershed, a region with strong agricultural activity in the FD. Despite the intense activity in the region, the samples from these points have similar characteristics, such as low concentration of ions [56].
The headwaters of Chapadinha Stream (P5) and Sobradinho River (P9) belong to the São Bartolomeu River watershed, the first located in HU-RURAL and the second in HU-URBAN. Despite being located in an urban area, the source of Sobradinho River is located in a Permanent Preservation Area (PPA). The source of Taquara Stream (P13), in turn, belongs to the Gama River Basin and is part of an Ecological Reserve.
Points P2, P4, P6, P7, P8, P11 and P14 make up the second grouping (Cluster 2). Points P2, P4, P6 and P8 belong to HUs with rural human influence and intense agricultural activity nearby. The other points in Cluster 2 (P7, P11 and P14) have similarities despite being composed of a spring within HU-RURAL (Cabeceira Comprida Stream P7) and HU-URBAN (Tamanduá Stream P11) and a point with anthropic influence in HU-NATURAL (Taquara Stream P14).
In Cluster 3, points P15 and P16 (both in Ouro Stream) and points P17 and P18 (source and point with human influence in Contagem River) are grouped. The points belonging to the cluster are located in the Maranhão River watershed and have similar hydrogeological characteristics, with no differentiation between the source and the point with anthropic influence. In this region occur the highest concentrations of minerals and deposits of the FD [57], evidenced in the high levels of ions (Na+, K+, Ca2+, Mg2+, HCO3−) and pH, total hardness, total alkalinity and total carbon above average in relation to the other points (Tables S5 and S6).
Cluster 4 corresponds to points P10 (point with human influence on Sobradinho River) and P12 (point with human influence on Ponte Alta River). The two points are located in HUs with urban occupation and both receive effluents from a Sewage Treatment Plant (STP) in three important cities in the FD [56]. These points showed low values of dissolved oxygen, high values of NO3−, NH4+, total phosphorus, total nitrogen, biochemical oxygen demand and Escherichia coli, in all months sampled (Tables S5 and S6).
3.1.2. Principal Component Analysis (PCA)
The PCA was applied to the data matrices separately, according to the land use/cover of the HUs, including 27 variables. The value of the KMO index was 0.708 for the HU-RURAL data matrix, 0.744 for the HU-URBAN and 0.792 for the HU-NATURAL. For all data sets, the p-value in Bartlett's sphericity test was considered significant (p < 0.000), evidencing suitability for the application of PCA.
The determination of the number of Principal Components (PC) interpreted was performed using the broken-stick model. This method more accurately selects the appropriate number of PCs relative to common rule methods (i.e. eigenvalues >1) and is typically more robust than statistically derived methods [58,59]. Figure 3 presents the graph of the eigenvalue and the broken-stick model for each component of the three different matrices. As can be seen in the figure, for the HU-URBAN matrix, the model selected the first PC. For the HU-RURAL and HU-NATURAL matrices, the first two PCs were selected.
The loads of the first two PCs retained for each data matrix (rural, urban, and natural) are shown in Fig. 4. The principal component loads can be used to determine the relative importance of a water quality variable compared to other variables of the PC, not reflecting the importance of the component itself [51].
For the HU-RURAL, principal component 1 (PC1) explained 48.8% of the total variance and was positively formed by physical variables, minerals and inorganic nutrients (EC, TDS, TH, TA, HCO3−, NO3−, Ca2+ and Mg2+) that presented loads greater than 0.7. This indicates that these variables are the most representative in defining the water quality of the analyzed water bodies. Variables with loadings greater than ± 0.70 are those that appropriately contribute to the data variation [60].
PC2 explained 10.6% of the total variance with physical variables related to the load of substances dissolved in water (TURB and COLOR). For these two components, the water quality variables related to the physical and inorganic characteristics predominate in relation to the organic and biological properties of the samples. These two components explained 59.4% of the total variation in the data. In studies that apply PCA in the assessment of water quality, the first two or three main components generated explain a good part of the variation in the original data (50 to 80%), without significant loss of information [61].
At HU-URBAN, the total variation explained for PC1 was 52.7%. In this first component, the variables that most contributed to the total explanation included BOD, TP and TN (as organic contributors) and EC, pH, TH, TA, TR, Cl− and SO42− (as physical and chemical variables related to mineral characteristics and acidity of the water). For PC2, which explained 14.2% of the total variation, ECOLI, Na+, NH4+ and K+ were the variables that contributed to the component, all positively (Fig. 4).
The PC1 of HU-NATURAL explained 43.5% of the total variance of data with the physical variables, minerals and inorganic nutrients responsible for the contribution in this component (EC, TDS, pH, TH, TA, TR, TC, HCO3-, F-, K+, Ca2+ and Mg2+). As can be seen in Fig. 4, for the PC2 of this HU (total explained variance of 15.2%), the variables that contributed most were Na+ and SO42-. The two components together explained 58.7% of the data variation.
These results show that the variables that influence the water quality of a group of water bodies (HUs) may not be important for other groups. The loads of the first two components, for the three matrices, also reveal that variables such as TEMP, DO and SAR were less important in the general variation of water quality, with low eigenvectors for these three variables (< 0.6).
As can be seen in Fig. 4, PC1 and PC2 for all matrices were (positively) influenced by a large number of variables, making it difficult to interpret which variables are most important in the general variation of water quality for a given land use or cover. Thus, Exploratory Factor Analysis (EFA) was applied in order to determine the relative importance of water quality variables.
Table 4 presents the correlation coefficients rotated in the EFA for the first three factors in each data matrix. The three factors accounted for 84.3%, 88.7% and 89.1% of the total changes in HU-RURAL, HU-URBAN and HU-NATURAL, respectively. Rotated factors with load above 0.75 are considered strong, loads between 0.75 and 0.5 moderate and loads between 0.5 and 0.3 are considered weak [62]. In this study, only variables with factor loadings considered strong (> 0.75) were considered relevant, contributing to seasonal variations in water quality in each group.
Table 4
Key variables for each land use/cover group.
Group | Factor | Key variables | Loads* |
HU-RURAL | F1 | TH | 0.796 |
TA | 0.821 |
HCO3− | 0.952 |
NO3− | 0.909 |
F2 | TURB | 0.856 |
F3 | SAR | 0.883 |
HU-URBAN | F1 | EC | 0.834 |
TP | 0.820 |
TN | 0.925 |
NH4+ | 0.912 |
F2 | TR | 0.903 |
ECOLI | 0.738 |
F3 | BOD | 0.761 |
HU-NATURAL | F1 | EC | 0.954 |
TDS | 0.886 |
TH | 0.976 |
TC | 0.946 |
HCO3− | 0.961 |
F2 | Ca2+ | 0.798 |
Mg2+ | 0.940 |
F3 | F− | 0.816 |
* Only variables with loads > 0.75 |
The key variables for HU-RURAL, in the first three factors rotated, were TH, TA, HCO3-, NO3-, TURB and SAR. These variables are important when water use is directed to irrigation. Water hardness refers to the presence of alkaline earth metals, mainly Ca2+ and Mg2+, which are the main ones found in natural waters [11]. Very hard water (> 180.0 mg/L CaCO3) can affect its suitability for certain techniques such as sprinkling or dripping [63,64]. High water hardness can also be limiting for fertigation, where values above 100 mg/L of calcium and 43 mg/L of magnesium increase the risk of precipitation of phosphate fertilizers inside the pipes [65]. At points P1 to P8 (HU-RURAL) the TH values ranged from 1.38 mg/L (P3) to 18.8 mg/L (P8). Calcium ranged from 0.078 mg/L (P1) to 5.992 mg/L (P2), and magnesium had a minimum of 0.005 mg/L (P1) and a maximum of 1.421 (P2) (Tables S3 and S4).
TA and HCO3- are equally important variables for assessing water quality for irrigation. The total alkalinity of water is the sum of all titratable bases, especially carbonate and bicarbonate (HCO3-). Waters rich in bicarbonates tend to precipitate calcium carbonate and magnesium carbonate when the soil solution is concentrated by evapotranspiration, increasing soil sodicity and consequently SAR. HCO3- levels above 518 mg/L in water can damage susceptible crops [66,67]. In this TA study, for the HU-RURAL points, the maximum value was 16.52 mg/L of CaCO3 in P2 and minimum of 1.28 mg/L of CaCO3 in P1. As for HCO3-, there was a maximum of 20.15 mg/L also in P2 and minimum of 1.562 mg/L in P1 (Tables S3 and S4).
The SAR is an important variable for the assessment of water quality for irrigation. This is a relative ratio of Na+ ion to Ca2+ and Mg2+ ions. It is used to estimate the potential for Na + to accumulate in the soil mainly to the detriment of Ca2+, Mg2+ and K+ as a result of the regular use of water with a high concentration of sodium. High SAR values (> 26 meq/L) can influence the percolation time of water in the soil, leading to a decrease in the infiltration rate due to the dispersion and disaggregation of the soil structure [63,68–69]. For the HU-RURAL points, the highest mean values of SAR were found in P2–3.378 meq/L and P8–3.462 meq/L (Table S4).
NO3-, in turn, is one of the most common pollutants found in surface and groundwater, coming from point and non-point sources. Some non-point sources include agricultural activities such as fertilizer and manure application, leguminous crops and irrigation with groundwater containing nitrogen compounds [70,71]. Excess NO3- in irrigation water can affect sensitive crops at concentrations above 5 mg/L. Most other crops are relatively unaffected by up to 30 mg/L nitrate [72]. The maximum content of NO3- was found in P4, with a value of 0.962 mg/L and the minimum in P8, a value of 0.001 mg/L (Table S4).
For HU-URBAN, the most important variables, with loads rotated above 0.75, were EC, TP, TN, NH4+, TR, ECOLI and BOD. These variables are important indicators for water bodies in urban areas, since they can indicate contamination, for example, by effluents from domestic sewage. Electrical conductivity (EC) is extremely useful as a general measure of water quality. Significant changes in conductivity can be an indicator that a discharge or some other source of pollution has reached a given water body, especially freshwater bodies [73,74]. The maximum EC for the HU-URBAN points was 448 µS/cm at P12 (Table S5).
Biochemical oxygen demand (BOD) is the amount of dissolved oxygen required to decompose the organic material present in the water sample, by aerobic biological organisms, in a given time at a certain temperature [75]. High BOD values in a water body are generally caused by the release of organic loads, mainly effluents from domestic sewage, and are associated with a decrease in dissolved oxygen in the water, which can lead to the mortality of aquatic organisms [44]. Point P12 showed a maximum of 6.42 mg/L, as can be seen in Table S5 (supplementary). BOD values can vary significantly; in general, unpolluted fresh water has a value below 1 mg/L, moderately polluted water from 2 to 8 mg/L and treated domestic effluent 20 mg/L [76].
Total residue (TR) represents the sum of dissolved solids and suspended solids in water, including colloidal particles. TR analysis in urban surface samples is an important indicator of pollution from domestic sewage or other point sources [44]. High levels of TR can affect the aesthetic quality of water, especially for human consumption, and can also reduce the efficiency of effluent treatment plants [77]. The highest levels of TR in this study were found in the HU-URBAN for P12 with a maximum of 450.9 mg/L (Table S5).
Phosphorus (P) and nitrogen (N) compounds are essential for the processes that occur in the aquatic environment. However, in excessive amounts they represent a significant source of water pollution [78]. P is a primary nutrient limiting the growth of algae and phytoplankton in many freshwater bodies, and its source can be either anthropogenic (domestic eluents and fertilizers) or natural (precipitation or geological materials) [79]. Total N is the sum of all forms of nitrogen present in water (organic, ammoniac, nitrite and nitrate).
Elevated levels of N and P in water bodies cause nutrient imbalance and induce eutrophication, bringing anoxic conditions to the water [80]. Both TP and TN were key variables in the factor analysis of this study, along with NH4+. The maximum values of TP and TN were found in points with urban influence (HU-URBAN), 0.282 mg/L TP and 41.40 mg/L TN for P12; and 16.01 mg/L of NH4+ at P10, both receiving effluents from Sewage Treatment Stations.
The bacterium of the coliform group, Escherichia coli (ECOLI), is an important indicator of fecal pollution in freshwater bodies, especially in urban environments, considered a simple and economic analysis compared to other pathogens [81,82]. The maximum concentrations of ECOLI detected by the method of enzyme substrates in this study were for P12–48,392 NMP/100 mL and P10–12,200 NMP/100 mL.
The sampling points in the area under mostly natural land cover (HU-NATURAL) are located in the Ecological Reserve of the Brazilian Institute of Geography and Statistics (RECOR-IBGE), in the center-south of the FD and in the Environmental Protection Area (APA) of Cafuringa, in the extreme north of the FD. These two regions are characterized by extensive areas of preserved vegetation in the Cerrado biome [83,84]. The key variables of greatest interest to the group were CE, TDS, TH, TC, HCO3-, Ca2+ and Mg2+. These variables are closely linked to the natural geological characteristics of these regions, since there is little or no human influence at the sampling points. Points P15, P16, P17 and P18 are located in a region characterized by the presence of Cambisol, originating from predominantly limestone rocks [85].
In the HU-NATURAL group, point P16 (point with anthropic influence in Ouro River) presented the maximum levels for CE (251 µS/cm), TDS (120.5 mg/L), TH (140.51 mg/L CaCO3), TC (32.514 mg/L) and HCO3+ (170.31 mg/L). P18 (point with anthropic influence in Contagem River) showed maximum Ca2+ (31.35 mg/L) and P15 (Ouro River headwater) maximum Mg2+ (13.611 mg/L) (Tables S5 and S6).