Statistical modeling of groundwater geochemistry in northeastern Brazil

José Francisco de Francisco Oliveira Júnior (  junior_inpe@hotmail.com ) Federal University of Alagoas https://orcid.org/0000-0002-6131-7605 Alex Souza Moraes Universidade Federal Rural de Pernambuco Héliton Pandor Universidade Federal Rural de Pernambuco Jhon Lennon Bezerra da Silva Universidade Federal Rural de Pernambuco Pedro Henrique Dias Batista Universidade Federal Rural de Pernambuco Alexandre Maniçoba Rosa Ferraz Jardim Universidade Federal Rural de Pernambuco Gledson Luiz Pontes de Almeida Universidade Federal Rural de Pernambuco Taize Calvacante Santana Universidade Federal Rural de Pernambuco Marcio Mesquita Universidade Federal Rural de Pernambuco


Introduction
Brazil has the largest freshwater reservoir in the world, equivalent to 14% of the total planet Earth, being a country of high relevance worldwide in the supply of mineral water (Tundisi, 2008;Kunrath et al., 2020). In recent years, bottled mineral water has grown signi cantly on a national scale and thus has become an attractive and pro table business in Brazil (DNPM, 2016; ABINAM, 2020). The worldwide consumption of mineral water reached 284.1 billion liters in 2016, while other beverages, for example, soft drinks reached 159.1 billion liters in the same period (Huang and Liu, 2017), due to these increases, the need for evaluations and analyzes of the quality of drinking water becomes essential and with high frequency (Shruti et al., 2020).
The consumer bases his choice on extrinsic factors, such as brand and personal habits (Pacheco et al., activities implemented, followed by population densi cation and the urbanization process, among others, there was a need to assess the quality of groundwater (CPRH, 2020).
RMRE is located on a Deltaic Plain, which has distinct hydrogeological characteristics, composed of sediments of different origins: uvial, marine, colluvial, and mangroves, which cover the coastal sedimentary basins of Pernambuco and Paraíba, respectively to the south and north of the region, separated by the structural divider Lineamento Pernambuco. Figure 2 illustrates the spatial distribution of hypsometry, to establish a detailed understanding of the Beberibe aquifer that contemplates the western portion of the Pernambuco coast, with a variation from 0 (coast of the state of Pernambuco) to 261 m of altitude.
[INSERT TO FIGURE 2 HERE]

Data collect
The in-situ data collection was carried out in 34 extraction units (i.e., artesian wells) that capture mineral water, georeferenced, based on the analyzes made available by the companies and following the rules of the Pernambuco Health Surveillance Agency (APEVISA, 2020). The representation of the spatial distribution of artesian wells is shown in Figure 1.  hydrochemical analyzes, all from tubular wells and chemically homogeneous domains concerning potability, chemical facies, and the suitability of water for use in irrigation. The chemical reports were incorporated into a database (elaborated in Access), where they were classi ed in terms of potability, chemical facies, and the suitability of waters for use in irrigation, and represented in a shape le format.
These determinations were migrated and georeferenced in GeoMedia, where work was carried out to individualize chemically homogeneous zones, using geological, physiographic, and hydrogeological criteria, which allowed the demarcation of units that have similar characteristics within their limits.
The data made available on the platform are in the shape le format, with an established preclassi cation, in which reclassi cation techniques by geoprocessing were applied to reclassify these shape les, for the area comprising the study. The shape les were processed using ArcGIS® software, version 10.2.2 (ESRI, Redlands, California, USA).

Statistical analysis
The data obtained were subjected to analysis of descriptive statistics based on the mean, median, and coe cient of variation (CV, %), being classi ed as low when the CV < 12%; medium when 12% < CV < 24%, and high when CV > 24% (Warrick and Nielsen, 1980). Subsequently, the Kolmogorov-Smirnov normality test (KS test) was applied with a signi cance level set at p ≤ 0.01.
The geostatistics analysis was performed based on the calculation of the classic semivariances (Eq. 1), which estimated the structure and the spatial dependence between the pairs of observations.
where, γ(h) -is the experimental semi-variance estimator, obtained from the sampled values Z(X i ), Z(X i +h);

[ ( ) ]
N(h) -is the number of measured value pairs separated by the vector or delay distance; h -is the distance between sample pairs (i.e., it is the distance between two samples); Z(X i ) and Z(X i + h) -are the values of the i-th observation of the regionalized variable, collected at points X i and X i +h (i = 1, ..., n), separated by the vector h.
Spatial dependence was analyzed by adjusting the semivariogram based on the estimate of the semivariance using the GEO-EAS® program (England et al., 1989). The data were adjusted to the spherical, exponential, and Gaussian models (Eqs. 2, 3, and 4), respectively, according to Deutsch et al. (1998). The spherical, exponential, and Gaussian models are referred to in the literature as transitive theoretical models and are more common for adjusting semivariograms (Gois et al., 2015).
Spherical Model: Exponential Model: Gaussian model: where, γ(h) -is the experimental semi-variance estimator; C 0 +C -is the sill (i.e., it is the nugget effect plus the variance dispersion, given by the acronyms C 0 and C, respectively); h -is the distance between sample pairs; a -is the range (m).
The best models of the adjusted semivariograms were validated by the cross-validation of the Jack-Kni ng test, in which the mean must be close to zero and the standard deviation close to 1 (e.g., Vauclin et al., 1983), the program used for that analysis was GEO-EAS® (England et al., 1989).
For principal component analysis (PCA), 19 variables were admitted. Based on the principal components (PC), the covariance matrix was obtained to extract the eigenvalues that originate the eigenvectors. To identify the variables that presented correlation, the Kaiser criterion was used, considering the eigenvalues greater than 1.0, which generate components with a relevant amount of information contained in the original data (Kaiser, 1958).
Pearson's correlation (r) was also performed for all variables, seeking to correlate with the PCA, to highlight the similarities between the variables. The program used for PCA and the correlation r was RStudio, version 3.6.1 (R Core Team, 2019).
Linear regression models were established to justify and clarify the main hydrochemical characteristics that are correlated with nitrate (NO 3 ), a component that has a direct impact on human health. It was veri ed among the main variables, the one with the highest coe cient of determination (R 2 > 0.5) to reduce the main relevance and correlation with NO 3 . The analyzes were performed using the Minitab software, version 19. Most of the artesian wells in the mineral water production units are close to the coast, a densely populated region (Figures 1 and 3). The artesian wells close to the coast are of porous geological formation, with the lowest altitudes ( Figure 2), with ow variations from medium to very high. From the western portion of the study region, there is a ssural geological formation, considering the higher altitudes, with greater rocky extension and ow variations from medium to very high (Figures 2 and 3).

[INSERT TO FIGURE 3 HERE]
There is a speci c homogeneous ow occurring close to the coastal region, categorized as "very high productivity" (Figure 3). The obtained result corroborates with the studies carried out by Hu et al. (2017) who analyzed the hydrogeological con guration of groundwater for the entire Brazilian territory and highlighted good water retention capacity for recharging and discharging aquifers. The authors also pointed out that the aquifers close to the Atlantic Ocean play a role as a "wall or barrier", because the water tables in these regions always remain at the same level as the sea surface at the limits of the coast, thus maintaining better retention, which will lead to homogenization of water ow.
Although the original depth of the well is unknown, part of the water may come from shallower horizons  As recently highlighted by Teramoto et al. (2020), the hydrochemistry in the Guarani aquifer, due to its mixed bicarbonate composition, in uences the geological formation of the porous type. This statement is con rmed in the present study since the geological composition is heterogeneous and presents an advanced state of decomposition, which implies a high variability of bicarbonates.

Piper's diagram
The hydrochemical face is a term used to describe the percentage of the chemical composition of the water, resulting from the combined effect of the solution's kinetics, rock-water interactions, hydrogeological con gurations, and sources of contamination. Piper's trilinear diagram (Piper, 1944) consists of a diamond and a pair of equilateral triangles, in which two triangles represent the anion and the cation, respectively, connected by a diamond-shaped diagram. Figure 5 shows the hydrochemical composition of the groundwater in the Beberibe aquifer.

[INSERT TO FIGURE 5 HERE]
Page 10/30 The chemical composition of groundwater is mainly of the Na-K-Cl-SO 4 type ( Figure 5), where Na-K corresponds to approximately 90% in the rst triangle.    The variability between the medium and high categories is mainly due to the hydrogeological and hydrochemical diversity, which in turn occurs in the geological formation of rocks and which has a direct impact on the chemistry of groundwater (IBGE, 2018; IBGE, 2020). The results obtained corroborate those mentioned by Yang et al. (2020), where they investigated the hydrochemical composition of groundwater to highlight its quality and assess its risk/suitability for irrigation purposes in the Ordos Basin, China, they identi ed high CV% for all variables studied, except for the pH, where the CV% presented a value referring to the medium variability.

Descriptive statistics and geostatistics of the data
The highest CV% indicated that the ions were sensitive to the external environment, such as hydrological conditions, topography, and anthropogenic activities, as discussed by Yang et al. (2020). Another key factor is pH, which according to Liu et al. (2018), in a study of the dynamic characteristics of groundwater in the valley plain of the city of Lhasa (capital of the Autonomous Region of Tibet), the low CV% indicated that the pH of groundwater tends to be relatively stable, which differs from the present study, in which the pH was not stable and CV% is admitted with medium variability, according to the criterion of Warrick and Nielsen (1980) -( Table 2).
The pH according to the standards established in Table 1 and compliance with the minimum values recorded in Table 2, presented low standards, indicative of points of distribution of acidic mineral water for human health. High concentrations of NO 3 were observed in some mineral water distribution points, therefore, outside the limits established by Brazilian legislation (Table 1) The analysis of the spatial variability for the hydrochemical parameters is shown in Table 3. All the parameters evaluated, the Gaussian model was the one that best t, which is indicated for studies of groundwater. The highlight for the coe cient R 2 with variations between 0.6 to 0.99.  (Vauclin et al., 1983) was applied, were all established semivariograms presented the mean close to 0 and the standard deviation close to 1, and thus used in kriging maps for each studied hydrochemical variable ( Table 3).
The DSD was strong for all the parameters analyzed, which indicates that the value of each parameter associated with a given location, was more similar to the value of a neighboring sample than to the rest of the location of the sample set, which reinforces the validation of the Gaussian model for the studied hydrochemical variables (Cambardella et al., 1994). assessed the groundwater quality of the Lakhimpur district in Bangladesh based on water quality indices, geostatistical methods, and multivariate analysis pointed out that a kriging method is an effective tool for decision-makers on groundwater quality management.
It is noteworthy that the hydrogeological formation also contributes to NO 3 concentrations. It appears that the Pernambuco coast has a hydrogeological formation of the mixed bicarbonate type (Figure 4), which because of this, there is a high variability (> CV%) of the hydrochemical elements of groundwater (Table   2). This mixed diversity of bicarbonate can also be considered as one of the factors for this high concentration of NO 3 , as such regions are common of ssural geological formation, which in turn re ects and a greater tendency for contamination of these waters.  (Table 1). However, to the west of the coast, values were observed within the limits allowed for NO 3  The pH of groundwater on the coast met the acceptable standard for human health (pH between 6 and 9), with variations between acid to alkaline (pH between 6.4 and 7.6) (Ministério da Saúde, 1999; Ministério da Saúde, 2006; CETESB, 2016). However, the west region of the coast had a pH < 6 ( Figure 6). Abualnaeem et al. (2018), evaluated the salinity and quality of groundwater in the Gaza coastal aquifer, Gaza Strip in Palestine, and highlighted that acidity and alkalinity close to neutralization, is derived from the dissolution of carbonate minerals in the form of bicarbonate (HCO 3− ). Corroborating the statement of the aforementioned authors, the coastal region of the present study, to the east, showed pH with variations between acid and alkaline, still because it presents mixed bicarbonate hydrogeological formation ( Figure   4).
The elements SO 4 , K, Mn, F, Zn, Fe, HCO 3 , Si, Sr and Ba were the elements that had the highest concentrations west of the coast (Figure 6). This study region was also the one with the lowest pH.

Principal component analysis (PCA)
The PCA presents the eigenvalue, the proportion, and the accumulation of the total variance for all variables studied ( Table 4). The cumulative total variance for principal component 2 (PC2) was 49.61%, a signi cant result for the analyzed data set. According to the criteria established by Kaiser (1958), PC1 and PC2 presented their eigenvalues greater than 1, which indicated a signi cant information load and, thus, generate the biplot graphs.  total cumulative variance for the PC2, of 59.26%, which is higher than the study.
In Figure 7, the PC's biplot chart stands out and Pearson correlation of the hydrochemical parameters of the mineral water of the Beberibe aquifer in the study area, referring to the municipalities of Recife, Olinda, and Ba) were admitted and the correlated points refer to the artesian sample collection wells (A01 to A34).
The variables DR, Na, Ca, Mg, Cl, EC, Hn, and Sr, showed a correlation with NO 3 , as observed in kriging maps ( Figure 7A). However, it is noteworthy that among the observed correlations, Cl was the one with the highest correlation with NO 3 , in which, the Cl projection line follows the NO 3 line, which characterizes this strong correlation ( Figure 7B). To clarify more precisely the correlations between the variables, Pearson's correlation was applied, where the correlation was signi cant between Cl and NO 3 ( Figure 7B). Corroborating the ndings of this study, Ayed et al. (2017) also observed a strong correlation between Cl and NO 3 .

Linear regression analysis
Based on the analysis of the kriging maps ( Figure 6) and the PCA -( Figure 7A), was performed a simple linear regression analysis of the electrical conductivity (EC), hardness (Hn), evaporative resistance (DR), magnesium (Mg), sodium (Na) and chlorine (Cl), with nitrate (NO 3 ), as they present a greater correlation with NO 3 in the previous analyzes ( Figure 8). The R 2 criterion greater than 0.5 was established to represent only the most representative predictor variables with NO 3 .

[INSERT TO FIGURE 8 HERE]
Among the linear regressions, only NO 3 x DR showed R 2 < 0.5 (R 2 = 0.32), so the DR has no direct in uence on NO 3 ( Figure 8C). It can be seen that Cl, showed a satisfactory R 2 with NO 3 (R 2 = 0.51) -( Figure 8F), previously indicated by geostatistics based on kriging maps and PCA, and these two variables showed a high correlation between themselves. The results obtained corroborate those described by Tiwari et al. (2017), where they assessed the spatial distribution of groundwater quality and assessed regional suitability for drinking water in the Alwar region in India and found a strong correlation between NO 3 and Cl.

Conclusions
Based on Brazilian legislation, there are strategic wells that have a direct impact on human health and, therefore, there is a need to reanalyze possible approaches for improvements that support The anthropogenic in uence of the metropolitan center of Recife, has a direct impact on water quality for human health, especially on nitrate concentrations (NO 3 ). It is also noteworthy that the concentrations of chlorine (Cl) have a high correlation with NO 3 , being con rmed in the kriging maps and the PCA, Pearson's correlation, and in the linear regression. Other methodologies must be formulated and applied, to con rm or rule out the possibility of recharging the regional ow of groundwater. The methodology adopted in the study can be perfectly applied in other regions of the planet with conditions of social vulnerability and problems with water quality, with bias of monitoring and de nition of public policies.
Low-quality underground water should not be used for human consumption, as it becomes the main agent for the emergence of waterborne diseases, mainly diarrhea, typhoid fever, hepatitis A, leptospirosis, cholera, and intestinal infections, caused by bacteria, which it reaches from infants to the elderly, especially those who live in areas of social vulnerability and with low-quality water. Therefore, regular monitoring is essential to ensure good quality water and a healthy environment that provides quality of life for local families.    Piper's diagram of the hydrochemical parameters of the mineral water of the Beberibe aquifer in the study area.

Figure 6
Spatial distribution of the hydrochemical parameters of the mineral water from the Beberibe aquifer in the study area.