Assessment of social vulnerability to groundwater pollution using K-means cluster analysis

It is possible to assess the harm that society suffers as a consequence of groundwater contamination in aquifers. Indexing methodologies are commonly applied to assess the social vulnerability to polluted aquifers. However, they assign weighting and rating values to the different factors involved, which makes them very subjective. This research aims to assess the social vulnerability to groundwater pollution taking into account three factors: the uses of groundwater resources, the exposed population, and the socio-economic losses. In order to eliminate the subjectivity of current indexing methodologies, this work uses a K-means cluster analysis for the assessment of social vulnerability. With this method, a social vulnerability map can be produced with greater objectivity. The proposed methodology is applied to an aquifer located in central Spain, an area with significant agricultural development. Low population density and unproductive zones result in low social vulnerability in most of the area. However, high social vulnerability is observed in the southern sector due to agricultural development, which leads to higher socio-economic variables and demand for groundwater resources. Similarly, high social vulnerability is observed in the northeast, mainly influenced by the groundwater use and the exposed population. These results show that social vulnerability in most of the study area is not very significant for assessing the risk of groundwater contamination, because the damage to the social, environmental, or economic sector is low. However, in the south and northeast of the study area, pesticides and fertilizers should be used with caution, as they significantly increase the risk of groundwater contamination. The K-means clustering method proved to be an objective and reliable option for assessing social vulnerability to groundwater pollution in aquifers.


Introduction
Groundwater represents the most important source of water supply for urban, industrial, and agricultural uses in areas where surface water resources are scarce or the use of supply sources is limited by water quality. In general, groundwater is of better quality than surface water because aquifers have a natural protection against anthropogenic pollutants. However, natural or geogenic contamination by dissolution or chemical reactions between water and the solid matrix (rock, soil) is also important. In some cases, the use of contaminated groundwater negatively affects society, endangering human health, the environment, or the economic development of a region (Cutter 1996(Cutter , 2010Grondona et al. 2015;Perles et al. 2008) Social vulnerability is a simple way to assess the potential damage to society from a natural or anthropogenic event (Cutter 1996;Perles et al. 2008). In environmental studies, some authors (e.g., Ducci 1999;Huan et al. 2012;Sajedi-Hosseini et al. 2018; Responsible Editor: Baojing Gu * Marisela Uzcategui-Salazar mariselauzcateguis@gmail.com 1 Rahman et al. 2021) consider that groundwater represents a valuable resource. Thus, the value of the water supply resource must be taken into account. The socio-economic value associated with groundwater supply uses incorporate variables such as population, number of employees, and economic productivity linked to activities that depend on groundwater resource (Ducci 1999;Vias 2005;Perles et al. 2008;French et al. 2017;Orellana-Macías and Perles Roselló 2022). Indexing methodologies have been developed to assess the social vulnerability (Ducci 1999;Vias 2005;Perles et al. 2008;Grondona et al. 2015;Orellana-Macías and Perles Roselló 2022). The methods incorporate the factors outlined above by assigning weighting and rating values that describe the degree of vulnerability of society to groundwater contamination. However, the subjectivity involved in the selection of relative weighting and rating values is a disadvantage in the application of these methodologies.
The main goal of this work is to develop a new methodology for the assessment of social vulnerability to anthropogenic groundwater pollution, through the application of K-means clustering techniques. The methodology will be applied to a case study related to a detrital aquifer located in the region of Madrid (central Spain).

The study area
The study area is located in the southeast of the region of Madrid (central Spain) and has an approximate area of 133 km 2 (Fig. 1). The climate is temperate-continental Mediterranean, with an average rainfall of 440 mm/year (Mostaza-Colado et al. 2018). The Jarama River is the main surface water resource, crossing the study area from north to south. In the south, a water canal (the so-called locally "Acequia del Jarama") was constructed to channel water from the Jarama River and provide water for irrigation (Mostaza 2019).
In the study area, there is a detrital aquifer formed by three groundwater bodies, according to the definition of the regional water authority (Confederación Hidrográfica del Tajo, CHET): "Aluvial del Jarama: Guadalajara Madrid," "Guadalajara" and "Aluviales Jarama-Tajuña." The latter is the most important because it covers more than 80% of the area (Fig. 1a). The aquifer is unconfined and shallow (~ 6-m depth, Carreño Conde et al. 2014) and consists of Quaternary sedimentary deposits: gravels and sands, with intercalations of clays and silts (Fig. 1). Quaternary deposits lie on Tertiary sedimentary materials that  (Bardají et al. 1990) include gypsum and marls and constitute the aquifer basement. Three vertical electrical sounding (VES) (Bardají et al. 1990) allow the interpretation of lithological sections of the aquifer (Fig. 1b), which presents an average thickness of about 10 m (Carreño Conde et al. 2014). The region has an important agricultural (mainly arable and tree crops) and livestock development (Fernández González 2013;MAPA 2021a). The population is scarce because most of the territory is used for agricultural activities (Comunidad de Madrid 2020).
Although it is not the only source of water, the aquifer is an important and valuable water resource in the region, providing water for agricultural, urban, and industrial activities (CHT 2015a). The availability of water (from the Jarama Canal, Jarama River, and the aquifer) and the fertile soil condition in the region have long promoted the agricultural activities (Mostaza-Colado et al. 2018). The intensive use of fertilizers has generated an important impact on groundwater, increasing the nitrate concentration values that exceed the limit established by the authority (50mg/L) (BOE 2009;CHT 2015b) in some cases.

Data set collection
The data considered in this work included the following: The data were stored as a geographical database in Arc-GIS v10.4.1. The whole study area (~133 km 2 ) was divided into 5842 pixels with a cell size of 150 m×150 m, in order to obtain a large data set to evaluate the different variables at each point. The thematic maps were obtained using map algebra tool in ArcGIS v10.4.1.

Estimation of factors to determine the social vulnerability
The social vulnerability assessment was carried out in the following two stages: • Estimation of factors affecting the social vulnerability.
Three different factors were considered to assess the social vulnerability to a groundwater contamination event. These factors, which include social, economic, and environmental aspects, were evaluated by considering the following settings: • Vulnerability of groundwater resources (V GR ) • Vulnerability of exposed population (V P ) • Socio-economic vulnerability (V S-E ) • Mapping of social vulnerability using cluster analysis (K-means algorithm) A normalization of the obtained factor values was performed to standardize the ranges of the values to avoid the bias of higher values over lower values (0-1) (Eq. 1; Salazar and Del Castillo 2018).
where Fx is the value of the factor in the x point, and Fmin, Fmax are the minimum and maximum values of the range, respectively.

Vulnerability of groundwater resources (V GR )
This factor represents the amount of the groundwater resources that can be affected by contamination, as it reduces the groundwater available for different uses. Groundwater contamination also has a negative impact because the contaminated water can reach other water bodies, affecting associated ecosystems. In the study area, the Jarama River and the aquifer have a hydraulic connection that incorporates water from the aquifer to the river (Mostaza 2019). To obtain the vulnerability of groundwater resources according to groundwater uses, three variables were used: urban uses (U u ), agricultural uses (A u ), and industrial uses (I u ). The amount of water abstracted for the different uses is estimated by CHET to each groundwater body (CHT 2015a).
The vulnerability of groundwater resources was calculated as the sum of urban, agricultural, and industrial uses, since to the amount of groundwater resources affected by a contamination event in turn affects all groundwater uses. Thus, Eq. 2 is proposed in this research to calculate the vulnerability of groundwater resources.
where V GR is the vulnerability of groundwater resources, U u is the urban uses map of groundwater, A u is the agricultural uses map of groundwater, and I u is the industrial uses map of groundwater. Vulnerability of exposed population (V p ) The exposed population was calculated from the population density located in the study area (Comunidad de Madrid 2020) and the percentage of urban groundwater use (CHT 2015a). On this basis, the population affected by the consumption of polluted groundwater was calculated as the number of inhabitants per square kilometer multiplied by the percentage of urban groundwater use, as proposed in Eq. 3.
Percentage of groundwater urban use corresponds to the percentage of the total groundwater use in each groundwater body ( Table 1).
The population density map was obtained from the number of inhabitants per square kilometer within each municipality in the study area (Comunidad de Madrid 2020)

Socio-economic vulnerability (V S-E )
To assess this factor, social and economic activities that depend on the groundwater resources were considered. As (2) V GR = U u + A u + I u (3) V P = Population density × % of groundwater urban use The superposition (sum) of these four variables results in the total socio-economic vulnerability. Thus, we proposed Eq. 4 to calculate the socio-economic vulnerability (V S-E ) It should be noted that irrigation facilities using surfaces water (the Jarama Canal) reduce the use of groundwater for irrigation, which in turn contributes to reduce the socioeconomic vulnerability due to groundwater contamination. Thus, a reduction factor (Rf) can be considered according = Land prices + crops production + livestock production + agricultural employment to the percentage of irrigated area with surface water, as shown in Table 2. Some municipalities into the study area use the Jarama Canal to irrigate part of the cultivated areas, due to the ease of access to water provided by this canal. The percentage of surface area irrigated with water from Jarama Canal is shown in Table 3.
According to the percentage of surface area irrigated with water from the Jarama Canal in each municipality, a reduction factor (Rf) value was assigned according to Table 2.
The spatial distribution of Rf in the study area is shown in Fig. 3.
Finally, the socio-economic vulnerability value was adjusted by applying the reduction factor to Eqs. 4 and 5.

• Land prices
The price of land is directly associated with the type of activity being developed. As already mentioned , the agricultural activity is the most important economic activity in the study area. For this reason, it was considered to obtain land prices according to the types of crops and irrigation uses. It is important to note that the soil contamination by irrigation with polluted water degrades the soil conditions for future crops, thus devaluing the land. Land prices for 2019 in the study area are shown in Table 4.
The land use map (IGN 2018) was used to delimit the type of crop and land classes in the study area (Fig. 4).

• Agricultural production (crops)
There are two types of crops in the study area: arable crops and woody crops. Both are delimited in the agricultural regions established by the government authority (Ministerio de Agricultura, Alimentación y Medioambiente de España) (Fernández González 2013). Thus, the agricultural  regions provided the information about surface of the cultivated areas of different crops. The main arable crops in the study area are wheat, barley, corn, chickpea, and oat (Fernández González 2013). Each type of crop has its own yield and market price (MAPA 2020, 2021c, 2021d) ( Table 5).
The main woody crops in study area are vineyard, olive groves, and fruit trees (not citrus) (Fernández González 2013). As with arable crops, each type of crop has a particular yield and a market price (MAPA 2020; Subsecretaría de Agricultura, Pesca y Alimentación 2020).
The value of the production obtained by woody crops varies according to the specific product obtained. For this reason, the prices of the different product were averaged to obtain a single value per product ( Table 6). The production of each type of woody crop is shown in Table 7.
The agricultural production of each type of crop was obtained by multiplying the cultivated area of the different crops by the economic production.
Finally, the total production of agricultural crops was obtained from the sum of agricultural production maps (arable and woody) obtained for different crops.
• Livestock production Livestock production in the study area is based on bovine, ovine, goats, porcine, and poultry livestock, depending on the water resources available from different groundwater bodies (Table 8).
Each livestock production generates different products, which have an associated yield and market price. The number of heads of livestock per hectare multiplied by the surface area of each groundwater body resulted in the total number of heads for each type of livestock. The total production for the different types of livestock was obtained by multiplying the number of heads related to each groundwater body by the annual production in €/year of each associated by-products (meat, milk, wool, and eggs) (Tables 9, 10, and 11).
In this work, Eq. 6 is proposed to calculate the total livestock production associated with the groundwater bodies in the study area (6) Livestock production C year = Bovine production + Ovine production + Goat production + porcine production + poultry production  • Employment related to agricultural activities In this research, employment was considered the main social variable to evaluate the vulnerability. Although there are four productive sectors in the study area (agriculture, industry, building, and services), the agricultural sector is the main and most important sector that depends on groundwater resources. For this reason, the agricultural employment was chosen to assess the impact of the groundwater contamination. The employment was calculated considering the permanent employment between June 2020 and July 2021, and the employment density per square kilometer (Table 12).
To assess the social impact of employment in the sector, the economic value generated through wages was considered. Taking into account that the average salary in the agricultural sector from 2019 to 2020 period in Spain was 16470€/year (MAPA 2021b), we propose to calculate the value of employment using the Eq. 7.

Social vulnerability mapping by cluster analysis (K-means method)
K-means cluster analysis was applied to the entire data set for the three factors obtained above. There were 5842 points (records) and three factors (vulnerability of groundwater resources-V GR -, vulnerability of exposed population-V P -, and social and economic vulnerability-V S-E -).
Data processing was carried out using RStudio v.4.0.5 software. Each factor was normalized with the max-min scaling method, in order to reduce the bias caused by predominance of very high ranges over lower ranges. The tool of extract values to point in ArcGIS v10.4.1 was used to obtain the value of each variable for the 5842 points.
The goal of K-means is to cluster data points with intrinsic similarities in the data set. In this research, this process started with the selection of the optimal number of clusters, which was determined by the R package NbClust using the majority rule (Charrad et al. 2014). Euclidean distance was used to find the distance from each point in the data set to a temporal cluster. The minimum distance of the sum of squared errors of the distance A (Eq. 8, Dabbura 2020) between each point to the centroid of each cluster was considered to locate points in them.
where x k = (x 1 , x 2 , x 3 ,……..x n ) are the data belonging to the k i cluster; and m i is the centroid of the cluster k i (Eq. 9, (Dabbura 2020)): where N i is the number of data objects in the cluster i. The procedure finishes when no points are reallocated from one cluster to another or when a pre-defined number of iterations is reached (Dabbura 2020). Table 13 shows the amount of groundwater for different uses in each groundwater body. The higher the quantity of groundwater resources affected, the higher the vulnerability and vice versa. In the case studied, agricultural uses account for the largest groundwater consumption in at least two of the three groundwater bodies (more than 50% of groundwater uses). For this reason, the activities associated with this sector are more severely impacted than the  others. On the other hand, the "Guadalajara (030.006)" groundwater body is the one with the highest water consumption, although it represents a small area in the study (Fig. 1). Figure 5 shows the spatial representation of total groundwater consumption. The largest groundwater use correspond to a small portion of the study area (almost 6% of the surface) and is mainly located to the northeast in the "Guadalajara (030.006)" water body. Although this consumption is mainly due to agricultural development, but urban and industrial consumption is also high in this area (Table 13). The total consumption in this zone represents almost 60% of the groundwater resources available in the study area (~23.02 hm 3 /year). Despite being a very small area, the amount of groundwater affected by a contamination event is very significant. It is important to note that in recent years, the nitrate contamination increased in this area (CHT 2019; Mostaza 2019) (Fig. 2), highlighting the impact of urban and agricultural uses on groundwater contamination. Most of the study area (almost 80% of the surface) consumes around 40% of the available groundwater resources (~14.20 hm 3 /year) in the water body "Aluvial Jarama-Tajuña (030.007)," which represents a significant amount of the available groundwater that affects a large part of the study area along the central zone from north to south. This consumption is mainly due to agricultural activities and, to a lesser extent, industrial use (Table 13). This means that an eventual contamination of the aquifer could have a major impact on the environment and agricultural activities, negatively affecting the economic and social development of a large part of the region. The hydraulic connection between the aquifer and the Jarama River favors an eventual contamination of the river water from the aquifer, affecting other natural resources. Fig. 2 shows high values of nitrate concentration in this zone, mainly in the south, which evidences the potential impact on groundwater resources in this area. A minor impact on groundwater resources occurs in a small area to the northwest of the study area, in the "Jarama Aluvial: Guadalajara-Madrid (030.024)," with an area of slightly more than 5%. Groundwater uses in this area represent only 5% of the available groundwater resources (~2.02 hm 3 /year) and are mainly related to industrial uses (Table 13). The low nitrate concentration in this area (Fig. 2) implies a lower impact on groundwater resources. The groundwater resources vulnerability factor (V GR ) implied the consideration of an environmental component due to the possibility of incorporating contaminated water into the Jarama River (and its canal) from the aquifer, causing significant damage to flora and fauna and as well as socio-economic losses.

Vulnerability of exposed population (V P )
Figures 6 and 7 show the population density in the study area (Comunidad de Madrid 2020) and its percentage of urban groundwater use, respectively. In general, the study area has a low population density, as it is mainly an agricultural region. Therefore, urban development is low. The highest urban groundwater consumption is only 16% and occurs in a small area located to the northeast (5% of the total area). However, in this area, the nitrate concentration has increased in recent years (Fig. 2). Thus, the impact on urban uses is significant, and a high vulnerability is expected in this zone.
The distribution of the population exposed to the consumption of contaminated groundwater is shown in Fig. 8. By density, the population exposed is low. More than 90% of the study area presents an exposed population density of less than 50 inhabitants per square kilometer, due to the limited urban development in the area. For this reason, the vulnerability of the exposed population do not have a significant influence on the social vulnerability of the study area.

Socio-economic vulnerability (V S-E )
As mentioned in previous sections, the price of land depends on the agricultural development and is higher where irrigated crops are present mainly due to the presence of irrigation facilities. Fig. 9a shows that about 60% of the area has a total land price below one million euros (the lowest value), which is consistent with the fact that most of study area has few irrigated crops and a significant portion of unproductive areas. Thus, the land class map (Fig. 4) shows that most of the areas are unproductive, forested, and rainfed crops zones. On the other hand, about 13% of the area has a total land value of more than ten million euros. This corresponds to the southern sector, with a high development of irrigated crops as the Jarama Canal is providing water for irrigation.
The crop production reached about four million euros in most of the study area (44% of the area), mainly by arable crops of corn and wheat and olive groves, located the southcentral part of the study area. The north of the study area had a low crop production of less than one million euros, due to arable crops of barley, corn oat, and fruit trees (38% of the study area). In the central area, the production of arable crops of wheat, barley, corn, chickpeas, vineyard, olive groves, and fruit trees reached about three million euros (Fig. 9b). About 80% of the study area produces one million euros or more of livestock, mainly bovine and ovine. However, this value is low compared with crop production (Fig. 9c).
The agricultural employment is the most influenced by groundwater uses. The economic value contributed by employment income was less than 300,000 euros/year, being lower than other economic variables. The employment income was higher in the south of the study area (Fig. 9d), due to higher agricultural development.
The highest values of socio-economic vulnerability were located in the south of study area (Fig. 10), mainly influenced by land prices and crops production. Both are in turn strongly conditioned by the availability of water to irrigate crops by the Jarama Canal. Although in this zone the socio-economic vulnerability decreases due to the reduction factor by the irrigation facilities of the Jarama Canal (Fig. 3), it is still the area with the highest socio-economic vulnerability. This effect reduces the socio-economic vulnerability by 10-20% due to the use of a combination of surface and groundwater for irrigation. However, even with this effect, the social vulnerability in this area remained high. The lowest values are located in the north due to scarcity or absence of crops and low livestock production, which implies low agricultural employment in this zone. In the north-central area, the production of crops and livestock led to a moderate socio-economic vulnerability.

Social vulnerability by K-means cluster analysis
The optimal number of clusters obtained by NbClust was five (Fig. 11).
The results of the K-means cluster analysis are summarized in Table 14.
Cluster 1 gave the highest value of socio-economic vulnerability due to the important agricultural development located in the south of the study area. However, the reduction factor affecting this zone related to the use of surface water to supply irrigation decreases the social vulnerability in this sector. In fact, the groundwater resource has a moderate consumption in this area. In addition, the population in this zone is small, which generates a reduced effect on human consumption. These conditions suggest that this cluster belongs to the high vulnerability mainly influenced by socio-economic factors.
Cluster 3 represents the lowest values of vulnerability of groundwater resources and population vulnerability. At the same time, social-economic vulnerability in this cluster is very low. These conditions contribute to very low or no social vulnerability. This cluster includes areas where there are no groundwater bodies (at the boundary of the study area and in a narrow branch located in the northeast), which reduces the impact on the water resources. In addition, the groundwater use in the northwest of the "Aluvial del Jarama: Guadalajara-Madrid" groundwater body has the lowest urban, industrial, and agricultural consumption (Table 13), resulting in the lowest population and socioeconomic vulnerability.
Clusters 2 and 4 show low population vulnerability values due to the low urban development in these areas. However, the groundwater resources vulnerability overlaps in the range of index values between these clusters, with Fig. 8 Density of population exposed to consumption of contaminated groundwater values being between the highest and lowest ranges. This suggests that clusters 2 and 4 represent low and moderate social vulnerability, but it is not clear which corresponds to which. The socio-economic vulnerability is used to clarify the classification and assign clusters 2 and 4 to their corresponding level of social vulnerability. Cluster 4 has the highest values of socio-economic vulnerability and cluster 2 the lowest ones. For this reason, cluster 2 represents for low vulnerability and cluster 4 a moderate vulnerability. In this case, both population and groundwater resource vulnerabilities have had a greater influence on creation of clusters, and the socio-economic vulnerability has allowed the classification to be refined.
Cluster 5 includes the highest values for the vulnerability of the exposed population and the vulnerability groundwater resources; therefore, a very high social vulnerability would be expected. However, as mentioned above, the population vulnerability did not have a relevant influence on social vulnerability due to the low population. In fact, the socio-economic vulnerability was the lowest value, which means that despite the higher impact on the population and groundwater resource, the economic losses due to an eventual groundwater contamination would be low. Taking into account the very high vulnerability of exposed population and groundwater resources, cluster 5 was considered representative of high social vulnerability, similarly to cluster 1.
According to previous discussion, no very high social vulnerability was identified in the clusters identified.
The clusters distribution and the social vulnerability maps using K-means analysis are shown in Fig. 12.
Some localities are not associated with a groundwater body and therefore in these localities, social vulnerability has been considered null. According to the results, more than 50% of the study area has a low social vulnerability, mainly located in the central zone and in some parts of north central and the southern edges of the aquifer. This is due to the low population in these areas as well as low agricultural development, which implies a low economic production. In addition, about 10% of the area has little or no groundwater use, which results in very low or no social vulnerability. This condition is observed in the northwest sector of the "Alluvial del Jarama: Guadalajara-Madrid" groundwater body, which only consumes 5% of the groundwater resource in the study area. The low nitrate concentrations in this area (Fig. 2) validate the effectiveness of the methodology used, since they are in agreement with the results obtained. A small part of the study area (11%) located in the center-north and in some areas in the south, shows moderate social vulnerability, mainly influenced by the economic impact on agricultural development. It is important to note that the influence of surface water supplied by the Jarama Canal in the southern part of the study area reduces the impact of groundwater contamination (the in the study area) and the impact of the exposed population (~166 inhabitants per square kilometers). In the south of the study area, the high social vulnerability is due to the impact on the social and economic sectors related to agricultural production, employment, and land prices, although this impact is partially reduced by the use of irrigation water from the Jarama Canal. Despite this, the groundwater consumption in this area represents the 36% of the total groundwater resource of the study area, which is a significant amount of the resource used for irrigation needs that negatively affects the economic and social development in the area. The high nitrate concentrations in these two differentiated zones (Fig. 2) show the reliability of the results, as the areas of high social vulnerability correspond to high pollution.
These results demonstrate the advantage of using K-means cluster analysis in the assessment of social vulnerability, since it was not necessary to assign numerical rates and weights to the considered vulnerability factors. The similarity in the data set grouped the information required to classify the social impact, and the results are in agreement with the real situation of groundwater quality in the aquifer.

Conclusions
The interaction between contaminated groundwater, nature, and society requires an assessment of the risk and the damage that a contamination event could cause in a region. In this work, the assessment of social vulnerability allowed the delimitation of zones for an adequate prevention of damage to society by contaminated groundwater. To assess the social vulnerability, three factors were considered: the groundwater resource (V GR ), the exposed population (V P ), and the socioeconomic impact (V S-E ).
The results point out that the vulnerability of groundwater resources represents an important and influential factor in the assessment of social vulnerability. The agricultural uses account for more than 50% of the groundwater use, as agricultural activities are the most important productive sector in the region that depends on groundwater resources. On the other hand, a high value of groundwater use due to agricultural, industrial, and urban activities was identified in the northeastern part of the study area. Although small, this area revealed an important influence on the assessment of social vulnerability.
The study area is sparsely populated due to its significant agricultural development, which reduces the risk of people use of surface water for irrigation reaches 23% in some localities). This result is in agreement with the observed in nitrate concentration maps, which have a nitrate concentration around 20-40 mg/L, which represents a contaminant presence that, although high, does not exceed the permitted limit (50 mg/L). Finally, almost 20% of the area presents high social vulnerability, distributed in two different sectors and influenced by different factors. In the north, the high social vulnerability is due to the high groundwater resource vulnerability (~50% of the groundwater consumed exposed to contaminated groundwater consumption. In fact, less than 50 inhabitants per square kilometer are exposed in most of the study area (more than 90%). For this reason, the exposed population has a low influence on social vulnerability.
The socio-economic vulnerability is closely influenced by agricultural activities in the region that involve crops and livestock production that affect land prices and agricultural employment. The highest socio-economic vulnerability was observed in the south of the study area. This area covers ~14% of the study area, representing a small but significantly affected area.
The K-means cluster analysis made it possible to assess and delimit areas to classify the social vulnerability without using weighting and rating values for the three factors considered. Five clusters were obtained, revealing four levels of social vulnerability: very low (cluster 3), low (cluster 2), moderate (cluster 4), and high (clusters 1 and 5) social vulnerability, respectively.
Most of the social vulnerability in the study area is low, which means that an eventual groundwater contamination would not cause a significant impact in society. However, in the south (14.2% of the area) and in the northeast (5.7% of the area), there are small zones that could be highly affected with a more significant impact. In the northeast, it is due to the exposed population and groundwater resource; and in the south, it is due the socio-economic factor related to agricultural development. In contrast, in the northwest of the study area, there is a small area (10.9%) that is not affected by a groundwater contamination at the social level. The small and scattered areas in the center-north and south present moderate social vulnerability, due to the groundwater resources factor (accounting for 36% of the total of groundwater consumption) and the socio-economic factor related to the agricultural development. The results are consistent with the distribution of nitrate concentration in the aquifer, which validates and confirms the advantage of using a novel objective method through K-means cluster analysis in the assessment of social vulnerability to groundwater pollution. It is important to note that the assessment of the risk of contamination of an aquifer involves other variables in addition to social vulnerability, such as the intrinsic vulnerability of the aquifer and the assessment of the pollutants involved. It is therefore recommended that a K-means analysis be applied to obtain these other components to estimate the risk of contamination of aquifers in future research.  Author contribution All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Marisela Uzcategui-Salazar. The first draft of the manuscript was written by Marisela Uzcategui-Salazar and Javier Lillo, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declarations
Ethics approval This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate Not applicable.

Consent for publication Not applicable
Competing interests The authors declare no competing interests.