Study of groundwater salinity in the HAHA syncline by the Kohonen self-organized classication (Essaouira, Morocco)

The coastal aquifer of the Essaouira syncline (Morocco) was studied to identify the main processes at the origin of the salinization of groundwater. In fact, a multicriteria analysis approach on hydrochemical data and physicochemical parameters of the Plio-Quaternary aquifer was used to understand their spatio-temporal variation and their origins. Currently, integrated water resources management has become paramount to both local, regional, national and international. This management is favored by extreme hydrological events (droughts or oods) which can have direct inuences on human, economic and political aspects. Appropriate management of a resource requires its evaluation. The statistical study by Kohonen's self-organized classication (SOM) of hydrochemical data for the years 1995 and 2009 is used to process 47 samples distributed over the entire study area; it showed an evolution of the values of the parameters. Physico-chemical as a function of time and space with an increase in the values of the parameters, from the center of the study area towards the southwest.


Introduction
Groundwater remains the main water resources for drinking water consumption, agricultural exploitation, and for industrial and tourist use by inhabitants of towns and rural areas in coastal areas with arid climates and semi-arid. Several factors are involved in the degradation of electrical conductivity and consequently the quality of groundwater, these factors can be of natural origin (leaching, evaporation, dissolution, marine aerosols) or of anthropogenic origin (marine intrusion, over-fertilization, water drainage, wastewater spreading, automobile emissions), and sometimes the combination of anthropogenic and natural origins (Andreasen andFleck 1997, Oulaaross 2009).
Generally in nature, the groundwater in a relation with the surface water in ltrated and percolated in the unsaturated zone to reach the aquifer they contain (saturated zone), the effective in ltration rate to in uence the piezometric evolution and consequently the evolution of the water quality of the reservoir. Indeed, a large effective in ltration causes an increase in the piezometric level, an increase in its ow and leads to the dilution of its mineralization (Urish and Frohlich1990;Sherif et al. 2006 ;Attwa and Zamzam 2020). On the contrary, a weak or absent effective in ltration combined with intensive pumping causes a lowering of the water table and a mineral overconcentration of the groundwater (Chafouq et al. 2018). These waters are often threatened by contamination by pollutants of different origins which can be biological, chemical or physical.
The Essaouira basin has seen, like other regions of Morocco, a signi cant decrease in water supplies in quantity and quality. This situation has resulted in the reduction of agricultural productivity and the degradation of several ecosystems. However, this basin has an aquifer system formed by a set of aquifers of unequal size. These aquifers can offer a natural regulating capacity which makes them valuable in ensuring a safe regular supply. The reserve also makes it possible to meet seasonal needs through temporary overexploitation to the extent that recovery is possible (Chamchati et al., 2013).
The diffuse pollution generated by marine waters is ampli ed by point sources of pollution mainly represented by human and agro-industrial activities. As a result, the salt contents recorded in certain places greatly exceed the standard of drinking water or even irrigation (Bahir et al. 2001). This situation risks ultimately jeopardizing the sustainability of agricultural activity as a whole, harms public health and compromises the self-puri cation power of groundwater.
These waters circulate slowly through the subsoil, so much so that pollution from human activities can persist over long periods that can range from a few years to several decades or even longer for certain speci c aquifers. In other words, in coastal areas, groundwater resources require special attention to minimize saltwater intrusions, either locally, or on a regional scale. The extent of salt water intrusion depends on the geometry, structure and properties of the The Cretaceous formations outcrop over the entire Essaouira basin with thicknesses varying from one place to another. In the study area, the Cretaceous outcrops in the east by gray marls and lumachelliclimestones of Senonian age, dolomitic limestones with int of Turonian age, limestones, marls and gypsiferous marls of the Cenomanian along the Tidziriver. Towards the west, the outcrops of the Albian correspond to green marls surmounted by dolomitic limestones of Maestrichtian age. The Lower Cretaceous formations also outcrop to the NW of the study area (Choubertand Faure-Muret1962 ; Duffaud et al. 1966 ;Rey et al. 1950).

Material And Method
The water samples were taken from the plio-quaternary water table in the coastal zone of Essaouira between Qsob river and Tidzi river at the level of 35 wells (Table 1) during 1995 (Mennani 2001) and at the level of 12 wells (Table 2) in 2009 (Chamchati 2014). The physical parameters relate to the piezometric level (PL), the temperature (T) and the electrical conductivity (CE).The chemical parameters concern on the one hand the major cations (calcium, magnesium, sodium and potassium) and on the other hand the major anions (chlorine, nitrates, sulphates and HCO3-). The data has been processed by advanced statistical analysis techniques. The Kohonen Self-Organizing Topological Map (SOM) classi cation method was used to understand and visualize the spatial and temporal distribution of the samples.
Principal component analysis (ACP) and hierarchical classi cation (CHA of SOM) by topological maps were used to validate the classi cation by SOM.
Kohonen Self-Organizing Map (SOM) This method is based on the neural networks of Kohonen (2001). These are common tools increasingly used for multivariate data processing and provide practical visualization results. A SOM map consists of units, called neurons, connected on a regular grid, usually 2D hexagonal grid. It allows the partition of a global training data set (input data space) into a reduced number of subsets, having certain statistical characteristics in common. Each subset is represented by a weight vector which has as many components as the input data vectors.A weight vector is a vector in the virtual input data space and corresponds to a neuron of the network card (output space card). A special feature of SOM algorithms over other grouping methods is the preservation of the topology of the input data space.This conservation of topology allows samples with similar characteristics to be placed together on the map. Correlations and relationships between samples and variables can be easily visualized using the SOM component visualization plans (Vesanto 1999).The map of accuracy in topology preservation is evaluated by the mean quantization error (QE) and topological error (TE) (Kiviluoto1996 ;Kohonen2001).
Component planes show the values of each component in each single neuron and how each input vector varies in the plane of the output space. They allow the detection of correlated variables when viewing several component plans at the same time. Patterns of the same color between variables mean that a variable is increasing or decreasing.In this case, these are positive correlations. Conversely, negative correlations between variables will have the same patterns, but of opposite color distribution (Fig. 3). In this study, the SOM card is used to process all 47 samples. Table 1 Physico-chemical data of samples taken from 35 wells during 1995 (Mennani 2001  It is an automatic classi cation method used in data analysis, from a set of individuals, its goal is to distribute these individuals into a certain number of classes (Fig. 4). These are hierarchical classi cation methods of techniques for sharing the training dataset in certain classes according to their proximity. The rst iteration of an ascending hierarchical classi cation (AHC) algorithm is to combine the two closest individuals. The elements (individuals or group of individuals) pairs are combined according to their proximity to arrive at a single class. Finally, a hierarchy of groups is obtained. Each level of the hierarchy represents a partition of particular data on disjoint groups. For the present study, the hierarchical classi cation SOM (SOM-HC) allows us to group the similar neurons of the SOM map according to the distances between the vectors of corresponding weight. Each particular set of data will be assigned to the corresponding group of its neuron.

Correlation matrix
The correlation matrix is the matrix of correlation coe cients, calculated on several variables taken in pairs. The resulting matrix is symmetric and the elements on the diagonal are equal to 1, the covariance of a variable with itself being equal to its variance. It is used to easily identify the links between the variables. When the covariance is zero, it is certain that the variables are independent and when it is equal to 1 the correlation is established. Intermediate values are more di cult to interpret.
However, a positive value indicates a positive correlation between two variables while a negative value indicates opposite correlations. This coe cient varies between − 1 and + 1; the intensity of the linear relation will therefore be all the stronger as the value of the coe cient is close to + 1 or to − 1, and all the weaker as it is close to 0.

Principal Component Analysis (ACP)
The objective of this methodology is to determine the factors that in uence the variability of the parameters of the Essaouira basin. The ACP method used for this study is based on the interpretation of the correlation matrix as well as the various factors obtained as a result of data processing. The choice of the main axes takes into account the reduction in the number of factors. This number is such that the cumulative sum of the contributions is signi cant (75% which represents three quarters of the total inertia). Indeed, two variables are correlated when their correlation coe cient is greater than or equal to 0.7.
In addition, at the factorial design level, variables are only representative when they are close to the end of these factors. When two variables are correlated, the variation of one results in the variation of the other. Analyzes of variance were used to judge the signi cance of the relationships highlighted by the factor analysis.

Results And Discussion
Kohonen self-organizing map and correlation matrix The distribution maps, taken from the Kohonen map, make it possible to visualize the distribution of the samples according to the hydro-chemical parameters (Table 3, 4, 5).
Dark cells in red represent high values, while blue cells represent low values. The values are represented with a logarithmic scale.A topological map of 48 cells (8 columns x 6 rows) was selected for this analysis with quanti cation errors qe = 0.338 and topography errors te = 0.000 (Fig. 5). Each card is representative of the corresponding variable. Consequently, it makes it possible to study the correlations between the variables. Table 3 Classi cation data and visualization of the spatial and temporal distribution of samples (Classes 1, 2).  The correlation matrix expresses the different correlations between the analyzed variables ( Table 5). The set of correlations is relatively weak. Indeed, more than 77% of the correlation coe cients are less than 0.53. It follows that the analyzed variables are not strongly correlated with each other. This nding signi cantly reduces the redundancy of information and at the same time justi es the relevance in the choice of these variables to conduct the study.
The strong correlations observed link variables of the same class (Table 5)  Hierarchical classi cation ACH -SOM The SOM hierarchical classi cation method applied consists in grouping the neurons as best as possible so as to give a more global view of the SOM map. To obtain a good partition, one can manually choose to cut at a level where the branches of the tree (Dendrogram) are long, indicating that the data contained in the classes are very different.
When you have a lot of data, you just need to visualize the nodes closest to the root. Another possible cutting criterion is to choose according to the distance between the classes. The line of the gure shows a cutoff at 1.2 (Fig. 6). However, it is di cult to automatically determine what the correct cutting value is.
Another criterion is to cut according to the number of classes obtained by the SOM card.
The second class (cluster II) is characterized by neurons 7, 8, 9, 15, 16, 23 and 24 which correspond to stations 7, 11, 21, 24, 35 and 39. These stations belong to the 1995 campaign and are located in the SW part of the study area. The third class (cluster III) presents the greatest number of neurons with 3, 4, 5, 6, 11, The rst principal component (PC1): A rst grouping located in the positive values includes the elements Ca 2+ , Cl − , CE 25°C, Na + , Mg 2+ and to a lesser degree i.e.b.
A second grouping located in the negative values includes PL (m) and rMg / rCa. The second principal component (PC2) correlates slightly with T (°C)in the positive part and with K + and HCO3 − in the negative part.

Projection of individuals
The axes express a signi cant percentage of the information with respectively 42% by PC1 and 16.1% by PC2; which gives a total of 58.1% of the information (Fig. 10, 11) The analysis of the descriptive statistical characteristics, the physicochemical variables used, reveals that the groundwater in the study area has temperatures ranging between 15.5 and 24.5 ° C, with an average of 21.02°C. The electrical conductivity oscillates between 770 and 5040 µs / cm with an average of 2682.19 µs / cm (Table 6).
Class 1 shows the highestPiezometricLevel (PL), while class 3 shows the highest concentrations of Electrical Conductivity (CE), Cl − and Na + compared to the other two classes (Table 6).  This increase is generally due to the in uence of the Qsobriver (dilution of the water table), the diapiritic in uence ( ush and / or sub-ush diapirs) and the in uence of the Atlantic Ocean.
To get an idea of the classi cation of the samples and their sampling position, the coordinates of the stations were projected on the geological map of the study area. This projection showed that the class 1 samples are grouped around the Qsobriver. Feeding the water table by in ltration of fresh water from this river will dilute the concentrations of water in this class, which explains the low values of their parameters (Fig. 12).
The 2nd class is generally located in the eastern part where the Tidzidiapir outcrops. The erosion of the clay-salt outcrops of Triassic age (Diapir de Tidzi) will in uence the synclinal basin and may contaminate the Plio Quaternary water table; this explains the increase in the values of the physico-chemical parameters compared to the 1st class.
The 3rd class is characterized by high values in most of the parameters probably related to its proximity to the Atlantic coast (in ltration of marine waters).

Figure 3
Neighborhood relations on the SOM.

Figure 4
Example of hierarchical classi cation and separation of groups in SOM.

Figure 5
Gradient of values of the hydro-chemical parameters studied on the Kohonen map.

Figure 6
Dendrogram obtained with the hierarchical classi cation ACH-SOM.

Figure 7
Class distribution on Kohonen's SOM map.

Figure 8
Graphical representation of the data in the factorial plane PC1 × PC2.

Figure 9
Graphical representation of the data in the factorial plane PC1 × PC3.

Figure 10
Projection of individuals on the factorial plane PC1 x PC3.

Figure 11
Projection of individuals on the factorial plane PC1 x PC2.

Figure 12
Projection of the 3 classes on the geological map.