Comparison of Data-Driven Methods for Estimating Deuterium and Oxygen-18 Isotopes of Groundwater

11 Isotope techniques are the most commonly used in cases where hydro-chemical analysis is 12 insufficient to identify groundwater's origin and quality and reveal seawater intrusion into 13 groundwater along coastlines. In this study, the potential of the Multilayer Perceptron (MLP), 14 Radial Basis Neural Networks (RBNN), Generalized Regression Neural Networks (GRNN), 15 Adaptive Neuro-Fuzzy Inference System (ANFIS), Support Vector Machines (SVM), Gaussian 16 Process Regression (GPR), Classification and Regression Tree (CART), and Multiple Linear 17 Regression Analysis (MLR) were compared using known hydro-chemical properties of waters 18 for estimating deuterium (δD) and oxyge n- 18 (δ 18 O) isotopes in groundwater of the Bafra plain, 19 Northern Turkey. A total of 61 water samples collected from the plain were chemically 20 analyzed. All data were divided into training (70%) and test (30%) sets. Cluster analysis was 21 performed to reduce the number of input variables, and electrical conductivity (EC), chloride 22 (Cl), magnesium (Mg) and, sulphate (SO 4 ) were introduced into the models as input variables, 23 after examining different combinations of these variables in the studied models. Three statistical 24 indices were used to evaluate models' performances: determination coefficient (R 2 ), root mean 25 square error (RMSE) and mean absolute error (MAE). Moreover, a visualization technique 26 (Taylor diagram) was used to assess the similarities between the measured and estimated δD 27 and δ 18 O values. The comparison revealed that the performance accuracy of MLP was the best among the applied models in δD and δ 18 O estimations. Overall, the study suggests using data- 29 driven methods, especially MLP, when lacking of appropriate laboratories for isotope analysis 30 and facing with high cost. 31


Introduction 33
Groundwater, which is one of the most valuable natural resources, has a dual character. On the 34 one hand, it is a resource that moves in the depths of the earth and abstracted from it; on the 35 other hand, it is a part or total water resource. The dominant role of groundwater resources is  The studies on water quality assessment revealed that the high volume of groundwater 52 abstraction, excessive pumping, and less recharge in coastal wells lead to seawater intrusion, thus increasing groundwater salinization (Klassen and Allen 2017; Mohanty and Rao 2019). 54 The hydro-chemical characteristics of groundwater and stable isotopes (δ 18 O and δ 2 H) have  These studies have contributed significantly to the knowledge base regarding the use of 81 AI technology to estimate water quality parameters. However, no study has been reported to 82 date on applying the AI to estimate deuterium (δD) and oxygen-18 (δ 18 O) isotopes in 83 groundwater. Considering isotopes analysis is very expensive and very few laboratories 84 equipped for to carry out this analysis, the objective of this research was to develop a simple,   The delta's climate is semi-humid, with temperatures ranging from 6.60 °C in January groundwater levels and drainage within the study area. Irrigation is generally performed by using largely border and furrow irrigation methods. About 75% of the land is irrigated with 103 surface water and the rest 25% with groundwater (Cemek et al. 2007).

104
For this analysis, a total of sixty-one water samples was taken from October 2007 to 105 September 2008 in different location of the study area including fifty-six from the 28 different 106 monitoring wells in Bafra plain, and five from Black sea. Samples filtered with a 0.45µm filter, 107 enclosed in polyethylene bottles, and stored at 4ºC until processing. Electrical conductivity 108 (EC) and pH of water samples were measured a handheld portable kit in situ. Major cation (K + , 109 Na + , Ca +2 , Mg +2 ) and anion (Cl -, SO4 -2 ) concentrations were analyzed in the laboratory using  The concentrations (mgL -1 ) of Ca +2 , Mg +2 , K + , and Na + range from 37.00 to 305.00, 46.00 to   Adaptive Neuro-Fuzzy Inference System (ANFIS) 166 This system integrates the ANN's learning ability and relational structure with the decision-167 making mechanism of the fuzzy inference system (FIS). ANFIS performs learning with samples 168 using a training dataset, as is done with ANN. In this way, the optimal ANN structure for 169 solving the associated problem is obtained. In order to identify its effect on samples that were The best decision function f(x) can be stated as; GPR model can be defined as:

283
In MLP models, different numbers of hidden nodes were tested, and the optimal one 284 that generated the lowest RMSE in the testing phase was selected. In the ANFIS technique, MFs were tried to find the best outputs. In the RBNN model, the optimum spread and hidden 287 node numbers were determined using a trial-error approach. In the GRNN application, optimal 288 models were simply obtained using different spread values by trial and error method. In the SVM technique, optimal parameters of SVM are selected using rule and the stopping criteria.

290
In the MLR analysis, the δD and δ 18 O were used as dependent variables, whereas EC, Cl, SO4,   Table 3. is near here 330 The measured and estimated δ 18 O and δD values by the optimal models for MLP,

331
ANFIS, RBNN, GRNN, SVM, GPR, CART, and MLR were plotted in Fig.6 and Fig.7 From these figures, MLP5(4,5,1) models seem to have better results than the other 336 studied models for δ 18 O and δD estimation. Figure 8 shows the scatter plots of the MLP5(4,5,1) 337 models for the measured and modelled δ 18 O and δD values for testing period. Also, the δ 18 O and δD estimation models were evaluated by using a Taylor diagram 340 (Fig. 9). It is shown that the MLP5(4,5,1) models provided a lower SD and RMSE, and a higher 341 correlation coefficient compared to the other studied models. Therefore, comparison of the 342 findings of the models shows that the MLP5(4,5,1) models were the most accurate model in the  Overall, the study suggests using data driven methods, especially MLP, when lacking 361 of appropriate laboratories for isotope analysis and facing with high cost.

364
Ethics approval and consent to participate Not applicable.

365
Consent for publication Not applicable.

366
Availability of data and materials All data is available in the paper. Figure 1 Study area and groundwater sampling sites Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Dendrogram showing the clustering of some parameters of groundwater in Bafra plain The measured and estimated δ18O values by the optimal models for MLP, ANFIS, RBNN, GRNN, SVM, GPR, CART, and MLR  Taylor diagrams for evaluating the δ18O and δD estimation models