Spatial prediction of groundwater potentiality mapping using machine 1 learning algorithms

20 Machine learning techniques offer powerful tools for the assessment and management of 21 groundwater resources. Here, we evaluated the groundwater potential maps (GWPMs) in Md. 22 Bazar Block of Birbhum District, India using four GIS-based machine-learning algorithms (MLA) 23 such as predictive neural network (PNN), decision tree (DT), Naïve Bayes classifier (NBC), and 24 random forest (RF). We used a database of 85 dug wells and one piezometer location identified 25 using extensive field study, and employed 12 influencing factors (elevation, slope, drainage density (DD), topographical wetness index, geomorphology, lineament density, rainfall, geology, pond density, land use/land cover (LULC), geology, and soil texture) for evaluation through GIS. 28 The 85 dug wells and 1 piezometer locations were sub-divided into two classes: 70:30 for training 29 and model validation. The DT, RF, PNN, and NBC MLAs were implemented to analyse the 30 relationship between the dug well locations and groundwater influencing factors to generate 31 GWPMs. The results predict excellent groundwater potential areas (GPA) DT RF of 17.38%, 32 14.69%, 20.43%, and 13.97% of the study area, respectively. The prediction accuracy of each 33 GWPM was determined using a receiver operating characteristic (ROC) curve. Using the 30% data 34 sets (validation data), accuracies of 80.1%, 78.30%, 75.20%, and 69.2% were obtained for the 35 PNN, RF, DT, and NBC models, respectively. The ROC values show that the four implemented 36 models provide satisfactory and suitable results for GWP mapping in this region. In addition, the 37 well-known mean decrease Gini (MDG) from the RF MLA was implemented to determine the 38 relative importance of the variables for groundwater potentiality assessment. The MDG revealed 39 that drainage density, lineament density, geomorphology, pond density, elevation, and stream 40 junction frequency were the most useful determinants of GWPM. Our approach to delineate the 41 GWPM can aid in the effective planning and management of groundwater resources in this region. 42


45
Groundwater scarcity and drinking water crisis are among the severe challenges that the planet is 46 facing in the future. Groundwater is the most valuable but diminishing resource, and proper delineation and management strategies are required. According to the World Bank report of 2012, 48 India is a highly groundwater consuming country which uses approximately 230 km 3 of 49 groundwater every year, which is greater than one-fourth of the global total (The World Bank  The LULC is less susceptible to groundwater potentiality because it tends to initiate groundwater 155 discharge (Balamurgan et al. 2016). A LULC map of this block was generated using Landsat8-156 OLI (9 th April 2016) imagery based on a supervised classification method and the results were 157 confirmed by applying Cohen's Kappa index with 89.6% Kappa value. The study area has seven 158 LULC classes: residential area, water bodies, agricultural land, waste land, mining area, forest 159 cover, and sand cover (Fig. 2b). The regional slope angle ranged from 0% to 7.25 %. Groundwater  (Fig. 2i). The TWI is used to assess the influence of topography on hydrogeomorphic 183 processes. It is an integration of the upstream area and slope within per unit width orthogonal to 184 the flow path. The TWI also helps to assessments soil environment and topography influence where α is the cumulative upslope area drainage per unit width orthogonal to the flow direction 190 and β is the inclination of the ground surface at the point (Fig. 2j).

191
Soil type is the most important predisposing factor for the assessment of the infiltration rate in any   219 The decision tree is a hierarchical model-based decision-support method. The DT model simplifies assumptions. The DT model is computationally advanced and can also trickle data illustrated on 225 various measurement scales (Pal and Mather 2003). In this study, a data-driven model, such as DT, 226 was adopted to obtain a more precise and reliable prediction of groundwater potentiality.  228 Random forest is a well-known machine learning algorithm for both regression and classification

244
where S is any groundwater prediction and K represents the separate trees in the algorithm.   255 The PNN is another type of neural network model which estimates the potentiality or susceptibility

289
WQI is a rating of overall water quality, which is influenced by individual parameters of water 290 quality. This is calculated in terms of the level of human consumption. The World Health 291 Organization (WHO) proposed a standard level for drinking purposes for WQI calculations.

292
The entire WQI methodology was conducted in three stages. In the initial stage, 11 parameters 293 (EC, HCO3, Ph, TH, Cl, Na, Ca, F, Mg, K, and SO4) were considered and given their weight (Wi) 294 with respect to their degree of influence on human health (Nik and Pirohit, 2001) ( Table 3). The

305
In the last stage, the WQI was calculated using the sub-index of the ith parameter (Sli  313 The GWPMs were developed using the four well-accepted MLAs (Figs. 3 (a-c)). The groundwater zones make up 73.37% and 6.81% of the total area, respectively (   331 It is vital to visualise the relationships between the selection factors and dug well locations to 332 evaluate the significance of each diver in the upliftment of groundwater recharge (Fig. 4).