Assessment and Prediction of Groundwater using Geospatial and ANN Modeling

In semi-arid regions, the deterioration in groundwater quality and drop in water level upshots the importance of water resource management for drinking and irrigation. Therefore geospatial techniques could be integrated with mathematical models for accurate spatiotemporal mapping of groundwater risk areas at the village level. In the present study, changes in water level, quality patterns, and future trends were analyzed using eight years (2012–2019) groundwater data for 171 villages of the Phagi tehsil, Jaipur district. Kriging interpolation method was used to draw spatial maps for the pre-monsoon season. These datasets were integrated with three different time series forecasting models (Simple Exponential Smoothing, Holt's Trend Method, ARIMA) and Artificial Neural Network models for accurate prediction of groundwater level and quality parameters. Results reveal that the ANN model can describe groundwater level and quality parameters more accurately than the time series forecasting models. The change in groundwater level was observed with more than 4.0 m rise in 81 villages during 2012–2013, whereas ANN predicted results of 2023–2024 predict no rise in water level > 4.0 m. However, based on predicted results of 2024, the water level will drop by more than 6.0 m in 16 villages of Phagi. Assessment of water quality index reveals unfit groundwater in 74% villages for human consumption in 2024. This time series and projected groundwater level and quality at the micro-level can assist decision-makers in sustainable groundwater management.


Introduction
In arid and semi-arid environment where rainfall is scanty and highly variable with very high evaporation rate; groundwater is the vital local source for drinking, agricultural, industrial and domestic uses. Groundwater is generally considered better than surface water because of its higher quality, less evapo-transpiration and less susceptible to contamination (Chenini and Ben 2010;Kumar et al. 2016). In the last few decades, the availability of ground water is at a greater risk in Rajasthan state of India due to growing population, urbanization, and large quantities of groundwater withdrawal for crop production. Over-exploitation of groundwater resources and drought events have caused severe drop in water table level of Rajasthan. Acute water crisis could be observed in many parts of Rajasthan especially during the summer months. The variations in the groundwater level reflects the impact of climatic condition, groundwater consumption, water storage and other human activities (Minville et al. 2010;Ghazavi et al. 2012); hence groundwater level fluctuation is an important indicator of the ecology and hydrology of the arid region (Jolly et al. 2008).
The groundwater quality of a region worsens due to geochemical reactions in the aquifers/soils during its transportation through canals/drainages, and excessive use of chemical fertilizers and pesticides (Singh et al. 2013;Rawat et al. 2018). The use of such contaminated water may lead to variety of water borne diseases, hence periodic monitoring and assessment of groundwater resources becomes necessary to determine the impact of human activities on groundwater deterioration ).
In areas with shortage of surface waterbodies, development of different management strategies for groundwater resource is emerging as an area of great concern in recent decades.
Geospatial technique can be the powerful tool for finding the solutions related to groundwater resources problems such as determining water availability, rise and drop in water level, assessing water quality, monitoring, modeling, estimation of the contaminant concentrations in locations that lack measurement data (Gharbia et al. 2016) and efficient use of groundwater resources at local or regional scale (Srivastava et al. 2012;Verma et al. 2017). Apart from this the detailed spatial and temporal trend patterns of groundwater quality and quantity over a certain period of time using geospatial techniques, could provide reliable, efficient, cost effective and sustainable management measures. With the availability of soft computing techniques, different models could be adopted to understand the scenarios of groundwater for both quantity and quality (Nas and Berktay 2010;Charulatha et al. 2017).
Few research studies (Yesilnacar et al. 2008;Mohanty et al. 2010;Sunayana et al. 2019) suggest that neural network models could help in predicting the groundwater quality and groundwater level well in advance for understanding the future scenario. Artificial neural network (ANN) models have also been applied to the water quality problems by several researchers (Singh et al. 2009;Ay and Kisi 2012;Csábrági et al. 2017). Present study is confined to semi-arid rural region (Phagi tehsil) of Jaipur district in Rajasthan state. In Phagi tehsil most of the population is dependent on groundwater for their domestic and agriculture needs. The groundwater quality of the study area is under stress of severe pollution (Singh et al.2012;Sharma et al. 2015), hence deterioration in groundwater quality and drop in water level marks the importance of spatio-temporal mapping with geospatial interpolation technique.
Geospatial technique can evaluate the groundwater risk areas of a region even with the availability of limited sample points (measured data). Therefore, it is imperative to understand the water quality trend patterns, changes in groundwater level and future scenarios of groundwater quality and quantity in Phagi tehsil. The aim of this research is: 1) to ascertain the spatial and temporal variations in the groundwater level and groundwater quality of Phagi tehsil during last 8 years (2012 to 2019) by developing spatial interpolation maps; 2) modeling of water level and important water quality parameters using time series forecasting & ANN models and identification of optimum model based on model validation using historic data; 3) to predict the groundwater level changes and groundwater quality during 2019 to 2024 using the best forcasting model identified for understanding the variations with space and time at village level.

Study Area
Phagi Tehsil is located in Jaipur district of Rajasthan state. It covers geographical area of  Celsius. More than 90% annual rainfall is received during monsoon season. Annual average rainfall during the period 2005 to 2019 has been 539.89mm (Fig. 2a) in Phagi tehsil. The depth of groundwater varies from 3 to 45m below ground level (bgl) in study area.   Groundwater at shallow depth occurs under water table condition and under semi-confined conditions at depth (CGWB 2017).

Data Collection
The pre-monsoon data was collected for groundwater level and groundwater quality from State Ground Water Department (SGWD), Jaipur during 2012 to 2019 for different sampling locations (Fig. 2b) within the Phagi tehsil. The source of all the water level sampling locations was dug wells and piezometric wells however, water quality samples were collected from hand pumps, bore well and wells. The sampling locations were plotted in GIS environment using World Geodetic System (WGS) 1984 as datum and Universal Transverse Mercator (UTM) Zone 43 North as projection system. Table 1 and 2 shows the groundwater level and groundwater quality variations respectively in Phagi tehsil during the study period. Base maps such as district, tehsil, panchayat and village boundary along with the main rivers and hydrogeology of the study area were generated using ArcGIS.  In present study, spatial analyst tool is applied for analyzing spatial and temporal trends of groundwater level and groundwater quality to achieve a better picture of the behavior of aquifer system over a long period for 171 villages of Phagi tehsil. Multiple year data of pre-monsoon season was used because normally the groundwater quality is worst during this period.
Ordinary kriging technique was applied to describe and model spatial patterns, predict values at unmeasured locations, and assess the uncertainty associated with a predicted value. Kriging can be seen as an unbiased point interpolation method, which requires a point map as input and returns a raster map with estimations. Kriging is known to be an exact estimator because observation points are correctly re-estimated (Marsily, 1986;Journel, 1989). The advantage of Kriging above inverse distance weighted (IDW) method is that it provides a measure of the probable error associated with the estimates. The estimated or predicted values (Z) is thus a linear combination known input point values (zi) and have a minimum estimation error. Thus, where, Wi = weight factors.
In case the value of an output pixel would only depend on three input points, so: should be taken into account in the calculation of an output pixel value by specifying a limiting distance and a minimum and maximum number of points (Nayak et al. 2015). to 2019. Trend shows that the watertable level in northern and eastern parts of the Phagi tehsil area is deeper than the other parts of the study area. The groundwater quality data was also the input value raster that intersect or fall within each zone of a specified input zone dataset. A single output value is computed for every zone in the input zone dataset. In present study zonal statistics tool is used to create groundwater quality and groundwater level at village level for Phagi tehsil from interpolated raster layers. Fig. 4 shows the temporal changes (2012 to 2019) in selected water quality parameters at village level for pre-monsoon season. In SES method forecasts are produced using weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, higher weights are given to the more recent observations and vice versa. The basic equation (Brown and Robert G. 1956) is represented by 'y' beginning at time t = 0, and the output of the exponential smoothing algorithm is written as 'St', which may be regarded as a best estimate of what the next value of 'y' will be. Equation is expressed as: where, α = the smoothing constant, varies from 0 to 1. When α is close to zero, smoothing happens slowly, and t = time period. However, HTM method considers the trend component while generating forecasts. This method involves two smoothing equations, one for the level and another one for the trend component. Holt (1957) extended simple exponential smoothing to allow the forecasting of data with a trend.
Forecast equation ŷ t+h|t = ℓ + hb (4) where ℓt is an estimate of the level of the series at time t, bt is an estimate of the trend (slope) of the time series at time t and α is the smoothing parameter for the level, 0 ≤ α ≤ 1, and β * is the smoothing parameter for the trend, 0 ≤ β ≤ 1. The level equation here shows that ℓt is a ARIMA modeling is one of the most popular approaches to time series forecasting.
While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA model aims to describe the auto-correlations in the data. It explains a given time series based on its own past values, that is, its own lags and the lagged forecast errors, so that equation can be used to forecast future values. The ARIMA model is denoted by ARIMA (p,d,q), where 'p' stands for the order of the auto regressive process, 'd' is the order of the data stationary and 'q' is the order of the moving average process. The general form of the ARIMA (p,d,q) can be written as described by Judge, et al. (1988).
where, ∆ d denotes differencing of order d, i.e., ∆ Zt = Zt -Zt-1, ∆ 2 Zt-1 = ∆ Zt -∆ Zt-1 and so on, Zt-1 -----Zt-p are past observations (lags), 'C' is a constant and 1 --------p are coefficient to be estimated by auto regressive model. The auto regressive model of order 'p' denoted by AR (P) and is written as where, et is a random variable with zero mean and constant variance. In the moving average (MA) model, coefficient 'F' needs to be estimated. So MA model of order q can be written as: The degree of the homogeneity (d) was determined on the basis of auto correlation function (ACF) until ACF moves toward zero. Then after calculating'd' one need to examine a stationary series Δ d Zt, along with its auto correlation function and partial autocorrelation to determine the values of p and q. Since a large number of factors affects the water level and water quality and usually shows non linear relation with the variables; so traditional data processing methods are not good enough (Xiang et al. 2006). Therefore, ANN approach has advantages over semi-empirical models, since they require known input data set without any assumptions and develops a mapping of the input and output variables, which can subsequently be used to predict desired output as a function of suitable inputs (Schalkoff, 1992;Gardner and Dorling, 1998). Multilayer Perceptron Neural Network (MLPNN) can approximate any smooth, measurable function between input and output vectors by selecting a suitable set of connecting weights and transfer functions (Singh et al. 2009). The ANN model has three or more layers: the input layer where the data are introduced to the model and computation of the weighted sum of the input is performed, the hidden layer or layers where data are processed, and the output layer, where the results of ANN are produced. Each layer consists of one or more basic element called a neuron or a node. The signal passing through the node is modified by weights and transfer functions. Each node in the input and inner layers receives input values, process it, and passes it to the next layer. This process is repeated until the output layer is reached (Govindaraju 2000). The number of neurons in the input, hidden and output layers depends on the problem, if number of hidden neurons is small, the network may not have sufficient degrees of freedom to learn the process correctly however, if the number is too high, the training will take longer time and the network may over-fit the data (Karunanithi et al. 1994). Implementation of these algorithms in neuralnet r language package were utilized to develop different models of neural networks. Fig. 5 shows the typical neural architecture with To determine optimum percentage of data between training and validation, models with 55% to 85% randomly chosen data were used and k-fold cross-validation method was used to validate the model. For each percentage of training sample varying from 55% to 85% in increment of 1%, total dataset was partitioned into 100 randomly allotted training and testing samples and ANN model was developed for each such set. This ensures that every data point gets a chance to be in test set and training set, thus this method reduces the dependence of performance on test-training split and reduces the variance of performance metrics. Mean, maximum, minimum and standard deviation of r 2 for all 100 models were determined for each % or training set and plotted as shown in Fig. 6. It can be seen that beyond 70%, standard deviation is continuously rising indicating over training. Therefore 70% was used as optimum training percentage.

Model Performance evaluation
The performance of the applied models can be assessed by several statistical error measures.
The root mean square error (RMSE), mean absolute error (MAE) and Nash-Sutcliffe coefficient of efficiency (NSE) were used to provide an indication of goodness of fit between the observed and predicted values. Expressions of these error parameters are given as follows: where N = total number of observations in the data set. NSE determines the relative magnitude of the residual variance ("noise") compared to the measured data variance ("information") (Nash and Sutcliffe 1970). NSE indicates how well the plot of observed versus simulated data fits the 1:1 line and ranges from negative infinity to 1. The model is deemed perfect when NSE is greater than 0.75, satisfactory when NSE is between 0.36 and 0.75, and unsatisfactory when NSE is smaller than 0.36 (Krause et al. 2005). Another important indicator is percent bias In order to evaluate optimum model out of multiple models an important criteria used is Akaike information criteria (AIC). AIC estimates the sample prediction error and therefore evaluates relative quality of different models for same set of data. Thus, AIC value could be used for selection of optimum model out of multiple models. AIC is defined as (Poeter and Hill, 2007;Zhou and Herath, 2017) = ( 2 ) + 2 (13) Where n is the number of observations, k is the number of model parameters, I is the weight for the i th observation and , ′ are measured and model calculated observation parameter, respectively. AIC values were used to find optimum ANN models out of multiple ANN models developed based on variations in input, structure and algorithm of ANN models.

Water Quality Index (WQI) estimation
Water Quality Index (WQI) is a very useful and efficient method for assessing the overall quality of water and to evaluate the suitability of the groundwater for drinking purposes (Abassi 1999;Asadi et al. 2007). For computing WQI, each of the four parameters has been assigned a weight (wi) according to its relative importance in the overall quality of water for drinking purposes (Ramakrishnaiah et al. 2009) and the relative weight (Wi) is computed from the following equation: where, Wi = relative weight, wi = weight of each parameter, and n = number of parameters.
Then quality rating scale (qi) is calculated for each parameter using following equation: where qi = quality rating for i th parameter, Ci = concentration of each chemical parameter in each water sample in mg/l, and Si = drinking water standard for each chemical parameter in mg/l according to the guidelines of World Health Organisation (WHO 2017). SI is first determined for each chemical parameter, which is then used to determine the WQI as per the     (Fig. 8) infer that spatial distribution of observed and simulated groundwater level values obtained from ANN15 is capable of close-fitting the hidden relationships in the time series datasets. Therefore, ANN15 was used to forecast the groundwater level in the study area.
It is evident from spatial distribution (Fig. 8) that depth to water level varied upto 11mbgl in Karwa (23.37mbgl).

Water Level Scenario (2012-2024)
In present study, water table rise was evaluated during 2012 to 2024 by comparing two years data simultaneously. The spatial and temporal changes in water level rise/fall (Fig. 9)  Daloowala (-9.4m), and Chandawas (-9.69m). It is clearly evident that till 2024 the groundwater level will drop more and more quickly as not a single village shows >4m rise in Phagi tehsil.

Groundwater quality prediction using ANN model
The spatial distribution (Fig.10a) (2017) for drinking purposes. The groundwater with 500-1000 mg/l TDS is permissible for drinking, TDS <3000 mg/l useful for irrigation; however >3000 mg/l TDS is unfit for drinking and irrigation both (Davis and De Wiest 1966). Results The results of fluoride concentration (Fig.10c) reveal that fluoride ranged from 2.09 to 3.15mg/l in Phagi tehsil in 2019. The permissible limit for fluoride content is 1-1.5 mg/l (WHO 2017) and spatial distribution results indicate that entire Phagi tehsil is affected with high concentration of fluoride. The main source of fluoride in groundwater is attributed to leaching from fluoride rich rocks, semi-arid climate, long-term irrigation processes and long residence time of groundwater (Srinivasamoorthy et al., 2008). High fluoride >1.2 mg/l results in dental fluorosis however fluoride >2mg/l initiate mottling (Singh et al. 2012). The observed data (2019) reveal that more than 3.0mg/l fluoride has been found in Khera Hanumanji (3.15mg/l), Jharana Khurd (3.05mg/l), Madhorajpura (3.04mg/l), Mandaliya Joga (3.07mg/l) and Mohabbatpura (3.02mg/l). However ANNsimulated data of 2019 reveal >3.0mg/l fluoride concentration in 12 villages of Phagi tehsil (Fig. 7c). The ANN predicted fluoride concentration for 2024 reveal that 47% of Phagi tehsil will be affected with >3.0mg/l fluoride especially in eastern, central and northern villages of the study area. The adverse effects due to consumption of fluoridated water are inevitably experienced by villagers in Phagi tehsil hence defluoridation of drinking water is requisite in the study area.
The groundwater data of Phagi tehsil shows that Madanpura@katariya ka bas village has maximum nitrate concentration (135.81mg/l) while Barh Ramchandrapura had minimum (11.9 mg/l) in 2019. The observed and ANNsimulated results (Fig. 10d) for 2019 reveal that nitrate content is higher than the maximum permissible limit recommended by WHO (WHO 2017) in northern and eastern villages of Phagi tehsil. Excess of nitrate is dangerous for infants below six months age and when the concentration of nitrate ion exceeds 45-50 mg/l, it causes methamoglobinemia in children (Sharma et. al. 2015). The higher content of nitrate (>120mg/l) will occur in Ladana (135.59mg/l), Thala (133.76mg/l), Chittora (133.05mg/l), Bhojpura

Water Quality Index
The Water quality index (WQI) was developed to evaluate the suitability of groundwater for drinking purpose. The spatial and temporal changes (Fig. 11)  for the year 2024 reveals that 74% villages of Phagi tehsil will have very poor groundwater quality with WQI ranging in between 200-300 and water cannot be used for domestic purposes.

Conclusions
In general, the spatial and temporal results conclude that the Phagi tehsil is facing the problem of declining groundwater table along with deterioration in groundwater quality and posing a threat to public health. The groundwater risk villages were identified with the use of GIS and advanced modeling techniques for the pre-monsoon season. Different time series forecasting models and ANN models were compared to determine optimum model for prediction of the future scenarios of groundwater quality and quantity in 171 villages of Phagi tehsil. Results of the groundwater table shows the decline of >6.0 m especially in northern villages of Phagi tehsil during the study period (2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023)(2024), which clearly indicates the excess utilization of groundwater for irrigation and domestic purposes. Hence, it is suggested that reduce the groundwater pumping and implement groundwater abstraction policy in high risk villages of the study area. Apart from this, the percolation tanks or farm ponds may be constructed to increase the natural recharge of rain water during monsoon period. Different groundwater quality parameters like TDS, chloride and fluoride indicated that groundwater is not suitable for drinking purpose in most of the villages of Phagi tehsil. The observed and ANNsimulated results for 2019 reveal high nitrate content in north-eastern villages however, ANNpredicted spatial distribution of nitrate for 2024 indicates that 58% villages of Phagi tehsil will exceed the maximum permissible limit for drinking. The WQI specifies very poor to unsuitable groundwater quality in 74% villages of the study area in 2024 indicating water cannot be used for domestic purposes. Thus, the spatio-temporal maps suggests the necessity of groundwater management with people's participation for more effective implementation of a mitigation strategy in Phagi tehsil at village level.