Groundwater level prediction in Visakhapatnam district, Andhra Pradesh, India using Bayesian Neural Networks

Groundwater level and rainfall measurements from 37 borewells in the Visakhapatnam district, Andhra Pradesh, India from 2002 to 2021 were analysed using Bayesian Neural Networks (BNN) to comprehend the predictability. We found chaotic dynamics in the groundwater and rainfall data, but a dominant trend component was seen in the groundwater from phase plots. Dynamics suggest the presence of self-organized criticality/chaos in the groundwater changes over decadal time scales. We used BNN prediction models (i) Non-linear Autoregressive (NAR) (ii) Non-linear Input Output and (NIO) (iii) Non-linear Autoregressive Exogenic Input (NARX) to predict the groundwater level changes with rainfall as an exogenic input. We noticed ~ 94 to 95% prediction accuracy with the NAR model with optimal inputs and ~ 1% improvement with added exogenic input. Interestingly, the study indicates that the (i) dynamics of the groundwater differ signi�cantly from rainfall and temperature in the region (ii) the Non-Linear Autoregressive Model considered based on the self-organized dynamics of groundwater level changes is robust in providing prediction accuracy up to ~ 95% (iii) dynamics of rest of the 5% may be due to the presence of extreme events, whose dynamics are closely related to random processes of the changes attributed to randomly varying manmade and weather changes.


Introduction
The state of self-organized criticality is commonly found in many geophysical data sets.The groundwater variations controlled by the changes in precipitation, runoff and man-made exploitation etc., captures the characteristics of local ow pattern and hydrogeology, show a self-organized critical behavior, that may be useful to provide prediction by understanding its past variability.The development of the human race and livelihood on the earth depends on the availability of water resources.Apart from surface water resources, Groundwater is a prominent source of freshwater that serves domestic, agricultural, and industrial needs.A signi cant increase in water usage is thought to be the reason for the changes in the natural ow regimes and groundwater depletion (Alley et Gleeson et al. 2012).Despite several forcing factors, a broad understanding of the natural and anthropogenic in uences is useful in assessing its future state (Bonacci, 2004;Panda et al. 2007 Mathews et al. 2022).The underlying classi cation relies on the fact that the natural variability is mainly due to natural forces like rainfall and runoff and groundwater pumping, land excavation, and barrier construction, etc are considered anthropogenic in uences.Although global warming was believed as the major factor in changing the hydrological system over the globe, an in-depth understanding of the independent role of natural and anthropogenic activities in the observed changes in the groundwater system is essential.Particularly, information on the dynamics of the natural and anthropogenic forces present in terrestrial water storage is crucial for understanding, modeling and future prediction of groundwater variability.The demand for groundwater increases with the increase in population and urbanization, especially in the coastal region where industrial development is rapid in the past few decades.Hence continuous monitoring of groundwater trends from the groundwater level data acquired over dense stations along with modeling using robust and optimal techniques help to plan the safe and sustainable usage of groundwater reserves.
Firstly, the physics-based numerical models using governing equations of groundwater ow help to provide site-speci c hydrological estimates with the help of software packages like MODFLOW, FEFLOW, and HydroGeoSphere (HGS) (Trefry and Muffels 2007;Wang et al. 2008;Brunner and Simmons 2012).
But, predicting changes in groundwater levels using such physics-based modeling is a non-linear problem, particularly when considering the natural and anthropogenic changes at local scales, which require voluminous station-speci c data.Therefore, researchers developed time series analysis-based approaches like auto-regressive integrated moving averages (ARIMA) (Bidwell, 2005) Barzegar et al. 2017) etc., to model complex groundwater problems.Although the ANNs are not physicsbased models, the optimal selection of the input data and mathematical model considering the physics and dynamics of the process under modeling helps to develop a logical model.The objective of the study is to understand the nature of dynamics in the groundwater time series, develop a predictive model of groundwater levels in the Visakhapatnam district of Andhra Pradesh using groundwater level and rainfall data at 37 borehole sites and Nonlinear Autoregressive (NAR) and NAR with exogenic input (NARX) Neural Network models.The concept of return maps of groundwater level, rainfall and temperature data is used to understand and compare the dynamics of the processes.

Data:
Borehole measurements of Ground Water Level (GWL) data along with the in-situ rainfall data from 37 sites shown in Fig. 1 are used in this study.The GWL, Rainfall is collected from Andhra Pradesh Groundwater Board for the period 2002-2021.In addition, we also used the temperature data of Visakhapatnam obtained from the global temperature data-set with 0.25 0 ×0.25 0 spatial resolution and 30 days temporal resolution obtained from Terra Climate high-resolution online repository for the period April 2002 to December 2021 ( Abatzoglou et al. 2018).The Land Use Land Cover (LULC) time series at the 37 sites have been downloaded and extracted from Google Earth Engine Global Land Cover and Land Use Change, 2000-2020 | GLAD (umd.edu).This data set provides annual changes in forest extent and height, cropland, built-up lands, surface water, and perennial snow and ice extent from the year 2000 to 2020 at 30m spatial resolution (Potapov et al, 2022).The annual data was interpolated to get the monthly LULC data.

Methods:
2.2.1.Phase Plots/Return Map: The Phase Plots/Return Maps help to identify the dynamics of underlying forces that encapsulated the systems (Packard et al. 1980;Takens, 1981;Broomhead and King, 1986;Tiwari, 2005).The return map is a map depicting the relationship x (n + 1) = f(x (n)), where, x(n) and x(n + 1) are two states of the systemseparated by unit sampling time.The study of dynamics aims to understand how systems change over time (Morrison, 1991).Therefore, the analysis of dynamics unveils the in uence of various forces on the system behavior between the de ned time boundaries and hence gives insights into the stability, orderliness and predictability (Barton, 1994;Tiwari, 2005).

Neural Network Time Series Model Prediction :
Time series prediction is a type of dynamic ltering that uses the past value of the time series to provide a future prediction (Tiwari and Rajesh, 2021).Depending on the temporal range of the future predictability, the methods are classi ed as the short-term or long-range prediction.Unlike the regressionbased methods, arti cial neural networks (ANN) based prediction doesn't have a prerequisite on dynamical nature (such as trends and seasonal variations etc.) and statistical distribution and hence is a data-driven prediction (Burke, 1991;Maier and Dandy, 1996).Dynamic neural networks use the tapped delay lines for nonlinear ltering and prediction.We applied three basic models (i) Non-linear Autoregressive (NAR) (ii) Non-linear Input Output (NIO) and (ii) Non-linear Autoregressive with Exogenic Input (NARX) models to analyze the predictability of the groundwater level data using the rainfall as an exogenic input.It is worth mentioning here that the land use land cover (LULC) is a crucial parameter that impacts groundwater recharge, the analysis of LULC for the study region shows only very minimal change during the study period (See additional information le).Therefore we considered the LULC as a constant in the modeling.The second important parameter that in uences groundwater recharge is the type of reservoir, however, as long as the decadal data sets are considered, the reservoir setting (rock and soil types and slope ) is also well-thought-out as constant.

Dynamical Analysis of Water Level and Rainfall data:
We analyze the dynamics of the GWL, Rainfall and Temperature datasets by using the return maps of the data.The return maps of the three datasets at delays (τ) 1 and 2 months are shown in Fig. 3.We can see that the three processes are showing different dynamics.The GWL data shows clearly the dynamics of a linear trend with quasi-cyclic oscillations with a little random component.The rainfall data shows chaotic dynamics with some degree of randomness.The dynamics of the temperature data indicate the presence of cyclic processes with some degree of chaotic nature.It is well known that the temperature on local to global scales exhibits cyclic changes ranging from diurnal to millennial scales (Tiwari, 2005).The complicated dynamics of temperature may be caused by the existence of several cyclic components with anthropogenically induced phase and frequency variations.The comparison of the phase plots of the three data sets with the phase plots of Chaotic, Stochastic and Random processes given by Tiwari, 2005 suggests strong chaotic dynamics in the rainfall data, strong cyclic components in the temperature data and a mixture of chaotic, cyclic and mild random behavior with trend components in the groundwater data.As it can be understood that the randomness in any dynamic process cannot be predicted by any kind of model, the uncertainty in the prediction of a process is always linked with the percentage of randomness in the data.However, we can assess the randomness in the following prediction analysis using ANN from the departure of prediction accuracy from 100%.

Modeling the Groundwater level data using the NAR model with Bayesian Regularisation:
In the rst step, we use nonlinear auto-regressive model, i.e., the model in which the present state of a dynamical process depends on the past dynamics of the system, to predict the groundwater level variation.The data from 36 borewells for the study period were used for training and testing the network.
After successfully achieving the best training state with minimum mean squared error (MSE), the network was tested for its performance/response using the data from the 37th borewell.The data were randomly grouped into 70%, 15%, and 15% respectively for training, testing, and validation for all the models used in this work.Figure 4a  and predicted groundwater levels) along with their difference (error) is shown in Fig. 5d.We can observe that the error or mismatch between the target and output is large during the large increase in the groundwater level during a short period.Such episodes, in general, possibly represent the sudden changes in groundwater levels due to heavy precipitation episodes during monsoons, cyclones, etc.
Further, the e cacy of the model was tested using data from Visakhapatnam Urban, which was not utilized in the neural network model's training and testing.Interestingly, the model prediction of the groundwater levels in this additional test is above 95% and we have also observed very low errors in the model prediction.In the densely populated Visakhapatnam Urban region, there are numerous industrial and construction projects underway.Therefore, groundwater levels are highly in uenced by anthropogenic activity i.e., by the associated randomness.However, the predictability of the groundwater levels from this site with 95% accuracy indicates that the possible randomness/unknown in uences are mainly limited to only 5% as predicting the process with random dynamics is not possible.The high predictability of the groundwater using the NAR network model suggests self-organized dynamics in the groundwater level variability.The self-organized nature of the groundwater variability may be linked with the inherent self-organized dynamics or such a self-organized pattern in its controlling/in uencing processes.Therefore, we looked at re ning the model with rainfall, as external input as it in uences the groundwater variability, in the following sections.

Modeling the Groundwater Level data using Non-linear
Autoregressive Network with Rainfall data as exogenic input: In addition to the above modeling approach that helps to capture the self-organized dynamics to predict future variability without any exogenic input, this section considers the rainfall to analyze its impact on the groundwater predictive model for the study area.As the relationship between groundwater level and rainfall is universal, we trained the network with rainfall as an exogenic input and groundwater level data from 36 sites of the study region to develop ANN model using Bayesian Regularisation with 5 neurons and a single delay state.Although we tried with even more neurons and delays, there is no noticeable  , 6c.Finally, the trained network was used to test its performance on the data from the 37th site (Visakhapatnam Urban).Figure 7 shows the response of the trained network to the inputs from the 37th site.We can observe that large-scale errors are observed during a large jump in the groundwater levels, which are possibly associated with extreme events like cyclones and associated oods.Overall, compared to the previously trained NAR model without any exogenic input, there is not much improvement in the prediction accuracies with the model trained with rainfall as exogenic input.

Discussion
The groundwater patterns in the Visakhapatnam district follows a non-linear change due to cumulative in uence of climate-induced cyclic variations, monotonic trends and human-induced anthropogenic changes (Mathews et al. 2022).Among these three forcing factors, the cyclic variations and monotonic trends together form a system with predictable self-organized dynamics (Tiwari, 2005)  Inference System(ANFIS) techniques for the predictive modeling of groundwater level changes.They used the data from individual boreholes for testing and discovered that the BNN technique was more robust than the other two in dealing with data prediction in the presence of noise.In the present study, we used BNN-based time series prediction models Nonlinear Autoregressive (NAR) and NAR with exogenic input (NARX) Neural Network models, as they are robust for comprehending complex, non-linear relationships and for model interaction effects, to develop a regional nueral network prediction model for Visakhapatnam district.We modeled the groundwater level data from 37 borehole measurements from the study area using the Nonlinear Autoregressive models with and without rainfall and temperature as exogenic inputs.Although the land use and land cover (LULC) in uences the groundwater recharge patterns, the LULC values in the study area showed almost negligible change during the study period.
Therefore, we discarded the usage of LULC as an exogenic in uencing factor in the modeling.Interestingly, our modeling suggests that the NAR model with 2 delay states appeared as the best model to capture the dynamics for predicting the groundwater variability in the study area.The NARX model using rainfall as an exogenic input to predict the groundwater level has no advantage over the NAR model.The regression R values of NAR and NARX models are ~ 0.94.In an additional test, the RMS errors of the NAR model on the data from the Visakhapatnam Urban site, which was not used in training the network, are smaller compared to those of the NARX model.It is imperative here to argue that the NAR model is capable of not only capturing the dynamics from the time series but also taking care of the inherent spatial changes in the hydrogeology that control the recharge patterns, otherwise, the model prediction might have not been possible for the Visakhapatnam Urban region as the data from this region is not used for the training or testing processes.Although we have used the rainfall and temperature data sets as exogenic inputs to the NARX model (Additional Information), the improvement in prediction is negligible (~ 1%).Simple nonlinear input in our model with rainfall and temperature as well rainfall alone as inputs and groundwater levels as output failed to give satisfactory results (See additional Figures).
The best result with this nonlinear input-output model suggests a prediction accuracy of less than 30%.

Conclusions
The monthly groundwater level changes along with rainfall data from 37 borehole sites and regional temperature time series from April 2002 to April 2021 in the Visakhapatnam district, Andra Pradesh, India have been analyzed in this study to understand the dynamics for possible predictions.The phase plot analysis of the data suggests a clear distinction between the dynamics of GWL, rainfall and temperature data sets.Unlike rainfall and temperature, the GWL data shows a clear monotonic trend with superimposed quasi-cyclic oscillations.Rainfall and temperature data sets show strong chaotic and periodic dynamics respectively.We used NAR and NARX models with bayesian regularisation algorithm.
The NAR model that relies on the past states of groundwater levels is able to predict the GWL changes with 94 to 95% accuracy.The NARX model that uses the past states of GWL and rainfall is able to predict the GWL changes with ~ 95% accuracy only.Thus the results suggest that the improvement in the prediction with rainfall as an additional input is negligible when compared with the accuracy of NAR model that uses the self-organized dynamics of GWL data from past states.Thus the study concludes that the (i) Non-Linear Autoregressive (NAR) Model based on the self-organized dynamics of groundwater level changes is robust in providing prediction accuracy up to ~ 95% (ii) the remaining 5% of changes may be due to the random dynamics associated with the extreme precipitation events, man-made and short term weather changes due to anthropogenic activities.Finally, the Neural Network developed using NAR model with bayesian regularization is robust for predicting the monthly groundwater level changes and is useful for the future management of groundwater resources.
shows the network model used in this example.The regression between the targets and outputs of the neural network during training and testing is shown in Fig. 4b.Both the training and testing phases have shown groundwater level prediction accuracy of up to 94%.The training converged with MSE 2.24 at 164th epoch with the best model performance.The performance of the neural network during the training and testing is shown in Fig. 4c.The comparison of targets and outputs (i.e., original improvement in the model.The network block diagram is shown in Fig. 6a.The training state reached its best performance at the 394th epoch.The training and testing regressions have shown R values of ~ 0.94.The regression between the input and targets of the model during the training, testing of the network and the associated network performances are shown in Figs.6b and anthropogenic changes cause random uctuations.The techniques based on Neural Network/ Arti cial Intelligence have been adopted for the modeling of non-linear and dynamic systems such as water resources systems(Maier and Dandy 2000).The availability of the voluminous data from observation wells and advancement in computational facilities made the Arti cial Intelligence/Neural Networks as an important framework for hydrological community for understanding the dynamics of groundwater(Tao et al. 2022).The feed-forward back propagation neural network trained with Levenberg-Marquardt (FFNN-LMB) algorithm was found to be effective in predicting the monthly groundwater levels (Sujatha andKumar, 2010, Daliakopoulos et  al. 2005; Chitsazan et al. 2013).Maiti and Tiwari (2014) used neural networks with scaled conjugate gradient, Bayesian Neural Networks (BNN) and Adaptive Neuro-Fuzzy

Figure 1 Map
Figure 1

Figure 2 The
Figure 2

Figure 3 Return
Figure 3

Figure 5 Testing
Figure 5