4.1 Data processing
There are basically three sets of approach that can be considered for GWL fluctuation prediction using spatio-temporal/time-series models, such as; taking GWL as an input component, taking rainfall as an input component for GWL prediction, and direct and recursive prediction strategies using hydrological components (Yoon et al., 2016). In the GWL perspective, several hydrological parameters have been considered for the study that actively influences the GW scenario. For instance, GLDAS simulation products have been used to estimate the soil moisture (SM) equivalent and surface water (SW) equivalent from NASA’s archive (Rodell et al. 2004). Surface water equivalents represent the amount of water stored in surface water bodies (e.g., rivers, lakes) within the model's grid cells. This information is also influenced by precipitation, and other hydrological processes simulated by the LSMs. Integrating GLDAS soil moisture data into hydrological models can help in understanding how soil moisture influences processes like runoff, evapotranspiration, and groundwater recharge. In India, particularly the northwestern subcontinents do not fall under the snowfall zone (except the minor sections of the Himalayas) and are negligible among other hydrological factors. Therefore, snow water equivalent has not been considered for this study (Bhanja & Mukherjee 2019). Surface runoff has been derived from GLDAS, which acts as an alternative for SW equivalent (as measurements are not available for NW India) (Asoka et al. 2017). Noah-based land surface model (LSM) has been implemented for the calculation of SM and SW (Koren et al. 1999). Noah LSM models the movement of water within the land surface, including precipitation, infiltration into the soil, runoff generation, groundwater recharge, and evapotranspiration. The model calculates soil moisture at multiple soil layers, representing the vertical profile of soil properties. It considers processes such as infiltration, percolation, and capillary rise to estimate soil moisture content. In addition to soil moisture, Noah LSM accounts for surface water processes. This includes the calculation of surface runoff, which is the water that flows over the land surface in response to excess precipitation and may contribute to streamflow.
Time series anomalies (Δ) of hydrological components have been used after filtering the all-time mean from the individual datasets. Space-borne GRACE gravity anomalies are being used worldwide for calculating TWS change, especially for the regions with limited or very few observations well. Seasonality in hydrological modeling has been refers to the variations in hydrological processes (i.e., precipitation, et, and rainfall) that occur in response to changing seasons throughout the year. It is a crucial aspect to consider in hydrological modeling because it directly affects the availability of water resources and can have significant implications for various applications like water resource management etc.
All these aforementioned parameters have been collected from various sources (see section 3), containing the cyclic data for different years (i.e., 2005–2017). To maintain the temporal continuity of the datasets, all the monthly data were collated for the mentioned duration and rescaled to similar spatial resolution. In general, the spatio-temporal model requires the homogeneous size and rectangular shape of the input feature for training and testing. Therefore, to begin with the processing steps, all the images were initially masked from the raw data using the shape-file of the NWI region using ArcGIS 10.5 software. Thereafter, all the images were resized to a quadrangular shape of 44×64 pixels. The rectangular input has several benefit such as it preserves the spatial arrangement of feature, several operations like pooling, down sampling will relatively easy to operate and memory efficient. In addition, the setup demonstrates the design to predict the difference in the GWL based on the different factor and cycle (w.r.t the monsoon season). Input features of the images were stacked w.r.t cycle (see Fig. 4), and further restacked w.r.t factor. Consequently, the resultant matrix (or features) becomes 44 × 64 × k × z, where k and z are the cycle and factor affecting the end result, respectively.
After arranging the data, the internal (or brightness) values need to be calibrated. The resultant matrices of each selected hydrological layer were used for further processing to best fit the deep learning model. The choice and amount of dataset are significant aspects of any deep learning method. The size of the training dataset is limited, therefore, the diversity of the datasets has been increased by several transformations using data augmentation, including random flipping, colour enhancement rotation, zooming, etc. The processing steps are summarised in Fig. 4.
In the deep learning approach, the internal parameters (such as weights and biases) ranges between [0,1], and input features can vary differently (see Table 1). These mismatched ranges between input feature and internal parameter cause abnormal gradient behaviours during the training and lead toward poor model performance (Soni et al. 2020). In order to calibrate the values, min-max normalisation has been used with the following function:
$$\underset{\_}{x}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$
1
Where, \({x}_{min}\) and \({x}_{max}\) are the maximum and minimum value within the same feature vector, x is the pixel value in the feature map and \(\underset{\_}{x}\) is the normalised pixel value.
4.2 Model parameter
In the previous work, spatial and temporal information was handled differently by the conventional approach. Even in the deep learning model, it has been addressed separately by convolution neural networks and LSTM. Designing a model with separate modalities could not incorporate (or take advantage) the spatial and temporal properties while learning the parameter in the model. The CNN architecture is neural network especially deal with image information and extract feature from the input image by applying the several operations (or layer) and develop the correlation between spatial feature in the low-level feature (i.e edges). The layers include convolution layer, pooling layer, etc. to enhance the quality of extracted feature, however, it doesn’t provide the adequate operation to extract the information from an image in case of temporal changes. The temporal information has been handled by different DL architecture such as LSTM. The LSTM is a type of recurrent neural network (RNN) architecture designed to overcome the limitations of traditional RNNs in capturing long-range dependencies in sequential (or temporal) data. At its core, LSTM uses hidden cores (or layer) to remember the sequence and it pattern. Therefore, we have addressed these issues for the prediction of groundwater level using the composite module CNN and LSTM, known as ConvLSTM (Xingjian et al. 2015). There are some challenges and difficulties associated with their integration such as vanishing gradient, overfitting, etc, which were carefully addressed during the parameter setting and pre-processing (see section 4.1).
ConvLSTM extracts the feature information using the convolution operation in input to state and state to state transition (shown in Fig. 5). In the preliminary operation, the input feature should comprise of 5 dimensions: sample size, row, column, channel, timestep (i.e. S×44 × 64 × k × z, where S is the sample size). ConvLSTM layer produces the output of a similar size to the input image. The internal structure of ConvLSTM is shown in Fig. 5, with interoperation is expressed below:
\({i}_{t}=\sigma ({W}_{xi}*{X}_{t}+{W}_{hi}*{H}_{t-1}+{W}_{ci}o{C}_{t-1}+{b}_{i})\) | (2) |
\({f}_{t}=\sigma ({W}_{xf}*{X}_{t}+{W}_{hf}*{H}_{t-1}+{W}_{cf}o{C}_{t-1}+{b}_{f})\) | (3) |
\({C}_{t}=tanhtanh ( {W}_{xc}*{W}_{t}+{W}_{hc}*{H}_{t-1}+{b}_{c})\) | (4) |
\({C}_{t}={f}_{t}o{C}_{t-1}+{i}_{t}o{C}_{t}\) | (5) |
\({o}_{t}=\sigma ({W}_{x0}*{X}_{t}+{W}_{ho}*{H}_{t-1}+{W}_{co}o{C}_{t}+{b}_{o})\) | (6) |
\({H}_{t}={o}_{t}\left({C}_{t}\right)\) | (7) |
Where Xt denotes the input feature of the current cell, Ct-1 and Ht-1 are the state of output feature of the previous cell. i is input gate, f is forget gate, o is output gate, c is cell state, h is hidden state, W is weight, ⌠ is activation function (i.e. Relu), o means Hadamard product and * is the convolution operation.
The network architecture is designed in encoder-decoder fashion (Sari et al. 2020). It consists of two major parts: 1. Encoder part; 2. Decoder part. The encoder part consists of three sets of ConvLSTM, batch normalisation, and maxpooling operation. This part extracts the spatial feature and adjusts the parameter based on the temporal feature. In each set of operations, the size of the feature is downsampled to enhance the field of view. The decoder part is similar to the encoder part, where upsampling operation is used instead of maxpooling operation. This part re-generate the feature using upsampling operation and spatial feature extracted from the other part. This repetitive upsampling recovers the spatial properties (such as, height and width) similar to the input feature.
The architecture was trained from the scratch using the stochastic gradient descent (SGD) optimizer with learning rate of 0.001 and momentum of 0.9. The iteration contains only 10 samples (and 5x samples using augmentation) with different time-series. The loss function is based on the root mean square error (RMSE) and mean square error (MAE), which evaluate the training and testing performance of the network (see section 4.3). To avoid the overfitting during the training, validation (equivalent to testing) set has also been used to consistently monitor the model performance. In addition, early stopping function with patience of 5 was used, which eventually check five continuous iterations and its improvement.
4.3 Evaluation metrics
This section explains the metric that has been used to perform quantitative analysis between the models. All the models follow the regression approach, therefore, root mean square error and mean absolute error has been used.
Root means square error is the standard deviation of the prediction errors. The error is defined as the distance between the actual and prediction values. The mathematical expression is shown in Eq. 8:
$$RMSE=\sqrt{\frac{{\sum }_{i=0}^{n}{({x}_{i}-{\widehat{x}}_{i})}^{2}}{N}}$$
8
Means Absolute Error is an arithmetic means between absolute error produced by the model. The absolute error is the difference between actual and prediction values. The mathematical expression is shown in Eq. 9.
$$MAE=\sqrt{\frac{{\sum }_{i=0}^{n}\left|{x}_{i}-{\widehat{x}}_{i}\right|}{N}}$$
9
Where, \({x}_{i}\) and \(\widehat{{x}_{i}}\) are the actual and prediction observation of the data point i, respectively. N denotes total number of data points.