SD methodology statistically links coarse resolution predictor variables with the fine resolution predictand variable. This study aims to propose computationally efficient and reliable SD methodology to obtain realistic projection of ISMR. We obtain projections of ISMR at 0.25° resolution using a Convolutional Long Short Term Memory (ConvLSTM) and Shared ConvLSTM modelling framework. The results of obtained projections are viewed in light of those from two most recently used approaches non-parametric Kernel Regression (KR) [Salvi et al., 2013] and LSTM [Misra et al., 2017]. Brief description of KR and LSTM methodologies is provided in Supplementary sections S1 and S2 respectively. This section provides an overview of modelling framework and data pre-processing followed by the steps for developing a standard ConvLSTM model as well as shared ConvLSTM model.

## 3.1 Statistical downscaling framework and data pre-processing

The overview of the ConvLSTM based SD methodology is represented in Fig. 1. The basic steps of the ConvLSTM-based SD model are as follows: Preparation of the predictor dataset, and application of ConvLSTM-model. The model is trained with the help of NCEP/NCAR reanalysis dataset of time period 1951–2000, and is validated using data of time period 2001–2013. The training time period of 50 years is considered long enough to establish the correctness of a SD model. The model trained and tested using NCEP/NCAR reanalysis dataset, is applied to the GCM data. The model is firstly applied to historical GCM data of the time period 1969–2005, to check correctness of the developed model application for GCM dataset. The model validated for GCM historical data is thereafter applied to GCM near future data of the time period 2030–2070 and far future 2070–2100 to obtain rainfall projections under changing climate. The GCM data depicts systematic variation with respect to the climate reanalysis data, known as bias. Bias in GCM data may result in uncertainty in model predictions. Therefore, it is essential to bias correct the GCM before it is taken as input to the model. The bias correction of GCM predictors is carried out prior to statistical downscaling, to reduce systematic biases in the mean and variance of various GCM predictors. The bias correction is carried out with respect to that of NCEP/NCAR predictors. Supplementary section S3 presents description of the quantile based bias correction method for GCM predictors (Li et al., 2010).

To account for highly variable climate patterns of the Indian landmass, the downscaling is performed independently for the seven climatologically homogeneous zones of the country (Parthasarathy et al. 1996). The extent of these regions are illustrated in Fig. 2a. The spatial extents for the NCEP/NCAR predictor variables used for predicting rainfall in these regions are shown with the boxes bounding over the rainfall zones (Fig. 2b, Table 1). The geographical extent of predictor region is selected following Salvi et al. (2013). Apart from the climate data representing the large scale circulation over a greater region around the selected region (zone), lag1 (previous day) rainfall of the zone is also included in predictor dataset. The ConvLSTM model mandates the data input in the form of regular shaped boundaries (square/rectangular box). Therefore the smallest square bounding box covering all the grid points in a particular zone is selected for training the ConvLSTM model.

The selected NCEP/NCAR variables, namely mean sea level pressure (MSLP), specific humidity (SHUM), horizontal component of wind velocity (UWIND) and vertical component of wind velocity (VWIND) have different numerical ranges. Therefore it is difficult to utilize them together as a combined predictor dataset. The process of standardization of predictor dataset is applied to normalize the variables to the range [0, 1]. The standard deviation and mean of predictor variables is computed with the help of daily dataset of respective predictor variable for monsoon time period (JJAS) for the defined time period. Large-scale climate predictor variables are standardized by subtracting the mean and dividing by the standard deviation of the respective variable taken over the predefined time period. In order to unify the grid size (geographical extent) of grids within the complete predictor dataset, the NCEP/NCAR predictor dataset having a geographical resolution of 2.50 is interpolated to grid size 0.250, to match the size of the rainfall data. The interpolation of NCEP/NCAR predictor dataset is performed using the bilinear interpolation methodology. NCEP/NCAR data and gridded lag-1 rainfall data are concatenated to form the input. For example, for the central region, the size of the gridded precipitation data is 48x48. Size of NCEP/NCAR data is 8x8x5 (for 5 predictor variables) which is interpolated to 48x48x5. Hence, the size of the final input to the model is 48x48x6. To ensure that the interpolated variables do not lose any information and maintain spatial structure of the variables, we check the statistical properties of selected variables at each grid point. Figure 3 presents mean value of NCEP/NCAR predictor variables before and after bias correction.

The traditional multisite downscaling models typically perform downscaling on a single homogeneous rainfall zone, predicting rainfall at one grid point in a single model run. In contrast to this, the proposed ConvLSTM based modelling framework projects rainfall for all the grids in a zone with a single model run. Further to this, the study proposes a shared ConvLSTM model with a novel modelling framework that shares a ConvLSTM model across multiple neighbouring regions. The shared ConvLSTM model captures the similarity in rainfall patterns of the neighbouring regions and customizes it to individual grid points. This downscaling approach provides a single end-to-end supervised model for predicting the future precipitation series for entire India. It captures the regional variability in rainfall better than a region wise trained model. The following sub section provides description and functioning of the Conv-LSTM model applied to the pre processed predictor dataset. Importantly, this model do not need specific feature engineering and can learn the features themselves. Moreover, ConvLSTMs can take three dimensional inputs. The dimensionality reduction techniques like Principal Component Analysis (PCA), reduces the dimension of predictor data, however this may cause loss of few important features. The proposed methodology completely omits the commonly mandated step of dimensionality reduction and hence preserves the complete information of the predictor dataset.

## 3.2 ConvLSTM model

LSTM is a Deep Learning (DL), Recurrent Neural Network (RNN) model proposed by Hochreiter and Schmidhuber. LSTM is designed to model temporal sequences with their long-range dependencies (Hochreiter and Schmidhuber, 1997). A standard Neural Network (NN) contains several simple, connected processors termed as neurons, each of them producing a series of real-valued activations. The input neurons are triggered with the help of sensors that perceive the environment, at the same time the other neurons are triggered with the help of weighted connections with previously activated neurons. The network may also function in a reverse order, that is, few of the neurons triggering actions that influence the environment. Recurrent Neural Networks (RNN) is special kind of neural networks designed to handle sequence dependence. The NN credit assignment or learning is referred as finding optimal weights that helps NN demonstrate the desired behavior. A simple traditional NN captures information in input very well fails to capture long term information dependency in input data. As the name suggests, LSTMs are specially designed to remember information for long time periods and avoid the problem of long-term dependency. Deep Learning (DL) in the context of NN is referred to assigning credit across long causal computational stages, where each stage governs a non-linear comprehensive activation of the network. LSTM is a widely used RNN that is proven to work well on a large range of problems such as: language modelling, image captioning, speech recognition, translation, etc.

LSTMs functions with a chain like structure, same as a traditional RNN, except that the repeating module has four interactive network layers in place of a single NN layer. LSTM removes or adds information to the cell state through a carefully controlled structure termed as gates. Gates are composed of a point wise multiplication operation and sigmoid neural net layer. LSTM has three different sigmoidal gates to determine state of the cell. The sigmoid layer outputs numbers between zero and one, describing weight of the information retention, i.e. value zero means no retention while the value one means a complete retention.

The first sigmoid layer which decides on information that will be discarded is termed as the ‘forget gate layer’. For any time step (t) a forget gate layer considers hidden output at previous time step (t-1) denoted as ht−1 and current input data denoted as xt to output between 0 and 1 corresponding to each number in cell state Ct−1 given as :

ft = σ(Wxf * xt +Whf * ht−1 + bf )……………………………..… (1)

Here, W denotes the weight matrix, suffix x, f, t, h denotes variable, forget layer, time step and hidden output respectively.

The next step decides new information retention in the cell state. This is achieved in two different parts. First, a sigmoid layer termed as the ‘input gate layer’. This layer decides the values to be updated. Second, tanh layer. This layer creates a vector of new candidate values, Ĉt that could be added to the state.

it = σ(Wxi * xt +Whi * ht−1 + bi)……………………………..…. (2)

Ĉt = tanh(Wxcxt +Whc * ht−1 + bc) …………………………..…. (3)

The following step, combines these two input information to update the old cell state, Ct−1, to new cell state Ct. This is achieved by multiplying the old state by ft, discarding information and adding it * Ĉt to determine new scaled state of the cell at time step t.

Ct = ft * Ct−1 + it * Ĉt …………………………………….(6)

The output of an LSTM cell is determined based on cell state Ct. The filtered output is determined by running a sigmoid layer deciding parts of cell state that will be retained in the output. In order to achieve this, the cell state is set through tanh to confine the values to range from − 1 and 1 and multiply it by output of the sigmoid gate.

ot = σ(Wxo * xt +Who * ht−1 + bo)…………………………….…. (5)

ht = ot * tanh(ct) …………………………………….…..(7)

ConvLSTM is a variation of LSTM, that comprises a convolution operation inside LSTM cell. The fully connected RNN layers in a standard LSTM are replaced by convolutional layers in a convolutional LSTM block. The model, therefore consist of 2 basic units, convolutional layers and LSTM cells. These layers are tailored for spatiotemporal input-output based prediction problems, as here in case of rainfall projection. A basic unit of a convolutional LSTM cell is depicted in Fig. 4.

Convolutional LSTMs [Krizhvesky et al., 2012] can be used to model dependencies between a three dimensional input-output volume with temporal dependencies, which is the same as a rainfall projection problem. In this study, large scale climate circulation and lag 1 rainfall acts as predictors or the input volume and the gridded rainfall data is taken as the output volume. The ConvLSTM model is thus applied to all grid locations in the input and output at the same time. The network connections in a ConvLSTM, as shown in Fig. 5, are built in such a way that it captures the relationship between the target (rainfall variable) at a grid point and the input (predictor variables) at all the grid locations in the input volume. A high degree of autocorrelation is generally observed in the daily rainfall series of any region. This autocorrelation can improvise predictive ability of a downscaling model by a significant proportion. Capturing autocorrelation in daily rainfall series is climatologically important in daily rainfall prediction studies. The ConvLSTM model thus essentially captures the relationship between large scale predictors and local rainfall. It also captures the spatial correlation between both input predictors and target rainfall fields. Due to the backward network connections in the ConvLSTM model, (as illustrated in Fig. 4) the model automatically takes care of the autocorrelation in the rainfall series. Shi et al. [2015] used the ConvLSTM model for precipitation nowcasting, showing a high accuracy in precipitation prediction. This study demonstrates that model processes a noteworthy skill of capturing all these dependencies and the autocorrelation in the daily rainfall series.

## 3.3 ConvLSTM model training

A four layer Convolutional LSTM model is trained with the input dataset and the current day rainfall as output. This model, as depicted in Fig. 5, consists of four ConvLSTM layers. The first three layers are with 64 filters, each having size 5×5, whereas the last layer has 1 filter of size 1×1 which enables to get the output of desired volume (48×48×1 in case of central region). Here, it is essential to note that there are different combinations of hyperparameters for training the ConvLSTM model. The model training is performed using all possible combinations of 1 to 5 ConvLSTM layer, each layer with filter size 3×3, 5×5 and 7×7. The number of filters are chosen to be 4, 8, 16, 32, 64, 128 and learning rate is selected as 0.0001, 0.001, 0.01, 0.1. The number of iterations are chosen from 5 to 500 with a stride of 5. Following this, the best performing model identified based on the weighted mean squared error estimation.

The selection of optimization methodology plays a key role in training of deep learning models. The advanced optimization method, namely Adam (Kingma and Ba, 2015) is adopted especially for training very deep networks. Here, the Adam optimizer is selected, mainly as it is an optimized version of stochastic gradient. This optimization algorithm is widely used for finding minima for convex optimization problems. It is generally expected to reach the global minima in a certain number of iterations if the learning rate is carefully chosen. The model is trained with Adam optimizer with an initial learning rate of 10− 5.

The model is developed in Tensorflow (Abadi et al., 2016) using Keras library and Python on a NVIDIA Tesla P100-PCIE GPU. The rainfall is highly variable across the grid points in a region, therefore to appropriately capture both mean and extremes of the rainfall at every grid point in a region weighted mean squared error ConvLSTM loss function WNMSE is used. The WNMSE applies weighting of individual observed rainfall values at every grid in the training set. The rainfall data has a vastly skewed distribution with values greater than 30 mm per day covering approximately 5% of the entire training set over Indian region. The loss is summation of losses over all grid points within the zone for which prediction of the rainfall distribution is made. The training loss function for the ConvLSTM model is as follows:

$$\text{W}\text{N}\text{M}\text{S}{\text{E}}_{\text{C}\text{o}\text{n}\text{v}\text{L}\text{S}\text{T}\text{M}} = {\text{w}}_{\text{i}}\text{*}\sum _{\text{i}=1}^{\text{n}}{({\text{t}}_{\text{i}}-{\text{m}}_{\text{i}})}^{2}$$

8

………………………..……....

Where n is a subset of the gridded output data having size r×c, ti is the true value of target, and mi is mean value of the predictive distribution.

The following weighting scheme is used for model evaluation:

$${w}_{i}= \left\{\begin{array}{c}{t}_{i} if {t}_{i}\ge 30\\ 0 if<30\end{array}\right.$$

9

……………………..…………………….

The trained model is validated using the predictor dataset for the validation time period. Multiple efforts of model training and testing are carried out to obtain an optimum weight matrix. The trained model is applied to the GCM data to generate future rainfall projections. This modelling framework is applied independently for each rainfall zone. The basic architecture of the ConvLSTM models for each region is depicted in table 2.

Gridded rainfall bounding boxes are taken as square shaped so that the NCEP predictor input volumes can be easily interpolated to the shape of the gridded rainfall. The regridded data is concatenated to form the final ConvLSTM based downscaling model input. For example, for all India rainfall data, NCEP predictors are of size 15×15 and gridded rainfall is of size 129×135. So, here we need to make the rainfall data of a shape which the predictors can be easily interpolated to. Hence, we take a larger region based rainfall data of size 135×135. We could have instead padded the data with zeros instead of taking larger region. The experimental investigation revealed that taking lag-1 rainfall from a larger region as input in the loss function worked better than the zero padding approach.

## 3.4 Shared ConvLSTM modeling framework

The previous works in literature proposed region based downscaling methodologies for larger regions of heterogeneous rainfall patterns like India by dividing the region into homogeneous zones. This study also applied the same design with the previous models. NN have been shown to benefit from sharing of related data and multitasking. Therefore, this study considers proposing a model, termed as shared ConvLSTM. The shared ConvLSTM model tries to explore whether the rainfall zones and the corresponding predictor data taken as a whole improve or worsen the results. This modelling framework considers gridded rainfall dataset of size 129X135 and a spatial resolution of 0.25°×0.25°. In order to ease training, we convert it to size 135×135 by padding with zeros. The zeros in the region outside Indian landmass are ignored during training by the use of the weighted loss function WConvLSTM. The NCEP/NCAR dataset size for entire India which we are using is 15×15 at a resolution of 2.5°×2.5°. We interpolate it to size 135×135 using bilinear interpolation as discussed above. The basic architecture of this model is same as that presented in Fig. 5. The basic steps are similar to the ConvLSTM model with the differences: gridded precipitation data with padding is used as the predictor and the interpolated NCEP/NCAR data is used as predictand for training the model. NCEP/NCAR data and gridded lag-1 rainfall data are concatenated to form the final input to the model. The four layer ConvLSTM model is trained with this input and current day rainfall as output. This model is trained using Adam optimizer with an initial learning rate of 10− 5 and the same WNMSE ConvLSTM loss function as discussed above. The trained model is then validated using NCEP/NCAR data and rainfall data for the validation period. Bias correction of GCM data is performed followed by validating the model for CanESM2 GCM data for the historical time period. The future rainfall predictions are generated using CanESM2 GCM data for the future time periods.