Effect of Climate Change on the Rainfall Trends: A case study of Pune, India

It is the need of the hour to predict the impact of climate change, especially rainfall on the future environmental conditions on local as well as global scales. The present work aims at studying the impact of climate change on the rainfall occurring over Pune, the eighth largest city in India. The General Circulation Models (GCMs) are predominantly used to obtain the climate data all over the globe, at various grid points, for past and future years. Rainfall values obtained from these grid points need to be downscaled to make them location specic. This study proposes a soft computing tool, Articial Neural Network (ANN) for the purpose of downscaling. The rainfall data at 4 grid points surrounding Pune, was extracted from 5 different GCMs and given as input to ANN with observed rainfall as output, thus forming 5 models. For comparison, a pre-existing downscaling technique, Distribution based scaling (DBS) was used. The coecient of correlation (r) showed that ANN was working better than DBS. The value of r for ANN was 0.73 for its least accurate model whereas DBS managed to reach 0.73 for its most accurate model. The future rainfall estimated with the help of the trained ANN models show an increase in mean rainfall over the Pune region by ∼ 2 – 15% and decrease in maximum rainfall by ∼ 40 – 65%. Peak prediction of rainfall simulated by ANN was not very accurate and hence there is still an opportunity for improvement which is the future scope of this study.


Introduction
The Earth's climate is constantly changing, due to the energy transfer cycle, from the sun to the earth's surface and back into the atmosphere. An observation can be drawn from the current scenario that the temperature of the earth is at a constant rise, leading to the increase in global warming. This phenomenon of the continual variations in the global climate is termed as climate change. The adverse effects if this changing climate can be observed over the global hydrological cycle. Certain areas, where the mean rainfall is forecasted to have a decreasing trend, are likely to face high intensity of extreme rainfall events (Semenov and Bengtsson 2002;Wilby and Wigley 2002). This increasing intensity of extreme rainfall events would majorly impact the intensity duration frequency relations (Kao and Ganguly 2011) and may result in ooding of urban areas (Mailhot et al. 2007). Thus, detailed risk analysis needs to be done to study these sudden trends which would prove helpful for design of infrastructures at urban localities.
This changing climate can be studied with the help General Circulation Models (GCMs). These models imitate physical processes in the atmosphere, ocean, cryosphere and land surface. They are one of the most advanced tools currently available for simulating the response of the global climate system to increase in greenhouse gas concentrations. GCMs give values of various climate variables along grid points. To obtain the accurate results, these values have to be downscaled so that they would represent a smaller and speci c study area. Various statistical and dynamic downscaling techniques have been a major focus of research since the last decade.
Several studies are being conducted worldwide to gain a better understanding of how these changing climatic conditions are going to impact the rainfall trends. For the sub-continent of India, while some researchers have inferred that the southwest monsoon seems to be weakening due to impact of climate change (Kripalani et al. 2003; Ueda et al. 2006, Stowasser et al. 2009; Sabade et al. 2011 andKrishnan et al. 2013), others say that due to global warming the monsoon rainfall over a broad region encircling South Asia is likely to increase in intensity (Lal et al. 2000;Meehl and Arblaster 2003;and Rupakumar et al. 2006). Rupakumar et al., (2006), have reported an increase in extreme rainfall along the west coast and western parts of central India. Rana et al., (2014) studied the effects of climate change on the precipitation taking place in Mumbai. They inferred that the intensity of rainfall is expected to increase in the future (2010 -2099). As per their research the average increase in maximum rainfall is approximately 15-20% in each 30-year time period (2010-2040, 2041-2070, 2071-2099) and about 30-45% in the 90-year period. Along with a seasonal shift and delayed onset of monsoon, the study also showed an increase in the total accumulated annual rainfall, ranging from 300 to 500 mm in the ensemble. The statistical downscaling approach termed Distribution-based Scaling (DBS) technique was applied in their work, which was then tested and applied to scale GCM data. The results of this scaling technique seemed to be potentially useful for impact studies. The delay in onset of monsoon and uctuations in intensity of rainfall in the Pune region of the state of Maharashtra, India, has given the authors', motivation to conduct this study to assess the impact of climate change on the rainfall intensity and pattern for this particular area.
Recently, numerous studies are being conducted in the eld of downscaling of GCM data using soft computing tools like Arti cial Neural Networks (ANN).
In the work done by Fistikoglu and Okkan (2011), the precipitation at three meteorological stations was investigated using ANN, which consisted of nine inputs selected as NCEP/NCAR parameters were used as input and the output was the monthly precipitation in the Tahtali basin, Turkey that needed to be downscaled. Levenberg-Marquardt (LM) algorithm was used for training the ANN with one hidden layer having 3 nodes. The results showed that ANN could successfully be used to downscale the grid-based NCEP/NCAR dataset. Chithra et al., (2014) worked with a similar methodology. In their work, a set of predictors chosen from the NCEP/NCAR reanalysis data were used as inputs to ANN models abd maximum and minimum temperatures, the Chaliyar river basin in Kerala, India were the outputs. Their results too concluded that ANN is a feasible option for downscaling climate data. Swain et al., (2017) combined ANN-predicted daily runoff with ANN-predicted 30-day cumulative runoff to improve ow estimates. Their conclusions were that that ANN performed well for representing 2007-2012 and the drier 1994-1997 periods and that ANN requires much less computational effort than a numerical model application, without compromising the accuracy of results. Abdullahi and Elkiran (2017) focused their work on forecasting the effect, climate change may have on reference evapotranspiration (ETo) for Girne and Larnaca regions of Cyprus for the next 3 decades (2017 -2050). A three-layer Feed Forward Back Propagation neural network (FFBP) trained by Levenberg-Marquardt (LM) optimization algorithm was used to predict evapotranspiration for the future. The study implemented two approaches; in the rst, the input was kept constant (6 causative input parameters) while changing the number of hidden neurons and in the second approach, the inputs were varied from 2 to 6 parameters and the hidden neurons used were double of the inputs. Their conclusion was that the e ciency of ANN to predict future evapotranspiration in the regions could be signi cantly increased by adding more number of inputs. Tanteliniaina et al., (2021) chose 26 predictor variables with the help of coe cient of correlation between them and the desired output (Temperature and pressure at Mangoky River, Madagascar). These 26 variables formed the input matrix for the neural networks with precipitation and temperature data as output. These trained models were then used to estimate the future rainfall and temperature.
In almost all the works mentioned above, standardization of the GCM data is suggested for the removal of systematic bias in the mean and standard deviation of the dataset. Also, it can be observed from many of the technical papers that methods such as bilinear interpolation (appendix 1) have been adopted for re-gridding the data as the grid points of the GCM differs from the grid points of the observed data. As the authors have considerable experience with soft computing techniques they feel that the above 2 steps can be reduced with the help of ANNs. This work proposes that the rainfall can be obtained from GCMs at the 4 nearest grid points surrounding the desired location and used as input variables for training the neural networks with observed rainfall as output. It can be seen in the results section that this novel approach is giving results a shed better than the traditional downscaling technique of Distribution Based Scaling (DBS) as suggested by Rana et al., (2014).
Another observation that was made in the earlier works was that the LM algorithm was used in a majority of papers for training the neural networks. Several other algorithms have proven e cient in solving non-linear problems and thus the accuracy of 9 different training algorithms would be checked and the best one would be adopted for future rainfall simulations. Detailed results for the same are given in the upcoming sections.

Arti cial Neural Networks
As stated by Londhe and Shah (2019), Arti cial Neural Networks (ANNs), possess excellent capabilities in modelling complex processes which have been applies to a widespread domain of problems in the eld of hydrology. The structure of the biological neural network in the human brain is mimicked aptly by ANNs, where the input neurons are the causative variables and the phenomenon to be modelled forms the output node, which are nally connected by one or more hidden layers with neurons whose working resembles the biological neural network.
The training of ANNs is an iterative process which continues to minimize the error between the known observed and predicted variables (outputs), which can be done with the help of various training algorithms. Once the network is calibrated, it can be applied to evaluate the output for the unseen data.

Study Area And Data Used
Pune, the eighth largest metropolitan city of India having population over 5 million is situated at approximately 18° 32' North latitude and 73° 51' East longitude. The city's total area is 729 square kilometres. Pune lies on the Western margin of the Deccan plateau, at an altitude of 560 m above sea level.
As in the Indian subcontinent, Pune receives rainfall in the 'monsoon' spread over almost 4 months. The monsoon lasts from June to October, with moderate rainfall (214.85 mm as per the data acquired from India Meteorological Department, Pune) and temperatures ranging from 22 to 28°C. Figure 1 shows the location of Pune on the map of India.
The observed data was acquired from India Meteorological Department (IMD), Pune (https://www.imdpune.gov.in/). The data consists of cumulative monthly values of rainfall for 116 years, from 1901 to 2016, with no gaps.
Along with the observed values, GCM data from CM, obtained from the Copernicus Climate Change Service website (https://cds.climate.copernicus.eu/) was used in this study. Daily rainfall data (historical and future) from 5 GCMs was downloaded and extracted for historical (1901 -2016) and future (2021-2050) time steps. The GCMs used in this study are IPSL_CMA5, HADGEM2_CC, NorEMS1_1, CNRM_CM5 and GFDL_CM3. Details of these are given in Table 1  The monthly data for the entire globe was initially downloaded, in NetCDF format, which was then converted into a Comma Separated Value (.csv) le using MATLAB. Four (4) grid points surrounding the Pune region were then found and the rainfall data at all those 4 grid points were extracted from the global data.

Methodology
As mentioned in the previous section, the climate data for rainfall was obtained from the 5 GCMs, at 4 grid points surrounding Pune. The technique of bilinear interpolation was used to obtain the value of the rainfall at the required grid point, which gave us the raw GCM data. This raw data was then used to obtain the downscaled data for Pune region using the Distribution based Scaling (DBS) technique, as suggested by Rana et al., (2014).
The DBS approach includes two steps. In the rst step, the wet fraction (i.e. proportion of time steps with a non-zero rainfall) is adjusted to match the actual observations. Thus interpolated and observed precipitation data were sorted in descending order and a cut-off value was de ned as the threshold that reduced the percentage of wet months in the GCM data to that of the observations. Months having rainfall larger than this threshold value were considered as wet months and the others as dry months. For detailed methodology, the readers are referred to Rana et at., (2014).
In the second step of DBS, the remaining non-zero rainfall were transformed to match the observed cumulative probability distribution in the reference data by tting gamma distributions (equation 1) to both observed and interpolated monthly rainfall. A shape and scale parameter (α and β respectively) are needed to t the gamma distribution function. As the distribution of rainfall values (f(x)) is highly tilted towards lower intensities, distribution parameters (shape and scale parameters) estimated by maximum likelihood could be dominated by the most frequently occurring values and fail to accurately describe extremes (Rana et al., 2014). To capture the characteristics of normal rainfall as well as extremes, in DBS the rainfall distribution was divided into two partitions separated by the 95th percentile. Two sets of parameters representing non-extreme values and above 95 percentile, representing extreme values -were estimated from observations and the GCM output for the historical data. These parameter sets were in turn used to bias-correct rainfall data from GCM outputs for the entire projection period up to 2050 (P DBS and P DBS,95 ) using inverse gamma function (equations 2 and 3).The authors had developed their own code using MATLAB 2013a, for the same.
Furthermore, using the 4 grid values obtained from each of the 5 GCM datasets as input and observed monthly rainfall as output, 5 separate ANN models were trained and tested. As per the authors' knowledge, ANN would work with reasonable accuracy as a statistical downscaling technique to correct the magnitude of data and make it applicable to a speci c geographical area. Thus, the 116 years of monthly GCM data for the years 1901 to 2016 (1392 values) were used along with the observed data to form a 4*1 matrix to train the feedforward backpropagation type of neural network, using MATLAB 2013a. The data division for all models was 70% data (974 values) for training purposes and remaining 30% (418 values) data was used for testing the models.
During the training of the GCM models using it was observed that a 3 layer Feedforward Backpropagation (FFBP) network, having a single hidden layer was not su cient to train the neural network, perhaps due to the highly non-linear nature of the problem. Figure 2 shows the time series plot for IPSL_MR_CMA5 using LM algorithm for training. It can be seen that the model is not able to nish training due to the complexity of the problem. Hence, the authors added another hidden layer, which proved to be effective in improving the results to a satisfactory extent.
As suggested by Londhe and Deo (2003)  All the other parameters of the models were xed by trial and error method. Lastly, 5 separate models were nalised as the best performing models of each GCM. The network details for each of the nal 5 models are given in table 2. These 5 trained networks were then used to estimate the rainfall that would be occurring over Pune for the future 30 years, that is, from 2021 to 2050. For this purpose the future rainfall (2021 to 2050) for the 4 grid points surrounding Pune, extracted from the 5 GCMs mentioned in table 1, were used as unseen inputs.  Table 4 shows comparison between the raw GCM data and the monthly precipitation values obtained from DBS and ANN. It can be inferred from the table that ANN is consistently working a shed better than the other two methods. Also, an extensive result analysis, of observed, DBS and ANN models, was done for monthly as well as annual rainfall for the past 116 years. The trained monthly models were used to compute the annual values. No separate annual models were trained or tested. The annual rainfall analysis was further bifurcated as annual accumulated rainfall and annual average rainfall. The rainfall for all 12 months of each year was summed up to obtain the annual accumulated rainfall and their mean gives the annual average rainfall. This detailed analysis would throw light on the distribution of rainfall over the entire year. For example, if the average rainfall shows an increasing trend but the maximum rainfall does not, that shows a more distributed rainfall pattern throughout the year. If it is the other way around, then the total rainfall occurring over the region might be less, with extreme rainfall on certain days and no rainfall on the rest. Tables 5, 6 and 7 show the comparison of both the techniques for monthly, annual accumulated and annual average rainfall respectively. Alongside, to keep a check on the performance of the ANN models, decade wise RMSE analysis was carried out for all the 116 years. No sudden spike or fall in the RMSE value indicates that the ANN models are reliable and would also help in comparison of the accuracy of all GCMs with each other. Table 8 gives an insight on the decade wise RMSE analysis for all GCMs. Figures 4 to 13 give a pictorial representation of the results, in the form of time series plot and scatter plots, of all 5 GCMs. All the result analysis have been done for both training and testing datasets. Result analysis of only testing datasets was not feasible as comparison with DBS would have been di cult since DBS requires entire dataset for its working methodology.  The result analysis for monthly models (Table 5) (Table 6) and annual average rainfall (Table 7) values also show that ANN seems to be working a shed better than DBS as a downscaling technique.
However, ANN seems to be lagging in the aspect of modelling the peak rainfall values, both monthly and annually. The maximum rainfall and PEP values are an indication of the accuracy by which the peaks are being simulated. A positive PEP value indicates underestimation and a negative value indicates overestimation of the peaks. It can be observed for both monthly (Table 5) and annual (Table 6 and Table7) results that ANN has consistently under predicted the peak rainfall values, as compared to DBS, where they are being consistently over predicted.
A similar pattern can be observed in the time series plots of all the 5 GCMs as well (Figures 4, 6, 8, 10 and 12). In each of these gures, ANN appears to be under predicting the higher order values and DBS is showing a over predicting trend.
A brief glance at the scatter plots for all 5 GCMs (Figures 5, 7 , 9, 11, 13) show that as compared to DBS, ANN values are aligned better along the best t line. As discussed earlier, from these gures as well it can be noted that ANN is under predicting results and not giving satisfactory results for higher order values. It can be observed from gure 7 that there is a saturation of ANN values at approximately 400mm. A similar trend is observed in the scatter plots of the other GCMs. This is consistent with the PEP values hence proving that the models are not up to the mark when it comes to higher order values. The network is not able to predict values beyond a certain limit. This might be occurring due to insu cient information being provided to the neural networks.
Thus, the authors feel that this drawback could be overcome by using rainfall causative parameters, along with previous year's rainfall, as input to the ANN models, so that they get more information about the physical process being modelled. there is no sudden increase or decrease in the error value. They lie in a similar range for all models. This inference can be backed up by gure 14, which depicts a plot of the decade wise RMSE analysis for all climate models. This study also shows that all 5 GCMs are working almost at par to model the rainfall at Pune. Out of all 5, HADGEM2_CC seems to be having an upper hand, followed by NorESM1_1. However, comparison of these models could be carried out only after the peak prediction of the models can be improved.
Further, the 5 trained ANN models were used to estimate real time rainfall for future 30 years, that is, 2021-2050. These models were run by giving the rainfall at 4 surrounding grid points as input to the pre-trained ANN models having the parameters mentioned in Table 2. For the future rainfall series obtained using ANN the mean and maximum value was calculated for both monthly as well as annual data series. These future parameters can be studied from Table 9. The table also shows us the percentage increase in the mean and maximum rainfall, as compared to the previous years. It can be seen in this table that the mean rainfall seems to be increasing in the future years. The percentage increase varies from 2.09% (HadGEM2_CC) to 15.76% their rainfall trends, so much difference in the future maximum rainfall does not seem plausible. It has already been discussed earlier in the paper that ANN is constantly under predicting the peak values, as a result of which, the same error is being carried forward to the future rainfall as well. Hence, the dependability of these results could be increased after the peak estimation of the trained models is more accurate. all other models show a slight increase in total rainfall up to 2040 and then it seems to be decreasing slightly. HADGEM2_CC shows a sudden rise in the rainfall for the last decade. If compared to decade wise RMSE analysis ( gure 14), it can be seen that the RMSE value for this model increases for the last 2 decades and thus, the operational model also might have an increased error percentage. Here again, the rainfall trend over the next 3 decades could be depicted more accurately if the peak prediction by ANN models could be improved upon. From the above results, it can be clearly concluded that ANN is working better than DBS, as a downscaling technique. Not only is it more accurate but also a less tedious process as the grid points can be directly used to train the models. The drawback that is seen in the results of ANN is that the peaks are not estimated with satisfactory accuracy. The authors' feel that this can be improved by using rainfall causative parameters, such as temperature, air pressure, wind and humidity, as input parameters along with the rainfall. This would give ANN more data to learn the rainfall trend from. This is the future scope of the current study.

Conclusion And Future Scope
The present environmental scenario clearly dictates the need for an estimate of how the changing climate is going to affect the various hydrological parameters such as rainfall. Global Climate Models are being used all over the globe for this purpose. These GCMs give climatic parameters over large grid points for historic and future time periods. The parameter values obtained need to be downscaled so that they can be made location speci c. The current work suggests the use of a soft computing tool, Arti cial Neural Networks, as a statistical downscaling technique, to study the rainfall 30 years into the future. The study has been carried out for the city of Pune.
The rainfall values obtained from 5 GCMs, IPSL_CMA5, HADGEM2_CC, NorEMS1_1, CNRM_CM5 and GFDL_CM3, for the 4 grid points surrounding Pune were used as input for each model. The output was the actual measured rainfall obtained from the Indian Meteorological Department, from years 1901 to 2016. A technique, namely, Distribution Based Scaling (DBS) was also used for comparison with ANN, as suggested by Rana et al., (2014). Statistical parameters and error measures were used to judge the accuracy of the models, along with time series plots and scatter plots. The results clearly indicated that ANN is performing better than DBS in all aspects.
The future rainfall estimated with the help of the trained ANN models show an increase in mean rainfall over the Pune region by ∼2 -15% and decrease in maximum rainfall by ∼40 -65%. There is a scope of improvement for the percentage decrease in the future rainfall as it was observed that ANN was not able to catch the peaks with the desired accuracy. For this, the authors suggest using rainfall causative parameters as input to ANN models along with rainfall. Study for the same is being carried out by the authors currently. Furthermore, analysis of the climate models showed that all 5 GCMs are working at par, but HADGEM2_CC seems to be working a tad better as compared to the others apart from the slight increase in error for the last 2 decades.  Flowchart explaining methodology of the present work Scatter plot (CNRM_CM5) Figure 10 Time Series plot (NorESM1_1) Figure 11 Scatter plot (NorESM1_1) Figure 12 Time Series plot (GFDL_CM3) Figure 13 Scatter plot (GFDL_CM3) Figure 14 RMSE trend for all GCMs used.