Bias adjustment of Cordex database based on a nonlinear artificial neural network method to simulate summer precipitation in the southeast Iran


 Precipitation is variable and random in nature and therefore, it has different behaviors in different places and times resulting in uncertainty increase. In this regard, due to the present uncertainties, there are many variations in the amount of precipitation, which makes it difficult to forecast this important quantity. In this study, to reduce the uncertainty and estimate the output of the Cordex database properly, we have used the multilayer perceptron neural network method and the Levenberg–Marquardt educational function. For this purpose, seven monthly parameters including surface temperature, surface temperature 850 hPa, surface pressure, altitude 500 hPa, surface moisture, specific humidity of 850 hPa and durations of sunshine extracted from the Cordex database networks were used as independent input parameters and monthly precipitation of six synoptic stations of Zahedan, Kerman, Bandar Abbas, Chabahar, Iranshahr and Saravan in five climate models CanESM2, CSIRO_Mk, GFDL_ESM2M, MIROC5 and NorESM1_M were separately evaluated as dependent parameters in the artificial neural network in two time series: 1) 1979 to 2005 in the historical model; and 2) 2006 to 2018 in RCP4.5 and RCP8.5 mode. The two models MIROC5 and CSIRO_MK in historical mode and GFDL_ESM2M and CSIRO_MK in RCP modes were selected as the most effective models. Considering the temperature, air pressure and humidity information and given the high correlation of the forecasted outputs, we can utilize the nonlinear artificial neural networks method in order to skew the Cordex database precipitation data for the southeast region of Iran.


Introduction
One of the most significant factors used in the natural resource studies is precipitation, which due to the variations and irregularities in the precipitation time series, it is important to identify the predictability and factors affecting the precipitation process. Forecasting of a time series is an issue not much attention has been paid to it in the meteorological processes. The precipitation systems of each place will be varied depending on the geographical location. The southeast of Iran has a winter precipitation regime, but part of its precipitation occurs in summer (Armesh et al. 2018;Babaeyan et al. 2018;Ramesh 2016) and therefore, the precipitation varies within the year and between the years based on the intensity and strength of these systems. On the other hand, due to global warming, changes have been observed in the amount of this variable (Daneshvar et al. 2019). Precipitation changes or climate changes in general can be so devastating to human beings that its consequences among the top 10 human threat factors in the 21st century as taken the first place of ranking (Carter et al. 2007), therefore, it is a necessity to do an integrated study with appropriate spatial precipitation distribution of network data (Bosilovich et al. 2008). It can be emphasized that producing climate data with proper accuracy is one of the main goals of forecasting and modeling centers.
Accurate estimation of climate variables in areas having no stations, the possibility of more appropriate climate research, and the study of changes and variations of the climate parameters are the most important benefits of this produced data (Forsythe et al. 2015). In this line, the Intergovernmental Panel on Climate Change (IPCC) has published five reports in this regard, which form the basis of the study. Currently, the fourth and fifth reports are the only ones used in the studies. The Special Report on Emission Scenarios (SRES) is used in the fourth evaluation model. In the fifth report of these scenarios, the greenhouse gas emissions were replaced by new scenarios based on the Radiative Forcing. In the fourth report, the main scenarios B1, B2, A1 and A2 represented the most optimistic to the most pessimistic emissions, respectively, which were replaced by RCP2.6, RCP4.5, RCP6.0 and RCP8.5 scenarios in the fifth report (Kouhestani et al. 2017).
One of the main concerns of the earth scientists is the prediction of changes in the climate and meteorological measures.
Because being large-scale and the low spatial separation of their data is one of the main problems of these models and it is not possible to use this data directly then the data must be downscaled to solve this problem (Katzfi et al. 2009). There are several ways to downscale this data, which is divided into two parts: statistical and dynamic. Also besides access easier to this data and reduce time wasting in downscaling, numerous databases and institutions such as NCEP / NCAR, ECMWF and CORDEX have done the downscaling and making it to be used on a smaller scale and provided researchers with their necessary data (Hamidianpour and Ghanbari 2018). The CORDEX database has been developed on a regional scale to model the future climate of the earth based on the international standards and examining the consequences of climate change and the methods of adaptation to it. In this project, land areas are divided into 13 sections and the output of global climate models is produced with a spatiial accuracy of 0.44, 0.22 and 0.11 degrees (Giorgi et al. 2009, p. 178). The downscaled data of the Cordex database has been used in many kind of researche. Hernandez et al. (2012) have successfully evaluated the temperature distribution and precipitation cycle in the western part of the African monsoon. Lapriez et al. (2013) have estimated the climate data according to the optimistic scenario of the Cordex project. There are significant differences in precipitation estimates in all models. Sceparovic et al. (2013) have regionally estimated and evaluated the distribution and temperature cycle and precipitation of the western part of the African monsoon using Cordex project data and selection of two regional (CRCM) and global (CGCM) models with success. Jacob et al. (2014) have studied the precipitation variable using data from the Cordex project across continental Europe and they acknowledged that the pessimistic line of 8.5 shows more distance from the optimistic line of 4.5 in the case of large changes in temperature indices.
However, these two lines will show no much difference in estimating precipitation amounts. Dojio et al. (2014) have used Cordex data Africa as well as two models that use the COSMO-CLM (CCLM) model as a regional model (RCM) and four effective models: MPI-ESM-LR, HadGEM2-ES, CNRM-CM5 and EC-Earth as a global model on the sequence precipitation distribution in wet and dry days and heavy precipitation and concluded that CCLM project data seems to be more appropriate in studying these variables. Cordex project is defined based on the one-time and coordinated combination of GCM / RCP / RCM / ESD models with high resolution to estimate the conditions and past of climate data inland areas of the world. Perrin et al. (2015) have done a comparison between data with an accuracy of 12.5 km and an accuracy of 50 km from the Cordex project downscaling database in the Europe continental zone and they made the conclusion that high-resolution values have more accuracy in estimating climate data and in orographic studies, more attention can be paid to this data. Dojio and Panitz (2015) have also simulated the precipitation amount and temperature in Africa using Cordex project downscaling data and the selection of two regional (RCM) and general (GCM) models. They came to the conclusion that the temperature rise difference in both models is estimated to be more than 1 ° C and this difference has also locally occurred. Al-Mazrouei et al. (2015) also used the Cortex project downscaling data and have introduced the regional estimates from multiple RegCM4 models in terms of precipitation ad temperature suitable for the Arab League (Arabic speaking countries). In to to compare two dynamic and statistical models, Casanova et al. (2016) have used the downscaled data of the average and extreme precipitation of Spain, which is extracted from the Cordex project. They concluded that the dynamic model data in seasonal errors and spatial patterns will act very well, however, it is not possible to report a statistically significant output of the two models (Dirdel et al. 2018). We have evaluated seven climate simulation models with 0.11-degree resolution in the Cordex database for a total of 3 precipitations to 24 hours of summer rainfall in Norway. Mianabadi et al. (2016) have estimated the rate of evapotranspiration of Mashhad station's potential for the horizon of 2021 to 2070 under two lines of pessimistic and optimistic Radiative Forcing using data with an accuracy of 0.44 degrees in the Cordex database. Research reviews suggest that the accuracy of CORDEX data has been evaluated in many parts of the world, however, a review of Iranian studies indicates that research related to the CORDEX project and the use of its outputs in the sphere of the monsoon system penetration in Iran is limited. Some studies have used two methods: Assuming the confirmation of historical data, the forecasting has been made (Moloud et al. 201;Mianabadi et al. 2016;Zolfaghari et al. 2017) or they have only relied on the ranking of database models . Research has evaluated the performance accuracy of eleven global models as well as two regional models in the CORDEX database to model the temperature changes in Zahedan synoptic station in 2005-1987 in order to obtain information about the decreasing or increasing trend of temperature in the region. Calculations have gained a very high correlation contrary to the expectations (Khajeh Amiri Khaledi and Salari Fanoodi 2018). On the other hand, the studies of  and Salari et al. (2020) have showed that precipitation data for Iran, especially in the southeastern region, are highly skewed and can't be trusted. Given the skewed differences in the downscaled data of the Cordex database and to reduce the existing uncertainties, we can use some methods to skew the required data and use them in the future studies. There are different linear and nonlinear methods to better preview the data, among which the e artificial neural network method is very powerful in modeling nonlinear structures, in other words, it uses the input and output sets to estimate and train the existing relationships, so that after training for a new member of the input data set, it will estimate its corresponding output.
Currently, the artificial neural networks method is not only used to forecast the precipitation at different time scales but also it has been used by many researchers as a tool to forecast extreme precipitation events. Badri and Sirmak (2000) used artificial neural networks to forecast the amount of extreme precipitation that leads to summer floods in the Maravia region (eastern Czech Republic). They have trained data using the Back propagation and with 38 years of monthly data for two stations in the region and then they forecast the next month's precipitation as well as next year's summer precipitation. The results showed that the artificial neural networks will well forecast the extreme precipitation amount because there is little difference between the actual and the forecasted precipitation amounts. Lakshmi et al. (2003) have networks is more appropriate and accurate. Gita and Selwarj (2011) have introduced a new method using neural network fans to improve the results of precipitation forecasting. In their study, 35-year average precipitation data and meteorological data such as wind speed, average temperature, relative humidity and the amount of particulate matter in the environment have been used to forecast using the Back propagation neural networks. The results showed that the model performed well in forecasting the average monthly precipitation. Chang et al. (2017) have obtained SST (sea surface temperature), In which the amount of ground precipitation patterns has a significant effect, using the artificial neural networks and have studied monthly and seasonal amounts across the United States.
According to the conducted studies, we can see that the goal of most research was the use of synoptic station data and Teleconnection Patterns to forecast the climate and hydrological variables; we can rarely find research about skew network precipitation data by simulation method based on non-precipitation element data with an artificial neural network. Therefore, the above research is the first research conducted in this regard. In this study, the neural network method was evaluated to estimate the amount of monthly precipitation; the model performance was examined and analyzed to determine if this nonlinear method could be used to better forecast the precipitation for future periods? Relying on the predictive power of the most optimal model, the estimation of precipitation in the southeast of Iran, which is one of the arid regions and has a variable and irregular precipitation pattern, is feeling more than before to help managers and researchers in various research and executive sections to achieve the goals of planning. For this purpose, for six important cities in this region, the output of 4 climate models was modeled using an artificial neural network and its output was examined.

Data and Methods
We have used two sets of data in this research. The first one: seven monthly parameters of surface temperature, the surface temperature of 850 hPa, surface pressure,the altitude of 500 hPa, surface moisture, specific surface moisture of 850 hPa, duration of sunshine over a 27year statistical period  were extracted from the Cordex database networks (esgfdata.dkrz.de/projects/cordex-dkrz). Using the artificial neural network method, the monthly precipitation of six synoptic stations in Zahedan, Kerman, Bandar Abbas, Chabahar, Iranshahr and Saravan (source of the Meteorological Organization) and the historical data of five climate models CanESM2, CSIRO_Mk, GFDL_ESM2M, MIROC5 and NorESM1_M were estimated.
The second one: five monthly parameters of surface temperature, surface pressure, and altitude of 500 hPa, surface moisture and duration of sunshine throughout 2005 to 2018 were extracted.
Using the artificial neural network method, monthly precipitation of six synoptic stations in Zahedan, Kerman, Bandar Abbas, Chabahar, Iranshahr and Saravan was estimated in four climate models CSIRO_Mk, GFDL_ESM2M, MIROC5 and NorESM1_M based on RCP4.5 and RCP8.5. Finally, the reproduction data were compared with the observational data of the studied stations to be able to evaluate the output accuracy of these models with the use of mean square error, root-mean-square deviation, mean absolute error and correlation coefficient.

Study Area
The reason for selecting the southeast of Iran as the area of study was that during the hot period of the year, this place will be in the sphere of the penetration of the hot and humid monsoon system of India and it will be limited to the three provinces of Sistan and Baluchestan, Hormozgan and Kerman. The location of the study area and meteorological stations are shown in Fig. 1 and the geographical characteristics of the stations are given in Table 1.

Cordex Database
CORDEX is an internationally coordinated framework for producing an advanced generation of climate change forecasting worldwide (Eich et al. 2015, p. 7). The purpose of the CORDEX project is to model the future of the Earth's climate based on the International Standard on Program Coordination (WCRP: World Climate Research Program) and to examine the consequences of climate change and adaptation methods at the regional scale (Giorgi et al. 2009, p. 178). In this project, land areas are divided into 13 sections and the output of global climate models are produced with a spatial accuracy of 0.44, 0.22 and 0.11 degrees. The output's time accuracy of the dynamic downscaled models of this project is also available as 3hour, 6-hour, daily, monthly and yearly data. Iran is geographically located in Region 6 called South Asia (CORDEX-WAS) and Region 8 is called North Africa and the Middle East (CORDEX-MENA) of the mentioned project. The study area is located in Section 6 (South Asia) of this project, whose data has a network of 0.44 degrees (Fig. 2).

Artificial Neural Networks
The Neural network includes many networks or models such as Boolean Neural Model, Although the multilayer perceptron with perceptron and a non-linear sigmoid output function is the most widely used model of multilayer perceptrons, the MLP network has an intermediate layer with a sigmoid conversion function (S-shaped function) and a linear conversion function in the output layer, which is capable of approximating all the desired phenomena, provided that it has a sufficient number of neurons in the middle layer (Abqari 2005).
The structure of multilayer perceptron (Fig. 3) can be described as expressed by Pierce et al.
where, is the data error function is the penalty condition W is the number of weight and bias parameters in the neural network are the Bayesian hyper parameters In Bayesian theory, the purpose is to calculate the probability of the weight and bias parameter of the MLP model given in dataset A (Ungar et al. 1996): As MacKay (1992) Where, norm Χ is the standardized input value, max Χ and min Χ are the maximum and minimum data values, respectively.
We should note this point that these things will depend on the complexity of the problem and the quality of the data: what percentage of the data is sufficient for training in the artificial neural network modeling as well as is there any threshold that will be not efficient with less data than that. Error rates during network training will also depend on the number of examples used to train the network. If the number of middle layer patterns or neurons is low, the network cannot properly understand the relation between inputs and outputs. In addition, if the number of neurons in the middle layer is more than the required one, the network starts to maintain patterns, so that in the training phase it will perform well but t for test data the performance is poor and can't be generalized (Halabian and Darand 2012; quoted by Dezfulian and Akbarpour Shirazi 2011).

Criteria for evaluating the output of neural networks
To evaluate the accuracy of the data in comparison with the observed data from four statistical criteria, the followings have been used: MSE, RMSE, MAE and R (Equations 5 to 8), which measures the linear relationship between two variables and is a mathematical tool that is widely used in the development of climatic analysis (Singh and Borah 2013).
In the above equations, X o is the observational data; ، X s is the simulated data; X � o is the mean of the observational data and N is the number of data. Finally, the error rate between the observational data and the modeled data was calculated and analyzed.

Results and Discussion
Based on the previous results, it has been cleared that there is a weak relationship between local and observed and station data with this database regarding the precipitation variable Khosravi et al. 2020). Since other data from this database showed a more appropriate correlation, using the artificial neural network that is a good tool to forecast and estimate nonlinear data and series in background studies, we have estimated precipitation data using the above mentioned data.
We have compared the observational and forecasted data (based on the research of Ferreira et al. (2005) using a multilayer perceptron network model with the capability of complex nonlinear mapping) after precipitation forecast for stations in the southeast of Iran and to order to determine the error rate of the models. Studies have shown that each of the indicators has presented acceptable results (Tables 2 and 3). Then, for better representation, the forecasted and observed data graphs were drawn in Figs. 5 and 6. As it can be seen, the difference between the observed and forecasted data from each other is acceptable, which indicates the good performance of these networks in forecasting the precipitation in this study.   Examination of comparative graphs of observed and forecasted precipitation using the artificial neural network model in the historical period (Fig. 6) shows that while the values (observational and forecasted) have appropriate correlation but the model didn't forecast well the peak and extreme data and had a weak performance. However, the nature of heavy precipitation in the region, especially in the monsoon season, can't be ignored. It should be noted that Campbell (2005) believes that artificial neural networks are mainly used for shortterm forecasts of atmospheric elements, but their use in modeling the trend of precipitation changes and finding correlations between climate variables has had very accurate results. For this reason, it has been widely used in climate modeling, especially in simulating climate change (Mojarad et al. 2014). Hence, accurate forecasting of values, especially limit values, is not possible, and if only the process of change is required, an artificial neural network can be used. As it can be seen in Table 3 7) shows a good correlation The performance of the artificial neural network structure designed in this study to use the network data in precipitation estimation is important in comparison with other studies. In most studies, observational data, as well as precipitation information in different forms or more inputs, have been used as variables (Fallah Ghahlari et al. 2008;Ildromi et al. 2012;Karimi Goghari and Eslami 2008;Kavazos 2000;Khalili et al. 2008;Maria et al. 2005), however, in this study in addition to using the network data of Cordex database, which has a lot of underestimated and overestimated errors in the precipitation data section, the data of high atmosphere levels (500 and 850 hPa) have also been used.
The main difference between the present study and other researches, especially the domestic research, is the precipitation regime is in the southeast of Iran, which according to the statistics, part of the annual precipitation, especially in the first half of the year is in the form of showering and heavy rains and of course forecasting it will have some problems. In other studied areas (Tabriz, Khuzestan, Tehran, Shiraz, Mashhad, etc.) precipitation has a systemic and proportional distribution throughout the year. The results also showed the use of more and newer variables to improve network performance and more accurate modeling. In contrast, some researchers did not observe a significant difference between the measured and estimated values due to the increase in input variables (Ghorbani Dashtaki et al. 2009). In addition to increasing the time and cost of measurements, increasing the inputs increases the size of the network, slows down training and increases network errors (Karimi Goghari and Islami 2008).

Conclusion
In this study, to better understand precipitation, precipitation was estimated using Cordex nonprecipitation parameters. Since, a model for regional studies can't be reliable, so different models were used for regional studies to select the most optimal estimation model to minimize the uncertainty in the results by reducing the rate of errors. Therefore, in the present study, to estimate the precipitation of stations in the south of Iran that are affected by monsoon, we have used five climate models in the first step and four climate models of the CORDEX database and two scenarios of RCP4.5 and RCP8.5 in the second step. Comparison of the results of CORDEX database models indicated that there is great variations in precipitation changes between models and distribution scenarios, which in historical mode, the MIROC5 model and NorESM1_M model has the most and the least impact among the models using the neural networks, respectively. In addition, in RCP4.5 and RCP8.5 scenarios, the GFDL_ESM2M model had the most impact and the NorESM1_M model had the least impact among the models. The results in the models using the artificial neural network with relatively high and significant correlation at the confidence level of 0.95 showed that finding the right trend and establishing the relationship between non-precipitation parameters with predicted precipitation have been done. In these designs, which were performed in 100 modes (1 to 10 neurons in each of the first and second layers) for each station, the best model with the best mode was identified.
In general, due to the weak correlation between the outputs of the CORDEX database models in the precipitation section compared to the observed precipitation, the direct use of this data was practically unusable for of Iran. Therefore, with innovation, the authors indirectly predicted the amount of precipitation with an acceptable correlation using non-precipitation data from the CORDEX database; the results indicate an increase in the amount of correlation and a decrease in the amount of error, which indicates the power of the neural network in predicting.

Availability of data and materials:
We have used two sets of data in this research. The first one: seven monthly parameters of surface temperature, the surface temperature of 850 hPa, surface pressure,the altitude of 500 hPa, surface moisture, specific surface moisture of 850 hPa, duration of sunshine over a 27- year statistical period  were extracted from the Cordex database networks (esgfdata.dkrz.de/projects/cordex-dkrz). This data is free and available to all researchers.
And the second data is related to station data obtained from the Meteorological Organization of the Islamic Republic of Iran (https://www.irimo.ir/eng/wd/720-Products-Services.html)

Funding:
Not Applicable

Conflict of interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.    The geographical location of the study area and the selected stations. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. CORDEX-WAS44 area with total coverage of Iran (www.cordex.org). Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors.  Schematic diagram of multilayer perceptron neural network structure for Zahedan station Figure 5 Sensitivity analysis of input parameters of multilayer perceptron neural network for Zahedan station Figure 6 Comparison of observed and forecasted precipitation by the arti cial neural network model in historical period Comparison of observed and forecasted precipitation by the arti cial neural network model in RCP4.5 and RCP8.5 scenarios