Performance of gridded precipitation products in the Black Sea region for hydrological studies

Gridded precipitation products are becoming good alternative data sources for regions with limited weather gauging stations. In this study, four climate gridded precipitation products were utilized, namely Climate Forecast System Reanalysis (CFSR), European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim/land, the Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), and Multi-Source Weighted-Ensemble Precipitation (MSWEP). The key novelty of this study is to fill the gap in one of important areas of the transcontinental region of Eurasia, namely Rize province in the Black Sea region being selected as a study area since it has complex topography and climatology in addition to a limited number of gauging stations. A set of precipitation products were assessed for performance with the observed precipitation data before using a hydrological model (SWAT) to evaluate the basin response for the climate products. Three methods were considered in this study: (i) spatial comparison and (ii) hydrological and (iii) statistical evaluations. Along with precipitation forcing, the SWAT model simulations were analyzed in conjunction with streamflow observations. In an overall evaluation, the percentage bias of ERA-Interim/land, CFSR, APHRODITE, and MSWEP mean monthly precipitation is 19.9%, 33.4%, 41.4%, and 85.0% respectively. For the flow simulations, the CFSR and MSWEP have resulted in exaggerated peak flows in the high flow season due to overestimated precipitation forcing (Nash Sutcliffe efficiency [NS] equal to 0.22 and −0.73, respectively). On the contrary, the APHRODITE underestimated the peak flows due to lower precipitation estimates (NS = 0.38). The ERA-Interim land showed good agreement with the observed flows (NS = 0.53). From these readings, we stated that the ERA-Interim land exhibited improved performance with the observed precipitation whereas the CFSR showed the worst performance. The study suggests that gridded precipitation products could supplement observed precipitation data for observational data scarcity in mountainous regions.


Introduction
A good quality of precipitation data is essential in all climatological and hydrological studies. Unless a sufficient number of gauging stations are available, it is difficult to assess the spatial and temporal variability of precipitation for a basin (Sharma et al. 2013). Insufficient density of observation stations is a major problem in many parts of the world. It is known that a poor data coverage makes estimate of all hydrologic variables a difficult task. The World Meteorological Organization (WMO) recommends the distance between precipitation gauging stations at surface level not being more than 500km, and between stations measuring temperature, humidity, and wind at upper levels not to be greater than 1000km (WMO 1956).
In recent years, sophisticated high-resolution satellite-based reanalysis data and the merger of different precipitation products have been developed to help scientists and researchers in areas with sparse rain gauges. It is evident that such data could be an essential asset helping researchers who work with basins having a scarce coverage of precipitation stations (Bitew et al. 2011). It is important to emphasize that not all precipitation products are climate reanalysis data, but also merged observed and satellite precipitation data compiled from a number of meteorological organizations. Some of the well-known publicly available gridded precipitation data include: the National Centre for Environmental Prediction (NCEP) -Climate Forecast System Reanalysis (CFSR) (NCEP 2015), the Asian Precipitation-Highly-Resolved 1 3 Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) (Yatagai et al. 2012), Multi-Source Weighted-Ensemble Precipitation (MSWEP) (Beck et al. 2017), European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim land precipitation data (Balsamo et al. 2012;Balsamo et al. 2015), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) (Ashouri et al. 2014) and Integrated Multi-satellite Retrievals for GPM (IMERG) (NASA 2019); which are examples of precipitation data produced by the merger of different precipitation products. In this study, we used the first four precipitation products that all are known with state-of-the-art algorithms, as well as are freely available. In addition, these products have relatively high spatio-temporal resolutions for the study area at hand.
Reanalysis data and merger of different precipitation products have been increasingly used recently by many researchers in hydro-climatological studies. For example, Renner et al. (2009) presented the outcomes of flow prediction for the Rhine River in Europe through the rainfall-runoff models with probabilistic weather forecasting. He reiterated the importance of downscaling climate forecast products prior to enforcing rainfall-runoff models in flow forecasting. On the other hand, Verkade et al. (2013) advised to remove the bias in the forcing ensemble weather products containing significant bias that cascades into the flow forecasts. Richard (2008) compared two reanalysis data (ECMWF-ERA40 and NCEP-NDRa2) globally, and detected significant disagreements between the two products in regions of the world with higher topography. Feidas et al. (2009) and Feidas (2010) validated a number of tropical rainfall measuring mission (TRMM) precipitation products for the Mediterranean region. Inconsistency of the CFSR precipitation products was reported after a comparative study of model and satellite-derived long-term precipitation data for Mediterranean basins (Aznar et al. 2010). Moreover, Manzato et al. (2015) compared ECMWF-ERA reanalysis data with 104 gauging stations data in Italy. He found out that ECMWF significantly underestimates precipitation and the results of bias and root mean square error (RMSE) were not acceptable. Hussain et al. (2017) evaluated two gridded precipitation datasets (TMPA and APHRODITE) in the Himalaya mountainous basin. He found that the biggest error in the gridded data arose mainly from elevation and TMPA dataset showed poor correlation with ground observations especially for higher altitudes.
In the western hemisphere, Ward et al. (2011) analyzed spatially averaged precipitation products (ERA-interim land, NCEP-R1 hindcast, PERSIANN, and TRMM reanalysis products) and compared with average precipitation observations in the Ecuadorian Andes and in Chilean Patagonia. They concluded that the observations were consistent over the two study regions which made extrapolation to other mountainous regions possible. In a number of basins on the North-American continent, reanalysis weather data (CFSR and ERA-Interim land) were used to force hydrological models. They found that the reanalysis data could successfully compensate the deficiency of surface observation records (Essou and Sabarly 2016;Essou et al. 2017a, b). In the case of hydrological modeling for regions with high spatial variability of precipitation such as mountainous regions, these studies also pointed out that reanalysis data have good performance when the density of the observation station network is low.
Moving to the east side of the globe, Kamiguchi et al. (2010) constructed historical daily precipitation data with high resolution (0.05° × 0.05°) over Japanese land area as part of the product of the APHRODITE project, which were used for validation of basin models. After examining long-term extreme precipitation trends in Japan, they found that a change in gauge density does not affect the trend of total precipitation, but does the trend of extreme values in Japan. In a study by Zhu et al. (2017), the performance of two reanalysis datasets (the twentieth century reanalysis -20CR and ERA-Interim land) were evaluated in the context of reproducing the persistent weather extremes in China. They showed that the two datasets capture the intensity indices better than the frequency indices of the weather. They also found that the ERA Interim land reanalysis data is able to depict the relationship among persistent precipitation extremes and local persistent temperature extremes. At last, Worqlul et al. (2014) used three weather products, namely TRMM product 3B42, multi-sensor precipitation estimate-geostationary (MPEG), and CFSR, in the northern highland basin of Ethiopia to assess the capacity of these data in supplementing ground observational stations, and stated that both MPEG and CFSR worked well to obtain good precipitation estimates. Among other studies in Africa, Seyoum et al. (2013) studied flow forecasting in poorly gauged, flood-prone sub-basins of the Blue Nile to test the performance of four quantitative precipitation forecasts (QPFs). He concluded that freely available atmospheric forecasting products could provide additional information on precipitation and peak flow events in areas where precipitation data are not available. In a mountainous basin in Ethiopia, four climate products (the CPC MORPHing technique (CMORPH), TRMM, TMPA-3B42 and PERSIANN) were employed by Bitew et al. (2011) to assess their use in hydrological modeling. Although a significant bias was observed in the reanalysis and merger of different precipitation products, they explored that such data improved the calibration of hydrological model after bias correction.
All the aforementioned studies used reanalysis and gridded satellite precipitation estimations with surface precipitation observations. Knowledge on the validity of gridded precipitation products in any part of the globe is of great scientific importance. Therefore, an investigation is critical to fill the gap in some parts of Eurasia like the Eastern Black Sea region which is known by its sparsely gauged highly elevated basins. Moreover, this region has been frequently exposed to floods during the rainy season, so effective hydrological modeling and assessments are crucial to mitigate flood damages. For this purpose, we selected Rize province in the Eastern Black Sea region where only 5 stations with at least 20-year uninterrupted observations are present. In the province, there is one gauging station corresponding to each 606-km 2 area, lower than the WMO's standard requirement (WMO 1956). Therefore, it is necessary to investigate the feasibility of gridded precipitation products data to supplement the surface observations used in hydrological studies. For this, we selected commonly used gridded precipitation products (CFSR, MSWEP, APRODITE and ERA Interim land). Hence, the objective of this study is to examine the performance of these four gridded reanalyses and merged precipitation products in estimating the spatio-temporal distribution of precipitation in the Rize province of Black Sea region, Turkey. To achieve this goal, the products were subjected to comparison with surface precipitation observations at each station, where daily precipitation data were available. The performance of these products in estimating river flow was tested using a calibrated physically based hydrological model, namely soil and water assessment tool (SWAT). Annual and seasonal data performance of the climate reanalysis products were then assessed before testing the quality of prediction with various statistical analysis metrics. The study results are presented in three categories: spatial comparison, hydrological evaluation, and statistical evaluation.

Study area
The Rize province is located in the north eastern Anatolian mainland of Turkey, so-called the Eastern Black Sea region, with an area coverage of 3920 km 2 . An average annual precipitation in the region is nearly 2250 mm, and total annual runoff is about 2745 million m 3 (Sen and Kahya 2017). The Black Sea borders Rize province from northern side. It is a mountainous region with elevations reaching up to 3000m. The borders of the study area are delineated in Fig. 1. The area modeled with SWAT (Ikizdere basin) with the subbasins and flow gauging stations is depicted (a). In addition, the LanUse/LandCover (LULC) and Soil maps of the study are plotted (b and c). The upstream area is dominated with barren (BARR) and rangeland (RNGE) land use types. Whereas forest (FRST) landuse and Sikeston soil type is dominant in the downstream areas. The climate of Rize province is in class A (very wet) category based on the Thornthwaite climate classification technique (Sensoy 2008). In this technique, climates are subdivided into 9 subcategories starting from A (very humid) to E (dry) based on their Thornthwaite climate index values. At the same time, it is a part of the homogeneous streamflow region 7 (as a result of principle component analysis) of Kahya et al. (2008a). In the same context, Rize province is covered by the homogeneous streamflow regions B, C, or E (as a result of cluster analysis) defined by Kahya et al. (2008b) depending on the month demonstrating a distinctive pattern of hydrologic characteristics.

Data sources
The observational weather and streamflow data supplied by the Turkish Meteorological Directorate (MGM) comprises of daily total precipitation (mm) and long-term average daily precipitation (mm) recorded at gauging stations in the Rize province (TÜMAS 2015). In addition, monthly average precipitation and daily flow data in the basin were supplied by the Turkish general directorate of state hydraulic works (Devlet Su Isleri -DSI) (DSI 2015). Detailed information concerning the input data and the SWAT model setup for the study area is described in Swalih and Kahya (2021). All the input data of SWAT was first prepared prior to the model setup using ArcGIS 10.1, and later, the SWAT model was set using its ArcGIS version (ArcSWAT). The model was run and calibrated using observed precipitation and flow data of the Ikizdere basin. The running period is considered between 1979 and 1996 since some of observed precipitation series only extended to 1996. Consequently, the data interval spanning from 1979 to 1990 (1990 to 1996) was used for calibration (validation) phase.
The predicted satellite precipitation estimate and observed gauged precipitation data have different spatial and temporal scales. The ground gauging network consists of 5 daily precipitation stations and 14 daily streamflow gauging stations, which are not uniformly distributed across Rize province (Fig. 1). In our study, a size of 11, 16, 12, and 14 grid points were used for the CFSR, ERA-Interim land, APHRODITE, and MSWEP reanalyses and merged satellite precipitation products, respectively. These grid points were given for our study area by default from the respective dataset product providers (grid size resolution). In order to give uniformity for our study, we chosen 0.5°× 0.5° horizontal resolution across the precipitation datasets since not all of these datasets have a higher plot resolution. Among the ECMWF products, we used the ERA Interim land and interchangeably termed as ECMWF throughout the manuscript. The grid meshes constructed for the gridded precipitation products are presented on Fig. 2, which illustrates the grid schemes of the four precipitation products over the study domain. The observed flow data at the outlet of Ikizdere basin was employed as check for the simulation accuracy using each product set. Thus, one observed and four reanalyses and merged satellite precipitation product sets were used in our analysis. The climate forecast system reanalysis (CFSR) is a global, high-resolution, coupled atmosphere-ocean-land surface-sea ice system designed to give the best reanalysis estimate of these coupled domains over the 32-year period of record from January 1979 to March 2014 (Saha et al. 2010). The CFSR dataset production was executed in such a way that each run of atmospheric, and ocean analysis was repeated at every 6-h (0000, 0600, 1200, and 1800 UTC) using a 9-h guess forecast with 30-min coupling to the ocean. The land analysis using observed precipitation with the Noah land model was made only at 0000 UTC. Finally, a coupled 5-day forecast from every 0000 UTC was delivered with an identical horizontal resolution version of the atmosphere for check-up mechanism. The CFSR weather data was accessed from the official site of global SWAT weather data with a special resolution of 0.5°X 0.5° and daily time step (https:// globa lweat her. tamu. edu/).
The Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources project (APHRODITE) was an international cooperative project that generated a 57-year daily precipitation dataset for the Asian subcontinent by first collecting and analyzing rain gauge observations from thousands of stations in Asia adding to the gauges reporting to the WMO Global Telecommunications System (Yatagai et al. 2012). The gridded fields of the daily precipitation are defined by interpolating gauge observations obtained from meteorological stations. The main goal of this project was to produce a quantitative high-resolution product for the validation of high-resolution models and for studies on hydrological applications. In the Fig. 1 The study area and flow observation stations (a), landuse types (b), and soil types (c) for the Rize province production of the APHRODITE dataset, gridded analysis of global daily and monthly precipitation have been interpolated on a 0.05° latitude-longitude grid by merging several kinds of information sources with different characteristics, including gauge observations and estimates inferred from a variety of satellite observations (Xie and Arkin 1997;Yatagai et al. 2012). The APHRODITE used the WorldClim technique for its data generation to get data throughout the world. The steps used in producing the daily global climatology include (i) summing daily data as monthly values, (ii) gathering the monthly data and averaging the value if the station has recorded data for more than 5 years, (iii) preparing the climatology at 0.05°, (iv) taking the ratio of station climatology (step 2) to the climatology at 0.05° (step 3) for each month, (v) interpolating the ratio in step 4 interpolated using Sheremap technique at a resolution of 0.05°, (vi) multiplying the interpolated values of step 5 with the world climatology (step 3), and (vii) calculating the daily climatology by using the Fourier transform technique from the values obtained in step 6. The precipitation data was downloaded with a daily temporal resolution and 0.5°X 0.5° spatial resolution from the official data site of APHRODITE project (http:// www. chikyu. ac. jp/ precip/ engli sh/ produ cts. html).
The numerical weather prediction system at European Centre for Medium-Range Weather Forecasts (ECMWF) -ERA Interim/Land weather product is designed to use data assimilation systems to "reanalyse" archived observations, creating global data sets describing the recent history of the atmosphere, land surface, and oceans. The quality of the product is controlled with ground and remote-sensing observations. The ERA-Interim/Land is a weather reanalysis product, which has a horizontal resolution of 80 km, and 3-hourly surface parameters. The ERA-Interim/Land is the result of a single 32-year period (Jan 1979 -Dec 2010) simulation with the latest ECMWF land surface model driven by meteorological forcing from the ERA-Interim atmospheric reanalysis and precipitation adjustments based on monthly GPCP v2.1 (Global Precipitation Climatology Project). The monthly GPCP data set merges satellite and rain gauge data from a number of satellite sources including the global precipitation index. In addition, rain gauge data from the combination of the Global Historical Climate Network (GHCN), Climate Anomaly Monitoring System (CAMS), and Global Precipitation Climatology Centre (GPCC) data set (consisting of approximately 6700 quality controlled stations around the globe interpolated into monthly area averages) are used over the land. The technique used was standalone land simulation both for global and point scales given the complexity involved in the coupled land-atmosphere assimilation systems. It is stated to be one of the most accurate meteorological forcing to drive the land surface numerical schemes (Balsamo et al. 2012;Balsamo et al. 2015). The ERA Interim land precipitation data was downloaded with a daily temporal resolution, 0.7° × 0.7° spatial resolution from the official website (http:// apps. ecmwf. int/ datas ets/ data/ inter im-land/ type= fc/).
At last, the Multi-Source Weighted-Ensemble Precipitation (MSWEP) is a new global precipitation dataset (1979-2016) with a high 3-hourly temporal and 0.1° spatial resolution. The dataset is unique so that it takes advantage of a wide range of data sources, including gauges, satellites, and atmospheric reanalysis models, to obtain the best possible precipitation estimates at global scale. The long-term mean of MSWEP was based on the recently released Climate Hazards Group Precipitation Climatology (CHPclim-version 1.0) dataset with 0.05 resolution (Beck et al. 2017). It is global precipitation climatology based on gauge observations and satellite data. The procedure used in the dataset preparation include the following steps: (i) calculating the long-term bias-corrected climatic mean; (ii) evaluating several gridded satellite and reanalysis precipitation datasets in terms of temporal variability in order to assess their potential inclusion in the MSWEP; and finally, (iii) downscaling the long-term climatic mean temporally in a stepwise manner first to monthly, then daily, and finally to 3-hourly timescales using weighted averages of precipitation anomalies derived from the gauge, satellite, and reanalysis datasets to yield the final MSWEP dataset. For accuracy, basin-ratio equations and streamflow values were used to correct the bias in the datasets. It has been reported that the MSWEP products outperformed other precipitation datasets, such as the CMORPH-CRT, PERSIANN-CCS, and PERSIANN-CDR (Beck et al. 2017). The MSWEP precipitation dataset was downloaded with a daily temporal resolution, and 0.5°× 0.5° spatial resolution from the official website (http:// www. gloh2o. org/).
The long-term monthly average precipitation scatter plots for the four weather products versus the observed precipitation in the basin are depicted in Fig. 3. The analysis was performed for the period 1979-1996. The precipitation regime of the area has been presented earlier in Fig. 1. The coastal region typically has a high precipitation compared with the mainland (mountainous inland) region. This phenomenon has been previously documented by Sensoy et al. (2008) and Sen (2013). The high peaks of mountains in the eastern Black Sea region are predominantly covered with snow during most of the time (especially in winter). Table 1 summarizes the datasets used in the study.

Methodology
The observational precipitation data was used to assess the performance of four state-of-the-art reanalyzes and merged satellite based precipitation product. We used three step evaluations in this study: (i) spatial comparison, (ii) hydrological assessment, and (iii) statistical evaluation. The arithmetic weight of individual grid stations was used to calculate annual average precipitation values. Moreover, a calibrated and validated physically based hydrological model (SWAT) based on the study area was employed to assess the quality of these climate products.
In general, process-based models focus to formulate the entire physical process from precipitation to flow in the hydrologic cycle by balancing the amount of water on daily, monthly, and seasonal time scales. The input data are usually temperature, humidity, soil moisture, soil texture, precipitation, evapotranspiration, lateral flow, and percolation rate. A major drawback of these models is the requirement of large number of parameters. This condition and other difficulties are limiting the use of the physical based models to a very small number of river basins (Demirel et al. 2009). One of the most popular process-based models is SWAT, which is used to examine the impacts of land use changes on the runoff and groundwater, production of sediment, and water quality; for example, flow in the tributaries or agricultural issues. Readers are referred to Arnold et al. (1996) and Arnold et al. (2012) for further information concerning the details of SWAT model. We set up the SWAT model for Ikizdere basin with a total area of 731.4 km 2 (located in Rize province). The SWAT model was calibrated using the observed rainfall and flow datasets. We have not used the reanalysis rainfall datasets in the model calibration. We only used the reanalysis rainfall datasets to observe the impact of the reanalysis datasets on the hydrology of the study area. Then, the model was run with all the four reanalysis precipitation data in order to be compared with the observed river flow records at the outlet of the basin. The general scheme used in this study is depicted in Fig. 4.

Spatial comparison
We applied the Kriging geostatistical technique to interpolate point precipitation data over the study domain. This technique is categorized under the interpolation methods consisting of geostatistical methods, which are based on statistical models including autocorrelation; that is to say, statistical relationships among measured points (Oliver 1990). Kriging is usually preferred due to the fact that it has not only the capability of prediction, but also can provide some degree of accuracy of the predicted values unlike the deterministic interpolation techniques like the inverse distance weighting (IDW), which only estimate the unknown value based on the distance of neighboring points. Kriging calculation is based on the weighted sum of the data as follows: where P(Z i ): measured value at the ith point, μ i : weight for the measured value at the ith point, P(Z o ): value at the prediction location, N: number of measured values. The term weight (μ i ) depends not only on the distance of the prediction point to the measured points, but also on the overall arrangement of the measured points in the study area (Oliver 1990). We used the ordinary Kriging method from the spatial interpolation methods (e.g., Kriging, IDW, and Spline) to create interpolated surfaces over the study area (available in ArcGIS interface). The statistical results presented in our work are based on the grid values not the Kriging interpolation surfaces.

Statistical evaluation
The statistical estimates were calculated using the daily time series data for each grid station. In this section, we went through statistical evaluation of the gridded climate products and the streamflow simulation of SWAT (with gridded precipitation forcing). We prepared histogram plot of the daily time series, probabilistic distribution function (PDF), and cumulative distribution function (CDF). In order to quantify the performance simulated flow compared with the observational flow values, Moriasi et al. (2007) recommended applying four quantitative statistics. They are namely (i) Nash-Sutcliffe efficiency (NS), (ii)  ratio of the root mean square error to the standard deviation of measured data (RSR), (iii) percent bias (PBIAS), and (iv) coefficient of determination (R 2 ). NS is used to assess the degree of fitness exhibited by the satellite data with that of the observational data, and implying the closeness of the value of interest with the observation (Eq. 2). RSR is expressed to standardize the root mean square error (RMSE) statistic by the standard deviation of observations. RMSE is the difference between the distribution of the ground precipitation observations and that of satellite precipitation estimations. The lower the RMSE score, the closer the satellite precipitation estimations represents the observed ground precipitation observations. RSR is calculated as a ratio of the RMSE and standard deviation of measured data (Eq. 3). PBIAS measures the average tendency of the simulated data to be larger or smaller than their observed counterparts (Moriasi et al. 2007 Initially, the bias for the inland area and coastal area was calculated separately to incorporate regional impact. Then the average of these values was taken as the overall dataset bias. Our final quantitative statistic R 2 is used to evaluate the goodness of fit of the relation. R 2 , implying the degree of linear association between the two variables, addresses the question on how well the gridded precipitation estimates correspond to the ground precipitation observations (Eq. 5).
In the decision stage, the indications of Moriasi et al. (2007) could be of practical criteria for evaluation. He stated that model simulations could be judged as satisfactory if NS > 0.50 and RSR < 0.70, and if PBIAS < 25% for flow analysis where: Y obs i is the ith observation; Y n i is the ith precipitation/ simulated flow value; Y mean i is the mean of observed data; STDEV is the standard deviation of observed data; R 2 is the coefficient of determination; G i is ground observation; S i is precipitation estimates/simulated flow; and n is total number of data pairs or the total number of observations.
In addition, we have used two statistical error metrics, which have been widely used by climatologists to evaluate precipitation dataset, namely probability of detection (POD) and false alarm rate (FAR) (Wilks 1995). These metrics explain the ability of the gridded precipitation dataset to detect the occurrence of rain and no-rain events without considering the amount of rainfall in these precipitation events. If a, b, c, and d are defined by: Fig. 4 Flow chart of the inputoutput setup adopted in the study Then, The POD is the ratio of the gridded precipitation event occurred in which it was observed (Eq. 6). It is the likelihood that the event would be forecasted provided that it was observed. Again, the best value of POD for a gridded precipitation is one, and its worst value is zero. On the other hand, the FAR is the proportion of the gridded precipitation estimate which was not actually observed (Eq. 7). We prefer FAR to be zero for the best gridded dataset since higher value indicate high uncertainty of the gridded precipitation dataset.

Hydrological evaluation
In this study, the semi-distributed hydrological model SWAT was used to model the Ikizdere basin of Rize province. The model was then used to assess impact of these various precipitation datasets on the hydrology of the basin. The weather data were used from available Turkish State Meteorological Service (MGM) stations in the SWAT model build up and calibration. ArcSWAT 2012 tool was used in this study to delineate the DEM and soil and landuse maps (which were provided by the Istanbul Technical University, Civil Engineering Department) and set up the SWAT model for the basin (refer Annex Tables 5 and 6 for landuse and soil types). Once the hydrological response unit (HRU) and subbasin analysis is done, the ArcSWAT interface then calls the weather data of the basin to finalize the SWAT model setup. For model optimizations, the DSI observed flow data of Camlıkdere station of Ikizdere basin was implemented for our model calibration and validation (Swalih and Kahya 2021).
The SWAT model setup for our study area was simulated for the period 1976-1996 (21 years) based on the available precipitation data in the basin. For the simulations and calibration phases, we employed observed precipitation data as it is only dominant climate factor, which has direct and significant impacts on the basin flow. In addition, we could not find alternative weather data with required level of quality in the study area. The mean monthly average values of temperature, wind speed, solar radiation, and humidity were generated from the climate forecast system reanalysis data (downloaded from the US National Centers for Environmental Prediction (NCEP -CFSR)). The simulations were run on a monthly time step, and a warm-up period of 3 years was provided to give time for the model to adjust itself to the data and the basin. The output of the model in the warm-up period was not used for our analysis. After successfully setting up the SWAT model for the basin, it was calibrated and validated using historical flow data of the basin.
The Parasol routine of SWAT-CUP auto-calibration tool was used to calibrate the model (Abbaspour 2015). The first three years (1976)(1977)(1978) were used for model warm-up, next 12 years (1978-1990) for model calibration, and 6 years (1991)(1992)(1993)(1994)(1995)(1996) for model validation. The sensitivity analysis method implemented in SWAT is called the Latin Hypercube One-factor-At-a-Time (LH-OAT) design as proposed by Morris (1991). Details on the LH-OAT technique could be referred from Van Griensven et al. (2006). Based on recommendations from literature and result of sensitivity analysis, the top 26 sensitive model parameters were chosen for model calibration ( Table 2). Each of these parameters represent a physical process affecting the flow (Neitsch et al. 2011). Since it is not possible to determine all these parameters, they must be adjusted by tuning their default values until the most appropriate model output (flow in this case) is achieved without violating the threshold boundary values. The most sensitive model parameters will be essential in model calibration since these parameters will affect our model the most.
Precipitation is the major component of the hydrological cycle. The streamflow and water yield of a basin are directly proportional to precipitation (rainfall and snowfall). In hydrology, the total amount of water flowing on the surface (streamflow) and subsurface (groundwater flow) is represented by the total water yield of a basin. It is calculated by subtracting evapotranspiration from precipitation. The daily time series was used to force the SWAT model to get the simulated flow from the basin on a monthly time scale. Therefore, the river flow analysis was carried out for monthly data for each gridded dataset. In order to avoid the confusion on the analysis period, we restricted our analysis period from 1979 to 1996 since some observational stations recording ceased starting from the year 1997.
The spatio-temporal performances of the reanalyses and merged satellite-based precipitation product as compared to the observed precipitation data were evaluated using different graphs for visual assessment. The ArcSWAT interface of Arc-GIS software has been implemented during the running phase of the model with various input data (Winchell et al. 2013). In the end, the performance of all the weather products were subjected to testing via the calibrated and validated SWAT model, which was parameterized according to our study basin using observed flow data at the basin outlet. Readers are referred to Neitsch et al. (2011) and Arnold et al. (2012) for further details on the hydrological model and theoretical and input-output documentation of the model.

Spatial comparison
As discussed in section 3.1, the spatial average precipitation of the four reanalyses and merged satellite-based precipitation products were computed using the Kriging technique as depicted in Fig. 5. The analysis was done using daily precipitation data from 1979 to 1996. The annual average precipitation estimation of the products was used in the analysis. Initial evaluation (yearly as well as seasonal) of the APHRODITE and ERA Interim/land precipitation was satisfactory compared with the other datasets. The MSWEP estimate seems good for the northeast region; however, it was poor for the south-west region. On the other hand, CFSR exhibits poor performance since the annual estimates are contradicting with the observed values. Similar conditions could be observed for the wet season (Sep-Dec) and the dry season (Mar-May). In an overall evaluation, the percentage bias for the mean monthly precipitation for the datasets was calculated by using the observed precipitation as reference. Initially, the bias for the inland area and coastal area was calculated to incorporate the regional effect; then, the average of these values was taken as the overall dataset bias. Hence, the mean monthly precipitation bias for ERA-Interim/land, APHRODITE, CFSR and MSWEP was found to be 8%, 23%, 42%, and 22% respectively. It should be noted that the observations are far from being perfect due to the low density of gauging stations in the province. It is possible that the gridded precipitation products could give a better description of precipitation pattern since they have improved grid density than the measuring gauges. Our analysis showed that gridded precipitation products could serve as a good substitute to observation precipitation data for sparsely gauged basins with insufficient weather data.

Statistical evaluation
The statistical evaluation of the gridded climate products and the river flow simulation of SWAT (with climate products forcing) is discussed in this section. The daily time series precipitation histogram, probabilistic distribution function (PDF), and cumulative distribution function (CDF) plots for grid points are presented in Fig. 6. The histogram plots of ERA Interim and APHRODITE are comparable with the observed histogram except for the low precipitation values. However, the statistical analysis for CFSR and MSWEP datasets exhibited poor performance. The CDF plot of the observed precipitation is quite different compared with all the gridded precipitation stations. All the measurement stations have a distinct CDF for the observation whereas the CDFs for the gridded precipitation products have little or no variability. MSWEP has captured the PDF of the observed precipitation much better that the other datasets. The ERA Interim and APHRODITE exhibited poor performance of PDF. Here, we can observe that the performance of the various precipitation datasets displayed different performance unlike what we observed in the spatial evaluation (section 4.1).
Overall statistical performance of the SWAT model is presented in the sense of comparing the model simulation runs with the observed and reanalyses and merged satellite-based precipitation products in the Table 3. The observed flow at the outlet of Ikizdere basin was used in our statistical analysis. In the table, MGM represents flow simulated by the SWAT using observed precipitation data. ERA Interim land precipitation data exhibited good performance (NS=0.53, PBIAS=19.9). These values are acceptable with the criteria set in the literature as discussed in section 3.3. A simulation can be accepted as satisfactory if NS > 0.5 and PBIAS < 25% (Moriasi et al. 2007). On the other hand, the simulation of the model with the CFSR, APHRODITE, and MSWEP precipitation products have a very low performance (NS = 0.22, 0.38, and −0.73, respectively) and percentage bias (PBIAS = 33.40,41.40,and 85.0,respectively), which are below the minimum criteria needed for a good simulation. Once again, the simulations with the CFSR, APHRODITE, and MSWEP precipitation data showed low performance.
The reliability of the gridded precipitation data sets was also checked by other metrics (POD and FAR). These metrics evaluate the underlying reason for differences in the observed and gridded precipitation data. We could see in Table 3 that the POD values are far or less high. The preferred POD value is one for the perfect scenario. This indicates that the probability that an observed precipitation to be forecasted by the gridded precipitation data was high for all the datasets. However, the FAR values for all the gridded datasets were also higher, that is not desirable. We expect this value to be close to zero for a good dataset. A big FAR value means that the proportion of the gridded precipitation event which was not observed is higher for the datasets (i.e., all the datasets tend to give "false-alarm" when it comes to the precipitation estimation in our study area). Overall performance of the various model simulations using the reanalyses and merged satellite-based precipitation product with that of the observed flow at the outlet of the study basin is presented in Table 4. The notation MGM here represents the simulation run with observed precipitation data. CFSR and MSWEP simulations depict overestimation while APHRODITE depicts underestimation of the observed flow. Once again, ERA Interim land has performed well compared with the other reanalysis and merged precipitation products, and its simulation flow statistics are comparable with the observed flows of the basin.

Hydrological evaluation
When we come to the precipitation datasets that were selected for this study, first we decided to analyze the annual and seasonal distribution on the whole study area. The yearly average values of precipitation products were plotted in Figs. 7 and 8. The weighted averages of the grid values were used for both figures. There is a visible change in trend for the coastal and mainland precipitation values. In general, the coastal areas have higher precipitation than that  of the mainland mountainous regions, which is in agreement with the previous studies (Sensoy et al. 2008;Şen 2013). The mountains are predominantly covered with snow in winter retaining a significant amount of the precipitation. The CFSR results are the only exception in which the estimates were inversely estimated. For our study area, the effect of elevation difference on precipitation are noticeable in Figs. 7 and 8, showing that precipitation is significantly higher for the coastal region for the both wet season (September -November) and dry season (March-May), opposite to our expectation. Our study period spans from 1979 to 1996 since the observation records were not available after 1996. It is well known that precipitation increases with increase in altitude. Our results are in agreement with Sensoy et al. (2008) who reported mountain influences on the precipitation distribution of the country. Due to the presence of Taurus mountains along the coastal areas, the rain clouds cannot penetrate into the interior parts of Turkey, causing that the majority of precipitation falls along the coastal regions. For this reason, the land of Rize province becomes the most humid coastal regions all over the country.
Since the records of most gauging stations in the mainland region end in 1996, therefore, we decided to present the analysis results only for the period 1979-1996 to avoid confusion. Our findings (Figs. 7 and 8) once again confirm the indications of Şen (2013), who reported annual precipitation higher than 1500 mm for the majority of Rize province. The CFSR reanalysis exhibited low performance so that the precipitation is underestimated by 39% for the wet season in the coastal region. For the mainland region, the wet season was underestimated by 49%, and dry season was overestimated by over 200%. On the other hand, the ERA Interim/land underestimated the wet season by 27% for the coastal and overestimated by 18% for the mainland region. But it gives good estimation of mean for the dry season (less than 20% underestimation). The APHRODITE underestimated precipitation in the mainland region for the wet and dry season by 58% and 14%, respectively. The estimates for the coastal region exhibits small deviation from the observed climatology. The MSWEP overestimates the precipitation in the coastal (mainland) region by 29% and 41% (64% and 42%) for the wet and dry season, respectively. These results are in good agreement with Fig. 3. The observed precipitation value suddenly drops on the year 1988 for both regions due to drought conditions prevalent in that particular year throughout Rize province. Another striking result is that a reversed spatial distribution of annual precipitation exists compared to the observed data for the CFSR dataset. It has been reported by the CFSR data provider that there The long-term monthly average precipitation plots for the four weather products versus the observed precipitation both for the coastal and mainland regions of the basin are depicted in Fig. 9a. The overall comparison of all the datasets is plotted on Fig. 9b. The results herein are consistent with those in the previous analysis in section 4.1.
The ERA Interim land comparatively captures the observed precipitation pattern quite well for the coastal and mainland areas, except its underestimation of the coastal average precipitation. Likewise, the APHRODITE estimates well for the coastal area, but underestimates the mainland average precipitation by 49%. In contrast, the MSWEP demonstrates an overestimation for the average precipitation on the both coastal and mainland regions by 30% and 55%, respectively. The worst average precipitation estimates were made by the CFSR in this analysis as having an underestimation level up to 30% for the coastal area and overestimation level up to 80% for the mainland. As an overall evaluation, it can be said that the seasonal pattern of precipitation could be satisfactorily captured by the MSWEP, APHRODITE, and ERA Interim land in our study area. Nevertheless, the CFSR products poorly resemble to the observed seasonal pattern.

Seasonal performance
We adopted a hydrological model (SWAT) parameterized for the Ikizdere basin to make original assessments on the hydrological responses to precipitation inputs. The Ikizdere basin was categorized into 149 hydrological response units (hru) and 22 subbasins based on the soil, landuse, and slope maps overlay. Then the observed precipitation and the other weather data were implemented to simulate the model. The Muskingum routing method was used in SWAT to route the river flow. The model calibration was done using the observed precipitation data (MGM). As mentioned in section 3.3., the observed precipitation data was divided into three periods: warm-up (1976-1978), calibration (1979-1990), and validation (1991-1996). It is after completing the calibration and validation that we implemented the model to assess the impact of the various precipitation datasets in our study area.
In this section, we evaluated the hydrological response of our study area to the precipitation input using a calibrated SWAT model. Figure 10 illustrates the annual cycle of precipitation (rainfall and snow fall) estimates of the gridded precipitation products compared with that of the observed precipitation. There are a number of important indications that can be readily commented from Fig. 10. All the cyclic Fig. 8 The long year average precipitation of the reanalysis/merged products for the coastal/mainland regions (top), and wet/dry seasons (bottom) Fig. 9 Comparison plots for the monthly long term averages of precipitation for mainland and coastal regions (a) and inter-comparison of all datasets (b) behavior of the seasonal APHRODITE product staying below the measured annual cycle, implying underestimation of precipitation, while the MSWEP output lies above the measured annual cycle indicating overestimation. A poor performance is detected from the annual cycle of CFSR, fluctuating with excessive positive deviations for the dry season (March-May) and preceding with negative deviations for the wet season of year (Sep-Nov). There is a dipole of product underperformance in the sense of underestimation (overestimation) of precipitation with the APHRODITE (MSWEP) annual cycles. It is worthwhile to note that the peaks of the annual precipitation cycle of CFSR and APH-RODITE products completely correspond to the peak timing of the observed rainfall. A careful visual inspection reveals that, among others, ERA Interim land captures better the annual cycle of the measured data with little deviations from the measured precipitation values. These values are in agreement with the analysis in section 2.2 (Fig. 3). Considering snowfall, both CFSR and MSWEP significantly overestimate in winter season while only the MSWEP overestimate snowfall in fall. The APHRODITE slightly underestimate snowfall in both winter and fall seasons. ERA Interim gave a better estimation of the snowfall compared with the other datasets. In this particular study area, the snow cover is essential because significant amount of precipitation will be stored in the mountainous parts of the mainland region during the winter season.
The SWAT computes the various components of the hydrologic cycle in the basin including water yield. The total amount of water flowing on the surface and subsurface is represented by total water yield of the basin which is calculated by subtracting evapotranspiration from precipitation. Figure 11 summarizes the output of seasonal water yield simulated by the SWAT using the four gridded precipitation inputs for the Ikizdere Basin. It is clearly certain that annual cyclic behavior of all data is quite consistent with each other, like having a peak on May. From standpoint of peak water yield magnitude, simulation with the CFSR and MSWEP data, the estimated flow in May is significantly greater than the observed water yield. However, the water yield value was lower for the APHRO-DITE. Moreover, water yields were highly overestimated during the period May-December by the MSWEP simulation, which is due to overestimated precipitation by the MSWEP in the first four months (January-April) when the majority of precipitation falls in the form of snow. The Fig. 10 Comparison of the total monthly rainfall and snowfall of the reanalysis/merged products with the observations for the study area Fig. 11 The measured versus simulated monthly total water yield and streamflow comparison (mm/day) computed using the various weather inputs in the SWAT simulations high snowmelt volume simulated by SWAT is seemingly resulted from overestimated precipitation (snowfall) estimation of CFSR and MSWEP (Fig. 10). The water yield estimate of the ERA Interim land is the most comparable with the observed value. In the case of surface streamflow estimations, both the CFSR and MSWEP again overestimated flow out of the basin whereas the APHRODITE tends to underestimate. As a result, most comparable flow with observed value was that of the ERA Interim simulation. Our study gave similar result with what was reported by Essou and Sabarly (2016) (NS = 0.78) and Manzato et al. (2015) (R ≥ 0.82) in such a way that the ERA Interim land reanalysis data was found to successfully compensate the deficiency of surface weather observation records. Other studies on CFSR have found that it performs well in their study areas unlike what we have found for the Rize province (Essou and Sabarly 2016;Essou et al. 2017a, b). These studies were done on similar areas to our study area with complex topography and low density of weather gauging stations. In the study conducted by Hussain et al. (2017) for the evaluation of gridded precipitation in the Himalaya mountainous basin, the largest error in the gridded data arose mainly from elevation, showing the impacts of elevation on precipitation data measurement accuracy. Further study is needed to examine the impacts of elevation difference on the precipitation distribution in the basin.

Temporal performance
In this section, the performance of the reanalyses and merged satellite-based precipitation product at the temporal scale is analyzed using the SWAT model simulations. The measured monthly total flow at the outlet of Ikizdere basin is used to compare with the simulation of gridded precipitation data (Fig. 12). The simulation run with the measured MGM precipitation data is represented by "Measured flow" in Fig. 12. The simulations using the MSWEP and CFSR precipitation resulted in overestimated flow, especially for the peak seasons. The flow exaggeration is due to the overestimation of precipitation over the study area by both these weather products. The simulation with the APHRODITE data underestimates the flow of our study area. Precipitation underestimation of the APHRODITE is once again the main cause  (Fig. 10). The simulation with the ERA Interim land reanalysis is the closest simulation flow with the observed flow at the outlet of Ikizdere basin. The MSWEP and CFSR precipitation data generally overestimate the peak flows; whereas, the ERA Interim land precipitation data can be observed to have satisfactory simulation outcomes compared with the observed flow data. Our analysis results are in agreement with Essou and Sabarly (2016) and Zhu et al. (2017), who concluded that the ERA Interim land reanalysis weather data could not only be successfully used in place of surface weather observation records, but also improve hydrological modeling performance.
In general, the MSWEP and CFSR results were not consistent with what have been reported in other parts of the world and even in the Mediterranean basins. For example, the ERA Interim dataset analysis in our study presented similar results with those of Manzato et al. (2015), Essou and Sabarly (2016), Essou et al. (2017a, b) in such a way that the ERA Interim land reanalysis data was found to successfully compensate the deficiency of surface weather observation records. Some studies on CFSR (e.g., Worqlul et al. 2014;Essou and Sabarly 2016;Essou et al. 2017a, b) reported good performance in their study areas unlike our case in the Rize province. However, Worqlul et al. (2014) found that CFSR exhibited similar inconsistency in regard to the bias over his study area (Lake Tana basin -Ethiopia). The possible reason could be both studies were conducted on similar areas with complex topography and low density of weather gauging stations.

Conclusions
Any area with limited weather gauging stations is difficult to study. Therefore alternative/supplementary data source is essential to support the data scarcity. Reanalysis and gridded precipitation products have been reported to be good alternative data source, which could supplement precipitation data scarcity in climate studies. In this study, the four precipitation products, namely (i) CFSR, (ii) APHRODITE, (iii) ERA Interim land, and (iv) MSWEP, were subjected to performance tests in the Rize province of Eastern Black Sea Region, Turkey.
The precipitation products were compared using three analysis techniques: (1) spatial average graph, (2) statistical assessment, and (3) hydrological evaluation. To evaluate the performance of precipitation averaged over the study area (Rize province), spatial average of the precipitation datasets was plotted. The statistical evaluation of the gridded climate products and the streamflow simulation of SWAT (with the precipitation datasets as an input) were assessed in detail. The daily precipitation histogram, PDF, and CDF plots for each grid stations were plotted. Other statistical metrics were also used, including NSE, RSR, PBIAS, POD, and FAR. The hydrological SWAT model was used to assess the basin's response for each precipitation product. The SWAT simulation flow outputs of the gridded precipitation products were assessed for seasonal and temporal performance in the study area.
In comparison with the observed precipitation data, the spatial annual average precipitation estimations of the APHRODITE and ERA Interim land were found to be satisfactory. The MSWEP estimations seem good for the Northern region. The CFSR estimates contradicted with the observed values exhibiting poor performance. When the wet (September-December) and dry (March-May) seasons were evaluated, again the APHRODITE and ERA-Interim/land overestimated the mean monthly precipitation by 8% and 23%, respectively. On the other hand, the CFSR and MSWEP overestimated the mean monthly precipitation by 42% and 22%, respectively. It should be noted that the observation data is far from being sufficient due to the low density of gauging stations in the province. It is possible that the gridded precipitation products could describe the precipitation pattern of a basin if they have improved grid density than the measuring gauges. This implies that gridded precipitation products could serve as a good supplement to observational data for sparsely gauged basins. However, the observed precipitation data is essential in the validation of the gridded precipitation data as well as the calibration of hydrological models for the basins where observed data is available.
The statistical evaluation of precipitation datasets and the river flow simulations (with input climate products) shows that the MSWEP captured the PDF and CDF of the observed precipitation much better than other datasets. The ERA Interim and APHRODITE exhibited poor performance. The histogram plots of ERA Interim and APHRODITE were found to be comparable with the observed precipitation whereas the CFSR and MSWEP precipitation were found to be less comparable. The CDF plot of the observed precipitation is quite different than those of all the gridded precipitation stations. Unlike what we observed in the spatial evaluation, the performance of the various gridded precipitation datasets using daily time series statistics was quite different. The ERA Interim and APHRODITE datasets did not exhibit good performance. The POD of all the datasets was found to be satisfactory. However, the FAR metric showed that all the precipitation datasets exhibit sustained "false-alarm". This shows that the performance of the gridded precipitation products could have different performance based on the resolution and type of data analysis technique used.
The SWAT model simulations with the CFSR, APH-RODITE, and MSWEP precipitation datasets resulted in NS value of 0.2, 0.4, and −0.7, respectively and percentage bias of 33.4%, 41.4%, and 85.0%, respectively. The datasets of CFSR and MSWEP proved to overestimate the peak flows of the basin since the most dominant weather variable of SWAT model (precipitation) was overestimated. The APHRODITE simulation was found to underestimate streamflow (particularly low flows). These values are indicatives of the poor performance of these precipitation datasets for hydrological study of the area. Unlike the other simulations, the ERA Interim land SWAT simulation was in good agreement with the observed flow pattern of the basin. The ERA Interim exhibited a much better hydrological simulation performance with NS = 0.53 and PBIAS = 19.9, that is consistent with the suggested range by Moriasi et al. (2007) for a satisfactory model performance (section 3.3.).
In this study, we made a thorough analysis of four gridded precipitation products for Rize province of Eastern Black sea region. Our analysis results showed that gridded precipitation data could supplement weather observation records, and they could improve hydrological modeling performance of the basin. The ERA Interim showed better performance than the datasets inputted to the hydrological model. However, the MSWEP and CFSR were proven to perform badly compared with the ground precipitation observations since they overestimated streamflow in the basin, in particular, during high flow seasons. The APHRODITE precipitation underestimated the streamflow of the study area. Our results were in agreement with those of Aznar et al. (2010), Essou and Sabarly (2016), and Hussain et al. (2017) where ERA-ECMWF gridded precipitation product was proved to perform good in mountainous regions (section 1). Therefore, we could argue that gridded precipitation datasets tested to have good performance could be used to supplement surface precipitation observations in hydrological studies.
In a nutshell, it is plausible to encourage researchers to study other similar basins in Black Sea regions so as to put forward comprehensive assessments regarding the product performance in order to augment observed precipitation in the basins with low density gauges. It would be great to select a particular climate product and analyze the impact of the data on the spatial and temporal resolution. We also recommend expanding the climate dataset spectrum and incorporating more recently developed gridded precipitation data (for example, IMERG) and the elevation factor on the precipitation distribution in future studies. In addition, an in-depth further analysis of the precipitation datasets error arising from the data itself need to be investigated thoroughly.
Author contribution SAS analyzed and interpreted the gridded precipitation datasets for the study area. In addition, he calibrated the SWAT model and prepared the draft manuscript. EK performed the datasets analysis checking and was a major contributor in writing/ editing the manuscript. All authors read and approved the final manuscript.
Funding We would like to acknowledge TUBİTAK (Turkish Scientific and Technological Research Council) for providing the fund to accomplish this study.
Availability of data and material Data used in this study could be provided upon request. Table 5 The land use types of the study are used in the SWAT model (Neitsch et al. 2005) No.

Code availability Not applicable
Land cover Name