“A comprehensive assessment of suitability of Global Precipitation Products for hydro-meteorological applications in a data-sparse Himalayan region”

Observation rainfall networks in developing countries such as Nepal face many challenges (like availability and quality). Global Precipitation Products (GPPs) could be an alternative to gauge-based observed rainfall (GOR) in hydro-meteorological studies. However, GPPs performance across the Himalayan regions is still unknown and is influenced by several factors such as spatial and temporal resolutions, primary data sources, etc. We have comprehensively assessed the suitability of the latest GPPs using categorical and continuous variable performance metrics for the Gandak River Basin in the Nepalese Himalayas. We then ranked GPPs for the first time using the Multicriteria Decision-Making technique. 11 out of 12 GPPs considered underestimated the annual rainfall in the basin. Performance of GPPs was also inconsistent for monthly/annual and daily timescales. At longer timescales, CHIRPS and IMERG_Final are better at representing the spatial and temporal pattern of the rainfall (spatial correlation of 0.75) and least percentage bias (PBIAS < 12%). At a daily timescale, IMERG_Final, ERA5, and PERSIANN_CDR stand out for probability of detection (POD) of rainfall, while all GPPs perform poorly in false alarm ratio (FAR). Although all GPPs have relatively high RMSE (6-14 mm/day), correlation (CC) with observed rainfall was high for IMERG_Final, ERA5, and MERRA_2 in most of the sub-basins. With elevation, the performance of all GPPs is reduced, as evidenced by higher PBIAS, and lower CC. Although there is plenty of room for improvements in rainfall estimation by GPPs, among the existing dataset, IMERG_Final scored best in the majority of the performance indicators and ranked first in five out of six sub-basins. It would be relatively the better choice in the data sparse Himalayan region when daily rainfall data is required. For applications that require monthly/annual rainfall, both CHIRPS and IMERG_Final are equally suitable. The method proposed in the study for assessing GPPs can be readily applied in other river basins and at sub-daily timescales.


Introduction
Precipitation data is crucial for analyzing regional and global climatic anomalies, forecasting extreme events, e.g., floods and drought, and understanding precipitation characteristics (Daly et al. 2017;Li et al. 2020;Luo et al. 2019). It also supports relevant policy-decisions in different sectors, e.g., agriculture, hydrology, industries, landslide management, and other emergencies (Hamal et al. 2020;Pokharel et al. 2020;Talchabhadel et al. 2018;Wang et al. 2021). Hydrometeorological inferences, such as hydropower generation, water availability and supply, irrigation, crop forecasting, and natural disaster studies (such as droughts and floods), are also sensitive to precipitation anomalies. However, such studies endure challenges in the data-sparse regions due to a lack of accurate precipitation datasets (Sun et al. 2018;Kumar et al. 2022a). Rain gauge networks are the most trusted sources of precise rainfall measurement (Hamal et al. 2020). Nevertheless, rain gauge networks face various challenges in mountainous terrains, e.g., sparse and irregular distribution and operational challenges, including rain gauge calibration and lack of regular maintenance, which causes inaccurate rainfall information (Ahmed et al. 2017;Khatakho et al. 2021).
Global Precipitation Products (GPPs), such as the fifth generation of global climate published by the European Centre for Medium-Range Weather Forecasts (ECMWF) (ERA5), Modern-ERA Retrospective analysis for Research and Applications (V02) (MERRA_2), Integrated Multi-Satellite Retrievals for GPM (IMERG)-Early, Late, Final run, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN)-Climate Data Record (CDR), and PERSIANN-Cloud Classification System (CCS), have drawn the attention of researchers and policymakers as an alternative to gauge-based observed rainfall in data-sparse regions for various applications (Chawla and Mujumdar 2020). Previous researchers Belabid et al. (2019); Brocca et al. (2020); Debbarma et al. (2022); Kumar et al. (2022b); Ma et al. (2020); and Yeditha et al. (2020) utilized advanced and finer spatio-temporal resolution GPPs for several applications, such as nowcasting and forecasting hydro-meteorological extremes that show the tremendous potential of applicability of latest GPPs. However, the relevance of GPPs across the morphologically complex Himalayan terrain is still unclear due to the enormous variations in the performance of GPPs in terms of rainfall events and the detection of magnitudes (Chawla and Mujumdar 2020).
Consequently, many studies have also evaluated satellite and reanalysis-based precipitation products against gaugebased rainfall data using various statistical metrics worldwide (Ahmed et al. 2017;Ayehu et al. 2018;Chen et al. 2020;Chowdhury et al. 2021;Jiang et al. 2020;Soo et al. 2019;Su et al. 2019;Xu et al. 2019). Several studies also attempted to assess the suitability of GPPs by evaluating based on their ability to capture rainfall and simulate streamflow by hydrological simulation in various basins across the globe (Beck et al. 2017;Bai and Liu 2018;Kolluru et al. 2020;Mararakanye et al. 2020;Mazzoleni et al. 2019;Tarek et al. 2020;Wang et al. 2020). Some studies evaluated the performance of different GPPs in various regions of Nepal. For example, Kumar et al. (2016)  Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), across the Karnali River Basin, Nepal, on daily and monthly scales using the J2000 hydrological model. Kumar et al. (2022b) compared the performance of eight SPPs, namely CMORPH, PER-SIANN, PERSIANN_CDR, PERSIAN_CCS, IMEGR_Late, IMERG_Early, IMERG_final, and CHRIPS, across the Simat Khola River Basin, Nepal. Talchabhadel et al. (2022) highlighted the importance and applicability of IMGER_Early in diagnosing hydrometeorological hazards in quasi-real-time in the Nepal Himalayas using extreme precipitation indices.
We found two major research gaps based on a comprehensive review of existing literature on GPPs. First, previous studies, particularly in data-sparse regions of the Himalayas, have primarily focused on analyzing a limited number of GPPs, typically ranging from 2 to 6. This may be due to the challenges associated with accessing and evaluating larger datasets. Consequently, the comprehensive understanding of the performance of GPPs in the region remains limited. Increasing the number of GPPs analyzed could significantly enhance previous studies and provide valuable insights into the precipitation dynamics in the Himalayas. Second, previous studies have identified that the performance of GPPs varies based on regional characteristics of the study area, as well as spatio-temporal variation in resolution and performance indicators used in the research. Some GPPs perform better in detecting rainfall events but fail to provide evidence on the magnitude of rainfall events. For instance, Wang et al. (2020) found that TRMM performed well in probability of detection (POD) but had a lower correlation coefficient (CC) compared to other GPPs. As a result, selecting an appropriate GPPs for specific applications is still challenging for researchers and decision-makers. To address this issue, a comprehensive evaluation of widely used and recommended GPPs for various applications using wide array of performance indicators can provide users with information regarding the product's robustness and suitability for specific locations and purposes. Such evaluations would ideally consider the spatio-temporal variability of precipitation and provide insight into the strengths and limitations of different GPPs.
Therefore, this paper developed and utilized a Multicriteria Decision-Making (MCDM) technique to rank GPPs with a comprehensive evaluation for any application. We have chosen the Gandak River Basin, Nepal, representing the morphologically complex Himalayan Region, to evaluate GPPs for the following reasons. First, this basin is highly vulnerable to water-induced disasters (e.g. landslides, debris flow, and floods), particularly during the monsoon season (May-Sep), which causes a significant loss of life and property due to its geographical location and complex topography (Tiwari and Rayamajhi 2018). Second, very few rainfall stations are available in the upper catchment, which might affect the hydro-meteorological studies and stakeholders in providing timely early warnings during major disasters.
In addressing the goal, we adopted a holistic approach to evaluating the performance of the GPPs. We first assessed the performance of twelve GPPs based on categorical and continuous variable metrics for different sub-basins and meteorological stations (station-based analysis) in the basin. We then assigned a rank to GPPs based on selected indicators using the MCDM technique. The study's significance is that we demonstrated a comprehensive framework for evaluating GPPs that could systematically rank various datasets and identify the best suitable datasets for meteorological application in the ungauged river basin for the first time.
Research findings might help local governments in Nepal and researchers to improve their understanding of GPPs and select the appropriate GPPs for water-related applications, e.g., hydropower generation, irrigation, and analyses of hydro-meteorological extremes (floods and droughts) in Himalayan terrain.

Study area
The Gandak River Basin, located between 25.49°N and 29.28°N and 81.80°E and 85.83°E, has topography ranging from 64 to 8141 m (Fig. 1). The basin, one of Nepal's major river systems, is a transboundary river originating from the Tibetan Plateau, flowing through central Nepal and draining into the Ganges in India. It covers almost one-third of Nepal's territory with a catchment area of 46,300 km 2 , of which 32,104 km 2 lies in Nepal, and the remaining catchment is in China and India (Chand et al. 2019). The river is significant for irrigation and hydropower generation in Nepal. Climatic conditions vary significantly across the basin, and the average annual precipitation was between 190 mm at the uppermost (Chhoser) and 3900 mm lowermost (Bhadaure Deurali) rainfall station. The basin receives approximately 86% of its annual rainfall in the monsoon season (May-Sep). To comprehensively evaluate the GPPs on a different basin scale, we delineated six sub-basins from small to large size by choosing various pour points individually. The delineated sub-basins-Mediseti, Budhigandaki, Marsandi, Trishuli, Kaligandaki, and Narayani-are shown in Fig. 1. We selected pour points based on the availability of discharge stations and well-known river tributaries. This enabled us to recommend the GPPs for different river tributaries of the basin and investigate the effect of GPPs' performance on tributaries' basin size.

Gauge-based observed rainfall (GOR)
We obtained daily gauge-based observed rainfall (GOR) from 2003 to 2017 for 36-gauge stations across the basin Department of Hydrology and Meteorology (DHM), Nepal (https:// www. dhm. gov. np/ conte nts/ resou rces). A summary of the rainfall stations and their locations is provided in the supplementary material (Table S1). We selected only those rainfall stations having a missing value of less than 5% to minimize the uncertainties in our evaluation results. In addition, to investigate and quantify the effects of elevation on GPPs performance, we selected a few stations (missing up to 10%) in higher elevation zone (elevation > 2500 m) within the total available rainfall stations since only a few stations are found in higher elevations area. However, the missing Fig. 1 The study area location map, along with six different sizes of basins caressing to six pour points (white circle) considered for the analysis with elevation ranges and available rainfall stations value period was excluded from the analysis to avoid additional uncertainty due to interpolation.

Global Precipitation Products (GPPs)
We selected a total of twelve GPPs, namely: (i) Climate Hazards Group InfraRed Precipitation with Station data ( (1) suitability for water-related applications such as hydropower generation, irrigation, and analyses of hydrometeorological extremes (floods and droughts) in the study area.
(2) Latency of the GPPs was also considered during the selection process, as some GPPs, such as IMERG_Early, IMERG_Late, and PERSIANN_CCS, are available for realtime applications such as flood and crop forecasting. (3) Recommendation based on previous studies.
CHIRPS is a satellite-based precipitation dataset developed by the Climate Hazards Group (CHG) at the University of California, Santa Barbara. It is generated by combining monthly rainfall climatology from Climate Hazards Group Precipitation Climatology, observation from quasi-global geostationary thermal infrared satellite, TRMM based precipitation, simulated precipitation using the atmospheric model from the National Oceanic and Atmospheric Administration (NOAA), Climate Forecast System (CFS), and gauge-based rainfall (Funk et al. 2015). The dataset is widely used in hydrological modeling, climate analysis, and early warning systems for drought events. CHIRPS provides real-time, consistent, reliable, and up-to-date precipitation datasets. For further details, refer to the study by Funk et al. (2015).
CMORPH precipitation data was generated by merging low-orbit satellite microwave observations with geostationary satellite IR data, passive microwaves aboard the DMSP 13, 14, and 15 (SSM/I), the NOAA-15, 16, 17, and 18 (AMSU-B), and AMSR-E and TMI aboard NASA's Aqua and TRMM spacecraft, respectively through the CPC Morphing technique (Joyce et al. 2004). This method blends the two data sources to produce high-quality precipitation estimates with global coverage and high spatiotemporal resolution. The CMORPH dataset has been widely used for various applications, including flood monitoring, drought analysis, and climate modeling. It provides a reliable source of precipitation data for regions with limited or no groundbased observations, aiding in the management of water resources and early warning systems. For further details, refer to the study by Joyce et al. (2004).
IMERG is a precipitation dataset that combines all the data from the Global Precipitation Measurement (GPM) satellite constellation using different processing and filtering methods over time (Huffman et al. 2020). There are three versions of IMERG precipitation products with different latencies: IMERG_Early, IMERG_Late, and IMERG_Final (Table 1). IMERG_Early and IMERG_Late use temperature and humidity ancillary data from a constellation of passive microwave satellites and employ a forward propagation GPROF algorithm to estimate precipitation. IMERG_Late additionally includes backward propagation, while IMERG_ Final uses both forward and backward propagation with the Goddard Profiling (GPROF) algorithm and a gauge correction technique. IMERG_Final also utilizes vertically integrated vapor data from the MERRA_2 dataset, monthly ground-based observations from the GPCC dataset, and ERA-5 data to correct any biases in the final product. The additional ancillary data sources and gauge correction technique used in IMERG_Final result in a more accurate and reliable precipitation dataset than the earlier versions. This dataset suits various applications such as flood forecasting, agriculture, and climate studies. However, it is important to note that IMERG precipitation products may still have some uncertainties and limitations, especially in regions with complex terrain and extreme weather conditions. For further details, refer to the study by Huffman et al. (2020). There are two types of rainfall estimates available in the IMERG, namely "HQprecipitation" (High-Quality precipitation from all available passive microwave sources) and "precipitationCal" (Precipitation (combined microwave-IR) estimate with climatological calibration). This study has considered the calibrated products (precipitationCal) for more data accuracy.
PERSIANN uses the neural network approximation method to estimate rainfall rates from infrared brightness temperature images provided by geostationary satellites (Sorooshian et al. 2000). PERSIANN can be used for various applications such as flood forecasting, drought monitoring, and weather forecasting. The neural network approximation method estimates the rainfall rate from infrared brightness temperature images provided by geostationary satellites (GOES-8, GOES-10, GMS-5, Metsat-6, and Metsat-7). PERSIANN_CDR provides long-term precipitation data to address the need for a consistent, long-term, high-resolution, and global precipitation  (Hersbach et al. 2018) dataset for studying changes and trends in daily precipitation, especially extreme precipitation events, due to climatic variability and change (Ashouri et al. 2015). PERSIANN_CDR can be used for various applications such as climate change impact assessments, hydrological modeling, and crop yield forecasting. PERSIANN_CCS utilizes the arrangement of cloud-fix highlights dependent on cloud height, areal extent, and fluctuation of texture assessed from satellite imagery to provide real-time rainfall datasets (Hong et al. 2004). PERSIANN_CCS is suitable for various applications such as flood forecasting, weather forecasting, and drought monitoring.
CPCGPRT is a dataset that uses gauge observations to generate global precipitation estimates at a daily timescale. The primary objective of CPCGPRT is to develop a unified and reliable analysis of precipitation by integrating all available data sources at CPC and employing an optimal interpolation (OI) target investigation method. The CPCGPRT project aims to provide researchers, policymakers, and other stakeholders with accurate and high-quality information on global precipitation that can be used to inform climaterelated decisions and actions. By continuing to improve the methodology and data sources used in CPCGPRT, the NOAA/OAR/ESRL PSL can enhance our understanding of global precipitation and its impact on society and the environment.
MERRA-2 is a reanalysis dataset that provides global atmospheric data with a resolution of 0.625 degrees in latitude and longitude. It includes the M2T1NXFLX variable (MERRA-2 tavg1_2d_flx_Nx: 2d,1-Hourly, Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics V5.12.4), which provides surface flux diagnostics data. The dataset is generated using an updated version of the Goddard Earth Observing System Model, Version 5 (GEOS-5) data assimilation system (Rienecker et al. 2011), which includes updates to the model and the Global Statistical Interpolation analysis plan (Wu et al. 2002;Molod et al. 2015). MERRA-2 estimates total precipitation using a combination of satellite observations and surface precipitation gauge measurements.
MERRA-Land is another reanalysis dataset that provides land-surface data and includes the M2T1NXLND variable (MERRA-2 tavg1_2d_lnd_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation, Land Surface Diagnostics V5.12.4). This dataset uses observation-based precipitation data as forcing for the land surface parameterization  and is designed to provide high-quality land-surface data for use in climate research and other applications. MERRA-Land estimates total precipitation using a combination of satellite observations, surface precipitation gauge measurements, and atmospheric reanalysis data.
ERA5 is a global atmospheric reanalysis dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). The dataset provides hourly estimates of a wide range of atmospheric variables, including precipitation, with a spatial resolution of 0.25 degrees and a temporal resolution of 1 h. ERA5 is derived using a sophisticated data assimilation system that combines satellite and ground-based observations with a numerical weather prediction model to produce a consistent and continuous record of the global atmosphere from 1979 to present. ERA5 is widely used for a variety of applications, including climate research, weather forecasting, and environmental monitoring. The overall summary of GPPs, including their characteristics and used inputs and algorithms, is presented in Table 1.

Pre-processing of data
Gauge-based observed rainfall (GOR) is measured at a point scale (station), while all the GPPs' data are available at various grid sizes. The complicated topographic slopes and heterogeneous distribution of rainfall stations within the selected basin affect the performance of the GPPs and the accuracy of interpolated rainfall (Monsieurs et al. 2018). We first employed a widely used point-to-grid comparison (Ahmed et al. 2017;Bai and Liu 2018;Hamal et al. 2020;Yang and Nesbitt 2014), which means each station gauge station is compared to its closest GPPs grid. This approach was used to quantify the effect of elevation of GPPs performance. The evaluation of GPPs grid boxes against gauge stations is affected by the homogeneity of the enclosed point-based observations. Since one grid box can contain multiple gauge stations, it is compared multiple times against them. This oversampling in the point-to-grid comparison amplifies the sub-grid heterogeneity, which is a characteristic of precipitation distribution over complex terrain. Secondly, we compared GPPs' performance on a sub-basin scale to understand how the GPPs will perform across the different sub-basin. Later we ranked GPPs for each sub-basin since most policymakers and GPPs users seek the applicability of GPPs at sub-basins and basin-scale, not at point scale (Fig. 2). Before the statistical evaluation of GPPs, we also quickly assessed the GPPs' performance in capturing the spatial and temporal rainfall pattern across the basin using temporal, spatial, and spatial correlation plots. As the available GPPs were at a different spatial resolutions ranging from (0.67° and 0.05°), all data were re-interpolated using bilinear interpolation to a common grid size of 0.25 degrees to allow better comparison. Also, many datasets used in this study were at a resolution of 0.25°, avoiding the need for additional interpolation. GOR was interpolated using the Thiessen Polygon (TP) method and re-gridded using the bilinear interpolation method to compare spatial and temporal rainfall patterns. Theisen polygons have been widely used, and although the lack of GOR may introduce some uncertainty, the overall analysis still provides valuable insights into capturing spatial variability in different GPPs.

Statistical performance indicators
A total of six widely used performance indicators, namely probability of detection (POD) Wang et al. 2021), false alarm ratio (FAR) Wang et al. 2021), accuracy (ACC ) , root mean square error (RMSE) (Baghel et al. 2022;Chen et al. 2020;Wang et al. 2021), Pearson correlation coefficient (CC) (Baghel et al. 2022;Chen et al. 2020;Wang et al. 2021), and percentage bias (PBIAS) Wang et al. 2021) were utilized to quantify the performance of the GPPs. POD, FAR, and ACC are categorical variable performance indicators that measure the capability of GPPs to detect rainfall events. Categorical variable verification is also known as yes/no dichotomous verification, wherein "yes" and "no" mean the event has happened and not happened, respectively. The threshold value of rainfall (Wu and Zhai 2012) was taken as 1 mm/day to avoid human error while taking a reading at gauge stations . A categorical variable verification is highly recommended for real-time flood evaluation studies (Kolluru et al. 2020). RMSE, CC, and PBIAS were used for continuous variable verification of the GPPs, which qualified the performance of GPPs in capturing the magnitude and temporal patterns of rainfall events. A brief description of all selected performance indicators is listed in Table 2.

Ranking of GPPs using MCDM technique
Normalization of statistical indicators Since the study has selected various performance indicators, including unitless (e.g., POD, FAR, ACC , CC), with units (e.g., RMSE, PBIAS), negative (e.g., FAR, RMSE), and positive (e.g., POD, ACC , CC) directed performance indicators, we normalized performance indicators to make them unitless and unidirectional which first steps of ranking the GPPs. Positive directed indicators are those in which absolute higher values denote the goodness of GPPs. In contrast, in the case of negative directed indicators, absolute higher values represent the badness of GPPs. The performance matrix (X) was formed using all of the performance indicators (m) and where i indicates the value corresponding to i th performance indicator and j represents the corresponding value of the j th GPPs. y ij is the normalized value of the elements of the original performance indicators matrix x ij and {min x ij }n represents the minimum value of the indicator i in correspondence to all the GPPs n. The values of all the normalized performance indicators matrix elements lie in the range of 0 to 1.

Calculation of entropy and weights
We used the entropy method to assign the weight to all performance indicators, which basically measure the dispersion in decision criteria. The entropy method for giving weights to decision criteria is widely accepted (Baghel et al. 2022;Chowdhury et al. 2021;Zhu et al. 2020;Raju and Kumar 2014). A high entropy (Eq. 5) represents high uncertainty coupled with the performance indicator, which means the indicator contains a low variation of information (Eq. 7) and is less valuable, i.e., lower weights (Eq. 8) for the ranking of GPPs (Chowdhury et al. 2021;Zhu et al. 2020). We calculated the entropy, diversion, and weight of performance indicators corresponding to the GPPs using Eqs. 5 and 6, Eq. 7, and Eq. 8, respectively.
(5)  Ability to detect rainfall events correctly in GPPs' data when the observed gauge showed rainfall False alarm ratio (FAR) 0 How often GPPs detected rainfall when there was no rainfall Accuracy (ACC ) Measures the level of agreement between GPPs and gaugebased observed rainfall Root mean square error (RMSE) 0 Measures the average magnitude of error Pearson correlation coefficient (CC) Measures the degree of agreement between gauge-based observed rainfall and a GPP's data Percentage bias (PBIAS) The deviation between gauge-based observed rainfall and a GPP's data where En i (j) is the entropy value of each normalized performance indicator for sub-basin j, n is the total number of sub-basins, m is the number of performance indicators, k ij is calculated value using Eq. 6, where y i is the normalized value of each indicator, and y i (j) is the normalized performance indicator for sub-basins j, D dj is the degree of diversification of the performance indicator i, and W i is the weight for the indicator i corresponding to sub-basins j.
Compromise programming We employed compromise programming, a Multiple Criteria Decision-Making (MCDM) technique, to rank the GPPs, which required weightage of selected performance indicators assigned using the entropy method (refer to Section 2.4.2). Compromise programming (Eq. 9) attribute rank based on L p value. where p is a parameter with a value equal to 1 for linear and 2 for square Euclidean distance (p was taken as 2 in this study), y * i is the normalized ideal value of a performance indicator, and y i (j) denotes the normalized value of the performance indicator i for sub-basin j.

Spatiotemporal variability of rainfall
There was a substantial spatial variation in annual rainfall across the study basin. The upper-west and eastern sides of the basin received lesser rainfall (< 1000 mm) than the lower (1600-2000 mm) and the middle (> 2100 mm) parts (Fig. 3a), which agrees with previous studies conducted in Nepal (Hamal et al. 2020;Hu et al. 1955). Several factors are responsible for the heterogeneous patterns of rainfall across the basin. The complex topography is one of the causes of orographic precipitation and its heterogeneity. Other contributing factors include differences in elevation, distance from the coast, the effects of monsoonal winds, and the timing and duration of the rainy season (Talchabhadel et al. 2018;Hamal et al. 2020;Hu et al. 1955). Compared to observation, PERSIANN (all three versions) and MERRA (both versions) significantly underestimated, while ERA5 overestimated the annual rainfall across all sub-basins (Fig. 3a). IMERG_Late and IMERG_Late, which are based on passive microwave sensors and climatic calibration, demonstrate a similar pattern and overestimate rainfall in the southern part of the basin. IMERG_Final, which was calibrated using GPCC monthly ground-based observed data and ERA5, showed an improvement in rainfall capture capability in the eastern part of the basin (Fig. 3a). However, there is still a significant need for improvement in the western part of the basin. This suggests that further improvement in correction methods is required, potentially by incorporating another set of groundbased rainfall observations. Mediseti, located in the eastern part of the basin, is characterized by steep slopes and narrow valleys that lead to high rainfall due to its high elevation and proximity to monsoon winds. Both ERA5 and CHIRPS have shown similar high amounts of rainfall in Mediseti compared to the ground-based observed rainfall. In contrast, Kaligandaki in the northern part of the basin has high mountains and deep gorges, resulting in low to moderate rainfall due to its high elevation and distance from the monsoon winds.
All GPPs except ERA5 have overestimated rainfall in Kaligandaki, indicating a need to improve the wind and terrain correction algorithms in these GPPs.
To better understand the performance of GPPs in capturing the spatial variability in rainfall, the spatial correlation coefficient (SCC) plot between GPPs and GOR is shown in Fig. 3b. CHIRPS and IMERG_Final exhibited the highest SSC (0.75), followed by ERA5 (0.72). In contrast, IMERG_ Early and IMERG_Late showed the lowest SSC (0.52) with GOR in capturing the spatial patterns of annual rainfall. The spatial correlation coefficient (SSC) of IMERG_Final was comparable with ERA5, but not with MERRA2, as IMERG_ Final depends on vertically integrated vapor data from the MERRA2 dataset, while the other datasets use ERA5 data. However, the inter-SSC of IMERG_Final with ERA5 and MERRA_2 was found to be more than 0.76, indicating a significant dependency on each other. The lower spatial correlation of IMERG_Early and IMERG_Late with GOR may suggest the need for further calibration and correction with ground-based data. Moreover, CMORPH exhibited better spatial correlation with GOR compared to CPCGPRT, and at the same time, showed a strong SSC (0.98) with CPCG-PRT, indicating that the CPC-Morphing techniques used in CMOPRH improve the spatial rainfall capturing pattern. These findings highlight the importance of carefully selecting precipitation datasets for accurate spatial and temporal analysis of rainfall patterns. Further efforts are needed to improve existing precipitation datasets' calibration and correction methods.
The Gandak basin receives the highest monthly rainfall of approximately 440 mm in July and the lowest (6 mm) in November and December (Fig. 4). All GPPs, excluding ERA5 and PERSIANN_CCS exhibited similar long-term monthly mean with GOR during the dry period (Nov-Feb) (Fig. 4). However, during monsoon season (May-Sep), ERA5 has overestimated (PBIAS = 36.59%), but PER-SIANN, PERSIANN_CCS, PERSIANN_CCS, and CMORPH have significantly underestimated the rainfall (PBIAS range: − 38.39 to − 52.96%). CHIRPS and IMERG_Final again outperformed other GPPs in capturing the temporal rainfall patterns (with a correlation close to 1) on a monthly scale throughout the year (Fig. 4). This is mainly because both GPPs are bias-corrected products (with monthly gauge observed rainfall) (refer to Table 1 for a detailed algorithm of GPPs), followed by IMERG_Early, IMERG_Late, MERRA_2, and CPCGPRT.
Precipitation from similar sources, e.g., IMERG_Early and IMERG_Late, PERSIANN and PERSIANN_CCS, and MERRA_2 and MERRA_Land have identical performances, which might be due to the application of similar technologies to estimate precipitation. For example, IMERG is a global precipitation product created by NASA that uses data from multiple satellite sensors. IMERG_Early and IMERG_Late are different versions of this product that vary in their data sources and processing time. Despite these differences, they employ the same algorithms and techniques to estimate precipitation from satellite data, leading to similar results in certain regions, although this may not be the case for other areas. MERRA-2 and MERRA_Land both use 3D-Var data assimilation techniques on the GEOS-5 data assimilation system to estimate precipitation. Additionally, MERRA_Land utilizes observation-based observation to improve the quality of data. PERSIANN and PERSIANN_ CCS perform poorly in capturing monthly rainfall variations. These products are generated using Artificial Neural Networks from single-sensor infrared (IR) data. Therefore, to improve their performance, it might be required to include data from various microwave sensors since the performance of multi-source satellite products is better than single satellite products (Lei et al. 2022).
Statistical performance of the GPPs, which are based on infrared (IR) sensors such as PERSIANN_CCS, was low because they use "cold cloud duration" (CCD) technique in precipitation estimation (Hong et al. 2004). The CCD technique measures the duration of cold clouds (temperatures below − 10 °C) as a proxy for precipitation. This technique assumes that all clouds with temperatures below the threshold will produce precipitation, which is not always true (Domenikiotis et al. 2003). IR sensors are known to have some limitations in accurately estimating precipitation. These limitations include the inability to differentiate between precipitation types (such as rain, snow, and sleet), the difficulty in detecting light precipitation, and the inability to accurately estimate precipitation in regions with high aerosol content or with high surface reflectivity. Additionally, some studies have reported that IR-based datasets tend to overestimate rainfall, especially in convective precipitation systems. Previous studies have also reported low performance of IR-based precipitation datasets compared to other satellite-based datasets, such as those based on microwave sensors. For example, a study by Kumar et al. (2022b) found that microwave-based precipitation datasets underperformed IR-based datasets in estimating precipitation over Simat Khola River Basin, Nepal. Another study by Dehaghani et al. (2023) found that microwave-based datasets better detect precipitation events than IR-based datasets.
Overall, interpretation solely based on long-term temporal monthly and annual plots cannot be a final recommendation because, in both cases, increasing or decreasing rainfall trend is neglected throughout the study period. This analysis only provides a preliminary comparison of GPPs' performance and was employed in this study to quickly assess the performance of GPPs in order to eliminate some of GPPs. We selected a total of nine GPPs, out of the original 12, namely CHIRPS, CMORPH, IMERG_Late, IMERG_Final, PERSIANN_CCS, PERSIANN_CDR, MERRA_2, CPCG-PRT, and ERA5 for further performance assessment. Our selection process was based on two key criteria. First, we eliminated the GPPs that significantly underperformed in comparison to the others. Second, we also made sure to include at least one GPP from the same data source, even if its performance was comparatively poor.

Effect of elevation on GPPs' performance
Understanding the role of elevation on precipitation is necessary for hydro-metrological studies since elevation influences the performance of GPPs (Hamal et al. 2020;Sharma et al. 2020;Lei et al. 2022). Thus, we assessed the dependency of GPPs' performance in terms of both categorical (POD and FAR) and continuous (CC and PBIAS) variable performance indicators against elevation using the linear regression method, as shown in Figs. 5, 6, 7 and 8.
POD and FAR were poorly correlated (R 2 < 0.1, P > 0.05) with an elevation in most GPPs, and POD was less elevation-dependent than FAR. PODs were higher (> 60%) for CHIRPS, IMERG_Final, PERSIANN_CDR, PERSIANN_ CCS, and ERA5 and were less influenced (almost zero slopes) by elevation than for other GPPs. ERA5 showed the highest POD (> 80%), which may be attributed to overestimation of rainfall and a higher number of rainy days, as shown in Fig. 5 and Figure S1 of the supplementary material. Additionally, ERA5 incorporates a large number of historical observations through data assimilation, allowing for a more accurate estimation of precipitation by combining information from various sources, including satellite data Fig. 3 a Spatial patterns of mean annual precipitation (mm/year). b Spatial correlation matrix plot of rainfall derived from gauge-based observed rainfall (GOR) and twelve Global Precipitation Products (GPPs) at 0.25° spatial resolution across the Gandak River Basin, Nepal (2003. The size and color of the circle represent a value of the correction coefficient, e.g. larger and darker red color denotes a higher correlation coefficient ◂ and ground-based observations. In comparison, the lowest POD (between 20 and 40%) was found for CHIRPS in most of the stations. The precision of precipitation estimates in finer-resolution GPPs such as CHIRPS (0.05°), IMERG_ Late (0.1°), IMERG_Final (0.1°), and PERSIANN_CCS (0.1°) may exhibit greater fluctuations in POD with elevation than coarser-resolution GPPs. This is because finerresolution data can capture more detailed information about precipitation patterns and variations in terrain, leading to greater variability in POD values (Duan et al. 2020). In addition, local effects such as topography and land use may also contribute to variations in POD values with elevation, which can be more sensitive to finer-resolution data (Duan et al. 2020;Yu et al. 2020). These findings are consistent with previous research. For instance, Tadesse et al. (2022) found that CHIRPS, a finer-resolution GPP, exhibited lower POD (33.4%) compared to the coarser-resolution PERCIANN-CDR (56.9%) in the data-sparse Wabi Shebelle River Basin, Ethiopia. Similarly, Soo et al. (2019) reported that the finer-resolution product CMORPH had a lower detection capability (POD varied from 71.8 to 81.4%) than the coarser-resolution products TRMM and PERSIANN (POD varied from 84.6 to 96.2%) in three different sub-basins of Malaysia.
Similarly, FAR was observed to be in between 20-60% at elevations lower than 2500 m but reached 60 to 80% in the higher elevation zone for all the GPPs. The possibility of falsely detecting a rainfall event is significantly higher for all the GPPs in higher elevation zones, i.e., all the GPPs performed similarly in terms of FAR. Still, best-performing ERA5 and PERSIANN_CDR in terms of POD performed worst in FAR. POD and FAR for a better-performing GPP, e.g., IMERG_Final, were high in elevation < 2500 m and worse for elevations > 2000 m, which also holds for other GPPs. The variation in performance is mainly due to spatially heterogeneity of precipitation with complex geography; GPPs experience issues settling the orography impact (Hamal et al. 2020;Hu et al. 1955). The calculated POD, FAR, and ACC range in different elevation zones for IMERG_Final agrees with past Nepal studies .
Continuous variable performance indicator PBIAS was more closely associated (R 2 < 0.21) with elevation than the CC for all the selected GPPs. The highest R 2 (0.23) between GPPs performance in CC and elevation was found for CPCGPRT, while the lowest R 2 (0.01) was seen for PER-SIANN_CCS (Figs. 7 and 8). The CC values of all reanalysis precipitation products (RPPs), namely MERRA_2, CPCG-PRT, and ERA5, were more than 0.25 in elevation zones lower than 2500 m. The (CC) of PERSIANN_CCS across the entire zone was found to be lower than 0.25, while the CC for ERA5 was higher than 0.25, except for two stations. Overall, all the RPPs were better with daily gauge-based observed rainfall in all the elevation zones than all the SPPs. IMERG_Final obtained the best CC among the SPPs; with values ranging from 0.25 to 0.40 for approximately 90% of the total stations. The PBIAS tended to increase with topography, and for almost all GPPs, higher PBIAS (> 100%) were observed in areas with elevations greater than 2500 m excluding few stations, while lower PBIAS values (< 25%) were observed in areas with elevations less than 2000 m except some stations. This indicates that in general, GPPs tended to overestimate rainfall in higher elevation zones and underestimate rainfall in lower elevation zones (Yucel et al. 2011). The observed variation in PBIAS at different elevations may be attributed to the orographic effect, which causes enhanced rainfall on the windward side and reduced rainfall on the leeward side (Yoxtheimer et al. 2023). This effect is not adequately accounted for by any of the GPPs.
Negative PBIAS values were consistently higher at lower elevations, except in a few locations, indicating a general underestimation of rainfall in these regions. In the southern zone, higher precipitation is expected due to the orographic lift and convergence of monsoon winds, which first impact the southern part of the lower elevation zone. The complex terrain in this region can cause significant spatial variability in precipitation patterns, which highlights the need for H-Line of 60% POD, respectively. Y-axis represents the POD value corresponding to the elevation X-axis, and R 2 and P are the coefficient of determination and statistical significance values between POD and elevation localized models or data sources to accurately correct all GPPs. CHIRPS and IMERG_Final exhibited the smallest range of PBIAS (− 25 to + 25%) in areas with elevations less than 2500 m, suggesting that they provided more accurate estimates of precipitation in these regions. However, CHIRPS, IMERG_Final, PERSIANN_CDR, and MERRA_2 overestimated rainfall in areas with elevations greater than 2500 m. This underscores the limitations of GPPs in accurately estimating precipitation in regions with complex terrain and high elevations. Figure 9 shows the variations in categorical variable indicators, POD, FAR, and ACC of the nine selected GPPs for six sub-basins (Narayani, Kaligandaki, Budhigandaki, Trishuli, Marsandi, and Mediseti). The GPPs that performed the worst (with POD values below 60%) were CHIRPS, followed by CPCGPRT, PERSIANN_CCS, and CMORPH for the six sub-basins. ERA5, IMERG_Final, IMERG_Late, PER-SIANN_CDR, and ERA5 showed higher POD (> 75%) for all these sub-basins. Furthermore, FAR was lower than 20% for all the GPPs except PERSIANN_CCS, PERSIANN_CDR, and ERA5 in three sub-basins: Narayani, Kaligandaki, and Mediseti. Regarding FAR, ERA5 was the worst performer, with more than 25% for each sub-basin. IMERG_Final was the best-performing GPPs for all the sub-basins and had less than 15% FAR across 50% of the sub-basins. The accuracy (ACC ) of all the GPPs, except PERSIANN_CCS, was found to be more than 75% across all the sub-basins. CHIRPS and ERA5 showed low accuracy across the Budhigandaki, Trishuli, Marsandi, and Mediseti sub-basins. The ACC values were similar for IMERG_Late, IMERG_Final, PERSIANN_CDR, MERRA_2, and ERA5. Based on these three categorical performance indicators, IMERG_Final performed better than the other selected GPPs. Furthermore, it was observed that the performance of all the precipitation products regarding their ability to detect rainfall was better in the bigger basin area than in smaller sub-basins. This might be due to the greater number of rain gauges available in the larger sub-basins, which would better capture the spatial patterns of rainfall. It can be concluded that the evaluation of the GPPs at the sub-basin scale is influenced by the number of gauge stations and the method of calculating the basin's areal rainfall. Therefore, the variations in the performance of these precipitation products in this research might be due to the lack of stations available in the smaller sub-basins.

Sub-basin-wise performance of GPPs
The bar diagrams of the continuous variable indicators-RMSE, CC, and PBIAS-for the six sub-basins (Narayani, Kaligandaki, Budhigandaki, Trishuli, Marsandi, and Mediseti) at the daily timescale are presented in Fig. 10 (a), (b), and (c), respectively. The lowest RMSE values (approximately 6 mm/day) with all the GPPs were found for the Narayani, Kaligandaki, and Trishuli sub-basins. In contrast, the highest RMSE (14 mm/day) with all the GPPs was observed for the Mediseti subbasin. Interestingly, despite differences in the overall performance of the GPPs based on POD, and FAR, all GPPs showed similar RMSE (around 8 mm/day) in all subbasins. This suggests that while RMSE is a useful metric for evaluating overall accuracy, it may not capture differences in performance in specific regions or under varying conditions. These findings highlight the importance of using multiple evaluation metrics when comparing GPP performance. Precipitation estimates from all the GPPs, except PERSIANN_CCS, were better correlated with gauge-based observed rainfall at Narayani and Kaligandaki while less well correlated for the remaining sub-basins on a daily timescale. The CC of RPPs was higher than SPPs for the Budhigandaki, Trishuli, Marsandi, and Mediseti sub-basins. ERA5 and IMERG_Final had the highest CC value in the Narayani and Kaligandaki sub-basins, followed by CPCGPRT and MERRA2; PERSIANN_CCS had the lowest CC value in all sub-basins. Negative PBIAS values were observed with all the GPPs, excluding ERA5, in all the sub-basins except Trishuli. Lower magnitude PBIAS was observed for CHIRPS, IMERG_Late, and IMERG_Final. In comparison, higher PBIAS were seen for ERA5, CMORPH, and PERSIANN_CCS in all the subbasins except Budhigandaki and Mediseti. The observed differences in the performance of GPPs in different subbasins highlight the importance of considering the specific characteristics of a region when selecting a GPP for rainfall estimation. For example, PERSIANN_CCS performed poorly across all sub-basins, suggesting that it may not be a suitable option for rainfall estimation in the study region. On the other hand, ERA5 and IMERG_Final demonstrated higher CC values in the Narayani and Kaligandaki subbasins, indicating that they may be more appropriate for rainfall estimation in these regions.
The negative PBIAS values observed in most of the GPPs, except ERA5, suggest a tendency towards underestimation of rainfall in the study area. The lower magnitude of PBIAS observed for CHIRPS, IMERG_Late, and IMERG_Final compared to ERA5, CMORPH, and PERSIANN_CCS may indicate that these models are better suited for capturing the characteristics of rainfall in the study region. However, the higher magnitude of PBIAS observed in ERA5, CMORPH, and PERSIANN_CCS cannot be ignored, and further investigation is needed to better understand the reasons for their underestimation and to improve their performance in the

Assigned rank of GPPs
GPPs are crucial for various applications, and selecting an appropriate GPPs for a specific region is essential. However, not all precipitation datasets are suitable for all applications, so it is necessary to compare the performance of multiple datasets to select the best one. The chosen dataset must be able to replicate all the observed precipitation properties required for the particular application. Different statistical metrics are used to evaluate the performance of gridded precipitation datasets, but no single metric can measure all the dataset's properties simultaneously. Therefore, a combination of metrics is used to identify the best product. Determining which performance metric is more important than others is important because all metrics cannot be equally important. To logically and mathematically understand the importance level of different metrics used in the research, we assigned different weights to metrics employed using the entropy method, which assigned weights based on the degree of dispersion in decision criteria. High entropy represents high uncertainty coupled with the performance indicator, which means the indicator contains a low variation of information and is less valuable, i.e., lower weights. Figure 11 shows the variation in weights assigned to six different metrics across the six different sub-basins of the study area. Different weights were assigned to performance indicators based on their entropy values. POD and PBIAS are the most critical performance indicators for evaluating GPPs in our study area because they receive the highest weights. The assigned weights agree with the discussion in Figs. 9 and 10. We can clearly observe variations in the performance of different GPPs based on PBIAS and POD, emphasizing the importance of these metrics. However, distinguishing between GPP performances based on CC is difficult since all GPPs performed equally across the subbasin. This suggests that CC may be less critical in evaluating GPPs performance.
Furthermore, FAR and ACC received almost equal weightage, followed by RMSE, and CC received the lowest weightage. This indicates that FAR and ACC are relatively more important metrics in evaluating GPP performance, while CC is less significant. Discussing the assigned weights to metrics on individual sub-basin, POD received the highest weightage in all sub-basins except Budhigandaki, followed by RMSE and CC (see the Figure S2 of supplementary material). In the Narayani sub-basin, POD received the highest Fig. 9 Variations in the performance of the Global Precipitation Products (GPPs) in terms of probability of detection (POD), false alarm ratio (FAR), and accuracy (ACC ) in the different sub-basins of the Gandak River Basin, Nepal, from 2003 to 2017. SPPs and RPPs represent satellite precipitation products and reanalysis precipitation products, respectively. The dotted black V-Line denote the line of separation between SPPs and RPPs weightage, and CC received the lowest weightage. However, CC received lower weightage in all other sub-basins except Mediseti and Trishuli. The weights assigned to performance indicators in different sub-basins followed a similar pattern.
With the assigned weightage, using compromise programming, the GPPs in the six sub-basins of the study basin were ranked based on their distance from the ideal solution (Table 3). IMERG_Final ranked first in five sub-basins and second in the remaining sub-basin. ERA5 received the first and second ranks in Kaligandaki and Mediseti, respectively. Both IMERG_Final and ERA5 performed exceptionally well in terms of POD and PBIAS, which received the highest weightage and resulted in the best ranking for these two products. None of the GPPs with a spatial resolution of less than 0.1 degree could secure a top-three ranking in any subbasins. As discussed in Section 3.2, GPPs with a finer grid could not perform exceptionally well, leading to lower ranks in this section since our ranking approach relies entirely on evaluation matrices. PERSIANN_CCS, a real-time GPP, received the last ranking, which is not surprising since it consistently underestimated the rainfall and performed poorly throughout each evaluation step. Although CHIRPS was able to capture temporal and spatial rainfall patterns on monthly and yearly timescales, it was not ranked first in any sub-basin. The ranking of GPPs was based on only six indicators calculated on a daily timescale. CHIRPS showed the worst performance in all performance indicators except PBIAS across all subbasins. Additionally, CHIRPS significantly underestimated the number of rainfall events of more than 1 mm/day (as seen in Figure S1 of the Supplementary Material), a major cause of poor daily performance. This shows that a GPPs does not necessarily have similar performance at different temporal scales, as indicated by CHIRPS, which was superior at the monthly timescale while performing poorly at the daily timescale.

Conclusions
The performance of 12 Global Precipitation Products was comprehensively assessed with several categorical and continuous variable performance indicators for hydro-meteorological applications in the Gandak River Basin, located in Nepalese Himalayan Region. CHIRPS and IMERG_Final showed better skills at representing the spatial and temporal pattern of the annual rainfall in the basin. Eleven out of twelve GPPs underestimated the annual rainfall in the basin, while ERA5 overestimated it. POD of rainfall is found to be highest for IMERG_Final, ERA5, and PERSIANN_CDR; however, all GPPs have significantly high FAR. At higher elevations, all GPPs are found to perform poorly, as evident Fig. 10 Variations in the performances of the Global Precipitation Products (GPPs) in terms of (a) root mean square error (RMSE), b correlation coefficient (CC), and (c) percentage bias (PBIAS) in the different sub-basins of the Gandak River Basin, Nepal. SPPs and RPPs represent Satellite Precipitation Products and Reanalysis Precipitation Products, respectively. The dotted black V-Line and H-Line denote the line of separation between SPPs and RPPs, and the line of 6 mm/day for RMSE, 0.50 for CC, and ± 25% for PBIAS ◂ Fig. 11 Variations in weights assigned to six different performance indicators using the entropy method across subbasins of the study area by higher FAR, PBIAS, and lower CC. High PBIAS (more than 100%) and low PBIAS (less than 25%) were found in the higher (> 2,500 m) and lower (< 2,500 m) elevation zones, respectively, for almost all the GPPs. Moreover, all the GPPs overestimated (underestimated) rainfall in the higher (lower) elevations. Overall, IMERG_Final scored better in the majority of the performance indicators, and it ranked first in five out of six sub-basins and second in the remaining one. Although the analysis shows that there is plenty of room for improvements in estimating the rainfall using GPPs, IMERG_Final would be the better choice among the existing datasets where daily rainfall data is required. Performance of GPPs at monthly (and annual) scale was not consistent at daily scale. At coarser temporal resolution (monthly and annual), CHIRPS performed equally well as IMERG_Final. Given that CHIRPS has a better spatial resolution (0.05°), longer data availability (1981 onwards), it will be suitable for assessing the water long-term water resources, drought assessment, irrigation planning and other such applications. However, IMERG_ Final stands out among others at daily temporal scale. With higher spatial (0.1°) and temporal (30 min) resolutions, IMERG_Final has several potential applications in hydropower, flood studies, water resources modeling, etc.
This study has contributed to the development of a comprehensive framework for evaluating the performance of GPPs, which can be applied in other river basins to improve the selection of accurate rainfall data in hydro-meteorological applications. The findings of this study also highlight the importance of considering the temporal scale when assessing the performance of GPPs. Further research is recommended to assess the performance of GPPs at a subdaily timescale, which can aid in flood risk assessment and management. However, there are some limitations associated with this study that could be addressed in future research. For instance, the analysis focused on daily performance metrics, and future studies could explore the performance of GPPs at an annual scale for each sub-basin. This would provide a more comprehensive understanding of the overall performance of GPPs in the study basin.