Bias correction method of high-resolution satellite-based precipitation product for Peninsular Malaysia

Satellite-based precipitation (SBP) is emerging as a reliable source for high-resolution rainfall estimates over the globe. However, uncertainty in SBP is still significant, limiting their use without evaluation and often without bias correction. The bias correction of SBP remains a challenge for atmospheric scientists. The present study evaluated the performance of six SBPs, namely, SM2RAIN-ASCAT, IMERG, GSMaP, CHIRPS, PERSIANN-CDS and PERSIANN-CSS, in replicating observed daily rainfall at 364 stations over Peninsular Malaysia. The bias of the most suitable SBP was corrected using a novel machine learning (ML)-based bias-correction method. The proposed bias-correction method consists of an ML classifier to correct the bias in estimating rainfall occurrence and an ML regression model to correct the rainfall amount during rainfall events. Besides, the study evaluated the performance of different widely used ML algorithms for classification and regression to select the most suitable algorithms for bias correction. IMERG showed better performance, showing a higher correlation coefficient (R2) of 0.57 and Kling-Gupta Efficiency (KGE) of 0.5 compared to the other products. The performance of random forest (RF) was better than the k-nearest neighbourhood (KNN) for both classification and regression. RF classified the rainfall events with a skill score of 0.38 and estimated the rainfall amount during rainfall events with the modified index of agreement (md) of 0.56. Comparison of IMERG and bias-corrected IMERG (BIMERG) revealed an average reduction in RMSE by 55% in simulating observed rainfall. The proposed bias correction method performed much better when compared with the conventional bias correction methods such as linear scaling and quantile regression. The BIMERG could reliably replicate the spatial distribution of heavy rainfall events, indicating its potential for hydro-climatic studies like flood and drought monitoring in the study area.


Introduction
Estimating rainfallʼs spatial distribution and temporal variability is vital for any hydrological or climatic study Ahmed et al., 2017). In-situ observations are the most reliable precipitation data; however, they are insufficient to provide details of spatial rainfall distribution in most regions due to the sparse distribution of rain gauges , Nashwan et al. 2019. Satellite-driven products are emerging as a dependable source of high spatial and temporal resolution rainfall measurement globally (Bhatti et al., 2016;Noor et al., 2020). Satellite-based precipitation products (SBP) have shown their potential in different hydroclimatic studies such as floods modelling, drought monitoring, water budgeting, and hydrological change assessment (Xie and Xiong 2011, Bitew and Gebremichael 2011, Yong et al., 2010Nashwan et al., 2020).
Higher spatial resolution rainfall data is vital for capturing the spatially heterogeneous pattern (Tan et al., 2015; Significance Statement A two-stage novel bias correction algorithm is proposed to correct the bias in the best suitable satellite-based precipitation product of Peninsular Malaysia to obtain a high-resolution, reliable precipitation dataset for hydrological modelling. Shiru et al., 2022). Highly localized intense rainfall is a common phenomenon in the tropical region due to higher convective activities (Sa'adi et al., 2020), which shares about 70% of the total rainfall (Badron et al., 2015). Such rainfalls are usually short and intense, with smaller areal coverage (Zafar and Chandrasekar 2004). The convective events mostly have a cell diameter of < 10 km, which is much lower than the density of rain gauges in most regions of the globe (Schroeer et al., 2018). Hence, most rainfall activities in the tropical region cannot be detected using the existing rainfall monitoring network. It has been reported that rising temperature would enhance convective moisture convergence Schumacher 2015, Shiru et al. 2020), which eventually would cause an increase in the amount of convective rainfall and decrease their spatial extents (Wasko et al., 2016). Therefore, much higher spatial resolution rainfall data will be required for tropical rainfall analysis in the near future.
SBP are generally passive microwave and infrared radiance precipitation retrievals. Several SBP datasets have been developed after the success of the Tropical Rainfall Measuring Mission (TRMM). The precision of SBP depends on the sensor used and the retrieval algorithm applied (Hsu et al., 1997). The recently used rainfall retrieving algorithms such as Integrated Multi-satellite Retrievals for global precipitation measurement (GPM) (IMERG) (Huffman et al., 2015), Goddard profiling algorithm (Kummerow et al., 2015) and SM2RAIN-ASCAT (Brocca et al., 2019) showed impressive rainfall estimation capability over various regions (Suliman et al., 2020). However, the capability of the sensors and retrieval algorithms varies widely with geography and climate. For example, many SBP products overestimate precipitation over the desert and underestimate precipitation over the forest. The bias in SBP also varies significantly between arid and tropical climate zones (Nashwan et al. 2019, Ushio et al., 2009. Moreover, topography and cloud type add different types of bias in SBP (Sun et al., 2018;Serrat-Capdevila et al., 2016).
Biases in SBP in the tropical maritime continent in Southeast Asia is much more complex. Rainfall in this maritime continental region is defined by a complex interaction of ocean, irregular land-ocean interface and highly variable topography. Accurate estimation of such a complex rainfall process using the existing satellite sensors and retrieval algorithms is often impossible. Therefore, the most appropriate SBP selection for such a region and the correction of biases in SBP is important before application in any hydroclimatological analysis (Katiraie-Boroujerdy et al., 2020).
Several studies have been conducted to assess the performance of SBP products over the maritime continent of Southeast Asia. Peña-Arancibia et al. (2013) evaluated three reanalyses and three SBPs. They showed that the reanalysis product performed well, except for the months affected by the Asia-Pacific monsoon. They also reported that SBP better estimates convective rainfall in the Southeast Asian region. Rauniyar et al. (2017) evaluated nine satellite products over the maritime continent and revealed inconsistent under or overestimation in different areas, including mountain, ocean and coast. Mahmud et al. (2017) analyzed the skill of GPM precipitation in Malaysia and showed various percentages of error in replicating the spatial variability of observed rainfall in different regions. Other existing literature also reported inconsistency in the performance of various SBP products, such as Tan et al. (2015) found TRMM to perform better in Malaysia, while Soo et al. (2020) showed CMORPH as the best SBP product. Soo et al. (2019) reported GSMap-NRT a better product in simulating the flow of the Kelantan River, Malaysia. Semire et al. (2012) reported that among the three TRMM products (3B42 V6, 3B43 V6, 3A12 V6), 3B43 V6 showed a better correlation with the observed data in Peninsular Malaysia. A comparative study made by Tan and Santo (2018) found real-time IMERG has better performance than other SBPs in Peninsular Malaysia. Tan and Duan (2017) also showed the least systematic bias in IMERG in detecting daily precipitation in Singapore. Overall, the review of the performance of SBP in Peninsular Malaysia indicates IMERG as most reliable for rainfall estimation, though it has a significant bias. Recently, a new SBP SM2RAIN-ASCAT showed its potential in different regions (Gupta et al., 2019;Paredes-Trejo et al., 2018). However, the capability of SM2RAIN-ASCAT in the tropical maritime continental region compared to other products such as CHIRPS, PERSIANN and GsMap has not been evaluated yet.
Though a large bias in SBPs in Peninsular Malaysia has been reported, no attempt has been made to correct the bias in SPBs. Several efforts have been made to correct biases in SBP in other parts of the world using different methods, including linear regression (Yang et al., 2016;Alharbi, 2019), distribution function matching (Mastrantonas et al., 2019), mean bias correction (Hashemi et al., 2017, Chaudhary andDhanya 2019), distribution mapping (Katiraie-Boroujerdy et al., 2020), Linear Scaling (Shiru et al. 2020) and Bayesian algorithm (Ma et al., 2018). Recently, machine learning algorithms have also been introduced for satellite precipitation bias correction. Pratama et al. (2018) combined a genetic algorithm with a power transformation method for satellite precipitation bias correction. Le et al. (2020) used a neural network to correct satellite precipitation bias in the Mekong River basin. Studies revealed improvement in SBP performance after bias correction. However, significant bias still exists in replicating different rainfall extremes such as consecutive dry days and extreme rainfall amounts, which are most important for estimating dry spells and floods. Therefore, there is a need for a better bias correction technique of SBP.

3
The objectives of the present study are (1) to compare the performance of recently available SBPs, namely, SM2RAIN-ASCAT, IMERG, GSMaP, CHIRPS, PERSIANN-CDS and PERSIANN-CSS, and (2) to reduce bias in the best performing SBP using a two-stage bias correction approach consists of a classifier and a regressor. The novelty of the present study is a machine learning-based double bias-correction approach for bias correction of satellite rainfall. The biascorrected IMERG data generated in this study can be used for hydrological and metrological studies at a fine resolution. The bias adjustment methodology is developed in two main steps, classification (rain/no rain) and regression on rainy days.

Geography of Peninsular Malaysia
The methodology opted in the paper was applied to Peninsular Malaysia, which is situated in South East Asia with a latitude 1.20° to 6.40° N and longitude 99.35° to 104.20° E. Situated near the equator, the climate of Malaysia is humid and hot. The Rainforest climate of the region is controlled by the Asian-Australian atmospheric dynamics, along with land-sea interaction, varying topography and monsoon winds (Webster et al., 1998). The average daily temperature varies between 21 and 32º C, with an average annual variation of 3ºC. The annual average rainfall is about 2000-4000 mm, with an average of 150-200 rainy days per year (Tan et al., 2014;Noor et al., 2019). The distribution pattern of precipitation in the region is established with the integrated response of local topography and wind flow direction.
Peninsular Malaysia has two seasons throughout the year, i.e. Southwest Monsoon (SWM), which prevails from May to August, and Northeast Monsoon (NEM) which exists between November and February. Extreme rainfall events are generally observed during NEM, whereas the weather is dry during SWM . Coastal areas are under the influence of NEM, whereas the monsoon less influences higher altitudes (Ziarh et al., 2021). Peninsular Malaysia represents the humid weather with recorded maximum precipitation during the 'inter-monsoon period'. Rain gauge density is the major source of uncertainty in evaluating any rainfall product. Better satellite precipitation evaluation is possible with dense observation (Gadelha et al., 2019). World Meteorological Organization (WMO) recommended one station per 575 km 2 as the optimum threshold of raingauge density (WMO, 1994). In this study, the record of 364 stations, distributed over Peninsular Malaysia covering an area of 132,265 km 2 , was used. The gauging density is one station per 363 km 2 .

Data and sources
The data recorded at 432 rain-gauges were acquired from the Department of Irrigation and Drainage (DID) Malaysia. The stations having less than 10% missing values (total 364 stations) were only considered in this study. The location of the observed station is shown in Fig. 1. The inverse distance weighting (IDW) method was used to interpolate all the SBP rainfall at the selected 364 stations to evaluate their performance. IDW considers the influence of neighbouring points according to their distance from the station location. It provides better interpolation when densely gauged data is available.
SM2RAIN-ASCAT is a global-scale precipitation product generated from the European Space Agency. Climate Change Initiative project soil moisture data (Brocca et al. (2014). SM2RAIN-ASCAT is available on a daily scale at a spatial resolution of 12.5 km for the period of 2007 − 2019 (Ciabatta et al., 2018;Liu et al., 2011Liu et al., , 2012. Brocca et al. (2014) estimated rainfall through inversion of a soil moisture estimation Eq. (1), where p(t) represents the computed precipitation; Z* is the soil moisture capacity; s(t) shows the soil saturation at a time, t; and a and b are the parameters showing the relation between drainage and soil saturation, which are calculated following a calibration procedure. The algorithm showed accurate results for global and regional scales (Ciabatta et al., 2015(Ciabatta et al., , 2017Abera et al., 2016;Brocca et al., 2014). The computation of rainfall from soil moisture using SM2Rain is technically based on the inversion of soil water balance equation which calculates the rainfall separated from runoff, evapotranspiration and infiltration. The data sets were obtained from (https:// doi. org/ 10. 5281/ zenodo. 36359 32).
IMERG provides different versions of multi-satellite precipitation data. In this study, IMERG Version 06 Level 3 daily precipitation data of 10-km resolution was employed. The dataset is developed by NASA GES data and Information Services Centre from the half-hourly precipitation data of GPM_3IMERGHH by summing the daily precipitation with a latency of 2-3 months. The data were obtained from the GPM website (http:// pmm. nasa. gov/ data-access/ downl oads/ gpm/). Three types of products are generated from the IMERG algorithms (early, late and final run). IMERG-ER performs on forward propagation, whereas IMERG-LR employs backwards and forward propagations to allow interpolation and extrapolation. The GPCC monthly precipitation is used to correct the Late Run of GPM to generate IMERG-Final Run (Huffman et al., 2020).
GSMaP is a multi-satellite product having global coverage developed under the global precipitation measurement The global precipitation map has been developed using GPM core satellite data, Dual-Frequency Precipitation Radar and other GPM constellation satellites. The GSMaP has two products: GsMap GC (Gauge Calibrate) and GsMap NRT (Near Real-Time). The former is the integration of the output of passive microwaves radiometers and infrared images, whereas the latter is a bias-adjusted product using the NOAA CPC by an algorithm developed by (Mega et al., 2019). In this study, GSMaP gauge calibrated product, having a spatial resolution of 0.1° × 0.1°, was used. The data is freely available at https:// shara ku. eorc. jaxa. jp/ GSMaP/ index. htm.
Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) developed by applying artificial neural network (ANN) on the infrared brightness temperature images of geostationary satellites by Center for Hydrometeorology and Remote sensing (CHRS) the University of California, Irvine. PER-SIANN-CSS uses the threshold cloud segmentation algorithm to separate and classify the cloud patches, whereas PERSIANN-CDR is developed using GridSat-B1 infrared data and bias-adjusted using GPCP product. PERSIANN-CCS have a spatial resolution of 0.04° × 0.04°, whereas PERSIANN-CDS have a resolution of 0.25° × 0.25° (Nguyen et al., 2018). These two products were obtained from https:// chrsd ata. eng. uci. edu/.
Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) developed to support the United States Agency for International Development Famine Early Warning Systems Network has a spatial resolution of 0.25° and 0.05° (Funk et al., 2015). In this study, the CHIRPS data having 0.05 resolution was used to evaluate the performance of SBP.

Methodology
The satellite products were evaluated using categorical and continuous indices to assess their relative performance in replicating spatial and temporal variability of observed rainfall of Peninsular Malaysia. A two-stage ML-based bias correction method is proposed in this study to correct the bias of the best SBP product against the observed rainfall to improve its performance. The methods used in this study are elaborated in the following subsections.

Measuring data performance
The performance of SBP products was evaluated employing four statistical metrics, coefficient of determination (R 2 ), percentage of bias (Pbias) and normalized root mean square error (NRMSE), modified index of agreement (md) and Kling Gupta Efficiency (KGE) as given below: Fig. 1 Location of the study area and rain gauges. Red circles indicate the observed stations used for calibration, while the purple triangles represent the stations used for validation In Eqs. (2 − 6), x obs,i and x sat,i are the i th observed and satellite data; x obs and x sat are mean observed and mean satellite data, N is the sample size equal to the daily observations and are the standard deviation of the satellite and observed data, where j is an arbitrary power (positive integer) used to calculate md in this case we use j = 1. The values of NRMSE and Pbias near to 0 and R 2 , md and KGE near to 1 represent better matching with the observed data.
Five categorical metrics, namely, hit rate (HR), Heidke Skill Score (HSS), Gerrity Skill Score (GSS), Hit Bias (HB) and Pierce Skill Score (PSS), were used to measure the capability of SPB to identify rainfall/no rainfall days. The mathematical equations of these metrics are given in Eqs. (7 − 11). The contingency table used for estimating categorical matrices is given in Table 1. The explanation of the joint distribution of the output of the contingency table is given in Table 2.
The equation used to calculate the metrics are as below: where FA means false alarm, CN indicates correct negative and s ij is the scoring matrix. The HR, GSS and PSS range between 0 and 1, and HB from 0 to ∞, where 1 represents a perfect forecast (Gerrity 1992). The HSS ranges from − 1 to + 1, where a value closer to + 1 shows better performance (Heidke, 1926).

Bias correction
A double correction approach is introduced in this study, where correction for zero rainfall days was first done using a classifier, and then rainfall amount in rainy days was corrected using regression. Classified values of the best SBP rainfall based on rainfall/no rainfall was considered as input, and observed rainfall/no rainfall was used as an output to develop a classification model. The rainfall values classified as rainy days were then corrected using a regression model. An individual model was developed for each month to consider the seasonal rainfall variability. Rainfall data of 70% of stations or 255 stations were randomly selected for model calibration, while the model performance was evaluated at the remaining 109 stations. Rainfall data of all days of a month at all the stations, considered for calibration, were merged for model calibration. The validation was also performed in each station separately.
This study also evaluated the performance of two machine learning algorithms, random forest (RF) and k-nearest neighbourhood (KNN), to select the best classifier. The performance of two regression algorithms, RF and artificial neural network (ANN), was evaluated to find the best method to predict   rainfall on rainy days. RF and KNN were chosen as classifiers as those have been reported as the most suitable classifier among many others (Fernández-Delgado et al., 2014). Two classical ML algorithms (RF and ANN) were used due to their ability to simulate complex nonlinear relationships effectively (Pour et al., 2020;Sáadi et al., 2017). The recent studies which used the RF regression model to correct SBP (King et al., 2020;Beck et al., 2020) found it very effective in correcting the biases. They applied the RF model on SNOw Data Assimilation System (SNODAS) and found an improvement of 86% in RMSE. The ML methods are described in the following sections. Details of the algorithms can be found in the literature cited in their description.

Random forest
RF (Breiman, 2001b) is amongst the most effective ML algorithms for predictive modelling. Rigorous improvements have made it more applicable in various research fields and improved its significance in ML (Fawagreh et al., 2014). It is a flexible algorithm that can be used for both classification and regression. It creates randomness in decision trees using bootstrapping for sampling, which helps RF to be more insensitive to overfitting (Heung et al., 2014;Nashwan and Shahid, 2019). The RF regression model is a kind of additive model that predicts the decision from the sequence of basic models shown in Eq. (12).
Given a training set for bootstrap aggregating, X = x 1 ,x 2 ,x 2⋯⋯•x n having a response of Y = y 1 ,y 2 ,y 2⋯⋯• y n . These samples are bagged N times to find fitting trees to these training sets, where f i (x) represents the random forest trees and f is the prediction of an unknown point using the bootstrapped data.

K-nearest neighbour
The KNN is an efficient nonparametric classification algorithm that assigns data to a class based on its nearest neighbours (Huang et al., 2017). In the particular classification problem, assuming that T = x n ∈ R d N n=1 indicates a training set comprises of N samples within each M class in d-dimension; the sample is assigned the class mark " c n , the distance between the unknown point and is estimated using Euclidean distance method as given in Eq. (13), Next, the class name of the query point x is estimated based on the majority voting of its neighbours as shown in Eq. (14),

Artificial neural network
ANN is one of the most prevalent ML algorithms which has been extensively studied and commonly used in many fields. The ANN used in this study is a multilayer feed-forward network with the backpropagation learning algorithm (Sivapragasam et al., 2010). ANN generally consists of an input, a hidden and an output layer (Urbanczik, 1996). The inputs (IMERG rainfall) through the input layer are fed into the ANN. In ANN, the difficult task is to assign the number of hidden layers. Generally, the number of layers is selected based on trial and error processes (Ghaffari et al., 2006, Shankar andBandyopadhyay 2007). The best performance in this study was found for an ANN with a single hidden layer. The hidden layer blends weights and biases with the inputs by employing activation functions to generate output. A single input is used in this study (x) which is blended using a vector of weight to determine the simulated output, using Eq. 15).
where f(.) is the activation function, w j is the weight vector and b j is the bias at node j. is the only input at node to the hidden layer, or it acts as output in the case of a single layer and represents the number of nodes. Each neuron has a transfer function to represent the neuron's internal activation level. Various transfer functions such as hyperbolic, tangent linear and sigmoidal are used for different relationships (Shankar and Bandyopadhyay 2007).
The generalized form of a sigmoidal transfer function is given in Eq. 16.
In this equation z j represents the output at node j . In a multilayer neural network, the weight w j is assigned by the backpropagation method at each node to determine the unknown data correctly.

Evaluation of satellite data
The spatial distribution of annual average rainfall estimated using observed and SBPs are presented in Fig. 2. All the SBPs were interpolated to a resolution of 0.1° × 0.1° using the IDW method to evaluate their capability in replicating spatial rainfall distribution. The spatial distribution of observed and SBP rainfall are shown in Fig. 2. Both IMERG and SM2RAIN-ASCAT indicated the high rainfall regions in the northeast coastal region and low rainfall zone in the central part of the peninsula, whereas CHIRPS and PER-SIANN-CSS showed overestimation in the overall study area and GSMap showed relatively higher rainfall in the northeast coastal region. However, the comparison of the maps revealed the better capability of IMERG to replicate the spatial distribution of annual average rainfall. The assessment of map similarity using the correlation analysis showed a high R 2 of IMERG with observed rainfall (0.56) compared to SM2RAIN-ASCAT (0.15), GSMap (0.18), PERSIANN-CDR (0.14), PERSIANN-CSS (0.10) and CHIRPS (0.13) with a significance level > 0.001.
The rainfall time series of six SBPs were also compared with the observed rainfall time series at each station for the period 2007 − 2019 to show their relative performance. Four statistics metrics were used for this purpose. Results are shown using box-whisker diagrams in Fig. 3. The boxwhisker diagrams in the figure were prepared using the statistical metrics estimated at all the 364 stations, where the horizontal line inside the box indicates a median value of a metric, while the height of the box represents an interquartile range of the metric.
The results showed that the lowest median NRMSE value was achieved by IMERG (0.9) and the highest value (3.8) by SM2RAIN-ASCAT, indicating that the absolute error for IMERG is much less than SM2RAIN-ASCAT and all other products. IMERG also showed a good correlation with the observed station data compared to the other products. The R 2  Figure 4 shows the probability distribution function (PDF) of areal average monthly observed and satellite rainfall for the period 2007-2019. The results showed a better match of observed PDF with the IMERG PDF than the other SBPs. PERSIANN-CDS performed better after IMERG, whereas SM2Rain was the worst in replicating the observed rainfall probability. The IMERG showed a bit higher mean than the observed, but it reliably replicated the observed rainfall variability.
IMERG showed a better correlation with observation and less bias in replicating the spatial and temporal variability of observed rainfall. Previous studies showed that the product with less bias and higher correlation with observation generates a better bias-corrected product (Valdés-Pineda et al., 2016). Therefore, only IMERG, among all SPBs, was selected for bias correction.

Bias correction
Though IMERG better replicated the gauged rainfall according to different statistics, still there was a significant bias in IMERG rainfall. The percentage of bias in IMERG was between − 0.39 and 0.64, with a median of − 0.16, as shown in Fig. 3. This indicates the need for improvement of the performance of IMERG rainfall. The performance of the proposed bias correction technique in detecting rainfall/no rainfall days and estimating the amount of rainfall in rainy days are discussed in the following subsections.

Performance of classifiers
Classifiers were developed to correct the number of rainfall days in IMERG data. RF and KNN were applied to SBP to correct the zero rainfall days. The zero rainfall corrected data was compared with the observed data using the categorical indices at all the stations used for model validation. The IMERG rainfall days before and after classification using KNN and RF are presented using boxplots in Fig. 5. The results revealed that both the classification algorithms increased the performance of IMERG in terms of all statistics. The performance comparison of RF and KNN revealed  1.20, 1.1 and 1.15, respectively, and the PSS were 0.31, 0.38 and 0.35, respectively. It means that the prediction of rainfall days was more accurate using RF than KNN.

Performance of regression models
Regression models were developed to predict the rainfall amount on rainy days. Models were developed using both ANN and RF. The model's performance in estimating rainfall amount in rainfall days at all the stations used for model validation is presented using a boxplot in Fig. 6. The performance of RF and ANN regression models was evaluated using different statistical indices to determine the most suitable model. The relative performance of RF and ANN clearly shows the superiority of RF in estimating rainfall amounts on rainy days. The NRMSE, R 2 , PBIAS and md for RF estimated rainfall were 1.4, 0.34, 0.02 and 0.56 compared to 2.5, 0.13, − 0.01 and 0.43, respectively, for ANN. Therefore, RF was used as the regression model for estimating rainfall in the proposed bias removal technique. The estimated rainfall by RF model in rainfall days was merged with classifier output to generate the entire time series.

Comparison of performance with the conventional model
The performance of the newly developed bias-correction method was compared with two widely used conventional bias correction methods, linear scaling (LS) and quantile regression (QR). The relative performance of the methods was estimated using two categorical and two continuous indices, as presented in Fig. 7. The newly developed method showed higher performance than the conventional methods in correcting SBP bias. The RF reduced RMSE in rainfall by 55% compared to 20% by LS and 24% by QR, indicating about 125% better performance of the newly developed model than LS and QR.

Performance in rainfall amount estimation
The BIMERG data was compared with observed data to estimate PBIAS and R 2 for each calendar day. Obtained results are presented in Fig. 8. The PBIAS and R 2 were calculated using data of all grids in the study area for each calendar day. PBIAS and R 2 values of each calendar day are plotted in Fig. 8. The blue line in the figure represents the BIMERG, and the orange line represents the IMERG before correction. The results revealed a large reduction of bias and a significant improvement in correlation after bias correction. The mean bias of IMERG for all the days considered for analysis was 158%, which was reduced to − 13.3% in BIMERG. The mean value of the R 2 for IMERG was 0.03, which was increased to 0.24 after correction using the RF regression model. The R 2 was also found significant at p < 0.05. Therefore, BIMERG can be recommended for hydro-climatic studies in Peninsular Malaysia.

Performance in the estimation of spatial distributions of rainfall
The maps of the spatial distribution of observed and BIMERG annual and seasonal rainfall were prepared to show the ability of BIMERG to replicate the spatial variability of rainfall (Fig. 9). The northeast of the peninsula receives the highest amount of rainfall (3500-4000 mm/ year). BIMERG also replicated this zone with an average annual rainfall of 3500 − 4000 mm/year. BIMERG was also able to reconstruct the low rainfall patches reliably. Similarly, the BIMERG was capable of reconstructing the spatial variability of NEM and SWM rainfall. The higher amount of NEM rainfall in the coastal region of Peninsular Malaysia in the range of 1000-1400 mm was well simulated by BIMERG. Relatively low rainfall received in Peninsular Malaysia during SWM was also found in BIMERG.

Performance in constructing extreme rainfall events
The performance of BIMERG in reconstructing extreme rainfall events was also evaluated to assess its applicability in hydro-climatic studies in Peninsular Malaysia. Two heavy rainfall events were shown as examples: (1) Event 1 (December 09, 2007): Several areas of Perak and Kelantan in the northern peninsula received a high rainfall ranging between 37.5 and 137 mm, which caused a flash flood in several regions. (2) Event 2 (December 18, 2014): Heavy rainfall causes flood in Kuala Terengganu, located in the northeast of the Peninsular, when the peak rainfall at some of the stations was estimated above 300 mm (405 mm at Kg. Keruak di Ulu Besut and 354.5 mm at Ibu Bekalan Sg. Angga, Ulu Besut station). Figure 10 shows the ability of IMERG and BIMERG in reconstructing Event 1. IMERG underestimated rainfall at high rainfall regions while overestimated at low rainfall. The BIMERG estimated the extreme rainfall of 137 mm at Machang and the observed rainfall variability (37.5-98 mm) in other places as 40-80 mm. The comparative performance of IMERG and BIMERG in reconstructing Event 2 is shown in Fig. 11. The figure shows that IMERG underestimated the event at most of the observed stations, whereas BIMERG estimated the pertinent event reliably. An extremely high rainfall amount of 405 mm was underestimated by BIMERG as 350 mm. However, it was able to estimate the rainfall amount and distribution reliably. The results indicate the potential of BIMERG for hydro-climatic studies like flood monitoring in Peninsular Malaysia.

Discussion
The importance of accurate precipitation and temperature estimation is very intricate for hydrometeorological studies (Suliman et al., 2020). Satellite-based products are emerging as the most reliable source for spatially distributed weather estimates in data scares regions (Nashwan et al. 2019). Hydrometeorological studies require well-distributed and accurate datasets to simulate the hydrological and ecological behaviour (Abdulkareem et al., 2018). Several researchers have attempted to compare and evaluate the performance of satellite precipitation, such as CMORPH, PERSIANN-CDR, GSMaP_GC, GSMaP_NRT, IMERG (early, late and final runs), TRMM, GPCP, TMPA-3b42V7 and CHIRPS. Comparing the results of the previous studies and this current study, the majority of the researcher found IMERG-F as the best product for Peninsular Malaysia among all SBPs (Soo et al., 2020;DaSilva et al., 2021, Tan andDuan 2017). Soo et al. (2019) compared four SBPs to analyze their performance in replicating 2014-2015 extreme flood. The study found that IMERG has a better ability to replicate extreme flood events than TRMM, CMORPH and PERSIANN. Linear scaling, power transformation (PT) and local intensity scaling were applied to the data sets for bias correction. However, the conventional bias correction methods still leave some over and underestimations in corrected rainfall. In this study, the performance of six SBPs has been evaluated in comparison to the rainfall data obtained from 364 weather stations. Both categorical and continuous indices were used to assess their relative performance in replicating spatial and temporal variability. The IMERG showed better performance than CHIRPS, PERSIANN-CSS, GsMap, SM2Rain-ASCAT and PERSIANN-CDR. The better performance of IMERG can be due to the use of dual-frequency radar sensors in GPM, which captures the light rain more accurately (Su et al., 2019;Wei et al., 2018).
The literature suggests that the bias correction technique reduces IMERG data uncertainties and improves its ability (Soo et al., 2020). Therefore, a double correction approach is adopted in this study, where classification and regression were applied in two steps to reduce the uncertainties. The bias correction model was developed for each month to consider the seasonal variability. Two classifiers, RF and KNN, were applied to correct the zero rainfall days in IMERG data. The results revealed that both the classification algorithms improve the performance of IMERG in terms of all statistics. The performance comparison revealed the better performance of RF as compared to KNN. The reason for the outperformance of RF may be related to its efficiency in handling large datasets. Besides, RF does not require feature scaling like other classifiers. Moreover, RF is more robust for selecting training datasets and for reducing noise in the training dataset (Misra and Li 2020). RF also has a fast processing time and capability to avoid overfitting, as proved in other fields of research (Breiman, 2001a;Caruana et al., 2008).
Two regression models, RF and ANN, were developed to predict rainfall amounts on rainy days. The best performing ML model was compared with the conventional bias correction methods, such as linear scaling and quantile mapping. The newly developed method showed higher performance than the conventional methods in correcting SBP bias. The RF reduced RMSE in rainfall by 55% compared to 20% by LS and 24% by QR, indicating about 125% better performance of the newly developed model than LS. A similar result was observed by Chaudhary and Dhanya (2020) in bias correction of IMERG over India. The RF regression performs better because it simultaneously uses the bagging and randomization approaches (Breiman, 2001a).
To further validate the performance of the BIMERG in Peninsular Malaysia, the spatial distribution of observed and BIMERG annual and seasonal rainfall was compared. BIMERG was able to replicate the annual, NEM and SWM rainfalls and low rainfall values. The IMERG and BIMERG data during two real-time flood events were compared to examine their performance in simulating extreme rainfall events. IMERG showed underestimation, whereas the BIMERG closely resembled the extreme rainfall amount during both events. This result agrees with DaSilva et al. (2021) that IMERG bias-correction is essential before use in extreme rainfall analysis. This novel two-step bias correction approach estimated the low rainfall and replicated the extreme rainfall events. This indicates the potential of BIMERG for hydroclimatic studies like flood monitoring in Peninsular Malaysia.

Conclusion
The performance of SM2RAIN-ASCAT, CHIRPS, GsMap, PERSIANN-CDR, PERSIANN-CSS and IMERG in replicating observed daily rainfall data over Peninsular Malaysia is evaluated, and then the bias of the most suitable SBP is corrected using a novel two-stage bias-correction method. Comparing the 6 SBP with the observed data showed lower errors and a higher correlation of IMERG with observed rainfall than all other data sets. However, the results also revealed that IMERG still inherits significant biases. A novel two-stage bias correction method based on ML methods is proposed to correct bias in IMERG. A significant enhancement in the capability of BIMERG indicates the effectiveness of the method. Using a classifier before correcting the amount of rainfall through regression has made the bias-correction method highly efficient. However, the method's performance largely depends on rainfall/no rainfall day classification. Therefore, efficient classification and regression algorithms should be employed for the better performance of the method. In this study, the performance of two ML-based classifications and two regression algorithms was employed. In the future, the potential of other ML algorithms for classification and regression should be explored to improve the bias-correction method. Besides, the classifier can be used to classify rainfall events that belong to different classes of rainfall intensity. The regression model can be employed to estimate rainfall amount within the intensity class to explore the possibility of improving model performance.