Comparison of spatial interpolation methods for estimating the precipitation distribution in Portugal

Precipitation has a strong and constant impact on different economic sectors, environment and social activities all over the world. An increasing interest for monitoring and estimating the precipitation characteristics can be claimed in the last decades. However, in some areas, the ground-based network is still sparse and the spatial data coverage insufficiently addresses the needs. In the last decades, different interpolation methods provide an efficient response for describing the spatial distribution of precipitation. In this study, we compare the performance of seven interpolation methods used for retrieving the mean annual precipitation over the mainland Portugal, as follows: local polynomial interpolation (LPI), global polynomial interpolation (GPI), radial basis function (RBF), inverse distance weighted (IDW), ordinary cokriging (OCK), universal cokriging (UCK) and empirical Bayesian kriging regression (EBKR). We generate the mean annual precipitation distribution using data from 128 rain gauge stations covering the period 1991 to 2000. The interpolation results were evaluated using cross-validation techniques and the performance of each method was evaluated using mean error (ME), mean absolute error (MAE), root mean square error (RMSE), Pearson’s correlation coefficient (R) and Taylor diagram. The results indicate that EBKR performs the best spatial distribution. In order to determine the accuracy of spatial distribution generated by the spatial interpolation methods, we calculate the prediction standard error (PSE). The PSE result of EBKR prediction over mainland Portugal increases from south to north.


Introduction
Precipitation has a major influence on environment and society at different spatial and temporal scales. Inadequate information on climate factors, including precipitation, can generate significant costs to different sectors, such as agriculture, infrastructure and services. Accurate information about spatial distribution of the rainfall would improve the management of natural resources as well as the prediction of natural disasters resulted from floods (Hao and Chang 2013) or landslides (Lenda et al. 2016). However, precipitation data are often sparsely distributed in space and discontinuous in time, making the development of consistent climatology in certain areas difficult. Since it is expensive and sometimes difficult to cover regions such as the mountain or urban areas, the spatial interpolation methods represent a good alternative for generating continuous spatial information using the available measurements over certain areas.
One of the earliest definitions of spatial interpolation is given by Lam (1983), as a function that can best represent the study area by using a data set to estimate other values. Over the years, different interpolation methods have emerged, being classified into two broad categories: deterministic and geostatistical (Ly et al. 2011). Deterministic methods, such as local polynomial (LPI), global polynomial (GPI), radial basis function (RBF) or inverse distance weighted (IDW), generate continuous distribution of precipitation, starting from measured points using mathematical formulas to determine the similarity or degree of smoothing. Geostatistical interpolation methods, such as ordinary cokriging (OCK), universal cokriging (UCK) and empirical Bayesian kriging regression (EBKR), use statistical methods for generating spatial distribution (Li and Heap 2014).
When interpolation methods use a single variable in estimation, they are called univariate, and when they include co-variables, they are called multivariate. GPI is one of the univariates methods which establish one equation for the entire data set. It has the advantage to capture the overall trend, but is not suitable for places where the variation is higher. In contrast, LPI establish an equation for a specified neighbourhood, being able to capture the local variation of precipitation which can lead to better results (Wang et al. 2014). IDW is influenced by the measured precipitation in the immediate vicinity of the new prediction and performs better if the data is uniform distributed. It has the advantage to capture the extreme precipitation. One of the disadvantages of IDW is the presence of the bull's eye and the limitation of prediction between minimum and maximum of measured precipitation, not being able to estimate outside the range (Achilleos 2011;Chen and Liu 2012). RBF is a fast method that can predict outside the measured precipitation. It generates better prediction when is used for a larger data set. The method is not suitable if precipitation presents high variance in short distance (Apaydin et al. 2004;Johnston et al. 2001).
Multivariate methods can include co-variables. This may lead to more accurate estimates depending on the co-variables used (e.g. altitude, latitude, longitude) and the research area (Table 1). The multivariate extension of kriging differs one to another in terms of dealing with modelling the trend of precipitation (Goovaerts 1998). OCK assumes the mean of precipitation to be constant and unknown, while UCK assumes to be nonstationary. EBKR has the advantage to automatically configure the construction process of a kriging model, reducing the operator's interaction to the difficult aspects, being considered the most difficult stage in geostatistical interpolation. The disadvantage of the EBKR method is the long processing time when the size of the data subset and the overlapping factor are larger (Gribov and Krivoruchko 2020;Gupta et al. 2017).
In the specialized literature, there are studies where both categories of interpolations have been applied (Table 1). In Portugal, comparative studies have been made by different researchers. For example, Goovaerts (2000) compared the result obtained by three multivariate geostatistical algorithms (SK local means, kriging with external drift and collocated kriging), using DEM (Digital Elevation Model) as co-variable. The methods were compared with three univariate methods (Thiessen polygon, IDW and ordinary kriging (OK)). The results of the study showed that the multivariate methods outperform univariate methods. In another study applied in southern Portugal, Costa et al. (2008) mapped the extreme precipitation index (R5D), through sequential cosimulation, using altitude as co-variable. Their results show that altitude and precipitation vary locally and the maps generated can be used in hydrological management as well as in different climate applications. Pereira et al. (2010) tested four deterministic methods (IDW, spline with tension (ST), thin plate spline (TPS) and LPI) and two geostatistical methods (OK and COK) in order to estimate the monthly precipitation in September, November and December in the mountainous area of Portugal. The results indicate that precipitation is dependent on topography and the most precise interpolation method is ST.
Belo- Pereira et al. (2011) developed a daily grid data set of precipitation named IB02. Their study was applied to Portugal and Spain, using data measured from 2000 weather stations over a period of 53 years . Precipitation was interpolated with OK, and the result was compared with four global gridded data sets (ERA-40, ERA-Interim, Climate Research Unit and Global Precipitation Climate Center). The results of the study show that IB02 have similar performance with the rest of the gridded data sets, especially in the northwest and southeast. IB02 largely captures the particularities of the spatial distribution of precipitation.
In a recent study, Álvarez-Rodríguez et al. (2019) use a hybrid model to estimate monthly precipitation in Portugal by combining two interpolation models. Firstly, the monthly values and standard deviation are analysed using a moving regression equation based on altitude to highlight the local and seasonal variability of precipitation. Secondly, a weighted technique to highlight the non-linearity of precipitation is used. The result is a model that can be applied on large areas with complex surface.
The studies made till now show that precipitation is difficult to estimate and there is no confirmed method to estimate the spatial distribution of precipitation, as the performance of methods depends on a number of factors (e.g. surface type, data distribution, sample size, stratification, sample clustering, co-variable) (Ahmed et al. 2014;Arowolo et al. 2017;Daly 2006). Therefore, the comparison of interpolation methods remains necessary to identify the most appropriate method to be used in the prediction.
As a consequence, this study aims (1) to compare the performance of four univariate (GPI, LPI, IDW, RBF) and three multivariate spatial interpolation methods (OCK, UCK, EBKR); (2) to select the best co-variables for explaining the variability of precipitation; and (3) to choose the most precise interpolation method over the entire mainland Portugal and calculate the prediction standard error (PSE) as a part of uncertainty.
MLR Multiple linear regression; GWR geographically weighted regression; OLS ordinary least square; RK regression kriging; UK universal kriging; ANN artificial neural

Study area
Portugal is located in the south-western edge of continental Europe, it spans over an area between 37° and 42°N and by − 6.5° and − 9.5°W, and it shares the Iberian Peninsula with Spain. According to the Köppen-Geiger classification (Peel et al. 2007), mainland Portugal lies in the temperate climate zone which is conditioned by large scale circulation patterns, driven by subpolar transient low-pressure systems during the winter and by a mid-latitude semi-permanent anticyclone during the summer. In this typical dry season, climate contrasts arise between inland hot weather and mild temperatures along the coast (Cunha et al. 2011).
Another climate factor regards the geography which is characterized by typical elevation values between 1000 and 1500 m and a maximum height of 1993 m at Serra da Estrela ( Fig. 1), embedded in a northeast-southwest mountain range in the central region of mainland Portugal. The precipitation distribution is characterized by large spatial variations, with annual precipitation ranging between 400 at southeast inland and 2200 mm in the north-western region (Soares et al. 2012).

Data analysis
This study employs the multiannual precipitation amounts over 1991-2000 from 128 weather stations with complete data sets downloaded from National Informational System for Water Resources website (SNIRH) (https:// snirh. apamb iente. pt) ( Fig. 1; Table 1 of Online Resource 1). Precipitation data quality is primarily controlled by the SNIRH (https:// snirh. apamb iente. pt). SNIRH comprises a network of automatic and conventional weather stations which are non-uniformly  distributed across mainland Portugal. Table 2 presents the summary statistics of data.

Co-variables selection
In order to reduce the complexity of interpolation methods and to use only variables with significant impact, we analysed the degree of correlation (R) between precipitation amounts and the following co-variables: latitude, longitude, altitude, slope and aspect. The 30-s resolution DEM for mainland Portugal has been downloaded from http:// www. diva-gis. org/ gdata, and all the DEM predictors have been calculated in ArcGIS Pro using Geoprocessing tool. Slope was calculated in degrees, altitude in metres and aspect using planar method. All the co-variables were extracted at each weather station.
The results show that only latitude, altitude and slope contribute to the explanation of precipitation in Portugal. The altitude returns the strongest correlation, R = 0.64, and the aspect has the weakest relationship (Table 3).

Spatial interpolation methodology
The interpolation analysis was performed using the Geostatistical Wizard (GW) tool from ArcGIS Pro and the statistical analyses were made in R Studio. In order to compare the spatial interpolation methods, we generate the mean annual precipitation distribution using data from 128 rain gauge stations. The performance of interpolation was evaluated using cross-validation (CV), while the difference between the measured and the estimated values is quantified using mean error (ME), root mean square error (RMSE) and mean absolute error (MAE). The linear dependent similarity between measured values and estimated ones was also calculated using the correlation coefficient R. Also, we graphically illustrate the performance of interpolation methods using Taylor diagram. The prediction standard error (PSE) was calculated for the interpolation methods with the best results, as a part of uncertainty in order to choose the suitable method. The entire methodological flow adopted in this study is presented in Fig. 2.

Interpolation methods
The interpolation methods are briefly described in this section, while more detailed presentations are provided by Isaaks and Srivastava (1989), Johnston et al. (2001), Li and Heap (2008), Longley et al. (2015) Webster and Oliver (2001).

GPI
GPI is a fast and global interpolator, which relies on polynomial function to predict precipitation (Ali et al. 2012;Apaydin et al. 2004;Wang et al. 2014). Being a global method, GPI uses all the measured precipitation to estimate the new ones and does not use local information (Ali et al. 2012). Webster and Oliver (2007) stated that, theoretically, GPI establish a plan between measured precipitations as if one were attempting to fit a surface between them. GPI involves few decisions to make and no assumption is required. The method has the advantage to capture the overall trend of precipitation, but is not capable of taking in consideration the local variation (Johnston et al. 2001;Wang et al. 2014). GPI is suitable for places where precipitation is changing slowly at a regular pace (Javari 2016). The challenge of method is to identify the degree of polynomial in order to fit the precipitation surface. In the present study, the degree of the polynomial was tested for values between 1 and 9.

LPI
LPI is more flexible than GPI and involves more parameters to be set up. The prediction of precipitation is mainly based on the spatial extent of area specified by the operator (Ali et al. 2012;Wang et al. 2014). The method fits for each area a new polynomial equation based on the following parameters: maximum and minimum of neighbours, sector, neighbourhood type and kernel type (e.g. exponential, Gaussian, quartic and constant) (Johnston et al. 2001). In order to have accurate prediction of precipitation, the measured data must have an equal distance between them and inside neighbourhood to be normal distributed. LPI can provide finer details of precipitation when it is used in narrower areas (Luo et al. 2008). The order of the polynomial function was tested for values between 0 and 10. And the polynomial equation was calculated using all type of kernels. The weight of the predicted precipitation ( i) can be calculated as: where d i is the distance between measured and predicted precipitation; R represents the neighbourhood area took in consideration; and p is the order of polynomial function defined by operator.

IDW
IDW is an exact interpolation method (it passes through each measured point) and can estimate precipitation only between minimum and maximum of measured precipitation (Li and Heap 2014). It is based on the assumption that things located nearby have closer values than those which are located more far away (Borges et al. 2015). The method has no assumption about data and has few parameters to be set up (Garnero and Godone 2013). In some studies, the method performed well when data was regularly spaced (Isaaks and Srivastava 1989). IWD is capable of including co-variables (Chen and Liu 2012), but in our study, we use it as univariate method.
The predicted precipitation uses only the measured precipitation located in the immediate vicinity. It is influenced by distance, so that a greater weight will be assigned to values located near than those at a greater distance (Borges et al. 2015;Johnston et al. 2001;Shepard 1968). In GW, the weight of each measured precipitation is control by the p-th power, which determines the main factor of accuracy of prediction. All the weights will be assigned proportional to the inverse of the distance and will be normalized so that their sum is equal to 1 (Achilleos 2011;Mueller et al. 2005).
The weights can be computed as: where d i0 is the Euclidean distance between predicted (a 0 ) and each measured precipitation (a i ) (d i0 > 0) and p is the parameter for controlling weights in relation to the distance of each measured precipitation (p > 0). The new predicted precipitation can be calculated with IDW as follows: where Ẑ is the predicted precipitation at a o location; Z is the measured precipitation at a i location; n is the number of weather stations located around the predicted location; and i is the weights assigned to each measured precipitation.

RBF
RBF (also named Spline) represents a series of accurate interpolation methods, which are based on a form of artificial neural networks (Borges et al. 2015;Johnston et al. 2001). The method consists of five different basis functions: TPS, ST, multi-quadric (MQ), inverse multi-quadric (IM) and completely regularized spline (CRS). In this study, we applied all five-basis function. Each function performs a different result depending on the smoothing parameter. RBF can predict precipitation outside minimum and maximum of measured precipitation. It is performing well when there is a small variation in the data set and when is applied for larger data sets (Johnston et al. 2001;Wang et al. 2014). RBF predicts the new precipitation based on an area specified by the operator and the prediction is force to pass through each measured precipitation. The weights are made according to their distance to the measured precipitation (Vicente-Serrano et al. 2003).
RBF can be computed as (Garnero and Godone 2013): where (r) represents one of the radial functions (CRS, TPS, ST, MQ, IM); i is the weights to be estimated; and ||s i -s 0 || expresses the Euclidean distance (r) between measured precipitation (s i ) and predicted precipitation (s 0 ).

OCK
OCK is similar with OK, but it takes in consideration the co-variables, and the mean is calculated locally (Pellicone et al. 2018). In general, the prediction made for most interpolation methods is expressed as weighted averages of the sampled data. OCK uses the same method, although the weighting is obtained using a semivariogram depending on the spatial correlation of data between precipitation and co-variables (Seyedmohammadi et al. 2016).
The predicted precipitation Z(s) represents a linear combination between measured precipitation and all co-variables used. In our case, OCK can be expressed as: where Z(s) represents the new prediction of precipitation at location s; Z I is the measured precipitation; I represents the weight assigned to Z I ; t 0 is the latitude; t 1 is the altitude; t 2 is the slope; and X 0 ÷ X 2 represents the weight assigned to each co-variable t 0 ÷ t 2 .

CK
Another method which takes into account co-variables is UCK (also named KED). UCK is an extension of UK which was developed by the mathematician Georges Matheron in 1963 (Biggs and Atkinson 2010;Kuhlman and Pardo 2010). According to Borges et al. (2015), UCK provides more accurate results when co-variables are strongly correlated with precipitation. UCK models the precipitation trend using a deterministic function. The resulting trend decreases from the locations where precipitation was recorded, and the autocorrelation is modelled in residues (ℇ) (Apaydin et al. 2004). Before estimating precipitation, the trend is added to the data set, which varies at each location (Biggs and Atkinson 2010;Goovaerts 1998).
UCK can be computed as: where Z(s) represents the new prediction of precipitation at location s; Z 0 is the measured precipitation; 0 represents the weight assigned to Z 0 ; t 1 is the latitude; t 2 is the altitude; t 3 is the slope; X 1 ÷ X 3 represents the weights assigned to each co-variable t 1 ÷ t 3 ; and 0 ÷ 3 is the mean of residuals of the measured precipitation and co-variables.

EBKR
EBKR is an extension of empirical Bayesian kriging (EBK), which includes a regression analysis based on ordinary least square (OLS) and two geostatistical models: intrinsic random function kriging (IRFK) and mixed linear model (MLM) (Gribov and Krivoruchko 2020). By combining the two models, EBKR manages to automate the processes of building a kriging model. It allows splitting the data set and use a large number of semivariograms for each data subset (Javari 2016). By dividing the data set, the method allows modelling the relationship between precipitation and covariable and to capture the nonstationary of precipitation. The biggest advantage compared with classical kriging is that EBKR takes in consideration the semivariogram error (Santanu et al. 2020). The weight of each semivariogram is calculated according to Bayes theorem (Yuanpei et al. 2016). Using co-variable as raster, the method solves the multicollinearity problem, being one of the well-known problems for the relationship between precipitation and co-variable. The method requires co-variables to be correlated with precipitation. The disadvantage of EBKR is the long processing time when the number of simulations is higher (Gribov and Krivoruchko 2020).

Methodology to evaluate the performance of interpolation
The performance of the prediction methods was performed using the CV. The method is part of the Monte Carlo group and is also known as "leave-one-out cross-validation" (LOOCV) (Berrar 2018;Chilès and Delfiner 2012). CV is the most widespread and applied method of verification in the field of climatology (Dobesch et al. 2010). The operation of the method takes into account all the data from the validation process (Keblouti et al. 2012). The CV algorithm involves two steps. In the first step, a measured precipitation value is temporarily removed from the data set, which is then predicted using the other measured precipitation values in the vicinity of the deleted point. In the second step, the value of the estimated point is compared with its true value. This procedure is repeated successively for all data in the data set. The result of the algorithm allows the measurement of the mismatch between the estimated data (E) and the measured data (M), which express the experimental error (ℇ).
The evaluation of disagreement between the two types of data is materialized with the help of error (Dumitrescu 2012). In this study, we used four types of indicators for each interpolation method, as follows: ME, RMSE, MAE and correlation coefficient R. The first indicator, ME, determines the average of the differences between the estimated and the measured values. ME has the ability to specify whether errors are overestimated or underestimated. If an interpolation method is accurate, then the value of ME must be 0 (Dumitrescu 2012;Johnston et al. 2001). ME can be computed as follows: Unlike the first indicator, RMSE allows the data to be transformed into absolute values and allows the detection of large errors by lifting to square and extracting the square root. The value of the RMSE must tend to 0 and is calculated according to the following relationship (Bostan et al. 2012): MAE calculate the modulus average of the difference between the measured and the estimated precipitation (Di Piazza et al. 2011;Hao and Chang 2013).
Correlation coefficient R is used to measure the linear dependent similarity between measured precipitation and the estimated one. According to Meng et al. (2013) and Borges et al. (2015), R can be computed as: To graphically illustrate the performance of interpolation methods, we used the Taylor diagram. The diagram provides a statistical summary of the estimation performance of the seven interpolation methods. It was developed by Karl E. Taylor and includes three coefficients: R, RMSE and standard deviation (Hu et al. 2018). The result of each method is displayed as a dot, and the closest one to the reference point marked as "measured" indicates the best results (Taylor 2001). Furthermore, anomalies can be detected and the unusual data erased (Apaydin et al. 2004). This study helps in choosing the best interpolation methods, different search strategies and different weighting methods (Apaydin et al. 2004;Keblouti et al. 2012).

Comparison of the spatial interpolation methods used for estimating the precipitation distribution
This study uses four deterministic (LPI, GPI, RBF and IDW) and three geostatistical (OCK, UCK and EBKR) methods. The interpolation results were compared using the basic principle of CV and four error indicators: ME, MAE, RMSE and R. The results of the indicators are presented in Table 4. Figure 3 shows the distribution of mean annual precipitation over mainland Portugal using data from 128 weather stations. Results of measured and predicted precipitation for each interpolation method applied at each weather station are presented in Table 2 of Online Resource 1.
One can notice that all methods generate the same spatial patterns of precipitation distribution. Precipitation increases from southeast to northwest. The GPI and LPI methods generate similar distribution, which is different from RBF and IDW. This is mainly due to the fact that the interpolations can be accurate or inaccurate. GPI and LPI are inaccurate methods, where estimated values differ from those measured. On the other hand, RBF and IDW are exact methods that generate a surface which passes through each measured point and returns the same value. Figure 3 shows that RBF and IDW are very similar.
GPI and LPI are defined by the degree of the polynomial. In this case, the best prediction was performed when the rank of the polynomial was set up to 2 for GPI and 0 to LPI. In the case of IDW, the accuracy of the precipitation estimation area is determined by the p parameter. In GW, the value of p can be chosen between 1 and 100. Although in different studies (Lloyd 2005), precise distributions were generated when the value of p was set to 2, in this study, the most precise distribution was generated when p was set up to 3. One of the well-known IDW problems reported in other studies (Soenario et al. 2010) and visible also in Table 4 Cross-validation results of mean annual precipitation distribution generated by seven interpolation methods  Ahmed et al. (2014). The authors compared IDW, RBF, UCK and disjunctive cokriging to map precipitation in Balochistan, Pakistan. The authors state that from all five sub-methods related to RBF and tested to predict precipitation, CRS perform the best prediction. Similar results were obtained also by Hutchinson (1995). The geostatistical methods applied in this study use co-variables (latitude, altitude and slope) to improve the prediction of precipitation. Even so, according to Table 4, IDW and RBF outperform OCK. A similar case was obtained by Pereira et al. (2010) in Portugal. The authors tested five univariate methods (IDW, ST, TPS and LPI) and one multivariate method (OCK) to predict monthly precipitation. They concluded that even altitude was taken in consideration to explain the presence of precipitation, it did not improve the results. In another study, applied in Belgium, Ly et al. (2011) proved that integrating altitude in prediction of daily precipitation using KED and OCK did not improve prediction and that IDW and OK provide better results. The same conclusion was made by Mair and Fares (2011). The authors compared Thiessen polygon, IDW, LR, OK and

Precipitation [mm]
SKlm and concluded that even they used co-variables, it did not generate better results. The only geostatistical methods to outperform univariate methods are EBKR and UCK. From all interpolation methods used, EBKR performs the best spatial distribution (smallest RMSE 174.74,MAE 110.44 and highest R 0.942) and GPI the lowest spatial distribution (highest RMSE 298.25,MAE 213.55 and smallest R 0.822). Also, the highest values for ME were registered by UCK (− 4.15). Taylor diagram (Fig. 4) reveals the accuracy of EBKR above all interpolation methods. With the exception of GPI, all methods capture the lower resolution of the precipitation distribution. The spatial distribution of precipitation shows similar patterns, with higher values (including the relative maximum values) located in the same regions. The discrepancies between methods are found in the contours details, which represent tens of kilometres of spatial variation. The precipitation contours are more outlined in the IDW, RBF and EBKR distributions. Besides, the EBKR distribution matches better the terrain orography. Precipitation is concentrated over mountain slopes and ridges, as a result of dynamic convection triggering the development of slopeward precipitation mechanisms.
The measured vs estimated precipitation plots (Fig. 5) generally show a slight heteroscedasticity. In all methods, there is a difference between reference line and regression line, except OCK and EBKR where it shows an almost identical match. Regarding the variations of the spatial precipitation, Fig. 3 shows that the mean annual precipitation increases from the southeast to the northwest. Thus, northwestern part of Portugal received the highest mean annual precipitation (2674.2 mm year −1 ) and the southern part the smallest amount of precipitation (473.5 mm year −1 ).

EBKR UCK
Comparing the new distribution of precipitation obtain in this study over mainland Portugal with other gridded data sets, some differences and similarities can be noticed. For example, the prediction of precipitation of E-OBS over Portugal is based only on 17 weather stations and not 128 as in our case. Also, the interpolation method used to obtain E-OBS is UK (Hofstra et al. 2009). Both gridded data sets have the same spatial patterns of precipitation distribution. Another example can be the daily gridded data set of precipitation named IB02 developed by Belo- Pereira et al. (2011) for Portugal and Spain. The method applied to obtain IBO2 is OK and is based on a dense network consisting of more than 400 weather stations distributed across mainland Portugal. Comparing the spatial distribution of our data with the results of IB02 and the rest of the data which Belo- Pereira et al. (2011) compared (ERA-40, ERA-Interim, Global Precipitation Climate Center and Climate Research Unit), we can notice that all of the data have the same pattern of spatial distribution. It is noticeable that for all data, the amount of precipitation is higher in places with mountains. From all gridded data set mentioned before, the best agreement is between IBO2 and our data. Both studies are applied for almost the same period of time (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000); however, the amount of precipitation is different in some regions. For example, in the northern part of Portugal, the maximum precipitation predicted by IB02 is 2500 mm and by EBKR is 2674 mm. Also, the minimum precipitation predicted by IB02 is 450 mm and by EBKR is 473.5 mm. The difference can be explained by the number of weather stations used and the advantage of EBKR to capture the nonstationary of precipitation by dividing the data set.

Assessment of uncertainties
It is very important for the end users to know the uncertainty of the modelled spatial distribution of precipitation in order to determine the reliability of maps used in different applications (Angulo-Martínez et al. 2009;Bárdossy and Pegram 2013). Uncertainties are inherently included in the spatial distribution generated by the interpolation methods. In this case, the uncertainty of the spatial distribution is expressed as a PSE for the method with the best results of statistical indicators, namely EBKR (Table 4). The PSE of the precipitation predicted using EBKR was calculated in GW using the same parameters as for prediction: number of subset size, 50; number of simulations, 100; overlap factor, 2; and the semivariogram type, exponential. Figure 6 shows the PSE of the precipitation using EBKR at each location. The results show that the PSE of precipitation in mainland Portugal increases from south to north (Fig. 6). In the northern part of Portugal, we are overestimating or underestimating precipitation with 470.2 mm and in the south with 22.5 mm of precipitation. The increase of PSE in the northern part of Portugal is due to the bias introduced by density of rain gauges in the study area (Apaydin et al. 2004;Dobesch et al. 2010). The second reason is due to extrapolation extended to Portuguese border.

Conclusions
This research tackles the problem of geospatial representation of precipitation using different interpolation methods. The performance of seven interpolation methods used in this study (LPI, GPI, IDW, RBF, OCK, UCK and EBKR) were compared to estimate the spatial distribution of precipitation over Portugal using data from 128 weather stations for 10 years (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000). The results were evaluated using CV, R, ME, RMSE and MAE. The difference between all predicted distribution results was highlighted in Taylor diagram. Also, the uncertainty of the best prediction was determined using PSE.
For the multivariate methods, six co-variables were taken in consideration to explain the presence of precipitation at the beginning of the study. From all six, only latitude, altitude and slope contribute to explain the precipitation in Portugal. The strongest correlation is register by latitude (R = 0.64).
One can notice that all methods generate the same spatial patterns of precipitation distribution. Precipitation increases from southeast to northwest. Thus, southern part of Portugal received the smallest amount of mean annual precipitation (473.5 mm year −1 ) and the northwestern part received the highest amount of precipitation (2674.2 mm year −1 ). From all seven cases of interpolation methods, EBKR performs the best spatial distribution (smallest RMSE 174.74, MAE 110.44 and highest R 0.942) and GPI the lowest spatial distribution (highest RMSE 298.25, MAE 213.55 and lowest R 0.822). Also, the highest values for ME were registered by UCK (− 4.15). Even OCK used co-variables to improve precipitation, it did not provide better results than univariate methods, such as IDW and RBF-CRS.
The PSE result of EBKR prediction over mainland Portugal increases from south to north. In the northern part of Portugal, we are overestimating or underestimating precipitation with 470.2 mm and in the south with 22.5 mm of precipitation. From the results, it is noticeable that the PSE is higher in places where the values of measured precipitation are also higher. Another correlation to justify the presence of high PSE in the northern part of Portugal is the increase of altitude. These two factors affect the accuracy of prediction. Also, according to Hofstra et al. (2009) andOtieno et al. (2014), the density of stations used in the interpolation process influences the accuracy of the spatial distribution generated by the interpolation methods and is less accurate in places with complex surfaces, such as mountains.