Impact of rainfall spatiotemporal variability and model structures on flood simulation in semi-arid regions

The spatiotemporal heterogeneity in precipitation and underlying surfaces and hybrid runoff generation mechanism make hydrological modelling and forecasting in semi-arid regions becoming a challenging work. Therefore, to provide a reference for the development of hydrological models in such regions, two nested hydrological experimental watersheds in semi-arid regions were selected for attribution analysis. Based on the concept of large-sample hydrology, large-scale numerical simulation experiments were performed by constructing different spatial and temporal scale rainfall schemes and combining three hydrological models with different runoff generation mechanisms. Finally, the influences of the time step, station density, and model structure on the flood simulations in semi-arid regions were evaluated. The spatial interpolation technique was used simultaneously to describe the high-dimensional complicated nonlinear relationships between the influencing factors and simulation results. The results show the following: (1) the flood simulation accuracy was more sensitive to the time step than the spatial station density of the rainfall schemes and was highly dependent on the time step of the original observation data, and (2) compared with the accuracy of the rainfall schemes, the model structure plays a dominant role in flood simulation accuracy. Thus, the hybrid model has significant potential for flood forecasting in semi-arid regions by combining different runoff generation mechanisms. (3) The spatial interpolation technique based on the k-nearest neighbour algorithm can construct a high-dimensional distribution between the influencing factors and model simulation accuracy, and describe the complicated relationships among the time step, station density, model structure, and flood simulations.


Introduction
Flooding is one of the most common natural disasters worldwide, especially in the semi-arid regions of northern China, which encompass approximately one-fourth of the area of China, and has caused severe loss of life and property (WMO 2011;Li et al. 2014). Rapid flooding is prone to occur in these regions owing to the short duration and high intensity rainfall, and poor surface vegetation coverage characteristic of these areas (Burns et al. 2001;Huang et al. 2016a;Shen et al. 2020). Meanwhile, the weak infiltration capacity and shallow soil infiltration layer in semi-arid regions can cause substantial variations in the surface soil water content under the forcing of local or heavy rainfall, leading to the alternating dominance of the saturation-and infiltration-excess runoff mechanisms (Buda et al. 2009;Huang et al. 2016b). Therefore, improving flood forecasting for semi-arid regions has become a hot and difficult topic in the field of hydrological modelling and forecasting. According to previous studies, the following two types of problems exist in the development of hydrological models for semi-arid regions: appropriate data resolutions and rational model structures (Al-Qurashi et al. 2008;Li et al. 2017). Precipitation in semi-arid regions is characterised by localisation. As a critical model input, high-resolution precipitation data is the fundamental guarantee of good simulation results. However, the rainfall station density in semi-arid regions is usually low, such as the Yellow River Basin in China (Li et al. 2017), rendering it challenging to conform to the model simulation requirements, which exacerbates the errors associated with average rainfall estimates (Nalder and Wein 1998;Al-Qurashi et al. 2008). Moreover, some hydrologic models require more auxiliary data (e.g., temperature, evapotranspiration, vegetation, and human activities, among others), which often exceed the conventional observation hydrologic elements for most watersheds. However, most semi-arid regions can be characterised as small-and medium-sized rivers, and the supporting hydrological observation equipment in the watersheds is relatively outdated or was only installed recently; many small-and medium-sized watersheds remain ungauged (The Yellow River Conservancy Commission Middle Reach Hydrology and Water Resources Bureau 2005).
Numerous exploratory studies have been conducted to solve the problem of the high-resolution precipitation and auxiliary data required for hydrological models. Expect for data unavailability or a lack of accuracy, which can be solved by obtaining the spatiotemporal distribution of rainfall, early studies proposed some rainfall spatiotemporal processing techniques, such as inverse distance weighting, ordinary kriging, Thiessen polygons, linear interpolation, and fractal similarity theory (Berne et al. 2004;Mair and Fares 2011;Piazza et al. 2011;Shen et al. 2012;Liu et al. 2020a). However, these techniques have proven to be inherently limited by the accuracy of the observational data (Haberlandt et al. 2007). With advances in remote sensing observation technology and satellite data inversion algorithms, satellite precipitation data have gradually become an important data source for hydrological research. The widely used satellite precipitation data are the Tropical Rainfall Measuring Mission (TRMM) and Global Precipitation Measurement (GPM) data. Despite being an earlier mission, the spatiotemporal resolution of TRMM data is lower (spatial resolution of 0.25°9 0.25°a nd a time step of 3 h) (Beck et al. 2017). In contrast, GPM has a higher data resolution (spatial resolution of 0.1°9 0.1°and a time step of 0.5 h) but had a later launch date (Siuki et al. 2017). Moreover, there are limited studies on whether the accuracy of the fusion of satellite precipitation data with rainfall station observation data meets the requirements for flood forecasting in small-and mediumsized watersheds in semi-arid regions. Therefore, for smalland medium-sized watersheds, studies using satellite data to compensate for data accuracy are not sufficiently mature and, at present, still rely on rainfall station observation data.
A hydrological model is an assumption and generalisation of real hydrological processes. Only with rational modelling strategies and suitable model structures can the model results be used to solve practical problems (Beven 1989). Zhao (1983) believed that runoff generation is the most complex hydrological process and the most difficult issue in the design of the model structure. Compared with humid and semi-humid areas, precipitation and surface factors in semi-arid areas have extremely obvious spatiotemporal heterogeneity, which makes the runoff generation in this area quite complicated, i.e., alternating dominance of the saturation-and infiltration-excess runoff mechanisms (Huang et al. 2016b).
To characterise semi-arid regions, previous studies have developed numerous types of hydrological models through a large number of field experiments and numerical simulations. These include single runoff generation models, such as CASC2D , SCS (Williams and LaSeur 1976), and TANK (Chen and Pi 2004), hybrid runoff generation models, such as the Vertical Mixed-Runoff model (Bao and Wang 1997), improved Xinanjiang model (Li et al. 2005), and Xinanjiang-Haihe model (Huang et al. 2013), and more popular flexible framework models, such as FUSE (Clark et al. 2008), FARM (Euser et al. 2012), and SCCM (Liu et al. 2020b), etc., which have achieved numerous reliable research results. However, flood simulation accuracies in semi-arid regions are still undeniably low compared with those in humid and semihumid regions (Huang et al. 2016a;Huo et al. 2020).
With in-depth research on flood forecasting, studies have discovered that there is mutual feedback among the hydrological structure, parameter optimisation, and precipitation data accuracy. Numerous studies have been performed on optimising hydrological model parameters and runoff simulation under different climatic conditions and spatial and temporal scales (Kim et al. 2013;Zhang et al. 2016). Ostrowski and Wolft (1984) found that deterministic hydrological model parameters were highly correlated with the time step of the observation data. Kavetski et al. (2011) performed a systematic qualitative and quantitative analysis of the parameters, simulations, and uncertainties using four conceptual hydrological models at different time scales. Moreover, Krajewski et al. (1991) compared the hydrological response processes at different temporal and spatial resolutions, finding that the hydrological response was more sensitive to the temporal scale than to the spatial scale. Wang et al. (2008) analysed the effect that the grid resolution and time step variation have on the hydrological response, finding that the latter had a significant effect on the simulation of the flooding process. Lin et al. (2011) found that a change in the time step can be obtained with a similar simulation accuracy by adjusting the model parameters. Xia et al. (2007) applied the distributed time-variant gain hydrological model (DTVGM) in the Wuding River basin at different spatiotemporal scales, indicating that hydrological simulations in semi-arid regions mainly depend on data accuracy. Previous studies have mostly focused on the effects that time, space, or spatiotemporal combinations have on flood response, not yet forming a systematic theoretical system and mature methodological techniques. This lack of a theoretical system and mature techniques is therefore inadequate for flood forecasting demands in semi-arid regions, where the complexity of the runoff generation mechanisms is combined with the spatiotemporal variability of rainfall.
In this study, based on the idea of organically integrating hydrological experiments and simulations, two nested experimental watersheds, i.e., the Suide and Caoping watersheds, in a semi-arid region were selected. The rainfall self-similarity fractal, station sampling, and inverse distance weighting methods were used to process the data with respect to time and space to obtain different spatiotemporal rainfall schemes to drive three hydrological models and achieve different simulation schemes. Furthermore, based on the limited simulation schemes, the time step, station density, and degree of the hybrid models were uniformly discretized and interpolated by the spatial interpolation method to construct the three-dimensional (3-D) distribution between the simulation results and influencing factors. By comparing and analysing the results, this study accomplished the following: (1) evaluated the effects of the time steps, station density, and model structures on flood simulation in semi-arid regions; (2) systematically revealed the difficulties associated with flood simulation in semi-arid regions; and (3) found a suitable presentation to describe the high-dimensional complex nonlinear relationships between different influencing factors and simulation effect. The framework of attribution analysis is shown in Fig. 1. 2 Study areas and data sources
Based on Fig. 2, the Suide hydrological station is located in the mainstream of the Dali River, while the Caoping hydrological station is located in a tributary downstream of the Dali River, the two watersheds are nested with the similar climatic and surface characteristics. Rainfall along the Dali River is unevenly distributed during the year, mainly concentrated in June to September, which can account for 60-70 % of the rainfall for the entire year. Floods mainly form via heavy rainfall and depend on the area and intensity of these heavy rainfall events. The watershed landscape is mainly loess hills and valleys, with low vegetation cover and severe soil erosion.
The hydrometeorological data included the precipitation, evaporation, and discharge provided by the Hydrology Bureau of Shaanxi Province. There are four hydrological stations and 10 rain gauge stations in the Suide Watershed, with a station density of approximately 278 km 2 /station. The Caoping Watershed contains one hydrological station and 12 rain gauge stations, with a higher station density of approximately 14 km 2 /station (see Fig. 1). In the original rainfall observation data, 12 of the 14 stations recording rainfall in the Suide Watershed had a rainfall recording interval of 2 h while the two other stations (Qingyangcha and Lijiahe) had a recording interval of 15-45 min. The recording interval for the rainfall stations in the Caoping Watershed was 5-25 min.
In this study, 19 flood events (from 2010 to 2018) in the Suide Watershed and 17 flood events (from 2000 to 2010) in the Caoping Watershed were selected for model simulation. Table 1 lists the flood event characteristics. Low-, mid-, and high-peak floods were included in both watersheds to allow the input of the model to adequately optimise the parameters.

Construction of spatiotemporal rainfall schemes
3.1.1 Temporal processing of rainfall schemes Self-similarity in fractal theory (i.e., the distribution pattern of rainfall over small time scales is similar to that over larger time scales) was used to aggregate and disaggregate the rainfall data (Waymire 1985;Olsson et al. 1993). The original rainfall data were processed at different intervals, resulting in M rainfall schemes with different time steps. A specific calculation can be presented using this example. The original rainfall event in time interval t is P 0 t , and the original rainfall event in t ? 1 is P 0 tþ1 . Then, the new rainfall event, P 0 tþ1 , in t ? 1 can be calculated according to the rainfall relationship between the current and previous times, i.e., Eq. (1). The original rainfall event, P 0 tþ1 , minus the new rainfall event, P 0 tþ1 , is the t ? 0.5 rainfall event, i.e., P 0 tþ0:5 , which were calculated as follows: and P 0 tþ0: 3.1.2 Spatial processing of rainfall schemes In this study, drawing on the concept of the station sampling method (Liang et al. 2016), n stations (n = 1, 2, 3, …) were sampled from all rainfall stations to obtain N spatial distribution schemes with different stations densities. Then, the inverse distance weighting (IDW) method was used to spatially interpolate the rainfall data of each scheme. The rainfall, P i , of the ith grid was calculated as follows: and where P j is the rainfall at the jth rain station, w i;j is the weight of the jth rain station in the ith grid, d i;j is the distance between the ith grid and the jth rain station, b is the number of rain stations, and k is the kth rain station closest to the ith grid. The original observation data were processed in time and space through the above methods; finally, M*N rainfall schemes, with different spatial and temporal distributions, were obtained.

Multiple runoff generation models
In semi-arid regions, the spatiotemporal heterogeneity of the surface and precipitation leads to the alternating spatial and temporal dominance of the saturation-and infiltration-excess runoff processes (Hassan et al. 2014). In this study, three types of runoff generation mechanism models were used: the saturation-to infiltration-excess Xinanjiang model (saturation-excess runoff mechanism), the Saturation and Infiltration-excess Combined model (hybrid runoff mechanism), and the Green-Ampt model (infiltration-excess runoff mechanism), which are referred to as M1, M2, and M3, respectively. Figure 2 shows a schematic representation of each model.

Xinanjiang model
The Xinanjiang model (M1), developed in 1973, is a conceptual hydrological model with the saturation-excess  runoff mechanism (Zhao et al. 1980). The core of the Xinanjiang model is the rainfall-runoff relation equation based on a parabola probability distribution (Zhao et al. 1992). The model consists of four modules: evapotranspiration, runoff generation, runoff separation, and flow concentration with 17 parameters (Fig. 3a). The Xinanjiang model has demonstrated high skills in predicting and simulating floods in humid and semi-humid regions. Previous studies have widely used this model for flood forecasting with limited input data in China (Li et al. 2014;Huang et al. 2016a, b;Huo et al. 2019;Liu et al. 2020b).

Green-Ampt model
The Green-Ampt model (M3) (Green and Ampt 1911) was developed to capture the infiltration-excess runoff based on the assumption of a sharp wetting front resulting from infiltration ( Fig. 3c). The model consists of three modules: evapotranspiration, runoff generation, and flow concentration. The infiltration-excess surface runoff was calculated using the Green-Ampt infiltration equation and the infiltration capacity distribution curve, neglecting subsurface runoff. This model is widely used for flood forecasting in semi-arid regions owing to its simple structure, relatively fewer parameters, and clear physical meaning (Morbidelli et al. 2018;Huo et al. 2020).

Saturation and infiltration-excess combined model
Based on the concept of the dominant hydrological process, the Saturation and Infiltration-excess Combined model (Huang et al. 2016b;Liu et al. 2020b) (M2, see Fig. 3b) is constructed by classifying the saturation-excess dominant (SED) or infiltration-excess dominant (IED) sub-watersheds and combining the Xinanjiang and Green-Ampt models.
The curve number (CN) is the only parameter of the SCS-CN model, which characterizes the tendency of infiltration-excess runoff generation (Williams and LaSeur 1976). The topographic index (TI) is introduced as a quantitative measure of the effect of topography on saturation-excess runoff generation (Beven and Kirkby 1979). Specifically, the CN-TI method (Huang et al. 2016b) was used to classify the watersheds into SED and IED subwatersheds. The detailed procedures are as follows: first, when the sub-watersheds were divided, the CN and topographic index of each sub-watershed were calculated using the DEM, soil, and land use data. Second, the init-classification was completed according to the CN values, i.e. less than 60 for SED sub-watersheds. Third, the topographic index was used for re-classification, i.e. sub-watersheds with the TI value greater than 25 are modified to SED and those less than 7 are modified to IED. Finally, the dominant runoff type in each sub-watershed was determined.
Combined with the sub-watershed classification results, the M2 model was constructed, i.e., the Green-Ampt infiltration-excess module was combined with the runoff generation module in the Xinanjiang model. When the runoff generation calculation was performed, the tension water storage capacity curve was used in the SED subwatershed while the infiltration capacity curve was used in the IED sub-watershed. Then, the lag-and-route method and Muskingum method were used for overland and channel routings, respectively.

Model calibration
Different spatiotemporal rainfall schemes were used as inputs to drive the three models, and the optimal parameter calibration was performed separately. To reduce parameter uncertainty and computational workload, only sensitive parameters were automatically optimised using the SCE- UA algorithm (Duan et al. 1992) in this study. Insensitive parameters were estimated using the trial-and-error method. The three of all flood events were selected in each watershed to validate the calibrated models. To facilitate the analysis, all flood simulation results were counted together.

Evaluation of rainfall heterogeneity
Two characteristic indicators were used to analyse the spatial heterogeneity of rainfall, i.e., the average rainfall ( À P ) and the coefficient of variation (Cv) (Zhang and Qian 2004) and calculated as follows: and where a i is the weight of the ith grid, P i is the accumulated rainfall in the ith grid, and m is the number of grid cells For the temporal variability in the rainfall, two indicators, reflecting the degree of rainfall concentration, were used for the analysis: the rainfall concentration degree (PCD) and rainfall concentration period (PCP), calculated as follows: where P i is the total precipitation at the ith rainfall station, p ij and h j are the precipitation at moment j at the ith rainfall station and the corresponding azimuth (the azimuth of the entire rainfall period was set to 360°), respectively, and k is the number of rainfall periods during the flood event.
PCD and PCP can quantitatively characterise the nonhomogeneity of precipitation over time. We note that PCD is not related to the rainfall magnitude, only representing the degree of rainfall concentration. The value of PCD ranges from 0 to 1; the closer it is to 1, the more concentrated the rainfall process is; in contrast, values closer to 0 denote a more homogeneous rainfall process. PCP reflects the period in which the maximum rainfall occurs during the entire rainfall process

Evaluation of model results
According to the standards for evaluating the accuracy of hydrological simulations and forecasts issued by the Hydrology Bureau of the Ministry of Water Resources, China (2008), four commonly used metrics were selected to assess the overall performance of the models for a set of flood events: the qualified ratio of the relative runoff depth error (NR), qualified ratio of the relative peak flow error (NQ), qualified ratio of the time-to-peak error (NT), and the mean value of the Nash-Sutcliffe efficiency coefficient (NS). For each flood event, a range of ± 20 % of the observed flood runoff depth and peak flow was taken as the allowable error for the runoff depth error and peak flow error, respectively. The allowable error for the time-topeak error was ± 3 h.
The value ranges of the above four statistical metrics were inconsistent and required normalisation. In order to combine the advantages of single metric and facilitate the evaluation of simulation results, the normalised statistical indicators were then cumulatively summed with equal weights to obtain the comprehensive indicator (CI). NR, NQ, and NT were normalised to (0,1) while NS was normalised to (-1,1). The comprehensive indicator was calculated as follows: where NR 0 j , NQ 0 j , NT 0 j , and NS 0 j are the normalised NR, NQ, NT, and NS of the jth scheme, respectively. The range of CI is (-1,4), with higher value representing better simulation accuracy

''Time-space-model'' spatial interpolation analysis
Based on the simulation results of existing schemes, the time step, station density, and the degree of the hybrid model were moderately discretized. The spatial interpolation was performed by the k-nearest neighbour (KNN) algorithm (Altman. 1992) to construct a 3-D cube of the simulation results and three factors to explore the relationship between the ''time-space-model'' factors and simulation results KNN algorithm is one of the simplest machine learning algorithms for classification and regression. The basic principle of KNN algorithm is that if the majority of a sample's k nearest neighbours in the feature space belong to a certain class, then that sample also belongs to that class. In the KNN algorithm, the Euclidean distance is always used to calculate the distance between two points. The Euclidean distances in the ''time-space-model'' 3-D cube were calculated as follows: where d 1;2 are the Euclidean distances between two schemes, and x t , y s , and z m are the time step, station number, and model's hybrid degree coordinates of the schemes, respectively The steps to generate the ''time-space-model'' 3-D cube based on the KNN method were as follows: (1) divide the M*N sets of schemes into a training set (80 % of sets) and test set (20 % remaining sets); (2) set the initial value of k, k 0 = 1 and the initial error of Euclidean distance between two samples, E 0 = 0; (3) calculate the Euclidean distance between each training sample in the training set and each target sample in the test set; (4) select the k training samples in training set with the smallest distance from each target sample in the test set, construct a weight matrix based on their Euclidean distances, and calculate the predicted value of the CI for each target sample; and (5) count the sum of the error between the actual and predicted values of all target samples in the test set, E n . If E n is less than E n-1 , k is added by 1. Repeat step (4) until E n is greater than E n-1 , k is taken as n. (6)

Analysis of spatiotemporal rainfall scheme results
The two watersheds are located in semi-arid regions characterised by short duration and high-intensity precipitation. Therefore, the time step and station density should be set to reflect the variability while meeting the representativeness.
Considering the flood duration, flood rise time, and confluence time of the two watersheds, the time steps of the two watersheds were set as follows: 15, 30, 60, and 120 min for the Suide watershed, and 5, 10, 20, and 30 min for the Caoping watershed. For the station density, considering the representativeness of the rainfall data, the minimum number of rainfall stations was set to half of the total rainfall stations. The number of different stations was evenly sampled from all rainfall stations in the two watersheds to obtain the five spatial distributions: 14, 12, 10, 8, and 6 stations for the Suide watershed,and 13,11,9,7, and 5 stations for the Caoping watershed (Fig. 4). Finally, by combining different time steps and the number of stations, 20 different spatiotemporal rainfall schemes were obtained in the Suide and Caoping watersheds, respectively.

Results of the spatiotemporal distribution of the rainfall schemes
Owing to a large number of rainfall events in both watersheds, we were not able to entirely present the results of all the rainfall schemes. The average rainfall was ranked to select the quartile flood events, i.e., 25, 50, 75, and 100 %, followed by the rainfall variability analysis. Table 2 lists the rainfall characteristics of the four flood events in the two watersheds. Figures 4 and 5 show the distribution of 20 rainfall schemes for four typical flood events in the Suide and Caoping watersheds, respectively. As the data used in these two figures were the total precipitation at each rainfall station, the time scale variability cannot be directly displayed. Therefore, the following descriptions focus on the spatial variability characteristics of the rainfall schemes.
For the Suide watershed, with an increase in the number of stations, the rainfall distribution of the four flood events can be characterized as follows. The number of storm centres for both #SD-2013081110 and #SD-2013080700 changed significantly; however, in terms of the rainfall distribution, the former became increasingly uneven relative to the latter (Fig. 5a, b). The storm centres of #SD-2010082000 and #SD-2017072308 showed no significant changes, but the upstream rainfall distribution was gradually uneven (Fig. 5c, d).
For the Caoping watershed, with an increase in the number of stations, the storm centre of #CP-2002081516 changed from 1 to 2 (Fig. 6a), but the storm centre of #CP-2006050716 disappeared, and its spatial rainfall distribution became more uniform relative to the former (Fig. 6b). The storm centre extent of #CP-2009071617 decreased, and the spatial distribution of rainfall did not change significantly (Fig. 6c); however, the number of storm centres at #CP-2006082918 increased, and the rainfall distribution became increasingly uneven relative to the former (Fig. 6d).
After removing the stations at the storm centre, the rainfall in this grid was calculated by interpolating the surrounding stations, such that the storm centre disappeared and the spatial rainfall distribution was severely homogenised. However, if the storm centre stations were retained, the spatial rainfall distribution had almost no change, e.g., #SD-2010082000 and #CP-2009071617. Therefore, fewer stations resulted in a more uniform rainfall distribution. In contrast, the more significant the storm centre, the more uneven the rainfall distribution.

Evaluation of variability in the rainfall schemes
1. Variability in rainfall distribution for the full range of flood events (Fig. 7).
For the Suide watershed, as the number of stations increased, the total rainfall of all flood events showed a decreasing trend (Fig. 7a, with a decrease of approximately 20 mm), the range in the average rainfall for each flood event gradually decreased (Fig. 7b), and the spatial variability in the rainfall gradually increased (Fig. 7c). We note that the distribution ranges of both the average rainfall and Cv values did not vary significantly at less than 75 %, but did vary significantly at the connecting line (75 % to Max.) (Fig. 7b), caused by flood events with higher rainfall. In contrast, when the number of stations was consistent, the total rainfall showed an increasing trend with an increasing time step while there was no significant trend in the average rainfall and Cv of flood events.
For the Caoping watershed, the total rainfall trend decreased as the number of stations increased by approximately 30 mm. In contrast, the total rainfall varied significantly with an increase in the time step, with a variation of approximately 10 mm (Fig. 7d). However, the average rainfall and Cv of flood events varied insignificantly (Fig. 7e, f).
2. Variability in spatial rainfall distribution for the four typical flood events.
The four selected typical flood events are analysed in depth below (Fig. 8) to understand the spatial variability in rainfall at different rainfall levels. Based on Fig. 8a, in the Suide watershed, with an increase in the number of stations, the rainfall in the 20 spatiotemporal schemes was essentially unchanged for the remaining three flood events, except for #SD-2017072308, which had a large reduction in the average rainfall of approximately 24 mm. Combined with Fig. 7a, we can infer that the 20 mm reduction in the total rainfall was predominantly caused by flood event #SD-2017072308. In contrast, #SD-2017072308 had the lowest rainfall at 14 stations and the highest rainfall at six stations. Combined with Fig. 5d, this apparent increase was due to an increase in the total rainfall after spatial interpolation, following the reduction in the number of stations.
The spatial variability in rainfall for the four typical flood events increased significantly with an increase in the number of stations (Fig. 8c). The largest increment was #SD-2013081110, with a value of 0.15, and the smallest increment was #SD-2010082000 (the largest decrease in the average rainfall), with a value of 0.05. The rainfall process in flood event #SD-2013081110 had a short-duration and was highly intense with scattered storm centres and an extremely uneven rainfall distribution (see Fig. 5a), resulting in particularly dramatic rainfall spatial variability. In contrast, the rainfall in flood event #SD-2010082000 was relatively larger than the former. The spatial rainfall distribution of #SD-2010082000 was almost similar regardless of the number of stations (Fig. 5c), leading to the lowest rainfall spatial variability. In addition, when the number of stations was the same, the average rainfall and rainfall spatial variability for the four typical flood events were consistent with increases in the time step (Fig. 8a, c). For the Caoping watershed, except for #CP-2006082918, the average rainfall during the three other flood events showed a fluctuating trend of decreasing with an increase in the number of stations (Fig. 8b). However, when the number of stations was the same, the average rainfall was practically consistent with the increase in the time step. The rainfall variation in flood event #CP-2006082918 was almost consistent with the total rainfall variation observed in all of the flood events (Fig. 7d), but the variation in the former was approximately 2 mm, significantly smaller than the 10 mm variation observed for the latter, which did not appear to play a dominant role like flood event #SD-2017072308 in the Suide watershed. By analysing the rainfall process of all the flood events, we found that the rainfall pattern for another flood event (with rainfall ranking at 82 %) was similar to that of #CP-2006082918, but with more substantial variability (7 mm), indicating that the rainfall variability in the remaining flood events was small. Overall, the variation in spatial rainfall variability for the four typical flood events was not notable with an increase in the number of stations and varied within a small range when the number of rainfall stations is consistent.
Comparing the rainfall spatial distribution of the two watersheds shows that although there are more rainfall stations in the watershed, the rainfall measurement was more accurate. However, for most rainfall events (small and medium), there was no difference in the total and average rainfall among the different schemes. Although the amount of change in the extra-large rainfall event was slightly notable, it was negligible compared to the total amount of rainfall. For the spatial rainfall variability, there was no significant variation in the small and medium rainfall during flood events in the Suide watershed, but there was quite a significant variation in the heavy rainfall.

Variability in rainfall temporal distribution for four typical flood events
The rainfall observation data were pre-processed using the aggregation and disaggregation methods to obtain rainfall data from different time steps at equal intervals. Then, two indicators, the PCD and PCP, were selected to respond to their concentration distribution on the rainfall time series (as shown in Fig. 9).
Based on Fig. 9a, b, the PCD of both the Suide and Caoping watersheds gradually decreases with an increase in the total rainfall for the four typical flood events. For the same flood event, the mean and median of the PCD gradually decrease with an increase in the time step. The difference is that the PCD ranges for the four flood events In summary, the variability in the rainfall schemes with different combinations of time steps and station densities is not significant when only focusing on rainfall variability. Therefore, the following section focuses on the effect that different temporal and spatial rainfall schemes have on the simulation results.

Single-factor analysis of the ''time-spacemodel'' simulation results
Twenty spatiotemporal rainfall schemes were input into three hydrological models for parameter optimisation. Accordingly, the optimal simulation results of these 60 ''time-space-model'' schemes were obtained. Figures 10, 11 and 12 show the four metrics of the simulation results for the Suide and Caoping watersheds. Heat maps of the four metrics for the simulation results of the two watersheds were plotted according to the three factors, i.e., the time step, station density, and model structure (e.g., Fig. 10a-h). Each heat map consists of several colour blocks. For example, in Fig. 10a, these 60 scheme metrics are divided into four groups according to all of the time steps, where each group is referred to as a colour block with each colour block containing the metrics of the entire simulated schemes for a given time step. In these heat maps, blue represents the qualified ratio of the relative runoff depth error (NR), green is the qualified ratio of the relative peak flow error (NQ), brown is the qualified ratio of the time-to-peak error (NT), and red is the mean value of the Nash-Sutcliffe efficiency coefficient (NS). Darker colours represent higher values (hereinafter referred to as ''high value''), which indicates high simulation accuracy; in contrast, a ''low value'' indicates low simulation accuracy.
The Suide and Caoping watersheds are geographically nested with similar climatic and surface characteristics, and both are semi-arid watersheds where the rainfall intensity strongly influences the runoff generation. Theoretically, smaller time steps yield higher rainfall accuracies, with a simulation that more closely approximates the natural process. However, based on the above simulation results, the simulation accuracy was not the best at the minimum time step, especially in the Suide watershed, where three of the four metrics were best at 60 min and the remaining metric was best at 120 min, which is mainly caused by the low accuracy of the observed rainfall. The rainfall observation interval in the Suide watershed was almost 2 h. Although the rainfall data was disaggregated by the nonlinear method, the accumulated error increases with a shorter time step. Thus, the disaggregated rainfall data cannot precisely approximate the natural rainfall process, especially the rainfall intensity distribution, which is why the model accuracy is high when the time step is closest to the observation time interval. In the Caoping watershed, the observed rainfall accuracy is high, with observation times ranging from 5 to 20 min. Most of the rainfall preprocessing in the Caoping watershed was aggregated from small-time steps to large-time steps. Therefore, the rainfall in the Caoping watershed can approximate the natural rainfall process at short time steps, allowing the model simulation to more closely approximate the natural flood process. 30min-6sta 60min-6sta 120min-6sta 15min-8sta 30min-8sta 60min-8sta 120min-8sta 15min-10sta 30min-10sta 60min-10sta 120min-10sta 15min-12sta 30min-12sta 60min-12sta 120min-12sta 15min-14sta 30min-14sta 60min-14sta 120min-14sta 5min-5sta 10min-5sta 20min-5sta 30min-5sta 5min-7sta 10min-7sta 20min-7sta 30min-7sta 5min-9sta 10min-9sta 20min-9sta 30min-9sta 5min-11sta 10min-11sta 20min-11sta 30min-11sta 5min-13sta 10min-13sta 20min-13sta 30min-13sta

Evaluation by station density groups.
For the Suide watershed, the results of the four metrics, with an increase in the station density, were as follows. The NR values varied insignificantly, and the simulation results are similar when the number of stations was 12 and 14 (Fig. 11a). The NQ values tended to first decrease and then increase (Fig. 11b). In the colour block for 14sta, the overall value was the largest due to more ''high values'' while in the colour block for 10sta, the overall value was smaller with few ''high values.'' The decreasing trend of NT was significant, with a reduced distribution of ''high values'' in each colour block (Fig. 11c), while the NS values were slightly decreasing (Fig. 11d). Based on Fig. 11a-d, the number of stations with the best results for the four metrics were 8, 14, 6, and 8, respectively.
For the Caoping watershed, the response relationships between the two metrics (NR and NT) and the number of stations were insensitive (Fig. 11e, g). The NQ values tended to decrease slightly with an increase in the number of stations (Fig. 11f) while the NS values increased incrementally (Fig. 11h). Overall, the number of stations with the best results for the four metrics were 11, 5, 13, and 11, respectively.
Based on the above results, there is no trend where ''the higher station density, results in higher values for the four metrics'' in the Suide and Caoping watersheds. Regardless  of the two watersheds, the effect that the reduction in the number of rainfall stations has on the rainfall data is mainly characterised by the absence of storm centres and a slight difference in the average rainfall. This can be temporarily remedied by interpolating the rainfall adjacent to the missing rainfall station using the IDW method. Although the spatial distribution of the interpolated rainfall scheme is not as adequate and accurate as when the real rainfall stations are available, the total rainfall error can be controlled within an acceptable range. Furthermore, we found that the small errors in the rainfall schemes could be ''offset'' by small adjustments to sensitive parameters to obtain similar flood simulation results when using the SCE-UA method to calibrate the model. However, offsetting errors in the input data by the model parameters is one of the errors that are strongly avoided in the hydrologic simulation field when constructing models and evaluating model performance. The most significant risk is that offsetting the errors does not truly reflect the errors in the precipitation inputs, possibly resulting in an incorrect calibration of the model, which is counterproductive to improving simulation capabilities.
3. Evaluation by model structure groups.
For the Suide watershed, the results of the four metrics, with an increase in the number of models (M1 to M3), were    Fig. 10 Group results for the four metrics in the Suide watershed (ad) and Caoping watershed (e-f) for the time step factor. The blue colour represents the qualified ratio of the relative runoff depth error (NR), green is the qualified ratio of the relative peak flow error (NQ), brown is the qualified ratio of the time-to-peak error (NT), and red is the mean value of the Nash-Sutcliffe efficiency coefficient (NS) as follows. The NR showed a significant increasing trend, and the ''high values,'' with a 100 % qualification rate, were all concentrated in the M3 model colour block (Fig. 12a). The ''high values'' for the NQ were mostly concentrated in the M2 model block, and only a few were scattered in the M1 and M3 model blocks (Fig. 12b). The NT distribution was relatively uniform, with only a few ''high values'' in the M2 model colour block (Fig. 12c).
The NS values were high in both the M2 and M3 model blocks due to the uniform distribution of ''high values,'' but the former had a higher mean (Fig. 12d). Based on Fig. 11a-d, the models with the best results for four metrics were M3, M2, M2, and M2, respectively. For the Caoping watershed, the mean value of the NR in the colour block of model M3 was significantly larger than the other two models (Fig. 12e). The ''high values'' for NQ were heavily clustered in the model M2 blocks, and their mean value was higher than those of the two other models (Fig. 12f). As the model number increases, both NT and NS tended to increase, and the colour blocks for the model M3 had more ''high values'' and higher mean values than the other models (Fig. 12g, h). In terms of the mean values of the colour blocks, the four models with the best results were M3, M2, M3, and M3 (Fig. 12e-h).
The runoff components of the flood processes in semiarid regions are mainly surface runoff, with a small proportion of subsurface runoff, leading to a high and thin flood process. In the model calibration, we found that model M1 was defective for the simulation of such flood processes, i.e., the high qualification rate of both the runoff depth and flood peak cannot be satisfied simultaneously. When the runoff depth of the simulated flood was qualified, the hydrograph was short and wide, resulting in a simulated flood peak that is significantly smaller than the observed one. In contrast, when the flood peak was qualified, the simulated runoff depth was still significantly larger than the  Fig. 11 Group results for the four metrics in the Suide watershed (a-d) and Caoping watershed (e-f) for the station density factor, and the colour settings for the four metrics are consistent with Fig. 10 Stochastic Environmental Research and Risk Assessment (2022) 36:785-809 801 observed peak, even though the interflow and groundwater in model M1 were reduced to the greatest extent possible. For another, model M3 (Green-Ampt model) considers that the infiltration-excess runoff only occurs at the surface and ignores subsurface runoff, allowing the simulated flood process to present a high and thin characteristic; thus, the simulated runoff depth can more easily meet the qualification. Therefore, when using the SCE-UA method for multi-objective calibration, model M1 tends to ''sacrifice'' the runoff depth qualification rate and prefers a more easily improved flood peak qualification rate to achieve the optimal multi-objective value while model M3 prefers the runoff depth qualification to achieve the optimal objective value. Moreover, model M2 combines two mechanisms, i.e., saturation-and infiltration-excess, such that it possesses the advantages of these two runoff mechanisms and solves the tendency for a single runoff mechanism model simulation. The runoff depth qualification rate for model M2 is between those of models M1 and M3, and the other three metrics are the best. Overall, in the semi-arid regions, simulations using model M1 with saturation-excess runoff generation had low accuracy while the simulation accuracy of the models with an infiltration-excess module (e.g., models M2 and M3) could be improved substantially. Regardless of how the time step and station density are combined, model M2 can obtain higher accuracy simulations with a high probability in the Suide watershed, whereas the simulations using model M3 always tend to have better results in the Caoping watershed. Using Eq. (10), the CIs of the 60 simulation schemes were calculated for the Suide and Caoping watersheds, whose    -8.83 -8.47 -8.55 -7.83 -7.83 -8.7 -8.93 -9.84 -8.4 -8.46 -7.62 -8.66 -8.32 -8.75 -8.8 -7.96 -9.43 -8.57 -9.39 - The second derivative of the fitting line was calculated to obtain several mutation points in its slope (second derivative equal to 0). According to these points of the fitting line and the CI distribution, all simulation schemes were divided into several parts for analysis. For the Suide watershed, the difference between the maximum (2.747) and minimum (0.799) CIs for the 60 simulation schemes was 1.948. For the Caoping watershed, the distribution range of the CIs was much wider than that for the Suide watershed (3.466), with a maximum value of 3.665. Moreover, the high CI values ([ 2.5) for the Caoping watershed were much more than those for the Suide watershed, indicating a higher accuracy of the simulation. Figure 13 shows the results of the CIs and the corresponding station number, time steps, and models for all simulation schemes in the Suide watershed. To facilitate the analysis, the CIs were sorted and divided into three parts: Parts 1-3. The CIs for Parts 1-3 accounted for 40.9, 36.4, and 22.7 % of the entire stage, respectively. Thus, Parts 1 and 2 were fast-rising stages while Part 3 was a stable-rising stage.
Part 3. Thirty-four schemes were in this part. Compared with Part 2, the CIs increasingly slowed. Model M2 was dominant (Fig. 13c, 17 schemes). The time step distribution was extreme (15 or 120 min), and the number of stations had no prominent distribution characteristics. Although the increase in the CIs was the smallest in this part, the simulation results for the Suide watershed were the best, suggesting that model M2 steadily dominated this part, regardless of the combination of the time step and rainfall stations.

Part 1
There were 20 simulation schemes in this section. There was no notable pattern in the number of stations in this entire part, and the time step gradually shortened, with M1 accounting for 100 % (Fig. 14a-c). (1) From Fig. 14d, the CIs in the prestage of Part 1 (1-5 schemes) increased rapidly (by 0.799) due to the increase in the number of stations.
(2) With the steady fluctuation in the time steps at 5 and 20 min, the CIs increased smoothly in the middle stage (6-16 schemes). (3) The time step stabilised at 10 min when the CIs increased further in the latter stage of this part (17-20 schemes). Part 2 There were 10 simulation schemes in this part, with the largest increase in the CIs in the Caoping watershed. Over the entire part, the number of stations showed a decreasing trend, the time step of 30 min accounted for 100 %, and the model factor was alternately dominated by M2 and M3 (50 % each). (1) Relative to Part 1, the CIs in the prestage of Part 2 (21-23 schemes) increased rapidly, caused by the change in the dominant model to M2 or M3. (2) The increase in the CIs slowed in the middle and later stages of this part (24-30 schemes): M2 and M3 alternately dominated, with the time step stabilising at 30 min, while there was a further decrease in the number of stations. Part 3 There were 17 schemes. Compared with Part 2, the increase in the CIs slowed, and M2 became dominant (Fig. 14c, 88 %). As the CIs increased, the time step showed a cyclical trend of increasing and decreasing, indicating that a smaller time step may significantly improve the CIs. Part 4 There were 13 schemes. As the CIs increased, the number of stations decreased, the time step shortened, and M3 entirely dominated this part. Relative to Part 3, the CIs of the pre-stage (48-52 schemes) had less of an increase (0.104). The CIs in the later part (53-60 schemes) increased rapidly, the number of stations decreased, and the time step stabilised at small steps (5 and 10 min).

Analysis of the ''Time-space-model'' spatial
Interpolation results Based on the above results and analysis, three factors, i.e., the time step, station density, and model structure, have different degrees of influence on the accuracy of the flood simulations. However, owing to the small sample size of the schemes, the limitation of the length of the observed data, and difficulties associated with using hybrid models, using mathematical formulas to describe the functional relationship between the different factors and simulation accuracy is challenging. Therefore, the KNN spatial interpolation method was used to present the relationship between the impact factors and simulation results in a 3-D manner. Before interpolation, the unit selection and discretization of the three factors must first be required. The discrete methods of the time step and station density were similar; both were interpolated based on the original scheme. For example, in the Caoping watershed, the four original time steps were interpolated to six time steps, and the station density was discretised from the original five stations to 13 stations (1-13). The runoff generation mechanisms in M1-M3 in this study were saturation-excess, hybrid, and infiltration-excess runoff, respectively. To facilitate the discretization of the models (degree of the hybrid model), the degrees of these three models were assumed as 1, 5, and 10, respectively. Based on this assumption, the degree of the hybrid models (from saturation-to infiltration-excess) was divided into 10 classes. Figure 15 shows the 3-D distribution of the comprehensive indicators (CIs) for the Suide watershed, where the x, y, and z axes represent the time step, station number, and model, respectively. According to the range of the comprehensive indicators, four typical interval values were selected, i.e.,  [2.6,2.7], to plot these iso-surfaces (Fig. 15b-c). Figure 15a shows a clear ''abrupt change zone'' in the 3-D cube, i.e., CI [ [2.3,2.4], where the comprehensive indicators change from low to high values, corresponding to the iso-surface shown in Fig. 15d. Combined with the analysis of the three factors in Sec. 4.2.1, the 3-D coordinates of the ''abrupt change zone'' approximated the corresponding threshold points of every single factor. The model factor was notably the most dominant factor that allows the appearance of this zone. In contrast, until the appropriate model became the dominant model, the simulation accuracy did not change significantly no matter how we adjust the combination of time steps and number of stations, mainly visualized in Fig. 15b. Therefore, model selection should be addressed as a priority in semi-arid regions. Figure 15e shows the iso-surface with higher CI values beyond the ''abrupt change zone.'' The optimal combination of time steps and number of stations was the key to further improve the simulation effect under the domination of the suitable model, which indicated that increasing the observation frequency and station density can effectively improve the simulation accuracy. Furthermore, Fig. 15c shows a rather unusual iso-surface, i.e., a ''gradual change zone'' before the appearance of suitable models. In this zone, achieving a similar simulation accuracy for different models by adjusting the time steps and number of stations was possible. According to the above results (Sec. 4.2.1), the ''gradual change zone'' may be caused by adjusting the model parameters, which offsets the error in the input data. Therefore, this feature may be appropriately used in production practices for emergency or rapid forecasting; however, this feature should be avoided in scientific research so as to yield ''local optimal solutions'' for the development of hydrological models. Figure 16 shows the 3-D distribution of the comprehensive indicators (CIs) for the Caoping watershed. The characteristics of the Caoping watershed almost consistent with those of the Suide watershed, or even more apparent. For example, in Fig. 16e, the CI values were heavily

Discussion
Overall, M3 performs well because of the small area of the Caoping watershed and the high temporal and spatial resolution of the observed rainfall. The Suide watershed has a large area, relatively sparse station density, and low temporal accuracy in the observed rainfall data, which leads to better performance in M2, as compared with the other models. This is because the Green-Ampt model is a pointscale infiltration model suitable for small or experimental watersheds with abundant data. The Caoping watershed fits well with the ideal requirements for Green-Ampt model operation, which leads to the highest simulation accuracy. However, high spatiotemporal resolution observation data are not available in the Suide watershed, such that we are unable to solve the scale transformation from a point-scale infiltration simulation to a watershed-scale rainfall-runoff simulation; thus, the simulation accuracy is significantly compromised. The infiltration-excess mechanism is characteristic of a local modelling scale, whereas the saturation-excess mechanism, which is linked to a cumulative phenomenon and conditioned by a lateral redistribution movement of the water in the soil, becomes dominant with an increase in the scale of the modelling (Blöschl and Sivapalan 1995). Until now, the resolution of data observation records for rainfall observations using self-registering rain gauges has been mostly limited and inconsistent. The determination of the time step is dependent on the resolution of the observation records, where long observation intervals lead to large time steps, affecting the rainfall-runoff responses. When the observation records are synchronised and the resolution is free, smaller time step settings yield more accurate simulations. In contrast, the time step setting for unsynchronized and resolution-limited observations should be set based on the data observation record. An excessively large time step would result in simulated values smaller than the observed record while too small a step would result in simulated values larger than the observed record.
For semi-arid regions, if the spatiotemporal resolution of the data for the watershed is high, the Green-Ampt model can be directly considered to obtain desirable simulation results. However, there have been few watersheds with sufficiently high spatial and temporal accuracy observation data and many watersheds with low-accuracy data, such as the Suide watershed. Therefore, with the existing data accuracy, the hybrid runoff generation model should be considered a priority.

Conclusions
In this study, based on the idea of organically integrating hydrological experiments and simulations, two nested experimental watersheds in a typical semi-arid region of China were selected. Furthermore, the rainfall self-similarity fractal method and station sampling method were conjunctively used to obtain different spatiotemporal rainfall schemes. These schemes were then used to drive three hydrological models with different runoff generation mechanisms to evaluate the effects of the time step, station density, and model structure on flood simulations in semiarid regions. Moreover, based on the limited simulation schemes, the time step, station density, and degree of the hybrid model were uniformly discretised. Then, the 3-D distribution between the simulation results and influencing factors were constructed through a machine learning-based spatial interpolation method to explore the complicated high-dimensional nonlinear relationship between the ''time-space-model'' factors and the simulation results. The following conclusions were obtained. 1. The rainfall intensity has a strong influence on the runoff generation in semi-arid regions. The results of the flood simulation with high-resolution rainfall data in the Caoping watershed showed that smaller time steps yield a higher accuracy for the rainfall scheme and flood simulations that more closely approximate the observed values. However, in practice, there are few watersheds with high-precision rainfall data. Thus, we must still consider the observation step of the original data; otherwise, the smaller the time step, the worse the simulation effect. Under the premise that the station density meets the representativeness, the effect that an increase or decrease in the station density has on rainfall is mainly characterised by the absence of storm centres and slight differences in the average rainfall. 2. Whether the hydrological model can accurately describe the dominant hydrological process is the key to flood simulation in semi-arid regions. If this basic premise cannot be satisfied, the simulation accuracy always tends to be low regardless of the combination of time steps and station density, and vice versa. Therefore, if a universally adapted model has poor application results in a semi-arid watershed, modules adapted to this region should be added or modified. In other words, the hydrological model should be adjusted and developed from the model structure. We accordingly recommend the hybrid model. 3. Based on the limited original simulation scheme, the KNN spatial interpolation technique was used to construct a high-dimensional spatial distribution of the influencing factors and model simulation results, which can visually and effectively describe the complicated relationships among the time step, station density, model structure, and simulation accuracy. Through further analysis of the ''abrupt change zone'' and ''gradual change zone,'' this study could partially enhance and provide directions for the development of hydrological models in semi-arid regions.
In this study, the model structure samples were limited and the interpolation results using the KNN technique were inevitably not sufficiently accurate. Future studies should consider hybrid models with different degrees. In addition, the quantitative expression function relationship between the impact factor and model simulation still requires further exploration.