2.1 Study area and data
A total of 619 catchments in CONUS were used in this study for analyzing the BFI. The streamflow, the precipitation, the potential evapotranspiration, and 31 catchment attributes were collected from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) datasets (Addor et al. 2017; Newman et al. 2015). Before application, 619 gauged sites with a length of more than 30 years (1981–2014) were selected to ensure the reliability of the long-term average calculation, and the streamflow change trend during the study period was analyzed as shown in Fig. 1.
The average daily streamflow datasets can be obtained from the USGS National Water Information System server (http://waterdata.usgs.gov/usa/nwis/sw). 31 attributes from five categories (i.e., topography and location, climatic indices, soil signatures, land cover characteristics, and geological characteristics) were applied to describe the generation of baseflow, and these factors were derived from the CAMELS data sets. Among the 31 attributes, 5 are topography and location attributes, 6 are climatic indices, 6 are land cover characteristics, 11 are soil signatures, and 3 are geological characteristics types (Table 1).
Table 1
Description of 31 catchment attributes
| Name | Unit | Description | | Name | Unit | Description |
Topography and location | LA | ° north | gauge latitude | Soil signatures | SC | cm/hr | soil conductivity |
EL | m | catchment mean elevation | MWC | m | maximum water content |
SM | m/km | catchment mean slope | SDS | m | soil depth to water and bedrock layers |
SIZ | km2 | catchment size | SIF | - | silt fraction |
LO | ° east | gauged longitude | SP | - | volume porosity of soil |
climate indices | SF | - | fraction of precipitation falling as snow | WF | - | water fraction in soil |
PET | mm/day | mean daily PET | Land cover characteristics | FF | - | forest fraction |
PM | mm/day | mean daily precipitation | LM | - | maximum leaf area index |
AR | - | aridity index | DL | - | dominant land cover type |
PS | - | seasonality and timing of precipitation | GVF | - | green vegetation fraction |
TEM | °C | mean temperature | RD50 | m | root depth of 50% land cover vegetation |
Soil signatures | CF | - | clay fraction | RD99 | m | root depth of 99% land cover vegetation |
SD | m | soil depth to bedrock | Geological characteristics | GSP | - | geology subsurface porosity |
OF | % | organic fraction in soil | CR | - | carbonate rocks fraction |
OF1 | % | other fraction in soil | GP | m2 | geology permeability (log10) |
SA | - | sand fraction | | | |
2.2 Methods
2.2.1 LH Baseflow separation
The Lyne and Hollick recursive digital filter method (the LH method), a widely used digital signal method, was used to separate the streamflow into baseflow and quick flow (Lyne and Hollick 1979). LH is easy to apply on large contiguous areas and can obtain the same results from different people through its objective and automaticity (Chapman 1991). The model formula is as follows (Nathan and McMahon 1990):
$${q}{q}\left(i\right)=\left{\begin{array}{c}\alpha {q}{q}\left(i-1\right)+\frac{\left(1+\alpha \right)}{2}\ 0, otherwise\end{array}\right.\left[q\left(i\right)-q\left(i-1\right)\right], if {q}_{q}\left(i\right)\ge 0$$
1$${q}{b\left(i\right)}=q\left(i\right)-{q}{q}\left(i\right)$$
2 features, such as soil and terrain, a constant parameter cannot provide a satisfactory baseflow estimate for all regions (Zhang et al. 2017a). Before separating the baseflow, the ABIT was used to assess the recession parameter of each catchment. The ABIT was proposed by Cheng et al. (2016) based on the BN77 method, which eliminates the impact of evapotranspiration and avoids the uncertainty of the recession time period after the rainfall, to obtain the baseflow objectively and quickly (Brutsaert and Nieber 1977). The formulation to describing the baseflow is express as:$$\frac{dq}{dt}=-\frac{1}{K}q$$
3 $${q}{b}={q}{0}\text{e}\text{x}\text{p}(-\frac{t}{K})$$
4 $$\alpha =\text{e}\text{x}\text{p}(-\frac{1}{K})$$
5 All the recession points are selected as the component of the baseflow record (Cheng et al.2016). This method aims to plot the points of logarithmic (-dq/dt) against drought flow q and obtain the lower envelope, keeping 5% points below it (Fig. S1). As a result, the combination of the LH method and the ABIT is hydrologically significant and can more reasonably estimate the baseflow generation process.
The daily baseflow was separated from the streamflow records, and the long-term BFI was calculated by averaging the baseflow to streamflow ratio over the entire period. The baseflow and the streamflow were split into four seasons: spring (March, April, May), summer (June, July, August), autumn (September, October, November), and winter (December, January, and February of the following year). Then, the seasonal BFIs were averaged the ratio of baseflow and streamflow in each season.
The coefficient of variation (CV) was applied to assess the variability of flow on an annual scale:$$CV=\frac{\sigma }{\mu }$$
2.2.2 Random Forest (RF)
The RF method is an intelligent artificial algorithm that can be used for classification and regression and has been applied in predicting the BFI by regarding the BFIs as a function of a series of selected driving factors. The RF method was proposed by Breiman (2001) and Cutler et al. (2011) based on bootstrap sampling methods and constructed from many independent decision trees.
According to the continuously growing decision trees, the non-linear relationship is integrated to connect the response with a set of predictors (Ellis et al. 2012; Singh et al. 2019). Each branch in the decision tree represents the current prediction results, and the final output of the model is determined by each decision tree in the forest. There are two aspects of randomness in the regression process that have ensured the robustness and stability of the prediction results, which are the sampling of sub-training sets and the selection of attributes for each decision tree during construction (Breiman 2001; Cutler et al. 2011).
The simulation results were qualitatively assessed with Out-Of-Bag (OOB) prediction. In OOB, the study sites were randomly divided into two groups, namely, the “gauged sites” for model training and the “ungauged sites” of model assessment. The “gauged sites” were involved in the construction of RF regression models through repeated training. The “ungauged sites” were assumed to be unobserved, and the long-term and seasonal BFIs were predicted by a series of predictors through constructed models (Booker and Snelder 2012). The prediction performance and reliability were validated qualitatively by plotting the predicted–measured values.
The RF method has three important parameters. The number of variables randomly sampled as candidates at each split is equal to 4 after successive selection. The number of decision trees in the forest is equal to 600 to stabilize error fluctuations. Importance values, which represent the increase in model prediction error when the variable is subtracted, are a variable importance matrix and measured by the residual sum of squares.
To improve the robustness of the prediction model, we developed two criteria for selecting the driving factors from the catchment attributes of CAMELS data set: 1) the driving factors have a good correlation with the BFI, and the significance level is less than 0.05; 2) no obvious collinearity exists between the selected driving factors.
2.2.3 Evaluation metrics The Leave-Location-Out (LLO) cross-validation approach was used to test the BFI prediction performance. Unlike the traditional cross-validation method, LLO packages neighboring sites together to eliminate spatial autocorrelation and avoid the optimistic prediction caused by over-fitting (Brenning 2005; Brenning and Lausen 2010). According to the longitude and latitude of the sites on the map, the K-means clustering method was used to cluster the neighboring sites in space without considering any catchment attribute mapping of the sites. During the LLO cross-validation, each cluster was assumed to be ungauged in turns, and the remaining clusters were used as the training set to establish the prediction model. Finally, the error result of the prediction model is the mean of the deviation of each prediction on the test set (Hüllermeier et al. 2010).
Furthermore, three metrics were applied to quantitatively assess the model performance, including the NSE coefficient (Nash and Sutcliffe 1970), the bias (Legates and McCabe 1999), and the root mean squared error $$\text{N}\text{S}\text{E}=1-\frac{{\sum {i=1}^{n}\left({BFI}{M}-{BFI}{P}\right)}^{2}}{{\sum {i=1}^{n}\left({BFI}{M}-\stackrel{-}{{BFI}{M}}\right)}^{2}}$$
7$$\text{b}\text{i}\text{a}\text{s}=\left|\frac{\sum {i=1}^{n}\left({BFI}{P}-{BFI}_{M}\right)}{\sum {i=1}^{n}{BFI}{M}}\times 100\text{%}\right|$$
8$$\text{R}\text{M}\text{S}\text{E}=\sqrt{\frac{{\sum {i=1}^{n}\left({BFI}{M}-{BFI}_{P}\right)}^{2}}{n}}$$
9