Catchment natural driving factors and prediction of baseflow index for Continental United States based on Random Forest technique

Baseflow plays a critical role in maintaining the aquatic environmental health. However, the driving factors and predictions of baseflow have not been rigorously investigated on a large scale, partly preventing hydrologist from deeply understanding runoff generation. To this end, the Lyne–Hollick digital filter method and the automatic baseflow identification technique were used to estimate the long-term and seasonal baseflow index (BFI) of 619 catchments across Continental United States (CONUS) from 1981 to 2014. Six natural driving factors are selected from the 31 catchment attributes about topography and location, soil, geology, land cover, and climate characteristics. The Random Forest (RF) technique was used to predict the BFI with the selected six driving factors as predictors. Results show that the long-term average BFI was 0.49, and the BFI value was different in four seasons, with the highest value of 0.55 in winter and the lowest value of 0.46 in autumn. The forest fraction, clay proportion and snow fraction were the most powerful factors affecting the long-term average BFI. The RF technique predicts the BFI across the 619 sites in CONUS with a R2 of 0.59 after Leave-One-Location cross-validation, which was more satisfactory than the multiple linear regression method. This study can provide a deep insight into the generation and variation of baseflow and guide the annual baseflow prediction for water resources management.


Introduction
Baseflow, including the groundwater, the slow soil runoff, and the water resource replenishment from lakes, reservoirs, and glaciers, greatly helps in determining the available water resources and environmental protection during dry periods (Hall 1968). Given that baseflow is released slowly when it flows through the soil matrix and maintains a relatively stable flow volume for a long time (Beck et al. 2013), it has important ecological functions, such as regulating the seasonal distribution of river flow (Gan et al. 2015), maintaining the aquatic habitat under extreme weather conditions (Fan et al. 2013), and transferring physical and chemical substances (Bosch et al. 2017). Sufficient river discharge must be ensured based on the efficient quantification or prediction of baseflow source (Goncalves et al. 2020). However, the spatial variation and generation of baseflow remain unclear due to the small number of gauging stations and the complex aquifer condition. Thus, the in-depth analysis of the driving factors of the baseflow is essential to understanding the hydrological cycle and sustainable water resources management (Araza et al. 2020).
BFI is the ratio of the mean baseflow to the mean streamflow in a certain region during a long-term period and can be calculated after the baseflow is separated from the daily streamflow. Many methods, which can be divided into tracer and non-tracer methods, have been developed for separating the baseflow to calculate the BFI values as a hydrological characteristic (Brunner et al. 2018;Georgek et al. 2018;Jones et al. 2006). Compared with tracer methods, which require high labor and operation costs, non-tracer methods have been widely used by hydrologists (Lott and Stewart 2016). Non-tracer methods include graphic methods (McNamara et al. 1997), digital simulation methods (Chapman 1991;Eckhardt 2005;Furey and Gupta 2001;Lyne and Hollick 1979), and physical chemistry methods (Cook et al. 2006). Many studies have discussed and compared these methods in terms of baseflow separation performance. For example, Xie et al. (2020) applied hydrologic model and digital filtering methods to separate the daily baseflow in CONUS and attempted to explore the separation performance. They found that the BFI estimated by Eckhardt digital filtering has the largest Nash Sutcliffe efficiency (NSE) median value among the 1145 study sites. Zhang et al. (2017a) evaluated the relative advantages of three types of non-tracer separation methods by using the tracing method as the criterion, and their results demonstrate that the use of automatic baseflow identification technology (ABIT) in estimating the recession constant produces smaller errors than the default recession constant of the baseflow simulation.
The BFI can reflect the baseflow and streamflow status of the catchment, which is closely related to the catchment characteristics, such as the soil characteristics, the geological conditions, and the topographical factors (Henderson and Wooding 1964;Zhang et al. 2019b). According to previous studies, the BFI is often used as a hydrological factor to reflect the low flow conditions of catchments (Loveridge and Rahman 2014;Seo et al. 2018;Yang et al. 2018). Generally, regions with a high BFI have fewer extreme floods and more stable streamflow (Bastola 2018). Many studies on the BFI have been conducted for water resource management and planning. For example, Sapač et al. (2020) and Zhang et al. (2017b) defined the BFI as the most common indicator in low flow studies and used the BFI as part of the hydrograph recession analysis to provide an intuitive overview of the low-flow environment; Yang et al. (2020) calculated the BFIs of the middle reaches of Yellow River to evaluate the ecological construction of the Loess Plateau in support of the environmental management; Schilling and Zhang (2004) and Zhu et al. (2019) relied on the BFI to estimate and control pollutants transmission.
Although the BFI in a certain site could be calculated easily using the aforementioned non-tracer methods, exploring the distribution and generation mechanism of baseflow and BFI in ungauged sites to support river flow management remains a challenge. Several studies have attempted to apply the knowledge on gauged catchments to the baseflow or BFI prediction in ungauged catchments . For example, Molla and Tegaye (2019) established the stepwise regression equation between the BFI, the slope, and the drainage density, providing a base tool for exploring the baseflow in ungauged catchments. Zhang et al. (2020) conducted a large-scale comparison of the BFI prediction performance for 596 catchments in Australia, and their results show that the multilevel regression method with a NSE of 0.75 and a bias of 19% outperformed the linear regression methods and the hydrological models (i.e., SIMHYD and Xinanjiang model) in terms of prediction. Thus, Zhang et al. (2020) concluded that multilevel regression could improve BFI prediction and be applied to predict large-scale hydrological characteristics. Singh et al. (2019) applied the RF technique in estimating the baseflow of all river reaches across New Zealand based on some catchment attributes and then obtained overall satisfactory predictions of the long-term average BFI and seasonal BFIs spatial distribution.
Despite many efforts on baseflow separation and simulation, the driving factors of baseflow have not been rigorously and systematically identified in a large area. Accurately identifying the driving factors helps in understanding the baseflow generation and the prediction of BFI for ungauged stations. Therefore, this study has the following objectives: 1) to investigate the spatiotemporal pattern of baseflow in 619 catchments of CONUS, 2) to intensively analyze the driving factors of baseflow for a better understanding of the baseflow generation, and 3) to provide a reliable approach to the BFI prediction of ungauged sites using the RF technique to guide the development of water resources management.

Study area and data
A total of 619 catchments in CONUS were used in this study for analyzing the BFI. The streamflow, the precipitation, the potential evapotranspiration, and 31 catchment attributes were collected from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) datasets (Addor et al. 2017;Newman 2015). Before application, 619 gauged sites with a length of more than 30 years  were selected to ensure the reliability of the long-term average calculation, and the streamflow change trend during the study period was analyzed as shown in Fig. 1.
The average daily streamflow datasets can be obtained from the USGS National Water Information System server (http://waterdata.usgs.gov/usa/nwis/sw). 31 attributes from five categories (i.e., topography and location, climatic indices, soil signatures, land cover characteristics, and geological characteristics) were applied to describe the generation of baseflow, and these factors were derived from the CAMELS data sets. Among the 31 attributes, 5 are topography and location attributes, 6 are climatic indices, 6 are land cover characteristics, 11 are soil signatures, and 3 are geological characteristics types (Table 1).

LH baseflow separation
The Lyne and Hollick recursive digital filter method (the LH method), a widely used digital signal method, was used to separate the streamflow into baseflow and quick flow (Lyne and Hollick 1979). LH is easy to apply on large contiguous areas and can obtain the same results from different people through its objective and automaticity (Chapman 1991). The model formula is as follows (Nathan and McMahon 1990):  (Nathan and McMahon 1990;Xie et al. 2020). However, in a large-scale study with different catchment features, such as soil and terrain, a constant parameter cannot provide a satisfactory baseflow estimate for all regions (Zhang et al. 2017a). Before separating the baseflow, the ABIT was used to assess the recession parameter of each catchment. The ABIT was proposed by Cheng et al. (2016) based on the BN77 method, which eliminates the impact of evapotranspiration and avoids the uncertainty of the recession time period after the rainfall, to obtain the baseflow objectively and quickly (Brutsaert and Nieber 1977). The formulation to describing the baseflow is express as: where t is daily step, and K is the catchment characteristic drainage time d ½ . By integrating Eq. 1, the baseflow decay expression can be obtained: where q 0 is the baseflow in the beginning of the study period [mm/d], and the recession constant of each catchment can be expressed as: All the recession points are selected as the component of the baseflow record (Cheng et al. 2016). This method aims to plot the points of logarithmic Àdq=dt against drought flow q and obtain the lower envelope, keeping 5% points below it (Fig. S1). As a result, the combination of the LH method and the ABIT is hydrologically significant and can more reasonably estimate the baseflow generation process.
The daily baseflow was separated from the streamflow records, and the long-term BFI was calculated by averaging the baseflow to streamflow ratio over the entire period. The baseflow and the streamflow were split into four seasons: spring (March, April, May), summer (June, July, August), autumn (September, October, November), and winter (December, January, and February of the following year). Then, the seasonal BFIs were averaged the ratios of baseflow and streamflow in each season.
The coefficient of variation (CV) was applied to assess the variability of flow on an annual scale: where r is the standard deviation of annual flow, and l is the mean annual flow.

Random Forest (RF)
The RF method is an intelligent artificial algorithm that can be used for classification and regression and has been applied in predicting the BFI by regarding the BFIs as a function of a series of selected driving factors. The RF method was proposed by Breiman (2001) and Cutler et al. (2011) based on bootstrap sampling methods and constructed from many independent decision trees. According to the continuously growing decision trees, the non-linear relationship is integrated to connect the response with a set of predictors (Ellis et al. 2012;Singh et al. 2019). Each branch in the decision tree represents the current prediction results, and the final output of the model is determined by each decision tree in the forest. There are two aspects of randomness in the regression process that have ensured the robustness and stability of the prediction results, which are the sampling of sub-training sets and the selection of attributes for each decision tree during construction (Breiman 2001;Cutler et al. 2011).
The simulation results were qualitatively assessed with Out-Of-Bag (OOB) prediction. In OOB, the study sites were randomly divided into two groups, namely, the ''gauged sites'' for model training and the ''ungauged sites'' of model assessment. The ''gauged sites'' were involved in the construction of RF regression models through repeated training. The ''ungauged sites'' were assumed to be unobserved, and the long-term and seasonal BFIs were predicted by a series of predictors through constructed models (Booker and Snelder 2012). The prediction performance and reliability were validated qualitatively by plotting the predicted-measured values.
The RF method has three important parameters. The number of variables randomly sampled as candidates at each split is equal to 4 after successive selection. The number of decision trees in the forest is equal to 600 to stabilize error fluctuations. Importance values, which represent the increase in model prediction error when the variable is subtracted, are a variable importance matrix and measured by the residual sum of squares.
To improve the robustness of the prediction model, we developed two criteria for selecting the driving factors from the catchment attributes of CAMELS data set: 1) the driving factors have a good correlation with the BFI, and the significance level is less than 0.05; 2) no obvious collinearity exists between the selected driving factors.

Evaluation metrics
The Leave-Location-Out (LLO) cross-validation approach was used to test the BFI prediction performance. Unlike the traditional cross-validation method, LLO packages neighboring sites together to eliminate spatial autocorrelation and avoid the optimistic prediction caused by over-fitting (Brenning 2005;Brenning and Lausen 2010). According to the longitude and latitude of the sites on the map, the K-means clustering method was used to cluster the neighboring sites in space without considering any catchment attribute mapping of the sites. During the LLO crossvalidation, each cluster was assumed to be ungauged in turns, and the remaining clusters were used as the training set to establish the prediction model. Finally, the error result of the prediction model is the mean of the deviation of each prediction on the test set (Hüllermeier et al. 2010).
Furthermore, three metrics were applied to quantitatively assess the model performance, including the NSE coefficient (Nash and Sutcliffe 1970), the bias (Legates and McCabe 1999), and the root mean squared error (RMSE) (Gupta et al. 1999). BFI M is the ''measured'' BFI, BFI P is the predicted BFI by RF simulation, and n is the number of catchments.

Spatial pattern of estimated baseflow in CONUS
The baseflow of 619 sites across CONUS was estimated based on LH method, and the spatial distributions of the long-term and seasonal BFIs are shown in Fig. 2. In general, the long-term average BFI of all sites range from 0.02 to 0.98, with an average value of 0.49. The BFIs were markedly higher in the northern-central part of CONUS (e.g., Michigan along with the shores of Lake Michigan and Southwestern Wisconsin) than those in the other parts, that is, approximately between 0.7 and 1. The BFIs were high mainly because these regions are located along the coast of Lake Michigan, which has sufficient water supply and high soil moisture, which are conducive to the generation of baseflow (Wolock 2003;Zhang et al. 2013). Meanwhile, the BFIs in the eastern and northwestern regions of CONUS ranged from 0.46 to 0.75, which is higher than the national average level. However, the BFIs of the central region were generally below the national average level, especially in the middle Mississippi Valley regions and the Western Gulf Coast, where the contribution of desert rock landscape and the lack of vegetation coverage led to weak soil storage capacity and most of the rainfall flows away in the form of quick flow. In seasonal BFIs, the average BFI in winter was the highest at 0.55; the average BFIs in spring and summer were lower at 0.49 and 0.53, respectively; and the lowest BFI in autumn was 0.48. However, anomalies occurred in the Midwestern CONUS (e.g., Rocky Mountains), which exhibited the highest BFIs in autumn and winter, and the lowest range in spring. This phenomenon occurred because these areas are dominated by snowfall, and the freezing period generally begins early in October. Therefore, when the season shifts to autumn, streamflow is mostly used to recharge the subsurface and leads to sufficient baseflow, resulting in high BFIs in the four seasons. On the contrary, as the temperature gradually rises in spring, most of the ice melts and forms quick flow, but the baseflow responds with latency and presents the lowest BFI pattern in the season (Bryant, 2020;Karlstrom and Houston, 1984;Liefert et al. 2018).
Furthermore, the coefficient of variation (CV) was used to measure the variation of baseflow and streamflow for each site during the study period, as shown in Fig. 3. The comparison of the long-term BFI distribution patterns (Fig. 2e) indicates that the BFI is negatively correlated with the CV, which is consistent with the results of the Longobardi and Villani (2020). The station with a high BFI is usually accompanied by a low CV, indicating that in the high BFI regions, both baseflow and streamflow will generally become more stable than those in the low BFI regions. For example, the BFIs were generally higher (0.5-0.75) in the Pacific Northwest regions of CONUS, but the CVs of baseflow (0-0.3) and streamflow (0.4-1) were lower than the mean national levels.

Selection of driving factors
The driving factors were selected from the 31 catchment attributes are shown in Fig. 4, and the results of significance analysis and collinearity calculation among selected factors are described in Table S1, S2. Six attributes (e.g., clay fraction (CF), soil depth to bedrock (SD), mean of catchment elevation (EL), forest fraction (FF), subsurface porosity (GSP), snow fraction (SF)) were selected as dominant factors to predict the long-term and seasonal BFIs. Although the mean of catchment slope (SM) factor has a significant positive correlation with BFI, it has a serious collinearity with other factors, so it was deleted. CF and SD represent the soil characteristics and could reflect the local water retention capacity (Bloomfield et al. 2009b). EL is the topography and location feature and reflects the influence of terrain conditions on hydrological processes (Santhi et al. 2008). FF is a landcover factor that reflects vegetation coverage. GSP is the feature of geology, which indirectly reflects the aquifer condition of different regions. SF refers to the fraction of precipitation falling as snow when the temperature is below 0°C and is included in the climate characteristics that affect the baseflow generation under the influence of low temperature (Xie et al. 2020).

Importance of driving factors
The generation of baseflow is always connected to local catchment attributes (Xie et al. 2020;Zhang et al. 2020). To understand the baseflow generation mechanism under the influence of the catchment attributes, the RF method was used to evaluate their importance because the method can assign an importance value to each attribute, and the results are shown in Fig. 5. Importance is equivalent to the error that results from deleting the attribute from the linear model, which represents the accuracy contribution of the attribute when it is independently predicted (Booker and Snelder 2012). The forest fraction (FF) is the most important predictor for the long-term average BFI with an importance value of 3.9, followed by the clay fraction (CF) predictor according to their importance values in the RF method. FF determines the proportion of baseflow in most catchments during the long period. Given that lush forests can affect the formation of soil texture and vegetation landscape and improve the soil storage capacity, they provide a suitable environment for the formation of baseflow (Zhang 2019a). Furthermore, the generation mechanism of baseflow is varies throughout the seasons. In spring, the generation mechanism of baseflow is similar to the long-term BFI, with the forest proportion having the greatest influence, followed by the clay proportion. The generation mechanism of baseflow in autumn and winter is similar. The most influential driving factor is the snow fraction (SF) and then the average elevation (EL) factor has flow. This phenomenon is mainly due to the special climate in the high-altitude regions of Midwestern CONUS, with an early and long winter (usually beginning in October and ending in June) and an annual 1000-mm precipitation dominated by snow (Musselman et al. 1994).

Prediction performance of long-term and seasonal BFIs
The LLO cross-validation approach was used to evaluate the performance of the RF technique in terms of predicting long-term and seasonal BFIs. Meanwhile, for comparative analysis, we applied the widely used multiple linear regression (MLR) method to predict the BFIs (including long-term and seasonal BFIs) and adopted the LLO crossvalidation approach to evaluate the prediction performance as well. The validation results are shown in Fig. 6. The points represent the predicted-measured BFI values of all the sites, and the scatter points generally follow the 1:1 linear distribution. The RF models exhibited better performance than the MLR method in predicting long-term and seasonal BFIs in terms of R 2 , indicating that the RF prediction results are reliable. The quantitative validation of the predicted performance was assessed by the NSE, the RE, and the bias, as shown in To illustrate the performance of spatial prediction, we interpolated the long-term and seasonal BFI predictions and plotted the error of each site in Fig. 7. Compared with Fig. 2, the predicted BFI spatial patterns in Fig. 7   In the long-term BFI spatial prediction, 489 of 619 ''ungauged'' sites have a reasonable prediction with an error of less than 0.3. Overall, the prediction performance of the model is satisfactory, especially in the area with high BFI value, such as the northwestern regions of CONUS and the Atlantic coast.

Discussion
To better understand the existence of baseflow on a large scale, the LH method was used with the ABIT to estimate the baseflow of 619 sites across CONUS, and the RF algorithm was applied for an in-depth and systematic analysis of the driving factors of the BFI for exploring the generation mechanism of baseflow. Consequently, the FF and CF played an important role in the long-term baseflow generation, and the SF is dominant in the generation of seasonal baseflow (especially in autumn and winter). Additionally, the large-scale long-term and seasonal BFIs prediction based on the RF technique exhibited promising results.

Variations of baseflow and BFI
The spatiotemporal patterns of streamflow, baseflow, and BFI across CONUS were analyzed in this study, and the pattern of BFI is consistent with the previous study by the U.S. Geological Survey (Wolock 2003). Many studies have explored the baseflow/BFI variation in a large-scale area and analyzed the impacts under the catchment attributes. Santhi et al. (2008) applied the USGS data set to explore the BFI spatial variation in CONUS, but there were differences. Their study is to explore the correlation between baseflow and hydro-geologic variables in different hydrological landscape regions, and the purpose of this work is to Fig. 6 BFI prediction performance by RF (upper Fig.) and MLR (lower Fig.) in 619 catchments across CONUS analyze the influence of natural driving factors on the baseflow generation and make accurate prediction. Gudmundsson et al. (2019) studied the historical trend of global river flow and assesses the availability of global water resources by focusing on changes in low flows, average flows, and high flows. They have done meaningful work on an overview of global flows and provide support for the analysis of the baseflow mechanisms. Furthermore, the changes in baseflow and streamflow at each site during the study period were also considered and measured as CVs.
The comparison of the long-term BFI distribution (Fig. 2e) and the CV distribution (Fig. 3) across CONUS indicates that the CVs of baseflow and streamflow present a significant negative correlation with the BFI of -0.54 and -0.59, respectively, as shown in Fig. 8. This result indicates that regions with high BFIs are usually accompanied by stable baseflow and streamflow. A similar study was conducted by Bastola et al. (2018) who tried to analyze the contribution of the monthly and annual baseflow changes to streamflow, and their results show that a region with a high BFI usually has a low CV value. Thus, attention must be given to low BFI regions whose baseflow and streamflow are vulnerable, which may threaten the sustainable development of water resources.

Mechanism of baseflow generation under various catchments attributes
The FF exhibit good performance for the long-term BFI prediction across CONUS. In general, FF is the most important indicator for evaluating baseflow, that is, the higher forest coverage is, the more permeable the soil and the greater the water retention capacity are (Lacey and Grayson 1988). Although some studies have suggested that higher watershed forest cover increases evapotranspiration rates and thus leads to lower baseflow, many other studies have shown that higher catchment forest cover increases baseflow due to higher soil permeability providing adequate groundwater recharge (Bruijnzeel 2004;Bruijnzeel 1991). Ma et al. (2009) conducted a study of the watershed in southwestern China, and the results showed that the reforestation increased baseflow and reduced streamflow, and attributed this phenomenon to the increase in soil infiltration. Price et al. (2011) showed that there was a significant positive correlation between forest cover and baseflow in the southern Appalachian Highlands. The CF is the second important factor, which is conducive to the infiltration of surface water and provides a favorable environment for the baseflow generation (Bloomfield et al. 2009a). Xu et al. 2013) discussed the baseflow generation by using the climate elasticity approach to assess the sensitivity of hydrological changes to climate and land surface changes in Midwestern CONUS and found that the influence of land surface changes on the baseflow and the BFI are significantly greater than the effects of climate change, and this study strengthens this point. Besides, snow fraction is an important driving factor on baseflow generation as well. When the temperature reaches a certain point, seasonal snowmelt will lead to the formation of peak discharge in spring and summer (Zhang et al. 2014). Most of the water melt from the snowpack will be supplied to the river in the form of quick flow. Besides, during the snowmelt period, the presence of frozen soil hinders the infiltration of water and accelerates the generation of quick flow (Shanley and Chalmers 1999). Therefore, in the high-altitude frozen catchment of CONUS, the BFI is the lowest in the spring and summer, which is consistent with the seasonal changes we have analyzed. The snow fraction factor has the most obvious impact on the generation of BFI in autumn and winter, especially reflected in the Midwestern CONUS, as shown in Fig. 2c, d. Catchments in the Midwestern CONUS have the highaltitude, low-temperature and perennial snow cover basin characteristics, which have an insignificant increase of BFI in the autumn and winter. Because these catchments have fewer precipitation in autumn and winter, lead to fewer replenishment for streamflow, and the average temperature in winter is below 0°C for most sites, thus the streamflow is significantly less, but the decline in the baseflow is relatively stable (Woods 2009). Considering the BFI is in fact relative relationship between baseflow and streamflow, the ratio of base flow to streamflow (BFI) is large, and BFI has a clear increasing trend during these seasons. However, the snowmelt process can also bring uncertain effects on the generation of baseflow. In large catchments, the snowmelt rate varies with the geographical conditions, the physical environment, and the elevation of the catchments ). In addition, most of the snowmelt water does not directly supplement the river discharge but enters the snowdrift for the refreezing process (Marsh and Woo 1984). These uncertain processes pose a challenge to analyzing the generation of the baseflow.

BFI prediction for water resource planning and management
The RF technique was used in this study to predict longterm and seasonal BFIs, and the R 2 , NSE, bias, and RMSE indexes are better than those in the MLR method, as shown in Table 2. The results indicate that the RF technique, which trains many decision trees with the randomness of attributes and sub-training sets, has good applicability in large-scale learning prediction. Besides, there are many studies using RF technology to predict hydrological variables and achieved pretty results. Olson and Hawkins (2013) established an RF model to predict the continuous spatial variation of total P and total N in rivers of the western United States and predicted better than the previous physical model. Fouad and Loaiciga (2020) evaluated the regression models of the percentile flow in 918 basins of CONUS and believed that the prediction effect of the RF technique is superior to the baseline regression procedure. Although RF is an empirical technique that cannot reflect any physical mechanism, the six predictors selected cover the characteristics of basin topography, soil, geography, land cover, and climate. The comparison of different combinations of catchment attributes and the application of MLR methods indicate that the current prediction model showed a satisfactory performance, especially in areas with high BFI values. Therefore, integrating digital filtering analysis and the RF technique into a framework for largescale baseflow separation and prediction is a promising approach, which can provide an effective method for lowflow prediction and water ecological management. Baseflow is always connected to complex hydrogeology conditions and affected by human activities and climate change (Rodiger et al. 2020;Tan et al. 2020). The catchments of the CAMELS data set used in this study are minimally affected by human activities, which reduces the impact of human activities on the generation of baseflow. In the following research, it is meaningful to explore the driving mechanism of human impact (such as reservoir construction, water resource regulation, vegetation protection, etc.), as well as the driving mechanism under the combined influence of catchment attributes (River and Richardson (2018), Zhu et al. (2019)). It is worth noting that, Cheng et al. (2016) proposed an automatic technique based on BN77 theory (Brutsaert and Nieber 1977) to obtain the recession constant more objectively and quickly. Singh et al. (2019); Zhang et al. (2020) applied this method to the baseflow analysis in the Australian and New Zealand basins and obtained satisfactory performances.

Conclusions
This study aims to systematically analyze the spatial variation of baseflow and the driving factors of baseflow and provide a reliable approach to BFI prediction. On the basis of the study of CONUS from years 1981 to 2014, the preliminary conclusions were as followed: (1) In the spatial pattern, the Great Lakes region and the Rocky Mountains have the highest BFIs across CONUS, with enough water supply. However, the BFI is lower in the central region, especially in the middle of the Mississippi Valley and along the Western Gulf Coast, where the existing desert rock landscape and the sparse vegetation cover lead to weak soil storage capacity, resulting in the conversion of rainfall into quick flow. Funding This work is financially supported by the National Natural Science Foundation of China (Grant No. 51979198), and chaired by Qianjin Dong.
Availability of data and material The data used in this work are from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) datasets, available at https://ral.ucar.edu/solutions/pro ducts/camels.
Code availability Not available.

Declarations
Conflicts of interest We declare there is no conflict of interest, and no conflict of interest exists in the submission of this manuscript.