Results for both monthly and annual discharges are discussed together. In general, TWSA- and altimetry-derived discharges provide reasonable monthly and annual estimates due to their being calibrated to the observed data (Fig. 3). In contrast, the GLDAS models are largely uncalibrated in the context of matching observed river discharges and do not account for water regulation. In this study, NOAH generally provided better estimates of discharge as compared to CLSM. While this study used NOAH (GLDAS 2.1) and CLSM (GLDAS 2.2), these findings are consistent with Hou et al. (2017)9 that shows NOAH (GLDAS 2.0) exhibited stronger correlations with observed monthly streamflow globally compared to CLSM (GLDAS 2.0). For both GLDAS models, results tended to be better in the eastern US as compared to western US. This finding is consistent with other studies56 and likely associated with non-linearly rainfall-runoff behavior in arid and semi-arid watersheds (e.g., western US) which runoff generation is highly variable.
A major difference between the GLDAS models used in this study, CLSM (GLDAS 2.2) and NOAH (GLDAS 2.1), is that the CLSM model assimilates GRACE TWSA. This difference is a likely reason for performance in the Great Lakes region. While NOAH discharges are reasonable in this region, both CLSM and TWSA-derived discharges perform poorly likely due to ‘leakage effects’ in the GRACE-derived TWSA estimates in this region 57. This was also highlighted in Duvvuri & Beighley, 202329 which shows strong temporal correlation between TWSA and Great Lake levels for watersheds near the Great Lakes. While not linked to GRACE leakage, the altimetry-derived discharges also tended to perform poorly in this region. One potential reason for this finding is the magnitude of uncertainty in altimetry derived WSE relative to the range in monthly river WSE. For example, the Great Lakes region receives less precipitation and generally has lower landscape gradients as compared to the Eastern US58,59which generally leads to lower variability in river WSE. However, additional research is needed to better understand what is causing the lower performance in altimetry-derived discharges in this region.
In addition to potential leakage impacts, GRACE data assimilation appears to not compensate for the lack of a groundwater withdrawal scheme in CLSM for regions with intensive groundwater abstraction 42. Girotto et al., 201660 also suggests the use of GRACE data can sometimes introduce negative trends in shallow groundwater. Thus, in irrigated areas, GRACE data assimilation might not properly attribute mass change to the different soil layers (i.e., surface soil moisture, root-zone soil moisture and groundwater). This could be a possible cause for the poor performance of CLSM model, particularly where the ground water depletion is caused by anthropogenic impacts 61 such as in Great plains and the Colorado. This is true in cases where ground water impacts river discharge. For example, Nie et al. (2018)62 has shown that accounting for groundwater pumping in the Great Plains improved the representation of TWS variability by LSMs and agreement with GRACE.
As noted above, the performance for the GLDAS models is lower in the western US as compared to eastern US. Challenges associated with runoff generation and its sensitivity to soil moisture are likely contributing factors. In the western US, simulated ET estimates tend to exhibit high relative uncertainty due to the limited availability of water, sparse vegetation cover, and high evaporative demand in the63. Additionally, human interventions and intricate terrain and topography, such as mountain ranges and deep valleys, exert localized influences on weather patterns, atmospheric stability, and wind dynamics, creating microclimates and local meteorological conditions that affect ET model performance 64. Yu et al., 2023 63identified CLSM as one of the models with high uncertainties in ET estimates among nine models in their study. Though not explicitly discussed in their study, annual average ET for CLSM showed a cluster of grid cells in the southeast US with anomalously small ET values. In this study, CLSM-HRR derived discharges are overestimated for a cluster of gauges in the southeast US (red points in Fig. 8) that correspond with these low ET estimates.
Performance of the GLDAS models did not appear to vary with drainage area while the remote sensing methods’ performance tended to improve for larger areas. The mean KGE and correlation at the gauges with KGE greater than 0.32 increased slightly with drainage area for TWSA- and altimetry-derived discharges. In the case of altimetry-derived, the mean correlation and KGE increased by nearly 0.2 for gauges draining less than 10,000 km2 to gauges greater than 25,000 km2. This finding was expected as both remote sensing methods are best suited for larger rivers. Fig S1 in the supplementary material highlights these findings.
Overall, all methods tend to perform better as watershed precipitation increases. This result is somewhat expected. For the GLDAS models, this is likely due to runoff generation being less impacted by dry initial conditions (i.e., highly non-linear infiltration behavior). For the TWSA-derived discharges, the TWSA-Q relationship tends to be strong as regulation is somewhat limited in these regions and large increases in monthly precipitation tends to show corresponding increases in TWSA and discharge. For the altimetry-derived discharges, large increases in monthly precipitation tends to result in large increases in discharge (i.e., WSE), which limits the impacts of altimetry derived WSE uncertainties on derived discharge.
To investigate contributions (i.e., correlation, bias and variability) to low KGE values (i.e., poor discharge estimates), the KGE component causing the largest reduction in KGE from its optimal value of 1 was determined at each gauge. In general, low correlation and high variability were the primary causes of the low KGE values. Here, we highlight results at the gauges with KGE less than 0.32 (i.e., locations where performance is less than our targeted value). Table 3 shows the fraction of sites most impacted by the three KGE components. Several key points are taken from Table 3. First, contributions from poor bias are lowest for TWSA- and altimetry-derived discharges, which is likely due to these methods being trained on gauge observations (i.e., training results in predication trending towards the mean). In addition, variability in TWSA-derived discharges is muted due the nature of the TWSA signal (i.e., large variability is not likely). These contributions are also consistent for monthly and annual estimates. In contrast, contributions from poor correlation are highest for these two methods at the monthly scale, which is likely due to TWSA signals impacted by regulation or GRACE leakage or uncertainty in altimetry measurements exceeding variations in water surface elevations. At the annual scale, there is a noticeable increase in sites where variability is the leading cause of low KGE suggesting that the overall uncertainty in the effective relationships is significant.
Table 3
Leading cause of low KGE for monthly (1st value) and annual (2nd value) discharges at gauges with KGE less than 0.32; values represent percent of sites with a given KGE component providing the largest impact on KGE.
KGE Component | TWSA-derived | NOAH-HRR | CLSM-HRR | Altimetry-derived |
Correlation | 88 to 57 | 34 to 20 | 40 to 9 | 52 to 28 |
Bias | 0 to 1 | 19 to 42 | 25 to 27 | 13 to 13 |
Variability | 12 to 42 | 47 to 38 | 35 to 64 | 35 to 59 |
The GLDAS models tend to show a somewhat more uniform distribution of contribution to low KGE but also suggest low correlation controls at the monthly scale and poor variability controls at the annual scale. As noted above and in other studies (e.g., Lv et al., 2018; Xia et al., 2017 65,67), peaks from GLDAS derived runoff/discharges tend to occur early. This could be the possible reason for poor correlation at the monthly scale. For example, Lv et al., 2018 observed that the snowmelt start time of GLDAS2.1-NOAH tends to be early, causing early peaks in discharges as compared to in-situ gauges in the northern hemisphere. At the annual scale, the impacts of poor peak timing is muted as the number of sites impacted by low correlation decreases, especially for CLSM-HRR (40% for monthly to only 9% for annual). The number of sites with low correlation also reflects the GLDAS models’ ability to provide reasonable water balance behavior (e.g., more precipitation – more runoff) at the annual scale where impacts due to water regulation and snow accumulation and melt are muted.
In all four methods, bias is least likely to be the dominate cause of low KGE at the monthly scale and three of the four methods at the annual scale (i.e., bias is the major contributor a 42% of sites for NOAH-HRR). As noted previously, the remote sensing methods are calibrated to the gauges, which explains why bias a not a significant contributor. For the GLDAS models, the general water balance framework appears to be suited for average discharges.
In this discussion, results from the major watersheds are grouped together with a focus on differences between discharge estimation methods. First, the TWSA- and Altimetry-derived discharges tend to track the monthly patterns in observed discharges reasonably well. However, at the annual scale, altimetry-derived discharges are limited due to gaps in the data. The most noticeable exception is the Susquehanna, especially for the Altimetry-derived discharges, which are almost temporally opposite to the gauge (e.g., for most of 2020 in Fig. 5). For this river, the TWSA-derived discharges are also generally much larger than the gauge. Additional research is needed to better understand these results as the river is not heavily regulated and sufficiently wide for extracting water surface elevations from the altimetry observations. While there is a seasonal snowpack in the mountains, snowmelt is not a significant fraction of streamflow. One possible issue for TWSA-derived discharges could be the watershed’s proximity to the Great Lakes to the North/Northwest and the Atlantic Ocean to the East/Southeast both of which could have leakage impacts on the TWSA signal. Difference in GRACE and GRACE-FO may also be an issue. For example, in Fig. 9e (2019–2022; GRACE-FO), TWSA-discharges are consistently overestimated in the Susquehanna. However, in Fig. 9e (2004–2016), TWSA-discharges track much closer to the observations, suggesting a difference in TWSA between the GRACE and GRACE-FO periods. In general, TWSA-derived annual discharge in the eastern US were often overestimated (Fig. 8a), which is similar to the findings in Mohanasundaram et al., 202136, where the GRACE-PY model was shown to overestimate annual discharges in some eastern US watersheds, especially the Ohio, Susquehanna and some New England rivers.
For the Mississippi River, the results for are mixed. For example, CLSM-HRR consistently overestimated discharges for Missouri and underestimated discharges for the Ohio, with varied for the upper Mississippi. This represents a shift in model performance from west to east across the watershed which is consistent with results presented in Tijerina et al., 2021. The underestimation pattern for the Ohio River was also documented byZaitchik et al., 2008 for runoff from CLSM after data assimilation with GRACE in Ohio Basin compared to CLSM open loop, which is similar to the application for CLSM (GLDAS 2.2) used in this study. The study pointed out that the data assimilation possibly degraded hydrologic fluxes despite improving the simulation of hydrologic states. Additional research is needed to better understand differences in model forcings and impacts resulting from assimilating GRACE TWSA into CLSM, especially given the differences between NOAH-HRR and CLSM-HRR are significant in all the Mississippi Basin gauges (20–40 cm/yr). Here, NOAH provides generally favorable results with one clear exception is 2019, where the model significantly overestimate discharges in the beginning of the year (Fig. 5). This appears in the Missouri and upper Mississippi Rivers suggesting a large faction of winter precipitation may have been simulated as rainfall and associated rapid runoff. The gauge data suggests a several month delay and a slower release of runoff. Results for the Ohio were generally consistent with the gauge, and the resulting impacts of overestimating discharges in the Missouri and upper Mississippi were muted at the lower Mississippi gauge. For the Ohio River, as with the Susquehanna, TWSA-derived discharges tend to be overestimated to a larger degree in 2019–2022 but tend to agree better with observations for the GRACE period.
The GLDAS derived discharges failed to accurately predict discharge in the upper Missouri and Columbia River Basins. This is a consistent finding in that the GLDAS models tend to perform poorly in cold regions, which may be related to challenges in capturing cryosphere hydrologic processes (e.g., snow, glacier and/or permafrost) 9. For gauges in South, Midwest and Southwest, all methods appear to agree with the observed mean monthly discharges (Fig. 3), however, the KGEs are less than 0.32 for all the estimation methods. This could be due to the failure to predict the irregularities in the time series caused by regulation from the large number of dams in the region 69 and the semi-arid environment (e.g., < 60 cm/yr), which is especially challenging for estimating runoff (i.e., runoff generation highly sensitive to soil moisture). The presence of reservoirs/dams in the region likely influence the relationship between discharge and TWSA, which has been previously shown by Duvvuri & Beighley, 2023b.
For the Columbia River, the differences in CLSM-HRR and NOAH-HRR are the most noticeable as compared to the other selected major rivers. CLSM-HRR captures the observed temporal pattern reasonably well while NOAH-HRR peaks higher and earlier than the gauge; followed by a rapid recession much faster than the gauge. NOAH-HRR shows a similar pattern to the TWSA-derived discharges suggesting that NOAH-HRR is mimicking a large influx of water (i.e., increase of TWSA) and loss of water (i.e., decrease of TWSA) differently than CLSM-HRR and the gauge. This could be related to not capturing snow accumulation and melt correctly. For the other major rivers, CLSM-HRR and NOAH-HRR tend to track temporally with each other suggesting their forcings and runoff processes are generally capturing similar behavior. However, their magnitudes of discharge tend to be different suggesting that their magnitudes of precipitation and/or fractions of precipitation converted to runoff are different. This could be attributed to the different sources of metrological fields used to force the GLDAS LSM models. However, in most of the major rivers, the differences in discharge magnitude are consistent with the exception of the Columbia River.
For the Colorado River, which has the smallest discharges (cm/month), the assessment of CLSM-HRR and NOAH-HRR is difficult as they both significantly overestimate discharge and are nearly identical. Similar results have been observed for NOAH-MP, which incorporates additional physics, multiple soil layers and improved representations of vegetation and snow processes compared as compared to GLDAS NOAH. Ma et al., 2017 found that NOAH-MP overestimated runoff by roughly two times throughout the year in the Rio Grande and Lower Colorado watersheds. They attributed it to the challenges associated with precipitation forcings and simulating ET. In addition to the possible error in the forcings, uncertainties in the model structure and parameters were listed as possible reasons for poor performance in these regions. As noted previously, the GLDAS models tend to have the seasonal peaks occur early and significantly higher followed by a rapid decrease to below observed discharges magnitude (i.e., rapid recession). These discrepancies are potentially due to incorrect snow accumulation and melt (i.e., simulated as rainfall and rapid runoff) and lack of reservoir regulation/routing. While the altimetry-discharges are largely not available for the Colorado River, the TWSA-discharges performed reasonably well (KGE 0.37 for monthly discharges and 8% relative bias for annual discharges).