Calibration and uncertainty analysis of integrated SWAT-MODFLOW model based on iterative ensemble smoother method for watershed scale river-aquifer interactions assessment

River-aquifer interaction is a key component of the hydrological cycle that affects water resources and quality. Recently, the application of integrated models to assess interaction has been increasing. However, calibration and uncertainty analysis of coupled models has been a challenge, especially for large-scale applications. In this study, we used PESTPP-IES, an implementation of the Gauss-Levenberg–Marquardt iterative ensemble smoother, to calibrate and quantify the uncertainty of an integrated SWAT-MODFLOW model for watershed-scale river aquifer interaction assessment. SWAT-MODFLOW combines the Soil and Water Assessment Tool (SWAT), a widely used watershed model, with a three-dimensional groundwater flow model (MODFLOW). The calibration performance of the model was evaluated, and the uncertainty in the parameters and observed ensemble, including the uncertainty in forecasting groundwater levels, was assessed. The results showed that the technique could enhance the model performance and reduce uncertainty. However, the results also revealed some limitations and biases, such as overestimating the groundwater levels in most monitoring wells. These biases were attributed to the limited availability of groundwater level in the first year of the calibration and the uncertainty in groundwater flow model parameters. The river-aquifer interactions analysis shows that water exchange occurs in almost all cells along the river, with most of the high-elevation areas receiving groundwater and flatter regions discharging water to the aquifer. The study showed that PESTPP-IES is a robust technique for watershed-scale river-aquifer modeling that can ensure model calibration and parameter uncertainty analysis. The findings of this study can be used to improve water resources management in watersheds and help decision-makers in making informed decisions.


Introduction
Surface water and groundwater are naturally unified systems that interact and influence each other's quantity and quality.However, they are often artificially separated for practical reasons, such as timescales, and analytical and computational challenges (Furman 2008;Fleckenstein et al. 2010;Wang and Chen 2021).To overcome these limitations and better capture the complex dynamics of the coupled surface-subsurface system, integrated modeling of surface water and groundwater has gained popularity over traditional approaches that ignore the connections between them.A particular focus of integrated modeling is to characterize river-aquifer interaction at the watershed scale, which is essential for water resources management and ecosystem protection (Flipo et al. 2014;Baratelli et al. 2016).
Integrated modeling of surface water and groundwater can be achieved using a variety of methods, including external coupling, iterative coupling, and full coupling (Furman 2008; Barthel and Banzhaf 2016;Wang and Chen 2021).The external coupling has limited functionality to correct or modify systems in each iteration, as computational results flow only one way.Full coupling solves both the surface and subsurface systems and their internal boundary conditions simultaneously.These models are more comprehensive but also more computationally demanding (Barthel and Banzhaf 2016).Some examples of full coupling models are ParFlow (Kollet and Maxwell 2006), HydroGeoSphere (Brunner and Simmons 2012), and OpenGeoSys (Kolditz et al. 2012).Iterative coupling solves one system, formulates interfacial boundary conditions, and solves the other system based on these conditions.Then it updates the internal boundary condition using the solution of the second system and repeats the process until convergence criteria are met.Coupled Groundwater and Surface-Water Flow (GSFLOW) (Markstrom et al. 2008) and SWAT-o (Kim et al. 2008;Guzman et al. 2015;Bailey et al. 2016) are widely used iteratively coupled models.This coupling method is less computationally intensive and has been increasingly used in regionalscale studies (Markstrom et al. 2008;Surfleet and Tullos 2013;Chung et al. 2014;Yifru et al. 2022).
The importance of considering uncertainty in model parameters and its propagation to predictions is widely acknowledged in hydrologic modeling (Hassan et al. 2008).However, integrated models typically have a large number of parameters, assumptions, and inputs, which can limit the time and resources available to explore uncertainty (Wu et al. 2014;Anderson et al. 2015).Consequently, calibration and quantification of uncertainties are challenging, particularly for regional-scale studies.Predominantly, coupled surface-subsurface environmental models, even though model development has progressed in accounting for detail processes and complexities, lack a consistent and transparent calibration framework.This is because many modern, model-independent parameter estimation tools such as PEST (Doherty 2004) rely on filling the Jacobian matrix and the computational cost of running a high dimensional coupled model can be prohibitive for automatic calibration (Moges et al. 2020b).
Process-based hydrological/hydrogeological modeling involves significant uncertainty due to the imperfectness of data and model structure, especially for integrated surfacewater groundwater interaction modeling (Wu et al. 2014).Therefore, systematic approaches for parameter estimation, sensitivity, and uncertainty analysis are needed to integrate data and models and quantify potential errors (Herrera et al. 2022).Several researchers (Chen and Oliver 2013;Crestani et al. 2013;Bocquet andSakov 2013, 2014) have proposed and used the Ensemble Kalman filter/ensemble smoother (Evensen 1994;van Leeuwen and Evensen 1996) to minimize computational costs on subsurface models.Recently, White (2018) proposed a model-independent iterative ensemble smoother (IES) and demonstrated its applicability by calibrating the MODFLOW model.Yet, real-world environmental problems seldom use this approach or they are limited to subsurface models.
The objective of this study is to perform calibration and uncertainty analysis of a coupled surface water and groundwater model using PESTPP-IES, an implementation of the Gauss-Levenberg-Marquardt IES, for watershed-scale river aquifer interactions assessment.The coupled model used in this study is SWAT-MODFLOW, which combines the Soil and Water Assessment Tool (SWAT) and the Modular Groundwater Flow Model (MODFLOW) to simulate the interactions between surface water and groundwater at the watershed scale.This study is the first to use IES to calibrate and uncertainty analyze a coupled surface-subsurface model for watershed-scale river aquifer interactions assessment.The results of this study will provide new insights.The rest of the article is organized as follows.First, an overview of calibration and uncertainty analysis of integrated models is provided.Then, the materials and methods section describes the study area, data, and model setting.Next, the results and discussion section present the calibration performance of the model, uncertainty analysis results, and spatiotemporal distribution of river-aquifer interactions.The summary and conclusion are also given following the results section.

Calibration and uncertainty analysis in integrated surface-subsurface models
Integrated environmental modeling inherently involves uncertainty stemming from imperfect data, parameters, and model structure (Wu et al. 2014;Moges et al. 2020a).Consequently, it becomes crucial to analyze how uncertainty propagates within the model to make accurate predictions (Hassan et al. 2008).However, the adoption of uncertainty analysis in coupled models remains limited, presenting challenges in its utilization as a decision-making tool, particularly in large-scale studies (Wu et al. 2014).While uncoupled, iteratively coupled, and fully coupled surface-subsurface models are generally regarded as integrated modeling approaches, iteratively coupled models are often preferred for large-scale applications (Freeze 1972;Furman 2008;Markstrom et al. 2008).This preference arises from the fact that uncoupled models lack the feedback mechanism between surface and subsurface models to mutually correct each other, while fully coupled models tend to be more complex and large systems to solve (Barthel and Banzhaf 2016).The most widely applied iteratively coupled surface-subsurface models are MODFLOW-based (Table 1).
Recently, applications of GSFLOW and SWAT-MOD-FLOW, among MODFLOW-based coupled models, have been steadily increasing (Tian et al. 2015;Barthel and Banzhaf 2016;Moges et al. 2020b).Following the revision of the modeling framework by Bailey et al. (2016), the use of SWAT-MODFLOW for a range of environmental studies has spread widely.In fact, it is now considered the most widely used integrated model for assessing the interaction between surface water and groundwater (Ntona et al. 2022).It has been applied to address various water resources issues, such as land use, land cover, groundwater abstraction scenarios, and climate change on surface water-groundwater interactions (Bailey et al. 2016;Aliyari et al. 2019;Gao et al. 2019;Yifru et al. 2022).

Integrated SWAT-MODFLOW model description
SWAT-MODFLOW integrates SWAT (Arnold et al. 1993) and MODFLOW (Harbaugh 2005) into a single executable code.MODFLOW is the U.S. Geological Survey's modular finite-difference model that simulates groundwater flow in three dimensions.MODFLOW consists of various packages that represent different boundary conditions and sources/ sinks of groundwater, such as rivers, wells, recharge, etc. SWAT-MODFLOW uses MODFLOW-NWT (Niswonger et al. 2011), which is a Newton-Raphson formulation for MODFLOW-2005 that improves the solution of unconfined groundwater-flow problems.MODFLOW applies a finite difference method to discretize the subsurface domain into cells with different hydraulic properties, such as hydraulic conductivity, porosity, and storage coefficient.By solving the ow flow equations in each cell and direction under the specified boundary conditions and stresses, MODFLOW predicts the changes in groundwater levels and fluxes (Eq.1).
where K xx , K yy , and K zz represent the principal hydraulic conductivity tensor; W represents the source or sink; S s denotes the specific storage (1/L); t is time; h represents the hydraulic head (L).
SWAT is a semi-distributed model continuous-time model developed by the U.S. Department of Agriculture, Agricultural Research Service (USDA, ARS) (Arnold et al. 1998).SWAT is physically based mainly on the soil processes, but surface runoff and dynamics in water-table and rivers are empirically based (Laurent and Ruelland 2011).The model is semi-distributed: some parameters are distributed whereas others are lumped.The basic unit for calculation in (1) The SWAT model provides two commonly employed surface runoff estimation options: the SCS curve number procedure (SCS 1972) and the Green & Ampt infiltration method (Green and Ampt 1912).Research has shown that both methods are equally accurate in simulating runoff (King et al. 1999;Ficklin and Zhang 2013;Bauwe et al. 2016).Despite both methods being equally accurate in simulating runoff, the SCS method is preferred over the Green & Ampt method due to its simplicity (Cheng et al. 2016), whereas the latter requires hourly data.SCS curve number procedure is based on the following equation (SCS 1972): The variables Q s , Rt, I o, and S represent surface runoff, rainfall depth for the day, initial abstraction, and the retention parameter all in millimeters of water, respectively.The initial abstraction includes surface storage, interception, and infiltration before runoff begins.The relationship used to approximate the retention parameter is as follows, where CN represents the curve number for the day: ET and percolation are also important water balance segments in the SWAT model framework.ET is the collective term for the processes that convert water to water vapor, such as evaporation from the plant canopy, (2)  (Swain and Wexler 1993) Branch model (Schaffranek et al. 1981) GSFLOW (Markstrom et al. 2008) Precipitation Runoff Modeling System (PRMS; Leavesley et al. 1983) Daflow-Modflow (Jobson and Harbaugh 1999) Daflow (Jobson 1989) transpiration, sublimation, and soil evaporation.The SWAT model utilizes the concept of potential evapotranspiration (PET) to calculate ET, employing three alternative methods that require varying inputs.The Penman-Monteith method (Monteith 1965) estimates PET based on solar radiation, air temperature, relative humidity, and wind speed.The Priestley-Taylor method (Priestley and Taylor 1972) requires solar radiation, air temperature, and relative humidity, while the Hargreaves method (Hargreaves and Samani 1985) solely uses air temperature as input.These methods have been extensively compared, explored, and documented (Jung et al. 2016;Aouissi et al. 2016).Percolation is calculated in each soil layer when water content exceeds the layer's field capacity and the layer below is not saturated, using a specific relationship to determine the available water volume for percolation.
The integrated SWAT-MODFLOW model replaces the SWAT groundwater module with the MODFLOW model.For example, the MODFLOW River package is used to compute river-aquifer interactions.The integrating code facilitates principal data exchange between SWAT and MODFLOW on a daily time scale, including recharge, river stage, groundwater head, and river-aquifer interaction (Fig. 1).MODFLOW computes groundwater levels and surface water-groundwater interactions, which are then passed to the SWAT model.In turn, SWAT computes ET and river stage, which are passed to MODFLOW.

Calibration and uncertainty analysis
Calibration is essential to make environmental models valuable for decision-making (Gupta et al. 1999;Renard et al. 2010;Anderson et al. 2015).The risk associated with uncertainties arising from measurement, simplifying assumptions made in representing the actual environmental system with a mathematical hypothesis, parameter variability, or lack of knowledge of the model parameters should also be evaluated (Pasetto et al. 2012;Herrera et al. 2022).Several techniques are available to assess uncertainty in hydrological and hydrogeological modeling, including Monte Carlo, Bayesian statistics, multi-objective analysis, and Generalized Likelihood Uncertainty Estimation (GLUE) (Beven 2007;Moges et al. 2020a).
Calibration and uncertainty analysis for coupled surface-subsurface modeling has not been extensively explored (Wu et al. 2014;Moges et al. 2020b).In the realm of integrated modeling, Wu et al. (2014) have utilized the efficient Probabilistic Collocation Method to perform uncertainty analysis on the GSFLOW model, while Moges et al. (2020b) have employed winding stairs and null-space Monte Carlo methods to achieve the same objectives.In most cases, coupled models are calibrated manually through trial-and-error attempts (Acero Triana et al. 2019).While SWAT and MOD-FLOW have well-defined calibration and uncertainty analysis frameworks, the integrated model lacks clear procedures for parameter estimation and uncertainty quantification.As a result, modelers follow diverse approaches.The common approaches can be broadly classified into two.The first one is calibrating and validating the two models independently and integrating them for a forward run (Bailey et al. 2016;Chunn et al. 2019;Wei et al. 2019).The second procedure is calibrating the two models independently and recalibrating for selected parameters after integrating the models (Taie  The SWAT model is commonly calibrated using either Sequential Uncertainty Fitting 2 (SUFI2) or GLUE (Arnold et al. 2012;Kumar et al. 2017;Zamani et al. 2021).On the other hand, MODFLOW calibration is performed commonly using a parameter estimation (PEST) interface (Doherty 2004).The latest version of PEST (PEST + +) offers a multitude of advanced features for calibration and uncertainty analysis.These include least-squares parameter estimation with integrated first-order and second-moment parameter and forecast uncertainty estimation, global sensitivity analysis capabilities, and ensemble smoother (White 2018;White et al. 2020).
The ensemble smoother (ES) algorithm (van Leeuwen and Evensen 1996) assimilates all available measurements or observations in a single step to update the model parameters (Chen and Oliver 2013;Emerick and Reynolds 2013).Compared to ensemble Kalman filters (EnKF) (Evensen 2003), which require simulation restart, the ES is a faster and easier-to-implement method due to its ability to avoid restarts (Emerick and Reynolds 2013).However, since ES uses a single Gauss-Newton correction to condition the ensemble to all data available, it may not provide acceptable history matching (Emerick and Reynolds 2013).Iterative methods have been developed for use with the EnKF in applications where the nonlinearity is large (Chen andOliver 2013, 2017).IES has less computational cost compared to EnKF (Li et al. 2018).But IES also requires a large number of iterations to match data relatively well.To overcome this computational constraint, Chen and Oliver (2013) formulated an IES based on the Gauss-Levenberg-Marquardt (GLM) (Hanke 1997) algorithm.More mathematical details on Gauss-Levenberg-Marquardt-based formulations can be found in Emerick andReynolds (2012, 2013), Chen and Oliver (2013), and White (2018).
The iterative nature of the IES improved ES on data matching for nonlinear problems, while the ensemble approximation to the Jacobian matrix of the GLM algorithm reduced the computational constraint induced by using large numbers of parameters (Emerick and Reynolds 2013;White 2018).And therefore, IES is an invaluable tool for mitigating the computational challenges associated with history-matching and uncertainty quantification on large-scale environmental models with high-dimensional parameter spaces (Chen and Oliver 2013;Emerick and Reynolds 2013;White 2018).
Recently, White (2018) employed Levenberg-Marquardt forms of IES (Chen and Oliver 2013) using the PEST + + interface (PESTPP-IES).PESTPP-IES is a data assimilation and uncertainty analysis approach that combines Bayesian methods with subspace methods to gain efficiencies, reducing the computational burden of applying Bayesian principles, especially with high parameter numbers.Compared to popular Bayesian sampling methodologies, such as Markov chain Monte Carlo, the numerical burden of using PESTPP-IES is much lower when the parameter numbers are moderate to high (White et al. 2020).PESTPP-IES finds an ensemble of parameter fields that are all able to adequately reflect the measured data and expert knowledge.

Data and study area description
This study focuses on a watershed within the Han River basin that covers more than 200 km 2 .The river that runs through the watershed is known as Tancheon (Fig. 2).The topography exhibits a rugged terrain with jagged ridgelines and deep valleys, resulting in a complex topography.While about 35% of the area has a relatively flat slope of less than 10%, slightly over 1% of the region has a steep slope that exceeds 50%.
The region has a humid climate with four distinct seasons, namely winter, summer, spring, and fall.Among these, summer (July-August) receives the highest levels of precipitation and humidity.On average, the relative humidity during The setup of the SWAT model requires meteorological, soil, land use/land cover, and topography information, which are obtained from various databases and agencies (Table 2).The meteorological data include daily maximum and minimum temperature, precipitation, solar radiation, relative humidity, and wind speed.The rainfall data were processed from the stations shown in Fig. 2, while other weather data were collected from Seoul and Suwon weather stations.
The data regarding soil characteristics are presented in the hydrologic soil group format, which categorizes soil based on its hydrological properties.The analysis reveals that soil in the region mainly belongs to the group 'A' or 'B' (Fig. 3).The region's principal land use/land cover types comprise four categories: forest, agriculture, urban, and water area.Forests cover the mountainous section of the watershed, while urbanization characterizes the lowerelevation areas.The region consists chiefly of artificial surfaces and forest cover, with forests occupying nearly half of the watershed.The agricultural area, however, accounts for only around 8.5% of the land use.
Precambrian geology characterizes the watershed as a diverse rock composition, encompassing metamorphic rock, unconsolidated sedimentary deposits, intrusive igneous rock, and carbonated salt rock.Groundwater serves multiple purposes, such as agricultural, domestic, and industrial use.For the model setup, 105 pumping wells with varying pumping rates and eight monitoring wells were assessed (Fig. 4).The pumping rate ranges from 0.76 to 90 m 3 /day.The hydraulic conductivity, which was obtained from the pumping test, ranges from 0.001 to 23.7 m/day.However, the available information is limited.

Model setup for the study area
SWAT and MODFLOW models are frequently developed separately to incorporate the necessary detail and boundary conditions in each model and subsequently integrated.Following a similar approach, the SWAT model was developed first  The groundwater flow model was built with a 200 × 200 m grid size and two vertical layers.On average, the thickness of the bottom layer ranges from 20 to 50 m, while the top layer typically measures between 10 to 30 m.The principal water sources and sinks were formulated using different MODFLOW packages, including river, well, general head, ET, and recharge packages (Fig. 4).The cells around the outlet of the watershed are represented using a general head boundary.The SWAT model river network was used to map the MODFLOW river cells.
Establishing the initial head is a crucial step in building a MODFLOW model because it determines the starting point for groundwater flow simulations.Usually, this value is interpolated from well-distributed monitoring wells in the study area.However, in this study, the limited number and uneven distribution of monitoring wells across the complex topography hinder accurate interpolation of the initial head.Therefore, we have used the average depth to the groundwater table to estimate the initial head, based on the elevation of the topmost aquifer layer, and made a moderate modification around the mountainous areas from the initial model checkup simulations.The top layer elevation is derived from the 30-m resolution digital elevation model (DEM).

Integrated SWAT-MODFLOW model calibration and uncertainty analysis
The process of calibration and uncertainty analysis was carried out using PESTPP-IES (White 2018;White et al. 2020) with a Python framework for Environmental Modeling Uncertainty analyses (pyEMU) (White et al. 2016) interface.The calibration was performed using both river flow data and groundwater level data.The MODFLOW model parameter calibration was based on pilot points (Fig. 4).Mainly specific yield, specific storage, and hydraulic conductivity were included in the calibration process.Calibration of SWAT model parameters was also carried out, which mainly comprised groundwater, soil, management, and HRU (Table 3).In most cases, the parameters were analyzed as a group, namely SWAT, hk (representing horizontal hydraulic conductivity), sy (representing specific yield), and ss (representing specific storage).An ensemble of 150 realizations was used.Figure 5 displays the prior parameter distribution for these groups.The calibration was performed from 2017 to 2018 daily.
PESTPP-IES minimizes an objective function (Φ) to find the best-fit parameter distribution.It generates ensembles of simulated values and residuals, so the user must select one or more parameter ensembles that produce the best or equally good fit.This can be done by selecting a single ensemble with the lowest Φ value or a group of ensembles that reflect uncertainty, with a Φ threshold used to select equally good fits (White et al. 2020).To evaluate the model's performance, a combination of statistical analysis and graphical comparison was conducted between the simulated and measured data.For the river flow simulation, the Nash-Sutcliffe Efficiency (NSE) (Nash and Sutcliffe 1970) and Percent Bias (PBIAS) (Sorooshian et al. 1993) were used as evaluation metrics (Eqs.5 and 6).However, due to the limited availability of measured groundwater level data, the Root Mean Square Error (RMSE) was used to evaluate the performance of the groundwater level simulation.The recorded groundwater level data is available from May 18th, 2017 to September 30 th , 2018, and the RMSE is a widely-used statistical measure for assessing observed and simulated groundwater levels.
In Eqs. 5 and 6, Q i and S i are the observed and computed data for the i th day, n is the length of data considered, and Q is the mean of observed data, respectively.

Spatiotemporal distribution of river-aquifer interactions
The advantage of using the integrated SWAT-MODFLOW model over the SWAT model is in estimating river-aquifer interaction.Since the SWAT model uses simplified groundwater and river seepage equations, the integrated SWAT-MODFLOW model replaces them with the MODFLOW river package (Eq.7).
where Q rivaq is volumetric water flux between river and aqui- fer (L 3 /T); K b is hydraulic conductivity of riverbed (L/T); L riv is river length (L); P riv denotes river-wetted perimeter (L); h riv andh gw are river stage and groundwater head (L) respectively; M is riverbed thickness (L).
Based on the spatial scale the river-aquifer interface can be classified into five categories: local, intermediate, watershed, regional, and continental (Flipo et al. 2014).These categories range from 10 cm to 10 m, 10 m to 1 km, 10 km 2 to 1000 km 2 , 10,000 km 2 to 1 million km 2 , and more than 10 million km 2 , respectively.Accordingly, we have used the watershed scale to describe the river-aquifer interaction modeling in this study.Following model calibration, this study simulates and processes the spatiotemporal distribution of river-aquifer interaction at a watershed scale from 2015 to 2018.

Model calibration performance
Figures 6 and 7 compare the SWAT-MODFLOW model simulated and measured data for river discharge and groundwater levels, respectively.The comparison of river discharge shows a good agreement between the simulated (7) and observed patterns, as evidenced by an NSE value of 0.85 and a PBIAS value of -6.89.These values indicate that the model can capture the temporal variability and magnitude of the river flow reasonably well.And according to the widely used guidelines (Moriasi et al. 2007), these statistical indices are generally within the "very good" range of river flow simulation performance evaluation criteria.However, the PBIAS value reveals that the model has a slight tendency to overestimate the flow, especially during low-flow periods.This bias is also reflected in the comparison of groundwater levels, where the model generally overestimates the observed heads in most monitoring wells throughout the simulation period.The overestimation is more pronounced in the first year than in the second year.
The model was able to estimate the groundwater level with a maximum RMSE of 1.34 m (Fig. 7).However, the comparison also reveals some temporal and spatial variations in the model performance.Temporally, the model shows a better agreement with the observed groundwater levels in 2018 than in 2017.This is because the available groundwater level data only started in May 2017, which limited the calibration of the model for the earlier period.Therefore, the model relied mainly on the river flow data to adjust the initial head condition, which may not capture the true groundwater dynamics.As a result, the simulated and   7 Plot showing simulated and measured depth to the groundwater table at monitoring wells used for model calibration observed groundwater levels in have large discrepancies, especially in the first few months.In contrast, the data in 2018 was complete, which enabled the model to calibrate the flux and groundwater level.Spatially, the model performs better in simulating the groundwater level in the lower reaches of the watershed than in the upper reaches.This may be related to the topographic nature and boundary conditions.Monitoring wells at the lower reaches are more at flatter areas.Moreover, the river flow at the lower reaches could contribute to the stability of groundwater level fluctuation.
In general, several factors may contribute to the integrated SWAT-MODFLOW model overestimation.One possible factor is the initial head condition used in the MODFLOW model, which was prepared from a limited number of observations.This interpolation may introduce some errors or uncertainties in the initial head distribution, especially in areas with highly undulating topography.Another possible factor is the rainfall pattern during the simulation period.
The first year has a significantly higher number of rainy days than the second year, which may result in an excess of water available for runoff generation and recharge to the aquifer.
The model performance in this study in capturing both river flow and groundwater level fluctuation is reasonable, compared to previous large-scale SWAT-MODFLOW model applications (Bailey et al. 2016;Aliyari et al. 2019;Taie Semiromi and Koch 2019).For example, Bailey et al. (2016) and Aliyari et al. (2019) found NSE values ranging from 0.22 to 0.83, while Taie Semiromi and Koch (2019) found a maximum NSE value of 0.67.Overall, the model performance in this study is comparable to or better than previous studies, suggesting that it is a reliable tool for simulating watershed-scale water resources.

Parameter and calibration data uncertainty
Figure 8 shows the mean and sigma (the standard deviation) changes of the prior and posterior parameter The sigma change represents the difference between the prior and posterior standard deviations of each parameter group, normalized by the prior standard deviation.These changes reflect how the parameter values and uncertainties are updated by the calibration process using the observed data.Most of the parameter groups show an increased sigma percentage, particularly for the MODFLOW model parameters.This indicates that the calibration process introduced more uncertainty to these parameters, possibly due to the limited number of observations.On the other hand, the SWAT model parameter mean changes are relatively small, suggesting that the prior parameter values were already close to the optimal values.
The uncertainty associated with the calibration data is presented in Figs. 9 and 10.The river flow plot (Fig. 9) and the groundwater level plot (Fig. 10) show the posterior and prior ensemble with observed data.The plots show that the posterior ensemble enclosed the observed ensemble, indicating that the calibration captured the river flow and groundwater level characteristics and eliminated unrealistic model outcomes.The plots also show that the posterior ensemble has a high overlap with the true value and a low overlap with the prior ensemble, indicating that the calibration aligned the model predictions with the data observations and distinguished them from the prior model predictions.
Although the posterior ensemble plot for groundwater level is narrower than the prior ensemble and closer to the observed data, the calibration process did not significantly improve the model fit or reduce the model uncertainty compared to the river flow.For example, the plot for the monitoring well of g_5926 demonstrates that the prior ensemble has a wide range from -20 m to 60 m, which is far from the observed data.Conversely, the posterior ensemble has a narrower range and is closer to the observed data.For watershed-scale integrated modeling with highly undulating topography and limited hydrogeological data, this result could be considered acceptable, particularly for regionalscale surface water-groundwater interaction modeling.

Forecast uncertainties
The forecast showed significant improvement in uncertainty reduction.In both monitoring wells (used as forecasting points), the posterior ensemble enclosed the true value and reduced the uncertainty (Fig. 11).This indicates that the calibration process improved the model predictions and reduced the model uncertainty.

River-aquifer interactions
The average groundwater depth ranges from 10 to 468 m (Fig. 12).An analysis of the interaction between the river and aquifer indicates that water exchange occurs in almost all cells along the river.Furthermore, the nature of the interaction largely follows the topography of the region in most areas.In the high-elevation areas, the river cells predominantly receive groundwater, while in the relatively flatter regions, the river tends to discharge water to the aquifer.From 2015 to 2018, the groundwater discharge into the river in the watershed averaged 521,809 m 3 /day, with a maximum of 18,553 m 3 /day per cell.In contrast, the seepage of water from the river to the aquifer amounted to 42,941 m 3 /day, with a maximum of 2,383 m 3 /day per cell.The daily interaction between rivers and aquifers at the watershed scale, as well as percolation, reflects the precipitation patterns in the area.While water seepage from the aquifer to the river occurs daily, percolation seepage from the river to the aquifer follows the precipitation pattern (Fig. 13).Although the study does not focus on evaluating water use scenarios, the results suggest that the impact of water use on the region's water resources is not significant.Although four years of data and simulation can provide some insight into watershed-scale water balance segments, they may not be entirely adequate.For instance, the considerable differences in the magnitude and duration of rainfall during the rainy seasons of 2015 and 2017 may lead to misinterpretation of the average model results.Nonetheless, it can be concluded that the model performed well in capturing the sources and sinks during the simulation period.

Summary and conclusions
This article applies PESTPP-IES, an iterative ensemble smoother, to calibrate and analyze the uncertainty of the integrated SWAT-MODFLOW model for watershed scale river-aquifer interaction modeling.PESTPP-IES is an iterative ensemble smoother implementation of the Gauss-Levenberg-Marquardt algorithm that can efficiently explore the parameter space and reduce the computational burden of calibration.The study presents a case study of a watershed in Korea, where the integrated SWAT-MODFLOW model was calibrated using river flow and groundwater level data.The model calibration performance and the uncertainties, including parameter and forecast uncertainties, were assessed.The comparison of the simulated and measured data showed that the model could reproduce the temporal variability and magnitude of the river flow reasonably well.However, the model generally overestimated the observed heads in most monitoring wells throughout the simulation period, with a maximum RMSE of 1.34 m.The model performed better in simulating the groundwater level in the lower reaches of the watershed than in the upper reaches.Nevertheless, the forecast demonstrated a significant reduction in uncertainty.The calibration process enhanced the model performance and decreased the model uncertainty, especially for the river flow.The uncertainty associated with the calibration data is relatively high due to the limited hydrogeological data and highly undulating topography.However, the calibrated model can provide acceptable results for watershed-scale surface water-groundwater interaction modeling.
The study finds that water exchange between the river and aquifer occurs in almost all cells along the river, with the nature of the interaction largely following the topography of the region.In high-elevation areas, the river cells predominantly receive groundwater, while in flatter regions, the river tends to discharge water to the aquifer.The daily interaction between rivers and aquifers reflects the precipitation patterns in the area, with water seepage from the aquifer to the river occurring daily and percolation seepage from the river to the aquifer following the precipitation pattern.The study also shows that the impact of water use on the region's water resources is not significant.The findings suggest that the integrated model can capture the sources and sinks of water during the simulation period, although the study acknowledges that the four-year data and simulation may not be entirely adequate to fully understand watershedscale water balance segments.
The findings of this study can be used to improve water resources management in river basins and help decisionmakers in making informed decisions.The study has some limitations that should be acknowledged.First, the data scarcity, especially for the groundwater level data in 2017, affected the initial head condition and the model calibration performance.Second, the river conductance was not calibrated, which may introduce some uncertainty in the river-aquifer exchange fluxes.Overall, the study offers valuable insights into the application of iterative ensemble smoother for integrated model calibration and uncertainty analysis for river-aquifer interactions modeling, providing a comprehensive understanding of the complex nature of river-aquifer interactions and their significance for water resources management.

Fig. 1
Fig. 1 Simplified diagram showing principal data flow in the coupled SWAT-MODFLOW model

Fig. 2
Fig. 2 Study region description, featuring location, topography, river network, and rainfall and river flow gauging stations

Fig. 3
Fig. 3 Land use/land cover (a) and hydrologic soil group (b) in the study area.The white dot labeled Seongnam represents the center of the city in the watershed

Fig. 4
Fig. 4 MODFLOW model boundary conditions (a) and simplified hydrogeology (b) of the study area: (a) shows monitoring wells and forecast points in black and red labeled with the corresponding grid ID

Fig. 6
Fig. 6 Comparison of the SWAT-MODFLOW model-simulated and observed river flow at the outlet of the study watershed

Fig.
Fig. 7 Plot showing simulated and measured depth to the groundwater table at monitoring wells used for model calibration

Fig. 8
Fig. 8 Parameter prior and posterior ensemble changes: Mean and sigma percentage changes plot of calibrated parameter groups

Fig. 13
Fig. 13 Watershed daily average river-aquifer interactions and percolation from 2015 to 2018