An innovative framework for real-time monitoring of pollutant point sources in river networks

Simultaneous identification of the location and release history of pollutant sources in river networks is an ill-posed and complicated problem, particularly in the case of multiple sources with time-varying release patterns. This study presents an innovative method for solving this problem using minimum observational data. To do so, a procedure is proposed in which, the number and the suspected reaches to the existence of pollutant sources are determined. This is done by defining two different types of monitoring stations with an adaptive arrangement in addition to real-time data collection and reliable flow and transport mathematical models. In the next step, the sources’ location and their release history are identified by solving the inverse source problem employing a geostatistical approach. Different scenarios are discussed for different conditions of number, release history and location of pollutant sources in the river network. Results indicated the capability of the proposed method in identifying the characteristics of the sources in complicated cases. Hence, it can be effectively used for the comprehensive monitoring of river networks for different purposes.


Introduction
Water resources are essential to keep life on the Earth, where these limited and valuable resources are increasingly under different threats. Due to their close routes to urban areas and extensive usage in industrial and agricultural activities, rivers are continuously exposed to accidental or intentional pollution spills. In recent years, great attention has been drawn to understanding and simulating the fate and transport of pollutants and also to identify the pollutant sources' characteristics. Recovering the release history of pollutant sources is essential in planning effective remediation strategies. Moreover, determining the number and location of pollutant sources help in identifying responsible parties for the occurred pollution and assigning remediation measure expenses among those parties (Skaggs and Kabala 1994;Liu and Ball 1999;Atmadja and Bagtzoglou 2001;Michalak 2002).
The pollution source identification problem is categorized as an inverse problem, given that the source characteristics must be recovered using limited concentration data collected at some observation points. Like most of the inverse problems, the inverse source problem does not fulfill the well-posedness criteria of Hadamard (1923). Based on Hadamard's definition, a problem is well-posed if its solution is existent, unique, and stable. A problem that lacks any of these features is called an ill-posed problem. However, since the observed pollution plume always must be originated from somewhere upstream, the inverse source problem always has a solution, therefore nonexistence would not be an issue to be raised. Hence, there are two main challenges in solving an inverse source problem, namely nonuniqueness and instability of the solution. The nonuniqueness means that different combinations of upstream release histories could lead to a single downstream concentration time series. To address the nonuniqueness issue, researchers often assume that some a priori information about the unknown source is available. The instability issue implies that small errors in the inputs of the inverse model may cause large errors in the outputs. It is mainly due to the irreversibility of the dispersion phenomena, which gradually smooth the pollution plume and decrease the amount of obtainable information from observational data (Skaggs and Kabala 1998). Hence considering uncertainties in the observed data associated with the measurement errors and the sparsity of data will increase the reliability of the inverse model.
In the last 30 years, various approaches have been proposed to solve pollution source identification problems in surface and groundwater. These approaches can be broadly categorized into three classes: optimization-based approaches, stochastic-based approaches, and mathematics-based approaches. Some reviews can be found in Morrison (2000a, b), Neupauer et al. (2000), Atmadja and Bagtzoglou (2001), Michalak and Kitanidis (2004b), Gómez-Hernández and Xu (2021), and Barati Moghaddam et al. (2021). In recent years, stochastic-based methods have been used frequently in solving the inverse source problem. The most significant feature of stochastic-based approaches is treating the unknown pollution source parameters as random variables and then use of the probability distribution functions for predicting them. This feature increases flexibility in the estimation of source characteristics and also takes into account the uncertainty of identification results due to errors in the observational data (Woodbury et al. 1998). Bagtzouglou (2003) introduced Reversible-Time Particle Tracking Method (RTPTM) which can identify the most probable spill location for both conservative and reactive solutes in spatially-varied flow fields. Later, Bagtzoglou and Ababou (2006) extended the RTPTM and presented Reverse Anti-diffusive Walk (RAW) method to address the issue of numerical instability associated with the solution of transport equation in negative time steps. Ababou et al. (2010) generalized the RAW method for more practical cases. Wang and Zabaras (2006) solved the one-dimensional backward advection-dispersion equation using the hierarchical Bayesian method and recovered the release history of a contaminant source in both homogeneous and heterogeneous porous media. Then a similar framework was used by Hazart et al. (2007) to estimate the location, release time, and amplitude of an instantaneous point source. Later, Hazart et al. (2014) extended their previous work to cover more realistic cases for point sources with more general temporal release patterns. Zeng et al. (2012) used an adaptive sparse grid-based Bayesian method to identify the pollutant sources with unknown parameters. Later, Zhang et al. (2015Zhang et al. ( , 2016 extended their previous work to design an optimal monitoring network for retrieving unknown contaminant source parameters in groundwater. Xu and Gómez-Hernández (2016) used the Ensemble Kalman filter (EnKF) to solve the inverse source problem in a two-dimensional synthetic aquifer. Later, they extended the method for simultaneous identification of aquifer parameters and contaminant source characteristics . Chen et al. (2018) assessed the performance of the EnKF method which had been successfully employed in some hypothetical cases by Gómez-Hernández 2016 and to a more realistic case based on the sandbox experiment to recover both aquifer geometry and contaminant source characteristics. The results indicated a good performance of the EnKF method for obtaining a reliable and accurate identification process and also provides an estimation uncertainty. Zhang and Huang (2017) developed a fully sequential identification framework based on the EnKF method for the reconstruction of pollution source characteristics in rivers, which also considers the uncertainties in hydrological parameters. Recently, Hwang et al. (2020) improved the Backward Probability Method (BPM) by introducing an innovative method for combining multiple probability density functions for identifying multiple contaminant sources under complex hydrogeological conditions.
One of the stochastic-based methods which is extensively used in solving the inverse source problem in groundwater is the Geostatistical (GS) method. The main assumption behind the GS method is that the unknown source function is a random variable with a known correlation structure but unknown correlation structural parameters. The optimal values of these structural parameters are obtained using the geostatistical inversion theory presented by Kitanidis (1996). According to this, the source function recovered by minimizing a likelihood function while retaining the assumed correlation structure. More details can be found in Kitanidis (1995Kitanidis ( , 1996 and Snodgrass and Kitanidis (1997). The GS approach has been widely tested and improved in groundwater source identification through hypothetical cases (Snodgrass and Kitanidis 1997;Michalak andKitanidis 2003, 2004b;Butera et al. 2013) and using field data (Michalak andKitanidis 2002, 2004a;Gzyl et al. 2014). Snodgrass and Kitanidis (1997) applied the GS method for estimating the release history of a conservative solute in a one-dimensional homogeneous aquifer. They were the first researchers who combined GS techniques with the Bayesian theory instead of using the usual iterative framework to obtain the best estimation of parameters. Michalak and Kitanidis (2002) applied the method given by Snodgrass and Kitanidis (1997) for the reconstruction of the contaminant release history for a three-dimensional plume at Gloucester landfill site in Ontario, Canada. Michalak and Kitanidis (2004b) combined the adjoint and the GS methods to reduce the computational cost and provide the possibility to use the approach in heterogeneous fields. This combination allowed them to employ commercial groundwater flow and transport codes in the framework of the proposed inverse method. Butera et al. (2013) proposed a framework based on GS for simultaneous identification of the location and release history of a single pollutant source in a two-dimensional confined aquifer with a strongly non-uniform flow field. Gzyl et al. (2014) presented a multi-step method based on performing an integral pumping test and GS approach to identify the location and release history of a pollutant source in groundwater. The results of applying this method to a complicated contamination case indicated that it can successfully detect the suspected areas. However, the proposed methods by Butera et al. (2013) and Gzyl et al. (2014) need a priori information for approximating the location of pollutant source at the beginning of a simulation, which could be a challenge in practical applications. The GS approach has been applied only once in pollution source identification in single-branched rivers considering the effects of transient storage zone as well as linear decay processes (Boano et al. 2005).
Compared to numerous studies on pollutant source identification in groundwater, only relatively few studies on solving an inverse source problem in surface waters can be found in the literature (El Badia and Hamdi 2007;Hamdi 2009Hamdi , 2016Andrle and El Badia 2012;Cheng and Jia 2010;Mazaheri et al. 2015;Yang et al. 2016;Wang et al. 2018;Amiri et al. 2019). It is worth mentioning that the pollutant transport in rivers is often advection-dominated, so pollutant substance is transported faster and further. Thus it may lead to partial capturing of the pollution plume at the observation points. On the other hand, on-time and accurate identification of illegal spills in rivers have great importance in providing scientific support for planning mitigation and adaptation strategies. Furthermore, the research about pollution source identification problems in surface waters was mainly limited to the single-branch rivers and rarely involved river networks. This is mainly due to the complexity in hydrodynamics and pollution transport of such systems along with the inherent illposedness of corresponding source identification problems. Since tributaries in a river network usually are less monitored, those areas might be considered as potential places for illegal discharge of pollutants. Therefore, to prevent further damage, it is necessary to pay more attention to identifying pollutant sources in river networks.
Focusing on pollution source identification in river networks, Telci and Aral (2011) used an adaptive sequential feature selection algorithm (Jiang 2008), to determine the location of a single instantaneous source among several candidate locations. However, their proposed method requires a significant simulation time for training monitoring stations with a large number of spill scenarios. Ghane et al. (2016) applied the backward probability method to identify the source location and the released time of a single spill in a river system. Lee et al. (2018) dealt with the problem of identifying the location of a single instantaneous source via analyzing changes in concentration levels observed by a sensor network in a river system. Using the random forest models, they determined the possibility that each candidate location is correct or not. However, all of the mentioned studies considered a single pollutant source with a simple form of release while in many practical applications there are more than one active source and the release functions may be complicated.
Apart from the issue of limited studies on pollution source identification in rivers, most previous studies considered the location of the pollutant source as a known a priori information. This assumption may not be appropriate with real-world conditions, since in most cases the location of the pollution source is also unknown as its release history. Introducing the source location as an unknown will have a significant effect on the source identification process and increase the complexity. In other words, different potential source location combinations may result in significantly different solutions. Moreover, the simultaneous identification of location and source release history is a very complicated ill-posed problem, particularly in the case of multiple unknown pollution sources with time-varying release patterns. The main motivation behind this study is to provide an innovative method for simultaneous identification of the number, locations, and release histories of multiple point sources in a river network using minimum observations. The proposed method includes two main steps: 1. Determining the number and suspected reaches to the presence of pollutant sources through the placement of observation points in a specific manner 2. Identification of the most probable location and release history of the sources by solving the inverse source problem using a geostatistical approach.
The method is effective and easy to use in complex river networks as well as single-branch ones. Moreover, it provides the possibility of simultaneous identification of all active pollutant sources. The required computational cost is significantly lower than the common iterative methods such as the simulation-optimization approach.

Governing equations and statement of the problem
The main governing equation of solute transport in surface waters is the advection-dispersion equation (ADE) (Taylor 1954), which is a parabolic partial differential equation derived from a combination of continuity equation and Fick's first law. The one-dimensional ADE equation is as follows (Fischer et al. 1979): where A is the flow area, C is the solute concentration, Q is the volumetric flow rate, D is the longitudinal dispersion coefficient, k is the first-order decay coefficient, m is the number of pollutant point sources, f i ðtÞ is correspondent release history of ith pollutant source, dðxÞ is the Dirac delta function, x i is the ith point source release location, t and x are the time and distance, respectively. It also should be noted that the hydrodynamic parameters (i.e., A, Q, D) in Eq. (1) are obtained from the hydrodynamics model, which is based on the well-established Saint-Venant equations (Wu 2007): in which g is the gravitational acceleration and z s and S f are the water level and the energy slope, respectively.
The general expression of the considered problem is that there are ''m'' pollutant point sources S 1 ; S 2 ; . . .; S m in a river network, in such a way that the number, locations x 1 ; x 2 ; ::::::::; x m ð Þ , and intensity functions f 1 t ð Þ; f 2 t ð Þ; :::::::; f m t ð Þ ð Þ of those ''m'' sources are unknown. The main objectives are to present a methodology for simultaneous identification of the characteristics of the sources (i.e., their number, locations, and intensity functions) and obtaining a unique solution for the considered inverse source problem with a minimum measured concentration data at observation points. The proposed method consists of two main steps. It starts with determining a spatial range in which the source of pollution is likely to be present. Then the location and release history of pollution sources are recovered using the geostatistical approach; that simultaneously considers all the possible candidates. The method is efficient and easy to use in complex river networks and single-branch ones as well. Moreover, since in each simulation all active pollutant sources are identified, the required computational cost is significantly lower than the common iterative methods. More details are given in the following sections.

Step1: Determination of the number and approximate location of the pollution sources
To determine the approximate location of the potential pollutant sources, some observation points are considered with a specific arrangement. Data collection at these observation points is managed based on some predefined rules. To provide the concentration data and proceed with the identification process, two types of observation points are defined, main stations M 1 ; M 2 ; :::::::::::; M n ð Þ and secondary stations P 1 ; P 2 ; :::::::::::; P k ð Þ (Fig. 1). The main stations measure concentration-time data continuously, and the secondary ones collect data occasionally and ondemand. The arrangement and distances of the main and secondary stations are based on some predefined criteria. These criteria related to the accuracy of identification from the viewpoint of source activity time retrieval and spatial range for pollution source localization. The main stations are arranged in such a way that the pollution plume travel time between two successive main stations always is less than or equal to the expected activity time of the sources. The travel time between successive main stations for each branch of the river network can be calculated using the following equation (Chapra 2008): where T is the travel time, C i is the concentration at time t i . The secondary stations are arranged in such a way that the distance between two successive stations is equal to or less than the expected accuracy for the spatial range of pollution source localization. The expected accuracy for the spatial range of pollution source localization depends on the sought problem, and even in a river network may vary from one tributary to another (see Sect. 3). This arrangement of monitoring stations makes it possible to identify active pollutant sources with minimum measurement data.
The key step in this study is the comparison of the observed concentration-time data and the corresponding simulated results at the main stations. Any difference between these sets could be a sign of the existence of a pollutant source upstream of that station. The simulated results are taken from integrated flow and transport models, which solve Eqs. (1)-(3) in the river network for a case of no active pollutant source. Numerical methods that are used to solve the flow and transport equations are the implicit Preissmann method for the flow model (Wu 2007) and the method of lines for the transport model (Pregla and Pascher 1989;Skeel and Berzins 1990;Schiesser 2012). It is emphasized that these simulations are real-time and continuously running, and the outputs are used in solving the inverse source problem by the proposed algorithm.
Once the main station detects a difference between the observed data and the simulated concentration results, all secondary stations located between that main station and the first upstream one start to measure one concentration data record at the time of difference detection. Depending on the type of communication setup of the network monitoring system, the referred command will be sent from a control center or directly from the main station that detected the difference. The first secondary station from upstream that shows a difference between observed data and simulated results guides us to the approximate location of the pollutant source. In other words, the pollutant source is probably located in the reach between that secondary station and the first secondary station at its upstream ( Fig. 2). Once the approximate location of the source is determined, the following steps take place: 1. The secondary station which detected the difference and its upstream one should start continuous data collecting to assure that other active sources at the upstream and/or downstream of that suspected reach is captured as well, 2. The source location must be determined more accurately, to continue the identification process and recover the release history of the detected source, 3. The forward simulation model should be revised to include the characteristics of the identified source.
It also should be noted that the continuous data collecting at the secondary stations framing the source location, will be stopped after the full passage of the pollution plume from the downstream secondary station. The identification process of the approximate location of a case with multiple pollutant sources is quite similar to what was described for the case with one active source (Fig. 3).
The main tool in using the proposed algorithm is an online monitoring system. Distributed and real-time monitoring networks are equipped with sensors that continuously measure specified water quality parameters. The recorded data are instantly sent to a control center through wireless communication systems (Zaev et al. 2016;Adu-Manu et al. 2017;Salim et al. 2017;Demetillo et al. 2019). The online monitoring provides an early warning system that assists the operators to detect any abnormality caused by a contamination source releases as soon as possible (Di Nardoa et al. 2015;Capodaglio et al. 2016). Observational data collected from these systems are used for pollutant source identification. Nowadays, real-time water quality monitoring systems are employed in many rivers around the world. For instance, real-time monitoring networks along the Rhine River or the Ohio River can be named (Grayman et al. 2000;Diehl et al. 2006).

2.3
Step 2: Recovering the characteristics of pollutant sources using a geostatistical method Once the number and suspected reaches are identified, the most probable location and approximate release history of pollution sources should be determined in the next step. Hence, at first, the suspected reaches are divided into some sub-reaches and the potential location of pollutant sources are considered at the center of those sub-reaches. Then, by solving an inverse source problem, the appropriate location of the pollutant sources is determined as a location that the highest contaminant release history is obtained. To solve the inverse source problem, a Geostatistical method has been used in this study. Regarding the linearity of Eq. (1), the solution of this equation subject to the corresponding initial and boundary conditions (i.e., (Skaggs and Kabala 1994): where K x; t À s ð Þ is the transfer function (TF), which describes the effect in time at a certain location x by a unit impulse source which is released at x 0 and time s. If M observational data are available and the time domain is discretized into N points, a general expression for the relation between the observations and the source values can be written as follows: where z is the M Â 1 ½ random vector of the observations, f is the N Â 1 ½ random vector of the discretized release history, h is the model function and v is the M Â 1 ½ random vector that represents the measurement errors. The error vector v is Gaussian with a zero mean and a covari- It also should be noted that N ) M, which means that there are more unknowns than measurements. Regarding the linearity of the ADE and by comparing Eqs. (6) and (5), it can be concluded that the function h f ð Þ is linear and therefore Eq. (6) can be rewritten as follows: where H is the M Â N ½ matrix known as the transfer matrix and is defined: where Ds is the time step and t i and s j are observation and release time, respectively. The elements of the transfer matrix represent the effect of a release at s j on observation data z i at which collected at t i . As expressed by Eq. (8), to construct the H matrix, it is necessary to calculate TFs at different times. TFs describe the response of the system to a unit impulse injection. Therefore, to calculate them, the ADE needs to be solved for a unit release pulse at the source location for different times. In the case of simple problems with a steady flow, regular cross-sections and constant parameters, TFs can be determined using analytical solutions. Due to the complex conditions considered in this study, the TFs have been computed using the finite volume method. To compute the H i;j elements, several runs of the corresponding numerical In the case of unsteady flow, the numerical model is executed for all the s j times in which the release function needs to be estimated. It is noted that the unit pulse is modeled using the Dirac delta function, d s j À Á , at the potential source location and the breakthrough curves at observation points are estimated as: Equation (7) is an ill-posed system that cannot be solved using methods. To overcome this issue, it is assumed that f has a normal distribution with the mean and the covariance as follows: where X is the N Â 1 ½ unit vector, b is the unknown mean, h is the vector of unknown structural parameters of the covariance function and Q is the covariance matrix of the release function. In this study, a Gaussian covariance matrix has been used as follows: where h T ¼ r 2 ; I f Â Ã are structural parameters. The reconstruction of the pollutant source release history in the geostatistical method consists of two steps. In the first step, the structural parameters of the covariance function h are determined and in the second one, the pollution source release function (f) is estimated using the Kriging method. Structural parameters are determined by minimizing the following objective function (Snodgrass and Kitanidis 1997): where: It is noted that this minimization problem is well-posed since the number of observation z is greater than the number of structural parameters h. In Eqs. (14) and (15), R is the covariance matrix of error in the observational data (v). The value of the unknown mean b is not relevant as it does not appear in Eqs. (13)-(15). The b coefficients are eliminated from Eq. (13) by averaging over all possible values of the coefficient (Hoeksema and Kitanidis 1985;Kitanidis 1995).
Once the structural parameters h are calculated, the release function is estimated through a Kriging system (De Marsily 1986): Equation (16) is a linear estimator. It is unbiased and also minimizes the estimate error variance (Boano et al. 2005;Butera et al. 2013), in other words: K is the N Â M ½ matrix of Kriging weights obtained by solving the following system: where M is the 1 Â N ½ matrix of Lagrange multipliers (De Marsily 1986). The mean of the release history is then estimated by Eq. (16), while its covariance matrix V is evaluated as: Using Eq. (20) the confidence interval of 95% is also determined so that for every time t i , the confidence interval is calculated asf Although the GS method is considered a practical and efficient method, in some cases it generates some nonphysical results. Usually, this problem is tackled by introducing additional constraints to the unknown variable (Box and Cox 1964;Snodgrass and Kitanidis 1997;Michalak andKitanidis 2003, 2004a). These constraints are imposed using a power transformation of the unknown variables. The new unknown function is written as follows: where a is the small positive parameter, the value which is chosen in a way that ensures the validity of inequalitỹ f [ À a. Kitanidis and Shen (1996) presented a method for choosing the optimal value of parameter a. Then the original variable f in Eq. (6) is substituted by the transformed variablef: where: Since the model is no longer linear with respect to the transformed variablef, the solution must be evaluated using successive iterations. More details can be found in Kitanidis (1995).
The method can easily be extended to the case of m multiple independent point sources, located at x ¼ x 1 ; x 2 ; . . .; x m ½ and p distinct measurement points located at Regarding the linearity of the ADE with respect to the concentration, it is written as: where i ¼ 1; 2; . . .p is the number of the observation points.
The matrix form of Eq. (24) is as follows: where Equation (25) is a system of equations in which, H i j , i ¼ 1; 2; . . .; p and j ¼ 1; 2; . . .; m, is the transfer matrix corresponding to the effect of the pollutant source release at x i on the measured concentration data at x j . Since the pollutant sources are independent, the covariance matrix can be expressed as the following block matrix: Other steps for solving the system (22) are similar to the solution for a single pollutant source described in the previous sections. Figure 4 represents the flowchart of the overall identification process.

Results and discussion
This section presents an application of the proposed method in identifying pollution source characteristics in a river network. For this purpose, a hypothetical river network consisting of a main stream (B1) and two tributaries (B2 and B3) under nonuniform and unsteady flow has been considered. The schematic of the hypothetical river network and the arrangement of the main and secondary stations are shown in Fig. 5. The main and secondary stations are arranged according to the criteria introduced in Sect. 2.2. First, the location of the main stations in all branches is determined by assuming the activity time of 10 h based on the corresponding travel time. Then, based on the desired spatial accuracy for retrieval of the source location, the secondary stations are placed in the intervals of 8 km, 7 km, and 9 km for the B1, B2, and B3, respectively.
A complete list of the main and secondary stations in each branch, along with their distances from the upstream and the travel time is given in Table 1. It can be seen from Table 1 that the travel time between two successive main stations is always less than or equal to the expected activity time for retrieval (10 h). It is worth mentioning that the main stations located at the beginning of all branches, namely M1, M5, and M7, in addition to usage in the identification process, are also used to apply upstream boundary conditions for the forward flow and transport models. The situation is similar for station M3 located at the end of B1. This station is used in the identification process as well as recording downstream boundary conditions of the forward flow and transport models.
After the arrangement of the main and secondary stations, the forward flow and transport models with the corresponding boundary conditions (Fig. 6) are executed. The execution of the forward model is done with and without considering the pollutant sources for obtaining the simulated results and the observed data sets. To evaluate the performance of the proposed method, three different scenarios for number, release location, and activity duration have been discussed. It also should be mantioned that in all of scenarios the first-order decay reaction was considered (with a decay coefficient k ¼ 1 Â 10 À6 ). The characteristics of these scenarios are listed in Table 2.

Scenario 1: one active source
In this scenario, it is assumed that there is only one active pollutant source at 8.75 km of the upstream end of the second branch (B2) with an activity duration of 11 h (Fig. 7). Figure 8 shows the comparison of the observed concentration data and the simulated results at the main stations. It should be noted that the initial period with zero concentration is due to the zero initial condition that was chosen for the sake of simplicity. It can be seen from Fig. 8 that the difference between the observed and the simulated data is first sensed by M6. After detecting the difference at M6, one concentration observation data record is collected at all of the secondary stations located upstream of M6 (P9 and P10). Figure 9 shows the collected data at these c Fig. 4 Flowchart of the identification process  Travel time (hr.) 9.98 10 6.12 secondary stations and their comparison with the simulated ones. As Fig. 9 shows, there is a significant difference between the observed and the simulated data only at P10. Therefore, this means that the release point is located in the reach between P9 and P10, i.e. at the range of 7 km to 14 km of the upstream end of the B2. Subsequently, secondary stations located upstream and downstream of the suspected reach should begin continuous data collecting to ensure there is no other active source. Figure 10 shows the comparison of the observed data and the simulated results at P9 and P10. It is concluded that there is no difference between the observed data and the simulated results at P9, meaning that no active source at the upstream of suspected reach during the period of activity of the discovered source exists. Comparison of these two series of data in P10 shows the difference. According to the form of the observed concentration-time curve and comparing it with the concentration-time curve at M6 (Fig. 8), it can be deduced that this difference is caused by the source that just has been discovered.
In the next step, the suspected reach to the presence of pollutant source (7 km to 14 km of the upstream end of the  B2) is divided into two sub-reaches (from 7 km to 10.5 km and 10.5 km to 14 km) and the potential locations of the pollutant source are considered at the center of each subreaches (8.75 km and 12.25 km of the upstream end of the B2). After that, using the inverse model and the spatial distribution of concentration data at all stations downstream of the suspected reach, the most probable location and approximate release function of the pollutant source are estimated. This is done after the full passage of the pollution plume from P10. Figure 11 shows the exact and recovered release functions of the identified pollutant source with 95 percent confidence interval for both potential locations. It can be seen from Fig. 11 when the potential and exact locations are the same, there is a good agreement between the exact and the recovered release functions. While for another location, the release function is trivial. When the most probable location of the source is estimated, the second step is to identify the release function more accurately. In this step, the release function is recovered using the observed concentration-time data at the first downstream main station (M6). It should be noted that the concentrations are measured every 3600 s, while the release history is retrieved every 600 s. The results are shown in Fig. 12. The error indices for both approximate recoveries using the spatial distribution of concentration data and the exact recovery using concentration-time data are given in Table 3. The indices used in evaluating the performance of the model in this study are: square of correlation coefficient (R 2 ), root mean square error (RMSE), mean absolute error (MAE) and Euclidean distance (de). The last index indicates the distance between the upper ru ð Þ and lower rl ð Þ bound of the 95% confidence interval. This index is mainly used to evaluate the uncertainty of the recovered release history based on the observation data: Figure 12 and Table 3 indicate that in both cases the proposed model has been retrieved the release history with almost the same accuracy. The slight difference in the 95% confidence interval means that there are more release history candidates consistent with the observations in the case of retrieval using the spatial distribution of concentration data. The main reason is the sparsity of spatial distribution of concentration data compared to the concentration-time data, which makes the ill-posedness issue more severe.
Once the first source is identified, the forward model is updated. After that, a comparison of the observed data and simulated results at the main stations shows no difference between these two sets (Fig. 13). This implies no other pollutant sources are expected to be active.

Scenario 2: two asynchronous active sources
In this scenario, it was assumed that during the simulation time there are two active pollutant sources in the river network. The second source starts releasing after the first pollutant source ends its activity. The location of the first source is similar to scenario 1 and the second one was located at 46 km of the upstream end of B1 (Fig. 14). After identifying the first pollutant source and updating the forward simulation model, a comparison of the observed data and simulated results at the main stations (Fig. 15) shows the difference between these two sets of data at the M3 located at 57 km of the upstream end of the B1. This indicates the existence of an active source upstream of the M3 station. Similar to the first source, a concentration data record is measured at all secondary stations located between station M3 and the first main upstream stations  Figure 16 depicts the collected data at these secondary stations and their comparison with the simulated ones. The only secondary station that recorded the difference is P5 located at the 48 km upstream end of B1. Hence, it can be concluded that the suspected reach for the second pollutant source is between P5 and the upstream secondary station (P4).
Following the same procedure taken for the first source, it is concluded that there is no difference between the observed and simulated data at P4 which means no active source at the upstream during the activity time of the discovered source (Fig. 17). Also by comparing the general form and peak of the observed concentration-time curve and the corresponding one at the M3 station (Fig. 15), it is concluded that this difference is only due to the discovered source and no other pollutant sources are expected to be active. Dividing the suspected reach into two sub-reaches with equal length and following the same procedure as previous, the most probable locations and their approximate release functions are estimated. The results are presented in Fig. 18. According to these results, it can be concluded that the second source is located at 46 km upstream end of B1.
Finally, by assuming the known source location and using concentration-time data at M3, the release function is determined more accurately. The measurements are made every 3600 s and the release history is retrieved every 600 s. The results are shown in Fig. 19. Also Table 4 shows the error indices.

Scenario 3: three active sources, with at least two simultaneously active sources
To show the capabilities of the proposed model in the case where several sources are simultaneously active, scenario 3 is designed. In this scenario, the identification of three sources where part of the activity time of two of them coincides is considered. Similar to scenario 1, the first source has been considered at 8.75 km upstream end of B2. The other two sources are considered at 10 km and 46 km of the upstream end of B1, respectively. It is also assumed  Fig. 12 The recovered release function using the most probable location of the source and the observed concentration-time data

a) Test 1
In the first case, it is assumed that the source at 46 km upstream end of B1 starts its activity 5 h earlier than the source at 10 km (Fig. 20). Once the first source is identified, similar to what was described in scenario 1, the forward model is updated for the recovered source. Then, a comparison of the observed data and simulated results at the main stations (Fig. 21), first shows a difference between these two sets of data at M3. A few hours later, while the pollution plume has not yet completely passed M3, a difference between the observed and simulated data at M2 is recognized. This implies that two sources are simultaneously active upstream of these two main stations.
To correctly identify the suspected reaches for this case, it is necessary to collect concentration data at the time of difference detection at all secondary stations between stations M3 and M2 and the first upstream main station. Figure 22a and b represent the comparison of the observed and simulated data at the sought secondary stations.
In addition to ensuring that there are no other active sources, the concentration-time data that has been collected at the secondary stations upstream and downstream of the suspected reaches are compared with the simulated results (Fig. 23). As can be seen from Fig. 23, there is no difference between the observed and simulated data at P1, which means that there is no active source upstream of that station during the activity of the detected source. A comparison of these two sets of data in P2 shows a difference. Also comparing the general form and peak concentration of the concentration-time curve with the corresponding curve at M2 (Fig. 21), reveals that this difference is only due to the discovered source. The observed and simulated data at P4 and P5 also show a difference. Similarly, it can be concluded that this difference is due to the discovered sources and that there is no other active source upstream of these stations. After determining the suspected reaches, the most probable location and approximate release function are f(kg/s) t(hr) exact recoverd 95% Fig. 19 The recovered release function using the most probable location of the source and using the observed concentration-time data  x= 46 km exact recoverd 95% Fig. 24 The recovered release function of S 2 at the two potential locations in scenario 3-test1 Stochastic Environmental Research and Risk Assessment (2022) 36:1791-1818 1809 recovered using the spatial distribution of concentration data in all downstream stations. Given that the source located at 40 km to 46 km upstream end of B1 has started its activity earlier, its most probable location must be determined first. This case is fundamentally different from the previous ones. In this case, due to the simultaneous activity of two pollutant sources, the most probable location and release function of the second source are estimated using the spatial distribution of concentration data at the time of discovering the effect of the third source. This is due to the observed data at the time of the full passage of pollutant plume from the downstream secondary station that represents the combined effect of two sources which may lead to the incorrect identification results. While at the time of detection of the third source, its effect has not yet reached downstream, and the data that has been recorded at the downstream main and secondary stations shows only the effect of the second source. Following similar steps as before, Fig. 24 shows the results of the inverse model for two potential locations of the second source. According to this figure, it can be concluded that the second source is located at 46 km upstream end of B1. Subsequently, the location of the third source is also determined using the spatial distribution of the concentration data at the time that pollutant plume fully passed from P2. The results of the inverse model for the two potential locations upstream end of B1 are also shown in Fig. 25.
Once the location of the pollutant is estimated, their release functions would be recovered more accurately. This is done by starting from upstream and updating the forward model. It should be noted that the observed concentrationtime data at M3 will only include the effect of the second pollutant source (located 46 km from upstream of B1), and therefore, the exact release function of this source can also be calculated.
The results of the recovery of the third source release function using the concentration-time observed data at M2 are shown in Fig. 26. The measurements are made every 3600 s and the release history is retrieved every 600 s. Figure 27 shows the results of the exact recovery of the release function of the second source using the observed concentration-time data at M3 after deducting the effect of the third source. The error indices for both approximate and exact recovery of the third source release function are given in Table 5. As can be seen from Figs. 26 and 27 and the error indices in Table 5, the accuracy of the results obtained using the concentration-time data is slightly better than the accuracy of those obtained using the spatial distribution of the concentration data. In addition, the 95% confidence interval width is narrower for the case of the exact recovery, which indicates less uncertainty in obtained results. The main reason for this is the difference in the number of observational data in these two cases. Since the spatial distribution of concentration data is usually sparse and the number of the available data is much less than the number of the desired time steps for recovery of the release function.

b) Test 2
In the second case, it is assumed that the source at 10 km upstream end of B1 starts its activity 5 h earlier than the source at 46 km (Fig. 28). This creates a different condition in the identification process than the first test. After identification of the first source, similar to what was described in scenario 1, the forward model is updated according to the recovered source characteristics. Updating the forward model, a comparison of the observed and simulated data at the main stations shows a difference between these two sets of data at the M2. (Fig. 29). Following similar steps as the previous cases, according to Fig. 30, it can be concluded that the suspected reach to the presence of second source is between P1 and P2. Also, to ensure that there are no other simultaneous active sources upstream and downstream of the suspected reach, the observed and simulated data are compared at P1 and P2 during the continuous data recording period by these stations (Fig. 31). As shown in Fig. 31, there is no difference between the observed and simulated data at P1, which means that there is no active source upstream of the suspected reach during the detection period. However, a comparison of these two sets of data in P2 shows a difference. Regarding the general form and peak concentration of the observed concentration-time curve with the corresponding one at M2 (Fig. 29), it can be understood  that this difference is due to the discovered pollutant source and no other pollutant source is expected to be active. Similarly, the results of the inverse model for both potential locations are presented in Fig. 32. As indicated in this figure, for a potential location of 14 km, the release function is trivial, while for a potential location of 10 km a non-zero release function is computed. Therefore, it can be concluded that the second pollutant source is located at 10 km upstream of the B1.
Once the most probable location of the source is determined, assuming the source location is known and using the observed concentration-time data at M2, its release function is recovered more accurately (Fig. 33). The error indices related to the second source for both approximate and exact recovery of the release function are given in Table 6. When the characteristics of the second pollutant source are identified, the forward model is updated according to the determined characteristics and then the observed data and simulated results computed by the updated forward model are compared. Comparison of these two sets of data indicates the existence of difference at M3 (Fig. 34). Therefore, it can be concluded that a pollution source upstream of this station, is active. By comparing the concentration data at the time of the difference detection in all secondary stations located between M3 and the first main station at upstream (M2) (Fig. 35), suspected reach to the presence of third pollutant source is determined between 40 to 48 km of the upstream end of the B1.
To ensure that there are no other simultaneous active sources upstream and downstream of the suspected reach, the observed and simulated data at P4 and P5 are compared during the continuous data recording period by these stations (Fig. 36). As shown in Fig. 36 there is no difference between the observed and simulated data at P4, meaning that there are no other active sources during the identification period. A comparison of these two sets of data in P5 shows the difference. By comparing the general form and peak concentrations of the observed concentration-time curve with the associated ones at M3 (Fig. 34), it can be concluded that this difference is due to the discovered contaminant source.
In the next step, the most probable location and approximate release function of the third source are recovered using the inverse model for two potential sources located at the center of two equal length sub-reaches, i.e. 42 km and 46 km of the upstream end of B1. The results are presented in Fig. 37. The figure shows the location of the third source at 46 km of the upstream end of B1, in which a trivial release function has been obtained. However, as shown in the figure, there is no good agreement between the recovered and the exact release functions and the reason is the time delay in identifying the effect of the third pollutant source at M3, due to the synchronization of its activity with the second pollutant source. As a result,   some part of the information of the third pollutant source is lost and consequently, retrieval accuracy is reduced and the associated uncertainty increased.
Once the most probable location of the third pollutant source is determined, its release function is retrieved more accurately using the observed concentration-time data at M3 (Fig. 38). The model has succeeded in recovering the release function of the mentioned pollutant source with acceptable accuracy. Also, the error indices presented in Table 6 confirm the results. It is worth mentioning that there is no limitation to the number of recovered pollution sources, and the presented model can be easily employed for cases with more than three active pollutant sources as well.

Conclusion
This study has been presented an innovative multistep method for simultaneous identification of number, location, and release history of pollutant source in a river network considering unsteady and non-uniform flow. The only prior information that the method needs are the expected activity period for recovery, the accuracy of spatial range for retrieval of the source location, and the travel time of each branch. Based on that prior information, at first, an adaptive arrangement of observation points is proposed. Then suspected reaches to the existence of pollutant sources are delineated by comparing the simulated and observed breakthrough curves at considered stations. In this step, the number of all simultaneous active pollution sources is also determined. Then, the suspected reaches are divided into some sub-reaches and it is assumed that the origin of possible sources is at the center of those sub-reaches. At the second step, the location and approximate release history of pollution sources are recovered using a geostatistical approach, that considered simultaneously all the possible candidates. The source location is considered as the location where the highest amount of the released pollutant is estimated. Finally, the exact release history is determined using the temporal distribution of observed concentration data at the first downstream main station. The proposed method is suitable for practical applications since it is based on one-dimensional flow and transport models and also considers complicated real-world conditions. The method is effective and easy to apply in complex river networks and single-branch ones as well. Moreover, since in each simulation it is possible to identify all active pollutant sources, the required computational cost is significantly lower than common iterative methods such as the simulation-optimization approach. Another significant advantage of the proposed method is that it provides unique results for sought characteristics, using minimum observational data. In fact, if the observation points are placed based on the suggested pattern, obtaining unique results is guaranteed. The results of the application of the method to some hypothetical river networks for different scenarios in terms of the number, release time, and location of pollutant sources, showed that the methodology performs well in the case of large-scale river networks. The quality of the recovery is dependent on the accuracy of the observation data. So, the uncertainty associated with results due to using erroneous observational data was considered also through a 95 percent confidence interval. This study is one of the first attempts to solve the complicated and illposed problem of simultaneous identification of all characteristics of multiple pollutant sources in a complex river network. Several aspects need further investigation.
Currently, the application of the proposed method is limited to cases in which the activity time of pollutant sources is equal to or greater than the expected activity time for recovery. Some measures such as considering random data collecting in secondary stations might alleviate this problem. This could be a subject for future research.
Funding No funding was received for conducting this study.
Availability of data and material Not applicable.
Code availability Not applicable.

Declarations
Conflict of interest The authors declare that they have no conflicts of interest.