An Efficient Approach for Nodal Water Demand Estimation in Large-scale Water Distribution Systems

Real-time modeling of a water distribution system (WDS) is a critical step for the control and operation of such systems. The nodal water demand, as the most important time-varying parameter, must be estimated in real time. The computational burden of nodal water demand estimation is intensive, leading to inefficiency in the modeling of large-scale networks. The Jacobian matrix computation and Hessian matrix inversion are the main processes that dominate the computation time. To address this problem, an approach for shortening the computation time for real-time demand estimation in largescale network is proposed. This approach allows the Jacobian matrix to be efficiently computed based on solving a system of linear equations, and a Hessian matrix inversion method based on matrix partitioning and the iterative Woodbury-Matrix-Identity Formula is proposed. The developed approach is applied to a large-scale network, in which the number of nodal water demands is 12523, and the number of measurements ranges from 10 to 2000. The results show that the time consumptions for the Jacobian computation and Hessian matrix inversion are within 465.3 ms and 1219.0 ms, respectively. The time consumption is significantly shortened compared with the existing approach, especially for nodal water demand estimation in large-scale WDSs.


Introduction
Hydraulic models of water distribution system have been applied extensively in the analysis and operation of such systems (Abu-Mahfouz et al. 2019;Chu et al. 2020;Moasheri and Jalili-Ghazizadeh 2020;Khatavkar and Mays 2019). For these hydraulic models, the nodal water demands, which are the most uncertain input parameters, can greatly influence the model accuracy (Chu et al. 2021a;Meirelles et al. 2017;Kapelan et al. 2007). Therefore, real-time estimation of nodal water demand has become a major task for the real-time modeling of WDSs. Thanks to the development of remote metering technology (Sankaranarayanan et al. 2020), the use of a large amount of measured data to estimate the nodal water demands in real-time has gradually received increasing attention in recent years (Vrachimis et al. 2018).
The sensors distributed in a WDS network usually upload measurements at regular time intervals, typically 5-15 minutes in China, and then these data are used for nodal water demand estimation. Therefore, the nodal water demands must be estimated within the time interval of data uploading. However, the computational burden of demand estimation is intensive, and the time consumption grows exponentially as the number of network nodes increases (Chu et al. 2021b). This leads to inefficiency of the existing approach when estimating the nodal water demands for a large-scale network. There is an urgent need to address such inefficiency problems for real-time nodal water demand estimation in largescale networks.
Many techniques have been proposed for estimating nodal water demands (Wang et al. 2019;Zhang et al. 2018). Examples include evolutionary algorithm(EA) based methods (Di Nardo et al. 2015;Zanfei et al. 2020), explicit methods (Berardi and Giustolisi 2021), singular value decomposition (SVD) methods (Weber and Hos 2020), and data assimilation methods (Chu et al. 2021a;Kang and Lansey 2009;Zhou et al. 2018). EA-based methods estimate nodal water demands by minimizing the deviation from the measurements to the model-simulated values. However, these methods require frequent calls to the EPANET hydraulic solver (Rossman 2000), which is time-consuming for a large-scale network. Explicit methods are computationally efficient methods, in which a parameter grouping strategy is adopted to reduce the dimensionality of the problem (Goulter and Bouchart 1990; Shang et al.2008;Freitas et al. 2018). This method groups the nodal water demands with similar characteristics as one parameter to be estimated (Rathi et al. 2020). However, the nonlinear relationship between the model outputs and model parameters is not considered, and improper grouping may result in additional errors (Jung et al. 2016). SVD methods have been used as an efficient means of estimating nodal water demands by solving the Moore-Penrose pseudoinverse of the coefficient matrix. However, such methods use only measured data, and the search space is not confined. Therefore, the nodal water demands can be arbitrarily adjusted, leading to considerable modeling errors.
Data assimilation methods estimate water demand by sequentially fusing the prior nodal water demand and measurements in a probabilistic form and finding a value with the maximum probability density value as the estimated nodal water demand (Law et al. 2015). Thus, the nodal water demand is estimated in a probabilistic form, typically described by the mean and covariance values. Data assimilation approaches can be further classified into sampling-based and numerical-based methods. Sampling-based methods are still time-consuming because they require frequent calls to the EPANET hydraulic solver to evaluate the probability density. In the numerical approach, the firstorder gradient (Jacobian matrix) or second order-gradient (Hessian matrix) information is utilized to search for the optimal solution. Therefore, such a method is expected to be more computationally efficient. However, estimation for large-scale networks is still time consuming when using current numerical approaches. Typically, the nodal water demands can be estimated only after several iterations, where each iteration involves the updating of the Jacobian matrix and Hessian matrix inversion. The updating of these two matrices involves an inversion operation for large matrix, which is time consuming. Therefore, the computation of the Jacobian matrix and Hessian matrix inversion are the processes that dominate the computation time.
The Jacobian matrix gives the first-order sensitivities of the outputs of a WDS hydraulic model (nodal pressure, pipe flow) to changes in the model parameters (nodal water demands, roughness) (Piller et al. 2017). The Jacobian matrix is widely used in many areas of water distribution system analysis, such as model parameter estimation, sensor placement, and uncertainty quantification. Piller et al. (2017) have presented high-quality research that provides explicit formulas for the Jacobian matrix of WDS steady-state heads and flows with respect to changes in the model parameters, including nodal water demands, roughness, and resistance factors. For the demand estimation problem, the Jacobian matrix between the model outputs and the nodal water demands is critical for minimizing the objective function. The computation of the Jacobian matrix involves an inversion operation on the matrix W ∈ ℝ n×n , which is typically a large, sparse, symmetric matrix (Coulbeck and Orr 1988), where n is the number of nodal water demands. Computing the Jacobian matrix using the conventional dense matrix inversion method is time-consuming, as its time complexity is O(n 3 ) . Piller et al. (2017) suggested that the Jacobian can be computed through the use of sparse Cholesky factorization, which is expected to be computationally efficient. However, this approach still suffers the disadvantage of being time consuming for demand estimation in a large-scale WDS, as the direct inversion of the matrix W involves many useless operations.
The computational burden of Hessian matrix inversion is also intensive, with a time complexity of O(n 3 ) . Some methods, such as using Hessian-free numerical optimization methods or solving a linear system of equations, allow the equations to be solved more efficiently. However, these methods are difficult to adopt in the data assimilation framework. For data assimilation approaches, the inversion of the Hessian matrix is used to compute the covariance matrix of the nodal water demands (Chu et al. 2021b). The covariance matrix is an important variable for quantifying the parameter uncertainty (Law et al. 2015). This highlights the importance of calculating the covariance matrix (Hessian matrix inversion) in a timely manner. Chu et al. (2021b) developed an efficient Hessian matrix inversion algorithm to shorten the time consumption of the demand estimation problem. This approach is efficient for problems in which the number of measurements m is far smaller than the number of nodal water demands ( m ≪ n ). However, it is not efficient enough when estimating the nodal water demands in sensor-dense networks (e.g., m n > 0.54 ) (Chu et al. 2021b).
This paper develops a computationally efficient demand estimation approach, especially for large-scale WDS networks. The contributions are as follows:1) an efficient method is developed to calculate the Jacobian matrix based on solving a system of linear equations; and 2) an efficient Hessian matrix inversion method is developed based on matrix partitioning and the iterative Woodbury-Matrix-Identity Formula. The developed approach is applied to a large-scale network, in which the number of nodal water demands is 12523 and the number of measurements ranges from 10 to 2000. The results demonstrate that the time consumptions for both the Jacobian computation and Hessian matrix inversion are significantly shortened compared with the existing approach.

Real-time Nodal Water Demand Estimation
A data assimilation method estimates nodal water demands through the fusion of realtime measurements with the prior nodal water demands in a probabilistic form, where X t|t ∈ ℝ n×1 represents the nodal water demands at the current time step; Y t ∈ ℝ m×1 is the measurement vector; P X t|t |Y t is the posterior probability distribution of X t|t , given the measurements Y t ; N X t|t |X t|t−1 , V t|t−1 is the prior probability distribution, describing the demand uncertainty predicted based on the previous time step ( t − 1 ); X t|t−1 ∈ ℝ n×1 and V t|t−1 ∈ ℝ n×n are the mean and covariance, respectively, of the nodal water demands predicted based on the previous time step t − 1 ; N Y t |g X t|t , R is the conditional probability distribution of the measurements Y t ; g X t|t ∈ ℝ m×1 and R ∈ ℝ m×m are the mean and covariance. g X t|t is the model outputs corresponding to the measurements, given the model input X t|t ; and m and n are the numbers of measurements and nodal water demands, respectively. The subscripts t|t − 1 and t|t represent that the states (values) at time step ( t ) are estimated from time steps t − 1 and t , respectively.
By maximizing the logarithm of P X t|t |Y t , the nodal water demands X t|t can be obtained. This is equivalent to minimizing the objective function shown in Eq. (2). The details of Eqs. (1, 2) can be found in (Chu et al. 2021b).
The objective function Eq. (2) is a nonlinear function and can be solved iteratively. The solution is achieved by solving Eqs. (3-6) iteratively.
where is the step size, satisfying < 1 ; ΔX k t|t ∈ ℝ n×1 is the demand corrector in the k-th iteration ;V t|t ∈ ℝ n×n is the covariance matrix of the estimated nodal water demands; and G k ∈ ℝ m×n is the Jacobian matrix in the k-th iteration.
Equation (5) aims to produce the Jacobian matrix of the measurements in each iteration. In addition, Eq. (6) involves the inversion of the Hessian matrix H = [V −1 t|t−1 + G k T R −1 G k ] ∈ ℝ n×n for the computation of the covariance matrix V t|t . For a large-scale network, the time consumption of Eqs. (5, 6) is intensive. Therefore, Eqs.
(5, 6) dominate the computation time. The bottleneck of computational efficiency lies in the solving of the Jacobian matrix and the Hessian matrix inversion (Chu et al. 2021b). Piller et al. (2017) provided explicit formulas for the Jacobian matrix between the model outputs (nodal pressure, pipe flow, etc.) and the model parameters (nodal water demand, pipe roughness, etc.). The mathematical formalism for the Jacobian matrix between the nodal pressures and nodal water demands can be written as shown in Eq. (7) (Piller et al. 2017).

Jacobian Matrix of the Pressure Measurements
where, p ∈ ℝ n×1 represents the pressures at all nodes; A ∈ ℝ np×n is the incidence matrix, F ∈ ℝ np×np is a diagonal matrix; G p ∈ ℝ n×n is the Jacobian matrix between the nodal pressures and the nodal water demands at the n nodes; and np is the number of pipes. Equation (7) gives the mathematical expression for the Jacobian matrix of n nodal pressures, which involves the inversion of the matrix −A T FA . The matrix −A T FA ∈ ℝ n×n is a square matrix, and the number of nonzero elements in −A T FA is far smaller than n 2 . Therefore, −A T FA is a large, sparse symmetric matrix (Coulbeck and Orr 1988). Treating −A T FA as a dense matrix and performing the direct inversion of −A T FA is time consuming with a time complexity of O n 3 . Piller et al. (2017) reported that the inversion of −A T FA can be computed at a marginal cost by using sparse Cholesky factorization. As shown in Fig. 1, the Jacobian matrix G p consists of n row vectors, where each row vector corresponds to a nodal pressure. Among the n row vectors, only m p ( m p ≪ n ) vectors (the right matrix in Fig. 1) correspond to measured pressure data and will be used for nodal water demand estimation, where m p is the number of pressure measurements. Even if sparse Cholesky factorization is used, the direct inversion method is not the most efficient approach especially for large-scale network, as most vectors ( n − m p ) in G p are discarded.
To improve the computational efficiency, we have developed an approach in which matrix inversion is avoided and the Jacobian matrix of the pressure measurements ( m p vectors) is directly calculated. Denoting (−A T FA) by W , the matrix G p can be computed by solving Eq. (8). Equation (8) can be decomposed into a system of linear equations, as shown in Eq. (9). Fig. 1 Calculation of the Jacobian matrix for the measured pressure data where, I ∈ ℝ n×n is the identity matrix; G p (i) ∈ ℝ n×1 is the i − th column of G p ; and I(i) ∈ ℝ n×1 is the i − th column of I. By solving Eq. (9), the i − th column vector G p (i) can be directly computed. Considering that G p is symmetric matrix, the i − th row of G p is the transposition of the column vector G p (i) . Therefore, m p column vectors corresponding to the pressure measurements can be computed by solving m p linear equations. Then, these m p column vectors are transposed to obtain the Jacobian matrix of the pressure measurements.
As mentioned previously, the matrix W is a large, sparse symmetric matrix. This allows the use of Cholesky factorization to solve the sparse symmetric system of linear equations in Eq. (9), an approach that has also been used in EPANET (Rossman 2000). The Cholesky factorization of matrix W can be written as where L is an upper triangular matrix with positive diagonal elements. Then, Eq. (9) can be written in its equivalent form, To calculate the Jacobian matrix for the pressure measurements, the Cholesky factorization of the matrix W should be performed first (Eq. 10). By submitting the matrix L to Eqs. (11 and 12), the Jacobian matrix ( G p (i) ) for i-th ( i = 1, 2, … , m p ) pressure measurement can be calculated. Considering that L is an upper triangular matrix, Eqs. (11 and 12) can be efficiently solved. Considering that Eqs. (11 and 12) for different measurements can be solved independently, parallelization implementation of Eqs. (11 and 12) is adopted to improve the computational efficiency. The pseudocode for the solution of the Jacobian matrix is shown in Fig. S1.

Jacobian Matrix for the Flow Measurements
As shown in Fig. S2, N 1 and N 2 denote the two nodes of a pipe, in which the flow rate is Q . The heads and pressures of the two nodes are h 1 and p 1 for N 1 and h 2 and p 2 for N 2 . The headloss of the pipe can be computed as h f = rQ = h 1 − h 2 , where r and are two coefficients. By taking the derivative of the head loss equation, we can obtain Eq. (13). Considering that , Eq. (13) can be written as Eq. (14).
(8) WG p = I (9) WG p (i) = I(i), i = 1, 2, … n To obtain the Jacobian matrix of the measured pipe flow data, we should first compute the pressure Jacobian matrix of the two nodes ( p 1 X t|t , p 2 X t|t ) of the pipe using the method developed in the previous section. The Jacobian matrix of the measured flow data is then calculated by substituting p 1 X t|t and p 2 X t|t into Eq. (14). Chu et al. (2021b) proposed an approach to solve the Hessian matrix inversion by adopting the iterative Sherman-Morrison Formula. This approach is efficient for problems in which the number of measurements is far smaller than the number of nodal water demands ( m ≪ n ). However, this method is not efficient enough when estimating the nodal water demands in sensor-dense networks (e.g., m n > 0.54 ) (Chu et al. 2021b). To fill this research gap, this paper provides an efficient method of computing the inversion of Hessian matrix.

Inversion of the Hessian Matrix
As shown in Fig. 2, the Jacobian matrix can be partitioned into several blocks, each of which corresponds to a group of measurements. Denoting the Hessian matrix by H NG , the Hessian matrix can be written as Eq. (15). where NG is the number of blocks (measurement groups); G i ∈ ℝ m i ×n is the i − th block of G ∈ ℝ m×n ; R i ∈ ℝ m i ×m i is the noise covariance of the measurements corresponding to G i ; and m i is the number of measurements corresponding to block i and satisfies ∑ NG i m i = m . Based on Eq. (15), the Hessian matrix has the following form.
Denoting G i H −1 i−1 by M ∈ ℝ m i ×n , based on Woodbury-Matrix-Identity Formula, the Hessian matrix inversion can be calculated as,

Fig. 2 Partitioned Jacobian matrix
As shown in Eqs. (17-19), the Hessian matrix inversion problem is solved after NG iterations. Each iteration involves the inversion of the matrix S ∈ ℝ m i ×m i , with a time complexity of O m i 3 . The total time complexity of each iteration can be written as O m i 3 + i , where i is the time complexity of other matrix operations (matrix multiplication and addition, etc.). Considering that the algorithm requires NG iterations, the total time complexity can be written as NG 2 . At this point, the time consumption decreases exponentially with an increasing number of measurement groups NG . Theoretically, the algorithm consumes the least time when NG = m (each block corresponds to just one measurement), in which case it is equivalent to the method developed by Chu et al. (2021b). However, Eq. (20) indicates that setting NG = m is not an optimal strategy. An increase in NG will reduce the time complexity of the inversion of the matrix S ; however, the time consumption of the other matrix operations ( ∑ NG i i ) will gradually increase. When NG = m , the total time In this situation, other matrix operations dominate the computation time, leading to inefficiency of the algorithm developed by Chu et al. (2021b). Compared with the method presented by Chu et al. (2021b), the developed method provides a more flexible means to select the appropriate NG to balance the time consumptions of the inversion of the matrix S and other matrix operations, and thus obtains a significant improvement when dealing with sensor-dense networks. The resulting nodal water demand estimation algorithm is shown in Fig. S3.

Case Study
The proposed approach has been developed in the C language. The algorithm for solving the Jacobian matrix needs to allocate a large amount of memory to store the matrix W ; thus, the CPU is most suitable for processing the parallel algorithm for Jacobian matrix computation. The algorithm for Hessian matrix inversion is executed in parallel on the GPU. Simulations were performed on an ASUS computer equipped with an Intel (R) Core I7-8750H CPU (6 cores and 12 threads) and an NVIDIA GeForce GTX 1050 Ti.
The developed method was applied to BWSN network 2, a large-scale network with 12523 nodal water demands to be estimated ( n = 12523 ), as shown in Fig. 3. Detailed information about BWSN network 2 can be found in Ostfeld et al. (2008). Various scenarios with numbers of measurements ( m ) ranging from 10 to 2000 were simulated to assess the properties of the developed approach. The sensor locations corresponding to these measurements were randomly selected. Both the prior demands and measurements were generated as synthetic data. The measured pressure data were generated by adding a normally distributed random value with a mean of 0 and a variance of 0.2 m. A uniformly distributed random value in U(−X, X) was added to the true nodal water demand ( X ) to generate the prior nodal water demand X t|t−1 . The diagonal elements of V t|t−1 were set to (X t|t−1 ) 2 . (20)

Computational Efficiency
The proposed approach, denoted by LE_SCF, computes the Jacobian matrix by solving linear equations (LEs) through the utilization of sparse Cholesky factorization (SCF). For comparison, the direct inversion method, denoted by Direct inversion_MKL, in which −A T FA is treated as a dense matrix is provided. This method (Direct inversion_MKL) is implemented through the use of a parallel inversion solver based on Intel MKL (Wang et al. 2014), a high-performance parallel matrix computing library provided by the Intel Corporation. In addition, a direct inversion method utilizing sparse Cholesky factorization(SCF) is also provided, denoted by Direct inversion_SCF. Typically, algorithms can estimate the nodal water demands after 15 iterations (Fig. S4). Figure 4 gives the average time consumption of the Jacobian matrix computation using each of the three methods in each iteration. From Fig. 4, we can see that the time  (Ostfeld et al. 2008) consumption of the LE_SCF method (the red line) increases linearly with an increasing number of measurements ( m ), while the time consumptions of the direct inversion methods (Direct inversion_MKL and Direct inversion_SCF) are relatively stable. This is because the developed LE_SCF method calculates the Jacobian matrix by solving m linear equations, whereas the direct inversion methods always directly invert the matrix −A T FA ∈ ℝ n×n ; thus, their time consumptions do not vary with the change in the number of measurements.
From Fig. 4 we can see that the time consumption of the Direct inversion_SCF method (the green line) is 16% of that of the Direct inversion_MKL method. This highlights the effectiveness of sparse Cholesky factorization in improving computational efficiency. In addition, the time consumption of the developed approach (red line) is far less than those of the direct inversion methods. Even when the number of measurements reaches 2000, the time consumption of the developed algorithm is only 20% of that of the Direct inver-sion_SCF method. Generally, the developed method can estimate the nodal water demands after 15 iterations (Fig. S4). In this situation, the total time consumption of the Jacobian matrix computation in the developed method over 15 iterations ranges from 0.18 s to 6.98 s as the number of measurements increases from 10 to 2000. For comparison, the total time consumptions of the Direct inversion_MKL method and the Direct inversion_SCF method are approximately 219 s and 34.77 s, respectively. The developed method, in which the Jacobian matrix is computed by solving a system of linear equations with the use of sparse Cholesky factorization, offers significantly shortened time consumption and improved usability, especially for the real-time modeling of a large-scale network. Figure S5 shows the average time consumption of Hessian matrix inversion in each iteration for different numbers of measurement groups ( NG ), with the number of measurements ranging from 10 to 2000. To facilitate the analysis, the time consumption for m = 2000 is shown in Fig. 5. It is interesting to note that two different performance-changing stages can be observed from Fig. 5 (red line). For the first stage, in which the number of measurement groups is fewer than 8 ( NG < 8 ), a rapid decrease in time consumption is observed; then follows the second  Table 1. Figure 6 shows the time consumption of Hessian matrix inversion after the number of measurement groups has been optimized (Table 1). For comparison, the time consumption of the Sh_Mo based method developed by Chu et al. (2021b) is also provided. As shown in Fig. 6, the time consumption of the developed approach increases as the number of measurements increases, and it is always within 1000 ms. In contrast, the time consumption of the Sh_Mo based method increases from 2315 ms to 165204 ms . On average, the time consumption of the Sh_Mo method is almost 60 times greater than that of the developed approach. The main reason is that the Sh_Mo method calculates the Hessian matrix inversion after m iterations (Chu et al. 2021b), making it equivalent to the developed approach with NG = m . Thus, the Sh_Mo method is not efficient, as it consumes considerable time for the other matrix operations. In contrast, in the developed approach, the appropriate number of groups ( NG ) can be selected to significantly shorten the time consumption of Hessian matrix inversion.

Modeling Accuracy
The cumulative probability of absolute error for the 12523 nodal water demands is shown in Fig. S6. The absolute error is computed as the deviation from the estimated water demands to the true values. The prior error is computed as the absolute deviation from the prior  Fig. S6, the estimated water demand error approaches the prior error. This is because the developed approach attempts to minimize the difference between the estimated nodal water demands ( X t|t ) and the prior values ( X t|t−1 ). Therefore, the algorithm may show similar performance when the same prior values are used. Figure S7 presents the cumulative probability of absolute error of the nodal pressures at the 12523 nodes. The absolute error is computed as the deviation from the estimated nodal pressures to the true values. The prior pressure error is the absolute deviation between the true pressures to the pressures computed by directly using the prior demands as model input. To facilitate the analysis, the estimated results for m = 30, 100, 1000, 2000 are shown in Fig. 7. As shown in Fig. 7, the estimated pressure error is smaller than the prior pressure error. In addition, the nodal pressure error decreases with an increasing number of measurements. For example, 50% of the nodes have an error greater than 0.5 m when m = 30 , while 20% of the nodes have an error greater than 0.5m when m = 100 . This significant improvement highlights the importance of the utilization of measurements to improve the modeling accuracy.

Conclusions
The timely estimation of nodal water demands is critical for the real-time modeling of WDS. The Jacobian matrix computation and Hessian matrix inversion are the processes that dominate the computation time. This paper has developed an efficient approach for improving the typical computational efficiency of nodal water demand estimation in the large-scale network.
The developed approach calculates the Jacobian matrix based on sparse Cholesky factorization and solving a system of linear equations. Each equation can be solved independently, allowing the use of parallel programming to improve the computational efficiency. An efficient Hessian matrix inversion method based on matrix partitioning and Iterative Woodbury-Matrix-Identity Formula has been developed. This approach provides a more flexible means to balance the time consumptions of the matrix inversion and other matrix operations, and thus obtains a significant improvement when dealing with sensor-dense networks. The results show that the time consumptions of the Jacobian matrix computation and Hessian matrix inversion are approximately 20% and 1.6%, respectively, of those in the existing approach.

Conflicts of Interest/Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.