Utilizing Matrix Completion for Simulation and Optimization of Water Distribution Networks

Simulation of water distribution networks (WDNs) constitutes a key element for the planning and management of water supply systems. The literature presents different formulations of heads-flows equations that differ in terms of dimensionality, computational cost, and solution accuracy. Whereas this problem has been the subject of active research in the past, in the last decades a state of stagnation was reached and no new formulations were introduced. In this study, we propose a novel formulation that utilizes a matrix completion technique to construct a reduced-size nonlinear system of equations that guarantees both mass and energy conservation. The advantages of the proposed method are demonstrated in simulation and optimization settings. In the former, the method demonstrates improved scalability and accuracy as compared with other widely known formulations. In the latter, the new formulation leads to compact optimization problems that result in better performance in terms of the objective function, run time, and rate of successful optimization runs.


Introduction
Water distribution networks (WDNs) are large-scale systems responsible for supplying drinking water from sources to consumers. These systems are composed of several links and nodes, such as pipes, pumps, valves, and reservoirs. Many problems require simulation of the flows and pressures in the network. For example, hydraulic simulators are needed in the WDN design problem (Saleh and Tanyimboh 2014;Marinnho et al. 2020;Pankaj et al. 2020;Wang et al. 2021) that seeks optimal sizing of pipes, pumps, and tanks to meet the water demand while satisfying pressure requirements within the network nodes. Hydraulic simulators are also used in the real-time operation problem (Pasha and Lansey 2014;Salehi and Shourian 2021) that seeks optimal pump scheduling to minimize operation cost subject to hydraulic requirements in the network. Moreover, many hydraulic monitoring systems use simulation engines (e.g. EPANET, Rossman 2000) for leakage estimation and detection (Martínez-Solano et al. 2017), system security monitoring (Housh and Ohar 2017), system maintenance and rehabilitation (Fuchs-Hanusch et al. 2008), and system resiliency and reliability assessment (Klise et al. 2017).
In general, the flows and pressures in a WDN are calculated by solving a linear set of mass conservation equations and a nonlinear set of energy conservation equations that should be solved simultaneously, thus making the problem nonlinear (Giustolisi et al. 2012; Moosavian and Roodsari 2014;Qiu and Ostfeld 2021). Different formulations exist that can be utilized to derive the flows and heads in the WDN. These formulations differ in terms of dimensionality, computational cost, and solution accuracy. There is a rich body of literature on the development of formulation and solution methods for WDN modeling. Next, we provide a concise summary of key formulations and techniques in this domain.
One of the most widely used formulations is the Heads and Flows (H-Q) formulation, proposed by Todini and Pilati (1988) as the Global Gradient Algorithm (GGA). In this formulation, the nodal heads and flows in the links are considered unknowns within a simultaneous nonlinear system of equations that includes both the mass and energy conservation equations. The formulation showed excellent convergence and was adopted in the widely used EPANET hydraulic simulator (Rossman 2000). Later, Basha and Kassab (1996) proposed solving the equations with the H-Q formulation based on the theory of disturbances, which yields a fast convergence rate to final solutions. However, Moosavian (2017) argued that the accuracy of the method was not sufficient.
Overall, the H-Q formulation suffers from high computational cost due to the problem's dimensionality. In several papers, the H formulation was proposed, in which only the nodal heads are considered unknowns within a nonlinear system of equations that guarantees both mass and energy conservation. Martin and Peters (1963) proposed formulating all the WDN equations in terms of nodal heads and updating the heads iteratively using the Newton-Raphson method. Liu (1969) solved the H formulation using this method while decomposing the Jacobian matrix into diagonal and nondiagonal matrices. This decomposition is utilized by observing that the nondiagonal matrix is negligible as compared to the diagonal one. However, Moosavian (2017) argued that this method suffers from divergence in the case of an inappropriate initial guess and in the case of large-scale networks.
An additional widely studied formulation is the Q formulation, in which the flows are the only unknowns within a nonlinear system of equations that guarantees both mass and energy conservation. In this formulation, the energy conservation equations are written around the loops in the network, such that in each loop the headloss should be equal to zero. This formulation was shown to be numerically superior to the H formulation (Nielsen 1989). Wood and Charles (1972) proposed the linear theory method for solving the Q formulation. In this method, the nonlinear headloss is linearized in each iteration, using the zero and the current flow as anchors. A disadvantage of this method is the low convergence rate due to the oscillations in the iterative process. Similarly, Wood and Funk (1993) used the extended Taylor series to solve a linearized Q formulation iteratively. This method is currently used in the commercial software KYPIPE (Wood 1980).
Building on the Q formulation, several authors suggested partitioning the network graph into a spanning tree and a corresponding co-tree (i.e., non-spanning tree links), where the size of the chord is equal to the number of loops in the network. Such partitioning will allow the spanning tree flows to be written as a function of chord flows, and thus the chord flows are the only unknowns within a nonlinear system of equations that guarantees both the mass and the energy conservation. We call this approach the Qchord formulation. It is noteworthy that the number of chord flows (i.e., number of loops) is always smaller than the number of links, meaning that this system of equations will have fewer unknowns (Nielsen 1989;Rahal 1995;Elhay et al. 2014;Qiu et al. 2020). An additional technique for reducing the number of unknowns is to formulate the system of equations in terms of corrections of flow around each loop (these correction variables are denoted by ΔQ ). We call this approach the Null-Space formulation since the loops of the network can be represented by the null space of the network's incidence matrix (Epp and Fowler 1970;Benzi et al. 2005;Housh 2011;Abraham and Stoianov 2016). In this formulation, observing that the mass conservation is written as an undetermined system of linear equations, the complete general solution to the underdetermined system can be characterized by adding a particular solution (i.e., an arbitrary flow distribution that satisfies the mass conservation) to a linear combination of the null space vectors. As such, mass conservation can be guaranteed by representing the flows as an affine map of ΔQ . The energy conservation equations are written around the loops in the network, such that in each loop the headloss should be equal to zero. Thus, we obtain a nonlinear system of equations where both the number of equations and the number of unknowns equals the number of loops in the network. This reduced system of equations can be solved using the Newton-Raphson method (Abraham and Stoianov 2016). Todini and Rossman (2013) proposed a unified framework to categorize the different available formulations in the literature. The paper shows that the previously discussed formulations can be classified either as linear-or nonlinear-transformation of the H-Q formulation. The different transformations project the H-Q system of equations onto vector bases smaller than the original bases (i.e. projection to nodes dimension in case of the H formulation, to pipes dimension in case of Q formulation, or loops dimension in case of the null-space formulation).
In this paper, we propose a novel formulation that utilizes the matrix completion technique to construct a reduced-size nonlinear system of equations that guarantees both mass and energy conservation. This reduction will result in increased efficiency in the nonlinear solver since each iteration will handle a smaller Jacobian matrix. Unlike former formulations that rely on the topology of the network (e.g., loops) and/or its original physical unknowns, in the proposed formulation we arbitrarily add "fictional" entries and auxiliary unknowns to facilitate direct mathematical manipulations. More specifically, we employ a matrix completion technique, in which arbitrary entries are added to the system of equations in order to obtain invertible squared matrices that facilitate the elimination of both the H and Q variables in the problem. To the best of our knowledge, there is no published study exists that considered such a procedure to solve the hydraulic simulation of WDNs, although it holds advantages compared to other methods discussed in the literature. Under the unified framework of Todini and Rossman (2013), the suggested completion formulation could be also categorized as a linear transformation of the H-Q formulation. However, as opposed to other approaches that use the network topology to project the problem to a smaller solution domain, we create a random projection by adding arbitrary random columns/rows to complete the system of equations to full rank matrices.
The idea of using the completion technique is inspired by Yang (2011) who used a similar technique to analyze a bilinear system of equations. Noteworthy that by matrix completion we do not mean the problem of low-rank matrix completion, which is a very popular problem in machine learning and computer vision applications Recht 2009, Candès andTao 2010). In the low-rank matrix completion problem, one wants to recover a matrix for which we have only measured a subset of the entries. In this study, we refer to the full-rank completion problem as defined in Yang (2011) and Johnson et al. (2014) where completion means adding rows to a known rectangular matrix such that it will be a squared full-rank matrix.
Using benchmark networks, the proposed method is compared to (1) the H-Q formulation, (2) the Qchord formulation, and (3) the Null-Space formulation. These formulations were chosen because the first is widely used in the literature while in the latter two formulations, the number of unknowns is reduced, similar to the proposed matrix completion formulation. Nonetheless, unlike the former methods, the suggested method does not rely on the topological properties (e.g., loops, spanning trees) of the network. We also demonstrated how the new formulation can facilitate the full elimination of both H-Q variables when used within an optimization model. This elimination reduces the size of the decision variables, which results in an efficient solution of the optimization model.

State-of-the-Art Formulations
In the WDN hydraulic simulation problem, the objective is to find the flows in the links (Q) and the heads at the nodes (H) such that the mass and the energy conservation are satisfied in the entire network.

Mass Conservation
Mass conservation at each node can be formulated as: where Q l is the flow in link l ; q k is the demand at node k ; IN k and OUT k are sets of ingoing and outgoing links that are linked to node k , respectively; nNodes is the number of the junction nodes in the network (i.e., not including source nodes).

Energy Conservation
Energy conservation in each link can be formulated as: where H l e and H l s are the heads at the end and start nodes of link l , respectively; nLinks is the number of the links; ΔH l is the head change of link l . In the case of a pipe, ΔH l is the headloss (note the minus in Eq. 3) which could be defined using the Hazen-William formula: where R l is the Hazen-William resistance of pipe l . In cases where the link represents a pump, the head gain is defined as the pump characteristic curve that dictates the head gain as a function of the flow in the pump. For example, when the pump characteristic curve is given as quadratic function (Lansey and Mays 1989), ΔH l can be defined as: where a , b and c are constants that are determined according to the specifications of the pump.
For nodes with fixed heads, Eq. (5) should be satisfied: where FH is a set of nodes with a predetermined fixed head (e.g. reservoir); H 0,k is the given value of the fixed head at node k.

Matrix Formulation
Equations (1)-(5) can be formulated in a matrix form. Following the notation of Todini and Rossman (2013) and many other studies, we define the oriented incidence matrix 12 ∈ ℝ nLinks×nNodes of the network in which each entry (l, k) equals 1 if link l is ingoing node k , − 1 if the link l is outgoing node k , or otherwise. We also define the incidence matrix of fixed head nodes 10 ∈ ℝ nLinks×|FH| which relates the pipes to the fixed heads that are defined in Eq. (5). In 10 an entry (l, k) will take value − 1 if link l is outgoing the fixed node k . For sake of clarity we define 21 = T 12 .
Equation (1), which dictates the mass conservation at the nodes, can be formulated as an underdetermined linear system of equations (underdetermined = more unknowns than equations) as shown in Eq. (6).
where the vector ∈ ℝ nLinks×1 collects the unknown flows in all links; the vector ∈ ℝ nNodes×1 collects the demands at all nodes; Since nLinks > nNodes in looped networks, the linear equation system above is undetermined, namely, the null space of 21 is not empty. If we define the matrix as the orthonormal basis for the null space, then ∈ ℝ nLinks×nLoops where nLoops = nLinks − nNodes equals to the number of loops and quasi-loops in the network (quasi-loops are paths between constant head nodes, Qiu et al. 2020).
where the vector ∈ ℝ nNodes×1 collects the heads at all nodes, Δ ( ) ∈ ℝ nLinks×1 is a vector of functions representing the head loss/gain in the links; 0 ∈ ℝ |FH|×1 is a vector that collects the values of the fixed heads in the network. Since the vector of functions Δ holds nonlinear terms, the system of equations in Eq. (7) is nonlinear. As discussed previously, the hydraulic simulation problem could be achieved using different formulations as summarized in Table 1: The H-Q formulation directly solves Eqs. (6), (7) as a simultaneous nonlinear system of equations with nLinks + nNodes unknowns/equations. Thus, both and are obtained as a direct outcome of the nonlinear equations solver. The H-formulation extracts from Eq. (7) as a function of nodal heads, = Δ −1 ( 12 ⋅ + 10 ⋅ 0 ) , which is then substituted in Eq. (6) to result in a nonlinear system of equations with nNodes unknowns/equations. The heads, , are obtained as a direct outcome of the nonlinear equations solver while the flows could be obtained by substituting in = Δ −1 ( 12 ⋅ + 10 ⋅ 0 ) . Noteworthy that, in Eq. (7), each element l in the vector of functions Δ ( ) , is a one-dimensional function of the form Δ l ( l ) , thus the functions inversion Δ −1 (⋅) could be derived analytically (Martin and Peters 1963;Boulos et al. 2006). In the Q-formulation, Eq. (6) is solved together with Eq. (8) as a simultaneous nonlinear system of equations with nLinks unknowns/equations. The flows, , are obtained as a direct outcome of the nonlinear equations solver while the heads, , could be obtained from Eq. (7) which becomes linear after determining the flows.
In the Qchord formulation, Eq. (6) is re-written as: where ST is sub-matrix of 21 which collects columns corresponding to a spanning tree; ch is sub-matrix of 21 which collects columns corresponding to co-tree; ST and ch are permutation matrices to extract the spanning tree and the chord flows, respectively; ST and ch are flows in spanning tree and chord links, respectively. After replacing in Eq. (8) with ( ch ) as obtained from Eq. (9), it could be solved as a nonlinear system of equations with nLoops unknowns/equations. The chord flows, ch , are obtained as a direct outcome of the nonlinear equations solver, the flows are obtained by substitution, from the function ( ch ) , while the heads, , could be obtained from Eq. (7) which becomes linear after determining the flows. In the Null-Space formulation, Eq. (6) is re-written as: where p is a particular solution to the undetermined system (i.e., arbitrary flow distribution that satisfies Eq. 6); Δ is corrections to flow around each loop. After replacing in Eq. (8) with (Δ ) as obtained from Eq. (10), it could be solved as a nonlinear system of equations with nLoops unknowns/equations. The flow corrections, Δ , are obtained as a direct outcome of the nonlinear equations solver, the flows are obtained by substitution, from the function (Δ ) , while the heads, , could be obtained from Eq. (7) which becomes linear after determining the flows.

Matrix Completion Formulation
The proposed method aims at obtaining the flows and heads through an auxiliary problem using the matrix completion technique. The matrix 21 is missing nLoops independent rows to become a squared full-rank matrix (i.e., invertible matrix). Let c 21 ∈ ℝ nLoops×nLinks be an arbitrary completion matrix of 21 , that collects nLoops rows which are linearly independent of the rows in 21 , then we can define a squared invertible matrix ̃ 21 ∈ ℝ nLinks×nLinks as: Equation (6) can be written using ̃ 21 as: where ∈ ℝ nLoops×1 is a vector of auxiliary variables. Notice that, unlike a "classical" linear equation system which only includes variables on the left-hand side (e.g., Eq. 6), in Eq. (12) we have a linear system with extra variables on the right-hand side. These auxiliary variables have an important role in the completion technique; without these variables adding more equations would change the solution space of the original problem (Eq. 6). But with these variables on the right-hand side, the original solution space will be maintained. To see that the solution space is maintained, note that any solution ( * , * ) to Eq. (12), must satisfy the first nNodes equations, which is exactly the linear system in Eq. (6). On the other hand, if * is a solution for Eq. (6), then we can set * = c 21 ⋅ * such that ( * , * ) is a solution for Eq. (12).
Recalling that ̃ 21 is an invertible matrix, we can extract as a function of , as follows: Note that there are many ways to complete the matrix 21 , since there is far more than one choice of the matrix c 21 that satisfies the linear independence requirement. In fact, randomly sampling the entries of c 21 is likely to produce a valid completion matrix that satisfies the linear independence requirement. In this study, the entries of c 21 are sampled arbitrarily from a uniform distribution between 0 and 1.
Next, we replace in Eq. (7) with ( ) as obtained from Eq. (13). This will yield an overdetermined equations system (more equations than unknowns): Equation (14) could be formulated using the squared invertible matrix, ̃ 12 =̃ T

21
, as follows: where ∈ ℝ nLoops×1 is a vector of auxiliary variables. This auxiliary vector is essential for matching the dimensions after using ̃ 12 instead of 12 . From Eq. (15), and could be calculated as a function of : Adding the variables, however, will change the original energy conservation in Eq. (7), unless we require that: That is, we need to have: where w = [ nNodes×nNodes | nLoops×nLoops ] is a permutation matrix to extract from ; is a matrix of zeros; is the identity matrix.
To conclude, in the proposed completion method, we solve the following nonlinear system of equations with nLoops unknowns/equations, in which the unknowns are the auxiliary variables: In Eq. (19), the vector is obtained as a direct outcome of the nonlinear equations solver, the flows are obtained by substituting in Eq. (13), while the heads are obtained by substituting in Eq. (20).
Noteworthy that the inversion of the matrix ̃ 12 should be performed one time for a specific network, thus the same ̃ −1 12 is valid in case we run the network with different loading conditions as done in extended period analysis.

Simulation
The proposed matrix completion technique for simulating WDNs is tested over an illustrative example and five benchmark networks. The solution obtained from the proposed methods is compared to: (a) H-Q formulation; (b) Qchord formulation and (c) Null-Space formulation. All the computations were carried out in Matlab R2020a with a processor of (15) 12 ⋅ = − 10 ⋅ 0 + Δ ( ( )) Intel ® Core ™ i7-10510 CPU @ 1.80 GHz 2.30 GHz and 16.0 GB RAM. All the formulations are solved using the fsolve solver which is shipped with the optimization toolbox of Matlab. The code and data are available in our GitHub Repository: https:// github. com/ SWSLAB/ Matri xComp letion. The results of the illustrative example are detailed in Appendix A. We performed hydraulic simulation for five networks which are available from EPANET-MATLAB Toolkit (Eliades et al. 2016). A summary of the network characteristics is given in Table 2. The initial guess of each run is sampled from a uniform distribution that ranges in practical bounds on the variables (e.g. maximum flow cannot exceed total demand). Figure 1 shows the average maximum absolute error (MAE) of the flows obtained from 20 different initial guesses. The bars show the average MAE, compared to the solution from EPANET, while the error intervals show the minimum and maximum MAE for each network and for each method. In comparison to the H-Q method, satisfactory results (below 0.07 m 3 hr −1 in all runs) were observed in the Completion method in all the networks. Examining the average MAE, Null-Space and Qchord methods obtained the largest MAE in Balerma, KL, and Hanoi networks. Despite the inferior MAE of the Completion method in the FossPoly1 network, the MAE is yet acceptable. Examining the minimum and maximum MAE shows that the H-Q method obtained a very narrow interval between minimum and maximum values in all networks. This indicates that the H-Q method is the While the H-Q method may be superior in estimation accuracy and stability, it might be inferior in terms of computational efficiency because of the need to update both the flow and head variables. To examine the computation time, Fig. 2 shows the average, minimum and maximum CPU time required for each network.
The results show that the computation time for the H-Q formulation is the highest among all methods in all networks. For example, noting the logarithm scale in the y-axis, the Completion formulation is 7 times faster than the H-Q formulation in the KL network. The Completion and Null-Space formulations produced the fastest results on average. However, comparing the Null-Space and Completion formulations in terms of accuracy ( Fig. 1) tips the scale toward the Completion formulation since it produced smaller average MAE in large networks such as KL and Balerma. For example, in the Balerma network, both the Null-Space and Completion solved the problem with about 2 s while the H-Q method required an average solution time of 200 s. However, the MAE in the Completion formulation is 0.03 CMH compared to 1000 CMH in the Null-Space formulation.

Optimization
The results show that the Completion method holds advantages compared to other methods for solving the hydraulic simulation problem. The benefit of this method, due to its feature of size reduction, is further highlighted when solving optimization problems that involve hydraulic simulation in the constraints. The proposed formulation can facilitate the full elimination of both H-Q variables, thus reducing the number of the decision variables significantly in the optimization model.
Many optimization problems in the water distribution systems literature involve hydraulic simulation. The most typical are the optimal design problem (Alperovits and Shamir 1977) and the optimal operation problem (Zessler and Shamir 1989). In the former, the Fig. 2 Average, minimum and maximum CPU time for benchmark networks decision-maker seeks optimal sizing of components (e.g. pipes, pump stations, tanks, etc.), while in the latter problem, the decision-maker seeks optimal scheduling of pumps to minimize operation cost. Many studies used a simulation-optimization framework for problems that involve hydraulic simulations. In this framework, a meta-heuristic technique, such as genetic algorithm (Savic and Walters 1997) is coupled with available hydraulic simulation tools (e.g. EPANET) to perform the objective function and constraints evaluation. As such, following this approach does not require the explicit formulation of the hydraulic simulation equations that involve H-Q variables. Nonetheless, with the advancement of new analytical optimization solvers, derivative-based optimization methods are regained research interest for their computational efficiency and their independence of parameter tuning processes (Mala-Jetmarova et al. 2017;Qiu et al. 2020). Unlike the simulation-optimization framework, when using analytical optimization solvers, the formulation of the optimization problem should explicitly include the hydraulic simulation within the problem's constraints. The straightforward approach for doing so is to include H-Q variables as decision variables in the optimization problem along with the mass and energy balance in Eqs. (6), (7) as equality constraints. For example, under these settings, the optimal design problem for a gravitational WDN could be formulated as in Eq. (21) where ∈ ℝ nLinks×1 is the per-unit-length resistance; s , s are vectors of flows and heads for demand load condition s ; N s is the number of demand load conditions; ∈ ℝ nLinks×1 is a vector of pipes length; min , max are vectors for minimum and maximum nodal heads; | s | is the element-wise absolute value of vector s ; • is the element-wise Hadamard product; f ( ) is the cost function. Note that there is a one-to-one function that relates to the diameters , i.e. ( = g( ) as shown in Eq. 22). As such, by seeking optimal per-unitlength resistance we determine the diameters of the pipes. Still, solving the optimization problem in terms of instead of the diameters, as suggested in Eq. (21), is advantageous since the formulation in terms of involves less nonlinear terms. The last constraint limits the resistance within minimum ( min ) and maximum ( max ) which represent maximum and minimum allowed diameters, respectively.
where C is the roughness coefficient and D is the diameter in centimeters.
The number of decision variables in the nonlinear optimization problem given in Eq. (21) equals to nLinks + (nLinks + nNodes) ⋅ N s . That is nLinks resistance variables, in addition to nLinks flow variables, and nNodes head variables for each demand loading condition. Using the Completion approach one can write an equivalent optimization problem (Eq. 23), which only has nLinks + nLoops ⋅ N s decision variables ( nLinks resistance variables, and nLoops of z variables for each demand loading condition). For an increasing number of demand loads, the size of the optimization problem in Eq. (23) is significantly smaller than the one in Eq. (21). This reduction in optimization problem size leads to an efficient solution, as will be demonstrated next. Noteworthy that the last line in Eq. (23) is not an equality constraint, but it is a definition that should be substituted in the first two constraints. Hence, the optimization problem in Eq. (23) involves only and s for each loading condition, s , as decision variables.
To demonstrate the advantage of using the proposed Completion method within an optimization problem we compared the optimization problem in Eq. (21) to the reduced size problem in Eq. (23) on the gravitational Hanoi network introduced by Fujiwara and Khang (1990). The network consists of 34 pipes and 31 demand nodes supplied by a single reservoir at a constant head of 100 (m). The minimum head is 30 (m) and the maximum head is 100 (m) at all nodes. The allowed diameter of all pipes ranges between 12 and 40 inches. The Hazen-Williams coefficient of all pipes is 130. The lengths of all pipes are listed in Fujiwara and Khang (1990). To create different demand loads, the demand was sampled from a joint normal distribution assuming mean demand equal to the demand in Fujiwara and Khang (1990). The standard deviation is defined as 10% of the mean demand for each node. Moreover, to consider different demand patterns and correlation, the system nodes were partitioned into two demand zones: zone 1 at nodes 1:15, zone 2 at nodes 16:32. Intrazone correlation is set to 0.8 reflecting that consumers in the same zone follow a similar demand pattern, whilst interzone correlation is set to − 0.6 to reflect different patterns in the two zones.
Using the above configuration, Hanoi optimal design problem is solved for increasing number of demand loads ranging from 1 demand load up to 30 demand loads. This process creates different optimization problems with different sizes. The optimization problems are solved using the fmincon solver (SQP algorithm), which is shipped with the optimization toolbox of Matlab. Since the optimization problem is non-convex and fmincon is a local optimization solver, we used a multi-start procedure, in which each optimization problem is solved from 50 starting points that are randomly drawn from uniform probability distribution defined by the decision variables bounds. Out of the 50 runs, the selected solution is the one that is feasible and minimizes the objective function (Loganathan et al. 1995). Figure 3 shows the box and whisker plot of the optimal cost statistics for both formulations and different problem sizes (i.e., increasing N s ). The optimal cost is analyzed for runs that terminated successfully out of 50 runs from different initial guesses. The box shows the mean, median, the first and third quartile while the whiskers show the minimum and maximum cost. The figure shows that the mean optimal cost achieved by the reduced problem (i.e., r-z formulation of the completion method) in Eq. (23) is generally better than the r-H-Q formulation in Eq. (21). However, the true exam is the best optimal cost, again the results show that the r-z (23) formulation obtained superior results, except when solving the problem with one loading condition in which the best cost of the r-z formulation is slightly outperformed. Figure 4 shows the number of runs which succeeded to declare successful termination.
The results show that the r-H-Q formulation failed to declare successful termination in more than half of the runs when the number of demand loads is greater than 10. For example, when N s = 30 the r-Q-H formulation converged to a solution only in 7 out 50 runs while the r-z formulation converged to a solution in 39 out of 50 runs. This could be explained by the significant difference in the size of the optimization problems. For the r-z formulation, when N s = 30 , the optimization problem involves 34 + 3 ⋅ 30 = 124 This reduction in the problem size is also beneficial in the mean CPU time required by the optimization solver. Figure 5 shows the mean CPU time of the successful runs, while Fig. 6 shows the mean CPU time of the failed runs. The results show that the reduced r-z formulation can obtain the optimal solution faster than the r-H-Q formulation. While the difference in the CPU time is negligible for small size optimization problems (i.e., N s = 1, 5 ), the CPU time saving can reach factor 15 for large optimization problems with N s = 30.

Conclusion
The proposed method utilizes the matrix completion technique to construct a reduced-size nonlinear system of equations that guarantees both mass and energy conservation. Unlike in former formulations that rely on the topology of the network, in the proposed method arbitrary entries are added to the system of equations to obtain invertible squared matrices that facilitate the elimination of both the H and Q variables in the problem. The advantage of the proposed method in terms of accuracy and run time in comparison to other formulations is demonstrated through five benchmark networks. Compared to the full H-Q formulation (the method used in EPANET), the proposed method is superior in terms of run time while maintaining acceptable accuracy. Moreover, the new formulation can be also embedded in optimization problems to simulate the hydraulic variables. Using the new formulation, a compact optimization problem is obtained which results in better performance in terms of the objective function, run time, and rate of successful optimization runs.
In light of the above, our method advantages are in the simplicity of the formulation and the efficiency of the solution both under simulation and optimization settings. To reduce the problem size one can find different cycle bases (for example, in the null-space formulation), but this will require analysis of the network topology and its fundamental loops while in the suggested approach any set of arbitrary entries can be used to derive the reduced size formulation. Our results show that the reduced problem is more efficient to solve. This efficiency is important for both theoretical and practical reasons. Often, the hydraulic simulation problem is a crucial module in Decision Support Systems, for example, in the optimal real-time operation problem, which is typically solved using Model Predictive Control (MPC), the solver must converge within minutes. Otherwise, the state of the system will change and the obtained solution will not be relevant for the controller. Salomons and Housh (2020) argued that a practical solution limit is five minutes, which is a very challenging limit for large-scale problems. Reducing the complexity of the hydraulic simulation using the completion method will contribute to more efficient optimization models for the real-time operation problem. Similarly, real-time monitoring systems that use hydraulic simulators can benefit from the enhanced efficiency of the suggested method. For design problems, one can argue that the solution efficiency (i.e. solver time) is not very important, since the problem is solved offline. This is why we can see in the literature time-consuming optimization-simulation frameworks, which link hydraulic solvers (e.g. EPANET) with evolutionary algorithms. This approach might take several hours or even days to solve the design problem for realistic case studies. Nonetheless, in practice, it is important to seek efficient solution approaches even for offline design problems. Typically, optimization models results will be used to facilitate disciplined discourse between stakeholders. A slow solution process will hinder real-time stakeholders' interaction with the model in which different scenarios are refined rapidly. In practice, it is hard to define the scenarios in advance; usually, the scenarios are redefined during an active discussion between the stakeholders, partly by seeing the solutions for different alternatives imposed by the decisionmakers. As such, solution efficiency in design problems is important because it facilitates near-real-time interaction with the model that can contribute to expediting stakeholders' engagement with Decision Support Systems.
Additional salient features of the proposed method include that (1) it is possible to write infinitely many equivalent auxiliary problems that guarantee mass and energy conservation for the same network and (2) the construction of the auxiliary problem can be efficiently updated in the case where the network is run with different loading conditions, since many of the algebraic operations (e.g., matrices inversion) are independent of the loading conditions. The former feature suggests that different completion strategies may result in different formulations for the same problem. This study did not explore different completion strategies. Here, only randomly sampled entries are used in the completion procedure. Nonetheless, we hypothesize that different completion strategies may yield even more enhanced performance. Future research should consider this limitation of the current work and suggest tailored completion strategies that benefit the solution process efficiency.

Illustrative Example
A synthetic network was generated as shown in Fig. 7. It consists of 7 pipes, 4 nodes, and one fixed head reservoir. The lengths of the pipes are 1000 m each, the diameters are 200 mm, and the Hazen-Williams coefficient is 100 in all pipes. The head at the reservoir is fixed at 10 m and the elevation of all network's nodes is 0 m. The nodal demand is 20 CMH at all nodes. Figure 8 shows the matrices definitions required by the completion technique as detailed in Eqs. (11)(12)(13)(14)(15)(16)(17)(18)(19)(20).  To analyze the robustness of the methods, we performed 100 runs where each run starts from a different initial guess. The initial guess of each run was sampled from a uniform distribution between − 100 and 100 (20 × 5 = 100 CMH). The average and the standard deviation (STD) of the MAE of the flows as obtained from the 100 runs are detailed in Table 3. The standard deviation is presented as an indication of the sensitivity of each formulation to the initial guess. The results in Table 3 show that the H-Q method outperformed other methods in terms of accuracy of estimation of flows as it has a lower average MAE. Comparing the other methods shows that the completion formulation outperforms other methods in average MAE and it is more robust since the standard deviation of the MAE is lower than the Qchord and Null-Space formulations.