A Model-Based Bayesian Framework for Pipeline Leakage Enumeration and Location Estimation

With the development of cities, the water resources loss and environmental pollution caused by pipeline leakage need to be solved urgently. In this paper, a probabilistic method of model-based Bayesian analysis is designed to solve the multi-leakage detection problem of reservoir pipeline valve system. Bayesian inference selects the model best suited to the measured data. This process estimates the number of leaks and then extracts the leak locations from a model that measures data preferences. In this paper, according to the characteristics of water head in pipeline, the Likelihood function of water head for Bayesian evidence calculation is given. It solves the problem that the location ability of recent research methods is limited by leakage location. The number and locations of leakages can be determined simultaneously. Different experimental Settings and scenarios are given to verify the effectiveness of the proposed method. For three leaks that do not contain tight leaks, the RMSE of each leak is 2.3068 m, and in the case of tight leaks, the average RMSE of each leak is 3.5011 m. The results demonstrating that this model-based Bayesian analysis is an accurate tool for leakage enumeration and location estimation.


Background and Introduction
With the rapid development of economy, pipeline transportation is playing an increasingly important role in the national economy, national defense industrial and people's daily life. Oil, natural gas, and water pipelines have become the lifeblood of national economic development (Li et al. 2019). Pipelines are the cheapest means of transportation, but this does not mean that they are risk-free. In recent years, Due to the increase of population, urbanization and industrialization, people are facing water shortage (Naik and Nagarajappa 2017;Pasalwad et al. 2019). While water supply is usually through a pump and piped system (Suseela et al. 2020), thus pipeline leakage accidents are the main source of water losses and occur frequently (Gupta and Kulat 2018). The impact of water loss in urban water supply systems on water and energy resources as well as the quality of public services has become a continuous and global challenge (Colombo and Karney 2002;Duan 2018;Del Teso et al. 2019). In order to ensure the safe work of pipelines and minimize the harm caused by leakage accidents, it is necessary to study leakage detection technology to raise the sensitivity of detection.
Various leak detection methods have been developed for decades. Recent researches have focused on transient-based leak detection methods. Transient-based technology utilizes the hydraulics of transient flow and measured pressure response at specified location(s) to detect leaks in the pipeline (Wang and Ghidaoui 2018). The reason that such methods are expected to work is that the pressure response signal in fluid conduits measured at a specific location is changed by its interaction with the physical system as it propagates and reflects throughout the system as a whole. It contains useful information of the conduit's properties and state. Specific methodological examples of this approach are (i) inverse transient-based method (ITM), an anti-mathematical method for determining the set of solution parameters in transient events. (Soares et al. 2011;Stephens et al. 2013;Vitkovsky et al. 2007); (ii) frequency response-based method (FRM), It is a single pipe leakage detection method using the system frequency response as an indicator (Duan 2016;Sattar and Chaudhry 2008;Kim 2016); (iii) transient damping-based method (TDM), a method for locating transient flows in pipes according to different Fourier component attenuation (Wang et al. 2002); and (iv) transient reflection-based method, a transient time-frequency analysis method (Covas et al. 2004;Sun et al. 2016).
In recent researches (Li et al. 2020; Wang and Ghidaoui 2018;Wang et al. 2019), the authors respectively formulated a transient leak detection method and found that the approach can locate leaks for the cases of one leak and two leaks. However, the simulation results also indicated that the localization efficiency of two leaks depends on the leak locations. Moreover, these methods cannot simultaneously determine the number and locations of leaks. These are precisely the shortcomings that this paper addresses.
Model-based Bayesian inference is a methodology that is capable of incorporating information related to the system under study, including a sufficiently accurate model derived from the observed phenomena. These methods have been used more and more in recent years. Previous researches (Bush and Xiang 2018;Escolano et al. 2014) applied Bayesian inference to DOA analysis (Knuth et al. 2015) applied Bayesian analysis to the context of signal processing. Bayesian model selection has also been employed to room-acoustic modal analysis (Beaton and Xiang 2017). Inspired by the above research, the paper presents a transient model-based Bayesian analysis for pipeline leak localization using nested sampling, which is a method that can infer the number of leaks and their locations through probabilistic analysis.
Inspired by the above research on applying Bayesian analysis to signal processing, in this paper, a model-based Bayesian leak detection algorithm is proposed. It solves the shortcoming that the transient leak detection method in recent research cannot determine the number and location of leaks at the same time. The positioning performance of this method is independent of the location of leaks, which makes up for the limitations of the MUSIC-Like method. And the positioning error is still within an acceptable range under the environment of low SNR. At the same time, the algorithm is suitable for more complex leakage cases.
The following contents of this paper are as follows. This paper begins by describing the pipe system and the frequency-domain transient wave model. Then, the Bayesian framework is introduced: the basis of Bayesian parameter estimation and model selection, followed by the introduction of nested sampling. Numerical simulations are then presented, where cases of two leaks and three leaks are both considered, and the performance of the proposed methods is evaluated and the results are shown. Finally, conclusions are drawn.

Linearized Model of Wave Propagation in Pipeline
In this section, the model of transient wave propagation in a pipeline is introduced. The configuration of the considered pipe system is shown in Fig. 1. A single, horizontal pipeline is connected by an upstream and downstream nodes with locations at x = x U = 0 and x = x D = l. A valve is positioned at x = x D to generate transient waves. A pressure measurement station is assumed to be placed near the downstream node whose coordinate is x = x M . The locations ( x L1 , … , x LN ) of N leaks are parameters to be estimated. QL 0 and HL 0 denote the steady-state discharge and head at the leak section, respectively. The lumped leak parameter s L = C d A L stands for the leak size, in which C d is the discharge coefficient of the leak and A L denotes the flow area of the leak opening (orifice). The steady-state discharge of the leak is related to the lumped leak parameter by where z L denotes the elevation of the pipe at the leak and g is the gravitational acceleration.
Assuming that a single pipe has equal section, equal wall thickness and the same pipe material, its field matrix can be expressed as follows (Chaudhry 2014) where = a −1 √ − 2 + igA R is the propagation function and Z = a 2 (i gA) is the characteristic impedance, where a is the wave speed, ω denotes the angular frequency, A is area of pipeline, R is steady-state friction resistance term by linearization, equal to (fQ 0)/ Fig. 1 Setup of the considered pipeline system (gdA 2 ) for turbulent flow , in which f is the Darcy-Weisbach friction factor, Q 0 is the steady-state discharge, and d is the pipe diameter.
The flow discharge and head oscillations due to a rapid change in flow setting are represented by q and h. Given the discharge q(x U ) and head h(x U ) at the upstream node x U , the quantities at x M can be obtained via the transfer matrix method after linearization where the superscript NL stands for no leak.
By rewriting Eq.
(2), the head measurement at x = x m (m = 1,…, M) near the downstream for a given angular frequency ω j (j = 1,2,…,J) is assumed to follow the theoretical expression from Eq. (2) plus a noise term (Wang et al. 2019): wherein and n jm is assumed to follow additive independent Gaussian random distribution with zero mean and covariance σ 2 .
The corresponding head difference due to leaks can be approximated via a linearized model: in which ∆h jm = h(ω j , x m )-h NL (ω j , x m ) represent the head difference between the head measurement in the presence of leaks and the theoretical head that does not include the leak terms at the measurement station x m and at the frequency ω j . The data Δh is used in the following section.
In Eq. (5), the n-th column of matrix G(x L ) is where Here, The upstream is connected to a reservoir, thus h(x U ) = 0 is reasonably assumed. The discharge q(x U ) can be estimated by adding a pressure sensor near the upstream Assuming there is no leak between x U and x M0 and using the pressure head measurement h(x M0 ) at x M0 and applying the boundary h(x U ) at x U , the discharge q(x U ) at x U can be solved via that is,

Numerical Setup
In this section, the system setup for the pipeline is introduced. The setup of the considered pipe system is shown in Fig. 1. The pipe is connected to two reservoirs; the heads of the upstream and downstream reservoirs are H 1 = 25 m and H 2 = 20 m, respectively. The pipe length is l = 2000 m and the wave speed is a = 1,000 m. The Darcy-Weisbach friction factor is f = 0.02 and the pipe diameter is d = 0.5 m. The steady-state discharge A valve is just set at the downstream of the pipeline and its role is to generate the desired transient wave. Assuming that an impulse wave is generated by rapidly closing and opening the valve, the boundary conditions h(x U ) = 0 and q(x D ) = 1 are given. The given dataset in the frequency domain is accomplished by using the transfer matrix method in Eq. (1). Here, there are three pressure measurement stations located at x M1 = 2000 m, x M2 = 1800 m and x M3 = 1600 m, respectively. Another pressure sensor at x = x M0 = 50 m is used to estimated q(x U ) according to Eq. (10). The resonant and anti-resonant frequencies ω = {(1 + α)ω th , α = 0, 0.02, 0.04, …, 25} are used, where ω th = aπ/2 l. The dimension of the dataset is JM = 3603. Different noise levels are considered to study the performance of the proposed method. Zero-mean independent and identically distributed Gaussian white noise is added to all the pressure sensors. The noise level is divided by signal-to-noise ratio in decibel, which is defined as where E(Δ ) denotes the average head difference and σ represents the standard deviation of the Gaussian white noise.

Bayesian Inference
Bayesian inference is extensively based on the Bayes's theorem. This section introduces the Bayesian analysis for pipeline leakage localization problem. Let Θ = {x L1 , x L2 , … } represent a vector of model parameters (leak locations). For a given dataset D and a given model (12) SNR = 20 log 10 (|E(Δ )| ) M, the Bayesian inference formulated in this specific problem begins with Bayes's theorem (Escolano et al. 2014;Landschoot and Xiang 2019) The term P(Θ|D, M) represents the posterior probability of model parameters, Θ, which quantifies the state of information for the parameters. P(D|Θ, M) is referred as the likelihood function, which indicates the resemblance of the data D for a given model parameter to the model M. In the paper, the dataset D is the pressure difference Δh. Since the vector of head difference Δh follows a JM-dimensional complex-valued Gaussian distribution, its probability density function is The term P(Θ| M) represents the prior distribution of the parameters given the model, M. It represents any prior knowledge we may have about the likely values of the parameters. According to the principle of maximum entropy, this should be uniformly distributed to avoid any preference. Finally, the conditional probability P(D| M) of the observed data for a particular model can be considered the likelihood of the model given the data, referred to as the marginal likelihood or the Bayesian evidence or just evidence for M.
The posterior probability of the parameter satisfies the following conditions,

This equation can be rearranged to
The marginal likelihood value, P(D|M) is evaluated over the whole parameter space by integrating the product of the likelihood and prior distribution.
In a Bayesian formulation, estimating the number of sources is an application of model selection. Bayesian model selection is a probabilistic method of evaluating a finite set of models, given the observed data, and then seeking the model that best describes the data. The idea behind the model selection is to compare the posterior probability of a set of competing models (Escolano et al. 2014;Landschoot and Xiang 2019;Sivia and Skilling 2006). This can be determined by the probability of the model, M i , given the data, D, represented as P(M i |D). Bayes's theorem states that P(M i |D) is the posterior probability of the model, P(D) is the probability of the measured data, which is independent of M i , and for this research it will be used as a constant that will not be of interest. P(M i ) is the prior probability of the model, and should be assigned based on any previous knowledge we may have about the likely values of the models. In the research, each model will be given equal prior probability in order to avoid preference for any model.
For convenience, the posterior ratio of two models M i and M j is defined as when assigning the competing models equal prior probability, i.e., P(M i ) = P(M j ); the model selection is determined in terms of the likelihood function P(D|M i ). If the numerator is greater than the denominator, the data prefers model M i over M j . The likelihood function in model selection is exactly the Bayesian evidence term in parameter estimation task. Therefore, model selection can be carried out just by comparing evidences obtained in the process of parameter estimation.

Nested Sampling
At the heart of Bayesian inference is calculating the evidence of each model for comparison, so different sampling methods must be put in place. A numerical approach is to calculate the Bayesian evidence using a sampling algorithm, termed nested sampling. The chosen approach exploits the prior and likelihood as inputs and generates samples from the posterior as a secondary output. Nested sampling utilizes the close relationship between the likelihood function L(Θ) and the prior mass ξ(λ). In the terminology of the subject, mass denotes an accumulated amount of probability and the prior mass can be accumulated from its elements dξ = P(Θ|M)dΘin any order (Skilling 2004), so let us define as the cumulant prior mass covering all likelihood values greater than λ. As λ increases, the enclosed mass ξ decreases from 1 to 0. Thus the evidence becomes and the cumulative form of numerical integration in Eq. (15) can be expressed as where 0 < ξ K < ··· < ξ 2 < ξ 1 < 1, ξ 0 = 1 and K is the total number of samples taken. For practical implementations, the prior mass will tend to shrink exponentially by one part at each iteration for a population of Q samples, leading to (Skilling 2006) Figure 2 displays one sequence sampled from the experimental data. The likelihood value from each sample is recorded when the parameters change under the constraint that each accepted parameter increases the likelihood value of the sample.
The curve grows monotonically and may be considered to be completed when following increase can be ignored as the sample population converges on the most likely parameter values. Here, we choose the appropriate number of iterations, K.
The main steps in the implementation of the nested sampling are summarized as follows, Fig. 3

Two-Leak Example
Here, a numerical example in Li et al. (2020) is revisited. A single pipe which is connected by two reservoirs in a horizontal plane is considered. The configuration is described in the above section. Figure 4 shows the output function of multiple signal classification (MUSIC)-like algorithm with two-leak assumption, i.e., two leaks at x L1 = 300 m and x L2 = 1200 m with actual sizes s L1 = 1.0 × 10 −4 m 2 and s L2 = 1.2 × 10 −4 m 2 . The SNR is set to 10 dB. In this case, the output function reaches maximum at the actual leak position. It is clear that there is a local maximum near each actual leak, but there is a higher lobe at around 1500 m, which will prevent obtaining correct estimates of leak locations. However, the method can locate two leaks in other locations (e.g., two leaks at x L1 = 600 m and x L2 = 1200 m). Therefore, it can be seen that the location accuracy of two-leak case is related to the location of the leak from the results. In the following, this section presents simulation results to demonstrate the outcomes of the Bayesian analysis method. The conditions used are the same as the previous case. The evidence results are shown as a line chart in order to compare models of differing   Fig. 5. The evidence for each model is evaluated over 10 individual runs using nested sampling. Figure 5 illustrates the log-evidence estimations over different models for the number of leaks is 1, 2, 3, and 4. It is observed that the position of the maximum indicates that there is most evidence for two leaks. Note that, there is a fall-off from the maximum on the left side, because there is not enough structure in the proposed model to adequately illustrate the data; there is a decrease on the right side, as the models become increasingly, and unnecessarily, complicated. According to the results presented in Fig. 5, one come to the conclusion that the experimental data prefer the model, M 2 , indicating the presence of two leaks. After model selection, select the parameters from the sample with maximum likelihood in the final population of the two-leak model as the actual leak locations. The results of leak locations are displayed in Fig. 6, wherein the solid lines and circles represent the actual and estimated leak locations, respectively. It is clear that the parameter estimation process can accurately localize the leaks. Two estimates are close to the actual leaks, with locations x L 1 = 300.16m and x L 2 = 1199.75m . Since the results are affected by random error, in order to observe the

Number of Leaks
Log Evidence statistical properties of the leak localization results, the root-mean-square error(RMSE) of each leak location estimation with respect to various SNR is plotted in Fig. 6. Here, the RMSE is defined as in which x L is the actual leak location and x L i represents the estimated value of the ith trial. When the number of leaks is determined, each simulation is repeated 10 times, i.e., N = 10. The SNR is varied from -15 dB to 25 dB. The results show that as SNR increases, the localization error (RMSE from 10 simulations) of each leak location decreases and the error is within the acceptable range. (Fig. 7) Next, the other two sets of results are also considered to demonstrate the performance of the proposed method. Two sets of experimental data were obtained in two cases, the case with leak locations are x L1 = 600 m and x L2 = 1200 m with actual sizes s L1 = s L2 = 1.2 × 10 −4 m 2 and the case with two close leaks whose coordinates are x L1 = 1000 m and x L2 = 1040 m with actual sizes s L1 = 1.0 × 10 −4 m 2 and s L2 = 1.2 × 10 −4 m 2 . According to the results of model selection in Fig. 8a, b, it is determined that the given experimental data prefer the model M 2 , indicating that there are two leaks. Figure 8c, d respectively plot the localization results of two cases. It can be seen that the proposed method can accurately localize the two leaks. In summary, this method can locate any two leaks. Figure 8e displays the RMSE of leak localization for the former case. Figure 8f shows the leak localization results with respect to various SNR for the latter case. As indicated in the above two figures, the estimation error increases as SNR decreases for each leak location. By comparing Fig. 8e, f, it is observed that for the former case, the estimation error is smaller than the latter case for the low SNR environment. For a low SNR being − 15 dB, reducing the distance between the two leaks increases the localization error.

Three-Leak Example
In this section, a more complicated case with three leak is considered. The leak locations are x L1 = 400 m, x L2 = 1000 m and x L3 = 1400 m; the leak sizes are s L1 = 1.4 × 10 −4 m 2 , s L2 = 1.4 × 10 −4 m 2 and s L3 = 1.2 × 10 −4 m 2 . Other configuration parameters are the same as the previous section. In the following, the Bayesian inference is justified to be able to select the model that most appropriately represents experimental data and identify the leak locations. Figure 9 shows the localization results using the MUSIC-Like method. The SNR is set to -10 dB. There are higher side lobes at around 200 m and 600 m, which may disturb the determination of leak locations. Other smaller lobes may be mistakenly identified as leaks especially when the number of leaks is unknown. Then, the Bayesian inference is used to detect the leaks. Firstly, the evidence obtained by nested sampling is used for model selection in order to determine the number of leaks. Figure 10 illustrates the log-evidence with respect to various models for the number of leaks is 1, 2, 3, and 4. The highest evidence is at the case of three leaks. The position of the maximum indicates that there is most evidence for three leaks. According to the results illustrated in Fig. 10, the following conclusions can be drawn: the experimental data prefer the model, M 3 , indicating the presence of three leaks. After the selection of the three-leak model, adopt the parameters from the sample with maximum likelihood in the final population of the selected model as the actual leak locations. Note that the each estimated leak location is close to its actual value, with locations x L 1 = 416.1376m , x L 2 = 1002.5304m and x L 3 = 1401.9674m . The left half  Table 1 lists the RMSE of leak location with various noise level. It is clear that as the SNR decreases, the RMSE(obtained from 10 simulations) of each leak increases. However, the average localization error of each leakage exceeds 200 m for a low SNR being − 15 dB.  In the following, the three-leak case including close leaks is also considered. The leak locations are x L1 = 1000 m, x L2 = 1040 m and x L3 = 1600 m; the leak sizes are s L1 = 1.4 × 10 −4 m 2 , s L2 = 1.4 × 10 −4 m 2 and s L3 = 1.2 × 10 −4 m 2 . Figure 11 displays the logevidence estimations over the different models. The results clearly implies that the model, M 3 , most appropriately represents the experimental data, indicating the presence of three leaks. Similarly to the previous case, each leak location estimation is close to its actual value and the estimated values are respectively x L 1 = 999.7778 m , x L 2 = 1040.2151 m and x L 3 = 1606.5884 m . In addition, the right half of Table 1 lists the localization error with various SNR. It is observed that the localization performance improves in high SNR environment.

Conclusions
In this paper, a model-based Bayesian leakage detection algorithm is proposed to solve the multi-leak detection problem of reservoir pipeline valve system. The method can determine the number and location of leaks simultaneously, and its positioning performance is independent of the location of the leak point. The technique has been validated by two numerical studies with two leaks and three leaks, respectively. The information about pipeline leakages can be identified from the results for both cases.
The results for two-leak case illustrate that the localization efficiency of Bayesian inference is independent of leak locations and the localization error is acceptable even in noisy environment, which solves the defect that the localization ability of the MUSIC-Like method is limited by leak locations. Besides, the proposed method for the case of three leaks situation is investigated. It is also able to determine the number and locations of leaks even for the cases including close leaks. To better apply transient-based leak detection method to practical applications, future work requires efforts to study various uncertainties that could affect the leak enumeration and localization, such as imprecise measurement of friction factor, wave speed and steady-state discharge.