An Accurate Leakage Localization Method for Water Supply Network Based on Deep Learning Network

In the water supply network, leakage of pipes will cause water loss and increase the risk of environmental pollution. For water supply systems, identifying the leak point can improve the efficiency of pipeline leak repair. Most existing leak location methods can only locate the leak point approximately at the node or pipe section of the pipe network but cannot locate the specific location of the pipe section. This paper presents a framework for accurate water supply network leakage location based on Residual Network (ResNet). This study proposes a leak localization idea with a parallel classification and regression process that enables the framework to pinpoint the exact position of leak points in the pipeline. Furthermore, a multi-supervision mechanism is designed in the regression process to speed up the model’s convergence. For a pipe network containing 40 pipes, the positioning accuracy of the pipe section is 0.94, and the MSE of the specific location of the leakage point is 0.000435. For the pipe network containing 117 pipes, the positioning accuracy of the pipe section is 0.91. The MSE of the specific location of the leakage point is 0.0009177. Experiments confirm the robustness and applicability of the framework.


Introduction
Water loss is a key issue in the management of water distribution systems, because, in addition to water consumption, it leads to the use of additional energy and chemicals for water treatment and supply, and carries the risk of bacterial and pollutant contamination (Fontanazza et al. 2015). In addition to these hydraulic and water quality impacts, pipeline rupture can also cause damage to surrounding infrastructure (such as ground collapse), posing a serious threat to public safety (Guo et al. 2013). Nowadays, many parts of the world suffer from rapid and uncoordinated growth in water demand, water depletion, and weak management (Ghandehari et al. 2020). The impact of water loss in urban water supply systems on water and energy resources and the quality of public services has become a continuing global challenge (Duan 2018;Del Teso et al. 2019;Bajany et al. 2021). For example, in 2016, more than 100 pipe breaks were recorded in Guangzhou, China, which significantly affected the service quality of urban water distribution systems (WDS) (Zhang et al. 2016). In addition, they often lead to social impacts such as water supply disruptions and traffic delays (Berardi et al. 2008). Therefore, the precise location of the leak is critical to effectively restoring the water supply.
In the past century, fluid transportation through pipelines and pipeline networks has made great progress, and the leakage detection technology of water supply networks has been developed for more than twenty years. Commercial departments now have a variety of hardwarebased leak detection equipment (Aghda et al. 2018). Similarly, software-based leak detection algorithms have been proposed in recent studies, including steady-state and transient-state (Zhou et al. 2019;Xing and Sela 2019). Hardware-based leak detection technology equipment can be roughly divided into "out of the tube" or external equipment and "in tube" or robot equipment. Most software-based leak detection methods run under steady-state conditions (Perez et al. 2014;Zaman et al. 2020). These technologies are based on the analysis of flow rate, pressure, consumer demand, or acoustic data collected from a large number of sensors to collect enough information from the pipeline system. In recent years, machine learning algorithms and deep learning techniques have been used to detect or locate leaks. Two different machine learning classifiers based on linear discriminant analysis (LDA) and neural network (NNET) were developed to determine the probability of leakage of each node in water distributed networks (WDN) (Irofti and Stoican 2020). Xu et al. (2020) proposed and verified a real-time leak detection method of water distribution networks based on the data drive. Unique integration of interference extraction and isolation forest technology is used to enable the detection of subtle burst signals from pressure data. Xie et al. (2019) offered a new nearreal-time method for hydraulic monitoring and detecting regional leaks. Sparse coding is used, and a linear classifier is trained to identify the most likely leak region. Quinones-Grueiro et al. (2021) presented a leak detection, estimation, and location method combining data-driven and model-based methods. Deep neural networks are used in leak detection tasks. Then, Gaussian process regression is used to estimate the leakage size range. Soldevila et al. (2022) combined expert knowledge and data-driven models for leak detection and location in water distribution networks, solving leak location as a classification problem and simplifying this problem with a customized clustering scheme. However, the leak points of the above methods are located at the nodes of the pipe network. In recent research work, some attempts have also been made for the case where the leak point is located on the pipe section. For example, a leakage detection model (DBSCAN-MFCN) based on density-based spatial clustering application with noise (DBSCAN) and Multi-scale Full convolutional network (MFCN) is proposed to manage water loss. To reduce the number of categories, DBSCAN divides a large water network into several partitions to detect leaking areas (Hu et al. 2021). Zhou et al. (2019) offered a burst position recognition framework based on a fully linear dense network (BLIFF). The framework can effectively narrow down the potential burst area to one or more pipelines. Although this method can directly locate the leaking pipeline, the research on the specific leak location on the pipeline is still very lacking. Therefore, this paper proposes a precise identification framework for the location of the leak. It can effectively identify the precise location of the leakage.
Machine learning algorithms, such as traditional support vector machines, artificial neural networks (ANN), and clustering have been used to detect and or locate leaks (Romano et al. 2014;Wu et al. 2016), the feature extractors of these methods need to be manually set, and it is difficult to learn for complex features. In recent years, as a new branch of artificial neural networks, deep learning technology (Lecun et al. 2015), has become a tool for pattern recognition and feature recognition. Compared with machine learning algorithms, deep learning methods can learn relatively complex functions through data and can automatically extract features. Convolutional neural network (CNN) is a type of deep learning, often used for feature extraction and classification (Geng et al. 2020). Since AlexNet, the most advanced CNN architecture has become deeper and deeper. AlexNet has only 5 convolutional layers, and the subsequent VGG network (Simonyan and Zisserman 2014) and GoogleNet (Szegedy et al. 2015) have 19 and 22 layers respectively. Due to the vanishing gradient problem, deep networks are difficult to train. Therefore, He et al. (2016a) proposed a new densely connected convolutional network (ResNet) architecture, and it was extended in He et al. (2016b). The residual network is characterized by easy optimization and can increase the accuracy by adding considerable depth. The internal residual block uses jump connections, which alleviates the problem of gradient disappearance caused by increasing depth in the deep neural network, strengthens the propagation of features, and has better accuracy. Therefore, this paper proposes a leak detection and location model based on ResNet to improve the accuracy of detection.
To date, there has been little investigation into the status of the leak point on the pipe section and the precise location of the leakage in this case. This study aims to build a framework that can effectively identify the precise location of the leakage (the exact location of the pipeline where the leak point is located). The framework is based on ResNet and uses The Water Network Tool for Resilience (WNTR) to run hydraulic models. The pressure difference datasets at the meter location are input into the ResNet-based backbone network, and the classification and regression processes determine the leaking pipe segment and the specific location of the leaking point, respectively.

Method
Based on the hydraulic model of WDN, the ResNet-based leakage precise positioning framework of the water supply pipe network is proposed in this paper to extract the characteristics of pressure mode when each pipe burst occurs. For any emergency, the pressure drop response caused by the emergency outlet is different at different nodes of the WDN and in different pipelines. The leaking pipeline is identified through the classification process, and the leak is located through the regression process. Section 2.1 introduces the framework structure of precise leakage localization. Section 2.2 introduces the backbone network of the framework based on ResNet. Section 2.3 introduces the Multi-supervision module structure. Section 2.4 introduces the preparation process of the dataset.

Water Supply Network Leakage Precise Location Identification Framework
Based on ResNet Figure 1 shows the flowchart of the proposed precise positioning framework. The framework can be roughly divided into four parts, which are dataset generation; ResNet training; classification process, and regression process. The first step of the framework is to set different leakage sizes and locations according to the hydraulic model. First, through simulation obtain pressure data at each node of the pipe network. Then, depending on the nodes that the sensors place, data at corresponding nodes are selected and input into the ResNet network. The output of the network is classified and regression respectively. The output of the classification is the probability value P n of leakage of each pipeline, and the corresponding pipeline ID with the maximum probability value is selected as the classification result, that is, the judged leakage pipeline. For the regression process, regression is performed for each pipeline sample to output its corresponding leakage position L on . According to the classification results, the regression result of the corresponding pipeline is selected, which is the specific location of leakage on the pipe section. The process is shown in Fig. 2. In Fig. 2, the value of n is N pipe , P n is the probability value of leakage of each pipeline. L on is the output of the corresponding leakage position. The shape of the output prediction of the classification process is B × N pipe , in which B is the batch size, that is the amount of input data per training. N pipe is the number of pipes in a water supply network.

The Backbone Network of the Framework Based on ResNet
This paper constructs a ResNet-based network architecture with a convolution layer to reduce the size required for storage. The residual block which is shown in Fig. 3 is used to extract information and a global average pooling is used for the feature pooling. The network connects a fully connected layer as output. Regression and classification tasks are performed for the output of the network respectively. ResNet was first proposed in He et al. (2016a) and extended in He et al. (2016b). It explicitly reformulated the layers as learning residual functions concerning the layer inputs, instead of learning unreferenced functions.
Experience shows that deeper neural networks can extract more complex information, and better results can be obtained by making neural networks deeper. As the gradient disappears, with the increase of network depth, the accuracy of the network becomes saturated or decreased. A residual learning framework is used to solve this problem. If the input is x, the feature that the architecture should learn is H(x), but in the residual learning framework, it is expected that the residual F(x) = H(x)-x can be learned, and the original feature that should be learned is F(x) + x so that the network performance will not decrease. When the residual error is 0, the stacking layer is only mapped identically, at least the network performance will not decrease, and in fact, the residual error will not be 0, which will enable the stacking layer to learn new features based on the input features, thus having better performance. The residual learning framework also reduces the learning challenge, because the residual is usually small, which means that ResNet needs to learn less information than the traditional framework. Compared with most computer vision tasks, the input of the dataset in this paper is simpler, so ResNet18 is chosen as the backbone network.
[ P 1 , P 2 , P 3 , …, P n-2 , P n-1 , P n ] If it is the maximum value of this row, take out the corresponding regression result.

Weight layer
Weight layer

Multi-Supervision Module Structure
In the regression process, this paper proposes a multi-supervision module to speed up the regression convergence of the model, which can improve the convergence ability of the network for the poorly fitting network. The structure of the module is shown in Fig. 4. The output of ResNet flows through the head structure three times, which consists of a full connection layer, Normalization process, Rectified Linear Unit, and a full connection layer. Each output is activated by the sigmoid function. The output of the three times is ensembled and the regression is performed together. The structure of the Multi-supervision is described as follows: 1. Output results from ResNet pass-through header structure and activation function, and then output L 1 . 2.
Step 1 is executed three times, and three outputs L 1 , L 2 , L 3 are obtained. At this moment the shape of the output L n is B × N pipe , in which B is the batch size, that is the amount of input data per training, N pipe is the number of pipes. 3. Upgrade the dimension of L n , and the shape of L n is B × N pipe × 1. L 0 is formed when L n is spliced together. L 0 has the shape B × N pipe × 3. 4. L 0 through a 3*1 linear layer, dimension is transformed into B × N pipe × 1. Perform dimension reduction on L 0 to obtain an output with a B × N pipe shape.

Datasets Preparation
The framework proposed in this paper is data-driven, so the amount of information contained in the datasets has an important impact on the accuracy of deep learning tasks. In this paper, the generated datasets consist of differential pressure data, these data contain information on different leak locations and leak sizes. In the first step, the network without leakage was simulated to access pressure data under normal conditions of each node. And  Fig. 4 Multi-supervision module structure then, define a new node and simulate leakage by setting the water requirement and flow rate of this node. To get the characteristics of pressure fluctuation caused by the leakage, subtract the pressure value at each point with leakage from the pressure value without leakage, and take the differential pressure as the value of the datasets.

Datasets Generation
In this paper, the network model is constructed from the INP file of EPANET and the hydraulic simulation was done using Pressure Driven Analysis (PDA). For a water supply network, the length of each pipe varies, so when determining the leak location of the pipe, use The Water Network Tool for Resilience (WNTR) to get the start and end nodes of the pipe. The location of the leak point is confirmed by declaring its location between the start node and end node. For example, when the leakage position is 0.5, indicates that the leak point is in the middle of the pipe. In the actual water supply network. In the actual water supply network, leaks can occur at any point in the network. To simulate the real situation, use uniform random numbers to define the leakage position. The random sample ranges from 0 to 1, excluding 1. For leak size Settings, leaks are modeled with a general form of the equation proposed by Crowl and Louvar (2019) where the mass flow rate of fluid through the hole is expressed as: where Q l is the leak demand, C d is the discharge coefficient, and for turbulent flow taken as C d = 0.75, A is the area of the hole, is an exponent related to characteristics of the leak, where α = 0.5 assuming a steel pipe with a large hole, p is the gauge pressure, and is the density of the fluid.
To obtain the leakage diameter information, the formula is expanded as: Among them, D leak is the leak diameter. In this paper, by introducing the leakage factor to adjust the diameter of the leakage, control the size of the leakage: where, D pipe is pipe diameter, f leak is the introduced leakage factor. For a pipe network, each pipe has a different diameter. As a result, f leak is introduced to imitate the randomness of leakage size under real-world pipe network operation, and f leak is a random sample value of 0.2-0.5 with a uniform distribution.
Because leaks in the water supply network are random, every pipe has the potential to leak. So, when the datasets are generated, each pipe is set as a potential leak pipe. In the simulation, each pipe is traversed. The leak location of each pipe and the size of leakages are random. When building the datasets, each run simulated a leaking pipe, its location along the pipe, and the leak diameter. The model traverses every location and diameter (3) D leak = D pipe f leak combination to generate the full network state simulations. After each simulation, the differential pressure from all of the nodes was saved to the datasets. The flowchart of data generation is shown in Fig. 5.
The main steps in datasets generation are summarized as follows: 1. Run the pipe network without leakage and obtain the pressure data without leakage of each node. 2. Set the number of simulations for each pipeline N, the leakage size of each simulation is random. Control the size of the dataset by adjusting N. The number of generated data is N × N pipe . 3. Set the location of the leak point. The random sample number of 0-1 following uniform distribution was used to define the leak location. 4. Set the leak size. Introducing leakage factor f leak , The random sample number of 0.2-0.5 following uniform distribution was used to define the leakage factor. The leakage diameter D leak is the product of the leakage factor and the pipe diameter D pipe . The generated leakage flow can be calculated by Eq. (2). 5. Set the label of datasets: leak pipe, leak point location. 6. Run the pipe network with leakage and obtain the pressure data with leakage of each node. 7. Subtract the data obtained in steps (1) and (6) to obtain the pressure difference data. 8. Repeat steps (2)-(7) until the set simulation times for each pipe are reached and each pipe is traversed.

Data Preprocessing
Since the pressure range of different instruments may vary depending on the position and height of the instruments, it is necessary to standardize the pressure values of different instruments in order to make the pressures of different instruments more comparable. In addition, standardization helps to improve the accuracy and efficiency of the training network. Before training the algorithm, the data must be standardized to a uniform scale. After normalization, the mean and standard deviation of each feature in the data set are 0 and 1 respectively. The data set is composed of pressure difference data for each node. Therefore, the normalized calculation formula is as follows: where x and x represents the original data and the normalized data, respectively. data and data represents the mean and variance of data, respectively.

Case Studies and Results
Two cases (a benchmark network and a relatively complex network) are studied to demonstrate the reliability and applicability of the proposed framework. For the training process, N is set to 300. And generate N × N pipe leaking data as the training samples.
To test the performance, N is set to 30. After being trained, predicts the location of the leakage in the test samples. Positioning accuracy for leaking pipeline of a test sample can be assessed by Eq. (5): TN is True Negative, the number of negative classes predicted as positive classes can be called false-positive rate. FP is False Positive, and the number of negative classes predicted as positive classes can be called false-positive rate. FN is False Negative, the number of positive classes predicted as negative classes can be called the false-negative rate. TP is True Positive, predicting positive classes as the number of positive classes.
The effect of accurate location of leaks can be evaluated by Eq. (6): where y is the true value on the test set, which is expressed by Eq. (7). ŷ is the predicted value on the test set, m is the number of test set samples.
where L leake is the leak point location, L start is the pipeline starting point position, Length pipe is the length of the corresponding pipe.

Case1
Anytown (Walski et al. 1987) network, which is a small WDN with high loop, is adopted in this paper to illustrate the application of the proposed scheme and its performance under different Settings. The network consists of 19 nodes, 40 pipes, 3 reservoirs, and 1 water pump. Pipe diameter is 200 mm-400 mm, and node basic demand is 12.5L /s-63.1L/s and N pipe = 40. It is assumed that the potential burst area is the entire network, that is, every pipeline in the network is the potential location of the leakage. The placement positions of different sensors include three different situations of the number of sensors N m in the pipe network, as shown in Table 1 (Zhou et al. 2019) and marked in Fig. 6.
For a water supply pipe network, in actual applications, the sensor layout is not unique. Therefore, the experiment has researched various sensor layouts. In the analysis, the number of sensors is changed to a different value each time, while the other parameters are left as default. In addition, for the problem that data set noise has a great influence on the model accuracy, 30-40dBz Gaussian white noise is added to the dataset and test set to verify the application type of the framework. The results of different sensor arrangements are shown in Table 1.
The result of the classification process under different N m values are presented in Fig. 7. It can be seen that the accuracy increases with the growth of the number of sensors. In the case of adding noise, when N m = 2 , the class accuracy is 0.76, However, the class accuracy becomes higher than 0.9 if N m = 3 . This indicates that the number and location of sensors are related to the leak detection results of the framework.
The result of the regression process under different N m values are presented in Fig. 8. The abscissa is the actual leak location, and the ordinate is the leak location predicted by the framework. As can be seen from the figure, scattered points are mostly distributed on the diagonal. This shows that the location of the leakage point on the pipeline is mostly correct. Similarly, the positioning results under different N m values were compared. As the number of sensors increases, the positioning capability of the framework improves.
Training epoch indicates the number of times that the network repeatedly learns the same training dataset. The default training epoch used in our study is 200. The effects of the training epoch are presented in Fig. 9. As shown in Fig. 9a, b. The class loss and the local loss of the model decrease with the iteration and finally stabilize. As shown in Fig. 9c, the proposed framework can achieve good results (acc > 0.80) after 45 training epochs, with minor improvement (acc > 0.90) if over 75 training epochs are used. The regression MSE changes are shown in Fig. 9d, the MSE tends to be stable after a period of oscillation, and is tend to be 0 during the training. In other words, most of these leaks are accurately located.

Case2
A relatively complex network Net3, which is a well-known example network from the literature (Diao et al. 2016) was used to test the reliability of the framework. A relatively complex network Net3 was used to test the reliability of the framework. The network layout is shown by 117 pipelines, 92 contacts, 2 reservoirs, and 3 pools. Assumes the potential attack area for the entire network. Pipe diameter is 202 mm-2514 mm, the pipe length is 3 m-3000 m, and N pipe = 117 . There are four pressure sensors and they are marked with red dots in Fig. 9  Each pipe in the network was analyzed, for the pipe section with poor positioning accuracy, they are marked with blue in Fig. 10. The hydraulic analysis of the net3 pipe network is carried out, running results are shown in Fig. 10. The blue section has a low flow rate, which is about 0.3-2.3 L/s, while pipes with a relatively large flow that about , it can be seen that for the pipe section with a small flow rate, the positioning accuracy is relatively reduced. Because when the pipeline flow is small, the pressure in the pipeline will be lower, and the corresponding pressure difference caused by leakage will be relatively small, which will affect the positioning effect. Classification results for Net3 are shown in Fig. 12a, the abscissa is the actual leaking pipeline, and the ordinate is the predicted leaking pipeline. Predictions of leaky pipes are mostly correct. It can be shown in Fig. 12b, as the number of iterations increases, the accuracy can reach more than 0.90.
The regression loss curve without multi-supervision is shown in Fig. 13a, while the regression loss curve under multi-supervision is shown in Fig. 13b. It can be seen from the figure that the convergence speed of the model is faster when adding multi-supervision. The regression results are shown in Table 2. It shows the location results of partial pipeline leakage points. It can be seen that for most of the leakage points, the predicted location is close to the actual leak location.

Discussion
First, compared with other related studies, the framework proposed in this paper is more accurate in leakage localization because it can narrow the leakage area to the exact position of the pipe section. As in this study, aiming at the case where the leak point is on the pipe segment, Hu et al. (2021) proposed a leak detection model based on Density-Based Spatial Clustering Application with Noise (DBSCAN) and Multi-Scale Fully Convolutional Network (MFCN) (DBSCAN-MFCN), which divides the pipe network using the clustering method and uses the cluster number of the pipelines as the category label to locate the leakage area. . The framework can narrow the potential leak area to one or more pipes. The proposed leak-localization framework in this paper outperforms other methods. As shown in Fig. 8 and Table 2, the framework can accurately determine the exact location of the leak point in the pipeline. Second, the performance of the framework is well under different conditions. In the case of Anytown, the influence of the number of sensors in the pipe network on localization accuracy is considered. In the case of using four sensors, the positioning accuracy of the pipeline can reach 0.94, and the MSE of the specific location of the leakage point is 0.000435. As shown in Figs. 7 and 8, satisfactory results can also be obtained with only two sensors; In the case where noise is introduced, it has been shown to perform well in Table 1. With the addition of 30-40 dB noise, the positioning accuracy of the pipeline can reach 0.89; In the case study of Net3, pipelines with a localization accuracy of less than 0.80 are examined. They are generally pipe segments with low flow in the pipe network. As discussed in Kallesoe and Jensen (2018), the larger the flow at nominal conditions, the higher the relative impact of leakage on pressure measurements. The smaller the flow rate, the lower the pipe section pressure, and the leakage characteristics are less noticeable. Therefore, when the leakage area is small, the pressure drop due to leakage is almost negligible, which will affect the positioning accuracy of the proposed framework; For the relatively complex network Net3, as shown in Fig. 13, the network convergence speed can be accelerated by adding the proposed multi-supervision mechanism.
Third, in further research, for the phenomenon that pipeline flow affects the accuracy of the framework, more optimization methods can be used to improve the neural network structure and improve its feature capture ability. Considering the experimental conditions, more complex networks have not been experimentally investigated in this study so far, and large-scale water supply networks will be considered in the next work.

Conclusions
In this paper, a leakage localization method for urban water supply pipe networks based on the ResNet network is proposed. This study puts forward the new idea of a parallel classification and regression process for leak localization, which enables the framework to pinpoint the exact position of leak points in the pipeline. In addition, a multi-supervision mechanism is designed in the regression process to accelerate the convergence of the model. The results of the two cases show that the framework can accurately identify the location of leakage points under different conditions. Satisfactory results can also be obtained with only two sensors. The positioning error is acceptable in the noisy environment. In the current research, there is a lack of research on the specific location of specific pipe segments in the leak detection of water supply networks. This research presents a possibility for precise location. In further work, more optimization methods can be adopted to improve the neural network structure and carry out experiments on more complicated pipe networks to compare with the current work.
Author Contribution All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by J Li, WJ Zheng, and CG Lu. The first draft of the manuscript was written by WJ Zheng and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Availability of Data and Materials
All authors make sure that all data and materials as well as a software application or custom code support the published claims and comply with field standards.

Declarations
Ethics Approval We certify that the submission is original work and is not published at any other publications.

Consent to Participate
All authors gave explicit consent to participate in this work.

Consent to Publish
All authors gave explicit consent to publish this manuscript.

Conflicts of Interest
The authors declare that they have no known competing financial interests.