0% found this document useful (0 votes)

9 views

A Two-Stage Model for Data-Driven Leakage Detection and Localization in Water Distribution Networks

Uploaded by

Nguyen Nguyen Khac

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

A Two-Stage Model for Data-Driven Leakage Detection and Localization in Water Distribution Networks

Uploaded by

Nguyen Nguyen Khac

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

water

Article
A Two-Stage Model for Data-Driven Leakage Detection and
Localization in Water Distribution Networks
Vineet Tyagi † , Prerna Pandey *,† , Shashi Jain † and Parthasarathy Ramachandran †

Department of Management Studies, Indian Institute of Science, Bangalore 560012, India;

[email protected] (V.T.); [email protected] (S.J.); [email protected] (P.R.)
* Correspondence: [email protected]
† These authors contributed equally to this work.

Abstract: Water utilities face the challenge of reducing water losses by promptly detecting, localizing,
and repairing leaks during their operational stage. To address this challenge, utilities are exploring
alternative approaches to detect leaks with high accuracy in a timely manner, while minimizing envi-
ronmental and economic consequences. This research proposes a two-stage model that relies on data
analysis to predict leak incidents and their specific locations in water distribution networks (WDNs).
By leveraging pressure and flow rate data collected from multiple points in the network, the model
first calculates prediction errors in pressure heads. Subsequently, statistical measures applied to these
error distributions are used to classify the occurrence and location of leaks. The suggested approach
is both cost-effective and easily deployable. Through simulation-based case studies conducted on
various benchmark networks, the efficacy of the proposed model is demonstrated. The results show
that the model effectively predicts leak occurrences and their respective locations. However, it should
be noted that as the network size increases, the model’s performance diminishes, resulting in reduced
accuracy. Later, the accuracy of leak prediction has been evaluated by examining its sensitivity to
varying numbers of sensors and different levels of noise.

Keywords: machine learning for leak detection; leak localization; water distribution network; sensors
for leak detection
Citation: Tyagi, V.; Pandey, P.; Jain, S.;
Ramachandran, P. A Two-Stage
Model for Data-Driven Leakage
Detection and Localization in Water 1. Introduction
Distribution Networks. Water 2023, Maintaining a network of pipes to supply water and remove wastewater can be
15, 2710. https://doi.org/10.3390/ challenging in an urban setting. One of the significant challenges faced by the water
w15152710 utilities lies in maintaining the pipe network infrastructure. Old and leaking pipes create a
Academic Editors: Gabriele Freni and loss of valuable resources and give rise to health and hygiene issues. The economic angle to
Mariacrocetta Sambito this problem is that the lost water fails to provide revenue to the utility. Non-revenue water
(NRW) is a significant issue for water utilities in developing countries. A large portion of the
Received: 16 June 2023 water put into the distribution system is lost due to leakage and theft. Poor infrastructure
Revised: 8 July 2023
maintenance leads to leakage, while inadequate metering and administrative setup lead
Accepted: 12 July 2023
to unauthorised consumption, resulting in NRW. For water pipelines, leakage reduces the
Published: 27 July 2023
available capacity for drinking and farming while the demand is already outstripping the
supply, hurting many of the poorest people.
In Bangalore, India, Ref. [1] identified that as much as 48% of the water entering the
Copyright: © 2023 by the authors.
distribution network in Bangalore is NRW. In absolute terms, around 590 million liters are
Licensee MDPI, Basel, Switzerland. lost per day. Though unauthorized access contributes to this loss, a significant part of this
This article is an open access article loss is also due to seepage and leaks in distribution mains and service pipes. Removing
distributed under the terms and these leaks requires significant continuous investment and effort.
conditions of the Creative Commons In developing urban centers, the water utility network can be vast, with heterogeneity
Attribution (CC BY) license (https:// in the pipes used and underground, making them difficult to monitor and maintain. For
creativecommons.org/licenses/by/ example, the water network in Bangalore is around 5600 km long. Parts of the piped
4.0/). network in the Bangalore central area may be more than 50 years old, with pipe dimensions

Water 2023, 15, 2710. https://doi.org/10.3390/w15152710 https://www.mdpi.com/journal/water

Water 2023, 15, 2710 2 of 19

ranging from 50 mm to 600 mm in diameter and materials used varying from cast iron,
galvanized iron, and PVC.
Typically, a hydraulic model is used to model a single-phase flow in a water pipeline
network. The hydraulic parameters—the water pressure and the water flow rate at different
network locations—are computed following the energy and mass conservation equations.
The hydraulic model also needs to consider the water network’s topological characteristics.
However, there are challenges associated with developing an accurate hydraulic model
for an operational urban water network. The primary challenges are (a) having accurate
real-time information on the demand from different nodes of the network, (b) having
sufficient information about the pipe characteristics for the entire network, and (c) a model
that incorporates all the characteristics of the actual system might be challenging to build
and operate.
We develop an interpretable machine-learning-based model that helps predict and
localize leaks in a water network with a relatively small risk of false alarm. The proposed
model uses the real-time pressure and flow measurements from a few sensors in the
network to detect leaks and their locations. As the leak detection model does not require
real-time demand information at each node, the exact topology of the pipeline network,
and the characteristics of the pipes in the WDN, it makes the model useful for practical
application. To produce a labeled dataset consisting of sensor flow and pressure readings,
we need to create controlled leaks on the network’s links. Consequently, the model requires
some knowledge about the network, specifically the locations of the links. The model
is capable of detecting leaks solely for the links it has been trained on using this labeled
dataset. We consider a water network with a few select nodes where pressure and flow
readings are measured and relayed in real-time using assumed sensors location. The real-
time data are then fed into a two-phase algorithm. The first phase involves learning the
relationship between the flow rate and the difference in pressure heads for all possible
pairs of nodes with sensors in the network. In the absence of the demand information,
this relationship between observed pressure head difference and flow rates at any two
nodes can only partially be learned. The residual error from the predictions made by the
model for a pair, and the observed difference in the pressure head of the pair, carries some
information. The second phase of the algorithm then learns, in a supervised manner, the
occurrence of the leak (or unaccounted demand) and its location based on the statistical
changes in the distribution of the above residual errors. Here, we demonstrate the model
for three diverse hypothetical benchmark water distribution networks where the data are
generated using the industry-certified hydraulic model EPANET.
A significant amount of research has been conducted on detecting leaks in water
distribution networks using various machine learning approaches [2–5]. The key points,
including the novelty and limitations, observed using the current two-stage methodology
in comparison to previous work, are discussed below:
1 The model can learn to identify leaks in all the links of the network using only a
limited number of pressure and flow sensors.
2 The leak localization is not too sensitive to noise in the sensor data. Therefore the
approach can provide an efficient and cost-effective solution for leak localization.
3 With only a fraction of sensors (compared to the number of links in the network), the
model is capable of localizing leaks with low sensitivity to the choice of sensor location.
4 The method is also successful in identifying leaks of various sizes.
5 The two-stage methodology, unlike many other machine learning models, is inter-
pretable, and therefore, would be better suited from an operational view for the
water utilities.
As a supervised learning model, it is necessary to create labeled datasets for flow and
pressure readings that correspond to leaks in each link. This can be achieved by creating
controlled leaks in each link. However, generating such a dataset can be a time-consuming
process. This can be seen as a limitation of the model.
Water 2023, 15, 2710 3 of 19

The paper is structured as follows. In Section 3, we describe the hydraulic model and
introduce the notations used in the paper. We present in Section 4 the simulation model used
in our implementation. In Section 5, we describe the application of methodology followed
for the identification of leaks. Section 6 describes numerical experiments conducted in
the HANOI network ([6]), Net3 ([7]), and C town network ([8]). Finally, we conclude our
findings in Section 7.

2. Literature Review
There is much research on leak detection and localization in a water distribution
network using either a model-based or data-driven approach. Ref. [9] provided a support
vector machine (SVM)-based approach that uses pressure data from different parts of the
network to predict and localize the leak in a water network. A relatively recent line of
research is focused on data-driven leak detection models. Ref. [10] provided an extensive
review of the data-driven approaches for burst detection in WDN. Their recommendations
include that data-driven strategies promise real-life burst detection, and reducing false
alarms for such a system is an important issue. Ref. [11] used error-domain model falsifica-
tion to detect leak regions in WDN. They also proposed a methodology to approximate
the demand at nodes in water supply and a method for estimating uncertainties through
experimentation. Ref. [12] compared the Bayesian probabilistic analysis, SVM, and artificial
neural network (ANN) approach for leak detection and determined the deficiencies of these
approaches under varying conditions. Ref. [13] used deep learning to narrow down the
pipe burst locations from a potential district to a few pipes in a WDN. The model trained
the neural network using simulated data from hydraulic models for pipe bursts. In this
framework, additional pressure meters were placed at limited, optimized places for a short
period (minutes to hours) to monitor system behavior after the burst and then localize the
leak location. Ref. [14] used a genetic algorithm to find the size and area of a leak, such
that the differences between the simulated and field-observed values for pressure head and
flow were minimized.
Ref. [15] proposed a multi-stage method for leak localization within the DMA with
the aid of active valve operations and smart demand metering. Each stage includes
partitioning the DMA into two subregions using valve operations and identifying poten-
tially leaking pipes within the subregions through water balance analysis based on smart
demand meters. Ref. [16] used an auto-encoder neural network (AN), an unsupervised
machine learning model, to detect a leak with unbalanced data—as water supply networks
mainly operate under no-leak conditions. Ref. [17] used a data-driven approach based on a
convolutional neural network for efficient flooding analysis and risk assessment in large
urban areas.
Ref. [18] proposed a novel multiple leak detection and localization framework
(MLDLF) based on the provided pressure and flow data. The methodology uses the
k-means clustering method to identify leak scenarios. Ref. [19] proposed a calibration
residual-based burst detection (CRBD) method that works on the output of a calibrated
model and burst localization using the vector angle method. Ref. [20] proposed a pressure-
data-based algorithm called the Leakage Identification and Localization Algorithm (LILA).
LILA identifies potential leaks using pairwise sensor pressure data and provides the lo-
cation of their nearest sensors. We refer the readers to [21] for a detailed review of recent
advances in data-driven and model-based approaches for leak detection and localization.
A drawback of model-based approaches is that they require highly calibrated hydraulic
models, and their accuracies are sensitive to modeling and measurement uncertainties. Hy-
draulic models also need real-time demand information to generate the flow characteristics
for the WDN. Our proposed leak detection model does not require the topology of the
network, the pipe characteristics, and real-time demand information. A data-driven model
learns the relationship between the flow rates and pressure head between pairs of sensors.
Changes in error (between predicted and observed readings) distributions from each sensor
pair are then used as an input to the model to predict the leak and its location. Compared
Water 2023, 15, 2710 4 of 19

to the existing models, the proposed model requires limited real-time information, is less
sensitive to the placement of sensors, and can predict leak locations and occurrences with a
low level of false-positive rates using a limited number of sensors.

3. Problem Formulation
We consider a water distribution network which has n p pipes, n j variable head nodes,
and n f fixed-head nodes. The head loss in all pipes in a network is assumed to be modeled
by the Hazen–Williams formula, so the relation between the heads at two ends (node i and
k) of a pipe-j and the flow is as follows:

Hi − Hk = r j Qnj , (1)

where Q j is the flow in pipe p j , Hi is the head at node i, n = 1.852, and r j is the pipe
resistance factor, which depends on the length, diameter, and material of the pipe. We
define q = ( Q1 , . . . , Qn p )> as the vector of unknown flows in the pipes.
The network topology is modeled by using the incidence matrices A1 ∈ Rn p ×n j and
A2 ∈ Rn p ×n f , for the unknown head nodes and the fixed head nodes, respectively. Both
these incidence matrices are defined as:

−1 if the flow in pipe j enters the node i

Ab = 0 if the j does not connect to the node i (2)

1 if the flow in pipe j leaves the node i,


where b = 1, 2. The unknown heads at different nodes are defined as h = ( H1 , . . . , Hn j )> ,

the known nodal demands as dm ∈ Rn j , and el ∈ Rn f the fixed head elevations. Addition-
ally we define the following matrices: O, an n j square zero matrix, o, an n p × n j zero matrix,
and an Rn p ×n p diagonal matrix G, with the following diagonal entries:

G jj = r j | Q j |n−1 .

The hydraulic problem is then used to solve for the unknown flow in the n p pipes, q,
and the unknown heads at the n j nodes, h, given the network topology, Ab , the demand
at nodes dm , and fixed head elevation, el , such that the mass and energy for the flow are
balanced. The continuity equation to be solved in matrix form can be written as (see [22]
for details):
G − A1 q A2 e l
f (y) = − = o, (3)
−A1> O h dm

where we solve for the unknown y := (q> , h> )> . The above set of equations are typically
solved using Rossman’s popular program EPANET ([23]) to obtain the steady-state solution.

3.1. Simulation: Base Scenarios

We mimic an operational water distribution network, by simulating the hydraulic
parameters of the network, i.e., q and h, by solving the steady-state Equation (3), for a
range of fixed head elevation el , and nodal demands dm . We assume that el , and dm , are
stochastic and drawn from a multivariate normal distribution. Specifically x := (e> > >
l , dm ) ,
has the following distribution:
x ∼ N (µ, Σ), (4)

where µ is an R(n f +n j )×1 vector of mean head elevation and nodal demands, while Σ is
the corresponding covariance matrix. If the water surface elevation level in the tanks or
reservoir is constant, we set the corresponding variance entry in the covariance matrix
as zero. We first draw x1 , . . . , x Nbase , Nbase samples of vector x, from the multivariate dis-
tribution provided in Equation (4). For a given sample of elevation and nodal demands,
xi , i = {1, . . . , Nbase }, the corresponding ybase
i , the steady-state values of the unknown heads
and flows are determined by solving Equation (3). Denote Pbase as the joint distribution of
Water 2023, 15, 2710 5 of 19

y, obtained as above. We assume there are ns , sensors (ns < n j ), located at the subset of the
n j nodes of the network, and only the simulated pressure and flow values at these sensor
nodes are used as input to our model, making it a low-dimensional representation of the
high-dimensional y. The flow and pressure values at the sensor nodes, generated as above,
serve as the base line data for the study.
In this work, we do not optimize the placement of sensors. Instead, we adhere to the
common practice of locating the sensors at critical measuring points (CMPs). These CMPs
are chosen based on the criteria that the required pressure gets lower than the minimum
pressure necessary to reach the consumers at the certain nodes. Furthermore, we ensure
that the sensors are positioned away from the source to account for the substantial head
loss that occurs in the network as the supply reaches the farthest points.

3.2. Simulation: Scenarios with a Leak

Following [24], the demand due to leak—the mass flow rate of fluid through the
hole—is expressed in a general form:
s
2
dleak = Cd Apα , (5)
ρ

where dleak is the equivalent water demand due to leak, Cd is the discharge coefficient,
where we use a default value of 0.75, A is the area of the hole in the pipe, p is the internal
water pressure, ρ is the density of the fluid, and the exponent α is a unitless parameter
related to the characteristics of the leak. We use the default value of α = 0.5, which results
in an equivalent equation: p
dleak = Cd A 2gh,
where g is the acceleration of gravity and h is the gauge head. We use the Water Network
Tool for Resilience (WNTR) ([25]), a Python-based package that is compatible with EPANET,
for our simulation for both leak and no leak scenarios. The leak is added to a location
in a pipe by splitting the pipe into two sections and adding a node with the demand
characteristics, as specified in Equation (5).
With this additional leak-node k, and Nleakk ,k with stochastic demands x , . . . , x
1 Nk ,leak
sampled from the distribution in Equation (4), the corresponding yleak , the steady-state
values of the unknown heads and flows, for all the nodes and pipes of the network are
then obtained by solving Equation (3). These samples represent flow characteristics with
an incremental leak added to the network. We denote Pleak k as the joint distribution of y,

obtained as above.
Given a series of pressure and head values at the ns sensor nodes, we want to label
them as data from the base scenario ybase , or the leak scenario yleak . In case of leak scenarios,
we would like to identify the node k where the leak was introduced.

4. Methodology
Let S be the set of all nodes in the network. Define Ss , a subset of S with cardinality
ns , containing any ns elements in S. Ss can be seen as the set of nodes where the sensors
for measuring pressure heads and flows are located. Define hs ≡ ( Hi )> , i ∈ Ss and
qs ≡ ( Qi )> , i ∈ Ss as a vector of pressure heads, and flows measured, respectively, at the
nodes with the sensors for the base network (where no additional leak demand node is
added). We first model the relationship between the nodal pairs in Ss , for the base network.
There will be s C2 such pairs, and for each pair, we fit a linear regression model:

Ey∼ Pbase ∆hij | Qi , Q j = β 0 + β 1 Qi + β 2 Q2i + β 3 Q j + β 4 Q2j ,

(6)
Water 2023, 15, 2710 6 of 19

where i, j ∈ Ss , i 6= j, ∆hij = Hi − Hj , and eij is the unexplained error for the pair. The
residual error for the i − jth pair is defined as:

eij = Ey∼ Pbase ∆hij | Qi , Q j − ∆hij .

(7)

Training1
Of The Nbase , base scenarios generated, as described in Section 3.1, a subset Nbase
is used to obtain the coefficients for the i − jth pair, β ij ≡ ( β 0 , . . . , β 4 )> , using the ordinary
least squares regression. The flow readings for the pair can only partially explain the
variation in the difference between pressure heads for the two nodes, as we cannot fully
explain the outcomes of the high-dimensional model (as described in Section 3) using a
low-dimensional dataset. If the high-dimensional datasets are available, a machine learning
model could precisely learn the functional relationship between flow and pressure heads.
Any deviation from this learned relationship would be adequate for leak detection. In
the current setup, as only partial information of this high-dimensional data is available,
the model will make predictions with error. However, changes in the distribution of the
prediction errors can be used to localize leaks. This is what is achieved in the second stage
of the model. However, as long as the network topology, and the distribution of the input
to the hydraulic model is fixed, i.e., x ∼ N (µ, Σ), the distribution of y and residual error eij
will remain unchanged. As ordinary least squares is an unbiased estimator, the mean of eij
would be close to zero [26].

4.1. Identifying the Occurrence of Leaks

Once an additional leak node is added to the network, the distribution of the outcomes,
y, changes from Pbase to Pleak . Now, ∆hij and Qi , Q j are not drawn from the distribution
Pbase , used to train the model in Equation (6), and thus, the distribution of the residuals,
as obtained in Equation (7) cannot be guaranteed to be centered around mean zero, or in
general have the same distribution as when ∆hij and Qi , Q j , are drawn from Pbase . As we are
dealing with a discrete sample of y, and as a result discrete samples of eij , if we have known
sampled errors for the base case (which serves as the reference empirical distribution),
we can infer the change in the distribution of y, by statistically measuring the changes
in the distribution of eij . The Kolmogorov–Smirnov test (KS test) is a commonly used
statistical non-parametric test of the equality of one-dimensional probability distributions
ij
that can be used to compare two samples. If F1,n is the empirical distribution function for
ij
the base residual errors with n iid samples for nodal pair i − j, and F2,m is the empirical
distribution function for unknown residual error (it is unknown whether it is an error when
base scenarios or scenarios with a leak at k are used) with m iid samples for the same pair,
then the KS-statistics is defined as:
ij ij ij
Dn,m = sup | F1,n ( x ) − F2,m ( x )|. (8)
x

ij
We could use the KS statistics, Dn,m , to accept or reject the null hypothesis that sample 2
is drawn from the same distribution as sample 1 for a given level α, and therefore, draw
conclusions with a certain level of confidence—whether there is a leak in the network or
not—however, it would not help us deduce at which node the leak exists. This is because
there could be several nodal pairs for which the distribution of residual errors could change
when a leak is introduced in the network.

4.2. Identifying the Location of Leaks

We consider a case where a single incremental leak is introduced to the network at
one of the n p links. We want to develop a supervised learning-based model that can, given
a sequence of pressure head and flow readings at the nodes with sensors, help us infer
which of the n p links the leak was introduced. We use a multinomial logistic regression to
model the posterior probabilities of the occurrence of a certain class (here, a leak at one of
Water 2023, 15, 2710 7 of 19

the n p links or no leak). For our leak detection model, we use z = ( Dij )> i, j ∈ Ss , i 6= j as
the predictors. As the residuals are normally distributed, alternatively, we also consider the
following statistics for our predictors:
n
1
∑ eijbase ,
ij
µ1 =
n l =1

and
m
1
∑ eijunknown ,
ij
µ2 =
m l =1
ij ij
namely, z = (µ1 , µ2 )> i, j ∈ Ss , i 6= j. Here, n and m are the number of samples we pick
from the base scenarios and scenarios with a particular label (leak at one of the nodes
among K or no leak). While training, the label k would be known, and we would then test
on a new test dataset, where the label is not revealed to test the accuracy of the model. We
generate z1 , . . . , ztrain2
Nk , Nktrain2 , samples for each label k = 0, . . . , n p , with k = 0 being the
label for no leak, k = 1 corresponds to a leak at link 1, and so on, and δ is the coefficient.
The logistic regression model would then be of the form:

exp(δk0 + δk> zi )
Pr ( G = k|Z = zi ) = , k = 0, . . . , n p ,
1 + ∑lK=−01 exp(δl0 + δl> zi )
1
Pr ( G = K |Z = zi ) = (9)
1 + ∑lK=−01 exp(δl0 + δl> zi )

4.3. Summary of the Method

Here, a summary of the steps involved is described.
Generating base scenarios and regression:

Step 1: Draw Nbase demands from the multivariate normal distribution.

Step 2: Using each set of the Nbase demands, solve for flow and pressure at the links and
nodes of the network.
Step 3: Retain the flow and pressure data for nodes marked as sensors and mark the
dataset as base scenarios.
Step 4: For each sensor node pair, perform an ordinary least squares using a subset of the
Nbase base scenarios, following Equation (6).

Generating labeled leak scenarios:

Step 5: Introduce leak node with a given area at the center of the jth link, j ∈ [1, n p ].
Step 6: For the demand in the remaining nodes, draw Nleak samples from the multivariate
normal distribution.
Step 7: Using each set of the Nleak demands, with an additional leak node as described in
Step 5, solve for flow and pressure for the network.
Step 8: Retain the flow and pressure data for nodes marked as sensors and mark the Nleak
dataset with a leak link id j.
Step 9: Repeat Step 5 to Step 8 for all the links.

Generating predictors:

Step 10: From the Nleak scenarios labeled as ’leak in link j’, pick m ≤ Nleak randomly
chosen scenarios without replacement.
Step 11: For each of the m scenarios, find the residual errors for each sensor pair, where
prediction is made using coefficients from [Step 4].
Step 12: Pick n ≤ Nbase randomly chosen scenarios without replacement from the Nbase
base scenarios.
Water 2023, 15, 2710 8 of 19

Step 13: For each of these n scenarios, find the residual errors for each sensor pair, where a
prediction is made using coefficients from [Step 4].
ij ij
Step 14: Compute the predictors for each sensor pair i − j, i.e., Dij , µ1 , and µ2 , i, j ∈ Ss , i 6= j.

train2
Step 15: Repeat Step 10 until Step 14 Nleak train2
in link j times, to obtain z1 , . . . , z Nleak in link j , with
label ’leak in link j’.
Step 16: Repeat Step 10 till Step 15 for a leak in all j ∈ [1, n p ] links.

Classification:

Step 17: With labeled z data, train the logistic regression model, as described in Section 4.2
with appropriate thresholds for multinomial classification of leak location. The
process flow of approach is as shown in Figure 1.

Figure 1. Process flow of linear regression and logistic regression.

5. Application of the Methodology on Water Distribution Networks

5.1. Data Simulation
The base water demand in all the nodes of the network is assumed to be known.
Several artificial demand patterns were generated assuming a multivariate Gaussian dis-
tribution for the joint demand distribution at all the nodes. The mean of the multivariate
Gaussian distribution is taken as the base water demand. The demand standard deviation
was assumed to be in between 16% and 20% for all the nodes (we aimed to examine a
scenario involving high variability in the demand at nodes, which is often the case as
temporal demand values fluctuate throughout the day. Additionally, we conducted a
sensitivity analysis regarding the demand variability and found that the model’s outcome
is not significantly influenced by this variability). A constant correlation coefficient of
0.6 between any two pairs of nodes was used to generate the covariance matrix. The
above choice of standard deviation allows wide variation in demand, which is typically
expected throughout the day. EPANET 2.0 and WNTR Python package were used to
perform hydraulic simulation of this network for the various demand scenarios gener-
ated above (the python code and scripts used for the experiments are made available at
https://github.com/adOption/DPTrans (20 September 2021)). The node heads and link
flow values from the simulation at specified pre-defined nodes were used instead of the
sensor data.
A leak node was introduced in each of the n p links, one after the other. These leaks
were positioned exactly at the midpoint of each link for training the model while for
creating the test dataset, the leaks were introduced randomly at either 3/10 and 7/10 of the
Water 2023, 15, 2710 9 of 19

length of the pipe. For the training dataset, we create equal samples of the leak node with
four variations of leak area, i.e., 0.0005 m2 , 0.002 m2 , 0.003 m2 , and 0.004 m2 . For the test
dataset, the area had the following three variations, 0.0001 m2 , 0.001 m2 , and 0.005 m2 . The
generated scenarios were labeled as k = 0 for no leak, k = 1, . . . , n p , where n p is equal to
the number of links. The flow in the links and pressure head at sensor nodes were recorded
along with the labels for both training and test datasets.
The experimental configuration for all the case studies is illustrated in Figures men-
tioned in section 6 . The nodes where pressure sensor is introduced is illustrated by green
circles, and the links where flow sensors are placed is depicted by orange lines. The choice
of these sensor locations is based on factors such as criticality and distance from the sources.
However, the sensor placement optimization becomes less critical when sufficient number
of pressure and flow sensors are available, as would be demonstrated in the case studies
discussed in Section 6. In the model setup, only one leak node is active at any given time.

5.2. Regression
We fit regression models for each sensor pair combination using a subset of samples
from the base no-leak scenarios (there will be n C2 such sensor pairs). For any given nodal
pair i − j, the regression model predicts the ∆hij , given the flow rates at the links associated
with the two nodes. As discussed earlier, the regression model will only be able to partially
explain the delta variations in the head. Figure 2 shows the predicted ∆hi,j from the linear
regression against the observed ∆hi,j for the sensor node pair i and j, when only data from
the base scenarios are used. Furthermore, the residual error is plotted, which, as expected,
is centered around a mean of 0 and has a standard deviation lower than the standard
deviation of the ∆hi,j observed.

(a) ∆hi,j observed vs. predicted.

(b) Distribution of ei,j observed

Figure 2. For the sensor node pair i and j, the regression model helps in partially explaining the
variations observed in ∆hi,j .
Water 2023, 15, 2710 10 of 19

5.3. Impact of Leaks on Residual Errors

Using the regressed functions for the sensor pairs, we predict ∆hij using flow data from
scenarios that we labeled as no leak when k = 0 and leak nodes when k = 1, . . . , n, where
n = number of links. The residual errors between the predicted ∆h and observed ∆h for
scenarios with leaks differ in distribution when compared to the corresponding residual
error distribution when only base scenarios are used.
Figure 3 compares the residual error in prediction of ∆hi,j and ∆hi,j for no-leak scenar-
ios and scenarios with a leak at any link in the network. While the distribution of residual
errors is unbiased for the base scenarios, the errors are no longer guaranteed to be unbiased
when a leak is introduced. Additionally, a single leak will impact the error distribution
between most sensor nodal pairs. In Figure 3, we see that a leak introduced in any link
impacts both the situation when there is no leak and when there is a leak.

(a) ∆hi,j observed vs. predicted.

(b) Distribution of ei,j of no leak and leak case

Figure 3. For the sensor node pair i and j, regression model helps in partially explaining the observed
variations in ∆hi,j when a leak is introduced in a link.

5.4. Classification
Figure 3 demonstrates that the introduction of a leak in the network would impact
the residual errors across sensor nodal pairs. Even while comparing the distribution of
Water 2023, 15, 2710 11 of 19

residual errors between base no-leak and labeled no-leak scenarios, there will be deviations
depending upon the sample size n, m used. As discussed in Section 4.2, we, therefore, use
the supervised learning model of multinomial logistic regression to classify whether a set
of residual errors are coming from k = 0, i.e., the no-leak case, or with a leak in the kth
link, when k ∈ [1, . . . , n p ]. The input to the logistic regression model is the vector z, whose
elements are as described in Section 4.2. We use a set of n = m = 500, error samples from
the base no-leak scenarios and scenarios labeled as a leak at a particular link, respectively.
For this set, as described in Section 4.1, we obtain the K-S statistics, the sample mean for
the errors with base no-leak scenarios, and the sample means of errors with the labeled
scenarios to obtain a single instance of z.
We test the efficacy of the trained model by predicting the occurrence of leak and
location of the leak node for a test dataset with Nktest =0 = 500, for the no-leak case, and
Nktest = 500, k = 1, . . . , n p for the different links with a leak. The predictors for the test
data are prepared following the same procedure as that for the training data. The error
metrics to assess the performance of the leak classification has been generated using the
test dataset.

5.5. Impact of Noisy Sensor Data

To make the simulated experiments closer to reality, we want to account that the sensor
data typically would have an additional measurement noise. Thus, we add Gaussian white
noise to the simulated flow and pressure data at the sensor nodes. We varied the standard
deviation of the white noise used between 0.25% and 10%. Thus, we have,
b = h (1 + e> )
h eh ∼ N (o, Σe );
h

where Σe is a diagonal covariance matrix with diagonal entries as σ for the appropriate
noise level. Similarly:
qb = q (1 + e>
q ) eq ∼ N (o, Σq ).
We use the noisy pressure head data and flow data as the inputs to our model.

5.6. Error in Terms of Topological Distance

Topological distance refers to the shortest distance measurement between two links.
In this study, it is utilized to determine the distance between the true and predicted leak
locations. It provides an indication of how far the model has predicted the leak location
in comparison to the original location by counting the number of links between the two
along the shortest path. Alternatively, instead of counting the number of links between
the true and predicted leak location, one can also directly measure the distance between
them along the shortest path. The python package NetworkX has been used to measure
the topological distance [27].
We introduce the term average topological distance (ATD), as the average of the
topological distance between the predicted and true leak location for the Ntest scenarios.
The ATD should be close to zero for a good leak localization model.

6. Results
The leak detection framework described in the previous sections is applied to the three
different WDNs to demonstrate its effectiveness. These are composed of:
(a) Hanoi WDN (WDN1);
(b) Net3 (WDN2);
(c) C-Town Network (WDN3).
As shown in Figure 4. The example networks considered vary from each other in
terms of their size and complexity. The WDN1 consists of n j = 31 nodes and n p = 34 pipes.
The WDN2 has n j = 92 nodes, n p = 117 pipes, 2 reservoirs, and 3 tanks, while WDN3 has
n j = 388 nodes, n p = 429 pipes, 1 reservoir, and 7 tanks. The other topological details of
the networks can be found in [6–8], for WDN1, WDN2, and WDN3, respectively.
Water 2023, 15, 2710 12 of 19

Figure 4. Network topologies of the three example networks, (a) Hanoi WDN, (b) Net3 WDN, and
(c) C-town WDN.

We use a combination of critical measuring points (CMP) and distance from the
source for identifying the sensor location. The choice of location for the sensors in the
three networks is provided in Figure 5. The pressure and flow sensors are assumed to
be positioned in the same location. This assumption is necessary because the regression
in Stage 1 focuses on modeling the relationship between flow and pressure using a pair
of sensors.

Figure 5. Network topologies with sensors location of the three example networks (a–c). The green
dot represents the location of the pressure sensor, while the orange line represents the presence of the
flow sensor.

6.1. Case Study 1: Hanoi WDN (WDN1)

The developed methodology is initially implemented and demonstrated on a small-
scale water distribution network. The sensors are strategically installed at seven locations
in links 4, 18, 17, 20, 27, 30, and 33 to capture pressure and flow data. Linear regression
models are then fitted to predict the measured pressure difference between each sensor
pair using the flow readings for the pair. Subsequently, a multinomial logistic regression is
fitted to classify the leak location for the labeled training dataset. Finally, the model is used
on a separate test dataset to predict the location of the leak in the 34 links of the network.
The model is evaluated for varying number of sensors used and different levels of noise in
the sensor reading.

Experiment with Varying Number of Sensors and Noise in Sensor Data

We first test the accuracy in predicting the leak location (including the identification
of the no-leak cases) for the network using three sensors (links 8, 20, and 27), five sensors
(links 4, 18, 17, 30, and 32), and seven sensors (links 4, 18, 17, 20, 27, 30, and 33) and for the
following noise levels: 0.5%, 1%, 2.5%, 5%, and 10%. The accuracy is defined as the number
of scenarios where the link with a leak was correctly identified over the total number of
scenarios. We see in Figure 6 that the best accuracy in identifying leak location is 91%, while
the lowest accuracy achieved was 79%. The accuracy increases with number of sensors
Water 2023, 15, 2710 13 of 19

used, although roughly similar levels are achieved with five and seven sensors. As the
noise in the sensor data increases, as expected, we see poor performance in identifying the
leak location accurately. However, even when 2.5% noise was added to the sensor data, the
method achieved satisfactory accuracy in leak location identification.

Figure 6. Accuracy and ATD variation with Noise Levels and Sensor Count in WDN1.

Another measure that we use to check the efficacy of the model is the ATD. The ATD
measures, on average, how far from the actual leak link the algorithm predicted the leak
location (Section 5.6). A good leak detection algorithm would have an ATD value close to
zero. We see in Figure 6 that ATD increases as we reduce the number of sensors, or increase
the level of noise in the sensor reading. We see that even for the worst case, with only three
sensors and a noise of 10% in the sensor reading, the algorithm on average predicts the
leak link within 1.3 links from the actual leak location. Notably, ATD achieved using five
sensors is as good as results obtained using seven sensors.

6.2. Case Study 2: Net3 (WDN2)

To evaluate the scalability of the model described in the previous case study, its
application is expanded to a medium-sized network. The model is trained to detect
leaks in any of the 117 links within this network, which is relatively more complex than
WDN1. Unlike WDN1, WDN2 comprises five sources (three tanks, two reservoirs) and two
pumping stations (assumed as non-operational). For this network, a total of six sensors are
strategically installed at links 18, 31, 82, 89, 108, and 110 to record flow and pressure data.
Additionally, the model is tested with varying numbers of sensors and different levels of
noise in the sensor data.

6.2.1. Experiment with Varying Number of Sensors and Noise in Sensor Data
The accuracy in predicting the leak location (including the identification of the no-leak
cases) is tested for three sensors (links 31, 89 and 110), five sensors (links 18, 31, 82, 89
and 108) and six sensors (links 18, 31, 82, 89, 108, and 110) and for noise levels of 0.5%,
1%, 2.5%, 5%, and 10%. From Figure 7, we observed that the highest accuracy achieved in
identifying leak locations is 79%, while the lowest accuracy obtained is 49%. The accuracy
improves as the number of sensors used increases, although similar levels of accuracy are
roughly attained with five and six sensors. As the level of noise in the sensor data increases,
a decline in the accurate identification of leak locations is expected. However, even when
1% noise was introduced to the sensor data, the model was still able to achieve satisfactory
accuracy in identifying leak locations.
Water 2023, 15, 2710 14 of 19

Figure 7. Accuracy and ATD variation with noise levels and sensor count in WDN2.

We can observe from Figure 7 that as the number of sensors decreases or the level of
noise in the sensor readings increases, the ATD increases. The ATD in the best scenarios
(with five and six sensors) ranges between 2 and 2.3. Even in one of the worst-case scenarios,
with only three sensors and a 2.5% noise level in the sensor readings, the algorithm achieves
an average prediction of the leak link within a distance of 3.8 links from the actual leak
location. Importantly, the ATD attained using five sensors is comparable to the results
obtained with six sensors, highlighting the effectiveness of the algorithm with a reduced
sensor count.

6.3. Case Study 3: C-Town Network (WDN3)

The framework is deployed in a larger benchmark network with increased complexity,
consisting of one reservoir, seven tanks (eight sources), and three valves (considered fully
opened). Two test cases are conducted, involving different numbers of sensors (10 and
5). The initial placement of sensors is performed at links 423, 250, 363, 228, 218, 74, 166,
407, 415, and 136, which correspond to 10 specific links. The accuracy and ATD are then
compared across five distinct levels of noise. Additionally, the model is employed to predict
the location of the leak in the 429 locations of the network. The model is evaluated for the
varying number of sensors used and different levels of noise in the sensor reading.

6.3.1. Experiment with Varying Numbers of Sensors and Noise in Sensor Data
The accuracy in predicting leak locations (including the identification of non-leak
cases) is tested using five sensors (links 166, 72, 228, 250, and 423) and another with ten
sensors (links 423, 250, 363, 228, 218, 74, 166, 407, 415, and 136) and for the noise levels
as predefined in Section 6.2.1. The highest accuracy achieved when locating leaks is 30%,
while lowest accuracy observed is 10%, as shown in Figure 8. The accuracy improved with
an increase in the number of sensors. Additionally, accurate identification of leak locations
is observed as the noise levels in the sensor data increase. However, even with minimal
noise levels, the accuracy was affected, resulting in a reduction to 25%.

Figure 8. Accuracy and ATD variation with noise levels and sensor count in WDN3.
Water 2023, 15, 2710 15 of 19

From Figure 8, it is observed that as the number of sensors decreases or the level of
noise in the sensor readings increases, there is an increase in ATD. In the best-case scenarios
(with ten sensors), the ATD ranges from 8 to 12. Even in one of the worst-case scenarios,
where only five sensors are used and the sensor readings have a noise level of 0.5%, the
algorithm achieves an average prediction of the leak location within a distance of 12 links
from the actual leak location.

6.3.2. Comparative Analysis of the Results from Three Networks

The results obtained using the framework is compared, as shown in Figure 9.

Figure 9. Comparison of the accuracy achieved across three distinct water networks.

It is evident from the figure that as the size and complexity of the network increases,
the obtained accuracy proportionately decreases. The highest accuracy obtained for the
three networks WDN1, WDN2, and WDN3 is 92%, 79%, and 34% respectively.
While the accuracy obtained in identifying the leak location in larger network is low,
a closer look at the predicted leak location shows that the method still provides useful
insights. Figure 10 illustrates a few examples of true leak location and the corresponding
predicted link with a leak. For our accuracy measurement we only count those scenarios
where the link with a leak is precisely identified as correct classification.
Water 2023, 15, 2710 16 of 19

Figure 10. Comparison between actual and predicted leak locations in WDN3.

If we provide a threshold instead, where the predicted link with a leak is within the
given tolerance of topological distance, we classify it as a correct identification.
In order to evaluate the accuracy of the model in classifying the leak location more
comprehensively, we classify cases where the predicted link with a leak is within a topolog-
ical distance of 2, 5, 8, 10, and 20 links from the true leak location. Figure 11 shows that
even for a significant level of noise in the sensor reading, using ten sensors, the algorithm
is able to predict leaks within a topological distance of 10 links. A practical use case is that
using a few sensors the leak location is coarsely identified in the network and the actual
location can then be identified by additional instrumentation around this region.

Figure 11. Accuracy of the leak location classification of the WDN3, corresponding to different levels
of tolerance in topological distance between the true and the predicted link with a leak.
Water 2023, 15, 2710 17 of 19

7. Conclusions
We have developed an algorithm to detect leaks in water distribution network using a
two-stage methodology, where in the first stage we learn the relationship between the flow
and pressure readings of the installed sensors in the network with no leaks. In the second
stage, using a labeled dataset of leak location, we fit a multinomial logistic regression model
using as input the change in the error distribution of the predicted head between sensor
pairs. The premise of our approach is based on the observation that the introduction of a
leak alters the distributional properties of hydraulic parameters, specifically pressure and
flow rates. By measuring the flow characteristics at specific locations in the network, we aim
to distinguish between different leak locations or the absence of a leak by mapping changes
in a high-dimensional distribution to a low-dimensional error distribution. Introduction of
a leak will result in a discernible change in the error distribution, enabling us to locate the
leaks effectively.
By utilizing a simulated dataset, we implemented our approach on three benchmark
networks, namely WDN1, WDN2, and WDN3. The results demonstrated that our model
exhibits reasonably good accuracy in classifying leak locations within these networks.
Specifically, for WDN1 and WDN2, we observed that our approach successfully identified
leaks even when they had a small orifice area. However, as the noise in the data increased
and the number of sensors decreased, the method occasionally resulted in misclassifcation.
Nevertheless, it is worth noting that even when the number of sensors was slightly reduced,
the model maintained a relatively high level of accuracy.
In contrast, when dealing with a larger-sized water network WDN3, the accuracy
of our approach was significantly compromised. It became evident that the accuracy
did not hold up well as the network size increased. We observed a substantial number of
misclassifications in accurately identifying the specific links where leaks occurred. However,
as the predicted leak location links where in close neighbourhood of the true leak location,
we incorporated a tolerance factor in classifying whether the leak location has been correctly
identified. This means that even if the identified leak location was in adjacent links rather
than the exact one, it would still be considered an acceptable classification. With this
modification, the accuracy of identifying leak in the network was found to be close to 60%,
even in the presence of high levels of noise in the sensor readings. Therefore, while the
method cannot precisely identify the leak location, we show that it is able to identify the
area within which the leak is present.
In summary, our algorithm exhibited promising performance in classifying leak lo-
cations in the benchmark networks, demonstrating its potential usefulness in practical
scenarios. Despite some challenges posed by increased noise and reduced sensor numbers,
the model’s accuracy remained satisfactory, showcasing its robustness to certain variations
in the data. By creating a labeled dataset of controlled leaks in the links of a real WDN, the
presented approach can be used to learn to locate new leaks in the network. As a future
step, the two-stage methodology will be applied to a real-life network, which will help
validate the assumptions of the simulation-based model.

Author Contributions: V.T. played a significant role in shaping the methodology, conducting formal
analysis, and validating the results. P.P. was primarily responsible for crafting the initial draft and
overseeing its revision. S.J. played a key role in conceptualizing the methodology and validating the
results. Furthermore, he made remarkable contributions in writing, reviewing, and editing the main
draft. P.R. provided invaluable technical expertise, resource coordination, problem formulation, and
meticulous review and editing of the final draft. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: No data were utilized in this study. Therefore, data availability is
not applicable.
Water 2023, 15, 2710 18 of 19

Acknowledgments: The authors would like to thank MeITY, India, for providing financial support
to implement the “DP-Trans ( Digital twin for Pipeline TRANSport Network) project”.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Raj, K. Sustainable urban habitats and urban water supply: Accounting for unaccounted for water in Bangalore City, India. Curr.
Urban Stud. 2013, 1, 156. [CrossRef]
2. Mashhadi, N.; Shahrour, I.; Attoue, N.; El Khattabi, J.; Aljer, A. Use of machine learning for leak detection and localization in
water distribution systems. Smart Cities 2021, 4, 1293–1315. [CrossRef]
3. Fan, X.; Yu, X. An innovative machine learning based framework for water distribution network leakage detection and localization.
Struct. Health Monit. 2022, 21, 1626–1644. [CrossRef]
4. Fares, A.; Tijani, I.; Rui, Z.; Zayed, T. Leak detection in real water distribution networks based on acoustic emission and machine
learning. Environ. Technol. 2022, 1–17. [CrossRef]
5. Kammoun, M.; Kammoun, A.; Abid, M. Experiments based comparative evaluations of machine learning techniques for leak
detection in water distribution systems. Water Supply 2022, 22, 628–642. [CrossRef]
6. Fujiwara, O.; Khang, D.B. A two-phase decomposition method for optimal design of looped water distribution networks. Water
Resour. Res. 1990, 26, 539–549. [CrossRef]
7. Bashi-Azghadi, S.N.; Afshar, M.H.; Afshar, A. Multi-objective optimization response modeling to contaminated water distribution
networks: Pressure driven versus demand driven analysis. KSCE J. Civ. Eng. 2017, 21, 2085–2096. [CrossRef]
8. Sousa, J.; Muranho, J.; Sá Marques, A.; Gomes, R. Optimal management of water distribution networks with simulated annealing:
The c-town problem. J. Water Resour. Plan. Manag. 2016, 142, C4015010. [CrossRef]
9. Mashford, J.; De Silva, D.; Marney, D.; Burn, S. An approach to leak detection in pipe networks using analysis of monitored
pressure values by support vector machine. In Proceedings of the 2009 Third International Conference on Network and System
Security, Gold Coast, QLD, Australia, 19–21 October 2009 ; IEEE: Piscataway, NJ, USA, 2009; pp. 534–539.
10. Wu, Y.; Liu, S. A review of data-driven approaches for burst detection in water distribution systems. Urban Water J. 2017,
14, 972–983. [CrossRef]
11. Moser, G.; Paal, S.G.; Smith, I.F. Leak detection of water supply networks using error-domain model falsification. J. Comput. Civ.
Eng. 2018, 32, 04017077. [CrossRef]
12. Van der Walt, J.; Heyns, P.S.; Wilke, D.N. Pipe network leak detection: Comparison between statistical and machine learning
techniques. Urban Water J. 2018, 15, 953–960. [CrossRef]
13. Zhou, X.; Tang, Z.; Xu, W.; Meng, F.; Chu, X.; Xin, K.; Fu, G. Deep learning identifies accurate burst locations in water distribution
networks. Water Res. 2019, 166, 115058. [CrossRef] [PubMed]
14. Sophocleous, S.; Savić, D.; Kapelan, Z. Leak localization in a real water distribution network based on search-space reduction. J.
Water Resour. Plan. Manag. 2019, 145, 04019024. [CrossRef]
15. Huang, Y.; Zheng, F.; Kapelan, Z.; Savic, D.; Duan, H.F.; Zhang, Q. Efficient leak localization in water distribution systems using
multistage optimal valve operations and smart demand metering. Water Resour. Res. 2020, 56, e2020WR028285. [CrossRef]
16. Fan, X.; Zhang, X.; Yu, X.B. Machine learning model and strategy for fast and accurate detection of leaks in water supply network.
J. Infrastruct. Preserv. Resil. 2021, 2, 1–21. [CrossRef]
17. Guo, Z.; Leitao, J.P.; Simões, N.E.; Moosavi, V. Data-driven flood emulation: Speeding up urban flood predictions by deep
convolutional neural networks. J. Flood Risk Manag. 2021, 14, e12684. [CrossRef]
18. Li, Z.; Wang, J.; Yan, H.; Li, S.; Tao, T.; Xin, K. Fast Detection and Localization of Multiple Leaks in Water Distribution Network
Jointly Driven by Simulation and Machine Learning. J. Water Resour. Plan. Manag. 2022, 148, 05022005. [CrossRef]
19. Huang, L.; Du, K.; Guan, M.; Huang, W.; Song, Z.; Wang, Q. Combined Usage of Hydraulic Model Calibration Residuals and
Improved Vector Angle Method for Burst Detection and Localization in Water Distribution Systems. J. Water Resour. Plan. Manag.
2022, 148, 04022034. [CrossRef]
20. Daniel, I.; Pesantez, J.; Letzgus, S.; Khaksar Fasaee, M.A.; Alghamdi, F.; Berglund, E.; Mahinthakumar, G.; Cominola, A. A
Sequential Pressure-Based Algorithm for Data-Driven Leakage Identification and Model-Based Localization in Water Distribution
Networks. J. Water Resour. Plan. Manag. 2022, 148, 04022025. [CrossRef]
21. Hu, Z.; Chen, B.; Chen, W.; Tan, D.; Shen, D. Review of model-based and data-driven approaches for leak detection and location
in water distribution systems. Water Supply 2021, 21, 3282–3306. [CrossRef]
22. Simpson, A.; Elhay, S. Jacobian matrix for solving water distribution system equations with the Darcy-Weisbach head-loss model.
J. Hydraul. Eng. 2011, 137, 696–700. [CrossRef]
23. Rossman, L.A. EPANET 2. In Users Manual; US Environmental Protection Agency (EPA): Washington, DC, USA, 2000.
24. Crowl, D.A.; Louvar, J.F. Chemical Process Safety: Fundamentals with Applications; Pearson Education: London, UK, 2001.
25. Klise, K.; Hart, D.; Bynum, M.; Hogge, J.; Haxton, T.; Murray, R.; Burkhardt, J. Water Network Tool for Resilience (WNTR) User
Manual; Technical Report; Sandia National Lab. (SNL-NM): Albuquerque, NM, USA, 2020.
Water 2023, 15, 2710 19 of 19

26. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021.
27. Hagberg, A.; Conway, D. Networkx: Network Analysis with Python. 2020. Available online: https://networkx.github.io
(accessed on 8 January 2010).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Arihant+Economics+11th+Term-I+ (@AVE EDUCATIONAL)
100% (4)
Arihant+Economics+11th+Term-I+ (@AVE EDUCATIONAL)
144 pages
Water Leak
No ratings yet
Water Leak
13 pages
Leak Prediction Model For Water Distribution Networks Created Using A Bayesian Network Learning Approach SpringerLink
No ratings yet
Leak Prediction Model For Water Distribution Networks Created Using A Bayesian Network Learning Approach SpringerLink
15 pages
smartcities-04-00069
No ratings yet
smartcities-04-00069
23 pages
b103
No ratings yet
b103
18 pages
Smartcities 04 00069 v2
No ratings yet
Smartcities 04 00069 v2
23 pages
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
No ratings yet
Leakage Identification in Water Distribution Networks Based On Xgboost Algorithm
13 pages
Leakage detection in water distribution networks using machine-learning strategies 2023
No ratings yet
Leakage detection in water distribution networks using machine-learning strategies 2023
12 pages
Prediction of Water Leakage in Pipeline Networks U
No ratings yet
Prediction of Water Leakage in Pipeline Networks U
16 pages
A Graph Based Analysis of Leak Localization in Urban Water Networks
No ratings yet
A Graph Based Analysis of Leak Localization in Urban Water Networks
10 pages
WRENG-5677_R3
No ratings yet
WRENG-5677_R3
47 pages
Pipe Network Leak Detection Comparison Between Statistical and Machine Learning Techniques
No ratings yet
Pipe Network Leak Detection Comparison Between Statistical and Machine Learning Techniques
9 pages
Dwes 13 29 2020
No ratings yet
Dwes 13 29 2020
13 pages
Resumen Articulo
No ratings yet
Resumen Articulo
3 pages
Water-12-03439-Clean 2023
No ratings yet
Water-12-03439-Clean 2023
19 pages
Sensors 19 05086 v2
No ratings yet
Sensors 19 05086 v2
21 pages
Enabling Low-Cost Automatic Water Leakage Detection A Semi-Supervised autoML-based Approach
No ratings yet
Enabling Low-Cost Automatic Water Leakage Detection A Semi-Supervised autoML-based Approach
12 pages
A Machine-Learning Approach For Monitoring Water Distribution Networks (WDNS)
No ratings yet
A Machine-Learning Approach For Monitoring Water Distribution Networks (WDNS)
17 pages
Modeling and Simulation of A Hydraulic Network For Leak Diagnosis
No ratings yet
Modeling and Simulation of A Hydraulic Network For Leak Diagnosis
11 pages
Sustainability 13 08306
No ratings yet
Sustainability 13 08306
16 pages
Processes 10 01355 v2
No ratings yet
Processes 10 01355 v2
18 pages
Leakage Detection and Estimation Algorithm for Loss Reduction in Water Piping Networks
No ratings yet
Leakage Detection and Estimation Algorithm for Loss Reduction in Water Piping Networks
21 pages
Evolutionary Observer Ensemble For Leak Diagnosis
No ratings yet
Evolutionary Observer Ensemble For Leak Diagnosis
18 pages
Remote Monitoring and Control System of A Water Distribution Network Using Lorawan Technology
No ratings yet
Remote Monitoring and Control System of A Water Distribution Network Using Lorawan Technology
10 pages
LeakDB A Benchmark Dataset For Leakage Diagnosis in Water - Paper
No ratings yet
LeakDB A Benchmark Dataset For Leakage Diagnosis in Water - Paper
8 pages
1 s2.0 S1367578823000160 Main
No ratings yet
1 s2.0 S1367578823000160 Main
28 pages
Decision Model To Control Water Losses in Distribution Networks
No ratings yet
Decision Model To Control Water Losses in Distribution Networks
10 pages
(Asce) WR 1943-5452 0001503
No ratings yet
(Asce) WR 1943-5452 0001503
11 pages
An Unsupervised Approach To Leak Detection and Location in Water Distribution Networks
No ratings yet
An Unsupervised Approach To Leak Detection and Location in Water Distribution Networks
13 pages
An Unsupervised Approach To Leak de
No ratings yet
An Unsupervised Approach To Leak de
13 pages
These Costadasilvaalves Debora-Vf
No ratings yet
These Costadasilvaalves Debora-Vf
175 pages
Abstract
No ratings yet
Abstract
2 pages
Análisis de Consumo de Agua para La Detección de Fugas en Tiempo Real en El Contexto de Un Edificio Terciario Inteligente
No ratings yet
Análisis de Consumo de Agua para La Detección de Fugas en Tiempo Real en El Contexto de Un Edificio Terciario Inteligente
6 pages
A Model - Based Approach For Leak Detection in Wat
No ratings yet
A Model - Based Approach For Leak Detection in Wat
9 pages
2002 A Mounce Leak Detec ML ANN MNF FailureSensor 3LevelsDetection
No ratings yet
2002 A Mounce Leak Detec ML ANN MNF FailureSensor 3LevelsDetection
10 pages
An Experimental Study For Leak Detection in Intermittent Water Distribution Networks
No ratings yet
An Experimental Study For Leak Detection in Intermittent Water Distribution Networks
7 pages
b26
No ratings yet
b26
7 pages
Review Paper On Water Leakage Detection in Pipes Using Sensors
0% (1)
Review Paper On Water Leakage Detection in Pipes Using Sensors
4 pages
Reliable Leakage Detection
No ratings yet
Reliable Leakage Detection
14 pages
Research Paper on Cnn Gas Leakage
No ratings yet
Research Paper on Cnn Gas Leakage
12 pages
Water 16 01534
No ratings yet
Water 16 01534
24 pages
Smart Water Supply System - A Quasi Intelligent Diagnostic Method For A Distribution Network
No ratings yet
Smart Water Supply System - A Quasi Intelligent Diagnostic Method For A Distribution Network
9 pages
Sensors 23 03226
No ratings yet
Sensors 23 03226
19 pages
Table 2 List of Randomly Selected Publications: El-Zahab and Zayed Smart Water (2019) 4:5 Page 9 of 23
No ratings yet
Table 2 List of Randomly Selected Publications: El-Zahab and Zayed Smart Water (2019) 4:5 Page 9 of 23
1 page
LDS WD-1 - 8 PDF
No ratings yet
LDS WD-1 - 8 PDF
1 page
Pipe Leakage Detection System With Artificial Neural Network
No ratings yet
Pipe Leakage Detection System With Artificial Neural Network
9 pages
An Accelerometer - Based Leak Detection System
No ratings yet
An Accelerometer - Based Leak Detection System
16 pages
Calculation of Leakage Water and Forecast Actual Water Delivery in Town Drinking Water Supply Systems
No ratings yet
Calculation of Leakage Water and Forecast Actual Water Delivery in Town Drinking Water Supply Systems
6 pages
1 s2.0 S1877705815025217 Main
No ratings yet
1 s2.0 S1877705815025217 Main
10 pages
water-10-01727
No ratings yet
water-10-01727
15 pages
Anomaly Detection System for Water Networks
No ratings yet
Anomaly Detection System for Water Networks
16 pages
Automatedmeterreadingtechnology PDF
No ratings yet
Automatedmeterreadingtechnology PDF
5 pages
Water 09 00224
No ratings yet
Water 09 00224
19 pages
A Review of Leakage Detection Strategies For Pressurised Pipeline in
No ratings yet
A Review of Leakage Detection Strategies For Pressurised Pipeline in
18 pages
Ascewr1943 54520001468
No ratings yet
Ascewr1943 54520001468
14 pages
PaperCarlos
No ratings yet
PaperCarlos
27 pages
Pipeline Leak Detection Using Artificial Neural Network: Experimental Study
No ratings yet
Pipeline Leak Detection Using Artificial Neural Network: Experimental Study
17 pages
Review of Current Technologies and Proposed Intellingent Methodologies For Water Distributed Network Leakage Detection
No ratings yet
Review of Current Technologies and Proposed Intellingent Methodologies For Water Distributed Network Leakage Detection
22 pages
1 s2.0 S0967066116301526 Main
No ratings yet
1 s2.0 S0967066116301526 Main
12 pages
Gómez-Camperos 2019 J. Phys. - Conf. Ser. 1388 012032
No ratings yet
Gómez-Camperos 2019 J. Phys. - Conf. Ser. 1388 012032
7 pages
Is 14181 2 2002 PDF
No ratings yet
Is 14181 2 2002 PDF
20 pages
FRM Download Document Pop Up
0% (1)
FRM Download Document Pop Up
108 pages
Operating Systems multiple choice questions
No ratings yet
Operating Systems multiple choice questions
10 pages
Manual Supervisor - Elastix CallCenterPRO - EN PDF
No ratings yet
Manual Supervisor - Elastix CallCenterPRO - EN PDF
74 pages
Research Internship
No ratings yet
Research Internship
2 pages
2014 2 Nine Weeks Study Guide: Answers
No ratings yet
2014 2 Nine Weeks Study Guide: Answers
47 pages
Hearing With The Eyes: Modulating Lyrics Typography For Music Visualization
No ratings yet
Hearing With The Eyes: Modulating Lyrics Typography For Music Visualization
17 pages
Ias 32
No ratings yet
Ias 32
13 pages
21CSL66 Lab Manual
No ratings yet
21CSL66 Lab Manual
60 pages
Air-Bb Neo 523 - A4 - 2106-06
0% (1)
Air-Bb Neo 523 - A4 - 2106-06
2 pages
Assignment I 9th Science
No ratings yet
Assignment I 9th Science
2 pages
ENCORE SP201SA Installguide
No ratings yet
ENCORE SP201SA Installguide
14 pages
Constant Volume Cycle
No ratings yet
Constant Volume Cycle
10 pages
Compact Heat Exchangers Heat Exchanger Types and Classifications
No ratings yet
Compact Heat Exchangers Heat Exchanger Types and Classifications
39 pages
Machine Design by S K Mondal
No ratings yet
Machine Design by S K Mondal
81 pages
SWFGD Cy Aiche
No ratings yet
SWFGD Cy Aiche
8 pages
LFTC Flanged Bearing Dimensions.
No ratings yet
LFTC Flanged Bearing Dimensions.
60 pages
Statistics: Cambridge International Examinations General Certificate of Education Ordinary Level
No ratings yet
Statistics: Cambridge International Examinations General Certificate of Education Ordinary Level
12 pages
1, General
No ratings yet
1, General
4 pages
Rsa Algorithm
100% (1)
Rsa Algorithm
3 pages
Chemistry Notes Vtu
67% (3)
Chemistry Notes Vtu
160 pages
Multiferroics
No ratings yet
Multiferroics
18 pages
Crankcase Pressure Regulator
No ratings yet
Crankcase Pressure Regulator
10 pages
Relative Valuation
No ratings yet
Relative Valuation
96 pages
Icecce49384 2020 9179470
No ratings yet
Icecce49384 2020 9179470
5 pages
Throughput Enhancement of IEEE 802.11 WLAN For Next Generation Communications
100% (1)
Throughput Enhancement of IEEE 802.11 WLAN For Next Generation Communications
68 pages
Resine Ip 2015-04 - Version3
No ratings yet
Resine Ip 2015-04 - Version3
2 pages
Serres and Hallward - The Science of Relations - An Interview
No ratings yet
Serres and Hallward - The Science of Relations - An Interview
13 pages

A Two-Stage Model for Data-Driven Leakage Detection and Localization in Water Distribution Networks

Uploaded by

A Two-Stage Model for Data-Driven Leakage Detection and Localization in Water Distribution Networks

Uploaded by

water

Department of Management Studies, Indian Institute of Science, Bangalore 560012, India;

Water 2023, 15, 2710. https://doi.org/10.3390/w15152710 https://www.mdpi.com/journal/water

where b = 1, 2. The unknown heads at different nodes are defined as h = ( H1 , . . . , Hn j )> ,

3.1. Simulation: Base Scenarios

3.2. Simulation: Scenarios with a Leak

Ey∼ Pbase ∆hij | Qi , Q j = β 0 + β 1 Qi + β 2 Q2i + β 3 Q j + β 4 Q2j ,

eij = Ey∼ Pbase ∆hij | Qi , Q j − ∆hij .

4.1. Identifying the Occurrence of Leaks

4.2. Identifying the Location of Leaks

4.3. Summary of the Method

Step 1: Draw Nbase demands from the multivariate normal distribution.

Generating labeled leak scenarios:

Figure 1. Process flow of linear regression and logistic regression.

5. Application of the Methodology on Water Distribution Networks

(a) ∆hi,j observed vs. predicted.

(b) Distribution of ei,j observed

5.3. Impact of Leaks on Residual Errors

(a) ∆hi,j observed vs. predicted.

(b) Distribution of ei,j of no leak and leak case

5.5. Impact of Noisy Sensor Data

5.6. Error in Terms of Topological Distance

6.1. Case Study 1: Hanoi WDN (WDN1)

Experiment with Varying Number of Sensors and Noise in Sensor Data

6.2. Case Study 2: Net3 (WDN2)

6.3. Case Study 3: C-Town Network (WDN3)

6.3.2. Comparative Analysis of the Results from Three Networks

You might also like