0% found this document useful (0 votes)

10 views13 pages

AI DRIVEN WATERSYSTEM2023

This research presents an AI-driven model utilizing the YOLOv5 architecture for high-precision blockage detection in urban wastewater systems, achieving a mean average precision (mAP) score of 96.30%. The model is trained on the S-BIRD dataset, which contains diverse images of sewer blockages, employing transfer learning and fine-tuning techniques to enhance performance. The study aims to improve the reliability and efficiency of robotic systems for the removal of blockages in complex sewer networks.

Uploaded by

hajji.arepo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

AI DRIVEN WATERSYSTEM2023

Uploaded by

hajji.arepo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

electronics

Article
AI-Driven High-Precision Model for Blockage Detection in
Urban Wastewater Systems
Ravindra R. Patil 1, * , Rajnish Kaur Calay 1 , Mohamad Y. Mustafa 1 and Saniya M. Ansari 2

1 Faculty of Engineering Science and Technology, UiT the Arctic University of Norway, 8514 Narvik, Norway;
[email protected] (R.K.C.); [email protected] (M.Y.M.)
2 Department of E & TC Engineering, Ajeenkya D Y Patil School of Engineering, Pune 411047, India
* Correspondence: [email protected]

Abstract: In artificial intelligence (AI), computer vision consists of intelligent models to interpret and
recognize the visual world, similar to human vision. This technology relies on a synergy of extensive
data and human expertise, meticulously structured to yield accurate results. Tackling the intricate
task of locating and resolving blockages within sewer systems is a significant challenge due to their
diverse nature and lack of robust technique. This research utilizes the previously introduced “S-BIRD”
dataset, a collection of frames depicting sewer blockages, as the foundational training data for a deep
neural network model. To enhance the model’s performance and attain optimal results, transfer
learning and fine-tuning techniques are strategically implemented on the YOLOv5 architecture, using
the corresponding dataset. The outcomes of the trained model exhibit a remarkable accuracy rate
in sewer blockage detection, thereby boosting the reliability and efficacy of the associated robotic
framework for proficient removal of various blockages. Particularly noteworthy is the achieved mean
average precision (mAP) score of 96.30% at a confidence threshold of 0.5, maintaining a consistently
high-performance level of 79.20% across Intersection over Union (IoU) thresholds ranging from 0.5 to
0.95. It is expected that this work contributes to advancing the applications of AI-driven solutions for
modern urban sanitation systems.

Keywords: AI; object detection; S-BIRD dataset; computer vision; transfer learning; YOLOv5;
Citation: Patil, R.R.; Calay, R.K.; wastewater management
Mustafa, M.Y.; Ansari, S.M.
AI-Driven High-Precision Model for
Blockage Detection in Urban
Wastewater Systems. Electronics 2023,
1. Introduction
12, 3606. https://doi.org/10.3390/
electronics12173606
Computer vision is a field of artificial intelligence (AI) with its own conventional
algorithms that extract required information from various visual forms such as photos and
Academic Editors: Dong Zhang and videos, and based on that information form, perform actions, or make recommendations in
Dah-Jye Lee
order to detect and identify distinct objects. Thus, the large datasets should increase the
Received: 10 August 2023 performance properties of computer vision.
Revised: 22 August 2023 Object detection techniques of computer vision detect the occurrence of objects in an
Accepted: 24 August 2023 image or video with bounding boxes and identify their classes. Initially, machine learning
Published: 26 August 2023 was mainly used for object detection tasks but when deep neural networks, i.e., deep
learning methods emerged, they became popular due to automatic representative feature
extraction from large datasets for training purposes [1]. Occlusion, clutter, and low resolu-
tion are some of the sub-problems that are handled very efficiently by deep learning-based
Copyright: © 2023 by the authors. detection frameworks [2,3]. It has two method types such as single-stage, which works for
Licensee MDPI, Basel, Switzerland. inference speed and real-time use, and two-stage, which works for model performance, i.e.,
This article is an open access article
detection accuracy. The single-stage detectors remove the process of region of interest (ROI)
distributed under the terms and
extraction and moves for classification and regression whereas two-stage detectors extract
conditions of the Creative Commons
ROI and then apply classification and regression. The YOLO detection model (YOLOv2 [4],
Attribution (CC BY) license (https://
YOLOv3 [5], YOLOv4 [6], and YOLOv5 [7]), SSD [8], CenterNet [9], CornerNet [10], etc., are
creativecommons.org/licenses/by/
some single stage detectors. Region proposal models (R-CNN [11], Fast-RCNN [12], Faster
4.0/).

Electronics 2023, 12, 3606. https://doi.org/10.3390/electronics12173606 https://www.mdpi.com/journal/electronics

Electronics 2023, 12, 3606 2 of 13

RCNN [13], Cascade R-CNN [14], and R-FCN [15]) are two-stage detectors. Classification
and localization accuracy and inference speed are two important metrics for object detec-
tors. In the advancement of detection models, transfer learning techniques with quality
datasets meet the requirements with a minimum training time [16,17]. Transfer learning
harnesses prior knowledge to enhance performance on novel tasks. By fine-tuning, pre-
trained deep neural models are adapted to new contexts with certain layers preserved and
others refined. This leads to many advantages such as achieving quick convergence, good
performance, and adaptability in real-world scenarios. As the applications of AI evolve,
such as video surveillance, military applications, security aspects, health monitoring, and
critical detection tasks, the AI techniques are being enhanced to suit these needs.
Addressing the application-based needs to produce sensible and accurate results, de-
tection models need to be adapted and modified, which usually have heavy computational
demands. However, there are methods such as the embedded vision approach with AI
that has an ability to enable real-time, efficient, and intelligent visual processing directly on
edge devices, which reduces dependency on cloud computing and enhances privacy and
responsiveness in many applications [18,19].
Detecting various sewer blockages is a major challenge due to their complex and
heterogeneous nature. Moreover, their locations in the sewer network may vary, including
main lines, lateral connections, and junctions. Blockages can exhibit varying levels of
severity, from partial restrictions that gradually reduce flow to complete blockages that
cause sewer overflows. The dynamic and unpredictable nature of urban wastewater
systems, influenced by factors such as climate, wastewater composition, and hydraulic
conditions adds another layer of complexity. In this research work, transfer learning
and fine-tuning techniques are utilized to achieve a high precision rate in the detection
of blockages within urban wastewater systems. This approach is intended for real-time
implementation on mobile devices and other environments with limited resources, with the
goal of effectively removing such blockages. Our primary emphasis is on the training of the
single-stage YOLOv5 model using the S-BIRD dataset [20,21], which contains representative
and critical multi-class images depicting prevalent sewer blockage scenarios.
The study implements all computer vision and model training procedures using
Python programming, OpenCV, PyTorch framework, and other machine learning libraries.
These operations are carried out on a DGX GPU workstation system running on the
Linux platform, ensuring a robust and efficient experimental environment. The results are
analyzed and discussed to demonstrate the effectiveness of the methodology used.

2. Structural Insights of YOLOv5 Model

YOLOv5 is an anchor-based single-stage detection model, which is built on the PyTorch
framework. It focuses on simplicity, model scaling, and transfer learning, making it versatile
for a wide range of object detection tasks. The model’s backbone is CSP Darknet-53, which
incorporates Cross Stage Partial (CSP) connections to enhance information flow and feature
representation.
To create feature pyramids for effective object scaling and generalization, YOLOv5
employs the Path Aggregation Network (PAN) as its neck. The head design utilizes anchor
boxes to generate output vectors that contain class probabilities, objectness scores, and
bounding box coordinates (center_x, center_y, height, and width). The model parameters
are updated during training using the following loss function:

Loss = λ1 ∗ L_cls + λ2 ∗ L_obj + λ3 ∗ L _loc (1)

where L_cls represents the Binary Cross Entropy loss for predicted classes, L_obj represents
the Binary Cross Entropy loss for objectness scores, and L_loc represents the Complete
Intersection over Union loss for bounding box locations. Here, λ1, λ2, and λ3 are hyperpa-
rameters controlling the contribution of each component to the overall loss. The employed
auto anchor automatically determines and generates anchor boxes based on the distribution
of bounding boxes in the custom dataset using K-means clustering and a genetic learning
Electronics 2023, 12, x FOR PEER REVIEW 3 of 13

Electronics 2023, 12, 3606 3 of 13

employed auto anchor automatically determines and generates anchor boxes based on the
distribution of bounding boxes in the custom dataset using K-means clustering and a ge-
netic learning algorithm. In this, SiLU (Sigmoid Linear Unit) activation function in hidden
algorithm.
layers acquireIn this, SiLU (Sigmoid
intricate details andLinear Unit)activation
Sigmoid activationfunction
functionininthe
hidden layers
output acquire
layer func-
intricate details and Sigmoid
tions for binary classification. activation function in the output layer functions for binary
classification.
As shown in Figure 1, the backbone employs Convolutional and C3 layers to extract
image Asfeatures,
shown in Figure
which are1,then
the combined
backbone employs
at variousConvolutional and C3
levels using Conv, layers toConcat,
Upsample, extract
image features, which are then combined at various levels using Conv, Upsample,
and C3 layers in the head. The object detection process is facilitated by a Detect layer that Concat,
and
usesC3 layers
anchor in the
boxes andhead. The object
the indicated detection
class process is facilitated
count. Particularly, by a Detect
each C3 (CSP-3) blocklayer
con-
that uses anchor boxes and the indicated class count. Particularly, each C3 (CSP-3)
sists of two parallel convolutional layers, the first layer channels input features through a block
consists
bottleneckof two parallel
layer, convolutional
compressing layers, the first
the information and layer channels
the second input
layer features
directly through
outputs a
fea-
bottleneck layer, compressing the information and the second layer directly outputs feature.
ture. These streams are then concatenated and processed through pooling and convolu-
These streams are then concatenated and processed through pooling and convolutional
tional layers. The C3 blocks also use skip connections and attention mechanisms to en-
layers. The C3 blocks also use skip connections and attention mechanisms to enhance
hance information flow and reduce noisy features.
information flow and reduce noisy features.

Figure 1.
Figure 1. Architectural
Architectural perception
perception of
of YOLOv5
YOLOv5 model.
model.

3.
3. Details
Details of
of Training
Training Instances
Instances inin Critical
Critical Multi-Class
Multi-ClassS-BIRD
S-BIRD
The
The dataset
dataset comprises
comprises aa total
total of
of 14,765
14,765 training
training frames
frames ofof classes
classes (grease,
(grease, plastics,
plastics, and
and
tree
tree roots),
roots), which
which are
are meticulously
meticulously annotated
annotated with
with 69,061 objects as
69,061 objects as shown
shown in in Figure
Figure 2,
2,
resulting in an average of 4.7 annotations per frame. Specifically, the dataset
resulting in an average of 4.7 annotations per frame. Specifically, the dataset comprises comprises
26,847 annotations for
26,847 annotations forgrease,
grease,21,553
21,553annotations
annotationsforfor tree
tree roots,
roots, andand 20,661
20,661 annotations
annotations for
for
plastics. To ensure uniformity and standardization, the frames were preprocessed and
plastics. To ensure uniformity and standardization, the frames were preprocessed and
augmented,
augmented,resulting
resultinginin
ananaverage
averageframe size size
frame of 0.173 Megapixels.
of 0.173 The frames
Megapixels. were resized
The frames were
to a square aspect ratio of 416 × 416 pixels, thereby maintaining a 1:1 aspect
resized to a square aspect ratio of 416 × 416 pixels, thereby maintaining a 1:1 aspect ratio class.ratio
The
angle
class. of
Thetheangle
diagonal was
of the calculated
diagonal wastocalculated
be 0.785 radians (equivalent
to be 0.785 radiansto (equivalent
45 degrees),towith
45 the
de-
diagonal length measuring 588 pixels.
grees), with the diagonal length measuring 588 pixels.
Electronics
Electronics
Electronics2023,
2023,
2023,12,
12, x3606
12, FOR PEER
x FOR REVIEW
PEER REVIEW 4 of
4 4of of1313
13

Figure
Figure2.2.2.
Figure Labelling
Labellingdetails
Labellingdetailsofof
details training
of instances
training
training from
instances
instances dataset.
from
from dataset.
dataset.

Regarding
Regardingpixel
Regarding pixeldensity,
pixel density,the
density, thedataset
dataset
dataset exhibits
exhibits
exhibitsa adensity
adensity
densityofof12
of12pixels
12pixels per
pixels millimeter
per
permillimeter
millimeteroror
290
290
or pixels
pixels
290 per
pixels inch.
perperinch.These
These
inch. specific
Thesespecific computational
specificcomputational
computational details
details are
arevital
details for
vital
are forunderstanding
vital understanding
for understanding the
the
characteristics
characteristics
the and
characteristics andintricacies
intricacies
and ofofthe
intricacies ofS-BIRD
the S-BIRD dataset,
the S-BIRDdataset, which
which
dataset, plays
playsaplays
which acrucial
crucialrole
roleinin
a crucial effec-
effec-
role in
tively
tivelytraining
effectively
training the thedeep
training the neural
deep deep network.
neural
neural Figure
network.
network. 3 3illustrates
Figure
Figure the
thedistribution
3 illustrates
illustrates ofofobject
the distribution
distribution clas-
of object
object clas-
classes
ses inin
ses in each
each
each training
training
training frame frame
frame based based
based onon on
the
the the
center center
centerxxfor
forxthe
for
the the
S-BIRDS-BIRD
S-BIRD dataset.
dataset.
dataset. Figure
Figure
Figure 3 shows3 shows
3 shows the
the
the relative
relative
relative distribution
distribution
distribution of center
ofofcenter
center x coordinates
x xcoordinates
coordinates across
acrossacross
different different
different classes classes
classes during
duringduring
training. training.
training. Each
Each
Each
segment segment
segment isis is color-coded
color-coded
color-coded and
and and displays
displays
displays data
data dataand
values
values values
and and percentiles,
percentiles,
percentiles, providing
providingproviding
a clear
a clear aunder-
clear
under-
understanding
standing
standingofofobject of object
objectpositions positions
positionsalong
alongthealong the
thex-axis. x-axis.
x-axis.This This
Thissection section
sectionprovides provides
providesvaluable valuable
valuableinsights insights
insightsintointo
into
the the dataset’s
thedataset’s
dataset’s dimensions,
dimensions,
dimensions, resolutions,
resolutions,
resolutions, and and geometric
andgeometric
geometric properties,
properties,
properties, which
whichwhich contribute
contribute
contribute tototheto
the
the successful
successful
successful implementation
implementation
implementation of transfer
ofoftransfer
transfer learning
learning
learning andandand fine-tuning
fine-tuning
fine-tuning techniques
techniques
techniques forforthe
for the deep
thedeep
deep
neural
neural detection
neuraldetection model.
detectionmodel.
model.

Figure 3. Object classes in each training frame by center x.

Figure 3. 3.
Figure Object classes
Object inin
classes each training
each frame
training byby
frame center x. x.
center
4. Training Method and Evaluation
4.4.Training
Training Method
Method
The training and
andEvaluation
process Evaluation
for the YOLOv5-s model (Based on PyTorch 1.10.0a0 with CUDA
support)
The on the
Thetraining S-BIRD
trainingprocess dataset
processfor involved
forthe YOLOv5-s
the a series
YOLOv5-s of steps
model
model (Basedaimed
(Based atPyTorch
achieving
ononPyTorch the highest
1.10.0a0
1.10.0a0 with
with
precision
CUDA in
CUDAsupport) detecting
support)ononthe sewer
theS-BIRD blockages.
S-BIRDdataset Through
datasetinvolved the
involveda aseries application
seriesofofsteps of
stepsaimed transfer learning
aimedatatachieving
achievingthe
the
Electronics 2023, 12, 3606 5 of 13

and fine-tuning techniques, the model’s formulation was optimized to suit the specific
characteristics of the representative dataset, enabling its effective adaptation for real-world
scenarios. To facilitate the training process, annotations for object classes were applied in
PyTorch TXT format, as needed. The training process was performed over 6000 epochs,
using the stochastic gradient descent (SGD) optimizer with specified hyperparameters. The
training process utilized the configurations listed in Table 1. The DGX-1 (utilized 32 GB
GPU Card) available at UiT, Narvik, running a Docker container with a defined image
served as the training platform, leveraging GPU parallelization for faster computations.
Overfitting was mitigated using Early Stopping with a patience of 100 epochs.

Table 1. Principal training configurations.

Attributes Implications
learning model YOLOv5-s
Annotation data type PyTorch TXT
max_epoch 6000
patience 100
batch_size 16
fp16 True
num_classes 3
Params 7.2 M
Gflops 15.9
depth 0.33
width 0.5
input_size (416, 416)
workers 8
anchor_t 4.0
scale 0.5
hsv_h, hsv_s, hsv_v 0.015, 0.7, 0.4
warmup_epochs 3
weight_decay 0.0005
momentum 0.937
translate 0.1

The training progression concluded at 933 epochs due to a lack of improvement in

the last 100 epochs. The most promising results were obtained at epoch 832, leading to the
selection of the corresponding model for practical applications. The evaluation metrics
are essential for quantifying the model’s performance, and they are computed using the
following formulas:
Precision = TP/(TP + FP) (2)

Recall = TP/(TP + FN) (3)

mAP = ∑(AP for each class)/Number of classes (4)

F1 score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall) (5)

Here, TP—true positive, FP—false positive, FN—false negative, and mAP—mean
average precision.
During the training, at epoch 832, the model exhibited impressive precision (P) and
recall (R) values of 94.40% and 93.90%, respectively, across all classes. Notably, Figure 4
illustrates that the developed detection model achieved outstanding average precision
values of 95.90% for grease blocks, 98.40% for plastic blocks, and 94.50% for tree root blocks.
These high precision values are indicative of the model’s ability to accurately detect and
classify instances belonging to these specific classes. The overall mean average precision
(mAP) for all classes, as indicated in Table 2, is remarkably high at 96.30% with a confidence
Table 2. Temporal evaluation details.

Timing Attributes Outturns (Milliseconds)

Electronics 2023, 12, 3606 Average forward time 0.2 ms 6 of 13
Average NMS time 1.1 ms
Average inference time 11 ms
threshold of 0.5. This highlights the model’s proficiency in making precise detections across
all classes
Table within assessment
3. Precision the dataset.details.
Moreover, the calculated mAP over various Intersection
over Union (IoU) thresholds, ranging from 0.5 to 0.95 with an increment of 0.05, yielded a
Object Class
consistent performance Average Precision
of 79.20%. This demonstrates that the modelmap_5095
maintains accuratemap_50
localization oftree roots
objects across a broad range of IoU0.945
thresholds. The timing results in Table 3
show that the model
greasehas efficient inference times, with an average forward
0.959 time of 0.2 ms,
0.792 0.9630
average NMS time of 1.1 ms, and average inference time of 11 ms. These low inference
plastic 0.984
times make the model suitable for real-time applications.

Figure4.4.Obtained
Figure Obtainedhigher precision
higher raterate
precision for each class.class.
for each

Table 2. Temporal evaluation details.

The confusion matrix in Figure 5, provides an overview of the model’s performance
in correctly classifying instances of grease, plastic,Outturns
Timing Attributes and tree(Milliseconds)
roots. This visualization pro
vides a clear breakdown of correct
Average forward time and incorrect classifications
0.2 msfor each category.
Average NMS time 1.1 ms
Average inference time 11 ms

Table 3. Precision assessment details.

Object Class Average Precision map_5095 map_50

tree roots 0.945
grease 0.959 0.792 0.9630
plastic 0.984

The confusion matrix in Figure 5, provides an overview of the model’s performance in

correctly classifying instances of grease, plastic, and tree roots. This visualization provides
a clear breakdown of correct and incorrect classifications for each category.
Figure 6 shows correlation connections within the frames of the dataset, demonstrating
the exact connection between instances and their labels among discrete views. It is also
evident that a majority of instances in the dataset are situated towards the outer edges of
both the top and bottom sides of the images in the dataset. This indicates the efficiency of
the trained model to detect and classify multiple objects in various real-world scenarios.
Electronics 2023, 12, x FOR PEER REVIEW 7 of 13
Electronics 2023, 12, 3606 7 of 13

Figure 5. Confusion matrix details for all classes.

Figure 6 shows correlation connections within the frames of the dataset, demonstrat-
ing the exact connection between instances and their labels among discrete views. It is also
evident that a majority of instances in the dataset are situated towards the outer edges of
both the top and bottom sides of the images in the dataset. This indicates the eﬃciency of
the trained model to detect and classify multiple objects in various real-world scenarios.

Figure 5. Confusion matrix details for all classes.

Figure 6. Correlogram for frames detailing.

Figure 6. Correlogram for frames detailing.
The scatter diagram, Figure 7, displays the instances in the dataset and their corre-
sponding labels. This visualization helps with understanding the distribution of instances
across different classes and assists with identifying potential clustering patterns.

Figure 6. Correlogram for frames detailing.

The scatter diagram, Figure 7, displays the instances in the dataset and their corre-
Electronics 2023, 12, 3606 sponding labels. This visualization helps with understanding the distribution 8ofofinstances
13
across diﬀerent classes and assists with identifying potential clustering patterns.

Figure 7. Scatter chart for instances and linked labels.

FigureThe
Figure graph
Scatter
7. Scatter
7. chartin
chart for Figure
forinstances
instances 8and
and illustrates
linked labels.
linked the
relationship between precision (
labels.
dence (C) that informs concerning changes in the model’s precision at diﬀere
The graph in Figure 8 illustrates the relationship between precision (P) and confi-
The
levels,
dence (C) graph
providingin Figure
that informs 8 illustrates
insights
concerning into the
the
changes relationship
inmodel’s
the between
ability
model’s precision
toatmake
precision (P) and
accurate
different confidence confi-
detectio
dence
levels,(C) that informs
providing concerning
insights changes
into the model’s in the
ability to model’s precision
make accurate at diﬀerent
detections confidence
at various
confidence thresholds.
levels, providing
confidence insights into the model’s ability to make accurate detections at various
thresholds.
confidence thresholds.

Figure 8. Precision (P) versus confidence (C) chart.

Figure 8. Precision (P) versus confidence (C) chart.
Figure 8. Precision (P) versus confidence (C) chart.
Figure 9 displays the correlation between recall (R) and confidence (C), which clari-
Figure 9 displays the correlation between recall (R) and confidence (C), which clarifies
fies
howhow
wellwell
the the model
model can can recall
recall positive
positive instances
instances at diﬀerent
at different confidence
confidence levels, levels,
giving giving
Figure
sensitivity 9 displays
sensitivity details
the
todetection
details to detectionofof
correlation
true
true
between
positives.
positives.
recall (R) and confidence (C)
fies how well the model can recall positive instances at diﬀerent confidence
sensitivity details to detection of true positives.
lectronics 2023, 12, x FOR PEER REVIEW

ctronics 2023, 12, x FOR PEER REVIEW

Electronics 2023, 12, 3606 9 of 13

Figure 9. Recall (R) versus confidence (C) chart.

FigureFigure
9. Recall 10 showcases
(R) versus the
confidence (C)mean average precision (mAP) of the model, co
chart.
Figure 9. Recall (R) versus confidence (C) chart.
truthFigure
bounding box and
10 showcases the detection
the mean box. A
average precision higher
(mAP) mAP
of the indicates
model, comparingbetter o
mance
the truthinbounding
detecting boxand localizing
and the detection objects across
box. A higher mAPall indicates
classes. better overall
Figurein10
performance showcases
detecting the mean
and localizing objectsaverage precision (mAP) of the model
across all classes.
truth bounding box and the detection box. A higher mAP indicates bette
mance in detecting and localizing objects across all classes.

Figure 10. Precision (P) versus recall (R) chart.

Figure 10. Precision (P) versus recall (R) chart.
Figure 11 exhibits the F1 score at a 94% threshold with a confidence level of 0.566.
The F1 score considers both precision and recall, making it a valuable metric for assessing
Figure 11 exhibits the F1 score at a 94% threshold with a confidence l
model performance.
The F1 score considers both precision and recall, making it a valuable metric
Figure 10. Precision (P) versus recall (R) chart.
model performance.
Figure 11 exhibits the F1 score at a 94% threshold with a confidenc
The F1 score considers both precision and recall, making it a valuable met
model performance.
lectronics 2023, 12, x FOR PEER REVIEW
Electronics 2023, 12, 3606 10 of 13
Electronics 2023, 12, x FOR PEER REVIEW 10 of 13

Figure 11. F1 score versus confidence (C) chart.

Figure 11. F1 score versus confidence (C) chart.
Figure 11. F1
Figure 12 score versus
exhibits confidence
the training (C) chart. losses of the detection model over 932
and validation
epochs Figure 12 exhibits
on the S-BIRD the training
dataset. and
This validation
graph helpslosses of the detection
in understanding themodel
model’soverlearning
932 epochs
progress on the S-BIRD
during training dataset.
and This graph
validation helps
phases. in understanding the
A decrease inlosses model’s
loss indicates learning
that the model
Figure
during12 exhibits the training and validation
in loss indicatesof thethedetection m
isprogress
learning training
to make and validation
better predictions.phases. A decrease that model
epochs on
is learning the S-BIRD
to make dataset. This graph helps in understanding the mod
better predictions.
progress during training and validation phases. A decrease in loss indicates t
is learning to make better predictions.

Figure 12. Detailing of losses in training and validation.

Figure 12. Detailing of losses in training and validation.
Figure 13 exhibits the detection outcomes obtained by deploying the trained model
FigureSource
on Google 13 exhibits
framesthe detection
[22–27] outcomes
as input data. Theobtained by deploying
outcomes include the the trained
location of model
on
objects and corresponding class labels (tree roots, grease, or plastic) predicted by the model. objects
Google Source frames [22] as input data. The outcomes include the location of
and
Thesecorresponding classimportance
results are of utmost labels (tree roots,
as they grease,
enable or plastic)
a thorough predicted
evaluation by the model.
of the model’s
performance and adaptability when dealing with new and diverse data
These results are of utmost importance as they enable a thorough evaluation of the in real-world
scenarios.
Figure
model’s 12. Additionally,
Detailing of
performance the
and model
losses has been
in training
adaptability whenspecifically
and optimized
with new to
validation.
dealing andhandle multiple
diverse data in real-
sewer blockages within the same frame, making it highly suitable for real-time
world scenarios. Additionally, the model has been specifically optimized to handle detection in mul-
various practical situations.
tiple sewer
Figure blockages within
13 exhibits thedetection
the same frame, making itobtained
outcomes highly suitable for real-time
by deploying de-t
the
tection
on in various
Google practical
Source frames situations.
[22] as input data. The outcomes include the locat
and corresponding class labels (tree roots, grease, or plastic) predicted b
These results are of utmost importance as they enable a thorough evalu
model’s performance and adaptability when dealing with new and diverse
world scenarios. Additionally, the model has been specifically optimized to
Electronics 2023,
Electronics 2023,12,
12,x3606
FOR PEER REVIEW 11 of 13 11 of

Figure13.13.
Figure Identification
Identification and and localization
localization outcomes.
outcomes.

5. Comparing AI-Driven Approach to MOEAs

5. Comparing AI-Driven Approach to MOEAs
The AI-driven approach presented in this research offers several advantages over
The AI-driven
Multi-Objective approach
Evolutionary presented
Algorithms in this[28]
(MOEAs) research
commonly offers
usedseveral advantages ov
in wastewater
system management. While MOEAs such as NSGA-II, SPEA2, MOPSO, and MODEinare
Multi-Objective Evolutionary Algorithms (MOEAs) [23] commonly used wastewa
system at
effective management. While objectives,
optimizing multiple MOEAs such as NSGA-II,
they often come with SPEA2, MOPSO,
the burden and MODE a
of complex
effective at optimizing
mathematical models and multiple objectives,requirements
high computational they often come [29,30].with the burden
In contrast, the AIof comp
approach leverages
mathematical advanced
models and computer vision and deep
high computational learning techniques
requirements [24,25]. Into detect
contrast, the
sewer blockages promptly and accurately. The model achieves a remarkable
approach leverages advanced computer vision and deep learning techniques to det mean average
precision (mAP) of 96.30% at a confidence threshold of 0.5, highlighting its exceptional
sewer blockages promptly and accurately. The model achieves a remarkable mean av
precision in sewer blockage detection, which in turn enhances the reliability and efficiency
ofage precisionmanagement
wastewater (mAP) of 96.30%
systems. at a confidence threshold of 0.5, highlighting its exception
precision
Furthermore, the AI approach relies onwhich
in sewer blockage detection, labelledin training
turn enhances
data and the reliabilitydeep
lightweight and efficien
of wastewater
learning management
models, enhancing systems.
its efficiency and real-time capabilities. This aligns well with the
urgentFurthermore,
need to addressthe AI approach reliesand
sewer blockages swiftly onprevent
labelled training and
disruptions dataoverflows.
and lightweight
The de
model’s
learning accuracy,
models, speed, and specialized
enhancing focus onand
its efficiency sewer blockagecapabilities.
real-time detection make it a aligns
This highly well w
promising
the urgent solution
need forto immediate
address sewerand effective urbanswiftly
blockages wastewater
and system
prevent management.
disruptions In and ov
comparison, MOEAs such as the sensitivity-based adaptive procedure (SAP) [31], optimal
flows. The model’s accuracy, speed, and specialized focus on sewer blockage detecti
control algorithms [32], and novel methodologies [33] have shown efficiency in various
make it
aspects ofawastewater
highly promising
management,solution
suchfor immediate
as sewer and effective
rehabilitation urbanscheduling.
and optimal wastewater syste
management.
However, In comparison,
their computational demandsMOEAs such asonthe
and reliance sensitivity-based
complex algorithms mightadaptive
hinderprocedu
(SAP)
their [26], optimal
real-time control
applicability. algorithms
The AI-driven [27], andability
approach’s novel to methodologies [28] have show
process data in real-time,
efficiency
coupled withinitsvarious aspects
high accuracy inof wastewater
detection, gives management,
it a distinct edgesuch as sewer dynamic
for addressing rehabilitation a
and criticalscheduling.
optimal scenarios like sewer blockages.
However, their computational demands and reliance on complex
Overall,might
gorithms while both AI-driven
hinder approachesapplicability.
their real-time and MOEAs contribute
The AI-drivento the advancement
approach’s ability
of wastewater management, the AI approach’s ability to quickly detect and respond to
process data in real-time, coupled with its high accuracy in detection, gives it a distin
sewer blockages makes it particularly well-suited for immediate, on-the-ground applica-
edgeinfor
tions addressing
modern dynamicsystems.
urban sanitation and critical scenarios like sewer blockages.
Overall, while both AI-driven approaches and MOEAs contribute to the advan
6.ment
Conclusions
of wastewater management, the AI approach’s ability to quickly detect and respo
to sewer blockages
This research makes it
highlights particularly
the potential ofwell-suited for immediate,
artificial intelligence, on-the-ground
by employing the app
YOLOv5 single-stage
cations in detection
modern urban model and
sanitation transfer learning on the critical S-BIRD im-
systems.
age dataset in sewer blockage detection. By harnessing the power of AI, we achieved a high
precision rate suitable for real-time deployment on resource-constrained mobile devices.
6. Conclusions
Based on the current work, the following specific conclusions may be made.
This research highlights the potential of artificial intelligence, by employing t
YOLOv5 single-stage detection model and transfer learning on the critical S-BIRD ima
dataset in sewer blockage detection. By harnessing the power of AI, we achieved a hi
precision rate suitable for real-time deployment on resource-constrained mobile device
Based on the current work, the following specific conclusions may be made.
Electronics 2023, 12, 3606 12 of 13

• The developed model demonstrated noticeable precision and recall rates, achieving
94.50%, 95.90%, and 98.40% average precision for tree roots, grease, and plastics,
respectively. The mean average precision (mAP) reached an outstanding 96.30% at a
confidence threshold of 0.5 and maintained consistent performance at mAP of 79.20%
across IoU thresholds ranging from 0.5 to 0.95, indicating the model’s proficiency
in handling different sewer blockage scenarios. The inference times were efficient,
making the model suitable for real-time applications. The detection outcomes on
Google Source frames further validated the model’s adaptability to diverse data.
• The results emphasize the effectiveness of transfer learning and fine tuning, reducing
training time, enhancing performance, and in adapting deep neural network models
to new contexts.
• The presented model’s ability to accurately detect sewer blockages holds promise for
its application in modern wastewater management systems. The AI-driven sewer
blockage detection system showcased in this research has significant implications for
real-world applications, ranging from urban infrastructure management to environ-
mental conservation.
As AI technologies continue to advance, the integration of computer vision and deep
learning models will pave the way for more efficient and intelligent solutions in various
new domains.

Author Contributions: Conceptualization, R.R.P., M.Y.M. and R.K.C.; methodology, R.R.P.; software,
R.R.P.; dataset creation, R.R.P.; validation, R.R.P., M.Y.M. and R.K.C.; formal analysis, R.R.P., M.Y.M.
and R.K.C.; investigation, R.R.P.; writing—original draft preparation, R.R.P.; writing—review and
editing, R.K.C. and R.R.P.; visualization, R.K.C. and R.R.P.; project administration, R.K.C., M.Y.M. and
S.M.A.; and funding acquisition, R.K.C. All authors have read and agreed to the published version of
the manuscript.
Funding: The research visit of R.R.P. is funded by project PEERS (UTF 2020/10131). The publication
charges for this article have been funded by the publication fund of UiT The Arctic University of
Norway.
Data Availability Statement: The research data will be made available on the request.
Acknowledgments: Authors acknowledge the support from SPRING EU-India Project (No. 821423
and GOI No. BT/IN/EU-WR/60/SP/2018) and UiT The Arctic University of Norway, Narvik,
Norway, for the Ph.D. studies of Ravindra R. Patil. We extend our thanks to ADY Patil School of
Engineering, Pune, India.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019,
30, 3212–3232. [CrossRef] [PubMed]
2. Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2022, 132, 103812.
[CrossRef]
3. Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [CrossRef]
4. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
5. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
6. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
7. Ultralytics/yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 9 June 2023).
8. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings
of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer
International Publishing: Cham, Switzerland, 2016; pp. 21–37.
9. Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the
2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019;
pp. 6569–6578.
10. Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer
Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750.
Electronics 2023, 12, 3606 13 of 13

11. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
12. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile,
7–13 December 2015; pp. 1440–1448.
13. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural
Inf. Process. Syst. 2015, 39, 1137–1149. [CrossRef] [PubMed]
14. Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162.
15. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the
Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016.
16. Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the
Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014.
17. Long, M.; Cao, Y.; Wang, J.; Jordan, M. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd
International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 97–105.
18. Vaidya, O.S.; Patil, R.; Phade, G.M.; Gandhe, S.T. Embedded Vision Based Cost Effective Tele-operating Smart Robot. Int. J. Innov.
Technol. Explor. Eng. 2019, 8, 1544–1550.
19. Patil, R.R.; Vaidya, O.S.; Phade, G.M.; Gandhe, S.T. Qualified Scrutiny for Real-Time Object Tracking Framework. Int. J. Emerg.
Technol. 2020, 11, 313–319.
20. Patil, R.R.; Ansari, S.M.; Calay, R.K.; Mustafa, M.Y. Review of the State-of-the-art Sewer Monitoring and Maintenance Systems
Pune Municipal Corporation-A Case Study. TEM J. 2021, 10, 1500–1508. [CrossRef]
21. Patil, R.R.; Mustafa, M.Y.; Calay, R.K.; Ansari, S.M. S-BIRD: A Novel Critical Multi-Class Imagery Dataset for Sewer Monitoring
and Maintenance Systems. Sensors 2023, 23, 2966. [CrossRef] [PubMed]
22. Google Source Images. Available online: https://www.drainmasterohio.com/red-flags-of-tree-root-intrusion-in-your-drain-
pipes/ (accessed on 30 June 2023).
23. Google Source Images. Available online: https://arboriculture.files.wordpress.com/2016/02/treerootpipe.jpg (accessed on
30 June 2023).
24. Google Source Images. Available online: https://spunout.ie/wp-content/uploads/elementor/thumbs/Plastic_bottles_in_the_
sea-q0ubkb8pkwa5boeuhpaj6o0v1e8l43mla862l6488o.jpg (accessed on 30 June 2023).
25. Google Source Images. Available online: https://bbwsd.com/wordpress/wp-content/uploads/2018/03/FOG-850x425.jpg
(accessed on 30 June 2023).
26. Google Source Images. Available online: https://images.squarespace-cdn.com/content/v1/55e97d2de4b0a47f46957437/149930
8890029-VM48EFRJJMCSOFFHFETV/iStock-482437666.jpg?format=1000w (accessed on 30 June 2023).
27. Google Source Images. Available online: https://www.istockphoto.com/photo/plastic-bottles-isolated-on-white-gm120234722
3-345153972 (accessed on 30 June 2023).
28. Wang, Z.; Pei, Y.; Li, J. A Survey on Search Strategy of Evolutionary Multi-Objective Optimization Algorithms. Appl. Sci. 2023, 13,
4643. [CrossRef]
29. Jiang, L.; Geng, Z.; Gu, D.; Guo, S.; Huang, R.; Cheng, H.; Zhu, K. RS-SVM machine learning approach driven by case data for
selecting urban drainage network restoration scheme. Data Intell. 2023, 5, 413–437. [CrossRef]
30. Yazdi, J. Rehabilitation of urban drainage systems using a resilience-based approach. Water Resour. Manag. 2018, 32, 721–734.
[CrossRef]
31. Cai, X.; Shirkhani, H.; Mohammadian, A. Sensitivity-based adaptive procedure (SAP) for optimal rehabilitation of sewer systems.
Urban Water J. 2022, 19, 889–899. [CrossRef]
32. Rathnayake, U. Migrating storms and optimal control of urban sewer networks. Hydrology 2015, 2, 230–241. [CrossRef]
33. Draude, S.; Keedwell, E.; Kapelan, Z.; Hiscock, R. Multi-objective optimisation of sewer maintenance scheduling. J. Hydroinform.
2022, 24, 574–589. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.