A Semi-Supervised Learning Approach For Pixel-Level Pavement Anomaly Detection

This document presents a semi-supervised learning approach for pixel-level pavement anomaly detection using a network called PAD Net, which leverages generative adversarial networks (GANs). The proposed method effectively identifies anomalous pavement segments without requiring pixel-level annotations, achieving an accuracy of 80.75% on the dataset. The framework includes two generators and three discriminators to maintain background pixels while modifying anomalous regions, addressing challenges in traditional pavement distress detection methods.

Uploaded by

yogiphd123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views9 pages

A Semi-Supervised Learning Approach For Pixel-Level Pavement Anomaly Detection

Uploaded by

yogiphd123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO.

9, SEPTEMBER 2023 10099

A Semi-Supervised Learning Approach for

Pixel-Level Pavement Anomaly Detection
Ruiqi Ren , Peixin Shi , Pengjiao Jia, and Xiangyang Xu

Abstract— Accurate and fast detection of pavement distress developed a crack detection method for features calculated
can provide reliable and effective technical support for pavement along every free-form path, which takes into account noisy
maintenance and rehabitation. Recently, deep learning has been texture background. Li et al. [3] extracted pavement cracks
widely used in pavement distress detection. However, its appli-
cation is still limited by the laborious and difficult annotation through the F* Seed growth algorithm to search for complex
process due to the complex topology of pavement distress. crack topology structures. Salman et al. [4] employed the
In this study, we propose a pavement anomaly detection network Gabor filter for distinguishing the fine structure of pavement
(PAD Net), which is a semi-supervised learning approach based cracks. The above methods can address some of the difficulties
on generative adversarial networks for identifying pixel-level in pavement distress detection, but no general solutions were
anomalous image segments. We build a mapping function for
unpaired abnormal and normal pavement images through a obtained.
framework containing two generators and three novel discrim- In recent years, deep learning has been widely used for
inators. The framework is capable of maintaining background pavement distress detection with its higher accuracy and better
pixels and modifying anomalous foreground regions with the help generalization performance. It can be generally categorized
of multi-style discriminators that consider interrelationships of into supervised learning, which requires manual annotations,
multi-scale generated images. Meanwhile, pixel-level abnormal
areas are detected through an end-to-end mask channel. Experi- and weakly supervised learning without them. A supervised
ments show that our approach is able to achieve 80.75% accuracy learning based approach requires various forms of distress
on our dataset without pixel-level or patch-level annotations. annotations to obtain positive samples and then trains a
Quantitative comparisons with several prior semi-supervised discriminative network based on them to identify distress.
methods demonstrate the superiority of our approach. For example, some researchers partition large-size pavement
Index Terms— Pavement distress, anomaly detection, semi- images into small-size images and determine whether they
supervised learning, generative adversarial network. belong to a known distress form to achieve patch-level detec-
tion [5], [6]. Recently, object detection is adopted to locate
I. I NTRODUCTION the distress of pavement in large-scale images using bounding
boxes with variable size [7], [8]. Different algorithms, such
P AVEMENT distress directly affects road service life and
driving safety. Accurate and fast detection of pavement
distress can provide reliable and effective technical support
as RetinaNet [9], YOLOv5 [10], and Faster R-CNN [11],
are developed and applied in pavement distress detection.
for pavement maintenance and rehabitation. In the early days, Meanwhile, pixel-level segmentation is adopted to characterize
pavement detection was mainly conducted manually by visu- the pavement distress morphology, including the length, shape,
ally collecting and subjectively evaluating distress information. size, etc. [12], [13]. For instance, Hou et al. [14] use generative
The manual detection has low efficiency and is easy to make adversarial networks (GANs) for image argumentation; and
large errors. In recent decades, pavement inspection vehicles Ren et al. [15] integrate dilated convolution, spatial pyramid
and machine vision based automatic detection have greatly pooling, and skip connection to improve the segmentation
improved the efficiency of pavement distress detection. accuracy.
The diversity and topological complexity of pavement dis- Supervised learning approaches greatly improve the accu-
tress, the low contrast and intensity inhomogeneity of images, racy of pavement distress detection and are easy to transfer to
and the noisy background lead researchers to put forward similar tasks. However, the complex topology of pavement
targeted solutions. For instance, Cubero-Fernandez et al. [1] distress makes pixel-level annotations time-consuming and
preprocessed the pavement crack image through logarithmic laborious. Furthermore, we cannot guarantee that all excep-
transformation, a bilateral filtering, a canny algorithm, and a tion categories are annotated for supervised learning, which
morphological filter to improve the contrast. Nguyen et al. [2] means that untrained categories may be difficult to detect.
From another perspective, pavement distress detection can be
Manuscript received 24 October 2022; accepted 10 April 2023. Date of regarded as an anomaly detection problem in addition to the
publication 21 April 2023; date of current version 30 August 2023. This
work was supported by the National Natural Science Foundation of China object detection problem since the distress is an abnormal
under Grant 52278405. The Associate Editor for this article was S. A. Haider. condition. Anomaly detection assumes that most instances in
(Corresponding author: Peixin Shi.) the dataset are normal and detects anomalies by looking for
The authors are with the School of Rail Transportation, Soochow University,
Suzhou 215000, China (e-mail: [email protected]). instances that do not match normal data. Obviously, most of
Digital Object Identifier 10.1109/TITS.2023.3267433 the pavement will appear in normal condition without distress
1558-0016 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10100 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023

to implement pixel-level pavement distress detection without

pixel-level or patch-level annotations.

II. R ELATED W ORK

1) Crack Segmentation: Crack is one of the main man-
ifestations of pavement distress and attracts the interest of
many researchers because of its unique shape. Current research
focuses on the improvement of accuracy, mainly based on
supervised learning. For example, feature pyramid and hier-
archical boosting network [13], grouped convolution pyra-
mid and edge enhancement network [12], and self-attention
and scaling-attention mechanisms [23] have been successfully
applied to crack segmentation. These methods are based on
Fig. 1. Training schema of PAD Net. high-quality expert annotations, and provide guidance for
high-precision feature extraction of cracks.
during the service life. Our goal is to find a mapping between 2) Anomaly Detection: Anomaly detection is used in human
abnormal pavement images and normal pavement images. disease detection, industrial defect detection, and autonomous
In this way, the input image will be compared with the output driving. In human disease detection, Schlegl.et al. [18] pro-
one to obtain the abnormal region. Human disease detection posed an unsupervised learning method based on GAN to
and industrial defect detection have both used the similar idea. build a rapid mapping by training healthy retinal images.
The similarity between human organs makes it possible to Baur et al. [16] introduced an auto-encoder for brain MRI
apply anomaly detection for the diagnosis of brain lesions based on an unsupervised approach for detecting pathologies.
[16], pulmonary embolism [17], and other lesions [18]. In the Siddiquee et al. [17] proposed a fixed-point translation based
same way, the similarity of industrial products allows anomaly on GAN for medical images. It obtained higher accuracy than
detection to be used for industrial defect detection, such as [18] by supervising same-domain translation and regulariz-
broken bolts and textile weaving errors [19], [20], [21], [22]. ing cross-domain translation; In industrial defect detection,
The aforementioned study indicates that we can train a MVTec provides high-resolution color images for anomaly
network to reconstruct abnormal pavement images into normal detection [24]. Based on it, Song et al. [19] proposed a self-
ones and detect pavement distress by comparing the images supervised method, AnoSeg, to achieve anomaly detection
before and after reconstruction, but it still faces enormous by designing anomaly patches. Zavrtanik et al. [20] achieved
challenges: 1) pavement images are usually taken by pavement anomaly detection by reconstructing the block-masked image
inspection vehicles to maintain normal traffic. This results in as the original one; In autonomous driving, positive progress
very different images, such as different lighting conditions, has been made by using street scene semantic segmenta-
color ranges, and shooting angles, which can lead to poor tion annotations [21] and self-set anomaly annotations [22].
reconstruction results; 2) even if the high-quality reconstruc- Anomaly detection inspires us to achieve pavement distress
tion image is obtained, it may not match the original one detection by reconstruction of abnormal pavement. How-
because normal pavement images is different from each other, ever, they either have definite reconstruction targets (such
resulting in the failure to obtain abnormal areas. as standardized industrial products) or relevant labels (such
We therefore propose a new framework based on generative as street scene segmentation annotations), which cannot be
adversarial networks (GANs), as shown in Figure 1, which has directly applied to the reconstruction of pavement images. Our
a generator G : X → Y that can transform abnormal pavement approach therefore encourages reconstruction networks with
images into normal ones. By considering the features of multi- more flexibility, allowing a variety of different pavements to
scale images, we employed a multi-style discriminator to be generated.
ensure that the generated images belong to the correct domain. 3) Unpaired Image-to-Image Translation: Unpaired image-
In order to ensure that the generated image is paired with to-image translation requires one type of image to be trans-
the original one, we use another generator F : Y → X , formed into another. Cycle-GAN [25] is a representative
which can transform the normal images back to the abnormal work, which achieves style translation such as photo gen-
ones. By comparing the difference between these abnormal eration from paintings through cycle-consistency loss and
images and the original ones, we encourage the generator G : has achieved great success. Further, Huang et al. [26] pro-
X → Y to generate paired images. In addition, the generator posed a multi-modal unsupervised image-to-image translation
F : Y → X is also employed to transform the original (MUNIT) framework, which can generate diverse outputs
image, which further encourages the generator G : X → Y to from source domain, such as converting house cats into
generate paired images. Meanwhile, we add a mask channel to different big cats. In addition, Zhao et al. [27] proposed
images for segmenting abnormal areas, which is an end-to-end adversarial-consistency loss GAN (ACL-GAN) to improve
network. We are able to obtain more accurate abnormal areas Cycle-GAN for shape changes, removing large objects, and
based on this channel than comparing the images before and other limitations. The above approaches are often used for
after reconstruction. To our knowledge, our method is the first style transfer tasks. They tend to change colors, regular stripes,

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
REN et al.: SEMI-SUPERVISED LEARNING APPROACH FOR PIXEL-LEVEL PAVEMENT ANOMALY DETECTION 10101

painting styles, etc., and are not good at changing or removing image distribution and the original image distribution. Unlike
complex targets, such as pavement distress. Our approach is the cycle-consistency loss [25], we do not encourage the
inspired by these ideas and is adapted to pavement distress. generated image to be similar to the original one. Specifically,
we synthesise multi-modal images in their neighborhood dis-
III. M ETHODOLOGY tribution. The adversarial-consistency loss is as follows:
Let {xi } ∈ X and {yi } ∈ Y be the domain of abnormal and
h i
Lacl = E(x,x̂ )∼ p(x,{x̂ }) log D̂ x, x̂
normal pavement images, respectively, where xi and yi are h i
images sampled from X and Y respectively. Our approach aims + E(x,x ′ )∼ p(x,{x ′ }) log 1 − D̂ x, x ′ (3)
to find a mapping function G : X → Y between two domains
X and Y . As illustrated in Figure 1, we propose a training
schema that includes two generators and three discriminators. B. Generation of Segmentation Mask Image
Similar to the Cycle-GAN [25], We design another mapping The prior method involved comparing the images taken
function F : Y → X , used to map Y domain to X before and after reconstruction to determine various pixel
domain. The domain X̂ is mapped from Y through function regions in order to provide a pixel-level binary segmentation
F : Y → X . Letting X domain pass through the mapping image of the distress, which can result in a variety of issues
function F : Y → X results in domain X ′ . Assuming x1 = because the reconstructed images frequently differ from the
(ξ1 , ξ2 , ξ3 , · · · , η1 , η2 , · · ·), where ξi are normal areas and ηi original ones despite looking similar. In order to make the
are abnormal areas in the abnormal pavement images, the sets segmentation process trainable, an end-to-end method for
from domain Y obtained by the mapping function G : X → Y anomaly segmentation is provided. In addition to the three
can be expressed as G (x1 ) = y1 = (ξ1 , ξ2 , ξ3 , ξ4 , ξ5 , · · ·), RGB channels, we add a fourth channel, the mask channel,
indicating that the abnormal areas have been replaced by to generate segmentation images. Its values are limited to
the normal areas. Similarly, we expect the sets from domain 0 and 1, and are advised to be 0 or 1. It will eventually be
X̂ obtained by the mapping function F : Y → X can calculated in conjunction with the RGB channel values to form
be expressed as F (y1 ) = xb1 = (ξ1 , ξ2 , ξ3 , · · · , η1 , η2 , · · ·), a new image. We urge the generator to change the foreground
where F : Y → X is s optimized by comparing xb1 with x1 . while leaving the background alone by using the mask channel.
The above measures ensure that the generated yi = G (xi )
belongs to the domain Y and is similar to xi . Therefore, C. Network Structure
we can detect the abnormal areas by comparing xi with yi .
Concretely, we want the mapping function G : X → Y to 1) Generator: Pavement distress requires special feature
cure the distress on the pavement image without changing extraction methods due to its special topology. Previous
other pixels. To ensure that the generated images belong research shows that the combination of the fully connected
to the target domain, we employed three discriminators to (FC) layer and convolutional layer can locate the pavement
determine whether the they belong to the domains Y , X̂ and distress more effectively. Inspired by this, we chose the
X ′ , respectively. In addition, we optimize the whole framework auto-encoder from MUNIT [26] as the generator. It has a
by comparing the similarity between the domains X , X ′ and downsampling-upsampling structure, as depicted in Figure 2.
X̂ . The loss function used to train our network is introduced To assist the decoder in eradicating the anomalous area,
as follows. the encoder component must be sensitive to it. The encoder
consists of two sections: the content code and the style code.
Specifically, the content encoder first adopts the convolution
A. Loss Functions layer for downsampling, followed by the concatenation of
1) Adversarial Loss: We use the least squares GAN several residual blocks, and then all convolution layers are
(LSGAN) loss [28] to calculate the distance between the gen- subjected to instance normalization (IN). The style encoder
erated distribution pg and true distribution pdata as follows: also uses convolutional layers for downsampling, followed by
h i h i a global pooling layer and a FC layer. The IN layer is not used
D = Ex∼ p data (x) (D (x) − 1) + Ex∼ p g (x) D(G (x))
2 2
Ladv here so as to retain more style information. The decoder part
(1) is responsible for reconstructing the content code and style
code into an image by means of residual blocks and several
where, 0-1 is used to encode the real and generated samples, upsampling and convolutional layers.
respectively. By minimizing this loss function, we can contin- 2) Discriminator: Inspired by the MUNIT [26] generator
uously optimize the discriminator. Similarly, the generator is and multi-scale discriminators [29], we propose a multi-style
optimized using the following loss function: discriminator. As shown in Figure 3, the multi-style discrimi-
h i
nator contains a downsampling module and then encodes the
G = Ex∼ p g (x) (D (G (x)) − 1)
2
Ladv (2)
feature maps in four convolutional layers, which are encoded
2) Adversarial-Consistency Loss: The adversarial loss into style codes corresponding to different scales by the global
enables the generated image to be in the correct domain but pooling layer and FC layer, respectively. Finally, the style
does not guarantee that it is similar to the original image. codes are turned into a scalar by a multilayer perceptron
For this purpose, we introduced Adversarial-Consistency Loss (MLP). The whole discriminator finally generates a list that
[27]. It aims to measure the distance between the generated contains four values. Unlike the multi-scale discriminator,

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10102 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023

Fig. 2. Generator network architecture.

B. Metrics
We focus on two aspects, the quality of the reconstruction
image and the segmentation image. In order to evaluate the
quality of reconstruction images, we introduced two evaluation
indicators, Fréchet Inception Distance (FID) and mean Struc-
tural Similarity (mSSIM). We use FID to evaluate the quality
of the domain, which consists of generated images, and SSIM
to evaluate the quality of generated images in paired images.
FID and mSSIM are calculated as follows:
1

F I D (X, Y ) = |µ X − µY |2 + tr 6 X + 6Y − 2(6 X 6Y ) 2
(4)

where µ X is the mean of X , µY is the mean of Y , 6 X is the

covariance matrices of X , 6Y is the covariance matrices of Y .
Fig. 3. Discriminator network architecture. n
X
mSSIM = SSIM (xi , yi ) /n (5)
i=1
we do not use fully convolutional layers, but convolutional
where n is the number of test images from domain Y , xi and
layers and MLP layers in the same way as the style encoder
yi are images from domain X and domain Y .
in the generator. This allows the discriminator to obtain more
detailed information. In addition, multi-style discriminators 2µx µ y + c1 2σx y + c2

are able to combine information from different scales because SSIM (x, y) = (6)
each scale is linked to the other and is generated by the same µ2x + µ2y + c1 σx2 + σ y2 + c2
network. This is different from the multi-scale discriminator
that distinguishes the feature maps of each scale independently. where µx is the mean of x, µ y is the mean of y, σx2 is
In the latter part of the experiments, it will be shown how it the variance of x, σ y2 is the variance of y, and σx y is the
affects the reconstruction of the image. covariance of x and y. c1 and c2 are the constants used to
maintain stability.
IV. E XPERIMENT Furthermore, in order to evaluate distress segmentation
images, we calculated the mean Pixel Accuracy (mPA) of the
A. Dataset segmentation mask images. mPA is calculated as follows:
Our dataset contains 88,088 pavement digital images taken
n
by a pavement inspection vehicle with a size of 256 × X
mPA = PA (yi ) /n (7)
256. These images contain transverse cracks, longitudinal
i=1
cracks, oblique cracks, alligator cracks, potholes, and many pcorr ect
other distress types. We randomly selected 300 images with PA (y) = (8)
ptotal
pavement distress to fine-tune a classifier based on ResNet50.
It finally divided 88,088 images into two categories: 57,183 where pcorr ect is the pixel number of correctly classified, and
normal images and 30,905 abnormal images. Therefore, the ptotal is the pixel number of the image.
entire annotation is only 300 image level labels without any We stochastically select 100 images from our dataset as a
patch-level or pixel-level annotations. test set and manually segment them to calculate mPA.

TABLE I
PARAMETERS OF M ULTI -S TYLE D ISCRIMINATOR

TABLE II It is applied to animal image translation and street scene

H YPERPARAMETERS OF PAD N ET translation tasks.
3) Fixed-Point [17]: Fixed-Point is able to identify a mini-
mal subset of target pixels for domain translation. It is applied
to pulmonary embolism detection.
4) f-AnoGAN [18]: Fast unsupervised anomaly detec-
tion with generative adversarial networks (f-AnoGAN) is a
semi-supervised learning method based on generative adver-
sarial networks, which is trained on normal medical images
for detecting human lesions.
5) GANomaly [30]: Semi-supervised anomaly detection via
adversarial training (GANomaly) is proposed for anomaly
C. Implementation Details detection, such as X-ray security screening problems. It trains
a reconstruction network on normal images.
Experiments were conducted on our pavement distress
dataset. To validate our three main innovations, we designed
three different experiments: 1) evaluating our PAD net frame- B. Comparison Study
work in comparison to other semi-supervised approaches. We compare the proposed method with the state-of-the-
We compared reconstruction and distress segmentation art methods as baselines and the quantitative analysis results
images; 2) whether to use a mask channel to generate seg- are shown in Table III. Most of them are able to reconstruct
mentation images; and 3) whether to train the network with abnormal pavement images into normal ones, and the distress
our multi-style discriminator. In ablation experiments, the regions will be obtained by comparing the different areas
hyperparameters of the discriminator will also be analyzed. before and after their reconstruction. Among them, GANor-
Other details include: we used Adam optimizer for training; maly is not used for pixel-level anomaly detection, so SSIM
The discriminators update twice while the generators update and FID are no longer calculated. Figure 4 and Figure 5 show
once; Similar to Fixed-Point [17], we also implement domain the image reconstruction results and the segmentation images
self-conversion; See Table I for details on multi-style dis- obtained by each approach.
criminators and Table II for the rest of the hyperparameters 1) Reconstruction: As shown in Table III, our method
information. The training schema was performed on Ubuntu obtained the highest mSSIM compared to other baselines.
18.04 platform using a NVIDIA A40 GPU. It shows that the image reconstructed by our method is most
similar to the original image. The FID of our method ranks
V. R ESULTS AND A NALYSIS third after ACL-GAN and f-AnoGAN. It can be seen in
Figure 4 that ACL-GAN and f-AnoGAN cannot effectively
A. Baselines reconstruct the distress region, implying that our method
1) ACL-GAN [27]: ACL-GAN is proposed to solve the changes the foreground information more efficiently and keeps
unpaired image-to-image translation problem, which encour- the background information unchanged. The MUNIT with
ages the foreground shape of the image in the original the highest FID over-altered the image, which can be found
domain to change by adversarial-consistency loss. It employs subjectively or in the lowest mSSIM.
a MUNIT generator and multi-scale discriminators and is used 2) Segmentation: As shown in Table III, our method
in male-to-female translation or glasses removal tasks. obtained the highest mPA, which is the most important indi-
2) MUNIT [26]: MUNIT develops a content encoder and a cator. It indicates that our method is able to detect pavement
style encoder to implement multi-domain image translation. distress more accurately. A more significant advantage is

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10104 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023

Fig. 4. Reconstruction results against baselines. From left to right: input, our PAD Net, ACL-GAN [27], MUNIT [26], Fixed Point [17], f-Anogan [18], and
GANormaly [30].

Fig. 5. Segmentation results against baselines. From left to right: manually annotations, our PAD Net, ACL-GAN [27], MUNIT [26], Fixed Point [17],
f-Anogan [18], and GANormaly [30].

reflected in Figure 5, where only our method is able to indicate each image before and after reconstruction, and FID represents
the location of the distress. the gap between image sets before and after reconstruction.
1) Mask Channel: We set two methods for different
discriminators to obtain the segmentation images by mask
C. Ablation Study channel and compare the paired images through SSIM. Qual-
We analyzed two different settings to reflect the two innova- itatively, using a mask channel can segment the distress area
tions in our method. 1) comparing the segmented image quality more accurately, and the segmented area is continuous. The
obtained with or without the mask channel; 2) analyzing segmentation images through SSIM consist of independent
the effect of multi-style discriminators on the quality of the small-area distress regions. The quantitative results in Table IV
segmentation images obtained. The results of different settings show that employing the mask channel to obtain segmentation
are shown in Figure 6, and the quantitative results are listed in images has a higher mPA than segmentation images obtained
Table IV, in which mSSIM represents the average similarity of by SSIM, regardless of the discriminator used.

Fig. 6. Segmentation results of different ablation settings. From left to right: manually annotations, multi-style discriminator (4 layers) with mask channel,
multi-style discriminator (4 layers) with SSIM, multi-style discriminator (1 layer) with mask channel, multi-style discriminator (1 layers) with SSIM, multi-scale
discriminator with mask channel, and multi-scale discriminator with SSIM.

TABLE III increase of MLP layers. The optimal solution appears when
Q UANTITATIVE R ESULTS OF D IFFERENT M ETHODS the number of MLP layers is four. At this time, we obtain
80.75% mPA while obtaining 96.16% mSSIM. In addition, the
comparison with the widely adopted multi-scale discriminator
shows that the detection accuracy of the multi-style discrimi-
nator using one MLP layer is weaker than that of the multi-
scale discriminator, but it performs better when using two or
more MLP layers. As shown in Table III, our method obtains
the highest mPA while also obtaining the maximum FID.
Considering that FID tends to evaluate the contents or features
between two domains, we believe that our method changes
TABLE IV content information in the original image without changing
Q UANTITATIVE R ESULTS OF D IFFERENT A BLATION S ETTINGS most pixels in the background. Qualitatively, the use of the
multi-style (one layer) discriminator leads to the collapse of
the mask channel, and the expected segmentation images are
not obtained. Comparing paired images through SSIM also
shows that the use of the multi-style (one layer) discriminator
results in the generated image being similar to the original one,
with only a small amount of distress detected. The multi-scale
discriminator tends to transform the original image as a whole,
and the white patches in the segmentation images obtained by
it are scattered over the entire image, failing to detect the
distress area accurately.

D. High Quality Cases

2) Discriminator: The proposed multi-style discriminator Figure 7 shows some high-quality results obtained by our
consists of multiple MLP layers for different downsampling approach, including transverse cracks, longitudinal cracks,
scales. The number of MLP layers is a hyperparameter. oblique cracks, and alligator cracks. The last line in Figure 7
To obtain the optimal parameters, we set five different numbers is the manually labeled pavement distress segmentation image,
of MLP layers (considering the scale of the input images is which has some problems such as missing labels and misla-
256 × 256). The results in Table IV show that the trend of bels. It is an inevitable problem of manual labeled images,
each evaluation increases first and then decreases with the so that supervised learning based approaches labeled as

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10106 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023

Fig. 7. High-quality cases of our approach. The red boxes indicate pavement distress that can be detected by our method, but are missed when manually
labeled.

VI. C ONCLUSION AND F UTURE W ORK

For the purpose of detecting pavement distress, we suggest a
novel framework, pavement anomaly detection network (PAD
Net), which is able to identify pavement degradation at the
pixel level for the first time without pixel-level annotations.
Within this framework, we present an end-to-end network for
pavement distress segmentation via mask channel and apply
multi-style discriminators that take correlation information
from multi-scale images into account. In both qualitative and
quantitative comparisons, our strategy outperforms previous
methods. It can also be applied to tasks that require similar
anomaly detection, such as industrial defect detection.
However, our method does not perform well in some special
Fig. 8. Typical failure cases of our approach. From left to right: The first
column mistakenly identified normal pavement as having pavement distress; cases, such as when the potholes contain stagnant water and
the second column failed to identify potholes with standing water; and the the color of distress is very close to the normal pavement.
third column failed to identify some pavement distress similar to normal Human prior knowledge may assist us in determining whether
pavement.
this component is abnormal, but PAD Net lacks this capability.
Furthermore, the convergence speed of our framework is slow.
"ground-truth" may learn wrong information. In contrast, our In the future, we intend to use more advanced methods to
approach detects some subtle distress that was not labeled. improve convergence speed and to draw on prior information
to improve the robustness of the framework.
E. Typical Failure Cases R EFERENCES
Although our method has achieved high-quality results for [1] A. Cubero-Fernandez, F. J. Rodriguez-Lozano, R. Villatoro, J. Olivares,
many distress situations, it still has some limitations. Figure 8 and J. M. Palomares, “Efficient pavement crack detection and classifi-
shows some typical failure cases obtained by our method. Part cation,” EURASIP J. Image Video Process., vol. 2017, no. 1, pp. 1–11,
Dec. 2017.
of the normal pavement in the first image was reconstructed, [2] T. S. Nguyen, S. Begot, F. Duculty, and M. Avila, “Free-form anisotropy:
causing it to be detected as pavement distress. The second A new method for crack detection on pavement surface images,” in Proc.
image failed to detect pavement slots covered by standing 18th IEEE Int. Conf. Image Process., Sep. 2011, pp. 1069–1072.
[3] Q. Li, Q. Zou, D. Zhang, and Q. Mao, “FoSA: F∗ seed-growing approach
water. The third image failed to accurately detect some pave- for crack-line detection from pavement images,” Imag. Vis. Comput.,
ment slots that were close to the normal pavement color. These vol. 29, no. 12, pp. 861–872, Nov. 2011.
failure cases offer some hints, such as the darker and wider [4] M. Salman, S. Mathavan, K. Kamal, and M. Rahman, “Pavement crack
detection using the Gabor filter,” in Proc. 16th Int. IEEE Conf. Intell.
range of the incorrect detection area in the first image; the Transp. Syst. (ITSC), Oct. 2013, pp. 2039–2044.
smoother surface of the water in the pit in the second image; [5] S. Park, S. Bang, H. Kim, and H. Kim, “Patch-based crack detection
and the color of the distress in the third image, which is in black box images using convolutional neural networks,” J. Comput.
Civil Eng., vol. 33, no. 3, May 2019, Art. no. 04019017.
difficult to distinguish with the naked eye. We suspect this
[6] B. Li, K. C. P. Wang, A. Zhang, E. Yang, and G. Wang, “Automatic clas-
is because our dataset is obtained by a classifier with an error sification of pavement crack using deep convolutional neural network,”
probability and is not fully divided into normal and abnormal Int. J. Pavement Eng., vol. 21, no. 4, pp. 457–463, Mar. 2020.
categories. In fact, it is possible to avoid this problem without [7] F.-C. Chen and M. R. Jahanshahi, “NB-CNN: Deep learning-based
crack detection using convolutional neural network and Naïve Bayes
adding more labor by obtaining images of the pavement at data fusion,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4392–4400,
different ages. May 2018.

[8] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road [28] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley,
damage detection and classification using deep neural networks with “Least squares generative adversarial networks,” in Proc. IEEE Int. Conf.
smartphone images,” Comput.-Aided Civil Infrastruct. Eng., vol. 33, Comput. Vis. (ICCV), Oct. 2017, pp. 2794–2802.
no. 12, pp. 1127–1141, Jun. 2018. [29] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro,
[9] V. P. Tran, T. S. Tran, H. J. Lee, K. D. Kim, J. Baek, and T. T. Nguyen, “High-resolution image synthesis and semantic manipulation with condi-
“One stage detector (RetinaNet)-based crack detection for asphalt pave- tional GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
ments considering pavement distresses and surface objects,” J. Civil Jun. 2018, pp. 8798–8807.
Struct. Health Monit., vol. 11, no. 1, pp. 205–222, Feb. 2021. [30] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “Ganomaly:
[10] G. X. Hu, B. L. Hu, Z. Yang, L. Huang, and P. Li, “Pavement crack Semi-supervised anomaly detection via adversarial training,” in Proc.
detection method based on deep learning models,” Wireless Commun. 14th Asian Conf. Comput. Vis. (ACCV). Perth, WA, Australia: Springer,
Mobile Comput., vol. 2021, pp. 1–13, May 2021. Dec. 2018.
[11] X. Xu et al., “Crack detection and comparison study based on faster
R-CNN and mask R-CNN,” Sensors, vol. 22, no. 3, p. 1215, Feb. 2022.
[12] Y. Wang, K. Song, J. Liu, H. Dong, Y. Yan, and P. Jiang, “RENet:
Rectangular convolution pyramid and edge enhancement network for
salient object detection of pavement cracks,” Measurement, vol. 170,
Jan. 2021, Art. no. 108698. Ruiqi Ren received the B.E. degree from Central
[13] F. Yang, L. Zhang, S. Yu, D. V. Prokhorov, X. Mei, and H. Ling, “Feature South University, Changsha, China. He is currently
pyramid and hierarchical boosting network for pavement crack detec- pursuing the Ph.D. degree with the School of Rail
tion,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 4, pp. 1525–1535, Transportation, Soochow University, Suzhou, China.
Apr. 2020. His research interests include structural health mon-
[14] Y. Hou et al., “A deep learning method for pavement crack identification itoring in civil and infrastructure, deep learning, and
based on limited field images,” IEEE Trans. Intell. Transp. Syst., vol. 23, computer vision.
no. 11, pp. 22156–22165, Nov. 2022.
[15] Y. Ren et al., “Image-based concrete crack detection in tunnels using
deep fully convolutional networks,” Construct. Building Mater., vol. 234,
Feb. 2020, Art. no. 117367.
[16] C. Baur, S. Denner, B. Wiestler, N. Navab, and S. Albarqouni,
“Autoencoders for unsupervised anomaly segmentation in brain MR
images: A comparative study,” Med. Image Anal., vol. 69, Apr. 2021,
Art. no. 101952. Peixin Shi received the Ph.D. degree in civil and
[17] M. M. R. Siddiquee et al., “Learning fixed points in generative adver- environmental engineering from Cornell University.
sarial networks: From image-to-image translation to disease detection He is currently a Full Professor with the School
and localization,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), of Rail Transportation, Soochow University. His
Oct. 2019, pp. 191–200. research topics include tunnelling and underground
[18] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and space technology, smart underground infrastructures,
U. Schmidt-Erfurth, “f-AnoGAN: Fast unsupervised anomaly detection and lifeline earthquake engineering.
with generative adversarial networks,” Med. Image Anal., vol. 54,
pp. 30–44, May 2019.
[19] J. Song, K. Kong, Y.-I. Park, S.-G. Kim, and S.-J. Kang, “AnoSeg:
Anomaly segmentation network using self-supervised learning,” 2021,
arXiv:2110.03396.
[20] V. Zavrtanik, M. Kristan, and D. Skočaj, “Reconstruction by inpainting
for visual anomaly detection,” Pattern Recognit., vol. 112, Apr. 2021,
Art. no. 107706. Pengjiao Jia received the Ph.D. degree in geotech-
[21] G. Di Biase, H. Blum, R. Siegwart, and C. Cadena, “Pixel-wise anomaly nical engineering from Northeastern University in
detection in complex driving scenes,” in Proc. IEEE/CVF Conf. Comput. 2020. He is currently an Associate Professor of
Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 16918–16927. geotechnical engineering with the School of Rail
[22] T. Vojir, T. Šipka, R. Aljundi, N. Chumerin, D. O. Reino, and J. Matas, Transportation, Soochow University. His research
“Road anomaly detection by partial image reconstruction with segmen- has been recognized by 56 technical and more than
tation coupling,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), ten patents. His research interests include pipelines
Oct. 2021, pp. 15651–15660. and trenchless technology, pipe jacking, and
[23] H. Liu, X. Miao, C. Mertz, C. Xu, and H. Kong, “CrackFormer: Trans- tunneling.
former network for fine-grained crack detection,” in Proc. IEEE/CVF
Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 3783–3792.
[24] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD—A
comprehensive real-world dataset for unsupervised anomaly detection,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2019, pp. 9592–9600. Xiangyang Xu is a full professor at Soochow
[25] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image University. She has published about 60 SCI
translation using cycle-consistent adversarial networks,” in Proc. IEEE papers, and has served as the associate/guest edi-
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232. tor for several SCI journals. Her research top-
ics mainly involve structural health monitoring of
[26] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsu-
civil and infrastructure, digital twins, and artificial
pervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis.
intelligence.
(ECCV), 2018, pp. 172–189.
[27] Y. Zhao, R. Wu, and H. Dong, “Unpaired image-to-image translation
using adversarial consistency loss,” in Proc. 16th Eur. Conf. Comput.
Vis. (ECCV). Glasgow, U.K.: Springer, Aug. 2020.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.