A Semi-Supervised Learning Approach For Pixel-Level Pavement Anomaly Detection
A Semi-Supervised Learning Approach For Pixel-Level Pavement Anomaly Detection
Abstract— Accurate and fast detection of pavement distress developed a crack detection method for features calculated
can provide reliable and effective technical support for pavement along every free-form path, which takes into account noisy
maintenance and rehabitation. Recently, deep learning has been texture background. Li et al. [3] extracted pavement cracks
widely used in pavement distress detection. However, its appli-
cation is still limited by the laborious and difficult annotation through the F* Seed growth algorithm to search for complex
process due to the complex topology of pavement distress. crack topology structures. Salman et al. [4] employed the
In this study, we propose a pavement anomaly detection network Gabor filter for distinguishing the fine structure of pavement
(PAD Net), which is a semi-supervised learning approach based cracks. The above methods can address some of the difficulties
on generative adversarial networks for identifying pixel-level in pavement distress detection, but no general solutions were
anomalous image segments. We build a mapping function for
unpaired abnormal and normal pavement images through a obtained.
framework containing two generators and three novel discrim- In recent years, deep learning has been widely used for
inators. The framework is capable of maintaining background pavement distress detection with its higher accuracy and better
pixels and modifying anomalous foreground regions with the help generalization performance. It can be generally categorized
of multi-style discriminators that consider interrelationships of into supervised learning, which requires manual annotations,
multi-scale generated images. Meanwhile, pixel-level abnormal
areas are detected through an end-to-end mask channel. Experi- and weakly supervised learning without them. A supervised
ments show that our approach is able to achieve 80.75% accuracy learning based approach requires various forms of distress
on our dataset without pixel-level or patch-level annotations. annotations to obtain positive samples and then trains a
Quantitative comparisons with several prior semi-supervised discriminative network based on them to identify distress.
methods demonstrate the superiority of our approach. For example, some researchers partition large-size pavement
Index Terms— Pavement distress, anomaly detection, semi- images into small-size images and determine whether they
supervised learning, generative adversarial network. belong to a known distress form to achieve patch-level detec-
tion [5], [6]. Recently, object detection is adopted to locate
I. I NTRODUCTION the distress of pavement in large-scale images using bounding
boxes with variable size [7], [8]. Different algorithms, such
P AVEMENT distress directly affects road service life and
driving safety. Accurate and fast detection of pavement
distress can provide reliable and effective technical support
as RetinaNet [9], YOLOv5 [10], and Faster R-CNN [11],
are developed and applied in pavement distress detection.
for pavement maintenance and rehabitation. In the early days, Meanwhile, pixel-level segmentation is adopted to characterize
pavement detection was mainly conducted manually by visu- the pavement distress morphology, including the length, shape,
ally collecting and subjectively evaluating distress information. size, etc. [12], [13]. For instance, Hou et al. [14] use generative
The manual detection has low efficiency and is easy to make adversarial networks (GANs) for image argumentation; and
large errors. In recent decades, pavement inspection vehicles Ren et al. [15] integrate dilated convolution, spatial pyramid
and machine vision based automatic detection have greatly pooling, and skip connection to improve the segmentation
improved the efficiency of pavement distress detection. accuracy.
The diversity and topological complexity of pavement dis- Supervised learning approaches greatly improve the accu-
tress, the low contrast and intensity inhomogeneity of images, racy of pavement distress detection and are easy to transfer to
and the noisy background lead researchers to put forward similar tasks. However, the complex topology of pavement
targeted solutions. For instance, Cubero-Fernandez et al. [1] distress makes pixel-level annotations time-consuming and
preprocessed the pavement crack image through logarithmic laborious. Furthermore, we cannot guarantee that all excep-
transformation, a bilateral filtering, a canny algorithm, and a tion categories are annotated for supervised learning, which
morphological filter to improve the contrast. Nguyen et al. [2] means that untrained categories may be difficult to detect.
From another perspective, pavement distress detection can be
Manuscript received 24 October 2022; accepted 10 April 2023. Date of regarded as an anomaly detection problem in addition to the
publication 21 April 2023; date of current version 30 August 2023. This
work was supported by the National Natural Science Foundation of China object detection problem since the distress is an abnormal
under Grant 52278405. The Associate Editor for this article was S. A. Haider. condition. Anomaly detection assumes that most instances in
(Corresponding author: Peixin Shi.) the dataset are normal and detects anomalies by looking for
The authors are with the School of Rail Transportation, Soochow University,
Suzhou 215000, China (e-mail: [email protected]). instances that do not match normal data. Obviously, most of
Digital Object Identifier 10.1109/TITS.2023.3267433 the pavement will appear in normal condition without distress
1558-0016 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10100 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
REN et al.: SEMI-SUPERVISED LEARNING APPROACH FOR PIXEL-LEVEL PAVEMENT ANOMALY DETECTION 10101
painting styles, etc., and are not good at changing or removing image distribution and the original image distribution. Unlike
complex targets, such as pavement distress. Our approach is the cycle-consistency loss [25], we do not encourage the
inspired by these ideas and is adapted to pavement distress. generated image to be similar to the original one. Specifically,
we synthesise multi-modal images in their neighborhood dis-
III. M ETHODOLOGY tribution. The adversarial-consistency loss is as follows:
Let {xi } ∈ X and {yi } ∈ Y be the domain of abnormal and
h i
Lacl = E(x,x̂ )∼ p(x,{x̂ }) log D̂ x, x̂
normal pavement images, respectively, where xi and yi are h i
images sampled from X and Y respectively. Our approach aims + E(x,x ′ )∼ p(x,{x ′ }) log 1 − D̂ x, x ′ (3)
to find a mapping function G : X → Y between two domains
X and Y . As illustrated in Figure 1, we propose a training
schema that includes two generators and three discriminators. B. Generation of Segmentation Mask Image
Similar to the Cycle-GAN [25], We design another mapping The prior method involved comparing the images taken
function F : Y → X , used to map Y domain to X before and after reconstruction to determine various pixel
domain. The domain X̂ is mapped from Y through function regions in order to provide a pixel-level binary segmentation
F : Y → X . Letting X domain pass through the mapping image of the distress, which can result in a variety of issues
function F : Y → X results in domain X ′ . Assuming x1 = because the reconstructed images frequently differ from the
(ξ1 , ξ2 , ξ3 , · · · , η1 , η2 , · · ·), where ξi are normal areas and ηi original ones despite looking similar. In order to make the
are abnormal areas in the abnormal pavement images, the sets segmentation process trainable, an end-to-end method for
from domain Y obtained by the mapping function G : X → Y anomaly segmentation is provided. In addition to the three
can be expressed as G (x1 ) = y1 = (ξ1 , ξ2 , ξ3 , ξ4 , ξ5 , · · ·), RGB channels, we add a fourth channel, the mask channel,
indicating that the abnormal areas have been replaced by to generate segmentation images. Its values are limited to
the normal areas. Similarly, we expect the sets from domain 0 and 1, and are advised to be 0 or 1. It will eventually be
X̂ obtained by the mapping function F : Y → X can calculated in conjunction with the RGB channel values to form
be expressed as F (y1 ) = xb1 = (ξ1 , ξ2 , ξ3 , · · · , η1 , η2 , · · ·), a new image. We urge the generator to change the foreground
where F : Y → X is s optimized by comparing xb1 with x1 . while leaving the background alone by using the mask channel.
The above measures ensure that the generated yi = G (xi )
belongs to the domain Y and is similar to xi . Therefore, C. Network Structure
we can detect the abnormal areas by comparing xi with yi .
Concretely, we want the mapping function G : X → Y to 1) Generator: Pavement distress requires special feature
cure the distress on the pavement image without changing extraction methods due to its special topology. Previous
other pixels. To ensure that the generated images belong research shows that the combination of the fully connected
to the target domain, we employed three discriminators to (FC) layer and convolutional layer can locate the pavement
determine whether the they belong to the domains Y , X̂ and distress more effectively. Inspired by this, we chose the
X ′ , respectively. In addition, we optimize the whole framework auto-encoder from MUNIT [26] as the generator. It has a
by comparing the similarity between the domains X , X ′ and downsampling-upsampling structure, as depicted in Figure 2.
X̂ . The loss function used to train our network is introduced To assist the decoder in eradicating the anomalous area,
as follows. the encoder component must be sensitive to it. The encoder
consists of two sections: the content code and the style code.
Specifically, the content encoder first adopts the convolution
A. Loss Functions layer for downsampling, followed by the concatenation of
1) Adversarial Loss: We use the least squares GAN several residual blocks, and then all convolution layers are
(LSGAN) loss [28] to calculate the distance between the gen- subjected to instance normalization (IN). The style encoder
erated distribution pg and true distribution pdata as follows: also uses convolutional layers for downsampling, followed by
h i h i a global pooling layer and a FC layer. The IN layer is not used
D = Ex∼ p data (x) (D (x) − 1) + Ex∼ p g (x) D(G (x))
2 2
Ladv here so as to retain more style information. The decoder part
(1) is responsible for reconstructing the content code and style
code into an image by means of residual blocks and several
where, 0-1 is used to encode the real and generated samples, upsampling and convolutional layers.
respectively. By minimizing this loss function, we can contin- 2) Discriminator: Inspired by the MUNIT [26] generator
uously optimize the discriminator. Similarly, the generator is and multi-scale discriminators [29], we propose a multi-style
optimized using the following loss function: discriminator. As shown in Figure 3, the multi-style discrimi-
h i
nator contains a downsampling module and then encodes the
G = Ex∼ p g (x) (D (G (x)) − 1)
2
Ladv (2)
feature maps in four convolutional layers, which are encoded
2) Adversarial-Consistency Loss: The adversarial loss into style codes corresponding to different scales by the global
enables the generated image to be in the correct domain but pooling layer and FC layer, respectively. Finally, the style
does not guarantee that it is similar to the original image. codes are turned into a scalar by a multilayer perceptron
For this purpose, we introduced Adversarial-Consistency Loss (MLP). The whole discriminator finally generates a list that
[27]. It aims to measure the distance between the generated contains four values. Unlike the multi-scale discriminator,
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10102 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023
B. Metrics
We focus on two aspects, the quality of the reconstruction
image and the segmentation image. In order to evaluate the
quality of reconstruction images, we introduced two evaluation
indicators, Fréchet Inception Distance (FID) and mean Struc-
tural Similarity (mSSIM). We use FID to evaluate the quality
of the domain, which consists of generated images, and SSIM
to evaluate the quality of generated images in paired images.
FID and mSSIM are calculated as follows:
1
F I D (X, Y ) = |µ X − µY |2 + tr 6 X + 6Y − 2(6 X 6Y ) 2
(4)
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
REN et al.: SEMI-SUPERVISED LEARNING APPROACH FOR PIXEL-LEVEL PAVEMENT ANOMALY DETECTION 10103
TABLE I
PARAMETERS OF M ULTI -S TYLE D ISCRIMINATOR
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10104 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023
Fig. 4. Reconstruction results against baselines. From left to right: input, our PAD Net, ACL-GAN [27], MUNIT [26], Fixed Point [17], f-Anogan [18], and
GANormaly [30].
Fig. 5. Segmentation results against baselines. From left to right: manually annotations, our PAD Net, ACL-GAN [27], MUNIT [26], Fixed Point [17],
f-Anogan [18], and GANormaly [30].
reflected in Figure 5, where only our method is able to indicate each image before and after reconstruction, and FID represents
the location of the distress. the gap between image sets before and after reconstruction.
1) Mask Channel: We set two methods for different
discriminators to obtain the segmentation images by mask
C. Ablation Study channel and compare the paired images through SSIM. Qual-
We analyzed two different settings to reflect the two innova- itatively, using a mask channel can segment the distress area
tions in our method. 1) comparing the segmented image quality more accurately, and the segmented area is continuous. The
obtained with or without the mask channel; 2) analyzing segmentation images through SSIM consist of independent
the effect of multi-style discriminators on the quality of the small-area distress regions. The quantitative results in Table IV
segmentation images obtained. The results of different settings show that employing the mask channel to obtain segmentation
are shown in Figure 6, and the quantitative results are listed in images has a higher mPA than segmentation images obtained
Table IV, in which mSSIM represents the average similarity of by SSIM, regardless of the discriminator used.
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
REN et al.: SEMI-SUPERVISED LEARNING APPROACH FOR PIXEL-LEVEL PAVEMENT ANOMALY DETECTION 10105
Fig. 6. Segmentation results of different ablation settings. From left to right: manually annotations, multi-style discriminator (4 layers) with mask channel,
multi-style discriminator (4 layers) with SSIM, multi-style discriminator (1 layer) with mask channel, multi-style discriminator (1 layers) with SSIM, multi-scale
discriminator with mask channel, and multi-scale discriminator with SSIM.
TABLE III increase of MLP layers. The optimal solution appears when
Q UANTITATIVE R ESULTS OF D IFFERENT M ETHODS the number of MLP layers is four. At this time, we obtain
80.75% mPA while obtaining 96.16% mSSIM. In addition, the
comparison with the widely adopted multi-scale discriminator
shows that the detection accuracy of the multi-style discrimi-
nator using one MLP layer is weaker than that of the multi-
scale discriminator, but it performs better when using two or
more MLP layers. As shown in Table III, our method obtains
the highest mPA while also obtaining the maximum FID.
Considering that FID tends to evaluate the contents or features
between two domains, we believe that our method changes
TABLE IV content information in the original image without changing
Q UANTITATIVE R ESULTS OF D IFFERENT A BLATION S ETTINGS most pixels in the background. Qualitatively, the use of the
multi-style (one layer) discriminator leads to the collapse of
the mask channel, and the expected segmentation images are
not obtained. Comparing paired images through SSIM also
shows that the use of the multi-style (one layer) discriminator
results in the generated image being similar to the original one,
with only a small amount of distress detected. The multi-scale
discriminator tends to transform the original image as a whole,
and the white patches in the segmentation images obtained by
it are scattered over the entire image, failing to detect the
distress area accurately.
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
10106 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 24, NO. 9, SEPTEMBER 2023
Fig. 7. High-quality cases of our approach. The red boxes indicate pavement distress that can be detected by our method, but are missed when manually
labeled.
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.
REN et al.: SEMI-SUPERVISED LEARNING APPROACH FOR PIXEL-LEVEL PAVEMENT ANOMALY DETECTION 10107
[8] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road [28] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley,
damage detection and classification using deep neural networks with “Least squares generative adversarial networks,” in Proc. IEEE Int. Conf.
smartphone images,” Comput.-Aided Civil Infrastruct. Eng., vol. 33, Comput. Vis. (ICCV), Oct. 2017, pp. 2794–2802.
no. 12, pp. 1127–1141, Jun. 2018. [29] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro,
[9] V. P. Tran, T. S. Tran, H. J. Lee, K. D. Kim, J. Baek, and T. T. Nguyen, “High-resolution image synthesis and semantic manipulation with condi-
“One stage detector (RetinaNet)-based crack detection for asphalt pave- tional GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
ments considering pavement distresses and surface objects,” J. Civil Jun. 2018, pp. 8798–8807.
Struct. Health Monit., vol. 11, no. 1, pp. 205–222, Feb. 2021. [30] S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “Ganomaly:
[10] G. X. Hu, B. L. Hu, Z. Yang, L. Huang, and P. Li, “Pavement crack Semi-supervised anomaly detection via adversarial training,” in Proc.
detection method based on deep learning models,” Wireless Commun. 14th Asian Conf. Comput. Vis. (ACCV). Perth, WA, Australia: Springer,
Mobile Comput., vol. 2021, pp. 1–13, May 2021. Dec. 2018.
[11] X. Xu et al., “Crack detection and comparison study based on faster
R-CNN and mask R-CNN,” Sensors, vol. 22, no. 3, p. 1215, Feb. 2022.
[12] Y. Wang, K. Song, J. Liu, H. Dong, Y. Yan, and P. Jiang, “RENet:
Rectangular convolution pyramid and edge enhancement network for
salient object detection of pavement cracks,” Measurement, vol. 170,
Jan. 2021, Art. no. 108698. Ruiqi Ren received the B.E. degree from Central
[13] F. Yang, L. Zhang, S. Yu, D. V. Prokhorov, X. Mei, and H. Ling, “Feature South University, Changsha, China. He is currently
pyramid and hierarchical boosting network for pavement crack detec- pursuing the Ph.D. degree with the School of Rail
tion,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 4, pp. 1525–1535, Transportation, Soochow University, Suzhou, China.
Apr. 2020. His research interests include structural health mon-
[14] Y. Hou et al., “A deep learning method for pavement crack identification itoring in civil and infrastructure, deep learning, and
based on limited field images,” IEEE Trans. Intell. Transp. Syst., vol. 23, computer vision.
no. 11, pp. 22156–22165, Nov. 2022.
[15] Y. Ren et al., “Image-based concrete crack detection in tunnels using
deep fully convolutional networks,” Construct. Building Mater., vol. 234,
Feb. 2020, Art. no. 117367.
[16] C. Baur, S. Denner, B. Wiestler, N. Navab, and S. Albarqouni,
“Autoencoders for unsupervised anomaly segmentation in brain MR
images: A comparative study,” Med. Image Anal., vol. 69, Apr. 2021,
Art. no. 101952. Peixin Shi received the Ph.D. degree in civil and
[17] M. M. R. Siddiquee et al., “Learning fixed points in generative adver- environmental engineering from Cornell University.
sarial networks: From image-to-image translation to disease detection He is currently a Full Professor with the School
and localization,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), of Rail Transportation, Soochow University. His
Oct. 2019, pp. 191–200. research topics include tunnelling and underground
[18] T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and space technology, smart underground infrastructures,
U. Schmidt-Erfurth, “f-AnoGAN: Fast unsupervised anomaly detection and lifeline earthquake engineering.
with generative adversarial networks,” Med. Image Anal., vol. 54,
pp. 30–44, May 2019.
[19] J. Song, K. Kong, Y.-I. Park, S.-G. Kim, and S.-J. Kang, “AnoSeg:
Anomaly segmentation network using self-supervised learning,” 2021,
arXiv:2110.03396.
[20] V. Zavrtanik, M. Kristan, and D. Skočaj, “Reconstruction by inpainting
for visual anomaly detection,” Pattern Recognit., vol. 112, Apr. 2021,
Art. no. 107706. Pengjiao Jia received the Ph.D. degree in geotech-
[21] G. Di Biase, H. Blum, R. Siegwart, and C. Cadena, “Pixel-wise anomaly nical engineering from Northeastern University in
detection in complex driving scenes,” in Proc. IEEE/CVF Conf. Comput. 2020. He is currently an Associate Professor of
Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 16918–16927. geotechnical engineering with the School of Rail
[22] T. Vojir, T. Šipka, R. Aljundi, N. Chumerin, D. O. Reino, and J. Matas, Transportation, Soochow University. His research
“Road anomaly detection by partial image reconstruction with segmen- has been recognized by 56 technical and more than
tation coupling,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), ten patents. His research interests include pipelines
Oct. 2021, pp. 15651–15660. and trenchless technology, pipe jacking, and
[23] H. Liu, X. Miao, C. Mertz, C. Xu, and H. Kong, “CrackFormer: Trans- tunneling.
former network for fine-grained crack detection,” in Proc. IEEE/CVF
Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 3783–3792.
[24] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD—A
comprehensive real-world dataset for unsupervised anomaly detection,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2019, pp. 9592–9600. Xiangyang Xu is a full professor at Soochow
[25] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image University. She has published about 60 SCI
translation using cycle-consistent adversarial networks,” in Proc. IEEE papers, and has served as the associate/guest edi-
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232. tor for several SCI journals. Her research top-
ics mainly involve structural health monitoring of
[26] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsu-
civil and infrastructure, digital twins, and artificial
pervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis.
intelligence.
(ECCV), 2018, pp. 172–189.
[27] Y. Zhao, R. Wu, and H. Dong, “Unpaired image-to-image translation
using adversarial consistency loss,” in Proc. 16th Eur. Conf. Comput.
Vis. (ECCV). Glasgow, U.K.: Springer, Aug. 2020.
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on December 25,2024 at 10:07:37 UTC from IEEE Xplore. Restrictions apply.