0% found this document useful (0 votes)
34 views

Fast and Accurate Traffic Sign Recognition For Self Driving Cars Using RetinaNet Based Detector

The document discusses a traffic sign recognition system for self-driving cars based on RetinaNet. It compares the performance of RetinaNet to Faster R-CNN and YOLOv3 using standard traffic sign datasets. The system uses RetinaNet for detection and a CNN classifier for recognition.

Uploaded by

ayac.yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Fast and Accurate Traffic Sign Recognition For Self Driving Cars Using RetinaNet Based Detector

The document discusses a traffic sign recognition system for self-driving cars based on RetinaNet. It compares the performance of RetinaNet to Faster R-CNN and YOLOv3 using standard traffic sign datasets. The system uses RetinaNet for detection and a CNN classifier for recognition.

Uploaded by

ayac.yt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Fast and Accurate Traffic Sign Recognition for Self

Driving Cars using RetinaNet based Detector

Shehan P Rajendran, Linu Shine, Pradeep R, Sajith Vijayaraghavan


Dept. of Electronics and Communication Engineering
College of Engineering Trivandrum
Trivandrum, India
[email protected], [email protected], [email protected], [email protected]

Abstract—Increase in the number of vehicles on road various benchmark datasets, most do not work very well in
necessitates the use of automated systems for driver assistance. adverse conditions like motion blur, poor illumination, rainy
These systems form important components of self-driving or foggy weather, small sized signs etc. Recent studies use
vehicles also. Traffic Sign Recognition system is such an deep learning techniques, especially Convolutional Neural
automated system which provides contextual awareness for the Networks (CNNs) for solving the traffic sign detection and
self-driving vehicle. CNN based methods like Faster R-CNN for classification problems. Majority of such studies use the
object detection provide human level accuracy and real time German Traffic sign Detection Benchmark (GTSDB) [6] and
performance and are proven successful in Traffic Sign German Traffic Sign Recognition Benchmark (GTSRB) [7]
Recognition systems [1]. Single stage detection systems such as
datasets for training and evaluation of their detection and
YOLO [2] and SSD [3], despite offering state-of-the-art real-
time detection speed, are not preferred for traffic sign detection
classification networks. Improvements are made in the
problem due to its reduced accuracy and small object detection general CNN based object detection networks for traffic sign
issues. RetinaNet has shown promising results with respect to detection. A modified version of Faster-RCNN, a two-stage
accuracy and speed required for object detection problems. It object detection network comprising of Region Proposal
uses Focal Loss [4] and Feature Pyramid Network (FPN) [5] for Network, bounding box regressor and classifier networks, has
tackling the low accuracy and small object detection problems. shown good accuracy and speed and it is found to be
In this paper, an approach for traffic sign recognition system for promising for real world problems like self-driving cars.
self-driving cars based on RetinaNet is presented with Single stage detectors like YOLO [2] offer real time speed in
comparative analysis of its performance with Faster R-CNN detection, however these suffer from foreground-background
based sign detector [1] and YOLOv3 [9] based detector. class imbalance problems and does not provide the required
RetinaNet forms the traffic sign detection network and a CNN- detection accuracy. Also, these detectors are not efficient in
based classifier forms the traffic sign class recognizer. The detecting small objects, which is critical for traffic sign
network training and evaluation are done using the German recognition. The recent improvements made as part of
Traffic Sign Detection Benchmark (GTSDB) [6] dataset and the YOLOv2 [8] and YOLOv3 [9] have tackled the problems to
classifier performance is verified using German Traffic Sign some extent.
Recognition Benchmark (GTSRB) [7] dataset.
In this paper, RetinaNet object detection network, which
Keywords— Traffic sign detection, CNN, RetinaNet, Faster R- is based on feature pyramid network and focal loss, is tuned
CNN, YOLO and used for the traffic sign detection purpose. Very good
detection result on the GTSDB database is observed using this
I. INTRODUCTION approach. The CNN based traffic sign classifier proposed by
Li and Wang [1] is used for classification of the traffic signs.
Systems for assisting the drivers to avoid accidents are
becoming more and more important as the number of vehicles The rest of the paper is organized as follows. Section II
on road is on an exponential increase. Advanced Driver covers the related work, Section III describes the technical
Assistance Systems (ADAS) are being effectively used in approach, Section IV details the experiments done, Section V
automobiles for providing lane keep assistance, forward covers the performance evaluation and Section VI draws the
collision warning, pedestrian warning, driver drowsiness conclusion of the work.
detection, traffic sign assist system etc. These form essential
systems in autonomous cars for contextual awareness and road II. RELATED WORK
attribute mapping in order to control the vehicle motion
trajectory. Traffic Sign Recognition (TSR) is the core A. Traffic Sign Recognition
component of traffic sign assist system for providing timely
Extensive research is being done by the computer vision
instructions and warnings to the drivers regarding traffic
and machine learning communities to address the problem of
restrictions and information. In self driving cars, the inputs
automatic traffic sign detection and recognition in the past
from traffic sign recognition system are used to make suitable
decade. Issues such as non-uniform scene illumination,
decisions by the car, for example, to reduce speed or prepare
blurring due to motion of vehicle-mounted camera with
for a detour etc.
respect to traffic signs, traffic sign occlusion due to other
Traffic sign recognition involves traffic sign detection and vehicles, trees etc. make the traffic sign recognition task very
classification. Several studies have been conducted to address challenging. Traffic sign detection based on extracting sign
the traffic sign recognition problem. Even though some of the proposals and classifying based on a color probability model
existing approaches have demonstrated good results on and Histogram of Oriented Gradients (HOG) is proposed by

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 784


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Y Yang et.al [10]. However manual features such as HOG III. APPROACH
fails due to the challenges mentioned above. Recently, several In this section, our approach for traffic sign recognition
object detection problems are being better addressed using based on RetinaNet is presented. The traffic sign recognition
Convolutional Neural Networks (CNNs). Computer vision pipeline, which is illustrated in Figure 1 consists of the
research for intelligent transportation systems are also RetinaNet based detector, the bounding box pre-processor and
following the trends and many such work are finding practical the CNN based classifier. The detector trained using GTSDB
use in advanced driver assistance applications as well as in dataset, detects and localizes the candidate traffic signs.
autonomous driving vehicles. In line with that, several Bounding box pre-processor pre-processes the localized
researchers have attempted to solve the problem of traffic sign traffic signs for classification. It enlarges the bounding boxes,
recognition using CNN based object detection frameworks. crops and resizes the boxes containing candidate traffic signs
Y Zhu et al [11] proposed a method based on deep and fed to the classifier. The CNN based classifier classifies
learning components for traffic sign recognition. It consisted the candidate traffic sign as belonging to one of the 43 traffic
of a Fully Convolutional Network (FCN) for proposal sign classes.
generation and a CNN for classifying signs. Later Zhu et al
[12] proposed a traffic sign recognition system using an end- A. RetinaNet based Traffic Sign Detection
to-end multi-class CNN for simultaneous traffic sign detection RetinaNet is used as the traffic sign detector. RetinaNet is
and classification. They have also created a new dataset called a composite network consisting of the following.
Tsinghua-Tencent 100K dataset for performance analysis.
They also proposed a single class detection network for the  Feature Pyramid Network (FPN) [5], which forms the
traffic sign detection and using a separate classifier for backbone network
classifying the detected traffic signs.
 A subnetwork for object classification
The performance of Fast-RCNN [13] for traffic sign
recognition is studied by Zhu et al [12]. Peng et al. [14]  A subnetwork for bounding box regression
analysed Faster R-CNN for traffic sign detection, even though RetinaNet structure is illustrated in Figure 2.
this approach was promising as compared to the previous
studies, consistent accuracy and detection speed could not be 1) Feature Pyramid Network (FPN): The RetinaNet
achieved. Another real time traffic sign recognizer based on based traffic sign detector uses FPN as the backbone network
Faster R-CNN [15] following the Mobilenet [16] structure and is built on top of ResNet-50 [20] deep feature extractor.
proposed recently by Li and Wang [1] could show the real- FPN is a fully convolutional network, which takes traffic
time performance and accuracy required for applications such scene image of an arbitrary size and outputs feature maps at
as self-driving car, based on their evaluation using GTSDB multiple scales resulting in a feature pyramid, a semantically
dataset. In this approach they have also proposed a classifier strong multi-scale feature representation. This multi-scale
network using asymmetric kernels for classifying the signs feature representation enables the detector to detect candidate
into 43 classes.
traffic signs of varying sizes present in the traffic scene. The
feature pyramid network consists of bottom-up and top-down
B. Traffic Sign Classification
pathways. A feature hierarchy comprises of feature maps of
CNN based classifiers trained with GTSRB dataset could different scales is generated by the bottom-up pathway. The
achieve high classification accuracy in classifying traffic
top-down pathway with lateral connections, up-samples the
signs. The classifier based on Multi Column Deep Neural
Network (MCDNN) proposed by Ciregan et al. [17] and feature map from the higher pyramid level, to generate
Multi-scale CNN by Sermanet et al [18] could achieve semantically strong, but spatially coarse feature maps of
accuracies of 99.17% and 99.65% respectively. These different spatial sizes. Using lateral connections, the feature
networks however are large and have to learn a huge number maps of same spatial sizes from both pathways are merged,
of parameters. Another approach by T L Yuan [19] using resulting in feature maps of different scales, which are
Spatial Transformer Networks (STN) could achieve an semantically strong with accurately localized activations.
accuracy of 99.59%. Classifier network based on CNN using 2) Classification Subnet: The object classification
asymmetric kernels proposed by Li and Wang [1] reported an subnetwork comprises of fully convolutional networks
accuracy of 99.66% when evaluated with GTSRB dataset. attached to each feature map level of the feature pyramid.
Their FRCNN based detector and CNN based classifier
Each subnetwork consists of four convolutional layers of
combination has proved superior to the state-of-the-art traffic
sign recognizers. The CNN based classifier employing filter size 3 × 3 with 256 filters with ReLU activations
asymmetric kernel proposed in [1] is used as the classifier in followed by another 3 × 3 convolutional layer with C × A
the proposed traffic sign recognition pipeline. filters with sigmoid activation, where C is the number of
classes and A is the number of anchor boxes. The output

Figure 1. RetinaNet based traffic sign detection pipeline

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 785


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Figure 2. RetinaNet detector architecture

shape would be (W, H, CA), where H and W are proportional = 1 and γ = 0, the focal loss becomes equivalent to the
to the height and width of the input feature map level categorical cross entropy loss. Smooth L1 loss function [13]
respectively. is used as the localization loss.
3) Regression Subnet: The bounding box regression B. Bounding Box Pre-processor
subnet is also attached to each feature map level of the feature
Bounding Box Pre-processor stage extracts and prepares
pyramid. The subnetwork structure is similar to that of the
the detected candidate traffic sign boxes for classification. The
classification network, the only difference is in the filter center of the regressed bounding box is determined and the
count of the last convolutional layer, which is 4A in this case box is enlarged by 25% to compensate for any regression
with 4 bounding box coordinates per anchor box. The output errors to make sure the traffic sign is completely enclosed in
shape would be (W, H, 4A) in this case. the region. The enlarged boxes are cropped and resized to
4) Loss Functions: RetinaNet uses a multi-task loss that 48 × 48 size and fed to the classifier network for recognition
contains two terms – localization loss and classification loss. of the traffic sign class.
= + (1) C. Traffic Sign Classifier
Here Lloc is the localization loss and Lcls is the classification A fast and accurate traffic sign classifier network
loss. λ is a balancing factor, which balances the two task architecture proposed in [1] is used here.
losses.
In this architecture, an n × n convolution is replaced by an
In single stage detectors, due to foreground-background class n × 1 convolution followed by a 1 × n convolution, which
imbalance, the training process is dominated by easily reduces both the number of convolution operations and the
classifiable background examples, which results in less network parameters. This leads to computational cost
accurate detection performance. RetinaNet resolves the class reduction and increased speed.
imbalance problem by using a variant of Focal Loss proposed
The classifier network structure is in Table I. Batch
by Yi Lin et al [4] as the classification loss. For each anchor,
Normalization and ReLU layers follows all layers other than
the classification loss is defined as: the final dense layer. The sixth layer forms an inception
module where kernels of different sizes are used to extract
= − { (1 − ) log( )
information from feature map output of the previous layer.
+ (1 − )(1 − ) log(1 − )} The output feature maps from the inception layer are
(2) concatenated to combine the feature maps. Dropout layers are
also used to regularize the activations of the final stages. To
where C denotes the number of classes; yi = 1, if ground trth
recognize the 43 traffic sign classes, fully-connected layer of
belongs to ith class and 0 otherwise; pi is the predicted score an output size of 43 with Softmax activation is used as the last
for the ith class; γ is the focusing parameter, whose value can layer.
range from 0 to +∞; αi is the weighting factor for the ith class
and can range from 0 to 1. Here, the categorical cross entropy
IV. EXPERIMENTS
loss function is modified by adding the term (1 - pi)γ called
the modulating factor which down-weights easy examples The RetinaNet based traffic sign detector and CNN based
and thus focusing training on hard negative examples. For a classifier are implemented using Keras with TensorFlow
misclassified example with small pi value, the modulating backend. The detector and classifier are trained and evaluated
factor is close to 1 and hence the loss is not affected, whereas on the GTSDB and GTSRB datasets respectively. Google
when pi is close to 1, i.e., for a well-classified example, the Colaboratory environment with Nvidia Tesla T4 GPU having
16GB GPU memory was used for training the detector. The
modulating factor becomes 0 and the loss is down-weighted.
evaluation of the detector, as well as the training and
αi and γ are hyper-parameters, which can be tuned. The effect
evaluation of the classifier are done on a computer with an
of the modulating factor is increased with increase in γ. If αi Nvidia GTX 1060 GPU having 6GB GPU memory.

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 786


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

Figure 3. Traffic scene image samples from GTSDB dataset [6]. These show different driving scenarios and lighting
conditions.
urban, rural and highway scenarios under different weather
and lighting conditions making it pretty representative and
TABLE I. TRAFFIC SIGN CLASSIFIER ARCHITECTURE [1]
challenging. The total dataset consists of 900 images of
1360 × 800 pixels in raw PPM format, which are divided into
Filter Size/ Filters/ Output 600 training images and 300 test images. The training images
Layer Type
Parameter Stride shape contain 846 traffic signs and test images contain 360 traffic
1 Conv 3×3 32, s1 48 × 48 × 32 signs. The traffic sign sizes in the images vary between 16 and
2 Conv 7×1 48, s1 48 × 48 × 48 128 pixels.
3 Conv 1×7 48, s1 48 × 48 × 48
4 MaxPool 2×2 s2 24 × 24 × 48 Representative sample images from GTSDB dataset is
5 DropOut 0.2 24 × 24 × 48
shown in Figure 3. It shows traffic scenes from urban, rural
3×1
and highway driving scenarios in different sizes as well as
6-1 Conv 64, s1 24 × 24 × 64
1×3
under different lighting and weather conditions. It can be seen
Inception

7-1 Conv 64, s1 24 × 24 × 64


that due to harsh background lighting, traffic sign is barely
6-2 Conv 1×7 64, s1 24 × 24 × 64 visible in one of the images. Also, in the highway scene, it can
7-2 Conv 7×1 64, s1 24 × 24 × 64 be seen that the traffic sign images are small in size. Being a
8 Concatenate - - 24 × 24 × 128 dataset having images of such diverse conditions, the GTSDB
9 MaxPool 2×2 s2 12 × 12 × 128 dataset is used to train and evaluate the RetinaNet based
10 DropOut 0.2 - 12 × 12 × 128 detector.
11 Conv 3×3 128, s1 12 × 12 × 128
3×3
The GTSRB dataset consists of more than 50000 traffic
12 Conv 256, s1 12 × 12 × 256
sign images of 43 classes with sizes varying between 15 × 15
13 MaxPool 2×2 s2 6 × 6 × 256
pixels and 222 × 193 pixels. Sample images of different traffic
14 Dropout 0.3 - 6 × 6 × 256 sign classes is shown in Figure 4. The classifier is trained
15 Dense 256 - 256 using 39209 training images and evaluated on 12630 test
16 Dropout 0.4 - 256 images from GTSRB dataset.
17 Dense - Softmax 43 - 43
B. Evaluation criteria
Intersection over Union (IoU) between the ground truth
A. Datasets box and the predicted bounding box > 0.5 is considered as a
The GTSDB and GTSRB datasets are very popular positive proposal. Mean average precision (mAP) is used for
datasets for training and evaluating models for traffic sign evaluating the detector performance. Speed of detection per
detection and recognition respectively. image, measured in milliseconds (or number of frames per
second) is also used for evaluating the detector. These
The GTSDB dataset is generated from video sequences performance measures are used for comparing the
recorded near Bochum, Germany. It consists of images from performance of the proposed detector with Faster R-CNN and

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 787


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

YOLOv3 based detectors. Accuracy is used as the measure for D. Classifier training
evaluating the classifier performance. Confusion matrix-based Classifier was trained on GTSRB dataset with 39209
evaluation for analyzing the accuracy of the classifier across training images. Minibatch training with batch size of 16 was
various traffic sign classes is also used. done. Adam optimizer was used with an initial learning rate of
0.001 and learning rate decay of 1e-6 per mini batch. The
C. Detector training classifier was trained for 50 epochs initially without any data
Detector training was done on GTSDB dataset with 600 augmentation and further for an additional 200 epochs with
training images. Resnet50 model pretrained on COCO dataset data augmentation, as proposed in [1], using various image
was loaded as the backbone model for the RetinaNet. The transformations like shifting, shearing, scaling and rotation.
regression subnets and classifier subnets are trained for 64
epochs with 500 steps/epoch, using minibatch training with a V. EVALUATION
batch size of 4. The Adam algorithm [21] was used as the loss
function optimizer with an initial learning rate of 1e-5 with A. Detector Performance
learning rate reduction by a factor of 0.1 if learning plateaus.
The RetinaNet based traffic sign detector was evaluated
For focal loss, a weighting factor of α = 0.25 and focusing
using the GTSDB test set consisting of 300 images. Mean
parameter γ = 2 is used. λ = 1 is used as the balancing factor
Average Precision (mAP) is used as the performance measure
for the multi task loss. After 64 epochs of training, the
for the detector. The proposed detector could achieve a state-
classification loss has reduced to 0.0066 and regression loss to
of-the-art detection performance of 96.7% on the GTSDB test
0.196.
set. mAP of 97.1% is observed when verified with the training

Figure 5. Sample images of 43 traffic sign classes from GTSRB dataset [7]

Figure 5. (a) Traffic sign detection results. (b) Traffic signs labelled with classification results

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 788


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

set images. It took an average of 197ms for processing an shows the confusion matrix of the classification results from
image, i.e., an average frame processing rate of approximately the model trained with data augmentation, where the per class
5 frames/sec (fps) could be achieved. Examples of detection accuracy of the classification results can be seen. It can be seen
results are shown in Figure 5 (a). that most of the classes could be classified with 100%
accuracy. A minimum accuracy of 97% is observed for one of
The detector performance is compared with traffic sign the classes.
detector implementations using Faster R-CNN [1] and
YOLOv3 [9] evaluated using the GTSDB test set. The A complete traffic sign recognition pipeline using RetinaNet
proposed RetinaNet based detector outperforms both the based detector, bounding box processor and CNN based
detectors in terms of accuracy. However, YOLOv3 could classifier has been setup by integrating the components. The
achieve a higher frame rate performance of 9.87 fps with a final result images with labels generated by the classifier on
mAP of 92.2%. bounding boxes from the detector are shown in Figure 5(b).
A comparison of the detector performances evaluated on
Intel i7 based PC with Nvidia GTX 1060 GPU is in Table II. VI. CONCLUSION
In this paper, a traffic sign recognition system based on
TABLE II. TRAFFIC SIGN DETECTOR COMPARISON RetinaNet detector and a custom CNN based classifier is
proposed. The detection performance, with traffic-sign being
Time per Frame considered as single class, is found to be superior to all
Detector mAP
image Rate
previous detectors, based on the detection accuracy, with a
Faster R-CNN [1] 84.5% 261ms 3.82 fps
reasonably high frame rate. The proposed detector has been
YOLOv3 [9] 92.2% 101ms 9.87fps
proved to be able to detect almost all categories of traffic signs
RetinaNet (proposed) 96.7% 197ms 5.07fps and could regress accurate bounding boxes for the detected
signs. The CNN based traffic sign classifier is simple in
B. Classifier Performance architecture with very high accuracy. The RetinaNet based
detector and CNN based classifier completes the traffic sign
The CNN based custom traffic sign classifier was recognition pipeline.
evaluated using the GTSRB test set consisting of 12630
images. Model trained without data augmentation could The fine-tuning and evaluation of the proposed detection
achieve an accuracy of 96.46%. With data augmentation, network on other benchmark datasets such as Tsinghua-
accuracy has improved to 99.6% as reported in [1]. Figure 6 Tencent 100K can be taken up in future, which could make

Figure 6. Traffic Sign Classifier Confusion Matrix

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 789


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Communication and Electronics Systems (ICCES 2019)
IEEE Conference Record # 45898; IEEE Xplore ISBN: 978-1-7281-1261-9

the recognition system more robust. Also, there is a possibility [8] Joseph Redmon, Ali Farhadi, “YOLO9000: Better, Faster, Stronger”,
of enhancing the datasets with newly added traffic signs, so [Online] Available: https://arxiv.org/abs/1612.08242, 2016
that the model can be made current with the existing [9] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental
Improvement”, [Online] Available: https://arxiv.org/abs/1804.02767,
transportation infrastructure in various countries. End-to-end 2018
traffic sign recognition pipeline using the proposed detector is [10] Y. Yang, H. Luo, H. Xu, and F. Wu, “Towards real-time traffic sign
another aspect which can be explored further. Based on some detection and classification,” IEEE Trans. Intell. Transp. Syst., vol. 17,
of the initial evaluations, it is observed that RetinaNet no. 7, pp. 2022–2031, Jul. 2016.
detector’s performance is promising for end-to-end traffic [11] Y. Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Traffic sign
sign recognition. Experiments with fine tuning the network detection and recognition using fully convolutional network guided
parameters to bring the performance of end-to-end recognition proposals,” Neurocomputing, vol. 214, pp. 758–766, Nov. 2016.
comparable to the detector-classifier combined approach can [12] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, “Traffic-sign
be taken up in future. detection and classification in the wild,” in Proc. IEEE Conf.
Comput.Vis. Pattern Recognit., Jun. 2016, pp. 2110–2118.
[13] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis.,
REFERENCES Dec. 2015, pp. 1440–1448.
[14] E. Peng, F. Chen, and X. Song, “Traffic sign detection with
convolutional neural networks,” in Proc. Int. Conf. Cogn. Syst. Signal
[1] Jia Li and Zengfu Wang, “Real-Time Traffic Sign Recognition Based
Process., Singapore: Springer, pp. 214–224, 2016.
on Efficient CNNs in the Wild”, IEEE Trans. Intell. Transp. Syst., vol.
20, no. 3, Mar. 2019. [15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
time object detection with region proposal networks,” IEEE Trans.
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
once: Unified, real-time object detection,” in Proc. IEEE Conf.
Comput.Vis. Pattern Recognit., pp. 779–788, Jun. 2016 [16] A. G. Howard et al. (2017). “MobileNets: Efficient convolutional
neural networks for mobile vision applications.” [Online]. Available:
[3] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf
https://arxiv.org/abs/1704.04861
Comput. Vis. Cham, Switzerland: Springer, 2016.
[17] D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural
[4] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár,
networks for image classification,” in Proc. IEEE Conf. Comput. Vis.
“Focal Loss for Dense Object Detection”, [Online] Available:
Pattern Recognit., Jun. 2012, pp. 3642–3649.
https://arxiv.org/abs/1708.02002, 2017
[18] P. Sermanet and Y. LeCun, “Traffic sign recognition with multi-scale
[5] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath
convolutional networks,” in Proc. IEEE Int. Joint Conf. Neural Netw.,
Hariharan, Serge Belongie, “Feature Pyramid Networks for Object
Aug. 2011, pp. 2809–2813.
Detection ”, [Online] Available: https://arxiv.org/abs/1612.03144,
2016 [19] T. L. Yuan. GTSRB_Keras_STN. Accessed: Nov. 1, 2017. [Online]
Available: https://github.com/hello2all/GTSRB_Keras_STN
[6] S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel,
“Detection of traffic signs in real-world images: The German traffic [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual
sign detection benchmark,” in Proc. IEEE Int. Joint Conf. Neural Learning for Image Recognition”, [Online] Available:
Netw., pp. 1–8, Aug. 2013. https://arxiv.org/abs/1512.03385
[7] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “The German [21] D. Kingma and J. Ba. (2014). “Adam: A method for stochastic
traffic sign recognition benchmark: A multi-class classification optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980
competition,” in Proc. IEEE Int. Joint Conf. Neural Netw., pp. 1453–
1460, Aug. 2011.

978-1-7281-1261-9/19/$31.00 ©2019 IEEE 790


Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on April 02,2024 at 13:55:21 UTC from IEEE Xplore. Restrictions apply.

You might also like