Traffic Light Counter Detection Comparison Using You Only Look Oncev3 and You Only Look Oncev5 For Version 3 and 5
Traffic Light Counter Detection Comparison Using You Only Look Oncev3 and You Only Look Oncev5 For Version 3 and 5
Corresponding Author:
Zamani Md Sani
Department of Mechatronics, Universiti Teknikal Malaysia Melaka
Hang Tuah Jaya, Malaysia
Email: [email protected]
1. INTRODUCTION
Road safety is globally recognised as one of the most significant problems that need to be
appropriately addressed. Driving through the red light is one of the most common causes for accidents that
occur at intersections. According to a research conducted by the insurance institute for highway safety (IIHS),
violations of traffic lights resulted in around 928 fatalities and 115,741 injuries on the highways of the United
States in 2020 [1]. Regrettably, most of the victims in the fatality or those with serious injuries were in good
health prior to the accidents. Hence, traffic lights are critical in ensuring the safety of especially urban roads.
Several studies on traffic safety have been conducted that looked into the different components of the
system [2]–[4].
The detection of traffic light counters on the road is critical for the safety of drivers, whether it is
autonomous vehicles or standard cars. The perception system, which gives the vehicle the ability to observe
and comprehend its surroundings, is a fundamental part of an autonomous automobile. It is possible to remark
that the development of autonomous vehicles has been motivated by a desire to cut down on the number of
accidents that take place worldwide. The detection of traffic light counter by an autonomous vehicle is an
essential kind of perception since it is critical for the control that the autonomous vehicle must perform, whether
it is to reduce the speed and stop at the traffic light junction or continue driving and cross the intersection.
Furthermore, if the driver is unfamiliar with the traffic light signals, a system that aids them in seeing the details
of traffic light signals or helps them to take actions based on the remaining time that is shown on the counter
of the traffic light which in turn is very important and might be crucial in a sensitive driving manoeuvre (for
instance, crossing an intersection) [5].
The aim of this research is to design and develop a system that detects the traffic light counter and
classifies the numbers (0-9) and their colour (red or green) on the counter, and to compare the results of the
you only look once version 3 (YOLOv3) algorithm with the YOLOv5 algorithm in different aspects such as
accuracy and confidence rate. Moreover, this research is focused on the classification of the traffic light counter
from zero to nine only. The classification is only for the numbers with colours which are (red and green) and
lastly, the detection system for the traffic light counter is performed in daytime only.
2. RELATED WORK
This section will review related literature on traffic light counter detection. Although autonomous
vehicles have been profoundly studied, most of the research conducted focussed on road signs and traffic lights
without including the traffic light counter. The study by Bascón et al. [6] focused on road signs where the
detection and recognition were based on support vector machines (SVMs) and the system proved to be accurate
and reliable. Furthermore, in [7], the process for detection and recognition utilised the illumination conditions
and multi-exposure images and it was also based on an SVM classifier. Although the results of the system were
accurate and reliable, the SVM classifier is however quite old and not very useful for current detection and
classification. Therefore, convolutional neural networks (CNN) [8] is more relevant for contemporary
conditions and has been used for applications in traffic lights, traffic signals and traffic light counter detection.
Meanwhile, Muller and Dietmayer [9], and Li and Zhou [10] have used single-shot multibox detection
(SSD) for traffic light detection. They utilized the DriveU traffic light dataset [11] and results from the research
were at 95% recall for small objects and up to 98% recall for larger objects while the false positive rates were
between 0.1 and 1. It was also demonstrated by Jensen et al. [12] that using YOLO [13]–[15] with the
laboratory for intelligent and safe automobiles (LISA) traffic light dataset [16] and logistic activity recognition
challenge (LARa) traffic light dataset [17] had produced 96.38% recall for YOLOv3, 68.06% recall for
YOLOv2 and 42.3% recall for YOLOv1. Another research [18] used faster region based convolutional neural
networks (R-CNN) [19] and LISA traffic light dataset [16] and Bosch small traffic light dataset [20] and the
results achieved were 56.31% mean average precision (mAP) on the Bosch dataset and 76.37% mAP on the
LISA dataset.
All the mentioned studies did not include the traffic light counter but rather the traffic light signals
only. Other research had used deep learning and YOLO for different purposes [21]–[38]. However, the study
by Chand et al. [5] used mask R-CNN [29] and was specifically for the countdown timer of the traffic light.
The dataset used were microsoft common objects in context (MS COCO) [30] and street view house numbers
dataset (SVHN) [31] with the acquired result of 82.2% precision and 82.78% recall. Based on the review of
past research, it is clear that multiple researchers had worked on traffic light detection and recognition systems
[32] and compared multiple algorithms to decide the best method for traffic light detection and classification
[33]. Nevertheless, this does not happen to the detection and classification of the timer counter on the traffic
light. Therefore, this paper will present the method to do the detection and classification of the counter, and
subsequently compare the performance of the results with two different algorithms which are the YOLOv3 and
YOLOv5.
3. METHOD
This project employed deep learning method with the YOLOv3 algorithm. When a photo is taken,
this algorithm identifies and recognises the numerous items in the image (in real-time). Object detection in
YOLO is accomplished using a regression problem, which results in the generation of class probabilities for
the pictures that were discovered. CNN are used in the YOLO method to recognise objects in real-time. When
it comes to object detection, the approach just needs a single forward propagation through a neural network, as
implied by the name. This indicates that a single algorithm run is sufficient to anticipate the content of the
whole image. It is used to forecast several class probabilities and bounding boxes simultaneously using a CNN
algorithm.
The dataset is a video collection of traffic lights with counter taken via a smartphone camera around
the city of Melaka, Malaysia and the videos were split into multiple frames per second to acquire a total of
2,204 frames with 1,764 (80%) were used for training. Another 440 (20%) were used for testing. The flow
chart of the system building and training process is illustrated in Figure 1.
The dataset was labelled manually and individually via computer vision annotation tool (CVAT)
which is an online platform. The dataset was then set into 20 classes (0-9 red and 0-9 green). Consequently,
the YOLOv3 algorithm was trained on Google Colab platform using Python. For the training process, the
maximum batch value was set to 40,000 and the filters to 75. Figure 2 shows a sample of the used dataset for
training.
𝑇𝑃
Recall = (2)
𝑇𝑃+𝐹𝑁
Traffic light counter detection comparison using you only look onceV3 … (Hamzah Abdulmalek Al-Haimi)
1588 ISSN: 2252-8938
5 19 0 100 13 0 100
6 34 0 100 19 5 79.1
7 28 0 100 14 0 100
8 20 8 71.4 21 5 80.07
9 10 0 100 9 1 90
0 20 5 80 9 5 64
Average precision% 91.5 Average precision 86.247
z Recall % F1 score %
89 83 70
IoU % mAp % Iteration
70.49 87.89 40,000
5 98 2 98 98 95 5 95 95
6 100 0 100 100 100 0 100 100
7 99 1 99 99 100 0 100 100
8 100 0 100 100 100 0 100 100
9 100 0 100 100 100 0 100 100
0 98 2 92 92 97 3 97 97
Total true positive Total false negative
1,984 16
Average recall% Average accuracy%
99.2 99.2
5 93 7 93 93
6 100 0 100 100
7 100 0 100 100
8 100 0 100 100
9 100 0 100 100
0 100 0 100 100
Green 100 0 100 100
Red 100 0 100 100
Total true positive Total false negative
1,170 30
Average recall% Average accuracy%
97.5 97.5
Red
96.3 97.6 99.2 99.4 99 97.7 99.3 99 98.6 97.9 98.4
100
Percentage
80
60
40
20
0
Confidence rate%
Number
1 2 3 4 5 6 7 8 9 0 Average
Green
98.6 99.7 99.1 99.6 99 97.7 99.7 99.2 99.2 99.5 99.33
100
Percentage
80
60
40
20
0
Confidence rate%
Number
1 2 3 4 5 6 7 8 9 0 Average
Traffic light counter detection comparison using you only look onceV3 … (Hamzah Abdulmalek Al-Haimi)
1590 ISSN: 2252-8938
FPS
6
4 2.5
2
0
YOLOv3
GPU
Tesla T4 Nvidia Jetson Nano
5. CONCLUSION
In conclusion, the YOLOv3 algorithm was successfully tested with the dataset collected around the
city of Melaka Malaysia. A total of 2204 images were split into 80% for training and 20% for testing and have
been labelled via CVAT and trained via Google Colab. The system was able to detect traffic light counter and
classifies the numbers (0-9) and its colour (red or green). The results for accuracy and recall are at 99.2%,
precision is at 89% intersection over union (IoU) is at 70.49% and mAp is at 87.89%. The YOLOv5 had been
tested and compared with the results shown are not very far from each other in terms of accuracy and reliability.
However, YOLOv5 has some limitations in terms of compatibility with the Nvidia Jetson Nano Kit as it cannot
be deployed on it. Moreover, YOLOv3 is lighter and has fewer layers; thus, it should have better FPS results
in both the Jetson Nano and personal computer.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the funding support received from Universiti Teknikal
Malaysia Melaka (UTeM) through Facilitation Research Program by Research & Innovation Management
(CRIM).
REFERENCES
[1] F. K. Green, “Red light running,” Research Report ARR, no. 356, 2002, doi: 10.1007/978-1-4614-7883-6_588-2.
[2] W. Wang, F. Hou, H. Tan, and H. Bubb, “A framework for function allocations in intelligent driver interface design for comfort
and safety,” International Journal of Computational Intelligence Systems, vol. 3, no. 5, pp. 531–541, 2010,
doi: 10.1080/18756891.2010.9727720.
[3] W. Wang et al., “Driver’s various information process and multi-ruled decision-making mechanism: a fundamental of intelligent
driving shaping model,” International Journal of Computational Intelligence Systems, vol. 4, no. 3, pp. 297–305, 2011,
doi: 10.1080/18756891.2011.9727786.
[4] W. Wang, H. Guo, H. Bubb, and K. Ikeuchi, “Numerical simulation and analysis procedure for model-based digital driving
dependability in intelligent transport system,” KSCE Journal of Civil Engineering, vol. 15, no. 5, pp. 891–898, 2011,
doi: 10.1007/s12205-011-1190-0.
[5] D. Chand, S. Gupta, and I. Kavati, “TSCTNet: traffic cignal and countdown timer detection network for autonomous vehicles,”
International Journal of Computer Information Systems and Industrial Management Applications, vol. 13, no. December,
pp. 182–191, 2021.
[6] S. Maldonado-Bascón, S. Lafuente-Arroyo, P. Gil-Jiménez, H. Gómez-Moreno, and F. López-Ferreras, “Road-sign detection and
recognition based on support vector machines,” IEEE Transactions on Intelligent Transportation Systems, vol. 8, no. 2,
pp. 264–278, 2007, doi: 10.1109/TITS.2007.895311.
[7] C. Jang, C. Kim, D. Kim, M. Lee, and M. Sunwoo, “Multiple exposure images based traffic light recognition,” in IEEE Intelligent
Vehicles Symposium, Proceedings, 2014, pp. 1313–1318, doi: 10.1109/IVS.2014.6856541.
[8] J. Wu, “Introduction to convolutional neural networks,” Introd. to Convolutional Neural Networks, pp. 1–31, 2017.
[9] J. Muller and K. Dietmayer, “Detecting traffic lights by single shot detection,” in IEEE Conference on Intelligent Transportation
Systems, Proceedings, ITSC, 2018, vol. 2018, pp. 266–273, doi: 10.1109/ITSC.2018.8569683.
[10] Z. Li and F. Zhou, “FSSD: feature fusion single shot multibox detector,” 2017, [Online]. Available: http://arxiv.org/abs/1712.00960.
[11] A. Fregin, J. Muller, U. Krebel, and K. Diermayer, “The driveU traffic light dataset: introduction and comparison with existing
datasets,” in Proceedings - IEEE International Conference on Robotics and Automation, 2018, pp. 3376–3383,
doi: 10.1109/ICRA.2018.8460737.
[12] M. B. Jensen, K. Nasrollahi, and T. B. Moeslund, “Evaluating state-of-the-art object detector on challenging traffic light data,”
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2017-July, pp. 882–888, 2017,
doi: 10.1109/CVPRW.2017.122.
[13] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” Proceedings of the
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 779–788, 2016,
doi: 10.1109/CVPR.2016.91.
[14] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” Proceedings - 30th IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2017, vol. 2017, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.
[15] J. Redmon and A. Farhadi, “YOLOv3: an incremental improvement,” Apr. 2018, [Online]. Available:
http://arxiv.org/abs/1804.02767.
[16] M. P. Jensen Morten Bornøand Philipsen, A. Møgelmose, T. B. Moeslund, and M. M. Trivedi, “Vision for looking at traffic lights:
issues, survey, and perspectives,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 7,
pp. 1800–1815, 2016, doi: 10.1109/TITS.2015.2509509.
[17] M. P. Philipsen, M. B. Jensen, M. M. Trivedi, A. Mogelmose, and T. B. Moeslund, “Ongoing work on traffic lights: detection and
evaluation,” 2015, doi: 10.1109/AVSS.2015.7301730.
[18] Z. Ennahhal, I. Berrada, and K. Fardousse, “Real time traffic light detection and classification using deep learning,” 2019,
doi: 10.1109/WINCOM47513.2019.8942446.
[19] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Feb. 2017,
doi: 10.1109/TPAMI.2016.2577031.
[20] K. Behrendt, L. Novak, and R. Botros, “A deep learning approach to traffic lights: detection, tracking, and classification,” in
Proceedings - IEEE International Conference on Robotics and Automation, 2017, pp. 1370–1377,
doi: 10.1109/ICRA.2017.7989163.
[21] Handoko, J. H. Pratama, and B. W. Yohanes, “Traffic sign detection optimization using color and shape segmentation as pre-
processing system,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 19, no. 1, pp. 173–181, 2021,
doi: 10.12928/TELKOMNIKA.V19I1.16281.
[22] N. Rachburee and W. Punlumjeak, “An assistive model of obstacle detection based on deep learning: YOLOv3 for visually impaired
people,” International Journal of Electrical and Computer Engineering, vol. 11, no. 4, pp. 3434–3442, 2021,
doi: 10.11591/ijece.v11i4.pp3434-3442.
[23] P. N. Andono, E. H. Rachmawanto, N. S. Herman, and K. Kondo, “Orchid types classification using supervised learning algorithm
based on feature and color extraction,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5,
pp. 2530–2538, 2021, doi: 10.11591/eei.v10i5.3118.
[24] D. P. Lestari and R. Kosasih, “Comparison of two deep learning methods for detecting fire hotspots,” International Journal of
Electrical and Computer Engineering, vol. 12, no. 3, pp. 3118–3128, 2022, doi: 10.11591/ijece.v12i3.pp3118-3128.
[25] S. Firdose, S. S. Kumar, and R. G. N. Meegama, “A novel predictive model for capturing threats for facilitating effective social
distancing in COVID-19,” International Journal of Electrical and Computer Engineering, vol. 12, no. 1, pp. 596–604, 2022,
doi: 10.11591/ijece.v12i1.pp596-604.
[26] H. S. Abdul-Ameer, H. J. Hassan, and S. H. Abdullah, “Development smart eyeglasses for visually impaired people based on you
only look once,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 20, no. 1, pp. 109–117, 2022,
doi: 10.12928/TELKOMNIKA.v20i1.22457.
[27] N. E. Budiyanta, C. O. Sereati, and F. R. G. Manalu, “Processing time increasement of non-rice object detection based on YOLOv3-
tiny using Movidius NCS 2 on Raspberry Pi,” Bulletin of Electrical Engineering and Informatics, vol. 11, no. 2,
pp. 1056–1061, 2022, doi: 10.11591/eei.v11i2.3483.
[28] I. A. Dahlan, M. B. G. Putra, S. H. Supangkat, F. Hidayat, F. F. Lubis, and F. Hamami, “Real-time passenger social distance
monitoring with video analytics using deep learning in railway station,” Indonesian Journal of Electrical Engineering and Computer
Science, vol. 26, no. 2, pp. 773–784, 2022, doi: 10.11591/ijeecs.v26.i2.pp773-784.
[29] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” Mar. 2017, [Online]. Available: http://arxiv.org/abs/1703.06870.
[30] T. Y. Lin et al., “Microsoft COCO: common objects in context,” Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740–755, 2014,
doi: 10.1007/978-3-319-10602-1_48.
[31] Y. Netzer and T. Wang, “Reading digits in natural images with unsupervised feature learning,” Nips, pp. 1–9, 2011, [Online].
Available: http://ufldl.stanford.edu/housenumbers/.
[32] T. W. Yeh, S. Y. Lin, H. Y. Lin, S. W. Chan, C. T. Lin, and Y. Y. Lin, “Traffic light detection using convolutional neural networks
and lidar data,” Proceedings - 2019 International Symposium on Intelligent Signal Processing and Communication Systems, ISPACS
2019, 2019, doi: 10.1109/ISPACS48206.2019.8986310.
[33] R. Gokul, A. Nirmal, K. M. Bharath, M. P. Pranesh, and R. Karthika, “A Comparative study between state-of-the-art object detectors
for traffic light detection,” International Conference on Emerging Trends in Information Technology and Engineering, ic-ETITE
2020, 2020, doi: 10.1109/ic-ETITE47903.2020.449.
BIOGRAPHIES OF AUTHORS
Traffic light counter detection comparison using you only look onceV3 … (Hamzah Abdulmalek Al-Haimi)
1592 ISSN: 2252-8938
Zamani Md Sani received his degree in 2000 from Universiti Sains Malaysia.
He worked at Intel Malaysia Kulim for 6 years and obtained his Master at the same university
later in 2009. Later he joined education at Universiti Teknikal Malaysia Melaka and obtained
his PhD from Multimedia Universiti in 2020. His research interest is in Image Processing and
Artificial Intelligence and can be contacted at email: [email protected].