Design and Implementation Monitoring Robotic System Based On You Only Look Once Model Using Deep Learning Technique
Design and Implementation Monitoring Robotic System Based On You Only Look Once Model Using Deep Learning Technique
Corresponding Author:
Maad Issa Al-Tameemi
Department of Computer Engineering, College of Engineering, University of Baghdad
Baghdad, Iraq
Email: [email protected]
1. INTRODUCTION
Recently, the use of mobile robots has become very necessary today because of their capabilities that
can be used in several areas, including the field of surveillance, especially monitoring the places to be seen
from a distance and determining the objects detection before entering these places. In [1], [2] authors presented
an IP camera video monitoring system utilizing Raspberry Pi. They aimed to capture the images in real-time
and display them in the web browser using transmission control protocol/internet protocol (TCP/IP). The
algorithm is designed and implemented based on the raspberry pi to make human faces detection as well as
display live video in real-time. The work was done without any monitoring reactions [1], [2].
An electric strike door lock is opened automatically when an authorized is passed radio frequency
identification (RFID) tag through an RFID reader. This embedded system is based on Raspberry pi to control
the E-Door [3]. Various electronic devices working on the internet of things (IoT) are controlled remotely by
users using a mobile phone. Data from these electronic devices are transmitted to IoT platforms through the
internet [4], [5].
YOLOv5 algorithm is used to detect Wheat Spike in unmanned aerial vehicle images [6], apples in
orchards images as well as with YOLOv3 [7], Apple stem/calyx recognition in real-time [8], Face mask
recognition [9], the mold on the food surface [10], the Ship in images of optical sensing [11]. Chest abnormality
is detected based on YOLOv5 model by using ResNet50 controller [12]. A vehicle is designed to float on the
different water bodies for collecting garbage floating automated by using you only look once model. This work
is based on Raspberry Pi 3 model B to control the overall vehicle [13]. Various technologies use one you look
family (YOLO) algorithms based on machine and deep learning to detect smoke early in forests satellite
imagery [14]. Object recognition, speech processing, and image classification technologies have been
implemented using the Jetson Nano Board to detect obstacles for blind people [15]. The system is built using
a Raspberry Pi 3 controller with an attached camera for object detection and recognition based on the YOLO
algorithm in different places [16]. The authors have proposed an online assistive blind system for detecting the
different objects based on YOLO algorithm and implementing by Raspberry Pi 3 Model B [17].
The process of rover robot system. The main goal of making a live video with real-time object
detection is to find a suitable object detection model in terms of accuracy and high speed as well as using a
controller that implements this model in real-time without having to send data to the last place, let it be a high
specification computer for processing data. Redmon et al. [18] a great development appeared in object
detection, which is the creation of the YOLO algorithm. YOLOv2 YOLO9000 are then introduced to be better,
faster, and stronger than YOLO in detecting objects, where YOLO9000 was the second "YOLOv2" trained
and used to detect more than 9000 object categories [19]. YOLOv3 is made by updating the YOLOv2 to be
faster than previous YOLO versions [20]. YOLOv4 is then created to improve YOLOv3 of real time object
detection by achieving optimal speed and accuracy [21].
YOLOv5 is the last version of YOLO and used for detecting the objects with fast detection speed and
exact precision, which gets and gives 72% AP 0.5 for the Common Objects in Context val2017 dataset [22].
Besides, YOLOv5 contains multi versions the minimum model size is YOLOv5s with 14 megabytes, which is
convenient for deployment [23]. YOLOv4 has been proposed and compared with many object detectors,
including EfficientDet and YOLOv3 where the proposed algorithm has been proven to have better performance
than others and is 2 times faster than EfficientDet, as well as a significant improvement in YOLOv3 in terms
of Average Precision and frames per second by 10% and 12% respectively [21]. Fang et al. [24] the authors
are proposed a method based YOLOv5 model for the detection of surface knots on sawn timber. YOLOv3
spatial pyramid pool (SPP) and faster region-based convolutional neural network (R-CNN) were implemented
on two datasets and compared with the YOLOv5 model. The experimental results showed that the YOLOv5
model has the best performance in terms of accuracy and speed [24].
A comparison was made between YOLOv5l, YOLOv4, and YOLOv3 algorithms for effective landing
area detection, where it was found that YOLOv5l outperforms the rest of the algorithms in terms of accuracy
and speed [25]. Thus, five versions of the YOLO algorithm appeared, while the last version was the best choice
to perform real time object detection according to the results and comparisons with mentioned references. A
mobile robotic surveillance system has been designed and implemented in both Java and python language
based on Raspberry Pi 3 controller [26]. This system needs to create a client-server in the pc and a server-client
in the controller of robot to transfer the data between them and make image processing in the pc because this
controller cannot process data in real-time Due to its limited capabilities, which makes the system very slow.
The system performs live streaming videos through two cameras and transfers the video frames to the computer
to be processed by using the Harr cascade algorithm to detect the objects, and these processed frames with
objects detection are then displayed directly on the screen of the pc [26].
The main contributions in this work, a remote-controlled mobile system based on Raspberry Pi 4
model B proposes and distinguishes from the rest of the mentioned systems by the ability to broadcast live
video of different places with a moving camera for detecting the objects in real-time, processing and displaying
by Raspberry Pi itself without the need to send data to the computer with a high specification for processing.
In addition, this system can detect objects in stored images, videos, or YouTube video links.
public network, the public address should be known and recieved by internet service providers (ISPs) and
anyone can access it via the Internet.
The terminal and status monitoring for the Raspberry Pi controller can be used remotely by Remote
desktop connection application, as well as be tunneled to any network services running on user Raspberry Pi
(such as hypertext transfer protocol, secure shell protocol) and thus can be accessed worldwide over the
internet. Begin by opening the remote desktop connection application on the windows computer and entering
Raspberry Pi’s local IP address to “Computer:” and clicking the “Connect” button. The PC will be connected
to Raspberry Pi by receiving a screen from the xrdp software, which must be pre-installed in Raspberry Pi to
allow remote connection. Then, the “username” and “password” of the account are eneterd which exist on your
Raspberry Pi. After that, the Raspberry Pi screen will appear and the user will be able to manage and control
the Raspberry Pi and all the devices connected with it.
Design and implementation monitoring robotic system based on … (Maad Issa Al-Tameemi)
110 ISSN: 2252-8938
to implement the monitoring subsystem with objects detection in real-time. First, the overall system is
implemented with python language by importing all the packages, libraries, frameworks necessary. YOLO v5
model is then loaded by using the PyTorch framework which is an open-source machine learning for detecting
the objects. The live video with a connected camera on the raspberry pi or stored images or videos is processed
by using the OpenCV library and then sending to the YOLOv5s model. OpenCV gives a video capture object
which handles everything identified with opening and closing the webcam. Creating the object is all we need
to do and save all the frames from it. After that, the webcam is opened to capture the frames and scale them,
an infinite loop is then used, and keep reading frames from the webcam until press "q" key from the keyboard
to exit. Thus, the YOLOv5s model will make objects detection on these frames in real-time and then show
them in a window or save it in the Raspberry Pi.
Figure 4. The structure of the YOLOv5 network, which consists of the backbone, neck, and the output
The robotic system has the ability to move in all directions, forward, backward, right, and left, as well
as the ability to rotate the camera to the right, left, up, and down, in addition to the other functions that it detects
the objects privatly or publicaly in images, videos, and live connected camera as shown in Figure 7. The
keyboard control keys are used to perform the functions mentioned above, first, select the key ↑, ↓, ←, → to
move the robot forward, backward, to the right or left and the keys R, F, D, G are used to control the pan-tilt
and rotate the webcam up, down, to the left or right. Finally, the space key is pressed to stop the robot. The
robot system is designed with several features to provide live video with object detection, recording abilities,
or image and video processing with object detection. This system is based on the YOLOv5 model for detecting
the objects in real-time by using one of the most used frameworks in the field of deep learning named PyTorch.
The overall functions of the robot are controlled by Raspberry pi 4.
Despite the limited capabilities of the Raspberry Pi, compared to computers with high specifications
of memory, graphics processing unit (GPU), and central processing unit (CPU), and in terms of the difficulty
of applying deep learning algorithms in the Raspberry Pi, however, the application of the 5th generation of
YOLO, YOLOv5 has become possible and has proven its efficiency in real time. YOLOv5s is implemented in
this system by Python 3.9.0 with all requirements installed including PyTorch 1.7, and the models can be
downloaded automatically from the YOLOv5 release. The system can detect the objects of a webcam in real
time, image, video, directory, YouTube link privately or publicly [29]. The obtained results demonstrate the
efficiency of this robot and the ability to detect, classify, and identify different objects in images, videos, or
Design and implementation monitoring robotic system based on … (Maad Issa Al-Tameemi)
112 ISSN: 2252-8938
live video in real-time with acceptable accuracies such as cars, persons, and cellphones by recording confidence
scores of 0.85 for close objects and 0.55 for far objects as shown in the results in Figure 8.
4. CONCLUSION
This work is proposed a rover robotic monitoring system by using Raspberry Pi 4 controlling with
connected USB webcam and utilizing deep learning based YOLOv5s, OpenCV, PyTorch, and COCO 2020
object detection task dataset to detect the objects. The evaluation results have proven the efficiency and the
ability to detect, classify, and identify different objects in images, videos, or live video in real-time with
acceptable accuracies. In conclusion, the overall work was implemented in python language by making a
relatively low-price robot with a lot of features and functions which can perform more than one task at the
same time including motion the robot with the detection of objects with live video. Further study, as extended
of this work, a comparisons study can be applied with other datasets, different weather conditions, or
algorithms.
REFERENCES
[1] S. Singh, P. Anap, Y. Bhaigade, and P. J. P. Chavan, “IP Camera Video surveillance using Raspberry Pi,” IJARCCE, pp. 326–328,
Feb. 2015, doi: 10.17148/IJARCCE.2015.4272.
[2] B. K. Oleiwi, “Scouting and controlling for mobile robot based Raspberry Pi 3,” Journal of Computational and Theoretical
Nanoscience, vol. 16, no. 1, pp. 79–83, Jan. 2019, doi: 10.1166/jctn.2019.7701.
[3] M. I. Younis, M. I. Al-Tameemi, and M. S. Hussein, “MPAES: a multiple-privileges access e-door system based on passive RFID
technology,” 2017.
[4] S. Saha and A. Majumdar, “Data centre temperature monitoring with ESP8266 based wireless sensor network and cloud based
dashboard with real time alert system,” in 2017 Devices for Integrated Circuit (DevIC), Mar. 2017, pp. 307–310. doi:
10.1109/DEVIC.2017.8073958.
[5] P. Singh and S. Saikia, “Arduino-based smart irrigation using water flow sensor, soil moisture sensor, temperature sensor and
ESP8266 WiFi module,” in 2016 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dec. 2016, pp. 1–4. doi:
10.1109/R10-HTC.2016.7906792.
[6] J. Zhao et al., “A wheat spike detection method in UAV images based on improved YOLOv5,” Remote Sensing, vol. 13, no. 16, p.
3095, Aug. 2021, doi: 10.3390/rs13163095.
[7] A. Kuznetsova, T. Maleva, and V. Soloviev, “Detecting apples in orchards using YOLOv3 and YOLOv5 in general and close-up
images,” 2020, pp. 233–243. doi: 10.1007/978-3-030-64221-1_20.
[8] Z. Wang, L. Jin, S. Wang, and H. Xu, “Apple stem/calyx real-time recognition using YOLO-v5 algorithm for fruit automatic loading
system,” Postharvest Biology and Technology, vol. 185, p. 111808, Mar. 2022, doi: 10.1016/j.postharvbio.2021.111808.
[9] G. Yang et al., “Face mask recognition system with YOLOV5 based on image recognition,” in 2020 IEEE 6th International
Conference on Computer and Communications (ICCC), Dec. 2020, pp. 1398–1404. doi: 10.1109/ICCC51575.2020.9345042.
[10] F. Jubayer et al., “Detection of mold on the food surface using YOLOv5,” Current Research in Food Science, vol. 4, pp. 724–728,
2021, doi: 10.1016/j.crfs.2021.10.003.
[11] yuwen chen, C. Zhang, T. Qiao, J. Xiong, and B. Liu, “Ship detection in optical sensing images based on YOLOv5,” in Twelfth
International Conference on Graphics and Image Processing (ICGIP 2020), Jan. 2021, p. 61. doi: 10.1117/12.2589395.
[12] Y. Luo, Y. Zhang, X. Sun, H. Dai, and X. Chen, “Intelligent solutions in chest abnormality detection based on YOLOv5 and
ResNet50,” Journal of Healthcare Engineering, vol. 2021, pp. 1–11, Oct. 2021, doi: 10.1155/2021/2267635.
[13] C. Patil, S. Tanpure, A. Lohiya, S. Pawar, and P. Mohite, “Autonomous amphibious vehicle for monitoring and collecting marine
debris,” in 2020 5th International Conference on Robotics and Automation Engineering (ICRAE), Nov. 2020, pp. 163–168. doi:
10.1109/ICRAE50850.2020.9310888.
[14] C.-L. C. Huang and T. Munasinghe, “Exploring various applicable techniques to detect smoke on the satellite images,” in 2020
IEEE International Conference on Big Data (Big Data), Dec. 2020, pp. 5703–5705. doi: 10.1109/BigData50022.2020.9378466.
[15] R. Joshi, M. Tripathi, A. Kumar, and M. S. Gaur, “Object recognition and classification system for visually impaired,” in 2020
International Conference on Communication and Signal Processing (ICCSP), Jul. 2020, pp. 1568–1572. doi:
10.1109/ICCSP48568.2020.9182077.
[16] H. Gupta, R. S. Yadav, S. M. S. Kumar, and M. J. Leo, “A novel trespassing detection system using deep networks,” 2021, pp. 633–
645. doi: 10.1007/978-981-15-8443-5_54.
[17] M. M. Abdul, F. Alkhalid, and B. K. Oleiwi, “Online blind assistive system using object recognition,” International Research
Journal of Innovations in Engineering and Technology, vol. 3, no. 12, pp. 47–51, 2019.
[18] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” 2015.
[19] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Jul. 2017, pp. 6517–6525. doi: 10.1109/CVPR.2017.690.
[20] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018. doi: 10.48550/ARXIV.1804.02767.
[21] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” Apr. 2020,
arXiv:2004.10934.
[22] “Ultralytics-Yolov5,” Githup. https://github.com/ultralytics/yolov5 (accessed Jan. 01, 2021).
[23] R. Xu, H. Lin, K. Lu, L. Cao, and Y. Liu, “A forest fire detection system based on ensemble learning,” Forests, vol. 12, no. 2, p.
217, Feb. 2021, doi: 10.3390/f12020217.
[24] Y. Fang, X. Guo, K. Chen, Z. Zhou, and Q. Ye, “Accurate and automated detection of surface knots on sawn timbers using YOLO-
V5 model,” BioResources, vol. 16, no. 3, pp. 5390–5406, 2021.
[25] U. Nepal and H. Eslamiat, “Comparing YOLOv3, YOLOv4 and YOLOv5 for autonomous landing spot detection in faulty UAVs,”
Sensors, vol. 22, no. 2, p. 464, Jan. 2022, doi: 10.3390/s22020464.
[26] M. I. AL-TAMEEMI, “RMSRS: Rover multi-purpose surveillance robotic system,” Baghdad Science Journal, vol. 17, no.
3(Suppl.), p. 1049, Sep. 2020, doi: 10.21123/bsj.2020.17.3(Suppl.).1049.
[27] T.-Y. Lin et al., “Microsoft COCO: common objects in context,” in In: European conference on computer vision, May 2014, pp.
740–755. [Online]. Available: http://arxiv.org/abs/1405.0312
[28] F. Zhou, H. Zhao, and Z. Nie, “Safety helmet detection based on YOLOv5,” in 2021 IEEE International Conference on Power
Electronics, Computer Applications (ICPECA), Jan. 2021, pp. 6–11. doi: 10.1109/ICPECA51329.2021.9362711.
[29] G. Jocher, “ultralytics/yolov5: v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration,” 2021, doi:
10.5281/zenodo.4418161.
BIOGRAPHIES OF AUTHORS
Design and implementation monitoring robotic system based on … (Maad Issa Al-Tameemi)