YOLO Based Real Time Human Detection Using Deep Learning
YOLO Based Real Time Human Detection Using Deep Learning
Series
Abstract. Person identification which has been widely used in computer vision
and one of the difficult applications employed in a variety of fields such
autonomous vehicles, robots, security tracking, aiding visually impaired
people, etc. As deep learning quickly developed, numerous algorithms
strengthened the link between video analysis and visual comprehension. The
goal of all these algorithms, regardless of how their network architectures
operate, is to find many people inside a complicated image. The freedom of
movement in an unknown environment is restricted by the absence of vision
impairment, thus it is crucial to use modern technologies and teach them to
assist blind people whenever necessary. We provide a system that will identify
all potential daily multiple people and then prompt a voice to inform the user
about both nearby and distant people.
1. Introduction
Humans are taught by their parents to classify numerous objects, including themselves, nearly from
birth. Because of its high accuracy and precision, the human visual system can manage several tasks
even when the conscious mind is not fully engaged. When there is a lot of data, a more precise system
is required to concurrently recognize and localize several objects. Now that machines have been created,
we may teach our computers to recognize several items in an image with great accuracy and precision
by using better algorithms. Since it requires a deep understanding of images, Using computer vision to
recognize objects is the most difficult task. To put it another way, an object tracker searches for the
presence of objects throughout a number of frames and detects them separately. Complex visuals,
information loss, and the transformation of 3D environments into 2D images can all cause problems for
the tracker. The detection of items is important, but we also need to pinpoint the locations of several
objects whose positions could change from image to image if we wish to identify things with high
precision. Making the best real-time object tracking algorithm is a difficult undertaking. Such problems
have been addressed since 2012 using deep learning. This study attempts to evaluate the performance
of both algorithms in a number of real-world circumstances and was specifically intended for persons
who are blind or visually impaired. Blind people are forced to follow someone else or make physical
contact with them, both of which can be quite dangerous. Without some clever technologies, blind
people may find it daunting to navigate new settings on a regular basis. This contribution's major
objective is to investigate the possibility of doing more things at once in order to improve the help given
to persons who are visually impaired.
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
2. Related Work
Real-time object tracking and detection are crucial functions in many computer vision systems.
Variations in object shape, partial and total occlusion, and scene illumination pose serious challenges
for reliable object tracking. The two key components of object tracking that can be accomplished by
applying these methods are the representation of the target item and the location prediction [1]. We
examine the topic of sets for trustworthy Human recognition using a test case of linear SVM-based
human detection. In this experimental demonstration, We demonstrate that, following Grids of
Histograms of Oriented Gradient (HOG) descriptors, current feature sets for human detection are
significantly outperformed [2]. Shallow trainable structures and handmade characteristics provide the
basis of typical person identification systems It is simple to improve their performance by creating
intricate ensembles that combine a number of low-level picture variables with high-level data from
scene classifiers and object detectors. As a result of deep learning's rapid development, more powerful
tools that can learn semantic, high-level, deeper features are becoming available in response to the
problems with conventional designs [3]. Our approach combines two ideas: (1) domain-specific fine-
tuning after supervised pre-training for an auxiliary job, significantly improves performance when
labelled training data are scarce; and (2) when labelled training data are plentiful, In order to identify
specific and segment objects, bottom-up region proposals can be used with high-capacity convolutional
networks (CNNs) [4]. The study presents the camera applications for persons detection and
identification based on convolutional neural networks (CNN) YOLO-v2. Deep learning-based
computer vision is used to determine the person's position and status [5]. we propose a method for
practically current scenario human detection, localization, and recognition using frames from a set of
data which is of video type that can be acquired from a security camera. After a predetermined amount
of time, the model begins receiving input frames and can assign an action label based on a single frame.
We were able to determine the action label for the video stream by combining information gathered
over a predetermined period of time [6]. A deep learning model known as YOLO has been used to
examine the context of person detection from an overhead perspective You Look Only Once (YOLO).
The model is tested on person data from an overhead view after being trained on frontal view data.
Furthermore, utilizing data from categorized bounding boxes has been done. With a TPR of 95%, the
YOLO model generates noticeably good outcomes [7].
3. System Analysis and Feasibility Study
2
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
3.3.2 YOLO
Yolo is a component of object detection. Finding instances of semantic objects that belong to a certain
class (such as people, buildings, or automobiles) in digital photos and videos is the aim of the computer
science discipline of object detection, which is connected with regard to image processing and computer
vision. Two well-studied object detection domains are face and pedestrian detection. Object detection
is necessary for several computer vision applications, including image retrieval and video monitoring.
Every object class includes distinguishing qualities that make it easier to classify the objects; for
example, all circles are spherical. Detecting the object class is done using these distinctive properties.
When searching for circles, for instance, one looks for objects that are a specific distance from the center
or a point.
3
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
are combined in the following phase, using the inputs from the previous phase that is system design.
Unit testing helps us in testing the system designed and help us in evaluating each and every unit.
Integration and Testing – After the testing phase in the previous stage, all the units are being
combined. The entire system that is being merged is sent to test the faults and errors to next phase.
Deployment of system − Once the product undergone with no errors and faults, it is either released or
installed in the market satisfying the user’s needs.
Maintenance – If the system doesn’t satisfy the user’s need, various problems arise. Patches are
released to fix the certain issues raised. After fixing the issues, the next versions of the system are to be
released. To bring about these changes from the user’s, maintenance is required.
4
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
expectations regarding the system. The system that is being designed with the required needs should
have no alterations and it should have low demand.
3.5.3 Social Feasibility
The main agenda of this study is to specify how much the user agrees to the system. After the acceptance
of the system, they can be able to compromise to use the system effectively. The user while using the
system should be made comfortable to use it rather than threatening them. The method used to
familiarize the user with the system is the sole element that influence how accepting they are of it.
4. System Requirements and Specifications
4.1 Functional Requirements
For the system to fulfil the fundamental requirements of the end user, these requirements must be met.
Functions those programmers must include to help users complete their tasks. It is essential to make
them clear about the requirements that are to be developed in a manner that should satisfy the
requirements. These are the requirements that frequently describe how a system will perform in the
following situation.
Examples of functional requirements:
• Whenever a user logs into the system, they must authenticate themselves.
• Shutdown of systems, if there are any cyber users.
• If a person registers for the first time, a verification email has to be sent to the user.
5
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
6
4th National Conference on Communication Systems (NCOCS 2022) IOP Publishing
Journal of Physics: Conference Series 2466 (2023) 012034 doi:10.1088/1742-6596/2466/1/012034
5. Results
5.1 Home page
From the Home Page, the user can navigate to the rest of the pages of the Human Detection Application.
6. Conclusion
In this paper, With excellent accuracy and in a short amount of time, we have developed a Yolo
approach for identifying and classifying any item in front of the webcam. Yolo is quick to pick up on
all adjacent items, but when tested against a complex scene, it overlooks all the little and distant objects.
To achieve acceptable accuracy across all of the deep learning algorithms used for this application is a
significant problem.
In this study, we evaluate algorithms on a small scale with regard to their accuracy, recall, and inference
time. Future evaluations of alternative algorithms with more parameters and photos will be our goal.
7. References
[1] Z.Zhao, Q. Zheng, P.Xu, S. T, and X. Wu, Object detection with deep learning: A review,IEEE
Transactions on neural networks and learning systems, 30(11), 3212-3232, 2019.
[2] R.Bharti, K. Bhadane, P. Bhadane, and A. Gadhe, Object Detection and Recognition for Blind
Assistance, International Research Journal of Engineering and Technology ( IRJET ) e-ISSN
:2395-0056 Volume:06, 2019.
[3] Misbah Ahmad, Imran Ahmed and Awais Adnan, Overhead view person detection using yolo,
IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference
(UEMCON), 2019
[4] Jiahui Sun, Huayong Ge and Zhehao Zhang, AS-YOLO: An Improved YOLOv4 based on Attent-
ion Mechanism and Squeeze Net for Person Detection, IEEE Advanced Information Technol-
ogy, Electronic and Automation Control Conference (IAEAC), 2021
[5] Prasanth Kannadaguli, YOLO v4 Based Human Detection System Using Aerial Thermal Imagi-
ng for UAV Based Surveillance Applications , IEEE Decision Aid Sciences and Application
(DASA), 2020
[6] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Van Truong Bui, Hung Manh Pham,
and Duc Minh Nguyen, YOLO Based Real-Time Human Detection for Smart Video surveilla-
nce at the Edge, IEEE International Conference on Communications and Electronics, ICCE
2020
[7] Sophia Riziel C De Guzman, Lauren Castro Tan and Jocelyn Flores Villaverde, Social Distanc-
ing Violation Monitoring Using YOLO for Human Detection, IEEE International Conference
on Control Science and Systems Engineering (CCSSE), 2021
[8] Sheshang Degadwala, Dhairya Vyas, Utsho Chakraborty, Abu Raihan Dider, Haimanti Biswas,
Yolo-v4 Deep Learning Model for Medical Face Mask Detection, Artificial Intelligence and
Smart Systems (ICAIS), International Conference on 2021
[9] Muhammad Azhad Bin Zuraimi and Fadhlan Hafizhelmi Kamaru Zaman, Vehicle Detection
and Tracking using YOLO and Deep SORT, Computer Applications & Industrial Electronics
(ISCAIE), IEEE Symposium on 2021.
[10] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Van Truong Bui, Hung Manh Pham,
YOLO Based Real-Time Human Detection for Smart Video Surveillance at the Edge, Interna-
tional Conference on Communications and Electronics, ICCE, 2021.