SlideShare a Scribd company logo
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 452
Convolutional Neural Network Based Real Time
Object Detection Using YOLO V4
Srinath S 1, Kishore S M 2, Hariharan G 3 Manikumar R 4
1,2,3 Final Year BE Student ,Department of Electronics and Communication Engineering.
4 Assistant professor, Department of Electronics and Communication Engineering.
1,2,3,4Government College of Engineering(Autonomous),Bargur, Krishnagiri, Tamil Nadu ,India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Object detection is a difficult work in computer
vision and finds applications in Different fields like robotics,
surveillance and automated vehicles. However, compared to
image classification the object detection tasks are more
difficult to analyze, more energy consuming and computation
intensive. Our project aims to develop a real time object
detection module using Convolutional Neural Network
implemented in Matlab. The proposed systemwillleveragethe
power of deep learning techniques, specifically CNNs, to
accurately detect and localize objects in real-time video
streams .
Key Words: Convolutional Neural Networks, Object
detection, YOLO V4
1.INTRODUCTION
Evolving technological landscape, computer vision plays a
pivotal role in various applications, ranging from
autonomous vehicles to surveillance systems, robotics, and
beyond. Real-time object detection, a subfield of computer
vision, has gained significant prominence due to itsabilityto
identify and locate objects within live video streams or
images in real-time. This capability has wide-ranging
applications, from enhancing safety and security toenabling
autonomous decision-making in machines.
This project, titled "Real-Time Object Detection Using
Convolutional Neural Networks (CNN) in MATLAB," is
dedicated to developing a robust and efficient system that
can perform real-time object detection with accurateness
and speed. At its core, this project harnesses the power of
deep learning, specifically CNNs, to revolutionize the way
objects are detected and localized within visual data.
The primary objective of this project is to create a real-time
object detection system that operates seamlessly on live
video feeds or images, allowing for the immediate
identification and localization of objects within the frame.
We aim to achieve state-of-the-art accuracyandreliability in
object detection, ensuring that the system can make correct
decisions with minimal false positives and false negatives.
Real-time performance is of paramount importance.
Our project strives to processvideoframesorimagesswiftly,
enabling real-time decision-making in various applications.
In addition to detecting objects, the system will classifythem
into predefined categories or classes. This classification
capability enhances the system's utility in differentiating
between various types of objects within the scene.
2. INTRODUCTION TO CONVOLUTIONAL NEURAL
NETWORK:
The Convolutional Neural Network represents a critical
milestone in the evolution of artificial neural network,
playinga pivotal role in revolutionizing the field of computer
vision and image analysis. Conceived as a specialized
architecture for processing visual data, CNNs have become
the go-to tool for tasks like image classification, object
detection, and facial recognition.
What sets CNNs apart is their ability to automatically and
hierarchically extract intricate features from raw images,
emulating the human visual system's knack for pattern
recognition. This adaptability has paved the way for
transformative applications across a wide spectrum of
industries, from healthcare (medical image analysis) to
automotive (autonomous driving systems)andbeyond.With
their deep layers of interconnected neurons and carefully
designed convolutional and pooling operations, CNNs have
not only achieved unprecedented accuracy but have also
reduced the computational burden, making them efficient
and scalable.
This versatility, combined with their capacity fortransfer
learning, where pre-trained models can be fine-tuned for
specific tasks, has made CNNs indispensable in the world of
machine learning and artificial intelligence. This paragraph
merely scratches the surface of the profoundimpactofCNNs,
as researchers and practitioners continue to push the
boundaries of what is achievable in visual data analysis.
Fig- 1: Schematic diagram of a basic convolutional neural
network (CNN) architecture
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 453
2.1 Input Layer:
The Input Layer is the entry point for the network and
represents the raw image data. It's structured as a grid of
pixel values, with each pixel corresponding to a color
channel. The dimensions of this layer correspond to the
dimensions of the input image.
2.2 Convolutional Layers:
Convolutional Layers are responsible for extracting
features from the input image. These layers consist of
multiple learnable filters that slide across the image,
performing convolution operations. Each filter learns to
recognize specific patterns and features, such as edges,
corners, and textures. The Convolutional Layers form a
hierarchy, with deeper layers capturing increasingly
complex and abstract features.
2.3 Activation Functions:
After each convolution operation, an Activation Function
is applied element wise. The most common activation
function used Is ReLU (Rectified Linear Unit), which
introduces non-linearity into the network. ReLU replaces
negative values with zeros, allowing the network to model
complex relationships in the data.
2.4 Pooling Layers:
Pooling Layers follow convolutional layersandreducethe
shrink the space feature maps while retaining essential
information. Max-pooling, for instance,selectsthe maximum
value from a small region, effectively down sampling the
data. This reduces computational complexity and makes the
network less sensitive to minor variations in object position
or scale.
2.5 Fully Connected Layers:
Fully Connected Layers come after several convolutional
and pooling layers and are typically found in the later stages
of the network. These layers flatten the feature maps into a
one-dimensional vector and connect every neuron to every
neuron in the previous and subsequent layers. They are
crucial for making final predictions or decisions, such as
classifying objects in an image.
2.6 Output Layer:
Output Layer is the final layer of the network, and its
structure depends on thespecific task.Inimageclassification
tasks, it typically contains as many neurons as there are
classes, with a SoftMax activation function. The SoftMax
function converts the network's output into class
probabilities, allowing it to assign a likelihood score to each
class, thus making a prediction.
3 .OBJECT DECTECTION
Object detection is to develop computer vision systems
capable of automatically identifying, localizing, and
classifying objects within images or video frames. This task
serves as a foundational building block in the field of
computer vision, addressing the fundamental question of
"what is where" in visual data.
Object detection systems are designed to go beyond
simple image classification,providinga morecomprehensive
understanding of the visual content. The key goals and
objective of object detection can be summarized as follows:
3.1 Precise Localization:
Object detection aims to precisely outline the boundaries
of objects in an image, often by drawing bounding boxes
around them. This localization information is vital in
applications such as robotics, where it enables robots to
interact with objects accurately.
3.2 Accurate Classification:
Another primary goal is to classify detected objects into
predefined categories or classes. This classification is
essential for tasks like scene understanding, content-based
image retrieval, and even assisting visually impaired
individuals in understanding their surroundings.
3.3 Multi-Object Recognition:
Object detection systems are designed to handle multiple
objects within a single image, distinguishing and identifying
each object separately. This capability is crucial in scenarios
like autonomous driving, where the system must recognize
and react to various objects on the road.
3.4 Real-Time Performance:
Many applications, such as video supervision and self-
driving cars, require object detection to operate in real-time
or near-real-time. Achieving efficient and fast object
detection is a key objective to meet these requirements.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 454
4. SYSTEM ARCHITECTURE
Fig -2: Block Diagram of Proposed system
4.1 Dataset Collection:
Gather a dataset containing images or video frames with
the objects to detect and classify. The dataset includes
annotated bounding boxes and class labels for the objects.
4.2 Data Preprocessing:
Preprocess the dataset by resizing images, normalizing
pixel values, and augmenting the data to enhance model
generalization.
4.3 Label Dataset:
Annotate the dataset with class labels and bounding box
coordinates for each object in the images or frames.
4.4 Test:
Split the dataset into training, validation, and testing
subsets. The training set is used for model training, the
validation set is for tuning hyperparameters, and the testing
set is for evaluating the model's performance.
4.5 Classes.txt:
Create a classes.txt file that lists all the object classes to
detect in your dataset.
4.6 Training YOLO Model (within CNN):
Train a YOLO model as part of your CNN architecture
using the preprocessed and annotated dataset. YOLO can be
integrated into CNNs for object detection tasks.
4.7 Model File:
After training, save the trained CNN model, including the
YOLO components, as a model file. This file will contain both
the CNN and YOLO model weights and architecture.
4.8 Module Evaluation:
Evaluate the performance of trained CNN-YOLO model
using metricslike precision, recall, F1-score, and MAP(mean
Average Precision).
4.9 Model Deployment:
Deploy trained CNN-YOLO model to a production
environment where itcanperformreal-timeobjectdetection.
4.10 Input Video:
Provide the deployed model with an input video stream.
This can be a live video feed or a pre-recorded video file.
4.11 Process Frame:
For each frame in the input video, pass it through CNN-
YOLO model for object detection.
4.12 Segmentation:
The CNN-YOLO model segments each frame into a grid
and assigns bounding boxes to detected objects. It also
estimates class probabilities for each detected object.
4.13 Bounding Box, Contour:
Extract the bounding box coordinates for each detected
object. These coordinates are used to draw bounding boxes
around the objects.
4.14 Label Classes:
Assign class labels to the detected objects based on the
class with the highest probability in the CNN-YOLO model's
output.
4.16 Visualization:
Visualize the video frames with bounding boxesandclass
labels overlaid on the detected objects for better
understanding and analysis.
4.17 Save Output File (in CNN):
Save theprocessedvideoframeswithboundingboxesand
class labels as an output video file or image sequence for
further analysis or sharing.
5. PROPOSED METHOD- YOLO V4
Sharing The YOLO (You Only Look Once) v4 architecture
is a state-of-the-art deep learning model for real-time object
detection. It builds upon the success of earlier versions like
YOLOv3, aiming to improve accuracy while maintainingreal-
time performance. YOLO v4 incorporates several
architectural enhancements and innovations to achieve its
objectives.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 455
5.1 Backbone Network:
YOLO v4 starts with a powerful backbone network that
extracts features from the input image. Thechosenbackbone
for YOLO v4 is often CSPDarknet53, an improved version of
Darknet53. CSPDarknet53 incorporates Cross-Stage Partial
connections, which enhances feature learning and
representation.
5.2 Feature Pyramid Network (FPN):
YOLO v4 utilizes a Feature Pyramid Network similar to
YOLO v3. The FPN is responsible for creating featuremapsat
multiple scales, which helps in detecting objects of different
sizes. These featuremaps are extracted from different layers
of the backbone network.
5.3 Neck Architecture:
In YOLO v4, there's a "neck" architecture that further
refines the features obtained from the FPN. This neck
architecture often includes PANet (Path Aggregation
Network) modules. PANet helps in aggregating multi-scale
features effectively, improving the model's ability to handle
objects at various scales.
5.4 Detection Head:
The detection head of YOLO v4 consists of multiple
detection layers. Each detection layer is associated with
specific feature maps from the FPN or the neck architecture.
These layers are responsible for predicting bounding boxes
and class probabilities for objects. The head also predicts
anchor box offsets for precise localization.
5.5 Anchor Boxes:
YOLO v4 uses anchor boxes, which are predefined
bounding box shapes of different sizes and aspect ratios.
These anchor boxes serve as references for the model to
predict object locations and sizes accurately. The model
predicts adjustments (offsets) to these anchor boxes.
5.6 Multi-scale Predictions:
YOLO v4 makes predictions at multiple scales, allowing it
to detect objects of varying sizes in the same image. This
multi-scale approach ensures that the model can identify
both small and large objects effectively.
5.7 Spatial Attention Module:
YOLO v4 incorporates the Spatial Attention Module in its
architectureto introducespatialattention.Thismodulehelps
the model focus on important regions in the feature maps,
improving the model's attention to relevant details.
5.8 Data Augmentation:
During training, data augmentation techniques are
applied to the input images. These techniques include
random scaling, translation, rotation,andcolorjittering.Data
augmentation helps the model become more robust to
variations in the input data.
5.9 Loss Functions:
YOLO v4 uses a combination of loss functions, including
classification loss,localizationandconfidenceloss.Theseloss
functions are designed to guide the training process and
encourage accurate object detection.
5.10 Training Strategy:
YOLO v4 is trained on large datasets like COCO, often
using transfer learning frompre-trained models.Fine-tuning
is also employed to adapt the model to specific object
detection tasks.
5.11 Post-processing:
After inference, YOLO v4 employs post-processing
techniques such as non-maximum suppression (NMS) to
remove duplicate and low-confidence detections, ensuring
that only the most confident predictions are retained.
5.12 Model Variants:
YOLO v4 has various model variants, including the full-
sized YOLOv4 and smaller variants like YOLOv4-tiny, which
have different trade-offs in terms of speed and accuracy.
The YOLO v4 architectureis a sophisticatedandpowerful
object detection system thatachieves state-of-the-artresults
in real-time detection tasks. Its ability to handle multi-scale
objects and adapt to various applications makes it a popular
choice in computer vision and deep learning research and
applications.
Fig -3: Block Diagram of Typical YOLO V4
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 456
6. SOFTWARE USED- MATLAB
MATLAB, short for "MATrix LABoratory," is a high-level
programming environment and a proprietary programming
language developed by MathWorks. It is widely used in
academia, research, and industry for various scientific and
engineeringapplications.MATLABisknownforitsversatility,
powerful mathematicalcapabilities,and extensive toolboxes
that facilitate a wide range of tasks.
7. OUTPUT SNAPCHATS
7.1 Input Video
7.2 Output Video
7.2 Performance Metrics for Video
7.3 Input Image
7.4 Output Image
7.2 Performance Metrics for image
8. FUTURE WORK
In our project we Developed Convolutional Neural
Network (CNN)-based real-time object detection using
YOLOv4 for Image and video streams.FuturewewillDevelop
a Live Video streams and Advanced Algorithms for Object
Detection.
9. CONCLUSIONS
In conclusion, our project on Convolutional Neural Network
(CNN)-based real-time object detection using YOLOv4 has
achieved remarkable results in the realm of computervision
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 457
and object recognition. Leveraging the cutting-edge YOLOv4
architecture, we have developed a highly accurate and
efficient system capable of instantaneously identifying and
locating objects in real-world scenarios, such as live video
streams. This project's success is a testament to the
incredible progress made in deep learning and CNNs,
allowing us to address complex, real-time object detection
tasks with unprecedented accuracy and speed.
Our YOLOv4-based solution holds significant potential for
applications across numerous domains, including
autonomous vehicles, surveillance, and industrial
automation, where real-time object detection is critical. As
we continue to advance the capabilities of CNNs and object
detection algorithms, we are at the forefront of shaping a
future where machines can perceive and interact with their
environments with remarkable precision and efficiency.
REFERENCES
1. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali
Farhadi,” You Only Look Once: Unified, Real-Time
Object Detection” 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
2. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
“Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks” IEEE
Transactions on Pattern Analysis and Machine
Intelligence (Volume: 39, Issue: 6, 01 June 2017).
3. Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Christian Szegedy, Scott Reed, Cheng-Yang Fu,
Alexander C. Berg” Single Shot MultiBox Detector
(SSD)” 2016 Conference: European Conference on
Computer Vision (ECCV).
4. R. Venkatesan,A.Ganesh“Real timeimplementation
on moving object tracking and recognisation using
Matlab” IEEE 2012 International Conference on
Computing, Communication and Applications.
5. Amruta D. Dange, B. Momin “The CNN and DPM
based approach for multiple object detection in
images” IEEE 2019 International Conference on
Intelligent Computing and Control Systems (ICCS).
6. Madhusudan Upadhyay, S. K. Murthy, A. Raj
“Intelligent System for Real time detection and
classification of Aerial Targets using CNN” IEEE
2021 5th International Conference on Intelligent
Computing and Control Systems (ICICCS).
7. G. Vinod, Padmapriya “An Adaptable Real-Time
Object Detection for Traffic Surveillance using R-
CNN over CNN with Improved Accuracy”IEEE2022
International Conference on Business Analytics for
Technology and Security (ICBATS).
8. Anitha Ramachandran, Arun Kumar Sangaiah “A
review on object detection in unmanned aerial
vehicle surveillance” International Journal of
Cognitive ComputinginEngineering. Volume2,June
2021, Pages 215-228.
9. Abbas Shaik, R. Thandaiah Prabu, S. Radhika
“Detection of Face Mask using Convolutional Neural
Network (CNN) based Real-Time Object Detection
Algorithm You Only Look Once-V3 (YOLO-V3)
Compared with Single-Stage Detector (SSD)
Algorithm to Improve Precision” IEEE 2023
International Conference on Advances in
Computing, Communication and Applied
Informatics (ACCAI).
10. Akash Tripathi, T. Kumar, Tarun Kanth Dhansetty,J.
Kumar “Real Time Object Detection using CNN”
International Journal of Engineering & Technology
7(2):33-36.
11. Mohammad Farhad Bulbul, Faishal Badsha, Rafiqul
Islam” Object Detection by Point Feature Matching
using Matlab” Advances In Image and Video
Processing 5(6).

More Related Content

Similar to Convolutional Neural Network Based Real Time Object Detection Using YOLO V4 (20)

PDF
REAL-TIME OBJECT DETECTION USING OPEN COMPUTER VISION
IRJET Journal
 
PPTX
Real Time Object Dectection using machine learning
pratik pratyay
 
PDF
ASSISTANCE SYSTEM FOR DRIVERS USING IOT
IRJET Journal
 
PPTX
slide-171212080528.pptx
SharanrajK22MMT1003
 
PPTX
HOW TO WASTE YOUR TIME ON SIMPLE THINGS DONT JUST FEEL INSTEAD BLAME OTHERS A...
lanaw86385
 
PDF
Real Time Sign Language Recognition Using Deep Learning
IRJET Journal
 
PDF
seminar report kshitij on PBL presentation.pdf
sayalishivarkar1
 
PDF
Object Detection An Overview
ijtsrd
 
PDF
IRJET- Applications of Object Detection System
IRJET Journal
 
PDF
IRJET- Smart Traffic Control System using Yolo
IRJET Journal
 
PDF
Detection of a user-defined object in an image using feature extraction- Trai...
IRJET Journal
 
PDF
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
PDF
Intelligent Accident Detection, Prevention and Reporting System
IRJET Journal
 
PPTX
Object detection with Tensorflow Api
ArwinKhan1
 
PPTX
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
PDF
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
IRJET Journal
 
PDF
Real Time Moving Object Detection for Day-Night Surveillance using AI
IRJET Journal
 
PDF
Machine learning based augmented reality for improved learning application th...
IJECEIAES
 
PPTX
Object detection with deep learning
Sushant Shrivastava
 
PDF
IRJET- Object Detection and Recognition for Blind Assistance
IRJET Journal
 
REAL-TIME OBJECT DETECTION USING OPEN COMPUTER VISION
IRJET Journal
 
Real Time Object Dectection using machine learning
pratik pratyay
 
ASSISTANCE SYSTEM FOR DRIVERS USING IOT
IRJET Journal
 
slide-171212080528.pptx
SharanrajK22MMT1003
 
HOW TO WASTE YOUR TIME ON SIMPLE THINGS DONT JUST FEEL INSTEAD BLAME OTHERS A...
lanaw86385
 
Real Time Sign Language Recognition Using Deep Learning
IRJET Journal
 
seminar report kshitij on PBL presentation.pdf
sayalishivarkar1
 
Object Detection An Overview
ijtsrd
 
IRJET- Applications of Object Detection System
IRJET Journal
 
IRJET- Smart Traffic Control System using Yolo
IRJET Journal
 
Detection of a user-defined object in an image using feature extraction- Trai...
IRJET Journal
 
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
Intelligent Accident Detection, Prevention and Reporting System
IRJET Journal
 
Object detection with Tensorflow Api
ArwinKhan1
 
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
IRJET Journal
 
Real Time Moving Object Detection for Day-Night Surveillance using AI
IRJET Journal
 
Machine learning based augmented reality for improved learning application th...
IJECEIAES
 
Object detection with deep learning
Sushant Shrivastava
 
IRJET- Object Detection and Recognition for Blind Assistance
IRJET Journal
 

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PDF
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
PDF
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
PPSX
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
PDF
A Brief Introduction About Robert Paul Hardee
Robert Paul Hardee
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PPTX
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
PPTX
CM Function of the heart pp.pptxafsasdfddsf
drmaneharshalid
 
PDF
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PPTX
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PPTX
Electrical_Safety_EMI_EMC_Presentation.pptx
drmaneharshalid
 
PDF
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
輪読会資料_Miipher and Miipher2 .
NABLAS株式会社
 
bs-en-12390-3 testing hardened concrete.pdf
ADVANCEDCONSTRUCTION
 
OOPS Concepts in Python and Exception Handling
Dr. A. B. Shinde
 
A Brief Introduction About Robert Paul Hardee
Robert Paul Hardee
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
template.pptxr4t5y67yrttttttttttttttttttttttttttttttttttt
SithamparanaathanPir
 
CM Function of the heart pp.pptxafsasdfddsf
drmaneharshalid
 
PROGRAMMING REQUESTS/RESPONSES WITH GREATFREE IN THE CLOUD ENVIRONMENT
samueljackson3773
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Explore USA’s Best Structural And Non Structural Steel Detailing
Silicon Engineering Consultants LLC
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
June 2025 Top 10 Sites -Electrical and Electronics Engineering: An Internatio...
elelijjournal653
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
Electrical_Safety_EMI_EMC_Presentation.pptx
drmaneharshalid
 
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Ad

Convolutional Neural Network Based Real Time Object Detection Using YOLO V4

  • 1. © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 452 Convolutional Neural Network Based Real Time Object Detection Using YOLO V4 Srinath S 1, Kishore S M 2, Hariharan G 3 Manikumar R 4 1,2,3 Final Year BE Student ,Department of Electronics and Communication Engineering. 4 Assistant professor, Department of Electronics and Communication Engineering. 1,2,3,4Government College of Engineering(Autonomous),Bargur, Krishnagiri, Tamil Nadu ,India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Object detection is a difficult work in computer vision and finds applications in Different fields like robotics, surveillance and automated vehicles. However, compared to image classification the object detection tasks are more difficult to analyze, more energy consuming and computation intensive. Our project aims to develop a real time object detection module using Convolutional Neural Network implemented in Matlab. The proposed systemwillleveragethe power of deep learning techniques, specifically CNNs, to accurately detect and localize objects in real-time video streams . Key Words: Convolutional Neural Networks, Object detection, YOLO V4 1.INTRODUCTION Evolving technological landscape, computer vision plays a pivotal role in various applications, ranging from autonomous vehicles to surveillance systems, robotics, and beyond. Real-time object detection, a subfield of computer vision, has gained significant prominence due to itsabilityto identify and locate objects within live video streams or images in real-time. This capability has wide-ranging applications, from enhancing safety and security toenabling autonomous decision-making in machines. This project, titled "Real-Time Object Detection Using Convolutional Neural Networks (CNN) in MATLAB," is dedicated to developing a robust and efficient system that can perform real-time object detection with accurateness and speed. At its core, this project harnesses the power of deep learning, specifically CNNs, to revolutionize the way objects are detected and localized within visual data. The primary objective of this project is to create a real-time object detection system that operates seamlessly on live video feeds or images, allowing for the immediate identification and localization of objects within the frame. We aim to achieve state-of-the-art accuracyandreliability in object detection, ensuring that the system can make correct decisions with minimal false positives and false negatives. Real-time performance is of paramount importance. Our project strives to processvideoframesorimagesswiftly, enabling real-time decision-making in various applications. In addition to detecting objects, the system will classifythem into predefined categories or classes. This classification capability enhances the system's utility in differentiating between various types of objects within the scene. 2. INTRODUCTION TO CONVOLUTIONAL NEURAL NETWORK: The Convolutional Neural Network represents a critical milestone in the evolution of artificial neural network, playinga pivotal role in revolutionizing the field of computer vision and image analysis. Conceived as a specialized architecture for processing visual data, CNNs have become the go-to tool for tasks like image classification, object detection, and facial recognition. What sets CNNs apart is their ability to automatically and hierarchically extract intricate features from raw images, emulating the human visual system's knack for pattern recognition. This adaptability has paved the way for transformative applications across a wide spectrum of industries, from healthcare (medical image analysis) to automotive (autonomous driving systems)andbeyond.With their deep layers of interconnected neurons and carefully designed convolutional and pooling operations, CNNs have not only achieved unprecedented accuracy but have also reduced the computational burden, making them efficient and scalable. This versatility, combined with their capacity fortransfer learning, where pre-trained models can be fine-tuned for specific tasks, has made CNNs indispensable in the world of machine learning and artificial intelligence. This paragraph merely scratches the surface of the profoundimpactofCNNs, as researchers and practitioners continue to push the boundaries of what is achievable in visual data analysis. Fig- 1: Schematic diagram of a basic convolutional neural network (CNN) architecture International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 453 2.1 Input Layer: The Input Layer is the entry point for the network and represents the raw image data. It's structured as a grid of pixel values, with each pixel corresponding to a color channel. The dimensions of this layer correspond to the dimensions of the input image. 2.2 Convolutional Layers: Convolutional Layers are responsible for extracting features from the input image. These layers consist of multiple learnable filters that slide across the image, performing convolution operations. Each filter learns to recognize specific patterns and features, such as edges, corners, and textures. The Convolutional Layers form a hierarchy, with deeper layers capturing increasingly complex and abstract features. 2.3 Activation Functions: After each convolution operation, an Activation Function is applied element wise. The most common activation function used Is ReLU (Rectified Linear Unit), which introduces non-linearity into the network. ReLU replaces negative values with zeros, allowing the network to model complex relationships in the data. 2.4 Pooling Layers: Pooling Layers follow convolutional layersandreducethe shrink the space feature maps while retaining essential information. Max-pooling, for instance,selectsthe maximum value from a small region, effectively down sampling the data. This reduces computational complexity and makes the network less sensitive to minor variations in object position or scale. 2.5 Fully Connected Layers: Fully Connected Layers come after several convolutional and pooling layers and are typically found in the later stages of the network. These layers flatten the feature maps into a one-dimensional vector and connect every neuron to every neuron in the previous and subsequent layers. They are crucial for making final predictions or decisions, such as classifying objects in an image. 2.6 Output Layer: Output Layer is the final layer of the network, and its structure depends on thespecific task.Inimageclassification tasks, it typically contains as many neurons as there are classes, with a SoftMax activation function. The SoftMax function converts the network's output into class probabilities, allowing it to assign a likelihood score to each class, thus making a prediction. 3 .OBJECT DECTECTION Object detection is to develop computer vision systems capable of automatically identifying, localizing, and classifying objects within images or video frames. This task serves as a foundational building block in the field of computer vision, addressing the fundamental question of "what is where" in visual data. Object detection systems are designed to go beyond simple image classification,providinga morecomprehensive understanding of the visual content. The key goals and objective of object detection can be summarized as follows: 3.1 Precise Localization: Object detection aims to precisely outline the boundaries of objects in an image, often by drawing bounding boxes around them. This localization information is vital in applications such as robotics, where it enables robots to interact with objects accurately. 3.2 Accurate Classification: Another primary goal is to classify detected objects into predefined categories or classes. This classification is essential for tasks like scene understanding, content-based image retrieval, and even assisting visually impaired individuals in understanding their surroundings. 3.3 Multi-Object Recognition: Object detection systems are designed to handle multiple objects within a single image, distinguishing and identifying each object separately. This capability is crucial in scenarios like autonomous driving, where the system must recognize and react to various objects on the road. 3.4 Real-Time Performance: Many applications, such as video supervision and self- driving cars, require object detection to operate in real-time or near-real-time. Achieving efficient and fast object detection is a key objective to meet these requirements.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 454 4. SYSTEM ARCHITECTURE Fig -2: Block Diagram of Proposed system 4.1 Dataset Collection: Gather a dataset containing images or video frames with the objects to detect and classify. The dataset includes annotated bounding boxes and class labels for the objects. 4.2 Data Preprocessing: Preprocess the dataset by resizing images, normalizing pixel values, and augmenting the data to enhance model generalization. 4.3 Label Dataset: Annotate the dataset with class labels and bounding box coordinates for each object in the images or frames. 4.4 Test: Split the dataset into training, validation, and testing subsets. The training set is used for model training, the validation set is for tuning hyperparameters, and the testing set is for evaluating the model's performance. 4.5 Classes.txt: Create a classes.txt file that lists all the object classes to detect in your dataset. 4.6 Training YOLO Model (within CNN): Train a YOLO model as part of your CNN architecture using the preprocessed and annotated dataset. YOLO can be integrated into CNNs for object detection tasks. 4.7 Model File: After training, save the trained CNN model, including the YOLO components, as a model file. This file will contain both the CNN and YOLO model weights and architecture. 4.8 Module Evaluation: Evaluate the performance of trained CNN-YOLO model using metricslike precision, recall, F1-score, and MAP(mean Average Precision). 4.9 Model Deployment: Deploy trained CNN-YOLO model to a production environment where itcanperformreal-timeobjectdetection. 4.10 Input Video: Provide the deployed model with an input video stream. This can be a live video feed or a pre-recorded video file. 4.11 Process Frame: For each frame in the input video, pass it through CNN- YOLO model for object detection. 4.12 Segmentation: The CNN-YOLO model segments each frame into a grid and assigns bounding boxes to detected objects. It also estimates class probabilities for each detected object. 4.13 Bounding Box, Contour: Extract the bounding box coordinates for each detected object. These coordinates are used to draw bounding boxes around the objects. 4.14 Label Classes: Assign class labels to the detected objects based on the class with the highest probability in the CNN-YOLO model's output. 4.16 Visualization: Visualize the video frames with bounding boxesandclass labels overlaid on the detected objects for better understanding and analysis. 4.17 Save Output File (in CNN): Save theprocessedvideoframeswithboundingboxesand class labels as an output video file or image sequence for further analysis or sharing. 5. PROPOSED METHOD- YOLO V4 Sharing The YOLO (You Only Look Once) v4 architecture is a state-of-the-art deep learning model for real-time object detection. It builds upon the success of earlier versions like YOLOv3, aiming to improve accuracy while maintainingreal- time performance. YOLO v4 incorporates several architectural enhancements and innovations to achieve its objectives.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 455 5.1 Backbone Network: YOLO v4 starts with a powerful backbone network that extracts features from the input image. Thechosenbackbone for YOLO v4 is often CSPDarknet53, an improved version of Darknet53. CSPDarknet53 incorporates Cross-Stage Partial connections, which enhances feature learning and representation. 5.2 Feature Pyramid Network (FPN): YOLO v4 utilizes a Feature Pyramid Network similar to YOLO v3. The FPN is responsible for creating featuremapsat multiple scales, which helps in detecting objects of different sizes. These featuremaps are extracted from different layers of the backbone network. 5.3 Neck Architecture: In YOLO v4, there's a "neck" architecture that further refines the features obtained from the FPN. This neck architecture often includes PANet (Path Aggregation Network) modules. PANet helps in aggregating multi-scale features effectively, improving the model's ability to handle objects at various scales. 5.4 Detection Head: The detection head of YOLO v4 consists of multiple detection layers. Each detection layer is associated with specific feature maps from the FPN or the neck architecture. These layers are responsible for predicting bounding boxes and class probabilities for objects. The head also predicts anchor box offsets for precise localization. 5.5 Anchor Boxes: YOLO v4 uses anchor boxes, which are predefined bounding box shapes of different sizes and aspect ratios. These anchor boxes serve as references for the model to predict object locations and sizes accurately. The model predicts adjustments (offsets) to these anchor boxes. 5.6 Multi-scale Predictions: YOLO v4 makes predictions at multiple scales, allowing it to detect objects of varying sizes in the same image. This multi-scale approach ensures that the model can identify both small and large objects effectively. 5.7 Spatial Attention Module: YOLO v4 incorporates the Spatial Attention Module in its architectureto introducespatialattention.Thismodulehelps the model focus on important regions in the feature maps, improving the model's attention to relevant details. 5.8 Data Augmentation: During training, data augmentation techniques are applied to the input images. These techniques include random scaling, translation, rotation,andcolorjittering.Data augmentation helps the model become more robust to variations in the input data. 5.9 Loss Functions: YOLO v4 uses a combination of loss functions, including classification loss,localizationandconfidenceloss.Theseloss functions are designed to guide the training process and encourage accurate object detection. 5.10 Training Strategy: YOLO v4 is trained on large datasets like COCO, often using transfer learning frompre-trained models.Fine-tuning is also employed to adapt the model to specific object detection tasks. 5.11 Post-processing: After inference, YOLO v4 employs post-processing techniques such as non-maximum suppression (NMS) to remove duplicate and low-confidence detections, ensuring that only the most confident predictions are retained. 5.12 Model Variants: YOLO v4 has various model variants, including the full- sized YOLOv4 and smaller variants like YOLOv4-tiny, which have different trade-offs in terms of speed and accuracy. The YOLO v4 architectureis a sophisticatedandpowerful object detection system thatachieves state-of-the-artresults in real-time detection tasks. Its ability to handle multi-scale objects and adapt to various applications makes it a popular choice in computer vision and deep learning research and applications. Fig -3: Block Diagram of Typical YOLO V4
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 456 6. SOFTWARE USED- MATLAB MATLAB, short for "MATrix LABoratory," is a high-level programming environment and a proprietary programming language developed by MathWorks. It is widely used in academia, research, and industry for various scientific and engineeringapplications.MATLABisknownforitsversatility, powerful mathematicalcapabilities,and extensive toolboxes that facilitate a wide range of tasks. 7. OUTPUT SNAPCHATS 7.1 Input Video 7.2 Output Video 7.2 Performance Metrics for Video 7.3 Input Image 7.4 Output Image 7.2 Performance Metrics for image 8. FUTURE WORK In our project we Developed Convolutional Neural Network (CNN)-based real-time object detection using YOLOv4 for Image and video streams.FuturewewillDevelop a Live Video streams and Advanced Algorithms for Object Detection. 9. CONCLUSIONS In conclusion, our project on Convolutional Neural Network (CNN)-based real-time object detection using YOLOv4 has achieved remarkable results in the realm of computervision
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 10 | Oct 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 457 and object recognition. Leveraging the cutting-edge YOLOv4 architecture, we have developed a highly accurate and efficient system capable of instantaneously identifying and locating objects in real-world scenarios, such as live video streams. This project's success is a testament to the incredible progress made in deep learning and CNNs, allowing us to address complex, real-time object detection tasks with unprecedented accuracy and speed. Our YOLOv4-based solution holds significant potential for applications across numerous domains, including autonomous vehicles, surveillance, and industrial automation, where real-time object detection is critical. As we continue to advance the capabilities of CNNs and object detection algorithms, we are at the forefront of shaping a future where machines can perceive and interact with their environments with remarkable precision and efficiency. REFERENCES 1. Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi,” You Only Look Once: Unified, Real-Time Object Detection” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 39, Issue: 6, 01 June 2017). 3. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg” Single Shot MultiBox Detector (SSD)” 2016 Conference: European Conference on Computer Vision (ECCV). 4. R. Venkatesan,A.Ganesh“Real timeimplementation on moving object tracking and recognisation using Matlab” IEEE 2012 International Conference on Computing, Communication and Applications. 5. Amruta D. Dange, B. Momin “The CNN and DPM based approach for multiple object detection in images” IEEE 2019 International Conference on Intelligent Computing and Control Systems (ICCS). 6. Madhusudan Upadhyay, S. K. Murthy, A. Raj “Intelligent System for Real time detection and classification of Aerial Targets using CNN” IEEE 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). 7. G. Vinod, Padmapriya “An Adaptable Real-Time Object Detection for Traffic Surveillance using R- CNN over CNN with Improved Accuracy”IEEE2022 International Conference on Business Analytics for Technology and Security (ICBATS). 8. Anitha Ramachandran, Arun Kumar Sangaiah “A review on object detection in unmanned aerial vehicle surveillance” International Journal of Cognitive ComputinginEngineering. Volume2,June 2021, Pages 215-228. 9. Abbas Shaik, R. Thandaiah Prabu, S. Radhika “Detection of Face Mask using Convolutional Neural Network (CNN) based Real-Time Object Detection Algorithm You Only Look Once-V3 (YOLO-V3) Compared with Single-Stage Detector (SSD) Algorithm to Improve Precision” IEEE 2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI). 10. Akash Tripathi, T. Kumar, Tarun Kanth Dhansetty,J. Kumar “Real Time Object Detection using CNN” International Journal of Engineering & Technology 7(2):33-36. 11. Mohammad Farhad Bulbul, Faishal Badsha, Rafiqul Islam” Object Detection by Point Feature Matching using Matlab” Advances In Image and Video Processing 5(6).