SlideShare a Scribd company logo
13
Most read
15
Most read
16
Most read
Multiple Object Detection
In partial fulfillment of the requirements of the degree of
Bachelor of Engineering & Technology
in
Computer Science
by
Manish Raghav (1501010002)
Mohit Kumar (1501010033)
Kunal Dogra (1501010027)
Under the Supervision of
Mrs. Saneh Lata Yadav
K. R. MANGALAM UNIVERSITY, GURUGRAM, HARYANA,
INDIA
April 2019
TABLE OF CONTENTS
__________________________________________________________________________
1. Certificate
2. Declaration
3. Approval sheet
4. Acknowledgment
5. Introduction
5.1 Problem Statement
5.2 Application
5.3 Challenges
6. Literature review
7. Objective
8. Methodology
8.1 Tools and Technology Used
8.2 Software Used
8.3 Software Requirement
9. Working
10. Result
11. Conclusion
12. References
1. CERTIFICATE
__________________________________________________________________________
It is certified that the work contained in the project report titled "Multiple Object Detection" by
the following students:
Name of the Student Roll Number
Manish Raghav 1501010002
Mohit Arora 1501010033
Kunal Dogra 1501010027
Has been carried out under our supervision and that this work has not been submitted elsewhere
for a degree.
Mrs.Saneh Lata
Assistant Professor
School of Engineering and Technology
K R Mangalam University
Gurugram, Haryana
India
___________________________________________________________________________
2. DECLARATION
___________________________________________________________________________
I declare that this written submission represents my ideas in my own words and where others'
ideas or words have been included, I have adequately cited and referenced the original sources. I
also declare that I have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I
understand that any violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been properly cited or
from whom proper permission has not been taken when needed.
Name of the Student Roll Number Signature
Manish Raghav 1501010002
Mohit Arora 1501010033
Kunal Dogra 1501010027
Date: __________
3. APPROVAL SHEET
___________________________________________________________________________
This project report Multiple Object Detection is approved for the degree of B.Tech CSE
School of Engineering and Technology.
Dean (SOET) Supervisor
Dr. Ranjeet Assistant Professor Mrs Saneh Lata Yadav
Date :____________
Place:____________
___________________________________________________________________________
4. ACKNOWLEDGEMENT
___________________________________________________________________________
It gives me immense pleasure to express my deepest sense of gratitude and sincere thanks to my
highly respected and esteemed guide Mrs Saneh Lata, for her valuable guidance,
encouragement and help for completing this work. Her useful suggestions for this whole work
and co-operative behaviour are sincerely acknowledged.
I would like to express my sincere thanks to Dr./Mr. …………….., ……………….., KRMU for
giving me this opportunity to undertake this project. I would also like to thank Dr. /Mr.
…………………………for whole hearted support.
At the end I would like to express my sincere thanks to all my friends and others who helped me
directly or indirectly during this project work.
Place: Gurugram MANISH RAGHAV
KUNAL DOGRA
MOHIT KUMAR
Date:
5. ABSTRACT
___________________________________________________________________________
Efficient and accurate object detection has been an important topic in the advancement of
computer vision systems. With the advent of deep learning techniques, the accuracy for object
detection has increased drastically. The project aims to incorporate state-of-the-art technique for
object detection with the goal of achieving high accuracy with a real-time performance. A major
challenge in many of the object detection systems is the dependency on other computer vision
techniques for helping the deep learning based approach, which leads to slow and non-optimal
performance. In this project, we use a completely deep learning based approach to solve the
problem of object detection in an end-to-end fashion. The network is trained on the most
challenging publicly available dataset on which a object detection challenge is conducted
annually. The resulting system is fast and accurate, thus aiding those applications which require
object detection
6. Introduction
1.1 Problem Statement
Many problems in computer vision were saturating on their accuracy before a decade. However,
with the rise of deep learning techniques, the accuracy of these problems drastically improved.
One of the major problems was that of image classification, which defined as predicting the class
of the image is. A slightly complicated problem is that of image localization, where the image
contains a single object and the system should predict the class of the location of the object in the
image (a bounding box around the object). The more complicated problem (this project), of
object detection involves both classification and localization. In this case, the input to the system
will be a image, and the output will be a bounding box corresponding to all the objects in the
image, along with the class of object in each box. An overview of all these problems is depicted
in Fig. 1.
1.2 Applications
A well known application of object detection is face detection that is used in almost all the
mobile cameras. A more generalized (multi-class) application can be used in autonomous driving
where a variety of objects need to be detected. Also it has a important role to play in surveillance
systems. These systems can be integrated with other tasks such as pose estimation where the first
stage in the pipeline is to detect the object, and then the second stage will be to estimate pose in
the detected region. It can be used for tracking objects and thus can be used in robotics and
medical applications. Thus this problem serves a multitude of applications.
1.3 Challenges
The major challenge in this problem is that of the variable dimension of the output which is
caused due to the variable number of objects that can be present in any given input image. Any
general machine learning task requires a fixed dimension of input and output for the model to be
trained. Another important obstacle for widespread adoption of object detection systems is the
requirement of real-time (¿30fps) while being accurate in detection. The more complex the
model is, the more time it requires for inference; and the less complex the model is, the less is
the accuracy. This trade-off between accuracy and performance needs to be chosen as per the
application. The problem involves classification as well as regression, leading the model to be
learnt simultaneously. This adds to the complexity of the problem.
7. Literature Review
These days, there are video surveillance systems everywhere. Monitoring technologies
are common in everyday life but they are also used for military and other purposes. The goal of
this thesis is to examine different algorithms for object detection using neural networks and pick
the most suitable one for pedestrian counting on affordable hardware, such as Intel NUCs or
NVIDIA Jetsons, which both cost roughly from 400 to 600 euros. These requirements cause
some limitations on the detection model because the most accurate models require lots of
computing power.
There are several different methods for object detection using computer vision, and some
methods are more reliable and robust than others. The most modern method is to use deep
learning. In deep learning, a computer learns to perform classification tasks directly from
examples and can achieve top-quality accuracy. Deep learning is part of machine learning family,
and machine learning is one of the fastest-growing and most exciting fields in artificial
intelligence. Deep learning has been around since the 1980’s, but has become useful only
recently because it requires a great amount of labelled data and computing power .Deep learning
architectures have been applied to multiple fields including computer vision, speech recognition
and board games, where in some cases these solutions have produced results comparable to
human experts, if not even superior. Most of the references used in this thesis are website articles
and blog posts, but all sources should be well-known and popular in the deep learning
community.
This thesis is structured so that the first chapters introduce the reader to the subject and explains
what object detection is and how neural networks work. The following chapters go through the
most famous deep learning algorithms and the tools used in this project. The last chapter goes
through the development in this project and explains briefly all the steps, However, because the
project is built on top of Fider as own code and due to NDA, no important code is shown.
WHAT IS OBJECT DETECTION?
Computer vision, as the name suggests, is a field in computer science that works on giving
computers the ability to see, identify and process images in the same way that human eyesight
does. In computer vision, object detection means searching for an object in an image or a
video. After detection, that object can be classified in multiple categories, such as human
or a boat, for instance. Video is just a sequence of images displayed in rapid succession, so it is
obvious that all image processing techniques can be applied to it .Object detection is one of the
areas in computer vision that is evolving very rapidly. New algorithms keep outperforming the
older ones in terms of speed and accuracy. Historically, object detection emerged in 2001 when
Paul Viola and Michael Jones came up with the idea of Haar Cascades. Haar Cascade is a
classifier which is used to detect the object which it has been trained for. Haar Cascade classifier
is trained using a set of positive and negative images, where positive images are images of the
object and negatives are something else. With the introduction of convolutional neural networks
(CNNs) and their proven success in computer vision, cascade classifiers are now the second-best
alternative . Convolutional neural networks work by splitting the input into smaller chunks, and
then passing that to the next layer which does the same thing with different rules. Object
detection and classification are simply preceding steps for object tracking. In object tracking, the
goal is to keep track of its motion, location and occlusion. Object tracking is used in many
different applications, such as video surveillance, robotics and traffic monitoring. Computer
vision deals with the extraction of meaningful information from the contents of digital images or
video. This is distinct from mere image processing, which involves manipulating visual
information on the pixel level. Applications of computer vision include image classification,
visual detection,3D scene reconstruction from 2D images, image retrieval, augmented reality,
machine vision and traffic automation .Today, machine learning is a necessary component of
many computer vision algorithms . Such algorithms can be described as a combination of image
processing and machine learning. Effective solutions require algorithms that can cope with the
vast amount of information contained in visual images, and critically for many applications, can
carry out the computation in real time. Object detection is one of the classical problems of
computer vision and is often described as a difficult task. In many respects, it is similar to other
computer vision tasks, because it involves creating a solution that is invariant to deformation and
changes in lighting and viewpoint. What makes object detection a distinct problem is that it
involves both locating and classifying regions of an image [20]. The locating part is not needed
in, for example, whole image classification. To detect an object, we need to have some idea
where the object might be and how the image is segmented. This creates a type of chicken-and-
egg
problem, where, to recognize the shape (and class) of an object, we need to know its location,
and to recognize the location of an object, we need to know its shape. Some visually dissimilar
features, such as the clothes and face of a human being, may be parts of the same object, but it is
difficult to know this without recognizing the object first. On the other hand, some objects stand
out only slightly from the background, requiring separation before recognition. Low-level visual
features of an image, such as a saliency map, may be used as a guide for locating candidate
objects. The location and size is typically defined using a bounding box, which is stored in the
form of corner coordinates. Using a rectangle is simpler than using an arbitrarily Shaped
polygon, and many operations, such as convolution, are performed on rectangles in any case. The
sub-image contained in the bounding box is then classified by an algorithm that has been trained
using machine learning. The boundaries of the object can be further refined iteratively, after
making an initial guess .During the 2000s, popular solutions for object detection utilized feature
descriptors, such as scale-invariant feature transform (SIFT) developed by David Lowe in 1999
and histogram of oriented gradients (HOG) popularized in 2005. In the 2010s, there has been a
shift towards utilizing convolutional neural networks .Before the wide scale adoption of CNNs,
there were two competing solutions for generating bounding boxes. In the first solution, a dense
set of region proposals is generated and then most of these are rejected . This typically involves a
sliding window detector. In the second solution, a sparse set of bounding boxes is generated
using a region proposal method, such as Selective Search . Combining sparse region proposals
with convolutional neural networks has provided good results and is currently popular
8. Objectives
Since many interesting lines of inquiry exist for improving convolutional object detection
systems, is it worthwhile to study the lessons learned from testing the geometric inference
method of the ”Putting Objects in Perspective” publication? The most immediate lesson is that
the method in its current form does not improve the performance of a convolutional object de-
tector, except in certain marginal cases. These cases are difficult to separate from the numerous
cases where the method degrades performance. From a practical point of view, the method is also
inefficient, because it requires a long computation time, which would have made it impractical
even if it had performed as expected. On one hand, the negative results from the geometric
inference can be perceived as a resentment of the performance capabilities of state-of-the-
art systems. Fast R-CNN already works well enough to render irrelevant the effects of a system
designed for the previous generation object detectors, and as we have demonstrated, many
methods exist for improving the detection speed and accuracy of Fast R-CNN. False negative
cases in context (specifically, the two small red boxes in the background). True boxes are shown
in darker colour than detections .On the other hand, the starting point of the original authors of
”Putting Objects in Perspective” would still appear to be valid. The improved convolutional
methods still consider the object proposals (mostly) out of context .However, we know from
practical examples that sometimes objects are only detectable from their context. Looking back
at the false negative cases in, we can see that the first two human forms are almost impossible to
visually detect as humans from the cropped images. However, from the complete image in
figure, we can, with some difficulty, identify the figures as humans from their general shape,
their location in the street and their slightly different colour compared to the surrounding
environment.
9. Methodology
The coding for this project was implemented in Python language, OpenCV library and caffe
During the process, different frameworks and pre-trained models were tested, including, Caffe
and PyTorch.
Due to the limitations in computing power, the model had to be small and fast. Tensor flow was
chosen as a framework because it was easy to implement and the pre-trained models were easy
to use due to freeze graphs.
Training of a model was also tested, hoping to acquire better accuracy in pedestrians
from a bird’s eye view.
This project aims to classify the input image as either a dog or a cat image. The image input
which you give to the system will be analyzed and the predicted result will be given as output.
Convolutional Neural Networks is used to classify the image. The dataset contains a lot of
images of cats and dogs. Our aim is to make the model learn the distinguishing features between
the cat and dog. Once the model has learned, i.e. once the model is trained, it will be able to
classify the input image as either cat or a dog.
Figure 11: Dog-Cat Image Classification Overview
9.1 Tools and Technologies
IDE—IDE and open source distribution of the Python and R programming languages for data
science and machine learning related applications, that aims to simplify package management
and deployment. IDE distribution comes with more than 1,000 data packages as well as the IDE
package and virtual environment manager, called Anaconda Navigator, so it eliminates the need
to learn to install each library independently.
Tensorflow — Tensor Flow is an open-source software library for dataflow programming across
a range of tasks. It is a symbolic math library and is also used for machine learning applications
such as neural networks. It is used for both research and production at Google. Tensor Flow was
developed by the Google Brain team for internal Google use. It was released under the Apache
2.0 open-source license on November 9, 2015.
Caffe
Expressive architecture encourages application and innovation. Models and optimization
are defined by configuration without hard-coding. Switch between CPU and GPU by setting a
single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.
Extensible code fosters active development. In Caffe’s first year, it has been forked by over
1,000 developers and had many significant changes contributed back. Thanks to these
contributors the framework tracks the state-of-the-art in both code and models.
Speedmakes Caffe perfect for research experiments and industry deployment. Caffe can
process over 60M images per day with a single NVIDIA K40 GPU*. That’s 1 ms/image for
inference and 4 ms/image for learning and more recent library versions and hardware are faster
still. We believe that Caffe is among the fastest convnet implementations available.
Community: Caffe already powers academic research projects, startup prototypes, and even
large-scale industrial applications in vision, speech, and multimedia. Join our community of
brewers on the caffe-users group and Github.
CNN — Convolution Neural network , a class of deep, feed-forward artificial neural networks,
most commonly applied to analyzing visual imagery. CNNs, like neural networks, are made up
of neurons with learnable weights and biases. Each neuron receives several inputs, takes a
weighted sum over them, pass it through an activation function and responds with an output.
9.2Software RequirementSpecification
The experiments in this project were carried out using the following hardware and software
packages.
Software
1. python 3.6
Library
Opencv
Hardware
1. Processor: Intel Core i5-6700HQ (2.6Ghz)
2. RAM : 4GB DDR4
3. GPU : Nvidia GTX 1060 2gb
10. Working
There has been a lot of work in object detection using traditional computer vision techniques
(sliding windows, deformable part models). However, they lack the accuracy of deep learning
based techniques. Among the deep learning based techniques, two broad class of methods are
prevalent: two stage detection (RCNN , Fast RCNN , Faster RCNN ) and unified detection Yolo ,
SSD The major concepts involved in these techniques have been explained below.
Bounding Box
The bounding box is a rectangle drawn on the image which tightly fits the object in the image. A
bounding box exists for every instance of every object in the image. For the box, 4 numbers
(center x, center y, width, height) are predicted. This can be trained using a distance measure
between predicted and ground truth bounding box. The distance measure is a jaccard distance
which computes intersection over union between the predicted and ground truth boxes as shown
in Fig. a
Fig a : Jaccard distance
11.Result
List of figures
1. In this photo our application is capturing 3 person and its detecting them according to their
appearance.
Fig(1)
2. In this image our application is detecting the photograph of a dog and a bottle on
its left. Our application is guessing it accurately and precisely.
Fig(2)
3. In this photograph our application is detecting 1 cow and 1 dog precisely.
Fig(3)
4. Our application is capturing 1 person and a chair which are infront of the camera accurately.
Fig(4)
Fig(5)
12. Conclusion
An accurate and efficient object detection system has been developed which achieves comparable
metrics with the existing state-of-the-art system. This project uses recent techniques in the field
of computer vision and deep learning. Custom dataset was created using labelling and the
evaluation was consistent. This can be used in real-time applications which require object
detection for pre-processing in their pipeline. An important scope would be to train the system
on a video sequence for usage in tracking applications. Addition of a temporally consistent
network would enable smooth detection and more optimal than per-frame detection
13. Refrences
[1] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for
accurate object detection and semantic segmentation. In The IEEE Conference on
ComputerVision and Pattern Recognition (CVPR), 2014.
[2] Ross Girshick. Fast R-CNN. In International Conference on Computer Vision (ICCV), 2015.
[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards realtime
object detection with region proposal networks. In Advances in Neural Information Processing
Systems (NIPS), 2015.
[4] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:
Unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016.
[5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang
Fu, and Alexander C. Berg. SSD: Single shot multibox detector. In ECCV, 2016.
[6] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-
scaleimage recognition. arXiv preprint arXiv:1409.1556, 2014.

More Related Content

What's hot (20)

PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PPTX
Object Detection & Tracking
Akshay Gujarathi
 
PPTX
Object detection with deep learning
Sushant Shrivastava
 
PPTX
Human age and gender Detection
AbhiAchalla
 
PPTX
Face detection presentation slide
Sanjoy Dutta
 
PPTX
Object Detection using Deep Neural Networks
Usman Qayyum
 
PDF
Moving Object Detection And Tracking Using CNN
NITISHKUMAR1401
 
PPTX
Multiple object detection
SAURABH KUMAR
 
PPTX
YOLO
geothomas18
 
PPTX
Object detection
ROUSHAN RAJ KUMAR
 
PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
PDF
Machine Learning - Object Detection and Classification
Vikas Jain
 
PPTX
Image classification using CNN
Noura Hussein
 
PPTX
Face detection and recognition
Pankaj Thakur
 
PDF
Object tracking presentation
MrsShwetaBanait1
 
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
PPTX
Image Classification using deep learning
Asma-AH
 
PDF
Computer vision
Dmitry Ryabokon
 
PPTX
Yolo
NEHA Kapoor
 
PDF
Introduction to object detection
Brodmann17
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
Object Detection & Tracking
Akshay Gujarathi
 
Object detection with deep learning
Sushant Shrivastava
 
Human age and gender Detection
AbhiAchalla
 
Face detection presentation slide
Sanjoy Dutta
 
Object Detection using Deep Neural Networks
Usman Qayyum
 
Moving Object Detection And Tracking Using CNN
NITISHKUMAR1401
 
Multiple object detection
SAURABH KUMAR
 
Object detection
ROUSHAN RAJ KUMAR
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
Machine Learning - Object Detection and Classification
Vikas Jain
 
Image classification using CNN
Noura Hussein
 
Face detection and recognition
Pankaj Thakur
 
Object tracking presentation
MrsShwetaBanait1
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Image Classification using deep learning
Asma-AH
 
Computer vision
Dmitry Ryabokon
 
Introduction to object detection
Brodmann17
 

Similar to Multiple object detection report (20)

PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Samira Akter Tumpa
 
PDF
IRJET- Real-Time Object Detection System using Caffe Model
IRJET Journal
 
PDF
IRJET- Comparative Analysis of Video Processing Object Detection
IRJET Journal
 
PDF
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
PPTX
Object recognition
Aakanksha Singh
 
PDF
ArtificialIntelligenceInObjectDetection-Report.pdf
Abishek86232
 
PPTX
SMART RECOGNITION FOR OBJECT DETECTION.pptx
divyasindhu040
 
PDF
IRJET- Application of MCNN in Object Detection
IRJET Journal
 
PDF
Object Detection and Tracking AI Robot
IRJET Journal
 
PPTX
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
PDF
IRJET- Applications of Object Detection System
IRJET Journal
 
PPTX
seminar ppt.pptx
VikulKumar16
 
PPTX
ppt.2.pptx image detection project ppt an
anandnayak7750
 
PDF
IRJET- Object Detection in Real Time using AI and Deep Learning
IRJET Journal
 
PDF
IRJET-Real-Time Object Detection: A Survey
IRJET Journal
 
PDF
IRJET- A Survey on Object Detection using Deep Learning Techniques
IRJET Journal
 
PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
PDF
Application To Monitor And Manage People In Crowded Places Using Neural Networks
IJSRED
 
PDF
ooObject detection and Recognization.pdf
DevidasBhere
 
PDF
Convolutional Neural Network Based Real Time Object Detection Using YOLO V4
IRJET Journal
 
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Samira Akter Tumpa
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET Journal
 
IRJET- Comparative Analysis of Video Processing Object Detection
IRJET Journal
 
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
Object recognition
Aakanksha Singh
 
ArtificialIntelligenceInObjectDetection-Report.pdf
Abishek86232
 
SMART RECOGNITION FOR OBJECT DETECTION.pptx
divyasindhu040
 
IRJET- Application of MCNN in Object Detection
IRJET Journal
 
Object Detection and Tracking AI Robot
IRJET Journal
 
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
IRJET- Applications of Object Detection System
IRJET Journal
 
seminar ppt.pptx
VikulKumar16
 
ppt.2.pptx image detection project ppt an
anandnayak7750
 
IRJET- Object Detection in Real Time using AI and Deep Learning
IRJET Journal
 
IRJET-Real-Time Object Detection: A Survey
IRJET Journal
 
IRJET- A Survey on Object Detection using Deep Learning Techniques
IRJET Journal
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
Application To Monitor And Manage People In Crowded Places Using Neural Networks
IJSRED
 
ooObject detection and Recognization.pdf
DevidasBhere
 
Convolutional Neural Network Based Real Time Object Detection Using YOLO V4
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Digital Circuits, important subject in CS
contactparinay1
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Ad

Multiple object detection report

  • 1. Multiple Object Detection In partial fulfillment of the requirements of the degree of Bachelor of Engineering & Technology in Computer Science by Manish Raghav (1501010002) Mohit Kumar (1501010033) Kunal Dogra (1501010027) Under the Supervision of Mrs. Saneh Lata Yadav K. R. MANGALAM UNIVERSITY, GURUGRAM, HARYANA, INDIA April 2019
  • 2. TABLE OF CONTENTS __________________________________________________________________________ 1. Certificate 2. Declaration 3. Approval sheet 4. Acknowledgment 5. Introduction 5.1 Problem Statement 5.2 Application 5.3 Challenges 6. Literature review 7. Objective 8. Methodology 8.1 Tools and Technology Used 8.2 Software Used 8.3 Software Requirement 9. Working 10. Result 11. Conclusion 12. References
  • 3. 1. CERTIFICATE __________________________________________________________________________ It is certified that the work contained in the project report titled "Multiple Object Detection" by the following students: Name of the Student Roll Number Manish Raghav 1501010002 Mohit Arora 1501010033 Kunal Dogra 1501010027 Has been carried out under our supervision and that this work has not been submitted elsewhere for a degree. Mrs.Saneh Lata Assistant Professor School of Engineering and Technology K R Mangalam University Gurugram, Haryana India ___________________________________________________________________________
  • 4. 2. DECLARATION ___________________________________________________________________________ I declare that this written submission represents my ideas in my own words and where others' ideas or words have been included, I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. Name of the Student Roll Number Signature Manish Raghav 1501010002 Mohit Arora 1501010033 Kunal Dogra 1501010027 Date: __________
  • 5. 3. APPROVAL SHEET ___________________________________________________________________________ This project report Multiple Object Detection is approved for the degree of B.Tech CSE School of Engineering and Technology. Dean (SOET) Supervisor Dr. Ranjeet Assistant Professor Mrs Saneh Lata Yadav Date :____________ Place:____________ ___________________________________________________________________________
  • 6. 4. ACKNOWLEDGEMENT ___________________________________________________________________________ It gives me immense pleasure to express my deepest sense of gratitude and sincere thanks to my highly respected and esteemed guide Mrs Saneh Lata, for her valuable guidance, encouragement and help for completing this work. Her useful suggestions for this whole work and co-operative behaviour are sincerely acknowledged. I would like to express my sincere thanks to Dr./Mr. …………….., ……………….., KRMU for giving me this opportunity to undertake this project. I would also like to thank Dr. /Mr. …………………………for whole hearted support. At the end I would like to express my sincere thanks to all my friends and others who helped me directly or indirectly during this project work. Place: Gurugram MANISH RAGHAV KUNAL DOGRA MOHIT KUMAR Date:
  • 7. 5. ABSTRACT ___________________________________________________________________________ Efficient and accurate object detection has been an important topic in the advancement of computer vision systems. With the advent of deep learning techniques, the accuracy for object detection has increased drastically. The project aims to incorporate state-of-the-art technique for object detection with the goal of achieving high accuracy with a real-time performance. A major challenge in many of the object detection systems is the dependency on other computer vision techniques for helping the deep learning based approach, which leads to slow and non-optimal performance. In this project, we use a completely deep learning based approach to solve the problem of object detection in an end-to-end fashion. The network is trained on the most challenging publicly available dataset on which a object detection challenge is conducted annually. The resulting system is fast and accurate, thus aiding those applications which require object detection
  • 8. 6. Introduction 1.1 Problem Statement Many problems in computer vision were saturating on their accuracy before a decade. However, with the rise of deep learning techniques, the accuracy of these problems drastically improved. One of the major problems was that of image classification, which defined as predicting the class of the image is. A slightly complicated problem is that of image localization, where the image contains a single object and the system should predict the class of the location of the object in the image (a bounding box around the object). The more complicated problem (this project), of object detection involves both classification and localization. In this case, the input to the system will be a image, and the output will be a bounding box corresponding to all the objects in the image, along with the class of object in each box. An overview of all these problems is depicted in Fig. 1. 1.2 Applications A well known application of object detection is face detection that is used in almost all the mobile cameras. A more generalized (multi-class) application can be used in autonomous driving where a variety of objects need to be detected. Also it has a important role to play in surveillance
  • 9. systems. These systems can be integrated with other tasks such as pose estimation where the first stage in the pipeline is to detect the object, and then the second stage will be to estimate pose in the detected region. It can be used for tracking objects and thus can be used in robotics and medical applications. Thus this problem serves a multitude of applications. 1.3 Challenges The major challenge in this problem is that of the variable dimension of the output which is caused due to the variable number of objects that can be present in any given input image. Any general machine learning task requires a fixed dimension of input and output for the model to be trained. Another important obstacle for widespread adoption of object detection systems is the requirement of real-time (¿30fps) while being accurate in detection. The more complex the model is, the more time it requires for inference; and the less complex the model is, the less is the accuracy. This trade-off between accuracy and performance needs to be chosen as per the application. The problem involves classification as well as regression, leading the model to be learnt simultaneously. This adds to the complexity of the problem.
  • 10. 7. Literature Review These days, there are video surveillance systems everywhere. Monitoring technologies are common in everyday life but they are also used for military and other purposes. The goal of this thesis is to examine different algorithms for object detection using neural networks and pick the most suitable one for pedestrian counting on affordable hardware, such as Intel NUCs or NVIDIA Jetsons, which both cost roughly from 400 to 600 euros. These requirements cause some limitations on the detection model because the most accurate models require lots of computing power. There are several different methods for object detection using computer vision, and some methods are more reliable and robust than others. The most modern method is to use deep learning. In deep learning, a computer learns to perform classification tasks directly from examples and can achieve top-quality accuracy. Deep learning is part of machine learning family, and machine learning is one of the fastest-growing and most exciting fields in artificial intelligence. Deep learning has been around since the 1980’s, but has become useful only recently because it requires a great amount of labelled data and computing power .Deep learning architectures have been applied to multiple fields including computer vision, speech recognition and board games, where in some cases these solutions have produced results comparable to human experts, if not even superior. Most of the references used in this thesis are website articles
  • 11. and blog posts, but all sources should be well-known and popular in the deep learning community. This thesis is structured so that the first chapters introduce the reader to the subject and explains what object detection is and how neural networks work. The following chapters go through the most famous deep learning algorithms and the tools used in this project. The last chapter goes through the development in this project and explains briefly all the steps, However, because the project is built on top of Fider as own code and due to NDA, no important code is shown. WHAT IS OBJECT DETECTION? Computer vision, as the name suggests, is a field in computer science that works on giving computers the ability to see, identify and process images in the same way that human eyesight does. In computer vision, object detection means searching for an object in an image or a video. After detection, that object can be classified in multiple categories, such as human or a boat, for instance. Video is just a sequence of images displayed in rapid succession, so it is obvious that all image processing techniques can be applied to it .Object detection is one of the areas in computer vision that is evolving very rapidly. New algorithms keep outperforming the older ones in terms of speed and accuracy. Historically, object detection emerged in 2001 when Paul Viola and Michael Jones came up with the idea of Haar Cascades. Haar Cascade is a classifier which is used to detect the object which it has been trained for. Haar Cascade classifier is trained using a set of positive and negative images, where positive images are images of the object and negatives are something else. With the introduction of convolutional neural networks (CNNs) and their proven success in computer vision, cascade classifiers are now the second-best alternative . Convolutional neural networks work by splitting the input into smaller chunks, and then passing that to the next layer which does the same thing with different rules. Object detection and classification are simply preceding steps for object tracking. In object tracking, the goal is to keep track of its motion, location and occlusion. Object tracking is used in many different applications, such as video surveillance, robotics and traffic monitoring. Computer
  • 12. vision deals with the extraction of meaningful information from the contents of digital images or video. This is distinct from mere image processing, which involves manipulating visual information on the pixel level. Applications of computer vision include image classification, visual detection,3D scene reconstruction from 2D images, image retrieval, augmented reality, machine vision and traffic automation .Today, machine learning is a necessary component of many computer vision algorithms . Such algorithms can be described as a combination of image processing and machine learning. Effective solutions require algorithms that can cope with the vast amount of information contained in visual images, and critically for many applications, can carry out the computation in real time. Object detection is one of the classical problems of computer vision and is often described as a difficult task. In many respects, it is similar to other computer vision tasks, because it involves creating a solution that is invariant to deformation and changes in lighting and viewpoint. What makes object detection a distinct problem is that it involves both locating and classifying regions of an image [20]. The locating part is not needed in, for example, whole image classification. To detect an object, we need to have some idea where the object might be and how the image is segmented. This creates a type of chicken-and- egg problem, where, to recognize the shape (and class) of an object, we need to know its location, and to recognize the location of an object, we need to know its shape. Some visually dissimilar features, such as the clothes and face of a human being, may be parts of the same object, but it is difficult to know this without recognizing the object first. On the other hand, some objects stand out only slightly from the background, requiring separation before recognition. Low-level visual features of an image, such as a saliency map, may be used as a guide for locating candidate objects. The location and size is typically defined using a bounding box, which is stored in the form of corner coordinates. Using a rectangle is simpler than using an arbitrarily Shaped polygon, and many operations, such as convolution, are performed on rectangles in any case. The sub-image contained in the bounding box is then classified by an algorithm that has been trained using machine learning. The boundaries of the object can be further refined iteratively, after making an initial guess .During the 2000s, popular solutions for object detection utilized feature descriptors, such as scale-invariant feature transform (SIFT) developed by David Lowe in 1999 and histogram of oriented gradients (HOG) popularized in 2005. In the 2010s, there has been a shift towards utilizing convolutional neural networks .Before the wide scale adoption of CNNs, there were two competing solutions for generating bounding boxes. In the first solution, a dense set of region proposals is generated and then most of these are rejected . This typically involves a sliding window detector. In the second solution, a sparse set of bounding boxes is generated using a region proposal method, such as Selective Search . Combining sparse region proposals with convolutional neural networks has provided good results and is currently popular
  • 13. 8. Objectives Since many interesting lines of inquiry exist for improving convolutional object detection systems, is it worthwhile to study the lessons learned from testing the geometric inference method of the ”Putting Objects in Perspective” publication? The most immediate lesson is that the method in its current form does not improve the performance of a convolutional object de- tector, except in certain marginal cases. These cases are difficult to separate from the numerous cases where the method degrades performance. From a practical point of view, the method is also inefficient, because it requires a long computation time, which would have made it impractical even if it had performed as expected. On one hand, the negative results from the geometric inference can be perceived as a resentment of the performance capabilities of state-of-the- art systems. Fast R-CNN already works well enough to render irrelevant the effects of a system designed for the previous generation object detectors, and as we have demonstrated, many methods exist for improving the detection speed and accuracy of Fast R-CNN. False negative cases in context (specifically, the two small red boxes in the background). True boxes are shown in darker colour than detections .On the other hand, the starting point of the original authors of ”Putting Objects in Perspective” would still appear to be valid. The improved convolutional methods still consider the object proposals (mostly) out of context .However, we know from
  • 14. practical examples that sometimes objects are only detectable from their context. Looking back at the false negative cases in, we can see that the first two human forms are almost impossible to visually detect as humans from the cropped images. However, from the complete image in figure, we can, with some difficulty, identify the figures as humans from their general shape, their location in the street and their slightly different colour compared to the surrounding environment. 9. Methodology The coding for this project was implemented in Python language, OpenCV library and caffe During the process, different frameworks and pre-trained models were tested, including, Caffe and PyTorch. Due to the limitations in computing power, the model had to be small and fast. Tensor flow was chosen as a framework because it was easy to implement and the pre-trained models were easy to use due to freeze graphs. Training of a model was also tested, hoping to acquire better accuracy in pedestrians from a bird’s eye view. This project aims to classify the input image as either a dog or a cat image. The image input which you give to the system will be analyzed and the predicted result will be given as output. Convolutional Neural Networks is used to classify the image. The dataset contains a lot of images of cats and dogs. Our aim is to make the model learn the distinguishing features between the cat and dog. Once the model has learned, i.e. once the model is trained, it will be able to classify the input image as either cat or a dog.
  • 15. Figure 11: Dog-Cat Image Classification Overview 9.1 Tools and Technologies IDE—IDE and open source distribution of the Python and R programming languages for data science and machine learning related applications, that aims to simplify package management and deployment. IDE distribution comes with more than 1,000 data packages as well as the IDE package and virtual environment manager, called Anaconda Navigator, so it eliminates the need to learn to install each library independently. Tensorflow — Tensor Flow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks. It is used for both research and production at Google. Tensor Flow was developed by the Google Brain team for internal Google use. It was released under the Apache 2.0 open-source license on November 9, 2015. Caffe Expressive architecture encourages application and innovation. Models and optimization are defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices. Extensible code fosters active development. In Caffe’s first year, it has been forked by over 1,000 developers and had many significant changes contributed back. Thanks to these contributors the framework tracks the state-of-the-art in both code and models. Speedmakes Caffe perfect for research experiments and industry deployment. Caffe can process over 60M images per day with a single NVIDIA K40 GPU*. That’s 1 ms/image for
  • 16. inference and 4 ms/image for learning and more recent library versions and hardware are faster still. We believe that Caffe is among the fastest convnet implementations available. Community: Caffe already powers academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Join our community of brewers on the caffe-users group and Github. CNN — Convolution Neural network , a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. CNNs, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output. 9.2Software RequirementSpecification The experiments in this project were carried out using the following hardware and software packages. Software 1. python 3.6 Library Opencv Hardware 1. Processor: Intel Core i5-6700HQ (2.6Ghz) 2. RAM : 4GB DDR4 3. GPU : Nvidia GTX 1060 2gb 10. Working There has been a lot of work in object detection using traditional computer vision techniques (sliding windows, deformable part models). However, they lack the accuracy of deep learning based techniques. Among the deep learning based techniques, two broad class of methods are
  • 17. prevalent: two stage detection (RCNN , Fast RCNN , Faster RCNN ) and unified detection Yolo , SSD The major concepts involved in these techniques have been explained below. Bounding Box The bounding box is a rectangle drawn on the image which tightly fits the object in the image. A bounding box exists for every instance of every object in the image. For the box, 4 numbers (center x, center y, width, height) are predicted. This can be trained using a distance measure between predicted and ground truth bounding box. The distance measure is a jaccard distance which computes intersection over union between the predicted and ground truth boxes as shown in Fig. a Fig a : Jaccard distance
  • 18. 11.Result List of figures 1. In this photo our application is capturing 3 person and its detecting them according to their appearance. Fig(1)
  • 19. 2. In this image our application is detecting the photograph of a dog and a bottle on its left. Our application is guessing it accurately and precisely. Fig(2) 3. In this photograph our application is detecting 1 cow and 1 dog precisely.
  • 20. Fig(3) 4. Our application is capturing 1 person and a chair which are infront of the camera accurately. Fig(4)
  • 21. Fig(5) 12. Conclusion An accurate and efficient object detection system has been developed which achieves comparable metrics with the existing state-of-the-art system. This project uses recent techniques in the field of computer vision and deep learning. Custom dataset was created using labelling and the evaluation was consistent. This can be used in real-time applications which require object detection for pre-processing in their pipeline. An important scope would be to train the system on a video sequence for usage in tracking applications. Addition of a temporally consistent network would enable smooth detection and more optimal than per-frame detection
  • 22. 13. Refrences [1] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2014. [2] Ross Girshick. Fast R-CNN. In International Conference on Computer Vision (ICCV), 2015.
  • 23. [3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards realtime object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS), 2015. [4] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [5] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, ChengYang Fu, and Alexander C. Berg. SSD: Single shot multibox detector. In ECCV, 2016. [6] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large- scaleimage recognition. arXiv preprint arXiv:1409.1556, 2014.