SlideShare a Scribd company logo
Object Detection in
TensorFlow
Nissim Cantor, Avi Radinsky, Jacob Silbiger
GitHub: https://github.com/ndcantor/tensorflow-street-classifier
Our Mentor, Gershom Kutliroff
• Chief Science Officer, Taranis
• CTO & Founder, ClearVuze
• Principle Engineer, Intel
• CTO & Founder, Omek Interactive
• Chief Scientist, IDT Video Technologies
Linkedin
Goal
• The goal of our project was to create an image classification and object detection model
that simulates how one would be used in a self driving car
• Our model uses a neural network to draw boxes around and label objects in an image
• Brief overview of model architecture:
• TensorFlow transfer learning image classifier
• Object detection using:
• Selective search
• Non-maximum suppression
Background
Image Classification vs. Object Detection
• Image classification - determining
the classification of a given image
(ex. Is this a photograph of a cat or
a dog).
• Object detection - detecting the an
image of a certain class within a
larger picture (ex. Within this
picture of many animals, draw a
bounding box around all relevant
animals and identify them.)
Neural Networks
• “A neural network is a series of
algorithms that endeavors to recognize
underlying relationships in a set of data
through a process that mimics the way
the human brain operates. In this sense,
neural networks refer to systems of
neurons, either organic or artificial in
nature.” (investopedia.com)
• It turns out that neural networks often
produce generalizable results.
• Its best to view them as a black box.
Convolutional Neural Networks
• A class of Neural Networks
commonly used in computer vision
applications
Wikipedia
VGG16 (Simonyan and Zisserman)
• A CNN architecture developed by
Oxford University researchers that
showed high rates of success in
image recognition.
• It showed that ‘deep’ Neural
Networks might be more effective
than ‘large’ neural networks.
Wiki
But how does a Neural Network actually work?
• Currently, the prevalent theory as to
how a neural network works is that
each layer is detecting a specific
characteristic of the image that it sees,
and as you travel through the layers of
the network the layers begin detecting
increasingly complex characteristics.
Transfer Learning
• The characteristic detecting nature of
neural networks allows us to leverage
pre-trained networks on different
problems with minimal changes.
• The changes can be as minimal as
retraining the last layer in the network
but sometime can involve training
several or more of the last layers in a
network.
Fine Tuning
Through tweaking the parameters of a neural network, one can try and maximize its accuracy.
However, each tweak of the parameters has its positives and negatives.
• Running through the data numerous times (multiple epochs)
• It can make up for small data sizes (+)
• It can cause overtraining in the model (-)
• Increasing the size of the neural network
• Accuracy can increase (+)
• It becomes easier to overtrain the model (-)
TensorFlow Street Classifier
Data
• Since we wanted to simulate a model that might be used in a self driving car, we decided
to train our network to recognize everyday objects found in city streets
• Classes:
• Bicycle
• Car
• Motorcycle
• Person
• Train
Datasets
• We used the COCO (Common Objects
in Context) Dataset as it contains high-
quality images of these objects in the
real-life scenes.
• We also used Open Images V6 Dataset
to supplement some more images for
the small sized classes.
• Both datasets contained labels and
bounding boxes which we used as
ground truth to train our model.
Generating Train/Test Images
• All bounding boxes from every image in the datasets were cropped and saved inside
of its class folder (crops of cars were stored in the ‘car’ folder, etc)
• These cropped images were then split into train and test folders
• Shift augmentations were performed for crops in the train folder in order to help the
model learn more about the features of each class.
• These crops were then used to train the model
• Any image from the datasets which had no bounding boxes of the classes we were
training the machine to recognize were sliced into quadrants, with each quarter being
placed into a ‘background’ class folder
• The purpose of the background class is to allow the model to classify unknown objects
as ‘background’ instead of mistakenly classifying it as one of the five other classes.
• In total the model was trained on over 848,000 cropped images
Classification Model
• Our model uses transfer learning using the
MobileNetV2 architecture
• The model ran for 20 epochs, the first 10 with a
learning rate of 0.0001 and the last 10 with a
learning rate of 0.00001
• After training, our model obtained a score of
over 95% test accuracy
• The figure to the right is a confusion matrix,
showing how accurate the model was for each
of the 6 classes
Object Detection
• After classifying each image, the next step is object detection
• Object detection works by sending image crops into the classification model and drawing
boxes around all crops that were classified as one of the non-background classes
• We tried two different methods of generating crops to send to into the classifier:
1. Sliding windows
2. Selective search
Sliding Windows
• Sliding windows works by passing a fixed-size rectangle across the image, cropping the
image and saving the crop’s location on the original image. The image is then enlarged
and the process is repeated in order to generate smaller boxes to help find smaller
objects.
• The process of repeatedly enlarging the image is called pyramid scaling
• The crops are then sent to the classification model to be classified. The model outputs the
original image with bounding boxes drawn around all crops which were classified as non-
background objects
Example
Original Image
Image crops with pyramid scaling
Problems with Sliding Windows
The sliding windows method had two main problems:
1. It is a brute force method, so it took a long time to run
2. The boxes all had a fixed size, so it was hard to classify multiple classes of objects that
had different shapes
Selective Search
• Selective search is an algorithm that works by creating
crops based on certain criteria in the image
• color, texture, shape, size
• This produces boxes of different shapes and sizes,
and doesn’t work by repeatedly running brute-force
across the image
• Crops are then sent to the model to be classified
Our model uses selective search as it is faster and more
accurate than sliding windows
(Diagram from pyimagesearch.com)
Non-maximum Suppression (NMS)
• After the cropped regions are classified, many adjacent regions
will be classified as the same object and will overlap with each
other. In order to choose only the best crop for each object, we
run a non-maximum suppression algorithm to remove all
overlapping regions.
• How NMS works:
• Select the crop whose classification the model is most
confident about
• Remove all bounding boxes whose overlap whose
intersection over union (IOU) with the selected crop is
above a certain threshold
• Output all remaining bounding boxes which are less than
the overlap threshold
Before NMS After NMS
Results
• Our model is able to classify images
with relatively high accuracy
• We were successful in creating an
image detection model that can detect if
an object of any of our classes is in the
image
• With our GitHub repo, you can
download the data and train your own
model
Further Exploration
• How we might be able to further improve our model:
• Uneven training data class sizes led our model to be skewed toward the bigger
classes. One way to prevent this would be to find more data to increase the size of
the smaller classes
• Another way to help this problem is to use a focal loss function. This loss function
helps the model adjust its weights more evenly by more drastically adjusting
weights when it mistakenly classifies an object of a smaller class than an object of
a bigger class
• Another aspect to improve is the accuracy of the bounding boxes. In order to have
tighter boxes, we would be able to run a bounding box regression algorithm
Do it Yourself - How to Build and Run Our Model
Pre-requisites:
1. Python 3.7+
2. CUDA 10.0
3. cuDNN 7.5.1
Run:
1. To clone the repo - git clone https://github.com/ndcantor/tensorflow-street-classifier.git
2. Install required Python libraries - pip3 install -r requirements.txt
3. Download data, build, train, and run inference on the model - python3 street_classifier.py
4. Results:
A. The model’s train, validation, and text accuracies will be printed on the screen
B. A confusion matrix as well as sample test images will be saved to disk
Ad

Recommended

AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Vandana Kannan
 
AI powered emotion recognition: From Inception to Production - Global AI Conf...
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Apache MXNet
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
Tuan Yang
 
Introduction to deep learning
Introduction to deep learning
Vishwas Lele
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
Intro to Deep Learning with Keras - using TensorFlow backend
Intro to Deep Learning with Keras - using TensorFlow backend
Amin Golnari
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
Turi, Inc.
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
Madhu Sanjeevi (Mady)
 
PyData Delhi 2018 : Creating Art with Neural Nets
PyData Delhi 2018 : Creating Art with Neural Nets
srish1
 
Deep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
Dato Keynote
Dato Keynote
Turi, Inc.
 
Handwritten mathematical symbol recognition
Handwritten mathematical symbol recognition
Meghana Kantharaj
 
Object tracking final
Object tracking final
MrsShwetaBanait1
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Object tracking presentation
Object tracking presentation
MrsShwetaBanait1
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
multiple object tracking using particle filter
multiple object tracking using particle filter
SRIKANTH DANDE
 
Open Source AI and ML, Whats Possible Today?
Open Source AI and ML, Whats Possible Today?
Justin Reock
 
Introduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
A Large-scale hierarchical image database
A Large-scale hierarchical image database
Rezapourabbas
 
Object tracking
Object tracking
Sri vidhya k
 
Multiple Object Tracking
Multiple Object Tracking
RainakSharma
 
Advance deep learning
Advance deep learning
aliaKhan71
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
Object tracking
Object tracking
chirase44
 
Object tracking a survey
Object tracking a survey
Haseeb Hassan
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
Manish Khare
 
cnn ppt.pptx
cnn ppt.pptx
rohithprabhas1
 
Deep learning with keras
Deep learning with keras
MOHITKUMAR1379
 

More Related Content

What's hot (20)

PyData Delhi 2018 : Creating Art with Neural Nets
PyData Delhi 2018 : Creating Art with Neural Nets
srish1
 
Deep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
Dato Keynote
Dato Keynote
Turi, Inc.
 
Handwritten mathematical symbol recognition
Handwritten mathematical symbol recognition
Meghana Kantharaj
 
Object tracking final
Object tracking final
MrsShwetaBanait1
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Object tracking presentation
Object tracking presentation
MrsShwetaBanait1
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
multiple object tracking using particle filter
multiple object tracking using particle filter
SRIKANTH DANDE
 
Open Source AI and ML, Whats Possible Today?
Open Source AI and ML, Whats Possible Today?
Justin Reock
 
Introduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
A Large-scale hierarchical image database
A Large-scale hierarchical image database
Rezapourabbas
 
Object tracking
Object tracking
Sri vidhya k
 
Multiple Object Tracking
Multiple Object Tracking
RainakSharma
 
Advance deep learning
Advance deep learning
aliaKhan71
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
Object tracking
Object tracking
chirase44
 
Object tracking a survey
Object tracking a survey
Haseeb Hassan
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
Manish Khare
 
PyData Delhi 2018 : Creating Art with Neural Nets
PyData Delhi 2018 : Creating Art with Neural Nets
srish1
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 
Handwritten mathematical symbol recognition
Handwritten mathematical symbol recognition
Meghana Kantharaj
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Object tracking presentation
Object tracking presentation
MrsShwetaBanait1
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
multiple object tracking using particle filter
multiple object tracking using particle filter
SRIKANTH DANDE
 
Open Source AI and ML, Whats Possible Today?
Open Source AI and ML, Whats Possible Today?
Justin Reock
 
Introduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
A Large-scale hierarchical image database
A Large-scale hierarchical image database
Rezapourabbas
 
Multiple Object Tracking
Multiple Object Tracking
RainakSharma
 
Advance deep learning
Advance deep learning
aliaKhan71
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
Object tracking
Object tracking
chirase44
 
Object tracking a survey
Object tracking a survey
Haseeb Hassan
 
Video object tracking with classification and recognition of objects
Video object tracking with classification and recognition of objects
Manish Khare
 

Similar to YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object Detection Model (20)

cnn ppt.pptx
cnn ppt.pptx
rohithprabhas1
 
Deep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Hannes Fassold
 
Computer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT
SalihaBathool
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ali Alkan
 
Introduction to Convolutional Neural Networks (CNNs).pptx
Introduction to Convolutional Neural Networks (CNNs).pptx
CHRISEVANS269099
 
ppt.pdf
ppt.pdf
MohanRaj924804
 
Introduction to Generative AI refers to a subset of artificial intelligence
Introduction to Generative AI refers to a subset of artificial intelligence
Kongu Engineering College, Perundurai, Erode
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
JawadHaider36
 
Mnist soln
Mnist soln
DanishFaisal4
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
Bill Liu
 
Deep learning summary
Deep learning summary
ankit_ppt
 
Introduction to transfer learning,aster way of adapting a neural network by e...
Introduction to transfer learning,aster way of adapting a neural network by e...
ShatrughanKumar14
 
Deeplearning
Deeplearning
Nimrita Koul
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Egyptian Engineers Association
 
Teach a neural network to read handwriting
Teach a neural network to read handwriting
Vipul Kaushal
 
Deep Learning.pptx
Deep Learning.pptx
Ramya Nellutla
 
slide-171212080528.pptx
slide-171212080528.pptx
SharanrajK22MMT1003
 
Deep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
Aditya Bhattacharya
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Hannes Fassold
 
Computer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT
UNSUPERVISED NEURAL.pptx UNSUPERVISED PPT
SalihaBathool
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ali Alkan
 
Introduction to Convolutional Neural Networks (CNNs).pptx
Introduction to Convolutional Neural Networks (CNNs).pptx
CHRISEVANS269099
 
Deep Computer Vision - 1.pptx
Deep Computer Vision - 1.pptx
JawadHaider36
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
Bill Liu
 
Deep learning summary
Deep learning summary
ankit_ppt
 
Introduction to transfer learning,aster way of adapting a neural network by e...
Introduction to transfer learning,aster way of adapting a neural network by e...
ShatrughanKumar14
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
Egyptian Engineers Association
 
Teach a neural network to read handwriting
Teach a neural network to read handwriting
Vipul Kaushal
 
Ad

Recently uploaded (20)

Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
Ad

YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object Detection Model

  • 1. Object Detection in TensorFlow Nissim Cantor, Avi Radinsky, Jacob Silbiger GitHub: https://github.com/ndcantor/tensorflow-street-classifier
  • 2. Our Mentor, Gershom Kutliroff • Chief Science Officer, Taranis • CTO & Founder, ClearVuze • Principle Engineer, Intel • CTO & Founder, Omek Interactive • Chief Scientist, IDT Video Technologies Linkedin
  • 3. Goal • The goal of our project was to create an image classification and object detection model that simulates how one would be used in a self driving car • Our model uses a neural network to draw boxes around and label objects in an image • Brief overview of model architecture: • TensorFlow transfer learning image classifier • Object detection using: • Selective search • Non-maximum suppression
  • 5. Image Classification vs. Object Detection • Image classification - determining the classification of a given image (ex. Is this a photograph of a cat or a dog). • Object detection - detecting the an image of a certain class within a larger picture (ex. Within this picture of many animals, draw a bounding box around all relevant animals and identify them.)
  • 6. Neural Networks • “A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.” (investopedia.com) • It turns out that neural networks often produce generalizable results. • Its best to view them as a black box.
  • 7. Convolutional Neural Networks • A class of Neural Networks commonly used in computer vision applications Wikipedia
  • 8. VGG16 (Simonyan and Zisserman) • A CNN architecture developed by Oxford University researchers that showed high rates of success in image recognition. • It showed that ‘deep’ Neural Networks might be more effective than ‘large’ neural networks. Wiki
  • 9. But how does a Neural Network actually work? • Currently, the prevalent theory as to how a neural network works is that each layer is detecting a specific characteristic of the image that it sees, and as you travel through the layers of the network the layers begin detecting increasingly complex characteristics.
  • 10. Transfer Learning • The characteristic detecting nature of neural networks allows us to leverage pre-trained networks on different problems with minimal changes. • The changes can be as minimal as retraining the last layer in the network but sometime can involve training several or more of the last layers in a network.
  • 11. Fine Tuning Through tweaking the parameters of a neural network, one can try and maximize its accuracy. However, each tweak of the parameters has its positives and negatives. • Running through the data numerous times (multiple epochs) • It can make up for small data sizes (+) • It can cause overtraining in the model (-) • Increasing the size of the neural network • Accuracy can increase (+) • It becomes easier to overtrain the model (-)
  • 13. Data • Since we wanted to simulate a model that might be used in a self driving car, we decided to train our network to recognize everyday objects found in city streets • Classes: • Bicycle • Car • Motorcycle • Person • Train
  • 14. Datasets • We used the COCO (Common Objects in Context) Dataset as it contains high- quality images of these objects in the real-life scenes. • We also used Open Images V6 Dataset to supplement some more images for the small sized classes. • Both datasets contained labels and bounding boxes which we used as ground truth to train our model.
  • 15. Generating Train/Test Images • All bounding boxes from every image in the datasets were cropped and saved inside of its class folder (crops of cars were stored in the ‘car’ folder, etc) • These cropped images were then split into train and test folders • Shift augmentations were performed for crops in the train folder in order to help the model learn more about the features of each class. • These crops were then used to train the model • Any image from the datasets which had no bounding boxes of the classes we were training the machine to recognize were sliced into quadrants, with each quarter being placed into a ‘background’ class folder • The purpose of the background class is to allow the model to classify unknown objects as ‘background’ instead of mistakenly classifying it as one of the five other classes. • In total the model was trained on over 848,000 cropped images
  • 16. Classification Model • Our model uses transfer learning using the MobileNetV2 architecture • The model ran for 20 epochs, the first 10 with a learning rate of 0.0001 and the last 10 with a learning rate of 0.00001 • After training, our model obtained a score of over 95% test accuracy • The figure to the right is a confusion matrix, showing how accurate the model was for each of the 6 classes
  • 17. Object Detection • After classifying each image, the next step is object detection • Object detection works by sending image crops into the classification model and drawing boxes around all crops that were classified as one of the non-background classes • We tried two different methods of generating crops to send to into the classifier: 1. Sliding windows 2. Selective search
  • 18. Sliding Windows • Sliding windows works by passing a fixed-size rectangle across the image, cropping the image and saving the crop’s location on the original image. The image is then enlarged and the process is repeated in order to generate smaller boxes to help find smaller objects. • The process of repeatedly enlarging the image is called pyramid scaling • The crops are then sent to the classification model to be classified. The model outputs the original image with bounding boxes drawn around all crops which were classified as non- background objects
  • 19. Example Original Image Image crops with pyramid scaling
  • 20. Problems with Sliding Windows The sliding windows method had two main problems: 1. It is a brute force method, so it took a long time to run 2. The boxes all had a fixed size, so it was hard to classify multiple classes of objects that had different shapes
  • 21. Selective Search • Selective search is an algorithm that works by creating crops based on certain criteria in the image • color, texture, shape, size • This produces boxes of different shapes and sizes, and doesn’t work by repeatedly running brute-force across the image • Crops are then sent to the model to be classified Our model uses selective search as it is faster and more accurate than sliding windows (Diagram from pyimagesearch.com)
  • 22. Non-maximum Suppression (NMS) • After the cropped regions are classified, many adjacent regions will be classified as the same object and will overlap with each other. In order to choose only the best crop for each object, we run a non-maximum suppression algorithm to remove all overlapping regions. • How NMS works: • Select the crop whose classification the model is most confident about • Remove all bounding boxes whose overlap whose intersection over union (IOU) with the selected crop is above a certain threshold • Output all remaining bounding boxes which are less than the overlap threshold
  • 24. Results • Our model is able to classify images with relatively high accuracy • We were successful in creating an image detection model that can detect if an object of any of our classes is in the image • With our GitHub repo, you can download the data and train your own model
  • 25. Further Exploration • How we might be able to further improve our model: • Uneven training data class sizes led our model to be skewed toward the bigger classes. One way to prevent this would be to find more data to increase the size of the smaller classes • Another way to help this problem is to use a focal loss function. This loss function helps the model adjust its weights more evenly by more drastically adjusting weights when it mistakenly classifies an object of a smaller class than an object of a bigger class • Another aspect to improve is the accuracy of the bounding boxes. In order to have tighter boxes, we would be able to run a bounding box regression algorithm
  • 26. Do it Yourself - How to Build and Run Our Model Pre-requisites: 1. Python 3.7+ 2. CUDA 10.0 3. cuDNN 7.5.1 Run: 1. To clone the repo - git clone https://github.com/ndcantor/tensorflow-street-classifier.git 2. Install required Python libraries - pip3 install -r requirements.txt 3. Download data, build, train, and run inference on the model - python3 street_classifier.py 4. Results: A. The model’s train, validation, and text accuracies will be printed on the screen B. A confusion matrix as well as sample test images will be saved to disk