0% found this document useful (0 votes)
12 views17 pages

Object Detection Using Overfeat

The document discusses object detection using a convolutional neural network. It describes Overfeat, which uses a pretrained model, sliding window detection, image pyramids, and fully connected layers as convolutions. Overfeat was the winner of the localization task at ILSVRC2013 and used these techniques along with non-maximum suppression for detection.

Uploaded by

Sprout Gigs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Object Detection Using Overfeat

The document discusses object detection using a convolutional neural network. It describes Overfeat, which uses a pretrained model, sliding window detection, image pyramids, and fully connected layers as convolutions. Overfeat was the winner of the localization task at ILSVRC2013 and used these techniques along with non-maximum suppression for detection.

Uploaded by

Sprout Gigs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Object Detection using Convolution Neural Network

Overfeat
Overfeat
Idea 4 Sliding window (spatial output) + Pretrained Model + Image Pyramid + FC as Convnet

1. Train Localize multi class classification - To understand generalize property of


model Bbox

2. Use it as pretrained model for object - Multi class detection


detection task

3. Sliding window - To detect multiple instance of object Overfeat


Paper
4. Image Pyramid - For detecting varying size object

5. FC as convnet - To overcome CNN constrain

6. NMS - For final prediction


Overfeat
Idea 4 Sliding window (spatial output ) + Pretrained Model + Image Pyramid + FC as convnet Testing

FC as convnet

Regression

MODEL
F

VGG 16
C C
1 2

Sliding window Feature Extraction


+ Classification
Image Pyramid
Pre Trained Localize Model
Overfeat - Experiment

~ winner of the localization task of the ImageNet Large


Scale Visual Recognition Challenge 2013 (ILSVRC2013)

Paper link: https://arxiv.org/abs/1312.6229


Overfeat - Experiment - Classification and Localization - Training

2012

• ImageNet 2012 dataset

• 1000 classes

• Trained classification and localization


task on modified AlexNET.
Vehicle Dog Craft • ILSVRC 2013 1st Winner for
classification and localization task

• 3rd Rank for Detection task


1000 classes
Overfeat - Experiment - Classification and Localization - Training

Classification and Localization


Localize
Bounding Box
ImageNet 2012
Dataset – 1000 CLASS Regression

MODEL
F
C C
1 2

Feature Extraction
Image + Bounding Box Classification

Modified AlexNET Classify Class k

• ILSVRC 2013 1st Winner for classification and localization task


Overfeat - Experiment - Object Detection - Training

• Training without Background


class leads to a lot of False
Positive prediction.

• To avoid FP, one additional


class is used.

• Training data for background


245 x 245 is taken randomly where no
object is present.
Classification and Localization
• Training of Classification and
Regression Localization is done on 20 + 1
Class.
MODEL

Training
C + 1 Class
• Base dimension for training is
+ 1 for Background Classification 245 x 245
Overfeat - Experiment - Object Detection - Inference

• Pretrained Classification and Localization Network


trained on C + 1 Class

• Spatial output of Image pyramid with base dimension


of prediction is 245 x 245

• 6 Scale Image pyramid with 1:2 factor of Resolution

• FC as Convent

• Resolution / Subsampling ratio / Effective Strides = 36


Overfeat - Experiment - Resolution

Resolution / Subsampling Ratio / Effective Stride = 36 Resolution / Subsampling Ratio / Effective Stride = 18

389 x 461 317 x 386 317 x 386

Spatial Output 3x5 6 x 10


For 1 class only
5x7
Overfeat - Experiment - Image Pyramid - Spatial Output
6 Scale Image Pyramid with 1:2 factor of Resolution

281 x 317 317 x 386 389 x 461 425 x 497 464 x 569

2 x 3 x C+1

3 x 5 x C+1 5 x 7 x C+1
6 x 7 x C+1
7 x 10 x C+1

1 x 1 x C+1 Resolution / Effective Stride = 36


245 X 245
Credit: https://www.youtube.com/watch?v=9I6nzfx_kpE&list=PL1GQaVhO4f_jLxOokW7CS5kY_J1t1T17S&ab_channel=Cogneethi
Overfeat - Experiment - Object Detection - Inference
Spatial Output

2x3 + 3x5 + 5x7 + 6x7 + 7 x 10 = 169 x C+1 Resolution / Effective Stride = 36

NMS

245 X 245 245 X 245


Overfeat - Experiment - Object Detection - Inference
1 x 1 x 4 x c+1
245 X 245
2 x 3 x 4 x c+1
Localize
Bounding Box 3 x 5 x 4 x c+1
281 x 317
Regression 5 x 7 x 4 x c+1
Classification and Localization
317 x 386 6 x 7 x 4 x c+1
Query 7 x 10 x 4 x c+1

MODEL
389 x 461
245 X 245

Modified Alex NET 1 x 1 x c+1


425 x 497
2 x 3 x c+1
Classification
3 x 5 x c+1
Classify
Class c+1 5 x 7 x c+1
464 x 569
6 x 7 x c+1
7 x 10 x c+1
Credit: https://www.youtube.com/watch?v=9I6nzfx_kpE&list=PL1GQaVhO4f_jLxOokW7CS5kY_J1t1T17S&ab_channel=Cogneethi
Query Input Resolution = 36

Classification
Confidence Box Resolution = 12

Model

Regression
Bounding Box

Credit: https://arxiv.org/abs/1312.6229
Overfeat - Experiment - Object Detection - Inference

NMS
Spatial Outputs

Final prediction

Credit: https://arxiv.org/abs/1312.6229
Overfeat - Experiment - Object Detection - Research Paper Result

• 3rd Rank for Detection task

Paper link: https://arxiv.org/abs/1312.6229


Overfeat - Object Detection - Drawbacks
Background Class

Model Background Class


Overfeat - Object Detection - Drawbacks

• Each inference takes 2 seconds

• Computationally Inefficient and also expensive (2013)

• A lot of background region is getting unnecessary processed because of Sliding


window or dense Sampling approach

• Sliding window approach creates a lot of FP , therefore less MAP

• Can we have some way to find to only those regions where background is not present?

• Accurate predictions and increase MAP ?

• RPNN END 
Credit :https://www.pyimagesearch.com/2020/06/29/opencv-selective-search-for-object-detection/

You might also like