0% found this document useful (0 votes)
38 views

Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

The document summarizes a lecture on object recognition. It introduces common visual recognition tasks like classification, image search, and detection. It discusses challenges in object recognition like viewpoint and illumination variations, scale changes, deformations, occlusions, background clutter, and intra-class variations. The document then introduces the machine learning framework for recognition problems and the k-nearest neighbor algorithm for classification. It explains how the k-NN algorithm assigns the label of the k nearest training examples to a test example based on distance measures.

Uploaded by

Huynh Viet Trung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Object Recognition: Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab

The document summarizes a lecture on object recognition. It introduces common visual recognition tasks like classification, image search, and detection. It discusses challenges in object recognition like viewpoint and illumination variations, scale changes, deformations, occlusions, background clutter, and intra-class variations. The document then introduces the machine learning framework for recognition problems and the k-nearest neighbor algorithm for classification. It explains how the k-NN algorithm assigns the label of the k nearest training examples to a test example based on distance measures.

Uploaded by

Huynh Viet Trung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Lecture:

Object Recognition

Juan Carlos Niebles and Ranjay Krishna


Stanford Vision and Learning Lab

Stanford University Lecture 14 1 31-Oct-17


-
What we will learn today?
• Introduction to object recognition
• K-nearest neighbor algorithm
• A simple Object Recognition pipeline

Stanford University Lecture 14 2 31-Oct-17


-
What are the different visual recognition tasks?

Stanford University Lecture 14 3 31-Oct-17


-
Classification:
Does this image contain a building? [yes/no]

Yes!

Stanford University Lecture 14 4 31-Oct-17


-
Classification:
Is this an beach?

Stanford University Lecture 14 5 31-Oct-17


-
Image search

Stanford University Lecture 14 6 31-Oct-17


-
Image Search

Organizing photo collections

Stanford University Lecture 14 7 31-Oct-17


-
Detection:
Does this image contain a car? [where?]

car

Stanford University Lecture 14 8 31-Oct-17


-
Detection:
Which object does this image contain? [where?]

Building

clock

person
car

Stanford University Lecture 14 9 31-Oct-17


-
Detection:
Accurate localization (segmentation)

clock

Stanford University Lecture 14 10 31-Oct-17


-
Detection: Estimating object semantic &
geometric attributes

Object: Building, 45º pose,


8-10 meters away
It has bricks

Object: Person, back;


1-2 meters away

Object: Police car, side view, 4-5 m away


Stanford University Lecture 14 11 31-Oct-17
-
Categorization vs Single instance
recognition
Does this image contain the Chicago Macy’s building?

Stanford University Lecture 14 13 31-Oct-17


-
Categorization vs Single instance
recognition
Where is the crunchy nut?

Stanford University Lecture 14 14 31-Oct-17


-
Applications of computer vision
•Recognizing landmarks in
mobile platforms

+ GPS

Stanford University Lecture 14 15 31-Oct-17


-
Activity or Event recognition
What are these people doing?

Stanford University Lecture 14 16 31-Oct-17


-
Visual Recognition
• Design algorithms that have the
capability to:
– Classify images or videos
– Detect and localize objects
– Estimate semantic and geometrical
attributes
– Classify human activities and events

Why is this challenging?


Stanford University Lecture 14 17 31-Oct-17
-
How many object categories are there?

Stanford University Lecture 14 18 31-Oct-17


-
Challenges: viewpoint variation

Michelangelo 1475-1564

Stanford University Lecture 14 19 31-Oct-17


-
Challenges: illumination

image credit: J. Koenderink

Stanford University Lecture 14 20 31-Oct-17


-
Challenges: scale

Stanford University Lecture 14 21 31-Oct-17


-
Challenges: deformation

Stanford University Lecture 14 22 31-Oct-17


-
Challenges:
occlusion

Magritte, 1957

Stanford University Lecture 14 23 31-Oct-17


-
Art Segway - Magritte

Stanford University Lecture 14 24 31-Oct-17


-
Challenges: background clutter

Kilmeny Niland. 1995

Stanford University Lecture 14 25 31-Oct-17


-
Challenges: intra-class variation

Stanford University Lecture 14 26 31-Oct-17


-
What we will learn today?
• Introduction
• K-nearest neighbor algorithm
• A simple Object Recognition pipeline

Stanford University Lecture 14 27 31-Oct-17


-
The machine learning framework

y = f(x)
output prediction Image
function feature

• Training: given a training set of labeled examples {(x1,y1), …,


(xN,yN)}, estimate the prediction function f by minimizing the
prediction error on the training set
• Testing: apply f to a never before seen test example x and
output the predicted value y = f(x)

Stanford University Lecture 14 28 31-Oct-17


Slide credit: L. Lazebnik
-
Classification
• Assign input vector to one of two or more classes
• Any decision rule divides input space into
decision regions separated by decision
boundaries

Slide credit: L. Lazebnik

Stanford University Lecture 14 29 31-Oct-17


-
Nearest Neighbor Classifier
• Assign label of nearest training data point
to each test data point

Compute
Distance
Test
image

Training
images Choose k of the
nearest records

Source: N. Goyal

Stanford University Lecture 14 30 31-Oct-17


-
Nearest Neighbor Classifier
• Assign label of nearest training data point
to each test data point

from Duda et al.

partitioning of feature space


for two-category 2D and 3D data
Source: D. Lowe

Stanford University Lecture 14 31 31-Oct-17


-
K-nearest neighbor
Distance measure - Euclidean x
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
Where Xn and Xm are the n-th and m-th data points o o
o
o
x2

x1

Stanford University Lecture 14 32 31-Oct-17


-
1-nearest neighbor
Distance measure - Euclidean x
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
Where Xn and Xm are the n-th and m-th data points o o
o
o
x2

x1

Stanford University Lecture 14 33 31-Oct-17


-
3-nearest neighbor
Distance measure - Euclidean x
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
Where Xn and Xm are the n-th and m-th data points o o
o
o
x2

x1

Stanford University Lecture 14 34 31-Oct-17


-
5-nearest neighbor
Distance measure - Euclidean x
x
D 2 x o
x x
x
n
Dist(X , X ) = m
∑(X i
n
−X )
i
m
+ o
i=1 o x
x
Where Xn and Xm are the n-th and m-th data points o o
o
o
x2

x1

Stanford University Lecture 14 35 31-Oct-17


-
K-NN: a very useful algorithm

• Simple, a good one to try first


• Very flexible decision boundaries
• With infinite examples, 1-NN provably has
error that is at most twice Bayes optimal error
(out of scope for this class).

Stanford University Lecture 14 36 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes

Stanford University Lecture 14 37 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
K=1 K=15

Stanford University Lecture 14 38 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
– Solution: cross validate!

Stanford University Lecture 14 39 31-Oct-17


-
Cross validation

Stanford University Lecture 14 40 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
– Solution: cross validate!
• Can produce counter-intuitive results (using
Euclidean measure)

Stanford University Lecture 14 41 31-Oct-17


-
Euclidean measure
111111111110 100000000000
vs
011111111111 000000000001
d = 1.4142 d = 1.4142

Stanford University Lecture 14 42 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
– Solution: cross validate!
• Can produce counter-intuitive results (using
Euclidean measure)
– Solution: normalize the vectors to unit length

Stanford University Lecture 14 43 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
– Solution: cross validate!
• Can produce counter-intuitive results (using
Euclidean measure)
– Solution: normalize the vectors to unit length
• Curse of Dimensionality

Stanford University Lecture 14 44 31-Oct-17


-
Curse of dimensionality
• Assume 5000 points uniformly distributed in the unit
hypercube and we want to apply 5-NN. Suppose our query
point is at the origin.
– In 1-dimension, we must go a distance of 5/5000=0.001 on the
average to capture 5 nearest neighbors.
– In 2 dimensions, we must go 0.001 to get a square that contains 0.001
of the volume.
1/d
– In d dimensions, we must go ( 0.001)

Stanford University Lecture 14 45 31-Oct-17


-
K-NN: issues to keep in mind
• Choosing the value of k:
– If too small, sensitive to noise points
– If too large, neighborhood may include points from other
classes
– Solution: cross validate!
• Can produce counter-intuitive results (using
Euclidean measure)
– Solution: normalize the vectors to unit length
• Curse of Dimensionality
– Solution: no good one

Stanford University Lecture 14 46 31-Oct-17


-
Many classifiers to choose from
• K-nearest neighbor Which is the best one?
• SVM
• Neural networks
• Naïve Bayes
• Bayesian network
• Logistic regression
• Randomized Forests
• Boosted Decision Trees
• RBMs
• Etc.
Slide credit: D. Hoiem

Stanford University Lecture 14 47 31-Oct-17


-
Generalization

Training set (labels known) Test set (labels


unknown)

• How well does a learned model generalize from the


data it was trained on to a new test set? Slide credit: L. Lazebnik

Stanford University Lecture 14 48 31-Oct-17


-
Bias-Variance Trade-off
• Models with too few
parameters are
inaccurate because of a
large bias (not enough
flexibility).

• Models with too many


parameters are
inaccurate because of a
large variance (too much
sensitivity to the sample).
Slide credit: D. Hoiem

Stanford University Lecture 14 49 31-Oct-17


-
Bias versus variance
• Components of generalization error
– Bias: how much the average model over all training sets differ
from the true model?
• Error due to inaccurate assumptions/simplifications made by
the model
– Variance: how much models estimated from different training
sets differ from each other
• Underfitting: model is too simple to represent all the
relevant class characteristics
– High bias and low variance
– High training error and high test error
• Overfitting: model is too complex and fits irrelevant
characteristics (noise) in the data
– Low bias and high variance
– Low training error and high test error
Slide credit: L. Lazebnik

Stanford University Lecture 14 50 31-Oct-17


-
Bias versus variance trade off

Stanford University Lecture 14 51 31-Oct-17


-
No Free Lunch Theorem

In a supervised learning setting, we can’t tell


which classifier will have best generalization
Slide credit: D. Hoiem

Stanford University Lecture 14 52 31-Oct-17


-
Remember…
• No classifier is inherently
better than any other: you
need to make assumptions to
generalize

• Three kinds of error


– Inherent: unavoidable
– Bias: due to over-simplifications
– Variance: due to inability to
perfectly estimate parameters
from limited data
Slide credit: D. Hoiem

Stanford University Lecture 14 53 31-Oct-17


-
How to reduce variance?

• Choose a simpler classifier

• Regularize the parameters

• Get more training data

How do you reduce bias?

Slide credit: D. Hoiem

Stanford University Lecture 14 54 31-Oct-17


-
Last remarks about applying machine
learning methods to object recognition
• There are machine learning algorithms to choose from
• Know your data:
– How much supervision do you have?
– How many training examples can you afford?
– How noisy?
• Know your goal (i.e. task):
– Affects your choices of representation
– Affects your choices of learning algorithms
– Affects your choices of evaluation metrics
• Understand the math behind each machine learning
algorithm under consideration!

Stanford University Lecture 14 55 31-Oct-17


-
What we will learn today?
• Introduction
• K-nearest neighbor algorithm
• A simple Object Recognition pipeline

Stanford University Lecture 14 56 31-Oct-17


-
Object recognition:
a classification framework
• Apply a prediction function to a feature representation of
the image to get the desired output:

f( ) = apple
f( ) = tomato
f( ) = cow
Dataset: ETH-80, by B. Leibe Slide credit: L. Lazebnik

Stanford University Lecture 14 57 31-Oct-17


-
A simple pipeline - Training

Training
Images

Image
Features

Stanford University Lecture 14 58 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image
Training
Features

Stanford University Lecture 14 59 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Stanford University Lecture 14 60 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Test Image
Image
Features

Stanford University Lecture 14 61 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Test Image
Image Learned
Prediction
Features Classifier

Stanford University Lecture 14 62 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Test Image
Image Learned
Prediction
Features Classifier

Stanford University Lecture 14 63 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

Stanford University Lecture 14 64 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Stanford University Lecture 14 65 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Stanford University Lecture 14 66 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Stanford University Lecture 14 67 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation
? Occlusion

Local shape: shape context Invariance?


? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Stanford University Lecture 14 68 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation
? Occlusion

Local shape: shape context Invariance?


? Translation
? Scale
? Rotation (in-planar)
? Occlusion

Stanford University Lecture 14 69 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation
? Occlusion

Local shape: shape context Invariance? Texture: Filter banks Invariance?


? Translation ? Translation
? Scale ? Scale
? Rotation (in-planar) ? Rotation (in-planar)
? Occlusion ? Occlusion

Stanford University Lecture 14 70 31-Oct-17


-
Image features
Input image
Color: Quantize RGB values Invariance?
? Translation
? Scale
? Rotation
? Occlusion

Global shape: PCA space Invariance?


? Translation
? Scale
? Rotation
? Occlusion

Local shape: shape context Invariance? Texture: Filter banks Invariance?


? Translation ? Translation
? Scale ? Scale
? Rotation (in-planar) ? Rotation (in-planar)
? Occlusion ? Occlusion

Stanford University Lecture 14 71 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Test Image
Image Learned
Prediction
Features Classifier

Stanford University Lecture 14 72 31-Oct-17


-
Classifiers: Nearest neighbor

Training
Training examples
examples from class 2
from class 1

Slide credit: L. Lazebnik

Stanford University Lecture 14 73 31-Oct-17


-
A simple pipeline - Training
Training
Training Labels
Images

Image Learned
Training
Features Classifier

Test Image
Image Learned
Prediction
Features Classifier

Stanford University Lecture 14 74 31-Oct-17


-
Classifiers: Nearest neighbor

Test Training
Training examples
examples example
from class 2
from class 1

Slide credit: L. Lazebnik

Stanford University Lecture 14 75 31-Oct-17


-
Results

Dataset: ETH-80, by B. Leibe, 2003

Stanford University Lecture 14 76 31-Oct-17


-
What we have learned today?
• Introduction
• K-nearest neighbor algorithm
• A simple Object Recognition pipeline

Stanford University Lecture 14 78 31-Oct-17


-

You might also like