SIFT - The Scale Invariant Feature Transform
SIFT - The Scale Invariant Feature Transform
Feature Transform
<0 12 31 0 0 23 >
<5 0 0 11 37 15 >
<14 21 10 0 3 22 >
Ideal Interest Points/Regions
Lots of them
Repeatable
Representative orientation/scale
Fast to extract and match
SIFT Overview
Detector
1. Find Scale-Space Extrema
2. Keypoint Localization & Filtering
Improve keypoints and throw out bad ones
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
SIFT Overview
Detector
1. Find Scale-Space Extrema
2. Keypoint Localization & Filtering
Improve keypoints and throw out bad ones
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
Scale Space
Need to find characteristic scale for feature
Scale-Space: Continuous function of scale
Only reasonable kernel is Gaussian:
L ( x, y , D ) = G ( x, y , D ) * I ( x, y )
Mikolajczyk 2002
Approximate LoG
LoG is expensive, so lets approximate it
Using the heat-diffusion equation:
G G ( k ) G ( )
G =
2
k
Define Difference-of-Gaussians (DoG):
( k 1) 2 2G G ( k ) G ( )
D( ) ( G ( k ) G ( ) ) * I
DoG Efficiency
The smoothed images need to be computed in
any case for feature description.
We need only to subtract two images.
DoB Filter (`Difference of Boxes')
Even faster approximation is using box filters (by
integral image)
A B
C D= B+ C - A
Integral Image Usage
Scale-Space Construction
First construct scale-space:
increasing
G ( 2 ) * I
G( ) * I G ( 2k ) * I
G ( k ) * I ( )
G 2k 2 * I
First octave ( )
G k 2 * I Second octave
Difference-of-Gaussianss
Now take differences:
Scale-Space Extrema
Choose all extrema within 3x3x3 neighborhood.
Low cost only several usually checked
(
D k 2 )
D( k )
D( )
SIFT Overview
Detector
1. Find Scale-Space Extrema
2. Keypoint Localization & Filtering
Improve keypoints and throw out bad ones
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
Keypoint Localization & Filtering
Now we have much less points than pixels.
However, still lots of points (~1000s)
With only pixel-accuracy at best
At higher scales, this corresponds to several pixels in base
image
And this includes many bad points
True Extrema
Detected Extrema
Sampling x
Keypoint Localization
The Solution:
Take Taylor series expansion:
D T 1 T 2 D T
D( x ) = D + x + x 2 x
x 2 x
1
D
2
D
x = 2
x x
Point constrained
Point detection
Point detection Point can move along edge
Keypoint Filtering - Edges
To check if ratio of principal curvatures is below some threshold, r, check:
Tr ( H ) (r + 1)
2 2
<
Det ( H ) r
r=10
Only 20 floating points operations to test each keypoint
Keypoint Filtering
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
Ideal Descriptors
Robust to:
Affine transformation
Lighting
Noise
Distinctive
Fast to match
Not too large
Usually L1 or L2 matching
SIFT Overview
Detector
1. Find Scale-Space Extrema
2. Keypoint Localization & Filtering
Improve keypoints and throw out bad ones
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
Orientation Assignment
Now we have set of good points
Choose a region around each point
Remove effects of scale and rotation
Orientation Assignment
Use scale of point to choose correct image:
L ( x , y ) = G ( x , y , ) * I ( x, y )
Compute gradient magnitude and orientation
using finite differences:
m ( x, y ) = ( L( x + 1, y ) L( x 1, y ) ) 2 + ( L( x, y + 1) L( x, y 1) ) 2
1 ( L ( x, y + 1) L ( x, y 1) )
( x, y ) = tan
( L( x + 1, y ) L( x 1, y ) )
Orientation Assignment
Create gradient histogram (36 bins)
Weighted by magnitude and Gaussian window ( is 1.5 times
that of the scale of a keypoint)
Orientation Assignment
Any peak within 80% of the highest peak is used
to create a keypoint with that orientation
~15% assigned multiplied orientations, but
contribute significantly to the stability
Finally a parabola is fit to the 3 histogram values
closest to each peak to interpolate the peak
position for better accuracy
SIFT Overview
Detector
1. Find Scale-Space Extrema
2. Keypoint Localization & Filtering
Improve keypoints and throw out bad ones
3. Orientation Assignment
Remove effects of rotation and scale
4. Create descriptor
Using histograms of orientations
Descriptor
SIFT Descriptor
Each point so far has x, y, , m,
Now we need a descriptor for the region
Could sample intensities around point, but
Sensitive to lighting changes
Sensitive to slight errors in x, y,
Look to biological vision
Neurons respond to gradients at certain frequency and
orientation
But location of gradient can shift slightly!
Image 2 Image 1
False 2nd
best match
Best Match
True 2nd
best match
Fast Nearest-Neighbor Matching to
Feature Database
Hypotheses are generated by approximate nearest neighbor
matching of each feature to vectors in the database
SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d
tree algorithm
Use heap data structure to identify bins in order by their
distance from query point
Scale = 2.5
Rotation = 450
1
Mikolajczyk & Schmid 2005
A note regarding invariance/robustness
There is a tradeoff between invariance and
distinctiveness.
For some tasks it is better not to be invariant
Local features and kernels for classification of
texture and object categories: An in-depth
study - Zhang, Marszalek, Lazebnik and Schmid. IJCV 2007.
11 color names - J. van de Weijer, C. Schmid, Applying
Color Names to Image Description. ICIP 2007
Conclusion: Local features
Much work left to be done
Efficient search and matching
Combining with global methods
Finding better features
SIFT extensions
Color
Color SIFT - G. J. Burghouts and J. M. Geusebroek.
Performance evaluation of local colour invariants.
Comput. Vision Image Understanding, 2009
Hue and Opponent histograms - J. van de Weijer,
C. Schmid. Coloring Local Feature Extraction.
ECCV 2006
11 color names - J. van de Weijer, C. Schmid,
Applying Color Names to Image Description. ICIP 2007
PCA-SIFT
Only change step 4 (creation of descriptor)
Pre-compute an eigen-space for local gradient
patches of size 41x41
2x39x39=3042 elements
Only keep 20 components
A more compact descriptor
In K.Mikolajczyk, C.Schmid 2005 PCA-SIFT
tested inferior to original SIFT
Speed Improvements
SURF - Bay et al. 2006
Approx SIFT - Grabner et al. 2006
GPU implementation - Sudipta N. Sinha et al. 2006
GLOH (Gradient location-orientation
histogram)
SIFT
17 location bins
16 orientation bins
Analyze the 17x16=272-d
eigen-space, keep 128 components