0% found this document useful (0 votes)
13 views

3local_features

The document discusses image matching in computer vision, highlighting its applications in object recognition, 3D reconstruction, and motion tracking. It emphasizes the importance of local features for effective matching, detailing methods for feature detection, descriptor computation, and matching techniques, including the Harris detector and SIFT. The document also covers the need for invariance to transformations and presents various applications of these techniques in real-world scenarios.

Uploaded by

Mert Akgüç
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

3local_features

The document discusses image matching in computer vision, highlighting its applications in object recognition, 3D reconstruction, and motion tracking. It emphasizes the importance of local features for effective matching, detailing methods for feature detection, descriptor computation, and matching techniques, including the Harris detector and SIFT. The document also covers the need for invariance to transformations and presents various applications of these techniques in real-world scenarios.

Uploaded by

Mert Akgüç
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

BSB663

Image Processing
Pinar Duygulu

Slides are adapted from


Selim Aksoy
Image matching
• Image matching is a fundamental aspect of many problems in
computer vision.
• Object or scene recognition
• Solving for 3D structure from multiple images
• Stereo correspondence
• Image alignment & stitching
• Image indexing and search
• Motion tracking
• Find “interesting” pieces of the image.
• Focus attention of algorithms
• Speed up computation
Image matching applications

Object recognition: Find correspondences between


feature points in training and test images.
Image matching applications
Stereo correspondence and 3D reconstruction
Image matching applications

Two images of Rome from Flickr


Image matching applications

Two images of Rome from Flickr: harder case


Image matching applications

Two images from NASA Mars Rover: even harder case


Image matching applications

Two images from NASA Mars Rover: matching using local features
Image matching applications
Recognition

Texture recognition

Car detection
Advantages of local features
• Locality
• features are local, so robust to occlusion and clutter
• Distinctiveness
• can differentiate a large database of objects
• Quantity
• hundreds or thousands in a single image
• Efficiency
• real-time performance achievable
• Generality
• exploit different types of features in different situations
Local features
• What makes a good feature?
• We want uniqueness.
• Look for image regions that are unusual.
• Lead to unambiguous matches in other images.
• How to define “unusual”?
0D structure
not useful for matching

1D structure
edge, can be localized in 1D,
subject to the aperture problem
2D structure
corner, can be localized in 2D,
good for matching
Local measures of uniqueness
• We should easily recognize the local feature by looking through a
small window.
• Shifting the window in any direction should give a large change in
intensity.

“flat” region: “edge”: “corner”:


no change in all no change along the significant change in
directions edge direction all directions
Local features and image matching
• There are three important requirements for feature points to have a
better correspondence for matching:
• Points corresponding to the same scene points should be detected
consistently over different views.
• They should be invariant to image scaling, rotation and to change in
illumination and 3D camera viewpoint.
• There should be enough information in the neighborhood of the points so
that corresponding points can be automatically matched.
• These points are also called interest points.
Overview of the approach
interest points

()
local descriptor

1. Extraction of interest points (characteristic locations).


2. Computation of local descriptors.
3. Determining correspondences.
4. Using these correspondences for matching/recognition/etc.
Local features: detection
• We will now talk about one particular feature detection algorithm.
• Idea: find regions that are dissimilar to its neighbors.

W
Local features: detection
• Consider shifting the window W by (u,v):
• How do the pixels in W change?
• Auto-correlation function measures
the self similarity of a signal and is W
related to the sum-of-squared
difference.
• Compare each pixel before and
after the shift by summing up the
squared differences (SSD).
• This defines an SSD “error” of E(u,v):
Local features: detection
• Taylor Series expansion of I:

• If the motion (u,v) is assumed to be small, then first order


approximation is good.

• Plugging this into the formula on the previous slide...


Local features: detection
• Sum-of-squared differences error E(u,v):
Local features: detection
• This can be rewritten:

[after sum over all (x,y)]

• For the example above:


• You can move the center of the green window to anywhere on the blue unit circle.
• Which directions will result in the largest and smallest E values?
Local features: detection
• We want to find (u,v) such that E(u,v) is maximized or minimized:
𝑢𝑇 𝑢
E 𝑢, 𝑣 = H
𝑣 𝑣

• By definition, we can find these directions by looking at the eigenvectors of H.

• First eigenvector of H is a unit vector that maximizes E(u,v).

• Second eigenvector of H is a unit vector that minimizes E(u,v).


Quick eigenvector/eigenvalue review
• Relevant theorem:

http://fedc.wiwi.hu-berlin.de/xplore/tutorials/mvahtmlnode16.html
Quick eigenvector/eigenvalue review
• The eigenvectors of a matrix A are the vectors x that satisfy:

• The scalar  is the eigenvalue corresponding to x


• The eigenvalues are found by solving:

• In our case, A = H is a 2x2 matrix, so we have

• The solution:

• Once you know , you find x by solving


Local features: detection
• This can be rewritten:

x-
[sum over all (x,y)]

x+
• Eigenvalues and eigenvectors of H:
• Define shifts with the smallest and largest change (E value).
• x+ = direction of largest increase in E.
• + = amount of increase in direction x+.
• x- = direction of smallest increase in E.
• - = amount of increase in direction x-.
Local features: detection
• How are +, x+, -, and x- relevant for feature detection?
• What’s our feature scoring function?
• Want E(u,v) to be large for small shifts in all directions.
• The minimum of E(u,v) should be large, over all unit vectors [u v].
• This minimum is given by the smaller eigenvalue (-) of H.
Local features: detection
• Here’s what you do:
• Compute the gradient at each point in the image.
• Create the H matrix from the entries in the gradient.
• Compute the eigenvalues.
• Find points with large response (- > threshold).
• Choose those points where - is a local maximum as features.
Local features: detection
• Here’s what you do:
• Compute the gradient at each point in the image.
• Create the H matrix from the entries in the gradient.
• Compute the eigenvalues.
• Find points with large response (- > threshold).
• Choose those points where - is a local maximum as features.
Harris detector
• To measure the corner strength:
R = det(H) – k(trace(H))2
where
trace(H) = 1 + 2
det(H) = 1 x 2
(1 and 2 are the eigenvalues of H).
• R is positive for corners, negative in edge regions, and small in
flat regions.
• Very similar to - but less expensive (no square root).
• Also called the “Harris Corner Detector” or “Harris Operator”.
• Lots of other detectors, this is one of the most popular.
Harris detector example
Harris detector example

R values (red high, blue low)


Harris detector example

Threshold (R > value)


Harris detector example

Local maxima of R
Harris detector example

Harris features (red)


Local features: descriptors

()
local descriptor

• Describe points so that they can be compared.


• Descriptors characterize the local neighborhood of a
point.
Local features: matching
• We know how to detect good features.
• Next question: how to match them?

?
() = ()
• Vector comparison using a distance measure can be used.
Local features: matching
• Given a feature in I1, how to find the best match in I2?
1. Define a distance function that compares two descriptors.
2. Test all the features in I2, find the one with minimum distance.

50
75
200

feature
distance
Matching examples
Matching examples
Local features: matching
• Matches can be improved using local constraints
• neighboring points should match
• angles, length ratios should be similar
1
1

1 ~1
2 ~2
2
2
3
Summary of the approach

• Detection of interest points/regions


• Harris detector
• Blob detector based on Laplacian
• Computation of descriptors for each point
• Gray value patch, differential invariants, steerable filter, SIFT descriptor
• Similarity of descriptors
• Correlation, Mahalanobis distance, Euclidean distance
• Semi-local constraints
• Geometrical or statistical relations between neighborhood points
• Global verification
• Robust estimation of geometry between images
Local features: invariance
• Suppose you rotate the image by some angle.
• Will you still pick up the same features?
• What if you change the brightness?
• What about scale?
• We’d like to find the same features regardless of the
transformation.
• This is called transformational invariance.
• Most feature methods are designed to be invariant to
• Translation, 2D rotation, scale.
• They can usually also handle
• Limited 3D rotations.
• Limited affine transformations (some are fully affine invariant).
• Limited illumination/contrast changes.
How to achieve invariance?
Need both of the following:
1. Make sure your detector is invariant.
• Harris is invariant to translation and rotation.
• Scale is trickier.
• Common approach is to detect features at many scales using a Gaussian
pyramid (e.g., MOPS).
• More sophisticated methods find “the best scale” to represent each feature
(e.g., SIFT).
2. Design an invariant feature descriptor.
• A descriptor captures the information in a region around the detected
feature point.
• The simplest descriptor: a square window of pixels.
• Let’s look at some better approaches…
Rotation invariance for descriptors
• Find dominant orientation of the image patch.
• This is given by x+, the eigenvector of H corresponding to + (larger
eigenvalue).
• Rotate the patch according to this angle.
Multi-scale Oriented Patches (MOPS)
• Take 40x40 square window around detected feature.
• Scale to 1/5 size (using prefiltering).
• Rotate to horizontal.
• Sample 8x8 square window centered at feature.
• Intensity normalize the window by subtracting the mean, dividing by the standard deviation
in the window.
8 pixels
Multi-scale Oriented Patches (MOPS)
• Extract oriented patches at multiple scales of the Gaussian pyramid.
Scale Invariant Feature Transform (SIFT)

• The SIFT operator developed by David Lowe is both a detector and a


descriptor that are invariant to translation, rotation, scale, and other
imaging parameters.
Overall approach for SIFT
1. Scale space extrema detection
• Search over multiple scales and image locations.
2. Interest point localization
• Fit a model to determine location and scale.
• Select interest points based on a measure of stability.
3. Orientation assignment
• Compute best orientation(s) for each interest point region.
4. Interest point description
• Use local image gradients at selected scale and rotation to
describe each interest point region.
Scale space extrema detection
• Goal: Identify locations and scales that can be repeatably assigned
under different views of the same scene or object.
• Method: search for stable features across multiple scales using a
continuous function of scale.
• Prior work has shown that under a variety of assumptions, the best
function is a Gaussian function.
• The scale space of an image is a function L(x,y,σ) that is produced
from the convolution of a Gaussian kernel (at different scales) with
the input image.
Scale space interest points
• Laplacian of Gaussian (LoG) kernel

• Scale space detection


• Find local maxima across scale-space

• Difference of Gaussian kernel is a close approximation to


scale-normalized LoG.
Lowe’s pyramid scheme

For each octave of scale space, the initial image is repeatedly convolved with Gaussian to produce
the set of scale space images (left). Adjacent Gaussian images are subtracted to produce
difference of Gaussian images (right). After each octave Gaussian image is downsampled by a
factor of 2.
Interest point localization
• Detect maxima and minima of
difference of Gaussian in scale space.

• Each point is compared to its 8


neighbors in the current image and 9
neighbors each in the scales above
and below.

• Select only if it is greater or smaller


than all the others. • For each max or min found, output is the
location and the scale.
Orientation assignment
• Create histogram of local gradient
directions computed at selected
scale.

• Assign canonical orientation at


peak of smoothed histogram.

• Each key specifies stable 2D


coordinates (x, y, scale,
orientation).

0 2
Interest point descriptors
• At this point, each interest point has
• location,
• scale,
• orientation.
• Next step is to compute a descriptor for the local image region about
each interest point that is
• highly distinctive,
• invariant as possible to variations such as changes in viewpoint and
illumination.
Lowe’s interest point descriptor
• Use the normalized circular region about the interest point.
• Rotate the window to standard orientation.
• Scale the window size based on the scale at which the point was found.
• Compute gradient magnitude and orientation at each point in the region.
• Weight them by a Gaussian window overlaid on the circle.
• Create an orientation histogram over the 4x4 subregions of the window.
• 4x4 descriptors over 16x16 sample array were used in practice. 4x4 times 8
directions gives a vector of 128 values.
Lowe’s interest point descriptor

An input image Overlayed descriptors

Adapted from www.vlfeat.org


Example applications
• Object and scene recognition
• Stereo correspondence
• 3D reconstruction
• Image alignment & stitching
• Image indexing and search
• Motion tracking
• Robot navigation
Examples: 3D recognition
Examples: 3D reconstruction
Examples: location recognition
Examples: robot localization
Examples: robot localization
Map continuously built over time
Examples: panaromas
• Recognize overlap from an unordered set of images and
automatically stitch together.
• SIFT features provide initial feature matching.
• Image blending at multiple scales hides the seams.

Panorama of Lowe’s lab automatically assembled from 143 images


Examples: panaromas
Image registration and blending
Examples: panaromas
Sony Aibo
• SIFT usage:
• Recognize
charging
station
• Communicate
with visual
cards
• Teach object
recognition
Photo tourism: exploring photo collections

• Joint work by University of Washington and Microsoft Research


• http://phototour.cs.washington.edu/
• http://research.microsoft.com/IVM/PhotoTours/
• Photosynth Technology Preview at Microsoft Live Labs
• http://photosynth.net/

• Don’t forget to check the cool video and demo at


http://phototour.cs.washington.edu/.
Photo tourism: exploring photo collections

• Detect features using SIFT.


Photo tourism: exploring photo collections

• Detect features using SIFT.


Photo tourism: exploring photo collections

• Detect features using SIFT.


Photo tourism: exploring photo collections

• Match features between each pair of images.


Photo tourism: exploring photo collections

• Link up pairwise matches to form connected components of matches


across several images.

Image 1 Image 2 Image 3 Image 4


Photo tourism: exploring photo collections
Photo tourism: exploring photo collections

Photos are automatically placed inside a sketchy 3D model of the scene;


an optional overhead map also shows each photo's location.
Photo tourism: exploring photo collections

An info pane on the left shows information about the current image and navigation buttons
for moving around the collection; the filmstrip view on the bottom shows related images;
mousing over these images brings them up as a registered overlay.
Photo tourism: exploring photo collections

Photographs can also be taken in outdoor natural environments. The


photos are correctly placed in 3-D, and more free-form geometric models
can be used for inter-image transitions.
Photo tourism: exploring photo collections

Annotations entered in one image (upper left) are automatically


transferred to all other related images.
Scene summarization for online collections

• http://grail.cs.washington.edu/projects/canonview

Scene summary browsing Enhanced 3D browsing

You might also like