0% found this document useful (0 votes)
5 views81 pages

[EIE529] Feature Extraction

The document discusses feature extraction in digital image processing, focusing on shape and region descriptors, including techniques like the Scale-Invariant Feature Transform (SIFT). It explains the importance of feature detection and description, the use of chain codes for boundary representation, and Fourier descriptors for shape analysis. Additionally, it covers region descriptors based on area, perimeter, and texture to enhance object characterization in images.

Uploaded by

Chris Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views81 pages

[EIE529] Feature Extraction

The document discusses feature extraction in digital image processing, focusing on shape and region descriptors, including techniques like the Scale-Invariant Feature Transform (SIFT). It explains the importance of feature detection and description, the use of chain codes for boundary representation, and Fourier descriptors for shape analysis. Additionally, it covers region descriptors based on area, perimeter, and texture to enhance object characterization in images.

Uploaded by

Chris Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Feature Extraction

Dr Yuyi MAO
Department of Electronic and Information Engineering
The Hong Kong Polytechnic University
EIE529 Digital Image Processing (Fall 2021)
Topics to be Discussed

• Shape descriptors
• Region descriptors
• Scale-invariant feature transform (SIFT)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 2


Introduction
• Image feature - Distinctive attribute we want to label or differentiate
• Feature extraction includes feature detection and feature description
▪ Feature detection - e.g. find the positions of object corners
▪ Feature description - e.g. find the size, sharpness, orientation of the corners
• Images may need to be pre-processed to facilitate feature extraction
▪ Histogram equalization, restoration, segmentation, etc.
• A feature descriptor is invariant with respect to a set of transformations if its value remains unchanged after the
application of any transformation from the family
▪ E.g. Area of an object is an invariant descriptor to {translation, reflection, rotation}
• A feature descriptor is covariant with respect to a set of transformations if its value changes proportionally to the
degree of transformation from the family
▪ E.g. Area of an object is an covariant descriptor to {scaling}
• A good feature descriptor should be invariant/covariant to a large set of common transformations of the image

EIE529 (Fall 2021), The Hong Kong Polytechnic University 3


Introduction

• During the image segmentation process, pixels of the interested objects are
clustered into different separated regions
• It is a standard practice to use schemes that compact the segmented data into
representations that facilitate the computation of descriptors
• Image representation - To represent and describe the resulting aggregate of
segmented pixels in a form suitable for further computer processing after
segmenting an image into regions
• Two choices for representing a region
▪ External characteristics: Shape characteristics represented by the boundary
▪ Internal characteristics: Reflectivity properties such as the color and texture of the
pixels comprising the region

EIE529 (Fall 2021), The Hong Kong Polytechnic University 4


Shape Descriptors

EIE529 (Fall 2021), The Hong Kong Polytechnic University 5


Boundary Representation - (Freeman) Chain Code

• A boundary can be represented by a connected sequence of straight line


segments of specified length and direction
• The direction of each segment is coded by using a numbering scheme

Directions for (a) 4-directional chain code and (b) 8-direction chain code.
EIE529 (Fall 2021), The Hong Kong Polytechnic University 6
Chain Codes - Example

{007665424222} {00303332212111}

EIE529 (Fall 2021), The Hong Kong Polytechnic University 7


Chain Codes - Code Generation Scheme
4-directional chain code

Grid cell center


8-directional chain code

EIE529 (Fall 2021), The Hong Kong Polytechnic University 8


Problems
The smaller the grid size, The smaller the grid size,
the longer the chain code. the more sensitive to noise.

{0000 77 66666 5 444 22 44 222222} {0000 0606 66666 64 44 3 2 44 222222}

EIE529 (Fall 2021), The Hong Kong Polytechnic University 9


Boundary Resampling
• A frequently used method to solve the
problem is to resample the boundary by
selecting a larger grid spacing
• The accuracy of the resulting code
representation depends on the spacing
of the sampling grid
• Tradeoff: Larger grid less sensitive to
noise but lower resolution

a) Digital boundary with resampling grid;


b) Result of resampling;
c) 4-directional chain code;
d) 8-directional chain code.

EIE529 (Fall 2021), The Hong Kong Polytechnic University 10


Normalization

• A chain code should be normalized to allow it to be invariant to the size and


orientation of the object, and starting point of the code
• Normalization for size - Adjust the size of the resampling grid
• Normalization of the starting point
▪ The chain code depends on the starting point
▪ It can be normalized by treating it as a circular sequence and redefine the starting
point that the resulting sequence of numbers forms a decimal number of minimum value

Example 1: {766665533212120} → {076666563321212}


Example 2: {553321212076666} → {076666563321212}

EIE529 (Fall 2021), The Hong Kong Polytechnic University 11


Normalization for Rotation

• Use the first difference of the chain code instead of the code itself
• The difference is simply by counting (counter-clockwise) the number of directions
that separate two adjacent elements of the code

{007665424222} {007646444221}

{607707762600} {707762600607}
1st difference 1st difference

{006077077626} {006077077626}
min. integer min. integer
Rotated
EIE529 (Fall 2021), The Hong Kong Polytechnic University 12
Smoothed by a Thresholding using
Example 9x9 box filter the Otsu’s method

The outer boundary Joining the sample points

Resampled

8-directional Freeman chain code: 00006066666666444444242222202202


First difference: 60006260000000600000626000062062
Minimum number: 00000006000006260000620626000626
EIE529 (Fall 2021), The Hong Kong Polytechnic University 13
Boundary Descriptor - Shape Number

• Shape number of a boundary is defined as the first difference of a chain code of


the smallest magnitude
• The order 𝑛 of a shape number is the number of digits in its representation
• The following figures show all shapes of order 4 and 6 in a 4-directional chain
code:
Order 4 Order 6

Chain code: 0 3 2 1 Chain code: 0 0 3 2 2 1


Difference: 3 3 3 3 Difference: 3 0 3 3 0 3
Shape no.: 3 3 3 3 Shape no.: 0 3 3 0 3 3
EIE529 (Fall 2021), The Hong Kong Polytechnic University 14
Boundary Descriptor - Fourier Descriptor
• For the shape number method, alignment needs to be done first to adjust the size of the
object
• Also, if the object is not rotated close to the directions of the chain code scheme, further
alignment of the object is required
• The 1-D Fourier transform of the boundary pixels can also be a good boundary descriptor
of an object without the abovementioned problems
• Coordinate pairs of points encountered in traversing an 𝑁-point boundary in the xy plane
are recorded as a sequence of complex numbers
• Example : 1,2 , 2,3 , 2,4 , ⋯ , 𝑥, 𝑦 , ⋯ ⇒
1 + 2𝑖, 2 + 3𝑖, 2 + 4𝑖, ⋯ , 𝑥 + 𝑦𝑖, ⋯
• An 𝑁-point DFT is performed to the sequence and the complex coefficients obtained are
called the Fourier descriptors of the boundary

EIE529 (Fall 2021), The Hong Kong Polytechnic University 15


Example
Object
Contour = { 1, 2-j, 1-2j, -2j,
-1-2j, -2-j, -1, -2+j,
-1+2j, 2j, 1+2j, 2+j}

EIE529 (Fall 2021), The Hong Kong Polytechnic University 16


Example - Object is Rotated by 90o
Rotated Object
Contour = { 2, 2-j, 1-2j, -j,
-1-2j, -2-j, -2, -2+j,
-1+2j, j, 1+2j, 2+j}

Only phase has


changed, not the
magnitude

EIE529 (Fall 2021), The Hong Kong Polytechnic University 17


Example - Object is Shifted by 2 Units
Translated Object
Contour = { 3, 4-j, 3-2j, 2-2j,
1-2j, -j, 1, j,
1+2j, 2+2j, 3+2j, 4+j}
={ 1, 2-j, 1-2j, -2j,
-1-2j, -2-j, -1, -2+j,
-1+2j, 2j, 1+2j, 2+j} + 2

Only the “dc” has


changed, not the
other part of the
magnitude. Also,
phase is the same

EIE529 (Fall 2021), The Hong Kong Polytechnic University 18


Example – Scaling by 2 Times
Scaled Object
Contour = { 2, 4-2j, 2-4j, -4j,
-2-4j, -4-2j, -2, -4+2j,
-2+4j, 4j, 2+4j, 4+2j}
={ 1, 2-j, 1-2j, -2j,
-1-2j, -2-j, -1, -2+j,
-1+2j, 2j, 1+2j, 2+j} x 2

All Fourier
coefficients are
scaled up, but not
the shape. Also,
phase is the same

EIE529 (Fall 2021), The Hong Kong Polytechnic University 19


Example - Different Starting Point
Object (different starting point)
Contour = { 2j, 1+2j, 2+j, 1,
2-j, 1-2j, -2j, -1-2j,
-2-j, -1, -2+j, -1+2j}

Only the phase is


changed, but not
the magnitude

EIE529 (Fall 2021), The Hong Kong Polytechnic University 20


Summary

• Sequence of coordinates 𝑠 𝑘 = 𝑥 𝑘 + 𝑗 ⋅ 𝑦 𝑘 , 𝑘 = 0,1, ⋯ , 𝐾 − 1


• Fourier descriptors of the boundary
𝐾−1

𝑎 𝑢 = ෍ 𝑠 𝑘 𝑒 −𝑗2𝜋𝑢𝑘/𝐾 , 𝑢 = 0,1, ⋯ , 𝐾 − 1
𝑘=0

• The inverse Fourier transform of {𝑎(𝑢)} restores the boundary {𝑠 𝑘 }


𝐾−1
1
𝑠 𝑘 = ෍ 𝑎 𝑢 𝑒 𝑗2𝜋𝑢𝑘/𝐾 , 𝑘 = 0,1, ⋯ , 𝐾 − 1
𝐾
𝑢=0

• Suppose only the first 𝑃 coefficients in {𝑎(𝑢)} are used


𝑃−1
1
𝑠Ƹ 𝑘 = ෍ 𝑎 𝑢 𝑒 𝑗2𝜋𝑢𝑘/𝐾 , 𝑘 = 0,1, ⋯ , 𝐾 − 1
𝐾
𝑢=0
EIE529 (Fall 2021), The Hong Kong Polytechnic University 21
Summary

EIE529 (Fall 2021), The Hong Kong Polytechnic University 22


Summary

• In general, only the first few coefficients are of significant magnitude and
are pretty enough to describe the general shape of the boundary
• Fourier descriptors are not totally insensitive to geometrical changes such as
translation, rotation and scale changes, but the changes can be related to
simple transformations on the descriptors
• Similar to the shape number method, using all boundary pixels will end up
with a long Fourier descriptor, which is difficult to handle
• Use only a small percentage of the Fourier descriptor can also well
represent the boundary
▪ Since the high frequency ones represent the detail of the boundary which is in
general not needed for the purpose of feature extraction

EIE529 (Fall 2021), The Hong Kong Polytechnic University 23


Reconstruction of the Boundary Using a Small
Percentage of the Boundary Pixels

Boundaries
Boundary of a reconstructed using
human chromosome different percentages
(2868 points) of Fourier descriptors
(by cutting the tail)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 24


Region Descriptors

EIE529 (Fall 2021), The Hong Kong Polytechnic University 25


Some Basic Region Descriptors

Area (𝐴) and perimeter (𝑃)


▪Make sense only when they are normalized
𝑃2
Compactness = , indicate how compact is the object shape
𝐴
▪A dimensionless measure
(2𝜋𝑟)2 (4𝑟)2
▪For circle, compactness = 𝜋𝑟 2
= 4𝜋. For square, compactness = 𝑟2
= 16
4𝜋𝐴
Circularity = , indicate how circular is the object shape
𝑃2

𝜆21 −𝜆22
Eccentricity = ∈ [0,1) given 𝜆1 ≥ 𝜆2 , indicate how
𝜆1
much difference of the major and minor axes when fitting
with an ellipse

EIE529 (Fall 2021), The Hong Kong Polytechnic University 26


Feature Vectors

• A single descriptor very often is not sufficient to clearly describe an object


• By using multiple descriptors and arranging them as a vector (feature vector), more information of
the object can be obtained
• E.g. from the table, the star is very compact, but not circular, and the major and minor axes are
similar, i.e. regular in all directions

EIE529 (Fall 2021), The Hong Kong Polytechnic University 27


Feature Space

• By plotting the vectors on a


feature space, the differences
between them can be easily
measured by their weighted
distance

• E.g., the weighted distance between objects A and B is


𝐷 𝐴, 𝐵 = 𝑤1 𝑥1𝐴 − 𝑥1𝐵 , 𝑤2 𝑥2𝐴 − 𝑥2𝐵 , 𝑤3 𝑥3𝐴 − 𝑥3𝐵
▪ where 𝑤1 , 𝑤2 , and 𝑤3 are the weights to normalize the three terms

EIE529 (Fall 2021), The Hong Kong Polytechnic University 28


Region Descriptor Based on Texture

• An important approach to region


description is to quantify its texture content
Regular
• Provides measures of properties such as texture
smoothness, coarseness, and regularity
• Two major approaches:
▪ Statistical - yield characterizations of
textures as smooth, coarse, grainy, etc.
▪ Spectral - based on properties of the Coarse
Fourier spectrum. Detect global periodicity texture
in an image

EIE529 (Fall 2021), The Hong Kong Polytechnic University 29


Statistical Approach for Texture Description

• One of the simplest approaches for describing texture is to use moments of the
gray-level histogram of an image or region
Image 𝑓 → ℎ𝑓 histogram
𝐿−1
1st order moment (mean): 𝜇=෍ 𝑖 ⋅ ℎ𝑓 (𝑖) Measure of average intensity
𝑖=0
2nd order moment 𝐿−1
2 Measure of intensity contrast
෍ 𝑖−𝜇 ⋅ ℎ𝑓 (𝑖)
(variance): 𝑖=0 (Useful for measuring smoothness)
𝐿−1 Measure of skewness of the
3rd order moment: ෍ 𝑖−𝜇 3
⋅ ℎ𝑓 (𝑖)
𝑖=0 histogram
𝐿−1 Measure of the flatness of the
4th order moment: ෍ 𝑖−𝜇 4
⋅ ℎ𝑓 (𝑖)
𝑖=0 histogram
𝐿−1 Measure of variability of intensity
Entropy: −෍ ℎ𝑓 𝑖 log 2 ℎ𝑓 𝑖
𝑖=0 The Hong Kong Polytechnic University (0 for constant image)
EIE529 (Fall 2021), 30
Example
• A normalized measure based on
2nd order moment:
1
𝑅 𝑧 =1− 2
𝜎 𝑧
1+
𝐿−1
• Measure of uniformity:
𝐿−1
2
𝑈 𝑧 =෍ ℎ𝑓 𝑖
Smooth Coarse Regular 𝑖=0

EIE529 (Fall 2021), The Hong Kong Polytechnic University 31


Co-occurrence Matrix

• Measures of texture computed using only


histograms carry no information regarding
spatial relationships between pixels
• Should also consider the relative positions of
pixels in an image
• This can be addressed by using the co-
occurrence matrix
• Given the condition 𝑄 (a.k.a. position
operator) is to “find the relationships with the
pixel immediately to the right of the current Mean that there are three
one”, the co-occurrence matrix 𝐺 is formed times that a number 2 is on
as follows: ⇒ the right of the number 6

EIE529 (Fall 2021), The Hong Kong Polytechnic University 32


Co-occurrence Matrix (Cont’d)

• The size of 𝐺 is equal to 𝐿 × 𝐿, where 𝐿 is the possible number of intensity levels


▪ Assume 𝐿 = 256, the size of 𝐺 becomes 256 x 256, not particularly big
▪ However, since co-occurrence matrix can be used in a sequence to form a feature vector, it is more
desirable to reduce the size of the matrix for easy handling
• One approach is to quantize the intensities into a few bands
▪ Levels 0 to 31 are assigned to 1, 32 to 63 are assigned to 2, …
▪ The original 256 x 256 matrix becomes 8 x 8 by quantizing the intensities
• Assume the total sum of all elements in 𝐺 is 𝑛, the quantity
𝑔𝑖𝑗
𝑝𝑖𝑗 = ,
𝑛
is an estimate of the probability that a pair of points (𝑖, 𝑗) satisfying the condition 𝑄
• These probabilities are in the range [0,1] and their sum is 1

EIE529 (Fall 2021), The Hong Kong Polytechnic University 33


Example Random image gives random 𝐺

Sinusoidal pattern horizontally.


Pixel next to the current one
often has similar intensity. Thus,
most significant coefficients in 𝐺
are on the diagonal

Distribution is more dense, since


normal images often have rich
variation in intensity. Clustered
on the diagonal since large
jumps in intensity are rare
EIE529 (Fall 2021), The Hong Kong Polytechnic University 34
Descriptors Used for Characterizing the Co-
occurrence Matrix

• Different measures can be


applied to 𝐺 to quantify its
“content”

EIE529 (Fall 2021), The Hong Kong Polytechnic University 35


Correlation

• Measure the correlation of a pixel with its neighbour (following the condition of 𝑄) over the entire
image
𝐾 𝐾
𝑖 − 𝑚𝑟 𝑗 − 𝑚𝑐 𝑝𝑖𝑗
෍෍
𝜎𝑟 𝜎𝑐
𝑖=1 𝑗=1

where 𝐾 is the size of 𝐺 (𝐾 × 𝐾); and


𝐾 𝐾 𝐾 𝐾

𝑚𝑟 = ෍ 𝑖 ෍ 𝑝𝑖𝑗 ; 𝑚𝑐 = ෍ 𝑗 ෍ 𝑝𝑖𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
𝐾 𝐾 𝐾 𝐾

𝜎𝑟2 = ෍ 𝑖 − 𝑚𝑟 2
෍ 𝑝𝑖𝑗 ; 𝜎𝑐2 = ෍ 𝑗 − 𝑚𝑐 2
෍ 𝑝𝑖𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
• The range of values is 1 (perfectly positively correlated) to -1 (perfectly negatively correlated)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 36


Example Totally random. Correlation
between neighboring pixels =
0

Many pixels follow the


condition of 𝑄. Concentrating in
a few patterns, since it is
repetitive. Hence high
correlation

Many pixels follow the


condition of 𝑄 (hence
diagonal). A bit more sharp
intensity changes due to object
edges. Still high correlation
EIE529 (Fall 2021), The Hong Kong Polytechnic University 37
Uniformity

• A measure of uniformity in the image following the condition defined by 𝑄


𝐾 𝐾
2
෍ ෍ 𝑝𝑖𝑗
𝑖=1 𝑗=1
• The value is in the range of 0 to 1. For constant image, it is equal to 1
• Due to the square operator, give high value to large 𝑝𝑖𝑗
• That is, image with large number of pixels following the condition of 𝑄 but
concentrating on small number of patterns will have high uniformity

EIE529 (Fall 2021), The Hong Kong Polytechnic University 38


Example Coefficients scatter in 𝐺. Each
𝑝𝑖𝑗 is small. Hence low
uniformity

Many pixels follow the


condition of 𝑄. Concentrating in
a few patterns. Each 𝑝𝑖𝑗 is
large. Hence high uniformity

Many pixels follow the


condition of 𝑄 (hence
diagonal). But there are many
patterns. Hence each 𝑝𝑖𝑗 is not
very large, and therefore
medium uniformity
EIE529 (Fall 2021), The Hong Kong Polytechnic University 39
The more uniform, the less
Due to stronger correlation Due to higher regularity information it carries

EIE529 (Fall 2021), The Hong Kong Polytechnic University 40


Spectral Approaches

• Spectral techniques are based on properties of the Fourier spectrum


and are used primarily to detect global periodicity in an image by
identifying high-energy narrow peaks in the spectrum
• The Fourier spectrum is ideally suited for describing the directionality
of periodic or almost periodic 2D-patterns in an image

EIE529 (Fall 2021), The Hong Kong Polytechnic University 41


Spectral Approaches (Cont’d)

• Three features of the Fourier spectrum that are useful for texture
description:
▪ Prominent peaks in the spectrum give the principal direction of the texture
patterns
▪ The location of the peaks in the frequency plane gives the fundamental spatial
period of the patterns
▪ Eliminating any periodic components via filtering leaves nonperiodic image
elements, which can then be described by statistical techniques

EIE529 (Fall 2021), The Hong Kong Polytechnic University 42


Spectral Approaches (Cont’d)

• The spectrum can be expressed in polar coordinates to yield a function


𝑆(𝑟, 𝜃)
• Two functions can then be used to describe the texture accordingly:
𝑅

𝑆 𝜃 = ෍ 𝑆(𝑟, 𝜃)
𝑟=0

𝑆 𝑟 = ෍ 𝑆(𝑟, 𝜃)
𝜃=0

EIE529 (Fall 2021), The Hong Kong Polytechnic University 43


Example

• The periodic bursts of energy extending quadrilaterally in


two dimensions in both Fourier spectra are due to the
periodic texture of the coarse background material on
which the objects rest
• The other dominant components in the spectra in (c) are
caused by the random orientation of the object edges in
(a)
(a) (b) • On the other hand, the main energy in (d) not associated
with the background is along the horizontal axis,
corresponding to the strong vertical edges in (b)

(a) and (b) Images of random and


ordered objects.
(c) and (d) Corresponding Fourier spectra.
All images are of size 600 × 600 pixels.
(c) (d)
EIE529 (Fall 2021), The Hong Kong Polytechnic University 44
Example
𝑆(𝑟) and 𝑆(𝜃) for the Fourier spectra in (c)
• The plot of 𝑆(𝑟) for the random objects shows no strong
periodic components (i.e., there are no dominant peaks in
the spectrum besides the peak at the origin, which is the
dc component)
• Conversely, the plot of 𝑆(𝑟) for the ordered objects
shows a strong peak near r = 15 and a smaller one near
r = 25, corresponding to the periodic horizontal
repetition of the light (objects) and dark (background)
regions
• The random nature of the energy bursts in (c) is quite
apparent in the plot of 𝑆(𝜃)
• By contrast, the plot in (d) shows strong energy
components in the region near the origin and at 90° and
180°
𝑆(𝑟) and 𝑆(𝜃) for the Fourier spectra in (d)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 45


Scale-invariant Feature Transform (SIFT)

[Reference: David G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.]
(Based in part on slides by Shalomi Eldar http://cs.haifa.ac.il/hagit/courses/seminars/visionTopics/Presentations/Lecture01_SIFT.ppt)
EIE529 (Fall 2021), The Hong Kong Polytechnic University 46
SIFT

← Citations: > 63K

Professor D. G. Lowe

EIE529 (Fall 2021), The Hong Kong Polytechnic University 47


Desired “Features” of the Features for Extraction

• Robustness ⇒ Invariance to changes in illumination, scale, rotation,


affine, perspective
• Locality ⇒ Derive from the local property of the image. Important for
cluttered images and also robust to occlusion when there are changes
in irrelevant parts of the image
• Distinctiveness ⇒ Easy to match to a large database of objects
• Quantity ⇒ Many features can be generated for even small objects
• Efficiency ⇒ Computationally “cheap”, real-time performance

EIE529 (Fall 2021), The Hong Kong Polytechnic University 48


SIFT Algorithm

• Input: Image 𝑛 × 𝑚
• Output: Set of descriptors of image’s features
• Descriptor: Based on spatial structure - Extrema
• Algorithm
1) Scale-space extrema detection
2) Keypoint localization
3) Orientation assignment
4) Generation of keypoint descriptors
• Performance
▪ Typical image of size 500x500 pixels produces about 2000 stable keypoints
▪ The descriptors are invariant to changes in scale, orientation, brightness and contrast of the image
▪ Near real-time performance can be achieved

EIE529 (Fall 2021), The Hong Kong Polytechnic University 49


Scale Space

• The first stage of the SIFT algorithm is to find image locations that are invariant to scale
change
• Achieved by searching for stable features across all possible scales, using a function of
scale known as scale space
▪ Actual scale of the objects in an image is not known
▪ Precompute all possible scales of the objects and form a scale space
• Lowering the scale of an image often leads to reduction in details. Need to simulate such
effect
▪ Achieved by smoothing the image with a Gaussian kernel of different sizes
• A scale space 𝐿 𝑥, 𝑦, 𝜎 of a grayscale image, 𝑓(𝑥, 𝑦), is produced by convolving 𝑓 with
a variable-scale Gaussian kernel 𝐺 𝑥, 𝑦, 𝜎
𝐿 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝑓(𝑥, 𝑦)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 50


Scale Space (Cont’d)

• For the 1st octave, 𝑓 is convolved by


𝐺 of different sizes (𝜎) controlled by
the power of 𝑘
• The 2nd octave is generated by
convolving 𝑓 with 𝐺 having 2 times
larger 𝜎 and then downsampling the
result by a factor of 2
• The other octaves are generated
similarly

EIE529 (Fall 2021), The Hong Kong Polytechnic University 51


Scale Space Example

• From the scale numbers in the table,


it can be seen that we do not need
to do the convolution for every scale
• Scale 1 of Octave 2 can be
generated by directly downsampling
the image of Scale 3 of Octave 1
• The other scales can also be
obtained in the same way

EIE529 (Fall 2021), The Hong Kong Polytechnic University 52


Detecting Local Minima

• An extrema is a maximum or minimum point in the image. They can


be detected by applying a Laplacian operator on the image
• To avoid unstable result due to noise, the image is usually smoothed
by a Gaussian kernel before applying the Laplacian operator
• Lead to the Laplacian of Gaussian (LoG) operator LoG kernel
∇2 𝐺(𝑥, 𝑦) ∗ 𝑓(𝑥, 𝑦)

• However, the difference of Gaussians (DoG) is an approximation to LoG. Thus we can


detect the extrema by directly finding the differences of adjacent scale coefficients. E.g.,
for scale 1 and 2,
𝐷 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝑘𝜎 − 𝐺(𝑥, 𝑦, 𝜎) ∗ 𝑓 𝑥, 𝑦
= 𝐿 𝑥, 𝑦, 𝑘𝜎 − 𝐿(𝑥, 𝑦, 𝜎)
EIE529 (Fall 2021), The Hong Kong Polytechnic University 53
Detecting Local Minima (Cont’d)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 54


1D Example

1D signal with 3 bumps


of different widths

[Reference: A. Torralba, “Image features, SIFT, homographies, RANSAC and panoramas,” Lecture notes for Advances in Computer Vision.]
EIE529 (Fall 2021), The Hong Kong Polytechnic University 55
1D Example (Cont’d)

Blurred with Gaussian of


increasing width

Scale
Space

Difference of Gaussian

Scale
Space
EIE529 (Fall 2021), The Hong Kong Polytechnic University 56
1D Example (Cont’d)

• Scales of peak responses are proportional


to bump width (the characteristic scale of
each bump): [1.71, 3.14, 5.28] ./ [5, 9, 15]
≈ [0.3429, 0.3492, 0.3524]

EIE529 (Fall 2021), The Hong Kong Polytechnic University 57


Extracting Keypoints

• A coefficient is selected as keypoint if it is larger or smaller (extrema) than


all 26 neighbours

The coefficient under consideration

EIE529 (Fall 2021), The Hong Kong Polytechnic University 58


Extracting Keypoints
Extrema detection example
• Each arrow represents a keypoint
• The magnitude and orientation of
the arrow represents the magnitude
and orientation of the keypoint
• The scale that the keypoint locates is
not shown in the image
• 233 x 189 pixels image ⇒ 832
DoG keypoints

Not all of them are good …

EIE529 (Fall 2021), The Hong Kong Polytechnic University 59


Problematic Keypoints – Inaccurate Location

• Inaccurate locations are obtained due to scaling and sampling

EIE529 (Fall 2021), The Hong Kong Polytechnic University 60


Inaccurate location - Solution

• Fit an interpolating function at each extremum point, then look for an improve extremum
location
• Let the DoG at 𝑋 = 𝑥, 𝑦, 𝜎 𝑇 be 𝐷(𝑋). For any offset 𝑋෠ from 𝑋 not on the sampling grid,
𝐷(𝑋 + 𝑋) ෠ can be estimated using the Taylor series expansion,
𝑇
𝜕𝐷 1 𝑇 𝜕 𝜕𝐷

𝐷 𝑋 + 𝑋 = 𝐷(𝑋) + 𝑋 + 𝑋෠
෠ 𝑋෠
𝜕𝑋 2 𝜕𝑋 𝜕𝑋
1 𝑇
= 𝐷(𝑋) + ∇𝐷 𝑋 + 𝑋෠ 𝐻 𝑋෠
𝑇 ෠ (a)
2
𝜕𝐷 Τ𝜕𝑥 𝜕 2 𝐷/𝜕𝑥 2 𝜕 2 𝐷/𝜕𝑥𝜕𝑦 𝜕 2 𝐷/𝜕𝑥𝜕𝜎
𝜕𝐷
∇𝐷 = = 𝜕𝐷/𝜕𝑦 ; 𝐻 = 𝜕 2 𝐷/𝜕𝑦𝜕𝑥 𝜕 2 𝐷/𝜕𝑦 2 𝜕 2 𝐷/𝜕𝑦𝜕𝜎
𝜕𝑋
𝜕𝐷/𝜕𝜎 𝜕 2 𝐷/𝜕𝜎𝜕𝑥 𝜕 2 𝐷/𝜕𝜎𝜕𝑦 𝜕 2 𝐷/𝜕𝜎 2
Gradient matrix Hessian matrix
EIE529 (Fall 2021), The Hong Kong Polytechnic University 61
Inaccurate location - Solution (Cont’d)

• To find the 𝑋෠ that gives the true extrema, differentiate Eqn. (a) with respect to 𝑋෠ and set the result
to 0
1 ෠𝑇 ෠
𝜕𝐷 𝑋 + 𝑋෠ 𝜕𝐷 𝜕 ∇𝐷 𝑋 𝑇 ෠ 𝜕 𝑋 𝐻𝑋
= + + 2

𝜕𝑋 𝜕𝑋෠ ෠
𝜕𝑋 𝜕𝑋෠
• ෠ Thus
Since 𝐷, ∇𝐷 and 𝐻 are evaluated on the sample point 𝑋, they do not change with 𝜕𝑋.
𝜕𝐷 𝑋 + 𝑋෠ 1 𝑇
= 0 + ∇𝐷 + 2𝑋෠ 𝐻
𝑇

𝜕𝑋 2
• Set the derivative to 0, we have,
𝑋෠ 𝑇 = − ∇𝐷 𝑇 𝐻−1 or 𝑋෠ = −𝐻 −1 ∇𝐷 Since 𝐻 −1 is symmetric

• If 𝑋෠ is greater than 0.5 in any of its three dimensions (assume the sampling distance is normalized to
1), choose the next sample point

EIE529 (Fall 2021), The Hong Kong Polytechnic University 62


Problematic Keypoints - Low Contrast

• If an extremum does not have a big difference from its neighbours, it may be the
result of noise
• Setting such extremum as a keypoint is highly unreliable
• For SIFT, if 𝐷 𝑋 + 𝑋෠ < 0.03 (pixel values in range [0,1]), the keypoint will be
discarded

EIE529 (Fall 2021), The Hong Kong Polytechnic University 63


Problematic Keypoints - Edge Response

• Image edges can generate many extrema since they introduce sharp
changes in intensity
• However, they are not good keypoints since all extrema along an edge can
have similar magnitude and orientation
• Choose corners as keypoints

Point detection Point can move along the Point detection


edge having similar
magnitude and orientation
EIE529 (Fall 2021), The Hong Kong Polytechnic University 64
Edge Response - Solution

• Check keypoints’ “cornerness”


• High “cornerness” → No dominant principal curvature component
• Edges → Strong curvature in one direction but much weaker in the
orthogonal direction
Point constrained

EIE529 (Fall 2021), The Hong Kong Polytechnic University 65


Edge Response - Solution (Cont’d)

• Curvature at a point of an image can be estimated from the 2x2 Hessian matrix
evaluated at that point
• Thus, to estimate local curvature of the DoG 𝐷 at any level of the scale space, we compute
the Hessian matrix of 𝐷 at that level:
𝜕 2 𝐷Τ𝜕𝑥 2 𝜕 2 𝐷Τ𝜕𝑥𝜕𝑦 𝐷𝑥𝑥 𝐷𝑥𝑦
𝐻= 2 =
𝜕 𝐷Τ𝜕𝑦𝜕𝑥 2 Τ
𝜕 𝐷 𝜕𝑦 2 𝐷𝑦𝑥 𝐷𝑦𝑦
• The ratio of largest and smallest eigenvalues of 𝐻
𝜆𝑚𝑎𝑥
𝑟≜
𝜆𝑚𝑖𝑛
is proportional to the ratio between the principle curvature and that orthogonal to it

EIE529 (Fall 2021), The Hong Kong Polytechnic University 66


Edge Response - Solution (Cont’d)

• Rather than directly computing the eigenvalues, a simpler method is to evaluate:


2 2 2
Tr(𝐻) 𝐷𝑥𝑥 + 𝐷𝑦𝑦 𝑟+1
= 2 =
Det(𝐻) 𝐷𝑥𝑥 𝐷𝑦𝑦 − 𝐷𝑥𝑦 𝑟
𝑟+1 2
• The minimum of
𝑟
occurs when the eigenvalues are equal (i.e. 𝑟 = 1), and it increases
with 𝑟
• In SIFT, keypoints with ratios of curvature 𝑟 greater than 10 will be eliminated
• Thus, we just need to check if
Tr(𝐻) 2 10 + 1 2
<
Det(𝐻) 10

EIE529 (Fall 2021), The Hong Kong Polytechnic University 67


Example

Reduce from 832 to 536 keypoints after the above keypoint filtering process

EIE529 (Fall 2021), The Hong Kong Polytechnic University 68


Keypoint Orientation

• After the positions of the keypoints are determined, their orientations are to be
evaluated next
• Rather than using the orientation of the keypoint itself, the average of orientations
of surrounding sample points is adopted
• For a particular scale, the magnitude and orientation of every sample point are
determined as follows:
2 2
𝑚 𝑥, 𝑦 = 𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦 + 𝐿 𝑥, 𝑦 + 1 − 𝐿 𝑥, 𝑦 − 1
𝐿 𝑥, 𝑦 + 1 − 𝐿(𝑥, 𝑦 − 1)
𝜃 𝑥, 𝑦 = tan−1
𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦

EIE529 (Fall 2021), The Hong Kong Polytechnic University 69


Keypoint Orientation (Cont’d)

• A histogram of orientations is formed from the gradient orientations of sample


points in a neighborhood of each keypoint
• The histogram has 36 bins covering the 360° range of orientations on the image
plane (each 10°)
• Each sample added to the histogram is weighed by its gradient magnitude, and by
a circular Gaussian function with a 𝜎 1.5 times the scale of the keypoint
• The peak is the principal
orientation
• Any histogram peak within
80% of highest peak is
assigned to keypoint (multiple
assignments possible)
EIE529 (Fall 2021), The Hong Kong Polytechnic University 70
Keypoint Orientation (Cont’d)

• For each keypoint, the relative orientations


of its neighbour 16x16 pixels are added
to sixteen 8-bin (each 45o ) histograms
based on their gradient magnitude
multiplied with a Gaussian function
• The orientation is made relative to the
principal orientation of the keypoint
• The 16 histograms form a descriptor of
128 elements of the keypoint

EIE529 (Fall 2021), The Hong Kong Polytechnic University 71


Why SIFT Descriptor is Scale, Rotation, Brightness,
and Contrast Invariant?
• Scale invariant
▪ The scale space is populated with the keypoints of different scales of an image
▪ If an image is enlarged or shrunk, a keypoint descriptor can still find a match in
one of the scales of the original image
• Rotation invariant
▪ The orientation of a keypoint descriptor is made relative to the principal
orientation of the keypoint
▪ If an image is rotated, the descriptor will have no change since the principal
orientation of the keypoint is also rotated by the same amount

EIE529 (Fall 2021), The Hong Kong Polytechnic University 72


Why SIFT Descriptor is Scale, Rotation, Brightness,
and Contrast Invariant?
• Brightness invariant
▪ A global change of brightness refers to adding/subtracting a constant to/from
every pixel of an image
▪ A keypoint description is based on gradient (difference of pixels), will not
change due to the constant added/subtracted to/from the pixels
• Contrast invariant
▪ A global change of contrast refers to multiplying a constant to every pixel of
an image
▪ It makes the keypoint descriptor having a large magnitude, but the relative
differences among the descriptor elements are the same
▪ Hence can be easily rectified by a normalization process

EIE529 (Fall 2021), The Hong Kong Polytechnic University 73


Summary of the SIFT Algorithm

1. Construct the scale space


2. Obtain the initial keypoints
3. Improve the accuracy of the location of the keypoints using the Taylor
expansion method
4. Delete unsuitable keypoints due to low contrast or the edges
5. Compute keypoint orientations and determine the principal orientation
6. Compute 128-element keypoint descriptors, which is the signatures of the
keypoints

EIE529 (Fall 2021), The Hong Kong Polytechnic University 74


Performance
Keypoints computed
separately on that edge Keypoint matching result
• Only 3 errors in 36 matches
• Show the uniqueness of the
keypoints

EIE529 (Fall 2021), The Hong Kong Polytechnic University 75


Performance
Subimage is Subimage is scaled down
rotated by 5o to half of the original size
• Only 2 errors in 10 matches • Only 4 errors in 11 matches
• Show the insensitivity to rotation • Still show good performance in scaling

EIE529 (Fall 2021), The Hong Kong Polytechnic University 76


Applications of SIFT - Object Recognition
• Object recognition

Training image
Query image Recognition

EIE529 (Fall 2021), The Hong Kong Polytechnic University 77


Applications of SIFT - Image Stitching

[Reference: M. Brown and D. G. Lowe, “Recognising panoramas,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), Oct. 2003.]

EIE529 (Fall 2021), The Hong Kong Polytechnic University 78


Applications of SIFT - Object Tracking

EIE529 (Fall 2021), The Hong Kong Polytechnic University 79


Summary

• Shape descriptors
▪ Chain code
▪ Fourier descriptor
• Region descriptors
▪ Statistical approach
▪ Spectral approach
• Scale-invariant feature transform (SIFT)

EIE529 (Fall 2021), The Hong Kong Polytechnic University 80


References

• D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J.


Comput. Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.
• M. Brown and D. G. Lowe, “Recognising panoramas,” in Proc. IEEE Int. Conf.
Comput. Vision (ICCV), Oct. 2003.

Copyright and Usage of Online Learning Materials


The learning and teaching platforms of The Hong Kong Polytechnic University (“PolyU”) are for the use of PolyU students to facilitate their learning. The student shall use the platforms and the
materials available (including teaching sessions conducted by staff of PolyU) for their personal study only. Where a student needs to download or save the materials available on the platforms for the
permitted purposes, the student shall take all necessary measures to prevent their access by other parties. The materials are copyright protected. Save for the permitted purposes, no copying, distribution,
transmission or publication of the materials in whole or in part in any form is permitted.

EIE529 (Fall 2021), The Hong Kong Polytechnic University 81

You might also like