[EIE529] Feature Extraction
[EIE529] Feature Extraction
Dr Yuyi MAO
Department of Electronic and Information Engineering
The Hong Kong Polytechnic University
EIE529 Digital Image Processing (Fall 2021)
Topics to be Discussed
• Shape descriptors
• Region descriptors
• Scale-invariant feature transform (SIFT)
• During the image segmentation process, pixels of the interested objects are
clustered into different separated regions
• It is a standard practice to use schemes that compact the segmented data into
representations that facilitate the computation of descriptors
• Image representation - To represent and describe the resulting aggregate of
segmented pixels in a form suitable for further computer processing after
segmenting an image into regions
• Two choices for representing a region
▪ External characteristics: Shape characteristics represented by the boundary
▪ Internal characteristics: Reflectivity properties such as the color and texture of the
pixels comprising the region
Directions for (a) 4-directional chain code and (b) 8-direction chain code.
EIE529 (Fall 2021), The Hong Kong Polytechnic University 6
Chain Codes - Example
{007665424222} {00303332212111}
• Use the first difference of the chain code instead of the code itself
• The difference is simply by counting (counter-clockwise) the number of directions
that separate two adjacent elements of the code
{007665424222} {007646444221}
{607707762600} {707762600607}
1st difference 1st difference
{006077077626} {006077077626}
min. integer min. integer
Rotated
EIE529 (Fall 2021), The Hong Kong Polytechnic University 12
Smoothed by a Thresholding using
Example 9x9 box filter the Otsu’s method
Resampled
All Fourier
coefficients are
scaled up, but not
the shape. Also,
phase is the same
𝑎 𝑢 = 𝑠 𝑘 𝑒 −𝑗2𝜋𝑢𝑘/𝐾 , 𝑢 = 0,1, ⋯ , 𝐾 − 1
𝑘=0
• In general, only the first few coefficients are of significant magnitude and
are pretty enough to describe the general shape of the boundary
• Fourier descriptors are not totally insensitive to geometrical changes such as
translation, rotation and scale changes, but the changes can be related to
simple transformations on the descriptors
• Similar to the shape number method, using all boundary pixels will end up
with a long Fourier descriptor, which is difficult to handle
• Use only a small percentage of the Fourier descriptor can also well
represent the boundary
▪ Since the high frequency ones represent the detail of the boundary which is in
general not needed for the purpose of feature extraction
Boundaries
Boundary of a reconstructed using
human chromosome different percentages
(2868 points) of Fourier descriptors
(by cutting the tail)
𝜆21 −𝜆22
Eccentricity = ∈ [0,1) given 𝜆1 ≥ 𝜆2 , indicate how
𝜆1
much difference of the major and minor axes when fitting
with an ellipse
• One of the simplest approaches for describing texture is to use moments of the
gray-level histogram of an image or region
Image 𝑓 → ℎ𝑓 histogram
𝐿−1
1st order moment (mean): 𝜇= 𝑖 ⋅ ℎ𝑓 (𝑖) Measure of average intensity
𝑖=0
2nd order moment 𝐿−1
2 Measure of intensity contrast
𝑖−𝜇 ⋅ ℎ𝑓 (𝑖)
(variance): 𝑖=0 (Useful for measuring smoothness)
𝐿−1 Measure of skewness of the
3rd order moment: 𝑖−𝜇 3
⋅ ℎ𝑓 (𝑖)
𝑖=0 histogram
𝐿−1 Measure of the flatness of the
4th order moment: 𝑖−𝜇 4
⋅ ℎ𝑓 (𝑖)
𝑖=0 histogram
𝐿−1 Measure of variability of intensity
Entropy: − ℎ𝑓 𝑖 log 2 ℎ𝑓 𝑖
𝑖=0 The Hong Kong Polytechnic University (0 for constant image)
EIE529 (Fall 2021), 30
Example
• A normalized measure based on
2nd order moment:
1
𝑅 𝑧 =1− 2
𝜎 𝑧
1+
𝐿−1
• Measure of uniformity:
𝐿−1
2
𝑈 𝑧 = ℎ𝑓 𝑖
Smooth Coarse Regular 𝑖=0
• Measure the correlation of a pixel with its neighbour (following the condition of 𝑄) over the entire
image
𝐾 𝐾
𝑖 − 𝑚𝑟 𝑗 − 𝑚𝑐 𝑝𝑖𝑗
𝜎𝑟 𝜎𝑐
𝑖=1 𝑗=1
𝑚𝑟 = 𝑖 𝑝𝑖𝑗 ; 𝑚𝑐 = 𝑗 𝑝𝑖𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
𝐾 𝐾 𝐾 𝐾
𝜎𝑟2 = 𝑖 − 𝑚𝑟 2
𝑝𝑖𝑗 ; 𝜎𝑐2 = 𝑗 − 𝑚𝑐 2
𝑝𝑖𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
• The range of values is 1 (perfectly positively correlated) to -1 (perfectly negatively correlated)
• Three features of the Fourier spectrum that are useful for texture
description:
▪ Prominent peaks in the spectrum give the principal direction of the texture
patterns
▪ The location of the peaks in the frequency plane gives the fundamental spatial
period of the patterns
▪ Eliminating any periodic components via filtering leaves nonperiodic image
elements, which can then be described by statistical techniques
𝑆 𝜃 = 𝑆(𝑟, 𝜃)
𝑟=0
𝑆 𝑟 = 𝑆(𝑟, 𝜃)
𝜃=0
[Reference: David G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004.]
(Based in part on slides by Shalomi Eldar http://cs.haifa.ac.il/hagit/courses/seminars/visionTopics/Presentations/Lecture01_SIFT.ppt)
EIE529 (Fall 2021), The Hong Kong Polytechnic University 46
SIFT
Professor D. G. Lowe
• Input: Image 𝑛 × 𝑚
• Output: Set of descriptors of image’s features
• Descriptor: Based on spatial structure - Extrema
• Algorithm
1) Scale-space extrema detection
2) Keypoint localization
3) Orientation assignment
4) Generation of keypoint descriptors
• Performance
▪ Typical image of size 500x500 pixels produces about 2000 stable keypoints
▪ The descriptors are invariant to changes in scale, orientation, brightness and contrast of the image
▪ Near real-time performance can be achieved
• The first stage of the SIFT algorithm is to find image locations that are invariant to scale
change
• Achieved by searching for stable features across all possible scales, using a function of
scale known as scale space
▪ Actual scale of the objects in an image is not known
▪ Precompute all possible scales of the objects and form a scale space
• Lowering the scale of an image often leads to reduction in details. Need to simulate such
effect
▪ Achieved by smoothing the image with a Gaussian kernel of different sizes
• A scale space 𝐿 𝑥, 𝑦, 𝜎 of a grayscale image, 𝑓(𝑥, 𝑦), is produced by convolving 𝑓 with
a variable-scale Gaussian kernel 𝐺 𝑥, 𝑦, 𝜎
𝐿 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝑓(𝑥, 𝑦)
[Reference: A. Torralba, “Image features, SIFT, homographies, RANSAC and panoramas,” Lecture notes for Advances in Computer Vision.]
EIE529 (Fall 2021), The Hong Kong Polytechnic University 55
1D Example (Cont’d)
Scale
Space
Difference of Gaussian
Scale
Space
EIE529 (Fall 2021), The Hong Kong Polytechnic University 56
1D Example (Cont’d)
• Fit an interpolating function at each extremum point, then look for an improve extremum
location
• Let the DoG at 𝑋 = 𝑥, 𝑦, 𝜎 𝑇 be 𝐷(𝑋). For any offset 𝑋 from 𝑋 not on the sampling grid,
𝐷(𝑋 + 𝑋) can be estimated using the Taylor series expansion,
𝑇
𝜕𝐷 1 𝑇 𝜕 𝜕𝐷
𝐷 𝑋 + 𝑋 = 𝐷(𝑋) + 𝑋 + 𝑋
𝑋
𝜕𝑋 2 𝜕𝑋 𝜕𝑋
1 𝑇
= 𝐷(𝑋) + ∇𝐷 𝑋 + 𝑋 𝐻 𝑋
𝑇 (a)
2
𝜕𝐷 Τ𝜕𝑥 𝜕 2 𝐷/𝜕𝑥 2 𝜕 2 𝐷/𝜕𝑥𝜕𝑦 𝜕 2 𝐷/𝜕𝑥𝜕𝜎
𝜕𝐷
∇𝐷 = = 𝜕𝐷/𝜕𝑦 ; 𝐻 = 𝜕 2 𝐷/𝜕𝑦𝜕𝑥 𝜕 2 𝐷/𝜕𝑦 2 𝜕 2 𝐷/𝜕𝑦𝜕𝜎
𝜕𝑋
𝜕𝐷/𝜕𝜎 𝜕 2 𝐷/𝜕𝜎𝜕𝑥 𝜕 2 𝐷/𝜕𝜎𝜕𝑦 𝜕 2 𝐷/𝜕𝜎 2
Gradient matrix Hessian matrix
EIE529 (Fall 2021), The Hong Kong Polytechnic University 61
Inaccurate location - Solution (Cont’d)
• To find the 𝑋 that gives the true extrema, differentiate Eqn. (a) with respect to 𝑋 and set the result
to 0
1 𝑇
𝜕𝐷 𝑋 + 𝑋 𝜕𝐷 𝜕 ∇𝐷 𝑋 𝑇 𝜕 𝑋 𝐻𝑋
= + + 2
𝜕𝑋 𝜕𝑋
𝜕𝑋 𝜕𝑋
• Thus
Since 𝐷, ∇𝐷 and 𝐻 are evaluated on the sample point 𝑋, they do not change with 𝜕𝑋.
𝜕𝐷 𝑋 + 𝑋 1 𝑇
= 0 + ∇𝐷 + 2𝑋 𝐻
𝑇
𝜕𝑋 2
• Set the derivative to 0, we have,
𝑋 𝑇 = − ∇𝐷 𝑇 𝐻−1 or 𝑋 = −𝐻 −1 ∇𝐷 Since 𝐻 −1 is symmetric
• If 𝑋 is greater than 0.5 in any of its three dimensions (assume the sampling distance is normalized to
1), choose the next sample point
• If an extremum does not have a big difference from its neighbours, it may be the
result of noise
• Setting such extremum as a keypoint is highly unreliable
• For SIFT, if 𝐷 𝑋 + 𝑋 < 0.03 (pixel values in range [0,1]), the keypoint will be
discarded
• Image edges can generate many extrema since they introduce sharp
changes in intensity
• However, they are not good keypoints since all extrema along an edge can
have similar magnitude and orientation
• Choose corners as keypoints
• Curvature at a point of an image can be estimated from the 2x2 Hessian matrix
evaluated at that point
• Thus, to estimate local curvature of the DoG 𝐷 at any level of the scale space, we compute
the Hessian matrix of 𝐷 at that level:
𝜕 2 𝐷Τ𝜕𝑥 2 𝜕 2 𝐷Τ𝜕𝑥𝜕𝑦 𝐷𝑥𝑥 𝐷𝑥𝑦
𝐻= 2 =
𝜕 𝐷Τ𝜕𝑦𝜕𝑥 2 Τ
𝜕 𝐷 𝜕𝑦 2 𝐷𝑦𝑥 𝐷𝑦𝑦
• The ratio of largest and smallest eigenvalues of 𝐻
𝜆𝑚𝑎𝑥
𝑟≜
𝜆𝑚𝑖𝑛
is proportional to the ratio between the principle curvature and that orthogonal to it
Reduce from 832 to 536 keypoints after the above keypoint filtering process
• After the positions of the keypoints are determined, their orientations are to be
evaluated next
• Rather than using the orientation of the keypoint itself, the average of orientations
of surrounding sample points is adopted
• For a particular scale, the magnitude and orientation of every sample point are
determined as follows:
2 2
𝑚 𝑥, 𝑦 = 𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦 + 𝐿 𝑥, 𝑦 + 1 − 𝐿 𝑥, 𝑦 − 1
𝐿 𝑥, 𝑦 + 1 − 𝐿(𝑥, 𝑦 − 1)
𝜃 𝑥, 𝑦 = tan−1
𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦
Training image
Query image Recognition
[Reference: M. Brown and D. G. Lowe, “Recognising panoramas,” in Proc. IEEE Int. Conf. Comput. Vision (ICCV), Oct. 2003.]
• Shape descriptors
▪ Chain code
▪ Fourier descriptor
• Region descriptors
▪ Statistical approach
▪ Spectral approach
• Scale-invariant feature transform (SIFT)