05-06 Feature and Object Descriptions
05-06 Feature and Object Descriptions
Descriptions
Computer Vision
Outline
• Feature and Object Descriptions
• Descriptors for contours
• Descriptors for areas
• Descriptors for points
• Points matching
• Model matching
Slide 2 of 103
Feature and Object Descriptions
Matching problem
How to match features on different images?
Slide 4 of 103
Description of the contours
The result of detectors and segmentation algorithms is a set of special points for which it
is necessary to construct a mathematical description.
Contour (chain) codes
Starting from the first point, the contour is traversed clockwise, with each subsequent
point being encoded with a number from 0 to 7, depending on its location.
Slide 5 of 103
Description of the contours
Contour (chain) codes
Disadvantages:
• dependence on the starting point of encoding;
• do not have the property of invariance to rotation;
• instability to noise, local changes in the contour can lead to different encoding results.
Slide 6 of 103
Description of the contours
Piecewise polynomial approximation
Search for a curve passing near a given set of contour points.
The curve is divided by separate nodes into segments, while the approximating function
on each of the segments looks like:
𝑓 𝑥 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑛 𝑥 𝑛 ,
Slide 7 of 103
Description of the contours
Piecewise linear approximation
For each pair of nodes, it is necessary to determine two coefficients 𝑎0 and 𝑎1 , the total
number of coefficients to be determined is 2 𝑛 + 1 , where 𝑛 is the total number of
nodes.
For piecewise linear approximation, an iterative algorithm for the selection of endpoints
can be used:
1. The end points of the contour A and B are connected by a straight line.
2. Distances to line AB are calculated for the remaining points.
3. The point that has the greatest deviation from line AB is taken as an additional node.
4. The curve is replaced by two segments AC and CB.
The procedure continues until the maximum value of the points deviation is less than the
specified threshold. The accuracy of the piecewise linear approximation approximation is
determined by the threshold value.
Slide 8 of 103
Description of the contours
Piecewise linear approximation
Disadvantages:
• the approximating function is not smooth (the first derivatives are discontinuous at the
grid nodes);
• dependence of the approximation results on the initial experimental data.
Slide 9 of 103
Description of the contours
Spline fit
• In practice, cubic splines are often used for approximation.
• Cubic splines give a high approximation accuracy and smoothness of the function.
• If the function being approximated has strong inflections, then in some cases the
cubic spline gives outliers.
• The spline of the first degree in this situation does not allow outliers, but it is difficult
to ensure the required accuracy of the approximation.
• Significant difficulties arise in the case of approximation of functions with large
values of curvature.
• The use of both cubic and first degree splines is associated with a large number of
interpolation nodes.
Slide 10 of 103
Description of the contours
Rational splines
They combine the properties of first-degree and cubic splines, allow approximating
functions with large curvature values and with breakpoints.
A rational spline is a function 𝑆𝑅 𝑥 , which on each segment 𝑥𝑖 , 𝑥𝑖+1 has the form:
𝑐𝑖 𝑡 3 𝑑𝑖 1 − 𝑡 3
𝑆𝑅 𝑥 = 𝑎𝑖 𝑡 + 𝑏𝑖 1 − 𝑡 + +
1 + 𝑝𝑖 1 − 𝑡 1 + 𝑞𝑖 𝑡
𝑥−𝑥𝑖
where 𝑡 = 𝑥 , 𝑝𝑖 , 𝑞𝑖 – are given numbers, and 0 < 𝑝𝑖 , 𝑞𝑖 < ∞.
𝑖+1 −𝑥𝑖
The parameters 𝑝𝑖 , 𝑞𝑖 define the properties of rational splines:
1. If 𝑝𝑖 , 𝑞𝑖 are close to zero, then the rational spline becomes cubic;
2. if the parameters 𝑝𝑖 , 𝑞𝑖 are large enough, then the estimates of the spline error are
comparable to a first-degree spline.
In most cases, it is customary to assume 𝑝𝑖 = 𝑞𝑖 .
Slide 11 of 103
Description of the contours
Natural curve representation
• The natural presentation of the curve implies the absence of connection points and branches on
the contours.
• The contour is represented as a one-dimensional function of an attribute on the length of the arc.
• The length of the arc 𝑙𝑗 of the discrete contour at the point 𝑃 𝑗 = 𝑥𝑗 , 𝑦𝑗 can be approximated
as follows: 𝑗−1
𝑙𝑗 = 𝑥𝑖 − 𝑥𝑖+1 2 + 𝑦𝑖 − 𝑦𝑖+1 2
𝑖=1
• The contour representation is often used as a function of curvature 𝐾 𝑙 , calculated by the
formula: 𝑓′ 𝑓′′ − 𝑓′′ 𝑓′
𝑥 𝑦 𝑥 𝑦
𝐾 𝑙 = 𝐾 𝑥 𝑙 ,𝑦 𝑙 = ,
2 2 3
𝑓′𝑥 + 𝑓′𝑦
where 𝑓′𝑥 , 𝑓′𝑦 – the first derivatives with respect to х and у respectively;
𝑓′′𝑥 , 𝑓′′𝑦 – the second derivatives with respect to х and у.
Slide 12 of 103
Description of the contours
Natural curve representation
Disadvantages:
Curvature function
• lack of invariance to scale;
• rectilinear (straight) contours cannot be represented as a function of curvature;
• the need to approximate curves for accurate calculation of derivatives at a point.
Slide 13 of 103
Description of the contours
Natural curve representation
• An analogue of curvature is the amount of contour inflection at a point.
• To obtain the inflection value, no curve approximation is required, but a discrete representation of
the curve in the form of a sequence of pixel coordinates of the contour points is used.
𝑃 𝑖
𝑃 𝑖+𝑘
𝑃 𝑖−𝑘
Slide 15 of 103
Description of the contours
Singular points of contours
• The number and positions of contour singular points (points of maximum inflection,
local extrema of the curvature function, end points, branch points) can be used as
characteristic features.
• First of all, try to select corner points on the contour, because endpoints and branch
points are not reliable enough and are highly susceptible to noise.
• A reliable way to identify special points is to search for the extreme values of any
attribute of the contour, for example, the extrema of the curvature function, for the
search of which it is necessary:
1. Perform piecewise polynomial approximation of the contour;
2. Construct a curvature function;
3. Find all local extrema of curvature.
Slide 16 of 103
Description of the contours
Singular points of contours
Piecewise polynomial approximation of the curve allows you to more accurately calculate
the values of the first two derivatives in directions at points, and, consequently, the value
of the curvature itself.
Slide 17 of 103
Description of areas
Description of selected areas
• Descriptors are vector-signs of a point's neighborhood.
• Features are built on the basis of information about the intensity, color and texture of
special points.
• It is necessary to describe each point of interest with a certain set of parameters.
Slide 18 of 103
Description of areas
Typical feature set for areas (images)
1. Topological features:
• the number of disconnected components - the number of separate objects in the image;
• the number of holes - are there any holes inside the object;
• Euler's number (Euler’s characteristic) - the number of objects minus the number of holes).
2. Geometric features (characterize the shape of the image):
• the area of the image 𝑆 is calculated as the number of nonzero elements of the image;
• the position of the center of gravity of the image, calculated in terms of static moments:
Ω 𝐵 𝑥, 𝑦 𝑥𝑑𝑥 Ω 𝐵 𝑥, 𝑦 𝑦𝑑𝑦
𝑥𝑐 = , 𝑦𝑐 = ,
Ω 𝐵 𝑥, 𝑦 𝑑𝑥𝑑𝑦 Ω 𝐵 𝑥, 𝑦 𝑑𝑥𝑑𝑦
where Ω – the image in the Cartesian coordinate system 𝑥, 𝑦 ;
𝐵 𝑥, 𝑦 – the value of the intensity function at the point 𝑥, 𝑦 .
Slide 19 of 103
Description of areas
Typical feature set for areas
• position of the center of gravity of the area of a binary image:
σΩ 𝑥 σΩ 𝑦
𝑥𝑐 = , 𝑦𝑐 =
𝑆 𝑆
• for a grayscale image:
σΩ 𝑥 𝐵 𝑥, 𝑦 σΩ 𝑦 𝐵 𝑥, 𝑦
𝑥𝑐 = , 𝑦𝑐 =
σΩ 𝐵 𝑥, 𝑦 σΩ 𝐵 𝑥, 𝑦
• the perimeter of the area is equal to the sum of the modules of the elementary vectors of
the contour connecting two neighboring elements (by 8-connectivity);
𝑁1 𝑁
𝑃= 𝑃1 + 2 𝑃2
𝑘=1 𝑘=𝑁1+1
where 𝑃1 and 𝑃2 - are elementary vectors oriented along the grid and at an angle of 45°, respectively.
• the ratio of the square of the perimeter to the area of the image;
Slide 20 of 103
Description of areas
Typical feature set for areas
• format feature – the ratio of the sides of the circumscribed rectangle
To calculate the value of the F (format) feature, a scattering matrix is built from the contour points of
the image: 𝑆20 𝑆11
𝐸= ,
𝑆11 𝑆02
𝑆𝑝𝑞 = 𝑥 − 𝑥𝑐 𝑝 𝑦 − 𝑦𝑐 𝑞
𝑥,𝑦 ∈𝐷Ω
and the eigenvalues of the scattering matrix are considered:
• To determine the orientation, the eigenvectors of the scattering matrix are found:
𝑆20 − 𝜆1 𝑆11 𝑥
𝑆11 𝑆02 − 𝜆2 𝑦 =0
Slide 22 of 103
Description of areas
Typical feature set for areas
• The magnitude of the projection of the contour point of the image 𝑥, 𝑦 onto
one of the eigenvectors (for example, 𝑥1 , 𝑦1 , corresponding to the eigenvalue
𝜆1 ) is determined by the formula:
𝑦 𝑦1
𝑅= 𝑥 2 + 𝑦 2 sin arctg − arctg
𝑥 𝑥1
Substituting the values of the eigenvectors, we get:
𝜆1 −𝑆20 1 𝜆2 −𝑆02 1
𝑅1 = 𝑦 − 𝑆11
𝑥 , 𝑅2 = 𝑦 − 𝑆11
𝑥 ,
𝑥12 +𝑦12 𝑥22 +𝑦22
where 𝑅1 and 𝑅2 - are the sides of the circumscribed rectangle oriented along the eigenvectors (the
projection of the image onto the eigenvectors).
Slide 23 of 103
Description of areas
Typical feature set for areas
• the perimeter and area of the circumscribed minimum area rectangle;
• the ratio of the area of the circumscribed rectangle to the area of the image;
• the ratio of the square of the perimeter of the circumscribed rectangle to its area;
• the format of the circumscribed rectangle;
𝑇1
𝐹1 =
𝑇2
where 𝑇1 and 𝑇2 – are the sides of the circumscribed rectangle.
• the relative width and height of the image.
𝑃 𝑃
𝑃3 = ,𝑃 =
𝑇1 4 𝑇2
Slide 24 of 103
Description of areas
Typical feature set for areas
3. Moments
𝑚𝛼𝛽 = Ω 𝐵 𝑥, 𝑦 𝑥 𝛼 𝑦 𝛽 𝑑𝑥𝑑𝑦
For a discrete case:
𝑚𝑝𝑞 = 𝑥 𝑝 𝑦 𝑞 𝐵 𝑥, 𝑦
𝑥,𝑦 ∈Ω
Slide 27 of 103
Description of special points
Simple Neighborhood Approach
Disadvantages:
• The point detector is rotation invariant, but the neighborhood is not;
• Small shifts, i.e., errors in finding a point make pixel-by-pixel comparison impossible.
Slide 28 of 103
Description of special points
SIFT Descriptor (Scale-Invariant Feature Transform)
• This descriptor is used when using a DoG detector to determine the position and scale of a
feature and is resistant to light changes and small shifts.
• The search for the orientation of a singular point is based on the idea of finding the main direction
of pixel gradients in the vicinity of a point. Algorithm:
1. Calculate the histogram. Each point contribution is weighted with Gaussian centered at the
singular point:
2. Rotate the fragment so that the dominant direction of the gradient is directed upwards:
3. If there are several local maxima, we assume that there are several points with different
orientations.
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.David G. Lowe. "Distinctive image features from scale-
invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide 29 of 103
Description of special points
SIFT Descriptor (Scale-Invariant Feature Transform)
For each feature found, we now know the characteristic scale and orientation.
• Let's select the corresponding rectangular neighborhood (Rotation Invariant Frame)
• Bring the neighborhood to a standard size (scale).
Neighborhood features
Slide 30 of 103
Description of special points
SIFT Descriptor (Scale-Invariant Feature Transform)
Slide 31 of 103
Description of special points
SIFT Descriptor (Scale-Invariant Feature Transform)
Algorithm for constructing a SIFT descriptor:
1. Calculate the direction of the gradient in each pixel;
2. Quantize the orientation of the gradients by 8 cells (directions);
• Tagging each pixel with a cell number;
3. Calculate the histogram of the directions of the gradients;
• For each cell, calculate the number of pixels with the same gradient direction;
• Each point contribution is evaluated with Gaussian, centered at the center of the
neighborhood.
𝐷 ℎ1 , ℎ2 = min ℎ1 𝑖 , ℎ2 𝑖
𝑖=1
• Chi-square distance 𝜒 2 :
𝑁 2
ℎ1 𝑖 − ℎ2 𝑖
𝐷 ℎ1 , ℎ2 =
ℎ1 𝑖 + ℎ2 𝑖
𝑖=1
Slide 34 of 103
Description of special points
Various SIFT modifications are used for color images.
RGB-SIFT
Implies 3 SIFT descriptors for each channel
С-SIFT 𝑅−𝐺
2
Uses channels 𝑂1 and 𝑂2 : 𝑂1 𝑅 + 𝐺 − 2𝐵
𝑂2 =
𝑂3 6
𝑅+𝐺+𝐵
3
rgSIFT 𝑅
Uses channels 𝑟 and 𝑔: 𝑟 𝑅+𝐺+𝐵
𝐺
𝑔 =
𝑏 𝑅+𝐺+𝐵
𝐵
𝑅+𝐺+𝐵
Koen E. A. van de Sande, Theo Gevers and Cees G. M. Snoek, Evaluating Color Descriptors for Object and Scene Recognition, IEEE PAMI, 2010 Slide 35 of 103
Description of special points
SIFT Descriptor (Scale-Invariant Feature Transform)
Advantages:
1. The SIFT descriptor is specific, resistant to changes in lighting, small shifts.
2. SIFT (detector, neighborhood selection, descriptor) schema is a very efficient tool for
image analysis.
3. Has become widespread.
Slide 36 of 103
Description of special points
PCA-SIFT descriptor (Principal Components Analysis-SIFT)
• For each feature point, a 41 × 41 neighborhood is considered.
• This gives neighborhood gradient vectors containing 2 × 39 × 39 = 3042 elements.
• The vectors are reduced to 32 elements by means of principal component analysis (PCA).
Y. Ke and R. Sukthankar, PCA-SIFT: A More Distinctive Representation for Local Image Descriptors CVPR, 2004. Slide 37 of 103
Description of special points
GLOH descriptor (Gradient location-orientation histogram)
• A polar grid is used for dividing the neighborhood into bins: 3 radial blocks with
radii of 6, 11 and 15 pixels and 8 sectors.
• The result is a vector containing 272 components, which is projected into a space
of dimension 128 using principal component analysis (PCA).
K. Mikolajczyk and C. Schmid,“A performance evaluation of local descriptors ,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, pp.
1615-1630, Oct. 2005 Slide 38 of 103
Description of special points
DAISY descriptor
• Works on a dense set of pixels throughout the image.
• Runs 66 times faster than SIFT running on dense pixel counts.
• The ideas of constructing SIFT and GLOH descriptors are used.
• Similarly with GLOH, a circular neighborhood of the singular point is selected, with the
bins being represented not by partial sectors, but by circles.
• For each bin, the same actions are performed as in the SIFT
algorithm, but the weighted sum of the gradient magnitudes is
replaced by the convolution of the original image with the
derivatives of the Gaussian filter taken in 8 directions.
• The constructed descriptor has invariance, while solving the
matching problem in the case when all pixels are considered
special and requires less computational costs.
Tola, Engin, Vincent Lepetit, and Pascal Fua. "A fast local descriptor for dense matching." 2008 IEEE conference on computer vision and pattern recognition.
IEEE, 2008. Slide 39 of 103
Description of special points
BRIEF descriptor (Binary Robust Independent Elementary Features)
Scheme for constructing feature vectors :
1. The image is split into patches (separate overlapping areas). Let's say patch 𝑃 has dimensions
𝑆 × 𝑆 pixels.
2. A set of pairs of pixels {(𝑋, 𝑌), for ∀𝑋, 𝑌 in the neighborhood, 𝑋, 𝑌 = 𝑢, 𝑣 } is selected from the
patch for which a set of binary tests is constructed :
1, 𝐼 𝑋 < 𝐼 𝑌
𝜏 𝑃, 𝑋, 𝑌 = ቊ
0, 𝑒𝑙𝑠𝑒
where 𝐼 𝑋 is the intensity of the pixel 𝑋.
3. For each patch, a set is selected containing 𝑛𝑑 pairs of points 𝑋𝑖 , 𝑌𝑖 that uniquely define a set of
binary tests.
4. Based on these tests, a binary string is built:
𝑛𝑑
𝑓𝑛𝑑 𝑃 = 2𝑖−1 𝜏 𝑃, 𝑋𝑖 , 𝑌𝑖
𝑖=1
Calonder, Michael, et al. "BRIEF: Computing a local binary descriptor very fast." IEEE transactions on pattern analysis and machine intelligence 34.7 (2011): 1281-1298. Slide 40 of 103
Description of special points
BRIEF descriptor (Binary Robust Independent Elementary Features)
• Provides recognition of the same areas of the image that were shot from different
points of view.
• The recognition algorithm is reduced to the construction of a random forest or a naive
Bayesian classifier on a certain training set of images and the subsequent classification
of areas of test images.
• In a simplified version, the nearest neighbor method can be used to find the most
similar patch in the training set.
• A small number of operations is provided due to the representation of the feature
vector in the form of a binary string, and, as a consequence, the use of the Hamming
metric as a measure of similarity.
Calonder, Michael, et al. "BRIEF: Computing a local binary descriptor very fast." IEEE transactions on pattern analysis and machine intelligence 34.7 (2011): 1281-1298. Slide 41 of 103
Neighborhood normalization
Perspective distortion
Different fragments fall into the circular neighborhood -in the left image,
half of the letter G is inside the circle, in the right it almost did not hit.
Slide 42 of 103
Neighborhood normalization
Perspective distortion
It is necessary to find the appropriate neighborhoods, and describe them with an ellipse,
taking affine transformations into account.
Normalizing ellipsoids
Slide 44 of 103
Neighborhood normalization
Affine adaptation
• The matrix 𝑀 can be represented as an ellipse, in which the lengths of the axes are
determined by the eigenvalues, and the orientation is determined by the matrix 𝑅.
• The main problem is that we count the matrix 𝑀 by a round (square) neighborhood. In
different images, the content will not match, and we will not be able to select the
same areas (ellipses).
𝑢
𝐸 𝑢, 𝑣 ≈ 𝑢 𝑣𝐌 ,
𝑣
𝐼𝑥2 𝐼𝑥 𝐼𝑦 −1 𝜆1 0
𝐌 = σ𝑥,𝑦 𝑤 𝑥, 𝑦 = 𝑅 R
𝐼𝑥 𝐼𝑦 𝐼𝑦2 0 𝜆2
Slide 45 of 103
Neighborhood normalization
Affine adaptation
Solution: iterative neighborhood adaptation.
In the case of affine distortions, the problem is that the matrix of the second moments
determined by the weights 𝑤 (𝑥, 𝑦) should be calculated from the characteristic shape of
the region.
The iterative refinement algorithm is as follows:
1. Calculation of the matrix of moments using a round window.
2. Application of affine adaptation to obtain an elliptical window.
3. Recalculation of the moment matrix along the normalized
neighborhood. Go to step 1.
Slide 46 of 103
Neighborhood normalization
Affine adaptation
Slide 47 of 103
Neighborhood normalization
Normalization
Normalize the neighborhood by converting the ellipses to circles of unit radius. Moreover,
the ellipse of the second moments can be considered the "characteristic shape" of the
region.
You can rotate and mirror a unit circle and it will remain a unit circle. This property can be
used to find the desired orientation of the neighborhood. After normalization, calculate
the dominant gradient and rotate the neighborhood.
Slide 48 of 103
Neighborhood normalization
Normalization
Slide 49 of 103
Points matching
Comparison
We have a set of selected singular points and their descriptors.
How to match the same points in different images?
Slide 50 of 103
Points matching
Comparison
To match the points, it is necessary to generate candidate pairs: for each patch in one
image, we find several patches most similar in terms of the selected metric in another
image.
Methods for selecting pairs of candidate points:
1.Full search:
• For each feature, we calculate the distances to all features of the second image and take the
best one.
2.Accelerated Approximate Measures:
• Hierarchical structures (kd-trees, vocabulary trees).
• Hashing.
Slide 51 of 103
Description of objects
Geometric models of structures
Local features belong to specific geometric structures:
1. the corners of the windows lie on straight lines;
2. the edges of the windows lie on straight lines.
Based on these structures, their geometric models can be calculated.
Slide 53 of 103
Description of objects
Parametric curves
𝐹 𝑥, 𝑎 = 0 – parametric model,
where 𝑎 – the parameters of the model,
𝑥 – a vector corresponding to some points in space,
𝑋 = 𝑥𝑖 – a set of vectors corresponding to points in space.
Straight line: 𝐹 𝑥, 𝑎 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 = 0.
Circle: 𝐹 𝑥, 𝑎 = 𝑥1 − 𝑎1 2 + 𝑥2 − 𝑎2 2 − 𝑎3 = 0.
Conica: 𝐹 𝑥, 𝑎 = 𝑎1 𝑥12 + 𝑎2 𝑥1 𝑥2 + 𝑎3 𝑥22 + 𝑎4 𝑥1 + 𝑎5 𝑥2 + 𝑎6 = 0.
(conical section of a plane with a circular cone or a curve of the second order)
Slide 54 of 103
Description of objects
Parametric curves
In the case of a two-dimensional plane 𝑥1 = 𝑥, 𝑥2 = 𝑦.
Slide 56 of 103
Direct linear transformation
Direct linear transformation (DLT)
Problem: a set of points with coordinates 𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1, 𝑛 is given on a two-dimensional
image. It is necessary to find the line that best approximates them.
Solution:
• using the least squares method, calculate a straight line with the smallest possible sum
of squares of distances from points to a straight line;
• probabilistic formulation: search for the maximum likelihood by distance:
𝑙መ = arg max 𝑃 𝑥𝑖 , 𝑦𝑖 |𝑙 .
𝑙
Slide 57 of 103
Direct linear transformation
Model of a straight line with noisy Gaussian noise in points perpendicular to the line:
𝑥 𝑢 𝑎
𝑦 = + 𝜀 ,
𝑣 𝑏
where the vector 𝑢 𝑣 T – a point on the line,
𝜀 – normally distributed Gaussian noise with zero mathematical expectation and standard
deviation 𝜎,
𝑎 𝑏 T – the normal vector.
Slide 58 of 103
Direct linear transformation
Probabilistic approach
• It is necessary to find the points with the maximum likelihood of being on the line
(maximum likelihood).
Slide 59 of 103
Direct linear transformation
Geometrical approach
• Since the distance from the point 𝑥𝑖 , 𝑦𝑖 to the line along the normal is equal
to 𝑎𝑥 + 𝑏𝑦 − 𝑑 , it is necessary to find such parameters 𝑎, 𝑏, 𝑑 that minimize the
function 𝐸:
𝑛
𝐸= 𝑎𝑥𝑖 + 𝑏𝑦𝑖 − 𝑑 2
𝑖=1
Slide 60 of 103
Direct linear transformation
We differentiate the function 𝐸 with respect to 𝑑 and equate to zero:
𝜕𝐸
= σ𝑛𝑖=1 −2 𝑎𝑥𝑖 + 𝑏𝑦𝑖 − 𝑑 = 0,
𝜕𝑑
and express 𝑑:
𝑛 𝑛
𝑎 𝑏
𝑑 = 𝑥𝑖 + 𝑥𝑖 = 𝑎𝑥ҧ + 𝑏𝑦ത
𝑛 𝑛
𝑖=1 𝑖=1
Substitute the resulting expression into the function 𝐸:
𝑛 2
𝐸= 𝑎 𝑥𝑖 − 𝑥ҧ + 𝑏 𝑦𝑖 − 𝑦ത =
𝑖=1
2
𝑥1 − 𝑥ҧ 𝑦1 − 𝑦ത
𝑎 T
= … … = 𝐴𝑁 𝐴𝑁 .
𝑏
𝑥𝑛 − 𝑥ҧ 𝑦𝑛 − 𝑦ത
Slide 61 of 103
Direct linear transformation
T
Differentiate 𝐴𝑁 𝐴𝑁 by 𝑁:
𝑑𝐸
= 2 𝐴T 𝐴 𝑁 = 0.
𝑑𝑁
It can be seen that the solution to this matrix equation is the eigenvector 𝐴T 𝐴 ,
corresponding to the minimum eigenvalue provided that 𝑁 2 = 1.
The expression 𝐴T 𝐴 allows you to find the singular values of the matrix 𝐴.
Slide 62 of 103
Direct linear transformation
SVD procedure
To simplify the search for singular numbers, consider the Singular Value Decomposition
(SVD).
Slide 63 of 103
Direct linear transformation
SVD procedure
We use this decomposition to calculate the least squares. Let the equation be given:
𝐴𝑝 = 0,
where the norm of the vector 𝑝: 𝑝 = 1.
To find the minimum singular number, it is necessary to minimize the norm: 𝑈𝐷𝑉 T 𝑝 .
Slide 64 of 103
Direct linear transformation
SVD procedure
If 𝑉 T 𝑝 = 1, then it is necessary to minimize:
𝐷𝑉 T 𝑝 .
We denote 𝑦 = 𝑉 T 𝑝, then it is necessary to minimize:
𝐷𝑦 , if 𝑦 = 1,
and in the diagonal matrix 𝐷 the columns are ordered in descending order.
Slide 65 of 103
Direct linear transformation
We use the OLS (Ordinary Least Squares) and SVD (Singular Value Decomposition)
procedure to construct lines.
Slide 66 of 103
Direct linear transformation
DLT problems:
• Often, some of the points obtained are not generated by the model 𝐹 𝑥, 𝑎 .
• In such a situation, when estimating by the least squares method, the result can be
arbitrarily far from the true one.
• For example, we have a set of pixels selected by the threshold and build a straight line
based on them:
Slide 67 of 103
M-estimators
To reduce the influence of distant points, we parameterize the points in polar
coordinates:
𝑥 cos 𝜃 + 𝑦 sin 𝜃 = 𝑅,
then the objective function will take the form:
𝜃, 𝑅 = arg min 𝑥𝑖 cos 𝜃 + 𝑦𝑖 sin 𝜃 − 𝑅 2.
𝜃,𝑅
𝑖
We denote 𝜀𝑖 = 𝑥𝑖 cos 𝜃 + 𝑦𝑖 sin 𝜃 − 𝑅, and modify the objective function:
𝜃, 𝑅 = arg min σ𝑖 𝜌 𝜀𝑖 ,
𝜃,𝑅
where in the case 𝜌 𝜀 = 𝜀 2 we obtain the least squares method.
Slide 68 of 103
M-estimators
The following function is usually minimized:
σ𝑖 𝜌 𝑟𝑖 𝑥𝑖 , 𝜃 , 𝜎 ,
where 𝑟𝑖 𝑥𝑖 , 𝜃 – the residual of the 𝑖 point, subject to the model parameters 𝜃,
𝜌 – a robust function with scale 𝜎.
The robust function 𝜌 behaves like the square of the distance for
small values of 𝑢 and flattens out as the value of increases.
Slide 69 of 103
M-estimators
The following functions are used as the most frequently used variants of the robust
function 𝜌:
1. Tukey's function:
𝐾2 𝜀 2 3
1− 1− , if 𝜀 ≤ 𝐾
6 𝐾
𝜌 𝜀 =
𝐾2
, if 𝜀 > 𝐾
6
2. Cauchy function:
𝑐2 𝜀 2
𝜌 𝜀 = log 1+ ,
2 𝑐
where 𝐾 and 𝑐 – tuning constants.
Slide 70 of 103
M-estimators
On the left is the Tukey function for 𝐾 = 4.685, on the right is the Cauchy
function for 𝑐 = 2.385.
Methods:
1. Nonlinear optimization methods;
2. Weighted Least Squares;
3. Iteratively overweighted least squares method.
Slide 72 of 103
Weighted Least Squares
Using the example of searching for straight lines:
𝑎, 𝑏, 𝑑 = arg min σ𝑖 𝑤𝑖 𝑎𝑥𝑖 + 𝑏𝑦𝑖 + 𝑑 2 ,
𝑎,𝑏 : 𝑎 +𝑏2 =1
2
σ𝑖 𝑤𝑖 = 1,
where 𝑤𝑖 – the weight of each point.
Slide 73 of 103
Weighted Least Squares
In a covariance matrix:
• the maximum eigenvector of the matrix 𝐶𝑜𝑣 specifies the direction of the straight line,
• the minimum eigenvector – the direction of the normal 𝑎, 𝑏 .
Slide 74 of 103
Iteratively overweighted least squares
1. Get the initial approximation of the model by the least squares method:
Θ 0 = 𝜌 0 ,𝜃 0 .
𝑡 𝑡 𝑡−1
𝜎 = 1,4826 median𝑖 𝑟𝑖 𝑥𝑖 , Θ
Slide 75 of 103
Iteratively overweighted least squares
𝑡
4. Calculate the weights of points 𝑤𝑖 taking into account the function 𝜌,
a. in general:
𝜀
𝜌′ 𝑖
𝜎
𝑤𝑖 = 𝜀𝑖 ,
𝜎
b. in the case of the Tukey function:
𝜀 2 𝜀
𝜀 1− , if ≤ 𝐾,
𝑤 = 𝜎∙𝐾 𝜎
𝜎 𝜀
0, if > 𝐾,
𝜎
c. in the case of the Cauchy function:
𝜀 1
𝑤 = 2
𝜎 𝜀
1+ 𝑐∙𝜎
Slide 76 of 103
Iteratively overweighted least squares
5. Using weighted least squares get Θ 𝑡 .
6. If the desired tolerance is not achieved then go to step 3,
Θ 𝑡 −Θ 𝑡−1 > 𝜀∗
where 𝜀 ∗ — the maximum desired deviation.
Slide 77 of 103
Iteratively overweighted least squares
Method results
Slide 78 of 103
M-estimators
Disadvantages:
Slide 79 of 103
Random Sample Consensus
Random Sample Consensus - RANSAC
The idea: to estimate not all data, but only a small sample that does not contain outliers.
• Since it is not known in advance which points are outliers and which are not, it is
possible to construct many samples at once in a random way.
• Then, for each of the samples, we build a hypothesis. After that, we choose such a
hypothesis from among all that best fits with all the data.
Slide 80 of 103
Random Sample Consensus
The main problem: the number of such samples is huge, so it is necessary to build
hypotheses on the minimum sample size.
• For example, when inscribing a straight line into a set of points on a plane, this method
takes as a basis only two points necessary to construct a straight line and use them to
build a model.
• After that, it is checked how many points correspond to the model using an estimation
function with a given threshold.
Slide 81 of 103
Random Sample Consensus
Example
Slide 82 of 103
Random Sample Consensus
Example
Two minimal samples (two points each) with cutoff along the proposed line.
On the left image, 11 points fell into the area, on the right - 4.
Slide 83 of 103
Random Sample Consensus
Example
The left sample more adequately describes the straight line (received
more "votes"), and accordingly is the right decision.
Slide 84 of 103
Random Sample Consensus
Basic scheme of the RANSAC algorithm is a loop with N iterations:
1. Build a sample 𝑆 ⊂ 𝑋 𝑥𝑖 ∈ 𝑋 . Typically, the sample size is the smallest possible size which is
enough for the model parameters estimation.
2. Put forward a hypothesis Θ on a sample 𝑆.
3. Evaluate the degree of agreement between hypothesis Θ and set of input data 𝑋. Each point is
labeled as "outlier" or “inlier".
4. After checking all points, it is checked whether the hypothesis is currently “the best” one or not
by comparing to with previous “best” one. If yes, then it replaces the previous “best” hypothesis.
At the end of the cycle, the last best hypothesis is left, from which it is possible to
determine the parameters of the model, as well as the points marked as "outliers" and
“inliers“.
Slide 85 of 103
Random Sample Consensus
• To obtain a model built without outliers with a given probability 𝑝, the number of
iterations N of the cycle can be calculated, if it is possible to indicate a given fraction of
“outliers” e.
• The number of samples N is chosen so that the probability of choosing at least one
sample without outliers would not be lower than the given one (for example, 0.99).
Thus:
𝑝 =1− 1− 1−𝑒 𝑠 𝑁⇒
log 1−𝑝
𝑁 = log 1− 1−𝑒 𝑠 ,
2 2 1, if 𝜀𝑖 ≤ 𝑇
𝑅 Θ = σ𝑖 𝑝 𝜀𝑖 Θ , 𝑝 𝜀𝑖 Θ =ቊ , 𝑖 = 1, 𝑛
0, if 𝜀𝑖 > 𝑇
where 𝜀𝑖 Θ – the residual of the 𝑖-th point and the estimated hypothesis;
𝑝 – probability (1 – “inlier”, 0 – “outlier”);
𝑇 – the threshold chosen on the basis that the value of the probability of “inlier” is 𝑝 ≈ 0,95.
Typically, a Gaussian noise model with zero mathematical expectation is used such that 𝑇 2 = 3,84𝜎 2 .
Slide 87 of 103
Random Sample Consensus
The number of samples grows rapidly with the growth of the sample size and the
proportion of outliers.
Dependence of the number of samples on the sample size and the proportion of "outliers"
How to minimize the number of samples if the fraction of ouliers is not known in
advance?
Slide 88 of 103
Random Sample Consensus
You can start the algorithm with a rough estimate, say 50%, and then refine the number
of samples sequentially.
RANSAC Adaptive Algorithm Completion:
Slide 90 of 103
Random Sample Consensus
Threshold selection problem
Slide 91 of 103
Random Sample Consensus
Threshold selection problem
Slide 92 of 103
Random Sample Consensus
To assess the degree of agreement between hypotheses, consider the hypothesis
evaluation function:
3. M-SAC (M-estimator Sample Consensus)
2 2 𝜀𝑖2 , if 𝜀𝑖 ≤ 𝑇
𝑅 Θ = σ𝑖 𝑝 𝜀𝑖 Θ , 𝑝 𝜀𝑖 Θ =൝ 2 , 𝑖 = 1, 𝑛
𝑇 , if 𝜀𝑖 > 𝑇
which is similar to the RANSAC function except for the modification of the probability
function.
This method gives a more accurate estimate without increasing computational complexity
and guarantees the correct solution.
Slide 93 of 103
Random Sample Consensus
An example of using RANSAC: matching the same feature points.
When matching singular points by descriptors, quite a lot of false pairs will be detected.
Slide 94 of 103
Random Sample Consensus
Matching algorithm with using RANSAC:
Slide 95 of 103
Random Sample Consensus
Slide 96 of 103
Random Sample Consensus
Use case: building panoramas from a set of photos.
Slide 97 of 103
Random Sample Consensus
Use case: building a panorama from an unordered set of photos, so it is necessary to
determine which of them belong to one image and which to another.
Slide 98 of 103
Random Sample Consensus
RANSAC advantages:
Slide 99 of 103
Random Sample Consensus
RANSAC disadvantages:
Andrei Zhdanov
[email protected]