Image Feature Extraction
Image Feature Extraction
Abstract - A feature-based technique is presented for 3D mo- ter estimation) [2], [4] and recursive (Kalman or nonlinear)
tion and structure estimation of rigid objects from a sequence of filtering [1], [5], [4] [6], [7], [8] frameworks. All of above
images. Considered is a feature-based rigid object representation cited works, however, assume perfect feature extraction and
which is more complete as compared to previous techniques. In matching and deal with the estimation issues only. In reality
this approach the features provide information regarding the lo- the feature extraction and matching are crucial for the per-
cation and type of significant object components and connections. formance of an overall estimation algorithm. While abun-
A robust enhanced Hough transform (HT) approach is proposed
for line and junction detection. Based on a perspective projection dant publications on feature extraction (e.g., [9], [10], [11])
imaging model the extracted feature points of an object are used and feature matching [12] exist in the image processing lit-
for estimating its 3D motion and structure parameters by means of erature it is rarely the case they are considered in an inte-
the unscented Kalman filter (UKF). The performances of the HT- grated dynamic feature detection-tracking framework. Ap-
basedfeature extractor and the UKF are illustrated via simulation parently, a great potential for enhancement of the feature
results. extraction exists, as compared to a "static" (single frame)
Keywords: Tracking, 3D Motion Estimation, Feature Extraction, feature extraction, if the valuable information from the state
Hough Transform estimator (predictor) is used. The same argument is valid
for the feature matching, as well. The ultimate aim of our
1 Introduction work is development of an overall 3D object tracker based
on row 2D image data which incorporates in an integrated
This paper is concemed with estimating the 3D motion and manner all processing stages-feature extraction, feature as-
structure of a rigid object from a sequence of 2D monocu- sociation (hard or "soft" matching), nonlinear state estima-
lar images. In contrast to the customary point-target track- tion.
ing, where the motion of some target "center" is of inter-
est', this problem involves estimating the rotational mo-
tion of the object body as well as some of its structure para- This paper is focused on the feature extraction. Consid-
maters2. Within the standard state-space estimation frame- ered is a feature-based rigid object representation which is
work, two main approaches have been applied to this prob- more complete as compared to previous techniques. In this
lem - feature-based [1], [2] and optical flow-based [3]. approach the features provide information regarding the lo-
The optical-flow based approach relies on measuring (or cation and type of significant object components and con-
computing) the velocity of the points in the image plane. nections. A robust enhanced Hough transform (HT) ap-
In this work we pursue the feature-based approach. The proach is proposed for line and junction detection. Based
target is considered as a set of features which satisfy the on a perspective projection imaging model the extracted
rigid body assumption - all feature points do not change feature points of an object are used for estimating its 3D
their relative positions. The 2D features are obtained from motion and structure parameters by means of the unscented
raw 2D images by a signal processing algorithm referred Kalman filter (UKF) [13, 14]. The performances of the HT-
to as feature extractor. Feature correspondences (feature based feature extractor and the UKF are illustrated via sim-
matching) within multiple-frames need to be established in ulation results.
order to estimate the 3D object motion, based on assumed
3D motion and 2D feature "measurement" models. The lat- The rest of the paper is organized as follows. Section 2
ter part of the problem-estimating the object state (motion provides problem formulation, including the models used.
and structure parameters) based on feature-point measure- An outline of the overall tracking algorithm is given in Sec-
ments - has been well developed in both batch (parame- tion 3. Section 4 provides a description of the proposed en-
*Research supported by ARO grant W91 lNF-04-1-0274, and hanced feature detection via a modified Hough transform.
NASA/LEQSF grant (2001-4)-Ol. Section 5 presents simulation results of feature extraction
'i.e., the target is considered as a mass-point. and 3D object structure estimation and tracking by UKF.
2e.g., the coordinates of some feature points. Section 6 provides conclusion and further work directions
0-7803-9286-8105/$20.00 ©2005 IEEE 1 569
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
2 Problem Formulation is fixed with the rigid body and C = C(t) is the rotational
center of the rigid body motion at time t (Fig. 1). C is
2.1 2D Feature Observation Model referred to as a body-frame4. Note that the coordinates p(i)
The image acquisition is modeled by the known perspective of the feature points in the body frame are time invariant
projection imaging model of a pin-hole camera [1, 2, 3]. due to the rigidity assumption.
The object is represented though a set of feature points p(i) The motion of any feature point p(') in I is then given
of a rigid body with IDs i = 1, 2,..., N. by
Let X and Y denote the axes of the 2D image coordinate
system on the image plane of a static camera (Fig. 1). An
inertial coordinate system I = 0.VZ is defined with respect p(l)(t) pC(t) + R(t)p(')
=
(3)
the camera such that its z-axis is pointing along the optical
axis of the camera, and x and y axes are parallel to X and
where Pc = (XC, Yc, ZC)' are the inertial coordinates of vec-
Y, respectively. If p(') = (x(i) y(i) z())' denotes the po-
sition vector of the ith feature point p(i) at time k in I then tor OC and R(t) is the 3 x 3 coordinate transformation
matrix which aligns C to I. pc(t) describes the motion
the image plane position measurements X(1) and Y() are of the rotation center C (referred to as translational motion)
while R(t) describes the rotational motion of the rigid body
XU) = xk'
Zk
+ nx(') (1) around C. The rotational matrix R(t) can be conveniently
described as an explicit function R(t) = R(q(t)) (see [1])
of the unit quaternion q(t) = (ql(t),q2(t),q3(t),q4(t))'
y(i) - Oyk(i) +n(3)
k = +nU 2
(2) which propagates in time according to the equation
Zk
where j = 1, 2,.. ., M denotes the measurement ID, 4 is
q = Q(w(t))q(t), Q(w)
the camera focal length3, and nx(j), nyy(j) denote the obser-
vation errors. The observation errors result from the imag- [0 Wz -WY wx 1
ing process and accuracy of the feature extraction, and are =1 -W. 0 Wz WY
2 WY -Wx 0 Oz
W
(4)
assumed to be random, zero-mean, white Gaussian noise.
-
-W -Wy -Wz 0O
Z (d
z
/' V
2.3 State-space model
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
The state function f of the target model (5) is given by directly connected to all peripheral points, while a periph-
eral point is connected directly to the center within a con-
1l 43 - 165 sidered junction (Fig. 2). The number of peripheral points
st2 44 -2
425
3
is characteristic for a junction and is called its rank. The
14
~425
-655 representation of the object structure through junctions is
~55 advantageous in two main aspects. Firstly, the estimating a
46 7 -11S8 + S1lO9)
1(-12 junction center by the FE algorithm can be done much more
47 2(-i1246 + S108 + 41149) accurate as compared, e.g., to estimating the endpoints and
48 2(411 6 - 6107 + 41259) the center of a segment [18]. Secondly, the junction repre-
2( 410t6 -S17 + 41248) sentation greatly facilitates the feature matching by exploit-
410 0 ing the information about its rank and structure.
f(V
=f
611 0
t12 0
72
413 -1O95
414 -41045
415
17i
17n-I
410+3N -410+3N(5
411 +3N -611+3N(5 J =
0o1711 r .n)
l7n
51 2+3N -1 2+3N(5
(7)
where Fig. 2: Junction of rank n.
A C YC p) p(2)' p(N)]
[ZC ZC ZC ZC ZC Feature Matching. In principle, matching the observed
C~
feature points to the predicted can be done just "pointwise"
is the normalized state vector. The normalization with zc(t) by using well established techniques for data association
is needed to compensate the unknown scale factor inherent in the multiple target tracking of "point" targets, such as
to the projection imaging model.
joint probability data association (JPDA), multiple hypoth-
The measurement function h of the measurement model esis tracking (MHT) [19], or 2-D assignment algorithms
(6) is [3] [20]. It is however highly beneficial to perform the asso-
ciation in terms of junctions rather than in terms of single
h(E [h (1) (),h(2) (,' h(m) )] feature points. In this work we implement a simple nearest
where neighbor (NN) matching logic by using a distance between
junctions defined as follows
1 + aj(
h U)W
W]
) [ 2+:w( i11 21.. Im (8) n
with
6 (J, J) = 11970 -t7-1 + ( mi~n i ||7j-iJ
j (9)
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
the one that corresponds to A[p(m), 0(m)]. Define all edge
pixels with a distance less than d(m) from line l(m) as p(m)
j = 1, ..., M. These M edge pixels are considered as being
Featnre D eEtchn part of line (m) . The next step is to remove the contribution
of these points from the histogram A[p(q), 0(q)] in order to
eliminate their false contribution to other lines. Practically,
Feain M at±hing not all p1) pixels are considered part of a true line seg-
ment. Several of those edge pixels may be due to noise,
or simply parts of other edge components that happen to
State E stin atbm I be located around line l(m). The actual line component
I that defines part of a junction should in general be a con-
tinuous line segment. Therefore, continuous line segments
should be identified from the set of pixels pi(m). This can be
achieved by applying a two-dimensional rotation transfor-
mation on the coordinates of pm), namely, x(m) = [X,(m),
Fig. 3: Tracking Algorithm Outline Y(m)]T, using -om) as the angle of rotation. The new co-
ordinates obtained are:
4 Junction Detection via a Modified Hough x(r) = R(-O(-))(X(-))T (12)
Transform where R(_O(q)) is the rotation matrix defined as
The Hough transform (FIT) is one of the most effective ap-
proaches used for line detection [9], [10]. Since any two or ) (
R(R(0) cos 0
sin 0
-csin 0
cos 0 ) (13)
more lines having a common point in the image I(X, Y)
define a junction, line detection may be the basic compo- This transformation rotates line 1(m) into a horizontal
nent of a junction detection procedure. Therefore, in order line. Therefore, the line's continuity can be determined by
to detect junctions, a Hough based approach may be em- simply examining the continuity of the horizontal Xj(r)
ployed [11]. i co-
ordinates. A threshold Tsep can be used to specify if the
Prior to HT, edge detection is performed resulting in N
edge pixels with coordinates (X3, Yj), j = 1,...,N . HT xi
separation between neighboring X(r) coordinates is signif-
icant. More specifically, considering the j0-th rotated point
uses the following parametric equation of a line:
with horizontal coordinate X(r),
3h a significant separation is
p = Xj cos 0 + Yj sin ° (10) defined by the following:
to build a two-dimensional histogram A[p(q), 9(q)] of the X(r)
j (X(r)
j. v X(r) +Ts,p),
jOse.
+ VjJtj
4o (14)
parameters p(q) and 9(q), which are quantized versions of Moreover, a threshold Tiong can be used to specify if a
p and 0 respectively. More specifically, the angle parame- segment is long enough to be considered a true line seg-
ter 0 varies assuming quantized values 9(q) in the interval ment. Considering that X( ) and X(ight are, respectively,
[0,1800] to obtain the corresponding p parameter. Parame- the segment's leftmost and rightmost points, the segment's
ter p is also quantized to p( ). In order to generate the his- length is considered significant if the following holds:
togram A[p(q), O(q)], this procedure is repeated for every
edge pixel. For each pair (p(q), 9(q)), the corresponding
histogram bin is incremented:
X(rht-X(r) > Tiong (15)
If multiple line segments are found, they can be consid-
A [p(q),O(q) = A[p(q),0(q)] + 1 (11) ered as different components. However, the contribution of
the pixels located on all these segments should be elimi-
If p(q) = p(), 0(q) = 0(l) correspond to a local maxi- nated from histogram A[p(q), O(q)]. For this purpose, equa-
mum in A[p(q), 9(q)], equation p(l) = X cos 0(l) +Y sin 0(') tion (10) is used to find the set of pairs (p(q), O(q)) that cor-
identifies a line in the image I(X, Y). respond to the line segment pixels similarly to the original
Although the application of HT is straightforward, its HT. Then, the histogram is updated as:
line detection success depends on the quality of the image.
Noisy images, and multiple or thick edges may result in A[p(q), 0(q)] = A[p(q), 0(q)] - 1 (16)
false local maximum points, while short line segments may Nevertheless, the extreme segment points may be ex-
not produce local maxima in A[p(q), O(q)]. cluded, since it is possible that they may belong to different
In this work, a robust HT-based approach is proposed to line segments of the same junction. Eliminating them may
identify only true line segments in images. In general, it reduce the chances of identifying those other segments.
is expected that the overall maximum in A[p(q), O(q)] found This procedure is repeated until all significant, based on
at p(q) = p(m), o(q) = 0(m), is more probable to corre- threshold Tlong, lines are found. Detected lines with similar
spond to a true line in the image space than other local p(q), O(q) parameters may be considered as a single line. A
maxima. Line 1(m) : p(m) = X cos Om) + Y sin om) is summary of the algorithmic steps is presented next:
1572
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
1. Identify significant edges in the image.
2. Apply the original HT to generate histogram
A [p(q), °(q)].
3. Identify the maximum peak in A[p(q), O(M)]. This cor-
responds to line 1(m) in the image space.
4. Draw line l(m) in the image and identify all pixels pi(i)
with a distance d(m) or less from the line.
5. Apply the two-dimensional rotation matrix on the co-
(i),
ordinates of points p3 namely x(m), to obtain new
points with coordinates x(r) located around a horizon-
tal line as in equation (12).
6. Examine the continuity of the horizontal coordinates
of the new points x(r) from left to right. An inter- Fig. 4: Several stages of a moving cube
val larger than or equal to Tsep in which there is no
second from the right, shows a tilted lower face. The two
point (m) with corresponding coordinate X(r) defines lowest edges of that face could be erroneously considered
a separation between two line segments. This is de- as one line segment. Increasing the Hough histogram reso-
scribed in equation (14). lution would possibly help in reducing this types of errors.
7. Examine the length of a line segment as in equation However, there would be a higher possibility of causing
(15). If it is less than a threshold Tiong this line seg- more errors due to lack of sufficient points per histogram
ment is considered too short to be a true line segment. bin, resulting in multiple local maxima, and thus unreliable
histograms. For instance, Fig. 7 presents the histogram
8. Remove the contribution of all pixels located on the A[p(q), O(q)] for one of the cube states. It can be observed
detected line segments from the Hough histogram that multiple local maxima exist, which make the selection
A[p(q), O(q)] as in (16). of significant lines considerably difficult. However, the pro-
posed method deals with one line at a time. Fig. 8 depicts
9. If the maximum peak in A[p(q), O(q)] is still more than the histogram after two lines are detected and the contribu-
a threshold, go back to step 4. Otherwise, stop. tion of the corresponding image pixels is removed. There-
It can be easily concluded that the number of opera-
fore, the third significant line in this histogram is clearer
compared to the original histogram of Fig. 7.
tions required for the proposed HT is approximately double
compared to the original HT. Existing techniques used for
speeding up the original Hough transform can be also used
here. However, this is beyond the scope of this work. Sim-
ulation results for the proposed Hough-based technique are
presented in the next section.
5 Simulation Results
5.1 Feature Extraction
In this section, simulation results are presented to illustrate
the effectiveness of the modified HT technique presented in Fig. 5: Cube after edge detection
Sect. 4. The method is evaluated on the simple example of
a moving cube. Several states of the cube, including various Another example of line segment detection is shown in
rotations, are shown in Fig. 4. Fig. 9. In this example, one of the line segments is shorter
Fig. 5 shows one of the cube states after edge detec- compared to rest, and it's presence is not obvious in the
tion. The pixel coordinates found on those edges will con- traditional Hough histogram. On the other hand, once the
tribute in the generation of the histogram A[p(q), O(q)]. Fig. pixels corresponding to significant lines are removed, the
6 presents the line segments detected for the cube states line segment's presence is apparent.
shown in Fig. 4 using the proposed Hough technique. The
pixels at which line segments meet define the junction posi-
tions. The current example, although seemingly simple, can 5.2 Object Tracking
cause help in identifying some problems of the traditional This section briefly illustrates the capabilities of the un-
HT technique, due to the relatively low image resolution, scented Kalman filter (UKF) [13, 14] to estimate the 3D
edge thickness, and parametric similarity of some line seg- motion and structure parameters of a moving rigid body.
ments. For instance the Fig. 4's cube position presented as The simulated scenario involved a moving cube, as given
1573
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
Fig. 8: Histogram A[p(q),O(q)] for the edges of the cube
shown in Fig. 5 after two line segments have been removed
(a) (bO
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.
RMSE: Normalized Center Positon
uJ
Time (Sec)
Fig. 1O: RMSE: Normalized Center Position Fig. 12: RMSE: Normalized Position of a Feature Point
[9] E. R. Davies. Application of the Generalised Hough Trans-
form to Corner Detection. IEE Proc. E, 135(1):49 54, Jan. -
1988.
[10] T. Tuytelaars, M. Proesmans, and L. Van Gool. The Cas-
caded Hough Transform. In Proc. International Conf On
Image Processing, pages 736 739, Oct. 1997. -
Authorized licensed use limited to: UNIVERSITY OF NOTTINGHAM. Downloaded on January 30, 2009 at 09:14 from IEEE Xplore. Restrictions apply.