Object Recognition in Infrared Image Sequences Using Scale Invariant Feature Transform
Object Recognition in Infrared Image Sequences Using Scale Invariant Feature Transform
1. INTRODUCTION
Recently, a visual simultaneous localization and map building (v-SLAM) is required multi-modal sensors, such as
ultrasound sensor, range sensor, infrared (IR) sensor, encoder (odometer), and multiple visual sensors. Recognitionbased localization is considered as the most promising method of image-based SLAM [1]. Multi-sensor based image
system is a challenging task and has many applications, such as security systems, defense applications, and intelligent
machines. Object recognition techniques have been actively investigated and have wide application in various fields. An
infrared and visual image is a potential solution to improve targets detection, tracking, recognition, and fusion
performance. Tracking and recognition based on the visual image is sensitive to variations in illumination conditions [2].
Tracking and recognition of target using different imaging modalities, in particular infrared (IR) images has become an
area of growing interest. Thermal IR imagery is nearly invariant to changes in ambient illumination, and provides a
capability for identification under all lighting conditions including total darkness. Infrared sensors are routinely used in
remote sensing applications. Coupling an infrared sensor with a visible band sensor - for frame of reference or for
additional spectral information - and properly processing the two information streams has the potential to provide
valuable information in night and/or poor visibility conditions [3].
Object detection and recognition techniques are proven to be more popular than other biometric features based on
efficiency and convenience. It can also use a low-cost personal computer (PC) camera instead of expensive equipments,
and require minimal user interface. Object authentication has become a potential a research field related to object
recognition [4]. Object recognition using infrared imaging sensors has become an area of growing interest in recent
years. Thermal infrared technique have been used in object recognition system, which have advantages in object
detection, detection of disguised objects, and object recognition under poor lighting conditions. However, thermal
infrared is not desirable because of the higher cost of thermal sensors and its instability in different temperature. Whereas
IR has attracted more and more attention due to its preferable attribute and low cost, which it is also adopted in this
paper.
An object recognition system tends to be classified into three categories. The first category includes geometric
feature-based methods, where feature vectors are used for representation. Feature vectors can decide identity of an object
as well as validity of object region. The most popular solution for the robust recognition method is scale-invariant feature
transform (SIFT) approach that transforms an input image into a large collection of local feature vectors, each of which
is invariant to image translation, scaling, and rotation [1]. Local descriptors [5] are commonly employed in a number of
real-world applications such as object recognition [5] and image retrieval [6] because they can be computed efficiently,
are resistant to partial occlusion, and are relatively insensitive to changes in viewpoint. In this paper, proposed feature
extraction for object recognition uses SIFT. Template-based methods fall into the second category including; correlationbased, Karhunen Loeve (KL) expansion, linear discriminant, singular value decomposition (SVD), matching pursuit,
neural network (NN), and dynamic link methods. A template is made from multiple features. Template-based methods
are very robust to illumination change, but sensitive to scale change. The third category includes model-based methods
based on hidden markov model (HMM) for object recognition and detection. They integrate information in various
scales and directions by using a probability model. Model-based methods are particularly useful in the case where feature
vectors cannot decide originality of the object template [4].
Proposed method consists of two stages. First, we must localize the interest point in position and scale of moving
objects. Second, we must build a description of the interest point and recognize moving objects. Proposed method uses
SIFT for an effective feature extraction in PowerPC-based IR imaging system [7]. Proposed SIFT method consists of
scale space, extrema detection, orientation assignment, key point description, and feature matching. SIFT descriptor sets
up extensive range about 1.5 times than visual image when feature value of SIFT in IR image is less than visual image.
Because, object of IR image is analogized by field test that it exists more expanse form than visual image. Therefore,
proposed SIFT descriptor is constituted at more expanse term for a precise matching of object. Based on experimental
results, the proposed method is extracted objects feature values in PowerPC-based IR imaging system, and the result is
presented by experiment. This paper is organized as follows: Section 2 describes feature extraction and recognition using
the SIFT in our system. Section 3 presents robust algorithm against Gaussian noise. Section 4 presents experimental, and
Section 5 concludes the paper.
rotation, and partially invariant to change in illumination. Proposed method uses SIFT for an effective feature extraction
in PowerPC-based IR imaging system. Following are the major stages of computation [5] used to generate the set of IR
image features:
1) Scale-space extrema detection
2) Keypoint localization
3) Orientation assignment
4) Keypoint descriptor
For object matching and recognition, SIFT features are first extracted from a set of reference IR images and stored in
a database. In this paper, we adopt matching algorithm for feature that a new IR image is matched by individually
comparing each feature from the new IR image to this previous database and finding candidate matching features based
on Euclidean distance of their feature vectors.
The first stage of keypoint detection is to identify locations and scales that can be repeatedly assigned under differing
views of the same object. According to SIFT [5], scale-space kernel is the Gaussian function. The scale space of an
image is defined as a function, L( x, y , ) , that is produced from the convolution of a variable-scale Gaussian,
(1)
1
2
e( x
+ y 2 ) / 2 2
(2)
To efficiently detect stable keypoint locations in scale space using scale-space extrema in the difference-of-Gaussian
function convolved with the image, D( x, y, ) , which can be computed from the difference of two nearby scales
separated by a constant multiplicative factor k :
G ( x, y, ) = (G ( x, y, k ) G ( x, y, ) ) I ( x, y )
= K ( x, y, k ) K ( x, y, )
In order to detect the local maxima and minima of
(3)
in the current image and nine neighbors in the scale above and below. It is selected only if it is larger than all of these
neighbors or smaller than all of them.
2.2 Keypoint localization
Once a keypoint candidate has been found by comparing a pixel to its neighbors, the next step is to perform a detailed fit
to the nearby data for location, scale, and ratio of principal curvatures. Lowes approach uses the Taylor expansion of the
scale-space function, D( x, y, ) , shifted so that the origin is at the sample point:
1 T 2D
DT
D ( x) = D +
x+ x
x,
x
2
x 2
where
(4)
D and its derivatives are evaluated at the sample point and x = ( x, y, )T is the offset from this point. The
location of the extremum, x , is determined by taking the derivative of this function with respect to x and setting it to
zero, giving
x=
2 D 1 D
.
x 2 x
(5)
(L( x + 1, y ) L( x 1, y ) )2 + (L( x, y + 1) L( x, y 1) )2 ,
(6)
(7)
m ( x, y ) =
(a)
(b)
Fig. 1. Key point description: (a) image gradients and (b) keypoint descriptor
K kj in our DB.
DB
x
DB 3
u m1
v = m
3
(8)
ty ]T and the affine rotation, scale, and stretch are represented by the mi
parameters, respectively. Position of keypoint corresponding with affine transform represent ( x, y ) and (u , v) ,
To solve for the transformation parameters, so the equation above can be rewritten to gather the unknowns into a
column vector:
m1
x y 0 0 1 0 m2
0 0 x y 0 1 m u
3 = v .
m4
L L
M
L L
x
y
(9)
Eq. (9) shows a single match, but any number of further matches can be added, with each match contributing two
more rows to the first and last matrix. At least three matches are needed to provide a solution. We can write this linear
system as
Ax = b ,
(10)
These parameters calculate Pseudo inverse. Here, the least-squares solution for the parameters x can be determined
by solving the corresponding normal equations,
x = AT A AT b ,
(11)
which minimizes the sum of the squares of the distances from the projected model locations to the corresponding image
locations. The result can be recognized with estimated parameter.
(a)
(b)
Fig. 3. (a) Position shift and (b) Generation of a false maximum extreme point by Gaussian noise
I (r , c) at the i th frame, and ni (r , c) be Gaussian noise on (r , c) at the i th frame, which r and c represent a row
and a column in image, respectively. We assume:
ni (r , c) ~ N (0, n2 ) .
(11)
Figure 4 shows the procedure of object recognition in our system. Our system implements PowerPC-based IR
imaging system. In this architecture, first, IR or CCD image sequences and the parameters involved with recognition
algorithms are transmitted to the PowerPC. The recognition can process image sequences captured by frame grabber in
the slave unit or stored in the master unit. The parameters are selected by user and transmitted to recognition form the
master unit.
bbC
rJJ
W1JU bur
r
LG2flJ2
H1bC
DAD
FCD
HDD
JJJJG 20fl1LCG
CCD 1K
crbWL
1W
bLoccJI 2
II
4. EXPERIMENTAL RESULTS
In this section we present some of the experiments using image sequences with PowerPC-based IR imaging system and
recognition and tracking algorithms. Figure 5 shows the recognition and tracking result of an object. If it disappears the
recognized object now, our system change tracking mode. Figure 6 shows the recognition and tracking result of an object
2
5. CONCLUSIONS
In this paper, we proposed automated target recognition by using SIFT in PowerPC-based IR imaging system. Proposed
method consists of two stages. First, we must localize the interest point in position and scale of moving objects. Second,
we must build a description of the interest point and recognize moving objects. Proposed method uses SIFT for an
effective feature extraction in PowerPC-based IR imaging system. Proposed SIFT method consists of scale space,
extrema detection, orientation assignment, key point description, and feature matching. SIFT descriptor sets up extensive
range about 1.5 times than visual image when feature value of SIFT in IR image is less than visual image. Because,
object of IR image is analogized by field test that it exists more expanse form than visual image. Therefore, proposed
SIFT descriptor is constituted at more expanse term for a precise matching of object. Based on experimental results, the
proposed method is extracted objects feature values in PowerPC-based IR imaging system, and the result is presented by
experiment.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
J. Lee, Y. Kim, C. Park, C. Park, and J. Paik. Robust feature detection using 2D wavelet transform under low light
environment, Proc. Intelligent Computing in Signal Processing and Pattern Recognition (ICIC2006) (345), 10421050(2006).
J. Wang, J. Liang, H. Hu, Y. Li, and B. Feng, Performance evaluation of infrared and visible image fusion
algorithms for face recognition, Proc. International Conf. Intelligent Systems and Knowledge Engineering (ISKE
2007), 1-8(2007).
D. Socolinsky, A. Selinger, and J. Neuheisel, Face recognition with visible and thermal infrared imagery,
Computer Vision and Image Understanding Papers 91(1-2), 72-114(2003).
C. Park, Multimodal human verification using stereo-based 3D information, IR, and speech, Proc. SPIE 6543,
65431D-10(2007).
D. Lowe. Distinctive image features from scale-invariant keypoints, Int. Journal of Computer Vision Papers
60(2), 91-110(2004).
K. Mikolajczyk, and C. Schmid, Indexing based on scale invariant interest points, Proc. Int. Conf. Computer
Vision, 525531(2001).
J. Lee, J. Youn, and C. Park, "PowerPC-based system for tracking in infrared image sequences, Proc. SPIE 6737,
67370S-9(2007).
(a)
(b)
(c)
(d)
Fig. 5. Experimental results of the proposed recognition and tracking algorithms: (a) 24th frame (b) 28th frame (c) 32nd
frame (d) 40th frame
(a)
(b)
(c)
(d)
Fig. 6. Experimental results of the proposed recognition and tracking algorithms with Gaussian noise
frame (b) 28th frame (c) 32nd frame (d) 40th frame