1 s2.0 S1875389211007206 Main
1 s2.0 S1875389211007206 Main
Abstract
In this paper, we propose a head pose estimation method which combines a texture-based head tracking method and
the Kalman filter. The texture-based tracking method first estimates the head pose in the current frame by recovering
the relative head motion between consequence frames. The Kalman filter predicts the head pose in the next frame,
which can help the tracking method to recover motion from the predicted pose. Our method has been tested on a real
video sequence. The experiment results show it successfully tracks a head and improve the efficiency of the tracking
method.
© 2011 Published by Elsevier B.V. Open access under CC BY-NC-ND license.
Selection and/or peer-review under responsibility of Garry Lee.
Keywords: Head Pose Estimation; Tracking Method; Kalman Filter
1875-3892 © 2011 Published by Elsevier B.V. Selection and/or peer-review under responsibility of Garry Lee.
Open access under CC BY-NC-ND license. doi:10.1016/j.phpro.2011.11.066
Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427 421
1. Introduction
In many computer vision applications, the 6 degree of freedom (DOF) [1] of a head is the key factor
for analyzing a person’s motion and intentions, evaluating his focus of attention, and reconstructing a 3-D
face. But estimating these 3D parameters automatically and robustly remains a challenging problem. An
obvious difficulty lies in calculating the rotation angles from a pixel-based representation of a head.
Moreover, some factors such as varying illumination conditions, partial occlusions, and complex
background, also impact the result of estimation.
In the last few years, a number of approaches have been proposed to solve the problem of head pose
estimation. Some of them make use of stereoscopic cameras, which can provide the depth information of
an image. But stereoscopic cameras are too expensive for some applications, so people still focus on the
monocular methods.
So far, many monocular methods have been applied for head pose estimation. Among them tracking
methods achieve high level accuracies. These tracking methods estimate head pose by recovering full
motion of head between consecutive frames of a video sequence. But most tracking methods only think
about how to compute the 3D parameters of head motion by using gray or color information about two
frames. In our opinion, it is helpful to take into account the previous state when recovering head motion.
For example, when we detect that a head is turning right at the current frame, it is likely that the head will
turn right at the next frame. Using this prior knowledge may increase the efficiency of estimation.
Thus, we propose a head pose estimation method combining a texture-based head tracking method and
the Kalman filter [2]. The texture-based tracking method is used to estimating the head pose in the current
frame. The Kalman filter predicts the head pose in the next frame by maximizing the posteriori
probability of the head pose based on the previous estimations. Then the result of the Kalman filter
prediction is used to improve the performance of the tracking method.
The reminder of the paper is organized as follows: the related works are briefly reviewed in section 2.
In section 3, the texture-based tracking method and the Kalman filter are discussed in details. In section 4,
experiment results are provided to demonstrate performance of the proposed method. In section 5,
conclusion and future work are given.
2. Related Work
Chutorian have classified current methods of head pose estimation into eight categories [1]. Among
them, manifold embedding methods, flexible model methods, and tracking methods have been widely
investigated.
The manifold embedding methods [3] consider the high-dimensional head images as a set of
geometrically related points lying on a smooth low-dimensional manifold. For head pose estimation, a
manifold is first computed based on images with known head poses, and then a new head image is
projected onto this manifold by an embedding technique to get the head pose. These methods can obtain
satisfactory results. A difficulty of them is to obtain a regular sampling of poses from a lot of people
when training a manifold.
The flexible model methods use non-rigid models to fit the face and can successfully operate in many
scenarios. Models such as ASM (Active Shape Model) [4] and AAM (Active Appearance Model) [5]
have been utilized in tracking face and facial features. To estimate 3D head pose, people often train 3D
face shape models. Chutorian [6] used a 3D facial model to track the motion of the driver’s head.
Although these models can work efficiently, they are usually very complex and computationally
expensive.
422 Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427
The tracking methods recover head motion by using a 3D head model. The model is projected to a 2D
face image. The motion of head is achieved by minimizing the square difference between the face image
and the input frame. The texture-based tracking methods are proposed by La Cascia [7] and Xiao [8], and
these methods have shown promising performance by modeling a head as a texture based cylinder.
Compared to other methods, tracking methods need to know a precise initial position of head.
The 3D head pose estimation method proposed in this paper uses texture-based tracking technique. Let
P represent the motion parameter vector, P [Z x , Z y , Z z , t x , t y , t z ]T . Where t x , t y , t z represent the 3D
translations relative to the three axes, and Zx , Z y , Zz represent the rotations.
We use AAM to detect frontal face and obtain an initial head pose P0 . When the kth frame is incoming,
the relative movement P k between the kth frame and the (k-1)th frame can be recovered by tracking
k
method. Then the head pose Pk of the kth frame is P0 ¦ P k .
n 1
The Kalman filter algorithm contains two phases, the prediction phase and the correction phase. The
prediction phase predicts the head pose in a new image. This prediction helps the tracking method to
estimate the head pose in this new image. At last the correction phase updates the estimation and the error
covariance for the next round of prediction phase. The flow chart of our method is shown in Fig.1.
Input Image
yes
If head pose is a frontal
no
predict the head motion P for
the next frame by prediction
phase(Kalman filter)
Estimate head pose P from P
(tracking method)
A difficulty of tracking methods is the requisite of an accurate initialization of head pose. Generally,
the frontal view of head is adopted as the initial pose. As a fast and accurate algorithm, the Adaboost
algorithm has been widely applied in frontal face detection. But when the head rotates a little angle, the
Adaboost algorithm will still consider it as a frontal face. To avoid this problem, we use AAM to detect
facial features and determine the frontal face.
Active Appearance Models (AAMs) [10] are generative face models, which contains a statistical
model of the shape and grey-level appearance of face. The facial feature points located by AAM are shown
in Fig.2. Once certain facial features, such as the eyes and mouth, are correctly found, a good
determination of frontal face can be obtained by exploiting face symmetry properties. In a face region,
eyes and mouth are assumed approximately coplanar, and the tip of the nose lies on the symmetry axis of
face. The eyes and mouth are assumed to be symmetrically positioned on each side of the symmetry axis.
Here we select seven feature points, which are nose tip, two outer eye corners, two inner eye corners, and
two mouth corners. The properties to determine frontal face are described as follows:
(1) The four eye corners lie on a same horizontal line;
(2) The vertical distance between left outer eye corner and nose-tip is approximately equal to that
between right outer eye corner and nose-tip;
(3) The length of left eye is approximately equal to that of right eye;
(4) The ratio of two lengths, D1 and D2 , is larger than 1.8, where D1 is the length between two outer
eye corners and D2 is the vertical length between nose-tip and the bottom lip.
3.2. Texture-based Tracking Method
Once a frontal face is detected, the head pose is assigned an initial value. Then tracking methods can
estimate the head pose by recovering relative movement of head between the frames. We choose texture-
based tracking methods because they can discover small shifts of head.
To track the 6 DOF of head motion, a 3D cylindrical model is used to approximately represent a human
head. In texture-based techniques, the head tracking problem is solved in terms of image registration.
After initialization, the head region in each frame is extracted to create a texture. The texture is mapped
onto a cylindrical surface which consists of 40 triangles, and the textured head model is then rendered on
an image depending on the 3D position and orientation of the cylinder. In fact, image registration is
executed between the rendered image and a new frame.
To implement texture-based registration, an iterative image registration algorithm is performed. The
algorithm first assumes that the position and rotation of head in the new frame are just the same as that in
424 Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427
the rendered image. Under this assumption, the mean error between the new frame and the rendered
image may be very small. But if the error is larger than a threshold, the head pose changes. Then the
algorithm searches for the parameters of motion. This process consists of multiple iterations. In every
iteration, the algorithm uses spatial intensity gradient information and calculates motion parameter vector
P by equation (1).
1
§ T · T (1)
P ¨ ¦ I u FP I u FP ¸ ¦ I t I u FP
© : ¹ :
Where T is the angle between the normal of the triangle and the direction to the camera center.
When implementing the tracking method, we utilize an image pyramid. The image pyramid is
constructed to represent an image at different resolutions. Parameters are first computed at the top layer of
the pyramid, and then propagated to the lower layer.
A major problem of texture-based tracking methods is that it searches for the best transformation from
the position and rotation of head in the last frame. When current pose differs largely from the previous
one, this algorithm may spend much computation time to converge. However, if the parameters are
precisely predicted, the tracking method can start searching from the pose near the actual values. In this
way, the number of searching iterations can be decreased effectively.
For pose prediction, we choose the Kalman filter. The Kalman filter is computationally efficient, and it
has the feasibility to model noise. It requires a process model of head motion and a measurement
model.The process model governs the dynamic relationship between states of two successive time steps,
which can be defined by equation (3).
Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427 425
X k 1 AX k Wk (3)
X k is a state vector in the current frame, which contains six variables for the translation and rotation
of the head, six for velocities, and six for accelerations.
§1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0·
¨ ¸
¨0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0¸
¨0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0¸
Hk ¨ ¸
¨0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0¸ (8)
¨0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ¸¸
¨
¨0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ¸¹
©
After processing the kth frame, the Kalman filter provides an optimal estimation of the current state
X k using the current input measurement Pk . It produces the future state X k 1 using the underlying state
model. As the process model has considered the previous head pose, velocities, and accelerations, X k 1
provides a good prediction for the head pose in the next frame.
4. Experiment Results
The experiments for the methods described above have been carried out under the laboratory
environment. We have implemented our methods based on OpenGL library and OpenCV library. To
evaluate the results of the image registration algorithm, we use the video sequences from Boston
University [9]. We first choose a frame with frontal face, and then test the algorithm using other frames.
426 Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427
Fig.3 shows that the image registration algorithm performs quite well when the angle of rotation is not
very large. By updating the texture for every frame, the texture-based tracking method can recover head
motion effectively.
The numbers of iterations that the textured-based tracking algorithm spent for the frames in Fig.3 are
recorded in table 1. This table also gives the numbers of iterations of the tracking algorithm when it uses
the pose prediction provide by the Kalman filter. Table 1 demonstrates that using the Kalman filter can
reduce the number of searching iterations and save computation time.
Frame a b c d
No pose prediction 23 27 41 24
Pose prediction 16 17 31 16
5. Conclusion
In this paper, a tracking method combining the Kalman filter has been proposed for estimating head
pose. The method uses texture-based tracking technique to estimate head pose and the Kalman filter to
predict head motion. The experiments show that our method can decrease computation time of tracking
method. In future, we plan to apply it in more complex environment.
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (2010-IV-
064).
References
[1] E Murphy-Chutorian, M. M. Trivedi, Head pose estimation in computer vision: a survey. IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 31, pp. 607-626, IEEE Press, Washington (2009)
[2] M. Perse, J. Pers, Physics-Based Modelling of Human Motion using Kalman Filter and Collision Avoidance Algorithm. In:
the 4th International Symposium on Image and Signal Processing and Analysis, pp. 328-333, IEEE Press, Zagreb (2005)
[3] C. Shan., W. Chan, Head Pose Estimation Using Spectral Regression Discriminant Analysis. In: 2009 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, IEEE Press, Miami (2009)
[4] P. Xiong, L. Huang,C. Liu, Initialization and pose alignment in active shape model. In: 2010 International Conference on
Pattern Recognition, pp. 3971-3974, IEEE Press, Istanbul (2010)
[5] T.F. Cootes, G.J. Edwards, C.J. Taylor. Active Appearance Models. IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 23, pp. 681-685, Washington (2001)
Wang Yu and Liu Gang / Physics Procedia 22 (2011) 420 – 427 427
[6] E. Murphy-Chutorian, M. M. Trivedi,:Head pose estimation and Augmented Reality Tracking:An Integrated System and
Evaluation for Monitoring Driver Awareness. IEEE Transactions on Intelligent Transportation System, Vol.11, pp, 300-311, IEEE
Press, Washington (2010)
[7] M. L. Cascia, S. Sclaroff, V. Athitsos, Fast, reliable head tracking under varying illumination: an approach based on
registration of texture-mapped 3D models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, pp. 322-336,
IEEE Press, Washington (2000)
[8] J. Xiao,, T. Moriyama, T. Kanade,, J. F. Cohn, Robust full-motion recovery of head by dynamic templates and re-registration
techniques. In: 5th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 85 -94, IEEE Press,
Washington (2003)
[9] http://csr.bu.edu/headtracking/