0% found this document useful (0 votes)
58 views

Feature Extraction

1) This document discusses a novel facial expression recognition system based on hybrid face regions (HFR) that uses both whole face images and specific face regions (eyes and mouth). 2) The system consists of face detection, feature extraction using log Gabor filters on whole face and regions, feature selection using mutual information, and classification using Naive Bayesian classifier. 3) The method is evaluated on two facial expression databases and shows improved classification rates compared to whole face analysis alone.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Feature Extraction

1) This document discusses a novel facial expression recognition system based on hybrid face regions (HFR) that uses both whole face images and specific face regions (eyes and mouth). 2) The system consists of face detection, feature extraction using log Gabor filters on whole face and regions, feature selection using mutual information, and classification using Naive Bayesian classifier. 3) The method is evaluated on two facial expression databases and shows improved classification rates compared to whole face analysis alone.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Advances in Electrical and Computer Engineering Volume 9, Number 3, 2009

Feature Extraction for Facial Expression


Recognition based on Hybrid Face Regions
Seyed M. LAJEVARDI, Zahir M. HUSSAIN
RMIT University, Australia
seyed.lajevardi @rmit.edu.au

Abstract—Facial expression recognition has numerous result of a person moving their head so that not all features
applications, including psychological research, improved of the face can be seen by a camera. Another type of
human computer interaction, and sign language translation. A occlusion is a systematic occlusion, which can be caused by
novel facial expression recognition system based on hybrid face
a person wearing something such as a head-mounted
regions (HFR) is investigated. The expression recognition
system is fully automatic, and consists of the following display, which causes the features of the upper half of the
modules: face detection, facial detection, feature extraction, face to be invisible. These types of occlusions are
optimal features selection, and classification. The features are potentially more damaging since they result in whole
extracted from both whole face image and face regions (eyes features of relevance to judging facial expression being
and mouth) using log Gabor filters. Then, the most obscured.
discriminate features are selected based on mutual information
Plenty of work has been done on facial expression
criteria. The system can automatically recognize six
expressions: anger, disgust, fear, happiness, sadness and recognition [15], [17], [19], [21]. In this study, we
surprise. The selected features are classified using the Naive investigate the part of the face that contains the most
Bayesian (NB) classifier. The proposed method has been discriminative information for facial expression recognition
extensively assessed using Cohn-Kanade database and JAFFE system and propose hybrid face region method for feature
database. The experiments have highlighted the efficiency of extraction. An automatic classification of facial expressions
the proposed HFR method in enhancing the classification rate.
consists of two stages: feature extraction and feature
Index Terms—Facial expression recognition, Gabor filters,
classification. The feature extraction is extremely important
Face regions, Human computer interaction, Feature extraction to the whole classification process. If inadequate features are
used, even the best classifier could fail to achieve accurate
I. INTRODUCTION recognition. In most cases of facial expression classification,
the process of feature extraction yields a prohibitively large
Since last decade, a growing interest in human computer
number of features and subsequently a smaller sub-set of
interaction (HCI) systems has been developed. Automated
features needs to be selected according to some optimality
Facial expression recognition is an important task in human
criteria.
computer interaction systems that include emotion
The Gabor wavelet feature representation showed high
processing. Humans are capable of producing thousands of
performance in the recognition of facial actions from image
facial actions during communication that vary in
sequences. Although the Gabor wavelet facial feature
complexity, intensity, and meaning. Emotion or intention is
representations have been widely adopted [6], [7], [20], it is
often communicated by subtle changes in one or several
computationally expensive to convolve the face images with
discrete features. The addition or absence of one or more
the multi-level banks of the Gabor filters in order to extract
facial actions may alter its interpretation. In addition, some
the scale and the orientation coefficients. Furthermore, the
facial expressions may have a similar gross morphology but
Gabor wavelet analysis suffers from two major limitations:
indicate varied meaning for different expression intensities.
the maximum bandwidth of a Gabor filter is limited to
Automatic facial expression analysis is a flourishing area
approximately one octave and the Gabor filters are not
of research in computer science. Problems that have been
optimal when the objective is to achieve broad spectral
tackled with previously are the tracking of facial expression
information with the maximum spatial localization. These
in static images and video sequences, the transfer of
drawbacks can be overcome when using the logarithmic
expressions to novel faces, the repurposing of a person’s
form of the Gabor filters in the process of feature extraction.
expression to a virtual model and recognition of facial
The log Gabor filters are known to provide excellent
expression. These recognition tasks have focused on the
simultaneous localization of spatial and frequency
classification of emotional expressions [1], classification of
information, however the dimensionality of the resulting
complex mental states [2] or the automatic recognition of
data is high. The dimensionality reduction can be achieved
FACS action units [3].
by selection of a small sub-set of the log Gabor features
A problem that is frequently encountered in each of these based on specified optimality criteria.
tasks is that of partial occlusions. Occlusions can introduce The dimensionality reduction can be achieved by
errors into the predicted expression or result in an incorrect selection of the more informative features based on feature
expression being transferred to a virtual head. One type of selection and data reduction methods such as: principle
partial occlusion is a temporary occlusion caused by a part component analysis (PCA), independent component analysis
of the face being obscured momentarily by an object or as a (ICA), mutual information (MI), etc. [10], [11], [12]. In this
Digital Object Identifier 10.4316/AECE.2009.03012

63
Advances in Electrical and Computer Engineering Volume 9, Number 3, 2009

paper, the mutual information [25] is investigated to select


the optimum features for classification. In contrast to the
classical correlation-based feature selection methods, the
mutual information can measure arbitrary relations between
variables and it does not depend on transformations applied
to different variables. It can be potentially useful in
problems where methods based on linear relations between
data are not performing well. Figure 1 illustrates a block
diagram of the proposed facial expression recognition
system.

Figure 2. Sample static images from JAFFE database.

Figure 1. Block diagram of proposed system.

II. IMAGE DATASET


All the facial expression classification tests described here
are performed using two popular databases: JAFFE database
[6] and Cohn-Kanade database [1]. The Japanese Female
Facial Expression (JAFFE) database contains 213 images of
6 basic facial expressions: happiness, anger, disgust,
surprise, sadness and surprise, as well as the neutral Figure 3. Image sequences from Cohn-Kanade database.
expression. The images were taken from 10 Japanese female
models. The emotions expressed by each picture were III. IMAGE PREPROCESSING
subjectively tested on 60 Japanese volunteers. A sample of The image pre-processing procedure comes as a very
images from JAFFE database is shown in Figure 2. important step in the facial expression recognition task. The
Cohn-Kanade database included 388 image sequences aim of the pre-processing phase is to obtain images which
from 100 subjects. Each sequence contained 12-16 frames. have normalized intensity, uniform size and shape, and
The subject ages ranged from 18 to 30 years. Sixty five depict only a face expressing certain emotion. The pre-
percent of subjects were female; and thirty five percent were processing procedure should also eliminate the effects of
male. Fifteen percent of subjects came from the African- illumination and lighting. The face area of an image is
American background, and three percent from the Asian or detected using the Viola-Jones method [4] based on the
the Latino-American background. The image sequences Haar-like features and AdaBoost learning algorithm. The
represented 100 different subjects expressing different Viola and Jones method is an object detection algorithm
stages of an expression development, starting from a low providing competitive object detection rates in real-time. It
arousal stage, reaching a peak of arousal and then declining. is primarily designed for the problem of face detection. The
The facial expressions of each subject represented six basic features used by Viola and Jones are derived from pixels
emotions: anger, disgust, fear, happiness, sadness and selected from rectangular areas imposed over the picture and
surprise. Some subjects did not have image sequences show high sensitivity to the vertical and horizontal lines.
corresponding to all of the six expressions, and in some AdaBoost is an adaptive learning algorithm that can be used
cases, only one image sequence per expression was in conjunction with many other learning algorithms to
available. Figure 3 shows an example of sequence images improve their performance. AdaBoost is adaptive in the
from Cohn-Kanade database. sense that subsequent classifiers built iteratively are made to

64
Advances in Electrical and Computer Engineering Volume 9, Number 3, 2009

fix instances misclassified by previous classifiers. At each ì f 2ü


ï - [ln( )] ï (1)
iteration, a distribution of weights is updated such that, the ï
H ( f ,q ) = expí
f0 ï ì - (q - q 0 ) 2 ü
exp í
ý ý
weights of each incorrectly classified example, are increased ï 2[ln(s f )]2 ï î 2s q
2
þ
(or alternatively, the weights of each correctly classified ïî f0 ïþ
example are decreased), so that the new classifier focus where f0 is the filters centre frequency, θ0, the filter
more on those examples. direction. The constant σf, defines the radial bandwidth B in
The final stage of the pre-processing for sequence images octaves:
is detection of a face image which depicted certain emotion
æs f ö (2)
with the maximum level of emotion intensity. A new B = 2 2 / ln 2 ´ lnçç ÷÷
method based on mutual information (MI) which is called è f0 ø
facial detection, is used [21], [26]. For each frame, the the constant σθ, defines the angular bandwidth DW in
mutual information between the current frame and the initial radians:
frame is calculated, and the frame with the minimum mutual
DW = 2s q 2 ln 2 (3)
information is selected as the frame that represents an
emotion with the maximum intensity. Finally, the images
are scaled to the same size. Figure 4 shows the images after
pre-processing step.

Figure 6. Eye images used for training and testing.


Figure 4. Images after pre-processing step.
For face region location, we use face model to extract the
eyes and mouth from the face image. The location of eyes
and mouth are shown in Figure 5. The sample of eyes and
mouths which are used for testing and training set is
illustrated in Figure 6 and Figure 7.

Figure 7. Mouth images used for training and testing.


In the study, the ratio σf/f0 is kept constant for varying f0,
B is set to one octave and the angular bandwidth is set to ΔΩ
= p/4 radians. There remains only σf, to be determined for a
Figure 5. Face template used for eyes and mouth detection. varying value of f0.
Several scales and orientations are implemented to extract
IV. FEATURE EXTRACTION features from face images. This leads to different filter
Gabor filters are commonly recognized as one of the best transfer functions representing different scales and
choices for obtaining localized frequency information. orientations. The image filtering is performed in the
However, they suffer from two major limitations. The frequency domain making the process faster as compared to
maximum bandwidth of a Gabor filter is limited to the space domain convolution. After the 2D FFT
approximately one octave and Gabor filters are not optimal transformation into the frequency domain, the image arrays
if one is seeking broad spectral information with maximal I(x,y) are changed into the spectral vectors (If) and
spatial localization. Log Gabor filters, proposed by Field multiplied by the log Gabor transfer functions, producing
[13], circumvent this limitation. They always have a null DC different spectral representations for each image. The
component and can be constructed with arbitrary bandwidth spectra are then transformed back to the spatial domain via
which can be optimized to produce a filter with minimal the 2D inverse FFT. To create the feature vector based on
spatial extent. Log Gabor filters in frequency domain can be HFR method, we extracted the features from eyes and
defined in polar coordinates by H(f, θ) = Hf × Hθ, where Hf mouth and concatenated them with whole face features
is the radial component and Hθ, the angular one: (Equation 4).

65
Advances in Electrical and Computer Engineering Volume 9, Number 3, 2009
r r r r
f HFR = f eyes U f mouth U f face (4) k
C = arg max{ p (c )Õ p ( f j | c)} (10)
c
This process results in prohibitively large number of j =1

feature arrays. For large training and testing sets, the where p(c)= Number of sample in class c / Total samples ,
computations are highly impractical. In order to improve the (fi|c) are conditional tables (or conditional density) learned
computational efficiency, it is critical to reduce the feature in training by using examples, and k is the length of feature
dimensions. This is achieved using feature selection process vector. Despite the independence assumption, NB has been
[5], [10], [25], [26]. shown to have very good classification performance for
many real data sets, on par with many more sophisticated
V. FEATURE SELECTION classifiers.
Optimal subset of features is selected on the basis of
mutual information (MI) criterion [10], [25]. The mutual VII. EXPERIMENTS AND RESULTS
information represents a measure of information commonly In this study, we used JAFFE (JF) and Cohn-Kanade
found in two random variables, say X and Y, and it is given (C-K) databases to train and test the facial expression
as: recognition system. Each test was performed 3 times using
p ( x, y ) randomly selected testing and training sets and an average
I ( X ;Y ) = åX åY p( x, y) log p( x) p( y)
xÎ yÎ
(5) result was calculated. Training has been done for six
expressions (C=6). The subjects represented in the training
where p(x) is the probability density function (pdf), defined set were not included in the testing set of images, thus
as p(x) = Pr{X=x}, and p(x,y) is the joint pdf defined as ensuring a person-independent classification of facial
p(x,y) = Pr(X=x and Y=y). The MI can also be expressed in expressions. Automatic face detection, facial detection, and
terms of the entropy: face region detection were used and the faces were also
I ( X ;Y ) = H ( X ) - H ( X | Y ) (6) scaled. The tested images were classified using log Gabor
filter for feature extraction and naïve Bayesian classifier.
where, H(X) is the entropy of a random variable X, given as: We extracted the features for different scales and
orientations and tested them using naïve Bayesian classifier
H ( X ) = - å p ( x) log p ( x) (7) to choose the best scale and orientation for the log Gabor
xÎX
filters. Figure 8 illustrates the recognition rate for different
H(X|Y) in Equation 6 is the conditional entropy given as: five scales and eight orientations using C-K database. As a
result, we have chosen the log Gabor filters with 3 scales
H ( X | Y ) = - åå p ( x, y ) log p ( y | x ) (8) and 6 orientations which have the maximum accuracy to do
xÎ X yÎY our experiments.
The mutual information feature selection (MIFS)
algorithm, described in [10] is applied to perform the feature
selection. In this approach, starting from the empty set, the
best available feature vectors are added, one by one to the
selected feature set, until the size of the set reaches the
desired value of NS. The sub-set S of feature vectors are
selected using a simultaneous maximization of the mutual
information between the selected feature vectors in S and
the class labels C, and a minimization of the mutual
information between the selected feature vectors within S.

I (C ; f i | S ) = I (C ; f i ) - b å I (f i ; s k ) (9)
s k ÎS

where I(C;fi) is MI between feature and class label and


I(fi;sk) is the MI between selected feature and new feature.
As a result, an optimal sub-set S Ì F of mutually
Figure 8. Percentage of recognition rate for 5 scales and 8 orientations
independent and highly representative feature vectors is based on log Gabor filters.
obtained. Furthermore, the expression recognition from different
face regions is considered. The classification accuracy for
VI. CLASSIFICATION different number of features is shown in Figure 9. The
The Naive Bayesian (NB) classifier is a probabilistic accuracy results are shown in Table I for both C-K and JF
method that has been shown to be effective in many databases based on HFR method using log Gabor filter and
classification problems [8], [14]. It assumes that the mutual information. The recognition rate is increased from
presence (or lack) of a particular feature of a class is 87% and 93% on the basis of whole face image to 91.8%
unrelated to the presence (or lack) of any other feature. If and 97.9% based on HFR method for C-K and JF databases
using c to represent the value of the class variable, and respectively. Generally, the accuracy is improved with
{f1, …, fk} for the features, the classification decision is nearly 5% for both C-K and JF databases.
made using the following formula:

66
Advances in Electrical and Computer Engineering Volume 9, Number 3, 2009

[2] Kaliouby, R. E., Robinson, P., "Real-time inference of complex


mental states from facial expressions and head gestures", Conference
on Computer Vision and Pattern Recognition Workshop, vol. 3, pp.
181-200, 2004.
[3] Tian, Y., Kanade, T., Cohn, J. F., "Recognizing action units for facial
expression analysis", IEEE Tran. on Pattern Analysis and Machine
Intelligence, vol. 23, no. 2, pp. 97-115, 2001.
[4] Viola, P., Jones, M., "Robust real-time object detection", International
Journal of Computer Vision, 57(2), pp. 137-154, 2004.
[5] Guyon, I., Gunn, S., Nikravesh, M., Zadeh, A., "Feature Extraction
Foundations and Applications", Springer, 2006.
[6] Lyons, M., Akamatsu, S., Kamachi, M., and Gyoba, J., "Coding facial
expressions with Gabor wavelets", In FG’98: Proceedings of the 3rd
International Conference on Face and Gesture Recognition,
Washington, USA, 1998.
[7] Zheng, D., Zhao, Y., Wang, J., "Features extraction using a Gabor
filter family", Proceedings of the Sixth IASTED International
Conference Signal and Image Processing, Hawaii, USA, 2004.
[8] Rish, I., "An empirical study of the naive Bayes classifier", IJCAI
Workshop on Empirical Methods in Artificial Intelligence, vol. 335,
Figure 9. The correct classification rate for different number of features pp. 41-46, 2001.
CK: Cohn-Kanade database, JF: JAFFE database. [9] Claude, F. B., Chibelushi, C., "Facial Expression Recognition: A
Brief Tutorial Overview", 2003.
VIII. CONCLUSION [10] Battiti, R., "Using mutual information for selecting features in
supervised neural net learning", IEEE Trans. on Neural Networks,
Facial expression recognition based on hybrid face vol. 5, no. 4, pp. 537-550, 1994.
regions (HFR) has been investigated to determine the part of [11] Liu, F., Wang, Z., Wang, L., Meng, X., "Facial expression recognition
using HLAC features and WPCA", Lecture Notes in Computer
the face that contains most discriminative information for
Science, Springer, 2005.
facial expression classification task. The feature vectors [12] Buciu, I., Kotropoulos, C., and Pitas, I., "ICA and Gabor
have been extracted from the original images and face representation for facial expression recognition", International
regions by multiplying the images with log Gabor filters in Conference on Image Processing, vol. 2, pp. 14-17, 2003.
[13] Field, D.J., "Relations between the images and the response properties
frequency domain from face, eyes and mouth images. Then, of cortical cells", Jour. of the Optical Society of America, pp. 2379-
the most informative features were selected on the basis of 2394, 1987.
mutual information. The features are classified using Naïve [14] Duda, R. O., Hart, P. E., Stork, D. G., "Pattern Classification", Wiley,
New York, 2001.
Bayesian classifier. We compare the classification results [15] Park, S., and Kim, D., "Subtle facial expression recognition using
when only eyes or mouth regions are used for classification. motion magnification", Pattern Recognition Letters, 30(7), pp. 708-
Overall, we found that the proposed HFR method is robust 716, 2009.
[16] Xie, X., and Lam, K.M., "Facial expression recognition based on
for recognizing different expressions. shape and texture", Pattern Recognition, 42(5), pp. 1003-1011, 2009.
[17] Kotsia, I., Zafeiriou, S., and Pitas, I., "Novel multiclass classifiers
TABLE I based on the minimization of the within-class variance", IEEE Tran.
PERCENTAGE OF CORRECT CLASSIFICATION BASED ON HFR METHOD. on Neural Networks, 20(1), pp. 14-34, 2009.
[18] Geetha, A., Ramalingam, V., Palanivel, S., Palaniappan, B., "Facial
(a) Cohn-Kanade database expression recognition: a real time approach", Expert Systems with
Applications, 36(1), pp. 303-308, 2009.
Anger Disgust Fear Happy Sad Surprise [19] Lajevardi, S. M., Lech, M., "Facial expression recognition using a
Anger 81.2 10.1 0.0 0.0 8.7 0.0 bank of neural networks and logarithmic Gabor filters", Proceedings
of Digital Image Computing: Techniques and Applications
Disgust 8.7 85.1 6.3 0.0 0.0 0.0
(DICTA’08), pp. 77-83, Australia, 2008.
Fear 0.0 0.0 95.5 4.5 0.0 0.0 [20] Lajevardi, S. M., Lech, M., "Averaged Gabor filter features for facial
Happy 0.0 0.0 3.0 97.0 0.0 0.0 expression recognition", Proceedings of Digital Image Computing:
Techniques and Applications (DICTA’08), pp. 71-76, Australia, 2008.
Sad 8.0 0.0 0.0 0.0 92.0 0.0 [21] Lajevardi, S. M., Lech, M., "Facial expression recognition from
Surprise 0.0 0.0 0.0 0.0 0.0 100 image sequences using optimised feature selection", 23rd
International Conference on Image and Vision Computing
Average 91.8 (IVCNZ’08), pp. 1-6, New Zealand, 2008.
[22] Lajevardi, S. M., Hussain, Z. M., "Facial expression recognition:
(b) JAFFE database
Gabor filters versus higher-order correlators", International
Conference on Communication, Computer and Power (ICCCP’08),
Anger Disgust Fear Happy Sad Surprise pp. 354-358, Oman, 2009.
Anger 95.8 0.0 0.0 0.0 4.2 0.0 [23] Lajevardi, S. M., Hussain, Z. M., "Facial expression recognition using
log-Gabor filters and local binary pattern operators", International
Disgust 0.0 96.5 3.5 0.0 0.0 0.0 Conference on Communication, Computer and Power (ICCCP’08),
Fear 0.0 0.0 97.9 2.1 0.0 0.0 pp. 349-353, Oman, 2009.
[24] Lajevardi, S. M., Hussain, Z. M., "Zernike moments for facial
Happy 0.0 0.0 1.5 98.5 0.0 0.0 expression recognition", International Conference on Communication,
Sad 1.6 0.0 0.0 0.0 98.4 0.0 Computer and Power (ICCCP’08), pp. 371-381, Oman, 2009.
[25] Lajevardi, S. M., Hussain, Z. M., "Feature selection for facial
Surprise 0.0 0.0 0.0 0.0 0.0 100 expression recognition based on mutual information", IEEE-GCC’09
Average 97.9 Conference, Kuwait, 2009.
[26] Lajevardi, S. M., Hussain, Z. M., "Feature selection for facial
expression recognition based on optimization algorithm", Second
REFERENCES International Workshop on Nonlinear Dynamics and Synchronization
[1] Kanade, T., Cohn, J. F., and Tian, Y., "Comprehensive database for (INDS'09), Klagenfurt, Austria, 2009.
facial expression analysis", Proceedings of the Fourth IEEE
International Conference on Automatic Face and Gesture
Recognition, Grenoble, France, pp. 46-53, 2000.

67

You might also like