0% found this document useful (0 votes)
71 views

Facial Expression Recognition Based On L PDF

This document summarizes a study on using Local Binary Patterns (LBP) features for facial expression recognition. The study systematically examines different machine learning methods using LBP features on several databases. Results show that LBP features are effective and efficient for facial expression recognition. Boosted LBP features combined with Support Vector Machine classifiers achieved the best recognition performance. Experiments also found that LBP features performed robustly over a useful range of low image resolutions.

Uploaded by

Hoa Tra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Facial Expression Recognition Based On L PDF

This document summarizes a study on using Local Binary Patterns (LBP) features for facial expression recognition. The study systematically examines different machine learning methods using LBP features on several databases. Results show that LBP features are effective and efficient for facial expression recognition. Boosted LBP features combined with Support Vector Machine classifiers achieved the best recognition performance. Experiments also found that LBP features performed robustly over a useful range of low image resolutions.

Uploaded by

Hoa Tra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Image and Vision Computing 27 (2009) 803–816

Contents lists available at ScienceDirect

Image and Vision Computing


journal homepage: www.elsevier.com/locate/imavis

Facial expression recognition based on Local Binary Patterns:


A comprehensive study
Caifeng Shan a,*, Shaogang Gong b, Peter W. McOwan b
a
Philips Research, High Tech Campus 36, 5656 AE Eindhoven, The Netherlands
b
Department of Computer Science, Queen Mary, University of London, Mile End Road, London E1 4NS, UK

a r t i c l e i n f o a b s t r a c t

Article history: Automatic facial expression analysis is an interesting and challenging problem, and impacts important
Received 12 June 2006 applications in many areas such as human–computer interaction and data-driven animation. Deriving
Received in revised form 14 February 2008 an effective facial representation from original face images is a vital step for successful facial expression
Accepted 16 August 2008
recognition. In this paper, we empirically evaluate facial representation based on statistical local features,
Local Binary Patterns, for person-independent facial expression recognition. Different machine learning
methods are systematically examined on several databases. Extensive experiments illustrate that LBP
Keywords:
features are effective and efficient for facial expression recognition. We further formulate Boosted-LBP
Facial expression recognition
Local Binary Patterns
to extract the most discriminant LBP features, and the best recognition performance is obtained by using
Support vector machine Support Vector Machine classifiers with Boosted-LBP features. Moreover, we investigate LBP features for
Adaboost low-resolution facial expression recognition, which is a critical problem but seldom addressed in the
Linear discriminant analysis existing work. We observe in our experiments that LBP features perform stably and robustly over a useful
Linear programming range of low resolutions of face images, and yield promising performance in compressed low-resolution
video sequences captured in real-world environments.
Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction similar or better performance than appearance-based approaches


in Action Unit recognition. However, the geometric feature-based
Facial expression is one of the most powerful, natural and methods usually requires accurate and reliable facial feature
immediate means for human beings to communicate their emo- detection and tracking, which is difficult to accommodate in many-
tions and intensions. Automatic facial expression analysis is an ituations. With appearance-based methods, image filters, such as
interesting and challenging problem, and impacts important appli- Gabor wavelets, are applied to either the whole-face or specific
cations in many areas such as human–computer interaction and face-regions to extract the appearance changes of the face.
data-driven animation. Due to its wide range of applications, Due to their superior performance, the major works on appear-
automatic facial expression recognition has attracted much atten- ance-based methods have focused on using Gabor-wavelet
tion in recent years [1–4]. Though much progress has been made representations [25,7,8,26,19]. However, it is both time and mem-
[5–24], recognizing facial expression with a high accuracy remains ory intensive to convolve face images with a bank of Gabor filters
difficult due to the subtlety, complexity and variability of facial to extract multi-scale and multi-orientational coefficients.
expressions. In this work, we empirically study facial representation based
Deriving an effective facial representation from original face on Local Binary Pattern (LBP) features [27,28] for person-indepen-
images is a vital step for successful facial expression recognition. dent facial expression recognition. LBP features were proposed
There are two common approaches to extract facial features: geo- originally for texture analysis, and recently have been introduced
metric feature-based methods and appearance-based methods [4]. to represent faces in facial images analysis [29–31]. The most
Geometric features present the shape and locations of facial com- important properties of LBP features are their tolerance against
ponents, which are extracted to form a feature vector that repre- illumination changes and their computational simplicity. We
sents the face geometry. Recently Valstar et al. [22,23] have examine different machine learning methods, including template
demonstrated that geometric feature-based methods provide matching, Support Vector Machine (SVM), Linear Discriminant
Analysis (LDA) and the linear programming technique, to perform
facial expression recognition using LBP features. Our study demon-
* Corresponding author.
strates that, compared to Gabor wavelets, LBP features can be de-
E-mail addresses: [email protected] (C. Shan), [email protected] (S.
Gong), [email protected] (P.W. McOwan). rived very fast in a single scan through the raw image and lie in

0262-8856/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2008.08.005
804 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

low-dimensional feature space, while still retaining discriminative 1872 [35]. Suwa et al. [36] made the first attempt to automatically
facial information in a compact representation. We further formu- analyze facial expressions from image sequences in 1978. Much
late Boosted-LBP by learning the most discriminative LBP features progress has been made in the last decade, and a thorough survey
with AdaBoost, and the recognition performance of different clas- of the exiting work can be found in [1,2]. Here we briefly review
sifiers are improved by using the Boosted-LBP features. We also some previous work in order to put our work in context.
evaluate the generalization ability of LBP features across different
databases. 2.1. Facial representation
One limitation of the existing facial expression recognition
methods is that they attempt to recognize facial expressions from Automatic facial expression recognition involves two vital
data collected in a highly controlled environment given high reso- aspects: facial representation and classifier design [4]. Facial repre-
lution frontal faces [26]. However, in real-world applications such sentation is to derive a set of features from original face images to
as smart meeting and visual surveillance, the input face images are effectively represent faces. The optimal features should minimize
often at low resolutions. Obviously low-resolution images in real- within-class variations of expressions while maximize between-
world environments make real-life expression recognition much class variations. If inadequate features are used, even the best clas-
more difficult. Recently Tian et al. [32,26] made a first attempt to sifier could fail to achieve accurate recognition. In some existing
recognize facial expressions at low resolutions. In [26], Tian stud- work [5,6,14,13], optical flow analysis has been used to model
ied the effects of different image resolutions for each step of auto- muscles activities or estimate the displacements of feature points.
matic facial expression recognition. In this work, we investigate However, flow estimates are easily disturbed by the nonrigid mo-
LBP features for low-resolution facial expression recognition. tion and varying lighting, and are sensitive to the inaccuracy of im-
Experiments on different image resolutions show that LBP features age registration and motion discontinuities [18]. Facial geometry
perform stably and robustly over a useful range of low resolutions analysis has been widely exploited in facial representation
of face images. The encouraging performance on real-world com- [9,10,16,17,24], where shapes and locations of facial components
pressed video sequences illustrated their promising applications are extracted to represent the face geometry. For example, Zhang
in real-world environments. et al. [25] used the geometric positions of 34 fiducial points as fa-
This paper is an extended version of our previous work de- cial features to represent facial images. In image sequences, the fa-
scribed in [33]. The main contributions of this paper are summa- cial movements can be qualified by measuring the geometrical
rized as follows: displacement of facial feature points between the current frame
and the initial frame. Valstar et al. [22] presented AU detection
 We empirically evaluate LBP features for person-independent by classifying features calculated from tracked fiducial facial
facial expression recognition. Different machine learning meth- points. Their method detects a similar amount of AUs with similar
ods are exploited to classify expressions on several databases. or higher recognition rates than those reported in [10,3,37]. So they
LBP features were previously used for facial expression classifi- argued that the facial representation based on tracked facial points
cation in [31], and more recently, following our work [33], Liao is well suited for facial expression analysis. Recently they [23] fur-
et al. [34] presented an extended LBP operator to extract fea- ther presented a fully automatic AU detection system that can
tures for facial expression recognition. However, these existing automatically localize facial points in the first frame and recognize
works were conducted on a very small database (JAFFE) using AU temporal segments using a subset of most informative spatio-
an individual classifier. In contrast, here we comprehensively temporal features selected by AdaBoost. However, the geometric
study LBP features for facial expression recognition with differ- feature-based representation commonly requires accurate and
ent classifiers on much larger databases. reliable facial feature detection and tracking, which is difficult to
 We investigate LBP features for low-resolution facial expression accommodate in many situations. In [18], motions of facial features
recognition, a critical problem but seldom addressed in the are measured by simultaneously using an active Infra-Red illumi-
existing work. We not only perform evaluation on different nation and Kalman filtering to deal with large variations of head
image resolutions, but also conduct experiments in real-world motion.
compressed video sequences. Compared to the previous work Another kind of method to represent faces is to model the
[32,26], LBP features provide just as good or better performance, appearance changes of faces. Holistic spatial analysis including
so are very promising for real-world applications. Principal Component Analysis (PCA) [38], Linear Discriminant
 We formulate Boosted-LBP by learning the most discriminative Analysis (LDA) [39], Independent Component Analysis (ICA) [40]
LBP histograms with AdaBoost for each expression, and the rec- and Gabor wavelet analysis [7] have been applied to either the
ognition performance of different classifiers are improved by whole-face or specific face regions to extract the facial appearance
using the Boosted-LBP features. We also evaluate the generaliza- changes. Donato et al. [8] explored different techniques to repre-
tion ability of LBP features cross different databases. sent face images for facial action recognition, which include PCA,
ICA, Local Feature Analysis (LFA), LDA and local schemes such as
The remainder of this paper is structured as follows. We present Gabor-wavelet representation and local principal components.
a brief review of related work in the next section. Local Binary Pat- Best performances were obtained using Gabor-wavelet representa-
terns are introduced in Section 4. Section 5 discusses facial expres- tion and ICA. Due to their superior performance, Gabor-wavelet
sion recognition using LBP features with different classification representations have been widely adopted in face image analysis
techniques. We investigate low-resolution expression recognition [25,7,26,19]. However, the computation of Gabor-wavelet repre-
in Section 6. Boosting LBP for expression recognition is presented sentations is both time and memory intensive, for example, in
in Section 7. We also evaluate across-dataset generalization in Sec- [19], the Gabor-wavelet representation derived from each
tion 8. Finally, Section 9 concludes the paper. 48  48 face image has the high dimensionality of Oð105 Þ. Recently
Local Binary Patterns have been introduced as effective appearance
features for facial image analysis [31,29,30]. We [33] compared LBP
2. Previous work features with Gabor features for facial expression recognition, and
studied their performance over a range of image resolutions. In
Automatic facial expression recognition has attracted much [41], we further presented facial expression manifold learning in
attention from behavioral scientists since the work of Darwin in the LBP feature space. More recently, Liao et al. [34] introduced
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 805

an improved LBP operator to extract features in both intensity and [8,10,18,23]. Meanwhile, psychophysical studies indicate that basic
gradient maps for facial expression recognition, and also tested emotions have corresponding universal facial expressions across
their methods on facial images of reduced resolutions. However, all cultures [44]. This is reflected by most current facial expression
their experiment was carried out on a very small database (213 recognition systems [7,11–13,19] that attempt to recognize a set of
images from 10 subjects). In this work, we comprehensively study prototypic emotional expressions including disgust, fear, joy, sur-
LBP features for facial expression recognition on several databases. prise, sadness and anger. Therefore, in this work, we also focus
on prototypic expression recognition. We consider both 6-class
2.2. Facial expression recognition prototypic expression recognition and 7-class expression recogni-
tion by including the neutral expression.
Different techniques have been proposed to classify facial We mainly conducted experiments on the Cohn–Kanade data-
expressions, such as Neural Network [42,25,26], Support Vector base [45], one of the most comprehensive database in the current
Machine (SVM) [19], Bayesian Network (BN)[11] and rule-based facial-expression-research community. The database consists of
classifiers [9,17,24]. In Lyons et al.’ work [7], the principle compo- 100 university students aged from 18 to 30 years, of which 65%
nents of the feature vectors from training images were analyzed by were female, 15% were African-American and 3% were Asian or La-
LDA to form discriminant vectors, and facial image classification tino. Subjects were instructed to perform a series of 23 facial dis-
was performed by projecting the input vector of a testing image plays, six of which were based on description of prototypic
along the discriminant vectors. Cohen et al. compared different emotions. Image sequences from neutral to target display were
Bayes classifiers [11], and Gaussian Tree-Augmented-Naive (TAN) digitized into 640  490 pixel arrays with 8-bit precision for gray-
Bayes classifiers performed best. Bartlett et al. [19] performed sys- scale values. Fig. 1 shows some sample images from the Cohn–Ka-
tematic comparison of different techniques including AdaBoost, nade database.
SVM and LDA for facial expression recognition, and best results For our experiments, we selected 320 image sequences from the
were obtained by selecting a subset of Gabor filters using AdaBoost database. The only selection criterion was that a sequence could be
and then training SVM on the outputs of the selected filters. Pantic labeled as one of the six basic emotions. The sequences come from
and Rothkrantz adopted rule-based reasoning to recognize action 96 subjects, with 1–6 emotions per subject. For each sequence, the
units and their combination [17]. neutral face and three peak frames were used for prototypic
expression recognition, resulting in 1280 images (108 Anger, 120
To exploit the temporal behaviors of facial expressions, differ- Disgust, 99 Fear, 282 Joy, 126 Sadness, 225 Surprise and 320 Neu-
ent techniques were presented for facial expression recognition tral). To evaluate the generalization performance to novel subjects,
in image sequences. There have been several attempts to track we adopted a 10-fold cross-validation testing scheme in our exper-
and recognize facial expressions over time based on optical flow iments. More precisely, we partitioned the dataset randomly into
analysis [5,6]. Tian et al. [10] presented a Neural Network based ten groups of roughly equal numbers of subjects. Nine groups were
approach to recognize facial action units in image sequences. Hid- used as the training data to train classifiers, while the remaining
den Markov Models (HMMs) have been widely used to model the group was used as the test data. The above process was repeated
temporal behaviors of facial expressions from image sequences ten times for each group in turn to be omitted from the training
[11,13]. Cohen et al. [11] proposed a multi-level HMM classifier, process. We reported the average recognition results on the test
which allows not only to perform expression classification on a vi- sets.
deo segment, bust also to automatically segment a long video se- Following Tian [26], we normalized the faces to a fixed distance
quence to the different expressions segments without resorting between the two eyes. We manually labeled the eyes location, to
to heuristic methods of segmentation. But HMMs can not deal with evaluate LBP features in the condition of no face registration errors.
dependencies in observation. Dynamic Bayesian Networks (DBNs) Automatic face registration can be achieved by face detection [46]
recently were exploited for sequence-based expression recognition and eye localization [26,47], which will be addressed in our future
[16,14,18]. Kaliouby and Robinson [16] proposed a system for work. Facial images of 110  150 pixels were cropped from original
inferring complex mental states from videos of facial expressions frames based on the two eyes location. No further registration such
and head gestures, where a multi-level DBN classifier was used as alignment of mouth [25] was performed in our algorithms. As
to model complex mental states as a number of interacting facial the faces in the database are frontal view, we did not consider head
and head displays. Zhang and Ji [18] explored the use of multisen- pose changes. For realistic sequences with head pose variation,
sory information fusion technique with DBNs for modeling and head pose estimation [26] can be adopted to detect front or near
understanding the temporal behaviors of facial expressions in im- front view. Illumination changes exist in the database, but there
age sequences. Chang et al. proposed a probabilistic video-based was no attempt made to remove illumination changes [26] in our
facial expression recognition method based on manifolds [15]. experiments, due to LBP’s gray-scale invariance. Fig. 2 shows an
Lee and Elgammal [21] recently introduced a framework to learn example of the original face image and the cropped image.
decomposable generative models for dynamic appearance of facial
expressions where facial motion is constrained to one dimensional 4. Local Binary Patterns (LBP)
closed manifolds. The learned model can generate different dy-
namic facial appearances for different people and for different The original LBP operator was introduced by Ojala et al. [27],
expressions, so enabling simultaneous recognition of faces and and was proved a powerful means of texture description. The oper-
facial expressions. ator labels the pixels of an image by thresholding a 3  3 neighbor-
hood of each pixel with the center value and considering the
3. Facial expression data results as a binary number (see Fig. 3 for an illustration), and the
256-bin histogram of the LBP labels computed over a region is used
Facial expressions can be described at different levels [4]. A as a texture descriptor. The derived binary numbers (called Local
widely used description is Facial Action Coding System (FACS) Binary Patterns or LBP codes) codify local primitives including dif-
[43], which is a human-observer-based system developed to cap- ferent types of curved edges, spots, flat areas, etc (as shown in
ture subtle changes in facial expressions. With FACS, facial expres- Fig. 4), so each LBP code can be regarded as a micro-texton [30].
sions are decomposed into one or more Action Units (AUs). AU The limitation of the basic LBP operator is its small 3  3 neigh-
recognition or detection has attracted much attention recently borhood which can not capture dominant features with large scale
806 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

Fig. 1. The sample face expression images from the Cohn–Kanade database.

The LBP operator LBPP;R produces 2P different output values, cor-


responding to the 2P different binary patterns that can be formed
by the P pixels in the neighbor set. It has been shown that certain
bins contain more information than others [28]. Therefore, it is
possible to use only a subset of the 2P Local Binary Patterns to de-
scribe the texture of images. Ojala et al. [28] called these funda-
mental patterns as uniform patterns. A Local Binary Pattern is
called uniform if it contains at most two bitwise transitions from
0 to 1 or vice versa when the binary string is considered circular.
For example, 00000000, 001110000 and 11100001 are uniform
patterns. It is observed that uniform patterns account for nearly
Fig. 2. The original face image and the cropped image. 90% of all patterns in the (8, 1) neighborhood and for about 70%
in the (16, 2) neighborhood in texture images [28]. Accumulating
the patterns which have more than 2 transitions into a single bin
P
yields an LBP operator, denoted LBPu2 P;R , with less than 2 bins. For
example, the number of labels for a neighborhood of 8 pixels is
256 for the standard LBP but 59 for LBPu2 .
After labeling a image with the LBP operator, a histogram of the
labeled image fl ðx; yÞ can be defined as
X
Hi ¼ Iðfl ðx; yÞ ¼ iÞ; i ¼ 0; . . . ; n  1 ð1Þ
x;y

Fig. 3. The basic LBP operator [29].


where n is the number of different labels produced by the LBP oper-
ator and
structures. Hence the operator later was extended to use neighbor- 
1 A is true
hood of different sizes [28]. Using circular neighborhoods and IðAÞ ¼ ð2Þ
0 A is false
bilinearly interpolating the pixel values allow any radius and num-
ber of pixels in the neighborhood. See Fig. 5 for examples of the ex- This LBP histogram contains information about the distribution of
tended LBP operator, where the notation ðP; RÞ denotes a the local micro-patterns, such as edges, spots and flat areas, over
neighborhood of P equally spaced sampling points on a circle of ra- the whole image, so can be used to statistically describe image
dius of R that form a circularly symmetric neighbor set. characteristics.

Fig. 4. Examples of texture primitives which can be detected by LBP (white circles represent ones and black circles zeros) [30].
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 807

Fig. 5. Three examples of the extended LBP [28]: the circular ð8; 1Þ neighborhood, the circular ð12; 1:5Þ neighborhood, and the circular ð16; 2Þ neighborhood, respectively.

Face images can be seen as a composition of micro-patterns


which can be effectively described by the LBP histograms. There-
fore, it is intuitive to use LBP features to represent face images
[29–31]. A LBP histogram computed over the whole face image en-
codes only the occurrences of the micro-patterns without any indi-
cation about their locations. To also consider shape information of
faces, face images were equally divided into small regions
R0 ; R1 ; . . . ; Rm to extract LBP histograms (as shown in Fig. 6). The
Fig. 7. (Left) A face image divided into 6  7 sub-region. (Right) The weights set for
LBP features extracted from each sub-region are concatenated into
weighted dissimilarity measure. Black squares indicate weight 0.0, dark gray 1.0,
a single, spatially enhanced feature histogram defined as light gray 2.0 and white 4.0.
X
Hi;j ¼ Iffl ðx; yÞ ¼ igIfðx; yÞ 2 Rj g ð3Þ
x;y
formed for each class of face images, then a nearest-neighbor clas-
sifier is used to match the input image with the closest template.
where i ¼ 0; . . . ; n  1; j ¼ 0; . . . ; m  1. Here we first adopted template matching to classify facial expres-
The extracted feature histogram represents the local texture sions for its simplicity. In training, the histograms of expression
and global shape of face images. Some parameters can be opti- images in a given class were averaged to generate a template for
mized for better feature extraction. One is the LBP operator, and this class.
the other is the number of regions divided. Following the setting Following [29], we also selected the Chi square statistic ðv2 Þ as
in [29], we selected the 59-bin LBP u2 8;2 operator, and divided the the dissimilarity measure for histograms:
110  150 pixels face images into 18  21 pixels regions, giving a
good trade-off between recognition performance and feature vec-
X ðSi  M i Þ2
v2 ðS; MÞ ¼ ð4Þ
tor length. Thus face images were divided into 42(6  7) regions i
Si þ M i
as shown in Fig 7, and represented by the LBP histograms with
the length of 2478(59  42). where S and M are two LBP histograms. It is observed that some lo-
cal facial regions contain more useful information for expression
classification than others. For example, facial features contributing
5. Facial expression recognition using LBP
to facial expressions mainly lie in regions such as eye and mouth re-
gions. Therefore, a weight can be set for each sub-region based on
In this section, we perform person-independent facial expres-
its importance. The particular weight set we adopted was shown
sion recognition using LBP features. Different machine learning
in Fig. 7, which was designed empirically based on the observation.
techniques, including template matching, Support Vector Ma-
The weighted v2 statistic is then given as
chines, Linear Discriminant Analysis and the linear programming
technique, are examined to recognize expressions. X ðSi;j  Mi;j Þ2
v2w ðS; MÞ ¼ wj ð5Þ
i;j
Si;j þ Mi;j
5.1. Template matching
where S and M are two LBP histograms, and wj is the weight for re-
Template matching was used in [29] to perform face recogni- gion j.
tion using the LBP-based facial representation: a template is The template matching achieved the generalization perfor-
mance of 79.1% for the 7-class task and 84.5% for the 6-class task.
We compared the results with that reported in [11], where Cohen
et al. adopted Bayesian network classifiers to classify 7-class emo-
tional expressions based on the tracked geometric facial features
(eyebrows, eyelids and mouth). They carried out 5-fold cross-vali-
dation on a subset of 53 subjects from the Cohn–Kanade database,
and obtained the best performance of 73.2% by using Tree-Aug-
mented-Naive Bayes (TAN) classifiers. Although we cannot make
a direct comparison due to different experiment setups and pre-
processing procedures, comparison in Table 1 indicates that our
simple template matching using LBP features provides slightly bet-
ter overall performance. The confusion matrix of 7-class recogni-
tion is shown in Table 2. We can observe that Joy and Surprise
Fig. 6. A face image is divided into small regions from which LBP histograms are can be recognized with high accuracy (around 90–92%), but Anger
extracted and concatenated into a single, spatially enhanced feature histogram. and Fear are easily confused with others.
808 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

Table 1 Table 3
Comparisons between the geometric features based TAN [11] and our LBP-based Recognition performance of LBP-based SVM with different kernels
template matching
6-Class recognition (%) 7-Class recognition (%)
Methods (feature + classifier) 7-Class recognition (%) 6-Class recognition (%)
SVM (linear) 91.5 ± 3.1 88.1 ± 3.8
LBP + template matching 79.1 ± 4.6 84.5 ± 5.2 SVM (polynomial) 91.5 ± 3.1 88.1 ± 3.8
Geometric features + TAN [11] 73.2 – SVM (RBF) 92.6 ± 2.9 88.9 ± 3.5

Table 2 Table 4
Confusion matrix of 7-class facial expression recognition using template matching Confusion matrix of 6-class facial expression recognition using SVM (RBF)
with LBP features
Anger (%) Disgust (%) Fear (%) Joy (%) Sadness (%) Surprise (%)
Anger Disgust Fear Joy Sadness Surprise Neutral
Anger 89.7 2.7 0 0 7.6 0
(%) (%) (%) (%) (%) (%) (%)
Disgust 0 97.5 2.5 0 0 0
Anger 58.7 5.5 0 0 26.7 0 9.1 Fear 0 2.0 73.0 22.0 3.0 0
Disgust 3.3 85.0 2.5 0 2.5 0 6.7 Joy 0 0.4 0.7 97.9 1.0 0
Fear 1.0 0 61.7 24.0 10.3 0 3.0 Sadness 10.3 0 0.8 0.8 83.5 4.6
Joy 0 0 6.0 90.4 0 0 3.6 Surprise 0 0 1.3 0 0 98.7
Sadness 4.9 0 0 0 72.4 1.7 21.0
Surprise 0 0 1.3 0 2.7 92.4 3.6
Neutral 2.0 0.8 0.4 0.8 25.7 0 70.3
Table 5
Confusion matrix of 7-class facial expression recognition using SVM (RBF)

Anger Disgust Fear Joy Sadness Surprise Neutral


5.2. Support Vector Machine (SVM) (%) (%) (%) (%) (%) (%) (%)
Anger 85.0 2.7 0 0 4.8 0 7.5
A previous successful technique to facial expression classifica-
Disgust 0 97.5 2.5 0 0 0 0
tion is Support Vector Machine (SVM) [48,19,22,23], so we adopted Fear 0 2.0 68.0 22.0 1.0 0 7.0
SVM as alternative classifiers for expression recognition. As a pow- Joy 0 0 0.7 94.7 1.1 0 3.5
erful machine learning technique for data classification, SVM [49] Sadness 8.6 0 0 0 69.5 2.3 19.6
Surprise 0 0 1.3 0 0 98.2 0.5
performs an implicit mapping of data into a higher (maybe infinite)
Neutral 1.6 0.4 0 1.6 6.0 0.4 90.0
dimensional feature space, and then finds a linear separating
hyperplane with the maximal margin to separate data in this high-
er dimensional space.
Given a training set of labeled examples fðxi ; yi Þ; i ¼ 1; . . . ; lg the RBF kernel are shown in Tables 4 and 5. It is observed that, Dis-
where xi 2 Rn and yi 2 f1; 1g, a new test example x is classified gust, Joy, Surprise and Neutral can be recognized with high accuracy
by the following function: (90–98%), while the recognition rates for Fear and Sadness are much
lower (68–69%). Compared to the recognition results of template
X
l
f ðxÞ ¼ sgnð ai yi Kðxi ; xÞ þ bÞ ð6Þ matching in Table 2, the recognition performance for every expres-
i¼1 sion is increased except Fear. For the 6-class problem, the number
of support vectors of the linear/polynomial SVMs were 18–29% of
where ai are Lagrange multipliers of a dual optimization problem
the total number of training samples, while the RBF SVMs employed
that describe the separating hyperplane, Kð; Þ is a kernel function,
18–31%. For the 7-class problem, the linear/polynomial SVMs em-
and b is the threshold parameter of the hyperplane. The training
ployed 15–30%, while the RBF SVMs employed 16–35%.
sample xi with ai > 0 is called support vectors, and SVM finds the
We further compare LBP features with Gabor-wavelet features
hyperplane that maximizes the distance between the support vec-
for facial expression recognition using SVMs. Following Bartlett
tors and the hyperplane. Given a non-linear mapping U that embeds
et al. [48,19], we converted images into a Gabor magnitude repre-
the input data into the high dimensional space, kernels have the
sentation using a bank of Gabor filters at 8 orientations and 5 spa-
form of Kðxi ; xj Þ ¼ hUðxi Þ  Uðxj Þi. SVM allows domain-specific selec-
tial frequencies (9:36 pixels per cycle at 1/2 octave steps2). To
tion of the kernel function. Though new kernels are being proposed,
reduce the length of the feature vector, the outputs of the 40 Gabor
the most frequently used kernel functions are the linear, polyno-
filters were downsampled by a factor of 16 [8], so the dimension-
mial, and Radial Basis Function (RBF) kernels.
ality of the Gabor feature vector is 42; 650ð40  110=4  150=4Þ.
SVM makes binary decisions, so the multi-class classification
We report the generalization performance of Gabor-wavelet fea-
here is accomplished by using the one-against-rest technique,
tures in Table 6.
which trains binary classifiers to discriminate one expression from
Bartlett et al. [48,19] recently conducted similar experiments
all others, and outputs the class with the largest output of binary
using the Gabor-wavelet representation with SVMs on the Cohn–
classification. With regard to the parameter selection of SVM, as
Kanade database. They selected 313 image sequences from the
suggested in [50], we carried out grid-search on the hyper-param-
database, which came from 90 subjects, with 1–6 emotions per
eters in the 10-fold cross-validation. The parameter setting pro-
subject. The facial images were converted into a Gabor magnitude
ducing best cross-validation accuracy was picked. We used the
representation using a bank of 40 Gabor filters. They [48] divided
SVM implementation in the public available machine learning li-
the subjects randomly into ten groups of roughly equal size and
brary SPIDER1 in our experiments. The generalization performances
did ‘‘leave one group out” cross-validation. SVMs with linear, poly-
achieved with different kernels are shown in Table 3, where the de-
nomial and RBF kernels were used to classify 7-class expressions.
gree of the polynomial kernel is 1, and the standard deviation for the
Linear and RBF kernels performed best, achieving recognition rates
RBF kernel is 213 for 7-class recognition and 211 for 6-class recogni-
of 84.8% and 86.9%, respectively. We also include the recognition
tion. The confusion matrices of 6-class and 7-class recognition with

2
pffiffiffi pffiffiffi pffiffiffi
1
pffiffiffii.e., 9, 9 2, 18, 18 2, 36 pixels per cycle, so the frequencies used 12, 6 2, 6,
http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html. 3 2, 3 cycles/image-width.
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 809

Table 6 In each trial of our 10-fold cross-validation experiments, the


Comparisons between LBP features with Gabor-filter features for facial expression training data was first projected into a PCA subspace (98% of infor-
recognition using SVMs.
mation was kept according to the reconstruction error, and the
6-Class 7-Class resulting number of eigenvectors ranges 358–378 for the 6-class
LBP Gabor LBP Gabor Gabor [48] problem and 405–431 for the 7-class problem), then the LDA trans-
(%) (%) (%) (%) (%) formation matrix was trained in the PCA subspace, where the
SVM (linear) 91.5 ± 3.1 89.4 ± 3.0 88.1 ± 3.8 86.6 ± 4.1 84.8 dimension that LDA kept was c  1. For facial expression recogni-
SVM (polynomial) 91.5 ± 3.1 89.4 ± 3.0 88.1 ± 3.8 86.6 ± 4.1 Worse than tion, we adopted a Nearest-Neighbor (NN) classifier for its simplic-
RBF/linear ity. The Euclidean metric was used as the distance measure. The
SVM (RBF) 92.6 ± 2.9 89.8 ± 3.1 88.9 ± 3.5 86.8 ± 3.6 86.9
generalization performance LDA + NN achieved is 73.4% for 7-class
recognition and 79.2% for 6-class recognition. We also adopted
SVM (linear) to perform recognition in the PCA subspace, i.e., the
results they reported in Table 6. In their more recent paper [19], input of SVM is not the original LBP features but their PCA projec-
they reported 88.0% (Linear) and 89.1% (RBF) in Leave-one-sub- tions. We compare LDA + NN with SVM (linear) in Table 8, and it is
ject-out experiments. observed that the performance of LDA + NN is much lower than
Comparisons summarized in Table 6 show that the LBP-based that of SVMs.
SVMs perform slightly better than the Gabor-wavelet based SVMs.
More crucially though, the advantage of LBP features lies at very 5.4. Linear programming
fast feature extraction. We compare the time and memory costs
of feature extraction process (Matlab implementation) between Feng et al. [51] recently presented an approach for facial expres-
LBP features with Gabor-wavelet features in Table 7, where the sion recognition that uses LBP features with a linear programming
Gabor-filter convolutions were calculated in spatial domain. It is technique, and demonstrated its effectiveness on a small database
observed that LBP features bring significant speed benefit, and, (JAFFE). In [52], the linear programming technique was adopted to
compared to the high dimensionality of the Gabor-wavelet perform simultaneous feature selection and classifier training for
features, LBP features lie in a much lower dimensional space. facial expression recognition. Here we also examine the linear pro-
gramming technique for facial expression recognition using LBP
5.3. Linear Discriminant Analysis (LDA) features.
Given two sets of data samples A and B in Rn , we seek a linear
Facial deformations lie intrinsically on much lower dimensional function such that f ðxÞ > 0 if x 2 A, and f ðxÞ 6 0 if x 2 B. This func-
subspaces. Therefore, subspace analysis has been widely exploited tion is given by f ðxÞ ¼ wT x  c, and determine a plane wT x ¼ c with
to discover subspaces for face image analysis [38,39,42,7]. LDA [39] normal w 2 Rn that separate A from B. Let the set of m samples in
is a supervised subspace learning technique, and has been previ- A be represented by a matrix A 2 Rmn and the set of k samples in
ously applied to facial expression recognition [7]. Here we further B be represented by a matrix B 2 Rkn . After normalization, we
adopted LDA to recognize expressions using LBP features. LDA want to satisfy
searches for the projection axes on which the data points of differ-
ent classes are far from each other while requiring data points of Aw P ec þ e; Bw 6 ec  e ð10Þ
the same class to be close to each other. where e is a vector of all 1s with appropriate dimension. Practically,
Given multi-dimensional data samples x1 ; x2 ; . . . ; xm in Rn that because of the overlap between the two classes, one has to mini-
belong to c classes, LDA find a transformation matrix W that maps mize some norm of the average error in Eq. (10) [52]:
these m points to y1 ; y2 ; . . . ; ym in Rl ðl 6 cÞ, where yi ¼ W T xi . The
objective function of LDA is as follows: 1 1
min f ðw; cÞ ¼ min kðAw þ ec þ eÞþ k1 þ kðBw  ec
w;c w;c m k
wT SB w þ eÞþ k1 ð11Þ
max ð7Þ
w wT SW w where xþ denotes the vector with components satisfying
X
c
ðxþ Þi ¼ maxfxi ; 0g; i ¼ 1; . . . ; n, and k  k1 denotes the 1-norm. Eq.
SB ¼ ni ðmðiÞ  mÞðmðiÞ  mÞT ð8Þ
i¼1
(11) can be modeled as a so-called robust linear programming prob-
ni
! lem [52]:
X
c X
ðxj  mðiÞ Þðxj  mðiÞ ÞT
ðiÞ ðiÞ
SW ¼ ð9Þ
eT y eT z
i¼1 j¼1 min þ ð12Þ
w;c;y;z m k
8
where m is the mean of all the samples, ni is the number of samples > Aw þ ec þ e 6 y;
<
ðiÞ
in the ith class, mðiÞ is the average vector of the ith class, xj is the jth subject to Bw  ec þ e 6 z;
>
:
sample in the ith class, SB is between-class scatter matrix, and SW is y P 0; z P 0
within-class scatter matrix. In practice, the dimension of the feature
which minimizes the average sum of misclassification errors. We
space ðnÞ is often much larger than the number of samples in a
use Eq. (12) to solve the classification problem.
training set ðmÞ. So the matrix SW is singular. To overcome this
Following Feng et al. [51], multi-class facial expression recogni-
problem, usually the dataset is first projected into a lower dimen-
tion was decomposed into one-to-one pairs of binary classification,
sional PCA space.

Table 8
Table 7 Comparison between LDA + NN and SVM (linear) for facial expression recognition
Time and memory costs for extracting LBP features and Gabor-filter features using LBP features

LBP Gabor Gabor [48] 7-Class recognition (%) 6-Class recognition (%)
Memory (feature dimension) 2478 42,650 92,160 LDA + NN 73.4 ± 5.6 79.2 ± 7.2
Time (feature extraction time) 0.03 s 30 s – SVM (linear) 80.2 ± 4.9 87.7 ± 4.7
810 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

where each binary classifier was produced by the linear program- Table 9
ming technique. Binary classifiers were combined with a voting Comparison between the linear programming technique and SVM (linear) for facial
expression recognition
scheme to output the final recognition result. To reduced the
length of the LBP feature vector, we also discarded the dimensions 7-Class recognition (%) 6-Class recognition (%)
whose occurrence frequency is lower than a threshold [51]. The Linear programming 82.3 ± 3.5 89.6 ± 3.6
threshold of 5 was adopted in our experiments. SVM (linear) 86.0 ± 3.3 90.4 ± 3.9
In our 10-fold cross-validation experiments, the linear pro-
gramming technique produces the generalization performance of
82.3% for 7-class recognition and 89.6% for 6-class recognition.
We compare its performance with that of SVM (linear) in Table
9, where the input of SVM (linear) is also the feature vectors with
dimensions discarded. It is observed that the linear programming
technique produces the slight inferior performance to SVM (linear).

6. Low-resolution facial expression recognition

In real-world environments such as smart meeting and visual


surveillance, only low-resolution video input is available. Fig. 8
shows a real-world image recorded in a smart meeting scenario.
How to derive a discriminative facial representation from low-res-
olution images is a critical problem for real-world applications. In
this section, we investigate LBP features for low-resolution facial
expression recognition. We first evaluated LBP features on differ-
ent image resolutions, then performed experiments on real-world
compressed low-resolution video sequences. Fig. 8. An example of low-resolution facial expressions recorded in real-world
environments (from PETS 2003 dataset).
6.1. Evaluation on different resolutions
cause of different experimental setups, pre-processing procedures
As shown in Table 10, totally six different resolutions of the face
and classifiers.
region were studied (110  150, 55  75, 36  48, 27  37, 18  24
We can draw the following conclusions from the experimental
and 14  19 pixels) based on the Cohn–Kanade database. The low-
results shown in Table 10: (1) Geometric features are not available
er resolution images were down-sampled from the original images.
for lower resolution, while appearance features such as Gabor
For LBP feature extraction, lower resolution face images were di-
wavelets and LBP features can be extracted on different resolu-
vided into 10  10 pixels regions (which may overlap with each
tions. It is difficult to detect or track facial components such as
other in the small face images). For example, face images of
mouth, eyes, brows and nose in lower resolution images, so geo-
14  19 pixels were divided into 12(3  4) regions of 10  10 pix-
metric features are not reliable in low-resolution images. On the
els: the overlap between adjacent regions is 8 pixels (along the side
contrary, appearance features present the appearance changes of
of 14 pixels) or/and 7 pixels (along the side of 19 pixels). We
faces such as wrinkles and furrows, and are available even in lower
adopted the 4-neighborhood LBP operator LBP 4;1 for each sub-
resolutions. (2) The presented LBP features perform slightly better
region.
than the Gabor-wavelet representation on low-resolution expres-
To compare with Tian’s work [26], we conducted experiments
sion recognition. Recently Liao et al. [34] also compared LBP fea-
on 6-class basic expression recognition using SVM with RBF kernel.
ture with Gabor-filter features on the JAFFE database, and their
We report the recognition results in Table 10, where the standard
experiments demonstrated that LBP features provide better perfor-
deviation of RBF kernels were 211 , 29 , 27 , 28 , 26 and 28 , respectively.
mance for low-resolution face images, which reenforces our find-
Besides LBP features, We also carried out experiments with the Ga-
ing (3) The LBP features perform robustly and stably over a
bor-magnitude representation by convolving images with a bank
useful range of low resolutions. This reenforce the superiority of
of 40 Gabor filters at 8 orientations and 5 spatial frequencies.
LBP features in face detection and recognition in low-resolution
The generalization performances of the Gabor-wavelet representa-
images reported in [30]. So LBP features are very promising for
tion are also shown in Table 10.
real-world applications where low-resolution video input is only
In Tian’s experiments [26], 375 image sequences were selected
available.
from the Cohn–Kanade database for 6-class expression classifica-
tion. Tian extracted two types of facial features: geometric features
6.2. Evaluation on real-world video sequences (PETS)
and appearance features. Geometric features were derived by fea-
ture tracking [10] and feature detection [32], respectively. For
We further conducted experiments on compressed low-resolu-
appearance features, a bank of 40 Gabor filters were applied to
tion image sequences recorded in a real environment. We used the
the difference images to extract facial appearance changes, where
smart meeting dataset in the PETS 2003 evaluation datasets.4 Re-
the difference images were obtained by subtracting a neural
sults on scenario A, camera 1 were reported here. In this scenario,
expression for each image. A three-layer Neural Network was
each person enters the conference room one after the other, goes to
adopted to recognize expressions. Recognition results of Tian’s
his place, presents himself to the frontal camera, and sits down.
methods are summarized in Table 10.3 However, we cannot make
Then each person looks at the other people with different expres-
direct comparative analysis between Tian’s results with ours be-
sions. Fig. 8 shows an example frame in the video sequence. Three
facial expressions, neutral, anger and joy, are available in the
dataset.
3
In [26], the different resolutions of the head region are 144  192, 72  96, 36  48,
18  24 pixels, which are comparable to the resolutions of the face region 110  150,
4
55  75, 27  37, 14  17 pixels in our experiments. http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html.
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 811

Table 10
Recognition performance (%) in low-resolution images with different methods

110  150 55  75 36  48 27  37 18  24 14  19

LBP 92.6 ± 2.9 89.9 ± 3.1 87.3 ± 3.4 84.3 ± 4.1 79.6 ± 4.7 76.9 ± 5.0
Gabor 89.8 ± 3.1 89.2 ± 3.0 86.4 ± 3.3 83.0 ± 4.3 78.2 ± 4.5 75.1 ± 5.1
Gabor [26] 92.2 91.6 – 77.6 – 68.2
Feature tracking [26] 91.8 91.6 – N/A – N/A
Feature detection [26] 73.8 72.9 – 61.3 N/A

‘‘–” indicates that the image resolution was not examined in [26], and ‘‘N/A” indicates that the image resolution was studied in [26], but no recognition result was obtained.

The real-world video sequence contains the full range of head Table 11
motion. In Tian’s previous work [32], the head pose was first esti- Examples of modified GT vs original GT

mated based on the detected head, and then for frontal and near
frontal views of the face, the facial features were extracted to per-
form facial expression recognition. Since our focus was investigat-
Original GT Neutral Joy Neutral Neutral
ing the validity of LBP features in real-world compressed video Modified GT Sideview Sideview Joy Joy
inputs, we did not consider pose estimation currently. We cropped
the face region in frontal and near frontal views based on the loca-
tion of two eyes from the input image sequence, then performed Table 12
recognition on the cropped face images. Fig. 9 shows face regions Examples of failed recognition
cropped in one frame.
It is very difficult, even for human beings, to recognize facial
expressions at low resolution. Following Tian et al. [32], experi-
ments were conducted on showing some frames of expression at Modified GT Joy Joy Neutral Neutral
Test Results Others Others Joy Others
low resolution to a small set of human observers (in this instance
five researchers in our lab) resulting in many who could not per-
form recognition against the ground truth provided by the PETS
dataset (original GT). Tian et al. modified the ground truth based expressions. Additionally, as the training images are captured
on the majority. Here we also generated a new ground truth (mod- when subjects exaggeratedly pose their facial expressions, while
ified GT) for some frames based on human observations. Examples the test images are natural facial expressions without any delib-
of modified GT vs original GT are shown in Table 11. erate exaggerated posing, this difference in data also brings some
A total of 1209 images from the Cohn–Kanade database were classification errors.
used to train the SVM classifier. Since face regions in PETS dataset
are around 40  50 pixels, the training images were down-sampled
7. Boosting LBP for facial expression recognition
from the original images to 38  48 pixels. The trained classifier
recognized five expressions: neutral, joy, angry, surprise and others
The above experiments clearly demonstrate that the LBP fea-
(including fear, sadness and disgust).
tures are effective for facial expression recognition, and performed
Our method performed well with the input real-world image
just as well or better than reported existing techniques but with a
sequence. The overall recognition rate on frames from 18,000 to
significant low-computation advantage. In the above investigation,
18,190 was 91.5%, which is comparable to results reported in
face images are equally divided into small sub-regions from which
Tian’s work [32]. Table 12 shows some failed examples. We ob-
LBP histograms are extracted and concatenated into a single fea-
serve that some frames of near frontal view were incorrectly clas-
ture vector. However, apparently the extracted LBP features de-
sified because our training data includes only frontal view
pend on the divided sub-regions, so this LBP feature extraction
scheme suffers from fixed sub-region size and positions. By shifting
and scaling a sub-window over face images, many more sub-re-
gions can be obtained, bringing many more LBP histograms, which
yield a more complete description of face images. To minimize a
very large number of LBP histograms necessarily introduced by
shifting and scaling a sub-window, boosting learning [53] can be
used to learn the most effective LBP histograms that containing
much discriminative information. In [54], Zhang et al. presented
an approach for face recognition by boosting LBP-based classifiers,
where the distance between corresponding LBP histograms of two
face images is used as a discriminative feature, and AdaBoost was
used to learn a few of most efficient features. In our previous work
[55], we presented a conditional mutual information base boosting
scheme to select the most discriminative LBP histograms for facial
expression recognition. We observed that AdaBoost performs bet-
ter than the conditional mutual information based boosting when
using several tens of weak classifiers. Therefore, in this section, we
Fig. 9. We cropped the face region in frontal and near frontal view based on the learn the most discriminative LBP histograms using AdaBoost for
location of two eyes from the input image sequence (frame 17,130). better facial representation.
812 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

AdaBoost methods [56,53] provide a simple yet effective ap- largest positive output of binary classifiers. In our experiments,
proach for stagewise learning of a nonlinear classification function. AdaBoost training continued until the classifier output distribution
AdaBoost learns a small number of weak classifiers whose perfor- for the positive and negative samples were completely separated,
mance can be just better than random guessing, and boosts them so the number of LBP histograms selected for each expression
iteratively into a strong classifier of higher accuracy. The process was not pre-defined, but automatically decided by the AdaBoost
of AdaBoost maintains a distribution on the training samples. At learner itself. In the 10-fold experiments, the number of selected
each iteration, a weak classifier which minimizes the weighted er- LBP histogram ranges 49–52 for 6-class expressions and 65–70
ror rate is selected, and the distribution is updated to increase the for 7-class expressions. For example, Fig. 11 displays the selected
weights of the misclassified samples and reduce the importance of sub-regions (LBP histograms) for each basic expression in one trial
the others. AdaBoost has been successfully used in many problems of the 10-fold cross-validation. We can observe that the selected
such as face detection [46]. sub-regions have variable sizes and positions. Moreover, while
As each LBP histogram is calculated from a sub-region, Ada- the weights of sub-regions in the template matching in Section
Boost is actually used to find the sub-regions that contain more 5.1 were chosen empirically, the weights in boosted classifiers
discriminative information for facial expression classification in were learned by AdaBoost. The generalization performance of the
term of the LBP histogram. On selecting a weak classifier for Ada- boosted classifiers is 84.6% for 7-class recognition and 89.8% for
Boost, we adopted the histogram-based template matching. For 6-class recognition, respectively. As shown in Table 13, compared
each sub-region, the LBP histograms in a given class are averaged to the LBP based template matching in Section 5.1, AdaBoost
to generate a template for this class. The trained weak classifier (Boosted-LBP) provides improved performance. We also show the
matches the input histogram with the closest template, and out- confusion matrix of 7-class recognition using AdaBoost in Table
puts the corresponding class label. The Chi square statistic ðv2 Þ 14, where Disgust, Joy, Surprise and Neutral can be recognized
was used as the dissimilarity measure for histograms (Eq. (4)). As with high accuracy. It can be seen that AdaBoost’s performance is
the traditional AdaBoost works on two-class problems, the multi- inferior to that of SVM (RBF) reported in Table 5 for most expres-
class problem here is accomplished by using the one-against-rest sions except Fear and Neutral.
technique, which trains AdaBoost between one expression with We further combine feature selection by AdaBoost with classi-
all others. For each AdaBoost learner, the images of one expression fication by SVM. In particular, we train SVM with the Boosted-
were positive samples, while the images of all other expressions LBP features. In each trial of the 10-fold cross-validation, we ap-
were negative samples. plied AdaBoost to learn the discriminative LBP histograms for each
By shifting and scaling a sub-window, 16,640 sub-regions, i.e., expression, and then utilized the union of the selected LBP histo-
16,640 LBP histograms, in total were extracted from each face im- grams as the input for SVMs. For example, in Fig. 11, the union
age. The sub-window was shifted in the whole image with the of all sub-regions selected resulted in a total of 51 LBP histograms.
shifting step of 4 pixels, while its size was scaled between The generalization performance of Boosted-LBP based SVM is sum-
10  10 pixels and 25  20 pixels with the scaling step of 5 pixels. marized in Table 15, where the degree of the polynomial kernel is 1
AdaBoost was used to learn a small subset (in tens) of effective LBP and the standard deviation for the RBF kernel is 211 . For compari-
histograms. we plot in Fig. 10 the spatial localization of the 50 sub- son, we also include the recognition performance of LBP based
regions (i.e., the centers of the sub-regions) that corresponded by SVMs (in Section 5.2) in Table 15. We observe that Boosted-LBP
the top 50 LBP histograms selected by AdaBoost for each expres- based SVMs outperform LBP-based SVMs by around 2.5–3.5%
sion. It is observed that different expressions have different key points. The 7-class expression recognition result of 91.4% is very
discriminant LBP features, and the discriminant features are encouraging, compared to the state of the art [11]. Bartlett et al.
mainly distributed in the eye and mouth regions. [19] obtained the best performance 93.3% by selecting a subset
We performed facial expression recognition using the strong of Gabor filters using AdaBoost and then training SVM on the out-
classifiers boosted by AdaBoost, and outputs the class with the puts of the selected filters. With regard to the 6-class recognition,

Fig. 10. Distributions of the top 50 sub-regions (LBP histograms) selected AdaBoost for each expression.

Fig. 11. The sub-regions (LBP histograms) selected by AdaBoost for each emotion. from left to right: Anger, Disgust, Fear, Joy, Sadness, Surprise.
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 813

Table 13 Table 17
Recognition performance of Boosted-LBP vs LBP Recognition performance of LBP based LDA vs Boosted-LBP based LDA

7-Class recognition (%) 6-Class recognition (%) 7-Class recognition (%) 6-Class recognition (%)
AdaBoost (Boosted-LBP) 85.0 ± 4.5 89.8 ± 4.7 LBP based LDA 73.4 ± 5.6 79.2 ± 7.2
LBP + template matching 79.1 ± 4.6 84.5 ± 5.2 Boosted-LBP based LDA 77.6 ± 5.7 84.2 ± 6.1

Table 14 8. Generalization to other datasets


Confusion matrix of 7-class facial expression recognition using AdaBoost (Boosted-
LBP)
We evaluated the Boosted-LBP based SVM approach on another
Anger Disgust Fear Joy Sadness Surprise Neutral two publicly available databases: the MMI database [57] and the
(%) (%) (%) (%) (%) (%) (%) JAFFE database [7]. The MMI database includes more than 20 stu-
Anger 66.6 3.7 2.0 0 7.3 0 20.4 dents and research staff members of both sexes (44% female), rang-
Disgust 0 92.5 2.5 0 0 0 5.0 ing in age from 19 to 62, having either a European, Asian, or South
Fear 0 0 70.0 17.0 3.0 0 10.0 American ethnic background. Subjects were instructed to display
Joy 0 0 2.5 90.1 0 0 7.4
Sadness 6.4 0 0 0 61.2 0.8 31.6
79 series of facial expressions, six of which are prototypic emo-
Surprise 0 0 1.3 0 0.5 92.5 5.7 tions. Image sequences have neutral faces at the beginning and
Neutral 0 0 0.8 0.4 3.6 0 95.2 the end, and were digitized into 720  576 pixels. Some sample
images from the MMI database are shown in Fig. 12. Although
the original data in the MMI database are color images, in our
experiment, we converted them to 8-bit grayscale images. As can
Table 15
Recognition performance of Boosted-LBP based SVMs vs LBP based SVMs be seen, the subjects displayed facial expressions with and without
glasses, which makes facial expression recognition more difficult.
7-Class 6-Class
The JAFFE database consists of 213 images of Japanese female facial
Boosted-LBP (%) LBP (%) Boosted-LBP (%) LBP (%) expression. Ten expressers posed 3 or 4 examples for each of the
SVM (linear) 91.1 ± 4.0 88.1 ± 3.8 95.0 ± 3.2 91.5 ± 3.1 seven basic expressions (six emotional expressions plus neutral
SVM (polynomial) 91.1 ± 4.0 88.1 ± 3.8 95.0 ± 3.2 91.5 ± 3.1 face). The image size is 256  256 pixels. Fig. 13 shows some sam-
SVM (RBF) 91.4 ± 3.8 88.9 ± 3.5 95.1 ± 3.4 92.6 ± 2.9 ple images from the JAFFE database.
In our experiments, 96 image sequences were selected from the
MMI database. The only selection criterion is that a sequence can
the result of 95.1% is, to our best knowledge, the best recognition be labeled as one of the six basic emotions. The sequences come
rate reported so far in the published literature on this database. from 20 subjects, with 1–6 emotions per subject. The neutral face
Previously Tian [26] achieved 94% performance using a three-layer and three peak frames of each sequence (hence, 384 images in to-
neural networks when combining geometric features and Gabor tal) were used for 7-class expression recognition. All 213 images of
wavelet features. The confusion matrix of 7-class expression recog- the JAFFE database were used for 7-class expression recognition. As
nition using Boosted-LBP based SVM (RBF) is shown in Table 16. we did on the Cohn–Kanade database, we normalized faces from
We can observe that, Disgust, Joy and Surprise can be recognized the MMI database and the JAFFE database to a fixed distance be-
with very high accuracy (more than 97%), and Sad is the easiest
confused expression with recognition accuracy around 75%. We
also re-conducted the experiments on low-resolution face images
in Section 6 using the Boosted-LBP features, and the recognition
rates all increase 3–5%.
We also evaluated LDA using Boosted-LBP features. As dis-
cussed in Section 5.3, in each trial of the 10-fold cross-validation,
the training data was first projected into a PCA subspace with
98% of variance kept, and the dimension of LDA subspace was
c  1. The nearest-neighbor classifier was adopted as the classifier
using the Euclidean distance measure. Boosted-LBP based LDA ob-
tained the generalization performance of 77.6% for 7-class recogni-
tion and 84.2% for 6-class recognition. As shown in Table 17, LDA’s
performance is clearly improved by using Boosted-LBP features.
But the performance of LDA is still inferior to that of SVM.

Table 16
Confusion matrix of 7-class facial expression recognition using Boosted-LBP based
SVM

Anger Disgust Fear Joy Sadness Surprise Neutral


(%) (%) (%) (%) (%) (%) (%)
Anger 85.1 2.7 0 0 8.6 0 3.6
Disgust 0 97.5 0.8 1.7 0 0 0
Fear 0 1.0 79.9 11.0 3.1 1.0 4.0
Joy 0 0 0 97.5 0.4 0 2.1
Sadness 12.0 0 0.8 0 74.7 0 12.5
Surprise 0 0 1.3 0.9 0 97.3 0.5
Neutral 1.2 0 0.8 3.6 2.4 0 92.0
Fig. 12. The sample face expression images from the MMI database.
814 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

Fig. 13. The sample face expression images from the JAFFE database.

tween the two eyes; face images of 110  150 pixels were cropped different databases in the same way, the only difference between
from original frames based on the two eyes location. them is that they were collected under different controlled envi-
We first performed 10-fold cross-validation on each dataset, ronments. So the current expression classifier trained on a single
and the recognition rates are shown in the top two rows of Table dataset with uniformly controlled environment works well only
18, where the degree of the polynomial kernel is 1 and the stan- within that dataset. In order to generalize across image collection
dard deviation for the RBF kernel is 215 for the MMI database and environments, we have to collect large training datasets with vari-
28 for the JAFFE database. The best recognition performance of ations in image conditions [59].
86.9% on the MMI database is inferior to that on the Cohn–Kanade
database. This is possibly because that there are fewer images in 9. Conclusions and future work
the dataset, and subjects are wearing glasses. The performance
on the JAFFE is worst overall compared to that of the Cohn–Kanade In this paper, we present a comprehensive empirical study of fa-
database and the MMI database, and this may be also due to a cial expression recognition based on Local Binary Patterns features.
much small dataset. With LBP features and the linear programming Different classification techniques are examined on several dat-
technique, Feng et al. [51] reported the performance of 93.8% on abases. The key issues of this work can be summarized as follows:
the JAFFE database. They preprocessed the images using the CSU
Face Identification Evaluation System [58] to exclude nonface area 1. Deriving an effective facial representation from original face
with an elliptical mask. Liao et al. [34] recently reported the recog- images is a vital step for successful facial expression recogni-
nition performance of 85.6% on the JAFFE database, but they did tion. We empirically evaluate LBP features to describe appear-
not conducted 10-fold cross-validation. ance changes of expression images. Extensive experiments
We then performed across-dataset experiments, i.e., we per- illustrate that LBP features are effective and efficient for facial
formed LBP feature selecting and SVM training on the Cohn–Ka- expression recognition.
nade database, and then tested the classifier on the MMI 2. One challenge for facial expression recognition is recognizing
database and the JAFFE database, respectively. Recognition results facial expressions at low resolutions, as only compressed low-
are shown in the bottom two rows of Table 18, where the degree of resolution video input is available in real-world applications.
the polynomial kernel is 1 and the standard deviation for the RBF We investigate LBP features on low-resolution images, and
kernel is 214 for the MMI database and 211 for the JAFFE database. observe that LBP features perform stably and robustly over a
We observe that generalization performance across datasets was useful range of low resolutions of face images.
much lower, such as around 50% on the MMI database and around 3. We adopt AdaBoost to learn the most discriminative LBP fea-
40% on the JAFFE database. These results actually reenforce Bartlett tures from a large LBP feature pool. Best recognition perfor-
et al.’s recent finding [59], where they trained selected Gabor- mance is obtained by using SVM with Boosted-LBP features.
wavelet features based SVMs on the Cohn–Kanade database and However, this method has limitation on generalization to other
tested them on another Pictures of Facial Affect database, and ob- datasets.
tained 56–60% performance. As we preprocessed face images of
Since the performance of the boosted strong classifier originates
in the characteristics of its weak hypothesis space, we will evaluate
Table 18 other kinds of weak classifiers as alternative to template matching,
Generalization performance of Boosted-LBP based SVM on other datasets
in order to achieve better classification performance. One limita-
SVM (linear) SVM (polynomial) SVM (RBF) tion of this work is that the recognition is performed by using static
(%) (%) (%) images without exploiting temporal behaviors of facial expres-
MMI 86.7 86.7 86.9 sions. The psychological experiments by Bassili [60] have sug-
JAFFE 79.8 79.8 81.0 gested that facial expressions are more accurately recognized
Train:Cohn–Kanade Test:MMI 50.8 50.8 51.1
from a dynamic image than from a single static image. We will ex-
Train:Cohn–Kanade Test:JAFFE 40.4 40.4 41.3
plore temporal information in our future work. Recently volume
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 815

LBP and LBP from three orthogonal planes have been introduced [24] M. Pantic, I. Patras, Dynamics of facial expression: recognition of facial
actions and their temporal segments from face profile image sequences,
for dynamic texture recognition [61], showing promising perfor-
IEEE Transactions on Systems, Man, and Cybernetics 36 (2) (2006) 433–
mance on facial expression recognition in video sequences. An- 449.
other limitation of the current work is that we do not consider [25] Z. Zhang, M.J. Lyons, M. Schuster, S. Akamatsu, Comparison between
head pose variations and occlusions, which will be addressed in geometry-based and Gabor-wavelets-based facial expression recognition
using multi-layer perceptron, in: IEEE International Conference on
our future work. We will also study the effect of imprecise face Automatic Face & Gesture Recognition (FG), 1998.
location on expression recognition results. [26] Y. Tian, Evaluation of face resolution for expression analysis, in: CVPR
Workshop on Face Processing in Video, 2004.
[27] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures
Acknowledgements with classification based on featured distribution, Pattern Recognition 29 (1)
(1996) 51–59.
We would like to thank Prof. Jeffery Cohn for the use of the [28] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns, IEEE
Cohn–Kanade database, Prof. Maja Pantic and Dr. Michel F. Valstar Transactions on Pattern Analysis and Machine Intelligence 24 (7) (2002)
for the use of the MMI database, and Dr. Michael J. Lyons for the 971–987.
use of the JAFFE database. [29] T. Ahonen, A. Hadid, M. Pietikäinen, Face recognition with local binary
patterns, in: European Conference on Computer Vision (ECCV), 2004.
[30] A. Hadid, M. Pietikäinen, T. Ahonen, A discriminative feature space for
References detecting and recognizing faces, in: IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2004.
[1] M. Pantic, L. Rothkrantz, Automatic analysis of facial expressions: the state of [31] X. Feng, A. Hadid, M. Pietikäinen, A coarse-to-fine classification scheme for
art, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12) facial expression recognition, International Conference on Image Analysis and
(2000) 1424–1445. Recognition (ICIAR), Lecture Notes in Computer Science, vol. 3212, Springer,
[2] B. Fasel, J. Luettin, Automatic facial expression analysis: a survey, Pattern 2004, pp. 668–675.
Recognition 36 (2003) 259–275. [32] Y. Tian, L. Brown, A. Hampapur, S. Pankanti, A. Senior, R. Bolle, Real world
[3] M. Pantic, L. Rothkrantz, Toward an affect-sensitive multimodal human– real-time automatic recognition of facial expression, in: IEEE Workshop on
computer interaction, in: Proceeding of the IEEE, vol. 91, 2003, pp. 1370–1390. Performance Evaluation of Tracking and Surveillance (PETS), Australia,
[4] Y. Tian, T. Kanade, J. Cohn, Handbook of Face Recognition, Springer, 2005 2003.
(Chapter 11. Facial Expression Analysis). [33] C. Shan, S. Gong, P.W. McOwan, Robust facial expression recognition using
[5] Y. Yacoob, L.S. Davis, Recognizing human facial expression from long image local binary patterns, in: IEEE International Conference on Image Processing
sequences using optical flow, IEEE Transactions on Pattern Analysis and (ICIP), Genoa, vol. 2, 2005, pp. 370–373.
Machine Intelligence 18 (6) (1996) 636–642. [34] S. Liao, W. Fan, C.S. Chung, D.-Y. Yeung, Facial expression recognition using
[6] I. Essa, A. Pentland, Coding, analysis, interpretation, and recognition of facial advanced local binary patterns, tsallis entropies and global appearance
expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence features, in: IEEE International Conference on Image Processing (ICIP), 2006,
19 (7) (1997) 757–763. pp. 665–668.
[7] M.J. Lyons, J. Budynek, S. Akamatsu, Automatic classification of single facial [35] C. Darwin, The Expression of the Emotions in Man and Animals, John Murray,
images, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 London, 1872.
(12) (1999) 1357–1362. [36] M. Suwa, N. Sugie, K. Fujimora, A preliminary note on pattern recognition of
[8] G. Donato, M. Bartlett, J. Hager, P. Ekman, T. Sejnowski, Classifying facial human emotional expression, in: International Joint Conference on Pattern
actions, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 Recognition, 1978, pp. 408–410.
(10) (1999) 974–989. [37] M. Bartlett, G. Littlewort, C. Lainscsek, I. Fasel, J. Movellan, Machine learning
[9] M. Pantic, L. Rothkrantz, Expert system for automatic analysis of facial methods for fully automatic recognition of facial expressions and facial
expression, Image and Vision Computing 18 (11) (2000) 881–905. actions, in: IEEE International Conference on Systems, Man & Cybernetics,
[10] Y. Tian, T. Kanade, J. Cohn, Recognizing action units for facial expression Netherlands, 2004.
analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2) [38] M. Turk, A.P. Pentland, Face recognition using eigenfaces, in: IEEE Conference
(2001) 97–115. on Computer Vision and Pattern Recognition (CVPR), 1991.
[11] I. Cohen, N. Sebe, A. Garg, L. Chen, T.S. Huang, Facial expression recognition [39] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces:
from video sequences: temporal and static modeling, Computer Vision and recognition using class specific linear projection, IEEE Transactions on
Image Understanding 91 (2003) 160–187. Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720.
[12] L. Yin, J. Loi, W. Xiong, Facial expression representation and recognition based [40] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent
on texture augmentation and topographic masking, in: ACM Multimedia, 2004. component analysis, IEEE Transactions on Neural Networks 13 (6) (2002)
[13] M. Yeasin, B. Bullot, R. Sharma, From facial expression to level of interests: a 1450–1464.
spatio-temporal approach, in: IEEE Conference on Computer Vision and [41] C. Shan, S. Gong, P.W. McOwan, Appearance manifold of facial expression, in:
Pattern Recognition (CVPR), 2004. N. Sebe, M.S. Lew, T.S. Huang (Eds.), IEEE ICCV workshop on Human–
[14] J. Hoey, J.J. Little, Value directed learning of gestures and facial displays, in: Computer Interaction, Vol. 3723 of Lecture Notes in Computer Science,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. Springer, Beijing, 2005, pp. 221–230.
[15] Y. Chang, C. Hu, M. Turk, Probabilistic expression analysis on manifolds, in: [42] C. Padgett, G. Cottrell, Representing face images for emotion classification, in:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. Advances in Neural Information Processing Systems (NIPS), 1997.
[16] R.E. Kaliouby, P. Robinson, Real-time inference of complex mental states from [43] P. Ekman, W. Friesen, Facial Action Coding System: A Technique for
facial expressions and head gestures, in: IEEE CVPR Workshop on Real-time Measurement of Facial Movement, Consulting Psychologists Press, 1978.
Vision for Human–Computer Interaction, 2004. [44] P. Ekman, W. Friesen, Pictures of Facial Affect, Consulting Psychologists, 1976.
[17] M. Pantic, L.J.M. Rothkrantz, Facial action recognition for facial expression [45] T. Kanade, J. Cohn, Y. Tian, Comprehensive database for facial expression
analysis from static face images, IEEE Transactions on Systems, Man, and analysis, in: IEEE International Conference on Automatic Face & Gesture
Cybernetics 34 (3) (2004) 1449–1461. Recognition (FG), 2000.
[18] Y. Zhang, Q. Ji, Active and dynamic information fusion for facial expression [46] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
understanding from image sequences, IEEE Transactions on Pattern Analysis features, in: IEEE Conference on Computer Vision and Pattern Recognition
and Machine Intelligence 27 (5) (2005) 1–16. (CVPR), 2001.
[19] M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, J. Movellan, [47] M.R. Everingham, A. Zisserman, Regression and classification approaches to
Recognizing facial expression: machine learning and application to spotaneous eye localization in face images, in: IEEE International Conference on Automatic
behavior, in: IEEE Conference on Computer Vision and Pattern Recognition Face & Gesture Recognition (FG), 2006, pp. 441–446.
(CVPR), 2005. [48] M. Bartlett, G. Littlewort, I. Fasel, R. Movellan, Real time face detection and
[20] F. Dornaika, F. Davoine, Simultaneous facial action tracking and expression facial expression recognition: development and application to human–
recognition using a particle filter, in: IEEE International Conference on computer interaction, in: CVPR Workshop on CVPR for HCI, 2003.
Computer Vision (ICCV), 2005. [49] V.N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[21] C.S. Lee, A. Elgammal, Facial expression analysis using nonlinear decomposable [50] C.-W. Hsu, C.-C. Chang, C.-J. Lin, A Practical Guide to Support Vector
generative models, in: IEEE International Workshop on Analysis and Modeling Classification, Tech. Rep., Taipei, 2003.
of Faces and Gestures (AMFG), 2005. [51] X. Feng, M. Pietikäinen, T. Hadid, Facial expression recognition with local
[22] M. Valstar, I. Patras, M. Pantic, Facial action unit detection using probabilistic binary patterns and linear programming, Pattern Recognition and Image
actively learned support vector machines on tracked facial point data, in: IEEE Analysis 15 (2) (2005) 546–548.
Conference on Computer Vision and Pattern Recognition Workshop, vol. 3, [52] G. Guo, C.R. Dyer, Simultaneous feature selection and classifier training via
2005, pp. 76–84. linear programming: a case study for face expression recognition, in: IEEE
[23] M. Valstar, M. Pantic, Fully automatic facial action unit detection and temporal Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
analysis, in: IEEE Conference on Computer Vision and Pattern Recognition [53] R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated
Workshop, 2006, p. 149. predictions, Maching Learning 37 (3) (1999) 297–336.
816 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816

[54] G. Zhang, X. Huang, S.Z. Li, Y. Wang, X. Wu, Boosting local binary pattern (lbp)- [58] D. Bolme, M. Teixeria, J. Beveridge, B. Draper, The CSU face identification
based face recognition, in: Chinese Conference on Biometric Recognition evaluation system: its purpose, features and structure, in: International
(SINOBIOMETRICS), 2004, pp. 179–186. Conference on Vision Systems, 2003, pp. 304–311.
[55] C. Shan, S. Gong, P.W. McOwan, Conditional mutual information based [59] G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, J. Movellan, Dynamics of facial
boosting for facial expression recognition, in: British Machine Vision expression extracted automatically from video, Image and Vision Computing
Conference (BMVC), Oxford, vol. 1, 2005, pp. 399–408. 24 (6) (2006) 615–625.
[56] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning [60] J.N. Bassili, Emotion recognition: the role of facial movement and the relative
and an application to boosting, Journal of Computer and System Sciences 55 importance of upper and lower area of the face, Journal of Personality and
(1) (1997) 119–139. Social Psychology 37 (11) (1979) 2049–2058.
[57] M. Pantic, M. Valstar, R. Rademaker, L. Maat, Web-based database for facial [61] G. Zhao, M. Pietikäinen, Dynamic texture recognition using local binary
expression analysis, in: IEEE International Conference on Multimedia and Expo patterns with an application to facial expressions, IEEE Transactions on Pattern
(ICME), 2005. Analysis and Machine Intelligence 29 (6) (2007) 915–928.

You might also like