Facial Expression Recognition Based On L PDF
Facial Expression Recognition Based On L PDF
a r t i c l e i n f o a b s t r a c t
Article history: Automatic facial expression analysis is an interesting and challenging problem, and impacts important
Received 12 June 2006 applications in many areas such as human–computer interaction and data-driven animation. Deriving
Received in revised form 14 February 2008 an effective facial representation from original face images is a vital step for successful facial expression
Accepted 16 August 2008
recognition. In this paper, we empirically evaluate facial representation based on statistical local features,
Local Binary Patterns, for person-independent facial expression recognition. Different machine learning
methods are systematically examined on several databases. Extensive experiments illustrate that LBP
Keywords:
features are effective and efficient for facial expression recognition. We further formulate Boosted-LBP
Facial expression recognition
Local Binary Patterns
to extract the most discriminant LBP features, and the best recognition performance is obtained by using
Support vector machine Support Vector Machine classifiers with Boosted-LBP features. Moreover, we investigate LBP features for
Adaboost low-resolution facial expression recognition, which is a critical problem but seldom addressed in the
Linear discriminant analysis existing work. We observe in our experiments that LBP features perform stably and robustly over a useful
Linear programming range of low resolutions of face images, and yield promising performance in compressed low-resolution
video sequences captured in real-world environments.
Ó 2008 Elsevier B.V. All rights reserved.
0262-8856/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2008.08.005
804 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816
low-dimensional feature space, while still retaining discriminative 1872 [35]. Suwa et al. [36] made the first attempt to automatically
facial information in a compact representation. We further formu- analyze facial expressions from image sequences in 1978. Much
late Boosted-LBP by learning the most discriminative LBP features progress has been made in the last decade, and a thorough survey
with AdaBoost, and the recognition performance of different clas- of the exiting work can be found in [1,2]. Here we briefly review
sifiers are improved by using the Boosted-LBP features. We also some previous work in order to put our work in context.
evaluate the generalization ability of LBP features across different
databases. 2.1. Facial representation
One limitation of the existing facial expression recognition
methods is that they attempt to recognize facial expressions from Automatic facial expression recognition involves two vital
data collected in a highly controlled environment given high reso- aspects: facial representation and classifier design [4]. Facial repre-
lution frontal faces [26]. However, in real-world applications such sentation is to derive a set of features from original face images to
as smart meeting and visual surveillance, the input face images are effectively represent faces. The optimal features should minimize
often at low resolutions. Obviously low-resolution images in real- within-class variations of expressions while maximize between-
world environments make real-life expression recognition much class variations. If inadequate features are used, even the best clas-
more difficult. Recently Tian et al. [32,26] made a first attempt to sifier could fail to achieve accurate recognition. In some existing
recognize facial expressions at low resolutions. In [26], Tian stud- work [5,6,14,13], optical flow analysis has been used to model
ied the effects of different image resolutions for each step of auto- muscles activities or estimate the displacements of feature points.
matic facial expression recognition. In this work, we investigate However, flow estimates are easily disturbed by the nonrigid mo-
LBP features for low-resolution facial expression recognition. tion and varying lighting, and are sensitive to the inaccuracy of im-
Experiments on different image resolutions show that LBP features age registration and motion discontinuities [18]. Facial geometry
perform stably and robustly over a useful range of low resolutions analysis has been widely exploited in facial representation
of face images. The encouraging performance on real-world com- [9,10,16,17,24], where shapes and locations of facial components
pressed video sequences illustrated their promising applications are extracted to represent the face geometry. For example, Zhang
in real-world environments. et al. [25] used the geometric positions of 34 fiducial points as fa-
This paper is an extended version of our previous work de- cial features to represent facial images. In image sequences, the fa-
scribed in [33]. The main contributions of this paper are summa- cial movements can be qualified by measuring the geometrical
rized as follows: displacement of facial feature points between the current frame
and the initial frame. Valstar et al. [22] presented AU detection
We empirically evaluate LBP features for person-independent by classifying features calculated from tracked fiducial facial
facial expression recognition. Different machine learning meth- points. Their method detects a similar amount of AUs with similar
ods are exploited to classify expressions on several databases. or higher recognition rates than those reported in [10,3,37]. So they
LBP features were previously used for facial expression classifi- argued that the facial representation based on tracked facial points
cation in [31], and more recently, following our work [33], Liao is well suited for facial expression analysis. Recently they [23] fur-
et al. [34] presented an extended LBP operator to extract fea- ther presented a fully automatic AU detection system that can
tures for facial expression recognition. However, these existing automatically localize facial points in the first frame and recognize
works were conducted on a very small database (JAFFE) using AU temporal segments using a subset of most informative spatio-
an individual classifier. In contrast, here we comprehensively temporal features selected by AdaBoost. However, the geometric
study LBP features for facial expression recognition with differ- feature-based representation commonly requires accurate and
ent classifiers on much larger databases. reliable facial feature detection and tracking, which is difficult to
We investigate LBP features for low-resolution facial expression accommodate in many situations. In [18], motions of facial features
recognition, a critical problem but seldom addressed in the are measured by simultaneously using an active Infra-Red illumi-
existing work. We not only perform evaluation on different nation and Kalman filtering to deal with large variations of head
image resolutions, but also conduct experiments in real-world motion.
compressed video sequences. Compared to the previous work Another kind of method to represent faces is to model the
[32,26], LBP features provide just as good or better performance, appearance changes of faces. Holistic spatial analysis including
so are very promising for real-world applications. Principal Component Analysis (PCA) [38], Linear Discriminant
We formulate Boosted-LBP by learning the most discriminative Analysis (LDA) [39], Independent Component Analysis (ICA) [40]
LBP histograms with AdaBoost for each expression, and the rec- and Gabor wavelet analysis [7] have been applied to either the
ognition performance of different classifiers are improved by whole-face or specific face regions to extract the facial appearance
using the Boosted-LBP features. We also evaluate the generaliza- changes. Donato et al. [8] explored different techniques to repre-
tion ability of LBP features cross different databases. sent face images for facial action recognition, which include PCA,
ICA, Local Feature Analysis (LFA), LDA and local schemes such as
The remainder of this paper is structured as follows. We present Gabor-wavelet representation and local principal components.
a brief review of related work in the next section. Local Binary Pat- Best performances were obtained using Gabor-wavelet representa-
terns are introduced in Section 4. Section 5 discusses facial expres- tion and ICA. Due to their superior performance, Gabor-wavelet
sion recognition using LBP features with different classification representations have been widely adopted in face image analysis
techniques. We investigate low-resolution expression recognition [25,7,26,19]. However, the computation of Gabor-wavelet repre-
in Section 6. Boosting LBP for expression recognition is presented sentations is both time and memory intensive, for example, in
in Section 7. We also evaluate across-dataset generalization in Sec- [19], the Gabor-wavelet representation derived from each
tion 8. Finally, Section 9 concludes the paper. 48 48 face image has the high dimensionality of Oð105 Þ. Recently
Local Binary Patterns have been introduced as effective appearance
features for facial image analysis [31,29,30]. We [33] compared LBP
2. Previous work features with Gabor features for facial expression recognition, and
studied their performance over a range of image resolutions. In
Automatic facial expression recognition has attracted much [41], we further presented facial expression manifold learning in
attention from behavioral scientists since the work of Darwin in the LBP feature space. More recently, Liao et al. [34] introduced
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 805
an improved LBP operator to extract features in both intensity and [8,10,18,23]. Meanwhile, psychophysical studies indicate that basic
gradient maps for facial expression recognition, and also tested emotions have corresponding universal facial expressions across
their methods on facial images of reduced resolutions. However, all cultures [44]. This is reflected by most current facial expression
their experiment was carried out on a very small database (213 recognition systems [7,11–13,19] that attempt to recognize a set of
images from 10 subjects). In this work, we comprehensively study prototypic emotional expressions including disgust, fear, joy, sur-
LBP features for facial expression recognition on several databases. prise, sadness and anger. Therefore, in this work, we also focus
on prototypic expression recognition. We consider both 6-class
2.2. Facial expression recognition prototypic expression recognition and 7-class expression recogni-
tion by including the neutral expression.
Different techniques have been proposed to classify facial We mainly conducted experiments on the Cohn–Kanade data-
expressions, such as Neural Network [42,25,26], Support Vector base [45], one of the most comprehensive database in the current
Machine (SVM) [19], Bayesian Network (BN)[11] and rule-based facial-expression-research community. The database consists of
classifiers [9,17,24]. In Lyons et al.’ work [7], the principle compo- 100 university students aged from 18 to 30 years, of which 65%
nents of the feature vectors from training images were analyzed by were female, 15% were African-American and 3% were Asian or La-
LDA to form discriminant vectors, and facial image classification tino. Subjects were instructed to perform a series of 23 facial dis-
was performed by projecting the input vector of a testing image plays, six of which were based on description of prototypic
along the discriminant vectors. Cohen et al. compared different emotions. Image sequences from neutral to target display were
Bayes classifiers [11], and Gaussian Tree-Augmented-Naive (TAN) digitized into 640 490 pixel arrays with 8-bit precision for gray-
Bayes classifiers performed best. Bartlett et al. [19] performed sys- scale values. Fig. 1 shows some sample images from the Cohn–Ka-
tematic comparison of different techniques including AdaBoost, nade database.
SVM and LDA for facial expression recognition, and best results For our experiments, we selected 320 image sequences from the
were obtained by selecting a subset of Gabor filters using AdaBoost database. The only selection criterion was that a sequence could be
and then training SVM on the outputs of the selected filters. Pantic labeled as one of the six basic emotions. The sequences come from
and Rothkrantz adopted rule-based reasoning to recognize action 96 subjects, with 1–6 emotions per subject. For each sequence, the
units and their combination [17]. neutral face and three peak frames were used for prototypic
expression recognition, resulting in 1280 images (108 Anger, 120
To exploit the temporal behaviors of facial expressions, differ- Disgust, 99 Fear, 282 Joy, 126 Sadness, 225 Surprise and 320 Neu-
ent techniques were presented for facial expression recognition tral). To evaluate the generalization performance to novel subjects,
in image sequences. There have been several attempts to track we adopted a 10-fold cross-validation testing scheme in our exper-
and recognize facial expressions over time based on optical flow iments. More precisely, we partitioned the dataset randomly into
analysis [5,6]. Tian et al. [10] presented a Neural Network based ten groups of roughly equal numbers of subjects. Nine groups were
approach to recognize facial action units in image sequences. Hid- used as the training data to train classifiers, while the remaining
den Markov Models (HMMs) have been widely used to model the group was used as the test data. The above process was repeated
temporal behaviors of facial expressions from image sequences ten times for each group in turn to be omitted from the training
[11,13]. Cohen et al. [11] proposed a multi-level HMM classifier, process. We reported the average recognition results on the test
which allows not only to perform expression classification on a vi- sets.
deo segment, bust also to automatically segment a long video se- Following Tian [26], we normalized the faces to a fixed distance
quence to the different expressions segments without resorting between the two eyes. We manually labeled the eyes location, to
to heuristic methods of segmentation. But HMMs can not deal with evaluate LBP features in the condition of no face registration errors.
dependencies in observation. Dynamic Bayesian Networks (DBNs) Automatic face registration can be achieved by face detection [46]
recently were exploited for sequence-based expression recognition and eye localization [26,47], which will be addressed in our future
[16,14,18]. Kaliouby and Robinson [16] proposed a system for work. Facial images of 110 150 pixels were cropped from original
inferring complex mental states from videos of facial expressions frames based on the two eyes location. No further registration such
and head gestures, where a multi-level DBN classifier was used as alignment of mouth [25] was performed in our algorithms. As
to model complex mental states as a number of interacting facial the faces in the database are frontal view, we did not consider head
and head displays. Zhang and Ji [18] explored the use of multisen- pose changes. For realistic sequences with head pose variation,
sory information fusion technique with DBNs for modeling and head pose estimation [26] can be adopted to detect front or near
understanding the temporal behaviors of facial expressions in im- front view. Illumination changes exist in the database, but there
age sequences. Chang et al. proposed a probabilistic video-based was no attempt made to remove illumination changes [26] in our
facial expression recognition method based on manifolds [15]. experiments, due to LBP’s gray-scale invariance. Fig. 2 shows an
Lee and Elgammal [21] recently introduced a framework to learn example of the original face image and the cropped image.
decomposable generative models for dynamic appearance of facial
expressions where facial motion is constrained to one dimensional 4. Local Binary Patterns (LBP)
closed manifolds. The learned model can generate different dy-
namic facial appearances for different people and for different The original LBP operator was introduced by Ojala et al. [27],
expressions, so enabling simultaneous recognition of faces and and was proved a powerful means of texture description. The oper-
facial expressions. ator labels the pixels of an image by thresholding a 3 3 neighbor-
hood of each pixel with the center value and considering the
3. Facial expression data results as a binary number (see Fig. 3 for an illustration), and the
256-bin histogram of the LBP labels computed over a region is used
Facial expressions can be described at different levels [4]. A as a texture descriptor. The derived binary numbers (called Local
widely used description is Facial Action Coding System (FACS) Binary Patterns or LBP codes) codify local primitives including dif-
[43], which is a human-observer-based system developed to cap- ferent types of curved edges, spots, flat areas, etc (as shown in
ture subtle changes in facial expressions. With FACS, facial expres- Fig. 4), so each LBP code can be regarded as a micro-texton [30].
sions are decomposed into one or more Action Units (AUs). AU The limitation of the basic LBP operator is its small 3 3 neigh-
recognition or detection has attracted much attention recently borhood which can not capture dominant features with large scale
806 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816
Fig. 1. The sample face expression images from the Cohn–Kanade database.
Fig. 4. Examples of texture primitives which can be detected by LBP (white circles represent ones and black circles zeros) [30].
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 807
Fig. 5. Three examples of the extended LBP [28]: the circular ð8; 1Þ neighborhood, the circular ð12; 1:5Þ neighborhood, and the circular ð16; 2Þ neighborhood, respectively.
Table 1 Table 3
Comparisons between the geometric features based TAN [11] and our LBP-based Recognition performance of LBP-based SVM with different kernels
template matching
6-Class recognition (%) 7-Class recognition (%)
Methods (feature + classifier) 7-Class recognition (%) 6-Class recognition (%)
SVM (linear) 91.5 ± 3.1 88.1 ± 3.8
LBP + template matching 79.1 ± 4.6 84.5 ± 5.2 SVM (polynomial) 91.5 ± 3.1 88.1 ± 3.8
Geometric features + TAN [11] 73.2 – SVM (RBF) 92.6 ± 2.9 88.9 ± 3.5
Table 2 Table 4
Confusion matrix of 7-class facial expression recognition using template matching Confusion matrix of 6-class facial expression recognition using SVM (RBF)
with LBP features
Anger (%) Disgust (%) Fear (%) Joy (%) Sadness (%) Surprise (%)
Anger Disgust Fear Joy Sadness Surprise Neutral
Anger 89.7 2.7 0 0 7.6 0
(%) (%) (%) (%) (%) (%) (%)
Disgust 0 97.5 2.5 0 0 0
Anger 58.7 5.5 0 0 26.7 0 9.1 Fear 0 2.0 73.0 22.0 3.0 0
Disgust 3.3 85.0 2.5 0 2.5 0 6.7 Joy 0 0.4 0.7 97.9 1.0 0
Fear 1.0 0 61.7 24.0 10.3 0 3.0 Sadness 10.3 0 0.8 0.8 83.5 4.6
Joy 0 0 6.0 90.4 0 0 3.6 Surprise 0 0 1.3 0 0 98.7
Sadness 4.9 0 0 0 72.4 1.7 21.0
Surprise 0 0 1.3 0 2.7 92.4 3.6
Neutral 2.0 0.8 0.4 0.8 25.7 0 70.3
Table 5
Confusion matrix of 7-class facial expression recognition using SVM (RBF)
2
pffiffiffi pffiffiffi pffiffiffi
1
pffiffiffii.e., 9, 9 2, 18, 18 2, 36 pixels per cycle, so the frequencies used 12, 6 2, 6,
http://www.kyb.tuebingen.mpg.de/bs/people/spider/index.html. 3 2, 3 cycles/image-width.
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 809
Table 8
Table 7 Comparison between LDA + NN and SVM (linear) for facial expression recognition
Time and memory costs for extracting LBP features and Gabor-filter features using LBP features
LBP Gabor Gabor [48] 7-Class recognition (%) 6-Class recognition (%)
Memory (feature dimension) 2478 42,650 92,160 LDA + NN 73.4 ± 5.6 79.2 ± 7.2
Time (feature extraction time) 0.03 s 30 s – SVM (linear) 80.2 ± 4.9 87.7 ± 4.7
810 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816
where each binary classifier was produced by the linear program- Table 9
ming technique. Binary classifiers were combined with a voting Comparison between the linear programming technique and SVM (linear) for facial
expression recognition
scheme to output the final recognition result. To reduced the
length of the LBP feature vector, we also discarded the dimensions 7-Class recognition (%) 6-Class recognition (%)
whose occurrence frequency is lower than a threshold [51]. The Linear programming 82.3 ± 3.5 89.6 ± 3.6
threshold of 5 was adopted in our experiments. SVM (linear) 86.0 ± 3.3 90.4 ± 3.9
In our 10-fold cross-validation experiments, the linear pro-
gramming technique produces the generalization performance of
82.3% for 7-class recognition and 89.6% for 6-class recognition.
We compare its performance with that of SVM (linear) in Table
9, where the input of SVM (linear) is also the feature vectors with
dimensions discarded. It is observed that the linear programming
technique produces the slight inferior performance to SVM (linear).
Table 10
Recognition performance (%) in low-resolution images with different methods
110 150 55 75 36 48 27 37 18 24 14 19
LBP 92.6 ± 2.9 89.9 ± 3.1 87.3 ± 3.4 84.3 ± 4.1 79.6 ± 4.7 76.9 ± 5.0
Gabor 89.8 ± 3.1 89.2 ± 3.0 86.4 ± 3.3 83.0 ± 4.3 78.2 ± 4.5 75.1 ± 5.1
Gabor [26] 92.2 91.6 – 77.6 – 68.2
Feature tracking [26] 91.8 91.6 – N/A – N/A
Feature detection [26] 73.8 72.9 – 61.3 N/A
‘‘–” indicates that the image resolution was not examined in [26], and ‘‘N/A” indicates that the image resolution was studied in [26], but no recognition result was obtained.
The real-world video sequence contains the full range of head Table 11
motion. In Tian’s previous work [32], the head pose was first esti- Examples of modified GT vs original GT
mated based on the detected head, and then for frontal and near
frontal views of the face, the facial features were extracted to per-
form facial expression recognition. Since our focus was investigat-
Original GT Neutral Joy Neutral Neutral
ing the validity of LBP features in real-world compressed video Modified GT Sideview Sideview Joy Joy
inputs, we did not consider pose estimation currently. We cropped
the face region in frontal and near frontal views based on the loca-
tion of two eyes from the input image sequence, then performed Table 12
recognition on the cropped face images. Fig. 9 shows face regions Examples of failed recognition
cropped in one frame.
It is very difficult, even for human beings, to recognize facial
expressions at low resolution. Following Tian et al. [32], experi-
ments were conducted on showing some frames of expression at Modified GT Joy Joy Neutral Neutral
Test Results Others Others Joy Others
low resolution to a small set of human observers (in this instance
five researchers in our lab) resulting in many who could not per-
form recognition against the ground truth provided by the PETS
dataset (original GT). Tian et al. modified the ground truth based expressions. Additionally, as the training images are captured
on the majority. Here we also generated a new ground truth (mod- when subjects exaggeratedly pose their facial expressions, while
ified GT) for some frames based on human observations. Examples the test images are natural facial expressions without any delib-
of modified GT vs original GT are shown in Table 11. erate exaggerated posing, this difference in data also brings some
A total of 1209 images from the Cohn–Kanade database were classification errors.
used to train the SVM classifier. Since face regions in PETS dataset
are around 40 50 pixels, the training images were down-sampled
7. Boosting LBP for facial expression recognition
from the original images to 38 48 pixels. The trained classifier
recognized five expressions: neutral, joy, angry, surprise and others
The above experiments clearly demonstrate that the LBP fea-
(including fear, sadness and disgust).
tures are effective for facial expression recognition, and performed
Our method performed well with the input real-world image
just as well or better than reported existing techniques but with a
sequence. The overall recognition rate on frames from 18,000 to
significant low-computation advantage. In the above investigation,
18,190 was 91.5%, which is comparable to results reported in
face images are equally divided into small sub-regions from which
Tian’s work [32]. Table 12 shows some failed examples. We ob-
LBP histograms are extracted and concatenated into a single fea-
serve that some frames of near frontal view were incorrectly clas-
ture vector. However, apparently the extracted LBP features de-
sified because our training data includes only frontal view
pend on the divided sub-regions, so this LBP feature extraction
scheme suffers from fixed sub-region size and positions. By shifting
and scaling a sub-window over face images, many more sub-re-
gions can be obtained, bringing many more LBP histograms, which
yield a more complete description of face images. To minimize a
very large number of LBP histograms necessarily introduced by
shifting and scaling a sub-window, boosting learning [53] can be
used to learn the most effective LBP histograms that containing
much discriminative information. In [54], Zhang et al. presented
an approach for face recognition by boosting LBP-based classifiers,
where the distance between corresponding LBP histograms of two
face images is used as a discriminative feature, and AdaBoost was
used to learn a few of most efficient features. In our previous work
[55], we presented a conditional mutual information base boosting
scheme to select the most discriminative LBP histograms for facial
expression recognition. We observed that AdaBoost performs bet-
ter than the conditional mutual information based boosting when
using several tens of weak classifiers. Therefore, in this section, we
Fig. 9. We cropped the face region in frontal and near frontal view based on the learn the most discriminative LBP histograms using AdaBoost for
location of two eyes from the input image sequence (frame 17,130). better facial representation.
812 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816
AdaBoost methods [56,53] provide a simple yet effective ap- largest positive output of binary classifiers. In our experiments,
proach for stagewise learning of a nonlinear classification function. AdaBoost training continued until the classifier output distribution
AdaBoost learns a small number of weak classifiers whose perfor- for the positive and negative samples were completely separated,
mance can be just better than random guessing, and boosts them so the number of LBP histograms selected for each expression
iteratively into a strong classifier of higher accuracy. The process was not pre-defined, but automatically decided by the AdaBoost
of AdaBoost maintains a distribution on the training samples. At learner itself. In the 10-fold experiments, the number of selected
each iteration, a weak classifier which minimizes the weighted er- LBP histogram ranges 49–52 for 6-class expressions and 65–70
ror rate is selected, and the distribution is updated to increase the for 7-class expressions. For example, Fig. 11 displays the selected
weights of the misclassified samples and reduce the importance of sub-regions (LBP histograms) for each basic expression in one trial
the others. AdaBoost has been successfully used in many problems of the 10-fold cross-validation. We can observe that the selected
such as face detection [46]. sub-regions have variable sizes and positions. Moreover, while
As each LBP histogram is calculated from a sub-region, Ada- the weights of sub-regions in the template matching in Section
Boost is actually used to find the sub-regions that contain more 5.1 were chosen empirically, the weights in boosted classifiers
discriminative information for facial expression classification in were learned by AdaBoost. The generalization performance of the
term of the LBP histogram. On selecting a weak classifier for Ada- boosted classifiers is 84.6% for 7-class recognition and 89.8% for
Boost, we adopted the histogram-based template matching. For 6-class recognition, respectively. As shown in Table 13, compared
each sub-region, the LBP histograms in a given class are averaged to the LBP based template matching in Section 5.1, AdaBoost
to generate a template for this class. The trained weak classifier (Boosted-LBP) provides improved performance. We also show the
matches the input histogram with the closest template, and out- confusion matrix of 7-class recognition using AdaBoost in Table
puts the corresponding class label. The Chi square statistic ðv2 Þ 14, where Disgust, Joy, Surprise and Neutral can be recognized
was used as the dissimilarity measure for histograms (Eq. (4)). As with high accuracy. It can be seen that AdaBoost’s performance is
the traditional AdaBoost works on two-class problems, the multi- inferior to that of SVM (RBF) reported in Table 5 for most expres-
class problem here is accomplished by using the one-against-rest sions except Fear and Neutral.
technique, which trains AdaBoost between one expression with We further combine feature selection by AdaBoost with classi-
all others. For each AdaBoost learner, the images of one expression fication by SVM. In particular, we train SVM with the Boosted-
were positive samples, while the images of all other expressions LBP features. In each trial of the 10-fold cross-validation, we ap-
were negative samples. plied AdaBoost to learn the discriminative LBP histograms for each
By shifting and scaling a sub-window, 16,640 sub-regions, i.e., expression, and then utilized the union of the selected LBP histo-
16,640 LBP histograms, in total were extracted from each face im- grams as the input for SVMs. For example, in Fig. 11, the union
age. The sub-window was shifted in the whole image with the of all sub-regions selected resulted in a total of 51 LBP histograms.
shifting step of 4 pixels, while its size was scaled between The generalization performance of Boosted-LBP based SVM is sum-
10 10 pixels and 25 20 pixels with the scaling step of 5 pixels. marized in Table 15, where the degree of the polynomial kernel is 1
AdaBoost was used to learn a small subset (in tens) of effective LBP and the standard deviation for the RBF kernel is 211 . For compari-
histograms. we plot in Fig. 10 the spatial localization of the 50 sub- son, we also include the recognition performance of LBP based
regions (i.e., the centers of the sub-regions) that corresponded by SVMs (in Section 5.2) in Table 15. We observe that Boosted-LBP
the top 50 LBP histograms selected by AdaBoost for each expres- based SVMs outperform LBP-based SVMs by around 2.5–3.5%
sion. It is observed that different expressions have different key points. The 7-class expression recognition result of 91.4% is very
discriminant LBP features, and the discriminant features are encouraging, compared to the state of the art [11]. Bartlett et al.
mainly distributed in the eye and mouth regions. [19] obtained the best performance 93.3% by selecting a subset
We performed facial expression recognition using the strong of Gabor filters using AdaBoost and then training SVM on the out-
classifiers boosted by AdaBoost, and outputs the class with the puts of the selected filters. With regard to the 6-class recognition,
Fig. 10. Distributions of the top 50 sub-regions (LBP histograms) selected AdaBoost for each expression.
Fig. 11. The sub-regions (LBP histograms) selected by AdaBoost for each emotion. from left to right: Anger, Disgust, Fear, Joy, Sadness, Surprise.
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 813
Table 13 Table 17
Recognition performance of Boosted-LBP vs LBP Recognition performance of LBP based LDA vs Boosted-LBP based LDA
7-Class recognition (%) 6-Class recognition (%) 7-Class recognition (%) 6-Class recognition (%)
AdaBoost (Boosted-LBP) 85.0 ± 4.5 89.8 ± 4.7 LBP based LDA 73.4 ± 5.6 79.2 ± 7.2
LBP + template matching 79.1 ± 4.6 84.5 ± 5.2 Boosted-LBP based LDA 77.6 ± 5.7 84.2 ± 6.1
Table 16
Confusion matrix of 7-class facial expression recognition using Boosted-LBP based
SVM
Fig. 13. The sample face expression images from the JAFFE database.
tween the two eyes; face images of 110 150 pixels were cropped different databases in the same way, the only difference between
from original frames based on the two eyes location. them is that they were collected under different controlled envi-
We first performed 10-fold cross-validation on each dataset, ronments. So the current expression classifier trained on a single
and the recognition rates are shown in the top two rows of Table dataset with uniformly controlled environment works well only
18, where the degree of the polynomial kernel is 1 and the stan- within that dataset. In order to generalize across image collection
dard deviation for the RBF kernel is 215 for the MMI database and environments, we have to collect large training datasets with vari-
28 for the JAFFE database. The best recognition performance of ations in image conditions [59].
86.9% on the MMI database is inferior to that on the Cohn–Kanade
database. This is possibly because that there are fewer images in 9. Conclusions and future work
the dataset, and subjects are wearing glasses. The performance
on the JAFFE is worst overall compared to that of the Cohn–Kanade In this paper, we present a comprehensive empirical study of fa-
database and the MMI database, and this may be also due to a cial expression recognition based on Local Binary Patterns features.
much small dataset. With LBP features and the linear programming Different classification techniques are examined on several dat-
technique, Feng et al. [51] reported the performance of 93.8% on abases. The key issues of this work can be summarized as follows:
the JAFFE database. They preprocessed the images using the CSU
Face Identification Evaluation System [58] to exclude nonface area 1. Deriving an effective facial representation from original face
with an elliptical mask. Liao et al. [34] recently reported the recog- images is a vital step for successful facial expression recogni-
nition performance of 85.6% on the JAFFE database, but they did tion. We empirically evaluate LBP features to describe appear-
not conducted 10-fold cross-validation. ance changes of expression images. Extensive experiments
We then performed across-dataset experiments, i.e., we per- illustrate that LBP features are effective and efficient for facial
formed LBP feature selecting and SVM training on the Cohn–Ka- expression recognition.
nade database, and then tested the classifier on the MMI 2. One challenge for facial expression recognition is recognizing
database and the JAFFE database, respectively. Recognition results facial expressions at low resolutions, as only compressed low-
are shown in the bottom two rows of Table 18, where the degree of resolution video input is available in real-world applications.
the polynomial kernel is 1 and the standard deviation for the RBF We investigate LBP features on low-resolution images, and
kernel is 214 for the MMI database and 211 for the JAFFE database. observe that LBP features perform stably and robustly over a
We observe that generalization performance across datasets was useful range of low resolutions of face images.
much lower, such as around 50% on the MMI database and around 3. We adopt AdaBoost to learn the most discriminative LBP fea-
40% on the JAFFE database. These results actually reenforce Bartlett tures from a large LBP feature pool. Best recognition perfor-
et al.’s recent finding [59], where they trained selected Gabor- mance is obtained by using SVM with Boosted-LBP features.
wavelet features based SVMs on the Cohn–Kanade database and However, this method has limitation on generalization to other
tested them on another Pictures of Facial Affect database, and ob- datasets.
tained 56–60% performance. As we preprocessed face images of
Since the performance of the boosted strong classifier originates
in the characteristics of its weak hypothesis space, we will evaluate
Table 18 other kinds of weak classifiers as alternative to template matching,
Generalization performance of Boosted-LBP based SVM on other datasets
in order to achieve better classification performance. One limita-
SVM (linear) SVM (polynomial) SVM (RBF) tion of this work is that the recognition is performed by using static
(%) (%) (%) images without exploiting temporal behaviors of facial expres-
MMI 86.7 86.7 86.9 sions. The psychological experiments by Bassili [60] have sug-
JAFFE 79.8 79.8 81.0 gested that facial expressions are more accurately recognized
Train:Cohn–Kanade Test:MMI 50.8 50.8 51.1
from a dynamic image than from a single static image. We will ex-
Train:Cohn–Kanade Test:JAFFE 40.4 40.4 41.3
plore temporal information in our future work. Recently volume
C. Shan et al. / Image and Vision Computing 27 (2009) 803–816 815
LBP and LBP from three orthogonal planes have been introduced [24] M. Pantic, I. Patras, Dynamics of facial expression: recognition of facial
actions and their temporal segments from face profile image sequences,
for dynamic texture recognition [61], showing promising perfor-
IEEE Transactions on Systems, Man, and Cybernetics 36 (2) (2006) 433–
mance on facial expression recognition in video sequences. An- 449.
other limitation of the current work is that we do not consider [25] Z. Zhang, M.J. Lyons, M. Schuster, S. Akamatsu, Comparison between
head pose variations and occlusions, which will be addressed in geometry-based and Gabor-wavelets-based facial expression recognition
using multi-layer perceptron, in: IEEE International Conference on
our future work. We will also study the effect of imprecise face Automatic Face & Gesture Recognition (FG), 1998.
location on expression recognition results. [26] Y. Tian, Evaluation of face resolution for expression analysis, in: CVPR
Workshop on Face Processing in Video, 2004.
[27] T. Ojala, M. Pietikäinen, D. Harwood, A comparative study of texture measures
Acknowledgements with classification based on featured distribution, Pattern Recognition 29 (1)
(1996) 51–59.
We would like to thank Prof. Jeffery Cohn for the use of the [28] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns, IEEE
Cohn–Kanade database, Prof. Maja Pantic and Dr. Michel F. Valstar Transactions on Pattern Analysis and Machine Intelligence 24 (7) (2002)
for the use of the MMI database, and Dr. Michael J. Lyons for the 971–987.
use of the JAFFE database. [29] T. Ahonen, A. Hadid, M. Pietikäinen, Face recognition with local binary
patterns, in: European Conference on Computer Vision (ECCV), 2004.
[30] A. Hadid, M. Pietikäinen, T. Ahonen, A discriminative feature space for
References detecting and recognizing faces, in: IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2004.
[1] M. Pantic, L. Rothkrantz, Automatic analysis of facial expressions: the state of [31] X. Feng, A. Hadid, M. Pietikäinen, A coarse-to-fine classification scheme for
art, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12) facial expression recognition, International Conference on Image Analysis and
(2000) 1424–1445. Recognition (ICIAR), Lecture Notes in Computer Science, vol. 3212, Springer,
[2] B. Fasel, J. Luettin, Automatic facial expression analysis: a survey, Pattern 2004, pp. 668–675.
Recognition 36 (2003) 259–275. [32] Y. Tian, L. Brown, A. Hampapur, S. Pankanti, A. Senior, R. Bolle, Real world
[3] M. Pantic, L. Rothkrantz, Toward an affect-sensitive multimodal human– real-time automatic recognition of facial expression, in: IEEE Workshop on
computer interaction, in: Proceeding of the IEEE, vol. 91, 2003, pp. 1370–1390. Performance Evaluation of Tracking and Surveillance (PETS), Australia,
[4] Y. Tian, T. Kanade, J. Cohn, Handbook of Face Recognition, Springer, 2005 2003.
(Chapter 11. Facial Expression Analysis). [33] C. Shan, S. Gong, P.W. McOwan, Robust facial expression recognition using
[5] Y. Yacoob, L.S. Davis, Recognizing human facial expression from long image local binary patterns, in: IEEE International Conference on Image Processing
sequences using optical flow, IEEE Transactions on Pattern Analysis and (ICIP), Genoa, vol. 2, 2005, pp. 370–373.
Machine Intelligence 18 (6) (1996) 636–642. [34] S. Liao, W. Fan, C.S. Chung, D.-Y. Yeung, Facial expression recognition using
[6] I. Essa, A. Pentland, Coding, analysis, interpretation, and recognition of facial advanced local binary patterns, tsallis entropies and global appearance
expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence features, in: IEEE International Conference on Image Processing (ICIP), 2006,
19 (7) (1997) 757–763. pp. 665–668.
[7] M.J. Lyons, J. Budynek, S. Akamatsu, Automatic classification of single facial [35] C. Darwin, The Expression of the Emotions in Man and Animals, John Murray,
images, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 London, 1872.
(12) (1999) 1357–1362. [36] M. Suwa, N. Sugie, K. Fujimora, A preliminary note on pattern recognition of
[8] G. Donato, M. Bartlett, J. Hager, P. Ekman, T. Sejnowski, Classifying facial human emotional expression, in: International Joint Conference on Pattern
actions, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 Recognition, 1978, pp. 408–410.
(10) (1999) 974–989. [37] M. Bartlett, G. Littlewort, C. Lainscsek, I. Fasel, J. Movellan, Machine learning
[9] M. Pantic, L. Rothkrantz, Expert system for automatic analysis of facial methods for fully automatic recognition of facial expressions and facial
expression, Image and Vision Computing 18 (11) (2000) 881–905. actions, in: IEEE International Conference on Systems, Man & Cybernetics,
[10] Y. Tian, T. Kanade, J. Cohn, Recognizing action units for facial expression Netherlands, 2004.
analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2) [38] M. Turk, A.P. Pentland, Face recognition using eigenfaces, in: IEEE Conference
(2001) 97–115. on Computer Vision and Pattern Recognition (CVPR), 1991.
[11] I. Cohen, N. Sebe, A. Garg, L. Chen, T.S. Huang, Facial expression recognition [39] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces:
from video sequences: temporal and static modeling, Computer Vision and recognition using class specific linear projection, IEEE Transactions on
Image Understanding 91 (2003) 160–187. Pattern Analysis and Machine Intelligence 19 (7) (1997) 711–720.
[12] L. Yin, J. Loi, W. Xiong, Facial expression representation and recognition based [40] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent
on texture augmentation and topographic masking, in: ACM Multimedia, 2004. component analysis, IEEE Transactions on Neural Networks 13 (6) (2002)
[13] M. Yeasin, B. Bullot, R. Sharma, From facial expression to level of interests: a 1450–1464.
spatio-temporal approach, in: IEEE Conference on Computer Vision and [41] C. Shan, S. Gong, P.W. McOwan, Appearance manifold of facial expression, in:
Pattern Recognition (CVPR), 2004. N. Sebe, M.S. Lew, T.S. Huang (Eds.), IEEE ICCV workshop on Human–
[14] J. Hoey, J.J. Little, Value directed learning of gestures and facial displays, in: Computer Interaction, Vol. 3723 of Lecture Notes in Computer Science,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. Springer, Beijing, 2005, pp. 221–230.
[15] Y. Chang, C. Hu, M. Turk, Probabilistic expression analysis on manifolds, in: [42] C. Padgett, G. Cottrell, Representing face images for emotion classification, in:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004. Advances in Neural Information Processing Systems (NIPS), 1997.
[16] R.E. Kaliouby, P. Robinson, Real-time inference of complex mental states from [43] P. Ekman, W. Friesen, Facial Action Coding System: A Technique for
facial expressions and head gestures, in: IEEE CVPR Workshop on Real-time Measurement of Facial Movement, Consulting Psychologists Press, 1978.
Vision for Human–Computer Interaction, 2004. [44] P. Ekman, W. Friesen, Pictures of Facial Affect, Consulting Psychologists, 1976.
[17] M. Pantic, L.J.M. Rothkrantz, Facial action recognition for facial expression [45] T. Kanade, J. Cohn, Y. Tian, Comprehensive database for facial expression
analysis from static face images, IEEE Transactions on Systems, Man, and analysis, in: IEEE International Conference on Automatic Face & Gesture
Cybernetics 34 (3) (2004) 1449–1461. Recognition (FG), 2000.
[18] Y. Zhang, Q. Ji, Active and dynamic information fusion for facial expression [46] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
understanding from image sequences, IEEE Transactions on Pattern Analysis features, in: IEEE Conference on Computer Vision and Pattern Recognition
and Machine Intelligence 27 (5) (2005) 1–16. (CVPR), 2001.
[19] M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, J. Movellan, [47] M.R. Everingham, A. Zisserman, Regression and classification approaches to
Recognizing facial expression: machine learning and application to spotaneous eye localization in face images, in: IEEE International Conference on Automatic
behavior, in: IEEE Conference on Computer Vision and Pattern Recognition Face & Gesture Recognition (FG), 2006, pp. 441–446.
(CVPR), 2005. [48] M. Bartlett, G. Littlewort, I. Fasel, R. Movellan, Real time face detection and
[20] F. Dornaika, F. Davoine, Simultaneous facial action tracking and expression facial expression recognition: development and application to human–
recognition using a particle filter, in: IEEE International Conference on computer interaction, in: CVPR Workshop on CVPR for HCI, 2003.
Computer Vision (ICCV), 2005. [49] V.N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[21] C.S. Lee, A. Elgammal, Facial expression analysis using nonlinear decomposable [50] C.-W. Hsu, C.-C. Chang, C.-J. Lin, A Practical Guide to Support Vector
generative models, in: IEEE International Workshop on Analysis and Modeling Classification, Tech. Rep., Taipei, 2003.
of Faces and Gestures (AMFG), 2005. [51] X. Feng, M. Pietikäinen, T. Hadid, Facial expression recognition with local
[22] M. Valstar, I. Patras, M. Pantic, Facial action unit detection using probabilistic binary patterns and linear programming, Pattern Recognition and Image
actively learned support vector machines on tracked facial point data, in: IEEE Analysis 15 (2) (2005) 546–548.
Conference on Computer Vision and Pattern Recognition Workshop, vol. 3, [52] G. Guo, C.R. Dyer, Simultaneous feature selection and classifier training via
2005, pp. 76–84. linear programming: a case study for face expression recognition, in: IEEE
[23] M. Valstar, M. Pantic, Fully automatic facial action unit detection and temporal Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
analysis, in: IEEE Conference on Computer Vision and Pattern Recognition [53] R.E. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated
Workshop, 2006, p. 149. predictions, Maching Learning 37 (3) (1999) 297–336.
816 C. Shan et al. / Image and Vision Computing 27 (2009) 803–816
[54] G. Zhang, X. Huang, S.Z. Li, Y. Wang, X. Wu, Boosting local binary pattern (lbp)- [58] D. Bolme, M. Teixeria, J. Beveridge, B. Draper, The CSU face identification
based face recognition, in: Chinese Conference on Biometric Recognition evaluation system: its purpose, features and structure, in: International
(SINOBIOMETRICS), 2004, pp. 179–186. Conference on Vision Systems, 2003, pp. 304–311.
[55] C. Shan, S. Gong, P.W. McOwan, Conditional mutual information based [59] G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, J. Movellan, Dynamics of facial
boosting for facial expression recognition, in: British Machine Vision expression extracted automatically from video, Image and Vision Computing
Conference (BMVC), Oxford, vol. 1, 2005, pp. 399–408. 24 (6) (2006) 615–625.
[56] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning [60] J.N. Bassili, Emotion recognition: the role of facial movement and the relative
and an application to boosting, Journal of Computer and System Sciences 55 importance of upper and lower area of the face, Journal of Personality and
(1) (1997) 119–139. Social Psychology 37 (11) (1979) 2049–2058.
[57] M. Pantic, M. Valstar, R. Rademaker, L. Maat, Web-based database for facial [61] G. Zhao, M. Pietikäinen, Dynamic texture recognition using local binary
expression analysis, in: IEEE International Conference on Multimedia and Expo patterns with an application to facial expressions, IEEE Transactions on Pattern
(ICME), 2005. Analysis and Machine Intelligence 29 (6) (2007) 915–928.