Cohn Kanade Database
Cohn Kanade Database
484
infrequently, however, and provide an incomplete referred to in FACS as miscellaneous actions. Many
description of facial expression [11]. To capture the action units may be coded as symmetrical or
subtlety of human facial expression, fine-grained asymmetrical. For action units that vary in intensity, a 5-
description of facial expression is needed. The Facial point ordinal scale is used to measure the degree of
Action Coding System [FACS: 4] is a human-observer- muscle contraction.
based system designed to detect subtle changes in facial Although Ekman and Friesen proposed that
features. Viewing videotaped facial behavior in slow specific combinations of FACS action units represent
motion, trained observers can manually FACS code all prototypic expressions of emotion, emotion-specified
possible facial displays, which are referred to as action expressions are not part of FACS; they are coded in
units (AU) and may occur individually or in separate systems, such as EMFACS [8]. FACS itself is
combinations. purely descriptive and includes no inferential labels. By
FACS consists of 44 action units. Thirty are converting FACS codes to EMFACS or similar systems,
anatomically related to contraction of a specific set of face images may be coded for emotion-specified
facial muscles (Table 1) [22]. The anatomic basis of the expressions (e.g., joy or anger) as well as for more molar
remaining 14 is unspecified (Table 2). These 14 are categories of positive or negative emotion [13].
485
28 Orbicularis oris Lips sucked
41 Relaxation of levator palpebrae superioris Upper eyelid droop
42 Orbicularis oculi Eyelid slit
43 Relaxation of levator palpebrae superioris; Eyes closed
orbicularis oculi, pars palpebralis
44 Orbicularis oculi, pars palpebralis Eyes squinted
45 Relaxation of levator palpebrae superioris; Blink
orbicularis oculi, pars palpebralis
46 Relaxation of levator palpebrae superioris; Wink
orbicularis oculi, pars palpebralis
Non-additive combinations represent further
Table 2. Miscellaneous Actions. complexity. Following usage in speech science, we
AU Description of Movement refer to these interactions as co-articulation effects.
8 Lips toward An example is the combination AU 12+15, which
19 Tongue show often occurs in embarrassment. While AU 12 raises
21 Neck tighten the cheeks, its action on the lip corners is modified by
29 Jaw thrust the downward action of AU 15. The resulting
30 Jaw sideways appearance change is highly dependent on timing.
31 Jaw clench The downward action of the lip corners may occur
32 Bite lip either simultaneously or sequentially. To be
33 Blow comprehensive, a database should include individual
34 Puff action units and both additive and non-additive
35 Cheek suck combinations, especially those that involve co-
36 Tongue bulge articulation effects. A classifier trained only on
37 Lip wipe single action units may perform poorly for
38 Nostril dilate combinations in which co-articulation effects occur.
39 Nostril compress
2.3 Deliberate versus spontaneous expression
2.2 Transitions among expressions
Most face expression data have been collected by
A simplifying assumption in previous research is asking subjects to perform a series of expressions.
that expressions are singular and begin and end from These directed facial action tasks may differ in
a neutral position. In reality, facial expression is appearance and timing from spontaneously occurring
more complex, especially at the level of action units. behavior [5]. Deliberate and spontaneous facial
Action units may occur in combinations or show behavior are mediated by separate motor pathways,
serial dependence. Transitions from action units or the pyramidal and extra-pyramidal motor tracks,
combination of actions to another may involve no respectively [16]. As a consequence, fine-motor
intervening neutral state. Parsing the stream of control of deliberate facial actions is often inferior
behavior is an essential requirement of a robust facial and less symmetric to that which occurs
analysis system, and training data are needed that spontaneously. Many people, for instance, are able to
include dynamic combinations of action units, which raise their outer brows spontaneously while leaving
may be either additive or non-additive. their inner brows at rest; few can perform this action
An example of an additive combination is voluntarily. Spontaneous depression of the lip
smiling (AU 12) with mouth opening, which would corners (AU 15) and raising and narrowing the inner
be coded as AU 12+25, AU 12+26, or AU 12+27 corners of the brow (AU 1+4) are common signs of
depending on the degree of lip parting and whether sadness. Without training, few people can perform
and how far the mandible was lowered. In the case these actions deliberately, which incidentally is an
of AU 12+27, for instance, the facial analysis system aid to lie detection [5]. Differences in the temporal
would need to detect transitions among all three organization of spontaneous and deliberate facial
levels of mouth opening while continuing to actions are particularly important in that many pattern
recognize AU 12, which may be simultaneously recognition approaches, such as Hidden Markov
changing in intensity. Modeling, are highly dependent on the timing of
appearance change. Unless a database includes both
486
deliberate and spontaneous facial actions, it will plasticity, morphology, frequency of intense
likely prove inadequate for developing face ex- expression, and overall rate of expression. Individual
pression methods that are robust to these differences. differences in these characteristics are well
established and are an important aspect of individual
2.4 Reliability of expression data identity [14]. (Incidentally, these individual
differences could be used to augment the accuracy of
When training a system to recognize facial face recognition algorithms). An extreme example of
expression, the investigator assumes that training and variability in expressiveness occurs in individuals
test data are accurately labeled. This assumption may who have incurred damage either to the facial nerve
or may not be accurate. Asking subjects to perform a or central nervous system [16, 19, 21]. To develop
given action is no guarantee that they will. To ensure algorithms that are robust to individual differences in
internal validity, expression data must be manually facial features and behavior, it is essential to include
coded, and the reliability of the coding verified. a large sample of varying ethnic background, age,
Inter-observer reliability can be improved by and sex, that includes people who have facial hair
providing rigorous training to observers and and wear jewelry or eyeglasses, and includes both
monitoring their performance. FACS coders must normal and clinically impaired individuals.
pass a standardized test, which ensures (initially)
uniform coding among international laboratories. 2.6 Head orientation and scene complexity
Monitoring is best achieved by having observers
independently code a portion of the same data. As a Face orientation relative to the camera, presence
general rule, 15% to 20% of data should be and actions of other people, and background
comparison coded. To guard against drift in coding conditions may influence face analysis. In the face
criteria [12], re-standardization is important. In recognition literature, face orientation has received
assessing reliability, coefficient kappa [7] is deliberate attention. The FERET data base [17], for
preferable to raw percentage of agreement, which instance, includes both frontal and oblique views, and
may be inflated by the marginal frequencies of codes. several specialized data bases have been collected to
Kappa quantifies inter-observer agreement after try to develop methods of face recognition that are
correcting for level of agreement expected by chance. invariant to moderate change in face orientation [20].
In the face expression literature, use of multiple
2.5 Individual differences among subjects perspectives is rare and relatively less attention has
been focused on the problem of pose invariance.
Face shape, texture, color, and facial and scalp Most researchers assume that face orientation is
hair vary with sex, ethnic background, and age [6, limited to in-plane variation [1] or that out-of-plane
23]. Infants, for instance, have smoother, less variation is small [10, 11]. In reality, large out-of-
textured skin and often lack facial hair in the brows plane variation in head position is common and often
or scalp. The eye opening and contrast between iris accompanies change in expression. Kraut [9] found
and sclera differ markedly between Asians and that smiling typically occurs while turning toward
Northern Europeans, which may affect the robustness another person. Camras [2] showed that infant
of eye tracking and facial feature analysis more surprise expressions often occur as the infant pitches
generally. Beards, eyeglasses, or jewelry may her head back. To develop pose invariant methods of
obscure facial features. Such individual differences in face expression analysis, image data are needed in
appearance may have important consequence for face which facial expression changes in combination with
analysis. Few attempts to study their influence exist. significant non-planar change in pose.
An exception was a study by Zlochower [23]. They Scene complexity, such as background and
found that algorithms for optical flow and high- presence of other people, potentially influences
gradient component detection that had been accuracy of face detection, feature tracking, and
optimized for young adults performed less well when expression recognition. Most databases use image
used in infants. The reduced texture of infants' skin, data in which the background is neutral or has a
their increased fatty tissue, juvenile facial consistent pattern and only a single person is present
conformation, and lack of transient furrows may all in the scene. In natural environments, multiple
have contributed to the differences observed in face people interacting with each other are likely to be
analysis between infants and adults. present, and their effects need to be understood.
In addition to individual differences in Unless this variation is represented in training data, it
appearance, there are individual differences in will be difficult to develop and test algorithms that
expressiveness, which refers to the degree of facial are robust to such variation.
487
2.7 Image acquisition and resolution the range of potential applications. Appropriate data
with which these factors can be tested are needed.
Image acquisition includes properties and
number of video cameras and digitizer, size of the 2.8 Relation to non-facial behavior
face image relative to total image dimensions, and
ambient lighting. All of these factors may influence Facial expression is one of several channels of
facial expression analysis. Images acquired in low nonverbal communication that may occur together.
light or at coarse resolution can provide less Contraction of the zygomaticus major (AU 12), for
information about facial features. Similarly, when instance, often is associated with positive or happy
face image size is small relative to total image size, vocalizations, and smiling tends to increase vocal
less information is available. NTSC cameras record fundamental frequency [3]. Few research groups,
images at 30 frames per second, The implications of however, have attempted to integrate gesture
down-sampling from this rate are unknown. Many recognition broadly defined across multiple channels
algorithms for optical flow assume that pixel of communication. An important question is whether
displacement between adjacent frames is small. there are advantages to early rather than late
Unless they are tested at a range of sampling rates, integration. Databases containing multi-modal
robustness to sampling rate and resolution cannot be expressive behavior afford opportunity for integrated
assessed. approaches to analysis of facial expression, prosody,
Within an image sequence, change in head gesture, and kinetic expression.
position relative to the light source and variation in
ambient lighting have potentially significant effects 2.9 Summary and problem statement
on face expression analysis. A light source above the
subject's head will cause shadows to fall below the The problem space for facial expression includes
brows, which can obscure the eyes, especially for multiple dimensions. To develop robust methods of
subject's with more pronounced bone structure or facial expression analysis, these dimensions must be
hair. Methods that work well in studio lighting may adequately sampled. In addition, to allow for
perform poorly in more naturalistic lighting (e.g., comparative tests of alternative approaches to facial
through an exterior window) when angle of lighting expression analysis, appropriate data must be made
changes across an image sequence. available to the face analysis community. To meet
Most investigators use single-camera set-ups, these needs, we have developed the CMU-Pittsburgh
which is problematic when a frontal orientation is not AU-Coded Facial Expression Database to serve as a
required. With image data from a single camera, out- test-bed for algorithm development and testing.
of-plane variation may be difficult to standardize.
For more than small out-of-plane rotation, multiple
3 The CMU-PITTSBURGH AU-Coded
cameras may be required. Multiple camera setups can
support 3-D modeling and in some cases ground truth Face Expression Image Database
with which to assess the accuracy of image
alignment. Our interdisciplinary research group of
Image resolution is another concern. psychologists and computer scientists is developing a
Professional grade PAL cameras, for instance, large, representative facial expression database for
provide very high resolution images. By contrast, use in both training and testing of algorithms for
security cameras provide ones that are seriously facial expression analysis. In this section we first
degraded. Although post processing may improve describe the CMU-Pittsburgh AU-Coded Face
image resolution, the degree of potential Expression Database. We then evaluate the database
improvement is likely limited. Also the effects of against the criteria presented above, and discuss
post processing for expression recognition are not current and future work.
known. Algorithms that work well at optimal
resolutions of full-face frontal images and studio 3.1 Description of database
lighting can be expected to perform poorly when
recording conditions are degraded or images are Facial behavior was recorded in 210 adults
compressed. Without knowing the boundary between the ages of 18 and 50 years. They were 69%
conditions of face expression algorithms, female, 31% male, 81%% Euro-American, 13%
comparative performance is difficult to assess. Afro-American, and 6% other groups (Table 3).
Algorithms that appear superior within one set of They were observed in an observation room equipped
boundary conditions may perform more poorly across with a chair on which to sit and two Panasonic
WV3230 cameras, each connected to a Panasonic
488
AG-7500 video recorder with a Horita synchronized apex and not from beginning to end, we are
time-code generator. One of the cameras was located performing additional coding. We also are
directly in front of the subject, and the other was increasing the number of action units that have been
positioned 30 degrees to the subject's right. An coded for intensity.
example of image data from the CMU-Pittsburgh
AU-Coded Facial Expression Database can be seen in
Figure 1. For approximately one third of subjects,
ambient room lighting augmented by a high-intensity
lamp was used. For the other two thirds, two high-
intensity lamps with reflective umbrellas were used
to provide uniform lighting.
489
The database has several limitations in its present the face analysis problem space. The face images
form. Intensity scoring of action units is incomplete, and facial features, especially in the infant, are small
and for many sequences only target frames rather relative to the image size, the infant's face has low
than the entire sequence have been coded. As noted texture, some shadows occur, and the likelihood of
above, continued coding is underway in order to sudden and large motion, occasional occlusion, and
further improve the data. Another limitation is the moderate out-of-plane motion is high. These are
lack of spontaneous expressions. Because deliberate challenging problems for which appropriate training
and spontaneous expressions may have different and testing data are critical.
appearance and timing, it is important to have
Surgical application. Another data source is
adequate numbers of each.
facial behavior from patients who have experienced
One solution, which we are pursuing, is to
damage to the facial nerve or the higher brain centers
examine our videotapes for instances of
that control facial behavior [16]. An example can be
spontaneously occurring action units that occurred
seen in Figure 3 from [21]. Notice the mild
during the experimental session. We now have a
asymmetry in repose due to muscle weakness and the
large sample of spontaneous smiles (AU 12) and
more marked asymmetry that occurs in the second
related action units (e.g., AU 6), and these will be
added to the database. frame. The inclusion of clinical data such as these
challenges assumptions of symmetry, which are
common when working with directed facial action
task images from subjects who have normal facial
function.
490
characteristics, and relation to other non-verbal [9] R.E. Kraut, R. Johnson, Social and emotional
behavior. Development of robust methods of facial messages of smiling: An ethological approach.
expression analysis requires access to databases that Journal of Personality and Social Psychology,
adequately sample from this problem space. The 37:1539-1523, 1979.
[10] J.J.J. Lien,T. Kanade, J.F. Cohn, C.C. Li, Detection,
CMU-Pittsburgh AU-Coded Facial Expression Image
tracking, and classification of subtle changes in facial
Database provides a valuable test-bed with which expression. Journal of Robotics and Autonomous
multiple approaches to facial expression analysis may Systems, in press.
be tested. In current and new work, we will further [11] J.J.J. Lien, T. Kanade, J.F. Cohn, C.C. Li,. Automated
increase the generalizability of this database. facial expression recognition. Proceedings of the
Third IEEE International Conference on Automatic
Face and Gesture Recognition (FG'98), 390-395,
1998.
[12] P. Martin, P. Bateson, Measuring behavior: An
introductory guide. Cambridge: Cambridge
University, 1986.
[13] R. Matias, J.F. Cohn, S. Ross, A comparison of two
systems to code infants' affective expression.
Developmental Psychology, 25: 483-489, 1989.
[14] A.S.R. Manstead, Expressiveness as an individual
difference. In R.S. Feldman & R. Rime (Eds.),
Fundamentals of nonverbal behavior, 285-328. NY:
Figure 4. Face images obtained from omni- Cambridge University, 1991.
[15] P.J. Narayanan, P. Rander, T. Kanade, Constructing
view camera. (6 images are shown here out
virtual worlds using dense flow. Proceedings of the
of 50 that were available). International Conference on Computer Vision ‘98, 3-
10, 1998.
5. Acknowledgements [16] W.E. Rinn, The neuropsychology of facial expression:
A review of the neurological and psychological
This research was supported by grant number mechanisms for producing facial expressions.
Psychological Bulletin, 95:52-77, 1984.
R01 MH51435 from the National Institute of Mental
[17] S.R. Rizvi, P.J. Phillips, H. Moon, The Feret
Health. verification testing protocol for face recognition
algorithms. Proceedings of the Third International
6. References Conference on Automatic Face and Gesture
Recognition, 48-55, 1998.
[1] M.S. Bartlett, J.C. Hager, P. Ekman, T.J. Sejnowski, [18] Y.L. Tian, T. Kanade, J.F. Cohn, Robust lip tracking
Measuring facial expressions by computer image by combining, shape color and motion. Asian
analysis. Psychophysiology, 36:253-264, 1999. Conference for Computer Vision, 2000.
[2] L.A. Camras, L. Lambrecht, L., G.F. Michel. Infant [19] J.M., VanSwearingen, J.F. Cohn, A. Bajaj-Luthra,
"surprise" expressions as coordinative motor Specific impairment of smiling increases severity of
structures. Journal of Nonverbal Behavior, 20:183- depressive symptoms in patients with facial
195, 1966. neuromuscular disorders. Journal of Aesthetic Plastic
[3] J.F. Cohn, G.S. Katz, Bimodal expression of emotion Surgery, in press.
by face and voice. ACM and ATR Workshop on [20] T. Vetter, Learning novels views to a single face
Face/Gesture Recognition and Their Applications, image. Proceedings of the IEEE International
41-44, 1998. Conference on Automatic Face and Gesture
[4] P. Ekman, W.V. Friesen, Facial Action Coding Recognition ‘95: 22-29, 1995.
System, Consulting Psychologist Press, Palo Alto, CA, [21] G.S. Wachtman, J.F. Cohn, J. VanSwearingen, E.K.
1978. Manders, Pixel-wise tracking of facial movement by
[5] P. Ekman, E. Rosenberg (Eds.), What the face reveals. computer image processing. Robert H. Ivy Society of
NY: Oxford University, 1997. Plastic & Reconstructive Surgeons, Pittsburgh, 1999.
[6] L.G. Farkas, I.R. Munro, Anthropometric Facial [22] A.J. Zlochower, Deciphering emotion from the face:
Proportions in Medicine, Springfield, Illinois: Charles An evaluation of FACS, EMG, and computer-vision
C. Thomas, 1987. based approaches to facial expression analysis.
[7] J.L. Fleiss, Statistical Methods for Rates and Unpublished manuscript, University of Pittsburgh,
Proportions, NY: Wiley, 1981. 1997.
[8] W.V. Friesen, P. Ekman, EMFACS-7: Emotional [23] A.J. Zlochower, J.F. Cohn, J.J. Lien, T. Kanade, A
Facial Action Coding System, Unpublished computer vision based method of facial expression
manuscript, University of California at San Francisco, analysis in parent-infant interaction, International
1983. Conference on Infant Studies, Atlanta, Georgia, April
1998.
491