100% found this document useful (3 votes)
10 views

(Ebook) Affective Computing, Focus on Emotion Expression, Synthesis and Recognition by Jimmy Or (Editor) ISBN 9783902613233, 3902613238 2024 scribd download

The document promotes the ebook 'Affective Computing, Focus on Emotion Expression, Synthesis and Recognition' edited by Jimmy Or, which explores the field of affective computing and its applications in recognizing and processing human emotions. It includes a comprehensive overview of research, methodologies, and practical applications related to facial expression recognition, body language, and emotional interaction in technology. Additionally, it provides links to download this and other recommended ebooks from ebooknice.com.

Uploaded by

acomehazize
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
10 views

(Ebook) Affective Computing, Focus on Emotion Expression, Synthesis and Recognition by Jimmy Or (Editor) ISBN 9783902613233, 3902613238 2024 scribd download

The document promotes the ebook 'Affective Computing, Focus on Emotion Expression, Synthesis and Recognition' edited by Jimmy Or, which explores the field of affective computing and its applications in recognizing and processing human emotions. It includes a comprehensive overview of research, methodologies, and practical applications related to facial expression recognition, body language, and emotional interaction in technology. Additionally, it provides links to download this and other recommended ebooks from ebooknice.com.

Uploaded by

acomehazize
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Visit https://ebooknice.

com to download the full version and


explore more ebooks

(Ebook) Affective Computing, Focus on Emotion


Expression, Synthesis and Recognition by Jimmy Or
(Editor) ISBN 9783902613233, 3902613238

_____ Click the link below to download _____


https://ebooknice.com/product/affective-computing-
focus-on-emotion-expression-synthesis-and-
recognition-2384976

Explore and download more ebooks at ebooknice.com


Here are some recommended products that might interest you.
You can download now and explore!

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James


ISBN 9781459699816, 9781743365571, 9781925268492, 1459699815,
1743365578, 1925268497

https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

ebooknice.com

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans


Heikne, Sanna Bodemyr ISBN 9789127456600, 9127456609

https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312

ebooknice.com

(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success)


by Peterson's ISBN 9780768906677, 0768906679

https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018

ebooknice.com

(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT
Subject Test: Math Levels 1 & 2) by Arco ISBN 9780768923049,
0768923042

https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-
arco-master-the-sat-subject-test-math-levels-1-2-2326094

ebooknice.com
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study:
the United States, 1919-41 2nd Edition by Benjamin Harrison ISBN
9781398375147, 9781398375048, 1398375144, 1398375047

https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044

ebooknice.com

(Ebook) Vagabond, Vol. 29 (29) by Inoue, Takehiko ISBN 9781421531489,


1421531488

https://ebooknice.com/product/vagabond-vol-29-29-37511002

ebooknice.com

(Ebook) Music Emotion Recognition (Multimedia Computing, Communication


and Intelligence) by Yi-Hsuan Yang, Homer H. Chen ISBN
9781439850466, 1439850461

https://ebooknice.com/product/music-emotion-recognition-multimedia-
computing-communication-and-intelligence-2414150

ebooknice.com

(Ebook) 29, Single and Nigerian by Naijasinglegirl ISBN


9781310004216, 1310004218

https://ebooknice.com/product/29-single-and-nigerian-53599780

ebooknice.com

(Ebook) Boeing B-29 Superfortress ISBN 9780764302725, 0764302728

https://ebooknice.com/product/boeing-b-29-superfortress-1573658

ebooknice.com
Affective Computing Focus on Emotion Expression
Synthesis and Recognition Jimmy Or (Editor) Digital
Instant Download
Author(s): Jimmy Or (Editor)
ISBN(s): 9783902613233, 3902613238
Edition: color illustrated
File Details: PDF, 13.55 MB
Year: 2008
Language: english
Affective Computing
Focus on Emotion Expression,
Synthesis and Recognition
Affective Computing
Focus on Emotion Expression,
Synthesis and Recognition

Edited by
Jimmy Or

I-TECH Education and Publishing


Published by the I-Tech Education and Publishing, Vienna, Austria

Abstracting and non-profit use of the material is permitted with credit to the source. Statements and
opinions expressed in the chapters are these of the individual contributors and not necessarily those of
the editors or publisher. No responsibility is accepted for the accuracy of information contained in the
published articles. Publisher assumes no responsibility liability for any damage or injury to persons or
property arising out of the use of any materials, instructions, methods or ideas contained inside. After
this work has been published by the Advanced Robotic Systems International, authors have the right to
republish it, in whole or part, in any publication of which they are an author or editor, and the make
other personal use of the work.

© 2008 I-Tech Education and Publishing


www.i-techonline.com
Additional copies can be obtained from:
[email protected]

First published May 2008


Printed in Croatia

A catalogue record for this book is available from the Austrian Library.
Affective Computing, Emotion Expression, Synthesis and Recognition, Edited by Jimmy Or
p. cm.
ISBN 978-3-902613-23-3
1. Affective Computing. 2. Or, Jimmy.
Preface

Affective Computing is a branch of artificial intelligence that deals with the design
of systems and devices that can recognize, interpret, and process emotions. Since
the introduction of the term “affective computing” by Rosalind Pichard at MIT in
1997, the research community in this field has grown rapidly. Affective Computing
is an important field because computer systems have become part of our daily
lives. As we nowadays live in the Age of Information Overload, and computer sys-
tems are becoming more complex, there is need for more natural user interfaces for
the overwhelmed computer users. Given that humans communicate with each
other by using not only speech but also implicitly their facial expressions and body
postures, machines that can understand human emotions and display affects
through these multimodal channels could be beneficial. If virtual agents and robots
are able to recognize and express their emotions through these channels, the result
of that will be more natural human-machine communication. This will allow hu-
man users to focus more on their tasks at hand.
This volume provides an overview of state of the art research in Affective Comput-
ing. It presents new ideas, original results and practical experiences in this increas-
ingly important research field. The book consists of 23 chapters categorized into
four sections. Since one of the most important means of human communication is
facial expression, the first section of this book (Chapters 1 to 7) presents a research
on synthesis and recognition of facial expressions. Given that we not only use the
face but also body movements to express ourselves, in the second section (Chap-
ters 8 to 11) we present a research on perception and generation of emotional ex-
pressions by using full-body motions. The third section of the book (Chapters 12 to
16) presents computational models on emotion, as well as findings from neurosci-
ence research. In the last section of the book (Chapters 17 to 22) we present applica-
tions related to affective computing.
A brief introduction to the book chapters is:
Chapter 1 presents a probabilistic neural network classifier for 3D analysis of facial
expressions. By using 11 facial features and taking symmetry of the human face
into consideration, the 3D distance vectors based recognition system can achieve a
high recognition rate of over 90%. Chapter 2 provides a set of deterministic and
stochastic techniques that allow efficient recognition of facial expression from a se-
ries of video imaging showing head motions. Chapter 3 reviews recent findings of
human-human interaction and demonstrates that the tangential aspects of an emo-
VI

tional signal (such as gaze and the type of face that shows the expression) can af-
fect the perceived meaning of the expression. Findings displayed in this chapter
could contribute to the design of avatars and agents used in the human computer
interface. Chapter 4 presents an approach to using genetic algorithm and neural
network for the recognition of emotion from the face. In particular, it focuses on
the eye and lip regions for the study of emotions. Chapter 5 proposes a system that
analyzes facial expressions based on topographic shape structure (eyebrow, eye,
nose and mouth) and the active texture.
Chapter 6 proposes a model of layered fuzzy facial expression generation (LFFEG)
to create expressive facial expressions for an agent in the affective human com-
puter interface. In this model, social, emotional and physiological layers contribute
to the generation of facial expression. Fuzzy theory is used to produce rich facial
expressions and personality for the virtual character. Based on recent findings that
the dynamics of facial expressions (such as timing, duration and intensity) play an
important role in the interpretation of facial expressions, Chapter 7 exams the
analysis of facial expressions based on computer vision and behavioral science
point of view. A technique that allows synthesis of photo-realistic expression of
various intensities is described.
In recent years, humanoid robots and simulated avatars have gained popularity.
Researchers try to develop both real and simulated humanoids that can behave
and communicate with humans more naturally. It is believed that a real humanoid
robot situated in the real world could better interact with humans. Given that we
also use whole body movements to express emotions, the next generation human-
oid robots should have a flexible spine and be able to express themselves by using
full body movements. Chapter 8 points out some of the challenges in developing
flexible spine humanoid robots for emotional expressions. Then, the chapter pre-
sents the development of emotional flexible spine humanoid robots based on find-
ings from a research on belly dance. Results of psychological experiments on the
effect of a full-body spine robot on human perceptions are presented.
Chapter 9 provides a review of the cues that we use in the perception of the affect
from body movements. Based on findings from psychology and neuroscience, the
authors raise the issue of whether giving a machine the ability to experience emo-
tions might help to accomplish reliable and efficient emotion recognition. Given
that human communications are multimodal, Chapter 10 reviews recent research
on systems that are capable of multiple input modalities and the use of alternative
channels to perceive affects. This is followed by a presentation of systems that are
capable of analyzing spontaneous input data in real world environments. Chapter
11 draws on findings from art theory to the synthesis of emotional expressions for
virtual humans. Lights, shadows, composition and filters are used as part of the
expression of emotions. In addition, the chapter proposes the use of genetic algo-
rithms to map affective states to multimodal expressions.
Since the modeling of emotion has become important in affective computing,
Chapter 12 presents a computational model of emotion. The model is capable of in-
VII

tegrating emotion, personality and motivation to allow the simulated characters to


have the ability of self-control in the virtual environment. Chapter 13 provides an-
other model for simulating emotions. This model, called SIMPLEX, operates in
three interconnected layers, namely personality, mood-states and emotions. Ex-
perimental results show that the simulated agents whose emotions were generated
by the model were able to exhibit emergent behavior. Chapter 14 proposes the use
of psychological emotion models to construct a new generation of user interfaces
that are capable of automatic emotional recognition by sensing and responding to
the user’s affective feedback. A Multidimensional Emotional Appraisal Semantic
Space (MEAS) semantic model is introduced. Chapter 15 reviews findings from
Neuroscience on the involvement of amygdala in emotion. This chapter explains a
general framework of how this area of the brain processes information on emotion.
Chapter 16 presents a study that shows that is possible for a computer to automati-
cally recognize emotions of its users based on physiological signals such as PPG,
GSR and SKT gathered through a specially designed mouse. Depending on the
state of the user’s emotion, the computer can adapt its actions correspondingly.
Chapter 17 presents the iFace facial expression training system. The system can be
used for rehabilitation, improvement of business skills and daily communications.
Chapter 18 introduces an automated real time virtual character based interface.
The 3D agents are able to interact with the user through multimodal and emotional
interaction. Depending on the emotional state the agents detect from the user’s fa-
cial expression during conversation, the agents are able to modify their emotional
states accordingly. The system allows more natural and interactive communica-
tions between computers and users. Chapter 19 proposes the design of an intelli-
gent tutoring system based on hand movements around the face of the user. Chap-
ter 20 presents a framework for affective-sensitive human-machine interaction.
Based on physiological signals from children users with ASD, an affect-sensitive
robot adapts its behavior to the affect of its users accordingly in real time. The sys-
tem could be used for interactive autism intervention. Chapter 21 discusses the de-
velopment of a plug-in interface for the storytelling authoring tools Inscape and
Tetrix. Using the plug-in, the authors are able to easily create interactive stories
that explore the emotional dimension of characters in the virtual world. The inter-
esting point is that the actions of the virtual characters can be influenced by their
own personal experience. Finally, Chapter 22 reviews computer therapy systems
that have been used in recent years to treat emotional disorders such as phobias.
These systems propose that by presenting anxiety, and provoking stimuli in a con-
trolled virtual environment, different social and emotional disorders can be
treated. A model that supports computer assisted regulation and voluntary control
of emotion is presented.

Acknowledgements
This book would not have been possible without the support of my colleagues and
friends. I own a great debt to Atsuo Takanishi of Waseda University. He gave me
VIII

freedom and support to pursuit my research on flexible spine humanoid robotics


during my stay in his lab. I also would like to thank Robin Cohen, Lenhart Schu-
bert, Shun-ichi Amari, Michael Arbib, Auke Ijspeert, David Willshaw, Xie Ming,
Eugene Fink, Charles Sanders, Hyun Wook Park, Jungmin Han and many others
for their support over the years. Many thanks to Lorna Gow for introducing me to
the wonderful world of belly dance. Special thanks to KAIST President Nam Pyo
Suh and Dean of Engineering Yong Hoon Lee for their support during my stay at
KAIST. Many thanks to the authors of the book chapters for their contributions. Fi-
nally, I would like to express my thanks to Dr. Vedran Kordic and the staff at I-
Tech Education and Publishing for their help in making the production of this
book possible.

Jimmy Or
May 2008
Center for High-Performance Integrated Systems
Korea Advanced Institute of Science and Technology
Daejeon, Republic of Korea
Contents

Preface ........................................................................................................................................V

1. Facial Expression Recognition Using 3D Facial Feature Distances .........................001


Hamit Soyel and Hasan Demirel

2. Facial Expression Recognition in the Presence of Head Motion ..............................013


Fadi Dornaika and Franck Davoine

3. The Devil is in the Details - the Meanings of


Faces and How They Influence the Meanings of Facial Expressions...........................045
Ursula Hess, Reginald B. Adams, Jr. and Robert E. Kleck

4. Genetic Algorithm and Neural Network for Face Emotion Recognition...................057


M. Karthigayan, M. Rizon, R. Nagarajan and Sazali Yaacob

5. Classifying Facial Expressions Based on Topo-Feature Representation ...............069


Xiaozhou Wei, Johnny Loi and Lijun Yin

6. Layered Fuzzy Facial Expression Generation:


Social, Emotional and Physiological ..................................................................................083
Xia Mao, Yuli Xue, Zheng Li and Haiyan Bao

7. Modelling, Classification and Synthesis of Facial Expressions ................................107


Jane Reilly, John Ghent and John McDonald

8. The Development of Emotional Flexible Spine Humanoid Robots ...........................133


Jimmy Or

9. The Perception of Bodily Expressions


of Emotion and the Implications for Computing...............................................................157
Winand H. Dittrich and Anthony P. Atkinson

10. From the Lab to the Real World:


Affect Recognition Using Multiple Cues and Modalities.................................................185
Hatice Gunes, Massimo Piccardi and Maja Pantic
X

11. The Art of Expressing Emotions in Virtual Humans ..................................................219


Celso de Melo and Ana Paiva

12. Computational Emotion Model for Virtual Characters ..............................................235


Zhen Liu

13. SIMPLEX - Simulation of Personal Emotion Experience ..........................................255


Henrik Kessler, Alexander Festini,
Harald C. Traue, Suzanne Filipic, Michael Weber and Holger Hoffmann

14. From Signals to Emotions:


Applying Emotion Models to HM Affective Interactions .................................................271
Rita Ciceri and Stefania Balzarotti

15. The Information Processing Role of the Amygdala in Emotion...............................297


Wataru Sato

16. A Physiological Approach to Affective Computing ...................................................309


Mincheol Whang and Joasang Lim

17. iFace: Facial Expression Training System ..................................................................319


Kyoko Ito, Hiroyuki Kurose, Ai Takami and Shogo Nishida

18. Affective Embodied Conversational Agents for Natural Interaction.......................329


Eva Cerezo, Sandra Baldassarri, Isabelle Hupont and Francisco J. Seron

19. Exploring Un-Intentional Body Gestures for Affective System Design..................355


Abdul Rehman Abbasi, Nitin V. Afzulpurkar and Takeaki Uno

20. Towards Affect-sensitive Assistive


Intervention Technologies for Children with Autism .......................................................365
Karla Conn, Changchun Liu, Nilanjan Sarkar, Wendy Stone and Zachary Warren

21. Authoring Emotion ..........................................................................................................391


Nelson Zagalo, Rui Prada, Isabel Machado Alexandre and Ana Torres

22. Computer-Assisted Regulation of Emotional and Social Processes .....................405


Toni Vanhala and Veikko Surakka

23. Generating Facial Expressions with Deep Belief Nets .............................................421


Joshua M. Susskind, Geoffrey E. Hinton, Javier R. Movellan and Adam K. Anderson
1

Facial Expression Recognition Using


3D Facial Feature Distances
Hamit Soyel and Hasan Demirel
Eastern Mediterranean University
Northern Cyprus

1. Introduction
Face plays an important role in human communication. Facial expressions and gestures
incorporate nonverbal information which contributes to human communication. By
recognizing the facial expressions from facial images, a number of applications in the field of
human computer interaction can be facilitated. Last two decades, the developments, as well
as the prospects in the field of multimedia signal processing have attracted the attention of
many computer vision researchers to concentrate in the problems of the facial expression
recognition. The pioneering studies of Ekman in late 70s have given evidence to the
classification of the basic facial expressions. According to these studies, the basic facial
expressions are those representing happiness, sadness, anger, fear, surprise, disgust and
neutral. Facial Action Coding System (FACS) was developed by Ekman and Friesen to code
facial expressions in which the movements on the face are described by action units. This
work inspired many researchers to analyze facial expressions in 2D by means of image and
video processing, where by tracking of facial features and measuring the amount of facial
movements, they attempt to classify different facial expressions. Recent work on facial
expression analysis and recognition has used these seven basic expressions as their basis for
the introduced systems.
Almost all of the methods developed use 2D distribution of facial features as inputs into a
classification system, and the outcome is one of the facial expression classes. They differ
mainly in the facial features selected and the classifiers used to distinguish among the
different facial expressions. Information extracted from 3D face models are rarely used in
the analysis of the facial expression recognition. This chapter considers the techniques using
the information extracted from 3D space for the analysis of facial images for the recognition
of facial expressions.
The first part of the chapter introduces the methods of extracting information from 3D
models for facial expression recognition. The 3D distributions of the facial feature points
and the estimation of characteristic distances in order to represent the facial expressions are
explained by using a rich collection of illustrations including graphs, charts and face images.
The second part of the chapter introduces 3D distance-vector based facial expression
recognition. The architecture of the system is explained by the block diagrams and
flowcharts. Finally 3D distance-vector based facial expression recognition is compared with
the conventional methods available in the literature.
2 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

2. Information extracted from 3D models for facial expression recognition


Conventional methods for analyzing expressions of facial images use limited information
such as gray levels of pixels and positions of feature points in a face [Donato et al.,1999],
[Fasel & Luttin, (2003)], [Pantic & Rothkrantz ,2004]. Their results depend on the
information used. If the information cannot be precisely extracted from the facial images,
then we may obtain unexpected results. In order to increase the reliability of the results of
facial expression recognition, the selection of the relevant feature points is important.
In this section we are primarily concerned with gathering the relevant data from the facial
animation sequences for expression recognition. The section is organised as follows. In
section 2.1 we will present the description of the primary facial expressions while section 2.2
shows the muscle actions involved in the primary facial expressions and in section 2.3 we
will present the optimization of the facial feature points.

2.1 Primary facial expressions


In the past, facial expression analysis was essentially a research topic for psychologists.
However, recent progresses in image processing and pattern recognition have motivated
significant research activities on automatic facial expression recognition [Braathen et
al.,2002]. Basic facial expressions, shown in Figure 1, typically recognized by psychologists
are neutral, anger, sadness, surprise, happiness, disgust and fear [P. Ekman & W.
Friesen,1976]. The expressions are textually defined in Table 1.

Fig.1. Emotion-specified facial expression [Yin et al., 2006]: 1-Neutral, 2-Anger, 3-Sadness, 4-
Surprise, 5- Happiness, 6- Disgust, 7- Fear.

Expression Textual Description


All face muscles are relaxed. Eyelids are tangent to the iris. The mouth is
Neutral
closed and lips are in contact.
The inner eyebrows are pulled downward and together. The eyes are wide
Anger
open. The lips are pressed against each other or opened to expose the teeth.
The inner eyebrows are bent upward. The eyes are slightly closed. The
Sadness
mouth is relaxed.
The eyebrows are raised. The upper eyelids are wide open, he lower
Surprise
relaxed. The jaw is opened.
The eyebrows are relaxed. The mouth is open and the mouth corners pulled
Happiness
back toward the ears.
The eyebrows and eyelids are relaxed. The upper lip is raised and curled,
Disgust
often asymmetrically.
The eyebrows are raised and pulled together. The inner eyebrows are bent
Fear
upward. The eyes are tense and alert.
Table 1. Basic Facial Expressions [Pandzic & Forchheimer, 2002]
Facial Expression Recognition Using 3D Facial Feature Distances 3

Fig. 2. The 3D orientation of the facial feature points [Pandzic & Forchheimer, 2002].

2.2 Muscle actions involved in the primary facial expressions


The Facial Definition Parameter set (FDP) and the Facial Animation Parameter set (FAP)
were designed in the MPEG-4 framework to allow the definition of a facial shape and
texture, as well as animation of faces reproducing expressions, emotions and speech
pronunciation. The FAPs [Pandzic & Forchheimer, 2002] are based on the study of minimal
facial actions and are closely related to muscle activation, in the sense that they represent a
complete set of atomic facial actions; therefore they allow the representation of even the
most detailed natural facial expressions, even those that cannot be categorized as particular
ones. All the parameters involving translational movement are expressed in terms of the
Facial Animation Parameter Units (FAPU). These units are defined with respect to specific
distances in a neutral pose in order to allow interpretation of the FAPs on any facial model
4 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

in a consistent way. As a result, description schemes that utilize FAPs produce reasonable
results in terms of expression and speech related postures.

Expression Muscle Actions

squeeze_l_eyebrow (+) squeeze_r_eyebrow (+)


lower_t_midlip (-) raise_b_midlip (+)
Anger raise_l_i_eyebrow (+) raise_r_i_eyebrow (+)
close_t_r_eyelid (-) close_t_l_eyelid (-)
close_b_r_eyelid (-) close_b_l_eyelid (-)

raise_l_i_eyebrow (+) raise_r_i_eyebrow (+)


close_t_l_eyelid (+) close_t_r_eyelid (+)
Sadness raise_l_m_eyebrow (-) raise_r_m_eyebrow (-)
raise_l_o_eyebrow (-) raise_r_o_eyebrow (-)
close_b_l_eyelid (+) close_b_r_eyelid (+)

raise_l_o_eyebrow (+) raise_r_o_eyebrow (+)


raise_l_i_eyebrow (+) raise_r_i_eyebrow (+)
Surprise raise_l_m_eyebrow (+) raise_r_m_eyebrow (+)
squeeze_l_eyebrow (-) squeeze_r_eyebrow (-)
open_jaw (+)

close_t_l_eyelid (+) close_t_r_eyelid (+)


close_b_l_eyelid (+) close_b_r_eyelid (+)
stretch_l_cornerlip (+) stretch_r_cornerlip (+)
Joy raise_l_m_eyebrow (+) raise_r_m_eyebrow (+)
lift_r_cheek (+) lift_l_cheek (+)
lower_t_midlip (-) raise_b_midlip (-)
OR open_jaw (+)

close_t_l_eyelid (+) close_b_l_eyelid (+)


close_t_r_eyelid (+) close_b_r_eyelid (+)
Disgust lower_t_midlip (-) open_jaw (+)
squeeze_l_cornerlip (+) AND / OR
{squeeze_r_cornerlip (+)}

raise_l_o_eyebrow (+) raise_r_o_eyebrow (+)


raise_l_m_eyebrow(+) raise_r_m_eyebrow (+)
raise_l_i_eyebrow (+) raise_r_I_eyebrow (+)
Fear squeeze_l_eyebrow (+) squeeze_r_eyebrow (+)
open_jaw (+)
OR{ close_t_l_eyelid (-), OR {close_t_r_eyelid (-),
lower_t_midlip (-)} lower_t_midlip (+)}

Table 2. Muscle Actions involved in the six basic expressions [Karpouzis et al.,2000].
Facial Expression Recognition Using 3D Facial Feature Distances 5

In general, facial expressions and emotions can be described as a set of measurements (FDPs
and derived features) and transformations (FAPs) that can be considered atomic with
respect to the MPEG-4 standard. In this way, one can describe the anatomy of a human face,
as well as any animation parameters with the change in the positions of the facial feature
points, thus eliminating the need to explicitly specify the topology of the underlying
geometry. These facial feature points can then be mapped to automatically detected
measurements and indications of motion on a video sequence and thus help analyse or
reconstruct the emotion or expression recognized by the system.
MPEG-4 specifies 84 feature points on the neutral face. The main purpose of these feature
points is to provide spatial references to key positions on a human face. These 84 points
were chosen to best reflect the facial anatomy and movement mechanics of a human face.
The location of these feature points has to be known for any MPEG-4 compliant face model.
The Feature points on the model should be located according to figure points illustrated in
Figure 2. After a series of analysis on faces we have concluded that mainly 15 FAP’s are
affected by these expressions [Soyel et al., 2005].
These facial features are moved due to the contraction and expansion of facial muscles,
whenever a facial expression is changed. Table 2 illustrates the description of the basic
expressions using the MPEG-4 FAPs terminology.
Although muscle actions [P. Ekman & W. Friesen,1978] are of high importance, with respect
to facial animation, one is unable to track them analytically without resorting to explicit
electromagnetic sensors. However, a subset of them can be deduced from their visual
results, that is, the deformation of the facial tissue and the movement of some facial surface
points. This reasoning resembles the way that humans visually perceive emotions, by
noticing specific features in the most expressive areas of the face, the regions around the
eyes and the mouth. The seven basic expressions, as well as intermediate ones, employ facial
deformations strongly related with the movement of some prominent facial points that can
be automatically detected. These points can be mapped to a subset of the MPEG-4 feature
point set. The reader should be noted that MPEG-4 defines the neutral as all face muscles are
relaxed.

2.3 Relevant facial feature points


In order to reduce the amount of time required to perform the experiments, a small set of 11
feature points were selected. Care was taken to select facial feature points from the whole
set defined by the MPEG-4 standard. The MPEG-4 standard divides feature points into a
number groups, which is listed in Table 3, corresponding to the particular region of the face
to which they belong. A few points from nearly all the groups were taken. Nine points were
selected from the left side of the face (Repetitive selection on the right side is not needed due
to symmetry). The feature points selected were such that they have varying predicted
extraction difficulty. The feature points selected are shown in Figure 3.
6 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

Fig. 3. 11-facial feature points: 1-Left corner of outer-lip contour, 2-Right corner of outer-lip
contour, 3-Middle point of outer upper-lip contour, 4- Middle point of outer lower-lip
contour, 5-Right corner of the right eye, 6-Left corner of the right eye, 7-Centre of upper
inner-right eyelid, 8-Centre of lower inner-right eyelid, 9-Uppermost point of the right
eyebrow, 10-Outermost point of right-face contour, 11- Outermost point of left-face contour.
Feature Point Groups Selected Feature Points
2- Chin, innerlip -
3.10-centre of lower inner-right eyelid
3.11- left corner of the right eye
3- Eyes
3.12-right corner of the right eye
3.14-centre of upper inner-right eyelid
4- Eye brows 4.4-uppermost point of the right eyebrow
5- Cheek -
6- Tongue -
7- Spine -
8.1-middle point of outer upper-lip contour
8.2-middle point of outer lower-lip contour
8- Outer Lip
8.3-left corner of outer-lip contour
8.4 right corner of outer-lip contour
9- Nose, Nostrils -
10.9-outermost point of left-face contour
10- Ear
10.10-outermost point of right-face contour
11-Hair Line -
Table 3. Selected facial features points.
Facial Expression Recognition Using 3D Facial Feature Distances 7

3. 3D distance-vector based facial expression recognition


3.1 Information extracted from 3D Space
By using the distribution of the 11 facial feature points from 3D facial model we extract six
characteristic distances that serve as input to neural network classifier used for recognizing
the different facial expressions shown in Table 4.

Distance No Distance Name Distance Description


Distance between the right corner of the right eye and
D1 Eye Opening
the left corner of the right eye.
Eyebrow Distance between the centre of upper inner-right
D2
Height eyelid and the uppermost point of the right eyebrow.
Distance between the left corner of outer-lip contour
D3 Mouth Opening
and right corner of outer-lip contour.
Distance between the middle point of outer upper-lip
D4 Mouth Height
contour and middle point of outer lower-lip contour.
Distance between the right corner of the right eye and
D5 Lip Stretching
right corner of outer-lip contour.
Distance between the outermost point of right-face
D6 Normalization
contour and outermost point of left-face contour.
Table 4. Six characteristic distances.

3.2 Basic architecture of facial expression recognition system


Facial expression recognition includes both measurement of facial motion and recognition of
expression. The general approach to Automatic Facial Expression Analysis (AFEA) systems,
which is shown in Figure 4, can be categorised by three steps.
• Face acquisition.
• Facial feature extraction and representation.
• Facial expression recognition.
Face acquisition is the first step of the facial expression recognition system to find a face
region in the input frame images. After determining the face location, various facial feature
extraction approaches can be used. Mainly there are two general approaches; geometric
feature-based methods and appearance-based methods. The first one utilizes the shape and
the location of face components such as: mouth, nose, and eyes which are represented by a
feature vector extracted from these facial components. In appearance-based methods, image
filters, such as Gabor wavelets, are applied to either the whole face or specific regions in a
face image to extract a feature vector.
Depending on the different facial feature extraction methods, the effects of in-plane head
rotation and different scales of the faces can be eliminated, either by face normalization
before the feature extraction or by feature representation before the step of expression
recognition. The last stage of the facial expression analysis system is facial expression
recognition using different classification approaches. Facial expression recognition usually
results in classes according to either the Facial Actions Coding System (FACS) or the seven
basic facial expressions.
8 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

Fig. 4. Basic Architecture of Facial Expression Recognition System

3.3 Classification of the facial expressions


By using the entire information introduced in the previous section, we achieve 3D facial
expression recognition in the following phases. First, we extract the characteristic distance
vectors as defined in Table 3. Then, we classify a given distance vector on a previously
trained neural network. The sixth distance, D6, is used to normalize the first five distances.
The neural network architecture consists of a multilayered perceptron of input, hidden and
output layers that is trained by using Backpropagation algorithm in the training process.
The input layer receives a vector of six distances and the output layer represents 7 possible
facial expressions mentioned in the preceding sections.
Backpropagation was created by generalizing the Widrow-Hoff learning rule to multiple-
layer networks and nonlinear differentiable transfer functions. Input vectors and the
corresponding target vectors are used to train a network until it can approximate a function
to associate input vectors with specific output vectors, or classify the input vectors.
Networks with biases, a sigmoid layer, and a linear output layer are capable of
approximating any function with a finite number of discontinuities.
Standard backpropagation is a gradient descent algorithm, as is the Widrow-Hoff learning
rule, in which the network weights are moved along the negative of the gradient of the
performance function. The term backpropagation refers to the manner in which the gradient
is computed for nonlinear multilayer networks. There are a number of variations on the
basic algorithm that are based on other standard optimization techniques, such as conjugate
gradient and Newton methods.
Properly trained backpropagation networks tend to give reasonable answers when
presented with inputs that they have never seen. Typically, a new input leads to an output
similar to the correct output for input vectors used in training that are similar to the new
input being presented. This generalization property makes it possible to train a network on
a representative set of input/target pairs and get good results without training the network
on all possible input/output pairs [ Rumelhart et al.,1986].
We used BU-3DFE database [Yin et al., 2006] in our experiments to train and test our model.
The database we have used contains 7 facial expressions for 60 different people. We
arbitrarily divided the 60 subjects into two subsets: one with 54 subjects for training and the
other with 6 subjects for testing. During the recognition experiments, a distance vector is
derived for every 3D model. Consecutive distance vectors are assumed to be statistically
independent as well as the underlying class sequences. The vector is eventually assigned to
the class with the highest likelihood score.
Facial Expression Recognition Using 3D Facial Feature Distances 9

4. Performances analysis and discussions


4.1 Training and testing the data
Neural networks are composed of simple elements operating in parallel. These elements are
inspired by biological nervous systems. As in nature, the network function is determined
largely by the connections between elements. We can train a neural network to perform a
particular function by adjusting the values of the connections (weights) between elements.
Commonly neural networks are adjusted, or trained, so that a particular input leads to a
specific target output. Such a situation is shown in Figure 5. The network is adjusted, based
on a comparison of the output and the target, until the network output matches the target.
Typically many such input/target pairs are used, in this supervised learning, to train a
network.

Target

Neural Network including


Input connections between Output Compare
neurons called weights

Adjust
Weights

Fig.5. Basic Neural Network Structure


Batch training of a network proceeds by making weight and bias changes based on an entire
set of input vectors. Incremental training changes the weights and biases of a network as
needed after presentation of each individual input vector. Incremental training is sometimes
referred to as "on line" or "adaptive" training.
Once the network weights and biases have been initialized, the network is ready for
training. The network can be trained for function approximation, pattern association, or
pattern classification. The training process requires a set of examples of proper network
behaviour - network inputs and target outputs. During training the weights and biases of
the network are iteratively adjusted to minimize the network the average squared error
between the network outputs and the target outputs.
We have tested our neural network setup on the BU-3DFE database [Yin et al., 2006], which
contains posed emotional facial expression images with seven fundamental emotional
states, Anger, Disgust, Fear, Happiness, Sadness, Surprise and Neutral. In our experiment,
we used the data captured from 60 subjects for each expression. The test is based on the
seven fundamental expressions. The 3D distribution of the 84 feature vertices was provided
for each facial model. A detail description of the database construction, post-processing, and
organization can be found in [Yin et al., 2006].
10 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

4.2 System performance


Our facial expression analysis experiments are carried out in a person-independent manner,
which is thought to be more challenging than a person-dependent approach. We arbitrarily
divided the 60 subjects into two subsets: one with subjects for training and the other with
subjects for test. The experiments assure that any subject used for testing does not appear in
the training set because the random partition is based on the subjects rather than the
individual expression. The tests are executed 10 times with different partitions to achieve a
stable generalized recognition rate. The entire process assures that every subject is tested at
least once for each classifier. For each round of the test, all the classifiers are reset and re-
trained from the initial state. We show the results for all the neural network classifiers in
Table 5. Note that most of the expressions are detected with high accuracy and the confusion
is larger with the Neutral and Anger classes. One reason why Anger is detected with only
85% is that in general this emotion’s confusion with Sadness and Neutral is much larger
than with the other emotions. As we compared the proposed 3D Distance Vectors based
Facial Expression Recognition method (3D-DVFER) with 2D appearance feature based
Gabor-wavelet (GW) approach [Lyons et al. 1999] we found the Gabor-wavelet approach
performs poorly with an average recognition rate around 80%, comparing to the
performance shown in Table 5, the 3D-DVFER method is superior to the 2D appearance
feature based methods when classifying the seven prototypic facial expressions.

Input/Output Neutral Happy Fear Surprise Sadness Disgust Anger


Neutral 86.7% 0.0% 1.7% 0.0% 3.7% 1.7% 6.7%
Happy 0.0% 95.0% 3.3% 0.0% 0.0% 5.0% 3.3%
Fear 0.0% 3.3% 91.7% 1.7% 0.0% 1.7% 0.0%
Surprise 0.0% 0.0% 0.0% 98,3% 0.0% 0.0% 0.0%
Sadness 6.7% 0.0% 1.7% 0.0% 90.7% 0.0% 5.0%
Disgust 1.7% 1.7% 0.0% 0.0% 1.9% 91.7% 0.0%
Anger 5.0% 0.0% 1.7% 0.0% 3.7% 0.0% 85.0%

Table 5. Average confusion matrix using the NN classifier (BU-3DFE database)[H. Soyel &
H. Demirel, 2007 ].
When we compare the results of the proposed system with the results reported in [Wang et
al., 2006] which use the same 3D database through an LDA classifier, we can see that our
method outperforms the recognition rates in Table 6 for all of the facial expressions except
the Happy case. Both systems give the same performance for the “Happy” facial expression.
Note that the classifier in [Wang et al., 2006] does not consider the Neutral case as an
expression, which gives an advantage to the approach.
The average recognition rate of the proposed system is 91.3% where the average
performance of the method given in [Wang et al., 2006] stays at 83.6% for the recognition of
the facial expressions that uses the same 3D database.
Facial Expression Recognition Using 3D Facial Feature Distances 11

Input/Output Happy Fear Surprise Sadness Disgust Anger


Happy 95.0% 3.8% 0.0% 0.4% 0.8% 0.0%
Fear 12.5% 75.0% 2.1% 7.9% 2.5% 0.0%
Surprise 0.0% 1.2% 90.8% 5.4% 0.8% 1.7%
Sadness 0.0% 2.9% 5.8% 80.4% 2.5% 8.3%
Disgust 3.8% 4.2% 0.4% 6.7% 80.4% 4.6%

Anger 0.0% 6.3% 0.8% 11.3% 1.7% 80.0%

Table 6. Average confusion matrix using of the LDA based classifier in [Wang et al., 2006]

5. Conclusion
In this chapter we have shown that probabilistic neural network classifier can be used for
the 3D analysis of facial expressions without relying on all of the 84 facial features and error-
prone face pose normalization stage. Face deformation as well as facial muscle contraction
and expansion are important indicators for facial expression and by using only 11 facial
feature points and symmetry of the human face, we are able to extract enough information
from a from a face image. Our results show that 3D distance vectors based recognition
outperforms facial expression recognition results compared to the results of the similar
systems using 2D and 3D facial feature analysis. The average facial expression recognition
rate of the proposed system reaches up to 91.3%. The quantitative results clearly suggest
that the proposed approach produces encouraging results and opens a promising direction
for higher rate expression analysis.

6. References
Ekman, P. & Friesen, W. (1976). Pictures of Facial Affect. Palo Alto, CA: Consulting
Psychologist
Ekman, P. & Friesen, W. (1978). The Facial Action Coding System: A Technique for the
Measurement of Facial Movement, Consulting Psychologists Press, San Francisco
Rumelhart, D. Hinton, G. Williams, R. (1986) Learning internal representations by error
propagation, In. Parallel Data Processing,D. Rumelhart and J. McClelland, (Ed.), pp.
318-362,the M.I.T. Press, Cambridge, MA
Donato, G. Bartlett, M. Hager, Ekman, P. & Sejnowski, T. (1999). Classifying facial actions.
IEEE Transaction on Pattern Analysis and Machine Intelligence, 21(10), pp. 974–989
Lyons, M. Budynek, J. & Akamatsu, S. (1999). Automatic classification of single facial
images. IEEE Trans. On PAMI, 21, pp. 1357–1362
Karpouzis, K. Tsapatsoulis, N. & Kollias, S. (2000). Moving to Continuous Facial Expression
Space using the MPEG-4 Facial Definition Parameter (FDP) Set, In Proceedings of the
Electronic Imaging, San Jose, USA
Braathen, B. Bartlett, M. Littlewort, G. Smith, E. & Movellan, J. (2002). An approach to
automatic recognition of spontaneous facial actions. In Proceedings of International
Conferance on FGR, pp. 345-350, USA
12 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

Pandzic, I.& Forchheimer R. (Ed.) (2002). MPEG-4 Facial Animation: the Standard,
Implementation and Applications, Wiley
Fasel, B. & Luttin, J. (2003). Automatic facial expression analysis: Survey. Pattern Recognition,
36(1), pp. 259–275
Pantic, M. & Rothkrantz, L. (2004). Facial action recognition for facial expression analysis
from static face images. IEEE Trans. on SMC-Part B: Cybernetics, 34, pp. 1449–1461
Soyel, H. Yurtkan, K. Demirel, H. Ozkaramanli, H. Uyguroglu, E. Varoglu, M. (2005). Face
Modeling andAnimation for MPEG Compliant Model Based Video Coding,
IASTED International Conference on Computer Graphics and Imaging.
Yin, L. Wei, X. Sun, Y. Wang, J. & Rosato, M.(2006). A 3d facial expression database for
facial behavior research. In Proceedings of International Conferance on FGR, pp. 211-
216, UK
Wang, J. Yin, L. Wei, X. & Sun, Y.(2006). 3D Facial Expression Recognition Based on
Primitive Surface Feature Distribution. IEEE CVPR'06 - Volume 2, pp. 1399-1406
Soyel, H. Demirel, H. (2007) Facial Expression Recognition using 3D Facial Feature
Distances, Lecture Notes in Computer Science (ICIAR 07), vol. 4633, pp. 831-838.
2

Facial Expression Recognition in the


Presence of Head Motion
Fadi Dornaika1 and Franck Davoine2
National Geographical Institute (IGN), 2 avenue Pasteur, 94165 Saint-Mandé 1,
Heudiasyc Mixed Research Unit, CNRS/UTC, 60205 Compiègne 2,
France

1. Introduction
The human face has attracted attention in a number of areas including psychology,
computer vision, human-computer interaction (HCI) and computer graphics (Chandrasiri et
al, 2004). As facial expressions are the direct means of communicating emotions, computer
analysis of facial expressions is an indispensable part of HCI designs. It is crucial for
computers to be able to interact with the users, in a way similar to human-to-human
interaction. Human-machine interfaces will require an increasingly good understanding of a
subject's behavior so that machines can react accordingly. Although humans detect and
analyze faces and facial expressions in a scene with little or no effort, development of an
automated system that accomplishes this task is rather diffcult.
One challenge is to construct robust, real-time, fully automatic systems to track the facial
features and expressions. Many computer vision researchers have been working on tracking
and recognition of the whole face or parts of the face. Within the past two decades, much
work has been done on automatic recognition of facial expression. The initial 2D methods
had limited success mainly because their dependency on the camera viewing angle. One of
the main motivations behind 3D methods for face or expression recognition is to enable a
broader range of camera viewing angles (Blanz & Vetter, 2003; Gokturk et al., 2002; Lu et
al., 2006; Moreno et al., 2002; Wang et al., 2004; Wen & Huang, 2003; Yilmaz et al., 2002).
To classify expressions in static images many techniques have been proposed, such as those
based on neural networks (Tian et al., 2001), Gabor wavelets (Bartlett et al., 2004), and
Adaboost (Wang et al., 2004). Recently, more attention has been given to modeling facial
deformation in dynamic scenarios, since it is argued that information based on dynamics is
richer than that provided by static images. Static image classifiers use feature vectors related
to a single frame to perform classification (Lyons et al., 1999). Temporal classifiers try to
capture the temporal pattern in the sequence of feature vectors related to each frame. These
include the Hidden Markov Model (HMM) based methods (Cohen et al., 2003) and Dynamic
Bayesian Networks (DBNs) (Zhang & Ji, 2005). In (Cohen et al., 2003), the authors introduce
a facial expression recognition from live video input using temporal cues. They propose a
new HMM architecture for automatically segmenting and recognizing human facial
expression from video sequences. The architecture performs both segmentation and
recognition of the facial expressions automatically using a multi-level architecture
14 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

composed of an HMM layer and a Markov model layer. In (Zhang & Ji, 2005), the authors
present a new approach to spontaneous facial expression understanding in image
sequences. The facial feature detection and tracking is based on active Infra Red
illumination. Modeling dynamic behavior of facial expression in image sequences falls
within the framework of information fusion with DBNs. In (Xiang et al., 2008), the authors
propose a temporal classifier based on the use of fuzzy C means where the features are
given by Fourrier transform.
Surveys of facial expression recognition methods can be found in (Fasel & Luettin, 2003;
Pantic & Rothkrantz, 2000). A number of earlier systems were based on facial motion
encoded as a dense flow between successive image frames. However, flow estimates are
easily disturbed by illumination changes and non-rigid motion. In (Yacoob & Davis, 1996),
the authors compute optical flow of regions on the face, then they use a rule-based classifier
to recognize the six basic facial expressions. Extracting and tracking facial actions in a video
can be done in several ways. In (Bascle & Black, 1998), the authors use active contours for
tracking the performer's facial deformations. In (Ahlberg, 2002), the author retrieves facial
actions using a variant of Active Appearance Models. In (Liao & Cohen, 2005), the authors
used a graphical model for modeling the interdependencies of defined facial regions for
characterizing facial gestures under varying pose. The dominant paradigm involves
computing a time-varying description of facial actions/features from which the expression
can be recognized; that is to say, the tracking process is performed prior to the recognition
process (Dornaika & Davoine, 2005; Zhang & Ji, 2005).
However, the results of both processes affect each other in various ways. Since these two
problems are interdependent, solving them simultaneously increases reliability and
robustness of the results. Such robustness is required when perturbing factors such as
partial occlusions, ultra-rapid movements and video streaming discontinuity may affect the
input data. Although the idea of merging tracking and recognition is not new, our work
addresses two complicated tasks, namely tracking the facial actions and recognizing
expression over time in a monocular video sequence.
In the literature, simultaneous tracking and recognition has been used in simple cases. For
example, (North et al., 2000) employs a particle-filter-based algorithm for tracking and
recognizing the motion class of a juggled ball in 2D. Another example is given in (Zhou et
al., 2003); this work proposes a framework allowing the simultaneous tracking and
recognizing of human faces using a particle filtering method. The recognition consists in
determining a person's identity, which is fixed for the whole probe video. The authors use a
mixed state vector formed by the 2D global face motion (affine transform) and an identity
variable. However, this work does not address either facial deformation or facial expression
recognition.
In this chapter, we describe two frameworks for facial expression recognition given natural
head motion. Both frameworks are texture- and view-independent. The first framework
exploits the temporal representation of tracked facial action in order to infer the current
facial expression in a deterministic way. The second framework proposes a novel paradigm
in which facial action tracking and expression recognition are simultaneously performed.
The second framework consists of two stages. First, the 3D head pose is estimated using a
deterministic approach based on the principles of Online Appearance Models (OAMs).
Second, the facial actions and expression are simultaneously estimated using a stochastic
approach based on a particle filter adopting mixed states (Isard & Blake, 1998). This
Facial Expression Recognition in the Presence of Head Motion 15

proposed framework is simple, efficient and robust with respect to head motion given that
(1) the dynamic models directly relate the facial actions to the universal expressions, (2) the
learning stage does not deal with facial images but only concerns the estimation of auto-
regressive models from sequences of facial actions, which is carried out using closed- from
solutions, and (3) facial actions are related to a deformable 3D model and not to entities
measured in the image plane.

1.1 Outline of the chapter


This chapter provides a set of recent deterministic and stochastic (robust) techniques that
perform efficient facial expression recognition from video sequences. The chapter
organization is as follows. The first part of the chapter (Section 2) briefly describes a real
time face tracker adopting a deformable 3D mesh and using the principles of Online
Appearance Models. This tracker can provide the 3D head pose parameters and some facial
actions. The second part of the chapter (Section 3) focuses on the analysis and recognition of
facial expressions in continuous videos using the tracked facial actions. We propose two
pose- and texture-independent approaches that exploit the tracked facial action parameters.
The first approach adopts a Dynamic Time Warping technique for recognizing expressions
where the training data are a set of trajectory examples associated with universal facial
expressions. The second approach models trajectories associated with facial actions using
Linear Discriminant Analysis. The third part of the chapter (Section 4) addresses the
simultaneous tracking and recognition of facial expressions. In contrast to the mainstream
approach "tracking then recognition", this framework simultaneously retrieves the facial
actions and expression using a particle filter adopting multi-class dynamics that are
conditioned on the expression.

2. Face and facial action tracking


2.1 A deformable 3D model
In our study, we use the Candide 3D face model (Ahlberg, 2002). This 3D deformable
wireframe model was first developed for the purposes of model-based image coding and
computer animation. The 3D shape of this wireframe model (triangular mesh) is directly
recorded in coordinate form. It is given by the coordinates of the 3D vertices Pi, i = 1,…, n
where n is the number of vertices. Thus, the shape up to a global scale can be fully described
by the 3n vector g; the concatenation of the 3D coordinates of all vertices Pi. The vector g is
written as:

(1)

where g is the standard shape of the model, τ s and τ a are shape and animation control
vectors, respectively, and the columns of S and A are the Shape and Animation Units. A
Shape Unit provides a means of deforming the 3D wireframe so as to be able to adapt eye
width, head width, eye separation distance, etc. Thus, the term S τ s accounts for shape
variability (inter-person variability) while the term A τ a accounts for the facial animation
(intra-person variability). The shape and animation variabilities can be approximated well
enough for practical purposes by this linear relation. Also, we assume that the two kinds of
variability are independent. With this model, the ideal neutral face configuration is
represented by τ a = 0. The shape modes were created manually to accommodate the
16 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

subjectively most important changes in facial shape (face height/width ratio, horizontal and
vertical positions of facial features, eye separation distance). Even though a PCA was
initially performed on manually adapted models in order to compute the shape modes, we
preferred to consider the Candide model with manually created shape modes with semantic
signification that are easy to use by human operators who need to adapt the 3D mesh to
facial images. The animation modes were measured from pictorial examples in the Facial
Action Coding System (FACS) (Ekman & Friesen, 1977).
In this study, we use twelve modes for the facial Shape Units matrix S and six modes for the
facial Animation Units (AUs) matrix A. Without loss of generality, we have chosen the six
following AUs: lower lip depressor, lip stretcher, lip corner depressor, upper lip raiser,
eyebrow lowerer and outer eyebrow raiser. These AUs are enough to cover most common
facial animations (mouth and eyebrow movements). Moreover, they are essential for
conveying emotions. The effects of the Shape Units and the six Animation Units on the 3D
wireframe model are illustrated in Figure 1.

Figure 1: First row: Facial Shape units (neutral shape, mouth width, eyes width, eyes vertical
position, eye separation distance, head height). Second and third rows: Positive and
negative perturbations of Facial Action Units (Brow lowerer, Outer brow raiser, Jaw drop,
Upper lip raiser, Lip corner depressor, Lip stretcher).
In equation (1), the 3D shape is expressed in a local coordinate system. However, one should
relate the 3D coordinates to the image coordinate system. To this end, we adopt the weak
perspective projection model. We neglect the perspective effects since the depth variation of
the face can be considered as small compared to its absolute depth. Therefore, the mapping
Facial Expression Recognition in the Presence of Head Motion 17

between the 3D face model and the image is given by a 2×4 matrix, M, encapsulating both
the 3D head pose and the camera parameters.
Thus, a 3D vertex Pi = (Xi, Yi, Zi)T ⊂ g will be projected onto the image point pi = (ui, vi)T
given by:

(2)
For a given subject, τs is constant. Estimating τs can be carried out using either feature-based
(Lu et al., 2001) or featureless approaches (Ahlberg, 2002). In our work, we assume that the
control vector τs is already known for every subject, and it is set manually using for instance
the face in the first frame of the video sequence (the Candide model and target face shapes
are aligned manually). Therefore, Equation (1) becomes:

(3)

where gs represents the static shape of the face-the neutral face configuration. Thus, the state
of the 3D wireframe model is given by the 3D head pose parameters (three rotations and
three translations) and the animation control vector τa. This is given by the 12-dimensional
vector b:

(4)

(5)

where the vector h represents the six degrees of freedom associated with the 3D head pose.

(a) (b)
Figure 2: (a) an input image with correct adaptation of the 3D model. (b) the corresponding
shape-free facial image.

2.2 Shape-free facial patches


A facial patch is represented as a shape-free image (geometrically normalized rawbrightness
image). The geometry of this image is obtained by projecting the standard shape g with a
centered frontal 3D pose onto an image with a given resolution. The geometrically
normalized image is obtained by texture mapping from the triangular 2D mesh in the input
image (see Figure 2) using a piece-wise affine transform, W. The warping process applied to
an input image y is denoted by:

(6)
18 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

where x denotes the shape-free patch and b denotes the geometrical parameters. Several
resolution levels can be chosen for the shape-free patches. The reported results are obtained
with a shape-free patch of 5392 pixels. Regarding photometric transformations, a zero-mean
unit-variance normalization is used to partially compensate for contrast variations. The
complete image transformation is implemented as follows: (i) transfer the rawbrightness
facial patch y using the piece-wise affine transform associated with the vector b, and (ii)
perform the gray-level normalization of the obtained patch.

2.3 Adaptive facial texture model


In this work, the facial texture model (appearance model) is built online using the tracked
shape-free patches. We use the HAT symbol for the tracked parameters and patches. For a
given frame t, b̂ t represents the computed geometric parameters and x̂ t the corresponding
shape-free patch, that is,

(7)

The estimation of b̂ t from the sequence of images will be presented in Section 2.4. b̂ 0 is set
manually, according to the face in the first video frame. The facial texture model
(appearance model) associated with the shape-free facial patch at time t is time-varying in
that it models the appearances present in all observations x̂ up to time t - 1. This may be
required as a result, for instance, of illumination changes or out-of-plane rotated faces.
By assuming that the pixels within the shape-free patch are independent, we can model the
appearance using a multivariate Gaussian with a diagonal covariance matrix Σ. In other
words, this multivariate Gaussian is the distribution of the facial patches x̂ t. Let μ be the
Gaussian center and σ the vector containing the square root of the diagonal elements of the
covariance matrix Σ. μ and σ are d-vectors (d is the size of x).
In summary, the observation likelihood is written as:

(8)

where N(xi, μi, σi) is the normal density:

(9)

We assume that the appearance model summarizes the past observations under an
⎛ log 2 ⎞
exponential envelope with a forgetting factor α = 1 − exp ⎜ − ⎟ , where nh represents the
⎝ nh ⎠
half-life of the envelope in frames (Jepson et al., 2003).
When the patch x̂ t is available at time t, the appearance is updated and used to track in the
next frame. It can be shown that the appearance model parameters, i.e., the μi's and σi's can
be updated from time t to time (t + 1) using the following equations (see (Jepson et al., 2003)
for more details on OAMs):

(10)
Facial Expression Recognition in the Presence of Head Motion 19

(11)

This technique is simple, time-efficient and therefore suitable for real-time applications. The
appearance parameters reflect the most recent observations within a roughly L = 1 / α
window with exponential decay.
Note that μ is initialized with the first patch x̂ 0. However, equation (11) is not used with α
being a constant until the number of frames reaches a given value (e.g., the first 40 frames).
For these frames, the classical variance is used, that is, equation (11) is used with α being set
to 1/ t .
Here we used a single Gaussian to model the appearance of each pixel in the shape-free
template. However, modeling the appearance with Gaussian mixtures can also be used at
the expense of an additional computational load (e.g., see (Lee, 2005; Zhou et al., 2004)).

2.4 Face and facial action tracking


Given a video sequence depicting a moving head/face, we would like to recover, for each
frame, the 3D head pose and the facial actions encoded by the state vector bt (equation 5).
The purpose of the tracking is to estimate the state vector bt by using the current appearance
model encoded by μ t and σ t. To this end, the current input image yt is registered with the
current appearance model. The state vector bt is estimated by minimizing the Mahalanobis
distance between the warped image patch and the current appearance mean - the current
Gaussian center

(12)

The above criterion can be minimized using an iterative gradient descent method where the
starting solution is set to the previous solution b̂ t-1. Handling outlier pixels (caused for
instance by occlusions) is performed by replacing the quadratic function by the Huber's cost
function (Huber, 1981). The gradient matrix is computed for each input frame. It is
approximated by numerical differences. More details about this tracking method can be
found in (Dornaika & Davoine, 2006).

3. Tracking then recognition


In this section, we show how the time series representation of the estimated facial actions, τa,
can be utilized for inferring the facial expression in continuous videos. We propose two
different approaches. The first one is a non-parametric approach and relies on Dynamic
Time Warping. The second one is a parametric approach and is based on Linear
Discriminant Analysis.
In order to learn the spatio-temporal structure of the facial actions associated with the
universal expressions, we have used the following. Video sequences have been picked up
from the CMU database (Kanade et al., 2000). These sequences depict five frontal view
universal expressions (surprise, sadness, joy, disgust and anger). Each expression is
performed by 7 different subjects, starting from the neutral one. Altogether we select 35
video sequences composed of around 15 to 20 frames each, that is, the average duration of
each sequence is about half a second. The learning phase consists in estimating the facial
20 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

action parameters τ a (a 6-vector) associated with each training sequence, that is, the
temporal trajectories of the action parameters.
Figure 3 shows six videos belonging to the CMU database. The first five images depict the
estimated deformable model associated with the high magnitude of the five basic
expressions. Figure 4 shows the computed facial action parameters associated with three
training sequences: surprise, joy and anger. The training video sequences have an
interesting property: all performed expressions go from the neutral expression to a high
magnitude expression by going through a moderate magnitude around the middle of the
sequence.

Surprise Sadness

Joy Disgust

Anger Neutral
Figure 3: Six video examples associated with the CMU database. The first five images depict
the high magnitude of the five basic expressions.
Facial Expression Recognition in the Presence of Head Motion 21

(a)

(b)

(c)
Figure 4: Three examples (sequences) of learned facial action parameters as a function of
time. (a) Surprise expression. (b) Joy expression. (c) Anger expression.
22 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

3.1 Dynamic time warping


In the recognition phase, the head and facial actions are recovered from the video sequence
using our developed appearance-based tracker (Dornaika & Davoine, 2006). The current
facial expression is then recognized by computing a similarity measure between the tracked
facial actions τ a(t) associated with the test sequence and those associated with each universal
expression. This recognition scheme can be carried out either online or off-line. One can
notice that a direct comparison between the tracked trajectories and the stored ones is not
feasible since there is no frame-to-frame correspondence between the tracked facial actions
and the stored ones. To overcome this problem, we use dynamic programming which
allows temporal deformation of time series as they are matched against each other.
We infer the facial expression associated with the current frame t by considering the
estimated trajectory, i.e. the sequence of vectors τ a(t), within a temporal window of size T
centered at the current frame t. In our tests, T is set to 9 frames. This trajectory is matched
against the 35 training trajectories using the Dynamic Time Warping (DTW) technique
(Rabiner & Juang, 1993; Berndt & Clifford, 1994). For each training trajectory, the DTW
technique returns a dissimilarity measure between the tested trajectory and the training
trajectory (known universal expression). The classification rule stipulates that the smallest
average dissimilarity decides the expression classifcation where the dissimilarity measures
associated with a given universal expression are averaged over the 7 subjects.
The proposed scheme accounts for the variability in duration since the DTW technique
allows non-linear time scaling. The segmentation of the video is obtained by repeating the
whole recognition scheme for every frame in the test video.
In order to evaluate the performance, we have created test videos featuring the universal
facial expressions. To this end, we have asked a volunteer student to perform each universal
expression several times in a relatively long sequence. The subject was instructed to display
the expression in a natural way, i.e. the displayed expressions were independent of any
database. Each video sequence contains several cycles depicting a particular universal facial
expression.
The performance of the developed recognition scheme is evaluated by utilizing five test
videos. Table 1 shows the confusion matrix for the dynamical facial expression classifier
using the DTW technique. We point out that the learned trajectories were inferred from the
CMU database while the used test videos were created at our laboratory. The recognition
rate of dynamical expressions was 100% for all universal expressions except for the disgust
expression for which the recognition rate was 44%. The reason is that the disgust expression
performed by our subject was very different from that performed by most of the CMU
database subjects. Therefore, for the above experiment, the overall recognition rate is 90.4%.

Table 1: Confusion matrix for the dynamical facial expression classifier using the DTW
technique (the smallest average similarity). The learned trajectories were inferred from the
CMU database while the used test videos were created at our laboratory. The recognition
rate of dynamical expressions was 100% for all basic expressions except for the disgust
expression for which the recognition rate was 44%.
Facial Expression Recognition in the Presence of Head Motion 23

3.2 Linear discriminant analysis


As can be seen from the previous section, the CPU time of the recognition scheme based on
the DTW technique is proportional to the number of the subjects present in the database.
Whenever this number is very large, the recognition scheme becomes computationally
expensive. In this section, we propose a parametric recognition scheme by which the
training trajectories can be represented in a more compact form. The computational cost of
the recognition scheme does not depend on the number of examples.

Figure 5: The parameterized modeling of facial expressions using Eigenspace and


Fisherspace.
Learning. The learning phase is depicted in Figure 5. Again, we use the training videos
associated with the CMU database. In order to obtain trajectories with the same number of
frames (duration) the trajectories belonging to the same expression class are aligned using
the DTW technique. Recall that this technique allows a frame-to-frame correspondence
between two time series.
Let eij be the aligned trajectory i belonging to the expression class j. The example eij is
represented by a column vector of dimension 1×6T and is obtained by concatenating the
facial action 6-vectors τ a(t):
24 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

Note that T represents the duration of the aligned trajectories which will be fixed for all
examples. For example, a nominal duration of 18 frames for the aligned trajectories makes
the dimension of all examples eij (all i and j) equal to 108.
Applying a Principal Component Analysis on the set of all training trajectories yields the
mean trajectory e as well as the principal modes of variation. Any training trajectory e can
be approximated by the principal modes using the q largest eigenvalues:

In our work, the number of principal modes is chosen such that the variability of the
retained modes corresponds to 99% of the total variability. The vector c can be seen as a
parametrization of any input trajectory, ê , in the space spanned by the q basis vectors Ul.
The vector c is given by:

(13)

Thus, all training trajectories eij can now be represented by the vectors cij (using (13)) on
which a Linear Discriminant Analysis can be applied. This gives a new space (the
Fisherspace) in which each training video sequence is represented by a vector of dimension
l -1 where l is the number of expression classes. Figure 6 illustrates the learning results
associated with the CMU data. In this space, each trajectory example is represented by a 5-
vector. Here, we use six facial expression classes: Surprise, Sadness, Joy, Disgust, Anger, and
Neutral. (a) displays the second component versus the first one, and (b) displays the fourth
component versus the third one. In this space, the neutral trajectory (a sequence of zero
vectors) is represented by a star.
Recognition. The recognition scheme follows the main steps of the learning stage. We infer
the facial expression by considering the estimated facial actions provided by our face tracker
(Dornaika & Davoine, 2006). We consider the one-dimensional vector e’ (the concatenation
of the facial actions τa(t)) within a temporal window of size T centered at the current frame t.
Note that the value of T should be the same as in the learning stage. This vector is projected
onto the PCA space, then the obtained vector is projected onto Fisherspace in which the
classification occurs. The expression class whose mean is the closest to the current trajectory
is then assigned to this trajectory (current frame).
Preformance evaluation. Table 2 shows the confusion matrix for the dynamical facial
expression classifier using Eigenspace and Fisherspace. The learned trajectories were
inferred from the CMU database while the used test videos were created at our laboratory.
The recognition rate of dynamical expressions was 100% for all basic expressions except for
the disgust expression for which the recognition rate was 55%. Therefore, for the above
experiment, the overall recognition rate is 92.3%. One can notice the slight improvement in
the recognition rate over the classical recognition scheme based on the DTW.
Facial Expression Recognition in the Presence of Head Motion 25

Table 2: Confusion matrix for the dynamical facial expression classifier using Eigenspace
and Fisherspace. The learned trajectories were inferred from the CMU database while the
used test videos were created at our laboratory. The recognition rate of dynamical
expressions was 100% for all basic expressions except for the disgust expression for which
the recognition rate was 55%.

(a)

(b)
Figure 6: The 35 trajectory examples associated with five universal facial expressions
depicted in Fisherspace. In this space, each trajectory example is represented by a 5-vector.
Here, we use six facial expression classes: Surprise, Sadness, Joy, Disgust, Anger, and
Neutral. (a) displays the second component versus the first one, and (b) displays the fourth
component versus the third one. In this space, the neutral trajectory (a sequence of zero
vectors) is represented by a star.
26 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

4. Tracking and recognition


In Section 3, the facial expression was inferred from the time series representation of the
tracked facial actions. In this section, we propose to simultaneously estimate the facial
actions and the expression from the video sequence.
Since the facial expression can be considered as a random discrete variable, we need to
append to the continuous state vector bt a discrete state component γ t in order to create a
mixed state:

(14)

where γ t ∈ ε = { 1, 2,…,Nγ} is the discrete component of the state, drawn from a finite set of
integer labels. Each integer label represents one of the six universal expressions: surprise,
disgust, fear, joy, sadness and anger. In our study, we adopt these facial expressions
together with the neutral expression, that is, Nγ is set to 7. There is another useful
representation of the mixed state which is given by:

(15)

where ht denotes the 3D head pose parameters, and at the facial actions appended with the
expression label γ t, i.e. at = [ τ aT(t ) , γ t]T .
This separation is consistent with the fact that the facial expression is highly correlated with
the facial actions, while the 3D head pose is independent of the facial actions and
expressions. The remainder of this section is organized as follows. Section 4.1 provides some
backgrounds. Section 4.2 describes the proposed approach for the simultaneous tracking
and recognition. Section 4.3 describes experiments and provides evaluations of performance
to show the feasibility and robustness of the proposed approach.

4.1 Backgrounds
4.1.1 Facial action dynamic models
Corresponding to each basic expression class, γ, there is a stochastic dynamic model
describing the temporal evolution of the facial actions τ a(t), given the expression. It is
assumed to be a Markov model of order K. For each basic expression γ, we associate a
Gaussian Auto-Regressive Process defined by:

(16)

in which wt is a vector of 6 independent random N(0, 1) variables. The parameters of the


dynamic model are: (i) deterministic parameters A1γ , A2γ ,…, AKγ and d γ, and stochastic
parameters Bγ which are multipliers for the stochastic process wt. It is worth noting that the
above model can be used in predicting the process from the previous K values. The
Facial Expression Recognition in the Presence of Head Motion 27

predicted value at time t obeys a multivariate Gaussian centered at the deterministic value
of (16), with BγBγT being its covariance matrix. In our study, we are interested in second-
order models, i.e. K = 2. The reason is twofold. First, these models are easy to estimate.
Second, they are able to model complex dynamics. For example, these models have been
used in (Blake & Isard, 2000) for learning the 2D motion of talking lips (profile contours),
beating heart, and writing fingers.

4.1.2 Learning the second-order auto-regressive models


Given a training sequence τ a(1),…, τ a(T), with T > 2, belonging to the same expression class, it
is well known that a Maximum Likelihood Estimator provides a closed-form solution for the
model parameters (Blake & Isard, 2000). For a second-order model, these parameters reduce
to two 6×6 matrices A1γ , A2γ , a 6-vector dγ, and a 6 × 6 covariance matrix Cγ = Bγ Bγ T .
Therefore, Eq. (16) reduces to:

(17)

The parameters of each auto-regressive model can be computed from temporal facial action
sequences. Ideally, the temporal sequence should contain several instances of the
corresponding expression.
More details about auto-regressive models and their computation can be found in (Blake &
Isard, 2000; Ljung, 1987; North et al., 2000). Each universal expression has its own second-
order auto-regressive model given by Eq.(17). However, the dynamics of facial actions
associated with the neutral expression can be simpler and are given by:

τ a(t) = τ a(t-1)+Dwt
where D is a diagonal matrix whose elements represent the variances around the ideal
neutral configuration τa = 0. The right-hand side of the above equation is constrained to
belong to a predefined interval, since a neutral configuration and expression is characterized
by both the lack of motion and the closeness to the ideal static configuration. In our study,
the auto-regressive models are learned using a supervised learning scheme. First, we asked
volunteer students to perform each basic expression several times in approximately 30-
second sequences. Each video sequence contains several cycles depicting a particular facial
expression: Surprise, Sadness, Joy, Disgust, Anger, and Fear. Second, for each training
video, the 3D head pose and the facial actions τa(t) are tracked using our deterministic
appearance-based tracker (Dornaika & Davoine, 2006) (outlined in Section 2). Third, the
parameters of each auto-regressive model are estimated using the Maximum Likelihood
Estimator.
Figure 7 illustrates the value of the facial actions, τ a(t), associated with six training video
sequences. For clarity purposes, only two components are shown for a given plot. For a
given training video, the neutral frames are skipped from the original training sequence
used in the computation of the auto- regressive models.

4.1.3 The transition matrix


In our study, the facial actions as well as the expression are simultaneously retrieved using a
stochastic framework, namely the particle filtering method. This framework requires a
28 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

transition matrix T whose entries T γ ',γ describe the probability of transition between two
expression labels γ’ and γ. The transition probabilities need to be learned from training video
sequences. In the literature, the transition probabilities associated with states (not
necessarily facial expressions) are inferred using supervised and unsupervised learning
techniques. However, since we are dealing with high level states (the universal facial
expressions), we have found that a realistic a priori setting works very well. We adopt a 7 ×7
symmetric matrix whose diagonal elements are close to one (e.g. Tγ,γ = 0.8, that is, 80% of the
transitions occur within the same expression class). The rest of the percentage is distributed
equally among the expressions. In this model, transitions from one expression to another
expression without going through the neutral one are allowed. Furthermore, this model
adopts the most general case where all universal expressions have the same probability.
However, according to the context of the application, one can adopt other transition
matrices in which some expressions are more likely to happen than others.

4.2 Approach
Since at any given time, the 3D head pose parameters can be considered as independent of
the facial actions and expression, our basic idea is to split the estimation of the unknown
parameters into two main stages. For each input video frame yt, these two stages are
invoked in sequence in order to recover the mixed state [ htT , atT ]T . Our proposed approach
is illustrated in Figure 8. In the first stage, the six degrees of freedom associated with the 3D
head pose (encoded by the vector ht) are obtained using a deterministic registration
technique similar to that proposed in (Dornaika & Davoine, 2006). In the second stage, the
facial actions and the facial expression (encoded by the vector at = [ τ aT( t ) , γt]T ) are
simultaneously estimated using a stochastic framework based on a particle filter. Such
models have been used to track objects when different types of dynamics can occur (Isard &
Blake, 1998). Other examples of auxiliary discrete variables beside the main hidden state of
interest are given in (Perez & Vermaak, 2005). Since τ a(t) and γt are highly correlated their
simultaneous estimation will give results that are more robust and accurate than results
obtained with methods estimating them in sequence. In the following, we present the
parameter estimation process associated with the current frame yt. Recall that the head pose
is computed using a deterministic approach, while the facial actions and expressions are
estimated using a probabilistic framework.

4.2.1 3D head pose


The purpose of this stage is to estimate the six degrees of freedom associated with the 3D
head pose at frame t, that is, the vector ht. Our basic idea is to recover the current 3D head
pose parameters from the previous 12-vector b̂ t-1 = [ θˆ x(t-1), θˆ y(t-1), θˆ z(t-1), tˆ x(t-1), tˆ y(t-1), tˆ z(t-1),
τˆaT(t −1) ]T = [ hˆtT−1 , τˆaT( t −1) ]T using the same region-based registration technique outlined in
Section 2.4. However, this time the unknown parameters are only given by the 3D head pose
parameters:

(18)
Facial Expression Recognition in the Presence of Head Motion 29

4.2.2 Simultaneous facial actions and expression


In this stage, our goal is to simultaneously infer the facial actions as well as the expression
label associated with the current frame t given (i) the observation model (Eq.(8)), (ii) the
dynamics associated with each expression (Eq.(17)), and (iii) the 3D head pose for the
current frame computed by the deterministic approach (see Section 4.2.1). This will be
performed using a particle filter paradigm. Thus, the statistical inference of such paradigm
will provide a posterior distribution for the facial actions τ a(t) as well as a Probability Mass
function for the facial expression γt .
Since the 3D head pose ht is already computed, we are left with the mixed state at = [ τ aT(t ) ,γt]T.
The dimension of the vector at is 7. Here we will employ a particle filter algorithm allowing
the recursive estimation of the posterior distribution p (at⏐ x1:(t)) using a particle set. This is
approximated by a set of J particles { (a t(0) ,w t(0) ),…, (a t( J ) ,w t( J ) )}. Once this distribution is
known the facial actions as well as the expression can be inferred using some loss function
such as the MAP or the mean. Figure 9 illustrates the proposed two-stage approach. It shows
how the current posterior p (at⏐ x1:(t)) can be inferred from the previous posterior
p (at-1⏐ x1:(t-1)) using a particle filter algorithm.
On a 3.2 GHz PC, a C code of the approach computes the 3D head pose parameters in 25 ms
and the facial actions/expression in 31 ms where the patch resolution is 1310 pixels and the
number of particles is 100.

4.3 Experimental results


In this section, we first report results on simultaneous facial action tracking and expression
recognition. Then we present performance studies, considering different perturbing factors
such as robustness to rapid facial movements or to imprecise 3D head pose estimation.

4.3.1 Simultaneous tracking and recognition


Figure 10 shows the application of the proposed approach to a 748-frame test video
sequence. The upper part of this figure shows 9 frames of this sequence: 50, 130, 221, 300,
371, 450, 500, 620, and 740. The two plots illustrate the probability of each expression as a
function of time (frames). The lower part of this figure shows the tracking results associated
with frames 130, 371, and 450. The upper left corner of these frames depicts the appearance
mean and the current shape-free facial patch. Figure 11.a illustrates the weighted average of
the tracked facial actions, τˆ a(t). For the sake of clarity, only three out of six components are
shown. For this sequence, the maximum probability was correctly indicating the displayed
expression. We noticed that some displayed expressions can, during a short initial phase
(very few frames), be considered as a mixture of two expressions (the displayed one and
another one). This is due to the fact that face postures and dynamics at some transition
phases can be shared by more than one expression. This is not a problem since the frame-
wise expression probabilities can be merged and averaged over a temporal patch including
contiguous non-neutral frames. Figure 11.b illustrates this scheme and shows the resulting
segmentation of the used test video. One remarks that this holds true for a human observer,
who may fail to recognize a gesture from only one single frame.
30 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

In the above experiment, the total number of particles is set to 200. Figure 12 illustrates the
same facial actions when the number of particles is set to 100. We have found that there is no
significant difference in the estimated facial actions and expressions when the tracking is
performed with 100 particles (see Figures 11.a and 12), which is due to the use of learned
multi-class dynamics.
Figure 13 shows the tracking results associated with another 600-frame test video sequence
depicting significant out-of-plane head movements. The recognition results were correct.
Recall that the facial actions are related to the deformable 3D model and thus the recognition
based on them is independent from the viewing angle.
A challenging example. We have dealt with a challenging test video. For this 1600-frame
test video, we asked our subject to adopt arbitrarily different facial gestures and expressions
for an arbitrary duration and in an arbitrary order. Figure 14 (Top) illustrates the probability
mass distribution as a function of time. As can be seen, surprise, joy, anger, disgust, and fear
are clearly and correctly detected. Also, we find that the facial actions associated with the
subject's conversation are correctly tracked using the dynamics of the universal expressions.
The tracked facial actions associated with the subject's conversation are depicted in nine
frames (see the lower part of Figure 14). The whole video can be found at http:
//www.hds.utc.fr/ ~ fdavoine/MovieTrackingRecognition.wmv.

4.3.2 Performance study


One-class dynamics versus multi-class dynamics In order to show the advantage of using
multi- class dynamics and mixed states, we conducted the following experiment. We used a
particle filter for tracking facial actions. However, this time the state consists only of facial
actions and the dynamics are replaced with a simple noise model, i.e. motion is modelled by
a random noise. Figures 15.a and 15.b show the tracking results associated with the same
input frame. (a) displays the tracking results obtained with a particle filter adopting a single-
class dynamics. (b) displays the tracking results with our proposed approach adopting the
six auto-regressive models. As can be seen, by using mixed states with learned multi-class
dynamics, the facial action tracking becomes considerably more accurate (see the adaptation
of the mouth region-the lower lip).
Effect of rapid and/or discontinuous facial movements It is well known that facial
expressions introduce rapid facial feature movements, and hence many developed trackers
may fail to keep track of them. In order to assess the behavior of our developed tracker
whenever very rapid facial movements occur, we conducted the following experiment to
simulate an ultra rapid mouth motion1. We cut about 40 frames from a test video. These
frames (video segment) overlap with a surprise transition. The altered video is then tracked
using two different methods: (i) a deterministic approach based on a registration technique
estimating both the head and facial action parameters (Dornaika & Davoine, 2006), and (ii)
our stochastic approach. Figures 16.a and 16.b show the tracking results associated with the
same input frame immediately after the cut. Note the difference in accuracy between the
deterministic approach (a) and the stochastic one (b) (see the eyebrow and mouth region).
Thus, despite the motion discontinuity of the mouth and the eyebrows, the particles are still

1 This experiment also simulates a discontinuity in video streaming.


Facial Expression Recognition in the Presence of Head Motion 31

able to provide the correct state (both the discrete and the continuous components) almost
instantaneously (see the correct alignment between the 3D model and the region of the lips
and mouth in Figure 16.b).
Low resolution video sequences In order to assess the behavior of our developed approach
when the resolution and/or the quality of the videos is low, we downloaded several low-
quality videos used in (Huang et al., 2002). In each 42-frame video, one universal expression
is displayed. Figure 17 shows our recognition results (the discrete probability distribution)
associated with three such videos. The left images display the 25th frame of each video. Note
that the neutral curve is not shown for reasons of clarity. As can be seen, the recognition
obtained with our stochastic approach was very good despite the low quality of the videos
used. The resolution of these videos is 320×240 pixels.
Impact of noisy estimated 3D head pose The estimated appearance-based 3D head pose
may suffer from some inaccuracies associated with the out-of-plane movements, which is
the case for all monocular systems. It would seem reasonable to fear that these inaccuracies
might lead to a failure in facial action tracking. In order to assess the effect of 3D head pose
inaccuracies on the facial action tracking, we conducted the following experiment. We
acquired a 750-frame sequence and performed our approach twice. The first was a
straightforward run. In the second run, the estimated out-of-plane parameters of the 3D
head pose were perturbed by a uniform noise, then the perturbed 3D pose was used by the
facial action tracking and facial expression recognition. Figure 18 shows the value of the
tracked actions in both cases: the noise-free 3D head pose (solid curve) and the noisy 3D
head pose (dotted curves). In this experiment, the two out-of-plane angles were perturbed
with additive uniform noise belonging to [-7degrees, +7degrees] and the scale was
perturbed by an additive noise belonging to [-2%, +2%]. As can be seen, the facial actions are
almost not affected by the introduced noise. This can be explained by the fact that the 2D
projection of out-of-plane errors produce very small errors in the image plane such that the
2D alignment between the model and the regions of lips and eyebrows is still good enough
to capture their independent movements correctly.
Robustness to lighting conditions The appearance model used was given by one single
multivariate Gaussian with parameters slowly updated over time. The robustness of this
model is improved through the use of robust statistics that prevent outliers from
deteriorating the global appearance model. This relatively simple model was adopted to
allow real-time performance. We found that the tracking based on this model was successful
even in the presence of temporary occlusions caused by a rotated face and occluding hands.
Figure 19 illustrates the tracking results associated with a video sequence provided by the
Polytechnic University of Madrid2, depicting head movements and facial expressions under
significant illumination changes (Buenaposada et al., 2006). As can be seen, even though
with our simple appearance model the possible brief perturbations caused temporary
tracking inaccuracies, there is no track lost. Moreover, whenever the perturbation
disappears the tracker begins once more to provide accurate parameters.

2
http://www.dia.fi.upm.es/~pcr/downloads.html
32 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

5. Conclusion
This chapter provided a set of recent deterministic and stochastic (robust) techniques that
perform efficient facial expression recognition from video sequences. More precisely, we
described two texture- and view-independent frameworks for facial expression recognition
given natural head motion. Both frameworks use temporal classification and do not require
any learned facial image patch since the facial texture model is learned online. The latter
property makes them more flexible than many existing recognition approaches. The
proposed frameworks can easily include other facial gestures in addition to the universal
expressions.
The first framework (Tracking then Recognition) exploits the temporal representation of
tracked facial actions in order to infer the current facial expression in a deterministic way.
Within this framework, we proposed two different recognition methods: i) a method based
on Dynamic Time Warping, and ii) a method based on Linear Discriminant Analysis. The
second framework (Tracking and Recognition) proposes a novel paradigm in which facial
action tracking and expression recognition are simultaneously performed. This framework
consists of two stages. In the first stage, the 3D head pose is recovered using a deterministic
registration technique based on Online Appearance Models. In the second stage, the facial
actions as well as the facial expression are simultaneously estimated using a stochastic
framework based on multi-class dynamics.
We have shown that possible inaccuracies affecting the out-of-plane parameters associated
with the 3D head pose have no impact on the stochastic tracking and recognition. The
developed scheme lends itself nicely to real-time systems. We expect the approach to
perform well in the presence of perturbing factors, such as video discontinuities and
moderate illumination changes. The developed face tracker was successfully tested with
moderate rapid head movements. Should ultra-rapid head movements break tracking, it is
possible to use a re-initialization process or a stochastic tracker that propagates a probability
distribution over time, such as the particle-filter-based tracking method presented in our
previous work (Dornaika & Davoine, 2006). The out-of-plane face motion range is limited
within the interval [-45 deg, 45 deg] for the pitch and the yaw angles. Within this range, the
obtained distortions associated with the facial patch are still acceptable to estimate the
correct pose of the head. Note that the proposed algorithm does not require that the first
frame should be a neutral face since all universal expressions have the same probability.
The current work uses an appearance model given by one single multivariate Gaussian
whose parameters are slowly updated over time. The robustness of this model is improved
through the use of robust statistics that prevent outliers from deteriorating the global
appearance model. This relatively simple model was adopted to allow real-time
performance. We found that the tracking based on this model was successful even in the
presence of occlusions caused by a rotated face and occluding hands. The current
appearance model can be made more sophisticated through the use of Gaussian mixtures
(Zhou et al., 2004; Lee, 2005) and/or illumination templates to take into account sudden and
significant local appearance changes due for instance to the presence of shadows.
Facial Expression Recognition in the Presence of Head Motion 33

Figure 7: The automatically tracked facial actions, τ a(t), using the training videos. Each video
sequence corresponds to one expression. For a given plot, only two components are
displayed.
34 Affective Computing, Focus on Emotion Expression, Synthesis and Recognition

Figure 8: The proposed two-stage approach. In the first stage (Section 4.2.1), the 3D head
pose is computed using a deterministic registration technique. In the second stage (Section
4.2.2), the facial actions and expression are simultaneously estimated using a stochastic
technique involving multi-class dynamics.
Discovering Diverse Content Through
Random Scribd Documents
Of Pluto, to have quite set free
His half regain’d Eurydice.
These delights, if thou canst give,
Mirth with thee, I mean to live.

311.
Il Penseroso
HENCE vain deluding joyes,
The brood of folly without father bred,
How little you bested,
Or fill the fixèd mind with all your toyes;
Dwell in som idle brain,
And fancies fond with gaudy shapes possess,
As thick and numberless
As the gay motes that people the Sun Beams,
Or likest hovering dreams
The fickle Pensioners of Morpheus train.
But hail thou Goddes, sage and holy,
Hail divinest Melancholy,
Whose Saintly visage is too bright
To hit the Sense of human sight;
And therfore to our weaker view,
Ore laid with black staid Wisdoms hue.
Black, but such as in esteem,
Prince Memnons sister might beseem,
Or that Starr’d Ethiope Queen that strove
To set her beauties praise above
The Sea Nymphs, and their powers offended.
Yet thou art higher far descended,
Thee bright-hair’d Vesta long of yore,
To solitary Saturn bore;
His daughter she (in Saturns raign,
Such mixture was not held a stain)
Oft in glimmering Bowres, and glades
He met her, and in secret shades
Of woody Ida’s inmost grove,
Whilst yet there was no fear of Jove.
Com pensive Nun, devout and pure,
Sober, stedfast, and demure,
All in a robe of darkest grain,
Flowing with majestick train,
And sable stole of Cipres Lawn,
Over thy decent shoulders drawn
Over thy decent shoulders drawn.
Com, but keep thy wonted state,
With eev’n step, and musing gate,
And looks commercing with the skies,
Thy rapt soul sitting in thine eyes:
There held in holy passion still,
Forget thy self to Marble, till
With a sad Leaden downward cast,
Thou fix them on the earth as fast.
And joyn with thee calm Peace, and Quiet,
Spare Fast, that oft with gods doth diet,
And hears the Muses in a ring,
Ay round about Joves Altar sing.
And adde to these retirèd Leasure,
That in trim Gardens takes his pleasure;
But first, and chiefest, with thee bring,
Him that yon soars on golden wing,
Guiding the fiery-wheelèd throne,
The Cherub Contemplation,
And the mute Silence hist along,
’Less Philomel will daign a Song,
In her sweetest, saddest plight,
Smoothing the rugged brow of night,
While Cynthia checks her Dragon yoke,
Gently o’re th’accustom’d Oke;
Sweet Bird that shunn’st the noise of folly,
Most musicall, most melancholy!
Thee Chauntress oft the Woods among,
I woo to hear thy eeven-Song;
And missing thee, I walk unseen
On the dry smooth-shaven Green,
To behold the wandring Moon,
Riding neer her highest noon,
Like one that had bin led astray
Through the Heav’ns wide pathles way;
And oft, as if her head she bow’d,
Stooping through a fleecy cloud.
Oft on a Plat of rising ground,
I hear the far-off Curfeu sound
Over som wide-water’d shoar,
Swinging slow with sullen roar;
Or if the Ayr will not permit,
Som still removèd place will fit.
Where glowing Embers through the room
Teach light to counterfeit a gloom,
Far from all resort of mirth,
Save the Cricket on the hearth,
Or the Belmans drousie charm,
To bless the dores from nightly harm:
Or let my Lamp at midnight hour,
Be seen in som high lonely Towr,
Where I may oft out-watch the Bear,
With thrice great Hermes, or unsphear
The spirit of Plato to unfold
What Worlds, or what vast Regions hold
The immortal mind that hath forsook
Her mansion in this fleshly nook:
And of those Dæmons that are found
In fire, air, flood, or under ground,
Whose power hath a true consent
With Planet, or with Element.
Som time let Gorgeous Tragedy
In Scepter’d Pall com sweeping by,
Presenting Thebs, or Pelops line,
Or the tale of Troy divine.
Or what (though rare) of later age,
Ennoblèd hath the Buskind stage.
But, O sad Virgin, that thy power
Might raise Musæeus from his bower
Or bid the soul of Orpheus sing
Such notes as warbled to the string,
Drew Iron tears down Pluto’s cheek,
And made Hell grant what Love did seek.
Or call up him that left half told
The story of Cambuscan bold,
Of Camball, and of Algarsife,
Of Camball, and of Algarsife,
And who had Canace to wife,
That own’d the vertuous Ring and Glass,
And of the wondrous Hors of Brass,
On which the Tartar King did ride;
And if ought els, great Bards beside,
In sage and solemn tunes have sung,
Of Turneys and of Trophies hung;
Of Forests, and inchantments drear,
Where more is meant then meets the ear.
Thus night oft see me in thy pale career,
Till civil-suited Morn appeer,
Not trickt and frounc’t as she was wont,
With the Attick Boy to hunt,
But Cherchef’t in a comly Cloud,
While rocking Winds are Piping loud,
Or usher’d with a shower still,
When the gust hath blown his fill,
Ending on the russling Leaves,
With minute drops from off the Eaves.
And when the Sun begins to fling
His flaring beams, me Goddes bring
To archèd walks of twilight groves,
And shadows brown that Sylvan loves,
Of Pine, or monumental Oake,
Where the rude Ax with heavèd stroke,
Was never heard the Nymphs to daunt,
Or fright them from their hallow’d haunt.
There in close covert by som Brook,
Where no profaner eye may look,
Hide me from Day’s garish eie,
While the Bee with Honied thie,
That at her flowry work doth sing,
And the Waters murmuring
With such consort as they keep,
Entice the dewy-feather’d Sleep;
And let som strange mysterious dream,
Wave at his Wings in Airy stream,
Of li l t t di l ’d
Of lively portrature display’d,
Softly on my eye-lids laid.
And as I wake, sweet musick breath
Above, about, or underneath,
Sent by som spirit to mortals good,
Or th’unseen Genius of the Wood.
But let my due feet never fail,
To walk the studious Cloysters pale,
And love the high embowèd Roof,
With antick Pillars massy proof,
And storied Windows richly dight,
Casting a dimm religious light.
There let the pealing Organ blow,
To the full voic’d Quire below,
In Service high, and Anthems cleer,
As may with sweetnes, through mine ear,
Dissolve me into extasies,
And bring all Heav’n before mine eyes.
And may at last my weary age
Find out the peacefull hermitage,
The Hairy Gown and Mossy Cell,
Where I may sit and rightly spell
Of every Star that Heav’n doth shew,
And every Herb that sips the dew;
Till old experience do attain
To somthing like Prophetic strain.
These pleasures Melancholy give,
And I with thee will choose to live.

312.
From ‘Arcades’
O’re the smooth enameld green
Where no print of step hath been,
Follow me as I sing,
And touch the warbled string.
Under the shady roof
Of branching Elm Star-proof,
Follow me,
I will bring you where she sits
Clad in splendor as befits
Her deity.
Such a rural Queen
All Arcadia hath not seen.
From ‘Comus’
313.
i
THE Star that bids the Shepherd fold,
Now the top of Heav’n doth hold,
And the gilded Car of Day,
His glowing Axle doth allay
In the steep Atlantick stream,
And the slope Sun his upward beam
Shoots against the dusky Pole,
Pacing toward the other gole
Of his Chamber in the East.
Mean while welcom Joy, and Feast,
Midnight shout, and revelry,
Tipsie dance, and Jollity.
Braid your Locks with rosie Twine
Dropping odours, dropping Wine.
Rigor now is gon to bed,
And Advice with scrupulous head,
Strict Age, and sowre Severity,
With their grave Saws in slumber ly.
We that are of purer fire
Imitate the Starry Quire,
Who in their nightly watchfull Sphears,
Lead in swift round the Months and Years.
The Sounds, and Seas with all their finny drove
Now to the Moon in wavering Morrice move,
And on the Tawny Sands and Shelves,
Trip the pert Fairies and the dapper Elves;
By dimpled Brook, and Fountain brim,
The Wood-Nymphs deckt with Daisies trim,
Their merry wakes and pastimes keep:
What hath night to do with sleep?
Night hath better sweets to prove,
Venus now wakes, and wak’ns Love....
Com, knit hands, and beat the ground,
In a light fantastick round.
314.
ii
Echo

SWEET Echo, sweetest Nymph that liv’st unseen


Within thy airy shell
By slow Meander’s margent green,
And in the violet imbroider’d vale
Where the love-lorn Nightingale
Nightly to thee her sad Song mourneth well.
Canst thou not tell me of a gentle Pair
That likest thy Narcissus are?
O if thou have
Hid them in som flowry Cave,
Tell me but where
Sweet Queen of Parly, Daughter of the Sphear!
So maist thou be translated to the skies,
And give resounding grace to all Heav’ns Harmonies?

315.
iii
Sabrina
The Spirit sings:
SABRINA fair
Listen where thou art sitting
Under the glassie, cool, translucent wave,
In twisted braids of Lillies knitting
The loose train of thy amber-dropping hair,
Listen for dear honour’s sake,
Goddess of the silver lake,
Listen and save!

Listen and appear to us,


In name of great Oceanus,
By the earth-shaking Neptune’s mace,
And Tethys grave majestick pace,
By hoary Nereus wrincled look,
And the Carpathian wisards hook,
By scaly Tritons winding shell,
And old sooth-saying Glaucus spell,
By Leucothea’s lovely hands,
And her son that rules the strands,
By Thetis tinsel-slipper’d feet,
And the Songs of Sirens sweet,
By dead Parthenope’s dear tomb,
And fair Ligea’s golden comb,
Wherwith she sits on diamond rocks
Sleeking her soft alluring locks,
By all the Nymphs that nightly dance
Upon thy streams with wily glance,
Rise, rise, and heave thy rosie head
From thy coral-pav’n bed,
And bridle in thy headlong wave,
Till thou our summons answered have.
Listen and save!

Sabrina replies:
BY the rushy-fringèd bank,
Where grows the Willow and the Osier dank,
My sliding Chariot stayes,
Thick set with Agat, and the azurn sheen
Of Turkis blew, and Emrauld green
That in the channell strayes,
Whilst from off the waters fleet
Thus I set my printless feet
O’re the Cowslips Velvet head,
That bends not as I tread,
Gentle swain at thy request
I am here.

316.
iv
The Spirit epiloguizes:
TO the Ocean now I fly,
And those happy climes that ly
Where day never shuts his eye,
Up in the broad fields of the sky:
There I suck the liquid ayr
All amidst the Gardens fair
Of Hesperus, and his daughters three
That sing about the golden tree:
Along the crispèd shades and bowres
Revels the spruce and jocond Spring,
The Graces, and the rosie-boosom’d Howres,
Thither all their bounties bring,
That there eternal Summer dwels,
And West winds, with musky wing
About the cedar’n alleys fling
Nard, and Cassia’s balmy smels.
Iris there with humid bow,
Waters the odorous banks that blow
Flowers of more mingled hew
Than her purfl’d scarf can shew,
And drenches with Elysian dew
(List mortals, if your ears be true)
Beds of Hyacinth, and roses
Where young Adonis oft reposes,
Waxing well of his deep wound
In slumber soft, and on the ground
Sadly sits th’ Assyrian Queen;
But far above in spangled sheen
Celestial Cupid her fam’d son advanc’t,
Holds his dear Psyche sweet intranc’t
After her wandring labours long,
Till free consent the gods among
Make her his eternal Bride,
And from her fair unspotted side
Two blissful twins are to be born,
Youth and Joy; so Jove hath sworn
Youth and Joy; so Jove hath sworn.
But now my task is smoothly don,
I can fly, or I can run
Quickly to the green earths end,
Where the bow’d welkin slow doth bend,
And from thence can soar as soon
To the corners of the Moon.
Mortals that would follow me,
Love vertue, she alone is free.
She can teach ye how to clime
Higher then the Spheary chime;
Or if Vertue feeble were,
Heav’n it self would stoop to her.

317.
Lycidas
A Lament for a friend drowned in his passage from Chester on the Irish
Seas, 1637
YET once more, O ye Laurels, and once more
Ye Myrtles brown, with Ivy never-sear,
I com to pluck your Berries harsh and crude,
And with forc’d fingers rude,
Shatter your leaves before the mellowing year.
Bitter constraint, and sad occasion dear,
Compels me to disturb your season due:
For Lycidas is dead, dead ere his prime
Young Lycidas, and hath not left his peer:
Who would not sing for Lycidas? he knew
Himself to sing, and build the lofty rhyme.
He must not flote upon his watry bear
Unwept, and welter to the parching wind,
Without the meed of som melodious tear.
Begin, then, Sisters of the sacred well,
That from beneath the seat of Jove doth spring,
Begin, and somwhat loudly sweep the string.
Hence with denial vain, and coy excuse,
So may som gentle Muse
With lucky words favour my destin’d Urn,
And as he passes turn,
And bid fair peace be to my sable shrowd.
For we were nurst upon the self-same hill,
Fed the same flock, by fountain, shade, and rill.
Together both, ere the high Lawns appear’d
Under the opening eye-lids of the morn,
We drove a field, and both together heard
What time the Gray-fly winds her sultry horn,
Batt’ning our flocks with the fresh dews of night,
Oft till the Star that rose, at Ev’ning, bright
Toward Heav’ns descent had slop’d his westering wheel.
Mean while the Rural ditties were not mute,
Temper’d to th’Oaten Flute;
Rough Satyrs danc’d, and Fauns with clov’n heel,
From the glad sound would not be absent long,
And old Damætas lov’d to hear our song
And old Damætas lov d to hear our song
But O the heavy change, now thou art gon,
Now thou art gon, and never must return!
Thee Shepherd, thee the Woods, and desert Caves,
With wilde Thyme and the gadding Vine o’regrown,
And all their echoes mourn.
The Willows, and the Hazle Copses green,
Shall now no more be seen,
Fanning their joyous Leaves to thy soft layes.
As killing as the Canker to the Rose,
Or Taint-worm to the weanling Herds that graze,
Or Frost to Flowers, that their gay wardrop wear,
When first the White thorn blows;
Such, Lycidas, thy loss to Shepherds ear.
Where were ye Nymphs when the remorseless deep
Clos’d o’re the head of your lov’d Lycidas?
For neither were ye playing on the steep,
Where your old Bards, the famous Druids ly,
Nor on the shaggy top of Mona high,
Nor yet where Deva spreads her wisard stream:
Ay me, I fondly dream!
Had ye bin there—for what could that have don?
What could the Muse her self that Orpheus bore,
The Muse her self, for her inchanting son
Whom Universal nature did lament,
When by the rout that made the hideous roar,
His goary visage down the stream was sent,
Down the swift Hebrus to the Lesbian shore.
Alas! what boots it with uncessant care
To tend the homely slighted Shepherds trade,
And strictly meditate the thankles Muse,
Were it not better don as others use,
To sport with Amaryllis in the shade,
Or with the tangles of Neæra’s hair?
Fame is the spur that the clear spirit doth raise
(That last infirmity of Noble mind)
To scorn delights, and live laborious dayes;
But the fair Guerdon when we hope to find,
And think to burst out into sudden blaze,
Comes the blind Fury with th’abhorrèd shears,
And slits the thin spun life. But not the praise,
Phœbus repli’d, and touch’d my trembling ears;
Fame is no plant that grows on mortal soil,
Nor in the glistering foil
Set off to th’world, nor in broad rumour lies,
But lives and spreds aloft by those pure eyes,
And perfet witnes of all judging Jove;
As he pronounces lastly on each deed,
Of so much fame in Heav’n expect thy meed.
O fountain Arethuse, and thou honour’d floud,
Smooth-sliding Mincius, crown’d with vocall reeds,
That strain I heard was of a higher mood:
But now my Oate proceeds,
And listens to the Herald of the Sea
That came in Neptune’s plea,
He ask’d the Waves, and ask’d the Fellon winds,
What hard mishap hath doom’d this gentle swain?
And question’d every gust of rugged wings
That blows from off each beakèd Promontory,
They knew not of his story,
And sage Hippotades their answer brings,
That not a blast was from his dungeon stray’d,
The Ayr was calm, and on the level brine,
Sleek Panope with all her sisters play’d.
It was that fatall and perfidious Bark
Built in th’eclipse, and rigg’d with curses dark,
That sunk so low that sacred head of thine.
Next Camus, reverend Sire, went footing slow,
His Mantle hairy, and his Bonnet sedge,
Inwrought with figures dim, and on the edge
Like to that sanguine flower inscrib’d with woe.
Ah; Who hath reft (quoth he) my dearest pledge?
Last came, and last did go,
The Pilot of the Galilean lake,
Two massy Keyes he bore of metals twain,
(The Golden opes, the Iron shuts amain)
(The Golden opes, the Iron shuts amain)
He shook his Miter’d locks, and stern bespake,
How well could I have spar’d for thee, young swain,
Anow of such as for their bellies sake,
Creep and intrude, and climb into the fold?
Of other care they little reck’ning make,
Then how to scramble at the shearers feast,
And shove away the worthy bidden guest.
Blind mouthes! that scarce themselves know how to hold
A Sheep-hook, or have learn’d ought els the least
That to the faithfull Herdmans art belongs!
What recks it them? What need they? They are sped;
And when they list, their lean and flashy songs
Grate on their scrannel Pipes of wretched straw,
The hungry Sheep look up, and are not fed,
But swoln with wind, and the rank mist they draw,
Rot inwardly, and foul contagion spread:
Besides what the grim Woolf with privy paw
Daily devours apace, and nothing sed,
But that two-handed engine at the door,
Stands ready to smite once, and smite no more.
Return Alpheus, the dread voice is past,
That shrunk thy streams; Return Sicilian Muse,
And call the Vales, and bid them hither cast
Their Bels, and Flourets of a thousand hues.
Ye valleys low where the milde whispers use,
Of shades and wanton winds, and gushing brooks,
On whose fresh lap the swart Star sparely looks,
Throw hither all your quaint enameld eyes,
That on the green terf suck the honied showres,
And purple all the ground with vernal flowres.
Bring the rathe Primrose that forsaken dies.
The tufted Crow-toe, and pale Gessamine,
The white Pink, and the Pansie freakt with jeat,
The glowing Violet.
The Musk-rose, and the well attir’d Woodbine.
With Cowslips wan that hang the pensive hed,
And every flower that sad embroidery wears:
Bid A th ll hi b t h d
Bid Amaranthus all his beauty shed,
And Daffadillies fill their cups with tears,
To strew the Laureat Herse where Lycid lies.
For so to interpose a little ease,
Let our frail thoughts dally with false surmise.
Ay me! Whilst thee the shores, and sounding Seas
Wash far away, where ere thy bones are hurld,
Whether beyond the stormy Hebrides,
Where thou perhaps under the whelming tide
Visit’st the bottom of the monstrous world;
Or whether thou to our moist vows deny’d,
Sleep’st by the fable of Bellerus old,
Where the great vision of the guarded Mount
Looks toward Namancos and Bayona’s hold;
Look homeward Angel now, and melt with ruth.
And, O ye Dolphins, waft the haples youth.
Weep no more, woful Shepherds weep no more,
For Lycidas your sorrow is not dead,
Sunk though he be beneath the watry floar,
So sinks the day-star in the Ocean bed,
And yet anon repairs his drooping head,
And tricks his beams, and with new spangled Ore,
Flames in the forehead of the morning sky:
So Lycidas sunk low, but mounted high,
Through the dear might of him that walk’d the waves
Where other groves, and other streams along,
With Nectar pure his oozy Lock’s he laves,
And hears the unexpressive nuptiall Song,
In the blest Kingdoms meek of joy and love.
There entertain him all the Saints above,
In solemn troops, and sweet Societies
That sing, and singing in their glory move,
And wipe the tears for ever from his eyes.
Now Lycidas the Shepherds weep no more;
Hence forth thou art the Genius of the shore,
In thy large recompense, and shalt be good
To all that wander in that perilous flood.
Thus sang the uncouth Swain to th’Okes and rills,
g ,
While the still morn went out with Sandals gray,
He touch’d the tender stops of various Quills,
With eager thought warbling his Dorick lay:
And now the Sun had stretch’d out all the hills,
And now was dropt into the Western bay;
At last he rose, and twitch’d his Mantle blew:
To morrow to fresh Woods, and Pastures new.

317*.
To the Lady Margaret Ley
DAUGHTER to that good Earl, once President
Of Englands Counsel, and her Treasury,
Who liv’d in both, unstain’d with gold or fee,
And left them both, more in himself content,
Till the sad breaking of that Parlament
Broke him, as that dishonest victory
At Chæronèa, fatal to liberty
Kil’d with report that Old man eloquent,
Though later born, then to have known the dayes
Wherin your Father flourisht, yet by you
Madam, me thinks I see him living yet;
So well your words his noble vertues praise,
That all both judge you to relate them true,
And to possess them, Honour’d Margaret.

318.
On His Blindness

WHEN I consider how my light is spent,


E’re half my days, in this dark world and wide,
And that one Talent which is death to hide,
Lodg’d with me useless, though my Soul more bent
To serve therewith my Maker, and present
My true account, least he returning chide,
Doth God exact day-labour, light deny’d,
I fondly ask; But patience to prevent
That murmur, soon replies, God doth not need
Either man’s work or his own gifts, who best
Bear his milde yoak, they serve him best, his State
Is Kingly. Thousands at his bidding speed
And post o’re Land and Ocean without rest:
They also serve who only stand and waite.
319.
To Mr. Lawrence

LAWRENCE of vertuous Father vertuous Son,


Now that the Fields are dank, and ways are mire,
Where shall we sometimes meet, and by the fire
Help wast a sullen day; what may be won
From the hard Season gaining: time will run
On smoother, till Favonius re-inspire
The frozen earth; and cloth in fresh attire
The Lillie and Rose, that neither sow’d nor spun.
What neat repast shall feast us, light and choice,
Of Attick tast, with Wine, whence we may rise
To hear the Lute well toucht, or artfull voice
Warble immortal Notes and Tuskan Ayre?
He who of those delights can judge, and spare
To interpose them oft, is not unwise.

320.
To Cyriack Skinner
CYRIACK, whose Grandsire on the Royal Bench
Of Brittish Themis, with no mean applause
Pronounc’t and in his volumes taught our Lawes,
Which others at their Barr so often wrench:
To day deep thoughts resolve with me to drench
In mirth, that after no repenting drawes;
Let Euclid rest and Archimedes pause,
And what the Swede intend, and what the French.
To measure life, learn thou betimes, and know
Toward solid good what leads the nearest way;
For other things mild Heav’n a time ordains,
And disapproves that care, though wise in show,
That with superfluous burden loads the day,
And when God sends a cheerful hour, refrains.

321.
On His Deceased Wife

METHOUGHT I saw my late espousèd Saint


Brought to me like Alcestis from the grave,
Whom Joves great Son to her glad Husband gave,
Rescu’d from death by force though pale and faint.
Mine as whom washt from spot of child-bed taint,
Purification in the old Law did save,
And such, as yet once more I trust to have
Full sight of her in Heaven without restraint,
Came vested all in white, pure as her mind:
Her face was vail’d, yet to my fancied sight,
Love, sweetness, goodness, in her person shin’d
So clear, as in no face with more delight.
But O as to embrace me she enclin’d
I wak’d, she fled, and day brought back my night.
322.
Light
HAIL holy light, ofspring of Heav’n first-born,
Or of th’ Eternal Coeternal beam
May I express thee unblam’d? since God is light,
And never but in unapproachèd light
Dwelt from Eternitie, dwelt then in thee,
Bright effluence of bright essence increate.
Or hear’st thou rather pure Ethereal stream,
Whose Fountain who shall tell? before the Sun,
Before the Heavens thou wert, and at the voice
Of God, as with a Mantle didst invest
The rising world of waters dark and deep,
Won from the void and formless infinite.
Thee I re-visit now with bolder wing,
Escap’t the Stygian Pool, though long detain’d
In that obscure sojourn, while in my flight
Through utter and through middle darkness borne
With other notes then to th’ Orphean Lyre
I sung of Chaos and Eternal Night,
Taught by the heav’nly Muse to venture down
The dark descent, and up to reascend,
Though hard and rare: thee I revisit safe,
And feel thy sovran vital Lamp; but thou
Revisit’st not these eyes, that rowle in vain
To find thy piercing ray, and find no dawn;
So thick a drop serene hath quencht thir Orbs,
Or dim suffusion veild. Yet not the more
Cease I to wander where the Muses haunt
Cleer Spring, or shadie Grove, or Sunnie Hill,
Smit with the love of sacred song; but chief
Thee Sion and the flowrie Brooks beneath
That wash thy hallowd feet, and warbling flow,
Nightly I visit: nor somtimes forget
Those other two equal’d with me in Fate,
So were I equal’d with them in renown.
Blind Thamyris and blind Mæonides,
And Tiresias and Phineus Prophets old
And Tiresias and Phineus Prophets old.
Then feed on thoughts, that voluntarie move
Harmonious numbers; as the wakeful Bird
Sings darkling, and in shadiest Covert hid
Tunes her nocturnal Note. Thus with the Year
Seasons return, but not to me returns
Day, or the sweet approach of Ev’n or Morn,
Or sight of vernal bloom, or Summers Rose,
Or flocks, or herds, or human face divine;
But cloud in stead, and ever-during dark
Surrounds me, from the chearful waies of men
Cut off, and for the Book of knowledg fair
Presented with a Universal blanc
Of Natures works to mee expung’d and ras’d,
And wisdome at one entrance quite shut out.
So much the rather thou Celestial light
Shine inward, and the mind through all her powers
Irradiate, there plant eyes, all mist from thence
Purge and disperse, that I may see and tell
Of things invisible to mortal sight.
From ‘Samson Agonistes’
323.
i
OH how comely it is and how reviving
To the Spirits of just men long opprest!
When God into the hands of thir deliverer
Puts invincible might
To quell the mighty of the Earth, th’ oppressour,
The brute and boist’rous force of violent men
Hardy and industrious to support
Tyrannic power, but raging to pursue
The righteous and all such as honour Truth;
He all thir Ammunition
And feats of War defeats
With plain Heroic magnitude of mind
And celestial vigour arm’d,
Thir Armories and Magazins contemns,
Renders them useless, while
With wingèd expedition
Swift as the lightning glance he executes
His errand on the wicked, who surpris’d
Lose thir defence distracted and amaz’d.

324.
ii
ALL is best, though we oft doubt,
What th’ unsearchable dispose
Of highest wisdom brings about,
And ever best found in the close.
Oft he seems to hide his face,
But unexpectedly returns
And to his faithful Champion hath in place
Bore witness gloriously; whence Gaza mourns
And all that band them to resist
His uncontroulable intent.
His servants he with new acquist
Of true experience from this great event
With peace and consolation hath dismist,
And calm of mind all passion spent.
SIR JOHN SUCKLING
1609-1642
325.
A Doubt of Martyrdom
O FOR some honest lover’s ghost,
Some kind unbodied post
Sent from the shades below!
I strangely long to know
Whether the noble chaplets wear
Those that their mistress’ scorn did bear
Or those that were used kindly.

For whatsoe’er they tell us here


To make those sufferings dear,
’Twill there, I fear, be found
That to the being crown’d
T’ have loved alone will not suffice,
Unless we also have been wise
And have our loves enjoy’d.

What posture can we think him in


That, here unloved, again
Departs, and ’s thither gone
Where each sits by his own?
Or how can that Elysium be
Where I my mistress still must see
Circled in other’s arms?

For there the judges all are just,


And Sophonisba must
Be his whom she held dear,
Not his who loved her here.
The sweet Philoclea, since she died,
Lies by her Pirocles his side,
Not by Amphialus.

Some bays, perchance, or myrtle bough


For difference crowns the brow
Of those kind souls that were
The noble martyrs here:

You might also like