0% found this document useful (0 votes)
7 views

Machine Perception Laboratory FACS

Uploaded by

mota
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine Perception Laboratory FACS

Uploaded by

mota
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

10/10/2022 23:15 Machine Perception Laboratory

Principal Auto FACS Coding


Investigators
Fully automated facial action coding
Marian
Stewart The output of the face detector is fed directly to the facial expression analysis system (see Figure
Bartlett 1). First the face image is passed through a bank of Gabor filters at 8 orientations and 9 scales (2-32
Gwen pixels/cycle at 0.5 octave steps). The filterbank representations are then channeled to a statistical
Littlewort classifier to code the image in terms of a set of expression dimensions. We conducted a comparison
Javier of classifiers, including support vector machines (SVM’s), Adaboost (Freund & Shapire, 1996),
Movellan and Linear Discriminant Analysis (Littlewort et al., in press, Bartlett et al., 2003). SVM’s were
Mark S. Frank found to be very effective for classifying facial expressions. Recent research at our lab has
Demos demonstrated that both speed and accuracy are enhanced by performing feature selection on the
Gabor filters prior to classification (e.g. Bartlett et al., 2003). This approach employs Adaboost for
Datasets feature selection prior to classification by SVM’s. Adaboost sequentially selects the feature that
Software gives the most information about classification given the features that have been already selected.
Sponsors Adaboost was more effective than other feature selection techniques such as PCA.
Publications

HOME PAGE
Projects
Software
Databases
Standards Figure 1.
Demos Fully automated facial action coding system, using machine learning
techniques
People
Publications a. Results on the DFAT-504 dataset
We developed a system for fully automatic facial action coding of a subset of FACS. This prototype
Tech Reports system recognizes all eighteen upper face AU’s (1,2,4,5,6,7 and 9) and is 100% automated. Face
Tutorials images are detected and aligned automatically in the video frames and sent directly to the
recognition system. The system was trained on Cohn and Kanade's DFAT-504 dataset (Kanade,
Presentations Cohn, & Tian, 2000). This is a dataset of video sequences of university students posing facial
Address & expressions. In addition to being labeled for basic emotion, the dataset was coded for FACS scores
Directions by two certified FACS coders. There were 313 sequences from 90 subjects, with 1 to 6 emotions
MPLab in the per subject. All faces in this dataset were successfully detected by the automatic face tracking
News system. The automatically detected faces were then passed to the expression analysis system (As
shown in Figure 1).
Twiki
MPbugs Seven support vector machines, one for each AU, were trained to detect the presence of a given
AU, regardless of whether it occurred alone or in combination with other AU’s. The expression
recognition system was trained on the last frame of each sequence, which contained the highest
magnitude of the target expression (AU peak). Negative examples consisted of all peak frames that
did not contain the target AU, plus 313 neutral images consisting of the first frame of each
sequence. A nonlinear radial basis function kernel was employed. Generalization to new subjects
was tested using leave-one-out cross-validation (Tukey, 1951). The results are shown in Table 1.

Table 1

https://inc.ucsd.edu/mplab/grants/project1/research/Fully-Auto-FACS-Coding.html 1/3
10/10/2022 23:15 Machine Perception Laboratory

Performance for fully automatic recognition of 18 facial actions,


generalization to novel subjects.
N:Total number of examples of each AU, including combinations
containing that AU. Agreement: Percent agreement with Human
FACS codes (positive and negative examples classed correctly).
Nhit:FA: Raw number of hits and false alarms, where the number of
negative test samples was 626-N.

System outputs for full image sequences are shown in Figure 2. These are results for test image
sequences, which are sequences not used for training.

Figure 2
Fully automated FACS measurements for image sequences. a. System
outputs for full image sequences of surprise expressions from four subjects,
scored by the human coder as containing AU’s 1, 2, and 5 (inner brow raise,
outer brow raise, and upper lid raise). Curves show automated system output
for AU’s 1, 2, and 5. b. System outputs for full image sequences of disgust
expressions from four subjects, scored by the human coder as containing
AU’s 4,7, and 9 (brow lower, lower lid raise, nose wrinkle). Curves show
system output for AU’s 4,7, and 9.

https://inc.ucsd.edu/mplab/grants/project1/research/Fully-Auto-FACS-Coding.html 2/3
10/10/2022 23:15 Machine Perception Laboratory

The system obtained a mean of 93.6% agreement with human FACS labels for fully automatic
recognition of 18 upper facial actions. This is an exciting result, as performance rates are equal to
or better than other systems tested on this dataset that employed manual registration. (Tian, Kanade
& Cohn, 2001; Kapoor, Qi, & Picard, 2003). Kapoor et al. obtained 81.2% correct on this dataset,
using hand marked pupil positions for alignment. Tian et al. obtained a similar level of performance
to ours, but hand-marked a set of feature points in neutral expression images immediately
preceding each movement. The high performance rate obtained by our system is the result of many
years of systematic comparisons investigating which image features (representations) are most
effective (Bartlett et al., 1999, Donato et al., 1999), which classifiers are most effective (Littlewort
et al., 2004), optimal resolution and spatial frequency (Donato et al., 1999; Littlewort et al.,
submitted), feature selection techniques (Littlewort et al., submitted), and comparing flow-based to
texture-based recognition (Bartlett et al., 1999, Donato et al., 1999).

b. Preliminary results on the RU-FACS-1 dataset.


The UCSD team has just received digital video and FACS codes for the first 20 subjects of the RU-
FACS-1 dataset from Rutgers. We are presently testing fully automatic FACS recognition in the
continuous video stream using the system described above. Preliminary results based on 1 subject
show a mean agreement rate of 88.7% between the automated system and the human FACS codes
for the four actions with sufficient data to test (AU’s 1, 2, 6, and 7). Here ‘agreement’ is the percent
of frames above or below threshold in accordance with the human codes. In the coming months, we
will train AU detectors directly on the spontaneous expression samples from Rutgers as the data
becomes available.

We are presently testing fully automatic FACS recognition in the continuous video stream. Faces
images were automatically detected. Alignment in the 2D plane was then refined using
automatically detected eye locations. The resulting images were then ported to the AU detectors
trained on the DFAT-504 database. The figure below shows example system outputs for a video
sequence that contains an AU 1 and AU 2. Preliminary results based on 1 subject show a mean
agreement rate of 88.7% between the automated system and the human FACS codes for the four
actions with sufficient data to test (AU’s 1, 2, 6, and 7). (Results by action are AU 1, 87.3% for
AU2, 92.4% for AU 6, and 86.4% for AU 7).Here ‘agreement’ is the percent of frames above or
below threshold in accordance with the human codes. In the coming months, we will train AU
detectors directly on the spontaneous expression samples from Rutgers as the data becomes
available.

Fully automated FACS coding on a sample subject from the RU-FACS-1 dataset. b. Outputs
for the AU 1 detector and the AU 2 detector. Arrows indicate the onset, offset, and peak as
scored by the human coder. AU1 was scored as intensity D. AU2 was scored as intensity c.
Output of the AU 1 detector for 500 frames of video. A human coded AU1 event from onset
to offset are in green, and peak is identified by the red dot

https://inc.ucsd.edu/mplab/grants/project1/research/Fully-Auto-FACS-Coding.html 3/3

You might also like