0% found this document useful (0 votes)
50 views6 pages

Face Recognition Algorithms As Models of Human Face Processing

The document compares how humans and computational algorithms process individual faces by looking at measures of face similarity from both. It finds that humans and most algorithms show a common bimodal structure in their similarity measures, reflecting differences in how they assess face similarity. This structure relates to representation and metric aspects of the algorithms.

Uploaded by

Bihari Lal Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views6 pages

Face Recognition Algorithms As Models of Human Face Processing

The document compares how humans and computational algorithms process individual faces by looking at measures of face similarity from both. It finds that humans and most algorithms show a common bimodal structure in their similarity measures, reflecting differences in how they assess face similarity. This structure relates to representation and metric aspects of the algorithms.

Uploaded by

Bihari Lal Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Face Recognition Algorithms as Models of Human Face Processing

Alice J. O'Toole P. Jonathon Phillips


The University of Texas at Dallas National Institute of Standards and Technology
Richardson TX 75083-0688 Gaithersburg, MD 20899-9840
[email protected] [email protected]

Yi Cheng Brendan Ross


The University of Texas at Dallas The University of Texas at Dallas
Richardson TX 75083-0688 Richardson TX 75083-0688
[email protected] [email protected]

Heather A. Wild
The University of Texas at Dallas
Richardson TX 75083-0688

Abstract human subject credibility in eyewitness identi cations


and the development of techniques for conveying vi-
We evaluated the adequacy of computational algo- sual memories of faces to others who must recognize
rithms as models of human face processing by looking the faces later (e.g., the description of suspects, miss-
at how the algorithms and humans process individual ing children, criminal suspects, etc.).
faces. By comparing model- and human-generated mea- A primary goal of the computational research has
sures of the similarity between pairs of faces, we were been to develop algorithms capable of recognizing hu-
able to assess the accord between several automatic face man faces automatically. The applications of such algo-
recognition algorithms and human perceivers. Multidi- rithms are evident for automatic vigilance systems, se-
mensional scaling (MDS) was used to create a spatial curity veri cation systems, and automatized data bases
representation of the subject response patterns. Next, that can be searched by descriptions. The accuracy
the model response patterns were projected into this of these automated face recognition systems has been
space. The results revealed a common bimodal struc- tested extensively. Most of these algorithms perform
ture for both the subjects and for most of the models. impressively, (e.g., [4, 2, 11, 12]), even on completely
The bimodal subject structure re ected strategy di er- objective tests for thousands of faces [7].
ences in making similarity decisions. For the models, Although there is a wealth of information about the
the bimodal structure was related to combined aspects relative strengths and weaknesses of individual algo-
of the representations and the distance metrics used in rithms for the problem of automatic face recognition,
the implementations. much less is known about how these algorithms com-
pare to the system that many of them are designed
to replace: the human perceiver. This comparison is
1. Introduction of interest both to psychologists and to computational
modelers. For psychologists, comparing human perfor-
Understanding the principles and processes involved mance to the performance of computational models of
in recognizing faces has been an active area of research face recognition can potentially give insight into the
in the domains of neuroscience, psychology, and com- processes used by the human visual system to encode
puter science. A primary goal of the psychological re- and retrieve faces. For computational theorists, knowl-
search is to understand how the human visual system edge of the way human face processing works may yield
converts retinal images of faces into mental representa- insights into processing strategies that will allow for
tions that can be used to recognize known faces. The more exible and robust face processing systems.
applications of this research include the evaluations of A simple and direct test of the performance of hu-
mans and models is to compare overall accuracy on \known" faces in memory. To recognize a known face,
a face recognition task. Ironically, comparing hu- a model or human must judge the test picture to be
man and model performance is complicated by the more similar to a stored picture of the same face than
high accuracy of the published computational mod- to pictures of other faces in the data base. Because of
els. This high accuracy stems both from the enor- natural image variations, changes in expression, and/or
mous amount of progress that has been made in im- changes in viewpoint, a test face may \appear" or even
age encoding/retrieval in the last decade and from the \be" more similar to other similar faces than to its own
types of tasks on which the models have been tested. stored picture in the data base. For this reason, com-
For the most part, automatic face recognition algo- putational models are often evaluated on a test stimu-
rithms have been tested primarily within a relatively lus by ranking the best matches to the test face. To do
small range of viewpoints and under moderately well- this ranking, a measure of the \similarity" between the
controlled conditions of illumination, (for a summary, test face and all other faces in the data base is usually
see [7]).1 Thus, although the models perform very ac- computed.
curately under certain controlled conditions, the exi- This face confusability phenomenon has been stud-
bility of the human system is likely to prove superior ied extensively in the human face recognition litera-
to current models in most practical contexts. ture. The most robust and consistent predictor of face
A second possible approach to comparing human recognition accuracy for human subjects is the rated
and model performance is to concentrate on the way \typicality" of a face [3, 6, 8]. Typicality is in fact a
in which individual faces are processed by a human or \short-cut" measure that captures a sort of \average"
computational system, instead of on the overall accu- of a faces' similarity to most other faces [3]. Typi-
racy. One of the most robust ndings in the psycholog- cal faces are judged to be similar to many other faces;
ical literature on face recognition is that not all faces whereas unusual faces are judged to be dissimilar to
are equally \recognizable". Some faces, by nature, are many other faces. Such ratings can be gathered easily
easier to recognize than others. More formally,psychol- from human subjects for a set of faces by asking the
ogists have de ned recognizability to be a characteris- subjects to rate the similarity of all possible pairs of
tic of individual faces. Highly recognizable faces are faces in the set. Estimates of the typicality of a face
those which are easily recalled when they have been can then be computed as the average of the similiarity
seen previously (high hit rates) and are rarely recog- ratings the face receives in combination with all other
nized by mistake (low false alarm rates) when they have faces in the data base. It is clear then that the reason
not been seen previously. why typical faces are recognized less accurately than
Given that individual faces vary in recognizability, unusual faces is that we are more likely to confuse typ-
face recognition performance can be characterized ac- ical faces with other similar faces (of which there are
cording to the pattern of errors made by a human or many by de nition) than we are to confuse an unusual
algorithm across any given set of faces. Comparing face with similar faces (of which there are many less).
the pattern of errors for individual faces is an under- The adequacy of computational algorithms as mod-
utilized, but useful, method for evaluating the accord els of human face processing can be evaluated by look-
between human and algorithm performance for recog- ing at the accord between typicality measures, which
nizing faces. are based on human subject similarity ratings and com-
Before proceeding with the methods, we present a parable measures extracted from the automatic face
brief look at psychological ideas about variations in the recognition systems. Measures for the automatic face
recognizability of individual faces for human subjects recognition models are available from the extensive
and show how they might relate to measures commonly data collected in the FERET project. In the FERET
available from tests of computational models of face program, the US Government evaluated 18 state-of-
recognition. the-art automatic face recognition algorithms between
August 1994 and March 1997. The primary interest
1.1. Typicality, Similarity and Human Face of the US Government in the FERET project was to
Recognition Accuracy explore the potential of these algorithms for automat-
ing visual monitoring activities. Several comprehensive
Both automatic and human face recognition involve tests of the accuracy of the algorithms have been re-
a series of comparisons between a \test" face and the ported in detail elsewhere [7]. Despite the extensive
1 See [9] for an exception, though note that other aspects of the testing of these algorithms for recognition accuracy, to
faces in this study were better controlled than in the algorithms date, there has been no attempt to make use of the
considered in the present study, e.g., hair. data generated in the FERET project to evaluate the

2
suitability of the algorithms as models of human face University (MSU) [12], the Media Lab at the Massa-
processing. chutsetts Institute of Technology (MIT95, MIT96)[4],
In the present study, we develop a method for com- University of Maryland (UMD97) [12], University of
paring human and model performance at the level of Southern California (USC)[2] and Rutgers University
individual faces. This method is based on a multi- (RUT)[11].
dimensional representation that combines the human- Six additional PCA-based algorithms, implemented
and model-generated measures of the \similarity" be- by researchers involved in designing the FERET evalu-
tween all pairs of faces in a data set. Multidimen- ations, were included : (a) to provide a performance
sional analysis is a method for creating a spatial map baseline control model using a standard PCA algo-
of \item locations" that tries to preserve the similar- rithm, and (b) to gain a better understanding of the
ities/distances between items. Similarities/distances impact of varying the \retrieval" stage of the model via
are assessed along a set of variables. Here, the items we variations of distance metric implemented in the near-
analyzed consisted of 13 of the 18 FERET algorithms2 est neighbor classi er. These NIST control algorithms
and 22 human observers. The variables consisted of the are as follows : L1 distance (L1), L2 distance (L2),
human- or algorithm-generated typicality measures for Mahalanobis distance (MD), city block distance metric
a set of faces. This analysis yielded a map of the simi- (ML1), Euclidean distance (ML2), angular distance or
larity of performance of the models and human subjects cosine (AN). These algorithms were all based on PCA
in the same multidimensional space. representations of faces, and varied only in the distance
metrics used to recognize the faces. The overall accu-
2. Methods racy of these algorithms is detailed elsewhere[5] and
indicates that variations in the distance metric impact
This paper is organized as follows. First, we describe the recognition performance of the PCA substantially.
the stimuli for both the algorithm and human data
collection. Next, we present a brief description and The performance of the algorithms was assessed us-
classi cation of the subset of FERET-evaluated mod- ing the FERET September 1996 evaluation method [7].
els and describe the algorithm measures of face typi- This method supplies a similarity measure between all
cality. We then describe the psychological methods for possible pairs of images for each algorithm. These sim-
gathering data analogous to the algorithm measures. ilarity measures can be represented in the form of a
Finally, we present a set of combined analyses aimed matrix S . For example, S contained the similarity
i;j

at determing the accord between human subjects and between the i and j face. For each algorithm, we
th th

the algorithms. calculated the typicality of all 20 faces as follows. Each


Face Stimuli. The stimuli consisted of smiling column of the matrix contained the similarity between
and neutral expression pictures of twenty Caucasian a particular face and the other 19 faces. Thus, aver-
male faces in their twenties taken from the FERET aging the columns yielded a vector, t, containing the
data base. We chose faces from the relatively homo- typicality of each of the 20 faces for the algorithm.
geneous category of Caucasian males in their twenties Human Measures. The purpose of the human
to examine algorithm and human performance under experiments was to generate face typicality measures
challenging conditions. Also, smiling and neutral ver- analogous to those available from the algorithms. We
sions of each face were used instead of identical images did this by collecting similarity ratings between all pos-
to prevent \picture-matching" for both the algorithms sible pairs of faces. For the human subjects, the stim-
(which would nearly always generate perfect perfor- uli were centered in a frame and clothing was edited
mance under these conditions) and humans. out digitally to prevent matching the faces by cloth-
Algorithm Measures. The 13 algorithms we con- ing cues. Twenty-two students from The University
sidered and their basic characteristics are listed in Ta- of Texas at Dallas volunteered to participate in the
ble 1.3 These can be divided into two groups. The experiment. On each trial, subjects viewed a pair of
rst group consisted of seven algorithms developed by facial images consisting of a neutral and smiling ex-
independent researchers not involved in designing the pression face for 1 second. A computer prompt then
FERET evaluation method. These algorithms were asked the subject to rate the face pair as consisting of :
developed at: Excalibur Corp. (EX), Michigan State (0) identical persons, (1) similar persons, or (2) dissim-
2 Technical problems prevented us from considering all 18 ilar persons. These three response categories allowed
algorithms.
3 fx g and fy g are points in a PCA-face space and z  p1
us to assess both similarity and matching accuracy (as
i i i i described below). Subjects viewed all possible pairs
and i are the eigenvalues of the PCA. of faces, presented in random order, for a total of 400

3
Table 1. Computational Algorithms
ALGORITHM REPRESENTATION DISTANCE METRIC
Excalibur Co. (EX) Unknown Unknown
MIT95 PCA-based L2
MIT96 PCA-di erence space MAP Bayesian Statistic
Michigan St. U. (MSU) Fischer discriminant L2
Rutgers U. Greyscale projection Weighted L1
U. of So. CA (USC) Dynamic Link Elastic matching
Architecture
U. of MD (UMD97) Fischer discriminant P
L2
NIST (L1) PCA P (jxx ,, yy )j
k
i =1 i i

, P Px y z
NIST (L2) PCA k
i =1 i i
2

NIST (MD) PCA k


=1 i i i

, qP P
i
k

NIST (AN) PCA i=1


xi yi zi

P jx , y jz
k k
x2 y2
i=1 i i=1 i

NIST (ML1) PCA P (x , y ) z


k
i =1 i i i

NIST (ML2) PCA k


i =1 i i
2
i

trials.4 follows. First, we created a spatial representation of


Analysis. As for the algorithms, data from individ- human subject response patterns to the 400 face pairs.
ual subjects were represented in the form of a matrix, Next, we projected analogous data from the computa-
with each element S containing the similarity rating
i;j tional algorithms into the subject space. Algorithms
(0, 1, or 2) between the i and j face. Thus, each
th th
producing patterns of typicality scores comparable to
value on the diagonal contained a zero if the subject the human subjects should cluster close to the subjects
correctly matched all identical persons from the smil- in the space.
ing and neutral versions. The o -diagonal elements More formally, we analyzed the typicality vectors
contained relatively higher numbers for relatively more from the 22 human subjects using metric MDS5. This
dissimilar face pairs. Zeros in the o -diagnonal ele- produces a low-dimensional spatial representation of
ments also represented errors in which di erent faces the response patterns of the 22 subjects on the task.
were rated as identical persons. Face typicality was The rst two axes of this analysis, which explain .73
calculated by averaging the columns of the subject ma- and .08 of the variance, respectively, appear in Fig-
trices, yielding typicality vectors for each subject. ure 1. Each individual subject is marked with an \*".
Subject Accuracy. Although not our primary pur- Subjects with similar face typicality response patterns
pose, it was nonetheless useful to verify that the sub- appear close to each other in the space and subjects
jects, like the models, performed the task accurately. with di erent response patterns appear relatively far-
The similarity ratings were used to indicate accuracy ther away from each other in the space. The gure re-
with two types of errors: (a) judging identical people veals two clusters of human subject response patterns.
as \similar" or \dissimilar" persons, and (b) judging This indicates simply that the subjects' response pat-
di erent people as \identical persons". Across all sub- terns are bimodally distributed.
jects, the proportion of errors of the rst type was .09, An initial interpretation of the bimodal distribution
and the proportion of errors of the second type was .04. of subject responses in the space is that there were two
This indicates relatively high human subject accuracy \types" of subjects, who rated faces for similarity in
on the task. di erent ways. This could arise if subjects in the two
Combined Analysis. We assessed the degree of groups paid attention to di erent features in their sim-
accord between the algorithms' and human subjects' ilarity judgments (e.g., one group based their similarity
estimates of the similarity among the set of faces as judgments more on hair style and the other group based
4 All experimental events were controlled by a computer pro- 5 Metric multidimensional scaling is based on a linear distance
grammed with PsyScope[1]. metric and hence reduces to a standard PCA.

4
0.4
Face Typicality : Human Subjects and Models 3. Interpretation and Discussion
Human Subjects. As noted, an initial explantaion
RUT
0.3

0.2 of the bimodal subject distribution is that subjects can


be divided into two groups, each based on di erences
in the similarity criteria they applied to the task. Pre-
MIT95
0.1 UMD97 ML2

cisely what critera the di erent subject groups used,


Axis 2

MSU USC L1

however, is dicult to assess directly. A less direct,


0
MIT96
EX AN

but nonetheless useful, approach is to apply an MDS to


ML1
−0.1 L2 MD

the original similarity ratings of the faces. This gives


−0.2
a spatial map that represents the similarity relation-
−0.3 ships among faces. When applied separately to each
group of subjects, MDS produces a face map for each
−0.4
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 group. One can then examine these maps and attempt
Axis 1
to interpret the axes (i.e., the directions in the space
Figure 1. Two-dimensional MDS solution of the that explain the most variation). This can be done
pattern of face typicalities produced by human by simply \laying out" pictures of the faces at their
subjects (*). Algorithm data projected into this coordinates in the space(s).
space (.). See Table 1 for label abbreviations. We performed this analysis separately for subjects
on the left and right sides of Figure 1. When we exam-
ined the faces that were contrasted by the axes, a rel-
atively straightforward interpretation emerged. There
were two common axes for both sets of subjects. How-
their judgments more on face shape). We will address ever, the axes were reversed in terms of which ex-
this question in more detail shortly. In any case, this is plained more variance. One axis could be interpreted as
an interesting and somewhat unexpected psychological an \age/maturity" axis, which contrasted \college stu-
nding in that it is often assumed that experiments of dent" faces at one end and with more mature \twenty-
this sort tap a single face typicality metric (though see something" faces at the other end. The other axis
[10] which questions this assumption). contrasted masculine/atheletic faces (wide faces, large
necks) with less masculine faces. The age criteria dom-
We then projected the analogous typicality vectors inated for the left cluster of subjects, while the athletic
for the algorithms into the space derived from the hu- criteria dominated the right cluster.
man subject data. These appear superimposed on the Algorithms and Human Data. In projecting
subject data in Figure 1. Each algorithm is marked the algorithm response patterns into the human re-
with a \.", and is labeled with its abbreviated name sponse pattern space, three ndings are worth not-
(see Table 1). ing. First, as we found for the subjects, di erent algo-
rithms varied in the patterns of faces they found simi-
A few descriptive points are worth noting. First, we lar/confusable. Thus, there were meaningful variations
were surprised to see the bimodal distribution of sub- in algorithm/human performance at the level of indi-
ject response patterns, which we hypothesized to be vidual faces. This supports the use of similarity data
due to di erences in human subject response strategies, for characterizing and for comparing the performance
mirrored in the model data. In one cluster of subjects, of algorithms and subjects.
we nd the algorithms from UMD97, USC, MSU and Second, the data indicate that the face algorithms
EX with the PCA control models employing the ML1, are not performing in a way that is unrelated to the
ML2 and L2 distance metrics. In the second cluster, we human subject performance. If the performance of the
see the MIT96 algorithms and the PCA control models algorithms had been unrelated to the subject perfor-
employing the AN, L1 and MD metrics. The MIT95 mance, the algorithm projections would have landed
and RUT algorithms are not embedded in subject clus- far from the human subjects in the space. This clearly
ters. Second, a rather encouraging nding here was did not occur for any of the algorithms.
that nearly all of the algorithms fell reasonably close Third, as for the human subjects, the algorithms
to subject data. This indicates that the algorithms are also tended to cluster bimodally. Several algorithms
capturing some of the features of human performance aligned with one or the other of the human response
on the task. patterns. Determining the factors responsible for the

5
common bimodal clustering is complicated by the fact References
that we did not have full access to implementation de-
tails (e.g., some were proprietary, and not all details [1] J. D. Cohen, B. McWhinney, M. Flatt, and J. Provost.
of the algorithms are reported). In any case, some Psyscope : A new graphic interactive environment
conclusions can be drawn. First, it is clear that the for designing psychology experiments. Behavior Re-
algorithms do not separate exclusively based on their search Methods, Instruments and Computers, 25:257{
underlying representation. From Figure 1, clearly algo- 271, 1993.
rithms with rather di erent representations can cluster [2] K. O. et. al. The Bochum/USC face recognition sys-
tem. In H. Wechsler, P. J. Phillips, V. Bruce, F. Fo-
together (and vice versa). Although some clustering gelman Soulie, and T. S. Huang, editors, Face Recog-
might relate to representation (e.g., the UMD97 and nition: From Theory to Applications. Springer-Verlag,
MSU algorithms, both based on linear Fischer discrim- Berlin, 1998.
inant analysis, are very close), representation is appar- [3] L. Light, F. Kayra-Stuart, and S. Hollander. Recogni-
ently not the only factor. tion memory for typical and unusual faces. Journal of
Second, the distance metrics, which comprised the Experimental Psychology: Human Learning and Mem-
ory, 5:212{228, 1979.
most controlled design feature in this study, had a po- [4] B. Moghaddam and A. Pentland. Beyond linear
tent e ect on the response patterns.6 This is clear in eigenspaces: Bayesian matching for face recognition.
the distribution of the NIST control implementations, In H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman
which are scattered across the space. Given that only Soulie, and T. S. Huang, editors, Face Recognition:
the distance metric varies in these models, the way in From Theory to Applications. Springer-Verlag, Berlin,
which \similarity" is computed in the algorithms can 1998.
give rise to very di erent patterns of typicality at the [5] H. Moon and P. J. Phillips. Analysis of PCA-based
level of individual faces (see also [5]). face recognition algorithms. In K. W. Bowyer and P. J.
Phillips, editors, Empirical Evaluation Techniques in
Finally, it is worth noting that the extensive FERET Computer Vision. IEEE Computer Society Press, Los
tests suggest that performance accuracy probably did Alamitos, CA, 1998.
not contribute strongly to the algorithm clustering. [6] A. J. O'Toole, K. A. De enbacher, D. Valentin, and
The three most accurate algorithms, MIT96, UMD97, H. Abdi. Structural aspects of face recognition and
and USC, are divided between the clusters. the other-race e ect. Memory & Cognition, 22:208{
224, 1994.
Combined, it is clear that the bimodal algorithm [7] P. J. Phillips, H. Moon, S. Rizvi, and P. Rauss. The
structure is determined by complex trade-o s between FERET evaluation. In H. Wechsler, P. J. Phillips,
the representations and distance metrics. Furthermore, V. Bruce, F. Fogelman Soulie, and T. S. Huang, edi-
given that the algorithm response patterns align with tors, Face Recognition: From Theory to Applications.
the human response patterns, and given that an ab- Springer-Verlag, Berlin, 1998.
stract interpretation of these human response patterns [8] T. Valentine and V. Bruce. The e ects of distinctive-
was possible, it is perhaps reasonable to characterize ness in recognising and classifying faces. Perception,
the algorithms in these terms. The \abstract" dimen- 15:525{536, 1986.
[9] T. Vetter and T. Poggio. Image synthesis from a sin-
sions used by the subjects relate to real physical prop- gle example image. In B. Buxton and R. Cippola,
erties of the faces. Athletic/masculine faces are shaped editors, Computer Vision { ECCV'96: Lecture Notes
di erently than non-athletic/non-masculine faces. Hu- in Computer Science, pages 652{659. Springer Verlag,
man subjects are very comfortable using these terms Cambridge, UK, 1996.
to describe faces, and neural network models have also [10] J. R. Vokey and J. D. Read. Familiarity, memorability,
been shown to be sensitive to this shape distinction. and the e ect of typicality on the recognition of faces.
The age of a face is also speci ed from physical infor- Memory & Cognition, 20:291{302, 1992.
mation in the face, e.g., from shapes and facial textures. [11] J. Wilder. Face recognition using transform coding
Therefore, it is not surprising that such dimensions re- of gray scale projections and the neural tree network.
In R. J. Mammone, editor, Arti cal Neural Networks
late to the algorithms' predictions of face similarity. with Applications in Speech and Vision, pages 520{
The possibility of classifying faces in this abstract man- 536. Chapman Hall, 1994.
ner provides a link between the information that hu- [12] W. Zhao, A. Krishnaswamy, R. Chellappa, D. Swets,
mans and algorithms use to recognize faces. and J. Weng. Discriminant analysis of principal com-
ponents for face recognition. In H. Wechsler, P. J.
Phillips, V. Bruce, F. Fogelman Soulie, and T. S.
6 See [5] for a quantitative performance assessment of the dis- Huang, editors, Face Recognition: From Theory to Ap-
tance metrics, which is consistent with the qualitative ndings plications. Springer-Verlag, Berlin, 1998.
reported here.

You might also like