0% found this document useful (0 votes)
71 views

ICASSp 1

regrading face

Uploaded by

Showkat Rashid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

ICASSp 1

regrading face

Uploaded by

Showkat Rashid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FACE RECOGNITION IN REAL-WORLD IMAGES

Xavier Fontaine Radhakrishna Achanta Sabine Süsstrunk

School of Computer and Communication Sciences


École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

ABSTRACT faces with high accuracy and speed without employing any
specialized hardware or parallel processing. Given detected
Face recognition systems are designed to handle well-aligned
faces and their landmarks, we present an algorithm to align
images captured under controlled situations. However real-
these faces and use a modified version of a state-of-the-art al-
world images present varying orientations, expressions, and
gorithm to recognize faces. Our algorithm is able to identify
illumination conditions. Traditional face recognition algo-
pictures in almost real-time on a simple PC. Except for deep
rithms perform poorly on such images. In this paper we
learning based schemes, our method outperforms all other
present a method for face recognition adapted to real-world
schemes we are aware of in terms of recognition accuracy.
conditions that can be trained using very few training exam-
The rest of this paper is organized as follows: Section 2
ples and is computationally efficient. Our method consists
briefly reviews prominent state-of-the-art methods for face
of performing a novel alignment process followed by classi-
recognition, Section 3 presents our approach, Section 4 shows
fication using sparse representation techniques. We present
the results of our experiments on datasets for face recognition,
our recognition rates on a difficult dataset that represents
and Section 5 concludes the paper.
real-world faces where we significantly outperform state-of-
the-art methods.
2. PREVIOUS WORK
Index Terms— Face recognition, sparse representation,
alignment, mesh warping, facial landmarks. The initial methods developed for face recognition used in-
dividual features on the faces, such as eyes, mouth or nose
1. INTRODUCTION to perform identification [2]. However such methods did not
lead to good results because of the variability of poses and the
Face recognition is probably one of the most prominent areas low amount of information used.
of research in imaging and has a wide range of real-world From the 90s, new methods that use global features of the
applications including surveillance, access control, identity faces were developed. For example, Turk and Pentland pro-
authentication [1], and photo-management. Face recognition posed EigenFaces [3] that uses Principal Component Analy-
systems either perform face verification, i.e., classify a pair sis (PCA). Other methods like Fisherfaces [4] or Laplacian-
of pictures as belonging to the same individual or not, or per- faces [5] extract features from face images and perform near-
form face identification, i.e., put a label on an unknown face est neighbor identification using Euclidean distance measure.
with respect to some training set. In this paper, we address Baback et al. [6] use a bayesian approach where a probabilis-
the latter problem of face identification. tic similarity measure is used to perform classification.
During the last thirty years automatic face recognition has Wright et al. applied the ideas of sparse coding to face
seen considerable progress. Despite this, face recognition is recognition: they proposed the Sparse Representation based
a very challenging problem when the training examples are Classification (SRC) scheme [7], a dictionary learning based
few and the conditions of capture are unconstrained, resulting approach to recognize faces. This method, which can be seen
in face images varying widely in orientation, expression, and as an improvement over the previous ones, is far more robust
illumination. and is able to handle occlusions and corruption of face im-
In our work, we focus on the difficult problem of recog- ages. The SRC algorithm led to other approaches [8, 9, 10,
nizing faces captured in uncontrolled environments. We im- 11] that use sparsity and improve the robustness in dealing
pose additional constraints on the number of training samples with face alignment and pose variation issues.
and on computational efficiency without needing any special- Following the success of the use of sparsity in face recog-
ized hardware. This rules out deep learning approaches which nition Zhang et al. [12] questioned if sparsity was the key to
are data and computation hungry. the success of the SRC algorithm. They concluded that it is
Our contribution is a face recognition scheme that per- the use of collaborative representation (i.e., using an overcom-
forms automatic face alignment and recognition of detected plete dictionary) and not the sparsity constraint that improves

978-1-5090-4117-6/17/$31.00 ©2017 IEEE 1482 ICASSP 2017


face recognition performance. This led to the development gles to map to the corresponding triangle on the reference face
of other algorithms like MSPCRC or PCRC [13] that use this mesh. This is done by an affine transformation consisting of
idea of collaborative representation. a rotation, a scaling, and a translation in order to map a point
Blanz and Vetter [14] use 3D scans of heads to learn a 3D [x y]T on the input face to a point [x0 y 0 ]T on the reference
model of faces and fit this model to 2D images of faces. Clas- face as:  0     
sification is done using the 3D representation. The advantage x a b x t
= + x (1)
of such a method is its independence to face orientation or il- y0 c d y ty
lumination. For instance, Zhu et al. [15] use 3D morphable
where a, b, c, d are the rotation and scaling parameters while
models to eliminate pose and expression variations.
tx , ty are the translation parameters. The warped image thus
In the recent years, deep learning methods [16, 17] have
obtained is now aligned such that the inter-eye distance and
been adapted to the face recognition problem. These methods
chin-eye distance are roughly the same in all images. This is
achieve very good recognition rates and clearly outperform
done with the following transformations:
the “standard” algorithms. However they generally require a
considerable amount of data and specialized hardware to train 1. Rotation to force the eyes to be horizontally aligned.
and deploy in practice. This makes them hard to train and less 2. Rescaling to obtain fixed inter-eye and eye-chin dis-
suited for embedded and low power devices. tances.
3. Translation to set the position of the left eye to a fixed
3. OUR APPROACH predefined value.
4. Cropping the relevant 30 × 30 pixel part of the face
In order to be robust, computationally efficient, and use min- image.
imal training, we choose to use the Robust Sparse Coding
(RSC) algorithm [9] with modifications. However, since the This completes the face alignment process. The faces for
RSC algorithm represents an input image as a linear combi- training are pre-processed as explained above and stored. In
nation of the training images, all images used in this approach order to perform identification, every candidate image is sub-
should be of similar size, well-aligned, and fully-frontal. It is ject to the same alignment process. The mapping of all the
mandatory to align the face images prior using RSC on real- triangles to the reference results in an image whose landmark
world images. We thus describe our novel approach for auto- points coincide with the landmarks of the reference image.
matically aligning faces and then the application of RSC on This deformation results in a kind of “frontalization” of the
them for identification. input image, as shown in Figures 1e and 2c. The goal of this
phase is not to obtain a visually accurate version of the input
image, but to prepare the image for the recognition step. No-
3.1. Automatic face alignment
tably, there exist automatic face alignment techniques [21, ?]
To align a given face to a reference, we mesh-warp the input but they are computationally too expensive to allow real-time
face image. The goal of mesh warping is to deform the input applications. Our pre-processing, as described in Section 3.1,
image to match its features with the corresponding features takes roughly 0.1s per image with our Python implementa-
of a reference image based on a triangulation mesh. This is tion, leaving ample room for an even more efficient imple-
done in three steps - detection of face, detection of facial land- mentation.
marks, and face warping.
First we detect faces on the picture using the Viola-Jones 3.2. Face recognition
approach [18]. Then we detect landmarks, i.e. particular
points on the face that are present in all face images and We use a modified version of the Robust Sparse Coding
whose correspondences are supposed to be preserved, on the (RSC) algorithm [9] for recognizing faces. The RSC algo-
faces. For facial landmark detection we use the regression rithm is an improvement of the SRC [7] method. The SRC
tree method of Kazemi and Sullivan [19] implemented in the method creates a dictionary matrix D containing all training
Dlib [20] library. The 68 detected landmarks are mainly lo- images. For a given unknown image y the goal is to find a
cated on the eyes, nose, and mouth as shown in Fig. 1d. vector of weights x such that y = Dx. By using the `1 -norm
Apart from the 68 detected landmarks, we add equally- of x as the regularizer, x is forced to be sparse.
spaced points on the border of the face square. This allows RSC [9] differs from SRC in the use of a diagonal weight
us to compute the Delaunay triangulation mesh that covers matrix W for improved robustness to occlusions and lighting
the entire face image, as in Figure 1b. For the landmarks of changes, and in the use of a Maximum Likelihood Estimator
each input face image we copy this reference face triangula- to solve the sparse coding problem. In summary, the RSC
tion mesh. We now simply need to warp each of these trian- algorithm solves the weighted-LASSO problem:
1 Source:http://lanimeshvariousarticles.blogspot. 2
min W 1/2 (y − Dx) s.t. kxk1 6 ε (2)

fr/2014/05/the-science-of-sex-appeal-face.html x 2

1483
(a) Reference face 1 (b) Triangle mesh (c) LFW sample (d) Face landmarks (e) Mesh warping (f) Aligned image

Fig. 1: The steps of alignment using our method on LFW sample image of G.W. Bush. Output (f) is cropped and resized after alignment.

where ε is a constant representing the noise level. If we use


the `2 -norm instead of the `1 -norm we obtain the regularized
least squares problem
2
2
min W 1/2 (Dx − y) + λ kxk2 (3)

x 2

with λ > 0, which has the analytical solution

x = (DT W D + λI)−1 DT W y (4)

Even though this solution requires inverting a matrix of size


#(training samples) × #(training samples), it is computa-
tionally much more efficient than the `1 -regularized version.
On the AR database [22], using the procedure described
in [9], we obtain a recognition rate of 95.0% in 2.4s with the
`1 -norm and a rate of 94.1% in 0.6s with the `2 -norm. That
is, using the `2 -norm we lose less than 1% in accuracy for a
4-fold speed-up. We thus use the modified `2 -norm version
of the RSC algorithm.

4. RESULTS

On well-aligned images of the AR dataset [22], our modi-


fied RSC algorithm achieves 95.0% recognition rate. So, for
our experiments we use the more challenging Labeled Faces
in the Wild (LFW) database [23, 24], which consists of un-
aligned real-world images.
The full LFW dataset contains more than 13,000 images
of 5749 individuals in unconstrained environments. Among
these, 158 individuals have at least 10 distinct images. To
be able to compare our method with the state-of-the-art we
use the LFWa version [25] of the dataset, which consists of
the LFW dataset images that are pre-aligned using a commer-
cial alignment software. A few such images can be seen in
Fig. 2a. Despite the pre-alignment, these images are not well
suited for RSC [9] based recognition. As we note later in Ta- (a) Original image (b) Triangulation (c) Warped
ble 1, our alignment proves to be more effective for improving
recognition rates. Fig. 2: Mesh warping examples on pictures of David Beckham, Tony
In order to prove the effectiveness of our method, for the Blair, Gordon Brown, Angelina Jolie and Hu Jintao. Despite the dis-
158 classes we randomly select 7 images for training and 3 for tortions introduced by our warping, the resulting aligned images of
testing we run the following three recognition experiments: column (c) are better suited for recognition than the original images
of column (a). The results in Table 1 prove this.
1. Applying RSC algorithm on the original LFWa dataset.

1484
2. Applying RSC algorithm on the faces detected on the ing samples is very fast since the size of the dictionary reduces
LFWa images. significantly.
3. Applying our modified RSC algorithm, after performing
our alignment step on the LFWa images.
Method Rate
The images from the LFWa dataset are 250 × 250 pixels in
NN 10.6%
size. For the first experiment, we resize them to 50 × 50 with-
SRC [7] 22.3%
out any other modifications. For the second experiment we
ESRC [11] 26.7%
detect faces in the LFWa images and we resize the face re-
PCRC [13] 25.0 ± 1.8%
gion of the images to size 50 × 50. In the third experiment,
SVDL [10] 30.2%
which corresponds to our algorithm, the final aligned images
Ours 33.3 ± 3.4%
are of size 30×30. The results are summarized in the Table 1.
Our time 0.02 s
Exp. 1 Exp. 2 Exp. 3
Recognition rate (%) 19.6 28.8 76.4 Table 3: Recognition rates on the LFWa dataset for the extreme
Time for one image (s) 3.2 3.0 1.6 case of using a single training sample per person.

Table 1: Results on the LFWa database with 7 training images and


3 test images.
For the Nearest Neighbor, SRC [7], ESRC [11] and
SVDL [10] methods we took the results from Yang et al. [10],
The results presented in the Table 1 show clearly that which correspond to the algorithms run with 2000 dimen-
the alignment phase performed with our method is essential sions, which gives the best performance in their case. The
to obtain good recognition rates. We obtain scores that are PCRC values are taken from Zhu et al. article [13] which uses
nearly 4 times better than those with the use of raw LFWa im- images resized to 80 × 80. For our simulations, we run our
ages while simultaneously halving the runtime. This demon- algorithm 10 times with different training and testing images
strates the effectiveness our alignment before running face in order to compute the means and standard deviations.
recognition algorithms on real-world images.
The results of our experiments (Tables 1, 2, 3) using the
We also compare our method with recent state-of-the-art [7, LFW dataset with a small number of training samples show
12, 13] that present benchmark results on the LFWa dataset that our method provides better recognition rates than other
using only 2 and 5 training samples. Table 2 shows this algorithms in near real-time. It is worth mentioning that deep
comparison. learning methods perform considerably better, achieving
recognition rates around 96% on the LFW dataset as ex-
plained in [17]. However, such methods need a large amount
Method 2 5
of training data (300000 images [17]) and powerful hardware
NN 9.3 ± 1.7% 14.3 ± 1.9% to handle the computation involved. Our method, on the other
SRC [7] 24.4 ± 2.4% 44.1 ± 2.6% hand, can perform fairly accurate recognition with less than
CRC [12] 27.4 ± 2.1% 42.0 ± 3.2% 10 training samples in a very efficient manner.
MSPCRC [13] 35.0 ± 1.6% 41.1 ± 2.8%
Ours 51.1 ± 2.9% 74.2 ± 2.5%
Our time 0.15 s 0.85 s

Table 2: Recognition rates on the LFWa dataset for different meth-


ods and with 2 and 5 training samples per person using settings as 5. CONCLUSION
proposed in [13].

We presented a computationally efficient face recognition


We finally run experiments with one unique training im- technique for real-world images that requires less than 10
age per person. This setting is quite extreme because the iden- training examples and ordinary hardware to deliver near real-
tity of a person is only determined by a single image and af- time recognition. Compared to existing state-of-the-art we
fects the robustnesss of the algorithms. However, using such nearly double the recognition rate while halving the compu-
a small number of training images may be inevitable for many tational runtime. We presented results on the LFW dataset
real life applications. The results of this comparison are pre- that shows that our method significantly outperforms existing
sented in Table 3. Note that the recognition using single train- non-deep learning algorithms.

1485
6. REFERENCES [12] Lei Zhang, Meng Yang, and Xiangchu Feng, “Sparse
representation or collaborative representation: Which
[1] Michel Owayjan, Amer Dergham, Gerges Haber, Nidal helps face recognition?,” in 2011 International Confer-
Fakih, Ahmad Hamoush, and Elie Abdo, “Face recog- ence on Computer Vision. IEEE, 2011, pp. 471–478.
nition security system,” in New Trends in Network-
ing, Computing, E-learning, Systems Sciences, and En- [13] Pengfei Zhu, Lei Zhang, Qinghua Hu, and Simon C. K.
gineering, pp. 343–348. Springer, 2015. Shiu, “Multi-scale patch based collaborative represen-
tation for face recognition with margin distribution op-
[2] Woodrow Wilson Bledsoe, “Man-machine facial recog- timization,” ECCV, 2012.
nition,” Panoramic Research Inc., 1966. [14] Volker Blanz and Thomas Vetter, “Face Recognition
[3] Matthew Turk and Alex Pentland, “Eigenfaces for Based on Fitting a 3D Morphable Model,” IEEE Trans-
Recognition,” Journal of Cognitive Neuroscience, vol. actions on Pattern Analysis and Machine Intelligence,
3, no. 1, 1991. vol. 25, no. 9, pp. 1063–1074, September 2003.
[15] Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and
[4] Peter N. Belhumeur, João P. Hespanha, and David J. Stan Z Li, “High-fidelity pose and expression normal-
Kriegman, “Eigenfaces vs. Fisherfaces: Recognition ization for face recognition in the wild,” in Proceedings
Using Class Specific Linear Projection,” IEEE Trans- of the IEEE Conference on Computer Vision and Pattern
actions on Pattern Analysis and Machine Intelligence, Recognition, 2015, pp. 787–796.
vol. 19, no. 7, pp. 711–720, 1997.
[16] Florian Schroff, Dmitry Kalenichenko, and James
[5] Xiaofei He, Shuicheng Yan, Yuxiao Hu, Partha Niyogi, Philbin, “FaceNet: A Unified Embedding for Face
and Hong-Jiang Zhang, “Face recognition using lapla- Recognition and Clustering,” 2015.
cianfaces,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 27, pp. 328–340, 2005. [17] Yi Sun, Ding Liang, Xiaogang Wang, and Xiaoou Tang,
“DeepID3: Face Recognition with Very Deep Neural
[6] Baback Moghaddam, Tony Jebara, and Alex Pentland, Networks,” February 2015.
“Bayesian face recognition,” Pattern Recognition, vol.
[18] Paul Viola and Michael Jones, “Robust real-time object
33, pp. 1771–1782, 2000.
detection,” in IJCV, 2001.
[7] John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar [19] Vahid Kazemi and Josephine Sullivan, “One Millisec-
Sastry, and Yi Ma, “Robust Face Recognition via Sparse ond Face Alignment with an Ensemble of Regression
Representation,” IEEE Transactions on pattern analysis Trees,” in CVPR. 2014, IEEE Computer Society.
and machine intelligence, vol. 31, no. 2, pp. 210–227,
February 2009. [20] Davis E. King, “Dlib-ml: A Machine Learning Toolkit,”
Journal of Machine Learning Research, vol. 10, pp.
[8] Andrew Wagner, John Wright, Arvind Ganesh, Zhou Zi- 1755–1758, 2009.
han, Hossein Mobahi, and Yi Ma, “Towards a Practical
[21] Gary B. Huang and Vidit Jain, “Unsupervised joint
Face Recognition System: Robust Alignment and Illu-
alignment of complex images,” in ICCV, 2007.
mination by Sarse Representation,” .
[22] A.M Martinez and R. Benavente, “The AR Face
[9] Meng Yang, Jian Yang, and David Zhang, “Robust Database,” Tech. Rep. 24, CVC, June 1998.
Sparse Coding for Face Recognition,” IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), [23] Gary B. Huang, Manu Ramesh, Tamara Berg, and
pp. 625–632, June 2011. Erik Learned-Miller, “Labeled Faces in the Wild:
A Database for Studying Face Recognition in Uncon-
[10] Meng Yang, Luc Van Gool, and Lei Zhang, “Sparse strained Environments,” Tech. Rep. 07-49, University
variation dictionary learning for face recognition with of Massachusetts, Amherst, October 2007.
a single training sample per person,” in The IEEE In-
[24] Gary B. Huang Erik Learned-Miller, “Labeled Faces
ternational Conference on Computer Vision, December
in the Wild: Updates and New Reporting Proce-
2013.
dures,” Tech. Rep. UM-CS-2014-003, University of
[11] Weihong Deng, Jiani Hu, and Jun Guo, “Extended SRC: Massachusetts, Amherst, May 2014.
Undersampled Face Recognition via Intraclass Variant [25] Lior Wolf, Tal Hassner, and Yaniv Taigman, “Similarity
Dictionary,” IEEE Transactions on Pattern Analysis and scores based on background samples,” Asian Confer-
Machine Intelligence, vol. 34, no. 9, pp. 1864–1870, ence on Computer Vision, September 2009.
September 2012.

1486

You might also like