Real Time Face Detection and Recognition
Real Time Face Detection and Recognition
Abstract—Face detection and recognition have both liability, interpersonal relationship, and commercial
been active research areas over the past few decades applications. In reality, even though one could recognize
and have been proven effective in many applications thousand of different faces throughout their lifetime, the
such as computer security and artificial intelligence. process of human recognition eventually becomes difficult
This paper introduces a practical system for tracking in case of aging, size, illumination, rotation and time apart.
and recognizing faces in real time using a webcam. The Unlike other systems of identification, the significance of
first part of the system is facial detection, which is facial recognition algorithm does not require the
achieved using Haar featurebased cascade classifiers, a collaboration of individual after the first detection and by
novel way proposed by Paul Viola and Michael Jones in taking into account these side factors, the facial recognition
their 2001 paper, “Rapid Object Detection using a algorithm would able to produce a very accurate results
Boosted Cascade of Simple Features” [1]. To further compared to other verifications as well as human
improve the method, geometric transformations are recognition skills.
applied to each frame for face detection, allowing
detection up to 45 degrees of head tilting. The second The goal of this paper is to propose a practical system for
part of the system, face recognition, is achieved through tracking and recognizing faces in real time using a webcam.
a hybrid model consisting of feature extraction and The design is shown as follows in Figure 1.
classification trained on the cropped Extended Yale
Face Database B [2]. To build the model, 2452 samples Fig. 1. Real time Face Detection and Recognition System
from 38 people in the database are splitted into Design Model
training and testing sets by a ratio of 3:1. The top 150
eigenfaces are extracted from 1839 training faces in the
database using Principal Component Analysis (PCA).
The principal components are then feeded into the
CSVM Classification model and trained with various
kernel tricks. At the end of the recognition task, an
accuracy of 93.3% is obtained with the Radial Basis
Function (RBF) kernel on the testing set of 613 samples.
Used in the real time application via webcam, the
proposed system runs at 10 frames per second with high
recognition accuracy relative to the number of training
images of real time testers and how representative those
training images are.
The social informational form of nonverbal In the face detection part of the system, Haar feature
communication such as facial expression have become a based cascade classifiers are used to detect faces in frames
primary focus of attention within our society nowadays in obtained from the webcam video stream. Haar cascade is a
many different areas including security system, law and machine learning approach for visual detection because it is
enforcement, criminal identification, realistic authentication trained from a great amount of positive and negative
as well as credit card verification. Therefore, many images [4]. Once the cascade of classifiers are trained, they
researchers have proposed several different algorithms for are capable of processing images effectively and rapidly
solving these issues that are considered as vital to human
[3]. The face detection part of the proposed system is
implemented in python using OpenCV. Haar feature based Fig. 3. Haar Features Used By OpenCV Cascade Classifiers
detection algorithm is used with the trained face cascade [4]
classifiers that came with OpenCV.
A. METHOD DESCRIPTION
Haar feature based classifiers are appearance based.
When using this algorithm to detect a face, a combinations
of features such as head shape, motion and skin color tones
are compared and evaluated. This combination of features
are detected and extracted through a cascade of weak
classifiers, or stages that Haar cascade is consisted of.
During detection, those classifiers are applied to a region of
interest subsequently until at some stages, the region fails
in a classifier and is rejected. Those classifiers at every
stage are built out of basic decision tree classifiers using
different Adaboosting techniques [4]. The fundamental
concept for detecting objects is the utilization of Haarlike
features since they are the basic inputs of those classifiers.
Haarlike features exploits the contrasting values of
adjacent grouping pixels, which are then used to detect
lighting differences from the images. For example, However, there are tens of thousands of Haarlike
Haarlike features can be selected rely the property that the features on human faces, how would one select the best
eyes are darker than the nose bridge as shown in Figure 2. features? Feature selection is achieved through a recent
Those haar features usually consists of two or three of those advancement called Adaboost, which is a machine learning
grouping pixels and can be scaled to fit the region of technique that finds the best threshold amount all training
interest being examined [3]. The cascade classifiers in images that will classify the face to positive and negative
OpenCV use the following Haar features in Figure 3. based on lighting differences. During which, the features
with minimum error rate, or the features that best classifies
the face and non face region are selected. According to the
Fig. 2. Use Haar Features To Finding Lighting Differences original paper by Viola and Jones, with only 200 selected
on Human Faces [4] features face detection can reach up to 95% accuracy.
Tilt 0 Degrees
The facial recognition part of the system is a hybrid Here we provide a detailed step by step explanation on
model mainly consisted of three modules: data preparation, how to use PCA to extract Eigenfaces feature vectors:
feature extraction and classification.
1. Zero mean the data, compute mean and subtract
A. DATABASE PREPARATION from all the data: z (i) = x (i) − m
The database used to train the CSVM Classification 2. Compute the covariance matrix, normalize it and
model is prepared using the cropped faces from the its eigenvectors and their associated eigenvalues.:
extended Yale Face Database B, which contains 2452 [Vp,Dp] = eig(1/N*Z*Z’)
images from 37 human subjects under 9 poses and 64 3. Sort the eigenvectors in order of decreasing
illumination conditions [2]. The initial conditions for all eigenvalue: [V, D] = eigsort(Vp,Dp);
training and testing image are that all images must be taken 4. Compute principal components: C = V’*Z;
in similar illuminations and without body obstruction. 5. Project original data onto reduced component
Images must also be in grayscale and contains only facial ‘principal component’ space, take the top k
features with height ranges from the hairline above eigenvectors: C_hat = C(1:k,:); where k≤p, p =
forehead to the chain. The originally cropped images are original data dimension. The top k eigen vectors
192 by 168 pixels, and they are resized to be 50 by 50 are the eigenfaces we are looking for. [6]
pixels to achieve higher processing speed during training
and detection. A visual representation of the extracted feature vectors
from the prepared database is shown in the plot gallery of
To test the integrated system in real time, the images of the 12 most significative eigenfaces in figure 7.
user’s face must be scanned, cropped out, resized and
grayscaled until they satisfy the initial conditions sued for Fig. 7. Top 12 Eigenfaces
database preparation discussed above, and then they can be
stored in the database for feature extraction and SVM
training. I have build a automatic training system in
OpenCV that allows users to take pictures of their faces
(The code is in Appendix C). The training system would
crop out the user’s faces following the initial conditions for
data preparation using the face detection system discussed
above and specify a customized face profile directory in a
default database to save all the training images in real time.
B.FEATURE EXTRACTION
To build the model from the initially prepared database
from the Extended Yale Face Database, 2452 samples
from 38 people in the database are splitted into training
and testing sets by a ratio of 3:1 randomly. The next step is
C. CLASSIFICATION At the end of the recognition task, an accuracy of 93.3% is
CSVM is used for classification part of the model. obtained with the Radial Basis Function (RBF) kernel on
CSVM, or type 1 SVM is a statistical model that separates the testing set of 613 samples. To show the quantitative
data sets by having maximum distances between data evaluation of the model quality, result of the prediction on
classes. Through the search of an optimal hyperplane, data a portion of the test set are plotted as shown below in
classes are distinguished and separated. The bounds Figure 8.
between the data classes and OSH are the so called support
vectors [2]. The training involves the minimization of the
error function of the CSVM classification model. The Fig. 8. Sample Predictions From Extended Yale Face
error function: Database B
TABLA I
Comparisons Of Different Kernel Tricks For Facial Recognition
http://docs.opencv.org/2.4/modules/imgproc/doc/geometric
_transfor mations.html.
"""
====================================================
Faces recognition and detection using OpenCV
====================================================
http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html
Summary:
Real time facial tracking and recognition using harrcascade
and SVM
To Ignore Warnings:
python W ignore main.py
"""
import cv2
import os
import numpy as np
from scipy import ndimage
from time import time
import logging
import matplotlib.pyplot as plt
import utils as ut
import svm
import warnings
print(__doc__)
###############################################################################
# Building SVC from database
# load YaleDatabaseB
load_Yale_Exteded_Database(40)
# print target_names
print face_target.shape[0], " samples from ", len(target_names), " people are loaded"
for i in range(1,2): print ("\n")
###############################################################################
# Facial Recognition and Tracking Live
def get_rotation_map(rotation):
""" Takes in an angle rotation, and returns an optimized rotation map """
if rotation > 0: return rotation_maps.get("right", None)
if rotation < 0: return rotation_maps.get("left", None)
if rotation == 0: return rotation_maps.get("middle", None)
current_rotation_map = get_rotation_map(0)
webcam = cv2.VideoCapture(0)
ret, frame = webcam.read() # get first frame
frame_scale = (frame.shape[1]/SCALE_FACTOR,frame.shape[0]/SCALE_FACTOR) # (y, x)
crop_face = []
num_of_face_saved = 0
while ret:
key = cv2.waitKey(1)
# exit on 'q' 'esc' 'Q'
if key in [27, ord('Q'), ord('q')]:
break
# resize the captured frame for face detection to increase processing speed
resized_frame = cv2.resize(frame, frame_scale)
processed_frame = resized_frame
# Skip a frame if the no face was found last frame
t0 = time()
if frame_skip_rate == 0:
faceFound = False
for rotation in current_rotation_map:
# for f in faces:
# x, y, w, h = [ v*SCALE_FACTOR for v in f ] # scale the bounding box back to original frame size
# cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0))
# cv2.putText(frame, "DumbAss", (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0))
if len(faces):
for f in faces:
# Crop out the face
x, y, w, h = [ v for v in f ] # scale the bounding box back to original frame size
crop_face = rotated_frame[y: y + h, x: x + w] # img[y: y + h, x: x + w]
crop_face = cv2.resize(crop_face, DISPLAY_FACE_DIM, interpolation = cv2.INTER_AREA)
# Name Prediction
face_to_predict = cv2.resize(crop_face, FACE_DIM, interpolation = cv2.INTER_AREA)
face_to_predict = cv2.cvtColor(face_to_predict, cv2.COLOR_BGR2GRAY)
face_to_predict = face_to_predict.ravel()
name_to_display = svm.predict(clf, pca, face_to_predict, target_names)
# Display frame
cv2.rectangle(rotated_frame, (x,y), (x+w,y+h), (0,255,0))
cv2.putText(rotated_frame, name_to_display, (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0))
faceFound = True
break
if faceFound:
frame_skip_rate = 0
# print "Face Found"
else:
frame_skip_rate = SKIP_FRAME
# print "Face Not Found"
else:
frame_skip_rate = 1
# print "Face Not Found"
if len(crop_face):
cv2.imshow("Cropped Face", cv2.cvtColor(crop_face, cv2.COLOR_BGR2GRAY))
# face_to_predict = cv2.resize(crop_face, FACE_DIM, interpolation = cv2.INTER_AREA)
# face_to_predict = cv2.cvtColor(face_to_predict, cv2.COLOR_BGR2GRAY)
# name_to_display = svm.predict(clf, pca, face_to_predict, target_names)
# get next frame
ret, frame = webcam.read()
webcam.release()
cv2.destroyAllWindows()
Appendix B (utils.py)
"""
Auther: Chenxing Ouyang <[email protected]>
Summary: Utilties used for facial tracking in OpenCV and facial recognition in SVM
"""
import cv2
import numpy as np
from scipy import ndimage
import os
import errno
###############################################################################
# Used For Facial Tracking and Traning in OpenCV
def create_directory(path):
""" create directories for saving images"""
try:
print "Making directory"
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
###############################################################################
# Used for Facial Recognition in SVM
"""
Auther: Chenxing Ouyang <[email protected]>
"""
import cv2
import numpy as np
from scipy import ndimage
import sys
import os
import utils as ut
def get_rotation_map(rotation):
""" Takes in an angle rotation, and returns an optimized rotation map """
if rotation > 0: return rotation_maps.get("right", None)
if rotation < 0: return rotation_maps.get("left", None)
if rotation == 0: return rotation_maps.get("middle", None)
current_rotation_map = get_rotation_map(0)
webcam = cv2.VideoCapture(0)
crop_face = []
num_of_face_to_collect = 150
num_of_face_saved = 0
if len(sys.argv) == 1:
print "\nError: No Saving Diectory Specified\n"
exit()
elif len(sys.argv) > 2:
print "\nError: More Than One Saving Directory Specified\n"
exit()
else:
profile_folder_path = ut.create_profile_in_database(sys.argv[1])
while ret:
key = cv2.waitKey(1)
# exit on 'q' 'esc' 'Q'
if key in [27, ord('Q'), ord('q')]:
break
# resize the captured frame for face detection to increase processing speed
resized_frame = cv2.resize(frame, frame_scale)
processed_frame = resized_frame
# Skip a frame if the no face was found last frame
if frame_skip_rate == 0:
faceFound = False
for rotation in current_rotation_map:
# for f in faces:
# x, y, w, h = [ v*SCALE_FACTOR for v in f ] # scale the bounding box back to original frame size
# cv2.rectangle(frame, (x,y), (x+w,y+h), (0,255,0))
# cv2.putText(frame, "DumbAss", (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0))
if len(faces):
for f in faces:
x, y, w, h = [ v for v in f ] # scale the bounding box back to original frame size
crop_face = rotated_frame[y: y + h, x: x + w] # img[y: y + h, x: x + w]
crop_face = cv2.resize(crop_face, FACE_DIM, interpolation = cv2.INTER_AREA)
cv2.rectangle(rotated_frame, (x,y), (x+w,y+h), (0,255,0))
cv2.putText(rotated_frame, "DumbAss", (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,255,0))
faceFound = True
break
if faceFound:
frame_skip_rate = 0
# print "Face Found"
else:
frame_skip_rate = SKIP_FRAME
# print "Face Not Found"
else:
frame_skip_rate = 1
# print "Face Not Found"
if len(crop_face):
cv2.imshow("Cropped Face", cv2.cvtColor(crop_face, cv2.COLOR_BGR2GRAY))
if num_of_face_saved < num_of_face_to_collect and key == ord('p'):
face_to_save = cv2.resize(crop_face, (50, 50), interpolation = cv2.INTER_AREA)
face_name = profile_folder_path+str(num_of_face_saved)+".png"
cv2.imwrite(face_name, face_to_save)
print "Pic Saved: ", face_name
num_of_face_saved += 1
webcam.release()
cv2.destroyAllWindows()
Appendix D (svm.py)
"""
Auther: Chenxing Ouyang <[email protected]>
"""
import cv2
import os
import numpy as np
from scipy import ndimage
from time import time
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
from sklearn.cross_validation import train_test_split
import utils as ut
Build SVM classification modle using the face_data matrix (numOfFace X numOfPixel)
and face_target array, face_dim is a tuple of the dimension of each image(h,w)
Returns the SVM classification modle
"""
X = face_data
y = face_target
# This portion of the code is used if the data is scarce, it uses the number
# of imputs as the number of features
# pca = RandomizedPCA(n_components=None, whiten=True).fit(X_train)
# eigenfaces = pca.components_.reshape((pca.components_.shape[0], face_dim[0], face_dim[1]))
# clf = SVC(kernel='poly')
# Train_pca Test Error Rate: 0.201005025126
# Train_pca Test Recognition Rate: 0.798994974874
# clf = SVC(kernel='sigmoid')
# Train_pca Test Error Rate: 0.985318107667
# Train_pca Test Recognition Rate: 0.0146818923328
###############################################################################
# Quantitative evaluation of the model quality on the test set
print("\nPredicting people's names on the test set")
y_pred = clf.predict(X_test_pca)
# print "predicated names: ", y_pred
# print "actual names: ", y_test
print "Test Error Rate: ", ut.errorRate(y_pred, y_test)
print "Test Recognition Rate: ", 1.0ut.errorRate(y_pred, y_test)
###############################################################################
# Testing
X_test_pic1 = X_test[0]
X_test_pic1_for_display = np.reshape(X_test_pic1, face_dim)
t0 = time()
pic1_pred_name = predict(clf, pca, X_test_pic1, target_names)
print("\nPrediction took %0.3fs" % (time() t0))
print "\nPredicated result for picture_1 name: ", pic1_pred_name
for i in range(1,3): print ("\n")
###############################################################################
# Qualitative evaluation of the predictions using matplotlib
# import matplotlib.pyplot as plt
# plt.show()
# This portion of the code is used if the data is scarce, it uses the number
# of imputs as the number of features
# pca = RandomizedPCA(n_components=None, whiten=True).fit(X_train)
# eigenfaces = pca.components_.reshape((pca.components_.shape[0], face_dim[0], face_dim[1]))