0% found this document useful (0 votes)
47 views

Neural Network

Facial recognition software uses deep learning algorithms to map facial features mathematically and compare live images to stored faceprints to verify identity. High quality cameras allow facial recognition on mobile devices. Apple's Face ID captures over 30,000 variables from facial scans to authenticate users and payments on iPhones. Character recognition allows computers to identify written or printed characters like numbers and letters and convert them to a usable format. Models use convolutional and recurrent neural networks with a connectionist temporal classification layer to map images to character sequences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Neural Network

Facial recognition software uses deep learning algorithms to map facial features mathematically and compare live images to stored faceprints to verify identity. High quality cameras allow facial recognition on mobile devices. Apple's Face ID captures over 30,000 variables from facial scans to authenticate users and payments on iPhones. Character recognition allows computers to identify written or printed characters like numbers and letters and convert them to a usable format. Models use convolutional and recurrent neural networks with a connectionist temporal classification layer to map images to character sequences.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Face Recognition

What is Face Recognition?

Facial recognition is a category of biometric software that maps an individual's facial features
mathematically and stores the data as a faceprint. The software uses deep learning algorithms to compare
a live capture or digital image to the stored faceprint in order to verify an individual's identity.

High-quality cameras in mobile devices have made facial recognition a viable option for authentication as
well as identification. Apple’s iPhone X, for example, includes Face ID technology that lets users unlock
their phones with a faceprint mapped by the phone's camera. The phone's software, which is designed with
3-D modeling to resist being spoofed by photos or masks, captures and compares over 30,000 variables.
As of this writing, Face ID can be used to authenticate purchases with Apple Pay and in the iTunes Store,
App Store and iBooks Store. Apple encrypts and stores faceprint data in the cloud, but authentication takes
place directly on the device.

Model Overview:

What is one short learning?


In one shot learning, only one image per person is stored in the database, which is passed through the neural
network to generate an embedding vector. This embedding vector is compared with the vector generated
for the person who has to be recognized. If there exist similarities between the two vectors then the system
recognizes that person, else that person is not there in the database. This can be understood by below picture.

Fig: One short learning

1 JnU/M.Sc. in CSE(e)/7th Bach


Understanding the basic design

Let’s visualize how to create basic facial recognition application using a pre-trained deep neural network.
Training of the network has already been done as shown in below diagram.

Fig: Open Face’s training module

I am using this pre-trained network to compare the embedding vectors of the images stored in the file
system with the embedding vector of the image captured from the webcam. This can be explained by below
diagram.

Fig: Facial recognition using one-shot learning

As per the above diagram, if the face captured by webcam has similar 128-bit embedding vector stored in
the database then it can recognize the person. All the images stored in the file system are converted to a
dictionary with names as key and embedding vectors as value.

Calculating similarity between two images


To compare two images for similarity, we compute the distance between their embedding’s. This can be
done by either calculating Euclidean (L2) distance or Cosine distance between the 128-dimensional vectors.
If the distance is less than a threshold (which is a hyper parameter), then the faces in the two pictures are
of the same person, if not, they are two different persons.

2 JnU/M.Sc. in CSE(e)/7th Bach


What is an Affine transformation?
Pose and illumination have been a long-standing challenge in face recognition. A potential bottleneck in
the face recognition system is that the faces could be looking in different directions, which can result in
generating a different embedding vector each time. We can address this issue by applying an Affine
transformation to the image as shown in the below diagram.

Fig: Affine transformation to normalize the face

An affine transformation rotates the face and makes the position of the eyes, nose, and mouth for each face
consistent. Performing an affine transformation ensure the position eyes, mouth and nose to be fixed, which
aid in finding the similarity between two images while applying one-shot learning on face recognition.

Face Recognition Code for matlab:


% by Tolga Birdal
% Implementation of the paper:
% "A simple and accurate face detection algorithm in complex background"
% by Yu-Tang Pai, Shanq-Jang Ruan, Mon-Chau Shie, Yi-Chi Liu

% Additions by me:
% Minumum face size constraint
% Adaptive theta thresholding (Theta is thresholded by mean2(theata)/4
% Parameters are modified by to detect better. Please check the paper for
% parameters they propose.
% Check the paper for more details.

% usage:
% I=double(imread('c:\Data\girl1.jpg'));
% detect_face(I);
% The function will display the bounding box if a face is found.

% Notes: This algorithm is very primitive and doesn't work in real life.
% The resaon why I implement is that I believe for low cost platforms
% people need such kind of algorithms. However this one doesn't perform so
% well in my opinion (if I implemented correctly)

function []=detect_face(I)

3 JnU/M.Sc. in CSE(e)/7th Bach


close all;

% No faces at the beginning


Faces=[];
numFaceFound=0;

I=double(I);

H=size(I,1);
W=size(I,2);
R=I(:,:,1);
G=I(:,:,2);
B=I(:,:,3);

%%%%%%%%%%%%%%%%%% LIGHTING COMPENSATION %%%%%%%%%%%%%%%


YCbCr=rgb2ycbcr(I);
Y=YCbCr(:,:,1);

%normalize Y
minY=min(min(Y));
maxY=max(max(Y));
Y=255.0*(Y-minY)./(maxY-minY);
YEye=Y;
Yavg=sum(sum(Y))/(W*H);

T=1;
if (Yavg<64)
T=1.4;
elseif (Yavg>192)
T=0.6;
end

if (T~=1)
RI=R.^T;
GI=G.^T;
else
RI=R;
GI=G;
end

C=zeros(H,W,3);
C(:,:,1)=RI;
C(:,:,2)=GI;
C(:,:,3)=B;

figure,imshow(C/255);
title('Lighting compensation');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%% EXTRACT SKIN


%%%%%%%%%%%%%%%%%%%%%%

4 JnU/M.Sc. in CSE(e)/7th Bach


YCbCr=rgb2ycbcr(C);
Cr=YCbCr(:,:,3);

S=zeros(H,W);
[SkinIndexRow,SkinIndexCol] =find(10<Cr & Cr<45);
for i=1:length(SkinIndexRow)
S(SkinIndexRow(i),SkinIndexCol(i))=1;
end

figure,imshow(S);
title('skin');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%

%%%%%%%%%%%%%%%% REMOVE NOISE


%%%%%%%%%%%%%%%%%%%%%%%%%%%%
SN=zeros(H,W);
for i=1:H-5
for j=1:W-5
localSum=sum(sum(S(i:i+4, j:j+4)));
SN(i:i+5, j:j+5)=(localSum>12);
end
end

figure,imshow(SN);
title('skin with noise removal');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%

%%%%%%%%%%%%%%% FIND SKIN COLOR BLOCKS %%%%%%%%%%%%%%%%%

L = bwlabel(SN,8);
BB = regionprops(L, 'BoundingBox');
bboxes= cat(1, BB.BoundingBox);
widths=bboxes(:,3);
heights=bboxes(:,4);
hByW=heights./widths;

lenRegions=size(bboxes,1);
foundFaces=zeros(1,lenRegions);

rgb=label2rgb(L);
figure,imshow(rgb);
title('face candidates');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%

%%%%%%%%%%%%%%%% CHECK FACE CRITERIONS


%%%%%%%%%%%%%%%%%%%%%%%%%%%

5 JnU/M.Sc. in CSE(e)/7th Bach


for i=1:lenRegions

% 1st criteria: height to width ratio, computed above.


if (hByW(i)>1.75 || hByW(i)<0.75)
% this cannot be a mouth region. discard
continue;
end

% implemented by me: Impose a min face dimension constraint


if (heights(i)<20 && widths(i)<20)
continue;
end

% get current region's bounding box


CurBB=bboxes(i,:);
XStart=CurBB(1);
YStart=CurBB(2);
WCur=CurBB(3);
HCur=CurBB(4);

% crop current region


rangeY=int32(YStart):int32(YStart+HCur-1);
rangeX= int32(XStart):int32(XStart+WCur-1);
RIC=RI(rangeY, rangeX);
GIC=GI(rangeY, rangeX);
BC=B(rangeY, rangeX);

figure, imshow(RIC/255);
title('Possible face R channel');

% 2nd criteria: existance & localisation of mouth

M=zeros(HCur, WCur);

theta=acos( 0.5.*(2.*RIC-GIC-BC) ./ sqrt( (RIC-GIC).*(RIC-GIC) + (RIC-BC).*(GIC-BC) ) );


theta(isnan(theta))=0;
thetaMean=mean2(theta);
[MouthIndexRow,MouthIndexCol] =find(theta<thetaMean/4);
for j=1:length(MouthIndexRow)
M(MouthIndexRow(j),MouthIndexCol(j))=1;
end

% now compute vertical mouth histogram


Hist=zeros(1, HCur);

for j=1:HCur
Hist(j)=length(find(M(j,:)==1));
end

wMax=find(Hist==max(Hist));
wMax=wMax(1); % just take one of them.

6 JnU/M.Sc. in CSE(e)/7th Bach


if (wMax < WCur/6)
%reject due to not existing mouth
continue;
end

figure, imshow(M);
title('Mouth map');

% 3rd criteria: existance & localisation of eyes

eyeH=HCur-wMax;
eyeW=WCur;

YC=YEye(YStart:YStart+eyeH-1, XStart:XStart+eyeW-1);

E=zeros(eyeH,eyeW);
[EyeIndexRow,EyeIndexCol] =find(65<YC & YC<80);
for j=1:length(EyeIndexRow)
E(EyeIndexRow(j),EyeIndexCol(j))=1;
end

% check if eyes are acceptable.


EyeExist=find(Hist>0.3*wMax);
if (~(length(EyeExist)>0))
continue;
end

foundFaces(i)=1;
numFaceFound=numFaceFound+1;

end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%

disp('Number of faces found');


numFaceFound;

if (numFaceFound>0)
disp('Indices of faces found: ');
ind=find(foundFaces==1);
CurBB=bboxes(ind,:);
CurBB
else
close all;
end

end

7 JnU/M.Sc. in CSE(e)/7th Bach


Character Recognition

What is Character Recognition?


Character recognition is a process which allows computers to recognize written or printed characters such
as numbers or letters and to change them into a form that the computer can use.
On the other definition, a magnetic or optical process used to detect the shape of individual characters
printed or written on paper.

Model Overview:

We use a NN for our task. It consists of convolutional NN (CNN) layers, recurrent NN (RNN) layers and
a final Connectionist Temporal Classification (CTC) layer. Fig. 2 shows an overview of our HTR
system.

Fig. 2: Overview of the NN operations (green) and the data flow through the NN (pink).

We can also view the NN in a more formal way as a function (see Eq. 1) which maps an image (or matrix)
M of size W×H to a character sequence (c1, c2, …) with a length between 0 and L. As you can see, the
text is recognized on character-level, therefore words or texts not contained in the training data can be
recognized too (as long as the individual characters get correctly classified).

8 JnU/M.Sc. in CSE(e)/7th Bach


Eq. 1: The NN written as a mathematical function which maps an image M to a character sequence (c1,
c2 …).

Character Recognition Code for matlab:

%% Character Recognition Example (I): Image Pre-processing

%% Manual Cropping
img = imread('sample.bmp');
imshow(img)
imgGray = rgb2gray(img);
imgCrop = imcrop(imgGray);
imshow(imgCrop)

%% Resizing
imgLGE = imresize(imgCrop, 5, 'bicubic');
imshow(imgLGE)

%% Rotation
imgRTE = imrotate(imgLGE, 35);
imshow(imgRTE)

%% Binary Image
imgBW = im2bw(imgLGE, 0.90455);
imshow(imgBW)

9 JnU/M.Sc. in CSE(e)/7th Bach


Backpropagation Algorithm
Backpropagation is a method used in artificial neural networks to calculate a gradient that is needed in
the calculation of the weights to be used in the network.
Backpropagation is happening in two main parts. First is called propagation and it is contained from
these steps:
1. Initialize weights of neural network
2. Propagate inputs forward through the network to generate the output values
3. Calculate the error
4. Propagation of the output back through the network in order to generate the error of all output and
hidden neurons.
The second part of backpropagation updates weights of connections:
1. The weight’s output error and input are multiplied to find the gradient of the weight.
2. A certain percentage, defined by learning rate (more on this a bit later) of the weight’s gradient is
subtracted from the weight.

10 JnU/M.Sc. in CSE(e)/7th Bach


Logic Gate Simulator

First, we need to know that the Perceptron algorithm states that:


Prediction (y`) = 1 if Wx+b >= 0 and 0 if Wx+b<0
Also, the steps in this method are very similar to how Neural Networks learn, which is as follows;

 Initialize weight values and bias


 Forward Propagate
 Check the error
 Backpropagate and Adjust weights and bias

11 JnU/M.Sc. in CSE(e)/7th Bach


AND Gate

Input Output
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1

Fig: Truth table

Row 1:
From W1X1 + W2X2 + b, initializing W1, W2, as 1 and b as -1.5, we get;
X1(1) + X2(1) + (-1.5) = 0x1 + 0x1 - 1.5 = -1.5, ⸫ φ (-1.5) = 0
⸫ Output, Y= 0
Row 2:
X1(1) + X2(1) + (-1.5) = 0x1 + 1x1 - 1.5 = -0.5, ⸫ φ (-0.5) = 0
⸫ Output, Y= 0
Row 3:
X1(1) + X2(1) + (-1.5) = 1x1 + 0x1 - 1.5 = -0.5, ⸫ φ (-0.5) = 0
⸫ Output, Y= 0
Row 4:
X1(1) + X2(1) + (-1.5) = 1x1 + 1x1 - 1.5 = 0.5, ⸫ φ (0.5) = 1
⸫ Output, Y= 1

12 JnU/M.Sc. in CSE(e)/7th Bach


NAND Gate

Input Output
X1 X2 Y
0 0 1
0 1 1
1 0 1
1 1 0
Fig: Truth table

Row 1:
From W1X1 + W2X2 + b, initializing W1, W2, as -1 and b as 1, we get;
X1(-1) + X2(-1) + 1 = 0x(-1) + 0x(-1) + 1 = 1, ⸫ φ(1) = 1
⸫ Output, Y= 1
Row 2:
X1(-1) + X2(-1) + 1 = 0x(-1) + 1x(-1) + 1=0 , ⸫ φ(0) = 1
⸫ Output, Y= 1
Row 3:
X1(-1) + X2(-1) + 1 = 1x(-1) + 0x(-1) + 1 = 0, ⸫ φ(0) = 1
⸫ Output, Y= 1
Row 4:
X1(-1) + X2(-1) + 1 = 1x(-1) + 1x(-1) + 1= -1, ⸫ φ(-1) = 0
⸫ Output, Y= 0

13 JnU/M.Sc. in CSE(e)/7th Bach


OR Gate

Input Output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1

Fig: Truth table

Row 1:
From W1X1 + W2X2 + b, initializing W1, W2, as 1 and b as -1, we get;
X1(1) + X2(1) + (-1) = 0x1 + 0x1 - 1 = -1, ⸫ φ(-1) = 0
⸫ Output, Y= 0
Row 2:
X1(1) + X2(1) + (-1) = 0x1 + 1x1 - 1 = 0, ⸫ φ(0) = 1
⸫ Output, Y= 1
Row 3:
X1(1) + X2(1) + (-1) = 1x1 + 0x1 - 1 = 0, ⸫ φ(0) = 1
⸫ Output, Y= 1
Row 4:
X1(1) + X2(1) + (-1) = 1x1 + 1x1 - 1 = 1, ⸫ φ(1) = 1
⸫ Output, Y= 1

14 JnU/M.Sc. in CSE(e)/7th Bach


NOT Gate

Input Output
X1 Y
0 1
1 0

Fig: Truth table

Row 1:
From W1X1 + W2X2 + b, initializing W1, W2, as 0(since single layer) and b as 0.5, we get;
X1(-1) + 0.5 = 0x(-1) + 0.5= 0.5, ⸫ φ(0.5) = 1
⸫ Output, Y= 1
Row 2:
X1(-1) + 0.5 = 1x(-1) + 0.5 = -0.5, ⸫ φ(-0.5) = 0
⸫ Output, Y= 0

15 JnU/M.Sc. in CSE(e)/7th Bach


NOR Gate

Input Output
X1 X2 Y
0 0 1
0 1 0
1 0 0
1 1 0

Fig: Truth table

Row 1:
From W1X1 + W2X2 + b, initializing W1, W2, as -1 and b as 0.5, we get;
X1(-1) + X2(-1) + 0.5= 0x(-1) + 0x(-1) + 0.5 = 0.5, ⸫ φ(0.5) = 1
⸫ Output, Y= 1
Row 2:
X1(-1) + X2(-1) + 0.5= 0x(-1) + 1x(-1) + 0.5 = -0.5, ⸫ φ(-0.5) = 0
⸫ Output, Y= 0
Row 3:
X1(-1) + X2(-1) + 0.5= 1x (-1) + 0(-1) + 0.5= -0.5, ⸫ φ(-0.5) = 0
⸫ Output, Y= 0
Row 4:
X1(-1) + X2(-1) + 0.5= 1x(-1) + 1x(-1) + 0.5 = -1.5, ⸫ φ(-1.5) = 0
⸫ Output, Y= 0

16 JnU/M.Sc. in CSE(e)/7th Bach


XNOR Gate

Input Output
X1 X2 Y
0 0 1
0 1 0
1 0 0
1 1 1

Fig: Truth table

The Boolean representation of an XNOR gate is;


X1X2 + X1`X2`
Where `` means inverse.
From the expression, we can say that the XNOR gate consists of an AND gate (x1x2), a NOR
gate (x1`x2`), and an OR gate.
This means we will have to combine 3 perceptron’s:

 AND (x1+x2–1.5)
 NOR (-x1-x2+0.5)
 OR (x1+x2–1)

Row 1:
X1=0, X1(1) + X2(1) - 1.5 = 0x1 + 0x1 - 1.5 = -1.5, ⸫ φ(-1.5) = 0
X2=0, X1(-1) + X2(-1) + 0.5 = 0x(-1) + 0x(-1) + 0.5 = 0.5, ⸫ φ(0.5) = 1
⸫ Output= 0x1 +1x1 – 1= 0, ⸫ φ(0) = 1

17 JnU/M.Sc. in CSE(e)/7th Bach


Row 2:
X1=0, X1(1) + X2(1) - 1.5 = 0x1 + 1x1 - 1.5 = -0.5, ⸫ φ(-0.5) = 0
X2=1, X1(-1) + X2(-1) + 0.5 = 0x(-1) + 1x(-1) + 0.5 = -0.5, ⸫ φ(-0.5) = 0
⸫ Output= 0x1 +0x1 – 1= -1, ⸫ φ(-1) = 0
Row 3:
X1=1, X1(1) + X2(1) - 1.5 = 1x1 + 0x1 - 1.5 = -0.5, ⸫ φ(-0.5) = 0
X2=0, X1(-1) + X2(-1) + 0.5 = 1x(-1) + 0x(-1) + 0.5 = -0.5, ⸫ φ(-0.5) = 0
⸫ Output= 0x1 +0x1 – 1= -1, ⸫ φ(-1) = 0
Row 4:
X1=1, X1(1) + X2(1) - 1.5 = 1x1 + 1x1 - 1.5 = 0.5, ⸫ φ(0.5) = 1
X2=1, X1(-1) + X2(-1) + 0.5 = 1x(-1) + 1x(-1) + 0.5 = -1.5, ⸫ φ(-1.5) = 0
⸫ Output= 1x1 +0x1 – 1= 0, ⸫ φ(0) = 1

XOR Gate

Input Output
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 0
Fig: Truth table

18 JnU/M.Sc. in CSE(e)/7th Bach


The Boolean representation of an XOR gate is:
X1X`2 + X`1X2
We first simplify the Boolean expression
X`1X2 + X1X`2 + X`1X1 +X`2X2
X1(X`1 + X`2) + X2(X`1 + X`2)
(X1 + X2)(X`1 + X`2)
(X1 + X2)(X1X2)`
From the simplified expression, we can say that the XOR gate consists of an OR gate (X1 + X2), a
NAND gate (-X1-X2+1) and an AND gate (X1+X2–1.5).
This means we will have to combine 2 perceptron’s:
 OR (X1+X2–1)
 NAND (-X1-X2+1)
 AND (X1+X2–1.5)

Row 1:
X1=0, X1(1) + X2(1) - 1 = 0x1 + 0x1 - 1 = -1, ⸫ φ(-1) = 0
X2=0, X1(-1) + X2(-1) + 1 = 0x(-1) + 0x(-1) + 1 = 1 , ⸫ φ(1) = 1
⸫ Output= 0x1 +1x1 – 1.5= -0.5, ⸫ φ(-0.5) = 0
Row 2:
X1=0, X1(1) + X2(1) - 1 = 0x1 + 1x1 - 1 = 0, ⸫ φ(0) = 1
X2=1, X1(-1) + X2(-1) + 1 = 0x(-1) + 1x(-1) + 1 = 0, ⸫ φ(0) = 1
⸫ Output= 1x1 +1x1 – 1.5= 0.5, ⸫ φ(0.5) = 1
Row 3:
X1=1, X1(1) + X2(1) - 1 = 1x1 + 0x1 - 1 = 0, ⸫ φ(0) = 1
X2=0, X1(-1) + X2(-1) + 1 = 1x(-1) + 0x(-1) + 1 = 0, ⸫ φ(0) = 1
⸫ Output= 1x1 +1x1 – 1.5= 0.5, ⸫ φ(1) = 1
Row 4:
X1=1, X1(1) + X2(1) - 1 = 1x1 + 1x1 - 1 = 1, ⸫ φ(1) = 1
X2=1, X1(-1) + X2(-1) + 1 = 1x(-1) + 1x(-1) + 1 = -1, ⸫ φ(-1) = 0
⸫ Output= 1x1 +0x1 – 1.5= -0.5, ⸫ φ(-0.5) = 0

19 JnU/M.Sc. in CSE(e)/7th Bach

You might also like