Research On Face Recognition Based On CNN
Research On Face Recognition Based On CNN
Abstract. With the development of deep learning, face recognition technology based
on CNN (Convolutional Neural Network) has become the main method adopted in the
field of face recognition. In this paper, the basic principles of CNN are studied, and
the convolutional and downsampled layers of CNN are constructed by using the
convolution function and downsampling function in opencv to process the pictures. At
the same time, the basic principle of MLP Grasp the full connection layer and
classification layer, and use Python's theano library to achieve. The construction and
training of CNN model based on face recognition are studied. To simplify the CNN
model, the convolution and sampling layers are combined into a single layer. Based on
the already trained network, greatly improve the image recognition rate.
1. Introduction
Intelligent systems appear more and more in people's lives, and often need to be identified when using
intelligent systems. Traditional methods of identification mainly identify individuals with some
personal characteristics, such as identity documents, such as documents and keys, which have obvious
shortcomings. They are easily forgotten, lost or faked. If you use some of the personal characteristics
to identify the effect will be quite good, such as: face recognition, fingerprinting and so on.
In terms of algorithms, there are sharing parameters between the convolution layer and the
convolution layer of CNN. The advantage of this is that the memory requirements are reduced, and the
number of parameters to be trained is correspondingly reduced. The performance of the algorithm is
therefore improved. At the same time, in other machine learning algorithms, the pictures need us to
perform preprocessing or feature extraction. However, we rarely need to do these operations when
using CNN for image processing. This is something other machine learning algorithms cannot do.
There are also some shortcomings in depth learning. One of them is that it requires a lot of samples to
construct a depth model, which limits the application of this algorithm. Today, very good results have
been achieved in the field of face recognition and license plate character recognition, so this topic will
do some simple research on CNN-based face recognition technology.
input layer, through each layer processing, and then into the other hierarchy, each layer has a
convolution kernel to obtain the most significant data characteristics. The previously mentioned
obvious features such as translation, rotation and the like can be obtained by this method.
hW ,b x f W T x f 3
i 1
Wi xi b (1)
This unit is also called Logistic regression model. When many neurons are linked together, and
when they were layered, the structure can now be called a neural network model. Figure 1 shows a
neural network with hidden layers.
In this neural network, X1, X2, X3 are the input of the neural network. +1 is the offset node, also
known as the intercept term. The leftmost column of this neural network model is the input layer of
the neural network, the rightmost column of which is the output layer of the neural network. The
middle layer of the network model is a hidden layer, which is fully connected between the input layer
and the output layer. The values of all the nodes in the network model cannot be seen in the training
sample set. By observing this neural network model, we can see that the model contains a total of 3
input units, 3 hidden units and 1 output unit.
Now, use nl to represent the number of layers in the neural network, and the number of layers in
this neural network is 3. Now mark each layer, the first layer can be expressed by Ll, then the output
layer of the neural network L1, its output layer is Lnl, in this neural network, the following parameters
exist:
W, b W 1 , b1 ,W 2 , b 2 (2)
Wij(l) is the connection parameter between the jth cell of layer 1 and the i th cell of layer l+1, and bil
is the offset of the i th cell of layer 1+1. In this neural network model, set ai(l) to represent the output
value of the first few cells in this layer. Let l denote this layer and i the first few cells in this layer.
2
2nd International Symposium on Resource Exploration and Environmental Science IOP Publishing
IOP Conf. Series: Earth and Environmental Science 170 (2018)
1234567890 ‘’“” 032110 doi:10.1088/1755-1315/170/3/032110
Given that the set of parameters W and b have been given, we can use the formula hw,b(x) to calculate
the output of this neural network. The following formulas are calculation steps:
a12 f W111 x1 W121 x1 W131 x1 b11
a2
2 f W 1
x W x W
21 1
1
22 1
1
x b
23 1
1
2
(3)
a32 f W 1
x W321 x1 W331
31 1 x b
1 3
1
hW ,b x a13 f W112 a12 W122 a22 W132 a32 b1 2
The calculation of forward propagation is as shown in equation (3). Neural network training
methods and Logistic regression model is similar, but due to the multi-layered neural network, but also
the need for gradient descent + chain derivation rule.
As shown in Figure 2, a size of 32*32 images through the input layer into the network structure.
The layer in the input layer is a convolution layer, which is represented by C1. The number of
convolution kernels is 6 and the size is 5*5. After this layer processing, the number of neurons is
28*28*6, trainable parameters are (5*5+1)*6. The next layer of the C1 layer is a downsampled layer,
shown in the figure, whose input is the output of the layer convolutional layer, 28*28 in size, 2*2 in
the spatial neighborhood of the sample, and the way it is sampled Is to add 4 numbers, multiply them
by a trainable parameter, and then add a trainable offset to output the result through the sigmoid
function. The number of neurons in layer S2 is 14*14*6. After passing through the S2-layer sampling
tube, the size of each feature plot it gets is a quarter of the output from its previous convolution layer.
The layer after layer S2 is still a convolutional layer, with a total of 16 convolution kernels, and the
size of each convolution kernel is the same as that of C1. This layer is called the C3 layer in the above
figure. The size of the output feature layer in this layer is 10*10. The 6 features in the S2 layer are
connected with all the features in the C3 layer. The features obtained in this layer the figure is a
different combination of the output features of the previous layer. The S4 layer is the same as the S2
3
2nd International Symposium on Resource Exploration and Environmental Science IOP Publishing
IOP Conf. Series: Earth and Environmental Science 170 (2018)
1234567890 ‘’“” 032110 doi:10.1088/1755-1315/170/3/032110
layer, and its sampling type is 16. So far, the network structure has reduced the number of neurons to
400. The next layer of C5 is still a convolutional layer, which is fully connected with the previous
layer, the size of its convolution kernel is still 5*5, this time C5 layer image processing, the image size
becomes 5-5+1=1, which means that only one neuron output, in this layer contains a total of 120
convolution kernel, so the final output of neurons is 120. The last layer of F6, this layer is a fully
connected layer, by calculating the input vector and the weight vector between the dot product, plus a
bias, and finally through the sigmoid function to deal with the results.
4
2nd International Symposium on Resource Exploration and Environmental Science IOP Publishing
IOP Conf. Series: Earth and Environmental Science 170 (2018)
1234567890 ‘’“” 032110 doi:10.1088/1755-1315/170/3/032110
The first is the input layer of the image input, in this design collected a total of 44 people face, each
person's face number is 10, and a total of four 440 sample data, the size of each face image is
57*47=2679. And each image is a grayscale image. The face data set after the face image is collected
and processed is the input of the convolution neural network.
The first layer after the input layer is the first convolutional and downsampling layer, the image
data input in this layer is 57*47, the size of the convolution kernel is 5 * 5, so the resulting image size
after convolution is (57-5+1)*(47-5+1)=(53, 43). After the convolution operation, the image is
downsampled to the maximum, resulting in an image size of 26*21.
The input to the second convolution plus sample layer is the output of the first convolution plus
sample layer, so the size of the input image in this layer is 26*21. Similar to the operation of
convolution plus sampling layer in the first layer, the image is convolution processed first, and the size
of the convolutioned image is 22*17. Subsequent image under the maximum downsampling operation,
the resulting image size is 11*8.
4. Summary
This paper studies the basic structure of CNN, as well as the basic principles of CNN. Convolutional
and downsampled layers of CNN are constructed using the opencv convolution function and the
downsampling function. At the same time, the basic principle of multi-layer perceptron MLP is
studied to grasp the full connection layer and classification layer, and the use of the Python Python
library to achieve. This article simplifies the CNN model by layering the convolutional and sampling
layers together. The model consists of two convolution plus sampling layers, a fully connected layer,
and a Softmax classification layer. This model is used to train the face data set to optimize the model
parameters.
References
[1] Wang Jue, Shi Chunyi. Machine Learning [J]. Journal of Guangxi Normal University (Natural
Science Edition), 2013.
[2] Zhang Cuiping, su guangda.Review of face recognition technology [J] .Journal of Image and
Graphics, 2015.
[3] Guo Wei, Cai Ning. Convolutional Network Coding [J]. Journal of China Academy of
Electronic Science and Technology, 2016.