A Face Emotion Recognition Method Using Convolutional Neural Network and Image Edge Computing
A Face Emotion Recognition Method Using Convolutional Neural Network and Image Edge Computing
Received October 3, 2019, accepted October 16, 2019, date of publication October 28, 2019, date of current version November 13, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2949741
ABSTRACT To avoid the complex process of explicit feature extraction in traditional facial expression
recognition, a face expression recognition method based on a convolutional neural network (CNN) and
an image edge detection is proposed. Firstly, the facial expression image is normalized, and the edge
of each layer of the image is extracted in the convolution process. The extracted edge information is
superimposed on each feature image to preserve the edge structure information of the texture image. Then,
the dimensionality reduction of the extracted implicit features is processed by the maximum pooling method.
Finally, the expression of the test sample image is classified and recognized by using a Softmax classifier.
To verify the robustness of this method for facial expression recognition under a complex background,
a simulation experiment is designed by scientifically mixing the Fer-2013 facial expression database with the
LFW data set. The experimental results show that the proposed algorithm can achieve an average recognition
rate of 88.56% with fewer iterations, and the training speed on the training set is about 1.5 times faster than
that on the contrast algorithm.
INDEX TERMS Face expression recognition, convolutional neural network, edge computing, deep learning,
image edge detection.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 7, 2019 159081
H. Zhang et al.: Face Emotion Recognition Method Using CNN and Image Edge Computing
autistic children and help doctors understand themselves. representative cross-transform features. The work in [21]
Psychological changes in autistic children, so as to develop presents a hierarchical Bayesian topic model based on pose to
more accurate treatment programs [10]. The application of solve the challenging problem in multi-user facial expression
facial expression recognition in teaching field can enable the recognition. The model combines local appearance features
teaching system to capture and record students’ emotional with global geometric information and learns intermediate
changes in learning, and provide better reference for teach- representation before recognizing expression. By sharing a
ers to teach students in accordance with their aptitude. The set of functions with different postures, it provides a unified
application of facial expression recognition in traffic field solution for multi-functional facial expression recognition,
can be used to judge the fatigue state of pilots or drivers, bypassing the individual training and parameter adjustment
and to avoid the occurrence of traffic hazards by technical of each posture, so it can be extended to a large number of
means. Applying facial expression recognition to daily life, postures.
life management robots can understand people’s mental state Although the CNN algorithm has made some progress
and intention according to facial expression recognition, and in the field of facial expression recognition, it still has
then make appropriate responses, thus enhancing the experi- some shortcomings, such as too long training time and low
ence of human-computer interaction. recognition rate in the complex background. To avoid the
In the recent years, the development of facial expression complex process of explicit feature extraction in traditional
recognition technologies has been rapid and many schol- facial expression recognition, a facial expression recognition
ars have contributed to the development of facial expres- method based on CNN and image edge detection is proposed
sion recognition [11], [12]. Among them, the Massachusetts in this paper. The main innovations of this method are as
Institute of Technology Media Laboratory and Japan’s Art follows:
Media Information Science Laboratory are representative. (1) The edge of each layer of the input image is extracted,
The research of expression recognition in computer field and then the extracted edge information is superimposed on
mainly focuses on the feature extraction and feature classi- each feature image to preserve the edge structure information
fication. The so-called feature extraction refers to extract- of the texture image.
ing features that can be used for classification from input (2) In this paper, the maximum pooling method is used to
pictures or video streams [13], [14]. There are many meth- reduce the dimension of the extracted implicit features, which
ods of feature extraction. According to the type of data shortens the training time of the convolutional neural network
input, the existing methods of feature extraction can be model.
divided into two categories: one is based on static images and (3) The Fer-2013 facial expression database and LFW
the other is based on a dynamic sequence. Feature extrac- (Labeled Faces in the wild) data set are scientifically mixed to
tion methods based on static images include Gabor wavelet design a simulation experiment, which proves that the method
transform [15], Haar wavelet transform [16], Local Binary proposed in this paper has a certain robustness for facial
Pattern (LBP), and Active Appearance Models (AAM) [17]. expression recognition under a complex background.
Generally speaking, the dimension of feature is large before
and after the completion of feature, and thus the dimen- II. FACIAL EXPRESSION DATA PREPROCESSING
sion reduction is usually carried out [18]. The facial expres- Because the original pictures of facial expressions have com-
sion classification refers to the use of specific algorithms plex background, different sizes, different shades and other
to identify the categories of facial expressions according to factors, a series of image pre-processing processes have to
the extracted features. Commonly used methods of facial be completed before facial expressions are input into the
expression classification are Hidden Markov Model (HMM), network for training. Firstly, we locate the face in the image
Support Vector Machine (SVM), AdaBoost, and Artificial and cut out the face image. Then, we normalize the face
Neural Networks (ANN) [6]. To avoid the complex process image to a specific size. Next, we equalize the histogram of
of explicit feature extraction and low-level data manipulation the image to reduce the influence of illumination and other
in traditional facial expression recognition, a fast R-CNN factors. Finally, we extract the edge of each layer of the image
(Faster Regions with Convolutional Neural Network Fea- in the convolution process. The extracted edge information
tures) facial expression recognition method is proposed in is superimposed on each feature image to preserve the edge
the literature [19]. The trainable convolution kernel is used structure information of texture image.
to extract the implicit features, and the maximum pool is
used to reduce the dimension of the extracted implicit fea- A. FACE DETECTION AND LOCATION
tures. The work in [20] presents a Feature Redundancy- This paper uses a Haar classifier for human detection. The
Reduced Convolutional Neural Network (FRR-CNN). Unlike Haar classifier is trained by Haar-like small features and an
traditional CNN, the convolution core of FRR-CNN diverges integral graph method combined with the AdaBoost algo-
due to the more discriminant differences between feature rithm. The Haar-like is a commonly used texture descriptor,
maps at the same level, resulting in fewer redundant fea- and its main features are linear, edge, center and diagonal.
tures and a more compact image representation. In addition, Adaboost is an improvement of Boosting algorithm and its
the transformation invariant pool strategy is used to extract core idea is to form a strong classifier by iterating not only
weak classifiers but also weak classifiers. The Viola-Jones region B. According to formula (3) and formula (4), the for-
detector is a milestone in the history of face detection. It has mula for calculating eigenvalue is as follows.
been widely used because of its high efficiency and fast
detection. This method uses the Haar-like to extract facial T = ii(5) − ii(4) + ii(3) − ii(2) − (ii(2)
features, and uses an integral graph to realize fast calcula- −ii(1)) − (ii(6) − ii(5)) (5)
tion of Haar-like features, and screens out important features
It can be seen that the eigenvalues of rectangular fea-
from a large number of Haar-like features. Then, we use the
tures are only related to the integral graph of rectangu-
Adaboost algorithm to train and integrate the weak classi-
lar endpoints. Through simple integral graph addition and
fier into a strong classifier. Finally, several strong classifiers
subtraction operation, the eigenvalues can be calculated,
are cascaded in series to improve the accuracy of the face
which greatly improves the speed of target detection. Next,
detection.
the extracted Haar-like features are used to train the classifier,
The Haar-like feature can reflect the gray level change of
and the AdaBoost algorithm is used to train the classifier.
image, so it is very effective to describe human face, because
Finally, the trained classifier is used to extract the face from
many features of human face have obvious contrast change
the image.
characteristics. However, the calculation of eigenvalues is
very time-consuming. In order to improve the calculation
B. SCALE NORMALIZATION
speed, this paper uses the integral graph method to calculate
the Haar like eigenvalues. The concept of an integral graph is Because the input of the network is a fixed sized picture,
expressed in Figure 1 (a). The integral graph of the coordinate before the picture is input into the network, the original pic-
A (x, y) in a graph is defined as the sum of all the pixels in its ture should be normalized to generate a specific size picture.
upper left corner. Let point (x, y) in the original picture be normalized and
X mapped to point x 0 , y0 . The mapping is as follows:
A (x, y) ii (x, y) = i x 0 , y0
(1) 0
x sx 0 0 x
x 0 ≤x,y0
y0 = 0 sy 0 y (6)
1 0 0 1 1
where sx represents the scaling ratio of the image in the
direction of x axis and sy represents the scaling ratio of the
image in the direction of y axis. In the process of image
scaling, bilinear interpolation algorithm is also needed to fill
the image. A, B, C and D are the four points around the
pixel (x, y). The corresponding gray values are g (A), g (B),
FIGURE 1. Integral graph method to calculate eigenvalues. g (C), g (D). To get the gray value of point (x, y) and calculate
the gray value of points E and F, the formula is as follows:
Here, ii (x, y) represents the integral image. i x 0 , y0 rep-
resents the original image; for gray image, here represents g (E) = (x − xD ) (g (C) − g (D)) + g (D) (7)
the gray value and for color image, here represents the color g (F) = (x − xA ) (g (B) − g (A)) + g (A) (8)
value.
The pixel value of an area can be calculated by using the xA and xD are the abscissa of point A and point D, respectively.
integral graph of the end points of the area, as shown in The gray scale formula of (x, y) is as follows:
Figure 1 (b). The pixel value of region D can be calculated by g (x, y) = (y − yD ) (g (F) − g (E)) + g (E) (9)
S (D) = ii (4) + ii (1) − ii (2) − ii (3) (2) where yD represents the ordinates of CD points. Through
where ii (1) represents the pixel value of region A, normalization, the input image is scaled to 128 × 128 size.
ii (2) represents the pixel value of region A + B, ii (3) rep- As shown in Figure 2, it is a normalized contrast map.
resents the pixel value of region A + C, ii (4) represents the
pixel value of regions A + B + C+ D. The eigenvalues of
rectangular features can be calculated by integral graphs of
feature endpoints. Taking the edge feature a as an example,
the eigenvalue calculation can be expressed by Fig. 1 (c). The
pixel values of point A and point B are:
S (A) = ii (5) + ii (1) − ii (2) − ii (4) (3)
S (B) = ii (6) + ii (2) − ii (5) − ii (3) (4)
According to the definition, the eigenvalue of rectangular
feature is the pixel value of region A minus the pixel value of FIGURE 2. Contrast before and after normalization.
C. GRAY LEVEL EQUALIZATION Kirsch edge operator is used to extract image edge informa-
In the actual image acquisition process, it is easy to be tion. The template of eight directions of Kirsch operator is
affected by illumination, shadows and other factors, which respectively.
makes the collected image show a state of uneven distribution
5 5 5 −3 5 5
of light and shade, which will increase the difficulty of feature
a0 = −3 0 −3 , a1 = −3 0 5 ,
extraction. Therefore, it is necessary to average the gray level
−3 −3 −3 −3 −3 −3
of the image to enhance the contrast of the image. In this
paper, the Histogram Equalization (HE) method is used to −3 −3 5 −3 −3 −3
process images. The basic idea is to transform the histogram a2 = −3 0 5 , a3 = −3 0 5
of the original graph into a uniform distribution form [22]. −3 −3 5 −3 5 5
If the gray level of the gray image is L, the size is M × N , and −3 −3 −3 −3 −3 −3
the number of pixels in the ri gray level is E, the correspond- a4 = −3 0 −3 , a5 = 5 0 −3 ,
ing probability of gray level occurrence is as follows: 5 5 5 5 5 −3
ni 5 −3 −3 5 5 −3
Pr (ri ) = , i = 0, 1, · · · , L − 1 (10)
M ×N a6 = 5 0 −3 , a7 = 5 0 −3 (13)
5 −3 −3 −3 −3 −3
Subsequently, the cumulative distribution function is calcu-
lated using the following equation. Assuming that any pixel PA in the image is surrounded by
i the gray level of 3 × 3 area, and that gi (i = 0, 1, · · · , 7) is
the gray level of point A obtained by convolution of the i + 1
X
T (ri ) = Pr rj , i = 0, 1, · · · , L − 1
(11)
j=0
template of the Kirsch edge operator of the image, the gray
level of point A can be obtained by the convolution of the D
Finally, the image histogram is averaged using the follow- template of the Kirsch edge operator
ing mapping relations:
g0 = 5 × (a3 +a4 +a5)−3(a2 +a6 )−3(a1 +a0 +a7) (14)
ej = INT [(emax − emin ) T (r) + emin + 0.5] (12)
j=0,1,··· ,L−1 In Equation (14), ai (i = 0, 1, · · · , 7) is the neighborhood
The processing results are shown in Figure 3. When the pixel of the arbitrary point A. The gray value of point A in
histogram of the image is completely uniform, the entropy of other directions can be calculated by the same method of
the image is the largest and the contrast of the image is the Equation (14). After processing, the gray value of point A is
largest. In fact, gray level equalization realizes the uniform calculated by
distribution of image histogram, which enhances the contrast gA = max (gi ) i = 0, 1, · · · , 7 (15)
of the image and makes the details clearer, and is conducive
to the extraction of facial features.
III. FACE EXPRESSION RECOGNITION NETWORK
MODEL BASED ON CNN
The essence of deep learning method is to construct a deep
neural network similar to human brain structure, which learns
more advanced feature expression of data layer by layer
through multi-hidden non-linear structure. This mechanism
of automatically learning the internal rules of large data
makes the extracted features have more essential character-
FIGURE 3. Grayscale equalization before and after contrast.
ization of the data, and thus the classification results can
be greatly enhanced. For a two-dimensional image input,
the neural network model can interpret it layer-by-layer from
D. IMAGE EDGE DETECTION the pixels initially understood by the computer to edges, parts,
The edge information of an image is often reflected in the area contours of objects, objects understood by the human brain,
where the gradient information of the image changes dramat- and then can classify it directly within the model to obtain
ically. The edge of the image gives people a stronger visual recognition results.
sense. Therefore, the edge information of the image cannot The CNN is a feedforward neural network, which can
be ignored in the process of texture synthesis. Some edge extract features from a two-dimensional image and optimize
information of the image is lost, which results in the blurred network parameters by using back propagation algorithm.
edge information in the final synthesis result and affects the Common CNNs usually consist of three basic layers: a con-
table. In this paper, we extract the edge of each layer of volution layer, a pooling layer and a connective layer. Each
the image in the convolution process, and then superimpose layer is composed of several two-dimensional planes, that is,
the extracted edge information on each feature map, which feature maps, and each feature map has many neurons. In con-
preserves the edge structure information of texture image. volution neural network, the input layer is a two-dimensional
then convolutes the characteristic graphs of convolution layer Among them, ylj and yl−1j represent the j-th feature map
C1 output. 64 feature graphs are obtained. The size of each of the current layer and the first layer respectively; down (·)
feature graph is (92 − 5 + 1) × (92 − 5 + 1) = 88 × 88. represents a down sampling function; βjl and blj represent the
In convolution layer C3, 128 5 × 5 convolution kernels multiplicative and additive biases of the j-th feature map of
are used to convolute the characteristic maps of pool layer the current layer, respectively. In the experiment, βjl = 1,
S1 output, and 128 feature maps are obtained. The size of blj = 0 and θ (·) are used as activation functions, and identical
each feature map is (44 − 5 + 1) × (44 − 5 + 1) = 40 × 40. functions are used in the experiment.
The principle of weight sharing is that the statistical char- After sharing the local receptive fields and weights,
acteristics of one part of an image are similar to those of the number of training parameters is greatly reduced, but the
other parts, so the same convolution kernel can be used to dimension of the feature map is not much reduced. There
extract features for all positions on the image. However, are two problems. Firstly, if the dimension of feature graph
it is not enough to use only one convolution kernel to learn is too large, the number of training parameters generated by
the features. Therefore, in the actual training of convolu- full connection will be very large; secondly, the computer will
tion neural network, many convolution kernels are used to waste a lot of time on convolution calculation in the process
increase the diversity of feature mapping. Each kind of con- of training network.
volution can get the mapping plane of different features
of the image. By using weight sharing, not only abundant C. FULL CONNECTION LAYER
image information can be obtained, but also the number The input of the full connection layer must be a
of parameters needed for network training can be greatly one-dimensional array, whereas the output of the previous
reduced. Under the condition of reasonable control of net- pooling layer S2 is a two-dimensional array. First, the two-
work structure, the generalization ability of convolutional dimensional array corresponding to each feature graph is
neural network can be enhanced. The feature extracted by converted into a one-dimensional array, and then 128 one-
convolution operation can be directly used to train classifiers, dimensional arrays are connected in series to a feature vector
but it still faces huge computational challenges. In order to of 51200 dimensions (20 × 20 × 128 = 51200) as the full
further reduce the parameters, a down sampling operation is connection. The output of each neuron is
proposed after the convolution operation. The basis of down
sampling is that the pixels in the continuous range of the hw,b (x) = θ wT x + b (19)
image have the same characteristics (local correlation), so the
features of different locations can be aggregated and counted. where hw,b (x) denotes the output value of neurons. x denotes
For example, we can calculate the average or maximum value the input eigenvector of neurons. w denotes the weight vector.
of a specific feature in an image region. This statistical dimen- b denotes bias. b = 0 denotes the activation function in
sionality reduction method not only reduces the number of experiments. θ (·) denotes the activation function, and ReLU
parameters, prevents fitting, but also makes the model obtain function is used in experiments. The number of neurons will
the scaling invariance of the image. affect the training speed and fitting ability of the network. The
experimental results show that when the number of neurons
B. POOLING LAYER is 300, the effect is better.
The main purpose of the pooling operation is to reduce the
dimension. A pooling window of 2 × 2 step size can reduce D. SOFTMAX LAYER
the dimension of the next feature map by half. Although there The last layer of the CNN uses a Softmax classifier. The
is no direct reduction in the number of training parameters, Softmax classifier is a multi-output competitive classifier.
halving the dimension of feature graph means that the compu- When a given sample is input, each neuron outputs a value
tational complexity of convolution operation will be greatly between 0 and 1, which represents the probability that the
reduced, which greatly improves the training speed. input sample belongs to that class. Therefore, the category
If we train the Softmax classifier directly with all the corresponding to the neuron with the largest output value is
features we have learned, it will inevitably bring about the selected as the classification result.
problem of dimension disaster. To avoid this problem, a pool-
ing layer is usually used after the convolution layer to reduce E. CNN PARAMETER TRAINING
the feature dimension [25], [26]. Down sampling does not The training process of CNN is essentially the process of
change the number of feature maps, but reduces the output optimizing and updating network weights. Appropriate ini-
of feature maps, which reduces the sensitivity to translation, tialization of weights has a great impact on the updating of
scaling, rotation and other transformations. If the size of the weights. The commonly used initialization methods include
sampling window is n × n, then after one down-sampling, constant initialization, uniform distribution initialization and
the size of the feature graph becomes 1/n×1/n of the original Gauss distribution initialization. The CNN essentially imple-
feature graph. The general expression of pooling is ments a mapping relationship between input and output. The
CNN carries out supervised training. Before starting training,
ylj = θ βjl down yl−1
j + blj (18)
it initializes the ownership value of the network with some
different small random numbers. The training of convolution recognition performance. The training set and the test set
neural network is divided into two stages: contained seven expressions of happiness, fear and surprise,
1) Forward propagation stage. Sample x is extracted from respectively.
the training sample set. Its corresponding category label is y, In order to verify the performance of the proposed
ỹ is a 7-dimensional vector whose elements represent the algorithm, the proposed method and R-CNN model [19]
probability that x is divided into different categories. x is input are compared experimentally. In the experiment, the same
to the CNN network. The output of the upper layer is the input experimental environment and experimental data were used.
of the current layer. Then, the output of the current layer is Through simulation experiments, the relationship between
calculated by activation function, which is passed down layer iterations and Accuracy of training sets of the two models is
by layer. Finally, the output ỹ of the Softmax layer is obtained. obtained, as shown in Figure 5.
(2) Back propagation stage, also known as error propa-
gation stage. Calculate the error between the output ỹ of
Softmax layer and the class label vector y of a given sample
(y is a 7-dimensional vector, only the element corresponding
to the class label y is 1, the other elements are 0), and adjust
the weight parameters by minimizing the mean square error
cost function.
IV. EXPERIMENT
In this section, two sets of experiments are designed to verify
the performance of the proposed method. The first group of
experiments analyze the performance of the algorithm and
FIGURE 5. Expression recognition rate under different iterations.
verify that the training time of the algorithm is lower than
that of the traditional CNN algorithm model. The experi- From Figure 5, we can see that both the proposed method
mental data comes from the Fer-2013 expression database. and R-CNN algorithm converge. It needs to be explained that
The second group of experiments is used to verify that the this experiment takes the training set as an example. There
recognition rate of the algorithm has increased under complex are 23,924 images in the training set, 48 facial expressions
background. The experimental data comes from the Fer-2013 (batch_size = 48) are processed at a time, and 499 samples
facial expression database and LF mixed sets of W data can be processed at a time. This experiment has trained
sets [27], [28]. The Fer-2013 facial expression database con- 50 generations in the training set, that is, 25,000 iterations.
tains 28,709 training pictures and 7,178 test pictures, each From the above figure, the following conclusions can be
of which is a 48 × 48 gray scale image. Each face is more or drawn:
less in the middle of the picture. Therefore, in the experiment, (1) As can be seen from Figure 5, both the proposed
the image data can be directly input into the network for algorithm and the R-CNN algorithm converge after a certain
training without any other pre-processing. number of iterations. When the model converges, the recogni-
We use a Keras framework to build the network. Keras is tion rates are 85.54% and 77.78%, respectively. It can be seen
a python-based neural network framework, which supports that the proposed algorithm improves by nearly 8 percentage
seamless switching between theano and tensorflow [29]. The points compared to the R-CNN algorithm. Thus, the proposed
hardware platform of the experiment is Intel (R) Core (TM) algorithm has certain advantages in facial expression recog-
i5-6500 CPU main frequency 3.2GHz, 16GB memory and nition rate.
6GB NVIDIA GeForce GTX 1060 GPU display memory. (2) As seen from Figure 5, both models converge after
a certain number of iterations. When the proposed model
A. PERFORMANCE ANALYSIS EXPERIMENTS is iterated to 10,000 times, the model begins to converge.
The purpose of this experiment is to test the performance of The R-CNN algorithm converges after 15,000 iterations. This
the proposed algorithm, and verify that the proposed algo- shows that the proposed algorithm can achieve satisfactory
rithm has a lower training time than the original algorithm. results after fewer iterations, that is to say, the training speed
The experimental data come from the images of Fer-2013 of the proposed method on the training set is 1.5 times faster
expression database. The Fer-2013 expression database has than that of R-CNN algorithm.
been introduced in Section 3.D. In order to improve the The proposed method is compared with R-CNN and
reliability of the experimental results, three cross-validation FRR-CNN algorithms [20]. The experimental data come
experiments were carried out, which divided 35,886 facial from Fer-2013 facial expression data set. Table 1 lists the
expression images into three parts on average. Two of recognition rate comparison of the three algorithms, and the
them were used as training samples in each experiment, time comparison on the test set and the training set.
and the remaining one was used as test samples. The The training time and test time of training set indicated
experiments were repeated three times, and the average in Table 1 refer to the time used to process a batch of
recognition results of three times were taken as the final images. In this paper, 48 images are processed in batches.
more robust models which satisfy real conditions. We will [23] F. Zhang, Q. Mao, X. Shen, Y. Zhan, and M. Dong, ‘‘Spatially coherent
also focus on how to reduce the complexity of network struc- feature learning for pose-invariant facial expression recognition,’’ ACM
Trans. Multimedia Comput., Commun., Appl., vol. 14, no. 1s, Apr. 2018,
ture, and will try to recognize dynamic expressions with 3D Art. no. 27.
convolution technology. [24] H. Ma and T. Celik, ‘‘FER-Net: Facial expression recognition using
densely connected convolutional network,’’ Electron. Lett., vol. 55, no. 4,
pp. 184–186, Feb. 2019.
REFERENCES [25] L. Wei, C. Tsangouri, F. Abtahi, and Z. Zhu, ‘‘A recursive framework
[1] R. M. Mehmood, R. Du, and H. J. Lee, ‘‘Optimal feature selection and deep for expression recognition: From Web images to deep models to game
learning ensembles method for emotion recognition from human brain dataset,’’ Mach. Vis. Appl., vol. 29, no. 3, pp. 489–502, 2018.
[26] S. Li and W. Deng, ‘‘Reliable crowdsourcing and deep locality-preserving
EEG sensors,’’ IEEE Access, vol. 5, pp. 14797–14806, 2017.
[2] T. Song, W. Zheng, C. Lu, Y. Zong, X. Zhang, and Z. Cui, ‘‘MPED: learning for unconstrained facial expression recognition,’’ IEEE Trans.
A multi-modal physiological emotion database for discrete emotion recog- Image Process., vol. 28, no. 1, pp. 356–370, Jan. 2019.
[27] A. V. Savchenko, ‘‘Deep neural networks and maximum likelihood search
nition,’’ IEEE Access, vol. 7, pp. 12177–12191, 2019.
[3] E. Batbaatar, M. Li, and K. H. Ryu, ‘‘Semantic-emotion neural network for for approximate nearest neighbor in video-based image recognition,’’ Opt.
emotion recognition from text,’’ IEEE Access, vol. 7, pp. 111866–111878, Memory Neural Netw., vol. 26, no. 2, pp. 129–136, Apr. 2017.
[28] R. Massey et al., ‘‘The behaviour of dark matter associated with four bright
2019.
[4] Y. Zhang, L. Yan, B. Xie, X. Li, and J. Zhu, ‘‘Pupil localization algorithm cluster galaxies in the 10 kpc core of Abell 3827,’’ Monthly Notices Roy.
combining convex area voting and model constraint,’’ Pattern Recognit. Astronomical Soc., vol. 449, no. 4, pp. 3393–3406, 2017.
[29] A. Moeini, K. Faez, H. Moeini, and A. M. Safai, ‘‘Facial expression recog-
Image Anal., vol. 27, no. 4, pp. 846–854, 2017.
[5] H. Meng, N. Bianchi-Berthouze, Y. Deng, J. Cheng, and J. P. Cosmas, nition using dual dictionary learning,’’ J. Vis. Commun. Image Represent.,
‘‘Time-delay neural network for continuous emotional dimension predic- vol. 45, pp. 20–33, May 2017.
tion from facial expression sequences,’’ IEEE Trans. Cybern., vol. 46, HONGLI ZHANG graduated from the Beijing
no. 4, pp. 916–929, Apr. 2016. Institute of Technology, in 2014, and the Ph.D.
[6] X. U. Feng and J.-P. Zhang, ‘‘Facial microexpression recognition: A sur- degree in computer science. She is currently an
vey,’’ Acta Automatica Sinica, vol. 43, no. 3, pp. 333–348, 2017.
[7] M. S. Özerdem and H. Polat, ‘‘Emotion recognition based on EEG fea-
Associate Professor with Inner Mongolia Normal
tures in movie clips with channel selection,’’ Brain Inf., vol. 4, no. 4, University. She has authored more than 15 peer-
pp. 241–252, 2017. reviewed articles on computer networks and intel-
[8] S. Escalera, X. Baró, I. Guyon, H. J. Escalante, G. Tzimiropoulos, ligent algorithms. Her current research interests
M. Valstar, M. Pantic, J. Cohn, and T. Kanade, ‘‘Guest editorial: The include artificial intelligent, data dining, and cog-
computational face,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, nitive computing.
no. 11, pp. 2541–2545, Nov. 2018.
[9] X. Yu, S. Zhang, Z. Yan, F. Yang, J. Huang, N. E. Dunbar, M. L. Jensen,
ALIREZA JOLFAEI received the Ph.D. degree
J. K. Burgoon, and D. N. Metaxas, ‘‘Is interactional dissynchrony a clue
to deception? Insights from automated analysis of nonverbal visual cues,’’ in applied cryptography from Griffith University,
IEEE Trans. Cybern., vol. 45, no. 3, pp. 492–506, Mar. 2015. Gold Coast, Australia. He is currently an Assistant
[10] F. Vella, I. Infantino, and G. Scardino, ‘‘Person identification through Professor in cyber security with Macquarie Uni-
entropy oriented mean shift clustering of human gaze patterns,’’ Multime- versity, Sydney, Australia. He has authored more
dia Tools Appl., vol. 76, no. 2, pp. 2289–2313, Jan. 2017. than 50 peer-reviewed articles on topics related
[11] S. H. Lee, K. N. K. Plataniotis, and Y. M. Ro, ‘‘Intra-class variation to cyber security. His current research interests
reduction using training expression images for sparse representation based include cyber security, the IoT security, human-
facial expression recognition,’’ IEEE Trans. Affect. Comput., vol. 5, no. 3, in-the-loop CPS security, cryptography, AI, and
pp. 340–351, Jul./Sep. 2014. machine learning for cyber security. He received
[12] D. Ghimire, S. Jeong, J. Lee, and S. H. Park, ‘‘Facial expression recognition
based on local region specific features and support vector machines,’’ the prestigious IEEE Australian Council Award for his research article
Multimed. Tools Appl., vol. 76, no. 6, pp. 7803–7821, Mar. 2017. published in the IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY.
[13] S. K. A. Kamarol, M. H. Jaward, H. Kälviäinen, J. Parkkinen, and He has served more than ten conferences in leadership capacities, including
R. Parthiban, ‘‘Joint facial expression recognition and intensity estimation the Program Co-Chair, Track Chair, Session Chair, and Technical Pro-
based on weighted votes of image sequences,’’ Pattern Recognit. Lett., gram Committee Member, including IEEE TrustCom. He has served as a
vol. 92, pp. 25–32, Jun. 2017. Guest Associate Editor for IEEE journals and transactions, including the
[14] J. Cai, Q. Chang, X.-L. Tang, C. Xue, and C. Wei, ‘‘Facial expression IEEE INTERNET OF THINGS JOURNAL, the IEEE TRANSACTIONS ON INDUSTRIAL
recognition method based on sparse batch normalization CNN,’’ in Proc. APPLICATIONS, and the IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION
37th Chin. Control Conf. (CCC), Jul. 2018, pp. 9608–9613. SYSTEMS.
[15] B. Yang, X. Xiang, D. Xu, X. Wang, and X. Yang, ‘‘3D palmprint recogni-
tion using shape index representation and fragile bits,’’ Multimedia Tools MAMOUN ALAZAB received the Ph.D. degree
Appl., vol. 76, no. 14, pp. 15357–15375, 2017. in computer science from the School of Science,
[16] N. Kumar and D. Bhargava, ‘‘A scheme of features fusion for facial Information Technology and Engineering, Federa-
expression analysis: A facial action recognition,’’ J. Statist. Manage. Syst., tion University Australia. He is currently an Asso-
vol. 20, no. 4, pp. 693–701, 2017. ciate Professor with the College of Engineering,
[17] G. Tzimiropoulos and M. Pantic, ‘‘Fast algorithms for fitting active appear- IT and Environment, Charles Darwin University,
ance models to unconstrained images,’’ Int. J. Comput. Vis., vol. 122, no. 1,
Australia. He is also a Cyber-Security Researcher
pp. 17–33, 2017.
[18] M. Takalkar, M. Xu, Q. Wu, and Z. Chaczko, ‘‘A survey: Facial and Practitioner with industry and academic expe-
micro-expression recognition,’’ Multimedia Tools Appl., vol. 77, no. 15, rience. His current research interests include cyber
pp. 19301–19325, 2018. security and the digital forensics of computer sys-
[19] J. Li, D. Zhang, J. Zhang, J. Zhang, T. Li, Y. Xia, Q. Yan, and L. Xun, tems, including current and emerging issues in the cyber environment like
‘‘Facial expression recognition with faster R-CNN,’’ Procedia Comput. cyber-physical systems, and the Internet of Things, by taking into consider-
Sci., vol. 107, pp. 135–140, Jan. 2017. ation the unique challenges present in these environments, with a focus on
[20] S. Xie and H. Hu, ‘‘Facial expression recognition with FRR-CNN,’’ Elec- cybercrime detection and prevention. He has authored or coauthored more
tron. Lett., vol. 53, no. 4, pp. 235–237, Feb. 2017. than 100 research articles, two of his papers were selected as the featured
[21] Q. Mao, Q. Rao, Y. Yu, and M. Dong, ‘‘Hierarchical Bayesian theme mod-
els for multipose facial expression recognition,’’ IEEE Trans. Multimedia, articles, and two other articles received the Best Paper Award. He was a
vol. 19, no. 4, pp. 861–873, Apr. 2017. recipient of the Short Fellowship from the Japan Society for the Promotion
[22] V. Magudeeswaran and J. F. Singh, ‘‘Contrast limited fuzzy adaptive of Science (JSPS) based on his nomination from the Australian Academy of
histogram equalization for enhancement of brain images,’’ Int. J. Imag. Science.
Syst. Technol., vol. 27, no. 1, pp. 98–103, 2017.