Deep Learning Based Student Emotion Reco
Deep Learning Based Student Emotion Reco
Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4691 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms
Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4693 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms
C. Feature Extraction
In the research work [18], the facial features were categorized
into four different categories such as eyes, eyebrows, lips, and
head. A Hidden Markov Model (HMM) was used to extract
facial features and it differs from other approaches like
template-based and appearance-based as it builds a series of
Fig. 4. Sample Images from the FER 2013 Dataset [24].
observation vectors from the face pattern, represented in the
To build the 7-class dataset, the foremost image (neutral Fig 5. The identified features are then labelled as D1, D2,
expression) from each of the 309 sequences was selected and D3… etc., and feed to the neural network. Any change in the
added to the 6-class dataset. This is shown in Fig 3. (a) where distance metrics point to an instance of a facial feature and
a sample of seven prototypic expressions are taken from the collection of these features can be used to classify those six
CK+ dataset. facial emotions. Certain threshold values were provided for
The research works [16] [17] have used the FER 2013 dataset distance to make decisions related to facial emotions, which
which includes around 35,000 human facial expression can then be used to classify an unknown pattern.
images for seven emotion categories, including anger, In their work [19], the researchers had used Haar Cascades
disgust, fear, happiness, sadness, surprise, and calm. The based facial feature extraction, a well-known technique and
image label breakdown shows a skew towards happiness and implemented it in OpenCV. This cascade classifier consists of
away from disgust, as can be clearly seen from the facial several stages, where each stage is an ensemble of weak
expression in the image (Fig 4). action as shown in Fig 6. In each stage, the classifier labels the
B. Face Detection region defined by the current position of the sliding window
as either positive or negative.
In research work [18], the authors have not used any face
detection algorithms, most likely because theirs was an
e-learning system where, its most possible, that only one
individual will be visible in the frame. Hence, there was no
need to detect the face and facial expressions were directly
analyzed which will be discussed in the next section.
However, a new method was used by the researchers in their Fig. 6. Haar Cascade Feature Extraction model [25].
paper [19], termed as, Key Frame Extraction, was used to
The authors [17] in their paper had proposed Regions with
avoid duplicate frames from the stored video frames due to
CNN (R-CNN) method to exactly identify the target position
slower processing. Each video frame is calculated by the
in the whole image, since regions suggest using precise
intersection of color histograms and corresponding values are
information like color, texture and texture in an image. This
noted. The resulting values are compared with threshold
overcomes the problems of missing features associated with
values, a screen-cut is detected and removed from the
the previous work that used sliding window method, by
training. Voila Jones Algorithm [6] was used to detect the
achieving a candidate window of better quality than the
faces from the selected frames
previous sliding window with fixed aspect ratio. This region
Tang et al. in [17] have used a convolutional model for
proposal approach results in extracting the facial features
detecting the face as well as predicting the facial expressions
well. In the research work [16], Gabor Filter and Discrete
with machine learning schemes such as Support Vector
wavelet transform (DWT) were used for feature
Machines (SVM), Logistic Regression and Random Forest
extraction.
Regression. Once the face from the image gets located, it is
cropped and resized into 48 x 48 pixels.
Chen et al. in [16] had used multiple methods for face
detection like geometric feature-based methods, template
matching methods, and subspace LDA, to detect the face in
Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4694 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019
Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4695 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms
Fig. 11. Proposed Facial Emotion Classification Model – Process Flow Diagram
The proposed method for face detection is Voila-Jones [6]. In
D. Training and Feature Learning
a multi-camera or dual camera assembly, it’s not hard to
identify the faces since either one or more cameras may have The proposed method is to implement a CNN model with
the proper alignment for easy face detection. Our proposed transfer learning. For image data, so called deep CNNs have
system uses a single digital camera pointing straight-ahead proved to perform similar or even better than humans in some
towards the students. This algorithm can be used to detect the recognition tasks. One of the main assumptions of most CNN
face from a video frame and crop the faces detected as architectures is that the inputs to these networks are raw
separate images of a fixed size, 150x150 pixels. The cropped images; this aid them in encoding specific properties into the
images can be converted into grayscale if necessary and it will network structure. The CNN transforms the dimensions of the
be highlighted by drawing a rectangular region around it. To image layer by layer while the various activation functions in
remove the background and other edge-lying obscurities, the the different layers, until it finally reduces to a single vector of
subject’s face was must be cropped from the original image scores assigned to each class. These scores are arranged along
based on the positions of the eyes. the depth of the final layer.
There will be some landmark locations which were provided Convolutional Networks (ConvNet) typically consist of three
for an image and each location represents some position on layers namely CONV, POOL and FC (fully connected). These
the face. By using those landmark locations, the distance layers are stacked to form a full ConvNet architecture. The
between eyes were calculated once the midpoint of left and activation function used for the network at hand is a Rectified
right eyes was identified. If you see the Fig.12, you can Linear Unit (ReLU) activation function, which is R(z) = max
understand that the face was then cropped using empirically (0, z). Further, Rectified Linear Unit (ReLU) apply an
selected percentages of D with the center of the left eye as a elementwise activation function. Symbolically, such a
reference point. network can be described by
[INPUT-CONV-RELU-POOLFC]. Once the features are
learnt through ConvNet architectures, classification of
emotions take place.
Deep Learning Algorithm 1: CNN Training Model on FER
2013 dataset
Stage 1. Pre-Processing:
1) Load data: database: FER2013, image size: 48x48 =
2304 vector. #classes=7 = [0=Angry, 1=Disgust, 2=Fear,
=Happy, 4=Sad, 5=Surprise, and 6=Neutral].
2) Split data: (training: test) = (28273,7067).
3) Augment data: rotation, scaling, shift along X and Y
Fig. 12. Viola-Jones proposed Facial features extraction process
[23].
axes.
Stage 2. Creating the Network.
C. Feature Extraction Add layers sequentially
After due consideration to all the prior discussed works for [CONV-CONVNORM-REL
feature extraction, the Haar Cascades extraction method U-POOL]x3 [FC]
would be the most suitable technique for high performance Stage 3. Training the Network:
Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4696 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019
TABLE I
EMOTION CLASSIFICATION RESULTS ACROSS REVIEWED RESEARCH WORKS
Sahla K. S. and CK+ Haar Cascades Haar Cascades CNN with LBP Encoding 80.1%
T. Senthil Kumar [19]
Jielong Tang, Xiaotian Zhou, FER 2013 CNN Model R-CNN SVM, Random Forest, 59.3%,
Jiawei Zheng [17] Logistic Regression
55.1%, 54.0%
Sheng Chen, Jianbang Dai FER 2013 Geometric Feature, Gabor Filter Multi-layer Perceptron (MLP) 85%
and Yizhe Yan [16] Template matching, Discrete Wavelet
Subspace LDA Transform
Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4697 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms
Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4698 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019
AUTHORS PROFILE
Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4699 & Sciences Publication