0% found this document useful (0 votes)
13 views

Deep Learning Based Student Emotion Reco

Base paper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Deep Learning Based Student Emotion Reco

Base paper
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Engineering and Advanced Technology (IJEAT)

ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019

Deep Learning based Student Emotion


Recognition from Facial Expressions in
Classrooms
Archana Sharma, Vibhakar Mansotra
 and communicative intricacies of the humans, thereby aiding
Abstract: Classroom teaching assessments are intended to give in building complex and sophisticated software frameworks
valuable advice on the teaching-learning process as it happens. that can identify social and emotional behavior better in a way
The finest schoolroom assessments furthermore assist as
where they can be used in robots. Human emotions play a
substantial foundations of information for teachers, serving them
to recognize what they imparted fittingly and how they can significant role in a way we respond to an action like making
improve their lecture content to keep the students attentive. In this decisions, learning capabilities, motivating oneself, planning,
paper, we have surveyed some of the recent paper works done on reasoning, thinking, perception, etc., This potent hold that
facial emotion recognition of students in a classroom emotions wield in our daily life, dictating and affecting our
arrangement and have proposed our deep learning approach to
reactions to the events, provides the impetus to analyze these
analyze emotions with improved emotion classification results
and offers an optimized feedback to the instructor. A deep better and build a model that can conclusively identify the
learning-based convolution neural network algorithm will be used right emotions in any situations has made emotion
in this paper to train FER2013 facial emotion images database recognition, the trending topic today and has become a
and use transfer learning technique to pre-train the VGG16 promising research field that can spring many surprises in the
architecture-based model with Cohn-Kanade (CK+) facial image future about human emotions. Emotions are normally
database, with its own weights and basis. A trained model will
capture the live steaming of students by using a high-resolution
recognized by analyzing the speech signals, text content and
digital video camera that faces towards the students, capturing facial expressions using intelligent software algorithms.
their live emotions through facial expressions, and classifying the Of all the modes of emotion recognition, emotions expressed
emotions as sad, happy, neutral, angry, disgust, surprise, and fear, through facial reactions is the most powerful and universal
that can offer us an insight into the class group emotion that is way of conveying emotions. The expression of one’s feelings
reflective of the mood among the students in the classroom. This
through facial expressions is authentic and natural, finds out a
experimental approach can be used for video conferences, online
classes etc. This proposition can improve the accuracy of emotion psychological research. The human-machine interface (HMI)
recognition and facilitate faster learning. We have presented the [1] framework aids us to construct devices capable of
research methodologies and the achieved results on student managing the interactions between human-machine. For
emotions in a classroom atmosphere and have proposed an instance, the facial emotions in our study can be deciphered as
improved CNN model based on transfer learning that can indicators for machines to investigate the fundamental
suggestively improve the emotions classification accuracy.
sentimental state through its intelligent framework and can
Keywords: classification, convolutional neural network, deep present the emotional state of the human being studied.
learning, emotion recognition, face recognition Facial expressions-based emotion recognition practices have
a significant role in the modern fields of computer vision and
I. INTRODUCTION artificial intelligence [2]. Although emotions can be captured
and analyzed with wearable sensors, it is essential and more
In the modern times where technology is forever evolving, the importantly adaptable to detect facial emotions with visual
interaction between humans and machines is gaining inputs without the need for a physical connection. This
importance and there is a growing demand for developing necessity to detect emotions in a real-world scenario, and the
machines that can be intelligent and self-decisive and complexity and scalability of modern computer vision
machines that can capture the gestures and emotions of the algorithms coupled with high performance hardware to
humans to automate tasks and handle the communication process loads of data in real-time had boosted the
better. A machine that can understand the emotions of a development and application of artificial intelligence based
human better can predict and respond to the human behavior facial emotion recognition a strong headstart among other
better, which in turn can significantly improve the efficiency contemporary methods used in facial emotions analysis.
of the task that is meant to be done. This machine can act as an However, this technology as poses its own challenges in facial
important tool for the behavioral science analysis of the social analysis, detection, recognition and emotions classification.
Simply mimicking the way by which humans recognize faces .

Revised Manuscript Received on August 20, 2019


Dr. Archana Sharma, Department of Computer Science, Government
M.A.M College, Cluster University of Jammu, Jammu, India. Email:
[email protected]
Dr. Vibhakar Mansotra, Department of Computer Science and IT,
University of Jammu, Jammu, India. Email: [email protected]

Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4691 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms

This way of analyzing the various emotions of the students


within a classroom environment can offer more insights into
their emotional states during the lecture can assist in
designing teaching aids to improve their attentiveness and
also improve the efficacy of the content delivery by the
Fig. 1. Flow Representation of Facial Emotion Recognition System
instructor in the classroom.
instantly without much effort is a huge challenge for the
After analyzing the past works on the classroom student
machines. Several other factors like the broad range of
emotions recognition, we propose our own system using deep
individuals with own characteristic faces, further complicated
convolutional neural networks (CNN) technique for
by their age and ethnicity must also be considered.
recognizing the students’ classroom emotions. A video
Furthermore, the head size variations, face orientations, facial
camera present in the classroom captures the live video of
hair, eyeglasses etc., make this task of identification more
the students’ in the classroom as a whole and this captured
difficult. Also locating the face in the images where there are a
video frames are presented to the processing unit which
group of people or group of objects
extracts the key frames from the video and applies facial
mixed with people can be more fundamental issue in facial
feature extraction techniques to detect the faces from the
emotional recognition. In an environment like classroom,
video frame. The CNN model has been trained on similar
meeting halls etc., the face poses may differ due to the camera
databases of people and faces and it can be used to predict the
angle, where they may be at an angle obscuring the facial
classes of emotions with highest accuracy and the most
features necessary for the analysis. This requires us to employ
probable emotion is decided. An assessment can be carried
some good preprocessing methods to work on those input
out based on the emotion predictions, and the emotions
images that have a great deal of tactlessness to the scaling, and
classes like happy, sad, surprise, disgust, angry, fear and
rotation of the head. Many feature-based techniques used in
neutral emotions are predicted and analyzed to provide better
facial emotion analysis use local spatial analysis in identifying
teaching tools, improve lecture content and lead to a better
the facial features, and the automatic localization of those
classroom environment.
facial key points helps a lot in the analyzing the facial
emotions in a robust manner. Recognizing the facial
II. RELATED WORKS
components like the eyes, nose, brows, lips, mouth etc., is also
an important part in analyzing the facial emotions and there Facial emotion identification and classification system has
are many techniques that are employed today to identify those gathered a substantial amount of interest with a vast number
key facial points. of applications in the current era. In most of the applications, a
In our study and the review of the facial emotion recognition, user’s face tracked by a digital high-resolution video camera
we consider the classroom scenario where we analyze the and analyzed for emotions is growing exponentially. One can
mood of the students in the classroom to the lecture and easily analyze the emotions to bring corresponding changes to
analyze this moods or the emotions that they experience to the environment that surrounds it. To analyze the human
understand the student psychology of what interests them and facial expressions, face recognition must be the first step. The
extraction and identification of the face from live images or
what makes them bored during a lecture. In the present
live video-stream is one of the major challenges in this field.
classroom scenario, the teaching assessments are basic,
By varying the physical attributes of a face causes a major
non-graded, limited in the way they are designed to offer
change in the identification. Myunghoon et al.[3] proposed
helpful feedback on the students’ emotions in a classroom. using an Active shape model (ASM) to take out the 77 facial
This study is undertaken as the commitment and the features based on geometric-based model, where the detected
involvement of the students during the lectures is lectures is image is iteratively distorted to fit and shape the model and
vital for understanding the concepts of the topic being taught filter out the facial features after comparing it with ASM. In
and can undoubtedly improve the academic credentials of the 2014, Kamlesh et al. [4] used a hybrid approach based on
students. Even though direct supervision by the instructors is Active Appearance Model (AAM) and Local Binary Patterns
possible in a classroom environment, it cannot be used as a (LBP). This approach extracted 68 facial points, in which
tool to measure the attentiveness of the students in the AAM is geometric- based approach and LBP is
classroom. Also, there are quite a handful of students in every appearance-based approach.
classroom who can lose their concentration and be inattentive In 2016, Krithika L.B and Lakshmi Priya G.G [5]
to the lecture even under direct supervision. This presents a proposed a student face emotion identification system for
need for an approach to quantitatively assess and find out the e-learning environments. This model allows to capture the
lapses of concentration and attentiveness by the students in students’ face and identifies the emotion of the students and
the classroom. Hence, a continuous assessment of the captures the dynamically changing emotions in response to
students’ facial emotions in the class can act as an aid in the part of the lecture being listened to in this e-learning
predicting the complete class behavior like their attention to environment. They had applied the Voila-Jones [6] method
the class (neutral), laughing during the lecture (happy), and Local Binary Patterns (LBP) to detect the face and
classifying an expression or an emotion of a student.
sleeping during the lecture (bored/sad) and so on. This study
In the other recent work, an
has been presented to discuss on the previous works that were
intelligent classroom facial
carried in the exact classroom scenario using the students’
emotion recognition based on
facial expression during a lecture to extract mood patterns. deep learning was presented by
Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4692 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019

proposed and implemented by Chen et al. [16]. Two GS3-U3


digital video cameras connected to HP Laptop (CPU: I7-855,
8GB RAM, Graphics: NVIDIA MX15) was used as the host
computer to train and test the neural network model. A
Multi-Layer Perceptron (MLP) Classifier was used in this
study to classify the emotion states.

III. EXISTING SYSTEM FRAMEWORKS


In this paper, we have intended on reviewing those research
works and their system frameworks to analyze the research
work done thus far in this field of emotion recognition in a
classroom environment. For this purpose of investigation, we
have selected four previous research works whose focus on
the analysis on student emotions in a classroom arrangement.
Fig. 2. Classroom Layout with the digital video cameras The selection of these papers were done based on the
positioned on either side of the platform as proposed by Sheng following criteria: year of publishing of the research work; we
Chen et al., [16]. have primarily considered the most recent research paper, the
Chao M et al. [7]. methodology used in the work; we have selected papers that
They had used the FER 2013 database with fixed size of have used different methodologies for features extractions
images such as 48 x 48 pixels and color-mode were grayscale. and classification of emotions, and finally the physical design
A three-layer CNN model was used in this paper. The results and implementation; how the hardware setup consisting of
were not too precise to make any conclusive agreement on the cameras and the processing unit have been implemented in a
emotion states, achieving 50-60% accuracy. classroom setup, the student population and the computing
Recently, in 2019, Boonroungrut et al. proposed the student power of the processing unit used in facial image processing.
emotion identification in classroom environment by using of aim this survey for facial emotion identification of a student
cloud facial recognizer technology [8]. Surprisingly, this in a classroom.
technology has been utilized in marketing to measure and For our review, we have considered Abdulkareem
explore the group satisfaction. Al-Alwani et al. [18] proposal on Mood Extraction Using
Several emotion identification algorithms for facial Facial Features to Improve Learning Curves of Students in
expression analysis have been discussed there in the literature E-Learning Systems, where there is no direct supervision
survey. Machine Learning methods [9] such as Support involved in gauging the emotions of the students. Another
Vector Machine (SVM) [10], Random Forest Regression work, we have taken for our review was originally presented
[11], Logistic Regression [12] and Neural Networks (NN) by Sahla K. S. and T. Senthil Kumar [19] for assessing the
[13], Hidden Markov Model [14] and Adaboost [15] are classroom teaching based on student emotions. This work is
different types of classifiers used for identification and unique in the way that they had designed the system to capture
classification of emotions. the video of the teacher to predict his/her emotions along with
In a prior work, Chen et al. [16] proposed a classroom the student’s emotions, hence providing a two-way dissection
teaching feedback system based on machine learning based of the emotions experienced by both the instructor and the
facial emotion recognition, with a perfect classroom setup for students, offering more insight into the classroom lectures.
capturing the live streaming video to identify facial emotions Our third paper for review was presented by Tang et al. [17]
and provide a feedback to the instructor. As we discussed for the design of intelligent classroom based facial
above, machine learning models does not learn anything from recognition using CNN and FER-2013 facial image dataset.
old data. It just predicts or classifies certain things. Hence this This work is one of the few works that uses FER-2013 dataset
model did not get better results compared to the previous for the classification of students’ emotions in a classroom.
work discussed above and layout can be shown in Fig 2. Since we have planned to use FER-2013 dataset for our
Models based on neural networks are very effective in proposed work, we found it appropriate to analyze their work
capturing complex facial patterns from facial images. Both better. The final research work considered here was by Chen
supervised and unsupervised learning approaches are used to et al. [16] that had proposed a real-time feedback system with
train the neural network. Since finding an enough training a camera array to capture the facial reactions of the students in
dataset is questionable, unsupervised neural networks are the classroom and identify their emotions, assisting the
preferable. Recent studies show that, Convolutional Neural instructor to handle the learning process of the students
Network (CNN) has become the most predominant technique dynamically.
used in the field of image processing.
In 2018, another classroom-based environment designed by A. Dataset
Tang et al. [17] used a classroom assessment and feedback In the research works [18] [19], Cohn-Kanade AU-Coded
system that utilizes the computer vision technology. With the (CK+) Facial Expression Database was used in the study. This
help of FER 2013 dataset, CNN model was trained and used is an extension of the former dataset, which was referred to as
in predicting the emotion state of the students. An NVIDIA Cohn-Kanade (CK) dataset with increased number of subjects
Jetson TX1, the first supercomputer to run a run a GPU and image sequences. Every peak expression in this dataset is
Computing architecture was used in training the neural fully coded by FACS and a
network, where 150K steps of the final trained model took prototypic emotion label is
around 4 hours to train it. assigned.
Another classroom teaching feedback-based system was

Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4693 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms

different angles from the two cameras positioned on both the


sides of the classroom. But there can be some hindrances in
capturing the facial features better due to low light conditions,
background noise, hand postures etc., and normalization of
the recognized faces can help eliminate this problem.

Fig. 3. Sample Expressions in a) CK+ (Extended Cohn-Kande) and


b) JAFFE [23].

For 6-class expression recognition, the three or four most


expressive images from each sequence was selected.

Fig. 5. Distance calculated between Facial Objects [18]

C. Feature Extraction
In the research work [18], the facial features were categorized
into four different categories such as eyes, eyebrows, lips, and
head. A Hidden Markov Model (HMM) was used to extract
facial features and it differs from other approaches like
template-based and appearance-based as it builds a series of
Fig. 4. Sample Images from the FER 2013 Dataset [24].
observation vectors from the face pattern, represented in the
To build the 7-class dataset, the foremost image (neutral Fig 5. The identified features are then labelled as D1, D2,
expression) from each of the 309 sequences was selected and D3… etc., and feed to the neural network. Any change in the
added to the 6-class dataset. This is shown in Fig 3. (a) where distance metrics point to an instance of a facial feature and
a sample of seven prototypic expressions are taken from the collection of these features can be used to classify those six
CK+ dataset. facial emotions. Certain threshold values were provided for
The research works [16] [17] have used the FER 2013 dataset distance to make decisions related to facial emotions, which
which includes around 35,000 human facial expression can then be used to classify an unknown pattern.
images for seven emotion categories, including anger, In their work [19], the researchers had used Haar Cascades
disgust, fear, happiness, sadness, surprise, and calm. The based facial feature extraction, a well-known technique and
image label breakdown shows a skew towards happiness and implemented it in OpenCV. This cascade classifier consists of
away from disgust, as can be clearly seen from the facial several stages, where each stage is an ensemble of weak
expression in the image (Fig 4). action as shown in Fig 6. In each stage, the classifier labels the
B. Face Detection region defined by the current position of the sliding window
as either positive or negative.
In research work [18], the authors have not used any face
detection algorithms, most likely because theirs was an
e-learning system where, its most possible, that only one
individual will be visible in the frame. Hence, there was no
need to detect the face and facial expressions were directly
analyzed which will be discussed in the next section.
However, a new method was used by the researchers in their Fig. 6. Haar Cascade Feature Extraction model [25].
paper [19], termed as, Key Frame Extraction, was used to
The authors [17] in their paper had proposed Regions with
avoid duplicate frames from the stored video frames due to
CNN (R-CNN) method to exactly identify the target position
slower processing. Each video frame is calculated by the
in the whole image, since regions suggest using precise
intersection of color histograms and corresponding values are
information like color, texture and texture in an image. This
noted. The resulting values are compared with threshold
overcomes the problems of missing features associated with
values, a screen-cut is detected and removed from the
the previous work that used sliding window method, by
training. Voila Jones Algorithm [6] was used to detect the
achieving a candidate window of better quality than the
faces from the selected frames
previous sliding window with fixed aspect ratio. This region
Tang et al. in [17] have used a convolutional model for
proposal approach results in extracting the facial features
detecting the face as well as predicting the facial expressions
well. In the research work [16], Gabor Filter and Discrete
with machine learning schemes such as Support Vector
wavelet transform (DWT) were used for feature
Machines (SVM), Logistic Regression and Random Forest
extraction.
Regression. Once the face from the image gets located, it is
cropped and resized into 48 x 48 pixels.
Chen et al. in [16] had used multiple methods for face
detection like geometric feature-based methods, template
matching methods, and subspace LDA, to detect the face in

Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4694 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019

Gabor wavelet transform, a potent image


processing algorithm that mimics the perception of the human
visual system, can improve the edge detection in the images A
feature vector was calculated for facial features using
Gaussian Kernel Function as seen in Fig 7. The image
demonstrates precisely how the kernel output changes from
low-level features like the eye corners, eyebrow edges to
advance features like eyes, nose and mouth. Fig. 9. Generic Convolutional Neural Network Model that takes
48x48 images as input and passes it through three layers of three
sequential convolution layers, with the feature maps doubled in
every layer to form two sequential dense layers with SoftMax
activation function at the output [17].
transferred to Convolution2D layer, where the number of
filters is stated as a super-parameter. Each filter, such as a
sliding window, moves through the whole
image to create a feature graph with shared weights. The
Fig. 7. Facial features extraction with Gabor Filter and Discrete feature map constructed by the convolution layer show how
Wavelet Transform [16]. the pixel values are elevated, for instance, like edge, light, and
D. Training and Feature Learning pattern detection. The CNN model produced better results
Radial-basis function neural network was the preferred than the machine learning techniques.
method used in [18] to classify the emotions such as happy,
sad, surprised, confused and disturbed using the facial
features that shows the distance points of eyes, mouth and
lips. Data association mining technique was used to estimate
the unknown relationships and decision rules in a dataset to
improve decision making and predicting human facial
expressions. Radial basis function for facial expression based
on distance-based approach is shown in Fig.8.

Fig. 10. Multi-Layer Perceptron based Emotion Classification


Model [16]

Sheng Chen, Jianbang Dai and Yizhe Yan in [16] had


proposed an MLP (Multi-Layer Perceptron) Emotion
Classifier that adopts the sigmoid function and reverses the
back-propagation network for training, that can reduce the
falling of the local optimum. Fig-10 shows the classification
Fig. 8. Radial Basis Function Neural Network Model methodology of MLP where the perceptron is determined by
The research work [19] had used CNN with Local Binary multiplying weights and adding bias.
Patterns (LBP) Encoding, where the cropped faces are With this method, to express the student’s emotion state like
processed to obtain the inputs for CNN model. The angry, disgust, sadness, happy, surprise, fear and normal,
classification of emotions is carried out as discussed below. simple emotion mapping to the students learning state is
 LBP encoding method was applied to pixels. realized by classifying them into positive, negative,
insignificant and unrecognized.
 The metric space values were transformed from an
unordered code values by using Multi-Dimensional Space
IV. PROPOSED SYSTEM FRAMEWORK
(MDS).
 CNN Model was trained with RGB Cropped face images The system architecture for proposed emotion recognition
and input to the deep CNN model. The class with framework is represented by Fig.11.
maximum average prediction be the final classification. A. Dataset Description
The researchers [17] had heavily indulged with three different
As there are certain publicly available databases which
machine learning algorithms like multi-class SVM, Random
contain basic human expressions and widely used in emotion
Forest and Logistic Regression for comparison with deep
identification systems. Both these databases consist of 7
learning technique like CNN to classify the human emotions.
expressions. CK+ consists of 123 subjects and 593 samples,
The CNN architecture for face detection and emotion
and FER2013 has 35,887 sample images. Our intention here
recognition is given below in Fig.9. In the preprocessing is to use the FER2013 database a training set and transfer
stage, the fixed size input layer (48x48) images is given as the learning for a new dataset such
input to the next layer. But first the faces are detected from as CK+.
each image using Adaboost to locate and crop faces, which in
turn reduces the size of the images. The input layer is then

Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4695 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms

B. Face Detection images and high-resolution faces. It is because, as we have


There are few models that are frequently used for extracting already referred [16], Gabor filter and wavelet transform
the face from an image such as Voila-Jones, Haar Cascade extracts only certain facial features, missing out those when
and dlib face detectors. The efficiency of the face detection the face is not aligned to face the camera. In such a situation, it
hinges on the effective uncovering of faces from the images is hard to train the model and classifying the emotions with
that contains other objects and human body parts. high accuracy.

Fig. 11. Proposed Facial Emotion Classification Model – Process Flow Diagram
The proposed method for face detection is Voila-Jones [6]. In
D. Training and Feature Learning
a multi-camera or dual camera assembly, it’s not hard to
identify the faces since either one or more cameras may have The proposed method is to implement a CNN model with
the proper alignment for easy face detection. Our proposed transfer learning. For image data, so called deep CNNs have
system uses a single digital camera pointing straight-ahead proved to perform similar or even better than humans in some
towards the students. This algorithm can be used to detect the recognition tasks. One of the main assumptions of most CNN
face from a video frame and crop the faces detected as architectures is that the inputs to these networks are raw
separate images of a fixed size, 150x150 pixels. The cropped images; this aid them in encoding specific properties into the
images can be converted into grayscale if necessary and it will network structure. The CNN transforms the dimensions of the
be highlighted by drawing a rectangular region around it. To image layer by layer while the various activation functions in
remove the background and other edge-lying obscurities, the the different layers, until it finally reduces to a single vector of
subject’s face was must be cropped from the original image scores assigned to each class. These scores are arranged along
based on the positions of the eyes. the depth of the final layer.
There will be some landmark locations which were provided Convolutional Networks (ConvNet) typically consist of three
for an image and each location represents some position on layers namely CONV, POOL and FC (fully connected). These
the face. By using those landmark locations, the distance layers are stacked to form a full ConvNet architecture. The
between eyes were calculated once the midpoint of left and activation function used for the network at hand is a Rectified
right eyes was identified. If you see the Fig.12, you can Linear Unit (ReLU) activation function, which is R(z) = max
understand that the face was then cropped using empirically (0, z). Further, Rectified Linear Unit (ReLU) apply an
selected percentages of D with the center of the left eye as a elementwise activation function. Symbolically, such a
reference point. network can be described by
[INPUT-CONV-RELU-POOLFC]. Once the features are
learnt through ConvNet architectures, classification of
emotions take place.
Deep Learning Algorithm 1: CNN Training Model on FER
2013 dataset
Stage 1. Pre-Processing:
1) Load data: database: FER2013, image size: 48x48 =
2304 vector. #classes=7 = [0=Angry, 1=Disgust, 2=Fear,
=Happy, 4=Sad, 5=Surprise, and 6=Neutral].
2) Split data: (training: test) = (28273,7067).
3) Augment data: rotation, scaling, shift along X and Y
Fig. 12. Viola-Jones proposed Facial features extraction process
[23].
axes.
Stage 2. Creating the Network.
C. Feature Extraction Add layers sequentially
After due consideration to all the prior discussed works for [CONV-CONVNORM-REL
feature extraction, the Haar Cascades extraction method U-POOL]x3 [FC]
would be the most suitable technique for high performance Stage 3. Training the Network:

Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4696 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019

TABLE I
EMOTION CLASSIFICATION RESULTS ACROSS REVIEWED RESEARCH WORKS

Authors Dataset Face Detection Feature Classifier Models Results


Methods Extraction Models

Abdulkareem CK+ -NA- Hidden Radial Basis Function 70%


Al-Alwani [18] Markov Model

Sahla K. S. and CK+ Haar Cascades Haar Cascades CNN with LBP Encoding 80.1%
T. Senthil Kumar [19]

Jielong Tang, Xiaotian Zhou, FER 2013 CNN Model R-CNN SVM, Random Forest, 59.3%,
Jiawei Zheng [17] Logistic Regression
55.1%, 54.0%

Sheng Chen, Jianbang Dai FER 2013 Geometric Feature, Gabor Filter Multi-layer Perceptron (MLP) 85%
and Yizhe Yan [16] Template matching, Discrete Wavelet
Subspace LDA Transform

Num_epochs=100. CNNs for learning and classifying samples from smaller


Fit model on batches with real-time augmentation. databases of CK+. We choose to adapt the bottleneck features
Stage 4. Learning decision: of a pretrained network to build a model
Determine loss on training and test sets over the training for the CK+ dataset. For the problem at hand, the output layer
epochs. of the network was truncated and replaced by a SoftMax layer
Stage 5. Making Predictions: with seven output nodes. To avoid overfitting, we equipped a
Test on individual images. dropout layer before the output layer. The VGG16
Evaluate trained model on test set. architecture is represented as
[[CONVx2-POOL]x3[CONVx3-POOL]x2 FCx3]. The
E. Transfer Learning
features learnt from VGG16 only up to the convolutional
As a deep CNN is made up of many nodes, resulting in a high model up to the fully connected layers is instantiated. This
number of weights to be trained, requiring a large training model is run on the training and validation data of CK+ once,
data to train from scratch. Since our focus lies on on the thus recording the bottleneck features from VGG16 model.
recognition of the emotional classes, the training effort can be The model is then trained with a fully
reduced with the use of pre-trained model from the devised connected model on top of the stored features.
deep learning algorithm already discussed. Since the weights
are already known, small datasets like the Cohn-Kanade+ V. RESULT AND DISCUSSION
(CK) [20] datasets can be also be used. In doing so, training
the network becomes rather a fine-tuning of the last 10 layers We have analyzed the previous research works done on facial
and the output layer’s weights. This notion of representations emotion identification in a classroom environment. The best
learnt from pretrained networks trained on a different dataset results were obtained by Al-Alwani [17] with 85% overall
(FER 2013) being transferred to a different dataset (CK+) for accuracy that was based on FER2013 Facial Emotion Dataset
facial expression recognition is explored. The DL architecture consisting of 7 Expressions.
that we have planned to deploy here is VGG16.
Deep Learning Algorithm 2 - Transfer Learning:
Pre-trained VGG16 on CK+ Database
For the comparison, we have used VGG-16 network by
Simonyan and Zisserman in Fig.13 [20] from the Keras
library for Python with the TensorFlow backend. As described
earlier the network was pre-trained given the FER2013 model
weights. We then adapt the work in [22] for use of pretrained

Fig. 14: Proposed Classroom Emotion Recognition System

The results obtained by each of the reviewed methods is


presented in the Table I. With our proposed system as seen in
Fig.14, we have planned to even better classification accuracy
in predicting the emotions, with the proposed CNN model and
using transfer learning on our pre-trained VGG16 on the CK+
dataset to get higher accuracy.

Fig. 13. VGG16 Architecture [26].

Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4697 & Sciences Publication
Deep Learning based Student Emotion Recognition from Facial Expressions in Classrooms

TABLE II 5. L.B. Krithika, “Student Emotion Recognition System (SERS) for


PROPOSED EMOTION CLASSIFICATION E-Learning improvement based on learner concentration metric,”
APPROACH RESULTS Procedia Computer Science 85, 2016, pp. 767-776.
6. O.H. Jensen, “Implementing the Viola-Jones face detection algorithm,”
Emotion Result Emotion Result MS thesis, Technical University of Denmark., DTU, Lyngby, Denmark,
2008.
Happy: Fear:
7. C. Ma, C. Sun, D. Song, X. Li and H. Xu, "A Deep Learning Approach
95% 84.2% for Online Learning Emotion Recognition," 2018 13th International
Surprise: Sad:12. Conference on Computer Science & Education (ICCSE), Colombo,
4.1% 4% 2018, pp. 1-5.
Neutral: Neutral: 8. C. Boonroungrut, T.T. Oo, and K. One, “Exploring classroom emotion
0.9% 3.4% with cloud-based facial recognizer in the Chinese beginning class: A
preliminary study,” International Journal of Instruction 12.1, 2019, pp.
Sad: Happy: 947-958.
86.4% 96.6% 9. M.S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J.
Neutral: Surprise Movellan, “Recognizing facial expression: machine learning and
9.7% : 3.4% application to spontaneous behavior,” IEEE Computer Society
Fear: Conference on Computer Vision and Pattern Recognition (CVPR'05),
vol. 2, 2005, pp. 568-573.
3.9% 10. M. Abdulrahman and A. Eleyan, “Facial expression recognition using
support vector machines,” 23nd Signal Processing and Communications
Applications Conference (SIU),2015, pp. 276-279.
VI. CONCLUSION 11. E. Kremic and A. Subasi, “Performance of random forest and SVM in
Face detection and emotion recognition are one of the face recognition,” Int. Arab J. Inf. Technol. 13.2, 2016, pp. 287-293.
12. C. Xing, X. Geng, and H. Xue, “Logistic boosting regression for label
challenging problems in the field of computer vision. In this distribution learning,” Proceedings of the IEEE conference on computer
paper, we have presented a detailed survey about various vision and pattern recognition, 2016, pp. 4489-4497.
research papers implemented in a classroom environment, as 13. M. Ali, D. Chan, and M.H. Mahoor, “Going deeper in facial expression
seen from the comparisons of databases, methodologies in recognition using deep neural networks,” IEEE Winter conference on
applications of computer vision (WACV), 2016, ), pp. 1-10.
Table I. As we have also inferred that, Chen et al., had 14. A.V. Nefian and M.H. Hayes, “Hidden markov models for face
achieved satisfactory results when compared to other methods recognition,” Proceedings of the 1998 IEEE International Conference on
used in this comparison as they had intensively concentrated Acoustics, Speech and Signal Processing, ICASSP'98, vol.5, 1998, pp.
on the pre-processing steps to extract the facial features which 2721-2724.
15. P. Yang, S. Shan, W. Gao, S.Z. Li, and D. Zhang, “Face recognition
can be difficult considering the different angles and the sitting using ada-boosted gabor features,” Proc. Sixth IEEE Inter-national
postures of the students. After due consideration and building Conference on Automatic Face and Gesture Recognition, 2004, pp.
on the techniques and methodologies from the discussed 356-361.
research works, we have proposed our paper with an 16. S. Chen, J. Dai, and Y. Yan, “Classroom teaching feedback system
based on emotion detection,” 9th International Conference on Education
improved methodology that can overcome the shortcomings and Social Science (ICESS 2019), 2019, pp. 940-946.
of the above discussed frameworks by focusing on 17. J. Tang, X. Zhou, and J. Zheng, “Design of intelligent classroom facial
preprocessing and classification of emotions. For recognition based on deep learning,” Journal of Physics: Conference
pre-processing, Haar Cascade, OpenCV and Voila-Jones Series, vol. 1168, No. 2. IOP Publishing, 2019.
18. A. Al-Awni, “Mood extraction using facial features to improve learning
algorithm performed well in extracting the facial features. For
curves of students in e-learning systems,” International Journal of
analyzing emotions, the transfer learning methodology that Advanced Computer Science and Applications 7.11, 2016, pp. 444-453.
we have used can analyze emotions in real-time with faster 19. K.S. Sahla and T.S. Kumar, “Classroom teaching assessment based on
processing than the other discussed works. This can student emotions,” The International Symposium on Intelligent Systems
Technologies and Applications, Springer: Cham., 2016, pp. 475-486.
especially work well for faces that are slightly blurred or at a
20. K. Simonyan and A. Zisserman, “Very deep convolutional networks for
different angles where it is hard to conclusively identify the large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
emotions. Even so, there can be certain pitfalls in every 21. P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I.
approach and that actually have provided a scope for our Matthews, “The extended cohn-kanade dataset (ck+): a complete dataset
future work that can use speech combined with facial features for action unit and emotion-specified expression,” IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, 2010,
to analyze the emotions of the classroom as a whole, termed pp. 94-101.
group emotions that can present the overall mood or 22. S. Ramalingam and F. Garzia, “Facial expression recognition using
sentiment of the classroom during the lecture. transfer learning,” International Carnahan Conference on Security
Technology (ICCST), 2018, pp. 1-5.
23. Holder, Ross P., and Jules R. Tapamo. "Improved gradient local ternary
REFERENCES patterns for facial expression recognition." EURASIP Journal on Image
1. J.D. Bauer, H.F. II Kenneth, and R.N. Flores, “Intelligent and Video Processing 2017.1 (2017): 42.
human-machine interface,” U.S. Patent No. 7,966,269. 21 Jun. 2011. 24. Pramerdorfer, Christopher & Kampel, Martin. (2016). Facial Expression
2. S.J. Russell and P. Norvig, Artificial Intelligence: a modern approach. Recognition using Convolutional Neural Networks: State of the Art.
Malaysia: Pearson Education Limited, 2016. 25. J. Cuevas, A. Chua, E. Sybingco and E. A. Bakar, "Identification of river
3. M. Suk and B. Prabhakaran, “Real-time mobile facial expression hydromorphological features using Viola-Jones Algorithm," 2016 IEEE
recognition system-A case study,” Proceedings of the IEEE Conference Region 10 Conference (TENCON), Singapore, 2016, pp. 2300-2306.
on Computer Vision and Pattern Recognition Workshops, 2014, pp. 26. Acharya, Debaditya & Yan, Weilin & Khoshelham, Kourosh. (2018).
132-137. Real-time image-based parking occupancy detection using deep
4. K. Mistry, L. Zhang, S.C. Neoh, M. Jiang, A. Hossain and B. Lafon, learning.
“Intelligent appearance and shape based facial emotion recognition for a
humanoid robot,” The 8th International Conference on Software,
Knowledge, Information Management and Applications (SKIMA 2014),
Dhaka, 2014, pp. 1-8.

Published By:
Retrieval Number F9170088619/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619 4698 & Sciences Publication
International Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 – 8958, Volume-8 Issue-6, August 2019

AUTHORS PROFILE

Dr. Archana Sharma received Doctorate from


Jodhpur National University, Jodhpur, MCA from
University of Jammu, India. She is currently
working as Assistant Professor, Department of
Computer Science, Govt.M.A.M College, Cluster
University of Jammu. She has 12 years teaching
experience at university of Jammu. Her research
area is data mining and artificial intelligence. She has published several
papers in National & International Journals.

Dr.Vibhakar Mansotra received Doctorate in


computer science from University of Jammu, M.Sc,
M.Phil.,(Physics), PGDCA,
M.Tech.(IIT-Delhi),India. He is currently working
as Professor, Former Head of Department of
Computer Science and IT, Dean, Faculty of
Mathematical Science & Director, CITES&M, Coordinator
IGNOU(S.C-1201), University of Jammu and Chairperson Division-IV,
Computer Society of India. He has 26 years teaching experience at university
of Jammu. His research area is data mining, software engineering and
information retrieval. He has published several papers in National &
International Journals.

Published By:
Retrieval Number F9170088619/2019©BEIESP
Blue Eyes Intelligence Engineering
DOI: 10.35940/ijeat.F9170.088619
4699 & Sciences Publication

You might also like