0% found this document useful (0 votes)
7 views

Facial Micro-Expression Recognition FMER Using Model Compression

This paper presents a novel facial micro-expression recognition (FMER) system that utilizes model compression and knowledge transfer from action units (AUs) to improve detection accuracy. The proposed method outperforms existing FMER techniques by leveraging temporal selection strategies and a combination of deep and shallow neural networks. Experimental results demonstrate the effectiveness of the approach across multiple publicly available datasets.

Uploaded by

rongfan.liao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Facial Micro-Expression Recognition FMER Using Model Compression

This paper presents a novel facial micro-expression recognition (FMER) system that utilizes model compression and knowledge transfer from action units (AUs) to improve detection accuracy. The proposed method outperforms existing FMER techniques by leveraging temporal selection strategies and a combination of deep and shallow neural networks. Experimental results demonstrate the effectiveness of the approach across multiple publicly available datasets.

Uploaded by

rongfan.liao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Proc.

of the International Conference on Electrical, Computer and Energy Technologies (ICECET 2022)
20-22 July 2022, Prague-Czech Republic

Facial Micro-Expression Recognition (FMER) using


Model Compression
2022 International Conference on Electrical, Computer and Energy Technologies (ICECET) | 978-1-6654-7087-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICECET55527.2022.9872920

Nikhil Singh Rajiv Kapoor


Research Scholar: Department of Electronics & Professor: Department of Electronics & Communication
Communication Engineering Engineering
Delhi Technological University Delhi Technological University
Delhi, India Delhi, India
[email protected] [email protected]

Abstract— Emotion recognition that utilizes facial expressions is The concept of micro-expression was initially proposed in
critical for AI systems like social robots to interact effectively this project in 1966. Micro-expression was first described in
with humans. In the actual world, however, it is considerably 1969. In a dialogue between a psychotherapist and a depressed
more difficult to distinguish face micro-expressions (FMEs) than client, they discovered that the client who grins frequently has
it is to recognize facial general-expressions with complex several very painful emotions. MEs is a term used to describe
emotions. It's a form of facial expression that lasts only a few quick, involuntary, and impulsive facial gestures. It frequently
seconds, has a modest magnitude, and usually local movement. arises when people are experiencing strong emotions. ME is a
However, because of these properties of MEs (micro- sort of expression that lasts for a short period of time. MEs
expressions), obtaining ME data is challenging, which is a
normally last 1/25 to 1/3 second. For macro-expression
constraint when using deep learning algorithms to recognize
FMEs. The facial action coding system (FACS) can likewise
recognition, crucial temporal selection is beneficial [8]. As a
encode these FMEs which represents that there is some relation result, we used three different temporal selection strategies to
between Action Unit (AU) and FMEs. To demonstrate this, a recognize MEs. The FACS, is a widely wielded programming
knowledge transfer scheme for ME recognition that compress system in the discipline of macro-affect recognition, can also
and transfers information from an AU is given in this encode micro-expression. It has a series of pre-programmed
manuscript. Experiments are carried out on four publicly source code sheets, all of which represents some certain facial
available ME database. Our model outperforms state-of-the-art movement and is referred to as an AU (action unit).
systems, according on the findings of the experiments.
The following is a summary of previous FMER research:
Keywords- Deep Neural Network, FME-Recognition, An FME recognition technique was proposed in [9] using a
Knowledge Transfer, Model Compression. temporal interpolation model and a random forest. The author
presented a hierarchical spatio-temporal descriptor for
controlling feature weight by looking for subtle facial muscle
I. INTRODUCTION action [10]. Many FMER research used LBP-TOP (Local
Emotion detection technology based on facial expression, Binary Patterns on three orthogonal planes) and similar models
action, and voice has recently been investigated as a means of [11]. For micro-expression analysis, suggested the Local binary
assisting intelligent systems. Recognizing emotions associated pattern with six intersection points (LBP-SIP). The key
with macro-expressions is not difficult in general [1]. People contribution of this study is that it diminishes the aspects of
rarely reflect their feelings on their faces in practical terms. features and enhances feature extraction efficacy. Using main
Therefore, the most active research area is on FME emotion directional mean optical flow, [12] devised an FMER system
recognition. [2], [3], [4], [5], [6]. In terms of face expression, (MDMO). They used sparse coding to extract the atomic
even the same emotions might have vastly different feature representing the region of interest from the optical flow
quantitative values. For ultimate emotion recognition, we must data and then categorized the result using SVM. A ME grand
manage even such a FME. Regrettably, there haven't been challenge (MEGC) has been run for a few years to stimulate
many previous investigations on FMER. Furthermore, datasets the competitive development of FME recognition algorithms
for FMER research are typically derived from broadcast [13].
content or films in which actors or actresses purposefully
construct their facial expressions [7].

978-1-6654-7087-2/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: University of Leicester. Downloaded on March 18,2025 at 16:01:14 UTC from IEEE Xplore. Restrictions apply.
Fig.1. Block Diagram of Proposed Algorithm

Deep learning schemes are very common in the domain of learning scheme to a tiny ME database and use AU
affective computing. Deep learning also necessitates information for MER.
abundance of data to train with. When data for training purpose
in not sufficient, in such circumstances we always pre-train the 1. Pre-trained DNN
system on a related database before fine-tuning it on the driven The residual network is divided into three categories.
database. For FMEs the size issue is always there. Each group contains four blocks, the structure of
To overcome the aforementioned problems, we offer a which is depicted in Fig.2. The following formula
unique model compression scheme that compress and transfers can be used to express a residual block with identity
multi-model information from AU for MEs detection having modeling:
ground truth of critical temporal sequences. An objective p l +1 = p l + R ( p l , t l ) (1)
function is used to construct a tutor-tutee correlative structure.
Finally, knowledge relating to multi-level AU is collected for where, pl is the input & p l +1 is the output, residual
ME recognition based on critical temporal sequences. is given by R and t l . Our model’s loss is determined
The rest of this paper is laid out as follows. Section 2 as follows:
delves into the specifics of the suggested methodology. Section F inst = β 1 F 1 + β 2 F 2 (2)
3 describes the experiments in full. Section 4 concludes with
our conclusions and plans for future work. Where, Finst is the neural network leading loss, F1 &
F2 is the detection and face identification loss of AU.
II. METHODOLOGY To balance the sub-task β 1 & β 2 is introduced.

A. Pre-Processing
Temporal phase selection, face identification and
synchronization, and image size normalization are all part of
the data pre-processing procedure. ME representation is, in
general, a phased and physiologically limited structure. It
features a general temporal phase structure that includes terms
like offset, peak, onset and neutral, among others
To draw out and recognize the face, we use the Viola Fig.2. Residual block architecture
Jones algorithm. Through affine transformation, all sequences
Finst is the result of the last two fully connected layers'
are set to this basis face. The frames have been reduced in size
output. The shallow neural network is then guided by
to 160 x 160 pixels.
these features [14].

B. Transfer Learning Framework 2. Shallow Neural Network


We take into account the properties of FMEs as well as the The training of a deep network with many parameters
known relationship between AU and MEs, and apply face AU necessitates a lot of computing [15], [16]. However,
domain knowledge to the FMEs domain. In this manuscript a the micro-expression data set is insufficient to
model compression based FMER system is proposed. It facilitate deep network training. As a result, we're
consists of a shallow NN and a pre-trained instructor DNN. attempting to reduce network size while preserving
The goal of developing this context was to apply the deep performance. We look at the way of model
compression. Ftut is made up of the output features
from the last two completely linked layers [17]. In

Authorized licensed use limited to: University of Leicester. Downloaded on March 18,2025 at 16:01:14 UTC from IEEE Xplore. Restrictions apply.
this paper, we define MEs as a sort of expression that 2 PR (6)
F − Score =
uses characters similar to those used in ordinary P+R
expressions. As a result, for a given input image, Fsh
should be similar to Finst, which is the kernel of given
network. Hence loss is given by:
F tut = η 1 F 1 + η 2 F 2 (3) TABLE I: STATE OF ART COMPARISON WITH CASME
DATASET
Where Fsh is the shallow network leading loss.
CASME Dataset
Reference
Accuracy MAE F-Score
III. PERFORMANCE EVALUATION [22] 65.45 7.72 0.58

A. Training Model Procedure [23] 57.89 8.36 0.61


Dataset FERA 2017 was used to train the DNN database after [24] 72.61 7.58 0.67
data pre-processing, and the best model was kept. The Finst Proposed
74.22 7.22 0.69
was then extracted as prior data to help the shallow network in Scheme
reviewing MER knowledge on the focused MEs database
using the pre-trained model. The output from the shallow TABLE II: STATE OF ART COMPARISON WITH SAVEE
network's last two completely connected layers was then DATASET
retrieved and spliced. The final findings were then obtained SAVEE Dataset
Reference
using SVM. At a given time, one participant is chosen as Accuracy MAE F-Score
testing data, while the others are used as training data. Finally, [22] 72.44 7.77 0.69
the classification performance is computed by averaging the
identification outcomes for every subject. This is also in line [23] 74.53 8.19 0.58
with the comparison method's technique, which we'll discuss [24] 76.06 7.55 0.71
later. Proposed
77.12 7.47 0.85
Scheme
B. Qualitative Comparison
In terms of accuracy, mean absolute error (MAE) and F- TABLE III: STATE OF ART COMPARISON WITH SMIC DATASET
Measure qualitative comparison is done. SMIC Dataset
Reference
Accuracy MAE F-Score
1. Accuracy
[22] 72.06 7.55 0.63
The accuracy of a model is determined by how well it
recognizes correlations and patterns between variables in [23] 74.72 7.52 0.77
a dataset using the input, or training data. [24] 71.47 8.19 0.71
z

Acc =
 i =1 TPi (4) Proposed
75.05 7.32 0..80
z Scheme
 i =1 (TPi + FPi )
where, P is precision, R is recall, z is classes of MEs TABLE IV: STATE OF ART COMPARISON WITH SAMM
DATASET
2. MAE SAMM Dataset
A measure of uncertainties between paired observations Reference
Accuracy MAE F-Score
describing the same phenomenon is called mean absolute
[22] 73.65 8.03 0.79
error (MAE). It is given as:
z [23] 70.88 7.69 0.73
MAE =
 i =1
FPi − TPi
(5) [24] 86.74 7.47 0.83
z
Proposed
3. F-Measure Scheme
90.22 6.81 0.86
The F-score, often known as the F-measure, is a
measurement of a test's accuracy. It is calculated by
dividing the number of true positive results by the entire IV. CONCLUSION
number of positive results, including those that were This paper presents a model compression scheme based facial
mistakenly recognized, and recall by the number of true micro-expression recognition scheme that compress and
positive findings divided by the total number of samples transfer information from AU to MER. The results of the
that should have been detected as positive. It is written as experiments reveal that our model outperforms the most recent
follows: FMER approaches. The study's findings suggest that temporal

Authorized licensed use limited to: University of Leicester. Downloaded on March 18,2025 at 16:01:14 UTC from IEEE Xplore. Restrictions apply.
selection is linked to MER. In the future, an end-to-end Comput. Soc. Conf. Comput. Vis. Pattern Recognit.-Workshops, pp. 94-
101, Jun. 2010.
learning framework that considers temporal selection and
[8] Sun, Bo, et al. "Affect recognition from facial movements and body
FMER at the same time will be investigated. gestures by hierarchical deep spatio-temporal features and fusion
strategy," Neural Networks, vol.105, pp.36-51, 2018.
[9] T. Pfister, X. Li, G. Zhao and M. Pietikainen, "Recognising spontaneous
facial micro-expressions", Proc. Int. Conf. Comput. Vis., pp. 1449-1456,
Nov. 2011.
REFERENCES [10] Y. Zong, X. Huang, W. Zheng, Z. Cui and G. Zhao, "Learning from
hierarchical spatiotemporal descriptors for micro-expression
recognition", IEEE Trans. Multimedia, vol. 20, no. 11, pp. 3160-3172,
[1] D. Y. Choi and B. C. Song, "Facial Micro-Expression Recognition Nov. 2018.
Using Two-Dimensional Landmark Feature Maps," in IEEE Access, vol.
8, pp. 121549-121563, 2020, doi: 10.1109/ACCESS.2020.3006958. [11] Huang, Xiaohua, et al. "Spontaneous facial micro-expression analysis
using spatiotemporal completed local quantized patterns,"
[2] A. C. Le Ngo, Y.-H. Oh, R. C.-W. Phan and J. See, "Eulerian emotion Neurocomputing, vol. 175, pp. 564-578, 2016.
magnification for subtle expression recognition", Proc. IEEE Int. Conf.
Acoust. Speech Signal Process. (ICASSP), pp. 1243-1247, Mar. 2016. [12] Y.-J. Liu, B.-J. Li and Y.-K. Lai, "Sparse MDMO: Learning a
discriminative feature for spontaneous micro-expression
[3] Y. Wang, J. See, Y.-H. Oh, R. C.-W. Phan, Y. Rahulamathavan, H.-C. recognition", IEEE Trans. Affect. Comput., Jul. 2018.
Ling, et al., "Effective recognition of facial micro-expressions with
video motion magnification", Multimedia Tools Appl., vol. 76, no. 20, [13] J. See, M. H. Yap, J. Li, X. Hong and S.-J. Wang, "MEGC 2019—The
pp. 21665-21690, Oct. 2017. second facial micro-expressions grand challenge", Proc. 14th IEEE Int.
Conf. Autom. Face Gesture Recognit. (FG), pp. 1-5, May 2019.
[4] R. Zhao, Q. Gan, S. Wang and Q. Ji, "Facial expression intensity
estimation using ordinal information", Proc. IEEE Conf. Comput. Vis. [14] Li, Chengcheng, Zi Wang, and Hairong Qi. "An Efficient Pipeline for
Pattern Recognit. (CVPR), pp. 3466-3474, Jun. 2016. Pruning Convolutional Neural Networks." In 2020 19th IEEE
International Conference on Machine Learning and Applications
[5] K. Zhang, Y. Huang, Y. Du and L. Wang, "Facial expression recognition (ICMLA), pp. 907-912. IEEE, 2020.
based on deep evolutional spatial-temporal networks", IEEE Trans.
Image Process., vol. 26, no. 9, pp. 4193-4203, Sep. 2017. [15] Wang, Zi. "Zero-Shot Knowledge Distillation from a Decision-Based
Black-Box Model." arXiv preprint arXiv:2106.03310 (2021).
[6] S. Xie and H. Hu, "Facial expression recognition using hierarchical
[16] He, Yang, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang.
features with deep comprehensive multipatches aggregation
"Soft filter pruning for accelerating deep convolutional neural
convolutional neural networks", IEEE Trans. Multimedia, vol. 21, no. 1,
networks." arXiv preprint arXiv:1808.06866 (2018).
pp. 211-220, Jan. 2019.
[17] Wang, Zi. "Data-Free Knowledge Distillation with Soft Targeted
[7] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I.
Transfer Set Synthesis." arXiv preprint arXiv:2104.04868 (2021).
Matthews, "The extended Cohn-Kanade dataset (CK): A complete
dataset for action unit and emotion-specified expression", Proc. IEEE

Authorized licensed use limited to: University of Leicester. Downloaded on March 18,2025 at 16:01:14 UTC from IEEE Xplore. Restrictions apply.

You might also like