Emotion Triggered H.264 Compression: This Chapter Describes
Emotion Triggered H.264 Compression: This Chapter Describes
Now, to understand the importance of a compression technique, let us consider the videos
shown in figure. We notice large that there is a large amount of redundancy in the consecutive
frames of these raw video files. The main aim of any compression technique is to remove the
redundancy present in a raw video signal, thereby reducing the number of binary bits required to
represent the raw video.
In order to carry out compression of a raw video, many standards are available nowadays like
MPEG-1, MPEG-2, JPEG, JPEG2000, and many more. A standard itself does not define the
encoding process. Rather, it defines the syntax in which the original video will be present in its
compressed form and also the method to decode the compressed data to get the decoded output. It
also makes sure that the encoder and decoder are compliant to each other, which means that the
raw video encoded by the encoder can be successfully decoded by the compliant decoder.
The video compression standard like MPEG-1 and MPEG-2 were developed by Moving
Pictures Expert Group (MPEG), which are now widely used in the field of communication and
storage of digital video. The JPEG and JPEG2000 standards which are widely popular were
developed by the Joint Photographic Experts Group (JPEG) for coding still images. ITU-T
Advanced Video Experts Group developed the H.263+ standard. The main advantage of H.264
standard over the previous standards was its improved compression efficiency for low bit-rate
encoding of video sequences. The latest development by ITU-T and JVT led to H.264 standard for
video compression which is widely used nowadays.
The visual quality measurement of the video at the output of the decoder is not an easy task as
many factors are involved in the quality measurement process. The viewer’s state of mind or
his/her personal opinion about the quality may be taken, for example, as one of the important
factors. Other factors may include the kind of video for which the encoding procedure is carried
out. For example, a person who is watching a football match may not look into the detail of the
audiences watching the match, but the same person may look into the detail of the facial
expressions of a news reader reading news on TV. Again, the objective for which encoding is
done also affects the quality measure. For example, people may expect a high quality video output
for a videoconferencing or a surveillance video scene.
Keeping all these factors in mind, a very commonly used procedure for subjective quality
assessment of the video is outlined in the standard, known as the Double Stimulus
Continuous Quality Scale (DSCQS) method. The experimental setup for the
procedure is shown in Figure 2.
Original Video
Sequence
Display
ENCODER DECODER
Figure 2. Double Stimulus Continuous Quality Scale (DSCQS) method
In this procedure to assess the subjective quality of the video, the viewer is shown two
versions of the same video sequence. One version (version A) is the original or the reference video
and the other being the encoded and decoded one (version B). These two versions are shown
randomly to the viewer and he/she is made to rate the quality of these two version in the
continuous scale between ‘Excellent’ to ‘Bad’. Many such videos, each comprising of two
versions, are shown to the viewer to know the final assessment of the quality metric of the encoder
and decoder.
(2 n −1) 2
PSNR = 10 log 10
MSE
Here, MSE or Mean Square Error is calculated between the original video and the reconstructed
video at the output of the decoder. n being the number of bits per image sample. Although PSNR
is a convenient option to measure the quality of the reconstructed video sequence, but its
calculation requires the original video sequence, which may not always be present.
Original:
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
[Frame 1 …………………………………………………………………………………………………………….Frame 6]
60
Average PSNR in dB 55
50
45
40
35
30
25
0 10 20 30 40 50
Quantization Step Size
Figure 4: Plot of PSNR versus Quantization Step Size for “akiyo_orig.yuv” sequence
65
60
55
Average PSNR in dB
50
45
40
35
30
25
0 500 1000 1500 2000 2500 3000 3500
BitRate in kbps
[Frame 1 ……………………………………………………………………………………………………………...Frame 6]
We encode the video with QP=1 and QP=25 (figure 7b and c).
[Frame 1 ……………………………………………………………………………………………………………...Frame 6]
From earlier research results, it is evident that the most important regions for recognizing the
emotion of a person are the eyes and lips. So, for emotion based QP setting of the macroblocks, it
is obvious that the ROI shall be the eye and lip region of the person. The macroblocks belonging
to these regions will be encoded with smaller QP value than the rest of the frame. This approach
will lead to better quality in the ROI and thus the emotion of the person will not be lost due to
compression.
1.5 Methodology
The two main steps involved in emotion based H.264 encoding are
A look-up table needs to be generated for all the frames in the video sequence. This look up table
comprises of information like frame number, the emotion expressed in a given frame and also the
macroblock number involved in different ROI portions of the frame. Figure 8 shows the schematic
diagram
Videofor the look-up
Frame 1table generation for all the frames in a video sequence.
Frame 2 Feature Emotion
Frame 3 Look-up Table
Extraction Recognition
.
.
Frame n
Figure 8: Schematic diagram for Look-up Table generation for a video sequence
In order to recognize the emotion of person in a frame, first a fuzzy face-space is constructed,
which comprises of primary and secondary membership curves of 10 known subjects. Considering
5 emotions (anger, disgust, happiness, relax and fear) and 5 facial features, we get 5×5×10 = 250
primary and secondary curves in the fuzzy face-space.
This is carried out is to separate out the skin and the non-skin regions of the image. The skin
region detection is performed on the HSV (Hue-Saturation-Value) color model. Two parameters,
namely x and y are identified based on the formulae:
The computation of the parameters x and y are done for each pixel according to the above
equations. A pixel is said to be a skin pixel provided the values for parameters x, y and H of the
pixel satisfies the following inequalities:
140 ≤ x ≤ 195
140 ≤ y ≤ 165
0.01≤ H ≤ 0.1
As the skin region detection is based purely on the color value matching, apart from the face
and neck portion other parts of the body present in the image are also included in skin regions. For
example, Fig.2(b) which is obtained from the skin separation procedure of Fig.2(a) contains
portion of both hands apart from the face and neck region. It is also possible that the skin
separation procedure may detect some skin colored regions in the image background provided the
background color values match with the skin range values. These regions along with the non-facial
skin portions do not contribute to the features required for emotion detection. In contrary, those
may lead to incorrect emotion inference. In order to achieve high accuracy in the emotion
detection, the unwanted skin regions and glitches needs to be removed. The next step ‘Face and
Neck Region Extraction’ is carried out for that purpose.
In order to filter out the unwanted skin regions, the column wise sum is obtained for all the
columns. Then the non-zero-column-sum windows are marked by grouping the adjacent columns
having non-zero column sum value. Among those windows, the window with the maximum size
corresponds to the face and neck portion. The remaining windows are left out by marking the pixel
values in those windows as zero. For example in Fig. 2(b), we obtain three windows. From left to
right, the windows are: i) the right hand region window ii) the face and neck region window and
iii) the right hand region window. We select the face and neck region as it has the maximum
window size amongst the three. Thus the left and right hand region windows are eliminated out.
Similar operations are carried out row wise on the new image to find out the maximum row
window (based on non-zero-row-sum values). In this way, the glitches and unnecessary skin
patches are eliminated. After the face and neck region extraction of the image in Fig.2(b), we get
an image as shown in Fig.2(c).
Function: Face_and_neck_region_extraction
Input: Skin segmented image P of m × n pixels.
Output: Face and neck region extracted image P_face of h × w pixels
Begin
S:= ϕ // a set for holding the sum of pixel values of all columns
count := 1; // counter for holding the number of windows
For i:= 1 to n do Begin
m
S (i ) := ∑P ( k , i )
k =1
End For
window := ϕ
min_col_list := ϕ
max_col_list := ϕ
window_size := 0
min_col := 1
max_col := n
For i:=1 to n do Begin
If S(i) ≠ 0 then
window_size := window_size +1
Else
If window_size ≠ 0 then
window(count):=window_size
min_col_list(count) := min_col
max_col_list(count):= (i-1)
count := count + 1
End If
window_size = 0
min_col := (i + 1)
End If
End For
Find the index (i) for which the window_size is maximum;
min_col := min_col_list(i)
max_col := max_col_list(i)
Eliminate other windows by setting the pixel values to 0;
Repeat the above operations to determine the row;
boundaries (minRow and maxRow);
return P_face := P(minRow to maxRow and minCol to maxCol);
End
3. Localization of Eye Region Search Area for Left and Right Eye
A sharp change in color is considered as a key to localize the eye region. For example, while
marking the pixel values from the forehead region, the eyebrow region has a distinct color than the
forehead region. In this approach, first the column wise sum value is computed for each of the
columns. The columns which have a sum less than 50% of the maximum column sum are
eliminated out. This narrows down the search region by redefining the minCol and maxCol values.
Next, in order to determine the sharp color changes, we take the row sums enclosed by the column
boundaries (say sum1 to sumh). Then we calculate the gradient between two consecutive row sums.
i.e difference between sums of ith and (i+1)th row. Taking the ten maximum gradients and their
corresponding row values, we notice that the least row value among these gives the row where the
eyebrows are located. In case, the left and right eyebrows are not aligned in the same row, we get
the row value of the upper eyebrow. Once the eyebrows are located, the upper limit of the search
area is defined. The lower boundary of the search area will be limited by upper limit plus half of
the column window width (maxCol – minCol). The location of the left and right boundaries for
both the eyes are as described in the following algorithm:
Add steps and make the algo concise
Function: Search_area_for_eyes()
Input: P_face of h × w pixels
Output: Left_eye_area of a × b pixels; Right_eye_area of c × d pixels
Begin
S := Φ //a set for holding the sum of pixel values of all columns
For i:= 1 to w do Begin
h
S (i ) := ∑ P _ face(k , i )
k =1
End For;
Find maximum of set S and store in max_col_sum;
For i:= 1 to w do Begin
If S(i) ≤ (0.5*max_col_sum) then
S (i ) := 0
End If
End For;
Mark the beginning and end of non zero values of S(i) to get minCol and maxCol values;
Z := Φ // a set for holding the difference of sum of pixel values of two consecutive rows
For j:= 1 to h-1 do Begin
w w
Z ( j ) := ∑ P _ face( j, k ) − ∑ P _ face( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of Z and store their corresponding row values in A;
minRow = min(A); //row location of eyebrow
A look at Fig. 2 reveals that there is a sharp change in color while moving from the forehead
region to the eyebrow region. Thus to detect the location of the eyebrow, we take the average
intensity (in three primary color planes) over each row of the image from the top, and identify the
row with a maximum dip in all the three planes. This row indicates the top of the eyebrow region
(Fig.3b). Similarly, we detect the lower eyelid by identifying the row with a sharp dip in intensity
in all the three planes, while scanning the face up from the bottommost row. The location of the
top eyelid region is identified by scanning the face up from the marked lower eyelid until a dip in
the three color planes are noted together.
Add steps and make the algo concise
Function: Estimation_of_eye_features()
Input: Left_eye_area (L_image) of a × b pixels
Output: Estimation of EOL and LEEL
Begin
S := Φ // set for holding the difference of sum of pixel values of two rows from top to bottom
For j:= 1 to a-1 do Begin
b b
S ( j ) := ∑ L _ image( j, k ) − ∑ L _ image( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Eyebrow_row = min(A); //row location of Eyebrow
S := Φ // set for holding the difference of sum of pixel values of two rows from bottom to top
For j:= a-1 to 1 do Begin
b b
S ( j ) := ∑ L _ image( j + 1, k ) − ∑ L _ image( j, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Lower_Eyelid_row = max(A); //row location of Lower Eyelid
S := Φ // set for holding the difference of sum of pixel values of two rows from top to bottom
For j:= Eyebrow_row to Lower_Eyelid_row do Begin
b b
S ( j ) := ∑ L _ image( j, k ) − ∑ L _ image( j + 1, k )
k =1 k =1
End For;
Find 10 maximum values of S and store their corresponding row values in A;
Upper_Eyelid_row = min(A); //row location of Upper Eyelid
//Feature List
return EOL = Lower_Eyelid_row - Upper_Eyelid_row;
return LEEL = Lower_Eyelid_row - Eyebrow_row;
EOR and LEER can be estimated similarly using the right eye search area as input image
End;
Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area
HAPPY
DISGUST
FEAR
ANGER
RELAX
Add some more subjects (at least 3 more – anisha, basabdatta, annesha)
1.6 Experimental Details and Results
The experiment is conducted with two sets of subjects: a) the first set of 10 subjects (n=10) is
considered for designing the fuzzy face-space and, b) the other set of 30 facial expressions taken
from 6 unknown subjects is considered to validate the result of the proposed emotion classification
scheme. The experiment thus consists of two distinct phases as indicated in the next two sub-
sections.
Type-2 fuzzy face-space contains both primary and secondary membership distributions for each
facial feature. In order to create the primary curves, we consider 10 known subjects. Ten instances
of one subject expressing a given emotion are considered, say anger. We take down the ten values
of a given facial feature, say EOL from these ten snapshots. The mode of all these values is
considered and a second moment around the mode is calculated. A bell shaped curve is drawn
with the peak as the mode and the standard deviation as the second moment around the mode.
Since we have 5 facial features and the experiment includes 5 distinct emotions of 10
subjects, we obtain 10×5×5=250 primary membership curves. These 250 membership curves are
grouped into 25 heads, each containing 10 membership curves of ten subjects for a specific feature
representing a given emotion. The primary membership curves for different features and different
emotions are shown in Figure 5.
0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1
0.9 0.9
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->
0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
0.9 0.9
Primary memberships -->
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->
0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1
0.9 0.9
Primary memberships -->
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->
0 0 0
0.5 1 1.5 0.5 1 1.5 0.5 1 1.5
Feature --> Feature --> Feature -->
Emotion: FEAR Emotion: RELAX
1 1
0.9 0.9
Primary memberships -->
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.5 1 1.5 0.5 1 1.5
Feature --> Feature -->
0 0 0
0.8 1 1.2 1.4 1.6 1.8 0.5 1 1.5 0.8 1 1.2 1.4 1.6 1.8
Feature --> Feature --> Feature -->
0.9 0.9
Primary memberships -->
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.8 1 1.2 1.4 1.6 1.8 0.5 1 1.5
Feature --> Feature -->
0.8 0.8
0.2 0.2
0 0
1 1
1.5 1.5
0.5 0.5
Secondary 1 1
1 1
0.8 0.8
0.2 0.2
0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary Secondary
membership
0 0.5
feature 0 0.5
feature
membership
0.8 1
0.8
0.6
Primary
Primary 0.6
membership 0.4
membership 0.4
0.2
0.2
0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary feature Secondary
0 0.5 0 0.5
feature
membership membership
1 1
Primary
membership 0.6 0.6
0.2 0.2
0 0
1 1
1.5 1.5
0.5 0.5
1 1
Secondary Secondary
0 0.5
feature 0 0.5
feature
membership membership
0.8 1
0.8
0.6
Primary Primary 0.6
membership 0.4
membership 0.4
0.2
0.2
0 0
1 1
1.5 1.5
0.5 0.5
Secondary 1
Secondary 1
membership
0 0.5
feature membership
0 0.5
feature
b. Emotion recognition of an unknown person
The process of emotion recognition for the unknown person is divided in two steps, as
outlined below.
1. Feature Extraction
The facial features are extracted as mentioned in section IV-A. The extracted features are self-
normalized by dividing individual feature obtained in a given emotional state by its value in the
relaxed state. This nullifies the effect of the distance variation while image capturing. The step by
step approach to feature extraction for the unknown facial image and for the same person in
relaxed state is shown in tabular form. The extracted features for Fig.7 are enlisted in Table II.
As mentioned earlier, we follow two main steps to recognize the emotion of the person.
Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area
TABLE IIB
Calculated Feature Value
EOL EOR MO LEEL LEER
7 10 19 25 27
The step by step method facial feature extraction method for the same person in relax state is
shown in Table I and the extracted features are tabulated in Table II.
TABLE IIIIA
Step by Step Feature Extraction Method for the same person in relax state
Original image Skin Region Face-Neck Right Left Mouth Lip Facial
Region Eye Eye Search Cluster Features
Search Search Area Extracted
Area Area
TABLE IVIB
Calculated Feature Value
EOL EOR MO LEEL LEER
11 13 20 35 36
The final feature list for the unknown facial image is obtained by dividing the features for the
unknown images by the feature list for the same person in relax state. Thus, dividing the feature
value in Table IB by Table IIB, we obtain table III.
TABLE VII
Calculated Feature Value
EOL EOR MO LEEL LEER
0.64 0.77 0.95 0.71 0.75
2. Consulting the Fuzzy face-space for emotion recognition
TABLE IVA
Consulting the Fuzzy face-space
Parameter Anger Disgust Happy Fear Relax
µpri µsec µpri µpri µsec µpri µpri µsec µpri µpri µsec µpri µpri µsec µpri
× × × × ×
µsec µsec µsec µsec µsec
0.04 0.61 0.02 0.50 0.45 0.23 0.36 0.45 0.16 0.02 0.61 0.01 0 0.64 0
0 0.59 0 0.96 0.91 0.87 0 0.61 0 0 0.67 0 0 0.65 0
0 0.61 0 0.02 0.53 0.01 0.02 0.53 0.01 0.5 0.6 0.3 0 0.67 0
EOL(0.64) 0.16 0.61 0.09 0.01 0.53 0.01 0 0.65 0 0.01 0.6 0.01 0 0.64 0
0 0.62 0 0.02 0.45 0.01 0.01 0.53 0.01 0 0.68 0 0 0.64 0
Dipti_disgust 0.07 0.61 0.04 0.81 0.91 0.74 0.08 0.45 0.04 0 0.68 0 0 0.61 0
0.29 0.58 0.17 0.04 0.53 0.02 0.8 0.91 0.73 0 0.67 0 0 0.66 0
0.01 0.61 0.01 0.26 0.45 0.12 0 0.63 0 0 0.68 0 0 0.66 0
0 0.63 0 0 0.53 0 0 0.66 0 0.01 0.62 0.01 0 0.65 0
0.3 0.6 0.18 0.96 0.9 0.86 0.11 0.64 0.07 0 0.60 0 0 0.6 0
0 0.67 0 0.52 0.44 0.23 0.38 0.47 0.18 0.03 0.58 0.02 0 0.58 0
0 0.69 0 0.88 0.89 0.78 0.01 0.66 0.01 0.01 0.65 0.01 0 0.59 0
0.19 0.63 0.12 0.02 0.56 0.02 0 0.49 0 0.44 0.62 0.27 0 0.65 0
EOR (0.77) 0 0.58 0 0.01 0.45 0 0.85 0.91 0.77 0.02 0.69 0.01 0 0.63 0
0.03 0.60 0.02 0.83 0.89 0.74 0 0.61 0 0 0.62 0 0 0.67 0
0.25 0.59 0.15 0.27 0.46 0.12 0 0.69 0 0 0.64 0 0 0.69 0
0 0.62 0 0 0.52 0 0.15 0.62 0.09 0.01 0.69 0.01 0 0.61 0
0.03 0.62 0.02 0.98 0.91 0.89 0.01 0.45 0 0 0.62 0 0 0.60 0
0 0.69 0. 0.01 0.56 0.02 0.09 0.53 0.05 0 0.60 0 0 0.65 0
0.34 0.61 0.21 0.05 0.56 0.03 0 0.69 0 0 0.68 0 0 0.69 0
0.51 0.72 0.36 1 0.9 0.9 0.94 0.72 0.67 0.77 0.67 0.51 0.39 0.72 0.28
0.85 0.77 0.65 0.49 0.68 0.33 0.94 0.77 0.72 0.93 0.77 0.71 0.32 0.72 0.23
MO 0.9 0.72 0.65 0 0.6 0 1 0.9 0.9 1 0.9 0.9 0.38 0.72 0.27
(0.95) 0.99 0.9 0.89 0.13 0.6 0.08 0.86 0.6 0.51 0.75 0.67 0.51 0.03 0.72 0.02
0.46 0.45 0.21 0.44 0.68 0.3 0.99 0.9 0.89 0.88 0.77 0.67 0.45 0.72 0.3
0.62 0.68 0.42 0.72 0.72 0.51 0.95 0.77 0.73 0.89 0.67 0.59 0 0.72 0
0.84 0.77 0.64 0.36 0.68 0.24 0.92 0.77 0.71 0.92 0.77 0.71 0.18 0.72 0.12
0 0.6 0 0.9 0.9 0.81 0 0.51 0 0 0.6 0 0.01 0.72 0.01
0.02 0.51 0.01 0.9 0.9 0.81 0.96 0.6 0.57 0.16 0.51 0.08 0.05 0.72 0.03
0.81 0.60 0.49 0 0.6 0 0.91 0.77 0.71 1 0.9 0.9 0.21 0.72 0.15
0.14 0.48 0.07 0.9 0.9 0.81 0 0.45 0 0.5 0.48 0.24 0 0.5 0
0 0.58 0 0.01 0.6 0.01 0 0.48 0 0.01 0.5 0.01 0 0.5 0
0.06 0.54 0.03 0 0.64 0 0.57 0.48 0.27 0.09 0.5 0.05 0 0.48 0
0.35 0.48 0.2 0.56 0.8 0.45 0 0.45 0 0.05 0.48 0.02 0 0.49 0
LEEL (0.71) 0.55 0.45 0.25 0.84 0.9 0.76 0 0.48 0 0 0.49 0 0 0.48 0
0.45 0.53 0.24 0 0.45 0 0.86 0.6 0.52 0.01 0.45 0 0 0.5 0
0.03 0.53 0.02 0.97 0.9 0.87 0 0.48 0 0.03 0.49 0.01 0 0.48 0
0 0.48 0 0 0.6 0 0 0.48 0 0 0.49 0 0 0.48 0
0 0.58 0 0.99 0.9 0.89 0 0.53 0 0 0.5 0 0 0.49 0
0.12 0.48 0.06 0.99 0.9 0.89 0 0.48 0 0 0.5 0 0 0.5 0
TABLE IVB
Final Range for Facial Features
Emotion Range of Features Range after Centre Value
EOL EOR MO LEEL LEER intersection
Anger 0 – 0.18 0 – 0.21 0 – 0.89 0 – 0.18 0.09
Disgust 0 – 0.87 0 – 0.89 0 – 0.9 0 – 0.87 0.435
Happy 0 – 0.73 0 – 0.77 0 – 0.9 0 – 0.52 0.26
Fear 0 – 0.3 0 – 0.27 0 – 0.9 0 – 0.24 0.12
Relax 0–0 0–0 0 – 0.3 0–0 0
From the last column of Table IVB, it is seen that Emotion ‘disgust’ has the highest centre value
among all the emotion classes. Thus, it is concluded that the emotion expressed by the unknown
facial image 1 is disgust.
UNKNOWN IMAGE 1:
Let us consider the unknown facial image shown in Figure 7.
The process of emotion recognition for the unknown person is divided in two steps, as outlined
below.
1.
TABLE VI
Step by Step Feature Extraction Method
Original Skin Region Face-Neck Right Left Mouth Search Lip Facial
image Region Eye Eye Area Cluster Features
Search Search Extracted
Area Area
TABLE VIII
Calculated Feature Value
EOL EOR MO LEEL LEER
0.636 0.6371 1.77 0.969 0.968
TABLE III
CALCULATED FEATURE RANGES AND CENTRE VALUE FOR EACH EMOTION
1.7 Conclusion
The paper proposed a simple and time-efficient scheme for emotion recognition from a pre-
constructed type-2 fuzzy face space. Experiments reveal that the classification accuracy of
emotion by considering both type-2 primary and secondary memberships is as high as 96.67%.
The accuracy falls off by more 8% when only the type-2 primary memberships are considered.
The classical rule based method for emotion classification depends largely on the relational matrix
used to represent implication relations. In the present context, the emotion analysis is performed
intentionally on the fuzzy encoded measurement space to make the system performance robust.
1.8 Summary