0% found this document useful (0 votes)

1 views

Speech Emotion Recognition Using Machine Learning

The paper discusses a machine learning approach for Speech Emotion Recognition (SER) using the Ravdess dataset, focusing on classifying emotions based on audio signals. Various algorithms, including SVM, Random Forest, Decision Tree, MLP, and CNN, were employed, with Random Forest yielding the highest accuracy of 85%. The study highlights the importance of emotion recognition in enhancing human-computer interaction and proposes future work in improving classification techniques.

Uploaded by

Sravani Bondalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Speech Emotion Recognition Using Machine Learning

Uploaded by

Sravani Bondalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Speech Emotion Recognition Using Machine

Learning Approach

S. G. Shaila(B) , A. Sindhu, L. Monish, D. Shivamma, and B. Vaishali

Department of CSE (Data Science), Dayananda Sagar University, Bangalore, Karnataka, India
{shaila-cse,sindhua-cse,monishl-cse,shivammad-cse,
vaishalivb-cse}@dsu.edu.in

Abstract. Nowadays, emotion recognition and classification plays a vital role

in the field of Human-Computer Interaction (HCI). Emotions are being recog-
nized through behaviors of body such as facial expression, voice tone, and body
movement. The present research considers Speech Emotion Recognition (SER)
as one of the foremost used modality to identify emotions. SER dataset contains
the four different datasets, Ravdess dataset is used in this project. This mechanism
is used due to its high temporal resolution with no risks and less cost. Over the
last decades, many researchers involved SER signals in sequence to cope up with
Brain-Computer Interface (BCI) to detect emotions. It includes removing noises
from audio signals, extracting temporal or spectral features from the audio sig-
nals, analysis on time or frequency domain respectively, and eventually, designing
a multi-class classification strategy. The paper discusses the approach of identify-
ing and classifying human emotions based on audio signals. The approach used
machine learning technique such as Random Forest (RF), Multilayer Perceptron
(MLP), Support Vector Machine (SVM), Convolution Network (CNN), and Deci-
sion Tree (DT) Models for classification. The obtained experimental result seems
to be promising with good accuracy in the emotion classification.

Keywords: Emotions · Audio Signal · Random Forest (RF) · Multilayer

Perceptron (MLP) · Support Vector Machine (SVM) Convolution Network
(CNN) · Decision Tree (DT) · Ravdess Dataset · Classification

1 Introduction

Emotions play an essential role in human life in process of communication between peo-
ple. Emotions are expressed in many ways, including facial expressions, move-ment of
the body, and communication. Hence, nowadays, researchers tend to adopt the approach
of acknowledging human emotions through audio signals. Emotions play a vital role in
human life. It is one way to express One’s feelings to others. Nowa-days, emotion recog-
nition has become a very hot topic for researchers. Emotion has made effective and easy
interaction between computers and people. Emotions can be recognized through different
communication channels such as body language, facial expressions, voice recognition,
etc. In some cases where there is a face-to-face conversation, the emotions of the person

© The Author(s) 2023

S. Tamane et al. (Eds.): ICAMIDA 2022, ACSR 105, pp. 592–599, 2023.
https://doi.org/10.2991/978-94-6463-136-4_50
Speech Emotion Recognition Using Machine Learning Approach 593

can be easily analyzed through his/her facial expression and body language, whereas
the conversation is made through the medium. The person residing to expect from one
another, the conversation and interaction, is made through the medium of the channel,
and then it is hard to predict the emotion of the person. Here, speech emotion recognition
(SER) is a method of ex-pressing One’s emotional state through his/her speech. The main
feature by which humans differ from other living beings is modulated vocal sounds. The
voice of a human can be categorized into several attributes, such as loudness, pitch, vocal
tone, and timbre. Through different vocal attributes, we can analyse human emotions
easily. There are a few universal emotions like anger, sadness, happiness, surprise, fear,
and neutrality that any system can be trained to identify easily. The feature extraction
with the help of a human audio signal supports recognizing emotions. Emotion recog-
nition has gained importance as it also supports physically disabled people who cannot
express their emotions.
The paper compares and contrasts the approaches for identifying emotions using
SVM, MLP, Random Forest, Decision Tree, and CNN models. SVM stands for Sup-port
Vector Machines; it’s a supervised learning algorithm. It works better for regression and
classification problems. It is mainly used for classification problems. The main goal of
SVM is to find the best boundary line that makes us easily classify the n-dimensional
space into specified classes. This boundary line is known as a hyper-plane, so that we can
easily add new data in the same category, which will be useful in the future. MLP stands
for Multilayer Perceptron, and is a synthetic neural network feed-forward technique that
generates a bunch of outputs from a few inputs. It con-sists of input, hidden, and output
layers. It is one of the deep learning methods that use back propagation for training
the model. MLP connects multiple layers into a single graph, which suggests the signal
path to the nodes. Aside from the input node, each node features a nonlinear activation
function. Random forest is a supervised learning technique based on the concept of
ensemble learning. It is similar to the deci-sion tree. Ensemble learning is a method of
combining many classifiers for better performance and to solve complex problems. The
random forest contains the collection of decision trees. It predicts the output based on the
majority voting instead of depending on a single decision tree. The decision tree is a type
of supervised learning algorithm; here the dataset is represented as a tree. It works better
for regression and classification problems and mainly for classification problems. As it is
a tree-structured method, the dataset features are represented by internal nodes, decision
rules are represented by branches, and the leaf node shows the output. CNN stands for
“convolution neural network.” It is an advanced neural network to classify emotions.
CNN automatically detects the important features without any human intervention. It
uses pooling and convolution operations for its computational efficiency. The rest of this
paper is organized as follows. We review the literature in the Sect. 2, and the proposed
work is presented in Sect. 3. In Sect. 4, we present the experimental results and conclude
this paper in the Sect. 4.

2 Literature Review

This section discusses the research done in the field of speech emotion recognition.
The various authors have used machine learning and deep learning techniques such as
594 S. G. Shaila et al.

Fig. 1. Proposed Model for identifying Emotions

Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), long short-
term memory (LSTM) and Bidirectional-LSTM, Support Vector Machine (SVM) for
predicting human emotions. In the paper [1], the authors have analyzed the dataset using
deep convolution recurrent network with LSTM in order to automatically learn the best
representation of the speech signal. It uses recola dataset. The authors in [2] have used the
Iemocap dataset, which contains both motion capture markers and audio data from five
pairs of actors. It contains the 1D CNN LSTM and 2D CNN LSTM network algorithms.
The authors in [3] have conferred a deep neural network for the dumb video, which does
not contain any audio. Dump video detects 70% of the mood and condition of the person
enacting. Similarly, from the voice of a person, 75% of the mood or condition could be
detected. In the paper [4], the authors have owned the LDC and UGA datasets, which
contain individual or groups of people’s emotions. It uses support vector machines and
it focuses mainly on gender datasets. The authors in [5] have presented a deep learning
algorithm such as DCNN, to predict the accuracy of the model. For speaker-dependent,
it gives an accuracy of 76.96% and speaker-independent accuracy of 65.32%. In the
paper, the authors have [6] presented two popular machine learning techniques such as
DNN and SVM that are applied on Iemocap dataset and results are compared on their
performance. The accuracy is less than 55%. The authors have [7] used SVM classifier
to recognize the emotion. Emotion recognition is analyzed in two phases; first, recognize
the 42-dimensional features. Second, through the classification method using SVM, and
achieved an accuracy of 74.62%. The main drawback of these papers is that as it is an
audio signal-based dataset more features are used. This paper concentrates mainly on 5
features.

3 Proposed Work

This section represents the Proposed Model in Fig. 1. The approach used the Ravdess
dataset as input for experimenting. The data set is pre-processed and further preceded
with feature selection and extraction. The approach used cross-validation for splitting up
of the Data into training and validation sets. Classification is done using various machine
learning algorithms and performance is evaluated.

3.1 Dataset Description

The Ryerson Audiovisual Database of Emotional Speech and Song (RAVDESS) is a

type of SER. It contains a total of 24 professional actors, with 12 male and 12 female
Speech Emotion Recognition Using Machine Learning Approach 595

Table 1. Details of the Ravdess Dataset

Dataset Actors Instances Emotions

RAVDESS 24 7356 8

actors recording voices. It contains a total of 7356 files. Speech includes emotions of
happiness, calm, sad, angry, surprised, disgusted, and fear with two statements: “kids
are talking by the door” and “dogs are sitting by the door”. These two statements are
expressed in all the above-listed emotions. The database contains full audio/video, only
video, and only audio. As we are focusing on speech recognition, only audio is used in
this project. It contains the emotional intensity of normal and strong. This is represented
in below Table 1

3.2 Data Pre-processing

In preprocessing, data augmentation is done. It is a set of techniques to artificially

increase the amount of data by generating new data from existing data points. This
includes making small changes to the data to increase the performance of the model.
It is noted that synthetic data generation of spoken MFCC can improve the recognition
of a speaker from their utterances via a transfer learning method. Some ways of data
augmentation are through noise injection into the dataset to check the performance of
the database.

3.3 Feature Selection and Extraction

The extraction of features in the audio signal classification is a crucial method. The
proposed approach is mainly focused on Chroma, MFCC, Mel features, contrast, and
Tonnetz. Chroma is one of the powerful tools which is mainly used for analyzing pitch,
Chroma is further categorized into 12 features. MFCC stands for Mel Frequency Cepstral
Coefficient is one of the popular features used for recognizing the vocal tract mainly used
to characterize speakers, for instance, it totally contains 39 inbuilt features to extract the
audio of the speaker. Mel features are used to represent the short-term power spectrum.
Contrast enhances the speech modulation and Tonnetz are used to fine-tune the tone. The
proposed approach contains a total of 120 features, of which five are used. These are the
five main features utilized in the proposed approach. The dataset is split into training,
validation, and testing sets using a 70–20–10 split ratio. They are then split into input
X and target Y of the respective categories for further processing. The given dataset is
not identical across all of them with respect to the standard deviation of attribute values.
Because of this issue, certain attributes end up being weighted over other attributes.

3.4 Classification

The dataset includes audio signal features such as Chroma, contrast, Mfcc, Mel-
spectrum, and Tonnetz. The dataset contains a total of five features, and based on these
596 S. G. Shaila et al.

Fig. 2. RF Architecture

features, classification is done. Here, the models are trained to identify whether emo-
tions are predicted or not. Initially, the proposed approach used the SVM model. Here,
70% of the data sample is used for developing the prediction model, and 30% is used
for validating the developed model with a linear kernel function. In the next stage, the
Random Forest model uses 80% of the data sample for training the model and 20% for
validating the developed model without kernel function. Further, the proposed approach
used a random forest with 100 decision trees for experimenting, and the results were
with a ratio of 70:30, 70% for training the model and 30% for testing the model with
50 epochs. The next decision tree is used with three classes. A MLP classifier with 500
iterations and 300 iterations with a ratio of 80:20 is used. The CNN algorithm is used
with two layers, three layers, and four layers. This is depicted in Fig. 2.
SVM stands for Support Vector Machines; it is mainly used for classification prob-
lems. The main goal of SVM is to find the best boundary line that makes us easily
classify the n-dimensional space into specified classes. This boundary line is known as
the hyperplane. The decision tree is a type of supervised learning algorithm; here the
dataset is represented as a tree. As it is a tree-structured method, the dataset features are
represented by internal nodes, decision rules are represented by branches, and the leaf
node shows the output. The decision of the classifier is represented by the decision node
and the outcome by the leaf node. Random forest is a supervised learning technique
based on the concept of ensemble learning. It is similar to the decision tree. Ensemble
learning is a method of combining many classifiers for better performance and to solve
complex problems. The random forest contains the collection of decision trees. It pre-
dicts the output based on the majority voting instead of depending on a single decision
tree.
MLP stands for Multilayer Perceptron, and is a synthetic neural network feed-forward
technique that generates a bunch of outputs from a few inputs. It consists of input, hidden,
and output layers. It’s one of the deep learning methods that uses back propagation for
training the model. MLP connects multiple layers into a single graph, which suggests
the signal path to the nodes. Aside from the input node, each node features a nonlinear
activation function. CNN stands for “convolution neural network.” It is an advanced
neural network to classify emotions. CNN automatically detects the important features
without any human intervention. It uses pooling and convolution operations, and it’s
computationally efficient. It contains an input layer, a convolution layer, and an output
Speech Emotion Recognition Using Machine Learning Approach 597

Fig. 3. CNN Architecture

Table 2. Performance evaluation on Testing Data

Algorithm Accuracy F1 Score Precision Recall

SVM 0.78 0.79 0.75 0.78
Decision Tree 0.78 0.79 0.80 0.84
Random Forest 0.85 0.88 0.87 0.89
MLP 0.81 0.82 0.83 0.82
CNN 0.82 0.84 0.82 0.84

layer. The hidden layer contains a convolution layer, a max pooling layer, and a flattening
layer with dropout. This is depicted in Fig. 3 below

4 Results and Discussion

The experimentation evaluations are analyzed with five different models such as SVM,
Random Forest, Decision Tree, MLP, and CNN. Out of 7356 samples, the data set has
been divided into 70:30 (70% for training and 30% for testing), and then the dataset is
divided into 80:20 and experimented for 50 epochs. Once the training data is modeled, the
approach uses the confusion matrix as a performance metric to evaluate the performance
of the algorithms used. The confusion matrix considers True Positive (TP), False Positive
(FP), True Negative (TN), and False Negative (FN) values for evaluation. Using the
confusion matrix, classification accuracy, precision, recall, and F1-Score are evaluated
for both the classifiers. Classification Accuracy is evaluated for each of the models in
which true labels and false labels are verified for correct classification. The best results
were evaluated using five sets of features. The Table 2 depicts the performance measures
of the proposed approach with respect to the SVM, Random Forest, Decision Tree, MLP,
and CNN classifiers. The best results are obtained with the Random Forest model with
100 trees. The Random Forest achieved an accuracy of 0.83.
Thus, it is noticed that the performance of the machine learning approach in Emotion
recognition based on audio signal has gained better results.
598 S. G. Shaila et al.

5 Conclusion and Future Work

In this paper, the proposed approach mainly focuses on emotion classification based on
audio signals. The proposed system uses SVM, Decision Tree, Random Forest, MLP,
and CNN models to identify the emotions based on the extracted 5 signals from an
audio signal. The SVM model makes use of features to produce an accurate accuracy of
78.57%, the decision tree with 78.56%, the random forest with 85.71%, the MLP with
81.82%, and the 82.98% for CNN, for emotions of happy, sad, and neutral. Future work
will be focused on compound emotions such as happily surprised, happily disgusted,
sadly fearful, sadly angry, sadly surprised, sadly disgusted, and angrily fearful with the
comparison of different algorithms.

References
1. Tzirakis, P., Zhang, J., & Schuller, B. W. (2018). End-to-end speech emotion recognition
using a deep convolutional recurrent network. ICASSP.
2. Zhao, J., Mao, X., & Chen, L. (2018). Learning deep features to recognise speech emotion
using merged deep CNN. IET Signal Processing, 12(6), 713-721..
3. Tarunika, K., Pradeeba, R. B., & Aruna, P. (2018, July). Applying machine learning tech-
niques for speech emotion recognition. In 2018 9th International Conference on Computing,
Communication and Networking Technologies (ICCCNT) (pp. 1–5). IEEE.
4. Jain, M., Narayan, S., Balaji, P., Bhowmick, A., & Muthu, R. K. (2020). Speech emotion
recognition using support vector machine. arXiv preprint arXiv:2002.07590.
5. Meng, H., Yan, T., Yuan, F., & Wei, H. (2019). Speech emotion recognition from 3D log-mel
spectrograms with deep learning network. IEEE access, 7, 125868-125881
6. Deng, J., Xu, X., Zhang, Z., Frühholz, S., & Schuller, B. (2017). Semisupervised autoencoders
for speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 26(1), 31-43.
7. Aouani, H., & Ayed, Y. B. (2018, March). Emotion recognition in speech using MFCC
with SVM, DSVM and auto-encoder. In 2018 4th International conference on advanced
technologies for signal and image processing (ATSIP) (pp. 1–5). IEEE.
8. Mirsamadi, S., Barsoum, E., & Zhang, C. (2017, March). Automatic speech emotion recog-
nition using recurrent neural networks with local attention. In 2017 IEEE International
conference on acoustics, speech and signal processing (ICASSP) (pp. 2227–2231). IEEE.
9. Aouani, H., & Ayed, Y. B. (2020). Speech emotion recognition with deep learning. Procedia
Computer Science, 176, 251-260.
10. Sajjad, M., & Kwon, S. (2020). Clustering-based speech emotion recognition by incorporating
learned features and deep BiLSTM. IEEE Access, 8, 79861-79875.
Speech Emotion Recognition Using Machine Learning Approach 599

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-
NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.

FNCE 780 Fall2024
No ratings yet
FNCE 780 Fall2024
6 pages
Exploring the Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring the Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
2411.02964v2
No ratings yet
2411.02964v2
9 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
Recognition_of_emotions_in_speech_using_deep_CNN_a (1)
No ratings yet
Recognition_of_emotions_in_speech_using_deep_CNN_a (1)
18 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Reality
No ratings yet
Reality
11 pages
Review_3_PPT_final1]
No ratings yet
Review_3_PPT_final1]
51 pages
DL Emotion MFCC
No ratings yet
DL Emotion MFCC
6 pages
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
11 pages
SPRINGERIJST
No ratings yet
SPRINGERIJST
11 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
Speech
No ratings yet
Speech
12 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
9.-Yogendra
No ratings yet
9.-Yogendra
5 pages
ppt 1-1
No ratings yet
ppt 1-1
14 pages
SERDL 2
No ratings yet
SERDL 2
10 pages
1904.06022v1
No ratings yet
1904.06022v1
9 pages
Speech_Emotion_Recognition_using_Deep_Learning
No ratings yet
Speech_Emotion_Recognition_using_Deep_Learning
6 pages
EMOTIONDETECTION (1)mini project
No ratings yet
EMOTIONDETECTION (1)mini project
5 pages
Speech-Emotion-Recognition-with-Deep-Learning
No ratings yet
Speech-Emotion-Recognition-with-Deep-Learning
5 pages
Speech Emotion Journal phase 2-3
No ratings yet
Speech Emotion Journal phase 2-3
6 pages
SECOND - s11042 023 16849 X
No ratings yet
SECOND - s11042 023 16849 X
18 pages
XEmoAccent Embracing Diversity in Cross-Accent Emo
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emo
19 pages
Research Paper
No ratings yet
Research Paper
5 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
10 1109@access 2019 2936124
No ratings yet
10 1109@access 2019 2936124
19 pages
Speech Databases Speech Features and Classifiers in Speech Emotion Recognition a Review
No ratings yet
Speech Databases Speech Features and Classifiers in Speech Emotion Recognition a Review
31 pages
48
No ratings yet
48
10 pages
Speech Emotion Recognition Using Deep Learning Techniques: A Review
No ratings yet
Speech Emotion Recognition Using Deep Learning Techniques: A Review
19 pages
MS Thesis Final
No ratings yet
MS Thesis Final
47 pages
2019 BE Emotionrecognition ICESTMM19
No ratings yet
2019 BE Emotionrecognition ICESTMM19
8 pages
1-s2.0-S0003682X23002906-main
No ratings yet
1-s2.0-S0003682X23002906-main
11 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
No ratings yet
Deep Learning Based Emotion Recognition System Using Speech Features and Transcriptions
12 pages
Project PPT(1)
No ratings yet
Project PPT(1)
13 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
Human_Speech_Emotion_Recognition_Using_Artificial_Neural_Networks_Technique
No ratings yet
Human_Speech_Emotion_Recognition_Using_Artificial_Neural_Networks_Technique
7 pages
Emotion_Recognition_from_Speech_via_the_Use_of_Dif (1)
No ratings yet
Emotion_Recognition_from_Speech_via_the_Use_of_Dif (1)
11 pages
CNN_based_approach_for_Speech_Emotion_Recognition_Using_MFCC_Croma_and_STFT_Hand-crafted_features
No ratings yet
CNN_based_approach_for_Speech_Emotion_Recognition_Using_MFCC_Croma_and_STFT_Hand-crafted_features
5 pages
SET CONFERENCE DRAFT PAPER_223585
No ratings yet
SET CONFERENCE DRAFT PAPER_223585
6 pages
Final Presentation
No ratings yet
Final Presentation
50 pages
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
No ratings yet
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
2 pages
FP-05.4
No ratings yet
FP-05.4
6 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
JETIR2106163 (37)
No ratings yet
JETIR2106163 (37)
5 pages
2304.11040v1
No ratings yet
2304.11040v1
15 pages
Research Paper On Speech Emotion Recogtion System
No ratings yet
Research Paper On Speech Emotion Recogtion System
9 pages
Deep_Learning_Techniques_for_Speech_Emotion_Recognition_A_Review
No ratings yet
Deep_Learning_Techniques_for_Speech_Emotion_Recognition_A_Review
6 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
MiniProject 5
No ratings yet
MiniProject 5
11 pages
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
No ratings yet
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
6 pages
Voice Emotion Recognition
No ratings yet
Voice Emotion Recognition
11 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
[2]
No ratings yet
[2]
7 pages
Emotion_classification_from_speech_signal_based_on
No ratings yet
Emotion_classification_from_speech_signal_based_on
16 pages
Speech Emotions Recognition Using Machine Learning
No ratings yet
Speech Emotions Recognition Using Machine Learning
5 pages
Group 110 Arun Kumar Review 2 Report
No ratings yet
Group 110 Arun Kumar Review 2 Report
14 pages
s11042-022-13363-4
No ratings yet
s11042-022-13363-4
22 pages
Affective Computing: Fundamentals and Applications
From Everand
Affective Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kismet: Fundamentals and Applications
From Everand
Kismet: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 5 Artificial Intelligence notes
No ratings yet
Chapter 5 Artificial Intelligence notes
7 pages
2022 CSSC Admission Test
No ratings yet
2022 CSSC Admission Test
18 pages
AI Meets HR
No ratings yet
AI Meets HR
49 pages
IAIO-regulations
No ratings yet
IAIO-regulations
18 pages
Deepl Translation
No ratings yet
Deepl Translation
10 pages
Assignment 3 Subhesh Introduction To Supply Chain Management
No ratings yet
Assignment 3 Subhesh Introduction To Supply Chain Management
9 pages
Applsci 13 05875
No ratings yet
Applsci 13 05875
20 pages
Download Full Understanding Deep Learning Simon J. D. Prince PDF All Chapters
100% (12)
Download Full Understanding Deep Learning Simon J. D. Prince PDF All Chapters
48 pages
Class X Holiday Homework
No ratings yet
Class X Holiday Homework
4 pages
Satelite Image Poverty Prediction
No ratings yet
Satelite Image Poverty Prediction
9 pages
Evaluating Expressive Piano Performances With Psychoacoustic Fitness Function
No ratings yet
Evaluating Expressive Piano Performances With Psychoacoustic Fitness Function
6 pages
Module 1 IoT - Landscape
No ratings yet
Module 1 IoT - Landscape
51 pages
Object Detection Research Paper
No ratings yet
Object Detection Research Paper
5 pages
AI Practical 1
No ratings yet
AI Practical 1
3 pages
Text Analytics Machine Learning Technique
No ratings yet
Text Analytics Machine Learning Technique
13 pages
Subtitles For Ex-Machina
No ratings yet
Subtitles For Ex-Machina
88 pages
AI MODEL PAPER
No ratings yet
AI MODEL PAPER
1 page
Machine Learning For Side-Channel Analysis: Stjepan Picek TU Delft, The Netherlands
No ratings yet
Machine Learning For Side-Channel Analysis: Stjepan Picek TU Delft, The Netherlands
52 pages
the-impact-of-artificial-intelligence-on-operations-management
No ratings yet
the-impact-of-artificial-intelligence-on-operations-management
2 pages
Java Project List
50% (2)
Java Project List
2 pages
A REVOLUTION IN PHYSICAL EDUCATION AND SPORTS
No ratings yet
A REVOLUTION IN PHYSICAL EDUCATION AND SPORTS
4 pages
AI Activity 1 Interview Skills
No ratings yet
AI Activity 1 Interview Skills
1 page
AL&ML
No ratings yet
AL&ML
11 pages
Assignment 1 and Activity 1-ORINES, KEEDY - BSED FIL 2B
No ratings yet
Assignment 1 and Activity 1-ORINES, KEEDY - BSED FIL 2B
2 pages
Listen To Deep Learning Audiobook by David Feldspar and Jason R. Gray
No ratings yet
Listen To Deep Learning Audiobook by David Feldspar and Jason R. Gray
1 page
DOC-20241212-WA0007
No ratings yet
DOC-20241212-WA0007
23 pages
Makeup Dissertation
100% (2)
Makeup Dissertation
5 pages
Project
No ratings yet
Project
3 pages
Appendix
No ratings yet
Appendix
22 pages

Speech Emotion Recognition Using Machine Learning

Uploaded by

Speech Emotion Recognition Using Machine Learning

Uploaded by

Speech Emotion Recognition Using Machine

S. G. Shaila(B) , A. Sindhu, L. Monish, D. Shivamma, and B. Vaishali

Abstract. Nowadays, emotion recognition and classification plays a vital role

Keywords: Emotions · Audio Signal · Random Forest (RF) · Multilayer

© The Author(s) 2023

Fig. 1. Proposed Model for identifying Emotions

3.1 Dataset Description

The Ryerson Audiovisual Database of Emotional Speech and Song (RAVDESS) is a

Table 1. Details of the Ravdess Dataset

Dataset Actors Instances Emotions

3.2 Data Pre-processing

In preprocessing, data augmentation is done. It is a set of techniques to artificially

3.3 Feature Selection and Extraction

Fig. 3. CNN Architecture

Table 2. Performance evaluation on Testing Data

Algorithm Accuracy F1 Score Precision Recall

4 Results and Discussion

5 Conclusion and Future Work

You might also like