MiniProject 5

Uploaded by

Akanksha Raj

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

MiniProject 5

Uploaded by

Akanksha Raj

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Speech Emotion Recognition

using ML

Submitted By - Under the mentorship of -

Akanksha Raj Dr. Ankit Tomar
U. Roll no - 2019272 Assistant Professor
Introduction
The most elementary way of communication in humans is Speech. To enrich interaction, one needs to know and understand the emotion of
another person and how to react to it. Unlike machines, we humans can naturally recognize the nature and emotion of the speech. Can a
machine also detect the emotion from a speech? Well, this could be made possible using machine learning. Machines need a specific model for
detecting the emotions of a speech and such a model can be implemented using machine learning. A machine detecting the emotion of a human
speech can be proved useful in various industries. A very basic usage of speech recognition is in the health sector where it can be used in
detecting depression, anxiety, stress etc. in a patient. It can also be used in industries like the crime sector where emotions can be recognized
from the speech to distinguish between victims and criminals.

Machine Learning is a well-known procedure of foreseeing or Classifying information to assist with people in pursuing important choices. In
order to learn from previous experiences and analyse the verifiable data, ML computations are prepared over cases or models. Just structure
models aren't sufficient. The model should be adequately advanced and tuned so that it gives you precise results. In order to achieve the best
results, streamlining strategies require tweaking the hyper parameters. As it repeatedly trains on the models, it gains the ability to detect
designs, enabling more precise decision-making. When the ML model is familiar with any new data, it applies its learnt lessons to the new data
and creates predictions for the future. Using various normalized methodologies, one can advance their models in light of the most recent
exactness. In a similar vein, Al models learn how to adapt to novel models and deliver better outcomes.
Problem Statement
Feelings assume a fundamental part in
correspondence, the location and examination of the
equivalent is of imperative significance in the present
computerized universe of distant correspondence.
Feeling identification is a testing task, since feelings
are emotional. We characterize a SER framework as
an assortment of strategies that cycle and group
discourse signs to identify feelings implanted in them.
Such a system has a vast variety of application, such
as intelligent voice-based assistants and expert guest
conversation research. The goal of this work is to
identify fundamental emotions in recorded
conversation by breaking down the acoustic
components of the sound data of reports. In this
undertaking, we will foresee the feeling in the
discourse of an individual's sound on the given
dataset utilizing CNN and profound learning
calculations. The dataset comprises 2,800 sound
records of 2 female voices with various feelings like
anger, disgust, fear, happiness, pleasant surprise,
Methodology • Librosa is a library that is used for analyzing the behavior of audio. It helps in loading
audio files, extracting the characteristics of the music, and visualizing audio data.

• The os library provides functions for interacting with the operating system, allowing
1.Import Required Libraries tasks like file management and directory manipulation in Python.

• TensorFlow is a popular deep learning framework used for building, training, and
deploying machine learning models, particularly neural networks.

• Matplotlib is a plotting library in Python used to create high-quality 2D and 3D

visualizations of data and results.

• NumPy is used for numerical computing in Python and provides essential tools for
array manipulation, mathematical operations, and linear algebra.

2.Data Collection and Preprocessing

• TESS is a dataset which has audio files of 200 target words spoken in the
carrier phrase "Say the word _' by two actresses (aged 26 and 64 years) and
recordings were made of the set portraying each of seven emotions (anger,
disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are
2800 audio files in total.
• The dataset is organized such that each of the two female actor and their
emotions are contain within its own folder. And within that, all 200 target
words audio file can be found. The format of the audio file is a WAV format
3. Exploratory Data Analysis 4. Feature Extraction
• Sequential is used to create a linear stack of layers, and Dense, LSTM, and Dropout are layer types that
5. Model Architecture and Training can be added to the model.

• STM Layer: A Long Short-Term Memory (LSTM) layer with 256 units, set to return only the last output
sequence (return_sequences=False). It takes input sequences of shape (40, 1), where 40 represents the
sequence length, and 1 is the number of features at each time step.

• Dropout Layer: A dropout layer with a dropout rate of 0.2 is added after the LSTM layer. Dropout is a
regularization technique that helps prevent overfitting by randomly setting a fraction of input units to 0 at
each update during training.

• Dense Layer (ReLU Activation): A fully connected (dense) layer with 128 units and Rectified Linear Unit
(ReLU) activation function is added. ReLU is a common activation function that introduces non-linearity.

• Dropout Layer: Another dropout layer with a rate of 0.2 is added after the dense layer.

• Dense Layer (ReLU Activation): Another fully connected layer with 64 units and ReLU activation.

• Dropout Layer: A dropout layer with a rate of 0.2 is added after the second dense layer.

• Dense Layer (Softmax Activation): The final layer is a dense layer with 7 units and softmax activation.
This is often used in multi-class classification problems, where the network outputs probability
distribution over different classes.
Result and Discussion
4.1 Model Performance Metrics:
The implemented CNN-based SER model exhibited commendable performance on the provided dataset. The
model achieved an accuracy of approximately 97% on the training dataset and 94% on the testing dataset.
While evaluating the confusion matrix, the model showcased robustness in recognizing various emotions,
particularly excelling in discerning 'Neutral' and 'Happy' emotions. However, it exhibited relatively lower
accuracy in classifying 'Angry' and 'Disgust' emotions, possibly due to the inherent complexity and nuances in
identifying these emotions solely from speech signals.
4.2 Comparative Analysis:

Comparing the model's performance against existing state-of-the-

art SER approaches reveals noteworthy observations. The
proposed CNN-based model yielded competitive accuracy rates
compared to traditional machine learning techniques,
demonstrating the efficacy of leveraging deep learning for SER
tasks. Nevertheless, further analysis is required to comprehend the
model's performance concerning specific emotions and the
potential influence of imbalanced data distribution across emotion
classes.
4.3 Strengths and Limitations:
The strengths of the CNN-based SER model lie in its ability to automatically extract intricate patterns and hierarchical features from MFCC
representations, enabling better discrimination among various emotions. The model’s adaptability to complex data and its high-dimensional feature
extraction capabilities contribute significantly to its success. However, inherent limitations exist, notably the dependency on the quality and diversity of
the dataset. The model’s performance might be influenced by imbalanced data distributions among emotion classes, potentially leading to biased
predictions. Additionally, challenges persist in accurately capturing subtle emotional nuances and cultural variations in speech, warranting further
exploration and data augmentation strategies
Conclusion and Future Work
5.1 Conclusions
In the project, deep learning is used to analyse certain speech samples. In order to illustrate the various human emotions, first the dataset is loaded using the Librosa library and
depicted them in the form of various wave plots and spectrograms. Then, the MFCC feature extraction method is used to analyse the acoustic characteristics of all of the samples
and the sequential data obtained is organized in the 3D array form that the CNN model accepts.
Using the Matplotlib library, the data is put into a graphical form, then after some repeated Testing with various values it is revealed that the model's average accuracy is 94% at
testing and 97% at the training phase

5.2 Future Scope

5.2.1. Data Augmentation and Diverse Datasets:
 Augmentation Strategies: Implement advanced data augmentation techniques to address data imbalances and enrich the diversity of emotional expressions within the dataset.
 Multilingual and Multicultural Datasets: Curate datasets encompassing diverse languages and cultural contexts to improve the model's adaptability and robustness in
recognizing emotions across different demographics.
5.2.2. Model Refinement and Optimization:
 Architecture Refinement: Exploring modifications to the CNN architecture, incorporating attention mechanisms, ensemble techniques, or deeper network structures to capture
finer emotional nuances present in speech signals.
 Feature Engineering: Investigating alternative feature representations or fusion of multimodal features (audio-visual, textual) to extract more discriminative emotional cues
and improve classification accuracy.
5.2.3. Contextual Understanding and Real-Time Applications:
 Contextual Analysis: Incorporate contextual understanding by analysing the context surrounding speech to enhance emotion recognition accuracy. Emphasizing the temporal
dynamics and sequence modelling for a more comprehensive interpretation of emotional cues.
 Real-Time Applications: Adapting the model for real-time applications, enabling its integration into interactive systems, virtual assistants, or therapeutic applications
requiring accurate emotion detection in speech.
Thank you!

Project-PPT-Speech Emotion Recognition
85% (13)
Project-PPT-Speech Emotion Recognition
10 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
55 pages
TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Department of Electrical Engineering: Riphah College of Science & Technology Faculty of Engineering & Applied Sciences
No ratings yet
Department of Electrical Engineering: Riphah College of Science & Technology Faculty of Engineering & Applied Sciences
6 pages
Syllabus For Online English G100: Freshman Composition: Course Description
No ratings yet
Syllabus For Online English G100: Freshman Composition: Course Description
7 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
Reality
No ratings yet
Reality
11 pages
Exploring the Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring the Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
JETIR2106163 (37)
No ratings yet
JETIR2106163 (37)
5 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
EMOTIONDETECTION (1)mini project
No ratings yet
EMOTIONDETECTION (1)mini project
5 pages
Final Presentation
No ratings yet
Final Presentation
50 pages
Research Paper
No ratings yet
Research Paper
5 pages
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
No ratings yet
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
2 pages
Cyprus University of Technology TEPAK Report Template English PDF
No ratings yet
Cyprus University of Technology TEPAK Report Template English PDF
17 pages
Cyprus University of Technology TEPAK Report Template English PDF
No ratings yet
Cyprus University of Technology TEPAK Report Template English PDF
17 pages
Group No 37
No ratings yet
Group No 37
19 pages
Recognition_of_emotions_in_speech_using_deep_CNN_a (1)
No ratings yet
Recognition_of_emotions_in_speech_using_deep_CNN_a (1)
18 pages
CNN_based_approach_for_Speech_Emotion_Recognition_Using_MFCC_Croma_and_STFT_Hand-crafted_features
No ratings yet
CNN_based_approach_for_Speech_Emotion_Recognition_Using_MFCC_Croma_and_STFT_Hand-crafted_features
5 pages
Project Report - 092046
No ratings yet
Project Report - 092046
5 pages
Irjet V7i6804
No ratings yet
Irjet V7i6804
7 pages
Presentation1 (Autosaved) (Autosaved)
No ratings yet
Presentation1 (Autosaved) (Autosaved)
20 pages
SERDL 2
No ratings yet
SERDL 2
10 pages
Speech Emotion System Full Project Report
No ratings yet
Speech Emotion System Full Project Report
54 pages
Internship Project
No ratings yet
Internship Project
9 pages
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
No ratings yet
An Ensemble 1D-CNN-LSTM-GRU Model With Data Augmentation For Speech Emotion Recognition
19 pages
2411.02964v2
No ratings yet
2411.02964v2
9 pages
1822 B.E Cse Batchno 140
No ratings yet
1822 B.E Cse Batchno 140
55 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
SECOND - s11042 023 16849 X
No ratings yet
SECOND - s11042 023 16849 X
18 pages
Electronics 11 03831
No ratings yet
Electronics 11 03831
12 pages
Speech_Emotion_Recognition_Using_Deep_Learning_Hybrid_Models
No ratings yet
Speech_Emotion_Recognition_Using_Deep_Learning_Hybrid_Models
5 pages
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
No ratings yet
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
7 pages
ppt 1-1
No ratings yet
ppt 1-1
14 pages
DL Emotion MFCC
No ratings yet
DL Emotion MFCC
6 pages
1904.06022v1
No ratings yet
1904.06022v1
9 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Multimodal Emotion Detection With An Emphasis On Speech Modal
No ratings yet
Multimodal Emotion Detection With An Emphasis On Speech Modal
38 pages
Book
No ratings yet
Book
25 pages
Group 110 Arun Kumar Review 2 Report
No ratings yet
Group 110 Arun Kumar Review 2 Report
14 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
Speech_Emotion_Recognition_using_Deep_Learning
No ratings yet
Speech_Emotion_Recognition_using_Deep_Learning
6 pages
Sample Poster Template CSE (1)
No ratings yet
Sample Poster Template CSE (1)
1 page
A Complete Phase 3
No ratings yet
A Complete Phase 3
14 pages
Emotion Detection Through Speech
No ratings yet
Emotion Detection Through Speech
9 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Emotion Detection Project Report
No ratings yet
Emotion Detection Project Report
51 pages
Emonet
No ratings yet
Emonet
16 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
4 pages
Speech Emotion Recognition (Sound C
No ratings yet
Speech Emotion Recognition (Sound C
2 pages
Emotion Tagging in An Audio Signal Using Weakly Supervised Learning
No ratings yet
Emotion Tagging in An Audio Signal Using Weakly Supervised Learning
46 pages
Research Proposal
No ratings yet
Research Proposal
3 pages
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
No ratings yet
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
12 pages
MS Thesis Final
No ratings yet
MS Thesis Final
47 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
Wa0007
No ratings yet
Wa0007
6 pages
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
No ratings yet
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
12 pages
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
From Everand
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
César Pérez López
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
Sound Eklin ESeries - Manual
No ratings yet
Sound Eklin ESeries - Manual
30 pages
Вкладыши форклифт Mitsubishi Fuso-2008
100% (1)
Вкладыши форклифт Mitsubishi Fuso-2008
7 pages
IX PROFIBUS Converter Instruction Manual - T741
No ratings yet
IX PROFIBUS Converter Instruction Manual - T741
24 pages
Module-1: Introduction: # What Is Mikrotik ?
No ratings yet
Module-1: Introduction: # What Is Mikrotik ?
5 pages
Saudi Aramco Typical Inspection Plan: Renovation & Repair Coating For Submerged Service-APCS 113 A SATIP-H-002-06
No ratings yet
Saudi Aramco Typical Inspection Plan: Renovation & Repair Coating For Submerged Service-APCS 113 A SATIP-H-002-06
6 pages
Module Week 2 3 Epp Lecture Notes
No ratings yet
Module Week 2 3 Epp Lecture Notes
23 pages
System Analysis and Design Lab 7: Question 1: Part A
No ratings yet
System Analysis and Design Lab 7: Question 1: Part A
4 pages
MPC 006 PDF
No ratings yet
MPC 006 PDF
313 pages
TX 1
No ratings yet
TX 1
14 pages
AJ Industrial Solutions Compnay Profile 1
No ratings yet
AJ Industrial Solutions Compnay Profile 1
11 pages
Metropolitan School District of Pike Township
No ratings yet
Metropolitan School District of Pike Township
4 pages
UEL-EG-7020_Week_1_Topic_Overview
No ratings yet
UEL-EG-7020_Week_1_Topic_Overview
18 pages
Purchase Order: Jaquar & Co. Pvt. LTD (Bhiwadi)
No ratings yet
Purchase Order: Jaquar & Co. Pvt. LTD (Bhiwadi)
2 pages
A Cruel Angels Thesis Spanish
100% (2)
A Cruel Angels Thesis Spanish
6 pages
Fire Fighting Break Up
No ratings yet
Fire Fighting Break Up
2 pages
ACS880_multidrives_catalog_3AUA0000115037_RevI_EN
No ratings yet
ACS880_multidrives_catalog_3AUA0000115037_RevI_EN
64 pages
When OMneT++ Goes Python
No ratings yet
When OMneT++ Goes Python
17 pages
SW Hart Selection Program v212 (2) (1) 1 (2)
No ratings yet
SW Hart Selection Program v212 (2) (1) 1 (2)
55 pages
Fans and Blower
No ratings yet
Fans and Blower
45 pages
Data Recovery Tomer
No ratings yet
Data Recovery Tomer
6 pages
SDA Lab 4
No ratings yet
SDA Lab 4
17 pages
Descriptor Observers For Robust Fault Reconstruction in A Class of Non-Linear Descriptor Systems
No ratings yet
Descriptor Observers For Robust Fault Reconstruction in A Class of Non-Linear Descriptor Systems
14 pages
Autosar Exp Aracomapi
No ratings yet
Autosar Exp Aracomapi
120 pages
03 - FWMS Cyber Risk Management Manual Rev 00 Section 2
No ratings yet
03 - FWMS Cyber Risk Management Manual Rev 00 Section 2
12 pages
RS-485 2x2x24 9392002xxx
No ratings yet
RS-485 2x2x24 9392002xxx
2 pages
Nexus 9000 Series
No ratings yet
Nexus 9000 Series
32 pages
Multirate filtering for digital signal processing MATLAB applications 1st Edition Ljiljana Milic - Quickly download the ebook to read anytime, anywhere
100% (1)
Multirate filtering for digital signal processing MATLAB applications 1st Edition Ljiljana Milic - Quickly download the ebook to read anytime, anywhere
85 pages
STAT 252-Notes-Topic 5-Multiple Linear Regression
No ratings yet
STAT 252-Notes-Topic 5-Multiple Linear Regression
33 pages

MiniProject 5

Uploaded by

MiniProject 5

Uploaded by

Speech Emotion Recognition

Submitted By - Under the mentorship of -

• Matplotlib is a plotting library in Python used to create high-quality 2D and 3D

2.Data Collection and Preprocessing

Comparing the model's performance against existing state-of-the-

5.2 Future Scope

You might also like