MiniProject 5
MiniProject 5
using ML
Machine Learning is a well-known procedure of foreseeing or Classifying information to assist with people in pursuing important choices. In
order to learn from previous experiences and analyse the verifiable data, ML computations are prepared over cases or models. Just structure
models aren't sufficient. The model should be adequately advanced and tuned so that it gives you precise results. In order to achieve the best
results, streamlining strategies require tweaking the hyper parameters. As it repeatedly trains on the models, it gains the ability to detect
designs, enabling more precise decision-making. When the ML model is familiar with any new data, it applies its learnt lessons to the new data
and creates predictions for the future. Using various normalized methodologies, one can advance their models in light of the most recent
exactness. In a similar vein, Al models learn how to adapt to novel models and deliver better outcomes.
Problem Statement
Feelings assume a fundamental part in
correspondence, the location and examination of the
equivalent is of imperative significance in the present
computerized universe of distant correspondence.
Feeling identification is a testing task, since feelings
are emotional. We characterize a SER framework as
an assortment of strategies that cycle and group
discourse signs to identify feelings implanted in them.
Such a system has a vast variety of application, such
as intelligent voice-based assistants and expert guest
conversation research. The goal of this work is to
identify fundamental emotions in recorded
conversation by breaking down the acoustic
components of the sound data of reports. In this
undertaking, we will foresee the feeling in the
discourse of an individual's sound on the given
dataset utilizing CNN and profound learning
calculations. The dataset comprises 2,800 sound
records of 2 female voices with various feelings like
anger, disgust, fear, happiness, pleasant surprise,
Methodology • Librosa is a library that is used for analyzing the behavior of audio. It helps in loading
audio files, extracting the characteristics of the music, and visualizing audio data.
• The os library provides functions for interacting with the operating system, allowing
1.Import Required Libraries tasks like file management and directory manipulation in Python.
• TensorFlow is a popular deep learning framework used for building, training, and
deploying machine learning models, particularly neural networks.
• NumPy is used for numerical computing in Python and provides essential tools for
array manipulation, mathematical operations, and linear algebra.
• TESS is a dataset which has audio files of 200 target words spoken in the
carrier phrase "Say the word _' by two actresses (aged 26 and 64 years) and
recordings were made of the set portraying each of seven emotions (anger,
disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are
2800 audio files in total.
• The dataset is organized such that each of the two female actor and their
emotions are contain within its own folder. And within that, all 200 target
words audio file can be found. The format of the audio file is a WAV format
3. Exploratory Data Analysis 4. Feature Extraction
• Sequential is used to create a linear stack of layers, and Dense, LSTM, and Dropout are layer types that
5. Model Architecture and Training can be added to the model.
• STM Layer: A Long Short-Term Memory (LSTM) layer with 256 units, set to return only the last output
sequence (return_sequences=False). It takes input sequences of shape (40, 1), where 40 represents the
sequence length, and 1 is the number of features at each time step.
• Dropout Layer: A dropout layer with a dropout rate of 0.2 is added after the LSTM layer. Dropout is a
regularization technique that helps prevent overfitting by randomly setting a fraction of input units to 0 at
each update during training.
• Dense Layer (ReLU Activation): A fully connected (dense) layer with 128 units and Rectified Linear Unit
(ReLU) activation function is added. ReLU is a common activation function that introduces non-linearity.
• Dropout Layer: Another dropout layer with a rate of 0.2 is added after the dense layer.
• Dense Layer (ReLU Activation): Another fully connected layer with 64 units and ReLU activation.
• Dropout Layer: A dropout layer with a rate of 0.2 is added after the second dense layer.
• Dense Layer (Softmax Activation): The final layer is a dense layer with 7 units and softmax activation.
This is often used in multi-class classification problems, where the network outputs probability
distribution over different classes.
Result and Discussion
4.1 Model Performance Metrics:
The implemented CNN-based SER model exhibited commendable performance on the provided dataset. The
model achieved an accuracy of approximately 97% on the training dataset and 94% on the testing dataset.
While evaluating the confusion matrix, the model showcased robustness in recognizing various emotions,
particularly excelling in discerning 'Neutral' and 'Happy' emotions. However, it exhibited relatively lower
accuracy in classifying 'Angry' and 'Disgust' emotions, possibly due to the inherent complexity and nuances in
identifying these emotions solely from speech signals.
4.2 Comparative Analysis: