Final Report 2.1 Repaired
Final Report 2.1 Repaired
ON
Submitted To:
Department of Computer Science and Information Technology
Tribhuvan University Kritipur, Kathamandu
Submitted by:
August 2024
AMBIKESHWARI CAMPUS
Tribhuvan University
STUDENT DECLARATION
I hereby declare that the project report entitled “Emotion Based Music
Recommendation System” submitted in partial fulfillment of the requirement for
bachelors of Science in Computer Science and Information Technology.
…………………………
Tribhuvan University
i
SUPERVISOR’S RECOMMENDATION
I hereby recommend that this project report prepared under my supervision by Milan
Dangi entitled Emotion Based Music Recommendation System is satisfactory in the
partial fulfillment of the requirement for the degree of Bachelor of Science in Computer
Science and Information Technology and be processed for the evaluation.
………………….
Tribhuvan University
LETTER OF APPROVAL
This is to certify that this project prepared by Milan Dangi entitled “Emotion Based Music
Recommendation System” in partial fulfillment of the requirement for Bachelor’s degree
in Communication and Information Technology of Tribhuvan University has been well
studied. In our opinion, it is satisfactory in the scope and quality of the degree.
ii
……………………………… ………………………………
Supervisor Campus Chief
Mr. Raj Singh Jora Mr. Chandra Prakash Khanal
Ambikeshwari Campus Ambikeshwari Campus
……………………………… ………………………………
Internal Examiner External Examiner
Ambikeshwari Campus IOST, Tribhuvan University
ACKNOWLEDGEMENT
I would like to extend my sincere gratitude to everyone who contributed to the successful
completion of the project.
I extend my heartfelt thanks to Mr. Chandra Prakash Kannal , campus chief for providing
me with this opportunity to showcase our learning through the project and for his constant
motivation , encouragement and for suitable working environment.
I am profoundly grateful to my supervisor and Head of Department Mr. Raj Singh Jora
for his expert guidance, continuous encouragement and constructive feedback during the
project development. His useful suggestion for this project and cooperative behavior are
sincerely acknowledge.
I also wish to acknowledge the support and inspiration provided my teacher Er. Sanni
Khanal and to the entire faculty teacher whose knowledge and advice played crucial role
during working in the project.
At the end, I would like to express our sincere thanks to all friends and others who helped
me directly or indirectly during the project work.
iii
ABSTRACT
“The Emotion Based Music Recommendation system” is an application that
automatically plays music chosen according to the real time facial emotions of the user.
Many music platforms are currently using collaborative filtering and content-based
models for matching preferences of users to selected songs, but these methods can be
improved by analyzing current facial expressions and emotions of a user. The main
purpose of this project is to provide a set of songs matching the current mood of the user
by analyzing his/her facial expressions and detecting his/her emotions.
This project aims to design a music recommendation system that assists users in deciding
which music to listen to, ultimately helping them reduce stress levels. It means that there
will be no more need for spending too much time searching for songs since it will
automatically suggest suitable tracks according to their moods. In order to do this, CNNs
(Convolutional Neural Networks) are employed so as to recognize faces and classify
them. By integrating the model with Spotify Web API, the system recommends music to
users depending on their moods thus making sure they have personalized seamless music
experience.
Keyword: Music, Mood, Facial recognition, Feature extraction, Feature detection,
Convolution neural network
TABLE OF CONTENT
STUDENT DECLARATION....................................................................................................
SUPERVISOR’S RECOMMENDATION..............................................................................ii
LETTER OF APPROVAL......................................................................................................iii
ACKNOWLEDGEMENT.......................................................................................................iv
ABSTRACT...............................................................................................................................v
TABLE OF CONTENT...........................................................................................................vi
LIST OF ABBREVIATIONS................................................................................................viii
LIST OF FIGURES.................................................................................................................ix
LIST OF TABLES.....................................................................................................................x
iv
Chapter 1: Introduction...........................................................................................................1
1.1 Introduction.......................................................................................................................1
1.2 Problem Statement............................................................................................................2
1.3 Objectives..........................................................................................................................2
1.4 Scope and limitation..........................................................................................................2
1.5 Development Methodology...............................................................................................3
1.6 Report Organizations........................................................................................................4
Chapter 2: Background Study and Literature Review..........................................................6
4.1 Design.............................................................................................................................14
4.2 Algorithm Details............................................................................................................15
Chapter5: Implementation and testing.................................................................................23
5.1 Implementation...............................................................................................................23
5.1.1 Tool used..................................................................................................................23
5.1.2 Implementation details of modules..........................................................................26
v
5.2 Testing.............................................................................................................................27
5.2.1 Test Cases for Unit Testing......................................................................................27
5.2.2 Test Cases for System Testing..................................................................................29
5.3 Result Analysis................................................................................................................29
Chapter 6: Conclusion and Future Recommendation.........................................................33
6.1 Conclusion......................................................................................................................33
6.2 Future Recommendation.................................................................................................33
References................................................................................................................................34
Appendices...............................................................................................................................35
LIST OF ABBREVIATIONS
API Application Programming Interface
CNN Convolution Neural Network
DFD Data Flow Diagram
GPU Graphics Processing Unit
FER Facial Expression Recognition
GIT Global Information Tracker
HOG Histogram of Oriented Gradients
IDE Integrated Development Environment
MIT Massachusetts Institute of Technology
OpenCV Open Source Computer Vision Library
PCA Principal Component Analysis
RAM Random Access Memory
RGB Red Green Blue
ReLU Rectified Linear Unit
LIST OF FIGURES
Figure 1.1 Life cycle of Agile Methodology .......................................................................4
Figure 3.1 Use Case Diagram of System .............................................................................8
Figure 3.2 Gantt chart ........................................................................................................11
vi
Figure 3.3 Flowchart of system .........................................................................................
12
Figure 3.4 DFD level 0 ......................................................................................................
13
Figure 3.5 DFD level 1 ......................................................................................................
13
Figure 4.1 System architecture ..........................................................................................
14
Figure 4.2 Feature extraction and recognition process in CNN ........................................
16
Figure 4.3 Gray-Scale Image .............................................................................................16
Figure 4.4 Convolution Layer ............................................................................................17
Figure 4.5 Convolution matrix after ReLU function applied ............................................
17
Figure 4.6 Pooling Layer ...................................................................................................
18
Figure 4.7 Flatten matrix after pooling operation..............................................................18
Figure 4.8 Fully- Connected Layer ....................................................................................19
Figure 4.9 Dropout Layer ..................................................................................................
21
Figure 5.1 Samples of FER2013 Dataset ...........................................................................25
Figure 5.2 Line Chart for accuracy and loss analysis ........................................................31
Figure 5.3 Predicted output on test data ............................................................................
31
Figure 5.4 Confusion Matrix for True labels vs Predicted Labels .................................... 32
LIST OF TABLES
Table 1 Gantt chart ............................................................................................................ 10
Table 2 Confusion Matrix ..................................................................................................22
Table 3 CNN Model Testing............................................................................................. 27
Table 4 Spotify Integration Testing...................................................................................28
Table 5 Recommendation music track .............................................................................. 28
Table 6 System Testing..................................................................................................... 29
Table 7 Accuracy of model................................................................................................30
Table 8 Hyperparameter and their value ........................................................................... 30
vii
Chapter 1: Introduction
1.1 Introduction
Music plays important role in influencing and motivating people, thus making such
system may highly significant for peoples. Many of the studies in recent years admit that
humans reply and react to music and this music has a high impression on the activity of
the human brain. By aligning music choices with listener’s current mood, these systems
enhance the overall listening experiences and can offer comfort, motivation or relaxation
and may boost listener’s working capacity.
The meter, timbre, rhythm, and pitch of music are managed in areas of the brain that
affect emotions and mood. Interaction between individuals may be a major aspect of
lifestyle. It reveals perfect details and much data among humans, whether they are in the
form of body language, speech, facial expression, or emotions. Nowadays, emotion
detection is considered the most important technique used in many applications such as
smart card applications, surveillance, image database investigation, criminal, video
indexing, civilian applications, security, and adaptive human-computer interface with
multimedia environments. [1]
The project aims to capture the emotion expressed by the person through the web camera
interface available on computing system or by uploading images. The software extracts
features (eyebrow, eye movement, mouth shaping, nose pointing) from the target user
with the help of Image segmentation and image processing techniques and tries to detect
the facial expression which may be one of the following: “Angry”, “Disgust”, “Fear”,
“Happy”, “Sad”, “Surprise” or “Neutral” that user tries to express. When the software
detects the user’s emotion it helps to lighten the mood of the user, by playing songs that
match the requirements of the user by capturing or uploading images of the user. In some
cases, mood alteration may also help in overcoming situations like depression and
sadness. With the aid of expression analysis, many health risks can be avoided and also
there can be steps taken that help bring the mood of a user to a better stage.
1.2 Problem Statement
Human emotions are complex and diverse, always in the constant search for happiness as
they face life’s challenges. Many accidental mood changes occur just because of mild
stressors. On the contrary, some people experience chronic feelings of dread that amounts
to depression, which is a common mood disorder. It has been established that these
1
unsettled feelings interfere with their daily functions and thus reduce productivity. In this
modern swift-developing world characterized by technology, there is a high inclination
towards instant pleasures as a way of boosting one’s morale. The choice often lies
especially on music – an acceptable source of entertainment for anyone around. But it can
be hard to find the perfect song that aligns with one’s current emotional state because
there are millions on the internet. Therefore, people requiring music based on their moods
have become more and more numerous lately and advanced quickly into the realm of
tailoring applications matching tunes with any kind of feeling.
1.3 Objectives
The project attempts to fulfill the following requirements:
● To develop an accurate and reliable model that detects and classifies user’s emotional
states based on their facial expression in real-time.
● To suggest a song to the user by mapping their emotion to the mood type of song.
● To develop a user-friendly interface where users should easily upload their image or
use webcams in real time mood detection and receive music recommendations
instantly.
● It enhances the music listening experience through real- time mood recognition
and personalized song selection.
● It can be integrated into mental health and wellness apps which aims to reduce
stress or depression of a person using music therapy.
● In the video games industry, mood-oriented pitch can be used to dynamically
adjust the soundtrack in accordance with the player's feeling thus creating a more
absorbing as well as emotionally filled experience.
● It can be integrated with wearable IOT devices using which we can detect mood
and play songs accordingly.
2
Limitation
● The accuracy of the emotion detection system may limited by the quality of
images or webcams due to factors like lighting, facial obstructions (e.g. Glasses,
masks), etc.
● This system will not be able to detect complex emotions.
● The system will not align with the mood of the user if the system accuracy is low.
3
Figure 1.1 Life cycle of Agile Methodology
Chapter 1: Introduction
The main report started with introducing the music and how the music recommendation
system can improve through emotion detection and its importance in future. It also covers
the topic like problem statement, objectives, scope and limitation along with development
methodology by which the system is going to develop.
Chapter 2: Background Study and Literature Review
This section describes the necessity of project in background study section and explores
some journals describing the previous work on this system in the literature review section
4
Chapter 4: System Design
This section describes the architecture of the system and algorithm details of the
Convolution neural network which acts as a backend tool of the system.
The project started with developing the custom CNN Models to detect the emotion in real
time and integrating with Spotify, it will automatically offer the best list of music tracks
with song names, artist name and images of song and reduce time complexity of users
searching for music.
5
2.2 Literature Review
In the 1960s, a research team led by Woodrow W Bledsoe ran an experiment to check
whether computers could recognize the human face but it wasn’t successful. It was the
first step in proving that Facial Recognition was a viable biometric. After that, various
research was done in this sector. In 1991, Alex Pentland and Matthew Turk of the
Massachusetts Institute of Technology (MIT) presented the first successful example of
facial recognition technology, Eigenfaces, which uses the statistical Principal component
analysis (PCA) method [2] . Similarly, various machine learning and deep learning
algorithms were developed which reduced the challenges to develop the system. When
the Facial recognition models were built, people started to fore-casting its applications,
one of them is Music Recommendation using facial expression.
The Research Paper [3] explored advances in human affect recognition, focusing on
methods for analyzing audio/visual displays of emotions like happiness, sadness, fear,
anger, disgust, and surprise. Their research addresses challenges in developing automated
affect recognizers, highlighting issues often overlooked in emotion recognition models.
Florence and Uma [4], developed a web-based music player where the model trained on
CS extensive data set. They use HAAR and HOG algorithms used for face detection and
recognize facial landmarks. Then they prepared an emotion -audio integration module
which recommended the song from the database that matches with user emotion detected
by Fisher face algorithm.
6
Chapter 3: System Analysis
3.1. System Analysis
3.1.1 Requirement Analysis
i. Functional Requirement
It describes the behavior or features of the system. It specifies “what should the software
system do?” A list of functional requirements of the system are:
● Users can upload an image from storage or through the camera and the system detects
the emotion.
● The detected emotion is mapped with predefined music mood (like happy, sad, calm)
and system fetches the corresponding music from the Spotify API
● The system provides a visual interface to display predicted emotion, recommended
songs with artist name, song name and image from Spotify.
7
● Performance: The system is able to detect the face and recognize emotions in real
time and recommend music tracks quickly to ensure seamless user experience
with high accuracy.
● Scalability: The system is scalable as the web API of our system can be integrated
into different sectors like Spotify, YouTube, hospital for music therapy and Mobile
application.
● Usability: With minimal technical knowledge any user uses it effectively.
● Maintainability: The system needs to be maintained in order to increase the
accuracy or fix the bugs. The project is well documented and instructions are
clearly explained, so it can be easily maintainable in future.
3.1.2 Feasibility Study
It is the process of analysis and evaluation of a proposed system to ensure whether or not
the project plan could be successful. It can be done by examining the different constraints
like cost-benefit analysis, resources requirement, time, etc. It focuses on answering the
question of “should we be able to forward the project?”
i. Technical
All the tools required to proceed the project are easily and readily available on the web. A
standard PC or laptop with webcam and 4 GB RAM. However, GPU might be necessary
to handle deep learning computation efficiently for complex project.
ii. Operational
It is a measure of how people feel about the system or project. It answers the question of
“Is the problem worth solving?” This project gives a better solution for music
recommendation systems than using methods like collaborative filtering and contentbased
models. This system suggests the song in response to the emotion recognized by the
system will give relief to the customer if they need music therapy to reduce their
depressed mood. Thus, the project is operationally feasible.
iii. Economical
Most of the tools, libraries and IDE required for the project are open sources and free to
use. However, if we want to scalable the project for production level there should be high
computational cost, database cost and API cost.
8
iv. Schedule
The Project is categorize into different phases with respect to time constraints and
milestones in order to resolve the problem in time and complete the project under the
certain time boundary. Here, Gantt chart use to show different jobs with respect to time
which is shown in below:
9
Figure 3.2 Gantt chart
10
3.1.3 Analysis
i. Flowchart
11
Figure 3.4 DFD level 0
DFD Level -1
User interface is responsible for taking input like uploading photos using a camera or
through storage. Then the face will be detected by a very popular algorithm Cascade
Classifier. Now the Convolution Neural network model comes into play. The detected
image will be preprocessed and essential features are extracted to classify the emotions
from the faces and corresponding output is again mapped with predefined music moods.
12
Using the Spotify API, the best top 15 popular music track list is recommended to users
along with music images, artist name and song name.
13
model. Step8: The track id column is taken from music_moods after filter by popularity
and corresponding song info features are fetched using Spotify Web API.
Step10: The predicted emotion is mapped to the music_mood dataset, and the top 15 most
popular songs corresponding to the mapped mood are recommended to the user.
Detailed explanation of the algorithm used in System Convolution neural network
Convolution neural network, also known as ConvNet, is a class of deep neural networks,
mostly used to solve image recognition, segmentation and classification problems.
The algorithms truly inspired by layered architecture of the animal visual cortex. Visual
nerves from the eyes go to the straight visual primary cortex where the edge and
orientation recognition starts, likewise in each cortical area there is specific operation like
shape, size, location, motion of the object and so on. Similarly in CNN, using the filter the
different types of information such as object orientation, edge detection, etc. and by
summing all the information, objects are either classified or recognized.
The architecture of CNN is formed with mainly three layers which are Convolution layer,
Pooling layer and Fully-connected layers. A simplified diagram is illustrated below with
brief concepts and mathematical descriptions.
14
Figure 4.3 Gray-Scale Image
Convolution Layer
The first process of convolutional ensemble extracts the features of the captured image
and thus uses a series of filters. As the system gets deeper, we make the filter 2-3 times
larger than the previous filter. The convolution operation involves sliding the filters across
the input image, computing the element-wise product of the filter with the corresponding
region of the image, and summing the results to obtain feature maps.
Let’s explore the mathematical computation:
15
At first the kernel matrix applied to the image matrix from top-left corner to right,
performs element-wise operation and by summing all, generates a value to the first topleft
corner of the convoluted matrix. In this layer, the filter moves at first left to right with
kernel size , then shifts by a column to right, which is called strides (here strides value is
1). Then move to step down and the process repeats until the image matrix is fully
covered.
An activation function (ReLU) is applied after each convolution operation which
introduces non-linearity by replacing the negative values with zero into the model to
make the model more robust for identifying different patterns.
In above examples we apply the convolution operation with no padding to the input image.
Pooling Layer
The pooling layers are used to reduce the dimension of the feature map (apply after the
convolution operation) by preserving the dominant i.e. import feature of the feature map
in each pool step, mitigating overfitting. There are different types of pooling techniques
such as max pooling, average pooling and min pooling. Mostly max pooling is used in
many cases for CNN related projects. Max pooling takes the maximum value of feature
map where usually 2*2 filters with stride 2 are used.
16
The above figure illustrate the how the max pooling operation is performed with the size of
kernel 2*2 and stride 2.
Once the feature extraction is done, the resultant vector is flattened into a single vector and
passed as an input to the fully connected layer where the classification is done.
17
Figure 4.8 Fully- Connected Layer
Batch Normalization
It is the process of scaling the values in such a way that the data will fall in the specific
range which is ultimately helpful for reducing the training time and maintaining stability
during the training process by mitigating the overfitting problems. Generally batch
normalization is applied after the convolution layer or fully connected layer using
Gaussian distribution. It can be computed by subtracting the mean from output at each
layer and then dividing by standard deviation.
Activation Functions (Non-linearity):
In the CNN model, there are two types of commonly used activation functions: Softmax
activation and ReLU activation.
18
Softmax activation is applied in the output layer to generate probability distribution of the
output classes i.e. it chooses the class having the higher probability than others and assign
the label to the input image. It can computed by following formula:
=
=1
Where is the probability for each output category i, denotes each un-normalized output from
the previous layer in the network.
Cross-entropy i.e. loss function is widely used to the measure the performance of model and
can easily compute using the formula”
(,) =− ( log )
The Rectified linear Unit (ReLU) activation function is widely used in Convolution
neural networks to replace by zero if it receives negative values but if it receives positive
value, it simply returns the same value as it is.
𝑒 = max (0, )
Dropout layer:
This is used to solve the issue of overfitting by cutting off some neuron of the network
randomly during the training process on each epoch. For example if the dropout is 0.25 it
means 25 % neuron drop out from the network. The value should be chosen according to
the architecture of the model defined.
19
Figure 4.9 Dropout Layer
Early stopping
During the training the model, we distribute the dataset into training dataset, validation
dataset and testing dataset. Generally validation dataset taken from 15 to 20% of the
whole dataset and used as a cross validation purpose.
So, early stopping stops the training process of the model when the validation error is not
decreased because the ability to generalize the learned model is also decreased. Then,
from the best epoch it returns the training accuracy of the model.
Evaluation Metrics
Evaluating the performance of the classification model is crucial to find the how the
model performed in unseen data and guide to improve the performance of model. This can
be achieved using the various evaluation metrics such as accuracy, precision, recall and
F1-Score. Confusion matrices is one of the widely used classification metrics for
measuring the performance of model.
Confusion Matrix
A confusion matrix is an N*N matrix used for evaluating the performance of the
classification model, where N is the number of target classes. The matrix compare
predicted label by machine learning model against actual label and gives holistic approach
about the performance and what kind of errors are occurring. By observing the kind of
error, we can optimize the model hyper-parameter and re-trained the model to improve the
20
performance of model. Classification problems can be binary classification or multiclass
classification. For binary classification, the structure of confusion matrix look like as
shown in table below:
Table 2 Confusion Matrix
Positive Negative
True Positive (TP): It refers correctly classified positive classes. Both actual and predicted
class are positive.
True Negative (TN): It refers correctly classified negative classes. Both actual and predicted
class are negative.
False Positive (FP): It refers incorrectly classified positive classes. The model predict
negative class but actually they are belong to positive class. This called Type-1 error.
False Negative (FN): It refers incorrectly classified negative classes. The model predict
positive class but actually they are belong to negative class. This called Type-2 error.
Accuracy: It is the percentage of correct prediction made by the model.
𝑇+
𝐴𝑐 =
𝑇+𝑇+ +
Precision: It is the percentage of correctly classified positive class to total number of classes
that are predicted as positive.
𝑇
𝐴𝑒𝑐 =
𝑇+𝑇
Recall: It is the percentage of correctly classified positive class to total number of actual
positive class.
𝑇
𝐴=
𝑇+
F1 Score: It is the combination or harmonic mean of precision and recall.
2 ∗ 𝐴𝑒𝑐 ∗ 𝐴
1−𝐴=
𝐴𝑒𝑐 + 𝐴
21
Chapter5: Implementation and testing
5.1 Implementation
Implementation is the executing phase of Software development life cycle where your
conceptual design and planning stages are converted into actionable code. This phase
integrates with the testing phase to ensure that the system either returns the expected
result specified in the document or not.
The project consists of two primary phases: predict the emotion and recommend the
music track.
In the first phase, we took the dataset FER2013 dataset from kaggle which consists of a
train and test class, each with seven classes (happy, sad, angry, neutral, surprised, fear and
disgust). Then the dataset of train class split into Split into training and validation dataset
(80%, 20%). After that model is trained using CNN algorithm which is able to predict
Facial emotion and return the label of any one of seven classes.
The predicted emotion is mapped with music moods and then a list of music tracks
fetched from Spotify Web API using the track Id and recommended to the user.
22
Jupyter notebook is a web-based application used to create interactive documents that can
contain live code, equations, visualization, media, output [7].
In this project, it is used for preprocessing and visualizing the dataset, and building the CNN
model and for preparing the music_mood dataset.
TensorFlow
It is an open source software library developed by Google for numerical computation
using data flow graphs. It is primarily used for model optimization and deploying
machine learning and deep neural networks, can run in both GPU/CPU and even in
mobile operating systems. It accepts input in the form of a multidimensional array called
tensor.
Keras site on the top of tensorflow which provides an easy interface to design neural
networks. In this project tensorflow and keras were used for building the CNN model and
as a backend tool where it tested the new image given from the user and predicted the
emotion.
Dataset
In this project two dataset were downloaded from the kaggle: Fer2013 dataset and Spotify
millions dataset.
Fer2013 dataset
The FER2013 dataset which is split into two parts: training and testing dataset. The
training dataset consists of 28,709 images while the test data set consists of 7,178 images.
There are 48 * 48-pixel grayscale images of faces in the dataset where each labeled into
one of seven classes: happy, sad, angry, surprise, neutral, fear, disgust. The grayscale
images shown below are taken from the FER2013 dataset which show a different mood.
23
Figure 5.1 Samples of FER2013 Dataset
Music_moods dataset
I took the dataset which is already preprocessed and clustered into four different music
moods: happy, sad, calm and surprised. It consists of Spotify music track id , song name,
mood type , artist name , so on.
In this project, track_id taken from a dataset after mapping predicted emotion with music
mood and then song details fetched from the Spotify web app.
Pandas
It's a powerful tool for data manipulation and can view any type of data in the form of
tabular structure. It offers various functions like checking null values, duplicates values,
shape, data info and so on. So, it is an important tool for cleaning, transformation and
analysis.
Numpy
Numpy stands for numerical python and useful tools for numerical computing. It offers
functions for working on multidimensional arrays, mathematical operators, linear
programming, etc.
Matplotlib
It supports visualization of data using the bar chart, column chart, scatter plots, pie chart,
etc.
OpenCV
24
It is an open source computer vision and machine learning library used to detect and
recognize faces, identify objects, classify human actions in videos, track camera
movements ,extract 3D models of objects and so on. In this project, it used to detect the
face in real time [8].
Streamlit
It is an open source python framework which allows us to visualize our work using just
view lines of code. Using it, Data scientist/ML engineer can deploy apps without taking
overload of configuration in streamlit.io [9].
Spotify Web API
It provides a wide range of functionality like retrieving metadata of song, playback control,
managing playlist and recommending the song.
Documentation tools
Microsoft words, draw.io, excel used as supporting tools for documentation.
25
Test 1
Test 2
Test 3
Test 4
Test 1
26
Actual Result Verified successfully and image, song
name, artist are fetched successfully
Test 2
Test 1
27
4 Emotion _mood predicted emotion should successfully mapped
mapping mapped with predefined
category of mood
Batch Size 64
No. of classes 7
28
Learning rate 0.001
optimizer Adam
Epoch 150
29
Figure 5.3 Predicted output on test data
Confusion Matrix
Below Figure illustrate how the model predicted the label of input image during training the
model.
30
Chapter 6: Conclusion and Future Recommendation
6.1 Conclusion
The project started with a lot of research related to emotion recognition and emotion
based music recommendation systems projects proposed by different data scientists and
developers. Nowadays, the AI powered solution is rapidly growing so we decided to work
on it.
In our case, we utilize the FER2013 datasets for training a convolutional neural network
(CNN) model on detection of facial emotion, achieving 60% accuracy test data. After
music mood mapped with predicted emotion and from another dataset music_moods
taken which act as file storage system from which music track id taken and corresponding
metadata related to music are fetched from Spotify and recommended to the user.
31
References
[1] M. Athavle, “Music Recommendation Based on Face Emotion Recognition,” Journal of
Informatics Electrical and Electronics Engineering (JIEEE), vol. 2, no. 2, pp. 1–11, Jun.
2021, doi: 10.54060/jieee/002.02.018.
[4] S. Metilda Florence and M. Uma, “Emotional Detection and Music Recommendation
System based on User Facial Expression,” IOP Conference Series: Materials Science
and Engineering, vol. 912, no. 6, p. 062007, Aug. 2020, doi: 10.1088/1757-
899x/912/6/062007.
32
Appendices
33
34
35
36
37
38