Emotion_Based_Music_Recommendation_System
Emotion_Based_Music_Recommendation_System
Abstract— A person often finds it difficult to choose which these basic emotions. This study presents a method for
music to listen from different collections of music. Depending on detecting these basic universal emotions from frontal facial
the user’s mood, a variety of suggestion frameworks have been expressions. After, implementing the facial recognition
made available for topics including music, festivals and machine learning model, we then further continue to make it
celebrations. Our music recommendation system’s main goal is
into a web application by using Streamlit. The Emotion
to offer users recommendations that match user’s preferences.
Understanding the user’s present facial expression can enable detection is performed using Deep learning. Deep Learning
us to predict the user’s emotional state. Humans frequently use is a well-known model in the pattern recognition arena. The
their facial expressions to convey their intentions. More than keras library is being used, as well as the Convolution Neural
60% of users have at some point believed that the count of songs Network (CNN) algorithm. A CNN is indeed an artificial
in their music playlist is so much that the user’s are unable to neural network with some machine learning component.
choose a music to play. By creating a suggestion system, it might Among other things, CNN can also be used to detect objects,
help a user decide which music to listen, allowing the user to perform facial recognition and process images. [2]
feel less stressed. This work is a study on how to track and match
the user's mood using face detection mechanism, saving the user
time from having to search or look up music. Deep learning
performs emotion detection which is a well-known model in II. LITERATURE REVIEW
facial recognition arena. Convolution neural network algorithm
Humans frequently convey their emotions through a
has been used for the facial recognition. We use an open-source
app framework known as Streamlit to make a web application variety of ways like hand gestures, voice, tonality and so on,
from the model. The user will then be shown songs that match but they mostly do through facial expressions.An expert
his or her mood. We capture the user’s expression using a would be able to determine the emotions being experienced
webcam. An appropriate music is then played on, according to by the other person by observing or examining them.
their mood or emotion. Nevertheless, as there is technological advancement in
today’s world, machine are attempting to become more
Keywords— emotion recognition, convolution neural smarter. Machines are aiming to operate in an increasingly
network, streamlit. human-like way. On training the computer on the human
I. INTRODUCTION emotions, the machine would be capable to perform analysis
and react like a human. By enabling precise expression
Nowadays, music services make vast amounts of music patterns with improved competence and error-free emotion
easily accessible. People are constantly attempting to calculation, data mining can assist machines in discovering
enhance music arrangement and search management, in order and acting more like humans. A music player which is
to alleviate the difficulty of selection and make discovering dependent on emotions takes less time to find the appropriate
new music works easier. Recommendation systems are music that the user can resonate with. People typically have a
becoming increasingly common, allowing users to choose lot of music on their playlist, this would make it difficult for
acceptable music for any circumstance. Recommendations the user to choose an appropriate song. Random music does
for music can be used in a range of situations, including music not make the user feel better, so with the aid of this
therapy, sports, studying, relaxing, and supporting mental and technology, users can have songs played automatically based
physical activity. [1] However, in terms of personalization on their mood. [3] The webcam records the user's image, and
and emotion driven recommendations, there is still a gap. the pictures are stored. The system recorded user’s varied
Humans have been massively influenced by music. Music is expressions to assess their emotions and select the apt music.
a key factor to for various effects on humans such as
controlling their mood, relaxation, mental and physical work The ability to read a person's mood from their expression is
and help in stress relief. Music therapy can be used in a important. To capture the facial expressions, a
variety of clinical contexts and practices to help people feel webcam is used. This input can be used, among other things,
better. In this project, we’re creating a web application that to extract data that can be used to infer a person's attitude.
recommends music based on emotions. It influences how Songs are generated using the "emotion" which has been
people live and interact with one another. At times, this could inferred from the previous input. This reduces the tedious job
seem that we have been controlled by our emotion. The of manually classifying songs into various lists. The Facial
emotion we are encountering at any given moment have an Expression Based Music Recommender’s main objective is
effect on the final decision that we choose, actions that we scanning and analyzing the data, and then it would suggest
undertake, and the impression that we form. Neutral, angry, music in line with the user’s mood. [4]
disgust, fear, glad, sad, and surprise are the seven primary By utilizing image processing, we have developed an
global emotions. The look on a person's face might reveal emotion-based music system that would allow the user to
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:11:50 UTC from IEEE Xplore. Restrictions apply.
506
III. METHODOLOGY present in photos are normally associated to each other. For
instance, an edge in the image or any related pattern might be
This work takes into account the major challenges which the represented by a specific set of pixels. Convolution utilizes
machine learning system faces and the core of the system is the pixels to recognize these photos. The process occurring in
the data training part. The data instructing portion of the the convolution layer is that it multiplies the matrix of pixels
system is instructed using the real data of people’s facial with a filter matrix (also known as ‘Kernel ’) and then adds
emotions. For instance, for the system to determine an angry up these multiplication values. After process one part of the
facial emotion the system should first be trained with the pixel it then moves over to the next pixel portion to process it
angry reaction. Similarly, for the model to determine a happy in similar method. This process continues until the whole
facial emotion they will have to first be trained using the image pixel have been covered. We make use of a model
happy emotion. To antecedents the model with these emotion which has the type as Sequential. Sequential is the easiest
types, we make use of the re-training process. Re-training way to generate a model in Keras. Layer by layer model is
data has been assembled from the physical world. The created. We make use of various layer in our model. The first
challenge in the system was the retraining portion. Various 2 layer in the convolution layer works with the input images,
other parts of the system are also considered as challenging. these are represented in 2-Dimentional matrix form. Another
Machine Learning is an extremely potential tool that provides function in this algorithm is Activation. We make use of the
for more efficient and rapid data processing of large ReLU which is the Rectified Linear Activation as the
databases. This provides the ability of detecting emotion activation property for the initial 2 layers. The ReLU
more accurate. The System is able to provide feedbacks in activation property is proved to work well with neural
actual-time. The model need not wait to get the final result in networks. A Flatten layer is present in between both the
later time, and the photo taken need not be stored. Conv2D layer and also in the Deep layer. The flatten/reduce
layer serves as a interrelation between both the conv2D layer
A. Data Collection and the Dense layer. The result most probably will be
predicted based on the highest probability. The next step is
The mediaPipe assigns different landmarks to different points compilation of the model. The system is compiled using the
in the face. The data contains different landmark points of our important three parameter: metrics, optimizer and loss. Out
face, and one particular row would be comprising of all the of the three the study rate is being managed using the
key points as face key points, left hand landmark, right hand optimizer. For the loss function a ‘Categorical cross entropy’
landmark and various other samples would have all these was used. This is considered as a widely used option for
properties. We compare the differences in those landmarks classification. Lower the score, better the performance of the
during each emotion to train the model on different emotions system. To make items even much better to understand, when
passed by the user, like happy, sad, etc. Hence, the model is processing the system, the ’accuracy’ measure will be
able to classify each of the emotion passed by the user. utilized.
Initially all the files that we have created are searched (which
We use the video capture class to capture the video feed are happy.npy, sad.npy, angry.npy, neutral.npy, rock.npy,
coming from the webcam. After capturing the video, the surprise.npy). the npy files are filtered using the split
system will read the frame and show it to the user. function. The files are then stored in an array(X array) with
We make use of holistic solution inside the mediaPipe. The the labels associated with it in another(Y array).
holistic solution would take in the frame, and it would return For example, while talking about happy.npy, all the data
all the facial key points such as left hand, right hand. Then under happy.npy will be the input to the model and will be
the frame is converted from cv2 colour to RGB because cv2 stored in the X array. For a particular input data, we require
reads integer formats. Basically, we use process function the model to predict something. What we require the model to
inside holis and would pass the frame and get the result out predict is the emotion happy. This prediction data will be
of it. present in the y variable.
Once the initialization is completed, we are going to
We then use drawing function to draw on our face and mark
concatenate the X and Y array. Basically, it concatenates the
the face landmark, right-hand landmark and the left-hand
input data to the X array and the prediction result in the y
landmark on the frame, of the result variable. We then store
array. The name of the file will have an integer associated to
all these drawings in a list.
it.
The collected data of various emotions are then stored in a The model then passes the file through the CNN algorithm to
numpy file format with specific names associated to it (such predict the emotion of the person.
as happy.npy, sad.npy, angry.npy, neutral.npy, rock.npy,
surprise.npy.)
C. Frontend
B. Data Training Earlier we have already created a model which can detect
Creating a Convolutional Neural Network (CNN) is an different emotions like sad, angry, happy etc.
exceptional method to categorize images using deep learning. We then deploy this model into a web app. The trained model
We make use of a library called Keras in python to construct is going to give us a model.h5 file. It’s important to note that
a CNN model. PCs recognizes photos as pixels. The pixels the structured data is been stored in the h5 file format, not a
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:11:50 UTC from IEEE Xplore. Restrictions apply.
507
model in and of itself. Since the weights and the model setup language and the singer. To recommend different songs from
can easily be stored in a single file, Keras stores the model in the YouTube we import another module named webbrowser.
this manner. Keras is a powerful and user-friendly free open- The webbrowser module passes the URL. We inject all the
source Python tool for building and evaluating deep learning keywords which has been retrieved from the frontend to the
models.The model.h5 are been used to create a web app. URL query. [17] The user is then redirected to the YouTube
Model.h5 will help in creating a web app using different page as per as the URL query. For example, if we input the
python libraries mainly streamlit and streamlit webrtc. language as English, singer as Taylor Swift and the emotion
is detected as happy, then the web app will recommend happy
songs by Taylor Swift.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:11:50 UTC from IEEE Xplore. Restrictions apply.
508
using Streamlit. The Emotion detection is performed using
Deep learning. A well-known model in the field of pattern
detection is Deep Learning. The keras library is being used,
as well as the Convolution Neural Network (CNN) algorithm.
A CNN indeed is an artificial neural network that includes
machine learning components. Among other things, CNN can
also be used to detect objects, perform facial recognition and
process images.
Some modifications that could be made to this system are:
Add advices that would help the user according to
his/her emotions (for example, if the system the
emotion of the user as sad, then the system would
provide some motivational quotes or other advices
that would cheer up the user.)
The system can also provide small activities that
would help improve the mood of the person.
Improve the face detection accuracy.
VI. REFERENCES
[1] Rumiantcev, M. and Khriyenko, O., 2020. Emotion based
music recommendation system. In Proceedings of
Conference of Open Innovations Association FRUCT. Fruct
Oy.
[2] Ali, M.F., Khatun, M. and Turzo, N.A., 2020. Facial
Emotion Detection Using NeuralNetwork. the international
journal of scientific and engineering research.
[3] Dureha A 2014 An accurate algorithm for generating a
music playlist based on facial expressions International
Journal of Computer Applications 100 33-9
[4] James, H.I., Arnold, J.J.A., Ruban, J.M.M., Tamilarasan,
M. and Saranya, R., 2019. Emotion based music
recommendation system. Emotion, 6(03).
Fig 4: Final Result [5] Gupte A, Naganarayanan A and Krishnan M Emotion
Based Music Player-XBeats International Journal of
Advanced Engineering Research and Science 3 236854
V. CONCLUSION AND FUTURE WORK
[6] Ruchika, A. V. Singh, and M. Sharma, “Building an
effective recommender system using machine learning based
One of the essential areas of study is the identification of framework,” in 2017 International Conference on Infocom
emotions from the facial expressions, which has previously Technologies and Unmanned Systems (Trends and Future
attracted a lot of interest. It is clear that the difficulty of Directions) (ICTUS), Dec 2017, pp. 215– 219.
emotion recognition using image processing algorithms has [7] L. Shou-Qiang, Q. Ming, and X. Qing-Zhen, “Research
been growing daily. By utilizing various features and image and design of hybrid collaborative filtering algorithm
processing techniques, researchers are constantly looking for scalability reform based on genetic algorithm optimization,”
solutions to this problem. in 2016 6th InternationalConference on Digital Home
In this paper we have implemented a system where two (ICDH), Dec 2016, pp. 175–179.
predicates such as language and singer is been used to [8] Will Hill, Larry Stead, Mark Rosenstein, George Furnas,
understand the preference of the user. Once the predicates are and South Street. Recommending andEvaluating Choices in
entered, the webcam would start to capture the image of the a Vitual Community of Use. Mosaic A Journal For The
user, the captured emotion is then analyzed by the model, the Interdisciplinary Study Of Literature, pages 5–12, 1995.
inputs such as “Language”, “Singer” and “Emotion” would [9] M.A. Casey, Remco Veltkamp, Masataka Goto, Marc
then be injected into the URL Query. The user is then Leman, Christophe Rhodes,and Malcolm Slaney. Content-
redirected to a YouTube page as required. based Music Information Retrieval: Current Directions and
Future Challenges.Proceedings of the IEEE, 96(4):668–696,
This study presents a method for detecting the basic universal 2008.
emotions from frontal facial expressions. After, [10] Qing Li, Byeong Man Kim, Dong Hai Guan, and Duk
implementing the facial recognition machine learning model, Oh. A Music Recommender Based on AudioFeatures. In
we then further continue to make it into a web application by Proceedings of the 27th annual international ACM SIGIR
conference on Research and development in information
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:11:50 UTC from IEEE Xplore. Restrictions apply.
509
retrieval, pages 532–533, Sheffield, United Kingdom, 2004.
ACM.
[11] Bhat, A. S., Amith, V. S., Prasad, N. S., & Mohan, M.
(2014). An Efficient Classification AlgorithmFor Music
Mood Detection In Western and Hindi MusicUsing Audio
Feature Extraction. 2014 Fifth International Conference on
Signal and Image Processing, 359- 364.
[12] Talele, M., Gurnani, Y., Rochani, H., Patil, M. and
Soneja, K., SMART MUSIC PLAYER USINGMOOD
DETECTION.
[13] Ninad Mehendale, ‚Facial emotion recognition using
convolutional neural networks (FERC),‛ 18 February 2020
[14] Fan, X., Zhang, F., Wang, H., & Lu, X. (2012). The
System of Face Detection Based on OpenCV. In 24th Chinese
Control and Decision Conference (CCDC),Taiyuan, China.
IEEE.
[15] Gilda, S., Zafar, H., Soni, C., & Waghurdekar, K. (2017).
Smart Music Player Integrating Facial Emotion Recognition
and Music Mood Recommendation. In 2017 International
Conference on Wireless Communications, Signal Processing
and Networking (WiSPNET), Chennai, India. IEEE
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 04,2023 at 04:11:50 UTC from IEEE Xplore. Restrictions apply.
510