Project Report On Emotion Aware Smart Music Recommended System Using CNN
Project Report On Emotion Aware Smart Music Recommended System Using CNN
• CNN ALOGRITHM 10
• CONVULATION LAYER 11
• POOLING LAYER 13
• FULLY CONNECTED LAYER
• PROBLEMSTATEMENT `15
• PROBLEMDESCRIPTION
2 LITERATURESURVEY 16
18
• MOOD BASED MUSIC RECOMMENDER
SYSTEM 18
● An Emotion-Aware Personalized Music
Recommendation System Using a Convolutional 20
Neural Networks Approach
● DEEP LEARNING IN MUSIC
● Review on Facial Expression Based Music Player
• SMART MUSIC PLAYER BASED ON FAIAL
EXPRESSION
3 SOFTWARE RQURIMENTS 21
• EXISTINGSYSTEM 22
• PROPOSEDSYSTEM 22
• ArchitectureOverview
22
• SYSTEMREQUIREMENTS 23
4 SYSTEM ANALYSIS 24
24
• PURPOSE
24
• SCOPE
• EXISTING SYSTEM 24
25
● PROPSED SYSTEM
5 SYSTEM DESGIN 26
● INPUT DESGIN 26
26
● OUTPUT DESGIN
26
● DATAFLOW 26
27
● UML DIAGREMS
6 MODULES 32
● Data Collection Module 32
32
● Emotion Extraction Module
● Audio Extraction Module
● Emotion - Audio Integration
Module
7 SYSTEM IMPLEMENTATION 33
● SYSTEM ARTICTURE
8 SYSTEM TESTING 35
● TEST OF PLAN 35
● VERIFICATION 36
● VALIDATION 40
● WHITE BOX TESTING
● BLACK BOX TESTING
● TYPES OF TESTING
● REQURIMENT ANALYSIS
● FUNCTIONAL ANALYSIS
● NON FUNCTIONAL ANALYSIS
9 CONCLUSION 41
REFERENCES 42
PLAGRISM REPORT 43
LIST OF FIGURES
ABSTRACT
CNN takes an image as input, which is classified and process under a certain
category such as dog, cat, lion, tiger, etc. The computer sees an image as an array
of pixels and depends on the resolution of the image. Based on image resolution, it
will see as h * w * d, where h= height w= width and d= dimension. For example,
An RGB image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 *
1 array of the matrix.
In CNN, each input image will pass through a sequence of convolution layers
along with pooling, fully connected layers, filters (Also known as kernels). After
that, we will apply the Soft-max function to classify an object with probabilistic
values 0 and 1.
Convolution Layer
Convolution layer is the first layer to extract features from an input image. By
learning image features using a small square of input data, the convolutional layer
preserves the relationship between pixels. It is a mathematical operation which
takes two inputs such as image matrix and a kernel or filter.
The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is called
"Features Map" and show as an output.
Strides
Stride is the number of pixels which are shift over the input matrix. When the
stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if
the stride is equaled to 2, then we move the filters to 2 pixels at a time. The
following figure shows that the convolution would work with a stride of 2.
Padding
Padding plays a crucial role in building the convolutional neural network. If the
image will get shrink and if we will take a neural network with 100's of layers on
it, it will give us a small image after filtered in the end.
If we take a three by three filter on top of a grayscale image and do the convolving
then what will happen?
It is clear from the above picture that the pixel in the corner will only get covers
one time, but the middle pixel will get covered more than once. It means that we
have more information on that middle pixel, so there are two downsides:
o Shrinking outputs
o Losing information on the corner of the image.
Pooling Layer
Max Pooling
Down-scaling will perform through average pooling by dividing the input into
rectangular pooling regions and computing the average values of each region.
Syntax
layer = averagePooling2dLayer(poolSize)
layer = averagePooling2dLayer(poolSize,Name,Value)
Sum Pooling
The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.
The fully connected layer is a layer in which the input from the other layers will be
flattened into a vector and sent. It will transform the output into the desired
number of classes by the network.
In the above diagram, the feature map matrix will be converted into the vector
such as x1, x2, x3... xn with the help of fully connected layers. We will combine
features to create a model and apply the activation function such
as softmax or sigmoid to classify the outputs as a car, dog, truck, etc.
CHAPTER 2
LITERATURE SURVEY