0% found this document useful (0 votes)
34 views

Chapter 1: Introduction: 1.1 Overview

change to active

Uploaded by

deepak singla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Chapter 1: Introduction: 1.1 Overview

change to active

Uploaded by

deepak singla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 1: Introduction

1.1 Overview
The primary goal of this project was the production of an effective and easy to use
software system that could classify music by genre after having been programmed with a
given genre hierarchy and trained on sample recordings. Before this could be
accomplished, of course, there were a number of intermediate tasks to complete, each
with varying degrees of research value of their own. The first task was to study and
consider musical genre from theoretical and psychological perspectives in order to
achieve a broader understanding of the issues involved. This was useful in gaining
insights on how to implement the classification taxonomy and in understanding what
kinds of assumptions might be reasonable to make and what kinds should be avoided.
The next task was the compilation of a library of features, or pieces of information that
can be extracted from music and used to describe or classify it. Features relating to
instrumentation, texture, dynamics, rhythm, melodic gestures and harmonic content can
all be used by humans to make distinctions between genres. Features based on these
parameters were considered along with features that might not be obvious to humans, but
could be useful to a computer.
A model genre hierarchy was then constructed and a large set of MIDI files were
collected in order to train and test the system. Although the large number of genres in
existence made it impossible to consider every possible genre, efforts were made to
incorporate as many different ones as possible, including genres from classical, jazz and
popular music. Each feature from the feature library was then extracted and stored for
each MIDI file. A variety of classification methodologies, based on statistical pattern
recognition and machine learning, were then applied to this data and a system was built
for coordinating the classifiers and improving their collective performance. Feature
selection was performed using genetic algorithms.

Jazz, rock, blues, classical.. These are all music genres that people use extensively in
describing music. Whether it is in the music store on the street or an online electronic store
such as Apple’s iTunes with more than 2 million songs, music genres are one of the most
important descriptors of music. This dissertation lies in the research area of Music Genre
Classification1 which focuses on computational algorithms that (ideally) can classify a song
or a shorter sound clip into its corresponding music genre. This is a topic which has seen an
increased interest recently as one of the cornerstones of the general area of Music Information
Retrieval (MIR). Other examples in MIR are music recommendation systems, automatic
playlist generation and artist identification. MIR is thought to become very important in the
nearest future in the processing, searching and retrieval of digital music.

Another thing to consider when dealing with genres is the fit of the label to a certain song. Is
there a typical Pop-song that is then compared to other songs Whether it is the artists
themselves who place their music into a genre, or an expert working as a producer, they
might have different opinions about what defines a certain genre. Additionally, mapping
songs to genre is not a one-for-one relation, but one song could have influences from many
different genres at once, which makes classification harder. For example, a Pop-song could
have jazzy elements and therefore be labeled Jazz/Pop.

1.2Machine Learning Technique


Machine learning is a data analytics technique that teaches computers to do what comes
naturally to humans and animals: learn from experience. Machine learning algorithms use
computational methods to “learn” information directly from data without relying on a
predetermined equation as a model. The algorithms adaptively improve their performance as
the number of samples available for learning increases. Deep learning is a specialized form of
machine learning.

Machine learning uses two types of techniques


supervised learning: which trains a model on known input and output data so that it can predict
future outputs.
unsupervised learning, which finds hidden patterns or intrinsic structures in input data.

UNSUPERVISED CLUSTERING
MACHINE LEARNING
LEARNING
CLASSIFICATION
SUPERVISED
LEARNING
REGRESSION
FIG 1: Machine Learning Technique

Supervised Learning

Supervised machine learning builds a model that makes predictions based on evidence in the
presence of uncertainty. A supervised learning algorithm takes a known set of input data and
known responses to the data (output) and trains a model to generate reasonable predictions
for the response to new data. Use supervised learning if you have known data for the output
you are trying to predict.

Supervised learning uses classification and regression techniques to develop predictive


models.

Classification techniques predict discrete responses—for example, whether an email is


genuine or spam, or whether a tumor is cancerous or benign. Classification models classify
input data into categories. Typical applications include medical imaging, speech recognition,
and credit scoring.

Regression techniques predict continuous responses—for example, changes in temperature


or fluctuations in power demand. Typical applications include electricity load forecasting and
algorithmic trading.

Unsupervised Learning

Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw
inferences from datasets consisting of input data without labeled responses.
Clustering is the most common unsupervised learning technique. It is used for exploratory
data analysis to find hidden patterns or groupings in data. Applications for cluster analysis
include gene sequence analysis, market research, and object recognition.

1.2 Ensemble of Machine Learning


An ensemble is itself a supervised learning algorithm, because it can be trained and then used
to make predictions. The trained ensemble, therefore, represents a single hypothesis. This
hypothesis, however, is not necessarily contained within the hypothesis space of the models
from which it is built. Thus, ensembles can be shown to have more flexibility in the functions
they can represent. This flexibility can, in theory, enable them to over-fit the training data
more than a single model would, but in practice, some ensemble techniques tend to reduce
problems related to over-fitting of the training data

A distinction must be made between instrument and musical ensemble, or instrumentation


identification. 1Instrument identification is concerned with determining which specific
instruments are present in a piece of audio, usually accomplished with frame-level signal
analysis, whereas instrumentation identification attempts to classify the musical ensemble
that produced a song. If the ensemble is classified correctly, the instruments present in a song
are known by default, as an ensemble is defined by its constituent instruments. While there
has been a large amount of research performed on instrument identification techniques,
musical ensemble identification has not received the same level of focus.

Fig: system overview

1.4 Music Genre


Music classification can be applied to a wide variety of tasks, both academic and
commercialin nature.Correspondingly, there are many ways in which one can classify music,
for many different purposes. For example, automatic music classification techniques can be
of great use to: libraries and other institutions that archive music; composers and musicians
who wish to make use of these technologies in their creative works; educational institutions
that can use automatic classification techniques in 41pedagogically useful teaching software;
courts making decisions on potential copyright violations; recording studios and record
companies;music vendors;listeners who wish to improve and customize their listening
experiences and personal music collections; and researchers in both music technology-
oriented disciplines as well as in more traditional fields, such as musicology and music
theory. This sub-section provides highlights of some of the ways in which automatic music
classification can be of value to such users.To begin with, automatic music classification is an
essential part of many types of MIR research, as is made clear by an examination of the sub-
disciplines of MIR outlined . Indeed, many important areas of MIR research can be
formulated directly as automatic music classification problems. Tasks such as genre, mood,
artist and composer classification are all examples of this, as are tag prediction and
classification by time period or geographical place of origin.Many of the MIR research areas
associated with musical similarity also use very similar techniques to those used in automatic
music classification. For example, tasks such as playlist generation, music recommendation,
cover song detection and hit prediction all typically involve many of the same features used
in automatic music classification. Such tasks also typically require the collection and
labelling of ground-truth data and application of machine learning algorithms, albeit
sometimes using unsupervised rather than supervised approaches

Genre is used by music retailers, music libraries and people in general as a primary means of
organizing music. Anyone who has attempted to search through the discount bins at a music
store will have experienced the frustration of searching through music that is not sorted by
genre. There is no doubt that genre is one of the most important means available of
classifying and organizing music. Listeners use genres to find music that they’re looking for
or to get a rough idea of whether they’re likely to like a piece of music before hearing it.
Industry, in contrast, uses genre as a key way of defining and targeting different markets. The
importance of genre in the mind of listeners is exemplified by research showing that the style
in which a piece is performed can influence listeners’ liking for the piece more than the piece
itself.

Unfortunately, consistent musical genre identification is a difficult task, both for humans and
for computers. There is often no generally accepted agreement on what the precise
characteristics are of a particular genre and there is often not even a clear consensus on
precisely which genre categories should be used and how different categories are related to
one another. The problems of determining which musical features to consider for
classification and determining how to classify feature sets into particular genres make the
music classification of music a difficult and interesting problem.

The need for an effective automatic means of classifying music is becoming increasingly
pressing as the number of recordings available continues to increase at a rapid rate. It is
estimated that 2000 CDs a month are released in Western countries alone . Software capable
of performing automatic classifications would be particularly useful to the administrators of
the rapidly growing networked music archives, as their success is very much linked to the
ease with which users can search for types of music on their sites. These sites currently rely
on manual genre classifications, a methodology that is slow and unwieldy. An additional
problem with manual classification is that different people classify genres differently, leading
to many inconsistencies, even within a single database of recordings. The mechanisms used
in human genre classification are poorly understood, and constructing an automatic classifier
to perform this task could produce valuable insights.
Genres: genres that are considered in this analysis (Pop, Jazz, Classical and R&B) is quite
ambiguous. Usually,one can hear the difference between for example Pop music and
Classical music

Pop music is a very wide genre, appealing to the larger mass of listeners. Pop music is
often said to have simple chord progressions, Usually Pop songs are repetitive, with
recurring choruses with catchy melodies. The instrumentation varies heavily, from acoustic
guitars, to electronic music.

Jazz music is mostly known for its complex chord progressions, using seventh chords,
extended chords and borrowed chords frequently. A common chord progression is the
Instrumentation is commonly piano, with a set of brass instruments such as trumpets,
trombones and saxophones in addition to the drums, bass guitar and electric guitar.

Classical music is very melodic, and usually orchestrated, or played on a solo instrument
such as piano. A common chord progression seen in classical music is the descending-fifth-
progression, which means the chords always move five steps down the scale.

R&B-music is very much like Pop a very wide genre, except it features rap as well as sung
melodies. R&B music are usually heavy on the beats and bass, with electronic melodies
played from a music producing software. It is unclear if there are any specific chord
progressions that characterizes this genre.

1.5 Music Genre Classification


Most of the proposed music genre classification systems consider a few genres in a flat
hierarchy. In an hierarchical genre taxonomy is suggested for 13 different music genres,
three speech types and a ”background” class. The genre taxonomy has four levels with 2-4
splits in each. Hence, to reach the decision of e.g. ”String Quartet”, the sound clip first has be
classified as ”Music”, ”Classical”, ”Chamber Music” and finally ”String Quartet”. Feature
selection was used on each decision level to find the most relevant features for a given split
and gaussian mixture model classifiers were trained for each of these splits.
So far, the music has been represented as an audio signal. In symbolic music genre
classification, however, symbolic representations such as the MIDI format or ordinary music
notation (sheet music) are used. This area is very closely related to ”audio-based” music
genre classification, but has the advantage of perfect knowledge of e.g. instrumentation and
the different instruments are split into separate streams. Limitations of the symbolic
representation are e.g. lack of vocal content and the use of a limited number of instruments.

For classification purposes, a number of standard statistical pattern recognition (SPR)


classifiers were used. The basic idea behind SPR is to estimate the probability density
function for the feature vectors of each class. In supervised learning a labelled training set is
used to estimate the pdf for each class. In the simple Gaussian (GS) classifier, each pdf is
assumed to be a multidimensional Gaussian distribution whose parameters are estimated
using the training set. In the Gaussian mixture model (GMM) classifier, each class pdf is
assumed to consist of a mixture of a specific number of multidimensional Gaussian
distributions. The iterative EM algorithm can be used to estimate the parameters of each
Gaussian component and the mixture weights. In this workGMMclassifiers with diagonal
covariance matrices are used and their initialization is performed using the means algorithm
with multiple random starting points. Finally, the nearest neighbor ( NN) classifier is an
example of a nonparametric classifier where each sample is labeled according to the majority
of its nearest neighbors. That way, no functional form for the pdf is assumed and it is
approximated locally using the training set. More information about statistical pattern
recognition can be found

In this experiment, there are 420 audio tracks in the dataset for training, 120 for validation
and 60 for testing. Each audio track lasts for 30 seconds. We set the batch size that defines
the number of samples to be propagated through the network for training as 35. We can see
that the accuracy and loss are improving within 20 Epochs. At 20, the test accuracy reaches
the maximum and the loss is minimized. We achieved a classiffcation accuracy of around 0.5
to 0.6. There are still some rooms for improvement. With more training samples, we may be
able to achieve an accuracy of 0.6 to 0.7. The major limitation is the small training data size.
It leads to low accuracy and overftting. Although some genres, such as metal, are outstanding
and easy to be recognized, it is hard to classify some other genres that are quite similar.

Genre Classification Phase- In this phase a dataset is used for feeding the data in the
classifier, which creates a memory model within itself stated as regression model. This
process is done by the Logistic Regression module of the scikit-learn library. The python
script for this purpose is stated. Once the model has been created, we can use it to predict
genres of other audio files. For efficient further use of the generated model, it is permanently
serialized to the disk, and is de-serialized when it needs to be used again. This simple process
improves performance greatly. As of now, the python script has to operate before any testing
with unknown audio file can be performed. Once the script is run, it will save the generated
model. Once the model has been successfully saved, the classification script need not be run
again until some newly labeled training data is available.
Fig : Classification of music genres in GTZAN dataset

Model for music genre


A dataset is used for training the classifier, which generates an in-memory regression model.
This process is done by the Logistic Regression module of the scikit-learn library. The python
script has been provided for this purpose. Once the model has been generated, we can use it to
predict genres of other audio files. For efficient further use of the generated model, it is
permanently serialized to the disk, and is de-serialized when it needs to be used again. This
simple process improves performance greatly. As of now, the python script must be run before
any testing with unknown music can be done. Once the script is run, it will save the generated
model. Once the model has been successfully saved, the classification script need not be run
again until some newly labeled training data is available.

Few other steps in this process include the following:


i. Testing- A python script is used for checking up on new and fresh audio files and it helps in de-
serializing the previously cached models. Thus, it labels the new files.
ii. Output Interpreter- All music files are classified and its trained model is saved to the disk.
Also, graphs are generated which are saved in the directory.
iii. ROC Curves- The Receiver Operating Characteristic Curves are generated and saved which
denotes the truthfulness of the defined genre after the music file is classified.

These examples show that there is a whole level of semantics inherent in song lyrics that can
not be detected solely by audio based techniques. We thus assume that a song’s text content
can help in better understanding its perception, and evaluate a new approach for combining
descriptors extracted from the audio domain of music with descriptors derived from the
textual content of lyrics. Our approach is based on the assumption that a diversity of music
descriptors and a diversity of machine learning algorithms are able to make further
improvements.

Music information retrieval (MIR) is concerned with adequately accessing (digital) audio.
Important research directions include similarity retrieval, musical genre classification, or
music analysis and knowledge representation. A comprehensive overviews of the research
field . The prevalent technique of music for MIR purposes is to analyse the audio signal.
Popular feature sets include MFCCs, Chroma, or the MPEG-7 audio descriptors.

Music classification can be applied to a wide variety of tasks, both academic and
commercialin nature.Correspondingly, there are many ways in which one can classify music,
for many different purposes. This sub-section provides highlights of some of the ways in
which automatic music classification can be of value to such users.To begin with, automatic
music classification is an essential part of many types of MIR research, as is made clear by an
examination of the sub-disciplines of MIR outlined . Indeed, many important areas of MIR
research can be formulated directly as automatic music classification problems. Tasks such as
genre, mood, artist and composer classification are all examples of this, as are tag prediction
and classification by time period or geographical place of origin.Many of the MIR research
areas associated with musical similarity also use very similar techniques to those used in
automatic music classification. For example, tasks such as playlist generation, music
recommendation, cover song detection and hit prediction all typically involve many of the
same features used in music classification.

You might also like