Chapter 1: Introduction: 1.1 Overview
Chapter 1: Introduction: 1.1 Overview
1.1 Overview
The primary goal of this project was the production of an effective and easy to use
software system that could classify music by genre after having been programmed with a
given genre hierarchy and trained on sample recordings. Before this could be
accomplished, of course, there were a number of intermediate tasks to complete, each
with varying degrees of research value of their own. The first task was to study and
consider musical genre from theoretical and psychological perspectives in order to
achieve a broader understanding of the issues involved. This was useful in gaining
insights on how to implement the classification taxonomy and in understanding what
kinds of assumptions might be reasonable to make and what kinds should be avoided.
The next task was the compilation of a library of features, or pieces of information that
can be extracted from music and used to describe or classify it. Features relating to
instrumentation, texture, dynamics, rhythm, melodic gestures and harmonic content can
all be used by humans to make distinctions between genres. Features based on these
parameters were considered along with features that might not be obvious to humans, but
could be useful to a computer.
A model genre hierarchy was then constructed and a large set of MIDI files were
collected in order to train and test the system. Although the large number of genres in
existence made it impossible to consider every possible genre, efforts were made to
incorporate as many different ones as possible, including genres from classical, jazz and
popular music. Each feature from the feature library was then extracted and stored for
each MIDI file. A variety of classification methodologies, based on statistical pattern
recognition and machine learning, were then applied to this data and a system was built
for coordinating the classifiers and improving their collective performance. Feature
selection was performed using genetic algorithms.
Jazz, rock, blues, classical.. These are all music genres that people use extensively in
describing music. Whether it is in the music store on the street or an online electronic store
such as Apple’s iTunes with more than 2 million songs, music genres are one of the most
important descriptors of music. This dissertation lies in the research area of Music Genre
Classification1 which focuses on computational algorithms that (ideally) can classify a song
or a shorter sound clip into its corresponding music genre. This is a topic which has seen an
increased interest recently as one of the cornerstones of the general area of Music Information
Retrieval (MIR). Other examples in MIR are music recommendation systems, automatic
playlist generation and artist identification. MIR is thought to become very important in the
nearest future in the processing, searching and retrieval of digital music.
Another thing to consider when dealing with genres is the fit of the label to a certain song. Is
there a typical Pop-song that is then compared to other songs Whether it is the artists
themselves who place their music into a genre, or an expert working as a producer, they
might have different opinions about what defines a certain genre. Additionally, mapping
songs to genre is not a one-for-one relation, but one song could have influences from many
different genres at once, which makes classification harder. For example, a Pop-song could
have jazzy elements and therefore be labeled Jazz/Pop.
UNSUPERVISED CLUSTERING
MACHINE LEARNING
LEARNING
CLASSIFICATION
SUPERVISED
LEARNING
REGRESSION
FIG 1: Machine Learning Technique
Supervised Learning
Supervised machine learning builds a model that makes predictions based on evidence in the
presence of uncertainty. A supervised learning algorithm takes a known set of input data and
known responses to the data (output) and trains a model to generate reasonable predictions
for the response to new data. Use supervised learning if you have known data for the output
you are trying to predict.
Unsupervised Learning
Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw
inferences from datasets consisting of input data without labeled responses.
Clustering is the most common unsupervised learning technique. It is used for exploratory
data analysis to find hidden patterns or groupings in data. Applications for cluster analysis
include gene sequence analysis, market research, and object recognition.
Genre is used by music retailers, music libraries and people in general as a primary means of
organizing music. Anyone who has attempted to search through the discount bins at a music
store will have experienced the frustration of searching through music that is not sorted by
genre. There is no doubt that genre is one of the most important means available of
classifying and organizing music. Listeners use genres to find music that they’re looking for
or to get a rough idea of whether they’re likely to like a piece of music before hearing it.
Industry, in contrast, uses genre as a key way of defining and targeting different markets. The
importance of genre in the mind of listeners is exemplified by research showing that the style
in which a piece is performed can influence listeners’ liking for the piece more than the piece
itself.
Unfortunately, consistent musical genre identification is a difficult task, both for humans and
for computers. There is often no generally accepted agreement on what the precise
characteristics are of a particular genre and there is often not even a clear consensus on
precisely which genre categories should be used and how different categories are related to
one another. The problems of determining which musical features to consider for
classification and determining how to classify feature sets into particular genres make the
music classification of music a difficult and interesting problem.
The need for an effective automatic means of classifying music is becoming increasingly
pressing as the number of recordings available continues to increase at a rapid rate. It is
estimated that 2000 CDs a month are released in Western countries alone . Software capable
of performing automatic classifications would be particularly useful to the administrators of
the rapidly growing networked music archives, as their success is very much linked to the
ease with which users can search for types of music on their sites. These sites currently rely
on manual genre classifications, a methodology that is slow and unwieldy. An additional
problem with manual classification is that different people classify genres differently, leading
to many inconsistencies, even within a single database of recordings. The mechanisms used
in human genre classification are poorly understood, and constructing an automatic classifier
to perform this task could produce valuable insights.
Genres: genres that are considered in this analysis (Pop, Jazz, Classical and R&B) is quite
ambiguous. Usually,one can hear the difference between for example Pop music and
Classical music
Pop music is a very wide genre, appealing to the larger mass of listeners. Pop music is
often said to have simple chord progressions, Usually Pop songs are repetitive, with
recurring choruses with catchy melodies. The instrumentation varies heavily, from acoustic
guitars, to electronic music.
Jazz music is mostly known for its complex chord progressions, using seventh chords,
extended chords and borrowed chords frequently. A common chord progression is the
Instrumentation is commonly piano, with a set of brass instruments such as trumpets,
trombones and saxophones in addition to the drums, bass guitar and electric guitar.
Classical music is very melodic, and usually orchestrated, or played on a solo instrument
such as piano. A common chord progression seen in classical music is the descending-fifth-
progression, which means the chords always move five steps down the scale.
R&B-music is very much like Pop a very wide genre, except it features rap as well as sung
melodies. R&B music are usually heavy on the beats and bass, with electronic melodies
played from a music producing software. It is unclear if there are any specific chord
progressions that characterizes this genre.
In this experiment, there are 420 audio tracks in the dataset for training, 120 for validation
and 60 for testing. Each audio track lasts for 30 seconds. We set the batch size that defines
the number of samples to be propagated through the network for training as 35. We can see
that the accuracy and loss are improving within 20 Epochs. At 20, the test accuracy reaches
the maximum and the loss is minimized. We achieved a classiffcation accuracy of around 0.5
to 0.6. There are still some rooms for improvement. With more training samples, we may be
able to achieve an accuracy of 0.6 to 0.7. The major limitation is the small training data size.
It leads to low accuracy and overftting. Although some genres, such as metal, are outstanding
and easy to be recognized, it is hard to classify some other genres that are quite similar.
Genre Classification Phase- In this phase a dataset is used for feeding the data in the
classifier, which creates a memory model within itself stated as regression model. This
process is done by the Logistic Regression module of the scikit-learn library. The python
script for this purpose is stated. Once the model has been created, we can use it to predict
genres of other audio files. For efficient further use of the generated model, it is permanently
serialized to the disk, and is de-serialized when it needs to be used again. This simple process
improves performance greatly. As of now, the python script has to operate before any testing
with unknown audio file can be performed. Once the script is run, it will save the generated
model. Once the model has been successfully saved, the classification script need not be run
again until some newly labeled training data is available.
Fig : Classification of music genres in GTZAN dataset
These examples show that there is a whole level of semantics inherent in song lyrics that can
not be detected solely by audio based techniques. We thus assume that a song’s text content
can help in better understanding its perception, and evaluate a new approach for combining
descriptors extracted from the audio domain of music with descriptors derived from the
textual content of lyrics. Our approach is based on the assumption that a diversity of music
descriptors and a diversity of machine learning algorithms are able to make further
improvements.
Music information retrieval (MIR) is concerned with adequately accessing (digital) audio.
Important research directions include similarity retrieval, musical genre classification, or
music analysis and knowledge representation. A comprehensive overviews of the research
field . The prevalent technique of music for MIR purposes is to analyse the audio signal.
Popular feature sets include MFCCs, Chroma, or the MPEG-7 audio descriptors.
Music classification can be applied to a wide variety of tasks, both academic and
commercialin nature.Correspondingly, there are many ways in which one can classify music,
for many different purposes. This sub-section provides highlights of some of the ways in
which automatic music classification can be of value to such users.To begin with, automatic
music classification is an essential part of many types of MIR research, as is made clear by an
examination of the sub-disciplines of MIR outlined . Indeed, many important areas of MIR
research can be formulated directly as automatic music classification problems. Tasks such as
genre, mood, artist and composer classification are all examples of this, as are tag prediction
and classification by time period or geographical place of origin.Many of the MIR research
areas associated with musical similarity also use very similar techniques to those used in
automatic music classification. For example, tasks such as playlist generation, music
recommendation, cover song detection and hit prediction all typically involve many of the
same features used in music classification.