Analysis and Clustering of Musical Compositions Using Melody-Based Features
Analysis and Clustering of Musical Compositions Using Melody-Based Features
Abstract
This paper demonstrates that melodic structure fundamentally differentiates musical genres. We use two methods: the
k-Means algorithm as an unsupervised learning method for understanding how to cluster unorganized music and a Markov
Chain Model which determines relative probabilities for the next notes most likely value, and evaluate our algorithms
accuracy in predicting correct genre. Our experiments indicate that the k-Means approach is modestly successful for
separating out most genres, whereas the Markov Chain Model tends to be very accurate for music classification.
1 Objective 27 Carnatic (South Indian classical) composi-
tions in equivalent scales to the Irish modes:
This paper demonstrates that melodic structure, i.e. Shankarabharanam (Major), Kharaharapriya (Do-
note subsequences and which notes are likely to follow rian), Harikambhoji (Mixolydian), Bhairavi (Mi-
other notes, can fundamentally differentiate musical nor), and Malahari (incomplete Minor).
genres, without additional information about instru-
mentation, chord structure, language, etc. Smaller data sets: a variety of childrens songs
and 13 Sarali Varasai (Carnatic vocal exercises) in
This idea is inspired in part by the concept in Indian ragam Mayamalavagowla.
classical music that each raga, or scale, is distinguished
by its own characteristic melodic phrases, or melodic All data are expressed relative to the root: the key is
idioms. This occurs in Western classical music also; for disregarded.
instance, consider the typical third, trilled second, root,
root to end phrases in Baroque pieces in the major scale.
3 Methods
Potential applications of our models include: using
3.1 k-Means Clustering
the k-Means algorithm to define musical clusters for
melodies without specified genres; using either k-Means 3.1.1 Rationale
cluster centroids or the Markov Chain Model to deter-
Given a dataset of melodies with unknown genre, can
mine known melodies similar to a new melody, using
we identify which melodies are similar? To answer
the either technique for automated genre prediction,
this question, we looked for an unsupervised learning
and using the Markov Chain Model to generate new
algorithm with a non-probabilistic model to identify
melodies within a musical genre or tradition.
song clusters. Although we believed that our data
would fit subspaces better than clusters, we did not
want to obscure the datas original features in our
results. Therefore, we decided to use the k-Means algo-
2 Data rithm as opposed to alternatives such as the PCA model.
Songs were stored as arrays of integers, with each integer In order to test the success of our clustering, we
representing a musical note. Rhythm was ignored. The ran k-Means on two melody genres with only two
sources of data were the following: clusters, used maximum recall probability to determine
the correct cluster assignment, and calculated an
2,000 Irish traditional songs scraped from theses- F-score to account for both precision and recall error.
sion.org in the Dorian, Mixolydian, Ionian (Major) A higher F-score indicates less error and better success.
and Aeolian (Minor) modes and in the time signa- To account for variability in k-Means, we averaged the
tures 2/4, 3/4, 4/4, 6/8 and 9/8. The majority F-scores for multiple iterations of k-Means.
(70%) were major.
Eventually, to determine the ideal cluster for a
new song, one could compare the songs distance in the
feature space to the cluster centroids, and the cluster
with the centroid a minimum distance away would be
considered the cluster of best fit.
1
3.1.2 Feature Definition where each term p(di |dik1...i1 ) is the smoothed
probability given by the Markov chain that the k-note
We define our features to be a sequence of absolute notes
subsequence dik1...i1 (henceforth also: feature) is
of a specified length, such as C-E-G#. We characterize
followed by the note di . This equation is the result of
each composition by the frequency of each feature in that
a slightly stronger variant of Naive Bayes assumption:
composition, and cluster the compositions based on their (i)
it is derived by assuming that note dj is independent
location in the resulting feature space.
of all notes further than k notes before it. Therefore:
Because of the varying sizes of the data sets, we used
3.1.3 Feature Subset Selection a standard 70/30% hold out split for the large data
Since using all possible note sequences as features sets and leave out one cross validation (LOOCV) for
would result in a too-sparse feature space for k-Means the smaller ones. The Markov Chain Model tended to
clustering, we selected a subset of features to serve as perform the quite well, with training error of around
the axes for the feature space. 1% for k = 3 for the entire data set. The following data
involve representations of the songs in terms of relative
We considered two methods for selecting features: degree.
1) selecting the total most frequent features across all
songs in the two genres, and 2) selecting the features Observing the performance of the model as a function of
with the highest variance in relative frequency, i.e. k provides important insight into the data (Figure 2b).
features that are very frequent in some categories and Over a variety of parameter values and data subsets,
very infrequent in other categories. Markov models of level 3 and 4 showed the least average
training error.
3.1.4 Number of Features
The failure of k = 1 to predict genre well demon-
We varied the number of features selected for k-Means strates that looking only at one note before, suggesting
clustering from 1 feature to 200 features. 1 feature would melodic idioms of length two (i.e. intervals), is too
be equivalent to seeing if a single note sequence is more myopic. (Note that it still performs significantly better
prevalent in some genres than others. The maximum than chance, however.) Longer features also lead to
number of features is numbero fn otesf eaturel ength . poorer models. This also makes intuitive sense, in that
longer subsequences begin to be characteristic of the
3.1.5 Feature Length overall melody of a specific song, and are consequently
long enough to be easily consciously recognized. The
We examined features from length 1 to 5. For feature
drive to be unique will therefore discourage songs from
length 1, our algorithm is equivalent to analyzing
developing similar features of this length. Equally im-
differences in note frequency distributions.
portantly, the data becomes sparser for these k-values,
because the size of the feature space is exponential in
the length of the feature.
3.2 Markov Chain Model
For our second model, we modeled each genre with We postulate moreover that levels three and four
a Markov chain model for a variety of levels k. This showed the best genre categorization because they
model makes intuitive sense as a way to capture melodic are similar to the most common lengths of measures,
idioms, because it explicitly models each note as being which are natural structural breaks in melody. Irish
drawn from a probability distribution dependent on the songs in particular tend to be strongly rhythmic, and
k notes directly preceding it. furthermore are a robuster dataset. We therefore
separated the Irish tunes into four categories, based
For a level k Markov chain model, we model the on their time signatures: 2/4, 3/4, 4/4, 6/8 and 9/8,
probability P (d(i) |g) that a held out document d(i) of predicting that k = 3 would predict better for 3/4,
length n belongs to a genre g with the following formula: 6/8 and 9/8, whereas k = 4 would predict 4/4 better
(Figure 3).
n
(i) (i)
Y
P (d(i) |g) = p(dj |djk1...j1 , g)
j=k+1
2
This certainly turned out to be true for k = 4. For
k = 3 the odd-numbered time signatures fared much
better, but still had higher error than 4/4. This could
either indicate that songs in the tempo 4/4 tend to be
more self similar and therefore predictable; or it could
be a result of the fact that this category has more songs.
3
Table 1: F-Scores for Feature Subset Selection by Highest Frequency
GENRE malahari childrens harikam- shanka- bhairavi kharaha- irish irish irish irish
tunes bodhi rabharanam rapriya major minor dorian mixylo-
dian
sarali varasai 0.8833 0.8198 0.8077 0.8539 0.9167 0.9042 0.2790 0.5747 0.5983 0.8228
malahari 0.8542 0.8333 0.8452 0.8667 0.8542 0.2456 0.5531 0.4053 0.4494
childrens tunes 0.7471 0.6392 0.8875 0.8750 0.2482 0.4246 0.4405 0.5118
harikambodhi 0.6198 0.8667 0.6889 0.2521 0.3906 0.4160 0.5420
shankarabharanam 0.8786 0.8661 0.2517 0.4428 0.3816 0.6126
bhairavi 0.6467 0.2636 0.5680 0.5159 0.7406
kharaharapriya 0.2509 0.4734 0.4875 0.7510
irish major 0.5043 0.4459 0.3519
irish minor 0.5983 0.6684
irish dorian 0.5600
Mean F-score: 60.32
GENRE malahari childrens harikam- shanka- bhairavi kharaha- irish irish irish irish
tunes bodhi rabharanam rapriya major minor dorian mixylo-
dian
sarali varasai 0.7266 0.7566 0.6701 0.7855 0.8328 0.6466 0.3163 0.9582 0.9594 0.9314
malahari 0.7689 0.8333 0.8452 0.8667 0.8542 0.2450 0.5170 0.4008 0.4052
childrens tunes 0.6410 0.6606 0.6990 0.6761 0.2463 0.4464 0.3854 0.5453
harikambodhi 0.6750 0.8667 0.6889 0.2392 0.3850 0.3407 0.4590
shankarabharanam 0.8786 0.8661 0.2478 0.4250 0.3626 0.4593
bhairavi 0.7075 0.2589 0.5344 0.4327 0.6249
kharaharapriya 0.2520 0.4581 0.3857 0.5129
irish major 0.4172 0.4372 0.3386
irish minor 0.6197 0.6813
irish dorian 0.5619
Mean F-score: 57.70
4
Figure 2
Figure 3: Training error versus time signature (beats per measure). When training with features of length 4 (right), songs
in 4/4 are much more accurately classified. A feature of length 3 (left) improves the classification of odd-valued tempos
significantly, but are still not classified as well as 4/4.