Piece Identification in Classical Piano Music Without Reference Scores

This paper presents a method for identifying classical piano music pieces from short audio excerpts without requiring reference scores. The approach involves compiling a reference database of performances obtained from online sources, which are then transcribed and processed using a symbolic fingerprinting algorithm to facilitate audio matching. The system automates the creation of the reference database and improves identification accuracy by increasing redundancy and employing a preprocessing step to select suitable performances.

Uploaded by

jlgultraboom

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

Piece Identification in Classical Piano Music Without Reference Scores

Uploaded by

jlgultraboom

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

PIECE IDENTIFICATION IN CLASSICAL PIANO MUSIC WITHOUT

REFERENCE SCORES

Andreas Arzt, Gerhard Widmer

Department of Computational Perception, Johannes Kepler University, Linz, Austria
Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria
[email protected]

ABSTRACT instrumentation, ornamentation and other performance as-

pects. Regarding classical music, the identification of per-
In this paper we describe an approach to identify the formances that derive from a common musical score is of
name of a piece of piano music, based on a short audio ex- special interest, as in general there exists a large number
cerpt of a performance. Given only a description of the of performances of the same piece (and new renditions are
pieces in text format (i.e. no score information is pro- performed every day).
vided), a reference database is automatically compiled by This task is generally called audio matching (or, mostly
acquiring a number of audio representations (performances in the context of popular music, cover version identifica-
of the pieces) from internet sources. These are transcribed, tion, see e.g. [14]). A common approach to solve this prob-
preprocessed, and used to build a reference database via a lem is to use an audio alignment algorithm. This is com-
robust symbolic fingerprinting algorithm, which in turn is putationally expensive, as it basically involves aligning the
used to identify new, incoming queries. The main chal- query snippet with every position within every audio file in
lenge is the amount of noise that is introduced into the the database (see [12], and [11] for a indexing method that
identification process by the music transcription algorithm makes the problem more tractable). Furthermore, due to
and the automatic (but possibly suboptimal) choice of per- the coarse feature resolution of these algorithms, relatively
formances to represent a piece in the reference database. large query sizes are needed.
In a number of experiments we show how to improve the As there exist efficient fingerprinting algorithms, it
identification performance by increasing redundancy in the seems natural to try to adapt them to the problem of cover
reference database and by using a preprocessing step to version identification. A first study towards this is pre-
rate the reference performances regarding their suitability sented in [9], where the authors focused on the suitabil-
as a representation of the pieces in question. As the results ity of different low-level features as a basis for fingerprint-
show this approach leads to a robust system that is able ing algorithms, but neglected the problem of tempo dif-
to identify piano music with high accuracy – without any ferences between performances. In [1] an extension to a
need for data annotation or manual data preparation. well-known fingerprinting algorithm [17] is proposed that
makes it invariant to the global tempo. With the help of
1. INTRODUCTION an audio transcription algorithm for piano music (see [5])
a system was built that, given a short audio query, almost
Efficient algorithms for content-based audio retrieval en- instantly returns the corresponding (symbolic) score from
able systems that allow users to browse and explore music a reference database – despite the fact that audio transcrip-
collections (see e.g. [10] for an overview). In this con- tion is a very hard problem and thus introduces a lot of
text audio fingerprinting algorithms which permit the fast noise in the process.
identification of an unknown recording (as long as an al- In this paper we show how to use this algorithm in the
most exact replica is contained in the reference database) absence of symbolic scores to identify unknown perfor-
play an important role. For this task there exist highly effi- mances, using a reference database based on other perfor-
cient algorithms that are in everyday commercial use (see mances of the pieces in question. As symbolic scores are
e.g., [3, 6, 13, 15–17]). often not readily available, this increases the applicability
However, these algorithms are not able to identify dif- of this algorithm in real life systems. The downside of this
ferent performances of the same piece of music, as they approach is that now audio transcription is used for both
are not designed to work in the face of musical variations the data contained in the reference database and for the
such as different tempi, expressive timing, differences in queries, which introduces even more noise. Furthermore,
the transcription algorithm we are using is optimised on pi-
ano sounds, which for now limits the proposed system to
c Andreas Arzt, Gerhard Widmer. Licensed under a Cre-
piano music only.
ative Commons Attribution 4.0 International License (CC BY 4.0). At-
tribution: Andreas Arzt, Gerhard Widmer. “Piece Identification in Clas- We are going to describe this approach in the context of
sical Piano Music Without Reference Scores”, 18th International Society a system geared towards fully automatic identification of
for Music Information Retrieval Conference, Suzhou, China, 2017. classical piano music, in the sense that even the creation
of the collection of audio recordings, which is needed to and the piece, and adding the word “piano”, to ensure that
perform the identification task, is automated. The moti- mainly piano performances are returned.
vation for this is to reduce the amount of costly manual Next, the collected recordings are fed into a Music
annotation to a minimum, and instead facilitate available, Transcription Algorithm that takes the audio files and
albeit noisy, web sources like YouTube 1 or Soundcloud 2 . transcribes them into series of symbolic events. For this
The main challenge in this setting is the noise introduced step we rely on a well known neural network based method
into the identification process via multiple processes (auto- presented in [5], more specifically the version that is avail-
matic retrieval of reference performances, audio transcrip- able as part of the Madmom library [4]. As input it takes a
tion of reference performances, and audio transcription of series of preprocessed and filtered STFT frames with two
the query). In the paper we will show how to deal with different window lengths. The neural network consists of
this amount of noise by increasing redundancy in the ref- a linear input layer with 324 units, three bidirectional fully
erence database and by an automatic selection strategy for connected recurrent hidden layers with 88 units, and a re-
the reference performances. gression output layer with 88 units, which directly repre-
The paper is structured as follows. Section 2 gives an sent the MIDI pitches. The output of the transcription al-
overview of the proposed system. Then, in Section 3 the gorithm is a list of detected musical events, represented by
data we are using for our experiments is described. Sec- their pitches and start times. For details we refer the reader
tions 4, 5, 6 and 7 describe the core experiments of the to [5]. This algorithm exhibits state of the art results for
paper, showing that our approach is robust enough to cope the task of piano transcription, as was demonstrated at the
with the multiple sources of noise and performs well in our MIREX 2014 3 . Still, polyphonic music transcription is a
experiments. A brief outlook on possible improvements very hard problem, and thus the output of this transcription
and applications is given in Section 8. algorithm contains a relatively large amount of noise, of
which the following components need to be robust to.
The Automatic Preprocessing step is concerned with
2. SYSTEM OVERVIEW
the question of which of the downloaded recordings for
In this section we are going to describe the piece identifi- each piece should be used in our fingerprint database. In
cation system that will be used throughout the paper. The this paper we discuss three setups: take the top match re-
main goals of the system are 1) to automate the process of turned by the web crawler (see Section 4), take the top five
compiling a reference database, thus making manual anno- / fifteen matches returned by the web crawler (see Section
tations obsolete, and 2) based on this reference database, 5), and download 30 recordings for each piece, rank them
allow for robust and fast piece identification. Figure 1 de- automatically via comparing them to each other and use
picts how the components interact with each other. the top recordings identified via this approach (see Section
The system is based on a Database Definition file, 6). This means that in the latter two experiments a single
which is a list of pieces that are to be included in the piece is represented by multiple recordings, adding redun-
database. On this list each piece is represented by an ID, dancy to the reference database.
the name of the composer and the name of the piece, in- The transcribed sequences of symbolic event informa-
cluding identifiers like the opus number (see Figure 2 for tion, i.e. sequences of pairs (pitch, onset time), are fed to
an excerpt of the list). We would like to emphasise once the Tempo-invariant Symbolic Fingerprinter, to build a
more that this is the only input our system needs (in ad- database of fingerprints that later on can be used to iden-
dition to a source from which the recordings can be re- tify queries. The algorithm is used as described in [1],
trieved). All the data necessary to perform the identifica- thus it will be summarised here very briefly. The princi-
tion task is then prepared automatically. This also means ple idea of the fingerprinting algorithm is to represent an
that extending the database is as easy as adding a new line instance (in this case a transcribed performance, represent-
to the text file, describing the new piece. The data in this ing a piece) via a large number of local, tempo-invariant
file also defines the granularity of the database. For ex- fingerprint tokens. These tokens are created based on the
ample, movements of a sonata could be represented as in- pitches of three temporally local note events, together with
dividual pieces or combined as single piece – for our ex- the ratio of their distances in time. Due to the way they are
periments we took the latter approach. For our proof-of- created, the tokens are invariant to the global tempo, and
concept implementation we settled for 339 piano pieces of can be stored in a hash table and efficiently queried for.
well-known composers (Mozart, Beethoven, Chopin, Scri- An incoming Query is processed in the same way as
abin, and Debussy), which already represents a substantial above by the Music Transcription Algorithm. The re-
share of the classical piano music repertoire. sulting sequence of symbolic events is used to query the
A Web Crawler takes this list of pieces and retrieves Tempo-invariant Symbolic Fingerprinter for matches.
audio recordings of performances of the pieces. In our To do so, from the query the same kind of fingerprint to-
case we use a simple crawler for YouTube (an alternative kens are computed, and matching tokens are retrieved from
would be to use Soundcloud, amongst others). The queries the fingerprint database. Finally, in this result set continu-
are constructed by concatenating the name of the composer ous sequences of matching tokens, which are a strong in-
1 https://www.youtube.com 3 http://www.music-ir.org/mirex/wiki/2014:
2 https://soundcloud.com MIREX2014_Results
Web Crawler Music Transcription Algorithm Automatic Preprocessing
Database Definition
Crawl Web Source for Audio Transcribe Recordings Automatically Identify Suitable
List of Pieces (Text)
Recordings (e.g. YouTube) (Performances of Pieces) Performances

Query Results
Tempo-invariant Symbolic
Name of the Piece,
Fingerprinter
corresponding to the Query

Query
Music Transcription Algorithm
Audio Snippet of an Unseen
Transcribe Query
Performance of a Piece

Figure 1. System Overview

ID ; Composer ; P i e c e Piece ID Performance ID Time in Ref. Score

...
1 7 ; M o z a r t ; P i a n o S o n a t a No . 17 i n B−f l a t m a j o r K 570
1 8 ; M o z a r t ; P i a n o S o n a t a No . 18 i n D m a j o r K 576 1 0 99 351
1 9 ; M o z a r t ; F a n t a s y No . 1 w i t h Fugue i n C m a j o r K 394
2 0 ; M o z a r t ; F a n t a s y No . 2 i n C minor , K 396
1 0 21 292
... 1 4 16 109
4 1 ; B e e t h o v e n ; P i a n o S o n a t a No . 1 4 , Op . 2 7 , No . 2 ” M o o n l i g h t ”
4 2 ; B e e t h o v e n ; P i a n o S o n a t a No . 1 5 , Op . 28 ” P a s t o r a l ” 1 4 15 36
...
1 6 8 ; Chopin ; Mazurka Op . 7 No . 5 i n C m a j o r 1 4 148 36
1 6 9 ; Chopin ; N o c t u r n e Op . 15 No . 1 i n F m a j o r
1 7 0 ; Chopin ; N o c t u r n e Op . 15 No . 2 i n F−s h a r p m a j o r
1 4 150 32
1 7 1 ; Chopin ; N o c t u r n e Op . 15 No . 3 i n G m i n o r 10 48 368 7
...
2 8 1 ; Debussy ; L 1 1 3 , C h i l d r e n ’ s C o r n e r , D o c t o r G r a d u s ad P a r n a s s u m 1 0 239 7
2 8 2 ; Debussy ; L 1 1 3 , C h i l d r e n ’ s C o r n e r , Jimbo ’ s L u l l a b y
...
3 3 2 ; S c r i a b i n ; P i a n o S o n a t a No . 3 , Op . 23
3 3 3 ; S c r i a b i n ; P i a n o S o n a t a No . 4 , Op . 30
... Table 1. An example of a result returned by the fingerprint-
ing algorithm. This query was performed on a database in
which multiple reference performances represent a piece of
Figure 2. An excerpt of the file used for collecting the
music, hence for the piece with ID 1 results for two perfor-
database.
mances are returned. The score is the number of matching
fingerprint tokens for the given query at the specific time
dication that the query matches a specific part of a piece in the reference recording. For our purposes we summarise
stored in the fingerprint database, are identified (via a fast, the results per piece, i.e. the matching score for the piece
histogram based approach). with ID 1 is 863, and for the piece with ID 10 it is 7.
The Query Result is a list of positions within the refer-
ence performances that were inserted into the database (see
Table 1). The positions in the result set are ordered by their matically downloaded data that is used to build the refer-
number of tokens matching the query. As can be seen, the ence database later on. In total 370 tracks were selected
result set is actually more detailed than necessary for our and assigned manually to the respective pieces (roughly
applications scenario, as we are only interested in identify- 30 hours of music, or 665 000 transcribed events). Some
ing the respective piece, and not a specific reference perfor- of the tracks were assigned to the same piece, as e.g. the
mance (or even a position within reference performance). movements of the sonatas are typically represented as dif-
Thus for the experiments in this paper we summarise all ferent audio tracks, but are represented as a single piece in
occurrences of a piece into one score by summing up the our database.
matching scores of all its occurrences in the results set. The experimental setup is as follows. We are going to
use the same set of randomly extracted queries for each ex-
periment. We are using three query lengths of 2, 5 and 10
3. GROUNDTRUTH DATA AND EXPERIMENTAL
seconds (we only took queries though which had at least 10
SETUP
transcribed notes, avoiding to e.g. query for silence), and
For the experiments presented in this paper, ground truth extract for each length ten queries for each ground truth
data, i.e. performances for which the composer and the performance (giving a total of 3 700 queries for each query
name of the piece is known, is needed. We are using com- length). The experiments are based on different strategies
mercial recordings of a large part of the pieces contained to automatically compile the reference database. We start
in our database. This includes e.g. Uchida’s recordings of with a simple baseline approach (Section 4) and then grad-
the Mozart Sonatas, Brendel’s recordings of the Beethoven ually improve on it by introducing redundancy and a selec-
Sonatas, Chopin recordings by Arrau, Pires and Pollini, tion strategy (Sections 5 to 7).
and Debussy recordings by Pollini, Thibaudet, Zimerman. As evaluation measure we use the Recall at Rank k 4 .
We would like to emphasise that to get realistic results, 4 We would like to note that the related measure Precision at Rank k
in our experiments we made sure manually that no exact is not useful in our experimental setup, as there will only be at most one
replicas of these performances are contained in the auto- correct result in the result set.
Query Length Query Length
2s 5s 10 s 2s 5s 10 s
Recall at Rank 1 0.28 0.38 0.46 Recall at Rank 1 0.58 0.69 0.74
Recall at Rank 5 0.34 0.45 0.54 Recall at Rank 5 0.72 0.84 0.90
Recall at Rank 10 0.35 0.47 0.55 Recall at Rank 10 0.74 0.86 0.92
Mean Reciprocal Rank 0.30 0.41 0.48 Mean Reciprocal Rank 0.64 0.77 0.84
Mean Query Time 0.13 s 0.41 s 0.92 s Mean Query Time 0.34 s 0.81 s 2.49 s

Table 2. Results of the baseline approach. The results are Table 3. Results on the reference database based on mul-
based on 3 700 queries for each query length. tiple recordings (the top five results according to the web
source) to represent each piece. The results are based on
3 700 queries for each query length.
This is the percentage of queries which have the correct
corresponding piece in the first k retrieval results. In our
experiments we look at the recall at ranks 1, 5 and 10. In Query Length
addition, we also report the Mean Reciprocal Rank (MRR). 2s 5s 10 s
Recall at Rank 1 0.76 0.87 0.91
|Q|
1 X 1 Recall at Rank 5 0.84 0.94 0.97
MRR = (1) Recall at Rank 10 0.86 0.95 0.98
|Q| i=1 ranki
Mean Reciprocal Rank 0.80 0.90 0.94
Here, ranki refers to the rank position of the correct re- Mean Query Time 0.82 s 2.85 s 6.08 s
sult for the ith query.
The mean query times (i.e. the mean time it takes to
process a single query) given in the tables are based on a Table 4. Results on the reference database based on multi-
desktop computer on a single core 5 . If needed, the compu- ple recordings (the top fifteen results according to the web
tation could easily be sped up by multi-threading the query source) to represent each piece. The results are based on
process. 3 700 queries for each query length.

4. BASELINE APPROACH 5. USING MULTIPLE INSTANCES PER PIECE

The baseline approach is very straightforward. The web A simple way to improve the performance of the system is
crawler is used to download the top result from the web to increase the redundancy within the reference database.
source for each piece on the list. The downloaded audio Instead of relying on a single instance (recording) for each
files are transcribed and then processed by the fingerprint- piece in the reference base, each piece is represented by
ing algorithm to build the reference database, i.e. in the multiple recordings. For the first experiment five perfor-
reference database each piece is represented by one performances per piece were downloaded using the web crawler.
mance. Note that due to the automatic process the database The performances were processed in the same way as for
can be quite noisy, as some of the pieces might be incom- the baseline approach in Section 4 above and inserted into
plete (e.g. only a single movement of a piece), represented the fingerprint database. Then, on this database the same
by more than the actual piece (if e.g. the performance set of queries were performed. As described in Section 2,
downloaded for the piece also contains other pieces, like a the match score of a piece is computed by summing up the
recording of a full concert), or the representation is wrong scores of the performances representing the piece in ques-
(if the top result of the web crawler is actually a perfor- tion (also see Table 1).
mance of some other piece).
Table 3 shows the results of this experiment. As can
The generated fingerprint database is queried via the
be seen, the increased redundancy leads to a substantial
prepared excerpts of the collected ground truth data (see
increase in identification results, compared to the baseline
Section 3). The results of this first experiment can be seen
(see Table 2). The added redundancy increases the chances
in Table 2. As can be seen, already in this scenario and
that for each piece at least one “good” performance (in the
despite the small query sizes the method gives reasonable
sense of corresponding to the piece and relatively easy to
results. For queries of length ten seconds the algorithm re-
transcribe) is contained in the reference database, and thus
turns the correct name of the piece in close to 50% of the
mitigates the problems caused by noise, at least to some
cases. A closer look at the results though showed that the
extent.
main problem with this simplistic approach is that, as ex-
pected, for many pieces the representation in the database For an additional experiment we increased the number
is not correct or incomplete. This problem is tackled in the of performances to fifteen per piece. These results are
following sections. shown in Table 4. This improved the results even further.
The downside of adding more instances to the fingerprint
5 Intel Core i7 6700K 4 GHz with 32 GB RAM. database is a significant increase in computation time.
Query Length of candidate performances. To do so, the performances are
2s 5s 10 s transcribed and inserted into a new fingerprint database.
The intuition is that for a query extracted from the same
Recall at Rank 1 0.54 0.68 0.74
set of candidate performances (that actually matches the
Recall at Rank 5 0.63 0.76 0.83
piece), the fingerprinter will likely return three kinds of re-
Recall at Rank 10 0.64 0.78 0.85
sults. Firstly, the top result will be the performance the
Mean Reciprocal Rank 0.58 0.72 0.78
query was taken from. This is a perfect fit for all tokens,
Mean Query Runtime 0.14 s 0.47 s 0.97 s
which results in the maximum score. Secondly, a number
of other performances will probably also have a high score,
Table 5. Results on the reference database based on the top identifying them as being based on the same piece and
recording selected via the proposed strategy to represent as being transcribed in sufficient quality. Thirdly, perfor-
each piece. The results are based on 3 700 queries for each mances that actually belong to a different piece, or which
query length. are transcribed poorly, will score very low.
Based on these observations, we designed the process
Query Length of ranking the performances regarding their suitability to
2s 5s 10 s represent the piece in question as follows. For each of
the performances ten queries are randomly extracted (for
Recall at Rank 1 0.72 0.85 0.89 our experiments we used a query length of ten seconds)
Recall at Rank 5 0.82 0.92 0.96 and processed by the fingerprinting algorithm. As in all
Recall at Rank 10 0.84 0.93 0.97 other experiments, the results are summarised on the per-
Mean Reciprocal Rank 0.77 0.88 0.92 formance level (i.e. match scores of positions within the
Mean Query Time 0.49 s 1.71 s 3.83 s same performance are summed up). Then, for each result
the score of the top match (i.e. of the performance the
query stems from) is stored, this performance is removed
Table 6. Results on the reference database based on mul-
from the result set, and the remaining matching scores are
tiple recordings (top five recordings selected via the pro-
normalised by dividing by the top match score. The rea-
posed strategy) to represent each piece. The results are
soning behind this is that the absolute scores depend on the
based on 3 700 queries for each query length.
particulars of the query (foremost the length in the sense of
the number of notes, but also e.g. if the part in question is
6. AUTOMATICALLY SELECTING SUITABLE normally played in a steady tempo or is subject to expres-
REPRESENTATIONS sive tempo changes, which makes it harder to detect and
leads to a lower score).
A closer look at the results so far shows that increasing the
This results in 300 preprocessed and normalised result
redundancy in the reference database indeed leads to bet-
sets. The suitability of a performance to represent the piece
ter results, but also increases the computation time. The
in question is computed by summing up all the scores of
main problem with our approach is that in addition to use-
all its occurrences in the result sets. The higher this value
ful data, the process also adds a lot of extra noise to the
is for a performance, the more it has in common with the
fingerprint database. The web crawler returns a consid-
other performances assigned to the piece in question.
erable number of performances of the wrong piece, per-
formances played on a different instrument, and perfor- Based on this ranking we repeat experiments from Sec-
mances recorded in very bad quality. This kind of data tions 4 and 5, but this time for each piece we select the top
increases the runtime and decreases the identification ac- one or top five performances, respectively, according to the
curacy. In this section we present a method for identifying computed rank within the candidate set for each piece. The
performances in a given a set of candidates for a piece that results are shown in Tables 5 and 6, which should be com-
most probably are related to the piece in question, which pared to Tables 2 and 3, respectively. As can be seen the
also enables us to discard performances that most proba- selection strategy increases the identification performance
bly are noise. In this way we try to reduce the number for both scenarios and for all query lengths.
of stored fingerprint tokens, which generally decreases the A comparison of Tables 6 and 4 shows that by using
computation time, while still achieving good identification the proposed selection strategy a lower number of perfor-
performance. mances (5 versus 15) is sufficient to achieve comparable
Thus, for each piece we perform the following process identification accuracy. The decreased number of tokens
to select appropriate representations. First, 30 recordings also results in roughly half the computation time.
are downloaded via the web crawler. With a high probabil- The runtime actually depends on a number of factors,
ity at least some of these are actually piano performances most importantly the size of the fingerprint database. But
of the piece we are looking for, while the others might have of similar influence is the actual number of tokens that are
nothing in common. The idea now is to find a homoge- returned by the fingerprint database for a specific query.
nous group within this set of candidates. To identify per- The reason is that each of these tokens has to be processed
formances which are part of this group, we again employ individually to come up with the matching score. This also
the symbolic fingerprinting process, but limited to the set means that queries for pieces which are represented in the
database by a large number of performances will actually Querylength
take longer to compute – a further argument in favour of 2s 5s 10 s
the selection strategy presented in this section.
Recall at Rank 1 0.92 0.95 0.95
Recall at Rank 5 0.98 0.99 0.99
7. USING MULTIPLE QUERIES PER Recall at Rank 10 0.99 1 1
PERFORMANCE Mean Reciprocal Rank 0.94 0.97 0.97
Mean Query Time 0.49 s 1.71 s 3.83 s
So far the assumption was that we only have access to a
single short query of two to ten seconds. If instead we have
access to a full recording, just querying for one short query Table 7. Results for querying for a whole performance
would be a suboptimal approach. Thus, we tried an addi- via ten random small queries with ten seconds each. The
tional query strategy on the reference database based on results are based on 3 700 queries for each query length.
the performance selection strategy from Section 6 above.
A standard approach for processing long queries (in
the automatic compilation of the reference database. Addi-
this case a whole performance) would be to apply shin-
tionally, this increases the robustness of the identification
gling [2,7,8], i.e. splitting longer queries into shorter, over-
process via the fingerprinting algorithm, as ’problematic’
lapping ones and track the results of these sub-queries over
sections (e.g. regarding the transcription process) are rep-
time. Here, as proof of concept we use an even simpler
resented multiple times, thus increasing the chances that
method: we select ten random queries from the piece we
the parts in question are well covered by the reference
want to identify, process them individually and sum up the
database.
results. This can be seen as adding redundancy (relying
There exist a number of possible improvements regard-
on multiple queries instead of a single one) on the query
ing the automatic selection of performances for a piece. In
side. We perform this experiment on the reference database
our implementation the focus is on increasing the homo-
based on the top five selected recordings via the proposed
geneity within the group of performances for a piece by
strategy. The results are shown in Table 7. As can be seen
comparing them to each other. An additional option is to
this again considerably improves the results, and we are
analyse matches on the full reference database and try to
getting very close to 100%. The main cause for this is that
find out which performances match well to multiple pieces
the retrieval precision heavily depends on the quality of
and exclude them (as they cover multiple songs or were
the transcription. Some parts of a performance are much
mistakenly assigned to multiple pieces by the crawler).
harder to transcribe than others (e.g. heavily polyphonic
We are currently in the process of collecting a much
parts with a lot of sustain pedal, which are difficult to tran-
larger collection of classical piano music. This dataset will
scribe correctly). Using multiple queries, randomly dis-
contain a few thousand pieces, covering a large part of the
tributed over the whole performance, increases the chances
classical piano repertoire 6 . On this dataset we are going
that at least some parts are transcribed in good quality, and
to conduct experiments regarding the scalability of our ap-
that together these queries enable high retrieval accuracy.
proach in terms of runtime and retrieval accuracy.
Finally, we had a closer look at the few performances
In the future, we will also investigate the usefulness of
that were still misclassified and identified two problems.
the presented approach for non-classical piano music. Pre-
Our approach does not take care of the problem of record-
liminary experiments have shown that this is a much harder
ings of full concerts. If included in the reference database
task, as compared to classical piano music the pieces are
for multiple pieces, these will lead to misclassifications.
not as strictly defined via a detailed score (e.g. popu-
Furthermore, for some pieces only a small number of per-
lar songs and jazz standards are mostly described via lead
formances exists, which causes the crawler to return “sim-
sheets). Thus, performances of the same piece differ more
ilar” but wrong performances (e.g. performances of other
heavily than in classical music. Of course we would also
pieces of the same composer). We sketch a possible solu-
like to lift the restriction to piano music and try our method
tion to these problems in Section 8 below.
on other genres, but thus far general music transcription is
not robust enough to be used with our approach. Hopefully
8. CONCLUSIONS AND FUTURE WORK this will change in the future.
Finally, regarding real-world applications, an automatic
In this paper we presented an approach towards piece method to determine which pieces are well covered by
identification for performances of piano music, based on the database, and which ones would benefit from man-
an automatically compiled reference database using web ual intervention, would be desirable. This would help to
sources. It is shown that the symbolic fingerprinting quickly build a reference database which already covers
method is robust enough to deal with the noise introduced most pieces well, and then to manually add additional ref-
by the transcription algorithms and allows for fast query- erences (based on performances, or even on symbolic score
ing in the symbolic domain. Furthermore, increasing the data) for pieces the identification algorithm struggles with.
redundancy by using multiple performances to represent a 6 The reference database is of course compiled automatically (based
single piece, especially using the proposed selection strat- on the list of pieces), but the preparation of the ground truth for the ex-
egy, largely alleviates the problem of noise introduced by periments is a time consuming, manual process.
9. ACKNOWLEDGEMENTS [10] Peter Grosche, Meinard Müller, and Joan Serrà. Au-
dio content-based music retrieval. In Meinard Müller,
This work is supported by the European Research Council
Masataka Goto, and Markus Schedl, editors, Mul-
(ERC Grant Agreement 670035, project CON ESPRES-
timodal Music Processing, volume 3 of Dagstuhl
SIONE).
Follow-Ups, pages 157–174. Schloss Dagstuhl–
Leibniz-Zentrum für Informatik, Dagstuhl, Germany,
10. REFERENCES 2012.
[1] Andreas Arzt, Sebastian Böck, and Gerhard Widmer. [11] Frank Kurth and Meinard Müller. Efficient index-based
Fast identification of piece and score position via sym- audio matching. IEEE Transactions on Audio, Speech,
bolic fingerprinting. In Proceedings of the Interna- and Language Processing, 16(2):382–395, 2008.
tional Society for Music Information Retrieval Confer-
ence (ISMIR), pages 433–438, Porto, Portugal, 2012. [12] Meinard Müller, Frank Kurth, and Michael Clausen.
Audio matching via chroma-based statistical features.
[2] Andreas Arzt, Gerhard Widmer, and Reinhard In Proceedings of the International Society for Music
Sonnleitner. Tempo- and transposition-invariant iden- Information Retrieval Conference (ISMIR), pages 288–
tification of piece and score position. In Proceedings 295, London, UK, 2005.
of the International Society for Music Information Re-
[13] Mathieu Ramona and Geoffroy Peeters. Audioprint:
trieval Conference (ISMIR), pages 549–554, Taipeh,
an efficient audio fingerprint system based on a
Taiwan, 2014.
novel cost-less synchronization scheme. In Proceed-
[3] Shumeet Baluja and Michele Covell. Waveprint: Ef- ings of the IEEE International Conference on Acous-
ficient wavelet-based audio fingerprinting. Pattern tics, Speech, and Signal Processing (ICASSP), pages
Recognition, 41(11):3467–3480, 2008. 818–822, Vancouver, Canada, 2013.

[4] Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Flo- [14] Joan Serrà, Emilia Gómez, and Perfecto Herrera. Au-
rian Krebs, and Gerhard Widmer. madmom: a new dio cover song identification and similarity: back-
Python Audio and Music Signal Processing Library. ground, approaches, evaluation and beyond. In Z. W.
In Proceedings of the 24th ACM International Con- Ras and A. A. Wieczorkowska, editors, Advances in
ference on Multimedia, pages 1174–1178, Amsterdam, Music Information Retrieval, volume 274 of Studies
The Netherlands, 10 2016. in Computational Intelligence, chapter 14, pages 307–
332. Springer, Berlin, Germany, 2010.
[5] Sebastian Böck and Markus Schedl. Polyphonic piano
[15] Joren Six and Marc Leman. Panako - a scalable acous-
note transcription with recurrent neural networks. In
tic fingerprinting system handling time-scale and pitch
Proceedings of the IEEE International Conference on
modification. In Proceedings of the International So-
Acoustics, Speech, and Signal Processing (ICASSP),
ciety for Music Information Retrieval Conference (IS-
pages 121–124, Kyoto, Japan, 2012.
MIR), pages 259–264, Taipei, Taiwan, 2014.
[6] Pedro Cano, Eloi Batlle, Ton Kalker, and Jaap Haitsma. [16] Reinhard Sonnleitner and Gerhard Widmer. Robust
A review of algorithms for audio fingerprinting. In Pro- quad-based audio fingerprinting. IEEE/ACM Trans-
ceedings of the IEEE International Workshop on Multi- actions on Audio, Speech and Language Processing,
media Signal Processing (MMSP), pages 169–173, St. 24(3):409–421, 2016.
Thomas, Virgin Islands, USA, 2002.
[17] Avery Wang. An industrial strength audio search al-
[7] Michael A. Casey and Malcolm Slaney. Song intersec- gorithm. In Proceedings of the International Society
tion by approximate nearest neighbor search. In Pro- for Music Information Retrieval Conference (ISMIR),
ceedings of the International Society for Music Infor- pages 7–13, Baltimore, Maryland, USA, 2003.
mation Retrieval Conference (ISMIR), pages 144–149,
Victoria, Canada, 2006.

[8] Peter Grosche and Meinard Müller. Toward character-

istic audio shingles for efficient cross-version music re-
trieval. In Proceedings of the IEEE International Con-
ference on Acoustics, Speech, and Signal Processing
(ICASSP), Kyoto, Japan, 2012.

[9] Peter Grosche and Meinard Müller. Toward musically-

motivated audio fingerprints. In Proceedings of the
IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), pages 93–96, Kyoto,
Japan, 2012.

Practical Machine Learning for Computer Vision: End-to-End Machine Learning for Images 1st Edition Valliappa Lakshmanan instant download
No ratings yet
Practical Machine Learning for Computer Vision: End-to-End Machine Learning for Images 1st Edition Valliappa Lakshmanan instant download
44 pages
Business Requirements Document BRD
100% (6)
Business Requirements Document BRD
22 pages
Digital Marketing Specialization - Capstone Project: Prepared By: Praveen Sinha
100% (1)
Digital Marketing Specialization - Capstone Project: Prepared By: Praveen Sinha
36 pages
SOP For Praperation and Validation of Excel Spreadsheet
100% (4)
SOP For Praperation and Validation of Excel Spreadsheet
6 pages
FINEMarine FAQ
67% (3)
FINEMarine FAQ
62 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Qiaozhan Gao Report ReportFinal
No ratings yet
Qiaozhan Gao Report ReportFinal
6 pages
6666666666666666
No ratings yet
6666666666666666
11 pages
ffffffffffffffffff
No ratings yet
ffffffffffffffffff
12 pages
Ismir 2005 Kitahara
No ratings yet
Ismir 2005 Kitahara
6 pages
A Classifi Er-Based Approach To Score - Guided Source Separation of Musical Audio
No ratings yet
A Classifi Er-Based Approach To Score - Guided Source Separation of Musical Audio
9 pages
The Columbine Massacre - Barack Obama - Zionist Wolf in Sheep's (PDFDrive)
No ratings yet
The Columbine Massacre - Barack Obama - Zionist Wolf in Sheep's (PDFDrive)
18 pages
Music Score Alignment and Computer Accompaniment: Roger B. Dannenberg and Christopher Raphael
100% (1)
Music Score Alignment and Computer Accompaniment: Roger B. Dannenberg and Christopher Raphael
8 pages
Computer Vision For Music Identification
No ratings yet
Computer Vision For Music Identification
8 pages
Music database retrieval based on spectral similarity.
No ratings yet
Music database retrieval based on spectral similarity.
9 pages
Article 3
No ratings yet
Article 3
2 pages
Automatic Music Timbre Indexing
No ratings yet
Automatic Music Timbre Indexing
1 page
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
From Everand
Data Compression: Unlocking Efficiency in Computer Vision with Data Compression
Fouad Sabry
No ratings yet
GFV2013
No ratings yet
GFV2013
38 pages
A Discriminative Model For Polyphonic Piano Transcription
No ratings yet
A Discriminative Model For Polyphonic Piano Transcription
9 pages
Bros Sier 04 Fast Notes
No ratings yet
Bros Sier 04 Fast Notes
6 pages
KaliakatsosEV2011 EVOMUSART
No ratings yet
KaliakatsosEV2011 EVOMUSART
11 pages
06061734
No ratings yet
06061734
6 pages
Musical Instrument Timbres Classification With Spectum
100% (1)
Musical Instrument Timbres Classification With Spectum
10 pages
Musical Notes Identification Using Digital Signal
No ratings yet
Musical Notes Identification Using Digital Signal
9 pages
Efficient Index-Based Audio Matching
No ratings yet
Efficient Index-Based Audio Matching
14 pages
An Objective Basis for Music Theory: Information-Dynamic Analysis of Minimalist Music
No ratings yet
An Objective Basis for Music Theory: Information-Dynamic Analysis of Minimalist Music
21 pages
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Christian Dittmar and Christian Uhle
No ratings yet
Christian Dittmar and Christian Uhle
8 pages
Computer Audition: Fundamentals and Applications
From Everand
Computer Audition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Audio Matching Via Chroma-Based Statistical Features
No ratings yet
Audio Matching Via Chroma-Based Statistical Features
8 pages
Musical Instrument Identification Using Deep Learning Approach - 70075
No ratings yet
Musical Instrument Identification Using Deep Learning Approach - 70075
18 pages
Big Data, Big Questions: A Closer Look at The Yale-Classical Archives Corpus (C. 2015)
No ratings yet
Big Data, Big Questions: A Closer Look at The Yale-Classical Archives Corpus (C. 2015)
9 pages
High-Resolution Piano Transcription With Pedals by Regressing Precise Onsets and Offsets Times - v0.2
100% (1)
High-Resolution Piano Transcription With Pedals by Regressing Precise Onsets and Offsets Times - v0.2
10 pages
Computational Music Analysis
No ratings yet
Computational Music Analysis
24 pages
3 Deec 51 Ae 28 Ba 013 A 4
No ratings yet
3 Deec 51 Ae 28 Ba 013 A 4
5 pages
Using Exact Locality Sensitive Mapping To Group and Detect Audio-Based Cover Songs
No ratings yet
Using Exact Locality Sensitive Mapping To Group and Detect Audio-Based Cover Songs
8 pages
Visualizing Music in Its Entirety Using Acoustic Features
100% (1)
Visualizing Music in Its Entirety Using Acoustic Features
8 pages
2022 Compare Deep Learning Models and Evalution Strategies
No ratings yet
2022 Compare Deep Learning Models and Evalution Strategies
14 pages
Música - Analytical Techniques for the Identification
No ratings yet
Música - Analytical Techniques for the Identification
11 pages
Cover Detection Using Dominant Melody Embeddings: Guillaume Doras Geoffroy Peeters
100% (1)
Cover Detection Using Dominant Melody Embeddings: Guillaume Doras Geoffroy Peeters
8 pages
The Beat Spectrum: A New Approach To Rhythm Analysis: 2. Previous Work
No ratings yet
The Beat Spectrum: A New Approach To Rhythm Analysis: 2. Previous Work
4 pages
(Burges, Platt, Jana) Distortion Discriminant Anal
No ratings yet
(Burges, Platt, Jana) Distortion Discriminant Anal
10 pages
Gorlow 16a
No ratings yet
Gorlow 16a
14 pages
Geometric Algorithms For Transposition Invariant Content-Based Music Retrieval
100% (1)
Geometric Algorithms For Transposition Invariant Content-Based Music Retrieval
7 pages
A Mid-Level Representation For Capturing Dominant Tempo and Pulse Information in Music Recordings
100% (1)
A Mid-Level Representation For Capturing Dominant Tempo and Pulse Information in Music Recordings
6 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
100% (1)
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
8 pages
Pitch Recognition Through Template Matching: Salim Perchy
100% (1)
Pitch Recognition Through Template Matching: Salim Perchy
11 pages
2024_ismir_polyphonic_piano_transcription_high_performance
No ratings yet
2024_ismir_polyphonic_piano_transcription_high_performance
8 pages
11111111111111
No ratings yet
11111111111111
4 pages
Final Survey Paper1
No ratings yet
Final Survey Paper1
5 pages
s10844-010-0140-5
No ratings yet
s10844-010-0140-5
22 pages
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
Margulis & Beatty - 2008 - Musical Style, Psychoaesthetics, and Prospects For Entropy As An Analytic Tool
No ratings yet
Margulis & Beatty - 2008 - Musical Style, Psychoaesthetics, and Prospects For Entropy As An Analytic Tool
15 pages
Locating Segments With Drums in Music Signals: Toni Heittola Anssi Klapuri
No ratings yet
Locating Segments With Drums in Music Signals: Toni Heittola Anssi Klapuri
6 pages
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
Davy Bayes 2002
No ratings yet
Davy Bayes 2002
16 pages
Drum Track Transcription of Polyphonic Music Using Noise Subspace Projection
No ratings yet
Drum Track Transcription of Polyphonic Music Using Noise Subspace Projection
8 pages
US6990453
No ratings yet
US6990453
30 pages
Musical Key Detection
100% (1)
Musical Key Detection
18 pages
PS4 21 PDF
No ratings yet
PS4 21 PDF
6 pages
Blostein 1992
No ratings yet
Blostein 1992
30 pages
Automatic Musical Instrument
No ratings yet
Automatic Musical Instrument
1 page
Statical Music Modeling
No ratings yet
Statical Music Modeling
2 pages
A Polyphony of Characteristics
No ratings yet
A Polyphony of Characteristics
23 pages
Made with Creative Commons
No ratings yet
Made with Creative Commons
177 pages
Dont Be a Dinosaur or The Benefits of Open Culture
No ratings yet
Dont Be a Dinosaur or The Benefits of Open Culture
23 pages
Robust Real-Time Music Tracking
No ratings yet
Robust Real-Time Music Tracking
4 pages
Comparison of Implicitization Methods interpolation
No ratings yet
Comparison of Implicitization Methods interpolation
19 pages
Implicit Representation of Parametric Curves and Surfaces
No ratings yet
Implicit Representation of Parametric Curves and Surfaces
13 pages
Simple Tempo Models for Real-time Music Tracking
No ratings yet
Simple Tempo Models for Real-time Music Tracking
7 pages
A Hierarchical Self-Organizing Approach for Learning the Patterns of Motion Trajectories
No ratings yet
A Hierarchical Self-Organizing Approach for Learning the Patterns of Motion Trajectories
10 pages
A Comparison of Physiological Signal Analysis Techniques and Classifiers
No ratings yet
A Comparison of Physiological Signal Analysis Techniques and Classifiers
14 pages
A Modified Shape Context Method for Shape Based Object Retrieval
No ratings yet
A Modified Shape Context Method for Shape Based Object Retrieval
12 pages
Audio-To-Symbolic Arrangement via Cross-Modal Music Representation Learning
No ratings yet
Audio-To-Symbolic Arrangement via Cross-Modal Music Representation Learning
5 pages
A Comparative Review of 3 Good Music Theory Books
No ratings yet
A Comparative Review of 3 Good Music Theory Books
10 pages
A Transformational Grammar Framework for Improvisation
No ratings yet
A Transformational Grammar Framework for Improvisation
7 pages
A Low-Complexity Audio Fingerprinting Technique for Embedded Applications
No ratings yet
A Low-Complexity Audio Fingerprinting Technique for Embedded Applications
20 pages
Definitions of Timbre
100% (1)
Definitions of Timbre
11 pages
In Control but Uninspired
No ratings yet
In Control but Uninspired
14 pages
"Dynamic Dualism" Kurth and Riemann On Music Theory and The Mind
100% (1)
"Dynamic Dualism" Kurth and Riemann On Music Theory and The Mind
17 pages
The Chordinator
100% (1)
The Chordinator
15 pages
Chord 2 Vec
No ratings yet
Chord 2 Vec
5 pages
Pitchclass 2 Vec
No ratings yet
Pitchclass 2 Vec
17 pages
Evolution of Fast Root Gravitropism in Seed Plants
No ratings yet
Evolution of Fast Root Gravitropism in Seed Plants
10 pages
Fundamental Considerations in Style Analysis
No ratings yet
Fundamental Considerations in Style Analysis
18 pages
Hippocampal Spatial Representations
No ratings yet
Hippocampal Spatial Representations
25 pages
How To Repair Broken References in SOLIDWORKS - GoEngineer
No ratings yet
How To Repair Broken References in SOLIDWORKS - GoEngineer
10 pages
215303-embedded-wireless-controller-conversion
No ratings yet
215303-embedded-wireless-controller-conversion
13 pages
Net Admin Installation Guide Supplement EN
No ratings yet
Net Admin Installation Guide Supplement EN
12 pages
Lanly-G14A
No ratings yet
Lanly-G14A
3 pages
Introduction To BSNL: 1.1 Company Profile
No ratings yet
Introduction To BSNL: 1.1 Company Profile
31 pages
TP 1
No ratings yet
TP 1
13 pages
SCE Training Curriculum: TIA Portal Module 003
No ratings yet
SCE Training Curriculum: TIA Portal Module 003
72 pages
BBuilding A Global Network WBSI Experience
No ratings yet
BBuilding A Global Network WBSI Experience
15 pages
Holtek E-Link For 8-Bit MCU OCDS User S Guide en
No ratings yet
Holtek E-Link For 8-Bit MCU OCDS User S Guide en
17 pages
Installing Oracle Database 11g On Windows: Back To Topic List
No ratings yet
Installing Oracle Database 11g On Windows: Back To Topic List
11 pages
Embedded Automotive System Development Process
No ratings yet
Embedded Automotive System Development Process
2 pages
DLSR Moot Court Competition - 2021
No ratings yet
DLSR Moot Court Competition - 2021
9 pages
Python Fundamentals C.W Notes
No ratings yet
Python Fundamentals C.W Notes
25 pages
IP Lab Manual Exp7
No ratings yet
IP Lab Manual Exp7
12 pages
Faldo 465 p paper cutter
No ratings yet
Faldo 465 p paper cutter
38 pages
Termwork of Java
No ratings yet
Termwork of Java
56 pages
Encryption Fundamentals, Techniques and Applications
No ratings yet
Encryption Fundamentals, Techniques and Applications
20 pages
Abb Dnp3 Manual
No ratings yet
Abb Dnp3 Manual
36 pages
neoColorBox_1.1B27_AdminManual
No ratings yet
neoColorBox_1.1B27_AdminManual
24 pages
Patch Panel Ref. 33567 - 68
No ratings yet
Patch Panel Ref. 33567 - 68
4 pages
Printing From SAP Using Zebra Printers: Frequently Asked Questions
No ratings yet
Printing From SAP Using Zebra Printers: Frequently Asked Questions
2 pages
TBMR 722 UserGuide
No ratings yet
TBMR 722 UserGuide
75 pages
Andeoid
No ratings yet
Andeoid
48 pages
Catia Help HVAC-DESIGN-5
No ratings yet
Catia Help HVAC-DESIGN-5
5 pages
College For Research & Technology of Cabanatuan GUIMBA Campus
No ratings yet
College For Research & Technology of Cabanatuan GUIMBA Campus
20 pages

Piece Identification in Classical Piano Music Without Reference Scores

Uploaded by

Piece Identification in Classical Piano Music Without Reference Scores

Uploaded by

PIECE IDENTIFICATION IN CLASSICAL PIANO MUSIC WITHOUT

Andreas Arzt, Gerhard Widmer

ABSTRACT instrumentation, ornamentation and other performance as-

Figure 1. System Overview

ID ; Composer ; P i e c e Piece ID Performance ID Time in Ref. Score

4. BASELINE APPROACH 5. USING MULTIPLE INSTANCES PER PIECE

[8] Peter Grosche and Meinard Müller. Toward character-

[9] Peter Grosche and Meinard Müller. Toward musically-

You might also like