0% found this document useful (0 votes)

6 views

Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research

The document discusses the evolution of music signal processing research, emphasizing the need for open-source practices to enhance transparency, sustainability, and reproducibility in music information retrieval (MIR). It highlights the complexities of modern MIR systems, the importance of detailed documentation, and the necessity of addressing confounding factors that can affect research outcomes. The authors advocate for adopting best practices in open-source software development to improve the reliability and reproducibility of research findings in the field.

Uploaded by

reheyi2494

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research

Uploaded by

reheyi2494

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MUSIC SIGNAL PROCESSING

Brian McFee, Jong Wook Kim, Mark Cartwright,

Justin Salamon, Rachel Bittner, and Juan Pablo Bello

Open-Source Practices
for Music Signal Processing Research
Recommendations for transparent, sustainable, and reproducible audio research

I
n the early years of music information retrieval (MIR), research
problems were often centered around conceptually simple
tasks, and methods were evaluated on small, idealized data sets.
A canonical example of this is genre recognition—i.e., Which
one of n genres describes this song?—which was often evaluated
on the GTZAN data set (1,000 musical excerpts balanced across
ten genres) [1]. As task definitions were simple, so too were signal
analysis pipelines, which often derived from methods for speech
processing and recognition and typically consisted of simple
methods for feature extraction, statistical modeling, and evalua-
tion. When describing a research system, the expected level of
detail was superficial: it was sufficient to state, e.g., the number
of mel-frequency cepstral coefficients used, the statistical model
(e.g., a Gaussian mixture model), the choice of data set, and the
evaluation criteria, without stating the underlying software depen-
dencies or implementation details. Because of an increased abun-
dance of methods, the proliferation of software toolkits, the explo-
sion of machine learning, and a focus shift toward more realistic
problem settings, modern research systems are substantially more
complex than their predecessors. Modern MIR researchers must
pay careful attention to detail when processing metadata, imple-
menting evaluation criteria, and disseminating results.

Reproducibility and Complexity in MIR

©istockphoto.com/traffic_analyzer
The common practice in MIR research has been to publish find-
ings when a novel variation of some system component (such as
the feature representation or statistical model) led to an increase
in performance. This approach is sensible when all relevant fac-
tors of an experiment can be enumerated and controlled and when
the researchers have confidence in the correctness and stability of
the underlying implementation. However, over time, researchers
have discovered that confounding factors were prevalent and un-
detected in many research systems, which undermines previous
findings. Confounding factors can arise from quirks in data col-
lection [2], subtle design choices in feature representations [3], or
unstated assumptions in the evaluation criteria [4].
As it turns out, implementation details can have greater
Digital Object Identifier 10.1109/MSP.2018.2875349
Date of publication: 24 December 2018 impacts on overall performance than many practitioners might

128 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

Authorized IEEE Signal Processing
BOMBAY.Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE
on March 1053-5888/19©2019IEEE
Xplore. Restrictions apply.
expect. For example, Raffel et al. [4] reported that differences in sources of software dependencies, and give pointers to com-
evaluation implementation can produce deviations of 9–11% in mon components.
commonly used metrics across diverse tasks including beat track- The first stage is data storage, which is often implemented
ing, structural segmentation, and melody extraction. This results by organizing data on a disk according to a file naming con-
in a manifestation of the reproducibility crisis [5] within MIR: if vention and directory structure. Storage may also be provided
implementation details can have such a profound effect on the by relational databases (e.g., SQLite [8]), key value/document
reported performance of a method, it becomes difficult to trust or stores (e.g., MongoDB at https://www.mongodb.com or Redis
verify empirical results. Reproducibility is usually facilitated by at https://redis.io), or structured numerical data formats (e.g.,
access to common data sets, which would allow independent re- HDF5 [9]). As data sets become larger and more richly struc-
implementations of a proposed method to be evaluated and com- tured, storage plays a critical role in the overall system.
pared with published findings. However, MIR studies often rely Input decoding, the second stage, loosely captures the trans-
on private or copyrighted data sets that cannot be shared openly. formation of raw data (compressed audio or text data) into
This shifts the burden of reproducibility from common data to formats more convenient for modeling (typically vector represen-
common software: although data sets often cannot be shared, tations). For audio, this consists primarily of compression codecs,
implementations usually can. which are provided by a few standard libraries (e.g., ffmpeg [10]
In this article, we share experiences and advice gained from or libsndfile [11]). Although different (lossy) codec implemen-
developing open-source software (OSS) for MIR research with tations are not guaranteed to produce numerically equivalent
the hope that practitioners in other related disciplines will ben- results, the differences are usually small enough to be ignored
efit from our findings and become effective developers of open- for most practical applications. For annotations and metadata, the
source scientific software. Many of the issues we encounter situation is less clear. Many data sets are provided in nonstan-
in MIR applications are likely to recur in more general signal dard formats (e.g., comma-separated values) that require custom
processing areas as data sets increase in complexity, evaluation parsers that can be difficult to correctly implement and validate.
becomes more integrated and realistic, and traditionally small Although several formats have been proposed for encoding anno-
research components become integrated with larger systems. tations and metadata (MusicXML [12], MEI [13], MPEG-7 [14],
and JAMS [15]), at this point none have emerged as a clear stan-
Open-source scientific software dard in the MIR community.
We agree with numerous authors [6] that description of research The third stage, synthesis and augmentation, is not univer-
systems is no longer sufficient, which follows from the position sal, but it has seen rapid growth in recent years. This stage
that scholarly publication serves primarily as advertisement for captures processes that automatically modify or expand data
the scientific contributions embodied by the software and data sets, usually with the aim of increasing the size or diversity
[7]. Here, we specifically advocate for adopting modern OSS of training sets for fitting statistical models. Data augmenta-
development practices when communicating scientific results. tion methods apply systematic perturbations to an annotated
The motivations for our position, although grounded in data set, such as pitch shifting or time stretching, to induce
music analysis applications, apply broadly to any field in which these properties as invariants in the model [16]. Relatedly,
systems reach a sufficiently high degree of complexity. Releas- degradation methods apply similar techniques to evaluation
ing software as open source requires more than posting code data as a means of diagnosing failure modes in a model once
on a website. We highlight several key ingredients of good its parameters have been estimated [17]. Synthesis methods,
research software practices: like augmentation, seek to generate realistic examples either
■■ licensing: to define the conditions under which the soft- for training or evaluation, and, although the results are syn-
ware can be used thetic, they are free of annotation errors [18]. Because these
■■ documentation: so that users know how to operate the soft-
ware and what exactly it does
■■ testing: so that the software is reliable
■■ packaging: so that the software can be easily installed and Data
managed in an environment (Audio and Annotations)
■■ application interface design: so that the software can be
Codecs and Parsing
easily integrated with other tools.
We discuss best practices for OSS development in the context of Synthesis and Augmentation
MIR applications and propose future directions for incorporat-
Data Sampling
ing open-source and open-science methodology in the creation
of data sets. Modeling
Evaluation
System architecture and components
Figure 1 shows a generic but representative MIR system pipe- Deployment
line consisting of seven distinct stages. We describe each stage
to provide a sense of scale involved in MIR research, document FIGURE 1. A system block diagram of a typical MIR pipeline.

IEEE Signal Processing

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions
on March 129apply.
processes can have a profound impact on the resulting model, perhaps the most overlooked in research and is possibly the
it is important that augmentation and synthesis be fully docu- most difficult to approach systematically, because the require-
mented and reproducible. Modern frameworks such as MUDA ments vary substantially across projects. If we limit attention
[16] and Scaper [18] achieve data provenance by embedding to reproducibility, software packaging emerges as an integral
the generation/augmentation parameters within the generated step to both internal reuse and scholarly dissemination. We
objects, thereby facilitating reproducibility. therefore encourage researchers to take an active role in pack-
Data sampling, the fourth stage, refers to how a collection is aging their software components, and in the “Best Practices for
partitioned and sampled when fitting statistical m odels. For sta- OSS Research” section we discuss specific tools for packaging
tistical evaluation, data are usually partitioned into training and and environment management.
testing subsets, and this step is usually implemented within a
machine-learning toolkit (e.g., SciKit-Learn [19]). We empha- Example: Onset detection
size data partitioning because it can be notoriously difficult to Although we focus on large, integrated systems, it is instruc-
implement correctly when dealing with related samples, such as tive to see how system complexity plays out on a smaller
multiple songs by a common artist [20]. Stochastic sampling is scale representative of earlier MIR work. As a conceptually
an increasingly important step, because it defines the sequences simple example task, consider onset detection: the problem
of examples used to estimate model parameters. Modern meth- of estimating the timing of the beginning of musical notes
ods trained by stochastic gradient descent can be sensitive to in a recording. A method for solving this problem could be
initialization and sampling, so it is important that the entire described next.
process be codified and reproducible. Often, sampling is speci- Audio was converted to 22,050 Hz (mono), and a 2,048-
fied only implicitly and is provided by machine-learning frame- point short-time Fourier transform (STFT) was computed with
works without explicit reproducibility guarantees. Sampling a 64-sample hop. The STFT was reduced to 128 mel-frequency
also becomes an engineering challenge when the training data bands, and magnitudes were compressed by log scaling. An
exceed the memory capacity of the system, which is common onset envelope was computed using thresholded spectral dif-
when dealing with large data sets. For problems involving large ferencing, and peaks were selected using the method of Böck
data sets, some framework-independent libraries have been et al. [32]. This description is artificial, but the level of speci-
developed to handle data sampling under resource constraints ficity given is representative of the literature.
(e.g., Pescador [21] and Fuel [22]). Although precise enough to be approximately reimple-
Modeling, as the fifth stage, includes both feature extrac- mented by a knowledgeable practitioner, the description omits
tion and statistical modeling, although the boundary between several details. To quantify the effect of these details, we con-
the two has blurred in recent years with the adoption of deep- ducted an experiment in which some unstated parameters were
learning methods. Many open-source libraries exist for audio varied, and the resulting accuracy was measured on a standard
feature extraction, such as Essentia [23], librosa [24], aubio [25], data set [33]. We varied the window function for STFT (Hann
Madmom [26], or Marsyas [27]. Different libraries may pro- or Hamming), the log scaling [bias-stabilized log ^1 + X h or
duce different numerical representations for the same feature clipped 80 dB below peak magnitude], and the differencing
(e.g., mel spectra), and even within a single library the robust- operator (first-order difference or a Savitsky–Golay filter,
ness of different features to input encoding/decoding may vary as is commonly used in delta feature implementations [34]).
[28]. Although robustness is distinct from reproducibility, it These three choices produce eight configurations that are all
highlights the importance of sharing specific software imple- consistent with the given description, any of which constitutes
mentations. The statistical modeling component is most often a reasonable attempt at reconstructing the described method.
provided by a machine-learning framework, such as SciKit- There are, of course, many other parameters unstated: the
Learn or Keras [29]. Although the specific choice of framework exact specification of the mel filter bank, how aggregation
is largely up to the practitioner’s discretion, we emphasize that across frequency bands was computed, and so on. For the sake
consideration should be given to how this choice interacts with of brevity, we limit the scope of this experiment to the three
the remaining two stages. aforementioned choices.
Referring to measuring the performance of an entire devel- Figure 2 shows the distribution of F-measure (harmonic
oped system (not just the statistical model component) is the mean of precision and recall) for each configuration. Although
sixth stage, evaluation. For simple classification problems, the best-performing versions are approximately equivalent, the
this functionality is typically provided by a machine-learning range of scores is quite large, spanning 0.43 to 0.76. Moreover,
framework (e.g., SciKit-Learn). However, for domain-specific some decisions can have a significant effect in some conditions
MIR problems, software packages have been developed to stan- (e.g., the differencing filter when using Hamming windows)
dardize evaluations, such as mir_eval for music description and that vanishes in other conditions (e.g., using a Hann window).
source separation [4], sed_eval for sound event detection [30], This demonstrates that an incomplete system description can
and rival for recommender systems [31]. lead to incorrect conclusions about a particular design choice.
Finally, the last stage is deployment, by which we broadly The interventions performed in this experiment are confined to
mean dissemination of results (publication), packaging for a single stage of Figure 1 (modeling), but realistic systems are
reuse, or practical application in a real setting. This stage is susceptible to variation at each stage of the pipeline.

130 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

Authorized IEEE Signal Processing
BOMBAY.Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions apply.
on March
Although this method is simple enough to be completely community mostly uses GPL-style licenses. A full discus-
described in a short amount of text, a full description quickly sion of the relative merits of different licensing options is far
becomes impractical as methods become more complex. In beyond the scope of this article, but we recommend https://
modern research systems, the only practical means of fully choosealicense.com as a resource to help select and compare
characterizing the implementation is to provide the source the various options.
code and data.
Documentation
Best practices for OSS research Documentation is the primary source of information for us-
As described in the previous sections, modern research pipe- ers of a piece of software, and it should be written and main-
lines consist of many components with complex interactions. tained with the most relevant and helpful content. A com-
The engineering cost for developing and maintaining these mon practice for distributing documentation is to include it
components often exceeds that of implementing the core re- with the source code distribution so that it is tightly coupled
search method for a particular study. Sculley et al. [35] dis- to the specific software version in use. We recommend us-
cussed this cost as hidden technical debt, which is hard to ing a documentation build tool that can automatically gen-
notice and compounds silently. In this section, we provide erate a website using both explicit documentation files and
recommendations for open-source research software develop- the in-line comments in the source code for the application
ment, which can help improve code quality and reproducibil- programming interface (API). Examples of such tools in-
ity and foster efficient long-term collaboration on large proj- clude Sphinx and MkDocs, and the generated website can be
ects with distributed contributors. The suggestions we make hosted on services such as Read the Docs (http://readthedocs
here are broadly applicable outside MIR [or digital signal .io). In addition to describing software functionality, documen-
processing (DSP)], and we draw attention to these points spe- tation should also include relevant bibliographic references and
cifically because domain experts are often not aware of their instructions for attribution.
importance. Many of the recommendations given here are To prevent the common problem of documentation falling
also implemented concretely in Shablona (https://github.com/ out of sync with the software, it is important to document con-
uwescience/shablona), a template repository for starting sci- currently with programming. Similarly, before each new ver-
entific Python projects. Interested readers may wish to browse sion of a package is released, a thorough audit of documentation
the Shablona repository while reading the following sections. should be conducted with respect to the changes introduced
Readers entirely new to software development and OSS may since the previous release. All changes should be summarized in
additionally benefit from the instructional materials provid- a CHANGELOG or release notes section of the documentation,
ed by Software Carpentry (https://software-carpentry.org/), ideally with time stamps, so that users can quickly discern
the Hitchhiker’s Guide to Python (https://docs.python-guide changes introduced for each version. These simple steps, com-
.org/), and Wilson et al. [36]. bined with semantic versioning and version control (described
in the following sections), require little effort, but they substan-
Software licensing tially ease use and integration.
The defining characteristic of OSS is the license. Licenses dic- Finally, we emphasize the importance of providing exam-
tate the terms under which software can be used, modified, or ple code in the documentation. Although examples cannot
distributed. If no license is explicitly stated, then no use, modi- replace a textual description of functionality, including self-
fication, or distribution is permitted [37], and, to put it mildly, contained example usage for each function or class (along with
this significantly impedes adoption, reuse, and open science. the expected output of the example code) can often be a more
Therefore, it is important to include a license agreement with
any software intended for reuse and distribution.
There are many open-source licenses to choose from, but
hamming - dB_clip - diff
four of the most popular licenses are the Massachusetts Insti-
hamming - dB_clip - sav-gol
tute of Technology (MIT), Berkeley Software Distribution hamming - log1p - diff
(BSD), Apache, and General Public License (GPL). MIT and hamming - log1p - sav-gol
BSD are simple, permissive licenses with minimal require- hann - dB_clip - diff
ments on how derivative works are distributed. Apache is also hann - dB_clip - sav-gol
permissive, but it contains additional provisions, including a hann - log1p - diff
grant of patent rights from contributors to users. In contrast, hann - log1p - sav-gol
GPL requires derivative works to be distributed under the 0.0 0.2 0.4 0.6 0.8 1
F -Measure
same license terms.
Not all of these licenses will suit an individual’s or orga-
nization’s needs. Therefore, it is common for particular com- FIGURE 2. The results of the onset detection experiment: Each box corre-
sponds to the interquartile range over test recordings, with the mean and
munities to tend toward a specific type of license: the scientific median scores indicated by • and |, respectively. Each row corresponds to
Python community generally uses the more permissive MIT- a system configuration that is consistent with the description given in the
or BSD-style licenses, whereas the R programming language “Example: Onset Detection” section, but differs in unstated parameters.

IEEE Signal Processing

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions
on March 131apply.
effective way of communicating the behavior of a component current code works as intended but also to quickly detect any
to a novice user. regressions caused by changes to any part of the software.
Unit testing refers to automated testing of the smallest test-
Version control software able parts of the software—units—which are usually indi-
Version control software (VCS) is an essential tool for modern soft- vidual functions or classes. Specifying the behaviors of the
ware development that 1) keeps track of who changed what, when, individual units not only helps programmers find errors in the
and for what reason; 2) supports creating and recreating snapshots earliest stage of development but also encourages a modular
of everything in the project’s history; and 3) enables a variety of design composed of loosely coupled, testable components.
tooling related to the software, such as test automation (see the Other forms of testing include integration testing, where tests
“Automated Software Testing” section) and quality control (see the are designed to ensure that small components produce desired
“Code Quality and Continuous Integration” section). Git, currently results when combined, and regression testing, which com-
the most popular VCS in OSS development, is a distributed VCS pares current outputs to archived previous outputs so that
where the full history of the project is stored unexpected changes (regressions) can be
in every developer’s computer. GitHub (https:// Large research data easily and automatically detected.
github.com) is a service that offers free host- By defining the guarantees of each part of
repositories typically
ing for open-source projects and leverages the the software and writing tests that can detect
decentralized nature of Git to provide a plat- guarantee multiple deviations from the guarantees, automated
form for collaboration of software developers. decades of longevity. testing helps improve the stability and reli-
Bundled with the pull-request feature (see the ability of the software and ultimately reduces
“Project Management, Pull Requests, and Code Review” section) the potential cost of undetected or late-detected errors. Test-driv-
that allows users (internal and external to a project) to suggest en development (TDD) is a software development process in
changes; issues trackers; and provides wikis, service integration, which the specification is written before the actual development
and website hosting, GitHub serves as the home for the majority of of features, and the implementations of features are then made to
open-source projects. pass the tests [38]. Although TDD protocol is not always strictly
VCS is also important for managing releases, which are followed, writing tests early in development can help program-
packaged versions of the software intended to be easily down- mers clarify the intended behavior of a function and discover
loaded and used. Each release is marked with a version string components that need to be simplified into smaller units.
(such as 1.5.3), and semantic versioning (https://semver.org/)
is a recommended practice of assigning software versions Code quality and continuous integration
that can systematically inform the users about the incom- Software developers should strive to maintain high-quality
patible changes to expect when updating versions. At a high code, meaning that it is well formatted, well organized, and
level, semantic versioning states that API-compatible revi- clear to read. Static analysis tools are utilities that quantify vari-
sions to a package retain the same major version index, which ous dimensions of code quality without executing the software.
allows users (including other libraries) to loosely specify ver- Many programming languages include static analyzers to test
sion requirements. that code adheres to a style (formatting and variable naming)
Unfortunately, there are no guarantees that a commercial guideline, such as Python’s pycodestyle tool. Similarly, a linter
hosting service like GitHub will persist indefinitely. There- is a static analysis tool that can suggest stylistic improvements
fore, for software accompanying publications, we recommend to the structure of code and identify possible sources of errors.
using a funded research data repository such as Zenodo (see Linters can also perform a measurement of code complexity
the “Data Distribution” section) in conjunction with GitHub. and produce warnings if, e.g., a function is too complex in its
Large research data repositories typically guarantee multiple structure. A metric commonly used for measuring this is the cy-
decades of longevity. clomatic complexity, which is the number of independent code
In short, we recommend using Git for efficient collabora- paths in a unit of code.
tion and sustainable development of software, with the help of Another important metric for code quality is test coverage,
GitHub for software distribution and issue tracking. GitLab is the proportion of code executed by the tests. Low code cov-
an alternative to GitHub that also offers free hosting and issue erage implies that the software is not thoroughly tested and
tracking but can be locally installed and self-administered. thus is unlikely to be reliable. Having a low cyclomatic com-
plexity is helpful in achieving high code coverage, because it
Automated software testing determines the number of test cases required to achieve the full
It is beneficial for software projects to regularly perform auto- code coverage.
mated tests to ensure the correctness of implementation. Au- Integration is the task of putting the development outputs
tomated software testing involves a set of specifications that to a product, i.e., ensuring quality by performing various auto-
precisely define the intended behaviors of the software, along mated tests and packaging the software for deployment. Con-
with a testing framework that controls the execution of the tests tinuous integration (CI) is a practice of performing integrations
and verifies that the software produces the expected outputs. as frequently as possible by automating the process so that the
The purpose of test automation is not only to verify that the status of every change to the code is automatically verified. In

132 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

Authorized IEEE Signal Processing
BOMBAY.Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions apply.
on March
addition to ensuring good software quality through automated In addition to the importance of intuitive API design, we
tests, continuous integration provides a platform for automatic argue that function-oriented interfaces are often better than
analysis of code quality. By using a version control system, object-oriented designs. In research settings, use cases are
continuous integration can be performed automatically at often procedural executions of steps in a pipeline, and using
every registered change to the software, and services such as class hierarchies may entail unnecessary cognitive load.
Travis CI (https://travis-ci.org), CircleCI (https://circleci.com), Functions have well-defined entry and exit points, making
and AppVeyor (https://www.appveyor.com) provide free host- their life spans explicit, but objects maintain state indefi-
ing for open-source projects. nitely, making it difficult to infer their scope. Moreover,
classes do not easily traverse library boundaries, impeding
Project management, pull requests, and code review interoperability between components. If an API expects or
Although automated testing and static analysis are powerful produces an instance of a certain class, it forces every pack-
tools, they must be used effectively to produce high-quality age depending on the API to conform to the specification of
OSS. Ultimately, software is developed and maintained by hu- the class, and this makes such packages sensitive to future
mans, and there is no total substitute for proper project man- changes in the class definition. For this reason, data con-
agement. A widely adopted practice in OSS development is to tainers can be better represented in the standardized, primi-
require that all changes to a code base be submitted via pull re- tive collection types, such as dictionaries, lists, or NumPy
quests. A pull request combines one or more proposed revisions ndarray type.
to the software as a unit that can either be accepted (merged Despite these arguments for function-oriented design,
into the main repository) or rejected. The benefit of this prac- object-oriented interfaces can be useful when the primary
tice is that a pull request provides a convenient point for human goal explicitly requires persistent state. This is the case, for
intervention without the need to manually track each individual instance, when packaging statistical models, where the state
change. Continuous integration systems typically execute all encapsulates the model parameters.
tests on a proposed pull request, which gives the project manag-
er—who may be the same person as the pull-request author—a Packaging and environments
quick way to determine whether the proposed changes conform Software is often organized into packages to facilitate main-
to style requirements, are sufficiently tested and documented, tainability and distribution, and it is a responsibility of a pack-
and do not introduce test regressions. age management system to provide means to install specific
Typically, a pull request should not be merged if any of the versions of desired packages. Many programming languages
following conditions are not satisfied: 1) all tests pass, 2) test provide package management systems that help organize in-
coverage has not decreased, 3) the code adheres to style require- stalled libraries and applications, such as pip for Python and
ments, and 4) the proposed changes are properly documented. CRAN for R. These provide a way to specify dependency
The first condition verifies that the proposed changes do not requirements and user interfaces to install and upgrade soft-
break existing behavior. The second condition requires that the ware. Because installing a software package becomes as sim-
proposed changes include a minimum amount of correspond- ple as running a single-line command—[package-manag-
ing tests. The third condition checks that the proposed changes er] install [package-name]—it is often a good idea to
are stylistically consistent with the project’s goals and existing distribute the software as a package for easier and wider adop-
code. The fourth condition ensures that the project’s documen- tion, even if the project is not primarily a library. Packages
tation does not fall out of sync with the source code. Of these, are constructed by build tools, which vary across languages,
the first three conditions can be automated by continuous inte- such as Python’s setuptools or Java’s Gradle. Working in
gration. However, none of the conditions ensures the correctness conjunction with package management software, build tools
of the proposed change, which ultimately should be determined allow a project to be packaged with its dependencies and with
(as best as possible) by a thorough code review by one or more their exact versions specified, along with the metadata to help
parties beyond the author of the proposed changes. Incidentally, index the project in a repository.
code review is also the ideal time to check the fourth condition Within Python, there are two dominant package systems:
and request any modifications to the pull request. Adopting this the Python Package Index (PyPI or pip package manager) and
workflow early in a project’s life cycle can provide structure to Conda. The key distinction between these two systems is that
software development and ease the burden of adhering to best pip can package only Python modules (and extensions writ-
practices (especially documentation and testing). ten in C), but Conda packages can be written in any language.
Conda packages thus allow dependency tracking across lan-
Interoperability and interface design guages, so that a package written in Python, e.g., can have
Publishing an OSS library means its functions and classes can dependencies written in C. This property is useful when devel-
be used by many users, who will benefit from a maintainable, oping large systems with heterogeneous components, as is
extensible, and easy-to-understand API design. This includes common in MIR and likely to become common in DSP more
programming practices such as descriptive function and vari- broadly in the future.
able naming, intuitive organization of functionality into sub- With all dependencies and their versions specified for a proj-
modules, and sensible default parameter values. ect, one can ensure the interoperability between components

IEEE Signal Processing

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions
on March 133apply.
and thus have an environment that provides reproducible taken inspiration from the UNIX System V init system,
results. However, libraries are known to change over time and which organizes (system startup) scripts alphanumerically to
introduce incompatibilities across version upgrades. This can ensure a consistent order of execution. This simple conven-
present a problem when reproducing an old experiment in a tion eases reproducibility by eliminating any ambiguity in
modern environment or when working on multiple projects how the various components should be executed. The exact
with conflicting dependencies. Environment managers (such as subdivision of steps is not critical, but the four listed here—
Conda or virtualenv in Python) resolve this by providing iso- synthesis/augmentation, preprocessing, model estimation,
lated environments in which packages can be installed. Virtual and evaluation—apply broadly to many situations. We have
machines or containers like Docker can also provide isolated found this loose organization to be flexible and useful in our
and reproducible environments that do not depend explicitly own projects.
on the programming language in question. Container tools like The preprocessing step can entail a variety of processes
ReproZip [39] can significantly ease reproducibility by auto- that generate intermediate data, such as precomputed feature
matically generating virtual machine images to reproduce a transformations or train-test splits of a data set. For diagnos-
specific experiment. tic purposes, we specifically advocate generating train-test
index partitions independent of model estimation and saving
Project structure all index sets to disk as index files (e.g., splitNN/index _
Figure 3 provides our recommended repository structure for train.json). This small amount of bookkeeping can signifi-
MIR projects using Python, although the template could be eas- cantly ease debugging and reproducibility and can facilitate
ily adapted to other domains and languages. The top-level di- fair, paired comparisons between different methods over the
rectory should at least include the license and a readme.txt file same data partitions. All data produced automatically should
that describes the project at a high level and provides contact be kept separate from the static data directory, e.g., in a dedi-
information for the authors. The file env.yaml (or requirements cated generated directory; if there are multiple train-test
.txt) describes the software dependencies (and versions) necessary splits, then all split-dependent data should be kept in their own
to reproduce the project’s working environment; these should be subdirectory (or otherwise separated by filename) to prevent
automatically generated by a package or environment manager, statistical contamination across partitions.
e.g., by executing conda env export or pip freeze. We recommend that any (interactive) post hoc analysis of
The data subdirectory should contain any static data used the results including figure generation for publications and be
in the experiment, such as a filename index of a data set or con- stored separately under notebooks. Here, we suggest Jupyter
figuration files associated with various software components. notebooks (https://www.jupyter.org/), which are portable and
Entire data sets need not be included in the repository here (to support interactive execution in a variety of languages. If mul-
limit the size of the repository), but a script or instructions to tiple steps are necessary, we again recommend ordering the
procure the data should be provided. files alphanumerically to disambiguate execution order.
The scripts subdirectory contains all of the scripts As a final note, we suggest that all (pseudo-)randomized
needed to generate the results of the project. Here, we have computations throughout the process use a fixed seed, which
can be easily set by a user. This ensures that the entire system
is deterministic and can significantly aid in debugging and
reproducibility.
project/
LICENSE.txt
README.txt Proposal: Tools for data collection and distribution
env.yaml (or requirements.txt) Just as complex systems often require multiple software compo-
data/
index–all.json nents, they increasingly also require multiple data sets. Similar
... to software, data sets can also change over time, either from ex-
scripts/
01–data–augmentation.py
tension or correction [40]. In addition, even small changes in the
02–pre–process.py data collection and processing pipeline can affect results. For
03–model.py example, previous studies have shown that even the visualiza-
04–evaluate.py
... tion used in audio annotation can affect annotation quality [41].
generated/ Researchers also often process or clean annotations by remov-
split01/
index–train.json ing outliers or aggregating annotations. These processes must
index–test.json be documented to appropriately use and extend annotations.
model_parameters.h5 Although many open-source principles can also be applied to
results.json
split02/ data, there is much work to be done regarding tooling and in-
... frastructure to support OSS practices for data collection. This
notebooks/ section is both a position statement and a proposal to the com-
01–analysis.ipynb
... munity in which we outline what has been done and propose
what needs to be done to move forward regarding the tooling of
FIGURE 3. An example file structure for an MIR research project. data collection and distribution.

134 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

Authorized IEEE Signal Processing
BOMBAY.Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions apply.
on March
Data annotation files would also contain standardized metadata and documen-
First, we propose that the research community should develop tation of the data set creation process.
and adopt standard, open-source tools for audio annotation.
This ensures not only that we are not replicating existing work Data documentation
with several ad hoc annotation solutions but also that we are To understand the content of data sets, use them appropriately,
following best practices and can extend existing data sets de- and extend them when necessary, data sets must be thoroughly
veloped by other research groups. documented. This motivates researchers to develop standard
In addition to following the OSS principles outlined ear- reporting mechanisms and tools to facilitate the documenta-
lier in this article, these tools should also be configurable, tion of the data collection process. Although standards should
extensible, and web-based so that they can be easily deployed be developed and ratified by the community, the following are
without requiring users (annotators) to install software. Web- possible items to include for each annotation:
based solutions enable easy distribution of audio and crowd- ■■ annotation software and version
sourced annotation, now a standard method for obtaining large ■■ annotation software configuration
numbers of annotations. Although many of our data needs can ■■ description of all tasks, including participant screening,
be met using strong or weak labeling tasks, some of our data training, annotation tasks, and surveys
needs require more specialized, unforeseen tasks. Therefore, ■■ description of annotator recruiting
these tools should be extensible, i.e., with the capability to sup- ■■ monetary (or other extrinsic) compensation mechanisms
port new tasks and workflows. Finally, the configuration of ■■ anonymized annotator identifiers
these tools—instructions, workflow definitions, task configu- ■■ time stamps
rations, and so on—should also be stored in a single location in ■■ data cleaning or processing procedures
a human-readable format. ■■ data synthesis procedures description and code (if applicable).
A number of open-source desktop applications have been All such documentation should provide reasonable explana-
already developed for annotation, such as Raven [42], Audac- tions and justifications for the choices made. This again helps
ity [43], or Tony [44], but only recently have we seen the emer- the community understand the data and what it can be used
gence of web-based tools for crowdsourcing. Audio Annotator for. As a community, we should also determine screening and
is a simple web-based front end for strong demographic survey procedures and anno-
labeling of audio with standard audio visual- First, we propose that tator quality metrics. Once these have been
izations [41]. Although a good starting point, established, documentation tools, in com-
the research community
its functionality is limited, and it is not eas- bination with standardized annotation file
ily extensible. Freesound Datasets is a new
should develop and adopt formats, should be able to quickly aggre-
web-based platform for crowdsourcing weak standard, open-source gate and display this information about the
labels of audio, hosted on https://freesound tools for audio annotation. population as a whole. Best practices for data
.org [45]. However, it is currently limited to documentation have been proposed before in
Freesound data and also is not extensible. Zooniverse is the most MIR, although adoption by the community has been slow [47].
popular citizen science platform, with over a million registered Recently, Gebru et al. [48] proposed a standardized data sheet
users [46]. Zooniverse supports audio content and audio visual- format for general machine-learning data sets, inspired by the
izations, but the available task types are limited to weak labeling standardized data sheets that accompany electronic components.
and survey questions, and its extensibility is limited.
Data distribution
Data set file formats Finally, we need tools to distribute, maintain, and index pub-
As described in the “System Architecture and Component” lic data sets. Although many of the requirements for data
section, standardized tools for reading and writing data file are similar to those for software, data typically require more
formats minimize the risk of parsing errors and ease distribu- storage than software, rendering many existing services
tion and use of data. There are several formats for encoding unsuitable. Data hosting should support versioning to sup-
music annotations (MusicXML [12], MEI [13], MPEG-7 [14], port changes to data sets, provide digital object identifiers
and JAMS [15]), but these formats are primarily for managing (DOIs), and guarantee longevity for several decades to pre-
annotations for a single recording rather than collections of an- vent broken URLs and ephemeral data. These data require-
notated audio. To increase transparency and usability of data ments files would specify the data sets and versions required
sets, we propose to develop a package to support collection by software, and they should be distributed along with the
management. Only the raw annotations and audio would be software requirements files. There are currently several
stored as data, and views could be defined to filter and process hosting solutions that support large data sets, versioning,
the data for a specific task. For example, if a data set needs to and DOIs and guarantee decades of longevity (e.g., Zenodo
be cleaned to remove erroneous annotations or outliers, then at https://zenodo.org, Figshare at https://figshare.org, Dryad
users could write a clean view of the data without discarding at https://datadryad.org, and Dataverse at https://dataverse
information. Additionally, preregistered splits of the data could .org). Unfortunately, these solutions have yet to develop a
be implemented as a view on top of an existing view. Data set data management tool like we have described. However, it

IEEE Signal Processing

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions
on March 135apply.
may be possible for a third party to build such a tool around scientist intern at Pandora in 2017 and Spotify in 2018, focusing
the existing infrastructure. on music recommender systems and neural music synthesis. He
In addition to hosting and distribution, we also need a plat- is a Ph.D. candidate at New York University’s Music and Audio
form for developing and maintaining data. At the minimum, this Research Laboratory. His research interests include automatic
would include an issue tracker for reporting errors and propos- music transcription and music language models.
ing/discussing improvements to existing data sets. However, Mark Cartwright ([email protected]) received a
this could also double as a platform for proposing and discuss- B.M. degree in music technology from Northwestern Uni
ing the creation of new data sets. Although such functionality versity, Evanston, Illinois, in 2004. He received an M.A. degree
would ideally be integrated into hosting services, this could also in music science and technology in 2007 from Stanford
be developed around existing infrastructure or supported with University (CCRMA), California, and a Ph.D. degree in com-
existing platforms such as GitHub. puter science in 2016 from Northwestern University, where
his research focused on developing new interaction para-
Conclusions digms for audio production tools. Currently, he is a postdoctor-
Although MIR has long been data driven and necessarily com- al researcher at New York University’s Music and Audio
plex because of the long chain of steps involved in bridging Research Laboratory. He was previously a visiting researcher
audio signals and semantically meaningful representations, at the Center for Digital Music at Queen Mary University of
we expect the core issues of system complexity to eventually London and an intern at Adobe’s Creative Technology Lab. His
pervade all data-driven areas of signal processing. The general research lies at the intersection of human–computer interac-
architecture outlined in the “System Architecture and Compo- tion, audio signal processing, and machine learning.
nents” section is generic enough to capture most MIR use cases, Justin Salamon ([email protected]) received his
and, although different domains might exhibit slightly different B.A. degree in 2007 in computer science from the University
workflows, we expect that the overall system complexity issue of Cambridge, United Kingdom, and his M.Sc. and Ph.D.
will arise across domains. The recommendations put forward degrees in computer science from the Universitat Pompeu
in the “Best Practices for OSS Research” section should serve Fabra, Barcelona, Spain, in 2008 and 2013, respectively. In
as a solid basis for improving the quality and reproducibility of 2011, he was a visiting researcher at the Institut de Recherche
scholarly research. While we do not expect signal processing et de Coordination Acoustique/Musique, Paris, France. In
researchers to become experts in software engineering, we fo- 2013, he joined New York University as a postdoctoral
cus here on software precisely because it is often overlooked as researcher, where he has been a senior research scientist since
a crucial component of research systems. Although most of our 2016. He is a senior research scientist at New York University’s
recommendations concern software, we see data management Music and Audio Research Laboratory and Center for Urban
as the next frontier in improving data-driven research in general Science and Progress. His research focuses on the application
and signal processing research specifically. Our proposal is in- of machine learning and signal processing to audio signals,
tended to resolve certain shortcomings in our current practices with applications in machine listening, music information
for data set construction, but it may be readily adapted to dif- retrieval, bioacoustics, environmental sound analysis, and
ferent application domains. We encourage future researchers to open-source software and data.
think carefully about data construction, preservation, and man- Rachel Bittner ([email protected]) received her
agement issues moving forward. B.S. degrees in music performance and math at the University
of California, Irvine, her B.M. degree in math at New York
Authors University’s Courant Institute, and her Ph.D. degree in music
Brian McFee ([email protected]) received his B.S. degree technology in 2018 at the Music and Audio Research Lab at
in computer science from the University of California, Santa New York University under Dr. Juan Pablo Bello. She was a
Cruz, in 2003 and his M.S. and Ph.D. degrees in computer sci- research assistant at NASA Ames Research Center, working
ence and engineering from the University of California, San with Durand Begault in the Advanced Controls and Displays
Diego, in 2008 and 2012, respectively. He is an assistant pro- Laboratory. She is a research scientist at Spotify in New York
fessor of music technology and data science at New York City. Her research interests are at the intersection of audio sig-
University. His work lies at the intersection of machine learn- nal processing and machine learning, applied to musical
ing and audio analysis. He is an active open-source software audio. Her dissertation work applied machine learning to fun-
developer and the principal maintainer of the librosa package damental frequency estimation.
for audio analysis. Juan Pablo Bello ([email protected]) received his B.Eng.
Jong Wook Kim ([email protected]) received his B.S. degree in electronics in 1998 from the Universidad Simón
degree in electrical engineering from the Korea Advanced Bolívar in Caracas, Venezuela, and in 2003 he received his
Institute of Science and Technology, Daejeon, and his M.S. Ph.D. degree in electronic engineering from Queen Mary
degree in computer science and engineering from University of University of London. He is a professor of music technology
Michigan in 2009 and 2011, respectively. From 2012 to 2015, and computer science and engineering at New York University.
he was a back-end software engineer at NCSOFT Corporation His expertise is in digital signal processing, machine listening,
and Kakao Corporation, South Korea, and he was a research and music information retrieval, topics that he teaches and on

136 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

Authorized IEEE Signal Processing
BOMBAY.Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions apply.
on March
which he has published more than 100 papers and articles in [23] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, P. Herrera, O. Mayor, G. Roma,
J. Salamon, et al., “Essentia: An audio analysis library for music information retriev-
books, journals, and conference proceedings. He is the director al,” in Proc. 14th Int. Society for Music Information Retrieval Conf., (ISMIR),
of the Music and Audio Research Lab, where he leads research Curitiba, Brazil, 4–8 Nov. 2013, pp. 493–498.
on music informatics. His work has been supported by public [24] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O.
Nieto, “librosa: Audio and music signal analysis in Python,” in Proc. 14th Python in
and private institutions in Venezuela, the United Kingdom, and Science Conf., 2015, pp. 18–25.
the United States, including Frontier and CAREER Awards [25] P. Brossier. (2009). Aubio, a library for audio labelling. [Online]. Available:
from the National Science Foundation and a Fulbright scholar https://aubio.org/

grant for multidisciplinary studies in France. He is a Senior [26] S. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer, “Madmom: A
new Python audio and music signal processing library,” in Proc. 2016 ACM
Member of the IEEE. Multimedia Conf., 2016, pp. 1174–1178.
[27] G. Tzanetakis and P. Cook, “Marsyas: A framework for audio analysis,”
Organised Sound, vol. 4, no. 3, pp. 169–175, 2000. doi: 10.1017/S1355771800003071.
References [28] J. Urbano, D. Bogdanov, P. Herrera, E. Gómez, and X. Serra, “What is the
[1] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” effect of audio quality on the robustness of MFCCs and chroma features?” in Proc.
IEEE Trans. Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, 2002. doi: 15th Int. Society for Music Information Retrieval Conf., (ISMIR), Taipei, Taiwan,
10.1109/TSA.2002.800560. 27–31 Oct. 2014, pp. 573–578.
[2] B. L. Sturm, “Revisiting priorities: Improving MIR evaluation practices,” in [29] F. Chollet, et al. (2015). Keras. [Online]. Available: https://keras.io
Proc. 17th Int. Society for Music Information Retrieval Conf., (ISMIR), New York,
7–11 Aug. 2016, pp. 488–494. [30] A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event
detection,” Appl. Sci., vol. 6, no. 6, p. 162, 2016. doi: 10.3390/app6060162.
[3] T. Cho, R. J. Weiss, and J. P. Bello, “Exploring common variations in state of the
art chord recognition systems,” presented at the Sound and Music Computing [31] A. Said and A. Bellogín, “Rival: A toolkit to foster reproducibility in recom-
Conf., 2010. mender system evaluation,” in Proc. 8th ACM Conf. Recommender Systems, 2014,
pp. 371–372.
[4] C. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P.
W. Ellis, “mir_eval: A transparent implementation of common MIR metrics,” in [32] S. Böck, F. Krebs, and M. Schedl, “Evaluating the online capabilities of onset
Proc. 15th Int. Society for Music Information Retrieval Conf., (ISMIR), Taipei, detection methods,” in Proc. 13th Int. Society for Music Information Retrieval
Taiwan, 27–31 Oct. 2014, pp. 367–372. Conf., Mosteiro S. Bento Da Vitória, Porto, Portugal, 8–12 Oct. 2012, pp. 49–54.

[5] H. Pashler and E. Wagenmakers, “Editors’ introduction to the special section on [33] S. Böck, “oneset_db.” Accessed on: Jan., 2018. [Online]. Available: https://
replicability in psychological science: A crisis of confidence?” Perspectives github.com/CPJKU/onset_db
Psychological Sci., vol. 7, no. 6, pp. 528–530, 2012. [34] D. P. Ellis. (2006). PLP and RASTA (and MFCC, and inversion) in MATLAB
[6] P. Vandewalle, J. Kovacevic, and M. Vetterli, “Reproducible research in signal using melfcc.m and invmelfcc.m. [Online]. Available: http://www.ee.columbia
processing,” IEEE Signal Processing Mag., vol. 26, no. 3, pp. 37–47, 2009. .edu/\~dpwe/resources/matlab/rastamat

[7] J. B. Buckheit and D. L. Donoho, “Wavelab and reproducible research,” in [35] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary,
Wavelets and Statistics, A. Antoniadis, G. Oppenheim, and B. McFee, Eds. New M. Young, et al., “Hidden technical debt in machine learning systems,” in Proc.
York: Springer, 1995, pp. 55–81. Advances in Neural Information Processing Systems, 2015, pp. 2503–2511.

[8] M. Owens and G. Allen, SQLite. New York: Springer-Verlag, 2010. [36] G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, and T. K. Teal,
“Good enough practices in scientific computing,” PLoS Computational Biology,
[9] M. Folk, A. Cheng, and K. Yates, “HDF5: A file format and I/O library for high per- vol. 13, no. 6, 2017. doi: 10.1371/journal.pcbi.1005510.
formance computing applications,” in Proc. Supercomputing, vol. 99, 1999, pp. 5–33.
[37] GitHub, Inc. No license. [Online]. Available: https://choosealicense.com/no-
[10] F. Bellard, M. Niedermayer, et al. (2012). Ffmpeg. [Online]. Available: http:// permission/
ffmpeg.org.
[38] K. Beck, Test-Driven Development: By Example. Reading, MA: Addison-
[11] E. de Castro Lopo. (2011). Libsndfile. [Online]. Available: http://www.mega-nerd Wesley, 2003.
.com/libsndfile/
[39] F. S. Chirigati, D. E. Shasha, and J. Freire, “ReproZip: Using provenance to
[12] M. Good, “MusicXML for notation and analysis,” Virtual Score: support computational reproducibility,” presented at the 5th USENIX Conf. Theory
Representation, Retrieval, Restoration, vol. 12, pp. 113–124, 2001. and Practice of Provenance (TAPP’13), 2013.
[13] P. Roland, “The music encoding initiative (MEI),” in Proc. First Int. Conf. [40] B. L. Sturm, “An analysis of the GTZAN music genre dataset,” in Proc. 2nd
Musical Applications Using, 2002, pp. 55–59. Int. ACM Workshop on Music Information Retrieval With User-Centered
[14] S.-F. Chang, T. Sikora, and A. Purl, “Overview of the MPEG-7 standard,” Multimodal Strategies, 2012, pp. 7–12.
IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 6, pp. 688–695, 2001. [41] M. Cartwright, A. Seals, J. Salamon, A. Williams, S. Mikloska, D.
[15] E. J. Humphrey, J. Salamon, O. Nieto, J. Forsyth, R. M. Bittner, and J. P. MacCbonnell, E. Law, J. Bello, and O. Nov, “Seeing sound: Investigating the
Bello, “JAMS: A JSON annotated music specification for reproducible MIR effects of visualizations and complexity on crowdsourced audio annotations,” Proc.
research,” in Proc. 15th Int. Society for Music Information Retrieval Conf., ACM on Human-Computer Interaction, vol. 1, no. 1, 2017. doi: 10.1145/3134664.
(ISMIR), Taipei, Taiwan, 27–31 Oct. 2014, pp. 591–596. [42] Bioacoustics Research Program. (2014). Raven pro: Interactive sound analysis
[16] B. McFee, E. J. Humphrey, and J. P. Bello, “A software framework for musical software (version 1.5). [Online]. Available: http://www.birds.cornell.edu/raven
data augmentation,” in Proc. 16th Int. Society for Music Information Retrieval [43] D. Mazzoni and R. Dannenberg. (2000). Audacity. Avaliable: https://www
Conf., (ISMIR), Málaga, Spain, 26–30 Oct. 2015, pp. 248–254. .audacityteam.org
[17] M. Mauch and S. Ewert, “The audio degradation toolbox and its application to [44] M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. Bello,
robustness evaluation,” in Proc. 14th Int. Society for Music Information Retrieval and S. Dixon, “Computer-aided melody note transcription using the Tony software:
Conf., (ISMIR), Curitiba, Brazil, 4–8 Nov. 2013, pp. 83–88. Accuracy and efficiency,” presented at the 1st Int. Conf. Technologies for Music
[18] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A Notation and Representation, 2015.
library for soundscape synthesis and augmentation,” presented at the Workshop on [45] E. Fonseca, J. Pons Puig, X. Favory, F. Font Corbera, D. Bogdanov, A.
Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, Oct. Ferraro, S. Oramas, A. Porter, and X. Serra, “Freesound datasets: A platform for
2017. the creation of open audio datasets,” in Proc. 18th Int. Society for Music
[19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, Information Retrieval Conf. (ISMIR), Suzhou, China, Oct. 2017, pp. 486–493.
M. Blondel, P. Prettenhofer, et al. “Scikit-learn: Machine learning in Python,” J. [46] K. Borne and Z. Team, “The Zooniverse: A framework for knowledge discov-
Mach. Learning Res., vol. 12, pp. 2825–2830, Oct. 2011. ery from citizen science data,” in Proc. AGU Fall Meeting Abstracts, 2011.
[20] B. Whitman, G. Flake, and S. Lawrence, “Artist detection in music with min- [47] G. Peeters and K. Fort, “Towards a (better) definition of the description of
nowmatch,” in Proc. 2001 IEEE Signal Processing Society Workshop, 2001, pp. annotated MIR corpora,” in Proc. 13th Int. Society for Music Information Retrieval
559–568. Conf., (ISMIR), Porto, Portugal, 8–12 Oct. 2012, pp. 25–30.
[21] B. McFee, C. Jacoby, E. J. Humphrey, and W. Pimenta. (2018). Pescadores/ [48] T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H.
pescador: 2.0.0. [Online]. Available: https://doi.org/10.5281/zenodo.1165998 Daumeé III, and K. Crawford. (2018). Datasheets for datasets. arXiv. [Online].
[22] B. Van Merriënboer, D. Bahdanau, V. Dumoulin, D. Serdyuk, D. Warde- Available: https://arxiv.org/abs/1803.09010
Farley, J. Chorowski, and Y. Bengio. (2015). Blocks and fuel: Frameworks for deep
learning. arXiv. [Online]. Available: https://arxiv.org/abs/1506.00619 SP

IEEE Signal Processing

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Magazine | January
Downloaded 2019 |14,2025 at 12:49:11 UTC from IEEE Xplore. Restrictions
on March 137apply.

Git For Devops Study Material
80% (5)
Git For Devops Study Material
194 pages
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
No ratings yet
Librosa - Audio and Music Signal Analysis in Python SCIPY 2015
7 pages
Pap Tza 2017
No ratings yet
Pap Tza 2017
17 pages
Music Information Retrieval in Live Coding: A Theoretical Framework
No ratings yet
Music Information Retrieval in Live Coding: A Theoretical Framework
17 pages
The music information retrieval evaluation exchange (2005–2007)- A window into music information retrieval research
No ratings yet
The music information retrieval evaluation exchange (2005–2007)- A window into music information retrieval research
9 pages
Audio Data Analysis Using Machine Learning and Deep
No ratings yet
Audio Data Analysis Using Machine Learning and Deep
74 pages
PS4 21 PDF
No ratings yet
PS4 21 PDF
6 pages
Gender Recognition Using Fast Fourier Transform With Ann
No ratings yet
Gender Recognition Using Fast Fourier Transform With Ann
6 pages
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
No ratings yet
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
6 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
Paper 10
No ratings yet
Paper 10
9 pages
Mirtoolbox: User Guide
No ratings yet
Mirtoolbox: User Guide
153 pages
Audio Signal Analysis in Industrial Settings
No ratings yet
Audio Signal Analysis in Industrial Settings
11 pages
MIRtoolbox UserGuide 1.3.5
No ratings yet
MIRtoolbox UserGuide 1.3.5
276 pages
A survey on symbolic data-based music genre classification
No ratings yet
A survey on symbolic data-based music genre classification
21 pages
Music Analysis, Retrieval and Synthesis of Audio Signals MARSYAS (George Tzanetakis)
No ratings yet
Music Analysis, Retrieval and Synthesis of Audio Signals MARSYAS (George Tzanetakis)
2 pages
Detection of Power Grid Synchronization Failure by Sensing Bad Voltage and Frequency
No ratings yet
Detection of Power Grid Synchronization Failure by Sensing Bad Voltage and Frequency
5 pages
MIRtoolbox User's Manual 1.2
No ratings yet
MIRtoolbox User's Manual 1.2
162 pages
Signal Processing Tools 04
No ratings yet
Signal Processing Tools 04
5 pages
Exploring Music Contents
No ratings yet
Exploring Music Contents
372 pages
Validity in Music Information Research Experiments
No ratings yet
Validity in Music Information Research Experiments
24 pages
Exploring Music Contents
100% (1)
Exploring Music Contents
370 pages
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
No ratings yet
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
5 pages
2022 Compare Deep Learning Models and Evalution Strategies
No ratings yet
2022 Compare Deep Learning Models and Evalution Strategies
14 pages
Pert Usa PHD
No ratings yet
Pert Usa PHD
232 pages
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
No ratings yet
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
232 pages
Signal Processing Methods For Music Transcription Klapuri
No ratings yet
Signal Processing Methods For Music Transcription Klapuri
443 pages
PS4 20 PDF
No ratings yet
PS4 20 PDF
6 pages
Towards Music Imagery Information Retrieval: Introducing The Openmiir Dataset of Eeg Recordings From Music Perception and Imagination
No ratings yet
Towards Music Imagery Information Retrieval: Introducing The Openmiir Dataset of Eeg Recordings From Music Perception and Imagination
7 pages
Audio Noise detection
No ratings yet
Audio Noise detection
29 pages
Mirt O O LB Ox, Ope N-Sourc E: Advanced Use, Architecture Description, Open-Source Project
No ratings yet
Mirt O O LB Ox, Ope N-Sourc E: Advanced Use, Architecture Description, Open-Source Project
1 page
Davy Bayes 2002
No ratings yet
Davy Bayes 2002
16 pages
Aria-MIDI a Dataset of Piano MIDI Files for Symbol
No ratings yet
Aria-MIDI a Dataset of Piano MIDI Files for Symbol
17 pages
Mirtoolbox 1.3.3: User'S Manual
No ratings yet
Mirtoolbox 1.3.3: User'S Manual
185 pages
Music Genre Classification
No ratings yet
Music Genre Classification
5 pages
MIRtoolbox 1.3 User's Manual
No ratings yet
MIRtoolbox 1.3 User's Manual
175 pages
Manual1 7 PDF
No ratings yet
Manual1 7 PDF
221 pages
Okay2
No ratings yet
Okay2
138 pages
Important Please Read: Brief Citation
No ratings yet
Important Please Read: Brief Citation
47 pages
A Tutorial On Spectral Sound Processing Using Max/MSP and Jitter
0% (1)
A Tutorial On Spectral Sound Processing Using Max/MSP and Jitter
16 pages
Vocal Pitch Detection For Musical Transcription PDF
No ratings yet
Vocal Pitch Detection For Musical Transcription PDF
3 pages
Klapuri - 2006 - Introduction To Music Transcription
No ratings yet
Klapuri - 2006 - Introduction To Music Transcription
28 pages
Additions and Improvements To The Ace 2.0 Music Classifier
No ratings yet
Additions and Improvements To The Ace 2.0 Music Classifier
6 pages
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
No ratings yet
Librosa: Audio and Music Signal Analysis in Python: Jones01 Vanderwalt11
8 pages
Music database retrieval based on spectral similarity.
No ratings yet
Music database retrieval based on spectral similarity.
9 pages
Signal OEL
No ratings yet
Signal OEL
4 pages
article - audio intent detection classification problem
No ratings yet
article - audio intent detection classification problem
4 pages
Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications
No ratings yet
Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications
18 pages
Gorlow 16a
No ratings yet
Gorlow 16a
14 pages
06516351
No ratings yet
06516351
6 pages
Analysis Introduction MAXMSP JITTER
100% (2)
Analysis Introduction MAXMSP JITTER
11 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
ffffffffffffffffff
No ratings yet
ffffffffffffffffff
12 pages
Intro To Music Inform
No ratings yet
Intro To Music Inform
28 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Introduction to Data Analysis in Qualitative Research
From Everand
Introduction to Data Analysis in Qualitative Research
Asher Shkedi
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
From Everand
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
daniel Huston
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Research Methodology Approaches
From Everand
Research Methodology Approaches
Jerry H. Swift
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Git2 PDF
No ratings yet
Git2 PDF
10 pages
Lab1-Version Controlling with Git in Visual Studio Code and GitHub
No ratings yet
Lab1-Version Controlling with Git in Visual Studio Code and GitHub
19 pages
8 - 11 - Introduction To Github
No ratings yet
8 - 11 - Introduction To Github
10 pages
Git Slack Integration
No ratings yet
Git Slack Integration
20 pages
Installation Doubts?: Workshop Will Start at 3:00 PM
No ratings yet
Installation Doubts?: Workshop Will Start at 3:00 PM
50 pages
Creating A Repository in Github - Duration: 2 Days: Jala Technologies
100% (1)
Creating A Repository in Github - Duration: 2 Days: Jala Technologies
9 pages
Git Interview Questions
No ratings yet
Git Interview Questions
11 pages
Arun Git
No ratings yet
Arun Git
7 pages
Git Hub Cheat Sheet
No ratings yet
Git Hub Cheat Sheet
2 pages
Linux Commands
No ratings yet
Linux Commands
7 pages
Git Tutorial
No ratings yet
Git Tutorial
8 pages
Original
No ratings yet
Original
24 pages
Final Report Gtu
No ratings yet
Final Report Gtu
39 pages
02-Introduction to Git in VS Code
No ratings yet
02-Introduction to Git in VS Code
14 pages
Git GitHub Consolidated
No ratings yet
Git GitHub Consolidated
51 pages
Notes 1
No ratings yet
Notes 1
3 pages
Source Code Management - CS181 - Lab1
No ratings yet
Source Code Management - CS181 - Lab1
12 pages
Introduction to DevOps
No ratings yet
Introduction to DevOps
107 pages
DevOps Interview Questions 01
No ratings yet
DevOps Interview Questions 01
27 pages
Devops Unit 3
No ratings yet
Devops Unit 3
18 pages
Principles of Software Construction: Objects, Design, and Concurrency
No ratings yet
Principles of Software Construction: Objects, Design, and Concurrency
53 pages
An Intro To Git and GitHub For Beginners (Tutorial)
100% (1)
An Intro To Git and GitHub For Beginners (Tutorial)
28 pages
Databricks Final
No ratings yet
Databricks Final
81 pages
DevOps Basics
100% (1)
DevOps Basics
28 pages
The Usage of Bots in Open Source Software Development
No ratings yet
The Usage of Bots in Open Source Software Development
4 pages
Git Branching Strategy
No ratings yet
Git Branching Strategy
12 pages
Git Github COMMANDS
No ratings yet
Git Github COMMANDS
23 pages
Software Project Management - Chapter 8 Software Configuration Management
No ratings yet
Software Project Management - Chapter 8 Software Configuration Management
43 pages
LLL
No ratings yet
LLL
75 pages

Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research

Uploaded by

Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research

Uploaded by

MUSIC SIGNAL PROCESSING

Brian McFee, Jong Wook Kim, Mark Cartwright,

Reproducibility and Complexity in MIR

128 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

IEEE Signal Processing

130 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

IEEE Signal Processing

132 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

IEEE Signal Processing

134 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

IEEE Signal Processing

136 licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY

IEEE Signal Processing

You might also like