Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research
Open-Source Practices for Music Signal Processing Research Recommendations for Transparent Sustainable and Reproducible Audio Research
Open-Source Practices
for Music Signal Processing Research
Recommendations for transparent, sustainable, and reproducible audio research
I
n the early years of music information retrieval (MIR), research
problems were often centered around conceptually simple
tasks, and methods were evaluated on small, idealized data sets.
A canonical example of this is genre recognition—i.e., Which
one of n genres describes this song?—which was often evaluated
on the GTZAN data set (1,000 musical excerpts balanced across
ten genres) [1]. As task definitions were simple, so too were signal
analysis pipelines, which often derived from methods for speech
processing and recognition and typically consisted of simple
methods for feature extraction, statistical modeling, and evalua-
tion. When describing a research system, the expected level of
detail was superficial: it was sufficient to state, e.g., the number
of mel-frequency cepstral coefficients used, the statistical model
(e.g., a Gaussian mixture model), the choice of data set, and the
evaluation criteria, without stating the underlying software depen-
dencies or implementation details. Because of an increased abun-
dance of methods, the proliferation of software toolkits, the explo-
sion of machine learning, and a focus shift toward more realistic
problem settings, modern research systems are substantially more
complex than their predecessors. Modern MIR researchers must
pay careful attention to detail when processing metadata, imple-
menting evaluation criteria, and disseminating results.
grant for multidisciplinary studies in France. He is a Senior [26] S. Böck, F. Korzeniowski, J. Schlüter, F. Krebs, and G. Widmer, “Madmom: A
new Python audio and music signal processing library,” in Proc. 2016 ACM
Member of the IEEE. Multimedia Conf., 2016, pp. 1174–1178.
[27] G. Tzanetakis and P. Cook, “Marsyas: A framework for audio analysis,”
Organised Sound, vol. 4, no. 3, pp. 169–175, 2000. doi: 10.1017/S1355771800003071.
References [28] J. Urbano, D. Bogdanov, P. Herrera, E. Gómez, and X. Serra, “What is the
[1] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” effect of audio quality on the robustness of MFCCs and chroma features?” in Proc.
IEEE Trans. Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, 2002. doi: 15th Int. Society for Music Information Retrieval Conf., (ISMIR), Taipei, Taiwan,
10.1109/TSA.2002.800560. 27–31 Oct. 2014, pp. 573–578.
[2] B. L. Sturm, “Revisiting priorities: Improving MIR evaluation practices,” in [29] F. Chollet, et al. (2015). Keras. [Online]. Available: https://keras.io
Proc. 17th Int. Society for Music Information Retrieval Conf., (ISMIR), New York,
7–11 Aug. 2016, pp. 488–494. [30] A. Mesaros, T. Heittola, and T. Virtanen, “Metrics for polyphonic sound event
detection,” Appl. Sci., vol. 6, no. 6, p. 162, 2016. doi: 10.3390/app6060162.
[3] T. Cho, R. J. Weiss, and J. P. Bello, “Exploring common variations in state of the
art chord recognition systems,” presented at the Sound and Music Computing [31] A. Said and A. Bellogín, “Rival: A toolkit to foster reproducibility in recom-
Conf., 2010. mender system evaluation,” in Proc. 8th ACM Conf. Recommender Systems, 2014,
pp. 371–372.
[4] C. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P.
W. Ellis, “mir_eval: A transparent implementation of common MIR metrics,” in [32] S. Böck, F. Krebs, and M. Schedl, “Evaluating the online capabilities of onset
Proc. 15th Int. Society for Music Information Retrieval Conf., (ISMIR), Taipei, detection methods,” in Proc. 13th Int. Society for Music Information Retrieval
Taiwan, 27–31 Oct. 2014, pp. 367–372. Conf., Mosteiro S. Bento Da Vitória, Porto, Portugal, 8–12 Oct. 2012, pp. 49–54.
[5] H. Pashler and E. Wagenmakers, “Editors’ introduction to the special section on [33] S. Böck, “oneset_db.” Accessed on: Jan., 2018. [Online]. Available: https://
replicability in psychological science: A crisis of confidence?” Perspectives github.com/CPJKU/onset_db
Psychological Sci., vol. 7, no. 6, pp. 528–530, 2012. [34] D. P. Ellis. (2006). PLP and RASTA (and MFCC, and inversion) in MATLAB
[6] P. Vandewalle, J. Kovacevic, and M. Vetterli, “Reproducible research in signal using melfcc.m and invmelfcc.m. [Online]. Available: http://www.ee.columbia
processing,” IEEE Signal Processing Mag., vol. 26, no. 3, pp. 37–47, 2009. .edu/\~dpwe/resources/matlab/rastamat
[7] J. B. Buckheit and D. L. Donoho, “Wavelab and reproducible research,” in [35] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary,
Wavelets and Statistics, A. Antoniadis, G. Oppenheim, and B. McFee, Eds. New M. Young, et al., “Hidden technical debt in machine learning systems,” in Proc.
York: Springer, 1995, pp. 55–81. Advances in Neural Information Processing Systems, 2015, pp. 2503–2511.
[8] M. Owens and G. Allen, SQLite. New York: Springer-Verlag, 2010. [36] G. Wilson, J. Bryan, K. Cranston, J. Kitzes, L. Nederbragt, and T. K. Teal,
“Good enough practices in scientific computing,” PLoS Computational Biology,
[9] M. Folk, A. Cheng, and K. Yates, “HDF5: A file format and I/O library for high per- vol. 13, no. 6, 2017. doi: 10.1371/journal.pcbi.1005510.
formance computing applications,” in Proc. Supercomputing, vol. 99, 1999, pp. 5–33.
[37] GitHub, Inc. No license. [Online]. Available: https://choosealicense.com/no-
[10] F. Bellard, M. Niedermayer, et al. (2012). Ffmpeg. [Online]. Available: http:// permission/
ffmpeg.org.
[38] K. Beck, Test-Driven Development: By Example. Reading, MA: Addison-
[11] E. de Castro Lopo. (2011). Libsndfile. [Online]. Available: http://www.mega-nerd Wesley, 2003.
.com/libsndfile/
[39] F. S. Chirigati, D. E. Shasha, and J. Freire, “ReproZip: Using provenance to
[12] M. Good, “MusicXML for notation and analysis,” Virtual Score: support computational reproducibility,” presented at the 5th USENIX Conf. Theory
Representation, Retrieval, Restoration, vol. 12, pp. 113–124, 2001. and Practice of Provenance (TAPP’13), 2013.
[13] P. Roland, “The music encoding initiative (MEI),” in Proc. First Int. Conf. [40] B. L. Sturm, “An analysis of the GTZAN music genre dataset,” in Proc. 2nd
Musical Applications Using, 2002, pp. 55–59. Int. ACM Workshop on Music Information Retrieval With User-Centered
[14] S.-F. Chang, T. Sikora, and A. Purl, “Overview of the MPEG-7 standard,” Multimodal Strategies, 2012, pp. 7–12.
IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 6, pp. 688–695, 2001. [41] M. Cartwright, A. Seals, J. Salamon, A. Williams, S. Mikloska, D.
[15] E. J. Humphrey, J. Salamon, O. Nieto, J. Forsyth, R. M. Bittner, and J. P. MacCbonnell, E. Law, J. Bello, and O. Nov, “Seeing sound: Investigating the
Bello, “JAMS: A JSON annotated music specification for reproducible MIR effects of visualizations and complexity on crowdsourced audio annotations,” Proc.
research,” in Proc. 15th Int. Society for Music Information Retrieval Conf., ACM on Human-Computer Interaction, vol. 1, no. 1, 2017. doi: 10.1145/3134664.
(ISMIR), Taipei, Taiwan, 27–31 Oct. 2014, pp. 591–596. [42] Bioacoustics Research Program. (2014). Raven pro: Interactive sound analysis
[16] B. McFee, E. J. Humphrey, and J. P. Bello, “A software framework for musical software (version 1.5). [Online]. Available: http://www.birds.cornell.edu/raven
data augmentation,” in Proc. 16th Int. Society for Music Information Retrieval [43] D. Mazzoni and R. Dannenberg. (2000). Audacity. Avaliable: https://www
Conf., (ISMIR), Málaga, Spain, 26–30 Oct. 2015, pp. 248–254. .audacityteam.org
[17] M. Mauch and S. Ewert, “The audio degradation toolbox and its application to [44] M. Mauch, C. Cannam, R. Bittner, G. Fazekas, J. Salamon, J. Dai, J. Bello,
robustness evaluation,” in Proc. 14th Int. Society for Music Information Retrieval and S. Dixon, “Computer-aided melody note transcription using the Tony software:
Conf., (ISMIR), Curitiba, Brazil, 4–8 Nov. 2013, pp. 83–88. Accuracy and efficiency,” presented at the 1st Int. Conf. Technologies for Music
[18] J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A Notation and Representation, 2015.
library for soundscape synthesis and augmentation,” presented at the Workshop on [45] E. Fonseca, J. Pons Puig, X. Favory, F. Font Corbera, D. Bogdanov, A.
Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, Oct. Ferraro, S. Oramas, A. Porter, and X. Serra, “Freesound datasets: A platform for
2017. the creation of open audio datasets,” in Proc. 18th Int. Society for Music
[19] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, Information Retrieval Conf. (ISMIR), Suzhou, China, Oct. 2017, pp. 486–493.
M. Blondel, P. Prettenhofer, et al. “Scikit-learn: Machine learning in Python,” J. [46] K. Borne and Z. Team, “The Zooniverse: A framework for knowledge discov-
Mach. Learning Res., vol. 12, pp. 2825–2830, Oct. 2011. ery from citizen science data,” in Proc. AGU Fall Meeting Abstracts, 2011.
[20] B. Whitman, G. Flake, and S. Lawrence, “Artist detection in music with min- [47] G. Peeters and K. Fort, “Towards a (better) definition of the description of
nowmatch,” in Proc. 2001 IEEE Signal Processing Society Workshop, 2001, pp. annotated MIR corpora,” in Proc. 13th Int. Society for Music Information Retrieval
559–568. Conf., (ISMIR), Porto, Portugal, 8–12 Oct. 2012, pp. 25–30.
[21] B. McFee, C. Jacoby, E. J. Humphrey, and W. Pimenta. (2018). Pescadores/ [48] T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. Wallach, H.
pescador: 2.0.0. [Online]. Available: https://doi.org/10.5281/zenodo.1165998 Daumeé III, and K. Crawford. (2018). Datasheets for datasets. arXiv. [Online].
[22] B. Van Merriënboer, D. Bahdanau, V. Dumoulin, D. Serdyuk, D. Warde- Available: https://arxiv.org/abs/1803.09010
Farley, J. Chorowski, and Y. Bengio. (2015). Blocks and fuel: Frameworks for deep
learning. arXiv. [Online]. Available: https://arxiv.org/abs/1506.00619 SP