ZsaDescriptors A Library

Uploaded by

jiwifan102

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

ZsaDescriptors A Library

Uploaded by

jiwifan102

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Zsa.

Descriptors: a library for real-time descriptors

analysis
Mikhail Malt, Emmanuel Jourdan

To cite this version:

Mikhail Malt, Emmanuel Jourdan. Zsa.Descriptors: a library for real-time descriptors analysis. 5th
Sound and Music Computing Conference, Berlin, Germany, Jul 2008, Berlin, Germany. pp.134-137.
�hal-01580326�

HAL Id: hal-01580326

https://hal.science/hal-01580326
Submitted on 1 Sep 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Zsa.Descriptors:
a library for real-time descriptors analysis

Mikhail Malt *, Emmanuel Jourdan†

*
IRCAM, Paris, France, [email protected]
†
IRCAM, Paris, France, [email protected]

characteristics of a given sound signal than the

I.INTRODUCTION use of one descriptor at a time;
In the past few years, several strategies to • The lack of a large choice of descriptors in real-
characterize sound signals have been suggested. The time so that artists can test them and learn to use
main objective of these strategies was to describe the them.
sound [1]. However, it was only with the creation of a
new standard format for indexing and transferring audio
MPEG 7 data that the desire to define audio data
semantic content descriptors came about [2, p.52]. The
II.REAL-TIME ENVIRONMENTS AND D ESCRIPTORS
widely known document written by Geoffroy Peeters
[1] is an example where, even if the goal announced is Among the most widely used software environments
not to carry out a systematic taxonomy on all the for real-time musical performances are SuperCollider
functions intended to describe sound, it does in fact [7], PureData [8], and Max/MSP [9]. Max/MSP offers
systematize the presentation of various descriptors. the largest selection of tools to work with sound
descriptors. Currently, several libraries offering
analyses of descriptors are available in Max/MSP. The
A. Descriptors Today best known of these environments include the library by
Tristan Jehan [10] [11] (pitch~, loudness~, brightness~ ,
A large percentage of the uses for descriptors concern
noisiness~, bark~, analyzer~, shifter~, segment~,
primarily indexing and browsing contents of sound
beat~), the iana~ object of Todor Todoroff, the yin~
databases or for re-synthesis of sounds, such as in
object implemented by Norbert Schnell, according to
telephone transmissions.
the Cheveigné and Kawara model [12], the FTM/Gabor
Our interest concerns the use of the analysis of object library [13] [14] that enables development of
descriptors in real-time for the creation and analysis of descriptors, and finally the classic fiddle~ and bonk~ by
contemporary music. In this domain, with the exception Miller Puckette [15].
of the fundamental frequency and the energy of the
However, a large number of the descriptors offered
sound signal, the use of spectral descriptors is still rare.
are, as we have already mentioned, based on the
It is important nonetheless, to examine the experiments
recognition of the fundamental frequency and the
on computer-assisted improvisation carried out by
energy. The only exceptions are the descriptors offered
Assayag, Bloch, and Chemillier [3] [4] and the
in the Gabor library, but they do not cover yet a large
developments in “concatenative synthesis” by Diemo
set of descriptors.
Schwarz [5] where the analysis of a variety of sound
descriptors is used to control re-synthesis.

B. Descriptors and Music Composition III.THE FIRST DESCRIPTORS SET AVAILABLE IN

The fact that descriptors are rarely used in ZSA.DESCRIPTORS
contemporary music compositions is due to several The Zsa.Descriptors library is intended to provide a
factors including: set of audio descriptors specially designed to be used in
• The lack of knowledge of the relationships real-time. This objects collection encloses a sound
between descriptors and the pertinent perceptual descriptors set coming from the MPEG-7 Descriptors,
characteristics of the sound for use in musical outlined by Peeters [1], algorithms for peak search from
composition; Serra [16] and some ideas from the Computer Assisted
• The fact that one descriptor is not sufficient in Composition developments realized by the Musical
order to characterize a complex “sound state” Representation Team at Ircam. In the next paragraphs
such as that of a note played “live.” Recent we will describe some of this tools, already developed
studies [6] show how the composed functions of in the Zsa.Descriptors library.
descriptors are more effective in recognizing the
A. Spectral Centroid (brightness) n"1

This is a very well known descriptor. The Spectral # a[i] " a[1]
centroid is the barycentre of spectra, computed as i=0
follow: decrease = n"1

n"1 # a[i](i "1)

# f [i]a[i] i=2:K

i=0
µ= n"1
E. Spectral Roll-Off
# a[i] The spectral roll-off point is the frequency fc [i] so
i=0 !
Where: that x% of the signal falls below this frequency. “ x ” is
n , is the half of the fft window size took as 0.95 as default value. The roll-off point is
calculated as follow:
i , the bin index fc [i] n#1
a[i] , is
!the amplitude of the bin i , the real part of the
! FFT calculus
!
" a [ f [i]] = x" a! [ f [i]]
2 ! 2

i=0 i=0
! f [i], is the frequency of the bin i . where where:
! ! samplerate fc [i], is the roll-off point and
f [i] = i *
fft window size x , the roll_off energy percent accumulated.
! ! !
and
µ , is the spectral centroid in hertz. ! F. Sinusoidal model based on peaks detection
!
We have based the calculus of our algorithm on the
! widely known method defined by Smith&Serra [16]
B. Spectral Spread (spectral centroid variance)
! [18, p. 38-48], where a peak is defined as a local
As usual, we consider the spectral centroid as the first maximum in the real magnitude spectrum ak [i] .“ k ” is
moment of spectra, considered as a frequency the frame index. As not all the peaks are equally
distribution, which is related with the weighted important in the spectrum, we have used a sliding five
frequency mean value. The spectral spread is the second points window to scan the magnitude spectra, avoiding
moment, i.e., the variance of the mean calculated above. undesired peaks. For each 5 magnitudes ! !vector we
n"1
2
check for the third point ak [2], and for a given
# ( f [i] " µ) a[i] threshold value " t , we compute:
v= i=0 ak [2] = max{ak [0], ak [1],..., ak [4]} " ak [2] > # t .
n"1
If the condition
! is true, then ak [2] becomes a peak.
# a[i] A parabolic interpolation is then applied on the three
i=0
!
! points, ak [1], ak [2], ak [3] .
adjacent
Solving the parabola ! peak location [18, p. 47], a
C. Spectral Slope coefficient “ p ” of the “ j ” peak is then calculated:
The spectral slope is an estimation of the amount of
! magnitude decreasing, computed by a linear
spectral ! 1 ak [1] " ak [3]
pj =
regression on the magnitude spectra. 2 ak [1] " 2ak [2]+ ak [3]
n"1 n"1 n"1 ! ! location (in bins) is given by:
The true peak
n# f [i]a[i] " # f [i]# a[i] i peak[ j ] " ia k [2 ] + p j
1 i=0 i=0 i=0
slope = n"1 To estimate the true magnitude we use p as follow:
n"1 $ n"1 '2 !
# a[i] n# f [i] " &# f [i])
2
1
ak peak[ j ] " ak [2] # (ak [3] # ak [1]) p j
i=0 i=0 % i=0 ( 4
!
At the end of the process we have collected a set of
D. Spectral Decrease partials pk j = ( f j , a j ) .
The spectral decrease meaning is similar to spectral
! slope, representing the amount of spectral magnitude !
decreasing. According to Peteers [1], this formulation G. A Tempered Virtual fundamental
comes from perceptual studies and it is supposed to be This descriptor was based on the harmonic histogram
more correlated to human perception. !
technic described by Jean Laroche [20, p.52-53]. We
adapted this method in order to approximate the result
and the research phase for the “best candidate”, by a
tempered musical scale with a given division.
Given a set of peaks pk j = ( f j , a j ) , calculated as object that can load special designed “patchers”. The
“pfft~” object takes at least three arguments: the patcher
showed previously, name, the FFT window size and an overlap factor, to
1) For each pk j = ( f j , a j ) we calculate a set of calculate the hop size (Fig. 1). The loaded “patcher”
must also follow a general structure. This patcher must
" fj % have at least an “fftin~” object. The pfft~ object
pk jn = $ ! , a j ', n ( N, n ) [1,.., 6] . manages the windowing and overlap of the incoming
#n & signal, fftin~ applies the window function (envelope)
and performs the FFT. The fftin~ object takes two
! fj arguments, the "inlet assignment" and the name of the
2) All the were converted in indexes, i jn , in a window envelop function (hanning, hamming, square
n and blackman are included), or the name of buffer~. It
pitch-class space. At this level i jn " R .
! is therefore possible to use any kind of window
3) The i jn were approximated by a grid of discrete depending on the type of sound that we want to analyse
values multiples of q , q " R, q #
! ![0,1] , returning a
new set of values i qjn , multiples of q . Notice that q can
!
be seen as a half-tone division. q = 1, means an
!approximation by a half-tone, q = 0.5 an approximation
by a quarter-tone,
!! ! and so on.
! couples pk! Fig. 2 the fftin~ object
jn = (i jn , a j ) .
q q
4) This leads us to new
!
!
5) Collecting all couples according with the identical Therefore, most of the objects of the library, was
designed to run inside the standard MAX/MSP pfft~
i q
jn , we build new couples
q
jn (
pk = i , " a j , where
q
jn ) object (Fig. 3). This strategy offers multiple advantages:
modularity, efficiency, and also the ability of using the
"a j is the sum of all!a j for the identical i q
jn . analysis directly as parameter for sound processing in
6) The best candidate to be our virtual fundamental the spectral domain.

! will be the ( )
pk qjn!= i qjn , " a j , that maximises "a .
j
q
! last phase, i jn is converted,
7) In the ! ! in floating point
MIDI pitches or in a frequency space.

! !
!
IV.THE SOLUTION OFFERED BY ZSA.D ESCRIPTORS
As was exposed previously, the main goal of
Zsa.Descriptors, a library of sound descriptors and
spectral analysis tools, is to expand the capabilities of
Fig. 3 Interior of the pfft~ object
sound description using the systematic approach of the
MPEG7 standard, and to offer a set of truly integrated
Furthermore, the fact that the objects of this library
objects for the Max/MSP [9] graphical programming
can work within the Max/MSP environment either
environment. In addition to sound descriptors and
together or by themselves and the fact that they work
original analysis features, the external objects of the
smoothly in conjunction with other standard Max/MSP
Zsa.Decriptors library are designed to compute multiple
objects, makes it possible to exploit all the
descriptors in real-time with both efficiency in terms of
synchronization resources available in this environment.
CPU usage and guaranteed synchronization. In
consequence a modular approach was chosen. In the
MAX/MSP environment context, this was made possible
by sharing the expensive process of the windowed FFT.
V.CONCLUSIONS AND PERSPECTIVES
We have presented in this paper a set of sound
descriptors and signal analysis tools intended to be used
in real-time as a toolbox for composers, musicologist
and researchers. This will allow the use and the research
on the use of sound descriptors in the fields of
systematic musicology and as a tool for taking decisions
in the real-time performance context. As part of the
Fig. 1 the pfft~ object future work, we have also planed a research on the
musical segmentation based on sound descriptors as a
MAX/MSP, actually has an object that calculates, in strategy to musical analysis.
an efficient form, a windowed FFT, the “pfft~” facility.
The main advantage, of this library, more than the
As is said, in the documentation, “pfft~”, is a small improvements we did in some algorithms, was the
“Spectral processing manager for patchers”, i.e., an modular development technique and implementation we
have used, trying to optimise the calculus by a strong [14] N. Schnell, D. Schwarz, “Gabor, multi-representation real-
integration in the MAX/MSP environment. time analysis/synthesis”, Proc. of the 8th Int. Conference on Digital
Audio Effects (DAFx’05), Madrid, Spain, September 20-22, 2005
Of course, the work presented here still preliminary, [15] M. Puckette, T. Apel,. « Real-time audio analysis tools for
but it will be improved with the implementation of the Pd and MSP ». Proceedings, International Computer Music
following list of features, which are already Conference. San Francisco: International Computer Music
implemented or currently being developed: temporal Association, 1998, pp. 109-112.
variation of the spectrum, bark, inharmonicity, [16] J.O. Smith, X. Serra, “PARSHL: an analysis/synthesis
harmonic spectral deviation, odd to Even harmonic program for non- harmonic sounds based on a sinusoidal
energy ratio, tristimulus, frame energy, harmonic part representation”. Proc. 1987 Int. Computer Music Conf. (ICMC’87),
energy (this harmonic descriptors will use a Urbana, Illinois, August 1987, pp. 290 -297.
monophonic F0 algorithm developed by Chunghsin [17] B. Doval, X. Rodet, “Fundamental frequency estimation
YEH in his Ph.D. thesis [21]), noise part energy, and and tracking using maximum likelihood harmonic matching and
HMMs.” Proceedings of the ICASSP ‘93, 1993, pp. 221- 224.
others descriptors coming from the signal processing
[18] X. Serra, A system for sound
and computer assisted composition worlds. analysis/transformation/synthesis based on a deterministic plus
stochastic decomposition. Philosophy Dissertation, Stanford
University, Oct. 1989
[19] X. Rodet, “Musical Sound Signal Analysis/Synthesis:
Sinusoidal+Residual and Elementary Waveform Models”, in
ACKNOWLEDGMENT TFTS'97 (IEEE Time-Frequency and Time-Scale Workshop 97),
We would like to thank Richard Dudas for his fruitful Coventry, Grande Bretagne, august 1997.
remarks, comments and suggestions, Arshia Cont for [20] J. Laroche, Traitement des signaux audio-fréquences,
his friendly support and Cyril Beros for funding the TELECOM, Handout, Paris, France, February 1995.
travel support. [21] C. YEH, Multiple fundamental frequency estimation of
polyphonic recordings, Ph.D. thesis, Université Paris 6, 2008.

REFERENCES

[1] G. Peeters, A large set of audio features for sound

description (similarity and classification) in the CUIDADO project.
Cuidado projet report, Institut de Recherche et de Coordination
Acoustique Musique (IRCAM), 2004.
[2] F. Gouyon, Extraction automatique de descripteurs
rythmiques dans des extraits de musiques populaires polyphoniques,
mémoire de DEA ATIAM, Université de la Méditérranée, Université
Paris VI, IRCAM, Télécom-Paris, université du Maine, Ecole
Normale Supérieure, ACROE-IMAG, Juillet 2000.
[3] G. Assayag, G. Bloch, M. Chemillier, A. Cont, S. Dubnov
– « Omax Brothers : A Dynamic Topology of Agents for
Improvization Learning », in Workshop on Audio and Music
Computing for Multimedia, ACM Multimedia 2006, Santa Barbara,
USA, October 2006.
[4] G. Assayag, G. Bloch, M. Chemillier – « OMax-Ofon », in
Sound and Music Computing (SMC) 2006, Marseille, France, Mai
2006
[5] D. Schwarz, « Current research in concatenative sound
synthesis », Proceedings of the International Computer Music
Conference (ICMC), Barcelona, Spain, September 5-9, 2005.
[6] Zils A., Extraction de descripteurs musicaux: une approche
évolutionniste, Thèse de Doctorat de l'Université Paris 6, Septembre
2004.
[7] SuperCollider, © James McCartney,
http://www.audiosynth.com/
[8] PureData, © Miller Puckette,
http://crca.ucsd.edu/~msp/Pd_documentation/
[9] Max/MSP, © Cycling74, www.cycling74.com
[10] T. Jehan, B. Schoner, « An Audio-Driven, Spectral
Analysis-Based, Perceptual Synthesis Engine », in Audio
Engineering Society, Proceedings of the 110th Convention,
Amsterdam, The Netherlands, 2001.
[11] T. Jehan, Creating Music by Listening, PhD Thesis in
Media Arts and Sciences, Massachusetts Institute of Technology,
September 2005.
[12] A. De Cheveigné, H. Kawahara, « YIN, a fundamental
frequency estimator for speech and music », J. Acoust. Soc. Am.
111, 1917-1930, 2002.
[13] N.Schnell et al. « FTM Complex Data Structures for
Max/MSP », in ICMC 2005, Barcelona, Spain, 2005..

Praat Manual
100% (2)
Praat Manual
1,270 pages
First Research Paper
No ratings yet
First Research Paper
15 pages
Musical Instrument Timbres Classification With Spectum
100% (1)
Musical Instrument Timbres Classification With Spectum
10 pages
Shazam Princeton ELE201
No ratings yet
Shazam Princeton ELE201
7 pages
Musical Genre Classification by Instrumental Features: Dannenberg, Thom, and Watson
No ratings yet
Musical Genre Classification by Instrumental Features: Dannenberg, Thom, and Watson
4 pages
Instrument Recognition
No ratings yet
Instrument Recognition
1 page
Es Sem04 Paper 04307909
No ratings yet
Es Sem04 Paper 04307909
17 pages
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
No ratings yet
Review Analysis of Real World Noise: Dheeraj Joshi, Prashant Moud
6 pages
Guide to the Basic Concepts and Techniques of Spectral Music Joshua Fineberg Part 4
No ratings yet
Guide to the Basic Concepts and Techniques of Spectral Music Joshua Fineberg Part 4
6 pages
MSM 1997 Xserra
No ratings yet
MSM 1997 Xserra
25 pages
Music Source Separation: Francisco Javier Cifuentes Garc Ia
No ratings yet
Music Source Separation: Francisco Javier Cifuentes Garc Ia
7 pages
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
100% (1)
Content-Based Classification of Musical Instrument Timbres: Agostini Longari Pollastri
8 pages
s10844-010-0140-5
No ratings yet
s10844-010-0140-5
22 pages
Cross-Correlation As A Measure For Cross-Modal Analysis of Music and Floor Data
No ratings yet
Cross-Correlation As A Measure For Cross-Modal Analysis of Music and Floor Data
5 pages
Chap 5 Audio Dbms
No ratings yet
Chap 5 Audio Dbms
16 pages
Sound Synthesis Theory
100% (3)
Sound Synthesis Theory
31 pages
FFT Research
No ratings yet
FFT Research
8 pages
Variation and The Frequency
No ratings yet
Variation and The Frequency
1 page
Convention Paper 5452: Audio Engineering Society
100% (1)
Convention Paper 5452: Audio Engineering Society
10 pages
Realtime Selection of Percussion Samples Through Timbral Similarity in Max/Msp
No ratings yet
Realtime Selection of Percussion Samples Through Timbral Similarity in Max/Msp
4 pages
ffffffffffffffffff
No ratings yet
ffffffffffffffffff
12 pages
Analysis and Synthesis of Speech Using Matlab
No ratings yet
Analysis and Synthesis of Speech Using Matlab
10 pages
Informed Spectral Analysis: Audio Signal Parameter Estimation Using Side Information
No ratings yet
Informed Spectral Analysis: Audio Signal Parameter Estimation Using Side Information
17 pages
Towards Timbre Solfege From Sound Features Manipulation
No ratings yet
Towards Timbre Solfege From Sound Features Manipulation
2 pages
Aes2001 Bonada PDF
100% (1)
Aes2001 Bonada PDF
10 pages
Bros Sier 04 Fast Notes
No ratings yet
Bros Sier 04 Fast Notes
6 pages
1804.01212
No ratings yet
1804.01212
19 pages
Basic Features of Audio Signals (音訊的基本特徵) : Jyh-Shing Roger Jang (張智星) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
No ratings yet
Basic Features of Audio Signals (音訊的基本特徵) : Jyh-Shing Roger Jang (張智星) MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
18 pages
Palermo
No ratings yet
Palermo
4 pages
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
100% (1)
Acoustic Phonetics - The Handbook of Phonetic Sciences - Blackwell Reference Online
32 pages
6666666666666666
No ratings yet
6666666666666666
11 pages
Journal of New Music Research: To Cite This Article: Shlomo Dubnov, Naftali Tishby & Dalia Cohen (1995) Hearing
No ratings yet
Journal of New Music Research: To Cite This Article: Shlomo Dubnov, Naftali Tishby & Dalia Cohen (1995) Hearing
29 pages
Feature Analysis and Extraction For Audio Automatic Classification
No ratings yet
Feature Analysis and Extraction For Audio Automatic Classification
6 pages
Audio Indexing: Feature Extraction
No ratings yet
Audio Indexing: Feature Extraction
1 page
03-Acoustics - Psychoacoustics and Spectral Music
100% (1)
03-Acoustics - Psychoacoustics and Spectral Music
28 pages
Automatic Music Timbre Indexing
No ratings yet
Automatic Music Timbre Indexing
1 page
Granular Sound Spatialization Using
No ratings yet
Granular Sound Spatialization Using
5 pages
Audio File Recognition Using Hash Algorithm
No ratings yet
Audio File Recognition Using Hash Algorithm
8 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
COMPOSITION BY EXPLORATION
No ratings yet
COMPOSITION BY EXPLORATION
4 pages
A Comparative Study in Automatic Recognition of Broadcast Audio
No ratings yet
A Comparative Study in Automatic Recognition of Broadcast Audio
4 pages
Introsounds 2 2
No ratings yet
Introsounds 2 2
33 pages
Multimedia Systems: Sreeraj K. P. Asst. Professor, Dec, Rset
No ratings yet
Multimedia Systems: Sreeraj K. P. Asst. Professor, Dec, Rset
27 pages
Seewave Analysis
No ratings yet
Seewave Analysis
17 pages
Chapter 4
No ratings yet
Chapter 4
20 pages
Audio Noise detection
No ratings yet
Audio Noise detection
29 pages
Automatic Musical Instrument
No ratings yet
Automatic Musical Instrument
1 page
Spectral FFT Max Ms P
100% (1)
Spectral FFT Max Ms P
17 pages
Digital Signal Processing: Course
No ratings yet
Digital Signal Processing: Course
47 pages
TimbreSolfege Exercises SzA
100% (1)
TimbreSolfege Exercises SzA
20 pages
Speech Acoustics Project
No ratings yet
Speech Acoustics Project
22 pages
Speech Chapter 4
No ratings yet
Speech Chapter 4
41 pages
Emotional Responses To The Perceptual Dimensions of Timbre - A Pilot Study Using Physically Informed Sound Synthesis
No ratings yet
Emotional Responses To The Perceptual Dimensions of Timbre - A Pilot Study Using Physically Informed Sound Synthesis
15 pages
Model of Resonance, Object For Max MSP
No ratings yet
Model of Resonance, Object For Max MSP
5 pages
article - audio intent detection classification problem
No ratings yet
article - audio intent detection classification problem
4 pages
Locating Segments With Drums in Music Signals: Toni Heittola Anssi Klapuri
No ratings yet
Locating Segments With Drums in Music Signals: Toni Heittola Anssi Klapuri
6 pages
Semantic Rank Reduction of Music Audio: MIT Media Lab 20 Ames St. E15-491 Cambridge, MA 02139 USA
No ratings yet
Semantic Rank Reduction of Music Audio: MIT Media Lab 20 Ames St. E15-491 Cambridge, MA 02139 USA
4 pages
Timbre Id
No ratings yet
Timbre Id
6 pages

ZsaDescriptors A Library

Uploaded by

ZsaDescriptors A Library

Uploaded by

Zsa.

Descriptors: a library for real-time descriptors

To cite this version:

HAL Id: hal-01580326

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

Mikhail Malt *, Emmanuel Jourdan†

characteristics of a given sound signal than the

B. Descriptors and Music Composition III.THE FIRST DESCRIPTORS SET AVAILABLE IN

n"1 # a[i](i "1)

[1] G. Peeters, A large set of audio features for sound

You might also like