0% found this document useful (0 votes)
19 views

ZsaDescriptors A Library

Uploaded by

jiwifan102
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

ZsaDescriptors A Library

Uploaded by

jiwifan102
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Zsa.

Descriptors: a library for real-time descriptors


analysis
Mikhail Malt, Emmanuel Jourdan

To cite this version:


Mikhail Malt, Emmanuel Jourdan. Zsa.Descriptors: a library for real-time descriptors analysis. 5th
Sound and Music Computing Conference, Berlin, Germany, Jul 2008, Berlin, Germany. pp.134-137.
�hal-01580326�

HAL Id: hal-01580326


https://hal.science/hal-01580326
Submitted on 1 Sep 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Zsa.Descriptors:
a library for real-time descriptors analysis

Mikhail Malt *, Emmanuel Jourdan†


*
IRCAM, Paris, France, [email protected]

IRCAM, Paris, France, [email protected]

characteristics of a given sound signal than the


I.INTRODUCTION use of one descriptor at a time;
In the past few years, several strategies to • The lack of a large choice of descriptors in real-
characterize sound signals have been suggested. The time so that artists can test them and learn to use
main objective of these strategies was to describe the them.
sound [1]. However, it was only with the creation of a
new standard format for indexing and transferring audio
MPEG 7 data that the desire to define audio data
semantic content descriptors came about [2, p.52]. The
II.REAL-TIME ENVIRONMENTS AND D ESCRIPTORS
widely known document written by Geoffroy Peeters
[1] is an example where, even if the goal announced is Among the most widely used software environments
not to carry out a systematic taxonomy on all the for real-time musical performances are SuperCollider
functions intended to describe sound, it does in fact [7], PureData [8], and Max/MSP [9]. Max/MSP offers
systematize the presentation of various descriptors. the largest selection of tools to work with sound
descriptors. Currently, several libraries offering
analyses of descriptors are available in Max/MSP. The
A. Descriptors Today best known of these environments include the library by
Tristan Jehan [10] [11] (pitch~, loudness~, brightness~ ,
A large percentage of the uses for descriptors concern
noisiness~, bark~, analyzer~, shifter~, segment~,
primarily indexing and browsing contents of sound
beat~), the iana~ object of Todor Todoroff, the yin~
databases or for re-synthesis of sounds, such as in
object implemented by Norbert Schnell, according to
telephone transmissions.
the Cheveigné and Kawara model [12], the FTM/Gabor
Our interest concerns the use of the analysis of object library [13] [14] that enables development of
descriptors in real-time for the creation and analysis of descriptors, and finally the classic fiddle~ and bonk~ by
contemporary music. In this domain, with the exception Miller Puckette [15].
of the fundamental frequency and the energy of the
However, a large number of the descriptors offered
sound signal, the use of spectral descriptors is still rare.
are, as we have already mentioned, based on the
It is important nonetheless, to examine the experiments
recognition of the fundamental frequency and the
on computer-assisted improvisation carried out by
energy. The only exceptions are the descriptors offered
Assayag, Bloch, and Chemillier [3] [4] and the
in the Gabor library, but they do not cover yet a large
developments in “concatenative synthesis” by Diemo
set of descriptors.
Schwarz [5] where the analysis of a variety of sound
descriptors is used to control re-synthesis.

B. Descriptors and Music Composition III.THE FIRST DESCRIPTORS SET AVAILABLE IN


The fact that descriptors are rarely used in ZSA.DESCRIPTORS
contemporary music compositions is due to several The Zsa.Descriptors library is intended to provide a
factors including: set of audio descriptors specially designed to be used in
• The lack of knowledge of the relationships real-time. This objects collection encloses a sound
between descriptors and the pertinent perceptual descriptors set coming from the MPEG-7 Descriptors,
characteristics of the sound for use in musical outlined by Peeters [1], algorithms for peak search from
composition; Serra [16] and some ideas from the Computer Assisted
• The fact that one descriptor is not sufficient in Composition developments realized by the Musical
order to characterize a complex “sound state” Representation Team at Ircam. In the next paragraphs
such as that of a note played “live.” Recent we will describe some of this tools, already developed
studies [6] show how the composed functions of in the Zsa.Descriptors library.
descriptors are more effective in recognizing the
A. Spectral Centroid (brightness) n"1

This is a very well known descriptor. The Spectral # a[i] " a[1]
centroid is the barycentre of spectra, computed as i=0
follow: decrease = n"1

n"1 # a[i](i "1)


# f [i]a[i] i=2:K

i=0
µ= n"1
E. Spectral Roll-Off
# a[i] The spectral roll-off point is the frequency fc [i] so
i=0 !
Where: that x% of the signal falls below this frequency. “ x ” is
n , is the half of the fft window size took as 0.95 as default value. The roll-off point is
calculated as follow:
i , the bin index fc [i] n#1
a[i] , is
!the amplitude of the bin i , the real part of the
! FFT calculus
!
" a [ f [i]] = x" a! [ f [i]]
2 ! 2

i=0 i=0
! f [i], is the frequency of the bin i . where where:
! ! samplerate fc [i], is the roll-off point and
f [i] = i *
fft window size x , the roll_off energy percent accumulated.
! ! !
and
µ , is the spectral centroid in hertz. ! F. Sinusoidal model based on peaks detection
!
We have based the calculus of our algorithm on the
! widely known method defined by Smith&Serra [16]
B. Spectral Spread (spectral centroid variance)
! [18, p. 38-48], where a peak is defined as a local
As usual, we consider the spectral centroid as the first maximum in the real magnitude spectrum ak [i] .“ k ” is
moment of spectra, considered as a frequency the frame index. As not all the peaks are equally
distribution, which is related with the weighted important in the spectrum, we have used a sliding five
frequency mean value. The spectral spread is the second points window to scan the magnitude spectra, avoiding
moment, i.e., the variance of the mean calculated above. undesired peaks. For each 5 magnitudes ! !vector we
n"1
2
check for the third point ak [2], and for a given
# ( f [i] " µ) a[i] threshold value " t , we compute:
v= i=0 ak [2] = max{ak [0], ak [1],..., ak [4]} " ak [2] > # t .
n"1
If the condition
! is true, then ak [2] becomes a peak.
# a[i] A parabolic interpolation is then applied on the three
i=0
!
! points, ak [1], ak [2], ak [3] .
adjacent
Solving the parabola ! peak location [18, p. 47], a
C. Spectral Slope coefficient “ p ” of the “ j ” peak is then calculated:
The spectral slope is an estimation of the amount of
! magnitude decreasing, computed by a linear
spectral ! 1 ak [1] " ak [3]
pj =
regression on the magnitude spectra. 2 ak [1] " 2ak [2]+ ak [3]
n"1 n"1 n"1 ! ! location (in bins) is given by:
The true peak
n# f [i]a[i] " # f [i]# a[i] i peak[ j ] " ia k [2 ] + p j
1 i=0 i=0 i=0
slope = n"1 To estimate the true magnitude we use p as follow:
n"1 $ n"1 '2 !
# a[i] n# f [i] " &# f [i])
2
1
ak peak[ j ] " ak [2] # (ak [3] # ak [1]) p j
i=0 i=0 % i=0 ( 4
!
At the end of the process we have collected a set of
D. Spectral Decrease partials pk j = ( f j , a j ) .
The spectral decrease meaning is similar to spectral
! slope, representing the amount of spectral magnitude !
decreasing. According to Peteers [1], this formulation G. A Tempered Virtual fundamental
comes from perceptual studies and it is supposed to be This descriptor was based on the harmonic histogram
more correlated to human perception. !
technic described by Jean Laroche [20, p.52-53]. We
adapted this method in order to approximate the result
and the research phase for the “best candidate”, by a
tempered musical scale with a given division.
Given a set of peaks pk j = ( f j , a j ) , calculated as object that can load special designed “patchers”. The
“pfft~” object takes at least three arguments: the patcher
showed previously, name, the FFT window size and an overlap factor, to
1) For each pk j = ( f j , a j ) we calculate a set of calculate the hop size (Fig. 1). The loaded “patcher”
must also follow a general structure. This patcher must
" fj % have at least an “fftin~” object. The pfft~ object
pk jn = $ ! , a j ', n ( N, n ) [1,.., 6] . manages the windowing and overlap of the incoming
#n & signal, fftin~ applies the window function (envelope)
and performs the FFT. The fftin~ object takes two
! fj arguments, the "inlet assignment" and the name of the
2) All the were converted in indexes, i jn , in a window envelop function (hanning, hamming, square
n and blackman are included), or the name of buffer~. It
pitch-class space. At this level i jn " R .
! is therefore possible to use any kind of window
3) The i jn were approximated by a grid of discrete depending on the type of sound that we want to analyse
values multiples of q , q " R, q #
! ![0,1] , returning a
new set of values i qjn , multiples of q . Notice that q can
!
be seen as a half-tone division. q = 1, means an
!approximation by a half-tone, q = 0.5 an approximation
by a quarter-tone,
!! ! and so on.
! couples pk! Fig. 2 the fftin~ object
jn = (i jn , a j ) .
q q
4) This leads us to new
!
!
5) Collecting all couples according with the identical Therefore, most of the objects of the library, was
designed to run inside the standard MAX/MSP pfft~
i q
jn , we build new couples
q
jn (
pk = i , " a j , where
q
jn ) object (Fig. 3). This strategy offers multiple advantages:
modularity, efficiency, and also the ability of using the
"a j is the sum of all!a j for the identical i q
jn . analysis directly as parameter for sound processing in
6) The best candidate to be our virtual fundamental the spectral domain.

! will be the ( )
pk qjn!= i qjn , " a j , that maximises "a .
j
q
! last phase, i jn is converted,
7) In the ! ! in floating point
MIDI pitches or in a frequency space.

! !
!
IV.THE SOLUTION OFFERED BY ZSA.D ESCRIPTORS
As was exposed previously, the main goal of
Zsa.Descriptors, a library of sound descriptors and
spectral analysis tools, is to expand the capabilities of
Fig. 3 Interior of the pfft~ object
sound description using the systematic approach of the
MPEG7 standard, and to offer a set of truly integrated
Furthermore, the fact that the objects of this library
objects for the Max/MSP [9] graphical programming
can work within the Max/MSP environment either
environment. In addition to sound descriptors and
together or by themselves and the fact that they work
original analysis features, the external objects of the
smoothly in conjunction with other standard Max/MSP
Zsa.Decriptors library are designed to compute multiple
objects, makes it possible to exploit all the
descriptors in real-time with both efficiency in terms of
synchronization resources available in this environment.
CPU usage and guaranteed synchronization. In
consequence a modular approach was chosen. In the
MAX/MSP environment context, this was made possible
by sharing the expensive process of the windowed FFT.
V.CONCLUSIONS AND PERSPECTIVES
We have presented in this paper a set of sound
descriptors and signal analysis tools intended to be used
in real-time as a toolbox for composers, musicologist
and researchers. This will allow the use and the research
on the use of sound descriptors in the fields of
systematic musicology and as a tool for taking decisions
in the real-time performance context. As part of the
Fig. 1 the pfft~ object future work, we have also planed a research on the
musical segmentation based on sound descriptors as a
MAX/MSP, actually has an object that calculates, in strategy to musical analysis.
an efficient form, a windowed FFT, the “pfft~” facility.
The main advantage, of this library, more than the
As is said, in the documentation, “pfft~”, is a small improvements we did in some algorithms, was the
“Spectral processing manager for patchers”, i.e., an modular development technique and implementation we
have used, trying to optimise the calculus by a strong [14] N. Schnell, D. Schwarz, “Gabor, multi-representation real-
integration in the MAX/MSP environment. time analysis/synthesis”, Proc. of the 8th Int. Conference on Digital
Audio Effects (DAFx’05), Madrid, Spain, September 20-22, 2005
Of course, the work presented here still preliminary, [15] M. Puckette, T. Apel,. « Real-time audio analysis tools for
but it will be improved with the implementation of the Pd and MSP ». Proceedings, International Computer Music
following list of features, which are already Conference. San Francisco: International Computer Music
implemented or currently being developed: temporal Association, 1998, pp. 109-112.
variation of the spectrum, bark, inharmonicity, [16] J.O. Smith, X. Serra, “PARSHL: an analysis/synthesis
harmonic spectral deviation, odd to Even harmonic program for non- harmonic sounds based on a sinusoidal
energy ratio, tristimulus, frame energy, harmonic part representation”. Proc. 1987 Int. Computer Music Conf. (ICMC’87),
energy (this harmonic descriptors will use a Urbana, Illinois, August 1987, pp. 290 -297.
monophonic F0 algorithm developed by Chunghsin [17] B. Doval, X. Rodet, “Fundamental frequency estimation
YEH in his Ph.D. thesis [21]), noise part energy, and and tracking using maximum likelihood harmonic matching and
HMMs.” Proceedings of the ICASSP ‘93, 1993, pp. 221- 224.
others descriptors coming from the signal processing
[18] X. Serra, A system for sound
and computer assisted composition worlds. analysis/transformation/synthesis based on a deterministic plus
stochastic decomposition. Philosophy Dissertation, Stanford
University, Oct. 1989
[19] X. Rodet, “Musical Sound Signal Analysis/Synthesis:
Sinusoidal+Residual and Elementary Waveform Models”, in
ACKNOWLEDGMENT TFTS'97 (IEEE Time-Frequency and Time-Scale Workshop 97),
We would like to thank Richard Dudas for his fruitful Coventry, Grande Bretagne, august 1997.
remarks, comments and suggestions, Arshia Cont for [20] J. Laroche, Traitement des signaux audio-fréquences,
his friendly support and Cyril Beros for funding the TELECOM, Handout, Paris, France, February 1995.
travel support. [21] C. YEH, Multiple fundamental frequency estimation of
polyphonic recordings, Ph.D. thesis, Université Paris 6, 2008.

REFERENCES

[1] G. Peeters, A large set of audio features for sound


description (similarity and classification) in the CUIDADO project.
Cuidado projet report, Institut de Recherche et de Coordination
Acoustique Musique (IRCAM), 2004.
[2] F. Gouyon, Extraction automatique de descripteurs
rythmiques dans des extraits de musiques populaires polyphoniques,
mémoire de DEA ATIAM, Université de la Méditérranée, Université
Paris VI, IRCAM, Télécom-Paris, université du Maine, Ecole
Normale Supérieure, ACROE-IMAG, Juillet 2000.
[3] G. Assayag, G. Bloch, M. Chemillier, A. Cont, S. Dubnov
– « Omax Brothers : A Dynamic Topology of Agents for
Improvization Learning », in Workshop on Audio and Music
Computing for Multimedia, ACM Multimedia 2006, Santa Barbara,
USA, October 2006.
[4] G. Assayag, G. Bloch, M. Chemillier – « OMax-Ofon », in
Sound and Music Computing (SMC) 2006, Marseille, France, Mai
2006
[5] D. Schwarz, « Current research in concatenative sound
synthesis », Proceedings of the International Computer Music
Conference (ICMC), Barcelona, Spain, September 5-9, 2005.
[6] Zils A., Extraction de descripteurs musicaux: une approche
évolutionniste, Thèse de Doctorat de l'Université Paris 6, Septembre
2004.
[7] SuperCollider, © James McCartney,
http://www.audiosynth.com/
[8] PureData, © Miller Puckette,
http://crca.ucsd.edu/~msp/Pd_documentation/
[9] Max/MSP, © Cycling74, www.cycling74.com
[10] T. Jehan, B. Schoner, « An Audio-Driven, Spectral
Analysis-Based, Perceptual Synthesis Engine », in Audio
Engineering Society, Proceedings of the 110th Convention,
Amsterdam, The Netherlands, 2001.
[11] T. Jehan, Creating Music by Listening, PhD Thesis in
Media Arts and Sciences, Massachusetts Institute of Technology,
September 2005.
[12] A. De Cheveigné, H. Kawahara, « YIN, a fundamental
frequency estimator for speech and music », J. Acoust. Soc. Am.
111, 1917-1930, 2002.
[13] N.Schnell et al. « FTM Complex Data Structures for
Max/MSP », in ICMC 2005, Barcelona, Spain, 2005..

You might also like