ZsaDescriptors A Library
ZsaDescriptors A Library
This is a very well known descriptor. The Spectral # a[i] " a[1]
centroid is the barycentre of spectra, computed as i=0
follow: decrease = n"1
i=0
µ= n"1
E. Spectral Roll-Off
# a[i] The spectral roll-off point is the frequency fc [i] so
i=0 !
Where: that x% of the signal falls below this frequency. “ x ” is
n , is the half of the fft window size took as 0.95 as default value. The roll-off point is
calculated as follow:
i , the bin index fc [i] n#1
a[i] , is
!the amplitude of the bin i , the real part of the
! FFT calculus
!
" a [ f [i]] = x" a! [ f [i]]
2 ! 2
i=0 i=0
! f [i], is the frequency of the bin i . where where:
! ! samplerate fc [i], is the roll-off point and
f [i] = i *
fft window size x , the roll_off energy percent accumulated.
! ! !
and
µ , is the spectral centroid in hertz. ! F. Sinusoidal model based on peaks detection
!
We have based the calculus of our algorithm on the
! widely known method defined by Smith&Serra [16]
B. Spectral Spread (spectral centroid variance)
! [18, p. 38-48], where a peak is defined as a local
As usual, we consider the spectral centroid as the first maximum in the real magnitude spectrum ak [i] .“ k ” is
moment of spectra, considered as a frequency the frame index. As not all the peaks are equally
distribution, which is related with the weighted important in the spectrum, we have used a sliding five
frequency mean value. The spectral spread is the second points window to scan the magnitude spectra, avoiding
moment, i.e., the variance of the mean calculated above. undesired peaks. For each 5 magnitudes ! !vector we
n"1
2
check for the third point ak [2], and for a given
# ( f [i] " µ) a[i] threshold value " t , we compute:
v= i=0 ak [2] = max{ak [0], ak [1],..., ak [4]} " ak [2] > # t .
n"1
If the condition
! is true, then ak [2] becomes a peak.
# a[i] A parabolic interpolation is then applied on the three
i=0
!
! points, ak [1], ak [2], ak [3] .
adjacent
Solving the parabola ! peak location [18, p. 47], a
C. Spectral Slope coefficient “ p ” of the “ j ” peak is then calculated:
The spectral slope is an estimation of the amount of
! magnitude decreasing, computed by a linear
spectral ! 1 ak [1] " ak [3]
pj =
regression on the magnitude spectra. 2 ak [1] " 2ak [2]+ ak [3]
n"1 n"1 n"1 ! ! location (in bins) is given by:
The true peak
n# f [i]a[i] " # f [i]# a[i] i peak[ j ] " ia k [2 ] + p j
1 i=0 i=0 i=0
slope = n"1 To estimate the true magnitude we use p as follow:
n"1 $ n"1 '2 !
# a[i] n# f [i] " &# f [i])
2
1
ak peak[ j ] " ak [2] # (ak [3] # ak [1]) p j
i=0 i=0 % i=0 ( 4
!
At the end of the process we have collected a set of
D. Spectral Decrease partials pk j = ( f j , a j ) .
The spectral decrease meaning is similar to spectral
! slope, representing the amount of spectral magnitude !
decreasing. According to Peteers [1], this formulation G. A Tempered Virtual fundamental
comes from perceptual studies and it is supposed to be This descriptor was based on the harmonic histogram
more correlated to human perception. !
technic described by Jean Laroche [20, p.52-53]. We
adapted this method in order to approximate the result
and the research phase for the “best candidate”, by a
tempered musical scale with a given division.
Given a set of peaks pk j = ( f j , a j ) , calculated as object that can load special designed “patchers”. The
“pfft~” object takes at least three arguments: the patcher
showed previously, name, the FFT window size and an overlap factor, to
1) For each pk j = ( f j , a j ) we calculate a set of calculate the hop size (Fig. 1). The loaded “patcher”
must also follow a general structure. This patcher must
" fj % have at least an “fftin~” object. The pfft~ object
pk jn = $ ! , a j ', n ( N, n ) [1,.., 6] . manages the windowing and overlap of the incoming
#n & signal, fftin~ applies the window function (envelope)
and performs the FFT. The fftin~ object takes two
! fj arguments, the "inlet assignment" and the name of the
2) All the were converted in indexes, i jn , in a window envelop function (hanning, hamming, square
n and blackman are included), or the name of buffer~. It
pitch-class space. At this level i jn " R .
! is therefore possible to use any kind of window
3) The i jn were approximated by a grid of discrete depending on the type of sound that we want to analyse
values multiples of q , q " R, q #
! ![0,1] , returning a
new set of values i qjn , multiples of q . Notice that q can
!
be seen as a half-tone division. q = 1, means an
!approximation by a half-tone, q = 0.5 an approximation
by a quarter-tone,
!! ! and so on.
! couples pk! Fig. 2 the fftin~ object
jn = (i jn , a j ) .
q q
4) This leads us to new
!
!
5) Collecting all couples according with the identical Therefore, most of the objects of the library, was
designed to run inside the standard MAX/MSP pfft~
i q
jn , we build new couples
q
jn (
pk = i , " a j , where
q
jn ) object (Fig. 3). This strategy offers multiple advantages:
modularity, efficiency, and also the ability of using the
"a j is the sum of all!a j for the identical i q
jn . analysis directly as parameter for sound processing in
6) The best candidate to be our virtual fundamental the spectral domain.
! will be the ( )
pk qjn!= i qjn , " a j , that maximises "a .
j
q
! last phase, i jn is converted,
7) In the ! ! in floating point
MIDI pitches or in a frequency space.
! !
!
IV.THE SOLUTION OFFERED BY ZSA.D ESCRIPTORS
As was exposed previously, the main goal of
Zsa.Descriptors, a library of sound descriptors and
spectral analysis tools, is to expand the capabilities of
Fig. 3 Interior of the pfft~ object
sound description using the systematic approach of the
MPEG7 standard, and to offer a set of truly integrated
Furthermore, the fact that the objects of this library
objects for the Max/MSP [9] graphical programming
can work within the Max/MSP environment either
environment. In addition to sound descriptors and
together or by themselves and the fact that they work
original analysis features, the external objects of the
smoothly in conjunction with other standard Max/MSP
Zsa.Decriptors library are designed to compute multiple
objects, makes it possible to exploit all the
descriptors in real-time with both efficiency in terms of
synchronization resources available in this environment.
CPU usage and guaranteed synchronization. In
consequence a modular approach was chosen. In the
MAX/MSP environment context, this was made possible
by sharing the expensive process of the windowed FFT.
V.CONCLUSIONS AND PERSPECTIVES
We have presented in this paper a set of sound
descriptors and signal analysis tools intended to be used
in real-time as a toolbox for composers, musicologist
and researchers. This will allow the use and the research
on the use of sound descriptors in the fields of
systematic musicology and as a tool for taking decisions
in the real-time performance context. As part of the
Fig. 1 the pfft~ object future work, we have also planed a research on the
musical segmentation based on sound descriptors as a
MAX/MSP, actually has an object that calculates, in strategy to musical analysis.
an efficient form, a windowed FFT, the “pfft~” facility.
The main advantage, of this library, more than the
As is said, in the documentation, “pfft~”, is a small improvements we did in some algorithms, was the
“Spectral processing manager for patchers”, i.e., an modular development technique and implementation we
have used, trying to optimise the calculus by a strong [14] N. Schnell, D. Schwarz, “Gabor, multi-representation real-
integration in the MAX/MSP environment. time analysis/synthesis”, Proc. of the 8th Int. Conference on Digital
Audio Effects (DAFx’05), Madrid, Spain, September 20-22, 2005
Of course, the work presented here still preliminary, [15] M. Puckette, T. Apel,. « Real-time audio analysis tools for
but it will be improved with the implementation of the Pd and MSP ». Proceedings, International Computer Music
following list of features, which are already Conference. San Francisco: International Computer Music
implemented or currently being developed: temporal Association, 1998, pp. 109-112.
variation of the spectrum, bark, inharmonicity, [16] J.O. Smith, X. Serra, “PARSHL: an analysis/synthesis
harmonic spectral deviation, odd to Even harmonic program for non- harmonic sounds based on a sinusoidal
energy ratio, tristimulus, frame energy, harmonic part representation”. Proc. 1987 Int. Computer Music Conf. (ICMC’87),
energy (this harmonic descriptors will use a Urbana, Illinois, August 1987, pp. 290 -297.
monophonic F0 algorithm developed by Chunghsin [17] B. Doval, X. Rodet, “Fundamental frequency estimation
YEH in his Ph.D. thesis [21]), noise part energy, and and tracking using maximum likelihood harmonic matching and
HMMs.” Proceedings of the ICASSP ‘93, 1993, pp. 221- 224.
others descriptors coming from the signal processing
[18] X. Serra, A system for sound
and computer assisted composition worlds. analysis/transformation/synthesis based on a deterministic plus
stochastic decomposition. Philosophy Dissertation, Stanford
University, Oct. 1989
[19] X. Rodet, “Musical Sound Signal Analysis/Synthesis:
Sinusoidal+Residual and Elementary Waveform Models”, in
ACKNOWLEDGMENT TFTS'97 (IEEE Time-Frequency and Time-Scale Workshop 97),
We would like to thank Richard Dudas for his fruitful Coventry, Grande Bretagne, august 1997.
remarks, comments and suggestions, Arshia Cont for [20] J. Laroche, Traitement des signaux audio-fréquences,
his friendly support and Cyril Beros for funding the TELECOM, Handout, Paris, France, February 1995.
travel support. [21] C. YEH, Multiple fundamental frequency estimation of
polyphonic recordings, Ph.D. thesis, Université Paris 6, 2008.
REFERENCES