Digital Sound Synthesis by Physical Modelling
Digital Sound Synthesis by Physical Modelling
1. Introduction
The last 150 years have seen tremendous advances in
electrical, electronical, and digital information transmission
and processing. From the very beginning, the available
technology has not only been used to send written or spoken
messages but also for more entertainig purposes: to make
music! An early example is the Musical Telegraph of Elisha
Gray in 1876, based on the telephone technology of that
time. Later examples used vacuum tube oscilators throughout the first half of last century, transistorized analog synthesizers in the 1960s, and the first digital instruments in
the 1970s. By the end of last century, digital soundcards
with various methods for sound reproduction and generation were commonplace in any personal computer.
The development is rapidly going on. One driving force
is certainly the availablity of ever more powerful hardware.
Cheap memory allows to store sound samples in high quality and astonishing variety. The increase in processing
power makes it possible to compute sounds in real time.
But also new algorithms and more powerful software
give desktop computers the functionality of stereo equipment or sound studios. An example are new coding schemes
for high quality audio. Together with rising bitrates for
file transmission on the internet, they have made digital
music recordings freely avaiable on the world wide web.
Another example is the combination of high performance
sound cards, high capacity and fast access hard disks, and
sophisticated software for audio recording, processing and
mixing. A high-end personal computer equipped with these
components and programs provides the full functionality for
a small home recording studio.
While more powerful hard- and software turn a single
computer into a music machine, advances in standardization pave the way to networked solutions. The benefits of
audio coding standards has already been mentioned. But
the new MPEG-4 video and audio coding standard does not
only provide natural but also synthetic audio coding. This
means, that not only compressed samples of recorded music
can be transmitted, but also digital scores similar to MIDI
in addition to algorithms for the sound generation. Finally
the concept of Structured Audio allows to break down an
acoustic scene into their components and to transmit and
manipulate them independently.
While natural audio coding is a well researched subject
with widespread applications, the creation of synthetic high
quality music is a topic of active development. For some
time, applications have been confined to the refinement of
digital musical instruments and software synthesizers. Recently, digital sound synthesis finds its way into the MPEG4 video and audio coding standard. The most recent and
maybe most interesting family of synthesis algorithms is
based on physical models of vibrating structures.
This article will higlight some of the methods for digital sound synthesis with special emphasis on physical modelling. Section 2 presents a survey of synthesis methods.
Two algorithms for physical modelling are described in section 3. Applications to computer music are given in section 4.
f (t) =
Fl (t) l (t) :
(1)
f (t) =
!m
VCO 1
!0
VCO 2
f (t)
(2)
(3)
The implementation consists of at least two coupled oscillators. In (3) the carrier sin(!0 t) is modulated by the timedependent modulator (t) such that the frequency becomes
time-dependent with ! (t) = !0 + (@=@t)(t). If the modulator is also sinusoidal with (t) = sin(!m t) as shown
in Fig. 1 the resulting spectrum consists of the carrier frequency !0 and side frequencies at !0 n!m ; n 2 N . The
relations between the amplitudes of the discrete frequencies can be varied with the modulation index . They are
given by the values of the Bessel functions of order n with
argument . Four different FM spectra for !0 = 1 kHz
and different modulator frequencies and different modulation indices are shown in Fig. 2. The spectrum for = 1
has a simple rational relation between !0 and !m resulting
in a harmonic spectrum. Increasing the modulation index
to = 2 preserves the distance of the frequency lines but
increases their number (top right). A slight decrease of !m
moves the frequency components closer together and produces a non-harmonic spectrum (bottom left). Spectrally
very rich sounds can be produced by combining small values of the modulation frequency !m with high modulation
indices, as shown for = 8. However, due to the dependence on only a few parameters, arbitrary spectra as in additive synthesis cannot be produced. Therefore this method
fails to reproduce natural instruments. Nevertheless FM is
frequently used in synthesizers and in sound cards for personal computers, often with more than just two oscillators
in a variety of different connections.
=1
0.5
0
0
1
0.5
1
f in kHz
0
0
1
=2
0.5
0
0
=2
1
f in kHz
2
=8
0.5
1
f in kHz
0
0
1
f in kHz
3. Physical Modelling
Sound synthesis by physical modelling requires two essential steps: the description of a vibrating structure by the
principles of physics and the transformation into a discretetime, discrete-space model which is suitable for computer
implementation. Each step requires certain simplifications
and allows variations. These are discussed in the following
sections.
@ 2 y (x; t)
@x2
@ 2 y (x; t)
=0:
@t2
(4)
For sound generation, transversal waves are more important, since they transmit energy to the resonance body and
the surrounding air. They are characterized by a fourthorder spatial derivative
EI
@4y
@ 2 y (x; t)
+ A
=0:
4
@x
@t2
(5)
Typically, a string is under strain by a certain force F , resulting in an additional second order term
EI
@4y
@x4
@2y
@ 2 y (x; t)
+
A
=0:
@x2
@t2
(6)
@2y
@ 2 y (x; t)
+ A
2
@x
@t2
@y
@3y
+d1
+ d3
= f (x; t) : (7)
@t
@t@x2
Note that for rigid (E = 0) or very thin (I = 0) strings with
no damping (d1 = d3 = 0) (7) has the same structure as the
EI
@4y
@x4
y (x; t) = yl (x + ct) + yr (x
p
ct)
(8)
The dot denotes time derivation y_ (x; t) = @y=@t. The initial profile of a string plucked close to x1 is shown in Fig. 5,
while Fig. 6 shows the initial velocity of string struck by
a hammer at the position xe . In general, both yi0 (x) and
yi1 (x) can be specified independently from each other.
t=0;
t=0:
(11)
(12)
y (x0 ; t) = 0;
y 00 (x0 ; t) = 0;
y (x1 ; t) = 0;
y 00 (x1 ; t) = 0:
(9)
(10)
The double prime denotes the second order spatial derivative y 00 = @ 2 y=@x2 .
Elastic fixing at the ends of the string or interface conditions to the sound board are described in the same way,
e.g. by prescribing a certain linear combination of y (x0 ; t)
and y 00 (x0 ; t). The boundary conditions can also include an
excitation function at the boundary as they occur in wood
wind instruments.
Typical excitation modes for musical instruments are to
pluck or to struck the string. These modes are expressed
in mathematical terms as initial conditions of the PDE. Because the highest time derivative in (7) is two, we need two
y [m; k ] = yl [m + k ] + yr [m
k]
(13)
Boundary conditions are considerd by a proper termination of the dual delay line waveguide. These are also realized by digital filters as shown in Fig. 10. Similar to the
filters H (z ) for loss and dispersion, the boundary reflection
filters Rl (z ) and Rr (z ) represent
into a smaller number of different filters. In practical implementations, their transfer functions are not derived directly
from H (z ), Rl (z ), and Rr (z ). Instead they are designed to
produce a certain waveform. It has turned out that three different filters are adequate to model the correct pitch, dispersion and frequency dependent damping [14]. Fig. 11 shows
the resulting arrangement of a single delay line and three
digital filters.
f (t)
y (t)
Hfd
Hdisp
HTP
y(x; t)
c2 y 00 (x; t)
y (x; 0)
y_ (x; 0)
y (x0 ; t)
y (x1 ; t)
=
=
=
=
=
x0 < x < x1
x0 < x < x1
x0 < x < x1
x = x0
x = x1
0
yi0 (x)
yi1 (x)
0
0
(14)
s2 Y (x; s)
T fY 00 (x)g =
Zx1
x0
Note that the second order time derivative has turned into a
multiplication with s2 and that the initial values from (14)
appear as additive terms on the right hand side.
To remove also the spatial derivative and to consider the
boundary conditions, we apply the spatial transformation
Zx1
T fY (x)g = Y ( ) = Y (x)K ( ; x) dx
(16)
x0
K ( ; x) = K sin (x
x0 )
(17)
=
x1
x0
2N:
(18)
This special form of the spatial transformation (finite sinetransformation) has been chosen, because the transformation kernel from (17) fulfills the same boundary conditions
as the deflection y (x; t) of the string (compare (14)
K ( ; x0 ) = 0
K ( ; x1 ) = 0
(19)
(20)
y (x) = T
fY ( )g
N
X
1
=0
N
(22)
Application of (16) and (22) now turns the boundaryvalue problem (16) into an algebraic equation
Y ( ; s) =
s2
s
1
yi0 ( ) + 2
y ( ) (24)
2
2
+ c
s + c2 2 i1
Y 00 (x)K (; x) dx = 2 Y ( ) :
y( )K ( ; x)
(21)
with the transfer function for the inital values yi0 ( ) and
yi1 ( )
G i0 ( ; s) =
G i1 ( ; s) =
s
;
s2 + c2 2
(26)
(27)
s2 + c2 2
s2
s
+ c2 2
z2
s2
1
+ c2 2
z2
z2
z cos(! T )
2z cos(! T ) + 1
z sin(! T )=!
2z cos(! T ) + 1
Y d ( ; z ) =
z2
z cos(! T )
yi0 ( ) +
2z cos(! T ) + 1
z sin(! T )=!
yi1 ( ) :
2
z
2z cos(! T ) + 1
z2
(28)
sin(! T )
!
yi1 ( )
cos(! T ) yi0 ( )
0 (k
1)
(29)
Higher order spatial operators require a careful construction of the spatial transformation T . The suitable
theoretical framework for this task is the theory of special boundary-value problems of the Sturm-Liouville
type.
The PDEs in section 3.1 exhibit more complex differentiation operators in time and space than the wave
equation presented above. Higher order differential
operators with respect to time simply introduce higher
order polynomials in s into the transfer functions, resulting in recursive systems of higher order.
Hammered String. To model a real hammer-string interaction the dynamic of the hammer has to be taken into
account. The hammer deflection can be modeled by one
second order recursive system. The input force for this recursive system is the negative input force for the recursive
systems of the string. The hammer interacts nonlinear with
the string because of the nonlinearity of the force-deflection
law of the hammer felt. The input variable is here the initial
hammer velocity vh . The algorithm is shown in figure 17.
The nonlinear operation includes a delay for computability.
References
5. Conclusions
Digital sound synthesis is an emerging application for
multimedia processing. With ever increasing computing
power, real-time implementation of demanding physical
models has become feasable. The advantage of physical
modelling over conventional sound reproduction or synthesis methods lies in the combination of highly flexible and at
the same time physically correct models. The high flexiblity
allows the player of a virtual instrument to control all parameters of the model during operation, while the physical
correctness ensures stable operation and meaningful results
with all parameter variations.
Future developements are expected in different directions. The complexity of the modells for strings, membranes, bells, tubes and other obejcts will cetrainly increase.
Furthermore, also the interactions between different kinds
of models for different components of an instrument have
to be established and implemented. Finally, the control of
the player over the virtual instrument will be extended by
new, human gesture based interfaces.
[1] A. Chaigne and A. Askenfelt. Numerical simulations of piano strings. I. A physical model for a struck string using
finite difference methods. J. Acoust. Soc. Am., 95(2):1112
1118, 1994.
[2] M. Goodwin and M. Vetterli. Time-frequency signal models
for music analysis, transformation, and synthesis. In Proc.
IEEE Int. Symp. on Time-Frequency and Time-Scale Analysis, pages 133136, 1996.
[3] M. Kahrs and K. Brandenburg, editors. Application of Digital Signal Processing to Audio and Acoustics. Kluwer Academic Publishers, Boston, 1998.
[4] G. D. Poli, A. Piccialli, and C. Roads. Representation of
Musical Signals. MIT Press, Cambridge, Mass., 1991.
[5] C. Roads, S. Pope, A. Piccialli, and G. D. Poli, editors. Musical Signal Processing. Swets & Zeitlinger, Lisse, 1997.
[6] T. Rossing and N. Fletcher. Principles of Vibration and
Sound. Springer, New York, 1995.
[7] G. P. Scavone. Digital waveguide modeling of the non-linear
excitation of single reed woodwind instruments. In Proc. Int.
Computer Music Conference, 1995.
[8] E. D. Scheirer, R. Vaa nanen, and J. Houpaniemi. AudioBIFS: Describing audio scenes with the MPEG-4 multimedia
standard. IEEE Transactions on Multimedia, 1(3):237250,
September 1999.
[9] J. O. Smith. Physical modeling using digital waveguides.
Computer Music Journal, 16(4):7491, 1992.
[10] M. Tohyama, H. Suzuki, and Y. Ando. The Nature and Technology of Acoustic Space. Academic Press, London, 1995.
[11] L. Trautmann, S. Petrausch, and R. Rabenstein. Physical
modeling of drums by transfer function methods. In Proc.
Int. Conf. Acoustics, Speech, and Signal Proc. (ICASSP01).
IEEE, 2001.
[12] L. Trautmann and R. Rabenstein. Digital sound synthesis
based on transfer function models. In Proc. Workshop on
Applications of Signal Processing to Audio and Acoustics
(WASPAA). IEEE, 1999.
[13] R. Vaa nanen. Synthetic audio tools in MPEG-4 standard. In
Proc. 108th AES Convention. Audio Engeneering Society,
February 2000. Preprint 5080.
[14] V. Valimaki, J. Huopaniemi, and M. Karjalainen. Physical
modeling of plucked string instruments with application to
real-time sound synthesis. Journal Audio Engineering Soc.,
44(5):331353, 1996.
[15] V. Valimaki and T. Takala. Virtual musical instruments
natural sound using physical models. Organised Sound,
1(2):7586, 1996.
[16] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer. Structured
audio: Creation, transmission, and rendering of parametric
sound representations. Proc. of the IEEE, 86(5):922940,
1998.
[17] L. J. Ziomek. Fundamentals of Acoustic Field Theory and
Space-Time Signal Processing. CRC Press, Boca Raton,
1995.
[18] G. Zoia and C. Alberti. An audio virtual DSP for multimedia frameworks. In Proc. Int. Conf. Acoustics, Speech, and
Signal Proc. (ICASSP01), 2001.