Ramakrishnan, C. Zirkonium, Non-Invasive Software For Sound Spatialisation
Ramakrishnan, C. Zirkonium, Non-Invasive Software For Sound Spatialisation
spatialisation*
CHANDRASEKHAR RAMAKRISHNAN
Native Systems Group, ETH Zürich, Clausiusstrasse 59, CH-8006 Zürich, Switzerland
E-mail: [email protected]
Zirkonium is a flexible, non-invasive, open-source program for the Klangdom (Section 3) and the motivation behind
sound spatialisation over spherical (dome-shaped) loudspeaker Zirkonium and how it handles spatialisation (Section
setups. By non-invasive, we mean that Zirkonium offers the 4), and discuss its implementation, providing details on
artist spatialisation capabilities without forcing her to change certain interesting aspects (Sections 5 and Section 6).
her usual way of working. This is achieved by supporting a We have been using Zirkonium in production since
variety of means of designing and controlling spatialisations. 2005, and we relate our experience using Zirkonium in a
Zikonium accommodates user-defined speaker distributions and
variety of concert situations (Section 7). We conclude
offers HRTF-based headphone simulation for situations when
the actual speaker setup is not available. It can acquire sound
with an assessment of the lessons we have learned and
sources from files, live input, or via the so-called device mode, suggestions of future work still to be done.
which allows Zirkonium to appear to other programs as an
audio interface. Control data may be predefined and stored in 2. HISTORICAL BACKGROUND
a file or generated elsewhere and sent over OSC. This paper
details Zirkonium, its design philosophy and implementation, Spatial distribution of sound events has played an
and how we have been using it since 2005. important role in electroacoustic music since its very
beginnings. As early as 1951, the studio at the Radio-
diffusion-Télévision Française (RTF) employed a quad-
1. INTRODUCTION
raphonic spatialisation system with two front channels,
The Institut für Musik und Akustik (IMA) at ZKM | one channel in the back, and one above the listeners
Zentrum für Kunst und Medientechnologie, Karls- and developed a controller, the pupitre d’espace, used
ruhe has as its primary mission commissioning, pro- for live control of spatialisation (Zvonar 2005).
ducing and presenting concerts of electroacoustic Richard Zvonar (Zvonar 2005) and Leo Küpper
music. It was founded in 1989 and has been at its (Küpper 1984) offer fascinating histories of the devel-
present location in the ZKM complex since 1997. opment of spatialisation in music, each with his own
IMA is home to composers’ ateliers and studios for focus. Through their research, we can see two broad
recording and producing electronic and acoustic tendencies in systems for sound spatialisation devised
music, as well as a concert space, the Kubus, a rec- since the late 1970s. We label these the acousmatic
tangular concert hall inside the iconic cubical build- approach and simulation approach (Table 1).
ing at the main entrance to ZKM. The Klangdom The acousmatic approach can be thought of as an
project was launched in 2004 in order to extend the extension of musique concre`te. Similar to musique con-
existing capabilities of the Kubus and offer a more cre`te’s focus on the capabilities of the tape machine as a
immersive sound-spatialisation system. source of generating musical material, the acousmatic
The Klangdom, as with all spatialisation systems, is approach focuses on the loudspeaker and its position as
made up of three components: loudspeaker position- the way to organise sound in space. This is well illu-
ing, a technique for routing sound to the speakers, and strated by several different systems.
a controller for defining the routing. In the Klangdom, The acousmonium was developed at the Groupe de
the Zirkonium software defines the latter two. Recherches Musicales (GRM) in the mid-1970s. An
Zirkonium’s primary task is offer the user a model of acousmonium is made up of different types of loud-
a room, let her place virtual sound sources in the space, speakers distributed throughout a room. It is usually
and route sound to speakers to realise this placement. played live by a composer/performer who routes the
This paper describes how Zirkonium performs this. We audio of a piece (often 2-channel) to the different
begin by providing historical background on sound loudspeakers, taking advantage of the sound repro-
spatialisation approaches (Section 2). We then describe duction characteristics and physical placement of the
loudspeakers to realise a spatialised live performance.
*The Klangdom project was generously funded by the Zunkunfts- Another system with a similar philosophy, but a
offensive initiative of Baden-Württemberg. different realisation, is the sound dome of the sort
Organised Sound 14(3): 268–276 & Cambridge University Press, 2009. doi:10.1017/S1355771809990082
Zirkonium: Non-invasive software for sound spatialisation 269
championed by Leo Küpper (Küpper 1984) and the unconstrained freedom of a tabula rasa would
exemplified by the German Pavilion at the World certainly take a different form. But for the Klangdom,
Expo ’70 in Osaka, Japan. A sound dome differs a goal of the design was to extend the existing cap-
from an acousmonium in that it is made up of one abilities of the Kubus, but at the same time to support
type of speaker and specifies a specific distribution of the established day-to-day activities of IMA.
the speakers in space (on the surface of a sphere). Composers and musicians who perform at IMA |
Nonetheless, both share the core spatialisation phi- ZKM come from a variety of backgrounds and
losophy that the position of the loudspeaker deter- represent a multitude of approaches to music. These
mines the origin of the virtual sound source. different approaches can make contradictory demands
The technique of vector-base amplitude panning, or on a spatialisation system. Some composers place
VPAB, represents a further development in this way of much worth on having the system deliver the sounds
thinking (Pulkki 1997). As with the dome, it places the they have composed as faithfully as possible to the
speakers on the surface of a sphere, but provides a more loudspeakers. Other composers are more interested in
sophisticated scheme for creating a virtual sound source hearing a convincing illusion of movement and are
by routing the audio to the three speakers closest to the willing to tolerate their sounds being processed, filtered
position of the virtual sound source. and reverberated to achieve that effect. This is parallel
The simulation approach, on the other hand, uses to the distinction between the acousmatic and simula-
signal processing to simulate physical acoustics to tion approaches to spatialisation.
produce the illusion of moving sound sources. The hardware and software of the Klangdom was
Whereas the acousmatic approach can be realised in designed to accommodate, as much as possible, both
the analogue domain with just a mixer, the simulation uses. The loudspeakers are distributed in a dome-
approach requires the intercession of a computer. shape, enveloping the audience. This configuration is
One of the early pioneers was John Chowning, who suitable for VBAP as well as Ambisonics. The loud-
in his 1971 article (Chowning 1971) described tech- speakers are suspended from three concentric rings
niques for simulating moving sound sources over a around the Kubus. The first two rings are made of
quadraphonic speaker setup. The Spatialisateur or track, allowing the speakers to be easily moved, and
Spat (Jot and Warusfel 1995) developed at IRCAM is the third ring can be raised or lowered. We chose
an evolution of this idea and lets the user specify a MeyerSound UPJ-1P loudspeakers for their quality
sound source’s position as well as its reverberation of sound reproduction, as well as the fact that they
characteristics, which are also important for the are normal, full-range concert loudspeakers. This
perception of sound localisation. offers the advantage that the speakers can be
Other recent work along the direction has focused on removed from the dome temporarily and used else-
Wave Field Synthesis and Ambisonics. Both of these where if there is an acute need. This was motivated by
techniques have as their basis the wave equation. Wave the reality of our concert schedule.
Field Synthesis (Berkhout 1988) uses a large number of The Klangdom was completed and first used in
small loudspeakers to synthesise an approximation of concert in 2005 (Ramakrishnan, Gossmann, Brümmer
the wavefront that would be observed were there a and Sturm 2006). The Zirkonium software, developed
sound source at the specified position. Ambisonics to control the Klangdom, has since undergone two
similarly approximates the sound field at a point, but major iterations. This paper describes the current, as
does so using ordinary speakers (Daniel 2001). of February 2009, version.
her preferred tools whenever possible. The model we sound sources in space and converts that repre-
envisioned is that of a series of services for spatiali- sentation into sound. The section after that, Section
sation to be leveraged and utilised in combination 7, concerns itself with the process of defining and
with other programmes (Digital Audio Workstations, acquiring sound sources.
Max/MSP, SuperCollider, etc.), as opposed to a mono-
lithic application that imposes itself into all aspects of
the compositional process. Thus, there was a conscious 6. PANNING MODEL
design decision that Zirkonium focus solely on panning Zirkonium takes a mono sound source and point in
and the construction of time-based panning choreo- space and creates the illusion of the sound emanating
graphies, while allowing other domains of spatiali- from that point in space. This is achieved through
sation, such as reverberation, to be implemented the interaction of several components: the panning
elsewhere and still be incorporated. algorithm, VBAP; the distribution of the speakers in
To this end, Zirkonium defines a simple, straight- the room (speaker setup); and the specification of the
forward panning model and provides an open infra- desired position of the virtual sound source (figure 1).
structure with entry points where other programmes This section describes these three pieces of the puzzle.
may dock themselves. This enables a full spectrum of
possibilities for sound and panning to be generated
and controlled in real time or composed beforehand for 6.1. VBAP
storage and playback. Furthermore, since composers
will want to perform their pieces in different locations, Equal power panning (EPP) is a standard technique
where different infrastructure may be available, Zirko- for stereo panning while keeping the perceived loud-
nium lets composers describe the relevant elements of ness constant (Roads 1996: 460–1). Vector base
the concert space and adapts the spatialisation to those amplitude panning is an extension of EPP to speakers
parameters, minimising setup overhead and obviating distributed over the surface of a sphere (Pulkki 1997).
the need for interchange formats. In VBAP, virtual point sources are panned using a
combination of the closest three speakers and scaled
such that the perceived loudness does not change.
5. ZIRKONIUM
At its core, all Zirkonium does is give the user a 6.2. Speaker setup
model of a room, let her place virtual sound sources
in the space, and route sound to speakers to realise The first step necessary to realise a spatialisation is the
this request. When so described, it sounds simple, but definition of a speaker setup. The speaker setup defines
there are many design decisions involved. As with the for Zirkonium the positions of the speakers in the
hardware, flexibility and pragmatism were the guid- room, information necessary to use VBAP (figure 2).
ing principles in making these decisions. Thus, we Speaker positions can be specified freely in 3-D
defined the goal of offering this functionality in a Cartesian coordinates (X/Y/Z). As the speakers are
minimalist way. This resulted in an acousmatic sys- added, Zirkonium does two things:
tem. For many uses, this is sufficient. In other 1. it projects the position of the new speaker onto
situations, Zirkonium can also be used as a basic the surface of a sphere, as required by VBAP, and
building block and extended with simulations.
The philosophy of pragmatism and flexibility
manifests itself in many places. Zirkonium uses
VBAP for spatialisation, but it also defines an inter-
face to allow for use of other algorithms, such as
Ambisonics. Zirkonium does not add reverb, distance
cues, and simulation of movement artefacts (e.g.,
Doppler shift), but it does support the user in doing
this herself (see Section 7 for a further discussion).
Furthermore, Zirkonium has been designed to be
used in other spaces, not just at ZKM. Users may
define their own loudspeaker configurations that
describe their local environment. And realising that
the environment itself may not always be at the dis-
posal of the artist, Zirkonium can simulate speaker
configurations for headphone listening.
The following section, Section 6, discusses Zirko-
nium’s spatialisation model – how it represents virtual Figure 1. Positioning a sound in space with Zirkonium.
Zirkonium: Non-invasive software for sound spatialisation 271
2. it computes a triangular mesh that decomposes that at most three speakers per virtual sound source
the individual speaker positions into triplets of will have non-zero coefficients.
speakers (figure 3). Virtual source positions may be specified in either
Cartesian (X/Y) or spherical (azimuth/zenith) coor-
Although Pulkki himself proposes an algorithm for
dinates. One of the initially confusing aspects of
computing a triangular mesh (Pulkki and Lokki 1998),
Zirkonium is that source positions are two dimen-
we use Delaunay triangulation, a well-known and
sional. This is because VBAP assumes that speakers
widely studied method from computational geometry
are located on the surface of a sphere (Pulkki 1997).
(Bern and Eppstein 1992). In particular, we use the
The surface of a sphere, though embedded in three
implementation provided by Triangle (Shewchuk 1996).
dimensions, is two dimensional (i.e., a surface). Thus,
A further feature offered by the speaker setup is the
only two coordinates are necessary to specify a point
ability to simulate it for headphone listening. This
on it – the third coordinate is implicit (figure 4).
simulation is performed using a head-related transfer
The first step in applying VBAP is to determine
function (HRTF).
which triplet of speakers should be used. As pointed
out in Pulkki’s original article (Pulkki 1997), this can
be easily done by computing the VBAP coefficients
6.3. Positioning virtual sources
for each of the speaker triplets. Only one of these
Positioning a virtual sound source involves taking a triplets will yield all non-negative coefficients. These
user-specified position and generating gain coefficients are the ones that are then passed on to the mixer to
for each loudspeaker. The properties of VBAP imply realise the virtual sound source.
272 Chandrasekhar Ramakrishnan
The file also specifies sound sources to be spatia- Though similar results may be achieved by other
lised (figure 7). The sound sources may be located in means – either using two computers and connecting
audio files, or come from live input. The audio files the output of one computer to another running Zir-
themselves may include any number of channels and konium, or by using the Jack audio router (Letz,
be in a variety of formats, including common LPCM Fober, Orlarey and Davis 2004) – the device mode
formats (AIFF, WAV, SND), as well as compressed offers advantages (figure 8). For one, it is usually
formats (MP3, MPEG-4 Audio), and even more simpler to set up. And, furthermore, device mode
exotic formats such as SDII. The user defines how the does not add latency.
channels of a file or live input map to sound sources.
Individual channels may map to zero, one, or multi- 7.3. OSC control
ple virtual sound sources.
All the aspects of a virtual sound source – its posi-
tion, spread and gain – may be controlled by an OSC
7.2. Device mode
interface. The OSC namespace provides for several
An alternative way to acquire sound sources is via different ways of positioning a sound source: Carte-
device mode. In device mode, Zirkonium appears as sian coordinates, spherical coordinates, or placing a
an audio device to other software. This gives any sound source at the position of the nearest speaker
audio program, including Max/MSP, SuperCollider, (Table 3).
and DAWs such as Logic or Digital Performer access OSC control is the best option when a complex
to Zirkonium. algorithm defines the source position. It is the only
away from the centre. Distance can be handled by Zirkonium may not be appropriate for you. Still,
similarly. we think we can extract some general advice,
The composer Todor Todoroff did exactly this for applicable to all spatialisation systems.
his piece Around and above, weightless y . Expect to handle standard formats. Composers will
continue to compose for stereo, quadraphonic, 5.1,
and octophonic configurations. Research into for-
8.3. Real-time control
mats for description and interchange of spatial sound
With live control, a performer can use the Zirkonium is an ongoing topic (Kendall, Peters and Geier 2008),
as an instrument and play a spatialisation. We have but no standard format has yet been agreed upon. So
built a very simple live controller for spatialisation have a good solution for incorporating content in
using the JazzMutant Lemur interface. This con- standard formats to create an appealing spatialisation
troller lets a user create several rings of virtual sound with little work.
sources and control the height in the dome and And advice for composers: compose for the format
rotational speed of each ring. Despite its simplicity, it in which you expect your work will primarily be
has been used to great effect by the composers Gilles heard. If that format is four channels, then optimise
Gobeil and Robert Normandeau. for four channels. But keep your source material
Another use for real-time control is to algor- (ProTools session, or whatever) in a form that you
ithmically determine the positions of sound sources. can easily produce sub-mixes of the entire piece.
For Alvin Curran’s TransDADAExpress 2, Frank These sub-mixes can than be placed in space to create
Halbig programmed a Max/MSP controller that a convincing result.
dynamically positioned channels based on chaotic Take advantage of the spatialisation capabilities of
oscillators. The audio for the piece was played back the format you compose for. If you are targeting a 4-
from file, but the positions of the sound sources was channel system, then go ahead and add movement,
determined in real time and controlled via OSC. reverberation, and so on. These effects can be lever-
aged when translating to a pluriphonic rendering.
Movement through space offered by, for example, a
9. FUTURE WORK
sound dome or WFS system is a powerful effect, but
As with any software project, Zirkonium is continually the ability to use space itself, and place channels that
evolving, and we are always trying to incorporate would otherwise overlap in different locations, is an
better solutions for the problems we encounter. even more powerful effect.
A particularly major gap is Zirkonium’s poor
integration with DAWs. This is the preferred envir-
onment for many electroacoustic composers, and REFERENCES
better tools for controlling Zirkonium from them Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J.,
would be an appreciated addition. Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S.,
Extending the spatialisation capabilities of Zirko- McKenney, A. and Sorensen, D. 1999. LAPACK Users’
nium is another present goal. First, it would be nice Guide. Philadelphia, PA: Society for Industrial and Applied
to support Ambisonics as well as VBAP, since the Mathematics.
basic premises of Zirkonium are compatible with Berkhout, A. J. 1988. A Holographic Approach to Acous-
tic Control. Journal of the Audio Engineering Society
Ambisonics as well. Furthermore, we would like to
36(12): 977–95.
add decorrelation to better create the impressions of
Bern, M. and Eppstein, D. 1992. Mesh Generation and
diffuse areas of sound as opposed to point sources. Optimal Triangulation. In Ding-Zhu Du and Frank
Hwang (eds.), Computing in Euclidean Geometry. Sin-
10. CONCLUSION gapore: World Scientific.
Chowning, J. M. 1971. The Simulation of Moving Sound
At this point, it seems appropriate to reflect on the Sources. Journal of the Audio Engineering Society 19: 2–6.
lessons we have learned from three years of using and Daniel, Jérôme. 2001. Représentation de champs acous-
revising Zirkonium, and to offer some advice to tiques, applications à la transmission et à la reproduc-
media technologists building spatialisation systems tion de scènes sonores complexes dans un contexte
and to composers who use them. multimedia. PhD thesis, Université Paris VI.
Jot, J. and Warusfel, O. 1995. Spat , : A Spatial Processor
First, we recommend, of course, checking out Zir-
for Musicians and Sound Engineers. CIARM: Interna-
konium (www.zkm.de/zirkonium). It is open source
tional Conference on Acoustics and Musical Research.
and available without cost. As we have presented Ferrara, Italy. Available online at: http://catalogue.
here, we think it does a good job of solving the ircam.fr/articles/textes/Jot95a.
problems it sets out to address. But it is not going to Kendall, G. 1995. The Decorrelation of Audio Signals and
be the solution for everyone. All software involves its Impact on Spatial Imagery. Computer Music Journal
design decisions and compromises, and those made 19(4): 71–87.
276 Chandrasekhar Ramakrishnan
Kendall, G., Peters, N. and Geier, M. 2008. Towards an with Diva Project. International Conference on Auditory
Interchange Format for Spatial Audio Scenes. Pro- Display. Glasgow: British Computer Society.
ceedings of the 2008 International Computer Music Con- Ramakrishnan, C., Gossmann, J., Brümmer, L. and Sturm B.
ference, Belfast, UK. 2006. The ZKM Klangdom. Proceedings of the 2006
Küpper, L. 1984. Musik und Technik heute – Die Para- International Conference on New Interfaces for Musical
meter des Klangraumes in der Musik. Ars Electronica Expression. Paris: IRCAM, 140–3.
Festival-Katalog 1984 (Band 01). Available online at Roads, C. 1996. The Computer Music Tutorial. Cambridge,
http://90.146.8.18/de/archiv_files/19841/1984_016.pdf. MA: MIT Press.
Letz, S., Fober, D., Orlarey, Y. and Davis, P. 2004. Jack Shewchuk, J. R. 1996. Triangle: Engineering a 2D Quality Mesh
audio server: MacOS X port and multi-processor Generator and Delaunay Triangulator. In Ming C. Lin and
version. Paper presented at the Sound and Music Dinesh Manocha (eds.), Applied Computational Geometry:
Computing ’04 Conference, Paris, France. Towards Geometric Engineering. Berlin: Springer-Verlag,
Pulkki, V. 1997. Virtual Sound Source Positioning Using Berlin. Lecture Notes in Computer Science (1148): 203–222.
Vector Base Amplitude Panning. Journal of the Audio Zvonar, R. 2005. A History of Spatial Music. eContact! 7.4.
Engineering Society 45(6): 456–66. Université Concordia. Accessed on 12 February 2009 at:
Pulkki, V. and Lokki, T. 1998. Creating Auditory Displays http://cec.concordia.ca/econtact/Multichannel/spatial_
to Multiple Loudspeakers using VBAP: A Case Study music.html.