Pro Techniques for Sound Design
Pro Techniques for Sound Design
Pr o Tech n iqu es f or
Sou n d Design
www.routledge.com
Table of Cont ent s
1. Pract ical Sound Design
Jean-Luc Sinclair
Principles of Game Audio and Sound Design
2. Designing a m om ent
Rob Bridgett
Leading with Sound
Br ow se ou r f u ll r an ge of Au dio book s.
Br ow se »
Free standard shipping included on all online orders.
Int roduct ion
Sound Designers are often found in theatrical, television/movie, and
corporate productions and are responsible for everything the
audience hears. This includes creating sound effects for visual
media, as well as helping to design and oversee system
installations.
This free guide is ideal for those who utilize sound design
techniques in their careers. Using both research based and hands
on advice, it covers a range of topics like the spatialization of sound,
3D audio techniques and sound design for games.
The chapters featured are sourced from a selection of Routledge
books which focus on sound design and audio engineering. More
details about each chapter are noted below.
If you would like to delve deeper into any of the topics, the full
versions of the books are available to purchase from
Routledge.com.
Ch apt er 2 - Design in g a m om en t
Context drives the interactive design range of the sound, and how it
needs to change over time and circumstance. This chapter from
'Leading with Sound' explores how to use the web of context to
create a narrative sound story to convey information to the
audience.
Q
Introduction
Chapter 3 - Emotion in sound design
The audience's reaction is crucial to storytelling. This chapter from
'Sound for Moving Pictures' explains how to employ ways to elicit
desired emotions in an audience using a wealth of research from
specialists in mixing and sound design.
Lear n M or e »
6 PRACTICAL SOUND DESIGN
Learning Objectives
In Chapter fve we looked at the origins of sound design and some of the
most commonly used techniques and processes used in the trade. In this
chapter we look at a few more specifc examples on how to apply these tech-
niques in the context of linear and interactive sound design. We will also
introduce the concept of prototyping, which consists of building interactive
sound objects such as vehicles or crowd engines and recreating their behav-
ior in the game by building an interactive model of it, in a software such as
MaxMASP or Pure Data, prior to integration in the game engine. The process
of prototyping is extremely helpful in testing, communicating and demon-
strating the intended behavior or possible behaviors of the interactive ele-
ments in a game. But frst we shall take a closer look at some of the major
pitfalls most game sound designers run into when setting up a session for
linear sound design, such as cut scenes, as well as some basics of signal fow
and gain staging.
1. Signal Flow
The term signal flow refers to the order through which the audio signal
encounters or flows through the various elements in a mixer or via external
processors, from the input – which is usually the hard drive – or a mic input
to the digital audio converters (DACs) and out to the speakers.
In this chapter we will use Avid’s Pro Tools as our DAW. The concepts dis-
cussed here, however, will easily apply to another software, especially as most
DAW mixers tend to mimic the behavior and setup of classic analog mixers.
Let’s take a look at how the signal flows, from input to output, in a tra-
ditional DAW and how understanding this process will make us better audio
engineers and therefore sound designers.
The following chart will help us understand this process in more detail:
a. Input
In most mixers the very first stage is the input. The input varies whether we
are in recording mode, in which case the input will usually be a microphone
or line input or whether we are in playback mode, in which case the input will
be the audio clip or clips in the currently active playlist.
b. Inserts
The next stage your signal is going to run into are the inserts or insert sec-
tion. This is where you can add effects to your audio, such as equalization,
PRACTICAL SOUND DESIGN 117
compression and whatever else may be available. Inserts are often referred
to as an access point, allowing you to add one or multiple processors in your
signal path. In most DAWs, the signal goes from the first insert to the last from
top to bottom.
c. Pre-Fader Send
After the inserts, a pre-fader send is the next option for your signal. This is
where you will send a copy of your audio to another section of your mixer,
using a bus. A bus is a path that allows you to move one or multiple signals to
a single destination on another section of the mixer. Sending out a signal at
this point of the channel strip means the amount sent will be irrespective of
the main fader, therefore changes in volume across the track set by the main
fader will not affect the amount of audio going out on the pre-fader send. The
amount of signal sent is only dependent on the level of the send and, of course,
the level of the signal after the insert section.
If you were to send vocals to a reverb processor at this stage, fading out the
vocals would not affect the level of the reverb, and you would eventually end
up with reverberation only after fading out the vocals.
d. Volume Fader
The next stage is the volume fader, which controls the overall level of the
channel strip or audio track. When the volume fader is set to a value of 0dB,
known as unity, no gain is applied to the overall track, and all the audio is play-
ing at the post insert audio level. Raising or lowering the fader by any amount
will change the current gain value by as much.
Often it is here that you will find panning, to place the audio output in
stereo or surround space, depending on the format you are working with.
Next to the volume fader, you will usually find a level meter. Please check with
your DAW’s manual to find out exactly how the meter is measuring the level
(Peak, RMS, LUFS etc.). Some DAWS will allow you to change the method for
metering. Irrelevant of the method employed, you have the option to monitor
signals pre-fader or post-fader. By default, most mixers will have their meters
set to post fader mode, which means the fader will display the level after the
volume fader and will therefore be affected by it. When monitoring pre-fader,
the meter will display the level of the signal right after the last insert, giving
you an accurate sense of the level at this stage. It’s probably a good idea to
at least occasionally monitor your signals pre-fader, so you can be sure your
signal is clean coming out of the insert section.
Please refer to your DAW’s documentation to find out how to monitor pre
or post-fader.
118 PRACTICAL SOUND DESIGN
f. Post-Fader Send
Next we find the post-fader send. The level sent to the bus will be impacted
by any changes in the level of the volume fader. This is the most commonly
used type of send. In this case, if you are sending vocals to a reverb processor,
fading out the vocals will also fade out the level of the reverb.
g. Output
Last, we find the output, which determines where the signal is routed to next,
by default usually the master bus, where all the audio is summed. Often the
output of an audio track should be routed to a submix, where multiple audio
tracks that can or should be processed in the same way are mixed together,
such as all the ambience tracks in a session or the dialog, music etc.
It’s probably a good rule of thumb to make sure that no track be routed directly
to the master fader but rather to a subgroup or submix. Routing individual tracks
directly to the master will make your mix messy and difficult to manage.
You may have already noticed that DAWs often do not display the informa-
tion on a channel strip in their mixer in the order through which the signal
flows from top to bottom. If unaware of this, it is easy to make mistakes that
get in the way of the task at hand.
Frame rates for video are usually lower than the ones we work with in gam-
ing. Frame rates ranking from 24 to 30 frames per second are common in
video, film and broadcast. Find out what the frame rate is of the video you are
working with, and make sure to set your DAW’s timeline to be displayed in
Timecode format, rather than bars and beats.
PRACTICAL SOUND DESIGN 119
Figure 6.2
Timecode is a way to make sure that each and every frame in a piece of
video will have a single address that can be easily recalled and is expressed in
the following format:
HH:MM:SS:FF.
Figure 6.3
The clipping may not be obvious, especially to tired ears and mixed in with
other audio signals, but this can lead to harsh sounding mixes and make your
task much more difficult.
A better solution would be to turn the gain down at the level of the first
insert by inserting a trim plugin and turning the level down before it hits the
first plugin and preventing any clipping to occur in the first place.
The term dynamic range in the context of a mixing session or a piece of equip-
ment refers to the difference– or ratio – between the loudest and the softest
sound or signal that can be accurately processed by the system. In digital audio,
the loud portion of the range refers to the point past which clipping occurs, intro-
ducing distortion by shaving off the top of the signal. The top of the dynamic
range in the digital audio domain is set to 0dBFS, where FS stands for full scale.
Figure 6.4 shows the same audio file, but the right one shows the charac-
teristic flat top of a clipped audio file, and the fidelity of the audio file will be
severely affected.
Figure 6.4
PRACTICAL SOUND DESIGN 121
In the digital audio world, the bottom of the dynamic range depends on the
number of bits the session or processor is running at. A rule of thumb is that
1 bit = 6dB of dynamic range. Keep in mind this is an approximation, but it
is a workable one. A session at 24 bits will therefore offer a dynamic range
of 144dB, from 0 to −144dBFS. This, theoretically, represents a considerable
improvement over previous high-end large format analog mixing consoles.
Any signal below that level will simply blend into the background noise and
probably will sound quite noisy as it approaches that level.
Figure 6.5
Delivery of stems is quite common and often expected when working with lin-
ear media. Stems are submixes of the audio by category such as music, dialog
and sound effects. Stems make it convenient to make changes to the mix, such
as replacing the dialog, without needing to revisit the entire mix. Having a
separate music bounce also allows for more flexible and creative editing while
working on the whole mix to picture.
It also makes sense to structure our overall mix in terms of music, effects
and dialog busses for ease of overall mixing. Rather than trying to mix all
tracks at once, the mix ultimately comes down to a balance between the three
submixes, allowing us to quickly change the relative balance between the
major components of the mix.
Effect loops are set up by using a pre or post-fader send to send a portion of
the signal to a processor, such as reverb, in order to obtain both a dry and
wet version of our signals in the mixer, allowing for maximum flexibility. The
effect we are routing the signal to usually sits on an auxiliary input track.
Figure 6.6
Additionally, when it comes to sound effects such as reverb and delays, which
are meant to be applied to multiple tracks, it usually makes more sense to use
effects loops and sends rather than inserting a new reverb plugin directly on every
track that requires one. The point of reverberation when working with sound
replacement is often to give us a sense for the space the scene takes place in,
PRACTICAL SOUND DESIGN 123
which means than most sound effects and dialog tracks will require some rever-
beration at some point. All our sounds, often coming from completely different
contexts, will also sound more cohesive and convincing when going through the
same reverb or reverbs. Furthermore, applying individual plugins to each track
requiring reverb is wasteful in terms of CPU resources and makes it very difficult
to make changes, such as a change of space from indoors to outdoors, as they
must be replicated over multiple instances of the plugins. This process is also time
consuming and difficult to manage as your mix grows in complexity.
As a rule, always set up separate aux send effect loops for reverberation
processors and delays used for modeling the environment. In addition to the
benefits mentioned earlier, this will also allow you to process the effects inde-
pendently from the original dry signal. The use of equalization or effects such
as chorus can be quite effective in enhancing the sound of a given reverb. As
all rules, though, it can be broken but only if there is a reason for it.
Figure 6.7
124 PRACTICAL SOUND DESIGN
In this configuration, no audio from the mix is routed directly to the master
fader. Rather there is an additional mixing stage, a master sub mix where all
the audio from our mix is routed. The sub master is then sent to the master
output sub master -> master output. This gives us an additional mix stage,
the sub master, where all premastering and/or mastering processing can be
applied, while the master output of the mix is used as a monitoring stage only,
such as audio levels, spatial image and spectral balance.
Since all premastering or mastering is done at the master sub mix, our master
outputs will be ‘clean’. Should we wish to use a reference track, this configura-
tion means that we can route our reference track directly to the master out and
compare it to the mix without running the reference through any of the master-
ing plugins as well as easily adjust the levels between our mix and the reference.
The next stage from the top is where we find the submixes by categories or
groups for music, dialog and sound effect, as well as the effect loops for reverb
and other global effects. All the audio or MIDI tracks in the session are summed
to one of these, no tracks going out directly to the master or sub master output.
Each of the group will likely in turn contain a few submixes depending on the
needs and complexity of the mix. Sound effects are often the most complex
of the groups and often contain several submixes as illustrated in the diagram.
Figure 6.8
PRACTICAL SOUND DESIGN 125
The screenshot shows an example of a similar mix structure for stereo out-
put realized in Avid’s Pro Tools, although this configuration is useful regard-
less of the DAW you are working with. The submixes are located on the left
side of the screen, to the left of the master fader, and the main groups for
music, dialog and sound effects are located on the right side.
• On each of the audio tracks routed to the groups a trim plugin would
be added at the first insert, in order to provide the sound designer with
an initial gain stage and prevent clipping.
• Each audio track is ultimately routed to a music, dialog or sound effect
submix, but some, especially sound effects, are routed to subgroups,
such as ambience, gunshots and vehicles that then get routed to the
sound effect submix.
• Three effect loops were added for various reverberation plugins or
effects.
f. Further Enhancements
We can further enhance our mix by adding additional features and effects to
our mix to give us yet more control and features.
Group Sidechaining
sounds effects where there is no dialog present. This type of group sidechain-
ing is most common in game engines but is also used in linear mixing.
Monitoring
While the meters in the mixer section of your DAW give you some sense of the
levels of your track, it is helpful to set up additional monitoring for frequency
content of the mix, stereo image (if applicable) and a good LUFS meter to have
an accurate sense of the actual loudness of your mix.
At this point, we are ready to mix. Additional steps may be required, based
on the session and delivery requirements, of course.
1. Guns
Guns are a staple of sound design in entertainment, and in order to stay
interesting from game to game they demand constant innovation in terms
of sound design. In fact, the perceived impact and power of a weapon very
much depends on the sound associated with it. The following is meant as an
introduction to the topic of gun sound design, as well as an insight as to how
they are implemented in games. There are lots of great resources out there
on the topic, should the reader decide to investigate the topic further, and is
encouraged to do so.
There are many types of guns used in games, but one of the main differences
is whether the weapon is a single shot or an automatic weapon.
Most handguns are single shot or one shot, meaning that for every shot
fired the used needs to push the trigger. Holding down the trigger will not fire
additional rounds.
Assault rifles and other compact and sub compact weapons are sometimes
automatic, meaning the weapon will continue to fire as long as the player is
pushing the trigger or until the weapon runs out of ammunition.
PRACTICAL SOUND DESIGN 127
The difference between one shot and automatic weapons affects the way
we design sounds and implement the weapon in the game. With a one-shot
weapon it is possible to design each sound as a single audio asset including
both the initial impulse, the detonation when the user presses the trigger, as
well as the tail of the sound, the long decaying portion of the sound.
Figure 6.9
In the case of an automatic weapon, the sound designer may design the
weapon in two parts: a looping sound to be played as long as the player is
holding onto the trigger and a separate tail sound to be played as soon as the
player lets go of the trigger, to model the sound of the weapon decaying as the
player stops firing. This will sound more realistic and less abrupt. Additional
sounds may be designed and triggered on top of the loop, such as the sound
of the shell casings being ejected from the rifle.
Figure 6.10
b. General Considerations
Overall, regardless of the type of weapon you are sound designing and imple-
menting, when designing gun sounds, keep these few aspects in mind:
• Sound is really the best way to give the player a sense of the power and
capabilities of the weapon they’re firing. It should make the player feel the
power behind their weapon and short of haptic feedback, sound remains
the best way to convey the impact and energy of the weapon to the player.
Sound therefore plays an especially critical role when it comes to weapons.
128 PRACTICAL SOUND DESIGN
• Guns are meant to be scary and need to be loud. Very loud. Perhaps louder
than you’ve been comfortable designing sounds so far if this a new area for
you. A good loudness maximizer/mastering limiter is a must, as is a tran-
sient shaper plugin, in order to make the weapon both loud and impactful.
• Guns have mechanical components; from the sound of the gun being han-
dled to the sound of the firing pin striking the round in the chamber to that
of the bullet casings being ejected after each shot (if appropriate), these
elements will make the weapon sound more compelling and give you as a
sound designer the opportunity to make each gun slightly different.
• As always, do not get hung up on making gun sounds realistic, even if
you are sound designing for a real-life weapon. A lot of sound design-
ers won’t even use actual recordings of hand guns or guns at all when
working sound designing for one.
• The sound of a gun is highly dependent on its environment, especially
the tail end of it. If a weapon is to be fired in multiple environments, you
might want to design the initial firing sound and a separate environmen-
tal layer separately, so you can swap the appropriate sound for a given
environment. Some sound designers will take this two-step approach
even for linear applications. That environmental layer may be played on
top of the gun shot itself or baked in with the tail portion of the sound.
Figure 6.11
c. Designing a Gunshot
One approach when sound designing a gun is to break down the sound into
several layers. A layered approach makes it easy to experiment with various
PRACTICAL SOUND DESIGN 129
samples for each of the three sounds, and individually process the different
aspects of the sound for best results.
Three separate layers are a good place to start:
• Layer 1: the detonation, or the main layer. In order to give your guns
maximum impact, you will want to make sure this sample has a nice
transient component to it. This is the main layer of the sound, which
we are going to augment with the other two.
• Layer 2: a top end, metallic/mechanical layer. This layer will increase
realism and add to the overall appeal of the weapon. You can use this
layer to give your guns more personality.
• Layer 3: a sub layer, to add bottom end and make the sound more
impactful. A subharmonic generator plugin might be helpful. This
layer will give your sound weight.
When selecting samples for each layer, prior to processing, do not limit
yourself to the sounds that are based in reality. For instance, when looking
for a sound for the detonation or the main layer, go bigger. For a handgun,
try a larger rifle or shotgun recording; they often sound more exciting than
handguns. Actual explosions, perhaps smaller ones for handguns, may be
appropriate too.
Figure 6.12
As always, pick your samples wisely. A lot of sound effects libraries out there
are filled with gun sounds that are not always of the best quality, may not be
the right perspective (recorded from a distance) or already have a lot reverber-
ation baked in. You’ll usually be looking for a dry sample, as much as possible
anyway, something that ideally already sounds impressive and scary. Look for
something with a healthy transient. You might want to use a transient shaping
130 PRACTICAL SOUND DESIGN
When a shot is fired through a gun, some of the energy is transferred into
the body of the gun and in essence turns the gun itself into a resonator. This
is partially responsible for the perceived mechanical or metallic aspect of the
sound. In addition, some guns will eject the casing of the bullet after every
shot. The sound of the case being ejected and hitting the floor obviously makes
a sound too. The mechanical layer gives you a lot of opportunity for custom-
ization. When sound designing a lot of guns for a game, inevitably they will
tend to sound somewhat similar. This layer is a good place to try to add some
personality to each gun. Generally speaking, you will be looking for a bright
sound layer that will cut through the detonation and the bottom end layers. It
should help give your gun a fuller sound by filling up the higher frequencies
that the detonation and the sub may not reach. It also adds a transient to your
gun sound, which will make it sound all the more realistic and impactful.
The purpose of the sub layer is to give our sounds more weight and impact and
give the player a sense of power, difficult to achieve otherwise, except perhaps
via haptic feedback systems. Even then, sound remains a crucial aspect of
making the player ‘feel’ like their weapon is as powerful as the graphics imply.
A sub layer can be created in any number of ways, all worth experimenting
with.
It can be created using a synthesizer by modifying or creating an existing
bass preset and applying a subharmonic generator to it to give it yet more
depth and weight. Another option is to start from an actual recording, perhaps
an explosion or detonation, low pass filtering it and processing it with a sub-
harmonic generator to give it more weight still. A third option would be to use
a ready-made sub layer, readily found in lots of commercial sound libraries.
Avoid using a simple sine wave for this layer. It may achieve the desired effect
on nice studio monitors but might get completely lost on smaller speakers,
while a more complex waveform, closer to a triangle wave, will translate much
better, even on smaller speakers.
Guns and explosions are impossible to abstract from the environment they
occur in. Indeed, the same weapon will sound quite different indoors and
PRACTICAL SOUND DESIGN 131
outdoors, and since in games it is often possible to fire the same gun in
several environments, game sound designers sometimes resort to design-
ing the tail end of the gun separately so that the game engine may con-
catenate them together based on the environment they are played into. In
some cases, sound designers will also add an environment layer to the gun
sounds simply because the reverb available in the game may not be quite
sophisticated enough to recreate the depth of the sound a detonation will
create when interacting with the environment. This environment layer is
usually created by running the sound of the gun through a high-end rever-
beration plugin.
The environment layer may be baked into the sound of the gun – that is,
bounced as a single file out of the DAW you are working with – or triggered
separately by the game engine, on top of the gun sound. This latter approach
allows for a more flexible weapon sound, one that can adapt to various
environments.
Once you have selected the sounds for each layer, you are close to being done,
but there still remain a few points to take into consideration.
Start by adjusting the relative mix of each layer to get the desired effect.
If you are unsure how to proceed, start by listening to some of your favorite
guns and weapons sounds from games and movies. Consider importing one or
more in the session you are currently working on as a reference. (Note: make
sure you are not routing your reference sound to any channels that you may
have added processors to.) Listen, make adjustments and check against your
reference. Repeat as needed.
Since guns are extremely loud, don’t be shy, and use loudness maximizers
and possibly even gain to clip the waveform or a layer in it. The real danger
here is to destroy transients in your sound, which may ultimately play against
you. There is no rule here, but use your ears to strike a compromise that is
satisfactory. This is where a reference sound is useful, as it can be tricky to
strike the proper balance.
In order to blend the layers together, some additional processing may
be a good idea. Compression, limiting, equalization and reverberation
should be considered in order to get your gun sound to be cohesive and
impactful.
Player Feedback
It is possible to provide the player with subtle hints to let them know how
much ammunition they have left via sound cues rather than by having to
look at the screen to find out. This is usually done by increasing the volume
132 PRACTICAL SOUND DESIGN
of the mechanical layer slightly as the ammunition is running out. The idea is
to make the gun sound slightly hollower as the player empties the magazine.
This approach does mean that you will need to render the mechanical layer
separately from the other two and control its volume via script. While this
requires a bit more work, it can increase the sense of immersion and real-
ism as well as establish a deeper connection between the player and their
weapon.
2. Prototyping Vehicles
When approaching the sound design for a vehicle or interactive element, it is
first important to understand the range of actions and potential requirements
for sounds as well as limitations prior to starting the process.
The implementation may not be up to you, so you will need to know and
perhaps suggest what features are available to you. You will likely need the
ability to pitch shift up and down various engine loops and crossfade between
different loops for each rpm. Consider the following as well: will the model
support tire sounds? Are the tire sounds surface dependent? Will you need
to provide skidding samples? What type of collision sounds do you need to
provide? The answers to these questions and more lie in the complexity of the
model you are dealing with.
a. Specifcations
A common starting point for cars is to assume a two gear vehicle, low and high
gear. For each gear we will create an acceleration and deceleration loop, which
the engine will crossfade between based on the user action.
This is a basic configuration that can easily be expanded upon by adding more
RPM samples and therefore a more complex gear mechanism.
The loops we create should be seamless, therefore steady in pitch and
without any modulation applied. We will use input from the game engine
to animate them, to create a sense of increased intensity as we speed up by
pitching the sound up or decreased intensity as we slow down by pitching the
sound down. As the user starts the car and accelerates, we will raise the pitch
and volume of our engine sample for low RPM and eventually crossfade into
the high RPM engine loop, which will also increase in pitch and volume until
we reach the maximum speed. When the user slows down, we will switch to
the deceleration samples.
PRACTICAL SOUND DESIGN 133
Figure 6.13
Let’s start by creating the audio loops, which we can test using the basic car
model provided in the Unity Standard’s asset package, also provided in the
Unity level accompanying this chapter.
Once you have gathered enough sounds to work with it’s time to import them
and process them in order to create the four loops we need to create.
134 PRACTICAL SOUND DESIGN
There are no rules here, but there are definitely a few things to watch out for:
• The sample needs to loop seamlessly, so make sure that there are no obvi-
ous variations in pitch and amplitude that could make it sound like a loop.
• Do not export your sounds with micro fades.
Use all the techniques at your disposal to create the best possible sound, but, of
course, make sure that whatever you create is in line with both the aesthetics
of the vehicle and the game in general.
Here are a few suggestions for processing:
• Layer and mix: do not be afraid to layer sounds in order to create the
right loop.
• Distortion (experiment with various types of distortion) can be applied
to increase the perceived intensity of the loop. Distortion can be
applied or ‘printed’ as a process in the session, or it can be applied in
real time in the game engine and controlled by a game parameter, such
as RPM or user input.
• Pitch shifting is often a good way to turn something small into some-
thing big and vice versa or into something entirely different.
• Comb filtering is a process that often naturally occurs in a combustion
engine; a comb filter tuned to the right frequency might make your
sound more natural and interesting sounding.
Once you have created the assets and checked that length is correct, that they loop
without issue and that they sound interesting, it’s time for the next step, hearing
them in context, something that you can only truly do as you are prototyping.
d. Building a Prototype
No matter how good your DAW is, it probably won’t be able to help you with
the next step, making sure that, in the context of the game, as the user speeds
up and slows down, your sounds truly come to life and enhance the experi-
ence significantly.
The next step is to load the samples in your prototype. The tools you use
for prototyping may vary, from a MaxMSP patch to a fully functioning object
in the game engine. The important thing here is not only to find out if the
sounds you created in the previous step work well when ‘put to picture’, it’s
also to find out what are the best ranges for the parameters the game engine
will control. In the case of the car, the main parameters to adjust are pitch shift,
volume and crossfades between samples. In other words, tuning your model. If
the pitch shift applied to the loops is too great, it may make the sound feel too
synthetic, perhaps even comical. If the range is too small, the model might not
be as compelling as it otherwise could be and lose a lot of its impact.
We will rely on the car model that comes in with the Unity Standard Assets
package, downloadable from the asset store. It is also included in the Unity
level for this chapter. Open the Unity project PGASD_CH06 and open the
PRACTICAL SOUND DESIGN 135
scene labelled ‘vehicle’. Once the scene is open, in the hierarchy, locate and
click on the Car prefab. At the bottom of the inspector for the car you will
find the Car Audio script.
Figure 6.14
The script reveals four slots for audio clips, as well as some adjustable param-
eters, mostly dealing with pitch control. The script will also allow us to work
with a single clip for all the engine sounds or with four audio clips, which is
the method we will use. You can switch between both methods by clicking on
the Engine Sound Style tab. You will also find the script that controls the audio
for the model, and although you are encouraged to look through it, it may
make more sense to revisit the script after going through Chapters seven and
eight if you haven’t worked with scripting and C# in Unity. This script will
crossfade between a low and high intensity loop for acceleration and decel-
eration and perform pitch shifting and volume adjustments in response to the
user input. For the purposes of this exercise, it is not necessary to understand
how the script functions as long as four appropriate audio loops have been
created. Each loop audio clip, four in total, is then assigned to a separate audio
source. It would not be possible for Unity to swap samples as needed using
a single audio source and maintain seamless playback. A short interruption
would be heard as the clips get swapped.
Next, import your sounds in the Unity project for each engine loop, load
them in the appropriate slot in the car audio script and start the scene. You
should be able to control the movement of the car using the WASD keys.
Listen to the way your sounds sound and play off each other. After driving the
vehicle for some time and getting a feel for it, ask yourself a few basic questions:
• Does my sound design work for this? Is it believable and does it make
the vehicle more exciting to drive?
• Do the loops work well together? Are the individual loops seamless?
Do the transitions from one sample to another work well and convey
136 PRACTICAL SOUND DESIGN
the proper level of intensity? Try to make sure you can identify when
and how the samples transition from one another when the car is
driving.
• Are any adjustments needed? Are the loops working well as they are,
or could you improve them by going back to your DAW and exporting
new versions? Are the parameter settings for pitch or any other avail-
able ones at their optimum? The job of a game audio designer includes
understanding how each object we are designing sound for behaves,
and adjusting available parameters properly can make or break our
model.
In all likelihood, you will need to experiment in order to get to the best results.
Even if your loops sound good at first, try to experiment with the various dif-
ferent settings available to you. Try using different loops, from realistic, based
on existing sounding vehicles, to completely made up, using other vehicle
sounds and any other interesting sounds at your disposal. You will be surprised
at how different a car can feel when different sounds are used for its engine.
Other sounds may be required in order to make this a fully interactive and
believable vehicle. Such a list may include:
There is obviously a lot more to explore here and to experiment with. This car
model does not include options to implement a lot of the sounds mentioned
earlier, but that could be easily changed with a little scripting knowledge.
Even so, adding features may not be an option based on other factors such
as RAM, performance, budget or deadlines. Our job is, as much as possible,
to do our best with what we are handed, and sometimes plead for a feature
we see as important to making the model come to life. If you know how to
prototype regardless of the environment, your case for implementing new
features will be stronger if you already have a working model to demonstrate
your work and plead your case more convincingly to the programming team
or the producer.
3. Creature Sounds
Creatures in games are often AI characters that can sometimes exhibit a wide
range of emotions, which sound plays a central role in effectively communi-
cating. As always, prior to beginning the sound design process, try to under-
stand the character or creature you are working on. Start with the basics: is it
endearing, cute, neutral, good, scary etc.? Then consider what its emotional
PRACTICAL SOUND DESIGN 137
span is. Some creatures can be more complex than others, but all will usually
have a few basic emotions and built in behaviors, from simply roaming around
to attacking, getting hurt or dying. Getting a sense for the creature should be
the first thing on your list.
Once you have established the basic role of the creature in the narrative,
consider its physical characteristics: is it big, small, reptilian, feline? The
appearance and its ‘lineage’ are great places to start in terms of the sonic
characteristics you will want to bring out. Based on its appearance, you can
determine if it should roar, hiss, bark, vocalize, a combination of these or
more. From these characteristics, you can get a sense for the creature’s main
voice or primary sounds, the sounds that will clearly focus the player’s atten-
tion and become the trademark of this character. If the creature is a zombie,
the primary sounds will likely be moans or vocalizations.
Realism and believability come from attention to detail; while the main
voice of the creature is important, so are all the peripheral sounds that will
help make the creature truly come to life. These are the secondary sounds:
breaths, movement sounds coming from a creature with a thick leathery skin,
gulps, moans and more will help the user gain a lot better idea of the type of
creature they are dealing with, not to mention that this added information
will also help consolidate the feeling of immersion felt by the player. In the
case of a zombie, secondary sounds would be breaths, lips smacks, bones
cracking or breaking etc. It is, however, extremely important that these
peripheral or secondary sounds be clearly understood as such and do not get
in the way of the primary sounds, such as vocalizations or roars for instance.
This could confuse the gamer and could make the creature and its intentions
hard to decipher. Make sure that they are mixed in at lower volume than the
primary sounds.
Remember that all sound design should be clearly understood or leg-
ible. If it is felt that a secondary sound conflicts with one of the primary
sound effects, you should consider adjusting the mix further or removing it
altogether.
b. Emotional Span
sounds you create all translate these emotions clearly and give us a wide range
of sonic transformations while at the same time clearly appearing to be ema-
nating from the same creature.
The study or observation of how animals express their emotions in the real
world is also quite useful. Cats and dogs can be quite expressive, making it
clear when they are happy by purring or when they are angry by hissing and
growling in a low register, possibly barking etc. Look beyond domestic ani-
mals and always try to learn more.
Creatures sound design tends to be approached in one of several ways:
by processing and layering human voice recordings, by using animal sounds,
by working from entirely removed but sonically interesting material or any
combination of these.
Your voice talent may sound fabulous and deliver excellent raw material, but
it is unlikely that they will be able to sound like a 50 meters tall creature or
a ten centimeters fairy. This is where pitch shifting can be extremely helpful.
PRACTICAL SOUND DESIGN 139
Pitch shifting was detailed in the previous chapters, but there are a few fea-
tures that are going to be especially helpful in the context of creature sound
design.
Since pitch is a good way to gauge the size of a character, it goes without
say that raising the pitch will make the creature feel smaller, while lowering it
will inevitably increase its perceived size.
The amount of pitch shift to be applied is usually specified in cents and
semitones.
Note: there are 12 semitones in an octave and 100 cents in a semitone.
The amount by which to transpose the vocal recording is going to be a
product of size and experimentation, yet an often-overlooked feature is the
formant shift parameter. Not all pitch shifting plugins have one, but it is rec-
ommended to invest in a plugin that does.
Formants are peaks of spectral energy that result from resonances usually
created by the physical object that created the sound in the first place. More
specifically, when it comes to speech, they are a product of the vocal tract and
other physical characteristics of the performer. The frequency of these for-
mants therefore does not change very much, even across the range of a singer,
although they are not entirely static in the human voice.
Table 6.1
Formant E A 0h 0oh`
Frequencies in Hz
These values are meant as starting points only, and the reader is encouraged to research more
information online for more detailed information.
When applying pitch shifting techniques that transpose the signal and
ignore formants, these resonant frequencies also get shifted, implying a
smaller and smaller creature as they get shifted upwards. This is the clas-
sic ‘chipmunk’ effect. Having individual control over the formants and the
amount of the pitch shift can be extremely useful. Lowering the formants
without changing the pitch can make a sound appear to be coming from
a larger source or creature and inversely. Having independent control of
the pitch and formant gives us the ability to create interesting and unusual
hybrid sounds.
140 PRACTICAL SOUND DESIGN
Distortion is a great way to add intensity to a sound. The amount and type of
distortion should be decided based on experience and experimentation, but
when it comes to creature design, distortion can translate into ferocity. Distor-
tion can either be applied to an individual layer of the overall sound or to a
submix of sounds to help blend or fuse the sounds into one while making the
overall mix slightly more aggressive. Of course, if the desired result is to use
distortion to help fuse sounds together and add mild harmonics to our sound,
a small amount of distortion should be applied.
Watch out for the overall spectral balance upon applying distortion, as
some algorithms tend to take away high frequencies and as a result the overall
effect can sound a bit lo-fi. If so, try to adjust the high frequency content by
boosting high frequencies using an equalizer or aural exciter.
Note: as with many processes, you might get more natural-sounding results
by applying distortion in stages rather than all at once. For large amounts, try
splitting the process in two separate plugins, in series each carrying half of the
load.
As with any application, a good equalizer will provide you with the abil-
ity to fix any tonal issues with the sound or sounds you are working with.
Adding bottom end to a growl to make it feel heavier and bigger or sim-
ply bringing up the high frequency content after a distortion stage, for
instance.
Another less obvious application of equalization is the ability to add
formants into a signal that may not contain any or add more formants to a
signal that already does. By adding formants found in a human voice to a
non-human creature and sounds, we can achieve interesting hybrid resulting
sounds.
Since a formant is a buildup of acoustical energy at a specific frequency, it
is possible to add formants to a sound by creating very narrow and powerful
boosts at the right frequency. This technique was mentioned in Chapter five as
a way to add resonances to a sound and therefore make it appear like it takes
place in a closed environment.
In order to create convincing formant, drastic equalization curves are
required. Some equalizer plugins will include various formants as parts of
their presets.
PRACTICAL SOUND DESIGN 141
Figure 6.15
Animal samples can provide us with great starting points for our creature
sound design. Tigers, lions and bears are indeed a fantastic source of fero-
cious and terrifying sounds, but at the same time they offer a huge range of
emotions: purring, huffing, breathing, whining. The animal kingdom is a very
rich one, and do not limit your searches to these obvious candidates. Look
far and wide, research other sound designer’s works on films and games and
experiment.
The main potential pitfall when working with animal samples is to
create something that actually sounds like an animal, in other words too
easily recognizable as a lion or large feline for instance. This is usually
because the samples used could be processed further in order to make
them sound less easily identifiable. Another trick to help disguise sounds
further is to chop off the beginning of the sample you are using. By remov-
ing the onset portion of a sample you make it harder to identify. Taking
this technique further you can also swap the start of a sample with another
one, creating a hybrid sound that after further processing will be difficult
to identify.
frequency component of the original sound while at the same time remov-
ing these original components. In other words, ring modulation removes the
original partials in the sound file and replaces them with sidebands. While the
process can sound a little electronic, it is a great way to drastically change a
sound while retaining some of its original properties.
When trying to create hybrid sounds using convolution, first make sure the
files you are working with are optimal and share at least some frequency con-
tent. You may also find that you get slightly more natural results if you apply
an equalizer to emphasize high frequencies in either input file, rather than
compensating after the process.
Some convolution plugins will give you control over the window length or
size. Although this term, window size, may be labelled slightly differently in
different implementations, it is usually expressed as a power of two, such as
256 or 512 samples. This is because most convolution algorithms are imple-
mented in the frequency domain, often via a Fourier algorithm, such as the
fast Fourier transform.
In this implementation, both audio signals are broken down into small
windows whose length is a power of two, and a frequency analysis is run
on each window or frame. The convolution algorithm then performs a
spectral multiplication of each frame and outputs a hybrid. The resulting
output is then returned to the time domain by performing an inverse Fou-
rier transform.
The process of splitting the audio in windows of a fixed length is not
entirely transparent, however. There is a tradeoff at the heart of this process
that is common to a lot of FFT-based algorithms: a short window size, such
PRACTICAL SOUND DESIGN 143
as 256 and under, will tend to result in better time resolution but poorer fre-
quency resolution. Inversely, a larger window size will yield better frequency
resolution and a poorer time resolution. In some cases, with larger window
sizes, some transients may end up lumped together, disappearing or getting
smeared. Take your best guess to choose the best window size based on your
material, and adjust from there.
Experimentation and documenting your results are keys to success.
Perhaps not as obvious when gathering material for sound design for crea-
tures and monsters is to use material that comes from entirely different
sources than human or animal sources. Remember that we can find interest-
ing sounds all around us, and non-organic elements can be great sources of
raw material. Certain types of sounds might be more obvious candidates
than others. The sound of a flame thrower can be a great addition to a
dragon-like creature, and the sound of scraping concrete blocks or stone can
be a great way to add texture to an ancient molten lava monster, but we can
also use non-human or animal material for primary sounds such as vocaliza-
tions or voices.
Certain sounds naturally exhibit qualities that make them sound
organic. The right sound of a bad hinge on a cabinet door, for instance,
can sound oddly similar to a moan or creature voice when the door is
slowly opening. The sound of a plastic straw pulled out of a fast food cup
can also, especially when pitch shifted down, have similar characteristics.
The sound of a bike tire pump can sound like air coming out of a large
creature’s nostrils and so on. It’s also quite possible to add formants to
most sounds using a flexible equalizer as was described in the previous
section.
Every situation is different of course, and every creature is too. Keep exper-
imenting with new techniques and materials and trying new sounds and new
techniques. Combining material, human, animal and non-organic, can create
some of the most interesting and unpredictable results.
Rather than doing simple crossfades between two samples, we will rely on
an XY pad instead, with each corner linked to an audio file. An XY pad gives
us more options and a much more flexible approach than a simple crossfade.
By moving the cursor to one of the corners, we can play only one file at a time.
By sliding it toward another edge, we can mix between two files at a time, and
by placing the cursor in the center of the screen, we can play all four at once.
This means that we could, for instance, recreate the excitement of fans as their
teams is about to score, while at the same time playing a little of the boos from
the opposite teams as they express their discontent. As you can see, XY pads
are a great way to create interactive audio objects, certainly not limited to a
crowd engine.
Figure 6.16
We will rely on four basic crowd loops for the main sound of the crowd:
Each one of these samples should loop seamlessly, and we will work with
loops about 30 seconds to a minute in length, although that figure can be
adjusted to match memory requirement vs. desired complexity and degree of
realism of the prototype. As always when choosing loops, make sure that the
looping point is seamless but also that the recording doesn’t contain an easily
remembered sound, such as an awkward and loud high pitch burst of laughter
by someone close to the microphone, which would eventually be remembered
by the player and suddenly feel a lot less realistic and would eventually get
annoying. In order to load the files into the crowd engine just drag the desired
file to the area on each corner labelled drop file.
As previously stated, we will crossfade between these sounds by moving the
cursor in the XY pad area. When the cursor is all the way in one corner, only
the sound file associated with that corner should play; when the cursor is in
the middle, all four sound files should play. Furthermore, for added flexibility,
each sound file should also have its own individual sets of controls for pitch,
playback speed and volume. We can use the pitch shift as way to increase
intensity, by bringing the pitch up slightly when needed or by lowering its
pitch slightly to lower the intensity of the sound in a subtle but efficient man-
ner. This is not unlike how we approached the car engine, except that we will
use much smaller ranges in this case.
In order to make our crowd engine more realistic we will also add a sweeteners
folder. Sweeteners are usually one-shot sounds triggered by the engine to make
the sonic environment more dynamic. In the case of a crowd engine these could be
additional yells by fans, announcements on the PA, an organ riff at a baseball game
etc. We will load samples from a folder and set a random timer for the amount
of time between sweeteners. Audio files can be loaded in the engine by dragging
and dropping them in each corner of the engine, and sweeteners can be loaded by
dropping a folder containing .wav or .aif files into the sweetener area.
Once all the files have been loaded, press the space bar to start the playback.
By slowly moving and dragging around the cursor in the XY pad while the
audio files are playing, we are able to recreate various moods from the crowd
by starting at a corner and moving toward another. The XY pad is convenient
because it allows us to mix more than one audio file at once; the center posi-
tion would play all four, while a corner will only play one.
Recreating the XY pad in Unity would not be very difficult; all it would
require are five audio sources, (one for each corner plus one for the sweeten-
ers) and a 2D controller moving on a 2D plane.
The architecture of this XY pad is very open and can be applied to many
other situations with few modifications. Further improvements may include
146 PRACTICAL SOUND DESIGN
Conclusion
Sound design, either linear or interactive, is a skill learned through experimenta-
tion and creativity, but that also requires the designer to be organized and aware
of the pitfalls ahead of them. When it comes to linear sound design, organizing
the session for maximum flexibility while managing dynamic range are going
to be some of the most important aspects to watch out for on the technical
side of things. When it comes to interactive sound design, being able to build
or use prototypes that effectively demonstrate the behavior of the object in the
game by simulating the main parameters is also very important. This will allow
you to address any potential faults with the mechanics or sound design prior
to implementation in the game and communicate more effectively with your
programming team.
Note
1. In order to tryout this example, the reader will need to install Cycling74’s MaxMSP,
a free trial version being available from their website.
CH A PT ER
2
Design in g a
m om en t
Lear n M or e »
15 Designing a moment
Temporality in interactive sound design
Em ot ion in sou n d
design
Lear n M or e »
2 Emotion in sound design
Introduction
This chapter looks at notable examples of current professional and academic lit-
erature that are relevant to this topic, and to the increasing interest in the study of
sound and emotion. This chapter also clarifies the terminology used, discusses to
what extent the relationships of music and emotions – and speech and emotions –
are relevant to this book, and looks at existing sound-related theoretical structures.
Whilst in his essay ‘Why Emotions Are Never Unconscious’, Clore proposes that:
And then, by considering an aspect of the work presented by Deleuze and Guattari
themselves, specifically their ‘autonomy of affect’ theory, which proposes that
affect is independent of the bodily mode through which an emotion is made vis-
ible (Schrimshaw, 2013), it seemed to be incongruous, particularly as far as the
topic of this study is concerned, to elevate the impersonal concept of affect over
the personal and social factors that constitute a cinema-viewing experience, and
more readily align with the term emotion.
Another clarification of the two terms is provided by Lisa Feldman Barrett,
writing an endnote to Chapter 4 of her book How Emotions are Made: The Secret
Life of the Brain:
Many scientists use the word ‘affect’ when really, they mean emotion.
They’re trying to talk about emotion cautiously, in a non-partisan way, with-
out taking sides in any debate. As a result, in the science of emotion, the word
‘affect’ can sometimes mean anything emotional. This is unfortunate because
Emotion in sound design 19
affect is not specific to emotion; it is a feature of consciousness. (Feldman
Barrett, 2017)
And so, throughout this book, it is proposed that the context for the work under-
taken by a Sound Designer and Re-recording Mixer most appropriately lies within
the boundaries of influencing audience emotion.
The challenge of defining what constitutes an emotion remains, however. As
Kathrin Knautz observes, whilst it may be straightforward to determine our own,
because of the difficulty in defining emotions, some researchers resort to formu-
lating a definition by instead looking at the features of emotions. (Knautz, 2012)
Fehr and Russell comment on this conundrum:
Everyone knows what an emotion is, until one is asked to give a definition.
Then, it seems, no one knows.
(1984, p. 464)
In the introduction to his book From Passions to Emotions, Dixon (2003) suggests
that the rise in academic work in a range of fields concerned with the emotions is a
modern trend, one that is in direct contrast to the preoccupation with intellect and
reason of earlier studies. Furthermore, he feels that this is no bad thing:
Human emotion is not just about sexual pleasures or fear of snakes. It is also
about the horror of witnessing suffering and about the satisfaction of seeing
justice served; about our delight at the sensual smile of Jeanne Moreau or
the thick beauty of words and ideas in Shakespeare’s verse; about the world-
weary voice of Dietrich Fischer-Dieskau singing Bach’s Ich habe genung
and the simultaneously earthly and otherworldly phrasings of Maria João
Pires playing any Mozart, any Schubert; and about the harmony that Einstein
sought in the structure of an equation. In fact, fine human emotion is even
triggered by cheap music and cheap movies, the power of which should never
be underestimated. (Damasio, 2000, pp. 35–36)
The first studies of emotion with regard to sound were related to music and came
in the late nineteenth century, coinciding with psychology becoming an independ-
ent discipline around 1897; although the early peak in studies was seen sometime
later, in the 1930s and 1940s (Juslin and Sloboda, 2010).
Today, a multidisciplinary approach pervades the field of emotion in music,
and although there is not yet unanimous agreement on whether there are uniquely
musical emotions, or whether the nature of these emotions is basic or complex,
the field of emotion in music is steadily advancing (Ibid.).
Jenefer Robinson articulates the complexity that the analysis of music and
emotions can produce:
However, Juslin and Sloboda broaden the perspective of the way sound can evoke
emotion from that of a purely music-based discussion, by suggesting that it is now
recognized that a significant proportion of our day-to-day emotions are evoked
by cultural products other than music; and therefore designers should be mindful
of emotion in the products and interfaces that they design, in order to make them
richer and challenging to the user (Juslin and Sloboda, 2010).
From the advent of the medium, moving picture producers have described and
promoted their films by describing the emotions that the audience is intended to
feel when they watch them (e.g. horror, romantic-comedy, or mystery-thriller).
Emotion in sound design 21
So, it is reasonable to suggest that audiences have proven themselves to not only
being susceptible to, but even desirous of having their emotions evoked in a
movie theatre.
Holland, in Literature and the Brain (2009), writes on our emotional response
to literary work, of which cinema is an important part:
We bring to bear on what we now see, some feeling or experience from our
own past. And my bringing my own past to bear on the here and now of trag-
edy makes me feel it all the more strongly. (Holland, 2009, p. 72)
Rather than allow the audience to come to their own conclusions the music
presses an emotional button that tells the audience what to feel, overriding the
words and thoughts of the film’s characters. (Sider, 2003, p. 9)
Above all, I feel that the sounds of this world are so beautiful in themselves
that if only we could learn to listen to them properly, cinema would have no
need of music at all. (Tarkovsky, 1987, p. 162)
So whilst this book looks carefully at the interplay between dialogue and sound
effects, a relationship to which music also makes a conspicuous contribution,
music in this study is treated respectfully for its emotional power in its own right;
but from a Re-recording Mixer’s perspective, music is but one of the sounds that
require balancing.
Because all sounds – not just music – can be emotionally important in a movie
(e.g. a single gunshot suddenly featured in a scene that had only music playing
will immediately draw the listener’s attention away from the music) and whilst
a sound may be interpreted in several ways, often depending on the context it is
heard in, all sounds in this study are referred to, considered as, or classified by,
their primary emotional function or purpose in the soundtrack.
And so, through the combination of all these sounds, the relative proportions
of which are solely determined by the Re-recording Mixer during the act of pre-
mixing and final mixing, the underlying meaning of the soundtrack is revealed.
Secondly, when considering the soundtrack and the way it forms part of
an audio-visual work, there are comparisons that may be drawn between the
Re-recording Mixer’s mix-balancing with an emotional intent in mind, and the
way that everyday speech is used to convey emotion. In speech, the meanings
of words are quite fixed within a language, yet the actual emphasis of the words
being spoken can be quite fluid due to inflection, tonality or accent.
The emphasis on words plays an important role in inducing different emotions
in the listener. For example, I might say the words ‘I’m really sad’ in a helpless
sounding way, or in a sarcastic sounding way. The words are the same and indi-
cate an emotion, but the sound of the words will determine the emotion that the
listener will perceive.
24 Emotion in sound design
So too in a movie, where the words of dialogue that the characters use may
on their own have clear meaning for the plot and storyline; yet when balanced
amongst other mix elements in the soundtrack, what results is a listening experi-
ence that is emotionally richer for the other sound elements that have been placed
carefully around the speech.
Additionally, the visual elements of a film (the acting, editing, lighting, grad-
ing, composition, etc.) can powerfully portray a particular emotional direction
(similarly to how the meaning of words do in speech). But the soundtrack, and
the balancing of its elements by the Re-recording Mixer, can shift the emotional
direction of the overall experience.
This is similar to how the changes in the prosodic patterns that naturally exist
in speech produce emotional shifts: e.g. the tendency to speak unwittingly loud
when gleeful, or in a higher than usual pitch when greeting a sexually attractive
person (Bachorowski, 1999); and this is described in other research studies of
listeners inferring emotion from vocal cues (see Frick, 1985; Graham, San Juan &
Khu, 2016; van Bezooijen, 1984 to name but a few).
In an audio-visual piece of work with emotional meanings already suggested
through the visuals, or through words and other selected sounds, variations in
emotional meaning can also be produced by manipulating the mix balance of the
sound track; which is similar to how natural variations in pitch, loudness, tempo
and rhythm do in speech.
For twenty-five centuries, Western knowledge has tried to look upon the
world. It has failed to understand that the world is not for the beholding. It is
for hearing. It is not legible, but audible. (Attali, 1985, p. 3)
Which implies that sound itself carries a quality, or set of qualities, that can not
only inform a cinema audience, but also impart meaning on what they are seeing;
which in turn relates to the assertions of Holland (2009) and accords with my
notion that (especially) within narrative filmmaking, a significant responsibility is
capable of being borne by the soundtrack to fully engage and emote an audience.
In his essay Art in Noise, Mark Ward suggests that:
Ward also argues against the primacy of speech and music in the traditional
process of soundtrack dissection, instead elevating what might be termed as
Emotion in sound design 25
environmental sound, or sound effects, to a status at least equal to dialogue and
score (Ward, 2015). This also implies that these fuller soundtracks require careful
balancing by the Re-recording Mixer:
Causal listening can condition, or even prepare, the listener by the very nature of
the sounds heard – for instance, the sound effect of a dog barking readily recalls
the image of a dog in the listener.
Chion goes on to describe how a film soundtrack might manipulate causal lis-
tening through its relationship to the pictures; a term he calls Synchresis; whereby
we are not necessarily listening to the initial causes of the sounds in question, but
rather causes that the film has led us to believe in:
For Chion, causal and semantic listening can occur simultaneously within a sound
sequence:
We hear at once what someone says and how they say it. In a sense, causal
listening to a voice is to listening to it semantically, as perception of the hand-
writing of a written text is to reading it. (Chion, 1994, p. 28)
Chion thirdly suggests that reduced listening refers to the listening mode that
focuses on the traits of the very sound itself, independent of its cause and of its
meaning:
Reduced listening has the enormous advantage of opening up our ears and
sharpening our power of listening […] The emotional, physical and aesthetic
value of a sound is linked not only to the causal explanation we attribute to it
but also to its own qualities of timbre and texture, to its own personal vibra-
tion. (Chion, 1994, p. 31)
Finally, Chion asserts that natural sounds or noises have become the forgotten
or repressed elements within the soundtrack – in practice and in analysis; whilst
music has historically been well studied and the spoken voice more recently has
found favour for research:
noises, those humble footsoldiers, have remained the outcasts of theory, hav-
ing been assigned a purely utilitarian and figurative value and consequently
neglected.
(Chion, 1994, pp. 144–145)
Layer 1: dialogue
Layer 2: music
Layer 3: footsteps (Murch’s ‘linguistic effects’)
Layer 4: musical effects (Murch’s ‘atmospheric tonalities’)
Layer 5: sound effects.
This highlights the fact that seeing something on-screen can evoke an emotional
reaction in the observer’s brain through the activity of so-called ‘mirror neurons’,
which are thought to be the main route to human empathy.
Neuroscientist Vilayanur Ramachandran believes that these mirror neurons
actually dissolve the barrier between self and others, light-heartedly referring to
them as ‘Gandhi Neurons’ (Ramachandran, 2009).
But what would seem to be highly significant to this investigation into emo-
tions evoked by sound, is what Keysers et al. (2003) described from the research
they conducted into monkey mirror neurons, in which they found that the same
neurons fired whether an action is performed, seen or simply heard:
In plain terms:
The spectators’ imagination is by far the best filmmaker if it’s given a fair
chance to work. The more precise a scene is, the more unlikely it is to affect
the audience emotionally. By being explicit the filmmaker reduces the pos-
sibilities for interpretation. […] With a minimal amount of visual information
and sounds suggesting something, you can get the audiences’ imaginations
running. (Dykhoff, 2003)
There are many examples of this style of feature film sound design, but a notable
example is the sounds associated with the dinosaurs featured in Jurassic Park
(1993) (Sound Designer and Re-recording Mixer – Gary Rydstrom), which are
seen on-screen for only 15 of the movie’s total 127 minutes – a little over 10% of
the film’s total running time; whilst their mysterious ‘off-screen’ sound is heard by
the audience long before they eventually make an appearance (Van Luling, 2014).
Regarding audience emotions being evoked by the soundtrack, Dykhoff goes
on to make a highly relevant point:
It’s interesting to speculate about how much information the trigger must
contain and how much it actually triggers. (Dykhoff, 2003)
An exploration of the existing literature on emotions and film would seem to sug-
gest that the understanding of the relationship between the overall organization of
a soundtrack and the emphasis within the mix – and the resulting emotions evoked
in an audience – is still very much in its infancy; even if work on the correlation
between emotion categories and types of sounds, or emotions and the acoustic
parameters of sounds in music and speech, has begun to be examined more closely:
Whilst sounds such as speech, music, effects and atmospheres constitute the tradi-
tional groupings of sounds within a moving picture soundtrack – especially dur-
ing its editing and mixing stages – the Four Sound Areas of this research are not
intended to be considered as alternative labels for the long-established audio post-
production working categories of ‘dialogue’, ‘music’ and ‘effects’ stems.
Rather, they sit alongside instead of replacing those headings; and in any case
they do not directly correspond to those categories, by virtue of their being used in
a rather different context: the traditional labels of dialogue, music and effects are
used primarily in the sub-master ‘stems’ delivery process before (and after) the
final mixing of the soundtrack has been undertaken by the Re-recording Mixer.
As will be seen in subsequent chapters, the Four Sound Areas framework is
instead an alternative kind of structure: one that can guide Sound Designers on
how best to group emotionally complementary sounds together at the track-laying
stage of a moving picture project (i.e. a ‘bottom-up’ approach); and then help
Re-recording Mixers to understand which elements of a mix require emphasis, to
increase their ability to enhance, steer or evoke an audience towards a particular
area of emotion (i.e. a ‘top-down’ approach).
2.5.1 Questions
·· What emotions were evoked in you by the sound design of the opening
sequences of Minority Report?
·· What influence do you think that had on the rest of the movie?
Notes
1 A traditional and frequently heard idiom amongst film industry technicians wanting
to highlight the importance of the soundtrack is ‘No one ever came out of a cinema
whistling a two-shot’.
2 Director Alan Crosland and Sound Engineer Nathan Levinson had completed a movie
for Warner Brothers a year earlier – Don Juan (1926) – that used the same Vitaphone
sound playback system as The Jazz Singer (1927). However, although the soundtrack
of Don Juan was synchronized to picture, it consisted solely of music with no speech
from the actors.
3 There is a famous Hollywood story that suggests the composer Arnold Schoenberg
once wrote a film score thinking that a feature film would subsequently be made to
match his music.
4 As well as being the Sound Designer and Re-recording Mixer, Murch also picture-
edited The Conversation (1974) and Apocalypse Now (1979). He won an Academy
Award for Best Sound Mixing on Apocalypse Now.
CH A PT ER
4
Lear n M or e »
46
4
Immersive Sound Production
Using Ambisonics and Advance
Audio Practices
What is Ambisonics?
Ambisonics is a completely different approach that as much as possible captures and reproduces
the entire 360 immersive soundfield from every direction equally –sound from the front,
sides, above and below to a single capture/focal point. Ambisonic attempts to reproduce as
much of the soundfield as possible regardless of speaker number or location because ambi-
sonics is a speaker independent representation of the soundfield and transmission channels do
not carry speaker setups. Since HOA is based on the entire soundfield in all dimensions, a sig-
nificant benefit with this audio spatial coding is the ability to create dimensional sound mixes
with spatial proximity with horizontal and vertical localization.
DOI: 10.4324/9781003052876-4
47
Figure 4.3 Higher order ambisonics (HOA) listens to the entire soundfield at a macro level
50
Production Note
HOA allows the sound designer/producer to create or develop the direction of the
sound and not be tied to where the speaker or reproduction device may be, which is
contradictory to the way a lot of sound is produced.
The fact is that dimensional sound formats will not go away. People want options for con-
sumption and the challenge is how to produce and manage a wide range of audio experiences
and formats as fast and cheaply as possible. Cheaply also means minimizing the amount of data
being transferred. If the rendering is done in the consumer device, it inherently means that
there is a need to deliver more channels/data to the device. However, data compression has
significantly advanced to the point that twice as much audio can be delivered over the same
bit stream as previous codecs.
Now consider the upside of the production options using HOA. You have the ability
to reproduce essentially all spatial formats over 7 of the 16 channels (a metadata channel is
needed), then you have another eight individual channels for multiple languages, custom
audio channels, and other audio elements or objects that are unique to a particular mix.
Additionally, producing the foundation soundfield separately from the voice and
personalized elements facilitates maximum dynamic range along with loudness compliance
while delivering consistent sound over the greatest number of playout options.
The sonic advantages of ambisonics reside with the capture and/or creation of HOA.
Ambisonics works on a principle of sampling and reproducing the entire soundfield.
Intuitively, as you increase the ambisonic order the results will be higher spatial resolution
and greater detail in the capture and reproduction of the soundfield. However, nothing comes
without a cost. Greater resolution requires more soundfield coefficients to map more of the
soundfield with greater detail. Some quick and easy math: fourth order ambisonics requires
25 coefficients, fifth order requires 36, and sixth order requires 49 and so on. The problem
has been that HOA Production requires a very high channel count to be effective which did
not fit in the current echo system, but coding from Qualcomm and THX has reduced the
bandwidth for a HOA signal to fit in 8 channels of the 15 or 16 channel architecture leaving
channels for objects and interactive channels.
Dr. Deep Sen has been researching the benefits of HOA for decades and headed a team that
developed a “mezzanine coding” that reduces the channels up to the 29th order HOA (900
Channels) to 6 channels +control track. Now consider a sound designer’s production options.
HOA provides the foundation for stereo, 5.1, 7.1, 7.1+4, 10.2, 11.1 up to 22.2 and higher
using only 7 channels in the data stream. I suspect that there are points of diminishing returns.
Scalability –the first four channels in a fifth order and a first order are exactly the same.4
Capture or Create?
HOA is a post-production as well as a live format; however, live production is dependent on
complexity and latency.
Figure 4.5 Channel control strip from a HOA tools plug-in with controls for azimuth, elevation, dis-
tance and size
53
Limits of perception
The human auditory system can only process sound waves within a certain frequency range.
This does not mean these extended frequencies do not exist, just that humans do not process
them through the auditory system. Additionally, the auditory system does not process all fre-
quencies the same. Some frequencies are more intense even when they are at the same ampli-
tude. For example, low frequency sound waves require significantly more energy to be heard
than high frequencies. Our hearing is not linear and the equal loudness curves known profes-
sionally as the Fletcher-Munson curves show the relationship between frequency, amplitude
and loudness. Finally, complex soundfields can suffer from frequency masking. Two sounds of
the same amplitude and overlapping frequencies are difficult to understand because the brain
needs a minimum difference in frequency to process the sound individually.
Sound localization is impacted by the size of the head and chest and the physical distance
between the ears.This is known as the head related transfer function (HRTF).The sound usu-
ally reaches the left ear and right ear at slightly different times and intensities and along with
tone and timbre the brain uses these clues to identify the location a sound is coming from.
Cognition is what happens in the mind where the brain infuses personal biases and
experiences. For example, when a baby laughs there is one reaction as opposed to when
they cry. Cognitive perception of sound has created an entire language of sound. Defining
and describing sound is often a difficult exercise because our vocabulary for sound includes
descriptive phrases that comprise both objective and subjective metaphors. Technical
characteristics such as distortion, dynamic range, frequency content and volume are measur-
able and have a fairly universal understanding, but when describing the aesthetic aspects and
sonic characteristics of sound, our descriptors tend to become littered with subjective phrase-
ology. Here is the simple yet complicated phrase which has always irritated me: “I don’t like
the way that sounds.” Exactly what does that mean? I worked with a producer who made that
comment halfway through a broadcast season during which I had not changed anything sub-
stantial in the sound mix. Being the diligent audio practitioner, I took his comment to heart
and really spent time listening to try to understand why he said what he said.
Broadcast sound and sound design is a subjective interpretation of what is being presented
visually. The balance of the commentary with the event and venue sound is interpreted by
the sound mixer. The sports director and producer are consumed with camera cuts, graphics
and replays while possibly focusing on the sonic qualities of a mix may be beyond their con-
centration. Factor in the distractions and high ambient noise levels in an OB van –remember
technical communications are verbal and everybody in the OB Van wears headsets –and now
you have to wonder who is really listening.
Meanwhile, after objectively listening and considering what the problem could be,
I inquired about the balance of mix, its tonal qualities, and my physical execution of the mix.
Once again the answer was, “I don’t like the sound.” My next move was to look really busy
and concerned and ultimately do nothing. That seemed to work.
When surround sound came along, a common description emerged to describe the sound
design goals: to enhance the viewer experience. At least now when there is talk about multi-
channel 3D sound, the conversation begins with the nebulous notion of immersive experi-
ence. I think this has to do with creating the illusion of reality … go ahead, close your eyes …
do you believe you are there?
So what do balance, bite, clarity, detail, fidelity, immersive experience, punch, presence,
rounded, reach, squashed or warmth have to do with sound? As audio practitioners we seem
56
Principles of Psychoacoustics
Understanding how we hear, along with how the brain perceives sounds, gives sound
designers and software engineers the ability to model sound-shaping algorithms based on
psychoacoustic principles and thought. Common considerations when modeling sound are
frequency and time, so instead of using elevation to achieve height try using equalization
which can be an effective means for creating impression of height.We naturally hear high fre-
quencies as coming from above because high frequencies are more directional and reach our
ears with less reflection. This principle is known as the Blauart Effect.5
Significantly, a lot of the low frequency energy has already been lost. By equalizing cer-
tain frequencies, you can create the illusion of top and bottom; in other words, the greater
the contrast between the tone of the top and the bottom, the wider the image appears to be.
This principle works well for sports and entertainment because you can build a discernable
layer of high frequency sounds (such as atmosphere) slightly above the horizontal perspective
of the ear.
Psychoacoustic Masking
Psychoacoustic masking is the brain’s ability to accept and subdue, to basically filter certain
distracting sounds. I have read articles touting the miracles of sound replication by Edison’s
Victrola. Edison demoed real singers side by side with his devices, he would pull back the cur-
tain exclaiming better than life, pay no attention to those pops and ticks in the recording. The
mechanical reproduction devices suffered from a significant amount of scratches and ticks but
the brain filters out the undesirable noise. For example, radio static is filtered out by the brain
when a high proportion of high frequency components are introduced. Additionally, noise
and artifacts from over compressed digital sampling may be filtered by the brain but result in
unhealthy sound.
DSpatial
DSpatial created a bundle of plug-ins that work under the AAX platform in a fully coordinated
way. Reality builder is inserted on each input channel and can operate in real time in the pro
59
DENNIS BAXTER (dB) : As a sound designer creating a sense of motion and speed has always
been a challenge, particularly with events that do not have a lot of dynamics like downhill
skiing or ski jumping. Creating the illusion of someone flying through the air on a pair
of skis is a challenge.
R AFAEL DU YOS ( RD ) : Scientifically speaking, what we have done is a balanced trade- off
between physical and psychoacoustic modeling principles. By that I mean that if some-
thing mathematically correct doesn’t sound right, we have twisted it until it sounds right.
After all, film and TV are not reality but a particular interpretation of reality. So we are not
always true to reality, but we are true to the human perception of it.
R D: We have applied this (principle) to all the effects we have modeled. For example, Doppler
is a direct consequence of the delay between the source and the listener, when any or
both are moving in relation to the other, but we have made this delay optional because
sometimes it can become disturbing. Inertia was implemented to make the Doppler
effect more realistic by simulating the mass of moving objects. Inertia is applied to each
source according to its actual mass. Small masses have much more erratic movements.The
Doppler of a fly doesn’t sound the same as the Doppler of a plane. Doppler and Inertia
usually have to be adjusted in parallel; very high degrees of Doppler usually require more
inertia.
In the case of proximity, for example, we have even provided for an adjustment of the
amount of proximity effect, from nothing (like current panning systems) to fully real-
istic. We use equalization only marginally. Normally we use impulse responses and
convolutions because they are much more realistic. A very important part of the algo-
rithm is the reflections. Take binaural, for example. A loose HRTF usually doesn’t sound
very realistic. However, if you take a good binaural microphone, it sounds much better
than an HRTF alone, and that’s because with the microphones you get millions of micro
reflections coming from everywhere.That’s what we try to model as much as possible.We
are probably the system that needs the most computation to work, but we are not worried
about that because computers are getting more and more powerful. Time is on our side.
d B : I thought your program for Walls and doors –reflection, refraction, diffraction and
scattering produced by walls and doors was very clever and useful. Can you explain your
scatter principle?
R D: Dispersion is achieved through extreme complexity. The key to our system is our
Impulse Response creator. This is something that cannot be achieved with algorithmic
reverberations, and allows us to get the best of convolution and the best of algorithms.
R D: The complexity of IR modeling allows us to create fully decorrelated IRs for each of the
speakers. That’s simply not possible with microphone recorded IRs. For us it’s the essen-
tial part of our design. Our walls, doors, reflection, refraction, diffraction and scattering
base their performance on the complexity. Rotate, collapse, explode, etc. are created in
our DSpatial native format, and can then be exported to any format, be it Ambisonics,
binaural, ATMOS, Auro3D. There is no format limit. As we record the automations and
not the audio, we can always change it later.
d B : What are the X,Y and Z controls for?
60
Figure 4.6
DSpatial ambient options window
61
Figure 4.7
DSpatial-Reality-2-0
gunshots, screaming, horses, door closing sounds, footsteps, etc. In these cases, there is a
pad-controlled firing mode, of course, supporting spatialization parameters.
The Ambient system is also intelligent enough to use multichannel audio in both the
ambience source and the number of channels in the final mix, ensuring the best possible
spatialization.
d B : Can you explain Spatial objects?
R D: Spatial-Object is what we call DSpatial objects which is the next generation of objects.
Traditional objects are simple mono or stereo files located in a grid of speakers. They
lack the ambience, which in reality is closely linked to the original signal. The envir-
onment is supplied separately in the form of beds. But that has the problem that the
beds don’t have good spatial resolution. If our goal is to make the system realistic, using
beds is not a good idea. To be realistic, objects have to be linked to their reflections. But
for that you need an integrated system that manages everything. That is exactly what
Reality Builder does.
R D: DSpatial- Objects are devoted to production, not just delivery. Contrary to all object-
based formats, DSpatial work with objects from the very beginning of a production.
d B : Remember, Dolby required a bed to get started.
R D: With a DSpatial workflow it is ideal is to work dry, and add as much, or as few, reverbs
as you want afterwards. There is no need to record the original reflections, hyper-realism
and repositioning possibilities DSpatial extreme realism allows for total control in
post-production.
This author listens and mixes in a neutral acoustic environment using ProTools, Nuendo
and Reaper with 11.1 Genelec speakers 7.1.4 and has auditioned and mixed the plug-ins
described in this book.
The ability to create sonic spaces in real time is a powerful tool in immersive sound cre-
ation and production. Remember sports sound design is equal parts sports specific, event
specific and venue specific. As discussed in Chapter 5, microphones capturing sports specific
sound is possible, but capturing the right venue tone is complicated by poor acoustics and
little noise control. Advance audio production practices advocate manufacturing an immersive
soundbed to develop upon.
62
Sound Particles
You probably have heard sound particles on film-type productions, but sound particles has
developed an immersive audio generator that produces sounds in virtual sound worlds. Sound
particles is a 3D native audio system that uses computer graphic imagery (CGI) (modeling)
techniques to create three-dimensional images in films and television. Sound particles uses
similar CGI computer modeling principles to generate thousands of 3D sound particles cre-
ating complex sound effects and spatial imaging.All sound particles processes require rendering.
Practice application –sound particles is a post-production plug-in but because of flex-
ible I/O configurations a timed event could be triggered, exported from the live domain to
sound particles, rendered and played out live through the sound I/O with the live action. For
example, a wide shot of the back stretch of a horse race is probably a sample playback and the
sample playback could be processed, rendered in real time and timed to the duration of the
horses run along a particular distance.
Sound particles can be anything from a single simple particle to a group of particles forming
complex systems. To build a new project from scratch, open the menu and select EMPTY
which opens a blank timeline. Now you can build your new timeline with video at the top
and then add audio track(s), add particle group, add particle emitter, add microphone or begin
with presets.
An audio track is the sound that is going to be processed and can be mono, stereo or ambi-
sonic. This is usually some file format such as a .wave or other audio file. You import your
audio file or files to the timeline. In the case of using multiple files each particle will randomly
select an audio file from the selection of imported files.
Other Plug-Ins
DTS-X Neural Surround Upmixer converts stereo and surround sound content to 5.1.4,
7.1.4, 7.1.5 and 9.1.4. (See Chapter 8.)
The WAVES MaxxAudio Suite includes extended bass range using psychoacoustics offering
better sound reproduction through small speakers, laptops, tablets and portable speakers.Waves
has a standalone headtracking controller.
The NuGen Halo Upmix 3D is channel-based output as well as ambisonics. Native upmix
to Dolby Atmos 7.1.2 stems and height channel control as well as 1st Order ambisonics.
During rendering, the software conforms the mix to the required loudness specification and
prepares the content for delivery over a wide array of audio formats from mono to various
immersive formats supporting up to 7.1.2. Nugen’s software can also down-process audio
signals with its Halo Downmix feature that gives the audio mastering process new ranges for
downmix coefficients, and a Netflix preset as well.
The Gaudio Spatial Upmix extracts each sound object from the stereo mix and then
spatializes the 3D scene on binaural rendering technology adopted from Next Generation
Audio standard ISO/IEC 23008-3 MPEG-H.
64
Notes
1 Christiaan Huygens. Sciencedirect.com/topics/physics-and-astronomy/Huygens-principle, courses.
lumenlearning.com/austincc-physics2/chapter/27-2-huygens-principle, Traite de la Lumiere.
Limited John Wiley and Sons 1690,
2 MI. A. Gerzon, “Periphony: With-Height Sound Reproduction,” J. Audio Eng. Soc., vol. 21, no. 1,
pp. 2–10 (1973 February).
3 Olivieri, Ferdinando, Nils Peters, and Deep Sen. 2019. Review of Scene-Based Audio and Higher
Order Ambisonics: A Technology Overview and Application to Next-Generation Audio, vr and 360°
Video. EBU Technical Review. https://tech.ebu.ch/docs/techreview/trev_2019-Q4_SBA_HOA_
Technology_Overview.pdf.
4 D. Sen, N. Peters, M. Kim, and M. Morrell, “Efficient Compression and Transportation of Scene-
Based Audio for Television Broadcast,” Paper 2-1, (2016 July).
5 Blauert, Jens. 2001. Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge: The
MIT Press.
6 H. Haas,“The Influence of a Single Echo on the Audibility of Speech,” J. Audio Eng. Soc., vol. 20, no. 2,
pp. 146–159 (1972 March).
7 “The Doppler Effect: Christian Doppler Wissensplattform.” n.d. Accessed December 16, 2021. www.
christian-doppler.net/en/doppler-effect/.
8 McKamey, Timothy. 2013. “Restoration of the Missing Fundamental.” Sound Possibilities Forum.
September 7, 2013. https://soundpossibilities.net/2013/09/06/restoration-of-the-missing-fund
amental/.
CH A PT ER
5
Lever agin g M ot ion an d
Con cept u al
Fr am ew or k s of Sou n d
as a Novel M ean s of
Sou n d Design in
Ext en ded Realit y
Lear n M or e »
8 Leveraging Motion and
Conceptual Frameworks of Sound
as a Novel Means of Sound Design
in Extended Reality
Tom A. Garner
1 Introduction
Bemoaning the under-appreciation of sound, specifically when compared to
visuals, in the design of virtual worlds is almost something one could build
an academic career on. Indeed, many of my prior works have been introduced
in this manner, to the extent that I sometimes even question my commitment
to addressing the issue. Were it to be solved, I would need to find something
else to complain about. Personal issues aside, it would be unfair to suggest that
sound design for virtual worlds has not progressed. In many ways it has, and
in leaps and bounds, but it often feels at least one step behind its visual cousin.
It is one of the most recent examples of this issue that is the subject of this
chapter, namely the consideration of sound amidst a form of cross-pollination
of technology and practice that is being driven by extended reality, or XR.
The term ‘extended reality’, long before it was abbreviated, goes back at least
25 years. It appears in the title of a 1996 paper by Yuval Ne’eman, in which the
term described a theoretical infinite sequence of parent universes connected in
a linear sequence, each birthing the next in line; essentially, reality extending
beyond the known universe. Extended reality reappears in academic literature
a couple of years later, this time analogous to augmented reality, in its ability
to extend reality by way of digital overlays upon a user-view of a physical
environment (Klinker et al. 1998). Over the next few years, the term remained
rather obscure, but the notion of extending our reality through technology, art
and thought persisted and continued to develop.
In most contemporary definitions, the meaning of XR takes much influence
from the taxonomy of Milgram and Kishino (1994) and functions as an umbrella
term to refer to the collective suite of virtual, augmented and mixed-reality
technologies, such as head-mounted displays, spatial computing systems and
wearables. To be clear, this usage of the term would arguably be best abbrevi-
ated to ‘xR’, with the prefix in lower case to signify something. Research typi-
cally deploys xR when describing an area of industry such as manufacturing
(Fast-Berglund et al. 2018) or construction (Alizadehsalehi et al. 2020) that
utilise a combination of virtual, augmented or mixed-reality systems as a suite
of technological solutions. Otherwise ‘XR’ is at present the default, and we
DOI: 10.4324/9781003140535-8
178 Tom A. Garner
therefore use this format throughout the chapter. Broadly speaking, the ratio
of virtual to physical content within a singular user experience identifies three
conceptual classes within extended reality, namely: Virtual Reality (VR), with
its emphasis upon virtual content; Augmented Reality (AR), which prioritises
experience of physical content; and Mixed-Reality (MR), a more balanced or
complex interplay between physical and virtual content.
In many recent cases, what constitutes virtual, augmented or mixed reality
has become entwined with specific hardware devices, presented as platforms
to exclusively deliver that form of XR content. The head-mounted display
(HMD) has arguably become so synonymous with virtual reality, in particular,
that many perceive the device and the concept to be the same thing. MR has
its equivalent in location-based experiences: installations comprising bespoke
physical and virtual content, such as digitally enhanced museum exhibits or
theme park rollercoasters. The immediate problem with us understanding XR
in this way, which feeds heavily into matters of extended reality sound design,
is that it constrains our expectations for what technologies and practices can
be deployed. If XR is restricted to VR and AR in particular, both of which are
themselves viewed as restricted to HMD hardware, this arguably limits numer-
ous opportunities to provide more nuanced, effective and efficient solutions.
The core aim of this chapter is to emphasise the great potential of sound
design research and practice to meaningfully enhance extended reality applica-
tions, both now and in the future. Feeding into this overarching ambition, the
discussion commences with a rationale for cross-pollination: extending the
meaning of XR by considering it more holistically, as a wider array of tech-
nologies that should not be deployed or developed in isolation, but rather as
a collection of potentials from which an ideal solution can emerge. Following
on from this, the discussion then turns to make the case for human motion to
be appreciated as one of the most significant opportunities to drive innovative
sound design in XR. This is done in three stages, each based on a key prem-
ise. The first premise is that human motion is the defining innovative asset of
contemporary XR technology. The second is that sound and human motion are
intrinsically and deeply interconnected. The final premise is that the substan-
tial body of literature concerning acoustic ecology and theories of sound and
listening can be leveraged to reveal numerous opportunities for developing
innovative approaches to motion-driven XR sound design.
References
Alizadehsalehi, S., Hadavi, A., & Huang, J. C. (2020). From BIM to extended reality in
AEC industry. Automation in Construction, 116, 103254.
Bellack, A. S., Hersen, M., & Lamparski, D. (1979). Role-play tests for assessing social
skills: Are they valid? Are they useful? Journal of Consulting and Clinical Psychol-
ogy, 47(2), 335.
Bijsterveld, K. (2019). Sonic Skills: Listening for Knowledge in Science, Medicine and
Engineering (1920s-Present) (p. 174). Springer Nature, Cham.
Chatzidimitris, T., Gavalas, D., & Michael, D. (2016, April 18–20). SoundPacman:
Audio augmented reality in location-based games. In 2016 18th Mediterranean Elec-
trotechnical Conference (MELECON) (pp. 1–6). IEEE, Lemesos, Cyprus.
Chion, M. (2012). The three listening modes. The Sound Studies Reader, 48–53.
Collins, K. (2013). Playing with Sound: A Theory of Interacting with Sound and Music
in Video Games. MIT Press.
Çöltekin, A., Lochhead, I., Madden, M., Christophe, S., Devaux, A., Pettit, C., . . . Hed-
ley, N. (2020). Extended reality in spatial sciences: A review of research challenges
and future directions. ISPRS International Journal of Geo-Information, 9(7), 439.
Cox, T. J. (2008). Scraping sounds and disgusting noises. Applied Acoustics, 69(12),
1195–1204.
Crawford, K. (2009). Following you: Disciplines of listening in social media. Con-
tinuum, 23(4), 525–535.
D’Amico, G., Del Bimbo, A., Dini, F., Landucci, L., & Torpei, N. (2010) Natural
human—computer interaction. In: Shao, L., Shan, C., Luo, J., & Etoh, M. (eds.),
Multimedia Interaction and Intelligent User Interfaces. Advances in Pattern Recog-
nition. Springer, London.
D’Auria, D., Di Mauro, D., Calandra, D. M., & Cutugno, F. (2015). A 3D audio aug-
mented reality system for a cultural heritage management and fruition. Journal of
Digital Information Management, 13(4).
Donohue, W. A., Diez, M. E., & Hamilton, M. (1984). Coding naturalistic negotiation
interaction. Human Communication Research, 10(3), 403–425.
Motion, sound design and extended reality 195
Doolani, S., Wessels, C., Kanal, V., Sevastopoulos, C., Jaiswal, A., Nambiappan, H., &
Makedon, F. (2020). A review of extended reality (XR) technologies for manufactur-
ing training. Technologies, 8(4), 77.
Fast-Berglund, Å., Gong, L., & Li, D. (2018). Testing and validating Extended Reality
(xR) technologies in manufacturing. Procedia Manufacturing, 25, 31–38.
Flavián, C., Ibáñez-Sánchez, S., & Orús, C. (2019). The impact of virtual, augmented
and mixed reality technologies on the customer experience. Journal of Business
Research, 100, 547–560.
Frank, R. J. (2000, August 27 – September 1). Temporal elements: A cognitive system of
analysis for electro-acoustic music. In International Computer Music Conference Pro-
ceedings (Vol. 2000). Michigan Publishing, University of Michigan Library, Berlin.
Goodwin, S. N. (2019). Beep to Boom: The Development of Advanced Runtime Sound
Systems for Games and Extended Reality. Routledge, New York.
Grimshaw, M. N. (2007). The acoustic ecology of the first-person shooter (Doctoral
dissertation, The University of Waikato).
Haga, E. (2008). Correspondences between music and body movement (Doctoral dis-
sertation, University of Oslo).
Halliwell, S. (2014). Diegesis—mimesis. Handbook of Narratology, 129–137.
Helsel, S. (1992). Virtual reality and education. Educational Technology, 32(5), 38–42.
Hong, D., Lee, T. H., Joo, Y., & Park, W. C. (2017, February). Real-time sound propa-
gation hardware accelerator for immersive virtual reality 3D audio. In Proceedings
of the 21st ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
(pp. 1–2). ACM, New York.
Johnson, D., Damian, D., & Tzanetakis, G. (2019). Osc-xr: A toolkit for extended reality
immersive music interfaces. http://smc2019.uma.es/articles/S3/S3_04_SMC2019_
paper.pdf (accessed 04.03.2021)
Kaghat, F. Z., Azough, A., Fakhour, M., & Meknassi, M. (2020). A new audio aug-
mented reality interaction and adaptation model for museum visits. Computers &
Electrical Engineering, 84, 106606.
Klinker, G., Stricker, D., & Reiners, D. (1998, June). The use of reality models in aug-
mented reality applications. In European Workshop on 3D Structure from Multiple
Images of Large-Scale Environments (pp. 275–289). Springer, Berlin, Heidelberg.
Krasnor, L. R., & Rubin, K. H. (1983). Preschool social problem solving: Attempts and
outcomes in naturalistic interaction. Child Development, 1545–1558.
Laurel, B., & Mountford, J. (1990). The Art of Human-computer Interface Design. Pub-
lished by Addison-Wesley Longman, Boston
Linqin, C., Shuangjie, C., Min, X., Jimin, Y., & Jianrong, Z. (2017). Dynamic hand ges-
ture recognition using RGB-D data for natural human-computer interaction. Journal
of Intelligent & Fuzzy Systems, 32(5), 3495–3507.
Luck, M., & Aylett, R. (2000). Applying artificial intelligence to virtual reality: Intel-
ligent virtual environments. Applied Artificial Intelligence, 14(1), 3–32.
Milgram, P., & Kishino, F. (1994). A taxonomy of mixed reality visual displays. IEICE
Transactions on Information and Systems, 77(12), 1321–1329.
Morawitz, F. (2018, March). Quantum: An art-science case study on sonification and
sound design in virtual reality. In 2018 IEEE 4th VR Workshop on Sonic Interactions
for Virtual Environments (SIVE) (pp. 1–5). IEEE.
Norton, R. (1972). What is virtuality? The Journal of Aesthetics and Art Criticism,
30(4), 499–505.
196 Tom A. Garner
Nymoen, K., Godøy, R. I., Jensenius, A. R., & Torresen, J. (2013). Analyzing corre-
spondence between sound objects and body motion. ACM Transactions on Applied
Perception (TAP), 10(2), 1–22.
O’Callaghan, C. (2011). Lessons from beyond vision (sounds and audition). Philosoph-
ical Studies, 153(1), 143–160.
Orcutt, J. D., & Anderson, R. E. (1974). Human-computer relationships: Interactions
and attitudes. Behavior Research Methods & Instrumentation, 6(2), 219–222.
Pajala-Assefa, H., & Erkut, C. (2019, October). A study of movement-sound within
extended reality: Skeleton conductor. In Proceedings of the 6th International Confer-
ence on Movement and Computing (pp. 1–4). ACM, New York.
Plouffe, G., Cretu, A. M., & Payeur, P. (2015, October). Natural human-computer inter-
action using static and dynamic hand gestures. In 2015 IEEE International Sympo-
sium on Haptic, Audio and Visual Environments and Games (HAVE) (pp. 1–6). IEEE.
Poerio, G. L., Blakey, E., Hostler, T. J., & Veltri, T. (2018). More than a feeling: Auton-
omous sensory meridian response (ASMR) is characterized by reliable changes in
affect and physiology. PloS One, 13(6), e0196645.
Raghuvanshi, N., & Snyder, J. (2018). Parametric directional coding for precomputed
sound propagation. ACM Transactions on Graphics (TOG), 37(4), 1–14.
Rautaray, S. S., & Agrawal, A. (2012). Real time multiple hand gesture recognition
system for human computer interaction. International Journal of Intelligent Systems
and Applications, 4(5), 56–64.
Rebelo, P., Green, M., & Hollerweger, F. (2008). A typology for listening in place. In
Proceedings of the 5th International Mobile Music Workshop (pp. 15–18).
Rice, T. (2015). Listening. In: Novak, D., & Sakakeeny, M. (eds.), Keywords in Sound.
Duke University Press, Durham, NC.
Sanchez, G. M. E., Van Renterghem, T., Sun, K., De Coensel, B., & Botteldooren,
D. (2017). Using Virtual Reality for assessing the role of noise in the audio-visual
design of an urban public space. Landscape and Urban Planning, 167, 98–107.
Savioja, L., Huopaniemi, J., Lokki, T., & Väänänen, R. (1999). Creating interactive
virtual acoustic environments. Journal of the Audio Engineering Society, 47(9),
675–705.
Seeger, A. (1994). Music and dance. Companion Encyclopedia of Anthropology,
686–705.
Serafin, S., Erkut, C., Kojs, J., Nilsson, N. C., & Nordahl, R. (2016). Virtual reality
musical instruments: State of the art, design principles, and future directions. Com-
puter Music Journal, 40(3), 22–40.
Serafin, S., Geronazzo, M., Erkut, C., Nilsson, N. C., & Nordahl, R. (2018). Sonic inter-
actions in virtual reality: State of the art, current challenges, and future directions.
IEEE Computer Graphics and Applications, 38(2), 31–43.
Skult, N., & Smed, J. (2020). Interactive storytelling in extended reality: Concepts for
the design. Game User Experience and Player-Centered Design, 449–467.
Slater, M., Steed, A., & Usoh, M. (1995). The virtual treadmill: A naturalistic meta-
phor for navigation in immersive virtual environments. In Virtual Environments’ 95
(pp. 135–148). Springer, Vienna.
Smith, S. L., & Goodwin, N. C. (1970). Computer-generated speech and man-computer
interaction. Human Factors, 12(2), 215–223.
Song, Y., Demirdjian, D., & Davis, R. (2012). Continuous body and hand gesture rec-
ognition for natural human-computer interaction. ACM Transactions on Interactive
Intelligent Systems (TiiS), 2(1), 1–28.
Motion, sound design and extended reality 197
Sterne, J. (2003). The Audible Past: Cultural Origins of Sound Reproduction. Duke
University Press, Durham, NC.
Summers, C., Lympouridis, V., & Erkut, C. (2015, March). Sonic interaction design
for virtual and augmented reality environments. In 2015 IEEE 2nd VR Workshop on
Sonic Interactions for Virtual Environments (SIVE) (pp. 1–6). IEEE.
Székely, G., & Satava, R. M. (1999). Virtual reality in medicine. BMJ: British Medical
Journal, 319(7220), 1305.
Treu, S. (1976, October). A framework of characteristics applicable to graphical user-
computer interaction. In Proceedings of the ACM/SIGGRAPH Workshop on User-
oriented Design of Interactive Graphics Systems (pp. 61–71). ACM, New York.
Truax, B. (2001). Acoustic Communication. Greenwood Publishing Group, Santa Bar-
bara, CA.
Tuuri, K., & Eerola, T. (2012). Formulating a revised taxonomy for modes of listening.
Journal of New Music Research, 41(2), 137–152.
Vi, S., da Silva, T. S., & Maurer, F. (2019, September). User experience guidelines for
designing hmd extended reality applications. In IFIP Conference on Human-Computer
Interaction (pp. 319–341). Springer, Cham.
Vorländer, M., Schröder, D., Pelzer, S., & Wefers, F. (2015). Virtual reality for architec-
tural acoustics. Journal of Building Performance Simulation, 8(1), 15–25.