toole
toole
Multichannel Future
Floyd E. Toole
Harman International Industries,Inc.
Northridge, CA 91329, USA
Presented at
the 109th Convention
2000 September 22125
Los Angeles, California, USA
This preprint has been reproduced from the author’s advance
manuscript, without editing, corrections or consideration by the
Review Board. The AES takes no responsibility for the
contents.
Floyd E. Toole
Harman International Industries, Inc.,
8500 Balboa Blvd., Northridge, CA 91329 USA
PH: 818 895 5761 FAX: 818 893 7139
e-mail: [email protected]
Abstract - At the point of delivering recorded sounds to listeners, the complex interactions of loudspeakers, rooms, and
listeners present many challenges to delivering satisfying listening experiences. In the past, the directional and spatial
limitations of two-channel stereo led to many creative solutions in both loudspeakers and room design. However,
multichannel audio changes the rules, and sets new requirements for optimum listening conditions, not all of which are
yet understood. Underlying all of this, though, are the basic requirements for timbral accuracy – good sound quality. In
this respect there is a need for the recording industry to catch up to the highest standards of consumer audio if the “art” is
to be adequately preserved. This presentation is a review of the science of sound reproduction in rooms, with a specific
focus on what may be needed in order for our industry to face the challenges of multichannel recording and reproduction.
Audio products must sound good. That is a given. However, the determination of what constitutes “good sound” is a
matter that has been controversial. Some assert that it is a matter of personal taste, that our opinions of sound quality are
as variable as our tastes in “wine, persons or song”. This would place audio manufacturers in the category of artists,
trying to appeal to a varying public “taste”. Others, like the author, take a more pragmatic view, namely that artistry is
the domain of the instrument makers and musicians and that it is the role of audio devices to capture, store and reproduce
their art with as much accuracy as technology allows. The audio industry then becomes the messenger of the art.
Interestingly, though, this process has created new “artists”, the recording engineers, who are free to editorialize on the
impressions of direction, space, timbre and dynamics of the original performance, as perceived by listeners through their
audio systems. Other creative opportunities exist at the point of reproduction, as audiophiles tailor the fundamental form
of the sound field in listening rooms by selecting loudspeakers of differing timbral signatures and directivities, and by
adjusting the acoustics of the listening space with furnishings or special acoustical devices.
To design audio products, engineers need technical measurements. Historically, measurements have been viewed with
varying degrees of trust. However, in recent years, the value of measurements has increased dramatically, as we have
found better ways to collect data, and as we have learned how to interpret the data in ways that relate more directly to
what we hear. Measurements inevitably involve objectives, telling us when we are successful. Some of these design
objectives are very clear, and others still need better definition. All of them need to be moderated by what is audible.
Imperfections in performance need not be immeasurably small, but they should be inaudible. Achieving this requires
knowledge of psychoacoustics, the relationship between what we measure and what we hear. This is a work in progress,
and considerable gains have been made. Given that loudspeakers, listeners and rooms form a complex acoustical system,
some effort must necessarily be devoted to understanding the interactions of these elements so that we may maximize the
performance of the entire system in real world circumstances.
In the midst of learning how to design good loudspeakers and, naturally, living and working with our evolutionary
mistakes, the audio industry has been making recordings. In the production of recordings we use loudspeakers for
monitoring our progress, and for judging the success of the result. Over the years, loudspeakers have been inconsistent
in their performance, and indeed, in their design objectives. The result has been variability in the recorded art.
LOUDSPEAKERS
which are evaluated
by using
Studio Monitor
LOUDSPEAKERS
which are evaluated
by using
In a very real sense, the audio industry, the manufacturers of the equipment used in recording, storing and reproducing
sound, is employing science in the service of art. To be fully successful in this endeavor, we need to act in a concerted
manner at all levels, acknowledging that “sounding good” is not enough, because there are reasons why something that
sounds “good” could be an aberration, an accident of compensating errors. A simple example would be a recording that
has excessive high frequency content, because the monitor loudspeakers roll off the highs. The combination may sound
just fine. Such accidents do not serve the audio community at large – our customers – they merely provide momentary
gratification for those whose egos are involved with whatever product is being evaluated.
Subjective evaluations are relative evaluations. Technical measurements are (or should be!) absolute data. We need to
know enough about technical measurements to be able to use them in ways that can reduce the variables in subjective
evaluations. The long-term objective is to have two parallel measurement systems, one subjective and one objective,
both of which lead us to the same conclusions about the performance of a loudspeaker.
V
A Figure 4. The effect of hearing loss, in this case
J averaged over the audiometric frequencies below 1
R kHz, on the variability in repeated sound-quality
U I evaluations of the same sounds. Note that the trend is
D A well established at threshold shifts well below those at
G B which hearing impairment is normally acknowledged.
M Hearing impairment, in audiometric terms, is normally
I
E related to the ability to understand speech. Here we
L are applying a much more rigorous criterion. From
N I ref. 2.
T T
Y
0 10 20 30
BROADBAND HEARING LOSS (dB)
It is interesting to note that, in Fig. 4, the correlation is with hearing threshold shifts below 1 kHz. The much more
common high-frequency losses, by themselves, were not clearly correlated, although they usually accompany the low-
frequency losses in sensitivity.
Listeners with hearing loss not only exhibit high judgment variability, they can also exhibit strong individualistic biases
in their judgments. This comes as no surprise, since such individuals are really in search of a “prosthetic” loudspeaker
that somehow compensates for their disability. Since the disabilities vary enormously, so do the biases.
The evidence of Figure 5 is that the group of normal-hearing listeners substantially agree in their ratings. Interestingly,
the second group shares the opinion of the truly good speakers, A and B. However, speakers C and D exhibit
characteristics that are viewed as problems by the normal group, but about which the second group has substantially no
opinion. Based on the opinions of individual listeners, either C or D could be the best or worst speaker in the world. It
appears that their disabilities prevented some of the listeners from hearing certain of the deficiencies. Sadly, some
listeners who fall into the problem category are talented and knowledgeable musicians or audio professionals whose
vocations may have contributed to their condition. However articulately their opinions are enunciated, their views are of
value only to them, personally.
10
LISTENERS WITH LOW
9
JUDGMENT VARIABILITY /
8 NORMAL HEARING
7 Figure 5. Evaluations of four loudspeakers
6 by two groups of listeners, one that exhibited
5 low variability in repeated sound-quality
4 LISTENERS WITH HIGH judgments on a 10-point “fidelity” scale, and
3 JUDGMENT VARIABILITY /
another that exhibited high variability. From
2 HEARING LOSS
reference 2.
1
0
A B C D
LOUDSPEAKER
The conclusion is clear. If there is any desire to extrapolate the results of a listening evaluation to the population at large,
it is essential to use representative listeners. In this context, it appears to be adequate to employ listeners with broadband
hearing levels within about 20 dB of audiometric zero. According to some large surveys, this is representative of about
75%, or more, of the population – an acceptable target audience for most commercial purposes. This is not an “elitist”
criterion.
The Listening Room, The Final Audio Component – The Final Experimental Variable
It has been recognized for many years that the room is a critical factor in determining the sound quality, imaging and
impressions of space and envelopment. Most of the scientific work was done in the context of concert halls and other
large performance spaces. It has been only relatively recently that serious investigations of loudspeaker/room acoustical
interactions have been seen (8,9,10). In the early days of listening tests, the problems of rooms could be mostly
overlooked, as the problems within the loudspeakers themselves were of such a magnitude that a kind of rough justice
would prevail even in poorly controlled evaluations. As time passed, and loudspeakers improved, it became necessary to
repeat the evaluations several times, each time changing the position of the loudspeaker and/or the listener, so that the
variations would be randomized, i.e. handled statistically. Eventually, loudspeakers improved still further, and that too
became inadequate. Now, we have an elegant, but expensive, solution to the problem of loudspeaker position – make it a
constant.
Figure 7 shows two stereo pairs of loudspeakers, each loudspeaker resting on a pneumatically-driven platform, in a
computer-controlled system wherein the listening location of the loudspeakers (as shown, the more forward locations)
can be specified in both X and Y dimensions, and the unused pair of loudspeakers is parked against the back wall. The
system allows for comparisons of up to four single loudspeakers, four stereo pairs, and three left, center, right
multichannel arrays. The quiet exchange takes 2 to 3 seconds, during which the music is paused. A control computer,
randomly selects the program and the displayed coded identities of the test products, which is changed for each program.
The sequence of comparisons, and the exposure times, are controlled by the single listener, who is visually isolated from
both the loudspeakers under test and the room itself by a visually opaque curtain. There is no time limit for the
evaluation.
Just as the position of the loudspeakers under test can be eliminated as an experimental variable, it can also be introduced
as the key variable, to ascertain its sensitivity to room position. The interior of the room is designed to allow for great
freedom in the interior acoustical treatment. Reference 11 describes the facility in detail.
Figure 7. The pneumatically-operated, computer-controlled, loudspeaker “shuffler” in a special listening room designed
to eliminate or to use position as a variable in listening tests.
At this point, it is correct to say that, with adequate experimental controls, we are no longer conducting “listening tests”,
we are performing “subjective measurements”. The results are repeatable at different times, with different listeners, with
statistical variances that are known, and with results in forms that can be subjected to statistical tests for significance.
Opinions have, in a fashion, been turned into “facts”.
In the early days of audio, everything was “colorful”: the experiences, the mythology, the charismatic characters, the
stories, the memories and the sounds of the loudspeakers; they were all colorful. How a loudspeaker sounded, in this era,
was as much a matter of accident, as it was design. There were measurements, but they were not always accurate or
comprehensive and, for the most part, there were no reliable rules by which they could be interpreted. In the end,
performance was judged subjectively, usually in ways that were themselves fundamentally flawed. Not the least of the
problems in these days, was the variability of recordings, and the playback apparatus. We now can do a lot better.
With statistically reliable, repeatable, numerical subjective ratings of loudspeakers in hand, we are now in a position to
look for orderly relationships with technical data on the same products. This is the entry point of psychoacoustics: the
relationship between what we measure, and what we hear (12).
Loudspeakers are not designed to entertain us in anechoic spaces although, for very practical reasons, that is where we
usually choose to measure them. Nowadays, computer based measurement systems allow us to simulate anechoic
measurements in reflective spaces, although there are resolution trade-offs that must be made. There is no “free lunch”.
Since we listen in rooms, it behooves us to make measurements that give us indications of how these loudspeakers will
behave in rooms. It is a simple observation that most of the sound we hear in rooms arrives at our ears after reflection,
sometimes after many reflections, from room boundaries and furnishings. Therefore, it is also a simple observation that
the on-axis behavior of a loudspeaker is but a beginning in collecting useful data about its performance.
Let is look at what happens in a typical room. The loudspeaker used in this example was one that listeners reported as
having some timbral colorations, and it is a good way to demonstrate some important fundamentals of loudspeaker
design. The data are from reference 12.
Figure 8 shows some measurements, indicating that the design directive for this loudspeaker system was focused on the
on-axis performance. The off axis measurements indicate progressive deterioration in the crossover regions between the
woofer and the midrange, and between the midrange and the tweeter. By the standards of that time (the early 1980’s),
this was not an unusually bad performance. In fact, this product scored well, albeit not at the top, in listening tests.
10
0
dB
-10 ON AXIS
30 DEGREES OFF AXIS
-20 60 DEGREES OFF AXIS
Figure 9 shows a synthesis, from anechoic measurements on the loudspeaker, of the sounds arriving at a listener’s ears in
a typical room. The sounds have been divided into three categories, direct, early reflections (floor, ceiling and sidewall
bounces), and reverberant sounds. For this example, the room boundaries were assumed to be relatively reflective. It
can be seen that, at low frequencies the dominant factor is the reverberant sound (sound power). The direct (on-axis)
sound is not really a factor. At the other extreme, at the highest frequencies, what we hear is absolutely dominated by
the direct (on-axis) sound. In between, for a frequency range covering the most important instrumental and vocal sounds
– from a few hundred Hz to a few thousand Hz – everything matters. All three components are clearly influential. So, if
we wish to characterize the sound of a loudspeaker in a room, by using anechoic measurements of frequency response, it
is quite clear that we must measure absolutely everything! Reducing the description of frequency response to a single
curve is overly simplistic. An on-axis frequency response tells only part of the story, as does a measurement of total
sound power.
20
PREDICTED“ROOM CURVE”
10
dB
0
-10
This is interesting, but it is still theory. What actually happens in a room? Figure 10 shows that the real world is more
complex. At low frequencies, the room resonances and adjacent-boundary reflections dominate the communication of
bass sounds from the loudspeaker to the listener, and different loudspeaker positions yield very different impressions of
bass – from artificially punchy “rock ‘n roll” bass (solid curve), through wimpy, inadequate bass (dotted curve), to quite
good (dashed curve). Although not shown, moving the listener can achieve similar differences in bass performance.
30
20
dB
10
-10
20 50 100 500 1K 5K 10K 20K
FREQUENCY (Hz)
Figure 10. Measurements of the example loudspeaker in a room. The loudspeaker was placed in three realistic left/right
channel locations and, for each one, measurements were made at four possible listening locations, all within a two-foot
radius. Each of the bottom curves is an energy average of the four measurements. The top curve is the predicted room
curve from Figure 9, which, for clarity, has been shifted vertically by 10 dB. Adapted from reference 12, Figure 20.
The predicted room curve fits remarkably well at frequencies above about 400 Hz, and is only slightly in error from 200
Hz up. At lower frequencies, it failed completely, because it did not consider the profound effect of standing waves and
adjacent boundary interactions, although there are ways to do it for specific rooms. The conclusion, though, is clear – at
low frequencies, position is everything. Even with “perfect” woofers, the room and the arrangement of sources and
listeners within it, determine the quality of the bass we hear.
At middle and high frequencies, the result is strikingly different. Position matters much less, and our calculated room
curve fits almost perfectly. From “sterile” anechoic data on the loudspeaker, we have been able to describe what
happens in a room. This is clearly an important fact.
Summing this up, it can be stated that there are two distinct domains in a listening room. Below about 400-500 Hz the
room dominates what we hear and, above that division, the loudspeaker dominates. In the case of this loudspeaker, it is
clear that listeners’ complaints about colored midrange sound were caused by the way the loudspeaker was designed.
Focusing on the direct/on-axis sound ignored the fact that what we hear in a room is very much influenced by sounds
radiated in other directions.
Now, let us return to the question posed in the title of this section: what do we measure? In terms of frequency response,
it is clear that no one curve is sufficient to describe how a loudspeaker will sound in a room. An on-axis curve is but a
beginning. It is necessary to have enough multi-directional data to be able to estimate the early-reflected sound arriving
at a listener, as well as total sound power, as an estimator of reverberant energy. In short, one must measure everything.
issue. Neither are equalizers that are limited to attenuating room resonances at very low frequencies. At issue are “brand
X” equalizers, operating over the entire audio bandwidth and being applied, after measurements in a room, to the
“correction” of “brand Y” loudspeakers. Such an endeavor seems more like a research project than a product, at this
stage in the game. Ironically, it amounts to finishing the design of the loudspeaker after it is installed in the room.
It is worth noting here that equalization is, at best, a partial solution for the kind of problem illustrated in Figure 10.
Above about 300 Hz, changing the shape of the room curve destroys the only good thing the loudspeaker had, its on-axis
response. By altering its frequency response, this loudspeaker can be made to sound different, possibly slightly better,
but it can never equal the sound of a loudspeaker that is properly designed to begin with.
To avoid coloration in its interaction with the listening room a loudspeaker must be well behaved both on and off axis. If
it is designed to have relatively constant directivity, as a function of frequency, then it is possible for all three categories
of sound, direct, early reflected, and reverberant, to exhibit similarly accurate timbral signatures. A loudspeaker that has
constant, or at least smoothly-and-gradually-changing directivity, as a function of frequency, can be equalized if it does
not exhibit a smooth flat frequency response. But this needs to be known in advance, and most manufacturers don’t, or
cannot, measure directivity as parameter. If they could, they might have made the loudspeaker flat and smooth to begin
with.
At low frequencies, we enter much more complex territory because it involves the room. This is a subject that will be
addressed in a later section. For the moment, it is sufficient to note that there are some things that equalization can do
very well at low frequencies, and other things that should not even be attempted.
The information
in this curve
needs to be
separated into:
Acoustical
interference
Figure 11 describes what might be possible, in an ideal world, to address the first of the listed requirements. As it turns
out, the real world is not all that far from this goal, but some special techniques are necessary.
The importance of spectral balance is unquestioned. A few minutes experimenting with an equalizer, or even simple
bass and treble controls, underlines how dramatically sound quality is influenced by broadband trends in frequency
response. Fortunately, this is the easiest parameter to isolate. Spectral smoothing can do it, or it can be done simply
with the eye-brain combination, which is very effective at identifying such underlying trends, even in unprocessed data.
Resonances are also fundamental audible problems, and it is interesting to speculate why. Resonances are the
fundamental building blocks of most of the sounds we are entertained and informed by. Vocal tract resonances give our
voices their distinctive timbres, such that we can usually be recognized even over limited bandwidth telephones. High-Q
(frequency-specific) resonances provide the perceived pitches for musical instruments - they play the tunes. Medium-
and low-Q resonances are excited by the vibrating strings, air columns, etc. in musical instruments, and through sounds
radiated by thin resonant panels, acoustical cavities, and the like, they give the instruments their individual auditory
identities, their distinctive timbres. Some of us can distinguish among pianos of different brands, recognize violins of
old Italian origin, and so on. It should therefore come as no surprise that, at the end of careful subjective evaluations,
many of the problems identified by listeners are traceable to resonances in the loudspeakers. Our ear-brain combination
has become very sensitive to even small changes in the resonant structure of sounds. Opportunities for resonances
abound in loudspeaker systems, in transducers, in enclosures and, last but not least, in the listening room.
The final category, acoustical interference, is one that can give us some problems. Sometimes, the visual evidence in a
curve is such that it looks remarkably like a resonance, however, to the ears, when listening in a room, there is no audible
problem. The explanation is that acoustical interference exists because of the interaction of sounds from two or more
origins, such as a direct sound from a midrange driver, and diffracted sounds from each of the sharp corners of a
rectangular enclosure. A key factor that distinguishes interference of this kind is that, because of differences in each of
the propagation path lengths, the frequency response irregularities change with microphone location. So, if one performs
several measurements, each at different angles on and off axis of the loudspeaker, an average of these measurements will
tend to suppress visual evidence of acoustical interference. The perceptual version of this is that, in a room, each
reflected sound involves a different angle of radiation from the loudspeaker so that, all of the sounds arriving at the ears,
have different timbral signatures. The result is a kind of perceptual suppression of acoustical interference. Resonances,
in contrast, tend to radiate similar misbehavior over wide ranges of angles, and these will tend to remain as visible
deviations in the spatially-averaged curve. Perceptually, the resonances are repeated in each reflected version of the
sound, and are easily audible. Fortunately, this simple process of spatial averaging of measurements is a remarkably
effective means of isolating resonant misbehavior.
10
-30
100 200 500 1K 2K 5K 10K
FREQUENCY (Hz)
Figure 13 shows an example of the results, in this case the detection thresholds as revealed by symphonic music, which
was less revealing than pink noise, and more revealing than multi-miked, pan-potted, non-reverberant music. It is
interesting that the result is counter-intuitive in the sense that the high-Q resonances, with the associated long ringing
decays, are more difficult to hear than the medium and low Q resonances with their better temporal behavior. Looking at
these phenomena in the time domain can lead one to an erroneous conclusion about the audibility of a problem! What
the eyes see does not correspond to what the ears hear. This is psychoacoustics.
10
dB 0
Q = 50
Figure 13. Detection threshold levels for
-10 Q = 10 resonances as revealed to listeners through
symphonic music. From top to bottom, the curves
Q=1 represent steady-state measured levels for
-20
resonances with Q’s of 50, 10 and 1.
-30
100 200 500 1K 2K 5K 10K
FREQUENCY (Hz)
Figure 14 shows these data translated into a form that is more useful in practice: the measured deviation in a frequency
response curve that would be caused by resonances just at the point of detection. Any lower, and the resonance would be
perceptually eliminated for this, or less revealing music. Any higher, and the resonance has an increasing risk of being
audibly detected as a change in the instrumental or vocal timbre.
Of practical concern is the precision required in measurements in order to accurately reveal the nature and amplitude of
these resonances. High resolution is required in both amplitude and frequency. High-Q resonances can be found at low
frequencies in loudspeaker drivers, enclosures and in rooms. Time-windowed, simulated anechoic, FFT or TDS
measurements can reveal these phenomena with enough accuracy for them to be interpreted only if the time windows are
very large – so large as to deny their use in conventional rooms. Figure 15 shows an example of how audible problems
can be hidden by the poor low-frequency resolution of a 17 ms time windowed FFT measurement, having a 60 Hz
frequency resolution. This is better than is done in much of the industry, requiring a very large measurement space.
One-third-octave measurements would show none of this, and are useless for anything other than broadband, tone-
control kinds of measurements and adjustments – see “spectral balance” in Figure 11. Anechoic chambers, and the great
outdoors, are NOT obsolete as measurement spaces.
10
True Level Figure 15. High-Q resonances at equal level,
distributed across the audible frequency range
are progressively hidden at low frequencies by
dB the limited frequency resolution of a time-
0 windowed measurement. Window = 17 ms.
Frequency resolution = 60 Hz.
CANNOT MEASURE WHAT WE HEAR
20 50 100 500 1K 5K 10K 20K
FREQUENCY (Hz)
The Frequency and Time Domains / Waveform Fidelity / The Audibility of Phase
What happens in the frequency domain is only partly described by the frequency response (amplitude vs. frequency).
The other half of the description is the phase vs. frequency characteristic, the phase response. Both of these
measurements together, processed by the Fourier transformation, give us the time-domain behavior, the so-called
transient or impulse response. The reverse relationship is also true, that a time response measurement can be Fourier
transformed into the amplitude and phase responses vs. frequency. In other words, comprehensive measurements in the
frequency domain are sufficient to completely describe the behavior of a device in the time domain, and vice versa. A
surprisingly high percentage of professional audio people do not know this important fact. As a consequence, a difficult
subject is further complicated by misunderstanding.
Now, while amplitude vs. frequency, the frequency response, is an easy characteristic to measure, its other half, phase
response, is a little more difficult. But, for much of what we do, it is of no consequence. There are two reasons.
1. It is fortunate, in the audio industry, that humans are relatively, indeed substantially, insensitive to phase shift.
As an audible factor it can largely be ignored. From an engineering perspective, this comes as a surprise,
because it is such a logical concept to strive for accurate waveform reproduction (which requires both flat
amplitude and phase responses, and results in a perfect impulse response). Consequently, over the years,
several independent investigations have been launched to look into the audibility of phase. Each time the result
has been a great disappointment to anyone hoping for a strongly positive result. It turns out that, within very
generous tolerances, humans are insensitive to phase shifts. Under carefully contrived circumstances, special
signals auditioned in anechoic conditions, or through headphones, people have heard slight differences.
However, even these limited results have failed to provide clear evidence of a “preference” for a lack of phase
shift. When auditioned in real rooms these differences disappear, as the combination of direct and reflected
sounds destroy any semblance of phase coherency. Why is this so? Perhaps it is because, in real life we listen
in rooms, to a combination of direct and reflected sounds. Musical instruments, such as pianos, rely on direct
sounds, and sounds reflected from the open top of the instrument, and from the floor underneath, to convey the
total sound energy to listeners. Is there a definitive waveform that describes the sound of a piano? I wonder
how, or where, one would measure it? The answer, of course, is no. Every microphone location would yield a
different waveform, yet, to our ears, the sound is unambiguously definitive.
2. The second fortunate fact related to phase response has to do with the reality that many devices in the audio
realm can be described as “minimum-phase devices”. It is a convenient property of such devices that, if one
measures only the frequency response, one can calculate the phase response from it. Knowing both, one can
calculate the time response. So, misbehavior in the frequency response, such as evidence of a resonance,
translates unambiguously into evidence of ringing in the time domain. So, what are these convenient devices?
Many forms of analog and digital electronics function in this manner, as do loudspeaker transducers: woofers,
midranges and tweeters, and just to add icing to the cake, so do room resonances at low frequencies. In each
case, as perfectly smooth flat frequency response means that the time-domain behavior will be pristine, at least
over the bandwidth of each of the “minimum-phase” devices. In a loudspeaker system, the crossover regions,
where two or more transducers can operate simultaneously clearly violate the conditions, since there can be
time delays between sounds arriving from each one of them. Nevertheless, the advantages of this fact are
considerable. For example, if a minimum-phase device has a problem resonance, which shows itself as a bump
in the frequency response, this resonance can be eliminated by adding the perfectly equal and opposite
minimum-phase parametric equalizer in series with the device. Simultaneously, the bump in the frequency
response and the associated ringing in the time domain are eliminated. As loudspeaker systems become active,
with dedicated electronics, more of this will be possible. Dedicated electronics can make a good loudspeaker
better, especially in a room.
Technically, it is necessary to measure loudspeakers from enough perspectives to be able to anticipate its behavior in a
room. Figure 16, below, shows such a collection of data derived from anechoic measurements.
DIRECT SOUND
LISTENING
WINDOW
EARLY
REFLECTIONS
SOUND POWER
DIRECTIVITY
INDEX
Figure 16. From a collection of 72 anechoic frequency response measurements, measured on horizontal and vertical
orbits, estimates of direct, early reflected, and reverberant (sound power) sounds have been calculated. In addition, a
spatial average over a ±30º horizontal and ±10º vertical “listening window” has been calculated, as well as the
traditional directivity index (the difference between the direct sound and sound power) and an invented directivity index
(the bottom curve) showing the difference between the direct sound and the early reflected sounds.
The loudspeaker used in the example shown in Figure 16 is a very good performer, it does very well in subjective
evaluations. It is an expensive consumer product. Such is not the case with all loudspeakers, of course, and it is
interesting to correlate the kinds of data shown in Figure 16, and the results of hundreds of controlled listening tests.
10 dB
5 dB DI
0 dB
10 dB
5 dB DI
0 dB
So, where do we stand today? Those who care to dig out reference 12 can see where we were in the mid 1980’s. The
best loudspeakers of that era would make a respectable showing today, but none would rate as highly as the best of this
generation of products. However, it is also fair to say that the average level of performance has gone up. It is no longer
necessary to take out a loan to have very high sound quality at home. Even many portable audio and car audio systems
are sounding quite natural these days. However, there remain glaring examples of opportunism or incompetence, as
shown in Figure 19. Perhaps there always will.
A fundamental change has occurred in the professional monitor category of products, though. In the past, in order to
make the professional products reliable, capable of being played very loud for long periods of time, and reasonably
sensitive – in short “bulletproof” – there were tradeoffs in sound-quality. Now, we have the knowledge and the
technology to be able to have both superb sound quality and reliability. A professional studio monitor now can have as
much “finesse” and “detail” as the highest of high end audio gear, with the important difference that it can play very loud
without distress, or power compression, and be physically abused, and survive. In this sense, some consumer products
need to catch up.
In terms of the “circle of confusion” shown at the beginning of this paper, we have arrived. The bar has been raised for
the industry, there is no need to compromise the recorded “art”.
As a result, customers shopping for loudspeakers today are confronted with a variety of options:
• Forward facing direct radiators - the normal woofer, midranges and tweeters arrayed on one face of a six-sided
box.
• Forward and backward facing direct radiators – if they appear in approximately equal numbers on each face,
and radiate in the same polarity, the systems are called “bipoles”, implying bi-directional in-phase sound
radiation. In effect, such speakers are essentially omnidirectional at low and middle frequencies, showing front-
back favoritism only at the highest frequencies where tweeters begin to beam.
• Forward / backward out-of-phase radiation, i.e. dipoles. All electrostatic and electromagnetic “panel”
loudspeakers fall into this class as do some cone/dome arrangements. They are distinguished by acoustical
nulls at the ±90º axes.
• Omnidirectional. Equal sound radiation in all directions is neither practical nor necessary, so most such systems
aim to achieve a uniform dispersion in the horizontal plane.
• Other. Various combinations of drivers, pointing in different directions, designed to deliver different ratios of
direct and reflected sounds to the listening area.
+ -
+ +
Figure 22. Three of the commonly available directivities in loudspeakers, showing how low, medium and high
frequencies radiate from each type. On the left is the common forward-facing direct radiator, in the middle is the so-
called “bipole”, and on the right is the classic dipole radiator (the low frequencies have dispersion identical to the mids).
The differences among these types of loudspeakers are so great that listeners will be exposed to quite different
perceptions of direction, space and envelopment for any given recording. Since these factors are an important part of the
sound design process undertaken in the recording control room, what we have here is a record/reproduction system that
is out of control. The “circle of confusion” is truly in place, since consumers can, by the selection of loudspeaker
directivity, change important components of the “art”. When they do so, they find that, if certain kinds of recordings are
flattered by the manipulation, there will be others that are not. All recording studios that the author is aware of employ
front-facing direct radiator studio monitoring loudspeakers. The rooms in which they are used tend to be acoustically
less reflective than typical homes which, it is reasonably argued, allows audio engineers to be more analytical in the
mixing and mastering processes. For recreational listening, even audio professionals tend to enjoy a higher proportion of
reflected sounds. However, multidirectional loudspeakers move the discussion into a different category.
point, with headphones. Now that we have five essentially independent channels, it is possible to add this decorrelation
to the “mix”, to make the sense of ambiguity and space an artistic variable.
The front left, center and right loudspeakers, following these guidelines, need to be able to place listeners in a strongly
direct sound field. This can be achieved through the use of loudspeakers with sufficient directional control to address the
audience, without energizing the room boundaries. It can also be achieved with loudspeakers of any directivity, but with
sound absorption on the major reflecting surfaces to attenuate the delayed arrivals at the listening positions. In either
case, or in a combination of both approaches, one needs to have a means to assess where one is - a measurement method
that correlates, psychoacoustically, with perceptions.
R EQUAL LEVEL
+10
E “HAAS EFFECT”
L
A dB 0 DIRECT
T SOUND
I -10
V
E SEPARATE
-20 SOURCE
L IMAGE-SHIFT
E -30 THRESHOLD
V SPACIOUSNESS
E DETECTION
L -40 THRESHOLD
0 10 20 30 40 50 60 ms DELAY
Test signal: speech
Figure 23. The sequence of perceptual effects of a single reflection, as determined with speech as a signal, for a
reflection arriving from a simulated right adjacent room wall. The bottom curve shows the absolute threshold of
detection. When listeners were asked what they heard, a description of “spaciousness” or similar descriptions were most
common. About 10 dB higher, the perception was that the dominant sound image had changed size and/or shifted
position towards the reflected sound, although the reflection itself was still not perceived as a distinct spatial event. At
about 10 dB higher, listeners were aware of the reflected sound as a second source. Obviously, the tests were simulated,
since the reflected sound is higher in level than the direct sound, something that is unlikely to occur in nature. The top
curve is truly artificial, in the sense that the “reflected” sound has been adjusted in amplitude so that, subjectively, it is as
loud as the direct sound. This is the curve that Haas generated in his experiments. The six dots clustered just below the
0 dB, 0 delay intersection are the first six reflected sounds arriving at a listener in a typical domestic or control room.
Obviously they cannot be ignored. Data are from reference 17; the compilation from reference 18; Haas is cited therein.
In Figure 23, it is important to note that all of the curves are continuous. There is no change in perceptions at delays
below 15 ms, 20 ms, or any other arbitrarily selected point in the delay domain. The widespread notion that perceptual
events are somehow “merged” at short time delays is simply not supported by the evidence from this study or that of
Haas, the usually cited source. Temporal effects on speech intelligibility, for example, can occur, but they tend to be
signal dependent. The data shown in Figure 23, and those in Haas, were determined with running speech – respectively
English and German, one presumes. Intelligibility has to do with average syllabic rates, voice spectra, and so on. If we
listened only to speech, such generalizations might be sufficient, but we do not.
Another important message from Figure 23 is that all reflected sounds above the threshold of audibility affect one’s
sense of “space”, at the very least. The room, any room, therefore becomes a part of the listening experience.
The curves in Figure 23 all have basically the same shape. The top curve, from Haas, is different, presumably because of
some differences in the experimental setup, or the nature of the speech signal (his would be in German). When we
repeated the Haas experiment, the shape of the topmost curve followed that of the lower ones. However, the story does
not end there. The shapes of the curves change very dramatically for other kinds of sounds (17). With transient sounds,
the thresholds at short delays are higher, and beyond about 10 ms the thresholds drop precipitously, meaning that the
reflections exhibit their audible effects at very much lower sound levels. With more continuous sounds, like reverberant
classical music, the curves flatten out, making delay almost irrelevant, and late reflected sounds are much less audible.
So, as detailed as Figure 23 is in some ways, it is but an entry point to understanding the perceptual effects of delayed
sounds. Since we listen to much more than just running speech, designing a listening experience based on these data
alone is wildly simplistic.
And yet we do it, perhaps because we can. It is, after all, a start. EBU Tech 3276-E, perhaps because of the importance
of speech in that business, specifies a restriction on early reflected sounds in control rooms that is sufficient to catch
those that fall into the first 15 ms following the direct sound, and that are within 10 dB of the direct sound amplitude. In
Figure 23 this just nicely eliminates the first 6 reflections shown by the dots. The recommendation is that these
reflections be eliminated at frequencies in the band 1 to 8 kHz. A layer of one-inch thick fiberglass board on the
offending surface will suffice.
Do these levels
Correspond to
What we hear?
Figure 24. An Energy-Time Curve (ETC) showing the
direct and reflected sounds arriving at a listening
position in a room. It is frequently assumed that the
amplitudes of the spikes are directly related to the
audibility of the reflected sounds. Often they are not.
Time
The fundamental problem with the ETC, and indeed conventional “reflectograms”, is that they contain no information
about spectrum.
It is the spectrum
level of the
reflection that
determines its
audibility – not
Spectrum Level
the height of the Spectrum Level
ETC spike.
Both of these
Energy-Time Curve Energy-Time Curve
reflections are
at the threshold
of detection.
Figure 25. An illustration of how the amplitude of a simple time-domain measurement can be seriously misleading. The
top data block is the familiar “waterfall” diagram, showing frequency as the horizontal axis, amplitude as the vertical
axis, and time moving back to front. The reflection of interest here is the one closest to the front. The middle data block
is the spectrum of that reflection. The bottom data block is the ETC. In each column of data, the level of the reflection
has been adjusted until it is just at the threshold of detection. Attenuating the high frequencies of a reflection, by
introducing some high-frequency absorption at the reflection point, dramatically reduces the height of an ETC spike, but
the audible threshold for the reflection is unchanged. We hear something different, but we still hear the effects of the
reflection. In measurements, it is necessary to be analytical in the frequency domain. From reference 17.
Now, having criticized the ETC as a measurement method for this purpose, let us now propose a technique for making it
useful. If the test signal is bandpass filtered, say into octave bands, the ETC for each band can be used to build up a
picture of what it happening. It is equivalent to taking frequency-limited slices through the waterfall diagrams in Figure
25.
As a result of multichannel audio, there are some new considerations that apply to our room acoustic efforts:
• It is usual for all of the low bass to come from a common-channel subwoofer. This leaves us free to locate this
device for optimum bass performance, while locating the five other loudspeakers for optimum directional and
spatial effects.
• We now attempt to design to satisfy several listeners, not just a solo listener in a “sweet spot”. Optimally
locating the listeners in the standing-wave patterns can greatly improve the uniformity of bass response at
several seating locations.
• The use of multiple subwoofers can, in some circumstances, be used to achieve more uniform bass performance
over a listening area, especially along a single axis: width or length. The principle at work here is to use the
multiple sources to selectively energize or cancel modes affecting what is heard.
• The acoustical damping of low-frequency room modes, always a “good thing”, now takes on an enhanced
importance since it can result in more uniform bass performance over a listening area. Low-Q modes have
lower pressure peaks, and higher pressure minima.
• If one is successful in achieving a relatively uniform bass performance over a listening area, it is then possible
to apply equalization to make it sound good. The challenge here is to know what can be corrected, and what
cannot.
Figure 2. Frequency
BEFORE response measurements
AFTER of a subwoofer, before
and after a single band of
parametric equalization.
Figure 26. High-resolution measurements of a subwoofer in a room, before and after equalization with a single band of
parametric filtering. It can be seen that the time domain misbehavior, as well as the frequency-domain misbehavior, has
been greatly improved. Audibly, the difference was dramatic.
6. CONCLUSION
With an A-to-Z topic such as this, it is difficult to be concise. Inevitably there have been omissions, as well as selective
emphases. Still, it is important to try to summarize one’s understanding of a subject from time to time. It is clear, to this
writer at least, that in terms of understanding why loudspeakers sound the way they do, and what we can do to improve
sound quality in rooms, we are far ahead of where we were 15 years ago. In terms of delivering good sound through
loudspeakers, we are much more technically advanced than we were, and through the application of good science and
good engineering there are many more truly good loudspeakers in circulation. We also think we have the technology to
deliver more uniformly good sound to many acoustical spaces. That is progress.
That the improvements apply equally to professional and consumer audio means that the “circle of confusion” is not now
the problem that it once was. However, visits to some recording studios remind one that old habits die hard. Some
places have some catching up to do.
Music and movies are art. Audio is a science. By increasing our understanding of the science of audio, and intelligently
applying the technology that derives from it, we will continue to make progress.
Multichannel audio presents a wonderful opportunity to create sounds in isolation, and sounds to accompany pictures,
that stimulate and entertain people in ways never before possible. “Realistic” sounds can be more real. Contrived
sounds can be even more adventurous. As important as anything, I think, is the fact that we have moved from stereo, an
essentially antisocial mode of listening, to a mode where several listeners can experience something close to what the
creators intended. Multichannel audio should be a social experience, just as the “real thing” is. That also is progress.
Maximizing the performance of such systems in the many varieties of real rooms our customers listen in is no small feat.
Nevertheless, the audio industry is on track to achieve just that.
Appologia
Most of this paper was written while I was on vacation, away from my references and resources. I have omitted several
obvious references to the work of others, and for this I apologize. If this were, or should become, an archival document,
the omissions will be rectified.
REFERENCES
(1) Toole, F.E., “Listening Tests, Turning Opinion Into Fact”, J. Audio Eng. Soc., vol. 30, pp. 431-445 (1982 June).
(2) Toole, F.E., “Subjective Measurements of Loudspeaker Sound Quality and Listener Performance”, J. Audio Eng.
Soc., vol 33, pp. 2-32 (1985 January/February)
(3) Toole, F.E., “Listening Tests - Identifying and Controlling the Variables”, Proceedings of the 8th International
Conference, Audio Eng, Soc. (1990 May).
(4) Toole, F.E., “Subjective Evaluation”, in J. Borwick, ed. Loudspeaker and Headphone Handbook - Second Edition,
chap. 11, Focal Press, London, 1994, Third Edition: in press.
(5) Olive, S.E. "A Method for Training of Listeners and Selecting Program Material for Listening Tests", 97th
Convention, Audio Eng. Soc., Preprint No. 3893 (1994 November).
(6) Olive, S.E., “A Method for Training Listeners: Part II –“, presented at the 101st Convention, Audio Eng. Soc., (no
preprint), abstract published in J. AES Vol. 44, No. 12 (1996 Dec.).
(7) Toole, F.E. and S.E. Olive, "Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests and
Other Interesting Things", 97th Convention, Audio Eng. Soc., Preprint No. 3894 (1994 Nov.).
(8) Schuck, P.L., S. E. Olive, J. Ryan, F. E. Toole, S Sally, M. Bonneville, E. Verreault, K. Momtahan, "Perception of
Reproduced Sound in Rooms: Some Results from the Athena Project", pp.49-73, Proceedings of the 12th International
Conference, Audio Eng. Soc. (1993 June).
(9) Olive, S.E., P. Schuck, S. Sally, M. Bonneville, “The Effects of Loudspeaker Placement on Listener Preference
Ratings”, J. Audio Eng. Soc., Vol. 42, pp. 651-669 (1994 September).
(10) Olive, S.E., P. Schuck, J. Ryan, S. Sally, M. Bonneville, “The Variability of Loudspeaker Sound Quality Among
Four Domestic-Sized Rooms”, presented at the 99th AES Convention, preprint 4092 K-1 (1995 October).
(11) Olive, S.E., B. Castro and F.E. Toole, “ A New Laboratory For Evaluating Multichannel Systems and Audio
Components”, 105th AES Convention, Audio Eng. Soc., preprint no. 4842 (1998 Sept).
(12) Toole, F.E., “Loudspeaker Measurements and Their Relationship to Listener Preferences”, J. Audio Eng, Soc., vol.
34, pt.1 pp.227-235 (1986 April), pt. 2, pp. 323-348 (1986 May).
(13) Toole, F.E. and S.E. Olive, “The Modification of Timbre by Resonances: Perception and Measurement”, J. Audio
Eng, Soc., vol. 36, pp. 122-142 (1988 March).
(14) Steinberg, J.C. and Snow, W.B. “Auditory Perspective - Physical Factors”, Electrical Engineering, vol. 53, PP. 12-
17, Jan. 1934.
(15) British Patent No. 394 325 “Improvements in and relating to Sound-transmission, Sound-recording and Sound-
reproducing systems” granted to Alan Blumlein and EMI, 1933. This is reprinted in J. Audio Eng. Soc., vol.6, pp. 91-
98, 130, April 1958.
(16) Toole, F.E., “The Future of Stereo”, Part 1, Audio, Vol.81, No.5, pp. 126-142 (1997, May), Part 2, Audio, Vol. 8,
No. 6, pp. 34-39 (1997 June). The text of this article, slightly updated, appears on the www.harman.com website, under
“white papers” with the title “How Many Channels?”
(17) Olive, S.E. and F.E. Toole, “The Detection of Reflections in Typical Rooms”, J. Audio Eng, Soc., vol. 37, pp. 539-
553 (1989 July/August).
(18) Toole, F.E., “Loudspeakers and Rooms for Stereophonic Sound Reproduction”, Proceedings of the 8th International
Conference, Audio Eng, Soc. (1990 May).
(19) Toole, F.E., “Loudspeakers and Rooms, Working Together”, in “white papers” at www.harman.com.
(20) Toole, F.E., “Maximizing Loudspeaker Performance in Rooms”, in “white papers” at www.harman.com.