The Auditory System at the Cocktail Party Optimized DOCX Download
The Auditory System at the Cocktail Party Optimized DOCX Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/the-auditory-system-at-the-cocktail-party/
Editors
123
Editors
John C. Middlebrooks Arthur N. Popper
Department of Otolaryngology, Department of Biology
Department of Neurobiology & Behavior, University of Maryland
Department of Cognitive Sciences, College Park, MD
Department of Biomedical Engineering, USA
Center for Hearing Research
University of California Richard R. Fay
Irvine, CA Loyola University of Chicago
USA Chicago, IL
USA
Jonathan Z. Simon
Department of Electrical & Computer
Engineering, Department of Biology,
Institute for Systems Research
University of Maryland
College Park, MD
USA
v
Series Preface
The following preface is the one that we published in Volume 1 of the Springer
Handbook of Auditory Research back in 1992. As anyone reading the original
preface, or the many users of the series, will note, we have far exceeded our original
expectation of eight volumes. Indeed, with books published to date and those in the
pipeline, we are now set for over 60 volumes in SHAR, and we are still open to new
and exciting ideas for additional books.
We are very proud that there seems to be consensus, at least among our friends
and colleagues, that SHAR has become an important and influential part of the
auditory literature. While we have worked hard to develop and maintain the quality
and value of SHAR, the real value of the books is very much because of the
numerous authors who have given their time to write outstanding chapters and to
our many coeditors who have provided the intellectual leadership to the individual
volumes. We have worked with a remarkable and wonderful group of people, many
of whom have become great personal friends of both of us. We also continue to
work with a spectacular group of editors at Springer. Indeed, several of our past
editors have moved on in the publishing world to become senior executives. To our
delight, this includes the current president of Springer US, Dr. William Curtis.
But the truth is that the series would and could not be possible without the support
of our families, and we want to take this opportunity to dedicate all of the SHAR
books, past and future, to them. Our wives, Catherine Fay and Helen Popper, and our
children, Michelle Popper Levit, Melissa Popper Levinsohn, Christian Fay, and
Amanda Fay Seirra, have been immensely patient as we developed and worked on
this series. We thank them and state, without doubt, that this series could not have
happened without them. We also dedicate the future of SHAR to our next generation
of (potential) auditory researchers—our grandchildren—Ethan and Sophie
Levinsohn, Emma Levit, and Nathaniel, Evan, and Stella Fay.
vii
viii Series Preface
Preface 1992
The cocktail party is the archetype of a complex auditory scene: multiple voices
compete for attention; glasses clink; background music plays. Other situations of
daily life, including busy offices, crowded restaurants, noisy classrooms, and
congested city streets, are no less acoustically complex. The normal auditory sys-
tem exhibits a remarkable ability to parse these complex scenes. Even relatively
minor hearing impairment, however, can disrupt this auditory scene analysis.
This volume grew out of the Presidential Symposium, “Ears and Brains at the
Cocktail Party,” at the Midwinter Meeting of the Association for Research in
Otolaryngology, held in 2013 in Baltimore, Maryland. In this volume, the authors
describe both the conditions in which the auditory system excels at segregating
signals of interest from distractors and the conditions in which the problem is
insoluble, all the time attempting to understand the neural mechanisms that underlie
both the successes and the failures. In Chap. 1, Middlebrooks and Simon introduce
the volume and provide an overview of the cocktail party problem, putting it into
the perspective of broader issues in auditory neuroscience. In Chap. 2,
Shinn-Cunningham, Best, and Lee further set the stage by elaborating on the key
concept of an auditory object, which can be thought of as the perceptual correlate of
an external auditory source and the unit on which target selection and attention
operate. In Chap. 3, Culling and Stone address the challenges of low-level sepa-
ration of signal from noise and consider the mechanisms by which those challenges
may be overcome. They introduce the distinction between energetic and informa-
tional masking. Next, in Chap. 4, Kidd and Colburn develop the concept of
informational masking by focusing on speech-on-speech masking.
Computational models can aid in formalizing the basic science understanding of
a problem as well as in generating algorithms that exploit biological principles for
use in solution of practical engineering problems. In Chap. 5, Elhilali considers the
challenges of creating useful computational models of the cocktail party problem.
Then, in Chap. 6, Middlebrooks considers the importance of spatial separation of
sound sources for stream segregation and reviews the psychophysics and physio-
logical substrates of spatial stream segregation. Next, in Chap. 7, Simon reviews
new developments in the field of experimental human auditory neuroscience.
ix
x Volume Preface
A cocktail party is no place for infants and children. The auditory scene,
however, is easily as acoustically complex on a noisy playground or in a crowded
classroom. Young people apprehend these scenes with immature auditory systems
and not-yet-crystallized language recognition. Werner, in Chap. 8, considers mul-
tiple stages and levels of development. Next, in Chap. 9, Pichora-Fuller, Alain, and
Schneider consider older adults in whom maturity of language skills and stores of
knowledge can to some degree compensate for senescence of the peripheral and
central auditory systems. Finally, in Chap. 10, Litovsky, Goupell, Misurelli, and
Kan consider the consequences of hearing impairment and the ways in which
hearing can at least partially restored.
Successful communication at the eponymous cocktail party as well as in other,
everyday, complex auditory scenes demands all the resources of the auditory sys-
tem, from basic coding mechanisms in the periphery to high-order integrative
processes. The chapters of this volume are intended to be a resource for exploration
of these resources at all levels: in normal mature hearing, in early development, in
aging, and in pathology.
John C. Middlebrooks, Irvine, CA, USA
Jonathan Z. Simon, College Park, MD, USA
Arthur N. Popper, College Park, MD, USA
Richard R. Fay, Chicago, IL, USA
Contents
xi
Contributors
xiii
xiv Contributors
Abstract The cocktail party is a popular metaphor for the complex auditory scene
that is everyday life. In busy offices, crowded restaurants, and noisy streets, a
listener is challenged to hear out signals of interest—most often speech from a
particular talker—amid a cacophony of competing talkers, broadband machine
noise, room reflections, and so forth. This chapter defines the problems that the
auditory system must solve and introduces the ensuing chapters, which explore the
relevant perception and physiology at all levels: in normal mature hearing, in early
development, in aging, and in pathology.
Keywords Auditory object Auditory scene analysis Cocktail party problem
Energetic masking Grouping Informational masking Stream segregation
Streaming
1.1 Introduction
The cocktail party is the archetype of a complex auditory scene: multiple voices vie
for attention; glasses clink; background music plays; all of which are shaken, not
stirred, by room reflections. Colin Cherry (1953) brought hearing science to the
cocktail party when he introduced the term “cocktail party problem.” Cherry’s
cocktail party was rather dry: just two talkers reading narratives at the same time,
either with one talker in each of earphones or with the two talkers mixed and played
to both earphones. Real-life cocktail parties are far more acoustically complex, as
are other auditory situations of daily life, such as busy offices, crowded restaurants,
noisy classrooms, and congested city streets. Albert Bregman (1990) has referred to
people’s efforts to solve these everyday cocktail party problems as “auditory scene
analysis.”
The normal auditory system exhibits a remarkable ability to parse these complex
scenes. As pointed out by Shinn-Cunningham, Best, and Lee (Chap. 2), the best
efforts of present-day technology pale compared to the ability of even a toddler to
hear out a special voice amid a crowd of distractors. Conversely, even a relatively
minor hearing impairment can disrupt auditory scene analysis. People with mild to
moderate hearing loss report that their inability to segregate multiple talkers or to
understand speech in a noisy background is one of their greatest disabilities
(Gatehouse and Nobel 2004).
In attempting to make sense of the auditory scene, a listener must form distinct
perceptual images—auditory objects—of one or more sound sources, where the
sound sources might be individual talkers, musical lines, mechanical objects, and so
forth. Formation of an auditory object requires grouping of the multiple sound
components that belong to a particular source and segregation of those components
from those of other sources. Grouping can happen instantaneously across fre-
quencies, such as grouping of all the harmonics of a vowel sound or of all the
sounds resulting from the release of a stop consonant. Grouping must also happen
across time, such as in the formation of perceptual streams from the sequences of
sounds from a particular source. In the cocktail party example, the relevant streams
might be the sentences formed by the successions of phonemes originating from the
various competing talkers. To a large degree, segregation of auditory objects takes
place on the basis of low-level differences in sounds, such as fundamental fre-
quencies, timbres, onset times, or source locations. Other, higher-level, factors for
segregation include linguistic cues, accents, and recognition of familiar voices.
Failure to segregate the components of sound sources can impair formation of
auditory objects: this is masking. When a competing sound coincides in frequency
and time with a signal of interest, the resulting masking is referred to as energetic.
1 Ear and Brain Mechanisms for Parsing the Auditory Scene 3
The present volume addresses conditions in which the auditory system succeeds at
segregating signals from distractors and conditions in which the cocktail party
problem cannot be solved. Shinn-Cunningham, Best, and Lee (Chap. 2) set the
stage by introducing the notion of the auditory object, which can be thought of as
the perceptual correlate of an external auditory source and the unit on which target
selection and attention operate. Sequences of auditory objects that are extended in
time form auditory streams. Parsing of the auditory scene, then, consists of selection
of particular auditory objects through some combination of bottom-up object sal-
ience and top-down attention, filtered by experience and expectation.
Culling and Stone (Chap. 3) address the challenges of low-level formation of
auditory objects and consider some mechanisms by which those challenges can be
overcome. They introduce the notion of energetic masking, in which interfering
sounds disrupt the representation of speech signals at the level of the auditory
nerve. Release from energetic masking can be achieved by exploiting differences
between target and masker, such as differences in their harmonic structure or
interaural time differences. In some conditions a listener can circumvent energetic
masking by “listening in the dips,” where “the dips” are moments at which masker
amplitude is minimal. In addition, a listener might exploit the acoustic shadow of
the head by attending to the ear at which the target-to-masker ratio is higher.
Understanding of a speech target can be impaired by the presence of a competing
speech source even in the absence of energetic masking, that is, when there is no
spectral or temporal overlap of target and masker. That residual informational
masking is the topic of Chap. 4, by Kidd and Colburn. Focusing on
speech-on-speech masking, the authors contrast and compare energetic and