0% found this document useful (0 votes)
21 views14 pages

Aiken and Picton (2008)

This study recorded frequency-following responses (FFRs) from the brains of normal hearing subjects to two naturally produced vowels (/a/ and /i/). Responses were measured at the fundamental frequency and higher harmonics using a Fourier analyzer. Envelope FFRs related to the stimulus envelope were distinguished from spectral FFRs related to spectral components by adding or subtracting responses to opposite polarity stimuli. Significant envelope FFRs were detected at the fundamental frequency for all subjects. Significant spectral FFRs were detected near formant peaks and distortion products, but not for all subjects above 1500 Hz. These findings indicate FFRs follow both the pitch envelope and spectral components of speech sounds.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Aiken and Picton (2008)

This study recorded frequency-following responses (FFRs) from the brains of normal hearing subjects to two naturally produced vowels (/a/ and /i/). Responses were measured at the fundamental frequency and higher harmonics using a Fourier analyzer. Envelope FFRs related to the stimulus envelope were distinguished from spectral FFRs related to spectral components by adding or subtracting responses to opposite polarity stimuli. Significant envelope FFRs were detected at the fundamental frequency for all subjects. Significant spectral FFRs were detected near formant peaks and distortion products, but not for all subjects above 1500 Hz. These findings indicate FFRs follow both the pitch envelope and spectral components of speech sounds.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/23231521

Envelope and spectral frequency-following responses to vowel sounds

Article in Hearing Research · September 2008


DOI: 10.1016/j.heares.2008.08.004 · Source: PubMed

CITATIONS READS

213 526

2 authors:

Steven J Aiken Terence W Picton


Dalhousie University University of Toronto
32 PUBLICATIONS 579 CITATIONS 249 PUBLICATIONS 29,882 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

outer hair cell View project

All content following this page was uploaded by Terence W Picton on 12 October 2018.

The user has requested enhancement of the downloaded file.


Hearing Research 245 (2008) 35–47

Contents lists available at ScienceDirect

Hearing Research
journal homepage: www.elsevier.com/locate/heares

Research paper

Envelope and spectral frequency-following responses to vowel sounds


Steven J. Aiken a,*, Terence W. Picton b
a
School of Human Communication Disorders, Dalhousie University, 5599 Fenwick Street, Halifax, Canada B3H 1R2
b
Rotman Research Institute, Baycrest Centre for Geriatric Care, University of Toronto, Canada

a r t i c l e i n f o a b s t r a c t

Article history: Frequency-following responses (FFRs) were recorded to two naturally produced vowels (/a/ and /i/) in
Received 16 February 2008 normal hearing subjects. A digitally implemented Fourier analyzer was used to measure response ampli-
Received in revised form 15 July 2008 tude at the fundamental frequency and at 23 higher harmonics. Response components related to the
Accepted 13 August 2008
stimulus envelope (‘‘envelope FFR”) were distinguished from components related to the stimulus spec-
Available online 19 August 2008
trum (‘‘spectral FFR”) by adding or subtracting responses to opposite polarity stimuli. Significant enve-
lope FFRs were detected at the fundamental frequency of both vowels, for all of the subjects.
Keywords:
Significant spectral FFRs were detected at harmonics close to formant peaks, and at harmonics corre-
Auditory evoked potentials
Frequency-following responses
sponding to cochlear intermodulation distortion products, but these were not significant in all subjects,
Speech envelope and were not detected above 1500 Hz. These findings indicate that speech-evoked FFRs follow both the
Vowel sounds glottal pitch envelope as well as spectral stimulus components.
Fourier analyzer Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction dren with learning problems (Cunningham et al., 2001; King et al.,
2002) and children with auditory processing problems (Johnson
Infants with hearing impairment detected by neonatal hearing et al., 2007), but not in children or adults wearing hearing aids.
screening are referred for hearing aids within the first few months Transient cortical responses have also been used in subjects with
of age. Fitting is mainly based on thresholds obtained by electro- hearing impairment and with hearing aids (Billings et al., 2007;
physiological measurements – generally the auditory brainstem Golding et al., 2007; Korczak et al., 2005; Rance et al., 2002;
response (ABR) to tone bursts (Stapells, 2000b, 2002) or the audi- Tremblay et al., 2006). However, these responses are more variable
tory steady-state response (ASSR) to amplitude modulated tones in morphology than the brainstem responses – especially in infants
(Picton et al., 2003; Stueve and O’Rourke, 2003; Luts et al., 2004; (Wunderlich et al., 2006) – and less clearly related to speech
Luts and Wouters, 2005). However, these measurements are not parameters other than major changes in intensity or frequency.
exact. Both ABR and ASSR threshold estimates predict behavioral Most commercial hearing aids exhibit sharply non-linear
thresholds with standard deviations that range from 5 to 15 dB behavior designed to preferentially amplify speech and attenuate
across various studies (see Tables 1 and 2, Tlumak et al., 2007 other sounds. As a result, hearing aid gain and output characteris-
and Table 4 in Herdman and Stapells, 2003; Stapells et al., 1990; tics are different for speech and non-speech stimuli, and different
Stapells, 2000a). It would thus be helpful to have some way of for transient and sustained stimuli. We have therefore been con-
assessing how well the amplified sound is received in the infant’s sidering the use of sustained speech stimuli such as vowel sounds
brain (Picton et al., 2001; Stroebel et al., 2007). Speech stimuli (Aiken and Picton, 2006) or even sentences (Aiken and Picton,
would be optimal because the main intent of amplification is to 2008). Sustained speech stimuli can evoke a variety of potentials
provide the child with sufficient speech information to allow com- from the cochlea to the cortex. Since cortical potentials in infants
munication and language learning. are variable and change with maturation, a reasonable approach
Speech sounds elicit both transient and sustained activity in the might be to measure frequency-specific brainstem responses to
human brainstem and cortex. Transient brainstem responses can speech stimuli presented at conversational levels.
be recorded with a consonant–vowel diphone stimulus. The Brainstem responses to sustained speech and speech-like stim-
speech-evoked auditory brainstem response evoked by /da/ has uli have been called envelope-following responses (Aiken and
been used to investigate the brainstem encoding of speech in chil- Picton, 2006), frequency-following responses (Krishnan et al.,
2004), and auditory steady-state responses (Dimitrijevic et al.,
2004). Although frequency-following responses have sometimes
Abbreviations: CM, cochlear microphonic; FFR, frequency-following response;
ABR, auditory brainstem response; ASSR, auditory steady-state response
been distinguished from envelope-following responses (e.g. Levi
* Corresponding author. Tel.: +1 902 494 1057; fax: +1 902 494 5151. et al., 1995), the term ‘frequency-following response’ (FFR) has
E-mail address: [email protected] (S.J. Aiken). been used to describe responses to speech formants (Plyler and

0378-5955/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.heares.2008.08.004
36 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

Table 1 1.1. Responses to the fundamental


Frequencies (Hz) of formants and harmonics

Vowel First formant Second formant Frequency-following responses to the speech fundamental fre-
Peak frequency Closest harmonic Peak frequency Closest harmonic quency should be relatively easy to record, since speech is natu-
rally amplitude modulated at this rate by the opening and
/a/ 937 960(f9) 1408 1387(f13)
/i/ 229 244(f2) 2613 2562(f21)
closing of the vocal folds. Although the amplitude envelope does
not have any spectral energy of its own, energy at the envelope fre-
quency is introduced into the auditory system as a result of recti-
fication during cochlear transduction.
Table 2 In an earlier study (Aiken and Picton, 2006), we recorded re-
Average response nomenclature sponses to the fundamental frequencies of naturally produced
vowels with steady or changing fundamental frequencies. We used
Response Derivation Components
a Fourier analyzer to measure the energy in each response as the
++ Average together all responses to the original Envelope FFR
fundamental frequency changed over time (followed a ‘trajectory’).
stimulus Spectral FFR
Cochlear When the frequency trajectory of a response can be predicted in
microphonic advance, the Fourier analyzer can provide an optimal estimate of
Stimulus artifact the response energy along that trajectory. This is in contrast to tra-
+ Average together an equal number of Envelope FFR ditional windowed signal processing techniques (e.g. the short-
responses to the original stimulus and term fast Fourier transform), which assume that a response does
responses to the inverted stimulus not change its frequency within each window (is ‘stationary’). With
 Subtract responses to the inverted stimulus Spectral FFR the Fourier analyzer, significant responses were recorded in all of
from an equal number of responses to the Cochlear the subjects, and the average time required to elicit a significant re-
original stimulus and divide by the total microphonic
sponse varied from 13 to 86 s.
number of responses Stimulus artifact
Other techniques have also been used to evaluate the funda-
mental response. Krishnan et al. (2004) recorded frequency-fol-
lowing responses to Mandarin Chinese tones with changing
Ananthanarayan, 2001), intermodulation distortion arising from fundamental frequencies, using a short-term autocorrelation algo-
two-tone vowels (Krishnan, 1999), speech harmonics (Aiken and rithm. Dajani et al. (2005) used a filterbank-based algorithm in-
Picton, 2006; Krishnan, 2002), and the speech fundamental fre- spired by cochlear physiology to analyze responses to speech
quency (Krishnan et al., 2004), which presumably relates to the segments with changing fundamental frequencies. Both tech-
speech envelope. Thus the term ‘frequency-following response’ niques can measure the frequency trajectory well, but neither
can be used in a general sense – denoting a response that follows accurately estimates the response energy if the signal changes fre-
either the spectral frequency of the stimulus or the frequency of quency within the window used for the filter or the auto-
its envelope. For the purposes of this paper, we shall distinguish correlation.
between ‘‘spectral FFR” and ‘‘envelope FFR.” A similar distinction
was suggested by Krishnan (2007, Table 15.1), who proposed using 1.2. Responses to harmonics
alternating-phase stimuli to record responses locked to the enve-
lope and fixed-phase stimuli to record responses phase-locked to Although responses to the fundamental can be measured
spectral components. For simplicity, we shall restrict the term quickly and reliably, such responses provide limited information
FFR to responses generated in the nervous system, and not include about the audibility of speech in different frequency ranges. Since
the CM or stimulus artifact, even though these do follow the spec- all energy in voiced speech is amplitude modulated at the funda-
tral frequencies of the stimulus. mental frequency, a response at the fundamental could be medi-
An important difference between spectral and envelope FFR is ated by audible speech information at any frequency, and thus
that the latter is largely insensitive to stimulus polarity, much like any place on the basilar membrane.
the transient auditory brainstem response (Krishnan, 2002; Small In order to measure place-specific responses to speech, it might
and Stapells, 2005). Spectral FFR can thus be teased apart from be best to record responses directly to the harmonics of the funda-
the transient response by recording responses to stimuli presented mental frequency. Using the Fourier analyzer (Aiken and Picton,
in alternate polarities, and averaging the difference between the 2006), we recorded significant responses to the second and third
responses (Huis in’t Veld et al., 1977; Yamada et al., 1977). Other harmonics of vowels with steady and changing fundamental fre-
researchers have averaged the sum of responses to stimuli pre- quencies. We did not measure responses to higher harmonics,
sented in alternate polarities, in order to separate the FFR from due to the limited bandwidth of the electroencephalographic
the cochlear microphonic (e.g. Cunningham et al., 2001; King recording (1–300 Hz).
et al., 2002), but this manipulation would eliminate (or severely Krishnan (2002) recorded wide-band responses to synthetic
distort) the spectral FFR, preserving only the envelope FFR vowels with formant frequencies below 1500 Hz (i.e. back vowels)
(Chimento and Schreiner, 1990). using a fast Fourier transform. Since the frequencies in the synthe-
Speech FFRs may be ideal for evaluating the peripheral encod- sized stimuli were stationary, the fast Fourier transform would
ing of speech sounds, since they can be evoked by specific elements have provided an optimal estimate of the response energy in each
of speech (e.g. vowel harmonics; Aiken and Picton, 2006; Krishnan, frequency bin. Responses were detected at harmonics close to for-
2002). FFRs may be evoked by several separate elements of speech. mant peaks, and at several low-frequency harmonics distant from
One is the speech fundamental – the rate of vocal fold vibration. formant peaks, but not at the vowel fundamental frequency. In this
The other is the harmonic structure of speech. Voiced speech has study, half of the responses were recorded to a polarity-inverted
energy at the integer multiples of the fundamental frequency, stimulus, and the final result was derived by subtracting the re-
which are selectively enhanced by formants (resonance peaks cre- sponses obtained in one polarity from the responses obtained in
ated by the shape of the vocal tract). Responses to harmonics may the opposite polarity. This subtractive approach (see also
thus provide information about the audibility of the formant struc- Greenberg et al., 1987; Huis in’t Veld et al., 1977) is analogous to
ture of speech. the compound histogram technique used in neurophysiologic
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 37

studies (Anderson et al., 1971; Goblick and Pfeiffer, 1969). Its ratio- sponse was able to represent second formant transitions in syn-
nale stems from the effects of half-wave rectification involved in thetic consonant–vowel pairs. FFRs to harmonics may thus
inner hair cell transduction (Brugge et al., 1969). Discharges only provide useful information about speech encoding.
occur during the rarefaction phase of the stimulus. If the polarity In the present study, we recorded wideband responses to sev-
of the stimulus is inverted, the discharges to the rarefaction eral vowels in alternate polarities and analyzed both the additive
phase of the inverted stimulus now occur during the condensation and subtractive averages with the same dataset. We also recorded
phase of the initial stimulus. Subtracting the period histogram of two responses in the same polarity, so that we could compare the
this inverted stimulus from the period histogram of the non- added and subtracted alternate polarity averages to a constant-
inverted stimulus cancels the rectification-related distortion, and polarity average calculated across the same number of responses.
the discharge pattern corresponds to the stimulating waveform. We hypothesized that the average response to the constant-polar-
Scalp-recorded frequency-following responses reflect the activity ity stimuli would have energy at prominent stimulus harmonics
of synchronized neuronal discharges, so it is reasonable to apply (i.e. near formant peaks) as well as energy corresponding to the
the compound histogram technique to these data. stimulus envelope. We further hypothesized that the harmonic
This approach shows different results for envelope and spectral pattern in the stimulus would be displayed in the subtractive aver-
FFRs. By subtracting responses to alternate stimulus polarities, the age, and that the stimulus envelope would be displayed in the
alternate rectified responses to the stimulus are combined to pro- additive average. We hypothesized that we would be able to obtain
duce non-rectified analogues of stimulus components (inasmuch reliable individual subject responses to the fundamental and
as the neural system is able to phase-lock to those components). stimulus harmonics up to approximately 1500 Hz (the upper limit
Subtracting responses to alternate polarities thus removes distor- for recording frequency-following responses; Krishnan, 2002;
tions associated with half-wave rectification (e.g. the energy at Moushegian et al., 1973).
the envelope) that exist in the neural response. Using the subtrac-
tive procedure, Krishnan (2002) found responses at prominent 2. Materials and methods
stimulus harmonics, but not at the envelope frequency. In contrast,
when this subtractive procedure has not been used, robust re- 2.1. Subjects
sponses have been recorded at the envelope frequency (Aiken
and Picton, 2006; Krishnan et al., 2004; Greenberg et al., 1987). Seven women (ages 20–30) and three men (ages 23–30) were
An alternate technique that has been used to analyze FFR is to recruited internally at the Rotman Research Institute. All subjects
add responses recorded to alternate polarity stimuli (e.g. Johnson were right-handed, and had hearing thresholds that were 15 dB
et al., 2005; Small and Stapells, 2005). This technique is generally HL or better at octave frequencies from 250 to 8000 Hz. Nine sub-
employed to eliminate the cochlear microphonic or residual arti- jects (6f/3m) participated in the main experiment, and a smaller
fact from the stimulus, and to preserve the envelope FFR. Summing numbers of subjects participated in subsidiary experiments to
alternate responses cancels the cochlear microphonic and artifact, evaluate the recording montage (4f/1m) and to examine masking
leaving the envelope FFR. The downside of this approach is that it (2f/1m).
also cancels the spectral FFR.
2.2. Stimuli
1.3. Relationship between harmonics and formants
Two naturally-produced vowels, /a/ (as in ‘father’) and /i/ (as in
Formants and formant trajectories carry information that is ‘she’), were recorded for the experiment. The /i/ was chosen be-
essential for speech sound identification. The lowest two or three cause its second formant frequency is higher than other vowels
formants convey enough information to identify vowels, and to (Hillenbrand et al., 1995), occurring where synchronized responses
specify the consonant place of articulation (Liberman et al., are most difficult to record. A response to harmonics near the sec-
1954). Formants correspond to peaks in the spectral shape, and ond formant of /i/ would suggest that responses could be recorded
not to specific harmonics (for a review, see Rosner and Pickering, to the second formants of all the other vowels. The /a/ was chosen
1994). The vocal tract can be characterized as a filter that shapes because it has a low-frequency second formant, to maximize the
the output from the glottal source (Fant, 1970), with the peaks of chances of recording a second formant response for at least one
the spectrum (i.e. formants) corresponding to the poles of the filter. of the stimuli.
Voigt et al. (1982) recorded auditory nerve responses to noise- The vowels were recorded from a male speaker in a double-
source vowels (i.e. vowels with no harmonics), and found that the walled sound-attenuating chamber. A Shure KSM-44 large-dia-
Fourier transforms of interval histograms had large frequency phragm cardioid microphone was placed approximately 3 in. from
components corresponding to the peaks of the formants. However, the mouth, with an analogue low-pass filter (6 dB/octave above
this temporal encoding would not likely produce measurable re- 200 Hz) employed to mitigate the proximity effect. The signal
sponses at the scalp, since the temporal intervals would have oc- was digitized at 32 kHz with 24 bits of resolution using a Sound
curred at random phases in the absence of a synchronizing Devices USBpreTM digitizer, and saved to hard disk using Adobe
stimulus. The frequency-following response requires synchronized AuditionTM.
neural activity. Two tokens of /a/ and two tokens of /i/ were selected from por-
FFRs recorded to formant-related harmonics could be used to tions of the recordings where vocal amplitude was steady. Each to-
assess the audibility of formants and formant trajectories. For ken was manually trimmed to be close to 1.5 s long, with onsets
example, responses recorded at harmonics related to the first and and offsets placed at zero-crossings spaced by integer multiples
second formant would indicate that the formant peaks had been of the pitch period. This made each token suitable for continuous
neurally encoded, and that the information was likely available play, with no audible discontinuities between successive stimulus
for the development of the phonemic inventory. Krishnan (2002) iterations. Each token was then resampled at a rate slightly higher
recorded frequency-following responses to synthetic vowels with or lower than 32 kHz in order to give exactly 48,000 samples over
steady frequencies, where the peaks of the first and second for- 1.5 s. This resampling introduced a slight pitch shift, but this was
mants were not multiples of f0. In this study, responses were found less than +/ 1 Hz. Stimuli were then bandpass filtered between
at harmonics related to the first two formants. Plyler and 20 and 6000 Hz with a 1000-point finite impulse response filter
Ananthanarayan (2001) found that the frequency-following re- having no phase delay.
38 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

Stimuli of reversed polarity were obtained by multiplying the continuously for 75 s, corresponding to 50 iterations (with no time
stimulus by -1. Thus, there were in total eight stimuli – two vow- delay between successive presentations). This process was re-
els, two tokens and two polarities. These were named a1+, a1, peated 4 times per block, with results averaged offline to provide
a2+, a2, i1+, i1, i2+, and i2. a single 5 min (200-sweep) average. Each of the /i/ and /a/ tokens
An LPC (linear predictive coding) analysis was conducted in or- was presented twice in the same polarity, and once in the opposite
der to determine the formant structure of each vowel. Since for- polarity.
mant structure can be estimated more easily after removing the The second experiment investigated three possible sources for
low-pass characteristic of speech, the spectrum of each token (x) the responses – electrical artifact, brainstem, and CM. In order to
was pre-emphasized (or ‘whitened’) using the following equation: ensure that the responses were not contaminated by electrical arti-
fact, responses were recorded to the first /a/ token routed to an in-
y½n ¼ x½n  ax½n  1
sert earphone, which was not coupled to the ear. Since the subject’s
where a (0.90 for the /a/ tokens and 0.94 for the /i/ tokens) was cal- ears were occluded during the experiment, this rendered the stim-
culated by conducting a first-order linear predictive coding (LPC) ulus inaudible. The transducer of this insert earphone was in the
analysis on each of the tokens (the first-order LPC providing an esti- same location as when it was connected to the ear canal. Recording
mate of spectral tilt). significant responses in this condition would indicate that the
Fig. 1 (left) show the spectra of the /a/ and /i/ stimuli, as calcu- recordings made when the earphone was normally coupled to
lated using the Fourier analyzer (solid line), as well as the spectral the ear were contaminated by artifact.
shape of the vowels, as calculated via the 34th-order LPC analysis We then recorded responses to the first /a/ token between elec-
(dotted line). The location of the formant peaks and the closest har- trodes at the right and left mastoids, in order to increase the sen-
monic are given in Table 1, and the harmonics closest to the first sitivity of the recording to horizontally aligned dipoles (e.g.
two formants are indicated on the figure. The relative intensity of related to activity in the cochlea, auditory nerve or lower
each harmonic in the cochlea was estimated by calculating its brainstem).
amplitude in the digital stimulus waveform with the Fourier ana- In a third condition, we attempted to determine whether any
lyzer, and then modifying this value to take into account the effects part of the response could reflect the cochlear microphonic (which
of the middle ear transfer function (see Fig. 2, Puria et al., 1997). may precede neural responses by as little as 3 ms; Huis in’t Veld
These middle ear compensated spectra are shown in Fig. 1 (right). et al., 1977), by recording responses (using the vertical Cz to nape
Stimulus presentation was controlled by a version of the MAS- montage) to the first /a/ token in the presence of speech-shaped
TER software (John and Picton, 2000) modified to present external masking noise. Masking eliminates neural responses without elim-
stimuli. The digital stimuli were DA converted at a rate of 32 kHz, inating the cochlear microphonic, so any response recorded in the
routed through a GSI 16 audiometer, and presented monaurally presence of an effective masker would likely be cochlear micro-
with an EAR-Tone 3A insert earphone in the right ear. The left phonic. The minimum effective masking level was determined
ear was occluded with a foam EAR earplug. All stimuli were scaled for each subject by testing whether the /a/ token (a1+) could be de-
to produce a level of 60 dBA (RMS) in a 2-cm3 coupler. tected while speech-shaped noise was being played. The noise was
first presented at 50 dB HL, with the level raised by 5 dB after two
2.3. Procedure correct behavioral responses. This process was repeated until the
subject could no longer detect the vowel. During the electroen-
The first experiment examined the responses to natural vowels cephalographic recording, the noise was presented 5 dB above each
of the same or opposite polarity. Each 1.5-s stimulus was presented subject’s minimum effective masking level.

Vowel Stimuli Middle-Ear Compensated


90 dB /a/ /a/
f9
f13

0 dB

90 dB f2
/i/ /i/
f21

0 dB

60 Hz 1500 Hz 3000 Hz 60 Hz 1500 Hz 3000 Hz


Fig. 1. Left: Spectra of the /a/ and /i/ vowels, as calculated using the Fourier analyzer (solid line) as well as the spectral shape of the vowels, as calculated via a 34th-order LPC
analysis (dotted line). The reference signals of the Fourier analyzer followed the f0 trajectory, and are plotted with respect to the average frequency in each reference
trajectory. The harmonics closest to the first and second formants are indicated on the figure. Right: Estimated vowel spectra at the level of the cochlea (i.e. taking the middle
ear transfer function into account). See text for details.
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 39

Stimulus 200 Hz Tone 200 Hz AM (2 kHz Tone)

A G
Original

B H
Inverted

Response
“ ++” C I
Original
D J
Inverted

“ +- ” E K
(Orig. + Inv.)/2
“- -” F L
(Orig. - Inv.)/2

10 ms 10 ms
Fig. 2. Top left panels (A, B) show two periods of a 200 Hz tone, in opposite polarities. Top right panels (G, H) show a 2 kHz tone, amplitude modulated at 200 Hz, also in
opposite polarities. The four centre panels show the expected scalp-recorded FFR for either the 200-Hz tone (C, D) or the 200 Hz envelope (I, J). Nerve fibers are unable to lock
to the 2000 Hz carrier of the AM tone but can lock to its 200 Hz envelope, so the right column deals with only envelope FFRs. The average response to stimuli presented in a
single polarity (+ +) contains both spectral and envelope FFRs and there are responses in both left and right columns (C, I). However, when spectral FFRs to opposite polarities
(C, D) are added (+ ), the result (E) is small and twice the frequency of the actual responses. Conversely, when spectral FFRs to opposite polarities are subtracted ( ), the
result (F) resembles the original spectral component in the stimulus (A).

These experiments were part of an ongoing research project on Fig. 2 shows a simple model of these procedures – with the
‘‘evoked response audiometry” which was approved by the spectral FFR on the left and the envelope FFR on the right. The
Research Ethics Committee of the Baycrest Centre for Geriatric top left panels (A, B) of Fig. 2 show two periods of a 200 Hz tone,
Care. in opposite polarities. The top right panels (G, H) show a 2 kHz
tone, amplitude modulated at 200 Hz, also in opposite polarities.
2.4. Recordings Inversion of the modulated tone has no effect on the modulation
envelope. One might simplistically consider these as the two parts
Electroencephalographic recordings were made while subjects of a vowel sound with a fundamental frequency of 200 Hz and a
relaxed in a reclining chair in a double-walled sound-attenuating formant frequency at 2000 Hz that was amplitude modulated at
chamber. Subjects were encouraged to sleep during the recording. the fundamental frequency.
Responses were recorded between gold disc electrodes at the ver- The four centre panels show the expected scalp-recorded FFR
tex and the mid-posterior neck for all conditions except the hori- for either the 200-Hz tone (C, D) or the 200 Hz envelope (I, J). The
zontal condition in the second experiment. For this condition, model assumes that the nerve fibers are unable to lock to the
responses were recorded between the right and left mastoids. A 2000 Hz carrier of the AM tone but can lock to its 200 Hz envelope.
ground electrode was placed on the left clavicle. Inter-electrode This makes the right column deal with only envelope FFRs. The
impedances were maintained below 5 kX for all recordings. Re- combined effect of the hair cell transduction and the synaptic
sponses were preamplified and filtered between 30 Hz and 3 kHz transmission between the hair cell and the afferent nerve fiber
with a Grass LP511 AC amplifier and digitized at 8 kHz by a Na- effectively rectifies the signal. Action potentials occur with the
tional Instruments E-Series data acquisition card. greatest probability during the rarefaction phase of the stimulus
Prior to analysis the recordings were averaged in the following (plotted upward), so the magnitude of the neural population re-
way. For each subject and vowel three different responses were sponse is proportional to a half-wave rectified version of the stim-
calculated. A + + average was obtained by averaging all four re- ulus. Synchronized neural activity gives rise to the voltage
sponses to the original stimulus (e.g. two presentations of the fluctuations in the FFR, so this similarly resembles the half-wave
a1+ token and the a2+ token). A +  average was obtained by aver- rectified stimulus. This is analogous to the period histogram tech-
aging the first two responses to the original stimulus (e.g. one pre- nique used to study the temporal structure of neurophysiologic re-
sentation of the a1+ token and the a2+ token) together with two sponses (Anderson et al., 1971; Brugge et al., 1969). In the figure,
responses to the inverted stimulus (e.g. the a1 token and the delays associated with stimulus presentation, cochlear transduc-
a2 token). A   average was then obtained by subtracting the tion, synaptic transmission and neural conduction have been ex-
two responses to the inverted stimulus from the two responses cluded in order to align the modeled responses with the stimulus
to the original stimulus. In this nomenclature, the first sign gives period.
the operation and the second sign codes whether the second re- The different averaging procedures (Table 2) then give three dif-
sponse is the response to the original or the inverted stimulus. ferent responses. In our nomenclature, the first sign denotes the
We always start with a response to the original stimulus. Table 2 procedure (addition or subtraction), and the second sign denotes
summarizes these procedures. For each type of average response, the polarity of the second set of responses (the first is always the
grand mean average responses were obtained by averaging the re- original). The average response to stimuli presented in a single
sponses of all subjects together. polarity (+ +) contains both spectral and envelope FFRs and there
40 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

are responses in both left and right columns (C, I). However, when relation to a set of reference signals, which need not be static.
spectral FFRs to opposite polarities (C, D) are added (+ ), the result Fig. 3 shows the spectrum of the first /a/ stimulus as calculated
(E) is small and twice the frequency of the actual responses. Con- using the FFT and as calculated using the Fourier analyzer. Both
versely, when spectral FFRs to opposite polarities are subtracted analyses were conducted with a resolution of 2 Hz, but the refer-
( ), the result (F) resembles the original spectral component in ence signals of the Fourier analyzer were constructed to follow
the stimulus (A). The subtraction artificially reconstitutes the the f0 trajectory of the speech. For the Fourier analyzer, data are
pre-rectified stimulus components present in the response, there- plotted relative to the mean frequency in each reference trajectory.
by removing rectification-related distortion (including the enve- Note that harmonic amplitudes were much greater when the anal-
lope). Spectral FFR is therefore present in all of the averages, but ysis was conducted with the Fourier analyzer, indicating that the
clearest in the   average. In the + + average and +  averages, it FFT underestimated these amplitudes.
is mixed with envelope FFR and rectification-related distortion The Fourier analyzer was used to quantify the amplitude of the
(and its frequency is doubled in the +  average). response along the trajectory of f0 and 23 of its harmonics (f2–f24).
When envelope FFRs to opposite polarities (I, J) are added (+ ), The same analyzer was used to quantify response amplitude along
the average (K) is the same as the actual response, since polarity 16 frequency trajectories adjacent to each of the harmonics (i.e. 8
inversion has little effect on envelope modulations (H). For the above and 8 below). Each trajectory was separated by 2 Hz, so
same reason, subtracting opposite polarity envelope FFRs ( ) the highest and lowest trajectories were 16 Hz above and below
eliminates the envelope FFR in the average (L). Therefore, envelope each harmonic, respectively. The 16 adjacent trajectories were
FFR is present in the + + and +  averages, but absent in the   used to quantify non-stimulus-locked electrophysiologic activity;
average. considered to be electrophysiologic ‘noise’ for the purpose of sta-
tistical testing.
2.5. Analysis Reference frequency tracks at each trajectory were created in
the following manner. Since the first harmonic was present in
2.5.1. Natural vowels the all of the vowel tokens, it was used to create the f0 reference
The energy in voiced speech is concentrated at the fundamental track. Visual inspection of the stimulus spectrum indicated that
frequency (f0) – equal to the rate of vocal fold vibration – and at its the first harmonic was slightly higher than 100 Hz, so f1 was iso-
harmonics, which are integer multiples of f0. The harmonics are la- lated by filtering the response between 50 and 200 Hz (with a
beled with a subscript that corresponds to the harmonic number. 1000-point phase-corrected finite impulse response filter). The Hil-
For example, when f0 is 100 Hz, f2 is 200 Hz. When it is present, bert transform provided the complex representation of f1 and the
f1 is equal to f0, although f0 can be perceived in the absence of four-quadrant inverse tangent provided its instantaneous phase.
any actual energy at f1 (a phenomenon known as the ‘‘missing The instantaneous frequency of f1 could then be calculated by find-
fundamental”). ing the derivative of the instantaneous phase with respect to time.
The fundamental frequency and harmonics of natural speech This frequency track was smoothed to remove any sharp changes
vary across time. The rate of f0 variation in a steady naturally-pro- introduced by the process of approximate differentiation, using a
duced vowel can be as high as 50 Hz/s (see Fig. 5c in Aiken and Pic- 50 ms boxcar moving average applied 3 times. Frequency tracks
ton, 2006). The response to the speech f0 precisely mirrors its at each higher harmonic fi were created by multiplying the f0 fre-
frequency changes (Aiken and Picton, 2006; Krishnan et al., quency track by each integer between 2 and 24. Adjacent fre-
2004), so responses to natural speech cannot be accurately ana- quency tracks were created by transposing each track by the
lyzed with techniques that require a stationary signal. The stimuli appropriate number of Hz.
and responses were therefore analyzed using a Fourier analyzer. A Fourier analyzer computes the amplitude of a signal along a
Unlike the fast Fourier transform (FFT), which calculates energy particular frequency trajectory by scalar multiplication of the sig-
in static frequency bins, a Fourier analyzer calculates energy in nal with a set of reference sinusoids (i.e. sine and cosine projec-

Fourier Analyzer Fast Fourier Transform

90 dB

0 dB

60 Hz 1000 Hz 2000 Hz 60 Hz 1000 Hz 2000 Hz


Fig. 3. Spectra of the first /a/ token calculated with a fast Fourier transform (right) and with a Fourier analyzer (left). The resolution of each analysis was 2 Hz, but the
reference signals of the Fourier analyzer were constructed to follow the f0 trajectory of the vowel. Fourier analyzer data are plotted with respect to the average frequency of
each reference trajectory.
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 41

tions) that represent the trajectory. Reference sinusoids were cre- soids by an amount equal to the delay in the response, prior to the
ated for each trajectory by calculating the sine and cosine of the multiplication. Evoked potentials are delayed by the time required
instantaneous phase angle of the corresponding frequency track to transduce the stimulus, as well as the time required for activa-
(i.e. the cumulative sum of the starting phase and the derivative tion to reach the place where the response is generated. We esti-
of the instantaneous frequency). This produced pairs of orthogonal mated the delay of the response to be approximately 10 ms (see
sinusoids with unity amplitude. Responses were resampled to be Table 1 in Picton et al., 2003), and delayed the reference sinusoids
32 kHz (4 times the data acquisition rate) prior to multiplication by 10 ms prior to multiplication.
with the reference sinusoids. The products were then integrated The significance of the response at each harmonic was evalu-
over the length of the sweep (1.5 s), producing two values (x and ated by comparing the power of the response along the harmonic’s
y). The amplitude (a) and phase (h) of the response along each tra- trajectory with the power of the response along adjacent trajecto-
jectory was then calculated by finding the vector magnitude and ries, using an F statistic (Zurek, 1992; Dobie and Wilson, 1996; Lins
phase, using the following equations: et al., 1996). An alpha criterion of 0.05 was selected for all analyses.
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A Bonferroni correction was applied to account for the 24 signifi-
a¼ x2 þ y2 cance tests (1 per harmonic) involved in each analysis. Thus the
h ¼ tan1 ðy=xÞ F statistic was accepted as significant for the grand mean record-
ings at p < 0.0021. An additional Bonferroni correction (further
Since the analysis requires scalar multiplication of the response dividing the alpha criterion by the number of subjects) was applied
with each of the reference signals, the result is sensitive to the tem- when individual subject responses were tested for significance.
poral alignment of the stimulus and response. If there is a lag be- We estimated the relative intensity of each harmonic in the co-
tween the stimulus (the basis for the reference sinusoids) and chlea by calculating its amplitude in the digital stimulus waveform
the response, the trajectories will not be temporally aligned. A with the Fourier analyzer, and then modifying this value to take
lag that is smaller than the period of the reference frequency will into account the effects of the middle ear transfer function (the fre-
merely shift the phase of the measured response, but a greater quency response function of the ER-3A insert earphone in the outer
lag may result in an underestimation of the response amplitude. ear is relatively flat – within ±4 dB – from 100 to 1500 Hz). The
This problem can be circumvented by delaying the reference sinu- middle ear provides a gain of approximately 20 dB between 500

Grand (vector) average responses to /a/

Harmonic Amplitude (Response)


100 nV * * Average Neighbouring Frequency Amplitude (Response)

* * * Significant at p < 0.0021


*
*
++ * * * *
.1 nV
100 nV
* *
* * *
+– * * *
.1 nV
100 nV

–– * * * * * * * *
.1 nV
f0 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18
90 dB

stimulus

0 dB

60 Hz 400 Hz 800 Hz 1200 Hz 1600 Hz 2000 Hz

Fig. 4. Grand (vector) average responses to /a/. For the top three panels, black lines indicate the response amplitude at the fundamental or harmonic. Grey bars indicate the
average amplitude at the 16 adjacent frequencies. Significant responses are marked with an asterisk. The top panel shows the + + average, which was created by averaging
responses to all four presentations of the /a/ stimulus (in the same polarity). The second panel shows the +  average, which was created by averaging responses to two
presentations of the /a/, with responses to two presentations of the inverted-polarity /a/. The third panel shows the   average, which was created by subtracting responses
to two presentations of the /a/ from responses to two presentations of the inverted-polarity /a/. The stimulus is displayed in the lowest panel for reference (modified by the
middle ear transfer function).
42 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

and 2000 Hz, but only about 10 dB at 300–400 Hz, 7 dB at 200 Hz sponses that were significant at p < 0.0021 (i.e. alpha of 0.05 with
and 2 dB at 100 Hz (see Fig. 2, Puria et al., 1997). After taking this Bonferroni correction for 24 significance tests).
transfer function into account, the three most intense harmonics The three most intense harmonics in the cochlea were f8, f9 (the
for /a/ were f8, f9, and f13, and the three least intense harmonics harmonics closest to the first formant) and f13 (the harmonic clos-
were f17, f18, and f20. The three least intense harmonics below est to the second formant). Significant responses were detected at
1500 Hz (the approximate frequency limit of FFR) were f0, f3, and two of these harmonics in the + + average (f8, f9), and at all three of
f4. The most intense harmonics for /i/ were f2, f3, and f4, and the these harmonics in the   average. The presence of these re-
three least intense harmonics were f10, f12, and f15. Below sponses in the   average indicates that they were spectral FFRs.
1500 Hz, the three least intense harmonics were f9, f10, and f12. In the   average, significant responses were also recorded at
We expected + + responses and +  responses to have envelope other stimulus harmonics: f2, f5, f6, f7, and f14.
FFRs at f0. Responses at f0 were compared to the responses at other The three least intense harmonics in the cochlea were f17, f18,
harmonics (in terms of amplitude and in terms of incidence of sig- and f20 (or f0, f3, and f4 below 1500 Hz). There were no significant
nificant responses). For the /a/, the first harmonic was estimated to responses to f17, f18, or f20 in any of the averages. There were no sig-
be one of the three least intense harmonics, so a response at f0 nificant responses to f0, f3 and f4 in the   average, although there
would not likely be a spectral FFR. Spectral FFRs would be present were significant responses to these harmonics in the + + and + 
in the + + and   responses. This was tested by comparing mea- average, and these responses were higher in amplitude than most
surements at the most and least intense harmonics. other responses.
A further goal of the study was to determine whether reliable The top panels of Fig. 5 show the percentage of individual sub-
responses could be detected (with the Fourier analyzer) at higher ject responses that were significant at the fundamental and at the
frequencies (up to approximately 1500 Hz) and in individual sub- harmonic nearest to the first formant of /a/. There were no signif-
jects. We therefore calculated the percentage of individual subject icant individual subject responses to the harmonics near the sec-
responses that were significant in each condition. ond formant (f13 and f14), even though these were significant in
the grand mean average over all subjects (in the   average re-
3. Results sponse). All subjects displayed significant responses to the funda-
mental frequency, in both the + + and +  averages, but only one
3.1. Experiment 1: natural vowels displayed significant responses in the   average. At the harmonic
nearest to the first formant (f9), most of the subjects displayed sig-
3.1.1. Responses to /a/ nificant responses in the + + and   average, but none displayed
Fig. 4 shows the amplitude of the grand average response significant responses in the +  average.
(coherently averaged across all subjects) to /a/. Although we were
able to record up to 3000 Hz, there were no significant responses in 3.1.2. Responses to /i/
the grand average or individual subject waveforms above f14 Fig. 6 shows the grand average responses to /i/. There were no
(1493 Hz). For simplicity, responses are only shown up to significant responses in the grand average or individual subject
2000 Hz. Response amplitudes at speech harmonics are repre- waveforms above f9 (1098 Hz), so responses are only shown up
sented by the narrow black bars, and mean response amplitudes to 2000 Hz. Data are presented as with Fig. 4.
at adjacent frequencies are represented by the wide gray bars. The three most intense harmonics were f2 (the harmonic closest
The stimulus spectrum (as calculated with the Fourier analyzer) to the first formant), f3 and f4. Significant responses were detected
is displayed in the bottom panel. An asterisk indicates those re- at all of these harmonics in all averages, although response ampli-

Fundamental Frequency (f0) Harmonic Nearest First Formant


100 /a/ /a/

0
100 /i/ /i/

0
++ + – – – ++ + – – –
Fig. 5. Percentage of individual subject responses that were significant at the fundamental frequency (left) and at the harmonic nearest to the first formant (right). The top
row corresponds to the /a/ stimulus, and the bottom row corresponds to the /i/ stimulus. Significance was determined by comparing the power of the response at the
fundamental or harmonic with response power in adjacent frequency bins, using an F statistic. A Bonferroni correction was applied for repeated testing at 24 harmonics and 9
subjects (2160 significance tests).
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 43

Grand (vector) average responses to /i/

Harmonic Amplitude (Response)


100 nV * * Average Neighbouring Frequency Amplitude (Response)
* * Significant at p < 0.0021
*
++ *
*
.1 nV
100 nV *
* * *
+– * *
.1 nV
100 nV
* *
–– * * *
.1 nV
f0 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
90 dB

stimulus

0 dB

60 Hz 400 Hz 800 Hz 1200 Hz 1600 Hz 2000 Hz

Fig. 6. Grand (vector) average responses to /i/. Data are presented as in Fig. 4.

tude at f0 (the envelope) was greatest in the + + and +  averages, When the stimulus was effectively masked using speech-
and the response amplitude at f2 (the harmonic nearest the first shaped masking noise (see Fig. 7, third panel), only one significant
formant peak) was greatest in the   condition. The three least in- response (at f9) was detected in the grand average, but none of the
tense harmonics were f10, f12, and f15 (or f9, f10, and f12 below individual subject responses were significant.
1500 Hz). No significant responses were recorded at these harmon-
ics in any of the conditions, with the exception of a significant re-
sponse to f9 in the + + average. 4. Discussion
The bottom panels of Fig. 5 show the percentage of individual
subject responses that were significant at the fundamental and at 4.1. Envelope FFR and spectral FFR
the harmonic nearest to the first formant of /i/. All subjects dis-
played significant responses to the fundamental frequency, in both The present study investigated the human FFR to naturally-pro-
the + + and +  averages, and half of the subjects displayed signif- duced vowels, and related these responses to the amplitude enve-
icant responses in the   average. At the harmonic nearest to lope (the fundamental frequency) and the formant frequencies in
the first formant (f2), all of the subjects displayed significant re- the vowel spectrum (at harmonics of the fundamental frequency,
sponses in the + + and   averages, and most displayed significant themselves modulated in amplitude at the frequency of the glottal
responses in the +  average. waveform). By adding or subtracting responses to opposite polarity
stimuli, response components related to the envelope (‘‘envelope
3.2. Experiment 2: investigating the sources of the harmonic responses FFR”) can be distinguished from components related to the spec-
trum (‘‘spectral FFR”).
Fig. 7 shows the results from the three parts of Experiment 2. All The results of the present study can be interpreted in light of the
of the averages in this experiment were + + averages. When the model presented in Fig. 2. Since only spectral FFR is present in the
stimulus was routed to the left earphone, which was not inserted   average, one would expect that this average would resemble
into the ear, there were no significant responses (top line ‘‘arti- the stimulus spectrum most closely, subject to the upper frequency
fact”). This was true for the grand mean average as well as for limit for neural phase-locking. For the /a/, the harmonics that
the individual subjects. should have been most intense in the cochlea were f8, f9, and f13
When responses were recorded between the right and left mas- (the harmonics closest to the formant peaks). In the   average,
toids (see Fig. 7, second panel), the grand average response showed significant responses were recorded at all of these harmonics.
significant responses at f5 and f9. Only two individual subject re- The harmonics that should have been least intense in the cochlea
sponses were significant, and both occurred at f9. (below 1500 Hz) were f0, f3, and f4. No significant responses were
44 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

Harmonic Amplitude (Response)


100 nV Average Neighbouring Frequency Amplitude (Response)
Significant at p < 0.0021
*
artifact
.1 nV
100 nV

horizontal * *
.1 nV
100 nV

masked *
.1 nV
f0 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18
90 dB

stimulus

0 dB

60 Hz 400 Hz 800 Hz 1200 Hz 1600 Hz 2000 Hz

Fig. 7. Grand (vector) average responses to /a/, in Experiment 2. Data are presented as in Fig. 4. Responses in the top panel were collected with the stimulus routed to an
earphone that was not inserted in the ear canal. Responses in the second panel were collected using a horizontal electrode montage (non-inverting electrode on ipsilateral
mastoid; inverting electrode on contralateral mastoid). Responses in the third panel were collected in the presence of an effective speech-shaped masker (with the standard
vertical electrode montage used elsewhere in the study).

detected at these harmonics in the   average, even though all The responses to harmonics near formant peaks are best exam-
three were significant in the + + and +  averages. ined in the /a/ response, since the first two harmonics were sepa-
For the /i/, the harmonics that should have been most intense in rate from the fundamental, but within the frequency range of
the cochlea were f2, f3, and f4. Significant responses were detected observable FFR components. The response at the harmonic closest
at these frequencies in the + + and +  averages, but only to f2 and f3 to the first formant was significant in all of the averages, and thus
in the   average. Also, the largest response in the   average oc- might have been a combination of envelope and spectral FFR,
curred at f2 (the harmonic closest to the first formant peak), although it also could have been entirely spectral FFR (since spec-
whereas the largest response in the + + and +  averages occurred tral FFR may be present in the +  average – see Fig. 2, panel E). A
at f0 (the frequency of the glottal pitch envelope). Thus, the spectral significant response was detected at the harmonic closest to the
FFR peaked near the first formant, while the envelope FFR peaked second formant peak in the + + and   averages, but not in the
at the fundamental frequency. +  average, indicating that this was a spectral FFR.
No significant responses were recognized above 1493 Hz (in Two phenomena are not explained by the concept that f0 is de-
any of the averages). This indicates that the central nervous system tected by an envelope FFR and that formant harmonics are de-
does not follow frequencies higher than this limit in a time-locked tected by spectral FFRs:
way. Clearly the nervous system responds to these higher frequen-
cies – but this is likely accomplished with a rate-place code rather 1. The occurrence of large envelope FFRs to the harmonics f2–f7 (in
than a synchronized temporal code (Rhode and Greenberg, 1994). the + + and +  averages for both /a/ and /i/). These responses
A 1300 Hz limit for following temporal signals is also found in could be introduced by non-linearities in the auditory nervous
studies of sound localization wherein the timing of signals is com- system beyond the auditory nerve, or harmonic distortion
pared between the ears (Zwislocki and Feldman, 1956). Interest- resulting from the rectification of the envelope within the
ingly, such localization is dependent on the carrier timing and cochlea. Although the model in Fig. 2 presents the envelope
independent of envelope timing (Schiano et al., 1986). FFR as a DC-shifted sinusoid (I, J), this is only because of the
The response at f0 is mainly an envelope FFR. This was most sinusoidal nature of the modeled stimulus envelopes (G, H).
clearly shown for the /a/, since it was present in the + + and +  The actual envelope of the glottal waveform is triangular or
averages, but not in the   average. For the /i/ it was present in sawtooth-shaped, with the glottis closing more quickly than it
all of the averages. However, the first formant was very close to opens (Holmberg et al, 1988). This sawtooth characteristic is
f0 and there may have been a combination of an envelope FFR to responsible for the rich harmonic structure of the vocal source,
the glottal pitch envelope and a spectral FFR to harmonics close and this broadly harmonic signal is then accentuated by the res-
to the formant peak. onance characteristics of the vocal tract to give the characteris-
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 45

tic formant spectrum of a vowel. All of the harmonics present in A portion of what is recorded from the scalp as spectral FFR
the sound can give rise to spectral FFRs. Each harmonic varies in therefore originates as CM. Due to synaptic delays, CM activity oc-
amplitude over time according to the glottal waveform. Enve- curs earlier than neural activity. At a constant frequency, these
lope FFRs relate to the energy introduced by the rectification timing differences amount to differences in phase, and the contri-
of this modulating envelope – mainly at its fundamental fre- butions of each source are difficult to distinguish. However, with a
quency, but probably also at its harmonics (cf. Cebulla et al., changing frequency stimulus and a Fourier analyzer, the maximum
2006). Unlike the acoustic harmonics present in the stimulus, response amplitude measurement should occur when the stimulus
these harmonics are produced by the rectification of the enve- and response are temporally aligned. Thus, by repeating the anal-
lope during inner hair cell transduction, and are thus not spec- ysis at different response lags, the CM component should be distin-
trally-shaped by the vocal tract. They relate only to the energy guishable from the neural component.
in the stimulus envelope. In the present study, the spectral content of the vowels was not
2. The occurrence of spectral FFRs to the /a/ at f5 and f6, where the variable enough to permit this type of analysis. The average instan-
energy in the stimulus was low. This is probably related to dis- taneous frequency rate of change was about 12 Hz/s for the /a/ to-
tortion products in the cochlea. The cochlear nonlinearity pro- kens, and 13 Hz/s for the /i/ tokens. If this change occurred
duces multiple intermodulation distortion products in consistently in one direction, a 10 ms mismatch between the stim-
response to pairs of tones, including prominent intermodula- ulus and response would result in an average frequency mismatch
tion distortion components at the 2fa  fb cubic difference fre- of 0.12 Hz. Since the resolution of the Fourier analyzer was 0.67 Hz
quency, and at the fb  fa difference frequency, where fa and fb (the reciprocal of the integration time), the response would still fall
can refer to any pair of frequencies (fa < fb). These are called dis- within the same analysis bin, and the amplitude of the measured
tortion product otoacoustic emissions when acoustically mea- response would be unaffected. With such a small rate of change,
sured with a sensitive microphone in the ear canal, but they the stimulus–response timing mismatch would have to be at least
can also be electrically detected in the scalp-recorded brain- 55 ms for the average deviation between the stimulus and the re-
stem responses (Chertoff et al., 1992; Krishnan, 2002; Pandya sponse to be large enough to reduce the amplitude of the measured
and Krishnan, 2004; Rickman et al., 1991). Because they are response. Moreover, the frequency changes were small fluctuations
not related to half-wave rectification, they are not removed by in the rate of vocal fold vibration that ultimately remained rela-
subtracting responses to alternate polarities, and they are pres- tively steady, so any timing mismatch between the stimulus and
ent in the   average (Krishnan, 2002). response would have to be even greater than this to have an effect
on the measurement.
The response at f5 could have followed the 2f9–f13 cubic distor- In future studies it would be helpful to use stimuli with rapid
tion product and/or intermodulation distortion at the f14  f9 and frequency changes (e.g. with exaggerated intonation), since these
f13  f8 difference frequencies. Similarly, the response at f6 could would facilitate a precise analysis of the temporal delay of the re-
have followed intermodulation distortion at the f14  f8 difference sponse. The CM may precede the spectral FFR by as little as 3 ms
frequency. These were all prominent harmonics near formant (Huis in’t Veld et al., 1977). A stimulus with a 333 Hz/s frequency
peaks (at which spectral FFRs were detected), so we would expect changes would increase or decrease in frequency by 1 Hz in 3 ms.
these harmonics to give rise to cochlear distortion products. With a 0.67 Hz analysis resolution (1.5 s integration time), it would
be possible to tease these contributions apart. The separation of
4.2. Contributions from the cochlear microphonic neural FFR from CM could be further aided by the use of a near field
(i.e. meatal or middle ear) recording electrode, which would make
The second experiment was conducted to investigate the possi- it possible to determine the precise timing (and phase) of the CM.
bility of non-neurogenic (e.g. artifactual or cochlear) contributions One interesting aspect of our responses is that the CM that we
to the measured response. No significant responses were detected recorded as part of the + + or   response was not recognizable
when the stimulus was uncoupled from the ear, while the electri- at frequencies greater than 1500 Hz. Cochlear microphonic is gen-
cal components (i.e. the EAR-3A transducer) were in place, indicat- erated by hair cells at all regions of the cochlea, and one therefore
ing that responses could not be attributed to electrical artifact. would not expect it to be limited by the frequency postulated as
However, a significant response at the harmonic closest to the the limit of neural synchronization. One might therefore consider
first formant of the /a/ was recorded with a horizontal electrode that part of what we are terming the CM is actually related to syn-
montage. This suggests a contribution of the CM to the recorded chronous firing in the afferent auditory neurons (cf. Coats et al.,
FFRs, although it could have been earlier neurogenic activity (e.g. 1979). Since this neurophonic would not survive the masking
from the auditory nerve). The scalp-recorded FFR likely represents manipulation, however, we would still have to find an additional
a combination of fields generated in the upper brainstem near the reason for no CM responses at frequencies greater than 1500 Hz.
inferior colliculus and in the cochlea (Sohmer et al., 1977). The co- The absent CM above 1500 Hz can perhaps be explained on the ba-
chlear response is best recorded from electrodes near the cochlea, sis of the stimulus energy being greatest near the first formant. If
such as on the mastoid, whereas the brainstem generator is best the speech energy had been more evenly distributed across the fre-
recorded in the vertically oriented electrode montage (Galbraith quencies, CM may have been recognized at higher frequencies.
et al., 2000).
The results with masking are more definite. The masking will 4.3. Stimulus–response relationships
prevent any synchronous activation of neurons to the stimulus
and therefore remove any neurally dependent response. However, With a harmonic stimulus like speech, stimulus components
masking does not affect the CM which readily reproduces the and related distortions may overlap at harmonic frequencies. The
acoustic signal in the noise. Averaging the masked CM attenuates glottis produces a sawtooth-shaped pulse, introducing acoustic en-
the microphonic masking noise (which is random from trial to ergy at integer multiples of the glottal pulse rate (f0). This is regis-
trial) and leaves the microphonic of the stimulus. Various masking tered in the cochlea, where the cochlear nonlinearity produces
techniques have been proposed to distinguish the CM from the harmonic and intermodulation distortion components, all of which
brainstem FFR (e.g. Chimento and Schreiner, 1990). Our procedure occur at integer multiples of f0. The rectification involved in the
simply demonstrated that a significant component of our response transduction of stimulus energy at speech harmonics (and cochlear
was immune to noise masking and likely CM. distortion products) produces neural harmonic and intermodula-
46 S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47

tion distortion products, all of which occur at integer multiples of the FFR and harmonic speech components is complicated by the
f0. The rectification also introduces energy at the fundamental fre- spectral overlap of stimulus harmonics, distortion products, and
quency of the amplitude envelope, which overlaps with energy at the stimulus envelope.
f0. Finally, harmonic distortion resulting from rectification of the The +  average provides a way to ensure that reliable speech-
envelope produces energy at integer multiples of f0. The neural re- related information has reached the nervous system. However, it
sponse occurring at a particular harmonic frequency might thus re- may not indicate that the nervous system is receiving sufficient
late to stimulus energy at that harmonic, a cochlear distortion information to discriminate between different vowels. The enve-
product related to the stimulus energy at other harmonics, a recti- lope FFR shows that energy is being modulated at the fundamental
fication-related distortion of a stimulus component at other har- frequency of the vowel but it does not indicate the frequencies of
monics, the stimulus envelope, or a rectification-related the modulated energy. Deriving envelope FFRs using high-pass
distortion of the stimulus envelope. It may even relate to a distor- noise techniques may be a way to determine what frequencies
tion involved in the transduction of the signal (e.g. an earphone the envelope FFR is carrying.
non-linearity), electrical artifact, or in the generation of the CM. The spectral FFR that is obtained in the   average may be the
This precludes any simple interpretation of the speech-evoked most useful measure clinically, since it can be used to assess the
FFR – the response at a given speech harmonic may not indicate audibility of formants via the related harmonics. Future studies
audibility of that harmonic. should incorporate techniques to eliminate the potential for co-
Using both +  and   averages (Table 2), it is possible to tease chlear microphonic contamination, so that these responses can
apart some of the possible contributions to the scalp-recorded re- be unequivocally related to neural activity. These responses should
sponse. The   average eliminates the envelope FFR. The +  aver- also be related to behavioral measures of speech understanding.
age eliminates the CM, and preserves the envelope FFR. However, The best approach for evaluating the neural encoding of speech
we have seen that some distortion products of the spectral FFR has yet to be determined, but this will likely involve the measure-
may show up in the +  average. Interestingly one of the early stud- ment of brainstem FFRs, along with other speech-evoked activity
ies of the tone-evoked FFR demonstrated a neurally generated FFR from the cortex. Future studies should focus on ways to relate re-
(as distinct from the CM) by using a +  average which showed a corded responses to the neural encoding of important speech com-
response at twice the frequency of the tone (Sohmer et al., 1977). ponents (e.g. formants), and to relate these responses to speech
Since frequency components in the +  average could relate to understanding.
distorted spectral FFR or envelope FFR, this derivation is not partic-
ularly helpful in distinguishing the two types of FFR. A simpler way Acknowledgments
to distinguish spectral and envelope FFR would be to compare the
+ + average with the   average. Spectral FFR components should This study was supported by a Grant from the Canadian Insti-
be present in both, while envelope FFR components should only be tutes for Health Research and by funds donated by James Knowles.
present in the former. However, the +  average is unique in that it Patricia Van Roon provided technical assistance with the record-
eliminates the CM and the stimulus artifact. The +  average would ings and with the manuscript.
thus be preferred for establishing that a neurogenic response has
occurred. References
Recent work with the ABR (Johnson et al., 2007; King et al.,
2002; Russo et al, 2004) has used a +  average to evaluate the Aiken, S.J., Picton, T.W., 2006. Envelope following responses to natural vowels.
transient FFR to brief speech stimuli. Stimulus artifact and CM Audiol. Neuro-otol. 11 (4), 213–232.
Aiken, S.J., Picton, T.W., 2008. Cortical responses to the speech envelope. Ear Hear.
are eliminated by averaging together responses to stimuli of oppo- 29 (2), 139–157.
site polarity leaving an FFR that follows the fundamental frequency Anderson, D.J., Rose, J.E., Hind, J.E., Brugge, J.F., 1971. Temporal position of
of the vowel. In our terminology this would be an envelope FFR. discharges in single auditory nerve fibers within the cycle of a sine-wave
stimulus: frequency and intensity effects. J. Acoust. Soc. Am. 49 (4, Suppl. 2),
The downside of the +  average is that it eliminates most spec- 1131–1139.
tral FFR in addition to the CM. For instance, +  average responses Billings, C.J., Tremblay, K.L., Souza, P.E., Binns, M.A., 2007. Effects of hearing aid
(to the /a/) were not detected at prominent stimulus harmonics or amplification and stimulus intensity on cortical auditory evoked potentials.
Audiol. Neuro-otol. 12 (4), 234–246.
at their second harmonics, apart from the significant response at f9.
Brugge, J.F., Anderson, D.J., Hind, J.E., Rose, J.E., 1969. Time structure of discharges in
If the goal of the technique is to assess the neural encoding of par- single auditory nerve fibers of the squirrel monkey in response to complex
ticular speech features (e.g. harmonics), it would be best to use the periodic sounds. J. Neurophys. 32 (3), 1005–1024.
  average, and control for the CM. Cebulla, M., Stürzebecher, E., Elberling, C., 2006. Objective detection of auditory
steady-state responses: comparison of one-sample and q-sample tests. J. Am.
Cochlear microphonic can be identified by recording noise- Acad. Audiol. 17 (2), 93–103.
masked responses. Masking eliminates the FFR, but not the cochlear Chertoff, M.E., Hecox, K.E., Goldstein, R., 1992. Auditory distortion products
microphonic. With a stimulus with rapid frequency changes, the co- measured with averaged auditory evoked potentials. J. Speech Hear. Res. 35
(1), 157–166.
chlear microphonic and the neural FFR could be distinguished by Chimento, T.C., Schreiner, C.E., 1990. Selectively eliminating cochlear microphonic
varying the stimulus–response delay in the Fourier analyzer. The contamination from the frequency-following response. Electroencephalogr.
short-latency cochlear microphonic would be expected to be pres- Clin. Neurophysiol. 75 (2), 88–96.
Coats, A.C., Martin, J.L., Kidder, H.R., 1979. Normal short-latency
ent in the noise-masked recording, but the longer latency spectral electrophysiological filtered click responses recorded from vertex and
FFR would be expected to be eliminated by the masker. It should external auditory meatus. J. Acoust. Soc. Am. 65 (3), 747–758.
therefore be possible to separate the FFR from cochlear micro- Cunningham, J., Nicol, T., Zecker, S., Bradlow, A., Kraus, N., 2001. Neurobiologic
responses to speech in noise in children with learning problems: deficits and
phonic, and to verify the effectiveness of this separation with mask- strategies for improvement. Clin. Neurophys. 112 (5), 758–767.
ing. A simpler technique might be to subtract the noise-masked Dajani, H., Purcell, D., Wong, W., Kunov, H., Picton, T., 2005. Recording human
response from the   average (Chimento and Schreiner, 1990). evoked potentials that follow the pitch contour of a natural vowel. IEEE Trans.
Biomed. Eng. 52 (9), 1614–1618.
Dimitrijevic, A., John, M.S., Picton, T.W., 2004. Auditory steady-state responses and
4.4. Clinical Implications word recognition scores in normal-hearing and hearing-impaired adults. Ear
Hear. 25 (1), 68–84.
The speech-evoked FFR might be useful for the validation of Dobie, R.A., Wilson, M.J., 1996. A comparison of t test, F test and coherence methods
of detecting steady-state auditory evoked potentials, distortion-product
hearing aid fittings in infants, by providing information about the otoacoustic emissions, or other sinusoids. J. Acoust. Soc. Am. 100 (4), 2236–
neural encoding of speech. However, the relationship between 2246.
S.J. Aiken, T.W. Picton / Hearing Research 245 (2008) 35–47 47

Fant, G., 1970. Acoustical Theory of Speech Production: With Calculations Based on Picton, T.W., Dimitrijevic, A., van Roon, P., John, M.S., Reed, M., Finkelstein, H., 2001.
X-Ray Studies of Russian Articulations. Walter de Gruyter, The Hague. Possible roles for the auditory steady-state responses in fitting hearing aids. In:
Galbraith, G.C., Threadgill, M.R., Hemsley, J., Salour, K., Sondej, N., Ton, J., Cheung, L., Seewald, R.C., Gravel, J.S. (Eds.), A Sound Foundation through Early
2000. Putative measure of peripheral and brainstem frequency-following in Amplification: Proceedings of the Second International Conference, Basel,
humans. Neurosci. Lett. 292 (2), 123–127. Phonak AG, pp. 59–69.
Goblick, T.J., Pfeiffer, R.R., 1969. Time-domain measurements of cochlear Picton, T.W., John, M.S., Dimitrijevic, A., Purcell, D., 2003. Human auditory steady-
nonlinearties using combination click stimuli. J. Acoust. Soc. Am. 46 (4), 924– state responses. Int. J. Audiol. 42 (4), 177–219.
938. Plyler, P., Ananthanarayan, A.K., 2001. Human frequency-following responses:
Golding, M., Pearce, W., Seymour, J., Cooper, A., Ching, T., Dillion, H., 2007. The representation of second formant transitions in normal and hearing-impaired
relationship between obligatory cortical auditory evoked potentials (CAEPs) listeners. J. Am. Acad. Audiol. 12 (10), 523–533.
and functional measures in young infants. J. Acoust. Soc. Am. 18 (2), 117–125. Puria, S., Peake, W.T., Rosowski, J.J., 1997. Sound-pressure measurements in the
Greenberg, S., Marsh, J.T., Brown, W.S., Smith, J.C., 1987. Neural temporal coding of cochlear vestibule of human-cadaver ears. J. Acoust. Soc. Am. 101 (5), 2754–
low pitch. I. Human frequency following responses to complex tones. Hear. Res. 2770.
25 (2–3), 91–114. Rance, G., Cone-Wesson, B., Wunderlich, J., Dowell, R., 2002. Speech perception and
Herdman, A.T., Stapells, D.R., 2003. Auditory steady-state response thresholds of cortical event related potentials in children with auditory neuropathy. Ear Hear.
adults with sensorineural hearing impairments. Int. J. Audiol. 42 (5), 237–248. 23 (3), 239–253.
Hillenbrand, J., Getty, L.A., Clark, M.J., Wheeler, K., 1995. Acoustic characteristics of Rhode, W.S., Greenberg, S., 1994. Encoding of amplitude modulation in the cochlear
American English vowels. J. Acoust. Soc. Am. 97 (5 pt 1), 3099–3111. nucleus of the cat. J. Neurophysiol. 71 (5), 1797–1825.
Holmberg, E.B., Hillman, R.E., Perkell, J.S., 1988. Glottal airflow and transglottal air Rickman, M.D., Chertoff, M.E., Hecox, K.E., 1991. Electrophysiological evidence of
pressure measurements for male and female speakers in soft, normal, and loud nonlinear distortion products to two-tone stimuli. J. Acoust. Soc. Am. 89 (6),
voice. J. Acoust. Soc. Am. 84 (2), 511–529 (Erratum in: J. Acoust. Soc. Am. 85 (4) 2818–2826.
(1989) 1787. Rosner, B.S., Pickering, J.B., 1994. Vowel Perception and Production. Oxford
Huis in’t Veld, F., Osterhammel, P., Terkildsen, K., 1977. The frequency selectivity of University Press, Toronto.
the 500 Hz frequency following response. Scand. Audiol. 6 (1), 35–42. Russo, N., Nicol, T., Musacchia, G., Kraus, N., 2004. Brainstem responses to speech
John, M.S., Picton, T.W., 2000. MASTER: a Windows program for recording multiple syllables. Clin. Neurophysiol. 115 (9), 2021–2030.
auditory steady-state responses. Comput. Meth. Prog. Biomed. 61 (2), 125–150. Schiano, J.L., Trahiotis, C., Bernstein, L.R., 1986. Lateralization of low-frequency tone
Johnson, K.L., Nicol, T.G., Kraus, N., 2005. Brain stem response to speech: a biological and narrow bands of noise. J. Acoust. Soc. Am. 79 (5), 1563–1570.
marker of auditory processing. Ear Hear. 26 (5), 424–434. Small, S.A., Stapells, D.R., 2005. Multiple auditory steady-state responses to bone-
Johnson, K.L., Nicol, T.G., Zecker, S.G., Kraus, N., 2007. Auditory brainstem correlates conduction stimuli in adults with normal hearing. J. Am. Acad. Audiol. 16 (3),
of perceptual timing deficits. J. Cognit. Neurosci. 19 (3), 376–385. 172–183.
King, C., Warrier, C.M., Hayes, E., Kraus, N., 2002. Deficits in auditory brainstem Sohmer, H., Pratt, H., Kinarti, R., 1977. Sources of frequency following responses
pathway encoding of speech sounds in children with learning problems. (FFR) in man. Electroencephalogr. Clin. Neurophysiol. 42 (5), 656–664.
Neurosci. Lett. 319 (2), 111–115. Stapells, D.R., 2000a. Threshold estimation by the tone-evoked auditory brainstem
Korczak, P.A., Kurtzberg, D., Stapells, D.R., 2005. Effects of sensorineural hearing loss response: a literature meta-analysis. J. Speech-Lang. Path Audiol. 24 (2), 74–83.
and personal hearing aids on cortical event-related potential and behavioral Stapells, D.R., 2000b. Frequency-specific evoked potential audiometry in infants. In:
measures of speech-sound processing. Ear Hear. 26 (2), 165–185. Seewald, R.C. (Ed.), A Sound Foundation through Early Amplification:
Krishnan, A., 1999. Human frequency-following responses to two-tone Proceedings of an International Conference, Basel, Phonak AG, pp. 13–31.
approximations of steady-state vowels. Audiol. Neuro-otol. 4 (2), 95–103. Stapells, D.R., 2002. The tone-evoked ABR: why it’s the measure of choice for young
Krishnan, A., 2002. Human frequency-following responses: representation of infants. Hear. J. 55, 14–18.
steady-state synthetic vowels. Hear. Res. 166 (1–2), 192–201. Stapells, D.R., Picton, T.W., Durieux-Smith, A., Edwards, C.G., Moran, L.M., 1990.
Krishnan, A., 2007. Frequency-following response. In: Burkhard, R.F., Eggermont, J.J., Thresholds for short-latency auditory-evoked potentials to tones in notched
Don, M. (Eds.), Auditory Evoked Potentials: Basic Principles and Clinical noise in normal-hearing and hearing-impaired subjects. Audiology 29 (5), 262–
Application, Lippincott Williams & Wilkins, New York, pp. 313–333. 274.
Krishnan, A., Xu, Y., Gandour, J.T., Cariani, P.A., 2004. Human frequency-following Stroebel, D., Swanepoel, W., Groenewald, E., 2007. Aided auditory steady-state
response: representation of pitch contours in Chinese tones. Hear. Res. 189 (1– responses in infants. Int. J. Audiol. 46 (6), 287–292.
2), 1–12. Stueve, M.P., O’Rourke, C., 2003. Estimation of hearing loss in children: comparison
Levi, E.C., Folsom, R.C., Dobie, R.A., 1995. Coherence analysis of envelope-following of auditory steady-state response, auditory brainstem response, and behavioral
responses (EFRs) and frequency-following responses (FFRs) in infants and test methods. Am. J. Audiol. 12 (2), 125–136.
adults. Hear. Res. 89 (1–2), 21–27. Tlumak, A.I., Rubinstein, E., Durrant, J.D., 2007. Meta-analysis of variables that affect
Liberman, A.M., Delattre, P.C., Cooper, F.S., Gerstman, L.J., 1954. The role of accuracy of threshold estimation via measurement of the auditory steady-state
consonant–vowel transitions in the perception of the stop and nasal response (ASSR). Int. J. Audiol. 46 (11), 692–710.
consonants. Psychol. Monogr. 68, 1–13. Tremblay, K.L., Billings, C.J., Friesen, L.M., Souza, P.E., 2006. Neural representation of
Lins, O.G., Picton, T.W., Boucher, B.L., Durieux-Smith, A., Champagne, S.C., Moran, amplified speech sounds. Ear Hear. 27 (2), 93–103.
L.M., Perez-Abalo, M.C., Martin, V., Savio, G., 1996. Frequency-specific Voigt, H.F., Sachs, M.B., Young, E.D., 1982. Representation of whispered vowels in
audiometry using steady-state responses. Ear Hear. 17 (1), 81–96. discharge patterns of auditory-nerve fibers. Hear. Res. 8 (1), 49–58.
Luts, H., Desloovere, C., Kumar, A., Vandermeersch, E., Wouters, J., 2004. Objective Wunderlich, J.L., Cone-Wesson, B.K., Shepherd, R., 2006. Maturation of the cortical
assessment of frequency-specific hearing thresholds in babies. Int. J. Pediat. auditory evoked potential in infants and young children. Hear. Res. 212 (1–2),
Otorhinolaryngol. 68 (7), 915–926. 185–202.
Luts, H., Wouters, J., 2005. Comparison of MASTER and AUDERA for measurement of Yamada, O., Yamane, H., Kodera, K., 1977. Simultaneous recordings of the brain
auditory steady-state responses. Int. J. Audiol. 44 (4), 244–253. stem response and the frequency-following response to low-frequency tone.
Moushegian, G., Rupert, A.L., Stillman, R.D., 1973. Laboratory note. Scalp-recorded Electroencephalogr. Clin. Neurophysiol. 43 (3), 362–370.
early responses in man to frequencies in the speech range. Electroencephalogr. Zurek, P.M., 1992. Detectability of transient and sinusoidal otoacoustic emissions.
Clin. Neurophysiol. 35 (6), 665–667. Ear Hear. 13 (5), 307–310.
Pandya, P.K., Krishnan, A., 2004. Human frequency-following response correlates of Zwislocki, J., Feldman, R.S., 1956. Just noticeable difference in dichotic phase. J.
the distortion product at 2F1-F2. J. Am. Acad. Audiol. 15 (3), 184–197. Acoust. Soc. Am. 28 (5), 860–864.

View publication stats

You might also like