TheoryandDesignofSpatialActiveNoiseControlSystems
TheoryandDesignofSpatialActiveNoiseControlSystems
net/publication/320224291
CITATIONS READS
5 6,849
1 author:
Hanchi Chen
Australian National University
19 PUBLICATIONS 333 CITATIONS
SEE PROFILE
All content following this page was uploaded by Hanchi Chen on 09 June 2019.
Hanchi Chen
September 2017
(5<3VNV<ZL.\PKLSPULZ
A thesis submitted for the degree of Doctor of Philosophy
;OL(5<SVNVPZHJVU[LTWVYHY`of The Australian National University
YLÅLJ[PVUVMV\YOLYP[HNL
0[JSLHYS`WYLZLU[ZV\YUHTL
V\YZOPLSKHUKV\YTV[[V!
-PYZ[[VSLHYU[OLUH[\YLVM[OPUNZ
;OL(5<SVNVYLTHPUZWYVWLY[`VM[OL<UP]LYZP[`;VWYLZLY]L
[OLH\[OLU[PJP[`VMV\YIYHUKPKLU[P[`[OLYLHYLY\SLZ[OH[
NV]LYUOV^V\YSVNVPZ\ZLK
Research School of Engineering
7YLMLYYLKSVNV )SHJR]LYZPVU
7YLMLYYLKSVNV
College of Engineering and Computer Science
;OLWYLMLYYLKSVNVZOV\SKIL\ZLKVUH^OP[LIHJRNYV\UK
;OPZ]LYZPVUPUJS\KLZISHJR[L_[^P[O[OLJYLZ[PU+LLW.VSKPU
LP[OLY74:VY*4@2
)SHJR
The Australian National University
>OLYLJVSV\YWYPU[PUNPZUV[H]HPSHISL[OLISHJRSVNVJHU +LLW.VSK )SHJR
9L]LYZLK]LYZPVU
3VNVHUKHWWYV]HSZJHUILVI[HPULKMYVTIYHUK'HU\LK\H\
Declaration
The contents of this thesis are the results of original research and have not been
submitted for a higher degree to any other university or institution. Much of this
work has either been published or submitted for publications as journal papers and
conference proceedings. Following is a list of these papers.
Journal Publications
Conference Proceedings
i
ii
The following papers are also results from my Ph.D. study, but not included in
this thesis:
Conference Proceedings
The research work presented in this thesis has been performed jointly with Prof.
Thushara D. Abhayapala, Dr. Wen Zhang and Dr. Prasanga Samarasinghe. Ap-
proximately 80% of this work is my own.
Hanchi Chen
iii
Without the support of the many colleagues and friends, this work would have never
been complete. I would like to acknowledge and thank each of the following.
First and foremost, my supervisors, Prof. Thushara Abhayapala and Dr. Wen
Zhang, for their professional guidance and consistent encouragement. Special
thanks goes to Thushara, who had provided me with knowledge and experience
not only in research, but also in many other aspects of life.
Dr. Glenn Dickins for inviting me to visit the Dolby Labs, and sharing with
me his extensive knowledge on every aspect of audio.
The Australian National University, for the PhD scholarship and the funding
and assistance for my patent application.
Mr. Xianjun Zhen and Mr. Erasmo Scipione for providing technical support
and electronics parts for my experiments.
Mr. Yuki Mitsufuji for giving me the internship opportunity at Sony Japan.
v
Abstract
The concept of spatial active noise control is to use a number of loudspeakers to
generate anti-noise sound waves, which would cancel the undesired acoustic noise
over a spatial region. The acoustic noise hazards that exist in a variety of situations
provide many potential applications for spatial ANC. However, using existing ANC
techniques, it is difficult to achieve satisfying noise reduction for a spatial area,
especially using a practical hardware setup. Therefore, this thesis explores various
aspects of spatial ANC, and seeks to develop algorithms and techniques to promote
the performance and feasibility of spatial ANC in real-life applications.
We use the spherical harmonic analysis technique as the basis for our research
in this work. This technique provides an accurate representation of the spatial
noise field, and enables in-depth analysis of the characteristics of the noise field.
Incorporating this technique into the design of spatial ANC systems, we developed
a series of algorithms and methods that optimizes the spatial ANC systems, towards
both improving noise reduction performance and reducing system complexity.
Several contributions of this work are: (i) design of compact planar microphone
array structures capable of recording 3D spatial sound fields, so that the noise field
can be monitored with minimum physical intrusion to the quiet zone, (ii) derivation
of a Direct-to-Reverberant Energy Ratio (DRR) estimation algorithm which can be
used for evaluating reverberant characteristics of a noisy environment, (iii) propose
a few methods to estimate and optimize spatial noise reduction of an ANC system,
including a new metric for measuring spatial noise energy level, and (iv) design of
an adaptive spatial ANC algorithm incorporating the spherical harmonic analysis
technique. The combination of these contributions enables the design of compact,
high performing spatial ANC systems for various applications.
vii
Contents
Declaration i
Acknowledgements v
Abstract vii
1 Introduction 3
1.1 Motivation and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Recent advancements in spatial ANC . . . . . . . . . . . . . . . . . . 10
1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ix
x Contents
Bibliography 163
Notations and Symbols
1
Chapter 1
Introduction
3
4 Introduction
two sound waves would cancel each other, thus reducing the noise level (Fig. 1.2).
Contrary to the passive noise control strategy, the active noise control method works
better at lower frequencies [2]. At lower frequencies (up to a few hundred Hz), the
wavelength of the sound is longer, thus making it easier for the anti-noise signal to
match with the unwanted noise.
The most commonly seen application of the active noise control technique is
the active noise cancelling (ANC) headphones (Fig. 1.3). The ANC headphones
typically employ a reference microphone, mounted on the outer surface of the head-
phone’s housing. The reference microphone picks up the ambient noise, and sends
the noise signal to a processing unit, which generates the anti-noise signals and
plays it through the headphone driver along with the music signal [3]. In some
designs, an additional error microphone is placed inside the ear cup to monitor the
residual noise. It is also possible to use a feedback ANC structure, where the ref-
erence microphone is omitted, one such design is detailed in [3]. Noise cancelling
headphones can yield reasonably good noise attenuation, partially due to the fact
that the secondary loudspeaker and the error microphone are placed very close to
the ear. According to [4], significant attenuation of sinusoidal noise signal can be
achieved for frequencies up to 2 kHz. Another study on consumer ANC headphone
performance [5] suggests that the noise reduction achievable by ANC headphones
is typically between 10 − 25 dB, and the performance is highly dependant on the
tightness of the wearing situation.
Although ANC headphones yield very good performance in terms of noise level
attenuation, one of its disadvantages is that the user is required to constantly wear
the headphone, which is inconvenient, or even impractical in many scenarios. In such
1.1 Motivation and scope 5
cases, it is desirable if the noise can be attenuated for a spatial area, such that people
within the area can enjoy a noise-free acoustical environment. A well developed
approach to achieve this goal is the Multiple-input-multiple-output (MIMO) ANC
systems, or multi-channel ANC systems [6]. In these systems, multiple secondary
loudspeakers are utilized to generate the anti-noise signals, while multiple error
microphones are distributed in the quiet zone to monitor the residual noise level.
For feed-forward systems, one or more reference microphones are placed close to the
noise source to pick up the noise signal; for feedback systems, reference microphones
are not needed [6]. Fig. 1.4 illustrates a feed-forward MIMO ANC system.
The MIMO ANC system has been successfully implemented to reduce the noise
in environments such as vehicle cabins [7, 8] and rooms [9]. However, conventional
MIMO ANC controllers minimize the sound pressure measured by the error mi-
crophones. Since the noise level is only known at the microphone locations, when
a number of error microphones are randomly distributed inside the desired quiet
zone, only the space in the proximity of each microphone can be expected to have
significant noise reduction; in the area not covered by microphones, noise reduction
cannot be guaranteed. One straightforward solution to this problem is to place a
large number of microphones inside the quiet zone, however this approach greatly
reduces the feasibility of MIMO ANC systems in real-life applications.
A potential way of overcoming this issue is to employ spatial sound analysis
techniques, where the noise sound captured by a microphone array is transformed
1.2 Problem description 7
into another domain, which results in a more accurate representation of the spatial
noise field. One of such techniques is the spherical harmonic analysis [10], where
the noise field inside a spherical region is decomposed into a series of spherical
harmonic functions. This technique allows accurate representation and reconstruc-
tion of the noise field, which makes it possible to perform ANC over a continuous
space, rather than at a number of sampling points. Furthermore, the transformation
into spherical harmonic domain allows in-depth analysis of the noise field, such as
Direction-of-Arrival Estimation (DOA) [11] and Direct-to-Reverberant Ratio (DRR)
estimation [12]. However, in order to perform the spherical harmonic analysis, the
error microphones need to be arranged in specific geometries, typically in a spherical
arrangement [13, 14].
In general, spherical microphone arrays designed for spherical harmonic analysis
of sound field can be divided into two categories: rigid sphere topology, and open
sphere topology. In the former case, the microphones are mounted on a rigid sphere
baffle whose radius is the same as that of the region of interest; while in the open
sphere case, microphones are placed on the surface of the region of interest, without
the use of a rigid baffle. However, for the open baffle topology, the microphone
array may suffer from ill-conditioning, due to the inherent properties of spherical
Bessel functions [15]. One way to overcome this is to use two concentric spherical
arrays with similar radius [15–18]. Although the open sphere topology is easier to
implement than the rigid sphere topology when the region size is large, the region
of interest is still fully surrounded by microphones, which limits its feasibility in
practical ANC applications.
From the discussion above, an unsolved problem regarding the active noise con-
trol technique can be summarized as follows:
How to attenuate a complicated noise field over a space using active noise can-
cellation strategies, especially with a hardware system that’s feasible for practical
applications?
Noise field modelling is about acquiring information about the noise field, so that
an ANC algorithm can use this information to generate suitable anti-noise signals to
cancel the noise. This can be further divided into two elements, namely the real-time
tracking of the noise, and characterization of the noise field. The real-time tracking
of the noise happens while the ANC system is online, it provides the ANC system
with the instantaneous noise field information, and measures the noise attenuation
achieved by the ANC system, so that the system can quickly respond to changes
of the noise field and minimize the residual noise. A number of sensors, typically
microphones, are usually employed to keep track of the noise in real time. The
number and position of these sensors play a key role in determining the performance
of an ANC system. Although distributing a large number of sensors over the entire
quiet zone would provide very complete information of the noise field, practical
applications demand for more compact and economic sensing solutions. In addition,
in some ANC systems, the reference noise is synthesized based on measurement of
noise source movement (such as engine rotation) and some prior knowledge of the
noise composition (such as harmonic components), this can also be categorized into
real-time noise tracking.
On the other hand, modelling of the noise characteristics can be done without a
functional ANC system, it is about analyzing the nature of the spatial noise, such
as its spectrum, direction of arrival, and spatial dimensionality. These information
help to determine whether or not a given noise environment is suitable for spatial
ANC, and whether the characteristics of the noise field can be exploited to simplify
the design complexity of the spatial ANC system. Both the noise source itself and a
reverberant environment contribute to the characteristics of the spatial noise field.
Modelling them separately would provide more insights into the noise field.
1.2 Problem description 9
Since the goal of every ANC system is to use a suitable sound wave to cancel the
noise, generation of the optimal anti-noise signal is critical to the performance of a
spatial ANC system. The position of the secondary loudspeakers, and consequently
the sound field they can produce inside the quiet zone, play an important role in
determining the anti-noise signal to be played. Badly positioned loudspeakers may
result in very high driving signals, causing excessive noise level outside the quiet
zone without achieving any significant noise attenuation inside the quiet zone, and
may damage the loudspeakers themselves; on the other hand, a few well-placed loud-
speakers may be able to minimize the noise level over a large region with very small
output power. Once the loudspeakers are placed, it is then critical to accurately
measure, and keep track of, the acoustic signal channels between each loudspeaker
and the quiet zone, as inaccurate channel information can cause instability of the
ANC system. Therefore studying the placement of loudspeakers and estimating the
loudspeaker channels is very necessary for designing compact and efficient spatial
ANC systems.
Designing loudspeakers suitable for spatial ANC is also important, and some of
the design goals are different from that of consumer loudspeaker products. While
consumer products aim for wide and flat frequency response, strong and deep bass,
and an attractive design, loudspeakers designed for ANC purposes should have char-
acteristics such as low harmonic distortion (especially at low frequencies), high sen-
sitivity, and good power handling capabilities, combined with a small form factor.
The frequency range can be just as wide as the target noise frequency band, and a
flat response curve is not necessary, since the adaptive filter will act as an equalizer
automatically. Although in some cases, the loudspeakers designed for music listen-
ing have to be employed for ANC purposes, such as in-car noise cancellation, it is
still desirable to keep in mind of the properties that make a good ANC loudspeaker
while selecting speakers for the ANC system.
The active noise control algorithm governs how the anti-noise signal is generated,
and depending on the optimization criteria, each algorithm would result in different
noise attenuation level at each position within the quiet zone. The Least-Mean-
Square error algorithm, commonly used in existing multi-channel ANC systems, may
not result in the best performance in a spatial noise control application. Utilizing
the latest spatial sound field analysis techniques, more advanced ANC algorithms
may be developed.
Many active noise control systems utilize an adaptive algorithm to estimate the
noise channel, as well as generate the driving signals for loudspeakers. The use of
10 Introduction
of a large sound field, using the local spherical harmonic coefficients captured by
each higher order microphone. Compared to the method developed in Chapter 3,
this method requires significantly smaller number of microphone units, due to the
use of higher order microphones. This method can be seen as a generalization of the
method proposed in Chapter 3.
This chapter presents an algorithm for DRR estimation using a first order micro-
phone system, which helps to characterize the noise environment, and the relevant
room acoustics. Using the relationship between first order spherical harmonics and
the acoustic particle velocity developed in Chapter 2, we derive an expression for
modelling certain characteristics of the reverberation that are related to DRR esti-
mation. Based on the estimated reverberation characteristics, we use the coherence
function between sound pressure and particle velocity to estimate DRR. All the re-
quired data can be obtained using a single first order microphone. The proposed
method addresses the overestimation problem observed in a previous DRR estima-
tion algorithm.
Overview: This chapter provides a brief overview of the theory and techniques re-
lated to spherical harmonic analysis. We first introduce the mathematical expressions
of the spherical harmonic expansion, and show how these expressions can be used
to express a spatial sound field. Then, we present a number of special properties of
the spherical harmonics. This is followed by a review of the techniques for recording
spatial sound using microphone arrays, as well as synthesizing spatial sound using
loudspeaker arrays, both of which are based on spherical harmonic analysis. The
techniques described in this chapter form a foundation for the rest of the thesis.
15
16 Background: Spherical harmonic analysis and synthesis of sound fields
decomposition reveals the underlying characteristics of the sound field, thus allow-
ing high accuracy manipulation and analysis of the sound field, therefore for this
thesis, we choose to use spherical harmonic analysis as the fundamental tool for the
development of the theories.
The essential idea of spherical harmonic analysis of a sound field is to use the
weighted sum of a set of orthogonal basis functions to describe the pressure field of
propagating sound. These functions, known as spherical harmonics, are solutions to
the Helmotz wave equation in the 3D space for representing the propagation modes
of a sound wave.
The spherical harmonics expansion of a sound field is divided into two cases: the
interior field expression, and the exterior field expression. The former is used to
describe the wave field within a spatial region with no sound source inside, and all
impinging sound waves are due to sources outside the region; the latter is used for
the situations where the sound sources are positioned within a limited area, and the
region of interest is defined as the space enclosing the source area.
In this work, only the interior field problem is considered, therefore we only
describe the spherical harmonics expansion for the interior field case in this section.
Consider a sound field within a source free region, the sound pressure at a point
(r, θ, φ) with respect to the origin O can be can be expressed as [46]
∞ X
X l
P (r, θ, φ, k) = Clm (k)jl (kr)Ylm (θ, φ) (2.1)
l=0 m=−l
where Clm (k) are spherical harmonic coefficients, k = 2πf /c is the wave number, f
is the frequency, c is the speed of sound propagation, jl (kr) is the lth order spherical
Bessel function of the first kind, and Ylm (θ, φ) are the spherical harmonics, defined
by
Ylm (θ, φ) = Pl|m| (cos θ)Em (φ) (2.2)
where
s
(2l + 1) (l − |m|)!
Pl|m| (cos θ) , Pl|m| (cos θ), and (2.3)
2 (l + |m|)!
√
Em (φ) , (1/ 2π)eimφ (2.4)
are the normalized associated Legendre functions and normalized exponential func-
tions, respectively; Pl|m| (cos θ) are the associated Legendre functions.
2.2 Properties of the spherical harmonic expansion 17
where δl,m is the two dimensional Dirac Delta function. The orthogonal property of
the spherical harmonics is very useful in simplifying the mathematical expressions
related to spatial sound, this property will be utilized later in this thesis in the
derivation of many results.
It can be seen from the decomposition (2.1) and the expression (2.2) that the
spherical bessel function jl (kr) governs the radial and frequency dependant compo-
nent of the basis functions, while Pl|m| (cos θ) and Em (φ) govern the elevation and
azimuth components, respectively. Due to the low pass nature of spherical Bessel
functions, spherical harmonics of higher order l has very little energy when the value
of kr is lower than a certain threshold. Therefore, a common practice is to truncate
the infinite summation in (2.1) at a maximum order l = L, such that the finite
summation provides an accurate approximation of the sound field, thus (2.1) can be
approximated as
L X
X l
P (r, θ, φ, k) ≈ Clm (k)jl (kr)Ylm (θ, φ). (2.6)
l=0 m=−l
ekr
L=d e, (2.7)
2
where e is the natural exponential. Using this truncation, the number of spherical
harmonics required to approximate any sound field of a certain radius and frequency
is limited to (L + 1)2 .
dPl|m| (x)
(x2 − 1) = nxPl|m| (x) − (|m| + l)P(l−1),|m| (x). (2.8)
dx
In the special case where x = 0, (2.8) can be simplified to
which indicates that the first order derivative of the associated Legendre functions
at x = 0 can be directly calculated from the same functions of a lower order.
0
By taking the derivative of (2.3) and setting cos θ = 0, expressing Pl|m| (0) using
(2.9) and expressing P(l−1),|m| (0) with Pl−1|m| (0) using (2.3), we derive the following
relationship for the normalised associate Legendre functions
s
(2l + 1)(l2 − m2 )
P 0 l|m| (0) = P(l−1)|m| (0), (2.10)
(2l − 1)
The relationship between Clm and Bνµ can be described by the spherical har-
monic addition theorem [57]. The relationship can be written as [58]
∞ X
X l
mµ
Bνµ = Clm Sblν (R), (2.12)
l=0 m=−l
where
l+ν+1
X
mµ ∗
Sblν (R) = 4πiν−l i` (−1)2m−µ j` (kR)Y`(µ−m) (ϑ, ϕ)W, (2.13)
`=|µ−m|
r
(2l + 1)(2ν + 1)(2` + 1)
W = W1 W2 . (2.14)
4π
Here, W1 and W2 denote Wigner 3-j symbols, with
! !
l ν ` l ν `
W1 = , W2 = . (2.15)
0 0 0 m −µ µ − m
It can be seen that by substituting (2.12) into (2.11), one can derive the sound
pressure decomposition of a given point with respect to O0 using the spherical har-
monic coefficients with respect to O.
B = SC,
b (2.16)
h iT h iT
where C = C00 C11 C10 . . . CLL and B = B00 B11 B10 . . . BV V . S b
is the translation matrix that maps the coefficients C to the coordinate system O0 .
b consists of all the Sbmµ (R) needed to translate C into B, the orders of Sbνµ (R)
S lν lm
are arranged in correspondence with B and C, thus S can be written as [58]
b
Sb00 00
Sb11 00
Sb10 00
. . . SbLL
0011 11 11
11
Sb00 Sb11 Sb10 . . . SbLL
10 10 10 10
S Sb00
b = Sb11 Sb10 . . . SbLL . (2.17)
. .. .. .. ..
.. . . . .
VV
Sb00 Sb11V
V
Sb10V
V VV
. . . SLL
b
For a given maximum order L, there are a total number of (L + 1)2 spherical har-
b becomes (V + 1)2 by (L + 1)2 .
monics available, thus the size of S
20 Background: Spherical harmonic analysis and synthesis of sound fields
where βlm and Clm represent the spherical harmonic coefficients after and before
l0 m0
rotation, respectively. The values of Mlm can be calculated using numerical inte-
gration [59], Z
lm0 0 ∗
Mlm = Yl0 m0 (Rs)Ylm (s)ds (2.19)
s
Theorem 1. The acoustic particle velocity at the point 0 ≡ (0, 0, 0) along the x, y
and z axes at a particular frequency k can be expressed using the first order spherical
harmonic coefficients,
iρ0 c
Vx (0, k) = √ (C11 (k) + C1,−1 (k)) (2.20)
24π
−ρ0 c
Vy (0, k) = √ (C11 (k) − C1,−1 (k)) (2.21)
24π
iρ0 c
Vz (0, k) = √ C10 (k), (2.22)
12π
where ρ0 is the density of the medium, c is the speed of sound, and Clm (k) denotes
the spherical harmonic coefficient of order l and mode m.
i ∂P (x0 , k)
Vx (x0 , k) = . (2.23)
kρ0 c ∂x
For the proof of (2.20), we consider the sound pressure at a point on the x-axis, whose
coordinate in the spherical coordinate system is (r, π/2, 0), the sound pressure can
be decomposed using (2.1),
∞ X l
π X π
P (r, , 0, k) = Clm (k)jl (kr)Ylm ( , 0). (2.24)
2 l=0 m=−l
2
Taking the partial derivative of P (r, π/2, 0, k) in the direction of r, which is equiv-
alent to ∂P (x,y,z)
∂x
, we have
∞ l
∂P (r, π2 , 0, k) X X ∂jl (kr) π
= Clm (k) Ylm ( , 0) (2.25)
∂r l=0 m=−l
∂r 2
Since we consider the partial derivative at the origin, we let r → 0. Using the
recurrent relationship [61]
d jl (x)
ljl−1 (x) − (l + 1)jl+1 (x) = (2l + 1) , (2.26)
dx
22 Background: Spherical harmonic analysis and synthesis of sound fields
∂P (r, π2 , 0, k) k π π
lim = C11 Y11 ( , 0) + C1,−1 Y1,−1 ( , 0) (2.29)
r→0 ∂r 3 2 2
p
Substituting (2.29) into (2.23) with the values Y11 (π/2, 0) = Y1,−1 (π/2, 0) = 3/8π
completes the proof.
For the proof of (2.21), we consider the partial derivative of sound pressure at
(r, π/2, π/2). The derivation is identical to that of ∂P
∂x
, except that Ylm (π/2, 0) are
replaced by Ylm (π/2, π/2).
∞ l
∂P (r, 0, φ, k) X X ∂jl (kr)
= Clm (k) Ylm (0, φ). (2.30)
∂r l=0 m=−l
∂r
Due to the fact that Y11 (0, φ) = 0 and Y1,−1 (0, φ) = 0, and utilizing (2.28), we can
simplify (2.30), such that
∂P (r, 0, φ, k) k
lim = C10 Y10 (0, φ) (2.31)
r→0 ∂r 3
p
Substituting (2.31) into (2.23) with Y10 (0, φ) = 3/4π into completes the proof.
Theorem 1 provides a direct link between the signal received by a first order
microphone and the 1st order spherical harmonic coefficients representing the sound
field. For example, when placing a bi-directional microphone at the origin, with its
two beams coincide with the z axis, then the signal received by the microphone is
equivalent to the coefficient C10 , up to a constant scaling factor.
2.2 Properties of the spherical harmonic expansion 23
The technique of spherical harmonic analysis is widely used in areas other than
spatial audio, such as geophysics [62, 63] and computer graphics [59]. In many of
these applications, the spherical functions to be analyzed are real-valued. For these
applications, it is sufficient to use real-value spherical harmonics to decompose the
spatial functions.
Compared to the complex-value spherical harmonics, it can be seen that the only
difference is that instead of using the complex exponential eimφ to express the func-
tion in the azimuth direction, the real-value spherical harmonics use the sinusoid
functions. Therefore, many properties of the complex-value spherical harmonics are
also valid for the real-value spherical harmonics.
It can be seen that the complex-value and real-value spherical harmonics are
related through the following equation
The complex-value spherical harmonics are used for analyzing the spatial sound
in the frequency domain. However, the time domain sound pressure signal is real-
valued, therefore, if the spherical harmonic analysis is performed in the time domain,
it is preferable to use real-value spherical harmonics instead. This is discussed in
detail in Chapter 7.
24 Background: Spherical harmonic analysis and synthesis of sound fields
The spherical microphone arrays are very suitable for capturing the spherical har-
monic coefficients of a spatial sound field, since their geometry coincide with that of
the spherical harmonics. The methods to capture spherical harmonics using open
and rigid spherical microphone arrays have been described in [10] and [64]. The or-
thogonal property of the spherical harmonics is exploited in both of these methods.
For open sphere microphone arrays with radius R, the sound pressure on the
surface of the spherical array can be expressed using (2.1). Multiplying both sides
∗
of (2.1) with Ylm (θ, φ) and integrating over the sphere yields
Z π Z 2π
∗
Clm (k)jl (kR) = P (R, θ, φ, k)Ylm (θ, φ)dθdφ, (2.35)
0 0
1 X ∗
Clm (k) = P (R, θi , φi , k)Ylm (θi , φi )γi , (2.36)
jl (kR) i
where θi and φi are the elevation and azimuth angle of the ith microphone, and γi
are some weighting coefficients specific to the sampling scheme of the microphone
array. The number of microphones on the sphere should be no fewer than (L + 1)2 ,
where L is the maximum order of the spatial sound in the area, determined using
(2.7).
In the case of rigid sphere microphone array, the microphones are mounted on
a rigid spherical baffle. The sound field around the microphone array is affected by
the baffle, and the sound pressure on the surface of the baffle can be expressed by
∞ X
X l
P (R, θ, φ, k) = Clm (k)bl (kR)Ylm (θ, φ), (2.37)
l=0 m=−l
2.3 Spatial sound recording and synthesis using spherical harmonic expansion 25
where
(2) 0
h (kR) (2)
bl (kR) = jl (kR) − l 0 h (kR), (2.38)
jl (kR) l
(2)
and hl (kR) is the spherical Hankel function of the second kind. Using the same
spherical integration method, the spherical harmonics can be calculated as [64]
1 X ∗
Clm = P (R, θi , φi , k)Ylm (θi , φi )γi , (2.39)
bl (kR) i
Compared to the open sphere microphone array, the rigid sphere array avoids the ill-
conditioning problem caused by jl (kR) approaching zero at certain combinations of k
and R. However, the rigid baffle completely encloses the region of interest, rendering
this array format hard to implement in larger sizes, and hinders its application in
fields such as spatial ANC.
Non-spherical microphone array layouts have also been proposed for the purpose
of spatial sound recording based on spherical harmonic analysis. In [13], it is pro-
posed to use multiple circular microphone arrays to capture the spatial sound. This
method offers superior flexibility in terms of array geometry compared to spherical
microphone arrays, since the radius and position of each circular array can vary
within a certain limit. We briefly outline this work in this section.
Consider a circular microphone array placed parallel to the x − y plane, with its
center located on the z axis. The sound pressure at a point on the array can be
expressed using (2.1) and (2.2) as
∞ X
X l
P (R, ϑ, φ, k) = Clm (k)jl (kR)Pl|m| (cos ϑ)Em (φ), (2.40)
l=0 m=−l
where R is the distance from the origin to the circular array, and ϑ is the elevation
angle of the array. Multiplying both sides of (2.40) by E−m (φ) and integrate with
respect to φ over [0, 2π), we have
L
X
αm (R, ϑ, k) = Clm (k)jl (kR)Pl|m| (cos ϑ), (2.41)
l=|m|
26 Background: Spherical harmonic analysis and synthesis of sound fields
where we define Z 2π
αm (R, ϑ, k) , P (R, ϑ, φ, k)E−m (φ)dφ. (2.42)
0
For a given circular array, the maximum order of observable spherical harmonic is
limited by (2.7). Equation (2.42) can be evaluated for m = −L, −L + 1...L.
When multiple circular arrays are deployed, each with radius and elevation angle
(Rq , ϑq ), the spherical harmonic coefficients of mode m can be solved through solving
the LMS problem
J m C m = αm , (2.43)
where C m = [C(|m|,m) , C|m|+1,m ...CLm ] is a vector containing all the spherical har-
1 2 Q
monics of mode m, αm = [αm , αm ...αm ] is a vector containing αm from the qth
circular array, and
j|m| (kR1 )P|m|,|m| (ϑ1 ) j|m|+1 (kR1 )P|m|,|m| (ϑ1 ) ...jL (kR1 )P|m|,|m| (ϑ1 )
j|m| (kR2 )P|m|,|m| (ϑ2 ) j|m|+1 (kR2 )P|m|,|m| (ϑ2 ) ...jL (kR2 )P|m|,|m| (ϑ2 )
Jm = .. .. .. .. .
. . . .
j|m| (kRQ )P|m|,|m| (ϑQ ) j|m|+1 (kRQ )P|m|,|m| (ϑQ ) . . . jL (kRQ )P|m|,|m| (ϑQ )
(2.44)
The complete set of spherical harmonics can be found by solving (2.43) for every
value of m which satisfies |m| ≤ L. At certain combinations of array radius, position
and sound frequency, the value of jl (kR)Pl|m| (ϑ) may equal to zero for some l and
m [13]. Ill-conditioning of J m due to this phenomenon can be avoided by employing
extra circular microphone arrays [13].
Compared to spherical microphone array apertures, this method allows more
flexible placement of the microphones, and the use of circular arrays can simplify
the supporting structure for the microphones. Therefore this method presents a
more practical solution for spatial sound recording over a larger region.
on the array so that the combined spherical harmonic coefficients due to all the
loudspeakers equal to some desired value, i.e.,
X q desire
Dq Hlm = Clm , (2.45)
q
q
where Hlm denotes the spherical harmonic coefficients due to the qth loudspeaker
desire
playing a unit signal, Clm denotes the spherical harmonic coefficient of the desired
sound field, and Dq is the driving signal for the qth loudspeaker. This problem can
be solved in a LMS manner, as
D = H −1 C desire , (2.46)
where D = [D1 , D2 ...DQ ]T is the vector containing all the driving signals, C desire =
desire desire desire T
[C00 , C11 ...CLL ] is the vector of desired spherical harmonic coefficients, and
Q
1 1
H00 H00 . . . H00
1 1 Q
H11 H11 . . . H11
H=
.. .. ... .. (2.47)
. . .
1 1 Q
HLL HLL . . . HLL
is the channel matrix containing the spherical harmonic coefficients due to each
loudspeaker.
Assuming a loudspeaker can be modeled as a point source, the sound field due
to a loudspeaker placed at (R, ϑ, ϕ) can be expanded as [44]
∞
X l
X
P (r, θ, φ, k) = ik jl (kr)hl (kR) Ylm (θ, φ)Ylm (ϑ, ϕ)∗ , (2.48)
l=0 m=−l
if the loudspeaker is placed at a long distance from the reproduction region, its
sound wave can be seen as plane wave, which can be expanded as [44]
∞
X l
X
P (r, θ, φ, k) = 4π l
jl (kr)i Ylm (θ, φ)Ylm (ϑ, ϕ)∗ , (2.49)
l=0 m=−l
Comparing (2.48) and (2.49) with (2.1), it can be seen that the spherical harmonic
coefficients corresponding to a point source and a plane wave source are
point
Hlm (R, ϑ, ϕ) = ikhl (kR)Ylm (ϑ, ϕ) (2.50)
28 Background: Spherical harmonic analysis and synthesis of sound fields
and
plane
Hlm (ϑ, ϕ) = 4πil Ylm (ϑ, ϕ) (2.51)
respectively. If the loudspeakers are arranged in a spherical geometry around the re-
production region, with a uniform spherical sampling scheme, due to the orthogonal
property of the spherical harmonics, we have
H −1 = H H , (2.52)
thus the driving signals for each loudspeaker can be solved using
D = H H C desire . (2.53)
Perfect reproduction of the desired sound field cannot be guaranteed if the loud-
speakers are not distributed evenly around the sphere, or an insufficient number of
loudspeakers are available. However, if no less than (L + 1) number of loudspeakers
are used, and uniformly distributed in a spherical arrangement, high quality sound
field reproduction can be achieved [44].
Overview: Spherical harmonic analysis is a very useful tool for representing the
noise field. However, a drawback of this technique is the three-dimensional micro-
phone arrays required for recording the noise sound field. In this chapter, a method
to design 2D planar microphone arrays that are capable of capturing 3D spatial
sound fields is proposed. Through the utilization of both omni-directional and first
order microphones, the proposed microphone array is capable of measuring sound
field components that are undetectable to conventional planar omni-directional mi-
crophone arrays, thus providing the same functionality as 3D arrays designed for
the same purpose. Simulations show that the accuracy of the planar microphone
array is comparable to traditional spherical microphone arrays. Due to its compact
shape, the proposed microphone array greatly increases the feasibility of 3D sound
field analysis techniques in spatial ANC applications.
3.1 Introduction
We use spherical harmonic analysis as a tool to represent the 3D noise field, due
to its various benefits such as accurate sound field representation and the ability to
perform in-depth analysis to the noise field. In order to capture the 3D noise field in
real time for the ANC system, it is necessary to use a microphone array which has
the capability to capture 3D sound field, in terms of spherical harmonic coefficients
29
30 Planar microphone array apertures for 3D spatial sound field analysis
of the sound field. To the best of our knowledge, all of the previously developed
microphone array structures designed for this purpose have a 3D geometry, which
limits their feasibility for compact ANC systems suitable for real-life applications.
As was discussed in Chapter 2, spherical microphone array geometries are well-
suited for the spherical harmonic transform, and both open and rigid sphere models
have been studied [10, 43]. Both models are widely used in research applications,
such as room geometry inference [65] and near field acoustic holography (NAH) [66].
An inherent drawback of the open sphere model is the numerical ill-conditioning
problem, which is due to the nulls in spherical Bessel functions, thus the diameter
of the microphone array has to be chosen carefully. It has been shown that such ill-
conditioning problem can be overcome via methods such as using concentric spheres
[67,68], co-centered rigid/open spheres [69], or by measuring the radial velocity [43].
The placement of microphones on a spherical array has to follow a strict rule
of orthogonality of the spherical harmonics [15, 70], which limits the flexibility of
the array configuration. The spherical shape of the array also pose difficulties on
implementation as well as practical usage.
Non-spherical microphone arrays, such as the conical microphone array aperture
proposed by Gupta et.al. [71] and the multiple circular microphone array proposed
by Abhayapala et. al. [13, 72] can also be used for spherical harmonic analysis.
These microphone arrays offer greater geometrical flexibility compared to spherical
microphone arrays, thus allowing easier implementation of larger microphone arrays.
However, these apertures still occupy a 3D space, which hinders the development of
compact microphone arrays for practical applications.
On the other hand, microphone arrays featuring 2D geometry are easy to im-
plement, yet existing 2D microphone arrays are incapable of capturing complete 3D
sound field information. Meyer et.al. have shown that a 2D microphone array can be
used to measure certain vertical component of a 3D sound field [73]. However, due
to inherent properties of the spherical harmonics, some spherical harmonic modes
are invisible to omni-directional pressure microphones on the x − y plane, which ex-
plains why previously proposed 2D microphone arrays fail to extract full 3D sound
field information. Measurement of these sound field components on the x − y plane
calls for additional types of sensors, no such technique has been proposed to our
best knowledge.
First order microphones, such as differential microphones and cardioid micro-
phones, are known to have the capability of detecting acoustic velocity in a certain
direction [74]. Kuntz et. al. have shown that through using cardioid microphones
3.1 Introduction 31
In this chapter, we first investigate using first order microphones to aid the de-
tection of 3D sound fields, and propose a new method for 3D sound field recording
using a 2D planar microphone array. In our approach, we use first order microphones
in conjunction with omni-directional microphones to measure the “invisible” com-
ponent of a 3D sound field on the x − y plane. Also, we propose a method of using
multiple co-centered circular arrays of omnidirectional/first order microphones to
compute the sound field coefficients associated with the spherical space enclosing
the planar array aperture. We show that the proposed planar microphone array of-
fers the same functionality as spherical/multiple circular arrays designed for sound
field analysis.
This chapter is arranged as follows: Section 3.2 derives the wave domain expres-
sion of sound field measured by general first order microphone. We show that the full
3D sound field can be observed on a plane with the aid of first order microphones
by exploiting a property of the associated Legendre functions. Section 3.3 intro-
duces the co-centered hybrid circular microphone array for sound field recording,
and shows how the sound field coefficients can be calculated using the data mea-
sured by different components of the hybrid array. We also provide a step-by-step
design procedure for determining parameters of an array based on system require-
ments. Section 3.4 provides an analysis on the recording accuracy of the proposed
array. Two primary causes of errors are identified, and their impact on each sound
field coefficient is discussed. Section 3.5 gives an hypothetical design example of the
proposed microphone array, as well as an experimental microphone array built for
validation of the theory. Detailed simulation results are provided for the hypothet-
ical design example and the test results of the experimental array is compared with
corresponding simulation results for performance evaluation.
32 Planar microphone array apertures for 3D spatial sound field analysis
For reasons that will become clear later in the chapter, we consider pressure gradient
of a sound field along the direction of θ. That is, we consider either differential or
velocity microphones placed in such a way that they measure pressure gradient in
the direction of θ at a given point (r, θ, φ).
We define the pressure gradient of sound along the direction of θ at a point
(r, θ, φ) as
∂P (r, θ, φ, k)
Pθ (r, θ, φ, k) , . (3.1)
∂θ
By substituting (2.1) into (3.1) and taking the partial derivative with respect to θ,
the pressure gradient can be expressed as
∞ X
X l
Pθ (r, θ, φ, k) = − sin θ Clm (k)jl (kr)P 0 l|m| (cos θ)Em (φ), (3.2)
l=0 m=−l
where
dPl|m| (u)
P 0 l|m| (u) =
d(u)
is the first order derivative of the normalized associated Legendre function.
as
Pc (r, θ, φ, k) , βP (r, θ, φ, k) + (1 − β)Pθ (r, θ, φ, k), (3.3)
where β is a weighing factor and has a range of [0, 1). When β = 0, Pc (r, θ, φ, k)
contains only the differential pattern, which is considered as a special case of first
order pick-up patterns. Here, differential microphones are regarded as one type of
first order microphones; when β = 0.5, Pc (r, θ, φ, k) becomes the pick-up pattern of
a “standard” cardioid microphone. Substituting (2.1) and (3.2) into (3.3) yields the
wave domain representation of the signal received by a general first order microphone
as
∞ X
X l
Clm (k)jl (kr) βPl|m| (cos θ) − (1 − β) sin θP 0 l|m| (cos θ) Em (φ).
Pc (r, θ, φ, k) =
l=0 m=−l
(3.4)
∞ X
X l
P (r, π/2, φ, k) = Clm (k)jl (kr)Pl|m| (0)Em (φ). (3.5)
l=0 m=−l
Observe that when l + |m| is an odd integer the value of Pl|m| (0) is equal to zero [13].
Consequently, the spherical harmonics associated with these associated Ledengre
Functions are equal to zero. This property makes the odd mode spherical harmonics
“invisible” on the θ = π/2 plane, which is why extraction of the complete 3D sound
field information cannot be done through sampling on a single plane using omni
directional microphones.
On the other hand,
π a non-zero value, when l + |m| is an odd integer,
P 0 l|m| (cos ) =
2 0, when n + |m| is an even integer.
Observe that the expression for the pressure gradient in (3.2) has the terms P 0 l|m| (·).
Hence the ‘odd’ components of the pressure gradient along the direction of θ is non-
34 Planar microphone array apertures for 3D spatial sound field analysis
zero on the x-y plane. Thus, the pressure gradient measurements contain ‘odd’
Clm (k) (i.e., l + |m| odd) coefficients. We use this property in this work to propose
a method to extract 3D sound field components by sampling the field on the x-y
plane using differential (or first order) and omni directional microphones together.
Using the recurrent relationship of the normalized associated Legendre functions
by substituting (2.10) into (3.2) and (3.4), we can write the output of the differential
and general first order microphones placed at a point (r, π/2, φ) on the x-y plane
along the direction of θ (i.e., perpendicular to the x-y plane) as
s
∞ X l
π X (2l + 1)(l2 − m2 )
Pθ (r, , φ, k) = − Clm (k)jl (kr) P(l−1)|m| (0)Em (φ)
2 l=0 m=−l
(2l − 1)
(3.6)
and
π
Pc (r, , φ, k) =
2 s
∞ l
X X (2l + 1)(l2 − m2 )
Clm (k)jl (kr) βPl|m| (0) − (1 − β) P(l−1)|m| (0) Em (φ),
l=0 m=−l
(2l − 1)
(3.7)
respectively.
In this section we outline possible geometric configurations of first order and omni-
directional sensors on the x-y plane to extract both the even and odd spherical
harmony components of the sound field.
Consider a circle placed on the x-y plane such that an arbitrary point on the circle
is given by (Rq , π/2, φ). Then the output of a omni-directional microphone on the
3.3 Array configuration 35
∞ X l
π X
P (Rq , , φ, k) = Clm (k) jn (kRq )Pl|m| (0)Em (φ). (3.8)
2 l=0 m=−l
Since sound fields over a spherical region of finite radius are mode limited (2.6), the
infinite summation on right hand side of (3.8) can be approximated by a finite sum,
L X l
π X
P (Rq , , φ, k) ≈ Clm (k) jl (kRq )Pl|m| (0)Em (φ). (3.9)
2 l=0 m=−l
where L denotes the maximum harmonic order at the array’s radius Rq and the
highest operating frequency [53]. Multiplying both sides of (3.9) by Em (−φ) and
integrating with respect to φ over [0, 2π) yields the total sound pressure received by
the ring, as
Z 2π
αm (Rq , k) , P (Rq , π/2, φ, k)Em (−φ) dφ (3.10)
0
L
X
= Clm (k) jl (kRq )Pl|m| (0). (3.11)
l=|m|
Note that only the even mode harmonics are present in (3.11), since Pl|m| (0) = 0
for l + |m| odd. Let there be a total of Q circles placed at different radii but all
on the θ = π/2 plane (x-y plane). Thus, for q = 1, . . . , Q, the relationship between
the even mode sound field coefficients of mode m and the azimuth sound pressure
harmonics αm (Rq , k) on each circle can be expressed as
s
L
X (2l + 1)(l2 − m2 )
=− Clm (k)jl (kRq ) P(l−1)|m| (0) (3.18)
(2l − 1)
l=|m|
Note that only the odd mode harmonics are present in (3.18), since P(l−1)|m| (0) = 0
for l + |m| even.
We can estimate the odd harmonic coefficients from (3.19), provided V m (k) is
non-singular, as
†
C odd (d)
m (k) = V m (k)αm (k) (3.23)
Thus the complete set of sound field coefficients can be derived through solv-
ing for the even and odd harmonics coefficients separately using the signal received
from omni-directional microphones (3.12) and differential microphones (3.19), re-
spectively.
38 Planar microphone array apertures for 3D spatial sound field analysis
Alternatively, the even and odd harmonic coefficients may be calculated together in
one matrix operation. This method is especially suitable for planar arrays that uti-
lize cardioid microphones (or general first order) instead of differential microphones.
According to (3.7), a first order (e.g., cardioid) microphone placed on the x-y plane
picks up both the even and odd components of the sound field. For a set of finite
radii circular arrays of first order microphones placed on the x-y plane, we can write
a matrix equation using (3.7) and following similar steps as in the previous two
subsections:
even odd
α(f)
m (k) = βU m (k)C m (k) + (1 − β)V m (k)C m (k) (3.24)
Equation (3.26) can be solved to calculate both the even and odd harmonics coeffi-
cients given by C even odd
m (k) and C m (k).
where Nq are the number of microphones placed in a circle and φs denotes the
azimuth angle of the location of the sth microphone.
Due to the spatial sampling of the sound field, one can only extract a limited number
of harmonic orders by each array. In order to sample a set of circular harmonics of
maximum order L, the number of microphones required is given by nmic ≥ 2L + 1,
and L is determined using L ≤ dekR/2e, where k is the wave number and R is the
radius of the region of interest [53]. The exact amount of microphones to be used
for each circular array thus depends on the radius of the array as well as the target
frequency band.
The truncation of spherical harmonics leads to errors, which will be discussed in
Section 3.4 The “rule of thumb” L ≤ dekR/2e gives a sufficiently high precision for
most applications [53]. For applications that require less accuracy, an alternative
truncation number is given by L ≤ dkRe [46], which truncates the order to a lower
value, hence reducing system complexity at the cost of accuracy. The former rule is
used in this work for higher accuracy.
Since the number of microphones on each circular array is directly linked to
the wave number k, which can then be translated into the wavelength λ, the num-
ber of microphones needed can be easily derived from the target frequency of the
application as
where c is the speed of wave propagation, in the case of sound, c = 340 m/s. Thus
one can directly calculate the number of sampling points (microphones) for a given
40 Planar microphone array apertures for 3D spatial sound field analysis
Figure 3.1: Example of omnidirectional (dot) and first order (triangle) microphone
arrangement on a 2D plane for 3D sound field analysis.
array radius and a target frequency band. For example, a circular array of 0.4 m
radius, designed for audio signals up to 1500 Hz would need 33 microphones.
Configuration(s)
The array system can be configured to have multiple circular microphone arrays
placed on a plane, with half of the arrays using omni-directional microphones, the
other half using first order microphones placed perpendicular to the plane. The
number of microphones on each array is decided by the target wave number and the
radius of the array, therefore smaller arrays may have a lower amount of microphones.
Figure 3.1 illustrates such a configuration.
An alternative configuration is to use closely placed omni-directional microphone
pairs to realize differential microphones. In this way, each microphone pair is used
in two different ways: the two microphone output signals are differentiated to create
the bi-directional pick up pattern, which is used for calculation of odd numbered
coefficients; in the mean time, one of the two microphone outputs is used to cal-
culate the even-numbered coefficients. Figure 3.2 shows an example of such array
arrangement.
The two microphone array configurations require the same number of micro-
phones for the same design target, although the second option uses half the number
3.3 Array configuration 41
of circular arrays. However, it should be noted that the distance between the two
microphones in each microphone pair should be small compared to the array radius,
so as to best approximate Pθ (r, θ, φ, k) in (3.1).
Step 1: Determine the desired frequency band and the radius R of the region of
interest.
Step 2: Calculate the maximum order of the sound field using L = dekR/2e.
Step 3: Based on the maximum order L, decide the number of circular arrays to be
implemented. For first order microphone configuration, at least Lomni = dL/2e
omnidirectional sensor arrays, and Lfirst = L − Lomni first order arrays are
needed. For differential microphone configuration, no less than Ldiff = dL/2e
arrays of microphone pairs are required.
Step 4: Determine the radius of each circular array. Choose the radius such that
42 Planar microphone array apertures for 3D spatial sound field analysis
the spherical Bessel zeros for the target frequency band are avoided. Ensure
that the radii of the circular arrays have a good diversity.
Step 5: For each circular array, decide the maximum spherical harmonic order Li ,
and estimate the number of microphones to be placed on the array, based on
nmic = 2Li + 1.
After settling on a design, the parameters for sound field calculation can then
be set based on the dimensions of the array.
3.3.4 Comments
We make the following comments and observations with the proposed array:
1. The even spherical harmonics are symmetric about the z = 0 (x-y) plane,
while the odd modes are not. A planar microphone array comprising only
omnidirectional microphones cannot distinguish the waves that are impinging
from either sides of the plane. This fact explains that why this type of array
is not capable of detecting the full 3D sound field.
2. First order cardioid microphones that are placed perpendicular to the array
plane can pick up a combination of even and odd mode harmonics, but are un-
able to separate the two components. However, if the even mode harmonic co-
efficients are known (which can be provided by an omnidirectional microphone
array), then it becomes easy to solve for the remaining odd mode coefficients.
Thus a hybrid array of both omnidirectional and first order microphones is
crucial for detecting full 3D sound field using a planar array aperture.
3. The zeros in the spherical Bessel functions cause certain spherical harmonics
to be “invisible” at some radius and frequency, which limits an array’s wide
band capabilities. The proposed array aperture samples the sound field at
multiple radii, thus improving the array’s redundancy against zero points in
the spherical Bessel functions. However, the user should carefully design the
array such that at each frequency, a sufficient number of circular arrays are
unaffected by the Bessel zeros and are available for calculating the coefficients.
In general, a properly designed planar array can avoid the Bessel zero problem
for all frequencies, and thus having wideband capabilities, this is shown in
Section 3.5 using a hypothetical design example.
3.4 Error analysis 43
4. Although the proposed array has a planar geometry, the free space assumption
still applies to our array system, which requires that no sound source or scat-
terer should exist within the spherical region enveloping the planar array. For
this reason, the array cannot be directly placed on walls or tables to capture
the surrounding sound. However, a work-around to this problem is to place an
appropriate sound absorbing material between the rigid surface (wall, table)
and the planar array, which eliminates all reflections from the surface, thus
the setup no longer violates the free-space assumption. Furthermore, if the
reflection characteristics of the surface is known, it is possible to compensate
for the reflection in the calculation. However, this is beyond the scope of this
chapter, and we will investigate this in a future work.
By choosing sufficiently small value of dx, the error of the approximation can be
minimized. However, due to implementation constraints such as physical dimension
of the microphone units, a very good approximation of (3.31) may not be achievable.
We recommend choosing dx ≈ 0.1/kmax , where kmax is the wave number correspond-
ing to the maximum operating frequency of the microphone array, so as to minimize
the error due to the approximation.
Since this approximation only exists for the sampling of the odd coefficients,
the accuracy of the calculated odd coefficients is expected to be slightly worse than
that of the even coefficients when the differential microphone approximation is used
to implement the array. This phenomenon is observed in the hypothetical design
example.
44 Planar microphone array apertures for 3D spatial sound field analysis
The same approximation error can be defined for (3.17) and (3.25). Generally
speaking, this error is small as long as the Nyquist sampling criteria is met, however,
using extra microphones on each circular array can help to improve the accuracy of
the system.
The truncation of spherical harmonic modes mentioned in Section 3.3 also leads
to errors, as the energy of the truncated higher order harmonics are aliased into the
observed harmonics during calculation. The truncation error can be expressed as
∞
X L
X
∆Etrunc , Clm jl (kr)Pl|m| (0) − Clm jl (kr)Pl|m| (0)
l=|m| l=|m|
X∞
= Clm jl (kr)Pl|m| (0) (3.33)
l=L+1
Using the “rule of thumb” given in [53], the error is in the order of 1 percent. It
should be noted that the truncation error will only be aliased into coefficients of the
highest order, due to inherent properties of the spherical Bessel functions.
Due to the structure of the proposed design example and the nature of the
spherical Bessel functions, the lower order spherical harmonic modes are sampled
by multiple circular arrays, whereas the highest order ones are only visible to one or
two circular arrays. As a result, when solving for the sound field coefficients using
(3.12) and (3.19), the lower order coefficients are less affected by the approximation
and aliasing errors than the higher order coefficients. This trend is shown in Fig. 3.6.
array’s capabilities. Then the implemented array is used to validate the technique
through lab experiments.
ekr
L=d e = 10, (3.34)
2
which means that the outer ring of the array should have at least 2L + 1 = 21 micro-
phone pairs. Following this manner, we place a series of circular arrays of different
radii inside the outer circle. Following the design procedure given in Section 3.3, the
radii of the rings are set to be 0.46 m, 0.4 m, 0.34 m, 0.28 m, 0.22 m, 0.16 m and
0.1 m. Thus, the number of microphone pairs on each ring are 21, 19, 17, 13, 11, 9
and 7, respectively.
To evaluate the performance of the proposed array system, we place a single
point source of frequency 150 − 1150 Hz at (R, θ, φ)=(1.6 m, 60°, 90°). We use
the array to estimate the spherical harmonic coefficients and then reconstructed the
sound field. We compare the reconstructed sound field to the original sound field
and calculate the overall reproduction error of the system. Figure 3.3 depicts the
error for different frequencies. Note that the error is small when the frequency is
below 850 Hz, which is the desired maximum frequency for the array. Beyond the
upper frequency, the error percentage increases dramatically. The reason is that
as the frequency increases, the order of active spherical harmonics also grows. At
frequencies above 850 Hz, the number of microphones needed to estimate the higher
frequency components are greater than the number of the microphones on the array,
thus causing aliasing. Also, the total number of coefficients for each mode m exceeds
the number of circular arrays available, as a result the matrix inversion problems
46 Planar microphone array apertures for 3D spatial sound field analysis
40
35
30
Error percentage (%)
25
20
15
10
0
200 400 600 800 1000 1200
Frequency (Hz)
Figure 3.3: Reproduction error percentage for a point source of frequencies 150 −
1150 Hz, located at (1.6 m, 60°, 90°).
a b
0.1 0.1
−0.6 −0.6
−0.4 −0.4
0.05 0.05
−0.2 −0.2
y(m)
y(m)
0 0 0 0
0.2 0.2
−0.05 −0.05
0.4 0.4
0.6 0.6
−0.1 −0.1
−0.5 0 0.5 −0.5 0 0.5
x(m) x(m)
c d
0.1 0.1
−0.6 −0.6
−0.4 −0.4
0.05 0.05
−0.2 −0.2
y(m)
y(m)
0 0 0 0
0.2 0.2
−0.05 −0.05
0.4 0.4
0.6 0.6
−0.1 −0.1
−0.5 0 0.5 −0.5 0 0.5
x(m) x(m)
Figure 3.4: Actual (a,c) and recorded (b,d) sound field due to a 850 Hz point source
located at θ = 45°, R = 1.6 m, reconstructed at z = 0 (a,b) and z = 0.2 m (c,d)
plane.
1.8
1.6
Error percentage (%)
1.4
1.2
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100 120 140 160 180
Impinging elevation angle (degree)
Figure 3.5: Reproduction error percentage for a plane wave source at 850 Hz, moving
from θ = 0 to θ = 180°.
48 Planar microphone array apertures for 3D spatial sound field analysis
0.16
0th order
0.14 1st order
Average Error (normalized)
2nd order
0.12
3rd order
0.1 4th order
5th order
0.08 6th order
Even−mode coefficient
0.06 Odd−mode coefficient
0.04
0.02
0
5 10 15 20 25 30 35 40 45
Sound Field Coefficients
Figure 3.6: Average coefficient error due to a 500 Hz plane wave impinging from
different elevation angles.
Table 3.1: Condition number of matrix U m of the hypothetical design example for
frequencies 100 Hz, 200 Hz, 400 Hz and 800 Hz.
m=0 m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m = 10
100Hz 5.76 1.00 1.00 / / / / / / / /
200Hz 13.25 4.57 1.00 1.00 / / / / / / /
400Hz 46.30 19.38 15.97 6.33 1.00 1.00 / / / / /
800Hz 181.35 21.80 110.9 13.24 54.40 41.80 10.88 4.20 7.88 1.00 1.00
Fig. 3.6 plots the normalized average error for each coefficient. It can be observed
that the lower order coefficients are more accurately measured compared to the
higher order ones; also, the even mode coefficients are more accurate compared to
the odd mode coefficients.
Table 3.1 shows the condition number of the matrix U m of the designed array
for various frequencies. Due to the separation of the even and odd mode harmonic
coefficients, the coefficients CL,±L , CL,±(L−1) and CL−1,±(L−1) are solved uniquely,
therefore the matrices U L and U L−1 are in fact vectors whose eigenvalues equal to
1. The size of U m grows as the frequency increases, and the condition number for
lower modes increase correspondingly. The design example consists of the minimum
number of circular arrays. We expect the condition numbers to be lower should
additional circular arrays be used in the system. Also, for high order systems (L ≥
5), regularization should be applied when inverting the matrix U m .
3.5 Design examples 49
In general, we can see from the simulations that the design example offers good
accuracy, with its error in the order of 1 percent. This is comparable to the per-
formance of spherical microphone arrays [43] and other previously proposed array
configurations such as the multiple circular microphone array [13] and the double
sided cone array [71] of the same order, assuming that a similar number of micro-
phones have been used in each array configuration.
a b
−0.3 0.5 −0.3 0.5
−0.2 −0.2
−0.1 −0.1
y(m)
y(m)
0 0 0 0
0.1 0.1
0.2 0.2
Figure 3.8: Comparison of (a) recorded and (b) simulated sound field for a 850 Hz
source at (R, θ, φ) = (1.64 m, 45°, 100°), reconstructed at the z = 0.05 m plane.
Microphone locations are marked with “*”.
3.6 Summary 51
Table 3.2: sound field coefficient comparison between simulation and experimen-
tal results, the sound fields are due to a point source located at (R, θ, φ) =
(1.64 m, 45°, 100°) and (1.5 m, 90°, 225°), respectively.
Recording 1 C0 0 C1 (−1) C1 0 C1 1 C2(−2) C2(−1) C2 0 C2 1 C2 2
Recorded 1.5413 0.9608 0.9936 0.8351 0.9566 0.7312 1.0255 1.3735 0.5825
Simulated 1.1079 0.9569 1.1892 0.9626 0.8198 1.4140 0.5095 1.4513 0.6788
Mag. Error 0.5142 0.0040 0.1645 0.1325 0.1668 0.4829 1.1026 0.0536 0.1418
Phase Error 0.0009 0.0226 0.1510 0.0888 0.4383 0.3350 0.0145 0.1566 0.3904
Recording 2 C0 0 C1 (−1) C1 0 C1 1 C2(−2) C2(−1) C2 0 C2 1 C2 2
Recorded 1.7838 1.5167 0.0968 1.5957 1.7211 0.1066 0.6772 0.1121 1.3902
Simulated 1.2380 1.4368 0 1.4137 1.6001 0 1.3127 0 1.8995
Mag. Error 0.5457 0.0798 / 0.1820 0.1209 / −0.6355 / −0.5094
Phase Error −0.0515 0.1143 / −0.1372 −0.1365 / −3.9421 / −0.4036
data as well as those acquired from the simulation results. It can be seen that al-
though rather significant errors occur with some coefficients, the general patterns
match very well. The microphone data used are raw recordings processed by mi-
crophone calibration data which was acquired before assembling the array, therefore
all the errors mentioned previously are present and have an impact on the recorded
coefficients. Further calibration to the system, including microphone gain calibra-
tion, array geometry adjustments and modification of algorithm parameters can be
expected to greatly improve the accuracy of the system.
We would like to point out that our array system utilizes 16 microphones to cap-
ture 2nd order sound field, whereas in theory, the minimum number of microphones
required to capture second order sound field is 9. Therefore, the proposed array
system does not reduce the number of microphones required to sample the sound
field. The highlight of our proposed array structure is that it reduces the physi-
cal dimension of a higher order microphone array system without compromising its
functionality.
3.6 Summary
This chapter first introduces a method of measuring complete 3D sound field infor-
mation on a 2D plane, through the combined use of omnidirectional microphones
and first order microphones. Two options are provided for planar microphone ar-
ray implementation based on the proposed sound field measuring method. Both
array configurations consist of multiple co-centered circular arrays, with one option
using both omni-directional microphones and first order microphones, while the
other option using omni-directional microphones only. The associated algorithms
to calculate sound field coefficients are also given in the chapter. We show in the
52 Planar microphone array apertures for 3D spatial sound field analysis
simulation example that the proposed 2D microphone array system has good accu-
racy within its designed operating frequency band, and both even and odd sound
field coefficients can be accurately calculated. We also built an experimental planar
microphone array to further validate the proposed theory.
This chapter’s work has been published in the following journal paper. [75]
Overview: This chapter proposes the theory and design of circular higher-order
microphone arrays for 3D sound field analysis using spherical harmonics. Through
employing the spherical harmonic translation theorem, the local spatial sound fields
recorded by each higher-order microphone placed in the circular arrays are combined
to form the sound field information of a large global spherical region. The proposed
design reduces the number of the required sampling points and the geometrical com-
plexity of microphone arrays. We develop a two-step method to calculate sound field
coefficients using the proposed array structure, i) analytically combine local sound
field coefficients on each circular array and ii) solve for global sound field coeffi-
cients using data from the first step. Simulation and experimental results show that
the proposed array is capable of acquiring the full 3D sound field information over a
relatively large spherical region with decent accuracy and computational simplicity,
hence suitable for spatial ANC applications especially over large regions.
4.1 Introduction
A higher-order microphone is capable of measuring the local sound field within
its proximity, and extracting the sound field coefficients up to a certain spherical
harmonics order. It has been shown that the sound field over a large region can
be recorded using a number of higher order microphones in a spherical geometry
53
54 3D sound field analysis using circular higher order microphone array
[76]. Compared to using omnidirectional microphones for the same purpose, the
higher order microphone array proposed in [76] requires significantly less number of
individual microphone units, thereby reducing the complexity of system deployment
especially for spatial sound recording over a large region.
In Chapter 3 we introduced a planar microphone array geometry consisting of
differential microphone pairs, which is capable of recording 3D spatial sound. A
differential microphone pair can also be seen as a special kind of higher-order mi-
crophone, since the sound pressure and pressure gradient it captures are related to
the 0th order and 1st order spherical harmonic coefficients as shown in Theorem
1. Intuitively, if differential microphone arrays arranged on a plane can capture 3D
sound field, then general higher-order microphones should also have this capability.
In this Chapter, we present an algorithm to capture 3D sound field using circular
arrays of higher order microphones, placed on a 2D plane. Compared to [76], this
method requires simpler microphone geometry, thus reduces the implementation dif-
ficulty of higher order microphone arrays for the purpose of large area sound field
recording. This method can be seen as a generalization of the algorithm discussed
in Chapter 3.
For clarity, in this section, we refer to the sound field with origin O as the global
sound field, which can be expressed using spherical harmonics using 2.1; the corre-
sponding coefficients Clm are considered as the global sound field coefficients.
In addition, we define a local origin Oq whose position with respect to O is
Rq = (Rq , θq , φq ), then the sound pressure at a point r = (r, ϑ, ϕ) with respect to
Oq can be expressed by
∞ X
X ν
P (r, ϑ, ϕ) = Bνµ (k)jν (kr)Yνµ (ϑ, ϕ), (4.1)
ν=0 µ=−ν
where Bνµ (k) represent the sound field coefficients with respect to the local origin
Oq . The sound field with respect to Oq is called the local sound field.
Using the spherical harmonic addition theorem (2.12), the relationship between
4.3 Higher-order microphone array 55
∞ X
X l
νµ
Bνµ = Clm Sblm (Rq ). (4.2)
l=0 m=−l
In (4.2), Bνµ are the local sound field coefficients in (4.1) and Clm are the global
sound field coefficients in (2.1).
V X
X ν
Pq (r, ϑ, ϕ) = Bνµ jν (kr)Yνµ (ϑ, ϕ). (4.3)
ν=0 µ=−ν
Theorem 2. Given a set of local sound field coefficients Bνµ (ϕ) which are mea-
sured along a circle, and an integer m0 , their relationship with the global sound field
56 3D sound field analysis using circular higher order microphone array
where
l+ν+1
X
νµ
Hlm (Rs , ϑs ) = 4πiν−l i` (−1)2m−µ j` (kRs )P`|µ−m| (ϑs )W, (4.5)
`=|µ−m|
νµ νµ
Sblm (Rs , ϑs , ϕ) = Hlm (Rs , ϑs )E(m−µ) (ϕ), (4.6)
νµ
where Hlm (Rs , ϑs ) is given by (4.5). Substituting (4.6) into (2.12) yields
∞ X
X l
νµ
Bνµ (ϕ) = Clm Hlm (Rs , ϑs )E(m−µ) (ϕ). (4.7)
l=0 m=−l
Multiplying both sides of (4.7) with Em0 (ϕ) and integrating with respect to ϕ over
[0, 2π), due to the orthogonality property of complex exponential functions
Z 2π
E(m−µ) (ϕ)Em0 (ϕ)∗ dϕ = δm−µ,m0 , (4.8)
0
R 2π νµ
the integration 0 Clm Hlm (Rs , ϑs )E(m−µ) (ϕ)Em0 (ϕ)dϕ is non-zero only when m =
0
µ − m , thus (4.7) reduces to (4.4), which completes the proof.
By replacing Bνµ (ϕ) with Bνµ (ϕq ), the discrete form of (4.4) can be written as
Q ∞
1 X X νµ
Bνµ (ϕq )Eµ−m (ϕq ) ≈ Clm Hlm (Rs , ϑs ), (4.9)
Q q=1
l=|m|
where Q is the number of sampling points evenly distributed on the circle. In (4.9),
the variable m0 has been replaced by (µ − m) to illustrate the direct relationship
between Bνµ and Clm . Due to the spatial sampling, an upper bound for the range
of (µ − m) that can be evaluated is given by
(Q − 1)
| µ − m |≤ b c. (4.10)
2
4.3 Higher-order microphone array 57
A method for calculating the global sound field coefficients Clm up to order L using
the local coefficients Bνµ (ϕq ) can be formulated based on (4.9).
Step 1 of the method is to evaluate the summation on the left hand side of
(4.9). For each existing global sound field mode m, evaluate the summation for all
m
combinations of Bνµ (ϕq ) and m that satisfy (4.10). Denote the summation as ανµ ,
then
Q
m 1 X
ανµ = Bνµ (ϕq )E(m−µ) (ϕq ). (4.11)
Q q=1
The second step is to solve a matrix inversion problem to find Clm . Using (4.9) and
m
(4.11), the relationship between Clm and ανµ can be represented in matrix form as
αm = Hm Cm , (4.12)
h iT h iT
where αm = α00 m m m m , and Cm = C|m|m C(|m|+1)m . . . CLm
α1(−1) α10 . . . ανµ
is the set of global coefficients of mode m.
00 00 00
H|m|m H(|m|+1)m ... HLm
1(−1) 1(−1) 1(−1)
H|m|m H(|m|+1)m . . . HLm
Hm = .
.. .. ..
. . . . .
νµ νµ νµ
H|m|m H(|m|+1)m . . . HLm
is the matrix that contains the weights for spherical harmonics translation. A solu-
tion for Cm can be found by calculating the Moore-Penrose Pseudo Inverse of Hm .
The size of Hm is (V + 1)2 by (L − |m| + 1), which is significantly smaller than
the (L + 1)2 -by-(L + 1)2 matrix inversion proposed in [58], thus both the computa-
tional simplicity and the condition of the matrix inversion are significantly better
compared to the method in [58].
The complete set of global sound field coefficients is found by solving (4.12) for
m = [−L : L], where L is the maximum order of the global sound field.
b HH
Cm = (H −1 b H
m m + λI) Hm α bm (4.13)
b
where H bm =
b m = [Hm;1 T Hm;2 T . . . Hm;K T ]T , λ is the regularization parameter, and α
[αm;1 T αm;2 T . . . αm;K T ]T . Evaluating (4.13) for m = [−L : L] yields the complete
set of global sound field coefficients.
ekRs
Q ≥ 2d e + 1. (4.15)
2
SNR of 40 dB. A point source is placed at (R, θ, φ) = (1.6, 60°, −60°) for all the
simulation setups.
0.1 0.1
0.6 0.6
0.08 0.08
0.04 0.04
0.2 0.2
0.02 0.02
y(m)
y(m)
0 0 0 0
−0.02 −0.02
−0.2 −0.2
−0.04 −0.04
−0.4 −0.06 −0.4 −0.06
−0.08 −0.08
−0.6 −0.6
−0.1 −0.1
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
x(m) x(m)
0.04 0.04
0.2 0.2
0.02 0.02
y(m)
y(m)
0 0 0 0
−0.02 −0.02
−0.2 −0.2
−0.04 −0.04
−0.4 −0.06 −0.4 −0.06
−0.08 −0.08
−0.6 −0.6
−0.1 −0.1
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
x(m) x(m)
Figure 4.1: Comparison of original and recorded sound field due to 700 Hz point
source, reconstructed at z=0 and z=0.2 m plane
Figure 4.1 shows the simulation result for the first order array configuration. In
this simulation, the sound field generated by the point source is recorded by the
array, and the resulting sound field coefficients are used to reconstruct the sound
field. The sound field is plotted for two layers: the z = 0 plane and z = 0.2 m
plane. Plots (a) and (c) show the original sound field at these two planes, and plots
(b) and (d) show the reconstruction of the sound field coefficients obtained from
the microphone array. The result shows that the microphone array is capable of
accurately capture the sound field within its coverage (yellow circle).
Figure 4.2 depicts the error performance for two different array configurations
at a frequency range of 100 − 1000 Hz. For this figure, the error is calculated by
averaging the amplitude error over the entire region of interest, and normalizing
by the average sound pressure in the same region. Since both array configurations
60 3D sound field analysis using circular higher order microphone array
0
10
−1
10
−2
10
100 200 300 400 500 600 700 800 900 1000
Frequency (Hz)
Figure 4.2: Reproduction error at different frequencies for first and second order
array configurations
0
10
1st order array, 500Hz
1st order array, 600Hz
1st order array, 700Hz
Relative Error
−2
10
0 0.05 0.1 0.15 0.2 0.25 0.3
Height of reproduction layer (m)
Figure 4.3: Reproduction error at different elevations and frequencies for first and
second order array configurations
are designed to operate at up to 700 Hz, it can be seen from Fig. 4.2 that the
reproduction error for both configurations are low for frequencies below 700 Hz,
and the error increases rapidly once the frequency becomes higher than the design
frequency.
The reproduction error is also evaluated at different planes using the same
method, but with the region limited to horizontal planes within the spherical area.
The results are shown in Fig. 4.3. The recorded sound field is reconstructed on
planes of different heights, ranging from z = 0 to z = 0.3 m. The simulation shows
that the reproduction error is smaller around the equator compared to that near
the poles of the sphere, which is due to the fact that the microphones are clustered
around the equator plane.
4.5 Experimental results 61
0.02 0.02
0.1 0.1
0.01 0.01
0.05 0.05
y(m)
y(m)
0 0 0 0
−0.05 −0.05
−0.01 −0.01
−0.1 −0.1
−0.02 −0.02
−0.1 −0.05 0 0.05 0.1 −0.1 −0.05 0 0.05 0.1
x(m) x(m)
We believe that the proposed spatial sampling method allows for easier imple-
mentation of sound field recording systems compared to spherical sampling methods,
especially when combined with the recording technique used in this experiment, and
for applications such as room response modelling over a large space.
4.6 Summary
In this chapter, we propose a circular higher-order microphone array structure and
an associated analytical algorithm for sound field analysis based on spherical har-
monics decomposition. This method can be seen as a generalization of the planar
microphone array proposed in Chapter 3. In this method, through employing the
spherical harmonic translation theorem, the local spatial sound fields recorded by
each higher-order microphone placed in the circular arrays are combined to form
the sound field information of a large global spherical region. The proposed design
reduces the number of the required sampling points and the geometrical complex-
ity of microphone arrays. Simulations and experiments show that the proposed
array architecture offers decent accuracy and robustness, and has the potential of
simplifying sound field recording systems in certain applications.
63
64 Direct-to-reverberant energy ratio estimation using a first order microphone
5.1 Introduction
The direct-to-reverberation energy ratio (DRR), defined as the energy ratio between
direct signal and its reverberations, is an important parameter to characterize a re-
verberant environment, along with other parameters such as reverberation time.
Since reverberation energy affects the speech signal’s clarity [78], the DRR has an
influence on the algorithms for various applications such as speech dereverbera-
tion [79], teleconferencing [80] and hearing aids [81], both in terms of algorithm
performance and strategy. The minimum audible difference in DRR has been inves-
tigated in [82]. In [83], DRR is utilized for parametric spatial audio coding. DRR
also finds its application in the field of psychoacoustics, where it is believed that
DRR helps human to determine the distance of the sound source [78, 84, 85].
DRR estimation methods based on estimating room impulse responses have been
presented by Larsen et al. [86] and Falk et al. [87], However, pre-processing is re-
quired for both methods. Mosayyebpour et al. [88] presented a method for blind
DRR estimation based on higher order statistics, where the inverse filter of the room
impulse response is estimated using the skewness of the speech signal. Parada et.
al. presented a single channel DRR estimation method base on a neuron network
learning algorithm [89].
Methods for blind DRR estimation using multiple sensors have also been pro-
posed in the literature. With the goal of estimating source distance, Lu [90] pre-
sented a DRR estimation algorithm using the equalization-cancellation method,
where a binaural microphone system is used to capture sound signal. The coherence
function framework was first introduced by Vesa [91] for estimating source distance
using binaural signals, where the coherence function of the two input signals was used
as a characterization of source distance. Later, the coherence function framework
was also used by Jeub [92] to develop a DRR estimation algorithm. In this work,
the DRR is estimated by comparing coherence value computed from two microphone
inputs with theoretical coherence functions in a diffuse sound field. Thiergart [93]
also developed a DRR estimation algorithm based on the complex coherence func-
tion of two omnidirectional microphones. In [94] a DRR estimation method based
on spectra standard deviation of two microphones was proposed.
Directional or beam forming microphone arrays have also been used to estimate
DRR, such as the methods presented in [95] and [96]. In both of these works, the
power spectral density (PSD) of the reverberant field were used to estimate DRR.
Another method [97] uses a circular microphone array to estimate DRR, the method
5.1 Introduction 65
relies on the spatial correlation matrix of the microphones’ received signals. The
reverberation is modelled as a diffuse field in this work, while the direct path is
assumed to be a plane wave. The DRR is solved using a least mean square method.
Kuster [98] presented a method based on coherence function of sound pressure and
particle velocity at the receiver position, measured by a differential microphone
array.
In recent years, the use of higher order microphones and the technique of spher-
ical harmonic decomposition [10] have become popular in the field of room acoustic
analysis. Jarrett et al. [12] proposed a method to estimate Signal-to-Diffuse Ra-
tio (SDR, equivalent to DRR when assuming diffuse reverberation field) utilizing
spherical harmonic coefficients captured by a higher order microphone. It is shown
that this method minimizes the SDR estimation bias. In our previous work [99], we
implemented Kuster’s method [98] in spherical harmonic domain, utilizing the first
order spherical harmonic coefficients to estimate DRR.
In many of the previous works, such as [98], [12], [93] and our previous work [99],
the direct path signal is assumed to be plane wave, and the reverberant sound field
is assumed to be diffuse field. In real-life reverberant environments where these
assumptions may not hold, the DRR estimation accuracy of these algorithms may
degrade. For example, the DRR estimated using Kuster’s method tend to be higher
than ground truth in reverberant rooms [98].
In this work, we first develop a general expression for DRR estimation using
the coherence function of sound pressure and particle velocity, using a point source
model for the direct path signal, and without applying any assumptions for the
reverberation field. Using the relationship between spherical harmonic coefficients
and acoustic particle velocity, we develop the framework in the spherical harmonics
domain. Then, for the direct path model, we provide a detailed analysis on the
error in DRR estimation results when using the plane wave model. We propose a
rule of thumb for determining whether the plane wave model can be used without
introducing significant error, based on the source-to-microphone distance and tar-
get frequency. For the reverberation sound field, we show that the reverberation
characteristics related to DRR estimation can be expressed using two parameters,
and that under the diffused field assumptions, the values of these parameters can
be determined theoretically, which results in the simplified DRR solutions in [98]
and [12]. We also provide a theoretical analysis on the two parameters, their physi-
cal meanings, and their impact on the DRR estimation, which explains the positive
bias phenomenon of Kuster’s method [98]. Furthermore, we propose a method to
66 Direct-to-reverberant energy ratio estimation using a first order microphone
estimate these two parameters for a given reverberant environment, using a first
order microphone, under a number of assumptions on the reverberant field which
are less strict than the diffuse field model. The DRR can then be calculated using
the estimated parameters.
The performance of the proposed DRR estimation algorithm is verified using the
ACE Challenge Dataset [100]. It is shown that the results agree with the theoreti-
cal analysis, and that the proposed method addresses the positive bias problem of
Kuster’s method [98], and the mean DRR estimation error is less than 2 dB for all
recording scenes in the ACE Challenge Dataset.
1 X
X l
PD (r, θ, φ, k) = Blm (k)jl (kr)Ylm (θ, φ), (5.1)
l=0 m=−l
1 X
X l
PR (r, θ, φ, k) = αlm (k)jl (kr)Ylm (θ, φ), (5.2)
l=0 m=−l
where Blm (k) and αlm (k) represent the coefficients of the direct path and the rever-
berant sound field, respectively.
5.2 DRR estimation based on coherence measurements 67
The following assumptions are made regarding the direct path sound:
2: The direct path signal PD (r, θ, φ, k) is uncorrelated with the reverberant sound
field PR (r, θ, φ, k). Using (5.1) and (5.2), this assumption can be expressed as
Since the direct path signal is modelled as sound waves emitted by a point source,
Blm (k) can be written using the following expression [101]
(1)∗
Blm (k) = AD ik hl (kr0 )Ylm (ϑ, ϕ), (5.4)
(1)
where AD indicates the magnitude of the impinging sound, hl (kr0 ) is the nth
order spherical Hankel function of the first kind, r0 is the distance between the
point source and the microphone with r0 > r, (r0 , ϑ, ϕ) denotes the position of the
point source, and (·)∗ represents complex conjugate. Since the coordinate system is
defined such that ϑ = 0, and due to the fact that Y11 (0, ϕ) = Y1,−1 (0, ϕ) = 0, we
have B11 (k) = B1,−1 (k) = 0. Thus the combined sound field coefficients Clm (k) can
be expressed as follows
(1)
C00 (k) = AD ik h0 (kr0 )Y00∗ (0, 0) + α00 (k), (5.5)
(1)
C10 (k) = AD ik h1 (kr0 )Y10∗ (0, 0) + α10 (k), (5.6)
C11 (k) = α11 (k), (5.7)
C1,−1 (k) = α1,−1 (k). (5.8)
Equations (5.5)-(5.8) shows that in the coordinate system defined in this section,
the direct path signal is only present in C00 (k) and C10 (k), but not in C11 (k) and
C1,−1 (k).
We note that the four coefficients C00 (k), C10 (k), C11 (k) and C1,−1 (k) can be
captured by a first order microphone. Although in the general sense, microphones
with certain directional beam patterns, such as cardioid microphones and differential
microphones are commonly referred to as first order microphones, in the context of
this section, a first order microphone is a microphone system which is capable of
acquiring the 0th and 1st order spherical harmonic coefficients of its surrounding
68 Direct-to-reverberant energy ratio estimation using a first order microphone
The coherence function between the sound pressure P (0, k) and particle velocity
Vz (0, k) along the z direction can be defined as [98],
where the assumption that direct path is uncorrelated with the reverberations (5.3)
is used, and we denote
(1)
H0 , AD ik h0 (kr0 )Y00∗ , (5.12)
(1)
H1 , AD ik h1 (kr0 )Y10∗ . (5.13)
Note that the angle arguments (ϑ = 0, ϕ = 0) of Ynm (ϑ, ϕ) and the frequency
arguments (k) of Cnm and αnm have been omitted for simplicity.
The linear scale direct-to-reverberant energy ratio is defined here to be the ratio
of measured acoustic energy at the position of measurement due to the direct path
and reverberation, since P (0) = C00 Y00 , we have
1
Although removing the imaginary argument i here does not affect γ 2 , we keep i for the deriva-
tion of further expressions.
5.2 DRR estimation based on coherence measurements 69
(1)
E{|AD ik h0 (kr0 )Y00∗ |2 } E{|H0 |2 }
DRR = = . (5.15)
E{|α00 |2 } E{|α00 |2 }
h(1) (kr ) ∗
1 0
H, (1)
. (5.19)
h0 (kr0 )
| − DRR · i · H + R1 |2
γ2 = (5.20)
(DRR + 1)(DRR|H|2 + R2 )
|DRR|2 |H|2 + 2DRR · Im{HR1∗ } + |R1 |2
= , (5.21)
(DRR + 1)(DRR|H|2 + R2 )
where Im{·} denotes imaginary part of the argument. From (5.20) it can be seen
the characteristics of reverberation which affects DRR estimation using coherence
method can be expressed using two parameters R1 and R2 .
In previous works, the direct path signal is often assumed to be a plane wave [12,98].
Under this assumption, the following approximation can be applied (see Appendix
70 Direct-to-reverberant energy ratio estimation using a first order microphone
|DRR + R1 |2
γ2 = (5.23)
(DRR + 1)(DRR + R2 )
DRR2 + 2DRR · Re{R1 } + |R1 |2
= , (5.24)
(DRR + 1)(DRR + R2 )
where Re{·} denotes real part of the argument. The plane wave assumption leads
to bias in the DRR estimation, primarily for lower frequencies and smaller values of
r0 , which is shown in Section 5.3.2.
In many previous works, the sound field due to reverberation is often modelled
as diffused field [12, 98], although the exact definition of diffused field may vary.
In [12], the diffuse field is defined as an infinite number of uncorrelated plane waves
impinging uniformly from the sphere. Under this assumption, it is shown that
∗
E{α00 α10 } = 0, and E{|αlm |2 } = E{|αl0 m0 |2 } for all values of l and m [12]. In
this case, R1 = 0, R2 = |Y00 |2 /|Y10 |2 = 1/3, and (5.24) becomes equivalent to the
magnitude-squared version of Eq.(18) in [12].2
In the case of Kuster’s work [98], the reverberant field is assumed to be plane
waves whose impinging directions distribute uniformly over θin ∈ [0, 2π), where θin
is the angle between direct path and the plane wave impinging direction. This
assumption differs from the reverberant field model used in [12], where plane waves
are distributed uniformly over the sphere; this assumption can be fulfilled if the
plane waves impinge uniformly over a circle. Under this assumption, Kuster has
derived an expression for γ 2 which takes the same form as (5.24), but with R1 = 0,
and R2 = 0.5 [98].
In many real acoustic environments, the diffused field assumptions for reverberant
field made in [12] and [98] often cannot be met, which may lead to inaccuracies in
the DRR estimation result. In this work, in order to improve the accuracy of DRR
2
For c00 and c10 , with Ωdir = (0, 0).
5.2 DRR estimation based on coherence measurements 71
1: The average sound intensity (product of sound pressure and particle velocity)
[102] of the reverberant field has the same magnitude in x, y and z directions.
where P r and V r denote sound pressure and particle velocity due to reverber-
ation, respectively.
3: The reverberant field sound intensity is zero mean when averaged over a frequency
band. Z k2
E{P r (k)Vzr (k)∗ }dk = 0. (5.27)
k1
In (5.27), the real part of P r Vzr ∗ is often referred to as the active sound intensity,
which represents the coherent flow of sound energy in the z direction [102]. The
imaginary part of sound intensity, on the other hand, is referred to as the reac-
tive sound intensity, which represents the coherent, but non-propagating, “standing
wave” sound energy. A detailed justification of the assumption (5.27) is given in
5.3.1.
In a diffuse sound field, both active and reactive components of the sound in-
tensity are equal to zero since the phase of particle velocity varies randomly. The
energy of particle velocity can be analytically computed [12, 98]. Applying these
results to (5.23) leads to simplified expressions of γ 2 as shown in [12] and [98].
However, this work do not assume diffuse field. Hence the expected energy of
particle velocity and the sound intensity cannot be directly computed without the
knowledge of the reverberant field. Therefore, a method to estimate these charac-
teristics is needed to compute the DRR. The following subsection describes one such
method, using measurements from a first (or higher) order microphone system.
72 Direct-to-reverberant energy ratio estimation using a first order microphone
From (5.7) and (5.8), it can be observed that the spherical harmonic coefficients β11
and β1,−1 do not contain the direct path signal. In fact, β11 and β1,−1 collectively
represent the particle velocity of the reverberations in the directions orthogonal to
the direct path. The assumptions on the reverberation (5.25) (5.26) and (5.27) can
be expressed using spherical harmonic coefficients as
and Z k2
E{α00 (k)(α10 (k) · i)∗ }dk = 0. (5.30)
k1
Since it is assumed that the direct path signal is uncorrelated with the reverber-
ation signal, substituting (5.3), (5.5), (5.7) and (5.8) into (5.28), we can write
√
2|E{α00 (α10 i)∗ }| = |E{C00 (C11 i + C1,−1 i)∗ }|
= |E{C00 (C11 i − C1,−1 i)∗ }|, (5.31)
which illustrates a way to indirectly estimate the value of |R1 | in (5.24). Using (5.5)
(5.6), the energy of the reverberation can be approximated by
Y002
E{|α00 |2 } = E{|C00 |2 } − (E{|C10 |2 } − E{|C10 |2 }), (5.32)
Y102 |H|2
If the plane wave model is used for the direct path, (5.32) can be simplified using
(5.22), as
E{|C |2 } E{|C |2 } E{|α |2 }
2 00 10 10
E{|α00 | } ≈ 2
− 2
+ 2
Y002 , (5.33)
Y00 Y10 Y10
where E{|α10 |2 } can be estimated using (5.29). Substituting (5.29), (5.31) and
5.2 DRR estimation based on coherence measurements 73
(5.32) into (5.17), the estimation expression for |R1 | can be written as
where we define
1
Mpwr , (E{|C11 + C1,−1 |2 } + E{|C11 − C1,−1 |2 }) (5.35)
2
similarly, by substituting (5.7) (5.8) and (5.32) into (5.18), R2 can be written as
Mpwr
R2 ≈ Y2 Y2
, (5.36)
E{|C00 |2 } − E{|C10 |2 } Y 200H2 + Mpwr Y 200H2
10 10
It can be seen that all the coefficients required for the calculation can be acquired
by a first order microphone array directly. The estimated values of |R1 | and R2 can
be directly substituted into (5.21) or (5.24) for estimation of DRR using γ 2 .
where the assumption (5.27) is used, which leads to Re{R1 } = 0. The calculated
DRR is in linear scale, and the more commonly used log-scale DRRlog is defined as
From our experience in testing the algorithm using the ACE Challenge Devel-
opment Dataset [100], the estimation of |R1 | and R2 at a single frequency is often
unstable. However, for typical room environments, one can assume that the charac-
teristics of reverberation do not vary rapidly over frequencies since sound waves of
similar wavelength are likely to have similar propagation modes. Therefore |R1 | and
R2 can be seen as constant if the frequency band of interest is sufficiently narrow,
then one can use the average values of |R1 | and R2 over a particular frequency band
for the calculation of DRR for this frequency band.
74 Direct-to-reverberant energy ratio estimation using a first order microphone
For subband and full band DRR estimation, the results are obtained by taking
the average of the single frequency DRR estimations within the band, then the
values are converted to log scale for convenience.
We recommend the following procedures to estimate the DRR of a particular
frequency band from a recording:
A disadvantage of the original coherence method for DRR estimation is that the
angle between the direct path and the particle velocity measurement direction is
generally unknown, and in a real measurement, the microphone have to be pointed
towards the direct path [98]. In our improved method, since we use a first order
microphone for measurement, which records the complete sound field, it is possible
to derive the velocity measurement in any direction, through rotation of the spherical
harmonic coefficients. In addition, the data acquired by the microphone can be used
to perform Direction-of-Arrival (DOA) estimation for the direct path, therefore there
is no special requirement for positioning the microphone during measurements.
0.8
R1 = 0, R2 = 0.5
0.6
γ2
R1 = 0, R2 = 0.33
R1 = 0, R2 = 0.28
0.4 R1 = - 0.15, R2 = 0.28
R1 = 0.15, R 2 = 0.28
R = 0.25i, R = 0.28
1 2
0.2
-10 -5 0 5 10 15 20
Direct-to-Reverberant Ratio (dB)
proposed by Kuster [98] (R1 , R2 = 0, 0.5) and Jarrett [12] (R1 , R2 = 0, 1/3) as well
as a number of other values that were commonly found in our experiment (R1 , R2 =
0, 0.28; 0.15, 0.28; −0.15, 0.28; 0.25i, 0.28, respectively), as shown in Fig. 5.1. We
note that the assumption (5.27) is not applied here, in order to illustrate the impact
of R1 on the DRR estimation. It can be seen from Fig. 5.1 that depending on the
values of R1 and R2 , a deviation of ±3dB in estimated DRR can be observed for
low values of γ 2 .
From (5.17), it can be seen that R1 is equivalent to the sound intensity in the
z direction with certain normalization. Since all normalization factors are real,
the real and imaginary part of R1 correspond to the active and reactive sound
intensity, respectively. When Re{R1 } > 0, it indicates that the net energy flow of
reverberation coincides with the direct path signal, and as a result the reverberation
will be “added” to the direct path, and as a result contributes to coherence function
γ 2 positively. On the other hand, if Re{R1 } < 0, the net reverberation energy flow
in the z direction opposites the direct path, essentially cancelling part of the direct
path sound intensity, therefore it contributes to γ 2 negatively. As a result of this,
as can be seen in Fig. 5.1, for the same value of γ 2 , a positive Re{R1 } corresponds
to low value of DRR, and vice versa.
The absolute value of R1 represents the overall coherence of the reverberant
field in the z direction. This includes the reactive part of R1 , which corresponds
to the resonating reverberation energy. It can be seen from (5.24) that |R1 | always
contributes to γ 2 positively. Therefore, as seen in Fig. 5.1, a non-zero value of |R1 |
76 Direct-to-reverberant energy ratio estimation using a first order microphone
results in lower value of DRR, for the same γ 2 , this is especially significant at lower
values o γ 2 .
Using a first order microphone, it is possible to estimate |R1 | for each frequency
bin, if it is assumed that the reverberant sound intensity is uniform in each direc-
tion. Unfortunatelly, the sign of Re{R1 }, which indicates the direction of energy
flow, cannot be determined through observation of the sound field in its orthogonal
directions. However, by observing the reverberation sound field from the ACE Chal-
lenge Development Sataset [100], it was found that both active and reactive sound
intensity of the reverberation in the x and y directions have zero mean when aver-
aged over each 1/3 octave subband, indicating that the energy flow of reverberation
changes randomly and rapidly with frequency. Therefore it is reasonable to assume
that P Vz∗ is also zero mean when observed at multiple frequencies. As as result,
when averaging the estimated DRR over each subband, the impact of Re{R1 } (and
Im{HR1∗ } in (5.21)) on each frequency bin will be cancelled out, and the term can be
removed in the derivation of (5.37), provided that appropriate frequency averaging
is performed after calculating DRR for each frequency bin.
As can be seen from Fig. 5.1, R2 does not affect the estimated DRR as strongly
as R1 , and a lower value of R2 results in a slightly lower estimation of DRR. From
(5.18) it can be seen that R2 reflects the expected energy ratio between sound
pressure and particle velocity. In Jarrett’s diffuse field model [12], the value of R2
is lower (R2 = 1/3), therefore, we expect Jarrett’s method to yield a slightly lower
estimation of DRR compared to Kuster’s. From our analysis to the ACE Challenge
Development Dataset, the value of R2 typically varies between 0.25 − 0.33, which is
close to Jarrett’s model (see Table 5.2).
5
γ2 = 0.86
4 γ2 = 0.65
∆DRR (dB) γ2 = 0.33
3
γ2 = 0.19
2
0
0.5 1 1.5 2 2.5 3
k·r0
Figure 5.2: Plot of theoretical DRR versus kr0 using plane wave model (5.24) and
point source model (5.20) with γ 2 = 0.86, 0.65, 0.33 and 0.19.
estimations than that of the point source model for smaller values of kr0 , where
∆DRR ≈ 2 − 4 dB for kr0 = 0, depending on the value of γ 2 . At higher frequencies
and larger source-microphone distance (higher kr0 ), the difference between the two
methods reduce rapidly, at kr0 > 3, the difference in the calculated DRR using the
two models becomes negligible.
Comparing the curves corresponding to each value of γ 2 , it can be seen that the
estimation error of the plane wave model is smaller when γ 2 is larger, corresponding
to higher values of DRR. The user may select the appropriate model for their ap-
plications, based on the target frequency band and expected source distance. Here,
we propose a rule of thumb for determining whether to use the point source model
or the plane wave model. When kr0 > 2, the error caused by plane wave model is
less than 0.5 dB for all values of γ 2 , as can be observed in Fig. 5.2. For kr0 < 2, the
use of point source model is recommended for improving DRR estimation accuracy.
Table 5.1: Room dimensions (approx.) and minimum/maximum DRR for each room
recording configuration
Room Name Lecture Room 1 Lecture Room 2 Meeting Room 1 Meeting Room 2 Office 2
Length (m) 6.9 13.4 6.6 10.3 5.1
Width (m) 9.7 9.2 4.7 9.2 3.2
Height(m) 3.0 2.9 3.0 2.6 2.9
Volume (m3 ) 200 360 92 250 48
Setup A min DRR -0.82 -0.37 -2.0 -2.6 -0.44
Setup A max DRR 15 13 11 11 13
Setup B min DRR 0.87 -3.7 -3.1 1.1 -2.3
Setup B max DRR 7.9 6.4 7.6 12 9.5
database, using which the participants can train and fine tune their algorithms. The
Evaluation dataset is used to evaluate the performance of fine-tuned algorithms.
The Evaluation dataset consists of 4500 synthesized recordings of various con-
figurations. A total of 5 rooms are used to record the room impulse responses, with
two recording setups (positions) for each room. The room details are summarized in
Table 5.1. We note that although the impulse responses of 7 rooms were recorded ac-
cording to [100], only 5 of them are used to create the Evaluation dataset; the other
two rooms were used to create the Development dataset. The speech and noise setup
for the Development dataset differ from that of the Evaluation database, therefore
in this work, the Develop dataset is only used for developing the DRR algorithm;
the results presented in this section are all generated using the Evaluation dataset.
The impulse responses are recorded using an Eigenmike, and the reverberant
speech recordings are synthesized by convolving the impulse responses with anechoic
speech recordings [100]. The speech recordings consist of voices of 10 talkers, 5
female and 5 male, with 5 separate utterance recordings for each talker. Three
different types of noise (“Ambient”, “Fan” and “Babble”) are recorded separately
under the same room setup and mixed into the reverberant speech recordings, each
with three SNR settings (−1 dB, 12 dB and 18 dB).
The ground truths for both full band and subband DRR have been provided.
For subband DRR, the central frequencies for all bands have been chosen according
to the ISO standard [100].
the direct path). To find the frames containing speech, a simple speech detection
algorithm calculates the average signal energy of each frame, and select the frames
with higher energy, which are considered to contain the speech signal. If the energy of
a frame is significantly higher than the previous one, then this window is considered
to contain the beginning of an utterance. We then calculate the spherical harmonic
coefficients for each selected frame and for frequencies between 200-2000 Hz, and
perform a frequency averaged MUSIC DOA estimation in the spherical harmonic
domain [11, 103]. The estimated DOA is used for further calculations.
In order to maintain the highest possible frequency resolution while at the same
time to avoid violating the assumption that the direct path signal and reverberations
are uncorrelated, we choose the analysis window length to be 10 ms. When fine-
tuning our algorithm using the ACE Development Dataset, it was found that a
window length shorter than 10 ms does not reduce the average value of γ 2 , therefore
we assume that the chosen window length is appropriate.
For each speech recording, only the windows that contain the speech signal are
used for analysis. For each frequency subband, we calculate the 0th and 1st order
spherical harmonic coefficients for each selected window and for all the frequency
bins within each subband. We then follow steps 3 through 6 in Section. 5.2.5 to
estimate DRR for each subband.
Although the ground truth for subband DRR is given for all frequency bands
between 20 Hz and 20 kHz [100], the recorded speech signal does not cover the
complete spectrum. Therefore, we focus on the subbands with central frequency
between 199.52 Hz and 2511.89 Hz, where there is sufficient energy in the speech
recordings for DRR estimation. For this reason, we cannot estimate the full band
DRR in the complete sense, instead, we calculate the average DRR over the selected
subbands, which is used as the full band estimation. The full band ground truth
DRR used for comparison is also calculated by averaging the corresponding subband
ground truths, instead of using the full band DRR provided by the database.
DRRest
DRRerr = 10 log10 . (5.39)
DRRtruth
The mean and standard deviation of DRR estimation error is then calculated using
DRRerr from each recording.
8
Proposed
Kuster
6 Jarrett et. al.
-2
A B A B A B A B A B
Lecture Rm 1 Lecture Rm 2 Meeting Rm 1 Meeting Rm 2 Office 2
Figure 5.3: Mean and standard deviation of estimated DRR using the proposed
method (blue), Kuster’s method (red) and Jarrett’s method (pink) for all 5 rooms
and 2 locations (A and B) in each room, with 18 dB SNR, averaged over 3 noise
types. Dashed lines indicate ground truth DRR.
band, which would add uncertainty to the distribution of estimated DRR. However,
from Fig. 5.3 it can be seen that the proposed algorithm yields almost identical
standard deviation as Kuster’s method, which indicates that the primary contributor
of standard deviation is the coherence function γ 2 , which is common for both the
proposed method and Kuster’s method.
On the other hand, Jarrett’s method results in the lowest error standard devia-
tion for all scenarios. The reason for this is that in the other two methods, only the
first order spherical harmonics are used to calculate the coherence γ 2 , while Jarrett’s
method utilizes all of the available spherical harmonic coefficients to reach a more
consistent estimation of γ 2 , which reduces its deviation due to random interference
and other sources of error.
6
Proposed
Kuster
DRR Estimation Error (dB)
-2
-4
199 251 316 398 501 631 794 1000 1258 1584 1995 2511
Central Frequency (Hz)
Figure 5.4: Mean and standard deviation of subband DRR estimation error for
all rooms and configurations with 18 dB SNR, using the proposed method (blue),
Kuster’s method (red) and Jarrett’s method (pink).
From Fig. 5.4 it can be seen that in general, the mean error of the proposed
method falls within 1 dB of the ground truth for all frequency bands. furthermore,
the subband results below 1000 Hz show a different pattern than the subbands
above 1000 Hz. Below 1000 Hz, the mean error are all positive, indicating a slight
overestimation of DRR; the error standard deviation is approximately 3 dB for
these subbands. On the other hand, for frequency bands above 1000 Hz, the mean
error becomes negative; the standard deviation of estimation error reduces to 2
dB at 1000 Hz, and decreases further at higher frequencies. On the other hand,
both Kuster’s and Jarrett’s methods show a clear trend of overestimation, this is
especially significant for Kuster’s method at lower frequencies. Jarrett’s method
yields lower DRR estimations compared to Kuster’s, and in most frequency bands,
have the lowest standard deviation.
Due to the geometry of the Eigenmike, only the 1st order spherical harmonics
can be reliably captured for frequencies below 1000 Hz [53]. Below 1000 Hz, the
2nd order spherical harmonics are aliased onto the 1st order coefficients, and the
aliasing error increases with frequency; at 1000 Hz and above, our algorithm begins
to calculate the 2nd order coefficients, which removes the aliasing and improves
the accuracy of the 1st order coefficients. Furthermore, at higher frequencies, the
wavelength of the sound becomes closer to the dimension of the Eigenmike (8.4
cm diameter), which further increases the accuracy of 1st order spherical harmonic
acquisition. This explains why the error standard deviation decreases gradually at
5.4 Validation using ACE Challenge Database 83
-2
-4
-6
199 - 398 Hz 501 - 1000 Hz 1258 - 2511 Hz
Figure 5.5: Mean and standard deviation of DRR estimation error with 18dB, 12dB
and −1dB SNR.
higher frequencies.
Overall it can be seen that compared to the two baseline algorithms, the proposed
method produces an unbiased DRR estimation. The standard deviation of the
proposed algorithm is on par with Kuster’s method, but slightly higher than Jarrett’s
method.
4
DRR Estimation Error (dB)
Ambient
Babble
2
Fan
-2
-4
-6
199 - 398 Hz 501 - 1000 Hz 1258 - 2511 Hz
Figure 5.6: Mean and standard deviation of DRR estimation error in multiple noisy
environments with −1dB SNR.
viation. When developing and testing our algorithm using the ACE Development
Dataset, we noticed that our frequency averaged MUSIC DOA algorithm became
much less reliable at −1 dB SNR, compared to 18 dB and 12 dB SNR. A direct re-
sult of inaccurate DOA estimation is the decreased consistency of DRR estimations
at different utterance/interference configurations in the same room setup, which
is reflected by a higher error standard deviation. It is expected that if a more
interference-robust DOA algorithm is applied, or if the DOA information can be
measured directly, the proposed algorithm would produce more consistent estima-
tions at low SNR.
How different types of interference affect the performance of the DRR estimation
is also investigated. The three noise types mixed into the recordings each have
different spectral characteristics, and therefore their effects on the subband DRR
estimation vary. This is illustrated in Fig. 5.6, which plots the estimation results for
the low, medium and high frequency ranges and for each of the three noise types.
The SNR of all recordings used in this analysis are −1 dB.
From Fig. 5.6 it can be seen that the “Ambient” noise type has the least effect on
DRR estimation accuracy causing only a small bias towards under estimation, while
the “Babble” noise results in more than 3 dB of under estimation for all frequency
ranges. The “Fan” noise has slightly more impact than the “Ambient” noise type,
but less than that of the “Babble” noise. The cause of this result is due to both the
spectral and spatial characteristics of the different noise types.
Fig. 5.7 plots the normalized power spectrum of the three noise types, the spectra
are acquired by manually selecting the sections of recordings that contain purely
noise signal. It can be seen that the “Ambient” noise consists of primarily low
5.4 Validation using ACE Challenge Database 85
-10
-30
Ambient
-40 Babble
Fan
-50
10 1 10 2 10 3 10 4
Frequency (Hz)
Figure 5.7: Normalized power spectrum of the “Ambient”, “Babble” and “Fan”
noises in the ACE Evaluation Dataset.
frequency signals that do not overlap with the speech signal spectrum. Therefore,
the subbands of interest are most likely to have higher SNR than the full band SNR
of −1 dB. As a result, the ambient noise has the least effect on the accuracy of DRR
estimation. On the other hand, the “Babble” noise is essentially a speech recording
by itself, therefore it almost completely overlaps with the spectrum of the speech
of the talker, resulting in the lowest SNR in the speech spectrum of the three noise
types. The “Fan” noise has very similar spectral characteristics as the “Ambient”
noise type, although its higher frequency components have more energy than that
of the “Ambient” noise, which leads to slightly more impact on DRR estimation.
According to the ACE Challenge description [100], the “Fan” noise is generated
using one or two fans inside the recording environment, while the “Babble” noise
records the voices of up to 7 people talking around the recording location. The
“Ambient” noise is a recording of the ambient noise within the room. Due to the
larger number of uncorrelated sources, each with a different DOA, the “Babble” noise
is likely to have a lower coherence level than that of the “Fan” noise. Therefore when
mixed into the speech recording, the “Babble” noise would lower γ 2 further than
the “Fan” noise. Although the nature of the “Ambient” noise is unclear, in typical
room environments its source is likely to be AC vents or windows, both of which
can be considered as localized sources, thus creating a more coherent sound field
than the “Babble” noise. In addition, due to its spectral characteristics, its impact
on DRR estimation is the smallest of all three noise types.
86 Direct-to-reverberant energy ratio estimation using a first order microphone
Table 5.2: Mean of estimated parameters in each room configuration and frequency
range
|R1 | R2
Room Setup Low Med High Low Med High
A 0.280 0.194 0.219 0.288 0.251 0.265
Lecture Room 1
B 0.277 0.293 0.331 0.290 0.293 0.332
A 0.232 0.201 0.146 0.316 0.268 0.290
Lecture Room 2
B 0.239 0.189 0.232 0.337 0.277 0.314
A 0.191 0.120 0.157 0.294 0.248 0.253
Meeting Room 1
B 0.239 0.279 0.118 0.297 0.273 0.291
A 0.226 0.265 0.278 0.201 0.329 0.321
Meeting Room 2
B 0.211 0.215 0.225 0.241 0.255 0.281
A 0.268 0.167 0.174 0.252 0.269 0.286
Office 2
B 0.199 0.213 0.193 0.263 0.282 0.278
The parameters |R1 | and R2 estimated for each subband of every speech recording
in the ACE Evaluation Dataset has been recorded and is presented in Table. 5.2,
where we have taken the average values of |R1 | and R2 for the low, medium and
high frequency ranges and for all the recordings from each room configuration, only
the data from recordings with 18 dB SNR are used for this calculation.
As can be seen from Table 5.2, although the values of R1 and R2 vary for each
room configuration and frequency range, in general, |R1 | falls within the range of
0.15-0.25, while R2 lies in between 0.25-0.33 in the majority of cases. From Fig. 5.1,
it can be seen that the values of |R1 | and R2 shown in Table 5.2 would lead to our
proposed algorithm yielding lower DRR estimations than assuming R1 = 0, R2 =
0.5, which is indeed the case in our estimation results.
From the above results, we believe that setting |R1 | = 0.2 and R2 = 0.28 pro-
vides a more reasonable and accurate model for a general reverberant sound field
within room environments, compared to the diffuse model where it is assumed that
R1 = 0, and R2 = 1/2 or 1/3. It is sometimes easier to acquire or implement differ-
ential microphone pairs than complete first order microphone systems (such as the
Eigenmike), and when a differential array is to be used to estimate room DRR, we
suggest using (5.20) or (5.37) to calculate DRR, and assume |R1 | = 0.2, R2 = 0.28,
which is likely to yield more accurate estimation results.
5.5 Summary 87
5.5 Summary
In this work, we present a novel algorithm for estimating DRR using a first order
microphone system. We show that the proposed algorithm is a generalization of pre-
vious DRR estimation methods based on sound pressure-particle velocity coherence
function. Using the proposed algorithm, it is possible to estimate the characteristics
of a reverberant sound field which are relevant to DRR estimation, thereby improv-
ing the estimation accuracy of the method. We also show that at low frequency
and small source-to-microphone distance, using the plane wave model for the direct
path signal can result in a positive bias on the estimated DRR. Through validating
the proposed algorithm using the ACE Challenge Dataset, it was found that the
proposed algorithm provides ±2 dB mean estimation error for the frequency range
of human speech (200-2500 Hz), and shows no obvious bias.
l
(1) −l−1 −1 iz
X 1
hl (z) =i z e (l + , k)(−2iz)−k . (5.40)
0
2
88 Direct-to-reverberant energy ratio estimation using a first order microphone
(1) (1)
The expression of h0 (z) and h1 (z) can then be written as
(1) 1
h0 (z) = − ieiz (5.41)
z
(1) iz z + i
h1 (z) = − e , (5.42)
z2
(1)
h1 (kr0 ) kr0 + i 1
lim (1)
= lim = lim − i + = −i, (5.43)
r0 →∞ h0 (kr0 ) r0 →∞ ikr0 r0 →∞ kr0
Overview: The use of spherical harmonic expansion to model noise fields enables
in-depth analysis and manipulation of the sound field. In this chapter, we introduce a
number of techniques for improving the spatial ANC performance. In Section 6.2, we
propose an improved sound field synthesis method based on spherical harmonic mode
matching. Through the use of spherical harmonic addition theorem, this method
allows the user to define a number of high priority regions within the quiet zone,
where greater noise attenuation can be achieved compared to the rest of the quiet
zone. In Section 6.3, we propose a new metric for measuring noise energy over a
spherical region, and use the new metric to evaluate the ANC performance of an
experimental ANC system. Finally, in Section 6.4, we use this metric to develop
a method for estimating optimum noise cancellation performance for a given noise
environment, and use this method to estimate the ANC performance in a passenger
car.
6.1 Introduction
The goal of spatial ANC is to minimize the noise level inside a certain quiet zone.
However, the exact “optimal” loudspeaker driving signals that would yield the best
noise reduction depends on the loudspeaker setup as well as the characteristics of
the quiet zone. For example, some regions within the quiet zone may be more
89
90 Methods for spatial ANC performance evaluation and optimization
important than the others, because the users are more likely to stay within these
regions. In such case, it would be beneficial to focus the ANC resources toward
these “more important” regions, which would result in a better overall ANC quality
than attenuating the noise evenly within the quiet zone.
Furthermore, the definition of “optimal” noise attenuation often depends on
the method employed to measure the noise level. For the noise level at a single
point in the space, one microphone is enough to pick up the sound pressure of the
noise; however, for a spatial region, measurement of the average sound pressure level
becomes much more complicated, as sampling the noise field at a few points within
the quiet zone cannot accurately represent the overall noise level inside the whole
quiet zone.
In addition, the ability to estimate, or predict the potentially achievable “op-
timal” spatial noise attenuation would be greatly beneficial to the design process
of spatial ANC systems. The designer would be able to find out the amount of
hardware that is necessary to achieve the desired noise attenuation, or to deter-
mine whether the available loudspeaker setup is sufficient for the ANC task, before
physically implementing a complete ANC system.
In this chapter, we utilize the spherical harmonic analysis technique to develop
a set of algorithms and tools to address the above problems. We show that char-
acterization and control of the noise field can be conveniently done by appropriate
manipulation of the spherical harmonic coefficients of the noise field. This chapter
is organized as follows:
In Section 6.2, we introduce a spatial single zone sound field reproduction tech-
nique which allows for higher reproduction accuracy within certain sub-zones while
maintaining a reasonable reproduction accuracy in the global region. By applying
the spherical harmonics addition theorem, we connect the spherical harmonic coef-
ficients of the global region with that of the local sub-zones, and use a weighting
method to enhance the reproduction quality at the sub-zones. This technique is
particularly useful when the available loudspeakers cannot provide very good sound
reproduction for the whole region, but a high accuracy is desired for at least some
sub-zones, such as some spatial ANC scenarios.
In Section 6.3, we propose a new metric for the measurement of average noise
level over a region. It is formulated in terms of the spherical harmonic decomposi-
tion of sound fields. Through a series of experiments, we show that the proposed
metric provides superior characterization of the noise level within the control region,
compared to existing methods where a number of microphones are placed around
6.2 Enhanced sound field reproduction within prioritized control region 91
the control region to sample the noise level. This metric is particularly suitable for
environments with irregular geometry and a fixed control region with moderate size,
such as vehicle and aircraft cabins.
In Section 6.4, utilizing the noise level metric developed in Section 6.3, we
evaluate a passenger car’s integrated loudspeakers’ noise cancelling capabilities by
analyzing the in-car noise field and the loudspeaker responses. Our proposed anal-
ysis method decomposes the noise field into a number of basis sound patterns, and
evaluate the loudspeakers’ capability at reproducing these basis patterns, then cal-
culate the expected overall noise reduction based on these results. Our results show
that the noise field inside a vehicle cabin has a sparse nature, and that the car’s
loudspeakers are capable of cancelling the noise around the passengers’ head posi-
tions.
6.2.1 Background
In 3D sound field synthesis, a fundamental problem arises which makes implemen-
tation very difficult: the synthesis quality is strongly related to the number and
position of the loudspeakers [105–107]. The ideal placement of the loudspeakers for
the mode-matching technique is to have the loudspeakers evenly distributed on a
sphere surrounding the interested region [44], such structure is impractical in real-
ity. To solve this problem, an array configuration for 3D sound field synthesis using
multiple circular loudspeaker arrays was proposed by Zhang and Abhayapala [108],
this method uses a functional analysis based algorithm to derive the driving signals.
92 Methods for spatial ANC performance evaluation and optimization
Still, the trade off between the number of the loudspeakers and the size and
frequency of the reproduction zone exists. The reproduction quality degrades rapidly
as the number of loudspeakers becomes less than the minimal required number.
In the case that the interested region can be separated and reduced into a few
smaller regions, it is possible to control the sound field in these small regions through
spatial multizone reproduction techniques [109]. However, the calculation involves
matrix inversion, and if done without proper regularization, the results may be
highly unstable.
The goal of this section is to introduce a spatial single zone sound field repro-
duction technique which allows for higher reproduction accuracy within certain sub-
zones while maintaining a reasonable reproduction accuracy in the global region.
This can be achieved by balancing between the single zone reproduction and the
spatial multizone reproduction techniques. Through the use of spherical harmonic
translation, the mode-matching method can be simultaneously applied to both the
global interested zone and certain sub-region within it (referred to as high priority
regions), and by adjusting the weighing factors in the LMS solution, one can easily
control the reproduction quality of different regions. This technique is particularly
useful when the reproduction region is large, but an insufficient number of loud-
speakers are available, and/or in applications where a high reproduction accuracy
is required for certain sub-zones, such as active noise cancellation.
v
where Hlm is the spherical harmonic coefficient of order l and mode m, due to the
vth loudspeaker playing an unit signal. The total number of coefficients is given by
6.2 Enhanced sound field reproduction within prioritized control region 93
N = (L + 1)2 . A desired sound field of the same order can be expressed as a column
vector of spherical harmonic coefficients
h iT
Q = Q00 , Q11 , Q10 , . . . QLL . (6.2)
The least mean square solution for the driving signals D can be written as
D = H −1 Q, (6.3)
where [·]−1 denotes pseudoinverse of the matrix. This minimizes the cost function
Our goal is to find a driving function, which minimizes a new cost function that
contains both L and Lq , and also has a weighting factor which could further enhance
the reproduction accuracy within the sub-region. This can be expressed as
Then, a LMS solution for synthesizing the desired sound field only within the sub-
region Oq can be found by solving
S
b qQ = S
b q HD. (6.8)
b q H)−1 S
D = (S b q Q. (6.9)
It should be noted that the solution provided by (6.9) normally requires regular-
ization, since the matrix S
b q H may be ill conditioned, which may result in very large
driving signals for the loudspeakers. This is especially true when the sub-region Oq
is small.
The local cost function Lq can be expressed as
Lq = (S b q HD)H (S
b qQ − S b qQ − S
b q HD), (6.10)
which corresponds to the sum of the squared errors in all local spherical harmonic
coefficients in the subregion Oq .
The combined cost function (6.7) can then be written as
Lall = E H H
G E G + αE Q E Q , (6.11)
The solution (6.13) considers not only the set of global coefficients, but also a linear
mapping of these coefficients which correspond to the sound field in a sub-region Oq
6.2 Enhanced sound field reproduction within prioritized control region 95
within the interested zone. By adding a weighing factor α, the priority of the sub-
region can be controlled. When α = 0, the sub-region is ignored and the solution
becomes identical to (6.3); if α = 10, the local reproduction accuracy becomes
10 times more significant than the global accuracy, and as a result the driving
signals D would construct a sound field where the reproduction error within Q is
approximately 10 times smaller than the global average.
Equation (6.13) can be extended to the multiple sub-region case, with each region
controlled by a separate weighing factor
−1
α1 S
b q1 H α1 S
b q1 Q
α2 S
b q2 H α2 S b q2 Q
.. ..
D= . (6.14)
.
.
α S H α S Q
n qn n qn
b b
βH βQ
a b
−1 −1
−0.8 −0.8
−0.6 −0.6
−0.4 −0.4
−0.2 −0.2
m
m
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m
c d
−1 −1
−0.8 −0.8
−0.6 −0.6
−0.4 −0.4
−0.2 −0.2
m
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m
Figure 6.1: Comparison of three methods for sound field reproduction on θ = π/2
plane. (a) plots the desired sound field; (b) plots the synthesized sound field using
LMS mode matching for global region; (c) plots the synthesized sound field using
proposed prioritized region LMS method, and (d) plots the synthesized sound field
using LMS mode matching for the high priority region only.
98 Methods for spatial ANC performance evaluation and optimization
a b
−1 −1
−0.8 −0.8
−0.6 −0.6
−0.4 −0.4
−0.2 −0.2
m
m
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m
c d
−1 −1
−0.8 −0.8
−0.6 −0.6
−0.4 −0.4
−0.2 −0.2
m
0 0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m
Figure 6.2: Comparison of three methods for sound field reproduction with 2 high
priority zones. (a) plots the desired sound field; (b) plots the synthesized sound field
using LMS mode matching for global region; (c) plots the synthesized sound field
using proposed method with two high priority regions, and (d) plots the synthesized
sound field using LMS mode matching for the high priority regions only.
a b −3 c −3
x 10 x 10
−1 0.1 −1 5 −1 5
4 4
−0.5 0.05 −0.5 −0.5
3 3
m
0 0
m
0 0
2 2
1 −0.1 1 0 1 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m m
Figure 6.3: Reproduction error plots for different high priority zone weight set-
tings;(a) desired sound field, (b) reproduction error with α1 = α2 = 10, and (c)
reproduction error with α1 = 30, α2 = 10.
100 Methods for spatial ANC performance evaluation and optimization
The first row in Table 6.1 shows the performance of the normal LMS algorithm as
a reference. The mean square error of the LMS method without prioritized control
is used as a reference for comparison of mean square error of different setups. It
can be seen that an average error of 7.24% is observed from a total of 121 sound
field coefficients. The next four rows show the simulation results for 1 priority zone,
with its maximum order set to 3 and 4, respectively. In both situations, an increase
in the global mean square error is observed; the global error percentage also saw
a slight increase. Most importantly, the error percentages in the priority zones are
much smaller than the global error percentage (0.48% and 1.93% ), the error in row
5, Table 6.1 is greater because of the larger size of the priority region, as well as the
lower weight applied to the priority zone.
The rest of Table 6.1 shows the simulation results for 2 and 3 high priority
regions, in both cases each sub-region is given a different weighting factor. The
effect of the weighting factors can be easily seen, as the sub region with the largest
weighting assigned always result in the lowest error percentage, while the regions
with low weights and large radii only see a slight improvement over the global zone.
Another observations is that the global synthesis precision degrades more greatly
when a large weight is given to the high priority zone. Therefore, in practice, we
recommend to choose the weightings and the radii of the sub-regions according to
the needs, rather than simply using overly large values.
In order to investigate the impact of the weighting factor on the prioritized
sub-region and the global sound field, a series of simulations are carried out. The
6.2 Enhanced sound field reproduction within prioritized control region 101
0.12
Global Error
Local Error
0.1
0.08
Error Percenage
0.06
0.04
0.02
0
0 1 2 3 4 5 6 7 8 9 10
High Priority Zone Weight
Figure 6.4: Effect of the weighting factor on the local and global error percentage
accuracy; if the local region is large, the accuracy gain may become smaller. In prac-
tice, Figure 6.4 can be used as a trade-off guidance when choosing the appropriate
weighting factors for each region.
6.3.1 Background
For the successful development of an ANC system, it is important to accurately
measure their noise reduction capability over space, especially at the design stage.
At present, the performance of ANC systems is analyzed in terms of (i) sound
pressure at the error microphones or (ii) recordings from a secondary microphone(s)
in the cancellation region. The first approach is widely used in theory, where noise
reduction performance is characterized by the average noise reduction in decibels at
the error microphones [111]. The second approach is mostly used with human head
shaped mannequins with 2 microphones placed at the ear locations to interpret
the noise reduction levels experienced by humans [112, 113]. While both of the
above methods are adequate to obtain an acceptable measure of the ANC system
performance, their accuracy in terms of spatial coverage is limited due to the limited
number of measurement points and the sparse nature of the spatial sampling.
Here, we propose an improved metric to evaluate the noise reduction in spatial
regions. It is formulated in terms of the spherical harmonic decomposition of sound-
fields and requires the measurements from a secondary microphone array distributed
over a spherical surface, preferably enclosing the center of the region of interest. The
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 103
spatial metric is defined as the acoustic potential energy inside a spherical region,
and we formulate it in terms of the aforementioned microphone array recordings.
A similar spatial metric was introduced in [114] for rectangular enclosures, where
the acoustic potential energy was described in terms of room modes. However, the
results were limited to simulations and the extraction of room modes is difficult in
practical applications where the natural modes of a room depends greatly on the
geometry of the room.
The proposed potential energy method calculates the average noise level in the
entire spatial region, therefore there is no need to take multiple samplings of the con-
trol region, which simplifies the process of ANC performance evaluation. Compared
to [114], our method represents the potential noise using spherical harmonic coef-
ficients rather than room modes, which can be conveniently captured by spherical
microphone arrays. The main advantage of this approach is its applicability to any
arbitrary enclosure and its independence from the ANC system of use. Therefore
our method is particularly suitable for environments with irregular geometry and a
fixed control region with moderate size, such as vehicle and aircraft cabins.
Z
1
Ep (k) = |P (r, θ, φ, k)|2 dS (6.15)
4ρ0 c2 S
R R R R π R 2π
where S dS = 0 0 0 r2 dr sin(θ)dθdφ denote the integral over a sphere. Using
(2.1), we can decompose the integral of sound energy as [115]
Z
|P (r, θ, φ, k)|2 dS (6.16)
S
Z R Z π Z 2π
= P (r, θ, φ, k)P ∗ (r, θ, φ, k)r2 dr sin(θ)dθdφ (6.17)
0 0 0
X Z R
∗
= Clm (k)Clm (k) jl2 (kr)r2 dr, (6.18)
l,m 0
104 Methods for spatial ANC performance evaluation and optimization
where the orthogonal property of the spherical harmonics (2.5) was used. Therefore
(6.15) can be expressed using the spherical harmonic coefficients as
1 X
Ep (k) = |Clm (k)Wl (k)|2 , (6.19)
4ρ0 c2 l,m
where ρ0 denotes the density of the media and c is the speed of sound, and we define
Z R 1/2
Wl (k) , jl (kr)2 r2 dr . (6.20)
0
The above result shows that the acoustic potential energy within a spherical
region is given by a sum of squared spherical harmonic coefficients with the weighting
Wl (k).
The commonly used criteria for ANC performance evaluation measures the at-
tenuation of the noise energy at microphone positions, the microphones are either
the error microphones themselves, or some additional microphones placed within
the region of interest. In the former case, it is difficult to gain any insight into the
spatial ANC performance of the system, due to lack of sampling of the noise level
inside the control region. When additional microphones are utilized to measure the
noise level inside the control region, it is necessary to sample the control region at
multiple locations in order to have a complete evaluation of the noise attenuation.
On the other hand, the proposed potential energy method calculates the average
noise level in the entire spatial region, therefore there is no need to take multiple
samplings of the control region, which simplifies the process of ANC performance
evaluation. In [114], the potential energy criteria is applied to evaluate the noise
level inside rectangular cabins. However, in practical scenarios, the natural modes of
a room depends greatly on the geometry of the room, and measuring these modes in
a practical environment is very difficult. Our method, on the other hand, represents
the potential noise using spherical harmonic coefficients, which can be conveniently
captured by spherical microphone arrays. Therefore our method is particularly
suitable for environments with irregular geometry and a fixed control region with
moderate size, such as vehicle and aircraft cabins.
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 105
Secondary
source 1
ͲǤͻͷ Mic 5
Mic 4
ͲǤͳ
ܱ
ͳǤ͵ͷ Mic 1
ͲǤʹ Mic 3
Primary source
ͲǤͻͷ Mic 2
Secondary
source 2
Figure 6.5 shows the hardware configuration of the system, where the control region
is defined as a spherical area with 0.1 m radius, which approximately covers the size
of a human head. Five AKG CK92 omnidirectional microphones are placed evenly
on the horizontal plane boundary of the region, which act as the error sensors.
In order to investigate the differences in ANC performance due to different error
microphone setups, we vary the radius of the error microphone array, as well as the
number of active microphones in the array. The array radius is varied between 10
cm and 20 cm, and the microphones used in each experiment are either (i) all five
microphones, or (ii) “Mic 2” and “Mic 5” shown in Fig. 6.5 only. This results in a
total of four combinations of array radius and microphone number.
Three TANNOY 600 loudspeaker are used as the primary and secondary sources.
The two secondary speakers are placed on either sides of the primary source, forming
an angle of 72 degrees.
The error microphone signals are transmitted to a PC, which performs the adap-
tive ANC algorithm and generates the secondary loudspeaker driving signals in real
time. Since the focus of this experiment is not on the performance of the MIMO
ANC algorithm itself, the reference signal is obtained directly from the electronic
106 Methods for spatial ANC performance evaluation and optimization
signal path of the primary source, rather than using a separate reference micro-
phone. This eliminates the feed back signal path from the secondary sources to the
reference sensor which may affect the ANC performance.
An Eigenmike is placed at the center to monitor the noise field within the control
region. Although the Eigenmike is capable of capturing spatial sounds up to 4 th
order at 4 kHz, we only focus on the lower frequency sounds (up to 800 Hz and 1st
order). This is because at higher frequencies, the second order spherical harmonics
begins to have a higher contribution towards the sound energy close to the boundary
of control region, but the Eigenmike is unable to capture the second order sound
field at that frequency, due to its smaller radius (4.2 cm) compared to the control
region.
A separate computer is used to process the audio signal recorded by the Eigen-
mike and calculate the potential energy while the ANC system is running. The
Eigenmike is not involved in the signal path of the ANC system in any way.
Both narrow-band and wide-band signals are used as the primary noise for the
experiments. The narrow-band signals are sine waves with frequencies 100 − 800
Hz; the wide-band signal is generated by filtering a in-car noise recording through
a 100 − 800 Hz bandpass filter.
For narrow-band experiments, a sine wave is played through the primary speaker,
then we calculate the average sound energy recorded by each error microphone with
and without ANC, and calculate the attenuation of the noise energy due to ANC.
The attenuation of the average sound energy within the control region is measured
in the same way.
For wide-band experiments, we play the wide-band noise and record a section of
signal from the error microphones as well as the Eigenmike while the ANC system is
not active, then repeat the recording with ANC active and fully converged. We then
calculate the average frequency spectrum of each recorded section. The playback
and recordings are synchronized such that the same section of noise signal is recorded
each time.
Figure 6.6: Picture of the experiment setup, the small loudspeakers in the back-
ground are not used in the experiment.
(a)
0
Attenuation (dB)
-5
-10
-20
-30
100 200 300 400 500 600 700 800
Frequency (Hz)
Figure 6.7: Average narrow-band noise energy attenuation at control region and
microphone locations using 5 error microphones (a) and 2 error microphones (b).
Legend of (b) is the same as (a).
crophone signals are no longer a good indication of the ANC systems’s performance.
The effect of the number of error microphones is also investigated. For this purpose,
we repeated the narrow-band experiments with only two error microphones active,
and the noise attenuation results are shown in Fig. 6.7 (b). From this figure, it can
be observed that both the microphone signal attenuations and the potential energy
attenuations become very different from the case where 5 microphones are used. In
particular, the microphone signals can achieve more than 10 dB attenuation for all
frequencies, and the attenuation does not decay with increased frequency; on the
other hand, the potential energy attenuation is significantly worse compared to the
5 channel case, and the value even became positive (higher noise level with ANC
active) at some higher frequencies.
The cause of this phenomenon is that the number of error microphones is equal
to the number of secondary sources, therefore a solution always exists to significantly
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 109
(a)
0
-20
Without ANC
-30 10 cm radius
20 cm radius
-40
100 200 300 400 500 600 700 800
Frequency (Hz)
(b)
0
Sound Energy (dB)
-10
-20
Without ANC
-30 10 cm radius
20 cm radius
-40
100 200 300 400 500 600 700 800
Frequency (Hz)
Figure 6.8: Spectrum of potential energy at control region (a) and microphone
locations (b) using wide-band noise signal.
reduce the sound pressure at the microphone positions, but often at the cost of very
high secondary source driving signals. Since the two microphones provide a complete
coverage of the control region, the potential energy inside the region becomes less
controllable compared to the 5-channel case, and in some extreme cases, this results
in positive attenuation inside the region. In this case, the microphone signals are
not a good indication of the ANC performance inside the control region, except at
very low frequencies (below 150 Hz), as can be seen in Fig. 6.7 (b).
Wide-band performance
From Fig. 6.8 (a) it can be seen that overall the wide-band ANC performance
agrees with the corresponding narrow-band results shown in Fig. 6.7 (a), where an
attenuation of 10 dB and more can be achieved for most frequencies below 400 Hz,
while at higher frequencies, the attenuation gradually reduces to 3−5 dB. Comparing
the curves corresponding to 10 cm and 20 cm microphone array radius, it can be seen
that the low frequency ANC performance of the two configurations are very similar,
with the 10 cm configuration being superior at certain frequency ranges. At higher
frequencies, the 10 cm configuration yields consistently better attenuation than the
20 cm configuration, which also agrees with Fig. 6.7 (a), although the attenuation
is slight worse for the 10 cm radius case.
In the case of Fig. 6.8 (b), the attenuation observed at microphone positions
differ greatly for the two radius configurations. For 10 cm radius, the attenuation is
greater than 5 dB for nearly all frequencies between 100−800 Hz, while for the 20 cm
setup, the attenuation becomes negligible above 475 Hz. Up on closer observation, it
can be seen that the microphone attenuation at 10 cm radius is 2−3 dB greater than
that of the potential energy inside the control region. Therefore, neither of the two
microphone position configurations truthfully reflect the ANC performance within
the control region, although the result is more accurate when the microphones are
placed closer to the control region.
6.4.1 Background
The application of noise cancellation methods to minimize interior cabin noise has
been a key topic of research in the automobile industry for the last 15 − 20 years
[116]. Initially, this problem was approached via passive noise cancellation methods,
which use acoustic treatments such as structural damping and acoustic absorption.
However, with the growing need to improve fuel efficiency, there has been more
preference on lighter bodies and smaller engines, which has significantly increased the
structural vibration and consequent interior noise, predominantly at low frequencies
(e.g. 0 − 500 Hz) [117]. As passive methods were least effective with low frequency
noise, active methods were developed where secondary loudspeakers were proposed
to attenuate measured noise inside the cabin [6, 117–120]. With modern in-car
entertainment systems providing 4 − 6 built-in loudspeakers, the addition of an
active noise cancellation systems is considered to involve no greater cost [7].
To the best of our knowledge, the existing in-car MIMO controllers are con-
strained to a set of arbitrary observation points. As a result, spatial control over
continuous regions is limited and made worse with increased frequency. Addressing
this issue, we focus this work on modeling vehicle-interior noise over a continuous
spatial region such that noise control can be achieved over the region with size simi-
lar to a human head for frequencies up to f = 500 Hz. We also derive the maximum
attenuation levels for a given speaker arrangement so that industrial designers can
investigate the potential noise cancellation capability of a given loudspeaker system
for various noise sources and driving conditions. All of the analysis we perform are
based on acoustic measurements taken in a real in-car environment.
112 Methods for spatial ANC performance evaluation and optimization
Denote the unwanted noise pressure at a point x as Pn (x), and the sound pres-
sure due to the loudspeakers as Pc (x), the average residual noise energy within the
interested region S can be expressed as
Z Z
2
|Pr (x)| dS = |Pn (x) + Pc (x)|2 dS. (6.21)
S S
(n) (c)
where the expression (6.19) is used, and Clm and Clm are the spherical harmonic
coefficients representing the noise field and the loudspeaker anti-noise field, respec-
tively.
R
We then move on to derive an estimation of S |Pr (x)|2 dS, by analyzing the noise
field and loudspeaker channel characteristics.
For a certain driving condition, we assume that the random noise field within S can
be seen as a weighted combination of a number of fixed, basis noise patterns, or
noise modes [121], each driving condition may have a different set of basis. Then
the noise field pressure within S at any time under a fixed driving condition can be
6.4 In car spatial ANC performance analysis 113
decomposed as
X
Pn (x) = gi Pi (x), (6.24)
i
where Pi (x) denotes the ith basic noise pattern at x, and gi are some random weigh-
ing factors for each noise pattern. Theoretically an infinite number of modes are
needed to fully describe an arbitrary noise field, however for a relatively small region
and low frequencies, only a small number of noise modes are required for a good
approximation of the noise field [121]. Using the spherical harmonics decomposition
(2.1) to decompose the noise field Pn (x) and the basis patterns Pi (x), we can express
(n) i
each noise field coefficient αnm using the corresponding coefficient Clm of every basis
pattern,
(n)
X
i
Clm = gi Clm . (6.25)
i
(n) (n)
We can write all the coefficients in a vector form such that C = [C00 , C11 , ..]T , and
i i
C i = [C00 , C11 , ..]T , then from (6.25) we have
X
C= gi C i . (6.26)
i
R
where c represents the random noise field in S and kck2 = S |Pn (x)|2 dS. Similar to
the modal domain MUSIC DOA algorithm [122], we can find a set of ci by calculating
the autocorrelation matrix E{ccH }, and then decompose E{ccH } to acquire a set of
orthonormal eigenvectors and their corresponding eigenvalues. Unlike the MUSIC
DOA method which utilizes the noise subspace eigenvectors, we select the signal
subspace eigenvectors to be ci , which correspond to the eigenvalues λi whose values
are significant. The eigenvalues indicate the energy distribution of the overall noise
field among the basis noise patterns, and E{|gi |} = λi .
114 Methods for spatial ANC performance evaluation and optimization
Through decomposing the noise field into basis noise patterns, we gain more in-
sight in the dimensionality/sparsity of the noise field. A noise field of high order may
have a compact representation using (6.24). Furthermore, additional signal analysis
methods such as direction-of-arrival (DOA) estimation may be applied on the basis
noise patterns to identify principal noise sources, which helps in determining optimal
loudspeaker placement for ANC purposes when designing the vehicle.
v
with Hlm being the spherical harmonic coefficient of order l and mode m, defined
in the region of interest S, due to the vth loudspeaker playing a unit signal at one
frequency. For an Lth order region S and an array of V independent loudspeakers,
the size of H is (L + 1)2 -by-V .
Since the noise field can be completely described by its eigenvectors c1 , c2 .., we
can estimate the noise cancellation performance by comparing the eigenvectors with
the loudspeaker channels. In particular, we define the weighted channel matrix
T = WH, where W is the diagonal matrix defined in Section 6.4.3.
Then we can solve for the loudspeaker driving signal solution that minimizes
(6.22) for each basis noise field pattern defined by ci , which can be derived as
The driving signal solution (6.30) is essentially the Least Mean-Square Error (LMS)
solution over the continuous space S, instead of the LMS solution based on a number
of discrete spatial sampling points which is commonly used in existing car ANC
systems.
We use the eigenvalues λi as well as the original and residual noise field vectors,
ci and ei , respectively, to express the noise cancelling performance, and the overall
expected noise power reduction ratio can be given using (6.28)
R
E{ S |Pr (x)|2 dS} 2
P
i kλi ei k
e= R = P 2
, (6.32)
E{ S |Pn (x)|2 dS} i kλi ci k
In this experiment, we use the method developed in the previous sections to analyze
the potential noise cancellation performance of the loudspeakers installed in a car
(2005 Ford Falcon XR6 sedan).
We use an Eigenmike to measure the in-car noise field; the region of interest
is chosen to be a spherical area with 10 cm radius, located at the head position
of the frontal passenger seat. The radius of the region is larger than that of the
EigenMike (4.2 cm), therefore we only analyze the sound field for frequencies below
500 Hz, within this frequency range, only the 0th and 1st order sound field harmonics
are active inside the region of interest [53], which can be reliably measured by the
Eigenmike placed in the center of the region. Also, spectral analysis of the in-car
noise indicate that the majority of the noise power lie below 500 Hz (an example
of the noise spectrum is shown in Fig 6.11), thus the noise cancelling performance
within this frequency band is indicative of the overall cancelling quality.
The vehicle has four full-band loudspeakers installed, two of which are integrated
in either of the front doors, while the other two are placed behind each rear seat.
Unfortunately, the car’s audio system can only play stereo signals, which means the
two loudspeakers on either side cannot be driven separately, and always play the
116 Methods for spatial ANC performance evaluation and optimization
same signal.
In order to characterize the noise field, we record the in-car noise under var-
ious driving conditions. We also recorded the noise fields due to engine and air-
conditioner while the car is stationary. For each driving condition, a 10-second-long
recording is separated into 100 snapshots, we then calculate the sound field co-
efficients for each snapshot and at every frequency bin, and finally calculate the
coefficient covariance matrix of all the 100 snapshots. The covariance matrix used
as the estimation of ccH , and is used for further data analysis.
When calculating the residual noise field vector ei , we include a small regular-
6.4 In car spatial ANC performance analysis 117
Table 6.2: Table of noise field eigenvalues for freeway driving condition and pure
engine noise
-5
Noise Power Attenuation (dB)
-10
-15
-20
-25
Busy Road
-30 Engine Only
AC Only
-35 Freeway
-40
50 100 150 200 250 300 350 400 450 500
Frequency (Hz)
Figure 6.10: Noise power spectrum attenuation for 4 different driving conditions.
bin. Therefore we expect that the lower frequency noise fields can be seen as sparse,
thus controlling such sound fields may require only a small number of well-placed
loudspeakers which can nicely reproduce the dominant noise pattern.
Figure 6.10 plots the noise power attenuation for four different driving conditions,
with the values calculated using (6.32). In addition to the freeway recording and
the engine noise recording, the “Busy Road” recording was taken while driving on
a 3-lane road at moderate speed with multiple vehicles passing by; while the “AC
only” recording was taken with the car parked in a quiet place and engine idle, the
air conditioning turned to maximum.
Figure 6.10 indicates that for most cases, the noise cancelling performance is
relatively consistent, with the attenuation reducing gradually from 30 − 35 dB at 50
Hz to 15−20 dB at 500 Hz. This frequency-dependent performance is expected since
the noise field is expected to be more complicated and harder to reproduce/cancel
when the wavelength is shorter. We also notice that the noise field due to air
conditioning is particularly difficult to cancel at 50 − 100 Hz, compared to other
scenarios. We expect this is because the noise field due to AC is less similar to
that of the loudspeakers, compared to other noise sources. One may also notice the
6.4 In car spatial ANC performance analysis 119
-30
-40
-50
-60
-70
-80
10 2 10 3 10 4
Frequency (Hz)
Figure 6.11: Comparison of average noise field power spectrum before and after
cancellation.
common peak in all cases at 470 Hz, clearly at this frequency, the loudspeakers are
unable to reproduce the noise fields very well.
We also include Fig. 6.11 which depicts the overall noise spectrum without at-
tenuation, and the expected residual noise spectrum if the in-car loudspeakers are
employed to cancel the noise field. The original noise spectrum is recorded while
driving at 70 km/h with air conditioning at minimum. The attenuation is cut off at
500 Hz. We can see from the figure that the most dominant noise frequencies can
be effectively cancelled by the integrated loudspeakers, resulting in a much quieter
sound field within the region of interest.
In general, we can conclude that the integrated loudspeakers are capable of can-
celling the noise field within our defined region of interest at the front passenger seat.
However, we would expect the performance to degrade should the noise cancellation
be carried out for multiple seats. Nevertheless, a proper in-car ANC system would
be able to drive the four loudspeakers separately, which provides extra degrees of
freedom for the loudspeaker channels, thereby promoting the overall performance of
the system.
120 Methods for spatial ANC performance evaluation and optimization
The theory developed in Section. 6.4.3 can be easily extended to multi-zone case.
Assuming that a number of adjacent regions are defined inside the car cabin, Then,
considering one of the control regions Sj , we can use the spherical harmonics de-
composition (2.1) to decompose the noise field Pn (x), x ∈ Sj as well as the basis
patterns Pi (x), x ∈ Sj , we can then express the noise field coefficients belonging
j j,i
to the jth control region Clm using the corresponding coefficient Clm of every basis
pattern,
j
X j,i
Clm = gi Clm . (6.34)
i
We have shown that the average energy of a noise field is related to the spherical
harmonic coefficients that represent the noise field by Wl ., substituting clm = Clm Wl
into (6.34), we have
X j,i
cjlm = gi clm . (6.35)
i
Since we are considering the overall noise field over all of the control regions, it
is convenient to write the coefficients of all regions in vector form, such that c =
[c100 , c111 ...c200 ...cjLL ]T , and ci = [c1,i 1,i 2,i j,i T
00 , c11 ...c11 ...cLL ] . Then from (6.35) and combining
the coefficient of all control regions we have the vector representation
X
c= gi ci . (6.36)
i
A limitation of the mode matching method for deriving loudspeaker driving sig-
nals is that the amplitude of the loudspeaker driving signal is unbounded. Although
a regularization can be added to the matrix inversion in (6.30) to avoid extremely
high driving signals, there is no strict upperbound to the loudspeaker output power.
From a practical point of view, driving a loudspeaker beyond its linear operating
range would result in harmonic distortions, which introduces additional noise in the
control regions. In order to avoid this problem, we define the optimization problem
where Di are the elements of D and represent the driving signal for the ith loud-
speaker, k · k denotes `2 -norm, K is a constant which sets the volume upper bound
6.4 In car spatial ANC performance analysis 121
j,v
where Hlm being the spherical harmonic coefficient of order l and mode m, associated
with the jth control region, due to the vth loudspeaker playing a unit signal.
Experiment Setup
In this experiment, we aim to investigate the noise field complexity within a 2005
Ford Falcon XR6 sedan, under various driving conditions; as well as examine the
noise cancelling potential of the multimedia loudspeakers installed in the car. The
regions of interest are chosen to be spherical regions located at the head position of
each of the four seats, the radius of each region is set to 10 cm, which covers the
size of a human head.
For this experiment, we focus on the noise below 200 Hz. Using (6.19), we can
calculate the relative contribution of each spherical harmonic mode towards the total
noise energy within the control regions, at f = 200 Hz we have
R
|P (x)|2 dS |α00 W0 |2
RS 00 = P ≈ 0.972 (6.40)
2
S
|P (x)|2 dS l,m |αlm Wl |
thus the 0th order spherical harmonic accounts for the vast majority of the noise
energy within the control regions, for frequencies below 200Hz, the contribution of
the 0th mode is even higher (99.3% at 100 Hz). Therefore, in our experiments,
we only monitor the 0th order spherical harmonic for each control region, which
can be done by placing a single omni-direction microphone at the center of each
122 Methods for spatial ANC performance evaluation and optimization
region. We note that we measure only the 0th mode spherical harmonic because
at low frequencies, the 0th mode contributes to the majority of the noise energy,
not because we believe the noise field is isotropic. For noise field analysis of large
region and higher frequencies, higher-order microphones are required, such as the
Eigenmike.
The recording system we use consists of four AKG CK92 omnidirectional con-
denser microphones, connected to a TubeFire 8 audio interface via four AKG SE300B
microphone pre-amps. The synchronous audio streams are recorded using a Mac-
book, which is connected to the TubeFire 8 via firewire.
We record the noise field at the four control regions simultaneously for various
driving conditions, including the pure engine noise recording, where the car is parked
in a relatively quiet place and the engine ran at 2000 rpm. For each driving condition,
we record the noise for 10 seconds. The recording is then split into 100 frames and
j
transformed into spherical harmonic coefficients α00 (k) at different frequency bins
for further analysis.
The Ford sedan has four full-band loudspeakers installed, two of which are in-
tegrated at the bottom of either of the front doors, while the other two are placed
behind each rear seat. However, the car’s audio playback system only supports
stereo signals, which means the two loudspeakers on the left side simultaneously
play the left channel of the stereo signal, and the same goes for the right channel.
We obtain the loudspeaker channel matrix by measuring the impulse response
at the region of interest due to the left channel and right channel separately, and
then calculating the corresponding sound field coefficients for each frequency bin, in
the same way as we obtain the noise field measurements. The channel matrix takes
the form of (6.39). The 0th order sound fields at 4 regions and the stereo speaker
system result in a 4-by-2 channel matrix for each frequency bin.
In order to estimate the noise cancellation capability of the in-car loudspeakers
in each driving condition, we solve (6.37) for each of the 100 snapshots in every
recording, and calculate the expected residual noise energy for each snapshot. The
value of K is chosen such that the sound energy at the regions of interest due to each
loudspeaker is no more 3 times more than that due to the noise. We then calculate
the average noise energy attenuation using
P100
kcn + T D n k2
A = n=1
P100 , (6.41)
kc k2
n=1 n
where cn and D n are the weighted coefficient vectors and the optimal driving signals
6.4 In car spatial ANC performance analysis 123
Table 6.3: Noise field eigenvalues for freeway driving condition and pure engine noise
100 km/h 40 Hz 80 Hz 120 Hz 160 Hz 200 Hz
λ1 1.000 1.000 1.000 1.000 1.000
λ2 0.292 0.282 0.498 0.476 0.292
λ3 0.062 0.207 0.181 0.372 0.102
λ4 0.007 0.139 0.049 0.092 0.053
Engine Only 40 Hz 80 Hz 120 Hz 160 Hz 200 Hz
λ1 1.000 1.000 1.000 1.000 1.000
λ2 0.033 0.315 0.108 0.293 0.042
λ3 0.005 0.095 0.018 0.106 0.031
λ4 0.000 0.018 0.003 0.045 0.015
Data Analysis
We first investigate the dimensionality of the combined noise field over the four
control regions by observing the eigenvalues of the estimated covariance matrix of
the spherical harmonic coefficients. We normalize the eigenvalues and sort them
from the largest to the smallest, the results for pure engine noise and the noise
when driving at 100 km/h are shown in Table 6.3. We can see from Table 6.3 that
the eigenvalues of the engine noise are almost always smaller than the corresponding
eigenvalues of the freeway driving condition (100 km/h). In the case of engine noise,
the fourth eigenvalue is in the order of 0.01 for most frequencies, therefore the noise
field may be modelled using 3 noise modes in (6.25), without significant loss of
accuracy. As a result, in order to effectively cancel the engine noise over the four
control regions simultaneously, a minimum of 3 loudspeakers would be sufficient,
assuming that the loudspeaker channels have sufficient diversity.
On the other hand, the noise field of the freeway driving condition is more
complicated, the fourth eigenvalues are above 0.01 for all frequencies above 40 Hz.
Therefore at least four independent loudspeakers are required to effectively cancel
the noise within the control regions simultaneously.
Since the car’s loudspeakers can only play stereo signals, and that the combined
noise fields require no less than 4 independent loudspeaker channels to effectively
control, we do not expect a high noise energy attenuation over 3 or 4 seats. However,
we expect the loudspeakers to simultaneously cancel the noise over two control
regions with good results. In order to validate our expectations, we use (6.41)
to calculate the expected noise attenuation for simultaneous noise cancellation for
2, 3 and 4 seats, the results are shown in Figs. 6.12-6.15. The noise cancellation
124 Methods for spatial ANC performance evaluation and optimization
0
60km/h
-10
80km/h
Average Attenuation (dB)
100km/h
Engine Noise
-20
-30
-40
-50
-60
40 60 80 100 120 140 160 180 200
Frequency (Hz)
Figure 6.12: Expected noise power attenuation after noise cancellation in the two
front seats only.
performance for the two front seats only is shown in Fig. 6.12. The attenuations are
calculated for frequencies from 40 Hz to 200 Hz, and for driving speeds at 60 km/h,
80 km/h and 100 km/h. The attenuation for the engine noise is also included in the
figure. We can see from Fig. 6.12 that the attenuation for all three driving speeds
are very similar. The residual noise level is highest at 40 Hz, and gradually reduces
to around -40 dB for all three driving speeds. The engine noise, on the other hand,
can be effectively cancelled at most frequency bins. We believe this is because of
the low dimensionality of the engine noise field, as is shown in Table 6.3.
Since we are only considering the 0th order coefficients in our calculations, while
ignoring the other coefficients which contribute to approximately 1 percent of total
noise energy, the upper bound of actual achievable attenuation would be around 20
dB, depending on the loudspeakers’ ability to attenuate the higher order coefficients.
Figure 6.13 shows the results for simultaneous noise control for the two right
side seats. A trend similar to that in Fig. 6.12 can be observed. We believe that
the reason for the increasing attenuation over frequency is due to the impact of
wavelength on loudspeaker channels, where at low frequency, the sound pressure at
two different seats due to one particular loudspeaker is very similar. Therefore the
loudspeaker channel matrix is highly coupled at low frequencies, resulting in less
6.4 In car spatial ANC performance analysis 125
0
60km/h
Average Attenuation (dB)
-10
80km/h
100km/h
Engine Noise
-20
-30
-40
-50
-60
40 60 80 100 120 140 160 180 200
Frequency (Hz)
Figure 6.13: Expected noise power attenuation after noise cancellation in the two
right side seats only.
noise attenuation under the same output power constraint. Figure 6.14 illustrates
the expected ANC performance for simultaneous 3-seat noise control (two front seats
and left passenger seat). As expected, the noise energy reduction is significantly
worse than the two-seat cases, with around 10 dB reduction across all frequency
bins of interest. We also notice that the engine noise is no longer easier to cancel
than the other noise fields apart from a few frequency bands (40-60 Hz). This is
consistent with Table 6.3, where the third and fourth eigenvalues of engine noise
at 40 Hz are very small, indicating a sparse noise field with 2 degrees of freedom,
therefore the noise field can be controlled by a stereo system. We also include
Fig. 6.15 which depicts the four-seat ANC performance. Compared to Fig. 6.14, the
attenuation is even smaller at around 6-7 dB. However, the ANC performance is once
again consistent over different driving speeds. From this observation, we estimate
that the noise field at different driving speeds are similar, and that a loudspeaker
array’s capability of controlling in-car noise does not vary greatly at different driving
speeds.
The attenuation of the engine noise is often lower that of the noise fields under
various driving conditions. However, from our subjective tests, the majority of the
noise in the car cabin came from the tires and suspension, the engine noise only
plays a small part in the overall perceived noise. Therefore, it is understandable
126 Methods for spatial ANC performance evaluation and optimization
-2 60km/h
80km/h
Average Attenuation (dB)
-4 100km/h
Engine Noise
-6
-8
-10
-12
-14
-16
-18
-20
40 60 80 100 120 140 160 180 200
Frequency (Hz)
Figure 6.14: Expected noise power attenuation after noise cancellation in the two
front seats and the left passenger seat.
that the overall noise reduction is different from the engine noise suppression under
the same conditions.
We would like to point out that the analysis of multiple-seat ANC performance
is limited to 200 Hz and 0th order only because of the limitations in the hardware
setup, more specifically, the lack of synchronized higher order microphones. In order
to obtain the analysis results for higher frequency and/or larger control regions,
it is necessary to replace the omni-directional microphones that are used in this
experiment with suitable higher order microphones, so that sound field components
of higher orders can be captured.
-2
Average Attenuation (dB)
-4
-6
-8
-10
-12
-14
60km/h
-16 80km/h
100km/h
-18
Engine Noise
-20
40 60 80 100 120 140 160 180 200
Frequency (Hz)
Figure 6.15: Expected noise power attenuation after noise cancellation in all four
seats.
6.5 Summary
In this chapter, we proposed one method to enhance the sound field reproduction
quality over a large region by prioritizing the reproduction at some smaller, sub-
zones. This method improves the sound field reproduction accuracy at the smaller
sub-zones at the cost of slight worse overall reproduction accuracy. This method
is especially useful when there is insufficient number of loudspeakers available for
sound field reproduction.
We also proposed a new metric for measuring average noise level within a region.
It is shown that this metric is more robust and accurate than the commonly used
method where the noise level is determined by averaging the noise pressure measured
by microphones.
This metric is then utilized to develop a method to estimate the potential per-
formance of a spatial ANC system. We use this method to evaluate the in-car
loudspeakers’ capability of cancelling the in-car noise at the passengers’ head posi-
tions, it was shown that the loudspeakers have the capability to attenuate the noise
level at lower frequencies for the given region of interest.
128 Methods for spatial ANC performance evaluation and optimization
7.1 Introduction
Active noise cancellation (ANC) over space has been a hot topic of research in
the last two decades. Typically, ANC systems targeting spatial noise reduction
are realized by Multi-Input Multi-Output (MIMO) systems [126] employing a feed-
forward or feedback control algorithm [6]. Most popular applications of such systems
include aircraft cabin noise reduction [127] and automobile noise reduction [7, 128].
The most widely used MIMO ANC systems employ a number of microphones,
placed within the spatial area where noise attenuation is desired. The algorithms
are designed to minimize the average noise level captured by the error microphones
129
130 Spatial active noise cancellation system architectures
in the least-mean-square sense [6], through playing back counter-noise signals from a
number of loudspeakers. Using these algorithms, the noise attenuation is only max-
imized at the positions of the error microphones, while the overall noise attenuation
quality within the region of interest cannot be guaranteed.
We develop the algorithm in the frequency domain, and since most ANC al-
gorithms are implemented in the time domain, we also present the time domain
equivalent of the algorithm, which can be realized through time domain filtering of
microphone signals. In order to validate the the proposed algorithm, a prototype
spatial ANC system is built inside our laboratory. The system is used to investigate
the spatial ANC performance of the proposed algorithm under various hardware
configurations.
7.2 Background theory 131
where
X
yz0 (n) = yv (n) ∗ svz , for z = 1, 2, . . . , Z (7.2)
v
yv (n) is the driving signal for the vth loudspeaker, svz is the secondary channel
between the vth loudspeaker and zth error microphone, and “∗” denotes linear
convolution.
The secondary source driving signals in each iteration can be represented by
X
yv (n) = wTuv (n)xu (n), for v = 1, 2, . . . , V (7.3)
u
where wuv (n) = [wuv,0 (n), wuv,1 (n), . . . , wuv,L−1 (n)]T are the adaptive filter coeffi-
cients in the nth iteration, L is the length of the FIR adaptive filters.
The update equation of the multi-channel FxLMS algorithm is derived by
X
wuv (n + 1) = wuv (n) − µ x0uvz (n)ez (n), for v = 1, 2, . . . , V, and u = 1, . . . , U
u
(7.4)
132 Spatial active noise cancellation system architectures
Figure 7.1: Block diagram of the time domain feedforward ANC system.
where µ is the step size, x0uvz (n) = [x0uvz (n), . . . , x0uvz (n − L + 1)]T is the vector of the
latest L filtered reference signals, and the filtered reference signals can be obtained
by
x0uvz (n) = xu (n) ∗ b svz . (7.5)
Filtering the reference signal xu (n) by the secondary channel estimation helps to
improve the convergence speed of the adaptive algorithm, especially when the sec-
ondary path has a long delay [6].
Figure 7.2: Block diagram of the Frequency Domain feedforward ANC system.
134 Spatial active noise cancellation system architectures
where F2N denotes 2N -point FFT, j is frame index, x(j) and x(j + 1) denote the
previous and current frame of input data, respectively.
The cancelling signal is generated by convolving the reference signal and the
filter w(k) in the frequency domain, and discarding the first N samples of the
IFFT output, which can be expressed as
−1
y(j + 1) = [O N I N ]F2N [X(j + 1) ⊗ W (j)], (7.7)
The filtering of the reference signal with the secondary path estimation is imple-
mented in a similar manner. Denote the estimated secondary path impulse response
of length N as sb, the frequency domain filtered reference signal X 0 (j + 1) can be
calculated as " #
x(j)
X 0 (j + 1) = F2N ⊗ S,
b (7.9)
x(j + 1)
where S
b is the 2N point FFT of b
s,
" #
b = F2N IN
S s.
b (7.10)
ON
The adaptive filter w(k) is implemented in the time domain, in the form of a
vector of length N . The filter is updated when a new frame of reference signal and
error signal is available, and the update equation can be written as
−1
c(j + 1) = [O N T N ]F2N [X 0 (j + 1) ⊗ E(j + 1)], (7.12)
with T N being a time reversal matrix, with its secondary diagonal equal to 1, and
7.2 Background theory 135
other entries equal to 0. The frequency domain error signal E(j + 1) is obtained
by taking the 2N point FFT of the latest frame of the time domain error signal,
expressed as " #
IN
E(j + 1) = F2N e(j + 1). (7.13)
ON
1
0≤µ≤ , (7.14)
N λmax
where λmax is the maximum eigenvalue of the autocorrelation matrix of the input
signal, and
0
E[X T (k)X 0 (j)] = 0, k 6= j. (7.15)
Z
X
wuv (j + 1) = wuv (j) + 2µcuvz (j + 1), (7.16)
z=1
where
−1
cuvz (j + 1) = [O N T N ]F2N [X 0uvz (j + 1) ⊗ E z (j + 1)], (7.17)
U
X
−1
y v (j + 1) = [O N I N ]F2N [X u (j + 1) ⊗ Wuv (j)]. (7.18)
u=1
It can be seen that the computational complexity grows quickly as the number of
reference signals, loudspeakers and error microphones increase.
136 Spatial active noise cancellation system architectures
Spatial ANC systems using circular harmonic transform have been proposed in [31].
The overall data flow in this method is similar to the multi-channel ANC algorithm.
However, in this method, the error microphones form a circular array, which is
surrounded by the secondary loudspeaker array, also taking a circular geometry.
The reference microphones surround the loudspeaker array, and form a third circular
array. The extensive use of circular array geometries allows for transforming both
the reference and error signals into circular harmonic coefficients; in addition, the
secondary loudspeaker channels H v (k) are also transformed into circular harmonic
domain, under the assumption that the loudspeakers are point sources.
In this method, the filtered reference signals are generated by filtering the refer-
ence circular harmonic coefficients through the secondary channel circular harmonic
coefficients, therefore X 0 (k) is also in the circular harmonic domain. Furthermore,
since the error signals are also transformed into circular harmonics, the adaptive
filter, which takes the same form as in the MIMO adaptive algorithm, operates in
the circular harmonic domain, and updates the adaptive filter W (k) which contains
circular harmonic coefficients that mimics the primary channel.
The loudspeaker driving signals are first generated by filtering the reference coef-
ficients through the adaptive filter, also in the form of circular harmonic coefficients.
Then, an inverse circular harmonic transform maps the coefficients to each individual
loudspeaker, and produces the final output signal for each speaker.
Due to the use of circular harmonic transform, this method is able attenuate the
noise within the 2D space covered by the error microphone array, while significantly
reducing the computational complexity, compared to a multi-channel algorithm us-
ing the same loudspeaker-microphone setup [31]. However, one disadvantage of this
method is that in order to express the secondary sources using circular harmonics,
the loudspeakers have to be arranged as a circular array.
7.3 Frequency domain feed-forward architecture for spatial ANC systems 137
where β(k, j) is a vector of length (L + 1)2 , containing all the spherical harmonic
coefficients for the error signals at frequency k and frame index j, E(k, j) is a
vector of length Z, containing the signals of all error microphones at frequency k
and frame index j. T (k) is the transformation matrix specific for the frequency bin
and microphone array geometry. For a uniform spherical error microphone array of
138 Spatial active noise cancellation system architectures
Figure 7.3: Block diagram of the frequency domain feedforward spatial ANC system.
7.3 Frequency domain feed-forward architecture for spatial ANC systems 139
The spherical harmonics transform for the filtered reference signals can be defined
similarly as
αuv (k, j) = T (k)X 0uv (k, j), (7.21)
where αuv (k, j) are the spherical harmonic coefficients of reference signal u filtered
through the channel responses of secondary source v.
For a microphone array suitable for spherical harmonics analysis, the transforma-
tion of the signal to the spherical harmonic domain changes the number of channels
from Z to (L+1)2 , and since Z ≥ (L+1)2 when no spatial aliasing occurs, the trans-
form reduces the complexity of the adaptive algorithm by a factor of approximately
(L + 1)2 /Z.
The Least-Mean-Square algorithm employed in the system aims to minimize
the mean square of all the error inputs. For a multi-channel ANC system without
spherical harmonics transformation, the optimization goal is
X
min{ |Ez (k, j)|2 } (7.22)
z
which minimizes the average signal energy at each of the error microphone. Af-
ter applying the spherical harmonic transform to both X 0 (k) and E(k), the LMS
minimization criteria becomes
XX
min{ |βlm (k, j)|2 } (7.23)
l m
which approximately reduces the noise level within a sphere, rather than at a finite
number of points. However, in order to achieve minimum residual acoustic potential
energy, an additional weighing needs to be applied to each of the obtained spherical
harmonic coefficients, such that
XX Z
2
min{ |Wl βlm (k, j)| } = min{ P (r, θ, φ, k)dS}, (7.24)
l m S
140 Spatial active noise cancellation system architectures
where the expression of the weights Wl is given in (6.20). The same weighing needs
to be applied to both the filtered reference signal coefficients and the error signal
coefficients, as shown in Fig. 7.3. The weighing procedure can be combined with the
spherical harmonics transform, and the weighted transform matrix can be expressed
as
T W (k) = W(k)T (k) (7.25)
where W(k) is a (L + 1)2 -by-(L + 1)2 diagonal matrix, with its diagonal elements
arranged as diag{W(k)} = [W0 (k), W1 (k), W1 (k)...WL (k)]T .
The adaptive filter wu,v for the uth reference signal and vth secondary source is
updated according to
X
wuv (j + 1) = wuv (j) + 2µ cuv
lm (j + 1) (7.26)
l,m
where cuv
lm (j + 1) is generated by taking the IFFT of the product of αuv (k, j + 1)
and β(k, j + 1) for all k,
−1
cuv
lm (j + 1) = [O N T N ]F2N [αuv (j + 1) ⊗ β(j + 1)]. (7.27)
It can be seen that due to the frame-based data processing scheme, generation of
the secondary driving signals as well as updating of the adaptive filter are delayed
by at least N samples, which is undesirable for ANC applications. One way to
reduce the delay of the secondary path signal is to implement (7.28) in the time
domain, i.e., use the time domain convolution method to generate the secondary
signals [129], this way, the latency of the secondary path signal can be reduced
to the same level of time domain ANC algorithms, at the cost of computational
efficiency. However, updating of the adaptive filter still needs to be processed frame-
by-frame, therefore the updating latency of the adaptive filter is always higher than
time domain implementations.
7.4 Time domain feed-forward architecture for spatial ANC systems 141
∞ X
X l
P (r, θ, φ, k) = τlm (k, r)Ylm (θ, φ). (7.29)
l=0 m=−l
By comparing (7.29) with (2.1), it can be seen that τlm (k, r) is related to the com-
monly used spherical harmonic coefficients by
∞ X
X l
p(r, θ, φ, n) = τblm (n, r)Ylm (θ, φ), (7.31)
l=0 m=−l
where p(r, θ, φ, n) is the sound pressure at a discrete time index n, and τblm (n, r) is
is a set of coefficients defined in the time domain. The physical meaning of (7.31)
is that the sound pressure on the surface of a spherical region at a certain time
instant can be represented by an infinite summation of spherical harmonics. We note
that p(r, θ, φ, n) is the sum of spatial sound of all frequencies, and since the higher
frequency sound has shorter wave length and hence more complicated pressure field,
it is necessary to use the higher order spherical harmonics to represent the higher
frequency spatial sound components.
However, since for a given frequency, only a finite number of spherical harmonics
142 Spatial active noise cancellation system architectures
are needed to represent the sound field [53], if the sound signal is band limited, the
infinite summation in (7.31) may be truncated to a finite summation up to order L
given by (2.7). When sampling the sound field using a finite number of microphones,
in order to avoid spatial aliasing, it is necessary to low pass filter the input signal
before applying the time domain spherical harmonic transform, so that the higher
order spherical harmonics associated with higher frequency sound components would
be removed from the input signal.
Rearranging (7.30) and taking its inverse Fourier transform, we have
where C blm (n) = F −1 {Clm }. From (7.32), it can be seen that the time domain
spherical harmonic coefficients C blm (n) can be obtained by filtering the corresponding
τblm (n, r) with a filter whose frequency response is equal to 1/jl (kr).
Since time domain sound pressure signals are real-valued, it is sufficient to use
real-valued spherical harmonics (2.32) as the basis functions. The sound pressure at
a certain location (r0 , θ0 , φ0 ) and time index n can be obtained by
∞ X
X l
0 0 0 blm (n) ∗ F −1 {jl (kr0 )}Y R (θ, φ).
p(r , θ , φ , n) = C lm (7.33)
l=0 m=−l
We note that the filters corresponding to 1/jl (kr) and Wl (k) can be designed
such that their frequency responses are accurate only for the interested frequency
band, so as to reduce the difficulty and complexity of the filter design problem.
We propose a time domain ANC system based on the time domain multichannel
ANC architecture and the time domain spherical harmonic analysis techniques. The
system structure is illustrated in Fig. 7.4.
7.4 Time domain feed-forward architecture for spatial ANC systems 143
Figure 7.4: Block diagram of the time domain feedforward spatial ANC system.
144 Spatial active noise cancellation system architectures
In the proposed system, the filtered reference signals x0uvz (n) are obtained by
The filtered reference signal is then passed through a low pass filter, whose cut off
frequency equals to the maximum operating frequency of the ANC system. The
maximum frequency needs to agree with the capability of error microphone array,
i.e., the array should be able to capture spatial sound field at this frequency without
spatial aliasing.
For a uniform spherical error microphone array, the time domain coefficients
uv
τblm (n) of order l and mode m, due to reference signal u and secondary source v, are
obtained by
X
uv
τblm (n) = x0uvz (n)Ylm
R
(θz , φz ), (7.36)
z
where (θz , φz ) denote the angular position of the zth error microphone. For spherical
microphone arrays that do not have a uniform spatial sampling scheme, the alter-
native option is to solve for the coefficients in a Least-Mean-Square manner, which
can be expressed by
τbuv(n) = (Y R )−1 x0uv (n), (7.37)
uv uv uv
where τbuv(n) = [b
τ00 (n), τb11 (n), τb10 (n)...]T is the vector of all the coefficients τblm
uv
at
time instant n, x0uv (n) = [x0uv1 (n), x0uv2 (n)..]T is the vector containing the filtered
reference signals for all the error microphones at time instant n, and (Y R )−1 is the
Moore-Penrose pseudo inverse of the matrix Y R , which is given by
Y00R (θ1 , φ1 ) Y11R (θ1 , φ1 ) Y10R (θ1 , φ1 ) ···
R
Y00 (θ2 , φ2 ) Y11R (θ2 , φ2 ) Y10R (θ2 , φ2 ) · · ·
YR = .. .. .. . . . .
(7.38)
. . .
Y00R (θZ , φZ ) Y11R (θZ , φZ ) Y10R (θZ , φZ ) · · ·
The pseudo inverse of the matrix Y R can be obtained offline, therefore it does not
increase the computation complexity of the algorithm.
uv
For clarity, instead of using Cblm (n), we use α
blm (n) and βblm (n) to represent the
time domain spherical harmonic coefficients for the filtered reference signals and
uv
error signals, respectively. We note that both α blm (n) and βblm (n) are real value coef-
ficients, due to the use of real-valued spherical harmonics. The time domain spher-
ical harmonic coefficients can be obtained according to (7.32). Next, the weighted
7.4 Time domain feed-forward architecture for spatial ANC systems 145
uv uv
spherical harmonic coefficients α
elm (n) are obtained by passing the coefficients α
blm (n)
through the weighing filter according to (7.34).
As can be seen in Fig. 7.4, the error signals from the microphone array are
processed in the same way as the filtered reference signal, and it is necessary that
the same low pass filter is used for both reference signal and error signal. We denote
the weighted error coefficients as βelm (n).
Since the inputs to the Least-Mean-Square algorithm are spherical harmonic
coefficients, the LMS algorithm operates on the spherical harmonics domain. The
adaptive filter bank w(n) has the size U × V × L. At each time instant, w(n) is
updated using the equation
XX
wuv (n + 1) = wuv (n) + µ e uv
α lm (n) ⊗ β lm (n),
e (7.39)
l m
where α e uv
lm (n) = [e
uv
αlm (n), α uv
elm (n − 1), α uv
elm (n − 2)...e uv
αlm (n − L + 1)] is the vector of the
latest L samples of the reference signal coefficients, and βelm (n) = [βelm (n), βelm (n −
1), βelm (n − 2)...βelm (n − L + 1)] is the vector of the latest L samples of the error
coefficients.
The driving signal for the vth loudspeaker is generated the same way as existing
multi-channel algorithm, given by
X
yv (n) = wuv (n)xu (n)T . (7.40)
u
The steps of filtering the reference signal and converting the filtered signal into
spherical harmonic domain may be simplified. Consider the equation
X 1
α uv
blm (n) = ( R
(xu ∗ sbvz )Ylm (θz , φz )) ∗ F −1 { } (7.41)
z
jl (kr)
X 1
= xu ∗ ( R
sbvz Ylm (θz , φz )) ∗ F −1 { }, (7.42)
z
jl (kr)
we may define the spherical harmonic domain secondary channel impulse response
as
X 1
v
Sblm , R
svz Ylm
b (θz , φz ) ∗ F −1 { }, (7.43)
z
jl (kr)
which essentially transforms the secondary channel impulse responses into the spher-
v
ical harmonic domain, and Sblm represents the spherical harmonic impulse response
v
of order l and mode m, due to secondary source v. Since Sblm can be calculated of-
146 Spatial active noise cancellation system architectures
fline without knowledge of the reference signal, the computational cost of obtaining
uv v
α
blm (n) can be greatly reduced through directly filtering xu (n) by Sblm , however it
should be noted that xu (n) needs to be low pass filtered first in order not to result
in spatial aliasing.
microphones are placed on the boundary of the region. The angular positions of
the microphones are (60◦ , 0), (60◦ , 120◦ ), (60◦ , 240◦ ), (120◦ , 60◦ ), (120◦ , 180◦ ) and
(120◦ , 300◦ ), respectively. An additional microphone is placed at the center of the
array, however this microphone is only used for monitoring purposes and is not part
of the ANC system.
The audio playback / record as well as real time signal processing are handled
by a desktop PC, with the ANC algorithm implemented using MatLab R2016. The
proposed time domain feed-forward spatial ANC system is implemented; we also
implemented the time domain MIMO ANC algorithm for comparison. The spatial
ANC algorithm is implemented in a frame based manner, with a frame size of 384
samples at a sampling rate of 44100 Hz. At each frame time, the program receives
audio input from the microphones, perform the ANC algorithm and generates the
loudspeaker driving signals to be played during the next frame. The noise signal is
also generated by the program, in order to create a controlled experimental environ-
ment.
Normally, in a feed-forward ANC system, the reference noise is picked up by
a reference microphone / sensor, sometimes attached to the primary noise source.
The digital signal processing system then processes the reference noise and generates
the anti-noise signals while the noise sound propagates towards the control region.
Ideally, the anti-noise sound is played by the secondary speakers before the noise
reaches the control region. In order to achieve this, the signal processing latency
must be smaller than the propagation time of the noise, from the error microphone to
the control region. Unfortunately, the signal path round trip latency of our system
is more than 2000 samples or 45 ms due to the buffering in the computer’s data
path, which means that the primary sources need to be placed more than 15 meters
away from the control region, should a reference microphone be used, which is not
possible in our lab condition1 .
Due to this reason, the reference noise is directly picked up in the electronic
path instead of being captured by a microphone, which eliminates the delay in the
capture of reference signal completely. In addition, the feedback from secondary
sources to the reference microphone is also avoided.
The aim of the experiment is to evaluate the performance of the proposed ANC
system under various system configurations. The target frequency band is 200 − 500
1
Using embedded signal processing systems such as microcontrollers, DSPs or FPGAs to im-
plement the adaptive algorithm and AD/DA conversion would significantly reduce the round trip
latency, down to only a few milliseconds or less, in which case primary noise source distance would
not be a problem.
148 Spatial active noise cancellation system architectures
Hz, this frequency band is chosen because it was found that the typical noise energy
inside cars is below 500 Hz, and that the loudspeakers being used as secondary
sources have limited low frequency capabilities. Within this frequency band, only
the 0th and 1st order spherical harmonic modes exist within the noise field, and the
microphone array can reliably pick up the noise field.
In order to evaluate the system’s performance at each frequency, for each experi-
ment, a sine wave of a certain frequency is played through one or two of the primary
noise sources; after a small period of time, the ANC algorithm begins to function
and gradually cancels the noise. The sound pressure received by the microphones
is recorded throughout each experiment, and the level of noise attenuation at the
microphone positions is calculated by
E{|ebef 2
P
z (n)| }
Amic = 10 log10 Pz aft 2
, (7.44)
z E{|ez (n)| }
bef
(n)|2 }
P
l,m E{|βelm
Aavg = 10 log10 P , (7.45)
l,m E{|βeaft (n)|2 }
lm
bef aft
where βelm (n) and βelm (n) represent the weighted error signal coefficients before ANC
begins and after ANC algorithm fully converges, respectively.
-5
Attenuation (dB)
-10
-15
MIMO
Harmonic
-20
200 250 300 350 400 450 500
Frequency (Hz)
Figure 7.6: Comparison of spatial noise attenuation using MIMO algorithm (blue)
and the proposed spatial ANC algorithm (red).
is activated, and we only use speaker 1 (in Fig.7.5) as the secondary source.
Figure 7.6 plots the average noise attenuation within the control region, using
the existing MIMO ANC method as well as the proposed spatial ANC method.
It can be seen that overall, the two methods result in similar noise attenuations,
and the attenuation at lower frequencies are better than that at higher frequencies.
Since the 6 microphones are placed evenly over the spherical boundary of the con-
trol region (in order to capture the spherical harmonic coefficients), and that their
distances are much smaller than the wave length of the noise, they provide a very
good representation of the noise level within the field. As a result, the proposed
method does not show a clear advantage over the MIMO method.
However, it can be seen from Fig. 7.6 that at higher frequencies (460 Hz and
above), the spatial ANC method begins to yield consistently better attenuation
than the MIMO method. The reason for this is that as the wavelength shortens,
sampling of noise pressure on the boundary of the control region begins to have
less correlation with the noise field inside the region. As a result, minimizing the
noise pressure at the microphone positions no longer guarantees minimization of the
noise level inside the region. The proposed method, on the other hand, is able to
control the entire region through converting the microphone signals into the spherical
harmonic domain. Should the region size be larger, this phenomenon would be more
pronounced.
Of special notice is the peak at 300 Hz in Fig. 7.6, where the attenuation is 0
150 Spatial active noise cancellation system architectures
0
Speaker 1
Speakers 1 & 2
Speakers 1,2,4,5,6
-5
Attenuation (dB)
-10
-15
-20
200 250 300 350 400 450 500
Frequency (Hz)
Figure 7.7: Spatial noise attenuation using primary noise source 1 and various num-
ber of secondary loudspeakers.
dB for both MIMO algorithm and the spatial algorithm. A careful investigation
reveals that at this frequency, a standing wave is formed between the noise source
and the walls of the lab, with the control region located at a “minimum” point of
the standing wave, i.e., the amplitude of the standing wave is very small. Therefore,
the noise field essentially cancels itself without need of the secondary loudspeakers,
which results in minimum attenuation gain for the ANC system.
In a multi channel ANC system, more than one secondary sources may be employed
in order to improve the attenuation level of the system. When the control region
is large, or the target frequency band is high, it is expected that a larger number
of secondary sources are required to achieve sufficient noise attenuation, due to
the increased complexity of the noise field. In this experiment, we validate this
assumption using the experimental ANC system.
First, we use only one primary noise source (noise source 1 in Fig. 7.5), and
perform the ANC experiment using the proposed algorithm with (i) speaker 1, (ii)
speakers 1 & 2, and (iii) speakers 1,2,4,5,6. The average noise attenuation for the
three cases at frequencies from 200-500 Hz is shown in Fig. 7.7.
From Fig. 7.7, we can see that indeed the noise attenuation is greater when
more secondary sources are active. Using only speaker 1, the system is able to
7.5 Experiment validation 151
-5
Attenuation (dB)
-10
-15
-20 Speaker 2
Speakers 1 & 3
Speakers 1 - 6
-25
200 250 300 350 400 450 500
Frequency (Hz)
Figure 7.8: Spatial noise attenuation using primary noise source 1 and various num-
ber of secondary loudspeakers.
achieve around 10 dB attenuation for most frequencies, and overall, the attenuation
is better at lower frequencies (below 300 Hz) than at higher frequencies (above 400
Hz). When speaker 2 is added to the system, a much better attenuation is observed
at the lowest frequencies, while a smaller performance gain is achieved at the higher
frequencies. Neither of the two configurations were able to attenuate the noise at
300 Hz.
When five secondary sources are used in the ANC system, compared to the case
with only speakers 1 and 2, the noise attenuation from 200 Hz to 240 Hz are almost
identical. As frequency increases, the difference between the two configurations
becomes more significant. At 440 Hz and above, the 5-speaker configuration results
in approximately 10 dB higher attenuation than the 2-speaker case. Furthermore,
unlike the other two cases, the 5-speaker setup is able to maintain more than 15 dB
noise reduction throughout the whole frequency band, with the only exception at
300 Hz, where only 1 db of noise reduction is observed.
The experiment is repeated with both primary noise sources active, with each
playing a sine wave at the same frequency, but different phase. This time, the
secondary sources being activated are (i) speaker 2, (ii) speakers 1 & 3, and finally
(iii) speakers 1-6. We plot the noise attenuation in Fig. 7.8.
The overall trend shown in Fig. 7.8 is similar to that of Fig. 7.7, where the single
secondary source case results in the least noise attenuation, and the case where
all 6 speakers are used has the most attenuation. The overall trend of decreasing
152 Spatial active noise cancellation system architectures
ANC performance with increased frequency is also observed for the single and duo
secondary source cases. At 420 Hz and above, using two speakers (speakers 1 an 3)
does not provide significant improvement over using speaker 2 alone. However, when
all 6 speakers are used, the noise attenuation can be further improved by 5 − 10 dB.
We also note that when both primary sources are active, the system no longer
experience difficulty at 300 Hz, this is because the speakers are capable of cancelling
the noise field due to the second noise source, whose noise field does not exhibit the
self-cancelling behavior like that of noise source 1.
It can be seen from both Fig. 7.7 and Fig. 7.8 that at lower frequencies, a small
number of secondary sources are sufficient to provide more than 15 dB of noise
attenuation, and the benefit of adding more secondary sources is only marginal.
This is because given the radius of our control region (0.145 m), at around 200 Hz,
the 0th order spherical harmonic mode is dominant within the control region. Since
the 0th mode is uniform and isotropic, any secondary source located in any direction
is capable of producing this mode within the control region, therefore only one or
two secondary speakers is sufficient to reproduce the noise field generated by the
noise source, hence resulting in high attenuation to the noise energy.
On the other hand, at higher frequencies, the 1st order spherical harmonic modes
begin to have more impact on the sound field. Since 1st order modes are directional,
a single secondary source can produce good results only if the sound field it pro-
duces within the control region is very similar to that of the primary noise source.
Because this is very unlikely to be the case, using a single or a small number of
secondary sources generally cannot provide high attenuation at higher frequencies.
The combined use of multiple secondary sources, however, can substantially improve
the spatial ANC system’s sound field reproduction capability, hence leads to better
results.
In Section 7.5.2, it was stated that the noise attenuation is related to the secondary
source’s ability to reproduce the noise field, which in turn is related to the placement
of secondary sources in relation to the primary noise. This is investigated in more
detail in this section.
In this experiment, we first use primary source 1 to generate the noise field, and
compare the ANC performance using secondary speaker 1 and secondary speaker 3.
We plot the experiment results in Fig. 7.9.
7.5 Experiment validation 153
-5
Attenuation (dB)
-10
-15
Speaker 1
Speaker 3
-20
200 250 300 350 400 450 500
Frequency (Hz)
Figure 7.9: Spatial noise attenuation using primary noise source 1 with secondary
speaker 1 (red) and secondary speaker 3 (blue).
It can be observed from Fig. 7.9 that the overall noise attenuation using speaker
1 is superior to that of speaker 3. Below 250 Hz, the difference of the two is not
very significant; however, at higher frequencies, speaker 1 begins to consistently
outperform speaker 3.
From Fig. 7.5, it can be seen that if viewed from the center of the control region,
secondary speaker 1 lies approximately in the same direction as primary source 1;
on the other hand, secondary speaker 3 forms a 45 degree angle with primary source
1. Since the loudspeakers employed in this experiment (both primary sources and
secondary sources) can be approximately seen as point sources, it can be expected
that the sound field produced by secondary speaker 1 would be very close to that
of primary source 1. Secondary speaker 3, on the other hand, will produce a very
different sound field due to its different impinging direction.
We note that in the case of speaker 3, the ANC system fails to converge at 280
Hz, in addition to 300 Hz. The failure at 280 Hz is not due to a standing wave of
the noise field, but likely due to the secondary channel being very different from the
primary channel, thus causing difficulty with the convergence of the algorithm.
The experiment is also repeated with multiple secondary sources. Instead of
changing the position of secondary sources, we examine the impact of primary source
position on ANC performance, given a fixed set of secondary loudspeakers. The
secondary sources used in this experiment are speakers 1,2,4,5 and 6. We plot the
noise attenuation results achieved using primary source 1 and primary source 3 in
154 Spatial active noise cancellation system architectures
0
Noise Source 1
Noise Source 2
-5
Attenuation (dB)
-10
-15
-20
-25
200 250 300 350 400 450 500
Frequency (Hz)
Figure 7.10: Spatial noise attenuation using secondary speakers 1,2,4,5,6 with pri-
mary source 1 (blue) and primary source 2 (red).
Fig. 7.10.
In Fig. 7.10 we can see that the noise attenuation for both noise sources are very
similar at lower frequencies; at 400 Hz and above, however, the noise attenuation for
primary source 1 becomes significantly better than than of primary source 2, with
more than 5 dB extra attenuation at some frequencies. Also, we note that primary
source 2 does not produce the standing wave like primary source 1 does at 300 Hz,
and therefore our system is able to yield a noise attenuation consistent with other
frequencies.
It can be seen from Fig. 7.5 that the five selected secondary sources essentially
“surrounds” primary source 1, if viewed from the control region, with secondary
speakers 1 and 2 being the closest to the noise source. The other three secondary
speakers have a very different elevation angle than the noise source, however their
elevation angles approximately coincides with the reflected waves from the ceiling
and the floor, emitted by primary source 1. Therefore, the secondary source setup
provides a very good coverage of the noise field generated by primary source 1,
hence the high, consistent ANC performance across the whole frequency band. In
the case of primary source 2, due to its different azimuth angle, the secondary
array’s capability to reproduce its noise field is limited and degrades gradually as
the frequency increases and the sound field becomes more complex, which results in
a slowly decaying attenuation level.
Overall, it can be concluded that the performance of the spatial ANC system
7.6 Summary 155
is affected by both the number of secondary sources and the relative position be-
tween secondary sources and primary sources. For a small control region and low
frequencies, even a small number of secondary sources is sufficient to provide ad-
equate noise attenuation, and the system performance is not very sensitive to the
placement of secondary sources. As the frequency increases, the system’s perfor-
mance becomes more sensitive to the number and position of secondary sources. If
the secondary sources are placed such that they cover the impinging directions of
the primary noises, it is possible to achieve consistent noise attenuation of over 15
dB for frequencies up to 500 Hz within a spherical region of 0.145 m radius, using
our proposed spatial ANC system.
7.6 Summary
In this chapter we propose a spatial active noise cancellation algorithm based on
spherical harmonics decomposition of the noise field. Both the frequency domain
implementation and the equivalent time domain implementation of the algorithm are
discussed. The proposed algorithm allows flexible placement of secondary sources,
and is able to optimize the noise attenuation for a given spherical region. Through
a series of experiments with the proposed ANC system, we show that the spatial
noise cancelling quality of the system depends on both the secondary source numbers
and placement, and an average noise attenuation of over 15 dB is achievable for a
spherical region of 0.145m radius, at a frequency range of 200-500 Hz, using the
proposed spatial ANC system.
Chapter 8
8.1 Conclusion
For the purpose of effectively attenuating the noise level inside a spatial region in
a practical environment, using active noise control methods, a number of problems
and difficulties have to be addressed. These include accurate acquisition of noise
field information, as well as generation of optimal noise cancelling signals. The
goal of this thesis was to address these problems by proposing a number of signal
processing algorithms, which includes algorithms for spatial sound recording, noise
environment modelling, as well as spatial adaptive ANC system architecture.
Spherical harmonic analysis has been shown to be an efficient and accurate tool
for representing spatial sound field. However, existing microphone array layouts
suitable for spherical harmonic analysis all exhibit 3D geometries. In Chapter 3, we
proposed a 2D planar microphone array layout which has the capability of capturing
3D spatial sound field. Through the use of vertically placed first order microphone
units, the proposed planar array is able to detect sound field components that are
“invisible” on a plane. The proposed array geometry is shown to have the same
capability as a spherical array of the same radius. However, it is desirable to use
high precision microphones to implement the proposed array, since its robustness is
inferior to spherical arrays. In Chapter 4, we also propose a generalization of this
method, which allows the use of planar higher order microphone arrays to sample
3D sound field. This method reduces the total number of microphones required for
sound field recording, hence promotes the feasibility of spherical harmonic analysis
in real-life ANC applications.
In a reverberant environment, the noise field due to a single source can be re-
157
158 Conclusion and future works
flected multiple times, thereby creating a more complex noise field. Such kind of
noise field is harder to control, due to its wide range of impinging directions. It
is therefore critical to have a method to estimate the level of reverberation in a
given environment, so as to aid the design of the ANC system. In Chapter 5, we
developed an algorithm for Direct-to-Reverberant Ratio estimation. Compared to
existing methods for DRR estimation, the proposed method provides a more ac-
curate modelling of the reverberant field, therefore its estimation of DRR is more
accurate. The sound field information required by the algorithm can be captured
by a first order microphone system.
In order to develop spatial noise cancellation techniques, one first needs a metric
to measure the average noise level inside a spatial area. We proposed one such
metric in Chapter 6, which can be calculated by taking the weighted squared sum
of the spherical harmonic coefficients of the sound field. Using this metric, we
developed a method to predict the optimum spatial ANC performance in a given
noise environment, before physically implementing an ANC system. This method
is then applied to estimate the performance of in-car loudspeakers for the purpose
of cancelling the car cabin noise under various driving conditions. In this chapter,
we also show that by appropriately weighing the spherical harmonic coefficients, it
is possible to optimize the ANC performance at a number of sub-regions within the
desired quiet zone, only at a small sacrifice to the global noise reduction performance.
This technique is shown to be especially useful when the number of secondary sources
is insufficient.
In Chapter 7, we developed an adaptive ANC algorithm based on the spherical
harmonic analysis technique. Through transforming the microphone signals into the
spherical harmonics domain, the algorithm is able to optimize the noise attenuation
within a spherical region. Both frequency domain and time domain implementations
are discussed. An experimental spatial ANC system based on the proposed algo-
rithm has been implemented, and we used this system to investigate the performance
of ANC system under various system configurations.
Overall, it can be concluded that despite the challenges that still remain un-
solved, the spatial active noise cancellation technique has been developed to the
point where it does not only exist in theory and simulations, but has become feasi-
ble and practical to be deployed in many applications to solve real-world problems.
Further development of the algorithms related to spatial ANC would surely lead
to improvements in the performance of spatial ANC systems, as well as identifying
more potential applications for the technique.
8.2 Future works 159
slower. In a previous work, the secondary driving signals are transformed into circu-
lar harmonics, which overcomes this problem. The drawback of this method is that
it requires a circular loudspeaker array, which is often impractical. For non-circular
speaker arrays, it may be possible to define a different transformation which does
not require specific loudspeaker placement, thus improving the convergence speed
of the adaptive algorithm.
[2] S. J. Elliott and P. A. Nelson, “Active noise control,” IEEE Signal Processing
Magazine, vol. 10, no. 4, pp. 12–35, Oct 1993.
[3] S. M. Kuo, S. Mitra, and Woon-Seng Gan, “Active noise control system for
headphone applications,” IEEE Transactions on Control Systems Technology,
vol. 14, no. 2, pp. 331–335, March 2006.
[4] S. M. Kuo and S. Mitra, “Design of noise reduction headphone,” in Proc. 2006
Digest of Technical Papers International Conference on Consumer Electronics,
Jan 2006, pp. 457–458.
163
164 Bibliography
[10] T. D. Abhayapala and D. B. Ward, “Theory and design of high order sound
field microphones using spherical microphone array,” in Proc. 2002 IEEE In-
ternational Conference on Acoustics, Speech, and Signal Processing (ICASSP),
2002, vol. 2, pp. II–1949–II–1952.
[18] M. Chan, “Theory and design of higher order sound field recording,” Depart-
ment of Engineering, FEIT, ANU, Honours Thesis, 2003.
[19] Rishabh RANJAN, Jianjun HE, Tatsuya MURAO, Lam BHAN, and
Woon Seng GAN, “Selective active noise control system for open windows
using sound classification,” in Proc. Inter.noise 2016, Nov 2016.
[20] Chuang Shi, Tatsuya Murao, Dongyuan Shi, Bhan Lam, and Woon-Seng Gan,
“Open loop active control of noise through open windows,” The Journal of
the Acoustical Society of America, vol. 140, no. 4, pp. 3313–3313, 2016.
[21] Tatsuya Murao, Chuang Shi, Woon-Seng Gan, and Masaharu Nishimura,
“Mixed-error approach for multi-channel active noise control of open win-
dows,” Applied Acoustics, vol. 127, pp. 305 – 315, 2017.
[22] Jordan Cheer and Stephen J. Elliott, “Multichannel control systems for the
attenuation of interior road noise in vehicles,” Mechanical Systems and Signal
Processing, vol. 6061, pp. 753 – 769, 2015.
[23] S. J. Elliott, W. Jung, and J. Cheer, “The spatial properties and local active
control of road noise,” in Proc. of Euro-noise, 2015, pp. 2189–2194.
[26] Akira Takahashi, Toshio Inoue, Kosuke Sakamoto, and Yasunori Kobayashi,
“Integrated active noise control system for low-frequency noise in automo-
biles,” in INTER-NOISE and NOISE-CON Congress and Conference Proceed-
ings. Institute of Noise Control Engineering, 2011, vol. 2011, pp. 2105–2113.
[28] J. Cheer and S. Daley, “An investigation of delayless subband adaptive filtering
for multi-input multi-output active noise control applications,” IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 25, no. 2, pp.
359–373, Feb 2017.
[29] Tongwei Wang, Woon-Seng Gan, and Sen M. Kuo, “New feedback active
noise control system with improved performance,” in Proc. 2014 IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP),.
IEEE, 2014, pp. 6662–6666.
[31] S. Spors and H. Buchner, “Efficient massive multichannel active noise control
using wave-domain adaptive filtering,” in Proc. 3rd International Symposium
on Communications, Control and Signal Processing, March 2008, pp. 1480–
1485.
[34] Shefeng Yan, Haohai Sun, U. P. Svensson, Xiaochuan Ma, and J. M. Hovem,
“Optimal modal beamforming for spherical microphone arrays,” IEEE Trans-
actions on Audio, Speech, and Language Processing, vol. 19, no. 2, pp. 361–371,
2011.
[48] Y. Peled and B. Rafaely, “Method for dereverberation and noise reduction
using spherical microphone arrays,” in Proc. 2010 IEEE International Confer-
ence on Acoustics Speech and Signal Processing (ICASSP), 2010, pp. 113–116.
[56] I.S. Gradshteyn and I.M. Ryzhik, Table of Integrals, Series, and Products, p.
955, Academic Press, 2000.
[59] J. Kautz, J. Snyder, and P. J. Sloan, “Fast arbitrary BRDF shading for low-
frequency lighting using spherical harmonics,” Rendering Techniques, vol. 2,
pp. 291–296, 2002.
[62] Frederik J. Simons, “Slepian functions and their use in signal estimation and
spectral analysiss,” in Handbook of Geomathematics, Willi Freeden, M. Zuhair
Nashed, and Thomas Sonar, Eds., pp. 891–923. Springer Berlin Heidelberg,
2010.
[64] J. Meyer and G. Elko, “A highly scalable spherical microphone array based on
an orthonormal decomposition of the soundfield,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2002,
vol. 2, pp. II–1781–II–1784.
[68] T. D. Abhayapala and M.C.T. Chan, “Limitation and errior analysis of spher-
ical microphone arrays,” in Proc. 14th International Congress on Sound and
Vibration (ICSV14), Cairns, Australia, July 2007.
[69] C. Jin, A. Parthy, and A. Van Schaik, “Optimisation of co-centred rigid and
open spherical microphone arrays,” in Proc. 120th Audio Engineering Society
Convention, Paris, France, May 2006, p. 6 pages, Audio Engineering Society.
[71] A. Gupta and T. D. Abhayapala, “Double sided cone array for spherical
harmonic analysis of wavefields,” in Proc. IEEE International Conference on
Acoustics Speech and Signal Processing (ICASSP), March 2010, pp. 77–80.
[77] H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-
ing circular higher-order microphone array,” in Proc. 23rd European Signal
Processing Conference (EUSIPCO),, Aug 2015, pp. 1153–1157.
[78] D. Griesinger, “The importance of the direct to reverberant ratio in the per-
ception of distance, localization, clarity, and envelopment,” in Proc. Audio
Engineering Society Convention 126. Audio Engineering Society, 2009.
[82] E. Larsen, N. Iyer, C. R. Lansing, and A. S. Feng, “On the minimum audible
difference in direct-to-reverberant energy ratio,” The Journal of the Acoustical
Society of America, vol. 124, no. 1, pp. 450–461, 2008.
to-reverberant cues,” Experimental brain research, vol. 224, no. 4, pp. 623–633,
2013.
[87] T. H. Falk and W. Chan, “Temporal dynamics for blind measurement of room
acoustical parameters,” IEEE Trans. on Instrumentation and Measurement,
vol. 59, no. 4, pp. 978–989, 2010.
[90] Y. Lu and M. Cooke, “Binaural estimation of sound source distance via the
direct-to-reverberant energy ratio for static and moving sources,” IEEE Trans.
on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1793–1805,
2010.
[91] S. Vesa, “Sound source distance learning based on binaural signals,” in Proc.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
Oct 2007, pp. 271–274.
[96] Y. Hioka and K. Niwa, “PSD estimation in beamspace for estimating direct-to-
reverberant ratio from a reverberant speech signal,” in Proc. ACE Challenge
Workshop, a satellite event of WASPAA, New Paltz, NY, USA, Oct 2015.
[98] M. Kuster, “Estimating the direct-to-reverberant energy ratio from the coher-
ence between coincident pressure and particle velocity,” The Journal of the
Acoustical Society of America, vol. 130, no. 6, pp. 3781–3787, 2011.
[101] E. G. Williams, Fourier Acoustics: Sound Radiation and Near field Acoustical
Holography, USA: Academic, 1999.
[102] F.J. Fahy, Sound Intensity, Elsevier Applied Science, London, 1989.
[105] N. Epain and E. Friot, “Active control of sound inside a sphere via control of
the acoustic pressure at the boundary surface,” J. Sound Vibr., vol. 299, no.
3, pp. 587–604, 2007.
[111] S. C. Douglas, “Fast implementations of the filtered-X LMS and LMS algo-
rithms for multichannel active noise control,” IEEE Transactions on Speech
and Audio Processing, vol. 7, no. 4, pp. 454–465, Jul 1999.
[117] P. A. Nelson and S. J. Elliott, Active control of sound, Academic press, 1991.
[120] S. J. Elliott and P. A. Nelson, “The active control of sound,” Electronics &
communication engineering journal, vol. 2, no. 4, pp. 127–136, 1990.
[121] S. J. Elliott, “A review of active noise and vibration control in road vehicles,”
Technical Report 981, ISVR Technical Memorandum, 2008.
[122] Xuan Li, Shefeng Yan, Xiaochuan Ma, and Chaohuan Hou, “Spherical har-
monics MUSIC versus conventional MUSIC,” Applied Acoustics, vol. 72, no.
9, pp. 646 – 652, 2011.
[127] S.J. Elliott, P.A. Nelson, I.M. Stothers, and C.C. Boucher, “In-flight exper-
iments on the active control of propeller-induced cabin noise,” Journal of
Sound and Vibration, vol. 140, no. 2, pp. 219 – 238, 1990.
[129] D. P. Das, G. Panda, and S. M. Kuo, “New block filtered-x lms algorithms for
active noise control systems,” IET Signal Processing, vol. 1, no. 2, pp. 73–81,
June 2007.