0% found this document useful (0 votes)

18 views

TheoryandDesignofSpatialActiveNoiseControlSystems

The thesis by Hanchi Chen focuses on the theory and design of spatial active noise control (ANC) systems, aiming to improve noise reduction in various applications. It explores the use of spherical harmonic analysis to optimize ANC performance and reduce system complexity, resulting in several key contributions including new microphone array designs and algorithms for evaluating noise characteristics. The research presents advancements in spatial ANC that enhance its feasibility for real-life applications, particularly in managing acoustic noise hazards.

Uploaded by

Carolina Navarro

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

TheoryandDesignofSpatialActiveNoiseControlSystems

Uploaded by

Carolina Navarro

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 193

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/320224291

Theory and Design of Spatial Active Noise Control Systems

Thesis · January 2017

CITATIONS READS

5 6,849

1 author:

Hanchi Chen
Australian National University
19 PUBLICATIONS 333 CITATIONS

SEE PROFILE

All content following this page was uploaded by Hanchi Chen on 09 June 2019.

The user has requested enhancement of the downloaded file.

Theory and Design of Spatial
Active Noise Control Systems

Hanchi Chen

Bachelor of Engineering (Hons 1)

Australian National University

September 2017

(5<3VNV<ZL.\PKLSPULZ
A thesis submitted for the degree of Doctor of Philosophy
;OL(5<SVNVPZHJVU[LTWVYHY`of The Australian National University
YLÅLJ[PVUVMV\YOLYP[HNL
0[JSLHYS`WYLZLU[ZV\YUHTL
V\YZOPLSKHUKV\YTV[[V!
-PYZ[[VSLHYU[OLUH[\YLVM[OPUNZ
;OL(5<SVNVYLTHPUZWYVWLY[`VM[OL<UP]LYZP[`;VWYLZLY]L
[OLH\[OLU[PJP[`VMV\YIYHUKPKLU[P[`[OLYLHYLY\SLZ[OH[
NV]LYUOV^V\YSVNVPZ\ZLK
Research School of Engineering
7YLMLYYLKSVNV )SHJR]LYZPVU

7YLMLYYLKSVNV
College of Engineering and Computer Science
;OLWYLMLYYLKSVNVZOV\SKIL\ZLKVUH^OP[LIHJRNYV\UK
;OPZ]LYZPVUPUJS\KLZISHJR[L_[^P[O[OLJYLZ[PU+LLW.VSKPU
LP[OLY74:VY*4@2
)SHJR
The Australian National University
>OLYLJVSV\YWYPU[PUNPZUV[H]HPSHISL[OLISHJRSVNVJHU +LLW.VSK )SHJR

©Hanchi Chen 2017

IL\ZLKVUH^OP[LIHJRNYV\UK *4@2 *4@2

9L]LYZL 9.) 9.)

74:4L[HSSPJ 74:7YVJLZZ
;OLSVNVJHUIL\ZLK^OP[LYL]LYZLKV\[VMHISHJR 74:
IHJRNYV\UKVYVJJHZPVUHSS`HUL\[YHSKHYRIHJRNYV\UK All Rights Reserved
Any application of the ANU lo
a coloured background is sub
to approval by the Marketing O
Please send to [email protected]

9L]LYZLK]LYZPVU
3VNVHUKHWWYV]HSZJHUILVI[HPULKMYVTIYHUK'HU\LK\H\
Declaration

The contents of this thesis are the results of original research and have not been
submitted for a higher degree to any other university or institution. Much of this
work has either been published or submitted for publications as journal papers and
conference proceedings. Following is a list of these papers.

Journal Publications

H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact

hybrid microphone arrays on two-dimensional planes for three-dimensional
soundfield analysis,” The Journal of the Acoustical Society of America, vol.
138, no. 5, pp. 3081–3092, 2015.

H. Chen, T. D. Abhayapala, P. N. Samarasinghe, and W. Zhang, “Direct-to-

reverberant energy ratio estimation using a first order microphone,”IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 25, no. 2, PP.
226–237, Feb 2017.

P. N. Samarasinghe, T. D. Abhayapala, and H. Chen, “Estimating the Direct-

to-Reverberant Energy Ratio Using a Spherical Harmonics Based Spatial Cor-
relation Model”, in IEEE Transactions on Audio, Speech and Language Pro-
cessing, vol. 25, no. 2, PP. 310–319, Feb 2017.

Conference Proceedings

H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-

ing circular higher-order microphone array,” in Proc. 23rd European Signal
Processing Conference (EUSIPCO), Aug 2015, pp. 1153–1157.

H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Estimation

of the direct-to-reverberant energy ratio using a spherical microphone array.,”

i
ii

in Proc. ACE Challenge Workshop, a satellite event of WASPAA, New Paltz,

NY, USA, Oct 2015.

H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Spatial noise

cancellation inside cars: Performance analysis and experimental results,” in
Proc. 2015 IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA), Oct 2015, pp. 1–5.

H. Chen, P. N. Samarasinghe, and T. D. Abhayapala, “In-car noise field

analysis and multi-zone noise cancellation quality estimation,” in Proc. 2015
Asia-Pacific Signal and Information Processing Association Annual Summit
and Conference (APSIPA), Dec 2015, pp. 773–778.

H. Chen, J. Zhang, P. N. Samarasinghe, and T. D. Abhayapala, “Evalua-

tion of spatial active noise cancellation performance using spherical harmonic
analysis,” in Proc. 2016 IEEE International Workshop on Acoustic Signal
Enhancement (IWAENC), Sept 2016, pp. 1–5.

H. Chen, T. D. Abhayapala, and W.Zhang, “Enhanced sound field reproduc-

tion within prioritized control region,” in INTER-NOISE and NOISE-CON
Congress and Conference Proceedings 2014, vol. 249, no. 3, pp. 4055–4064,
Nov 2014.

The following papers are also results from my Ph.D. study, but not included in
this thesis:

Conference Proceedings

G. Dickins, H. Chen and W. Zhang, “Soundfield control for consumer de-

vice testing”, in Proc. 9th International Conference on Signal Processing and
Communication Systems (ICSPCS’2015), Cairns, Australia, 2015.

The research work presented in this thesis has been performed jointly with Prof.
Thushara D. Abhayapala, Dr. Wen Zhang and Dr. Prasanga Samarasinghe. Ap-
proximately 80% of this work is my own.

Hanchi Chen
iii

Research School of Engineering

The Australian National University
Canberra ACT 2601
January 2017
Acknowledgments

Without the support of the many colleagues and friends, this work would have never
been complete. I would like to acknowledge and thank each of the following.

First and foremost, my supervisors, Prof. Thushara Abhayapala and Dr. Wen
Zhang, for their professional guidance and consistent encouragement. Special
thanks goes to Thushara, who had provided me with knowledge and experience
not only in research, but also in many other aspects of life.

Dr. Prasanga Samarasinghe, who had provided suggestions on many research

problems, and helped in the writing and editing of many papers.

Dr. Glenn Dickins for inviting me to visit the Dolby Labs, and sharing with
me his extensive knowledge on every aspect of audio.

The Australian National University, for the PhD scholarship and the funding
and assistance for my patent application.

My fellow students in the Applied Signal Processing Group, specially Jing,

Yurui, Xiang and Aimee for their true friendship.

Mr. Xianjun Zhen and Mr. Erasmo Scipione for providing technical support
and electronics parts for my experiments.

Mr. Yuki Mitsufuji for giving me the internship opportunity at Sony Japan.

My parents for sending me to Australia in the first place, and supporting my

study and life all these years.

Finally, my girlfriend Mendy, for accompanying me throughout my PhD study

and helping me out during the busiest days.

v
Abstract
The concept of spatial active noise control is to use a number of loudspeakers to
generate anti-noise sound waves, which would cancel the undesired acoustic noise
over a spatial region. The acoustic noise hazards that exist in a variety of situations
provide many potential applications for spatial ANC. However, using existing ANC
techniques, it is difficult to achieve satisfying noise reduction for a spatial area,
especially using a practical hardware setup. Therefore, this thesis explores various
aspects of spatial ANC, and seeks to develop algorithms and techniques to promote
the performance and feasibility of spatial ANC in real-life applications.
We use the spherical harmonic analysis technique as the basis for our research
in this work. This technique provides an accurate representation of the spatial
noise field, and enables in-depth analysis of the characteristics of the noise field.
Incorporating this technique into the design of spatial ANC systems, we developed
a series of algorithms and methods that optimizes the spatial ANC systems, towards
both improving noise reduction performance and reducing system complexity.
Several contributions of this work are: (i) design of compact planar microphone
array structures capable of recording 3D spatial sound fields, so that the noise field
can be monitored with minimum physical intrusion to the quiet zone, (ii) derivation
of a Direct-to-Reverberant Energy Ratio (DRR) estimation algorithm which can be
used for evaluating reverberant characteristics of a noisy environment, (iii) propose
a few methods to estimate and optimize spatial noise reduction of an ANC system,
including a new metric for measuring spatial noise energy level, and (iv) design of
an adaptive spatial ANC algorithm incorporating the spherical harmonic analysis
technique. The combination of these contributions enables the design of compact,
high performing spatial ANC systems for various applications.

vii
Contents

Declaration i

Acknowledgements v

Abstract vii

Notations and Symbols ix

1 Introduction 3
1.1 Motivation and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Recent advancements in spatial ANC . . . . . . . . . . . . . . . . . . 10
1.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Background: Spherical harmonic analysis and synthesis of sound

fields 15
2.1 Spherical harmonic expansion of a sound field . . . . . . . . . . . . . 15
2.2 Properties of the spherical harmonic expansion . . . . . . . . . . . . . 17
2.2.1 Recurrent property of associated Legendre functions . . . . . . 17
2.2.2 Addition theorem . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Rotation of spherical harmonics . . . . . . . . . . . . . . . . . 20
2.2.4 Relationship between first order spherical harmonics and par-
ticle velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.5 Real-valued spherical harmonics . . . . . . . . . . . . . . . . . 23
2.3 Spatial sound recording and synthesis using spherical harmonic ex-
pansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Spatial sound recording using spherical microphone array . . . 24
2.3.2 Spatial sound recording using non-spherical microphone array 25
2.3.3 Spatial sound synthesis based on mode matching . . . . . . . 26

ix
x Contents

3 Planar microphone array apertures for 3D spatial sound field anal-

ysis 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 First order microphones for sound field acquisition . . . . . . . . . . . 32
3.2.1 General expression for first order microphones . . . . . . . . . 32
3.2.2 Sampling on a plane . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Array configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Calculation of harmonic coefficients . . . . . . . . . . . . . . . 34
3.3.2 Discrete sensor placement: sampling of continuous aperture . . 38
3.3.3 Array design procedure . . . . . . . . . . . . . . . . . . . . . . 41
3.3.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.1 Differential microphone approximation . . . . . . . . . . . . . 43
3.4.2 Spatial sampling and spatial aliasing . . . . . . . . . . . . . . 44
3.5 Design examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.1 Hypothetical design example . . . . . . . . . . . . . . . . . . . 45
3.5.2 Array implementation . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.7 Related patents and publications . . . . . . . . . . . . . . . . . . . . 52

4 3D sound field analysis using circular higher order microphone ar-

ray 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Sound field model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Higher-order microphone array . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Higher-order microphone . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Continuous circular higher-order microphone array . . . . . . 55
4.3.3 Solving for global coefficients . . . . . . . . . . . . . . . . . . 57
4.3.4 Dimensionality analysis . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Contents xi

5 Direct-to-reverberant energy ratio estimation using a first order

microphone 63
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 DRR estimation based on coherence measurements . . . . . . . . . . 66
5.2.1 Representation of reverberant sound field . . . . . . . . . . . . 66
5.2.2 Representation of DRR using coherence function . . . . . . . . 68
5.2.3 Assumptions for the reverberant sound field . . . . . . . . . . 69
5.2.4 Reverberant field estimation . . . . . . . . . . . . . . . . . . . 72
5.2.5 DRR estimation procedure . . . . . . . . . . . . . . . . . . . . 73
5.3 Impact of parameters on DRR estimation . . . . . . . . . . . . . . . . 74
5.3.1 Reverberation parameter . . . . . . . . . . . . . . . . . . . . . 74
5.3.2 Nearfield sound source . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Validation using ACE Challenge Database . . . . . . . . . . . . . . . 77
5.4.1 The ACE Challenge Database . . . . . . . . . . . . . . . . . . 77
5.4.2 Algorithm setup . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.3 Full band results . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.4 Subband results . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.5 Impact of noise on DRR estimation . . . . . . . . . . . . . . . 83
5.4.6 Estimated parameters from the ACE Evaluation Dataset . . . 86
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7 Proof of Equation (5.22) . . . . . . . . . . . . . . . . . . . . . . . . . 87

6 Methods for spatial ANC performance evaluation and optimization 89

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2 Enhanced sound field reproduction within prioritized control region . 91
6.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 92
6.2.3 Combined Least Mean Square Solution for sound field repro-
duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.5 Observations and insights . . . . . . . . . . . . . . . . . . . . 101
6.3 Evaluation of spatial active noise cancellation performance using acous-
tic potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3.2 Calculation of the acoustic potential energy . . . . . . . . . . 103
xii Contents

6.3.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . 105

6.3.4 Result analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3.5 Observations and insights . . . . . . . . . . . . . . . . . . . . 110
6.4 In car spatial ANC performance analysis . . . . . . . . . . . . . . . . 111
6.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.3 Noise field characterization . . . . . . . . . . . . . . . . . . . . 112
6.4.4 Residual noise level estimation . . . . . . . . . . . . . . . . . . 114
6.4.5 Experiment on a single passenger seat . . . . . . . . . . . . . 115
6.4.6 Experiment with multiple passenger seats and limited loud-
speaker output power . . . . . . . . . . . . . . . . . . . . . . . 120
6.4.7 Observations and insights . . . . . . . . . . . . . . . . . . . . 126
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.6 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7 Spatial active noise cancellation system architectures 129

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Background theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.2.1 Time domain multi-channel feed-forward ANC architecture . . 131
7.2.2 Frequency domain feed-forward ANC architecture . . . . . . . 132
7.3 Frequency domain feed-forward architecture for spatial ANC systems 136
7.3.1 Existing spatial ANC system based on circular harmonic trans-
form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.3.2 Proposed spatial ANC system based on spherical harmonic
transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.4 Time domain feed-forward architecture for spatial ANC systems . . . 141
7.4.1 Time domain spherical harmonics representation of sound field 141
7.4.2 Spatial ANC architecture using time domain spherical har-
monics analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.5 Experiment validation . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.5.1 System setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.5.2 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . 148
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8 Conclusion and future works 157

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Contents xiii

8.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Bibliography 163
Notations and Symbols

d·e ceiling operator

b·c floor operator
[·]∗ complex conjugate of a matrix
[·]T transpose of a matrix
[·]H complex conjugate transpose of a matrix
|·| Euclidean norm of a vector
k·k `2 -norm of a vector
A−1 matrix psuedoinverse
E{·} Expectation operator
Re{·} real part
Im{·} imaginary part
δ{·} Dirac delta function
δnm {·} Kronecker delta function
{∗} linear convolution
FN [·] N-point Fast Fourier Transform
FN−1 [·] N-point Fast Fourier Transform
F −1 {·} Inverse Fourier Transform
√
i −1
ANC active noise cancellation
DRR direct-to-reverberant energy ratio
DOA direction of arrival
SNR signal to noise ratio

1
Chapter 1

Introduction

1.1 Motivation and scope

A wide range of human activities generate unwanted noise. Acoustic noise is one of
the most common hazards in the world. Exposure to acoustic noise causes discomfort
and pain; long term exposure to excessive noise can also result in chronic effects to
human health, especially hearing loss, which can limit one’s ability to hear high
frequency sounds and understand speech [1].
Methods to reduce excessive noise can be sorted into two categories: passive
noise control, and active noise control. Passive noise control methods utilize sound
absorbing materials, such as glasswool, acoustic foam, or other insulation materials
to absorb the impinging noise (Fig. 1.1). Sometimes the material is cut into special
geometries to enhance their sound absorbtion capabilities. The overall noise isolation
capability depends on a number of factors, including sound frequency, material
type, its thickness, and its geometry. In general, a common property of all passive
noise isolation materials is that the sound absorption coefficient rises with sound
frequency [2]. When the wavelength of the sound becomes larger than the thickness
of the material, it becomes difficult for the material to absorb the sound. As a
result, passive noise control systems perform well at higher frequencies, but their
effectiveness reduces significantly at low frequencies. In many real-world scenarios,
low frequency noise is dominant in the whole noise spectrum, in such cases, the
passive noise control method becomes less effective [2].
The alternative method is active noise control. Active noise control systems rely
on one or more loudspeakers, called “secondary sources”, which produce a sound
wave whose magnitude is the same as the noise but is 180°out of phase, so that the

3
4 Introduction

Figure 1.1: Passive noise control system.

two sound waves would cancel each other, thus reducing the noise level (Fig. 1.2).
Contrary to the passive noise control strategy, the active noise control method works
better at lower frequencies [2]. At lower frequencies (up to a few hundred Hz), the
wavelength of the sound is longer, thus making it easier for the anti-noise signal to
match with the unwanted noise.
The most commonly seen application of the active noise control technique is
the active noise cancelling (ANC) headphones (Fig. 1.3). The ANC headphones
typically employ a reference microphone, mounted on the outer surface of the head-
phone’s housing. The reference microphone picks up the ambient noise, and sends
the noise signal to a processing unit, which generates the anti-noise signals and
plays it through the headphone driver along with the music signal [3]. In some
designs, an additional error microphone is placed inside the ear cup to monitor the
residual noise. It is also possible to use a feedback ANC structure, where the ref-
erence microphone is omitted, one such design is detailed in [3]. Noise cancelling
headphones can yield reasonably good noise attenuation, partially due to the fact
that the secondary loudspeaker and the error microphone are placed very close to
the ear. According to [4], significant attenuation of sinusoidal noise signal can be
achieved for frequencies up to 2 kHz. Another study on consumer ANC headphone
performance [5] suggests that the noise reduction achievable by ANC headphones
is typically between 10 − 25 dB, and the performance is highly dependant on the
tightness of the wearing situation.
Although ANC headphones yield very good performance in terms of noise level
attenuation, one of its disadvantages is that the user is required to constantly wear
the headphone, which is inconvenient, or even impractical in many scenarios. In such
1.1 Motivation and scope 5

Figure 1.2: Active noise control system.

Figure 1.3: Structure of feed-forward active noise cancelling headphones.

6 Introduction

Figure 1.4: Feed-forward MIMO active noise cancelling system.

cases, it is desirable if the noise can be attenuated for a spatial area, such that people
within the area can enjoy a noise-free acoustical environment. A well developed
approach to achieve this goal is the Multiple-input-multiple-output (MIMO) ANC
systems, or multi-channel ANC systems [6]. In these systems, multiple secondary
loudspeakers are utilized to generate the anti-noise signals, while multiple error
microphones are distributed in the quiet zone to monitor the residual noise level.
For feed-forward systems, one or more reference microphones are placed close to the
noise source to pick up the noise signal; for feedback systems, reference microphones
are not needed [6]. Fig. 1.4 illustrates a feed-forward MIMO ANC system.
The MIMO ANC system has been successfully implemented to reduce the noise
in environments such as vehicle cabins [7, 8] and rooms [9]. However, conventional
MIMO ANC controllers minimize the sound pressure measured by the error mi-
crophones. Since the noise level is only known at the microphone locations, when
a number of error microphones are randomly distributed inside the desired quiet
zone, only the space in the proximity of each microphone can be expected to have
significant noise reduction; in the area not covered by microphones, noise reduction
cannot be guaranteed. One straightforward solution to this problem is to place a
large number of microphones inside the quiet zone, however this approach greatly
reduces the feasibility of MIMO ANC systems in real-life applications.
A potential way of overcoming this issue is to employ spatial sound analysis
techniques, where the noise sound captured by a microphone array is transformed
1.2 Problem description 7

into another domain, which results in a more accurate representation of the spatial
noise field. One of such techniques is the spherical harmonic analysis [10], where
the noise field inside a spherical region is decomposed into a series of spherical
harmonic functions. This technique allows accurate representation and reconstruc-
tion of the noise field, which makes it possible to perform ANC over a continuous
space, rather than at a number of sampling points. Furthermore, the transformation
into spherical harmonic domain allows in-depth analysis of the noise field, such as
Direction-of-Arrival Estimation (DOA) [11] and Direct-to-Reverberant Ratio (DRR)
estimation [12]. However, in order to perform the spherical harmonic analysis, the
error microphones need to be arranged in specific geometries, typically in a spherical
arrangement [13, 14].
In general, spherical microphone arrays designed for spherical harmonic analysis
of sound field can be divided into two categories: rigid sphere topology, and open
sphere topology. In the former case, the microphones are mounted on a rigid sphere
baffle whose radius is the same as that of the region of interest; while in the open
sphere case, microphones are placed on the surface of the region of interest, without
the use of a rigid baffle. However, for the open baffle topology, the microphone
array may suffer from ill-conditioning, due to the inherent properties of spherical
Bessel functions [15]. One way to overcome this is to use two concentric spherical
arrays with similar radius [15–18]. Although the open sphere topology is easier to
implement than the rigid sphere topology when the region size is large, the region
of interest is still fully surrounded by microphones, which limits its feasibility in
practical ANC applications.
From the discussion above, an unsolved problem regarding the active noise con-
trol technique can be summarized as follows:

How to attenuate a complicated noise field over a space using active noise can-
cellation strategies, especially with a hardware system that’s feasible for practical
applications?

1.2 Problem description

We breakdown this problem into a number of sub-tasks. As shown in Fig. 1.5, the
spatial ANC problem can be divided into two major components: modelling of the
noise field, and generation of anti-noise signal.
8 Introduction

Figure 1.5: Breakdown of the spatial active noise cancellation problem.

Noise field modelling is about acquiring information about the noise field, so that
an ANC algorithm can use this information to generate suitable anti-noise signals to
cancel the noise. This can be further divided into two elements, namely the real-time
tracking of the noise, and characterization of the noise field. The real-time tracking
of the noise happens while the ANC system is online, it provides the ANC system
with the instantaneous noise field information, and measures the noise attenuation
achieved by the ANC system, so that the system can quickly respond to changes
of the noise field and minimize the residual noise. A number of sensors, typically
microphones, are usually employed to keep track of the noise in real time. The
number and position of these sensors play a key role in determining the performance
of an ANC system. Although distributing a large number of sensors over the entire
quiet zone would provide very complete information of the noise field, practical
applications demand for more compact and economic sensing solutions. In addition,
in some ANC systems, the reference noise is synthesized based on measurement of
noise source movement (such as engine rotation) and some prior knowledge of the
noise composition (such as harmonic components), this can also be categorized into
real-time noise tracking.
On the other hand, modelling of the noise characteristics can be done without a
functional ANC system, it is about analyzing the nature of the spatial noise, such
as its spectrum, direction of arrival, and spatial dimensionality. These information
help to determine whether or not a given noise environment is suitable for spatial
ANC, and whether the characteristics of the noise field can be exploited to simplify
the design complexity of the spatial ANC system. Both the noise source itself and a
reverberant environment contribute to the characteristics of the spatial noise field.
Modelling them separately would provide more insights into the noise field.
1.2 Problem description 9

Since the goal of every ANC system is to use a suitable sound wave to cancel the
noise, generation of the optimal anti-noise signal is critical to the performance of a
spatial ANC system. The position of the secondary loudspeakers, and consequently
the sound field they can produce inside the quiet zone, play an important role in
determining the anti-noise signal to be played. Badly positioned loudspeakers may
result in very high driving signals, causing excessive noise level outside the quiet
zone without achieving any significant noise attenuation inside the quiet zone, and
may damage the loudspeakers themselves; on the other hand, a few well-placed loud-
speakers may be able to minimize the noise level over a large region with very small
output power. Once the loudspeakers are placed, it is then critical to accurately
measure, and keep track of, the acoustic signal channels between each loudspeaker
and the quiet zone, as inaccurate channel information can cause instability of the
ANC system. Therefore studying the placement of loudspeakers and estimating the
loudspeaker channels is very necessary for designing compact and efficient spatial
ANC systems.
Designing loudspeakers suitable for spatial ANC is also important, and some of
the design goals are different from that of consumer loudspeaker products. While
consumer products aim for wide and flat frequency response, strong and deep bass,
and an attractive design, loudspeakers designed for ANC purposes should have char-
acteristics such as low harmonic distortion (especially at low frequencies), high sen-
sitivity, and good power handling capabilities, combined with a small form factor.
The frequency range can be just as wide as the target noise frequency band, and a
flat response curve is not necessary, since the adaptive filter will act as an equalizer
automatically. Although in some cases, the loudspeakers designed for music listen-
ing have to be employed for ANC purposes, such as in-car noise cancellation, it is
still desirable to keep in mind of the properties that make a good ANC loudspeaker
while selecting speakers for the ANC system.
The active noise control algorithm governs how the anti-noise signal is generated,
and depending on the optimization criteria, each algorithm would result in different
noise attenuation level at each position within the quiet zone. The Least-Mean-
Square error algorithm, commonly used in existing multi-channel ANC systems, may
not result in the best performance in a spatial noise control application. Utilizing
the latest spatial sound field analysis techniques, more advanced ANC algorithms
may be developed.
Many active noise control systems utilize an adaptive algorithm to estimate the
noise channel, as well as generate the driving signals for loudspeakers. The use of
10 Introduction

adaptive algorithm enables an ANC system to quickly respond to changes in the

noise signal, and continuously generate the anti-noise signals most suited for the
current noise signals. To yield the optimum spatial noise attenuation, it is necessary
to incorporate the spatial sound processing algorithms, especially the optimum spa-
tial ANC algorithms discussed above into the real-time adaptive algorithm, so that
the adaptive ANC system can generate the optimum anti-noise signals in real time.
Motivated by the above problem, we develop a series of techniques to improve the
performance and feasibility of spatial ANC systems, as well as methods to analyze
the spatial noise field, with the aim of aiding development and evaluation of spatial
ANC systems.

1.3 Recent advancements in spatial ANC

In recent years, researchers have made significant progress in the field of spatial ANC.
In [19], the authors investigated the problem of cancelling the noise propagating
through an open window. In this work, a number of pre-set filters are used to cancel
the spatial noise through the window, an additional algorithm recognizes the nature
of the impinging noise, and selects the most suitable filter for the ANC system. This
open-loop ANC system is further investigated in [20]. Further more, a mixed-error
approach to reduce adaptive filter complexity for the open window ANC application
has been presented in [21].
ANC systems aimed to reduce in-car noise are investigated in [22–25] and [26],
with [22–24] specifically target to control the road noise during driving. The results
in [22] show that up to 8 dB of noise reduction is achievable for lower frequencies.
In [27], a MIMO ANC system is deployed in the master cabin of a yacht to cancel
the noise of the diesel engine, yielding a 23% reduction in the noise loudness.
Improvements to the adaptive algorithms for ANC systems have also been pro-
posed. The performance of multiple subband MIMO adaptive algorithms were stud-
ied in [28]. A new feedback adaptive ANC algorithm with faster convergence rate
was proposed in [29], where an adaptive notch filter is used to track the frequency
components of the noise. Jihui et. al. proposed a feedback adaptive spatial ANC al-
gorithm [30], which is capable of cancelling the impinging noise over a spatial region.
For 2D spatial ANC, Spors et. al. [31] proposed a feed-forward adaptive algorithm
based on circular harmonic transform, this algorithm is able to significantly reduce
the computational complexity of massive ANC systems.
1.4 Thesis outline 11

1.4 Thesis outline

This thesis is organized into 8 chapters. The key contributions of each chapter are:

Chapter 2 - Background: Spherical harmonic analysis and

synthesis of sound fields
This chapter briefly reviews the theory of spherical harmonic analysis for spatial
sound, and presents a number of properties of the spherical harmonics. The tech-
niques for spatial sound recording and synthesis using spherical microphone / loud-
speaker arrays are also briefly reviewed. These properties and techniques are used
later in the thesis for the development of various theories and techniques. In this
chapter, we also derive the mathematical relationship between the first order spher-
ical harmonics and acoustic particle velocity. Although this relationship has been
assumed to exist and used in the literature, its mathematical proof has not been
proposed, to the best of our knowledge.

Chapter 3 - Planar microphone array apertures for 3D spatial

sound field analysis
In this chapter, we present a novel method to capture 3D spatial sound fields using
a 2D planar microphone array. In general, it is assumed that capturing a 3D sound
field requires the use of a microphone array with 3D geometry. Here, we explain the
reason of this requirement by investigating the properties of the spherical harmonics.
We also show that by exploiting a property of the associated Legendre functions, it
is possible to capture full 3D sound field using first order differential microphones,
placed on a 2D plane. A planar microphone array structure consisting of multi-
ple concentric circular arrays is proposed, as well as an algorithm to calculate the
spherical harmonic coefficients of the sound field using this array structure.

Chapter 4 - 3D sound field analysis using circular higher or-

der microphone array
This chapter develops a method to use circular higher order microphone arrays
placed on a 2D plane to capture 3D spatial sound. We use the spherical harmonic
addition theorem to derive a method for calculation spherical harmonic coefficients
12 Introduction

of a large sound field, using the local spherical harmonic coefficients captured by
each higher order microphone. Compared to the method developed in Chapter 3,
this method requires significantly smaller number of microphone units, due to the
use of higher order microphones. This method can be seen as a generalization of the
method proposed in Chapter 3.

Chapter 5 - Direct-to-reverberant energy ratio estimation us-

ing a first order microphone

This chapter presents an algorithm for DRR estimation using a first order micro-
phone system, which helps to characterize the noise environment, and the relevant
room acoustics. Using the relationship between first order spherical harmonics and
the acoustic particle velocity developed in Chapter 2, we derive an expression for
modelling certain characteristics of the reverberation that are related to DRR esti-
mation. Based on the estimated reverberation characteristics, we use the coherence
function between sound pressure and particle velocity to estimate DRR. All the re-
quired data can be obtained using a single first order microphone. The proposed
method addresses the overestimation problem observed in a previous DRR estima-
tion algorithm.

Chapter 6 - Methods for spatial ANC performance evaluation

and optimization

In Chapter 6, we develop a series of methods to estimate and optimize spatial noise

control performance. First, in Section 6.2, we present a method to maximize the
noise reduction at certain high priority sub-regions within the global quiet zone,
this technique is particularly useful when the number of secondary loudspeakers is
insufficient. Then, in Section 6.3, we propose a new metric for measuring spatial
noise level. This metric provides a more robust and accurate representation of the
average noise energy over space compared to existing metrics. Finally, in Section 6.4,
we use the proposed metric to develop a method for estimating the potential ANC
performance for a given noise environment and loudspeaker setup. This method is
then used to estimate in-car ANC performance for both single and multiple seats
scenarios.
1.4 Thesis outline 13

Chapter 7 - Spatial Active Noise Cancellation System Archi-

tectures
In this chapter, we present a novel adaptive algorithm designed for spatial ANC. The
proposed algorithm is based on the conventional multi-channel feed-forward adap-
tive algorithm, but incorporates the spherical harmonic transform, thereby achieving
superior spatial ANC performance compared to existing multi-channel ANC algo-
rithms. We present both frequency domain and time domain implementation of the
algorithm, which are mathematically equivalent but each feature their own advan-
tages. An experimental ANC system utilizing the proposed time domain algorithm
is implemented in our laboratory, and we use this system to investigate the impact
of secondary loudspeaker placement on the performance of the ANC system.

Chapter 8 - Conclusion and future works

Chapter 8 concludes this thesis, as well as discusses a number of future works which
would further improve the performance and practical feasibility of spatial active
noise control systems.
Chapter 2

Background: Spherical harmonic

analysis and synthesis of sound
fields

Overview: This chapter provides a brief overview of the theory and techniques re-
lated to spherical harmonic analysis. We first introduce the mathematical expressions
of the spherical harmonic expansion, and show how these expressions can be used
to express a spatial sound field. Then, we present a number of special properties of
the spherical harmonics. This is followed by a review of the techniques for recording
spatial sound using microphone arrays, as well as synthesizing spatial sound using
loudspeaker arrays, both of which are based on spherical harmonic analysis. The
techniques described in this chapter form a foundation for the rest of the thesis.

2.1 Spherical harmonic expansion of a sound field

Three dimensional (3D) sound field decomposition based on spherical harmonic anal-
ysis has become a popular tool in the field of array signal processing. Applications
of this technique can be found in both acoustic and radio frequency (RF) areas,
such as spatial filtering and beamforming [32–38], room acoustic modeling [39–42],
sound field analysis [10, 43], spatial sound field reproduction [44–46], source local-
ization [11, 39, 47], active noise control [31, 48, 49], and phase mode processing for
antenna arrays [50]. While other spatial sound field representation and reproduc-
tion techniques such as wave front synthesis [51] and plane wave decomposition [52]
each have their own advantages, spherical harmonic analysis based 3D sound field

15
16 Background: Spherical harmonic analysis and synthesis of sound fields

decomposition reveals the underlying characteristics of the sound field, thus allow-
ing high accuracy manipulation and analysis of the sound field, therefore for this
thesis, we choose to use spherical harmonic analysis as the fundamental tool for the
development of the theories.
The essential idea of spherical harmonic analysis of a sound field is to use the
weighted sum of a set of orthogonal basis functions to describe the pressure field of
propagating sound. These functions, known as spherical harmonics, are solutions to
the Helmotz wave equation in the 3D space for representing the propagation modes
of a sound wave.
The spherical harmonics expansion of a sound field is divided into two cases: the
interior field expression, and the exterior field expression. The former is used to
describe the wave field within a spatial region with no sound source inside, and all
impinging sound waves are due to sources outside the region; the latter is used for
the situations where the sound sources are positioned within a limited area, and the
region of interest is defined as the space enclosing the source area.
In this work, only the interior field problem is considered, therefore we only
describe the spherical harmonics expansion for the interior field case in this section.
Consider a sound field within a source free region, the sound pressure at a point
(r, θ, φ) with respect to the origin O can be can be expressed as [46]

∞ X
X l
P (r, θ, φ, k) = Clm (k)jl (kr)Ylm (θ, φ) (2.1)
l=0 m=−l

where Clm (k) are spherical harmonic coefficients, k = 2πf /c is the wave number, f
is the frequency, c is the speed of sound propagation, jl (kr) is the lth order spherical
Bessel function of the first kind, and Ylm (θ, φ) are the spherical harmonics, defined
by
Ylm (θ, φ) = Pl|m| (cos θ)Em (φ) (2.2)

where
s
(2l + 1) (l − |m|)!
Pl|m| (cos θ) , Pl|m| (cos θ), and (2.3)
2 (l + |m|)!
√
Em (φ) , (1/ 2π)eimφ (2.4)

are the normalized associated Legendre functions and normalized exponential func-
tions, respectively; Pl|m| (cos θ) are the associated Legendre functions.
2.2 Properties of the spherical harmonic expansion 17

Ylm (ϑ, ϕ) has the orthogonal property

Z π Z 2π
Ylm (θ, φ)Yl∗0 m0 (θ, φ) sin(θ)dθdφ = δl−l0 ,m−m0 , (2.5)
0 0

where δl,m is the two dimensional Dirac Delta function. The orthogonal property of
the spherical harmonics is very useful in simplifying the mathematical expressions
related to spatial sound, this property will be utilized later in this thesis in the
derivation of many results.
It can be seen from the decomposition (2.1) and the expression (2.2) that the
spherical bessel function jl (kr) governs the radial and frequency dependant compo-
nent of the basis functions, while Pl|m| (cos θ) and Em (φ) govern the elevation and
azimuth components, respectively. Due to the low pass nature of spherical Bessel
functions, spherical harmonics of higher order l has very little energy when the value
of kr is lower than a certain threshold. Therefore, a common practice is to truncate
the infinite summation in (2.1) at a maximum order l = L, such that the finite
summation provides an accurate approximation of the sound field, thus (2.1) can be
approximated as

L X
X l
P (r, θ, φ, k) ≈ Clm (k)jl (kr)Ylm (θ, φ). (2.6)
l=0 m=−l

A rule of thumb for determining the upper bound L is given by [53–55]

ekr
L=d e, (2.7)
2

where e is the natural exponential. Using this truncation, the number of spherical
harmonics required to approximate any sound field of a certain radius and frequency
is limited to (L + 1)2 .

2.2 Properties of the spherical harmonic expan-

sion

2.2.1 Recurrent property of associated Legendre functions

A recurrent relationship between the associated Legendre function and its first order
derivative is given by [56]
18 Background: Spherical harmonic analysis and synthesis of sound fields

dPl|m| (x)
(x2 − 1) = nxPl|m| (x) − (|m| + l)P(l−1),|m| (x). (2.8)
dx
In the special case where x = 0, (2.8) can be simplified to

P 0 l|m| (0) = (|m| + l)P(l−1),|m| (0), (2.9)

which indicates that the first order derivative of the associated Legendre functions
at x = 0 can be directly calculated from the same functions of a lower order.
0
By taking the derivative of (2.3) and setting cos θ = 0, expressing Pl|m| (0) using
(2.9) and expressing P(l−1),|m| (0) with Pl−1|m| (0) using (2.3), we derive the following
relationship for the normalised associate Legendre functions
s
(2l + 1)(l2 − m2 )
P 0 l|m| (0) = P(l−1)|m| (0), (2.10)
(2l − 1)

which illustrates a relationship between the normalized associate Legendre functions

and their first order derivatives. We will show in Chapter 3 that this property can
be exploited to develop compact microphone arrays for spatial sound recording.

2.2.2 Addition theorem

The addition theorem describes the relationship of spherical harmonic coefficients
with respect to two different coordinate systems. It shows that each spherical
harmonic coefficient with respect to one coordinate system can be expressed as
a weighted sum of the coefficients with respect to another coordinate system. The
addition theorem also has an interior field variant as well as an exterior field variant.
In this section, we briefly outline the addition theorem for the interior field case.
In addition to the coordinate system with origin O, we define a new coordinate
system with its origin O0 located at R = (R, ϑ, ϕ) with respect to O. The two
coordinate system are defined such that they have the same orientation, i.e., the x,
y and z axes of the two coordinate systems point in the same directions.
The sound field with respect to O0 can also be decomposed using spherical har-
monics, but using another set of weighting coefficients Bνµ , such that the sound
pressure at at point (r0 , θ0 , φ0 ) with respect to O0 can be expressed by
∞ X
X ν
0 0 0
P (r , θ , φ , k) = Bνµ (k)jν (kr0 )Yνµ (θ0 , φ0 ). (2.11)
ν=0 µ=−ν
2.2 Properties of the spherical harmonic expansion 19

The relationship between Clm and Bνµ can be described by the spherical har-
monic addition theorem [57]. The relationship can be written as [58]

∞ X
X l
mµ
Bνµ = Clm Sblν (R), (2.12)
l=0 m=−l

where
l+ν+1
X
mµ ∗
Sblν (R) = 4πiν−l i` (−1)2m−µ j` (kR)Y`(µ−m) (ϑ, ϕ)W, (2.13)
`=|µ−m|
r
(2l + 1)(2ν + 1)(2` + 1)
W = W1 W2 . (2.14)
4π
Here, W1 and W2 denote Wigner 3-j symbols, with
! !
l ν ` l ν `
W1 = , W2 = . (2.15)
0 0 0 m −µ µ − m

It can be seen that by substituting (2.12) into (2.11), one can derive the sound
pressure decomposition of a given point with respect to O0 using the spherical har-
monic coefficients with respect to O.

Equation (2.12) can be conveniently represented in matrix form, as

B = SC,
b (2.16)
h iT h iT
where C = C00 C11 C10 . . . CLL and B = B00 B11 B10 . . . BV V . S b
is the translation matrix that maps the coefficients C to the coordinate system O0 .
b consists of all the Sbmµ (R) needed to translate C into B, the orders of Sbνµ (R)
S lν lm
are arranged in correspondence with B and C, thus S can be written as [58]
b
 
Sb00 00
Sb11 00
Sb10 00
. . . SbLL
 0011 11 11

11 
 Sb00 Sb11 Sb10 . . . SbLL
 
10 10 10 10 
S  Sb00
b = Sb11 Sb10 . . . SbLL . (2.17)
 . .. .. .. .. 
 .. . . . . 
 
VV
Sb00 Sb11V
V
Sb10V
V VV
. . . SLL
b

For a given maximum order L, there are a total number of (L + 1)2 spherical har-
b becomes (V + 1)2 by (L + 1)2 .
monics available, thus the size of S
20 Background: Spherical harmonic analysis and synthesis of sound fields

2.2.3 Rotation of spherical harmonics

In some applications, it is convenient to perform a rotation to the coordinate system.
In this case, the spherical harmonics of the sound field would also need to be rotated
such that they still represent the same sound field. Here, we outline a method to
perform such rotation to the spherical harmonic coefficients.
The rotation can be performed through a transformation matrix M , so that the
original and transformed coefficients can be expressed as
    
β00 M 00 M11 00
M10 00 00
M1,−1 ... C00
   00 11 11 11 11
 
 β11   M00 M11 M10 M1,−1 . . .  C11 
    
 β   M 10 M11 10
M10 10 10
M1,−1 . . .
 10  =  00   C10  , (2.18)
 
   1,−1 1,−1 1,−1 1,−1  
β1,−1  M00 M11 M10 M1,−1 . . . C1,−1 
.. .. .. .. .. ..
    
..
. . . . . . .

where βlm and Clm represent the spherical harmonic coefficients after and before
l0 m0
rotation, respectively. The values of Mlm can be calculated using numerical inte-
gration [59], Z
lm0 0 ∗
Mlm = Yl0 m0 (Rs)Ylm (s)ds (2.19)
s

where R denotes the rotation matrix for the spherical coordinates.

2.2.4 Relationship between first order spherical harmonics

and particle velocity
The term “particle velocity” is commonly used to describe the velocity component
of impinging sound, it refers to the velocity of particle movement in the medium
during wave propagation. In the literature, microphones with first order beam
patterns (such as cardioid and dipolar) are considered to have the capability of
picking up this velocity component of sound. This suggests that the first order
spherical harmonics, which form all of the first order beam patterns, should have
some mathematical relationship with the particle velocity of the sound.
Here, we derive the expressions that relate the 1st order spherical harmonic
coefficients to the acoustic particle velocity in the x, y and z directions.
Defining the spherical coordinate system (r, θ, φ) in relation to the Cartesian
coordinate system, the particle velocity at the origin is related to the spherical
harmonic coefficients by the following theorem:
2.2 Properties of the spherical harmonic expansion 21

Theorem 1. The acoustic particle velocity at the point 0 ≡ (0, 0, 0) along the x, y
and z axes at a particular frequency k can be expressed using the first order spherical
harmonic coefficients,

iρ0 c
Vx (0, k) = √ (C11 (k) + C1,−1 (k)) (2.20)
24π
−ρ0 c
Vy (0, k) = √ (C11 (k) − C1,−1 (k)) (2.21)
24π
iρ0 c
Vz (0, k) = √ C10 (k), (2.22)
12π

where ρ0 is the density of the medium, c is the speed of sound, and Clm (k) denotes
the spherical harmonic coefficient of order l and mode m.

Proof. The particle velocity Vx (x0 , k) at position x0 , in the direction x, is related

to the sound pressure by [60]

i ∂P (x0 , k)
Vx (x0 , k) = . (2.23)
kρ0 c ∂x

For the proof of (2.20), we consider the sound pressure at a point on the x-axis, whose
coordinate in the spherical coordinate system is (r, π/2, 0), the sound pressure can
be decomposed using (2.1),

∞ X l
π X π
P (r, , 0, k) = Clm (k)jl (kr)Ylm ( , 0). (2.24)
2 l=0 m=−l
2

Taking the partial derivative of P (r, π/2, 0, k) in the direction of r, which is equiv-
alent to ∂P (x,y,z)
∂x
, we have

∞ l
∂P (r, π2 , 0, k) X X ∂jl (kr) π
= Clm (k) Ylm ( , 0) (2.25)
∂r l=0 m=−l
∂r 2

Since we consider the partial derivative at the origin, we let r → 0. Using the
recurrent relationship [61]

d jl (x)
ljl−1 (x) − (l + 1)jl+1 (x) = (2l + 1) , (2.26)
dx
22 Background: Spherical harmonic analysis and synthesis of sound fields

and the fact that 

1, if l = 0
jl (0) = (2.27)
0, if l = 1, 2, 3...

It can be shown that


∂jl (kr) k/3, if n = 1
lim = (2.28)
r→0 ∂r 0, otherwise.

In addition, Y10 (π/2, 0) = 0. Therefore from (2.25) we have

∂P (r, π2 , 0, k) k π π
lim = C11 Y11 ( , 0) + C1,−1 Y1,−1 ( , 0) (2.29)
r→0 ∂r 3 2 2
p
Substituting (2.29) into (2.23) with the values Y11 (π/2, 0) = Y1,−1 (π/2, 0) = 3/8π
completes the proof.

For the proof of (2.21), we consider the partial derivative of sound pressure at
(r, π/2, π/2). The derivation is identical to that of ∂P
∂x
, except that Ylm (π/2, 0) are
replaced by Ylm (π/2, π/2).

In the case of (2.22), we consider the partial derivative of sound pressure at

(r, 0, φ) along r. Similar to (2.25), we can write

∞ l
∂P (r, 0, φ, k) X X ∂jl (kr)
= Clm (k) Ylm (0, φ). (2.30)
∂r l=0 m=−l
∂r

Due to the fact that Y11 (0, φ) = 0 and Y1,−1 (0, φ) = 0, and utilizing (2.28), we can
simplify (2.30), such that

∂P (r, 0, φ, k) k
lim = C10 Y10 (0, φ) (2.31)
r→0 ∂r 3
p
Substituting (2.31) into (2.23) with Y10 (0, φ) = 3/4π into completes the proof.

Theorem 1 provides a direct link between the signal received by a first order
microphone and the 1st order spherical harmonic coefficients representing the sound
field. For example, when placing a bi-directional microphone at the origin, with its
two beams coincide with the z axis, then the signal received by the microphone is
equivalent to the coefficient C10 , up to a constant scaling factor.
2.2 Properties of the spherical harmonic expansion 23

2.2.5 Real-valued spherical harmonics

The technique of spherical harmonic analysis is widely used in areas other than
spatial audio, such as geophysics [62, 63] and computer graphics [59]. In many of
these applications, the spherical functions to be analyzed are real-valued. For these
applications, it is sufficient to use real-value spherical harmonics to decompose the
spatial functions.

The real-value spherical harmonics can be defined as [62]

q
2l+1 (l−|m|)!
 Pl|m| (cos θ) cos(mφ), if m > 0;
q 4π (l+|m|)!


R 2l+1 (l−|m|)!
Ylm (θ, φ) = P (cos θ), if m = 0;
4π (l+|m|)! l|m|
(2.32)

 q
 2l+1 (l−|m|)! P (cos θ) sin(mφ), if m < 0.

4π (l+|m|)! l|m|

The real-value spherical harmonics have the orthogonal property

Z π Z 2π
R
Ylm (θ, φ)YlR
0 m0 (θ, φ) sin(θ)dθdφ = δl−l0 ,m−m0 . (2.33)
0 0

Compared to the complex-value spherical harmonics, it can be seen that the only
difference is that instead of using the complex exponential eimφ to express the func-
tion in the azimuth direction, the real-value spherical harmonics use the sinusoid
functions. Therefore, many properties of the complex-value spherical harmonics are
also valid for the real-value spherical harmonics.

It can be seen that the complex-value and real-value spherical harmonics are
related through the following equation

R Ylm (θ, φ) + Yl,−m (θ, φ)

Ylm (θ, φ) = . (2.34)
2

The complex-value spherical harmonics are used for analyzing the spatial sound
in the frequency domain. However, the time domain sound pressure signal is real-
valued, therefore, if the spherical harmonic analysis is performed in the time domain,
it is preferable to use real-value spherical harmonics instead. This is discussed in
detail in Chapter 7.
24 Background: Spherical harmonic analysis and synthesis of sound fields

2.3 Spatial sound recording and synthesis using

spherical harmonic expansion

2.3.1 Spatial sound recording using spherical microphone

array

The spherical microphone arrays are very suitable for capturing the spherical har-
monic coefficients of a spatial sound field, since their geometry coincide with that of
the spherical harmonics. The methods to capture spherical harmonics using open
and rigid spherical microphone arrays have been described in [10] and [64]. The or-
thogonal property of the spherical harmonics is exploited in both of these methods.
For open sphere microphone arrays with radius R, the sound pressure on the
surface of the spherical array can be expressed using (2.1). Multiplying both sides
∗
of (2.1) with Ylm (θ, φ) and integrating over the sphere yields
Z π Z 2π
∗
Clm (k)jl (kR) = P (R, θ, φ, k)Ylm (θ, φ)dθdφ, (2.35)
0 0

where the orthogonal property (2.5) is used in the derivation.

The integration in (2.35) can be approximated using a finite number of micro-
phones, placed uniformly over the sphere. The spherical harmonic coefficients can
thus be calculated using [10]

1 X ∗
Clm (k) = P (R, θi , φi , k)Ylm (θi , φi )γi , (2.36)
jl (kR) i

where θi and φi are the elevation and azimuth angle of the ith microphone, and γi
are some weighting coefficients specific to the sampling scheme of the microphone
array. The number of microphones on the sphere should be no fewer than (L + 1)2 ,
where L is the maximum order of the spatial sound in the area, determined using
(2.7).
In the case of rigid sphere microphone array, the microphones are mounted on
a rigid spherical baffle. The sound field around the microphone array is affected by
the baffle, and the sound pressure on the surface of the baffle can be expressed by

∞ X
X l
P (R, θ, φ, k) = Clm (k)bl (kR)Ylm (θ, φ), (2.37)
l=0 m=−l
2.3 Spatial sound recording and synthesis using spherical harmonic expansion 25

where
(2) 0
h (kR) (2)
bl (kR) = jl (kR) − l 0 h (kR), (2.38)
jl (kR) l
(2)
and hl (kR) is the spherical Hankel function of the second kind. Using the same
spherical integration method, the spherical harmonics can be calculated as [64]

1 X ∗
Clm = P (R, θi , φi , k)Ylm (θi , φi )γi , (2.39)
bl (kR) i

Compared to the open sphere microphone array, the rigid sphere array avoids the ill-
conditioning problem caused by jl (kR) approaching zero at certain combinations of k
and R. However, the rigid baffle completely encloses the region of interest, rendering
this array format hard to implement in larger sizes, and hinders its application in
fields such as spatial ANC.

2.3.2 Spatial sound recording using non-spherical micro-

phone array

Non-spherical microphone array layouts have also been proposed for the purpose
of spatial sound recording based on spherical harmonic analysis. In [13], it is pro-
posed to use multiple circular microphone arrays to capture the spatial sound. This
method offers superior flexibility in terms of array geometry compared to spherical
microphone arrays, since the radius and position of each circular array can vary
within a certain limit. We briefly outline this work in this section.
Consider a circular microphone array placed parallel to the x − y plane, with its
center located on the z axis. The sound pressure at a point on the array can be
expressed using (2.1) and (2.2) as

∞ X
X l
P (R, ϑ, φ, k) = Clm (k)jl (kR)Pl|m| (cos ϑ)Em (φ), (2.40)
l=0 m=−l

where R is the distance from the origin to the circular array, and ϑ is the elevation
angle of the array. Multiplying both sides of (2.40) by E−m (φ) and integrate with
respect to φ over [0, 2π), we have

L
X
αm (R, ϑ, k) = Clm (k)jl (kR)Pl|m| (cos ϑ), (2.41)
l=|m|
26 Background: Spherical harmonic analysis and synthesis of sound fields

where we define Z 2π
αm (R, ϑ, k) , P (R, ϑ, φ, k)E−m (φ)dφ. (2.42)
0

For a given circular array, the maximum order of observable spherical harmonic is
limited by (2.7). Equation (2.42) can be evaluated for m = −L, −L + 1...L.
When multiple circular arrays are deployed, each with radius and elevation angle
(Rq , ϑq ), the spherical harmonic coefficients of mode m can be solved through solving
the LMS problem
J m C m = αm , (2.43)

where C m = [C(|m|,m) , C|m|+1,m ...CLm ] is a vector containing all the spherical har-
1 2 Q
monics of mode m, αm = [αm , αm ...αm ] is a vector containing αm from the qth
circular array, and
 
j|m| (kR1 )P|m|,|m| (ϑ1 ) j|m|+1 (kR1 )P|m|,|m| (ϑ1 ) ...jL (kR1 )P|m|,|m| (ϑ1 )
 j|m| (kR2 )P|m|,|m| (ϑ2 ) j|m|+1 (kR2 )P|m|,|m| (ϑ2 ) ...jL (kR2 )P|m|,|m| (ϑ2 ) 
 
Jm = .. .. .. .. .

 . . . .


j|m| (kRQ )P|m|,|m| (ϑQ ) j|m|+1 (kRQ )P|m|,|m| (ϑQ ) . . . jL (kRQ )P|m|,|m| (ϑQ )
(2.44)
The complete set of spherical harmonics can be found by solving (2.43) for every
value of m which satisfies |m| ≤ L. At certain combinations of array radius, position
and sound frequency, the value of jl (kR)Pl|m| (ϑ) may equal to zero for some l and
m [13]. Ill-conditioning of J m due to this phenomenon can be avoided by employing
extra circular microphone arrays [13].
Compared to spherical microphone array apertures, this method allows more
flexible placement of the microphones, and the use of circular arrays can simplify
the supporting structure for the microphones. Therefore this method presents a
more practical solution for spatial sound recording over a larger region.

2.3.3 Spatial sound synthesis based on mode matching

Similar to the recording of spatial sound, a desired spatial sound field can be synthe-
sized using a loudspeaker array, placed around the designated reproduction region.
A commonly used spatial sound synthesis method is the spherical harmonic mode
matching method, where the loudspeaker array often has a spherical shape, such as
in [44–46].
The mode matching method aims to find driving signals for each loudspeaker
2.3 Spatial sound recording and synthesis using spherical harmonic expansion 27

on the array so that the combined spherical harmonic coefficients due to all the
loudspeakers equal to some desired value, i.e.,
X q desire
Dq Hlm = Clm , (2.45)
q

q
where Hlm denotes the spherical harmonic coefficients due to the qth loudspeaker
desire
playing a unit signal, Clm denotes the spherical harmonic coefficient of the desired
sound field, and Dq is the driving signal for the qth loudspeaker. This problem can
be solved in a LMS manner, as

D = H −1 C desire , (2.46)

where D = [D1 , D2 ...DQ ]T is the vector containing all the driving signals, C desire =
desire desire desire T
[C00 , C11 ...CLL ] is the vector of desired spherical harmonic coefficients, and
 Q

1 1
H00 H00 . . . H00
 1 1 Q
 H11 H11 . . . H11 
H=
 .. .. ... ..  (2.47)
 . . . 

1 1 Q
HLL HLL . . . HLL

is the channel matrix containing the spherical harmonic coefficients due to each
loudspeaker.
Assuming a loudspeaker can be modeled as a point source, the sound field due
to a loudspeaker placed at (R, ϑ, ϕ) can be expanded as [44]

∞
X l
X
P (r, θ, φ, k) = ik jl (kr)hl (kR) Ylm (θ, φ)Ylm (ϑ, ϕ)∗ , (2.48)
l=0 m=−l

if the loudspeaker is placed at a long distance from the reproduction region, its
sound wave can be seen as plane wave, which can be expanded as [44]

∞
X l
X
P (r, θ, φ, k) = 4π l
jl (kr)i Ylm (θ, φ)Ylm (ϑ, ϕ)∗ , (2.49)
l=0 m=−l

Comparing (2.48) and (2.49) with (2.1), it can be seen that the spherical harmonic
coefficients corresponding to a point source and a plane wave source are

point
Hlm (R, ϑ, ϕ) = ikhl (kR)Ylm (ϑ, ϕ) (2.50)
28 Background: Spherical harmonic analysis and synthesis of sound fields

and
plane
Hlm (ϑ, ϕ) = 4πil Ylm (ϑ, ϕ) (2.51)

respectively. If the loudspeakers are arranged in a spherical geometry around the re-
production region, with a uniform spherical sampling scheme, due to the orthogonal
property of the spherical harmonics, we have

H −1 = H H , (2.52)

thus the driving signals for each loudspeaker can be solved using

D = H H C desire . (2.53)

Perfect reproduction of the desired sound field cannot be guaranteed if the loud-
speakers are not distributed evenly around the sphere, or an insufficient number of
loudspeakers are available. However, if no less than (L + 1) number of loudspeakers
are used, and uniformly distributed in a spherical arrangement, high quality sound
field reproduction can be achieved [44].

The technique of spherical harmonic expansion, the properties of the spherical

harmonics as well as the spatial sound recording and synthesis techniques discussed
in this chapter form a foundation for the algorithms and techniques to be developed
in the later chapters of this thesis.
Chapter 3

Planar microphone array

apertures for 3D spatial sound
field analysis

Overview: Spherical harmonic analysis is a very useful tool for representing the
noise field. However, a drawback of this technique is the three-dimensional micro-
phone arrays required for recording the noise sound field. In this chapter, a method
to design 2D planar microphone arrays that are capable of capturing 3D spatial
sound fields is proposed. Through the utilization of both omni-directional and first
order microphones, the proposed microphone array is capable of measuring sound
field components that are undetectable to conventional planar omni-directional mi-
crophone arrays, thus providing the same functionality as 3D arrays designed for
the same purpose. Simulations show that the accuracy of the planar microphone
array is comparable to traditional spherical microphone arrays. Due to its compact
shape, the proposed microphone array greatly increases the feasibility of 3D sound
field analysis techniques in spatial ANC applications.

3.1 Introduction
We use spherical harmonic analysis as a tool to represent the 3D noise field, due
to its various benefits such as accurate sound field representation and the ability to
perform in-depth analysis to the noise field. In order to capture the 3D noise field in
real time for the ANC system, it is necessary to use a microphone array which has
the capability to capture 3D sound field, in terms of spherical harmonic coefficients

29
30 Planar microphone array apertures for 3D spatial sound field analysis

of the sound field. To the best of our knowledge, all of the previously developed
microphone array structures designed for this purpose have a 3D geometry, which
limits their feasibility for compact ANC systems suitable for real-life applications.
As was discussed in Chapter 2, spherical microphone array geometries are well-
suited for the spherical harmonic transform, and both open and rigid sphere models
have been studied [10, 43]. Both models are widely used in research applications,
such as room geometry inference [65] and near field acoustic holography (NAH) [66].
An inherent drawback of the open sphere model is the numerical ill-conditioning
problem, which is due to the nulls in spherical Bessel functions, thus the diameter
of the microphone array has to be chosen carefully. It has been shown that such ill-
conditioning problem can be overcome via methods such as using concentric spheres
[67,68], co-centered rigid/open spheres [69], or by measuring the radial velocity [43].
The placement of microphones on a spherical array has to follow a strict rule
of orthogonality of the spherical harmonics [15, 70], which limits the flexibility of
the array configuration. The spherical shape of the array also pose difficulties on
implementation as well as practical usage.
Non-spherical microphone arrays, such as the conical microphone array aperture
proposed by Gupta et.al. [71] and the multiple circular microphone array proposed
by Abhayapala et. al. [13, 72] can also be used for spherical harmonic analysis.
These microphone arrays offer greater geometrical flexibility compared to spherical
microphone arrays, thus allowing easier implementation of larger microphone arrays.
However, these apertures still occupy a 3D space, which hinders the development of
compact microphone arrays for practical applications.
On the other hand, microphone arrays featuring 2D geometry are easy to im-
plement, yet existing 2D microphone arrays are incapable of capturing complete 3D
sound field information. Meyer et.al. have shown that a 2D microphone array can be
used to measure certain vertical component of a 3D sound field [73]. However, due
to inherent properties of the spherical harmonics, some spherical harmonic modes
are invisible to omni-directional pressure microphones on the x − y plane, which ex-
plains why previously proposed 2D microphone arrays fail to extract full 3D sound
field information. Measurement of these sound field components on the x − y plane
calls for additional types of sensors, no such technique has been proposed to our
best knowledge.
First order microphones, such as differential microphones and cardioid micro-
phones, are known to have the capability of detecting acoustic velocity in a certain
direction [74]. Kuntz et. al. have shown that through using cardioid microphones
3.1 Introduction 31

pointed in the radial direction to replace omni-directional microphones in a circu-

lar array, the numerical ill-conditioning problem can be solved for a 2D sound field
analysis system [60].

In this chapter, we first investigate using first order microphones to aid the de-
tection of 3D sound fields, and propose a new method for 3D sound field recording
using a 2D planar microphone array. In our approach, we use first order microphones
in conjunction with omni-directional microphones to measure the “invisible” com-
ponent of a 3D sound field on the x − y plane. Also, we propose a method of using
multiple co-centered circular arrays of omnidirectional/first order microphones to
compute the sound field coefficients associated with the spherical space enclosing
the planar array aperture. We show that the proposed planar microphone array of-
fers the same functionality as spherical/multiple circular arrays designed for sound
field analysis.

In addition, we propose a method to capture 3D sound field using circular arrays

of higher order microphones, also placed on a 2D plane. This method can be seen
as a generalization of the method discussed in the first few sections of this chapter.

This chapter is arranged as follows: Section 3.2 derives the wave domain expres-
sion of sound field measured by general first order microphone. We show that the full
3D sound field can be observed on a plane with the aid of first order microphones
by exploiting a property of the associated Legendre functions. Section 3.3 intro-
duces the co-centered hybrid circular microphone array for sound field recording,
and shows how the sound field coefficients can be calculated using the data mea-
sured by different components of the hybrid array. We also provide a step-by-step
design procedure for determining parameters of an array based on system require-
ments. Section 3.4 provides an analysis on the recording accuracy of the proposed
array. Two primary causes of errors are identified, and their impact on each sound
field coefficient is discussed. Section 3.5 gives an hypothetical design example of the
proposed microphone array, as well as an experimental microphone array built for
validation of the theory. Detailed simulation results are provided for the hypothet-
ical design example and the test results of the experimental array is compared with
corresponding simulation results for performance evaluation.
32 Planar microphone array apertures for 3D spatial sound field analysis

3.2 First order microphones for sound field acqui-

sition
In this section, we derive the general velocity of the pressure field at a point along
a direction and the wave domain expression of the received signal of a general first
order microphone. We also show that the 3D sound field coefficients can be divided
into even and odd components, while the even modes can be measured by omnidi-
rectional microphones, the odd components of the sound field can be observed on a
plane by using a recurrent relationship of associated Legendre functions.

Wave domain expression of pressure gradient

For reasons that will become clear later in the chapter, we consider pressure gradient
of a sound field along the direction of θ. That is, we consider either differential or
velocity microphones placed in such a way that they measure pressure gradient in
the direction of θ at a given point (r, θ, φ).
We define the pressure gradient of sound along the direction of θ at a point
(r, θ, φ) as
∂P (r, θ, φ, k)
Pθ (r, θ, φ, k) , . (3.1)
∂θ
By substituting (2.1) into (3.1) and taking the partial derivative with respect to θ,
the pressure gradient can be expressed as

∞ X
X l
Pθ (r, θ, φ, k) = − sin θ Clm (k)jl (kr)P 0 l|m| (cos θ)Em (φ), (3.2)
l=0 m=−l

where
dPl|m| (u)
P 0 l|m| (u) =
d(u)
is the first order derivative of the normalized associated Legendre function.

3.2.1 General expression for first order microphones

The pick-up pattern of any first order microphone can be considered as a weighted
sum of an omni-directional pattern and a differential pattern. Using P (r, θ, φ, k)
to represent the omnidirectional component of the measured sound pressure and
Pθ (r, θ, φ, k) for the differential component in the θ direction at point (r, θ, φ), the
total sound pressure measured by an arbitrary first order microphone can be written
3.2 First order microphones for sound field acquisition 33

as
Pc (r, θ, φ, k) , βP (r, θ, φ, k) + (1 − β)Pθ (r, θ, φ, k), (3.3)

where β is a weighing factor and has a range of [0, 1). When β = 0, Pc (r, θ, φ, k)
contains only the differential pattern, which is considered as a special case of first
order pick-up patterns. Here, differential microphones are regarded as one type of
first order microphones; when β = 0.5, Pc (r, θ, φ, k) becomes the pick-up pattern of
a “standard” cardioid microphone. Substituting (2.1) and (3.2) into (3.3) yields the
wave domain representation of the signal received by a general first order microphone
as
∞ X
X l
Clm (k)jl (kr) βPl|m| (cos θ) − (1 − β) sin θP 0 l|m| (cos θ) Em (φ).

Pc (r, θ, φ, k) =
l=0 m=−l
(3.4)

3.2.2 Sampling on a plane

Without loss of generality, let us place the co-ordinate system such that the plane of
interest for sensor placement is the x-y plane. In the spherical co-ordinate system,
θ = π/2 (i.e., cos θ = 0), for all points on the x-y plane. Thus, the output of an
omni-directional sensor placed on the x-y plane is

∞ X
X l
P (r, π/2, φ, k) = Clm (k)jl (kr)Pl|m| (0)Em (φ). (3.5)
l=0 m=−l

Observe that when l + |m| is an odd integer the value of Pl|m| (0) is equal to zero [13].
Consequently, the spherical harmonics associated with these associated Ledengre
Functions are equal to zero. This property makes the odd mode spherical harmonics
“invisible” on the θ = π/2 plane, which is why extraction of the complete 3D sound
field information cannot be done through sampling on a single plane using omni
directional microphones.
On the other hand,

π a non-zero value, when l + |m| is an odd integer,
P 0 l|m| (cos ) =
2 0, when n + |m| is an even integer.

Observe that the expression for the pressure gradient in (3.2) has the terms P 0 l|m| (·).
Hence the ‘odd’ components of the pressure gradient along the direction of θ is non-
34 Planar microphone array apertures for 3D spatial sound field analysis

zero on the x-y plane. Thus, the pressure gradient measurements contain ‘odd’
Clm (k) (i.e., l + |m| odd) coefficients. We use this property in this work to propose
a method to extract 3D sound field components by sampling the field on the x-y
plane using differential (or first order) and omni directional microphones together.
Using the recurrent relationship of the normalized associated Legendre functions
by substituting (2.10) into (3.2) and (3.4), we can write the output of the differential
and general first order microphones placed at a point (r, π/2, φ) on the x-y plane
along the direction of θ (i.e., perpendicular to the x-y plane) as
s
∞ X l
π X (2l + 1)(l2 − m2 )
Pθ (r, , φ, k) = − Clm (k)jl (kr) P(l−1)|m| (0)Em (φ)
2 l=0 m=−l
(2l − 1)
(3.6)
and

π
Pc (r, , φ, k) =
2 s
∞ l
X X (2l + 1)(l2 − m2 )
Clm (k)jl (kr) βPl|m| (0) − (1 − β) P(l−1)|m| (0) Em (φ),
l=0 m=−l
(2l − 1)
(3.7)

respectively.

3.3 Array configuration

In this section we outline possible geometric configurations of first order and omni-
directional sensors on the x-y plane to extract both the even and odd spherical
harmony components of the sound field.

3.3.1 Calculation of harmonic coefficients

Even coefficients: Omni-Array

Consider a circle placed on the x-y plane such that an arbitrary point on the circle
is given by (Rq , π/2, φ). Then the output of a omni-directional microphone on the
3.3 Array configuration 35

circle at (Rq , π/2, φ) is given by

∞ X l
π X
P (Rq , , φ, k) = Clm (k) jn (kRq )Pl|m| (0)Em (φ). (3.8)
2 l=0 m=−l

Since sound fields over a spherical region of finite radius are mode limited (2.6), the
infinite summation on right hand side of (3.8) can be approximated by a finite sum,

L X l
π X
P (Rq , , φ, k) ≈ Clm (k) jl (kRq )Pl|m| (0)Em (φ). (3.9)
2 l=0 m=−l

where L denotes the maximum harmonic order at the array’s radius Rq and the
highest operating frequency [53]. Multiplying both sides of (3.9) by Em (−φ) and
integrating with respect to φ over [0, 2π) yields the total sound pressure received by
the ring, as
Z 2π
αm (Rq , k) , P (Rq , π/2, φ, k)Em (−φ) dφ (3.10)
0
L
X
= Clm (k) jl (kRq )Pl|m| (0). (3.11)
l=|m|

Note that only the even mode harmonics are present in (3.11), since Pl|m| (0) = 0
for l + |m| odd. Let there be a total of Q circles placed at different radii but all
on the θ = π/2 plane (x-y plane). Thus, for q = 1, . . . , Q, the relationship between
the even mode sound field coefficients of mode m and the azimuth sound pressure
harmonics αm (Rq , k) on each circle can be expressed as

αm (k) = U m (k)C even

m (k) (3.12)

where αm (k) = [αm (R1 , k), αm (R2 , k), . . . , αm (RQ , k)]T ,


[C T
mm (k), C(m+2)m (k), . . . , CLm (k)] , if m and L are both even/odd
C even
m (k) =
[C T
mm (k), C(m+2)m (k), . . . , C(L−1)m (k)] , otherwise
(3.13)
36 Planar microphone array apertures for 3D spatial sound field analysis

is the vector of the even mode coefficients of mode m and

 
jm (kR1 )Pm|m| (0) jm+2 (kR1 )P(m+2)|m| (0) · · · jL (kR1 )PL|m| (0)
 jm (kR2 )Pm|m| (0) jm+2 (kR1 )P(m+2)|m| (0) · · · jL (kR2 )PL|m| (0) 
 
U m (k) = 
 .. .. .. .. ,
. . . .

 
jm (kRQ )Pm|m| (0) jm+2 (kRQ )P(m+2)|m| (0) · · · jL (kRQ )PL|m| (0)
(3.14)
for the case with both N and m are either odd or even (otherwise replace L in (3.14)
by L − 1).
We can estimate the even mode coefficients from (3.12), provided U m (k) is not
singular, as
†
C even
m (k) = U m (k)αm (k) (3.15)

where U †m = (U Tm U m )−1 U Tm is the pseudo inverse of U m .

Note that the calculation of even harmonic coefficients are similar to the work
presented in [13]. However, we show in the following subsection how to extract odd
harmonic coefficients by placing the differential microphones on the x-y plane which
is a method not reported elsewhere to the best of our knowledge.

Odd coefficients: differential microphone array

Consider a circular array of differential microphones with radius Rq placed on the

x-y plane with all differential microphones pointed perpendicular to the x-y plane
(i.e., θ = π/2 plane). Then the output of a differential microphone on the circle at
(Rq , π/2, φ) is given by (3.6). Using the properties of the spherical Bessel functions,
we can show that the infinite summation of (3.6) can be truncated to a finite number
(similar to the case of (3.9)). The resulting equation is given below:
s
N X l
π X (2l + 1)(l2 − m2 )
Pθ (Rq , , φ, k) = − Clm (k)jl (kRq ) P(l−1)|m| (0)Em (φ).
2 l=0 m=−l
(2l − 1)
(3.16)
By multiplying both sides of (3.16) by Em (−φ) and integrating with respect to φ
over [0, 2π), we obtain the response of the differential microphone array, named as
azimuth pressure gradient harmonics
Z 2π
(d) π
αm (Rq , k) , Pθ (Rq , , φ, k)Em (−φ)dφ (3.17)
0 2
3.3 Array configuration 37

s
L
X (2l + 1)(l2 − m2 )
=− Clm (k)jl (kRq ) P(l−1)|m| (0) (3.18)
(2l − 1)
l=|m|

Note that only the odd mode harmonics are present in (3.18), since P(l−1)|m| (0) = 0
for l + |m| even.

By evaluating (3.18) for q = 1, . . . , Q, the relationship between the odd sound

(d)
field coefficients of mode m and αm (Rq , k) on each circle can be expressed as a
matrix equation:
odd
α(d)
m (k) = V m (k)C m (k) (3.19)
(d) (d) (d) (d)
where αm (k) = [αm (R1 , k), αm (R2 , k), . . . , αm (RQ , k)]T ,

[C T
(m+1)m (k), C(m+3)m (k), . . . , C(L−1)m (k)] , if m and L are both even/odd
C odd
m (k) =
[C T
(m+1)m (k), C(m+3)m (k), . . . , CLm (k)] , otherwise
(3.20)
and
(1) (1) (1)
 
V(m+1)|m| V(m+3)|m| . . . V(L−1)|m|
 (2) (2) (2)
V(m+1)|m| V(m+3)|m| . . . V(L−1)|m| 

V m (k) =  .. .. .. ..
, (3.21)

 . . . .


(Q) (Q) (Q)
V(m+1)|m| V(m+3)|m| . . . V(L−1)|m|
with s
(q) (2l + 1)(l2 − m2 )
Vl|m| = − jl (kRq )P(l−1)|m| (0) (3.22)
(2l − 1)
for the case with both L and m are either odd or even (otherwise replace L − 1 in
(3.21) by N ).

We can estimate the odd harmonic coefficients from (3.19), provided V m (k) is
non-singular, as
†
C odd (d)
m (k) = V m (k)αm (k) (3.23)

where V †m = (V Tm V m )−1 V Tm is the pseudo inverse of V m .

Thus the complete set of sound field coefficients can be derived through solv-
ing for the even and odd harmonics coefficients separately using the signal received
from omni-directional microphones (3.12) and differential microphones (3.19), re-
spectively.
38 Planar microphone array apertures for 3D spatial sound field analysis

Cardioid or general first order microphone arrays

Alternatively, the even and odd harmonic coefficients may be calculated together in
one matrix operation. This method is especially suitable for planar arrays that uti-
lize cardioid microphones (or general first order) instead of differential microphones.
According to (3.7), a first order (e.g., cardioid) microphone placed on the x-y plane
picks up both the even and odd components of the sound field. For a set of finite
radii circular arrays of first order microphones placed on the x-y plane, we can write
a matrix equation using (3.7) and following similar steps as in the previous two
subsections:

even odd
α(f)
m (k) = βU m (k)C m (k) + (1 − β)V m (k)C m (k) (3.24)

(f) (f) (f) (f)

where αm (k) = [αm (R1 , k), αm (R2 , k), . . . , αm (RQ , k)]T with
Z 2π
(f) π
αm (Rq , k) , Pc (Rq , , φ, k) Em (−φ) dφ, (3.25)
0 2

and C even odd

m (k), U m (k), C m (k) and V m (k) are given by (3.13), (3.14), (3.20) and
(3.21), respectively.
If we have both omni-directional and first order circular arrays of microphones,
then we can combine (3.12) and (3.24) to obtain
" # " #" #
αm (k) U m (k) 0 C even
m (k)
(f) = . (3.26)
αm (k) βU m (k) (1 − β)V m (k) C odd
m (k)

Equation (3.26) can be solved to calculate both the even and odd harmonics coeffi-
cients given by C even odd
m (k) and C m (k).

3.3.2 Discrete sensor placement: sampling of continuous

aperture
In the previous subsection, we assumed that the pressure P (Rq , π/2, φ, k), pressure
gradient Pθ (Rq , π/2, φ, k) and the first order microphone output Pc (Rq , π/2, φ, k)
are readily available over a continuous circular aperture in (3.10), (3.17) and (3.25),
respectively. However, in practice we only have a finite set of microphones, and
hence a discrete set of samples on the circular aperture. Thus, for an equally spaced
3.3 Array configuration 39

microphone arrays, we approximate the integration in (3.10), (3.17) and (3.25) by

summations:
Nq
2π X π
αm (Rq , k) ≈ P (Rq , , φs , k)Em (−φs ) (3.27)
Nq s=1 2
Nq
(d) 2π X π
αm (Rq , k) ≈ Pθ (Rq , , φs , k)Em (−φs ) (3.28)
Nq s=1 2
Nq
(f) 2π X π
αm (Rq , k) ≈ Pc (Rq , , φs , k)Em (−φs ) (3.29)
Nq s=1 2

where Nq are the number of microphones placed in a circle and φs denotes the
azimuth angle of the location of the sth microphone.

Number of sensors per circle

Due to the spatial sampling of the sound field, one can only extract a limited number
of harmonic orders by each array. In order to sample a set of circular harmonics of
maximum order L, the number of microphones required is given by nmic ≥ 2L + 1,
and L is determined using L ≤ dekR/2e, where k is the wave number and R is the
radius of the region of interest [53]. The exact amount of microphones to be used
for each circular array thus depends on the radius of the array as well as the target
frequency band.
The truncation of spherical harmonics leads to errors, which will be discussed in
Section 3.4 The “rule of thumb” L ≤ dekR/2e gives a sufficiently high precision for
most applications [53]. For applications that require less accuracy, an alternative
truncation number is given by L ≤ dkRe [46], which truncates the order to a lower
value, hence reducing system complexity at the cost of accuracy. The former rule is
used in this work for higher accuracy.
Since the number of microphones on each circular array is directly linked to
the wave number k, which can then be translated into the wavelength λ, the num-
ber of microphones needed can be easily derived from the target frequency of the
application as

ekR eπR eπf R

nmic = 2L + 1 = 2d e + 1 = 2d e + 1 = 2d e + 1, (3.30)
2 λ c

where c is the speed of wave propagation, in the case of sound, c = 340 m/s. Thus
one can directly calculate the number of sampling points (microphones) for a given
40 Planar microphone array apertures for 3D spatial sound field analysis

Figure 3.1: Example of omnidirectional (dot) and first order (triangle) microphone
arrangement on a 2D plane for 3D sound field analysis.

array radius and a target frequency band. For example, a circular array of 0.4 m
radius, designed for audio signals up to 1500 Hz would need 33 microphones.

Configuration(s)

The array system can be configured to have multiple circular microphone arrays
placed on a plane, with half of the arrays using omni-directional microphones, the
other half using first order microphones placed perpendicular to the plane. The
number of microphones on each array is decided by the target wave number and the
radius of the array, therefore smaller arrays may have a lower amount of microphones.
Figure 3.1 illustrates such a configuration.
An alternative configuration is to use closely placed omni-directional microphone
pairs to realize differential microphones. In this way, each microphone pair is used
in two different ways: the two microphone output signals are differentiated to create
the bi-directional pick up pattern, which is used for calculation of odd numbered
coefficients; in the mean time, one of the two microphone outputs is used to cal-
culate the even-numbered coefficients. Figure 3.2 shows an example of such array
arrangement.
The two microphone array configurations require the same number of micro-
phones for the same design target, although the second option uses half the number
3.3 Array configuration 41

Figure 3.2: Example of omni-directional microphone pair arrangement on a 2D plane

for 3D sound field analysis.

of circular arrays. However, it should be noted that the distance between the two
microphones in each microphone pair should be small compared to the array radius,
so as to best approximate Pθ (r, θ, φ, k) in (3.1).

3.3.3 Array design procedure

A general guidance for designing the planar array is provided in this section. This
procedure illustrates the basic steps in setting the parameters of the microphone
array.

Step 1: Determine the desired frequency band and the radius R of the region of
interest.

Step 2: Calculate the maximum order of the sound field using L = dekR/2e.

Step 3: Based on the maximum order L, decide the number of circular arrays to be
implemented. For first order microphone configuration, at least Lomni = dL/2e
omnidirectional sensor arrays, and Lfirst = L − Lomni first order arrays are
needed. For differential microphone configuration, no less than Ldiff = dL/2e
arrays of microphone pairs are required.

Step 4: Determine the radius of each circular array. Choose the radius such that
42 Planar microphone array apertures for 3D spatial sound field analysis

the spherical Bessel zeros for the target frequency band are avoided. Ensure
that the radii of the circular arrays have a good diversity.

Step 5: For each circular array, decide the maximum spherical harmonic order Li ,
and estimate the number of microphones to be placed on the array, based on
nmic = 2Li + 1.

After settling on a design, the parameters for sound field calculation can then
be set based on the dimensions of the array.

3.3.4 Comments
We make the following comments and observations with the proposed array:

1. The even spherical harmonics are symmetric about the z = 0 (x-y) plane,
while the odd modes are not. A planar microphone array comprising only
omnidirectional microphones cannot distinguish the waves that are impinging
from either sides of the plane. This fact explains that why this type of array
is not capable of detecting the full 3D sound field.

2. First order cardioid microphones that are placed perpendicular to the array
plane can pick up a combination of even and odd mode harmonics, but are un-
able to separate the two components. However, if the even mode harmonic co-
efficients are known (which can be provided by an omnidirectional microphone
array), then it becomes easy to solve for the remaining odd mode coefficients.
Thus a hybrid array of both omnidirectional and first order microphones is
crucial for detecting full 3D sound field using a planar array aperture.

3. The zeros in the spherical Bessel functions cause certain spherical harmonics
to be “invisible” at some radius and frequency, which limits an array’s wide
band capabilities. The proposed array aperture samples the sound field at
multiple radii, thus improving the array’s redundancy against zero points in
the spherical Bessel functions. However, the user should carefully design the
array such that at each frequency, a sufficient number of circular arrays are
unaffected by the Bessel zeros and are available for calculating the coefficients.
In general, a properly designed planar array can avoid the Bessel zero problem
for all frequencies, and thus having wideband capabilities, this is shown in
Section 3.5 using a hypothetical design example.
3.4 Error analysis 43

4. Although the proposed array has a planar geometry, the free space assumption
still applies to our array system, which requires that no sound source or scat-
terer should exist within the spherical region enveloping the planar array. For
this reason, the array cannot be directly placed on walls or tables to capture
the surrounding sound. However, a work-around to this problem is to place an
appropriate sound absorbing material between the rigid surface (wall, table)
and the planar array, which eliminates all reflections from the surface, thus
the setup no longer violates the free-space assumption. Furthermore, if the
reflection characteristics of the surface is known, it is possible to compensate
for the reflection in the calculation. However, this is beyond the scope of this
chapter, and we will investigate this in a future work.

3.4 Error analysis

In this section we discuss two primary sources of error, and the impact they have
on the acquisition accuracy of different sound field coefficients.

3.4.1 Differential microphone approximation

As was mentioned in Section 3.3, a differential microphone can be realized using a
pair of closely placed omni-directional microphones. However, this implementation
only approximates the ideal velocity sensor, using the approximation

P (x + dx) − P (x) ∂P (x)

≈ = V (x). (3.31)
dx ∂x

By choosing sufficiently small value of dx, the error of the approximation can be
minimized. However, due to implementation constraints such as physical dimension
of the microphone units, a very good approximation of (3.31) may not be achievable.
We recommend choosing dx ≈ 0.1/kmax , where kmax is the wave number correspond-
ing to the maximum operating frequency of the microphone array, so as to minimize
the error due to the approximation.
Since this approximation only exists for the sampling of the odd coefficients,
the accuracy of the calculated odd coefficients is expected to be slightly worse than
that of the even coefficients when the differential microphone approximation is used
to implement the array. This phenomenon is observed in the hypothetical design
example.
44 Planar microphone array apertures for 3D spatial sound field analysis

3.4.2 Spatial sampling and spatial aliasing

One major source of error in the proposed array system is spatial sampling. By
comparing (3.10) and its discrete approximation, (3.28), the error on each harmonic
mode due to spatial sampling can be defined as
Z 2π nmic
2π X
∆Emode , P (r, θ, φ, k)Em (−φ)dφ − P (r, θ(u), φ(u), k)Em (−φ). (3.32)
0 nmic u=1

The same approximation error can be defined for (3.17) and (3.25). Generally
speaking, this error is small as long as the Nyquist sampling criteria is met, however,
using extra microphones on each circular array can help to improve the accuracy of
the system.
The truncation of spherical harmonic modes mentioned in Section 3.3 also leads
to errors, as the energy of the truncated higher order harmonics are aliased into the
observed harmonics during calculation. The truncation error can be expressed as

∞
X L
X
∆Etrunc , Clm jl (kr)Pl|m| (0) − Clm jl (kr)Pl|m| (0)
l=|m| l=|m|
X∞
= Clm jl (kr)Pl|m| (0) (3.33)
l=L+1

Using the “rule of thumb” given in [53], the error is in the order of 1 percent. It
should be noted that the truncation error will only be aliased into coefficients of the
highest order, due to inherent properties of the spherical Bessel functions.
Due to the structure of the proposed design example and the nature of the
spherical Bessel functions, the lower order spherical harmonic modes are sampled
by multiple circular arrays, whereas the highest order ones are only visible to one or
two circular arrays. As a result, when solving for the sound field coefficients using
(3.12) and (3.19), the lower order coefficients are less affected by the approximation
and aliasing errors than the higher order coefficients. This trend is shown in Fig. 3.6.

3.5 Design examples

In this section we describe (i) a hypothetical design example and (ii) an actual
implementation of the proposed array. The purpose of the hypothetical example
is to illustrate the procedures to design an array and to theoretically evaluate the
3.5 Design examples 45

array’s capabilities. Then the implemented array is used to validate the technique
through lab experiments.

3.5.1 Hypothetical design example

We consider the case of recording the sound field in a spherical region with a di-
ameter of approximately 1 m, the target frequency band is 50-850 Hz. This design
example illustrates the use of pairs of omni-directional microphones to realize dif-
ferential microphones in the array. We chose this array configuration because its
accuracy is worse compared to the design using both omni-directional and first or-
der microphones, due to the presence of differential pattern approximation error
mentioned in Section 3.4 The radius of the array is chosen to be 0.46 m, which
is close to the size of the region of interest. Thus, for the maximum frequency of
850 Hz and a radius of 0.46 m, the array can pick up sound field harmonics up to
the order

ekr
L=d e = 10, (3.34)
2
which means that the outer ring of the array should have at least 2L + 1 = 21 micro-
phone pairs. Following this manner, we place a series of circular arrays of different
radii inside the outer circle. Following the design procedure given in Section 3.3, the
radii of the rings are set to be 0.46 m, 0.4 m, 0.34 m, 0.28 m, 0.22 m, 0.16 m and
0.1 m. Thus, the number of microphone pairs on each ring are 21, 19, 17, 13, 11, 9
and 7, respectively.
To evaluate the performance of the proposed array system, we place a single
point source of frequency 150 − 1150 Hz at (R, θ, φ)=(1.6 m, 60°, 90°). We use
the array to estimate the spherical harmonic coefficients and then reconstructed the
sound field. We compare the reconstructed sound field to the original sound field
and calculate the overall reproduction error of the system. Figure 3.3 depicts the
error for different frequencies. Note that the error is small when the frequency is
below 850 Hz, which is the desired maximum frequency for the array. Beyond the
upper frequency, the error percentage increases dramatically. The reason is that
as the frequency increases, the order of active spherical harmonics also grows. At
frequencies above 850 Hz, the number of microphones needed to estimate the higher
frequency components are greater than the number of the microphones on the array,
thus causing aliasing. Also, the total number of coefficients for each mode m exceeds
the number of circular arrays available, as a result the matrix inversion problems
46 Planar microphone array apertures for 3D spatial sound field analysis

30
Error percentage (%)

0
200 400 600 800 1000 1200
Frequency (Hz)

Figure 3.3: Reproduction error percentage for a point source of frequencies 150 −
1150 Hz, located at (1.6 m, 60°, 90°).

shown in (3.12) and (3.19) become under-determined, resulting in significant errors.

We plot the original and reconstructed (using captured spherical harmonic coef-
ficients) sound fields in Fig. 3.4, where plots (a) and (c) are the actual sound field
at planes z = 0 and z = 0.2 m, (b) and (d) are the recorded and reconstructed
sound field at these two planes, respectively. We observe that the captured sound
field over the region of interest in both planes are similar to the actual sound field
in the same area.
To evaluate the array performance for different impinging angles, we move a
plane wave source at frequency 850 Hz over different elevation angles over [0, 180°]
and the corresponding reproduction error is given in Fig. 3.5. As seen from Fig. 3.5,
the error is less than 1.8% over all elevation angles. Due to the symmetry of the
array over the azimuth angles, the performance are almost constant over different
azimuth angles.
To examine the array accuracy in terms of sound field coefficients, we move a
plane wave source at frequency 500 Hz over different elevation angles in the range
of [0, 180°] and calculate the average error for each coefficient, where the theoretical
3.5 Design examples 47

a b
0.1 0.1
−0.6 −0.6

−0.4 −0.4
0.05 0.05

−0.2 −0.2

y(m)

y(m)
0 0 0 0

0.2 0.2

−0.05 −0.05
0.4 0.4

0.6 0.6
−0.1 −0.1
−0.5 0 0.5 −0.5 0 0.5
x(m) x(m)
c d
0.1 0.1
−0.6 −0.6

−0.4 −0.4
0.05 0.05

−0.2 −0.2
y(m)

y(m)
0 0 0 0

0.2 0.2

−0.05 −0.05
0.4 0.4

0.6 0.6
−0.1 −0.1
−0.5 0 0.5 −0.5 0 0.5
x(m) x(m)

Figure 3.4: Actual (a,c) and recorded (b,d) sound field due to a 850 Hz point source
located at θ = 45°, R = 1.6 m, reconstructed at z = 0 (a,b) and z = 0.2 m (c,d)
plane.

1.8

1.6
Error percentage (%)

1.4

1.2

0.8

0.6

0.4

0.2

0
0 20 40 60 80 100 120 140 160 180
Impinging elevation angle (degree)

Figure 3.5: Reproduction error percentage for a plane wave source at 850 Hz, moving
from θ = 0 to θ = 180°.
48 Planar microphone array apertures for 3D spatial sound field analysis

0.16
0th order
0.14 1st order
Average Error (normalized)

2nd order
0.12
3rd order
0.1 4th order
5th order
0.08 6th order
Even−mode coefficient
0.06 Odd−mode coefficient

0.04

0.02

0
5 10 15 20 25 30 35 40 45
Sound Field Coefficients

Figure 3.6: Average coefficient error due to a 500 Hz plane wave impinging from
different elevation angles.

Table 3.1: Condition number of matrix U m of the hypothetical design example for
frequencies 100 Hz, 200 Hz, 400 Hz and 800 Hz.
m=0 m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 m=9 m = 10
100Hz 5.76 1.00 1.00 / / / / / / / /
200Hz 13.25 4.57 1.00 1.00 / / / / / / /
400Hz 46.30 19.38 15.97 6.33 1.00 1.00 / / / / /
800Hz 181.35 21.80 110.9 13.24 54.40 41.80 10.88 4.20 7.88 1.00 1.00

coefficient response to a plane wave impinging from (ϑ, ϕ) is given by [74]

√ l
Clm = 4πi Ylm (ϑ, ϕ)∗ . (3.35)

Fig. 3.6 plots the normalized average error for each coefficient. It can be observed
that the lower order coefficients are more accurately measured compared to the
higher order ones; also, the even mode coefficients are more accurate compared to
the odd mode coefficients.
Table 3.1 shows the condition number of the matrix U m of the designed array
for various frequencies. Due to the separation of the even and odd mode harmonic
coefficients, the coefficients CL,±L , CL,±(L−1) and CL−1,±(L−1) are solved uniquely,
therefore the matrices U L and U L−1 are in fact vectors whose eigenvalues equal to
1. The size of U m grows as the frequency increases, and the condition number for
lower modes increase correspondingly. The design example consists of the minimum
number of circular arrays. We expect the condition numbers to be lower should
additional circular arrays be used in the system. Also, for high order systems (L ≥
5), regularization should be applied when inverting the matrix U m .
3.5 Design examples 49

In general, we can see from the simulations that the design example offers good
accuracy, with its error in the order of 1 percent. This is comparable to the per-
formance of spherical microphone arrays [43] and other previously proposed array
configurations such as the multiple circular microphone array [13] and the double
sided cone array [71] of the same order, assuming that a similar number of micro-
phones have been used in each array configuration.

3.5.2 Array implementation

In order to experimentally test the proposed array design and the associated algo-
rithms, we built a physical array of omni-directional microphones (see Fig. 3.7). The
microphones used are Panasonic WM-61B electret microphones, which have a flat
frequency response for the whole audible frequency band, and a sensitivity tolerance
of ±4 dB. Due to hardware limitations, we only use 16 microphones to build the
array. Therefore, the array is designed to detect up to the 2nd order sound field for
up to 1000 Hz frequency. Based on the proposed design procedure, the system is
built to have two co-centered circular arrays, the outer ring has the radius of 10 cm,
consisting of 5 omnidirectional microphone pairs, while the inner ring is 4 cm in
radius, and consists of 3 microphone pairs.
Testing of the microphone array was conducted in our acoustic lab. A series
of factors contribute to the errors in the test results. First of all, although most
rigid surfaces in the lab are covered by acoustic foams to reduce reverberation, the
acoustic foams are relatively thin and thus reverberations still exist. Secondly, the
microphone capsules used have a sensitivity variation of approximately 6 dB, and the
calibration process could not guarantee high uniformity among all the microphone
units. This factor has a significant impact on the performance of the differential
microphone pairs. Furthermore, the position of each microphone unit has a deviation
of 1-2 mm, which also leads to errors in the acquired data.
In our experiment, the impinging sound fields are due to two loudspeakers that
play 850 Hz sine waves. The loudspeakers were placed at (R, θ, φ) = (1.64 m, 45°, 100°)
and (1.5 m, 90°, 225°), respectively. In order to evaluate the results of the experi-
ments, the same loudspeaker-microphone array setup is simulated using MatLab.
Figure 3.8 plots the recorded sound field (a) and the simulated sound field (b) due
to a point source located at (R, θ, φ) = (1.64 m, 45°, 100°). It can be seen from the
figure that the recorded sound field is very similar to the simulated result.
Table 3.2 lists the spherical harmonic coefficients calculated from the recorded
50 Planar microphone array apertures for 3D spatial sound field analysis

Figure 3.7: Implemented planar microphone array, using omni-directional micro-

phone pairs.

a b
−0.3 0.5 −0.3 0.5

−0.2 −0.2

−0.1 −0.1
y(m)

y(m)

0 0 0 0

0.1 0.1

0.2 0.2

0.3 −0.5 0.3 −0.5

−0.2 0 0.2 −0.2 0 0.2
x(m) x(m)

Figure 3.8: Comparison of (a) recorded and (b) simulated sound field for a 850 Hz
source at (R, θ, φ) = (1.64 m, 45°, 100°), reconstructed at the z = 0.05 m plane.
Microphone locations are marked with “*”.
3.6 Summary 51

Table 3.2: sound field coefficient comparison between simulation and experimen-
tal results, the sound fields are due to a point source located at (R, θ, φ) =
(1.64 m, 45°, 100°) and (1.5 m, 90°, 225°), respectively.
Recording 1 C0 0 C1 (−1) C1 0 C1 1 C2(−2) C2(−1) C2 0 C2 1 C2 2
Recorded 1.5413 0.9608 0.9936 0.8351 0.9566 0.7312 1.0255 1.3735 0.5825
Simulated 1.1079 0.9569 1.1892 0.9626 0.8198 1.4140 0.5095 1.4513 0.6788
Mag. Error 0.5142 0.0040 0.1645 0.1325 0.1668 0.4829 1.1026 0.0536 0.1418
Phase Error 0.0009 0.0226 0.1510 0.0888 0.4383 0.3350 0.0145 0.1566 0.3904
Recording 2 C0 0 C1 (−1) C1 0 C1 1 C2(−2) C2(−1) C2 0 C2 1 C2 2
Recorded 1.7838 1.5167 0.0968 1.5957 1.7211 0.1066 0.6772 0.1121 1.3902
Simulated 1.2380 1.4368 0 1.4137 1.6001 0 1.3127 0 1.8995
Mag. Error 0.5457 0.0798 / 0.1820 0.1209 / −0.6355 / −0.5094
Phase Error −0.0515 0.1143 / −0.1372 −0.1365 / −3.9421 / −0.4036

data as well as those acquired from the simulation results. It can be seen that al-
though rather significant errors occur with some coefficients, the general patterns
match very well. The microphone data used are raw recordings processed by mi-
crophone calibration data which was acquired before assembling the array, therefore
all the errors mentioned previously are present and have an impact on the recorded
coefficients. Further calibration to the system, including microphone gain calibra-
tion, array geometry adjustments and modification of algorithm parameters can be
expected to greatly improve the accuracy of the system.
We would like to point out that our array system utilizes 16 microphones to cap-
ture 2nd order sound field, whereas in theory, the minimum number of microphones
required to capture second order sound field is 9. Therefore, the proposed array
system does not reduce the number of microphones required to sample the sound
field. The highlight of our proposed array structure is that it reduces the physi-
cal dimension of a higher order microphone array system without compromising its
functionality.

3.6 Summary
This chapter first introduces a method of measuring complete 3D sound field infor-
mation on a 2D plane, through the combined use of omnidirectional microphones
and first order microphones. Two options are provided for planar microphone ar-
ray implementation based on the proposed sound field measuring method. Both
array configurations consist of multiple co-centered circular arrays, with one option
using both omni-directional microphones and first order microphones, while the
other option using omni-directional microphones only. The associated algorithms
to calculate sound field coefficients are also given in the chapter. We show in the
52 Planar microphone array apertures for 3D spatial sound field analysis

simulation example that the proposed 2D microphone array system has good accu-
racy within its designed operating frequency band, and both even and odd sound
field coefficients can be accurately calculated. We also built an experimental planar
microphone array to further validate the proposed theory.

3.7 Related patents and publications

The following patent is related to the work in this chapter.

H. Chen, T.D. Abhayapala, and W. Zhang, “Planar sensor array”, Interna-

tional (PCT) Patent Application No. PCT/AU2015/000413.

This chapter’s work has been published in the following journal paper. [75]

H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact

hybrid microphone arrays on two-dimensional planes for three-dimensional
soundfield analysis,” The Journal of the Acoustical Society of America, vol.
138, no. 5, pp. 3081–3092, 2015.
Chapter 4

3D sound field analysis using

circular higher order microphone
array

Overview: This chapter proposes the theory and design of circular higher-order
microphone arrays for 3D sound field analysis using spherical harmonics. Through
employing the spherical harmonic translation theorem, the local spatial sound fields
recorded by each higher-order microphone placed in the circular arrays are combined
to form the sound field information of a large global spherical region. The proposed
design reduces the number of the required sampling points and the geometrical com-
plexity of microphone arrays. We develop a two-step method to calculate sound field
coefficients using the proposed array structure, i) analytically combine local sound
field coefficients on each circular array and ii) solve for global sound field coeffi-
cients using data from the first step. Simulation and experimental results show that
the proposed array is capable of acquiring the full 3D sound field information over a
relatively large spherical region with decent accuracy and computational simplicity,
hence suitable for spatial ANC applications especially over large regions.

4.1 Introduction
A higher-order microphone is capable of measuring the local sound field within
its proximity, and extracting the sound field coefficients up to a certain spherical
harmonics order. It has been shown that the sound field over a large region can
be recorded using a number of higher order microphones in a spherical geometry

53
54 3D sound field analysis using circular higher order microphone array

[76]. Compared to using omnidirectional microphones for the same purpose, the
higher order microphone array proposed in [76] requires significantly less number of
individual microphone units, thereby reducing the complexity of system deployment
especially for spatial sound recording over a large region.
In Chapter 3 we introduced a planar microphone array geometry consisting of
differential microphone pairs, which is capable of recording 3D spatial sound. A
differential microphone pair can also be seen as a special kind of higher-order mi-
crophone, since the sound pressure and pressure gradient it captures are related to
the 0th order and 1st order spherical harmonic coefficients as shown in Theorem
1. Intuitively, if differential microphone arrays arranged on a plane can capture 3D
sound field, then general higher-order microphones should also have this capability.
In this Chapter, we present an algorithm to capture 3D sound field using circular
arrays of higher order microphones, placed on a 2D plane. Compared to [76], this
method requires simpler microphone geometry, thus reduces the implementation dif-
ficulty of higher order microphone arrays for the purpose of large area sound field
recording. This method can be seen as a generalization of the algorithm discussed
in Chapter 3.

4.2 Sound field model

For clarity, in this section, we refer to the sound field with origin O as the global
sound field, which can be expressed using spherical harmonics using 2.1; the corre-
sponding coefficients Clm are considered as the global sound field coefficients.
In addition, we define a local origin Oq whose position with respect to O is
Rq = (Rq , θq , φq ), then the sound pressure at a point r = (r, ϑ, ϕ) with respect to
Oq can be expressed by
∞ X
X ν
P (r, ϑ, ϕ) = Bνµ (k)jν (kr)Yνµ (ϑ, ϕ), (4.1)
ν=0 µ=−ν

where Bνµ (k) represent the sound field coefficients with respect to the local origin
Oq . The sound field with respect to Oq is called the local sound field.
Using the spherical harmonic addition theorem (2.12), the relationship between
4.3 Higher-order microphone array 55

Bνµ and Clm can be written as

∞ X
X l
νµ
Bνµ = Clm Sblm (Rq ). (4.2)
l=0 m=−l

In (4.2), Bνµ are the local sound field coefficients in (4.1) and Clm are the global
sound field coefficients in (2.1).

4.3 Higher-order microphone array

4.3.1 Higher-order microphone

A higher-order microphone is capable of measuring the local sound field within

its proximity, and extracting the sound field coefficients up to a certain spherical
harmonics order. Thus if a higher-order microphone of order V is placed at a local
origin Oq , the sound pressure at a point close to Oq can be expressed by a limited
summation of spherical harmonics

V X
X ν
Pq (r, ϑ, ϕ) = Bνµ jν (kr)Yνµ (ϑ, ϕ). (4.3)
ν=0 µ=−ν

A total of (V + 1)2 spherical harmonics and their respective weighing coefficients

are present in the summation.

4.3.2 Continuous circular higher-order microphone array

A concept of continuous circular microphone array has been proposed in [13]. In

this work, this concept is extended for the higher-order microphone case.
Consider a continuous distribution of V th order microphones are placed along a
circle (Rs , ϑs ), then each higher-order microphone, at a particular azimuth angle ϕ,
is able to detect its local sound field coefficients, denoted as Bνµ (ϕ). The relationship
between Bνµ (ϕ) and Cnm is given by the following theorem:

Theorem 2. Given a set of local sound field coefficients Bνµ (ϕ) which are mea-
sured along a circle, and an integer m0 , their relationship with the global sound field
56 3D sound field analysis using circular higher order microphone array

coefficients can be given by

Z 2π ∞
X νµ
Bνµ (ϕ)Em0 (ϕ)dϕ = Cl(µ−m0 ) Hl(µ−m 0 ) (Rs , ϑs ). (4.4)
0 l=|µ−m0 |

where
l+ν+1
X
νµ
Hlm (Rs , ϑs ) = 4πiν−l i` (−1)2m−µ j` (kRs )P`|µ−m| (ϑs )W, (4.5)
`=|µ−m|

with the definition of W given by (2.14).

Proof. Using (2.2), (2.13) can be rewritten with Rs = (Rs , ϑs , ϕ),

νµ νµ
Sblm (Rs , ϑs , ϕ) = Hlm (Rs , ϑs )E(m−µ) (ϕ), (4.6)

νµ
where Hlm (Rs , ϑs ) is given by (4.5). Substituting (4.6) into (2.12) yields

∞ X
X l
νµ
Bνµ (ϕ) = Clm Hlm (Rs , ϑs )E(m−µ) (ϕ). (4.7)
l=0 m=−l

Multiplying both sides of (4.7) with Em0 (ϕ) and integrating with respect to ϕ over
[0, 2π), due to the orthogonality property of complex exponential functions
Z 2π
E(m−µ) (ϕ)Em0 (ϕ)∗ dϕ = δm−µ,m0 , (4.8)
0

R 2π νµ
the integration 0 Clm Hlm (Rs , ϑs )E(m−µ) (ϕ)Em0 (ϕ)dϕ is non-zero only when m =
0
µ − m , thus (4.7) reduces to (4.4), which completes the proof.

By replacing Bνµ (ϕ) with Bνµ (ϕq ), the discrete form of (4.4) can be written as

Q ∞
1 X X νµ
Bνµ (ϕq )Eµ−m (ϕq ) ≈ Clm Hlm (Rs , ϑs ), (4.9)
Q q=1
l=|m|

where Q is the number of sampling points evenly distributed on the circle. In (4.9),
the variable m0 has been replaced by (µ − m) to illustrate the direct relationship
between Bνµ and Clm . Due to the spatial sampling, an upper bound for the range
of (µ − m) that can be evaluated is given by

(Q − 1)
| µ − m |≤ b c. (4.10)
2
4.3 Higher-order microphone array 57

4.3.3 Solving for global coefficients

A method for calculating the global sound field coefficients Clm up to order L using
the local coefficients Bνµ (ϕq ) can be formulated based on (4.9).

Step 1 of the method is to evaluate the summation on the left hand side of
(4.9). For each existing global sound field mode m, evaluate the summation for all
m
combinations of Bνµ (ϕq ) and m that satisfy (4.10). Denote the summation as ανµ ,
then
Q
m 1 X
ανµ = Bνµ (ϕq )E(m−µ) (ϕq ). (4.11)
Q q=1

The second step is to solve a matrix inversion problem to find Clm . Using (4.9) and
m
(4.11), the relationship between Clm and ανµ can be represented in matrix form as

αm = Hm Cm , (4.12)
h iT h iT
where αm = α00 m m m m , and Cm = C|m|m C(|m|+1)m . . . CLm
α1(−1) α10 . . . ανµ
is the set of global coefficients of mode m.
 
00 00 00
H|m|m H(|m|+1)m ... HLm
 1(−1) 1(−1) 1(−1) 
H|m|m H(|m|+1)m . . . HLm 
Hm =  .
 .. .. .. 
 . . . . . 

νµ νµ νµ
H|m|m H(|m|+1)m . . . HLm

is the matrix that contains the weights for spherical harmonics translation. A solu-
tion for Cm can be found by calculating the Moore-Penrose Pseudo Inverse of Hm .
The size of Hm is (V + 1)2 by (L − |m| + 1), which is significantly smaller than
the (L + 1)2 -by-(L + 1)2 matrix inversion proposed in [58], thus both the computa-
tional simplicity and the condition of the matrix inversion are significantly better
compared to the method in [58].

The complete set of global sound field coefficients is found by solving (4.12) for
m = [−L : L], where L is the maximum order of the global sound field.

Implementing multiple circular higher-order microphone arrays in the global re-

gion can improve the robustness and precision of the microphone system. Assuming
a total number of K circular arrays are implemented, then in order to calculate the
global coefficients, one needs to formulate (4.12) for each circular array, denoted as
58 3D sound field analysis using circular higher order microphone array

αm;K = Hm;K Cm , then the solution for Cm can be expressed as

b HH
Cm = (H −1 b H
m m + λI) Hm α bm (4.13)
b

where H bm =
b m = [Hm;1 T Hm;2 T . . . Hm;K T ]T , λ is the regularization parameter, and α
[αm;1 T αm;2 T . . . αm;K T ]T . Evaluating (4.13) for m = [−L : L] yields the complete
set of global sound field coefficients.

4.3.4 Dimensionality analysis

Due to the nature of Spherical Bessel functions, only a number of jl (kRs ) are active
within a certain radius. From (2.7) and the range of ` in (4.5), we can derive
he maximum global spherical harmonic order detectable by a circular Vth order
microphone array
ekRs
L=V +d e, (4.14)
2
where L is the maximum global sound field order detectable and Rs is the radius of
the circular array.
The minimum number of sampling points Q on a circle can be derived from eqs
(4.10) and (2.7), using ` ≥ |µ − m|,

ekRs
Q ≥ 2d e + 1. (4.15)
2

4.4 Simulation results

A series of simulations have been conducted to validate the performance of the
proposed array structure. Two instances of the proposed array structure are used
in the following simulations. Both array configurations are designed to capture
sound fields up to 700 Hz within a sphere with 0.5m radius, with their dimensions
determined based on eqs (4.14) and (4.15). Multiple circular arrays are employed
in both cases to guarantee the quality of the matrix inversion in (4.13). One design
consists of first order microphones arranged into four circular arrays, positioned at
(Rs , ϑs ) = (0.4, 90°), (0.34, 72°), (0.28, 108°) and (0.22, 72°), the number of first order
microphones on each array is 17, 15, 13 and 11, respectively. The second design
utilizes only second order microphone arrays, with two circular arrays located at
(Rs , ϑs ) = (0.4, 90°) and (0.2, 72°), with 17 and 9 second order microphones placed
on each array. AWGN is added to the microphone input of all simulations with a
4.4 Simulation results 59

SNR of 40 dB. A point source is placed at (R, θ, φ) = (1.6, 60°, −60°) for all the
simulation setups.

0.1 0.1
0.6 0.6
0.08 0.08

0.4 0.06 0.4 0.06

0.04 0.04
0.2 0.2
0.02 0.02
y(m)

y(m)
0 0 0 0

−0.02 −0.02
−0.2 −0.2
−0.04 −0.04
−0.4 −0.06 −0.4 −0.06

−0.08 −0.08
−0.6 −0.6
−0.1 −0.1
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
x(m) x(m)

(a) original, z=0m (b) recorded, z=0m

0.1 0.1
0.6 0.6
0.08 0.08

0.4 0.06 0.4 0.06

0.04 0.04
0.2 0.2
0.02 0.02
y(m)

y(m)

0 0 0 0

−0.02 −0.02
−0.2 −0.2
−0.04 −0.04
−0.4 −0.06 −0.4 −0.06

−0.08 −0.08
−0.6 −0.6
−0.1 −0.1
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
x(m) x(m)

(c) original, z=0.2m (d) recorded, z=0.2m

Figure 4.1: Comparison of original and recorded sound field due to 700 Hz point
source, reconstructed at z=0 and z=0.2 m plane

Figure 4.1 shows the simulation result for the first order array configuration. In
this simulation, the sound field generated by the point source is recorded by the
array, and the resulting sound field coefficients are used to reconstruct the sound
field. The sound field is plotted for two layers: the z = 0 plane and z = 0.2 m
plane. Plots (a) and (c) show the original sound field at these two planes, and plots
(b) and (d) show the reconstruction of the sound field coefficients obtained from
the microphone array. The result shows that the microphone array is capable of
accurately capture the sound field within its coverage (yellow circle).
Figure 4.2 depicts the error performance for two different array configurations
at a frequency range of 100 − 1000 Hz. For this figure, the error is calculated by
averaging the amplitude error over the entire region of interest, and normalizing
by the average sound pressure in the same region. Since both array configurations
60 3D sound field analysis using circular higher order microphone array

0
10

1st order array

2nd order array
Relative Error

−1
10

−2
10
100 200 300 400 500 600 700 800 900 1000
Frequency (Hz)

Figure 4.2: Reproduction error at different frequencies for first and second order
array configurations

0
10
1st order array, 500Hz
1st order array, 600Hz
1st order array, 700Hz
Relative Error

2nd order array, 500Hz

−1 2nd order array, 600Hz
10
2nd order array, 700Hz

−2
10
0 0.05 0.1 0.15 0.2 0.25 0.3
Height of reproduction layer (m)

Figure 4.3: Reproduction error at different elevations and frequencies for first and
second order array configurations

are designed to operate at up to 700 Hz, it can be seen from Fig. 4.2 that the
reproduction error for both configurations are low for frequencies below 700 Hz,
and the error increases rapidly once the frequency becomes higher than the design
frequency.
The reproduction error is also evaluated at different planes using the same
method, but with the region limited to horizontal planes within the spherical area.
The results are shown in Fig. 4.3. The recorded sound field is reconstructed on
planes of different heights, ranging from z = 0 to z = 0.3 m. The simulation shows
that the reproduction error is smaller around the equator compared to that near
the poles of the sphere, which is due to the fact that the microphones are clustered
around the equator plane.
4.5 Experimental results 61

4.5 Experimental results

In order to further validate the proposed method, we conducted an experiment
of recording a three-dimensional sound field using higher order microphone. We
use a single Eigenmike as a 4th order microphone, which consists of 32 condenser
microphone capsules placed on a rigid sphere of 4.2cm radius. The goal of the
experiment is to test the robustness of the algorithm with the presence of noise and
interference in a real-life system.
The region of interest is set to be a sphere of 25cm diameter, a loudspeaker
is placed at (R, θ, φ) = (1.5m, 90°, 30°) with respect to the center of the region of
interest. The Eigenmike is placed on the equatorial plane of the spherical region
and is moved around a circle of 10cm radius. A total of 25 sampling points are
evenly distributed along the circle. At each sampling point, the Eigenmike records
a sweeping signal played by the loudspeaker, which is then converted to a set of
4th order spherical harmonic coefficients. The 25 sets of local coefficients are then
combined using the proposed method to compute the 13th order global coefficients.
A visualization of the reconstructed sound field at 3500 Hz within the region of
interest is shown in Fig. 4.4.

0.02 0.02
0.1 0.1

0.01 0.01
0.05 0.05
y(m)

y(m)

0 0 0 0

−0.05 −0.05
−0.01 −0.01

−0.1 −0.1
−0.02 −0.02
−0.1 −0.05 0 0.05 0.1 −0.1 −0.05 0 0.05 0.1
x(m) x(m)

(a) real (b) imaginary

Figure 4.4: Reconstructed sound field at 3500Hz due to a loudspeaker placed at

(1.5m, 90°, 30°), sampling points are indicated by blue circles.

We have identified the primary causes of error to be the sensitivity variation

of the microphone capsules, and the reverberation inside the laboratory. Despite
said interferences, the Eigenmike was able to record the sound field with acceptable
accuracy.
62 3D sound field analysis using circular higher order microphone array

We believe that the proposed spatial sampling method allows for easier imple-
mentation of sound field recording systems compared to spherical sampling methods,
especially when combined with the recording technique used in this experiment, and
for applications such as room response modelling over a large space.

4.6 Summary
In this chapter, we propose a circular higher-order microphone array structure and
an associated analytical algorithm for sound field analysis based on spherical har-
monics decomposition. This method can be seen as a generalization of the planar
microphone array proposed in Chapter 3. In this method, through employing the
spherical harmonic translation theorem, the local spatial sound fields recorded by
each higher-order microphone placed in the circular arrays are combined to form
the sound field information of a large global spherical region. The proposed design
reduces the number of the required sampling points and the geometrical complex-
ity of microphone arrays. Simulations and experiments show that the proposed
array architecture offers decent accuracy and robustness, and has the potential of
simplifying sound field recording systems in certain applications.

4.7 Related Publications

This chapter’s work has been published in the following conference proceeding [77]:

H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-

ing circular higher-order microphone array,” in Proc. 23rd European Signal
Processing Conference (EUSIPCO), Aug 2015, pp. 1153–1157.
Chapter 5

Direct-to-reverberant energy ratio

estimation using a first order
microphone

Overview: The Direct-to-Reverberant Ratio (DRR) is an important characteriza-

tion of a reverberant environment. In the context of spatial ANC, DRR helps to
determine the strength of reverberation within the noise field’s composition. This
chapter presents a novel blind DRR estimation method based on coherence function
of sound pressure and particle velocity. First, a general expression of coherence
function and DRR is derived in the spherical harmonic domain, without imposing
assumptions on the reverberation. In this work, DRR is expressed in terms of the
coherence function as well as two parameters which are related to statistical char-
acteristics of the reverberant environment. Then, a method to estimate the values
of these two parameters using a microphone system capable of capturing first order
spherical harmonics is proposed, under three assumptions which are more realistic
than the diffuse field model. Furthermore, a theoretical analysis on the use of plane
wave model for direct path signal and its effect on DRR estimation is presented, and
a rule of thumb is provided for determining whether the point source model should
be used for the direct path signal. Finally, the ACE Challenge Dataset is used to
validate the proposed DRR estimation method. The results show that the average
full band estimation error is within 2 dB, with no clear trend of biasing.

63
64 Direct-to-reverberant energy ratio estimation using a first order microphone

5.1 Introduction
The direct-to-reverberation energy ratio (DRR), defined as the energy ratio between
direct signal and its reverberations, is an important parameter to characterize a re-
verberant environment, along with other parameters such as reverberation time.
Since reverberation energy affects the speech signal’s clarity [78], the DRR has an
influence on the algorithms for various applications such as speech dereverbera-
tion [79], teleconferencing [80] and hearing aids [81], both in terms of algorithm
performance and strategy. The minimum audible difference in DRR has been inves-
tigated in [82]. In [83], DRR is utilized for parametric spatial audio coding. DRR
also finds its application in the field of psychoacoustics, where it is believed that
DRR helps human to determine the distance of the sound source [78, 84, 85].
DRR estimation methods based on estimating room impulse responses have been
presented by Larsen et al. [86] and Falk et al. [87], However, pre-processing is re-
quired for both methods. Mosayyebpour et al. [88] presented a method for blind
DRR estimation based on higher order statistics, where the inverse filter of the room
impulse response is estimated using the skewness of the speech signal. Parada et.
al. presented a single channel DRR estimation method base on a neuron network
learning algorithm [89].
Methods for blind DRR estimation using multiple sensors have also been pro-
posed in the literature. With the goal of estimating source distance, Lu [90] pre-
sented a DRR estimation algorithm using the equalization-cancellation method,
where a binaural microphone system is used to capture sound signal. The coherence
function framework was first introduced by Vesa [91] for estimating source distance
using binaural signals, where the coherence function of the two input signals was used
as a characterization of source distance. Later, the coherence function framework
was also used by Jeub [92] to develop a DRR estimation algorithm. In this work,
the DRR is estimated by comparing coherence value computed from two microphone
inputs with theoretical coherence functions in a diffuse sound field. Thiergart [93]
also developed a DRR estimation algorithm based on the complex coherence func-
tion of two omnidirectional microphones. In [94] a DRR estimation method based
on spectra standard deviation of two microphones was proposed.
Directional or beam forming microphone arrays have also been used to estimate
DRR, such as the methods presented in [95] and [96]. In both of these works, the
power spectral density (PSD) of the reverberant field were used to estimate DRR.
Another method [97] uses a circular microphone array to estimate DRR, the method
5.1 Introduction 65

relies on the spatial correlation matrix of the microphones’ received signals. The
reverberation is modelled as a diffuse field in this work, while the direct path is
assumed to be a plane wave. The DRR is solved using a least mean square method.
Kuster [98] presented a method based on coherence function of sound pressure and
particle velocity at the receiver position, measured by a differential microphone
array.
In recent years, the use of higher order microphones and the technique of spher-
ical harmonic decomposition [10] have become popular in the field of room acoustic
analysis. Jarrett et al. [12] proposed a method to estimate Signal-to-Diffuse Ra-
tio (SDR, equivalent to DRR when assuming diffuse reverberation field) utilizing
spherical harmonic coefficients captured by a higher order microphone. It is shown
that this method minimizes the SDR estimation bias. In our previous work [99], we
implemented Kuster’s method [98] in spherical harmonic domain, utilizing the first
order spherical harmonic coefficients to estimate DRR.
In many of the previous works, such as [98], [12], [93] and our previous work [99],
the direct path signal is assumed to be plane wave, and the reverberant sound field
is assumed to be diffuse field. In real-life reverberant environments where these
assumptions may not hold, the DRR estimation accuracy of these algorithms may
degrade. For example, the DRR estimated using Kuster’s method tend to be higher
than ground truth in reverberant rooms [98].
In this work, we first develop a general expression for DRR estimation using
the coherence function of sound pressure and particle velocity, using a point source
model for the direct path signal, and without applying any assumptions for the
reverberation field. Using the relationship between spherical harmonic coefficients
and acoustic particle velocity, we develop the framework in the spherical harmonics
domain. Then, for the direct path model, we provide a detailed analysis on the
error in DRR estimation results when using the plane wave model. We propose a
rule of thumb for determining whether the plane wave model can be used without
introducing significant error, based on the source-to-microphone distance and tar-
get frequency. For the reverberation sound field, we show that the reverberation
characteristics related to DRR estimation can be expressed using two parameters,
and that under the diffused field assumptions, the values of these parameters can
be determined theoretically, which results in the simplified DRR solutions in [98]
and [12]. We also provide a theoretical analysis on the two parameters, their physi-
cal meanings, and their impact on the DRR estimation, which explains the positive
bias phenomenon of Kuster’s method [98]. Furthermore, we propose a method to
66 Direct-to-reverberant energy ratio estimation using a first order microphone

estimate these two parameters for a given reverberant environment, using a first
order microphone, under a number of assumptions on the reverberant field which
are less strict than the diffuse field model. The DRR can then be calculated using
the estimated parameters.
The performance of the proposed DRR estimation algorithm is verified using the
ACE Challenge Dataset [100]. It is shown that the results agree with the theoreti-
cal analysis, and that the proposed method addresses the positive bias problem of
Kuster’s method [98], and the mean DRR estimation error is less than 2 dB for all
recording scenes in the ACE Challenge Dataset.

5.2 DRR estimation based on coherence measure-

ments

5.2.1 Representation of reverberant sound field

For convenience, the spherical coordinate system is defined such that its origin is at
the position of the microphone, and its positive z axis points towards the impinging
direction of the direct path signal. In many scenarios, the natural coordinate system
may have a different orientation than our definition. In such cases, the spherical
harmonics defined under a different coordinate system (with the same origin) can
be transformed into our desired coordinate system using the spherical harmonic
rotation, which is described in Chapter. 2.2.3 .
The sound pressure at a point (r, θ, φ) close to the origin can be decomposed
using (2.1). For the direct path, we have

1 X
X l
PD (r, θ, φ, k) = Blm (k)jl (kr)Ylm (θ, φ), (5.1)
l=0 m=−l

where only the first order sound field is considered.

For the sound field due to reverberation, we have

1 X
X l
PR (r, θ, φ, k) = αlm (k)jl (kr)Ylm (θ, φ), (5.2)
l=0 m=−l

where Blm (k) and αlm (k) represent the coefficients of the direct path and the rever-
berant sound field, respectively.
5.2 DRR estimation based on coherence measurements 67

The following assumptions are made regarding the direct path sound:

1: The direct path is due to a point source located at (r0 , ϑ, ϕ).

2: The direct path signal PD (r, θ, φ, k) is uncorrelated with the reverberant sound
field PR (r, θ, φ, k). Using (5.1) and (5.2), this assumption can be expressed as

E{Blm αl∗0 m0 } = 0, for all l and m. (5.3)

where E{·} denotes the expectation operator.

Since the direct path signal is modelled as sound waves emitted by a point source,
Blm (k) can be written using the following expression [101]

(1)∗
Blm (k) = AD ik hl (kr0 )Ylm (ϑ, ϕ), (5.4)

(1)
where AD indicates the magnitude of the impinging sound, hl (kr0 ) is the nth
order spherical Hankel function of the first kind, r0 is the distance between the
point source and the microphone with r0 > r, (r0 , ϑ, ϕ) denotes the position of the
point source, and (·)∗ represents complex conjugate. Since the coordinate system is
defined such that ϑ = 0, and due to the fact that Y11 (0, ϕ) = Y1,−1 (0, ϕ) = 0, we
have B11 (k) = B1,−1 (k) = 0. Thus the combined sound field coefficients Clm (k) can
be expressed as follows

(1)
C00 (k) = AD ik h0 (kr0 )Y00∗ (0, 0) + α00 (k), (5.5)
(1)
C10 (k) = AD ik h1 (kr0 )Y10∗ (0, 0) + α10 (k), (5.6)
C11 (k) = α11 (k), (5.7)
C1,−1 (k) = α1,−1 (k). (5.8)

Equations (5.5)-(5.8) shows that in the coordinate system defined in this section,
the direct path signal is only present in C00 (k) and C10 (k), but not in C11 (k) and
C1,−1 (k).
We note that the four coefficients C00 (k), C10 (k), C11 (k) and C1,−1 (k) can be
captured by a first order microphone. Although in the general sense, microphones
with certain directional beam patterns, such as cardioid microphones and differential
microphones are commonly referred to as first order microphones, in the context of
this section, a first order microphone is a microphone system which is capable of
acquiring the 0th and 1st order spherical harmonic coefficients of its surrounding
68 Direct-to-reverberant energy ratio estimation using a first order microphone

sound field. Specific directionalities can be realized through applying beam-forming

algorithms on the 0th and 1st order coefficients.

5.2.2 Representation of DRR using coherence function

The coherence function between the sound pressure P (0, k) and particle velocity
Vz (0, k) along the z direction can be defined as [98],

|E{P (0, k)Vz (0, k)∗ }|2

γ2 , . (5.9)
E{|P (0, k)|2 }E{|Vz (0, k)|2 }
√
Note that P (0, k) = C00 Y00 (0, 0) = 1/ 2πC00 and Vz (0, k) is proportional to C10 · i1
in (2.22). Substituting P (0, k) and (2.22) into (5.9), and applying (5.5) (5.6), we
have

|E{C00 (C10 · i)∗ }|2

γ2 = (5.10)
E{|C00 |2 }E{|C10 |2 }
|E{H0 (H1 i)∗ } + E{α00 (α10 i)∗ }|2
= , (5.11)
(E{|H0 |2 } + E{|α00 |2 })(E{|H1 |2 } + E{|α10 |2 })

where the assumption that direct path is uncorrelated with the reverberations (5.3)
is used, and we denote

(1)
H0 , AD ik h0 (kr0 )Y00∗ , (5.12)
(1)
H1 , AD ik h1 (kr0 )Y10∗ . (5.13)

Note that the angle arguments (ϑ = 0, ϕ = 0) of Ynm (ϑ, ϕ) and the frequency
arguments (k) of Cnm and αnm have been omitted for simplicity.
The linear scale direct-to-reverberant energy ratio is defined here to be the ratio
of measured acoustic energy at the position of measurement due to the direct path
and reverberation, since P (0) = C00 Y00 , we have

E{|PD (0)|2 } E{|B00 |2 }

DRR = = . (5.14)
E{|PR (0)|2 } E{|α00 |2 }

1
Although removing the imaginary argument i here does not affect γ 2 , we keep i for the deriva-
tion of further expressions.
5.2 DRR estimation based on coherence measurements 69

Using (5.4) to express B00 in (5.14), we have

(1)
E{|AD ik h0 (kr0 )Y00∗ |2 } E{|H0 |2 }
DRR = = . (5.15)
E{|α00 |2 } E{|α00 |2 }

Substituting (5.15) into (5.10) yields

(1) ∗ ∗
h1 (kr0 )Y10 E{α00 (α10 i)∗ } 2
− DRR · i (1) ∗
+ E{|α00 |2 }
h0 (kr0 )Y00
γ2 = (1)
(5.16)
h1 (kr0 )Y10 2 E{|α10 |2 }
(DRR + 1)(DRR (1) + E{|α00 |2 }
)
h0 (kr0 )2 Y00

which relates the coherence value γ 2 to the DRR of the room.

For convenience, we define

E{α00 (α10 · i)∗ } Y00

R1 , (5.17)
E{|α00 |2 } Y10
2 2
E{|α10 | } Y00
R2 , (5.18)
E{|α00 |2 } Y102

as the reverberation parameters, and

h(1) (kr ) ∗
1 0
H, (1)
. (5.19)
h0 (kr0 )

Then (5.16) can be simplified as

| − DRR · i · H + R1 |2
γ2 = (5.20)
(DRR + 1)(DRR|H|2 + R2 )
|DRR|2 |H|2 + 2DRR · Im{HR1∗ } + |R1 |2
= , (5.21)
(DRR + 1)(DRR|H|2 + R2 )

where Im{·} denotes imaginary part of the argument. From (5.20) it can be seen
the characteristics of reverberation which affects DRR estimation using coherence
method can be expressed using two parameters R1 and R2 .

5.2.3 Assumptions for the reverberant sound field

Plane wave assumption for the direct path

In previous works, the direct path signal is often assumed to be a plane wave [12,98].
Under this assumption, the following approximation can be applied (see Appendix
70 Direct-to-reverberant energy ratio estimation using a first order microphone

5.7 for the proof)

(1)
h1 (kr0 )
lim (1)
≈ −i, (5.22)
r0 →∞ h0 (kr0 )
and (5.20) can be simplified into

|DRR + R1 |2
γ2 = (5.23)
(DRR + 1)(DRR + R2 )
DRR2 + 2DRR · Re{R1 } + |R1 |2
= , (5.24)
(DRR + 1)(DRR + R2 )

where Re{·} denotes real part of the argument. The plane wave assumption leads
to bias in the DRR estimation, primarily for lower frequencies and smaller values of
r0 , which is shown in Section 5.3.2.

Diffuse reverberation assumptions in previous works

In many previous works, the sound field due to reverberation is often modelled
as diffused field [12, 98], although the exact definition of diffused field may vary.
In [12], the diffuse field is defined as an infinite number of uncorrelated plane waves
impinging uniformly from the sphere. Under this assumption, it is shown that
∗
E{α00 α10 } = 0, and E{|αlm |2 } = E{|αl0 m0 |2 } for all values of l and m [12]. In
this case, R1 = 0, R2 = |Y00 |2 /|Y10 |2 = 1/3, and (5.24) becomes equivalent to the
magnitude-squared version of Eq.(18) in [12].2
In the case of Kuster’s work [98], the reverberant field is assumed to be plane
waves whose impinging directions distribute uniformly over θin ∈ [0, 2π), where θin
is the angle between direct path and the plane wave impinging direction. This
assumption differs from the reverberant field model used in [12], where plane waves
are distributed uniformly over the sphere; this assumption can be fulfilled if the
plane waves impinge uniformly over a circle. Under this assumption, Kuster has
derived an expression for γ 2 which takes the same form as (5.24), but with R1 = 0,
and R2 = 0.5 [98].

Assumptions on reverberation used in this work

In many real acoustic environments, the diffused field assumptions for reverberant
field made in [12] and [98] often cannot be met, which may lead to inaccuracies in
the DRR estimation result. In this work, in order to improve the accuracy of DRR
2
For c00 and c10 , with Ωdir = (0, 0).
5.2 DRR estimation based on coherence measurements 71

estimation, we relax some assumptions made on the reverberant sound field. In

particular, we assume that the reverberant field satisfies the following conditions:

1: The average sound intensity (product of sound pressure and particle velocity)
[102] of the reverberant field has the same magnitude in x, y and z directions.

|E{P r Vzr ∗ }| = |E{P r Vxr ∗ }| = |E{P r Vyr ∗ }|. (5.25)

where P r and V r denote sound pressure and particle velocity due to reverber-
ation, respectively.

2: The expected energy of the reverberant field particle velocity is constant in x, y

and z directions.
E{|Vxr |2 } = E{|Vyr |2 } = E{|Vzr |2 }, (5.26)

3: The reverberant field sound intensity is zero mean when averaged over a frequency
band. Z k2
E{P r (k)Vzr (k)∗ }dk = 0. (5.27)
k1

where k1 and k2 represent the boundary of a frequency band.

In (5.27), the real part of P r Vzr ∗ is often referred to as the active sound intensity,
which represents the coherent flow of sound energy in the z direction [102]. The
imaginary part of sound intensity, on the other hand, is referred to as the reac-
tive sound intensity, which represents the coherent, but non-propagating, “standing
wave” sound energy. A detailed justification of the assumption (5.27) is given in
5.3.1.
In a diffuse sound field, both active and reactive components of the sound in-
tensity are equal to zero since the phase of particle velocity varies randomly. The
energy of particle velocity can be analytically computed [12, 98]. Applying these
results to (5.23) leads to simplified expressions of γ 2 as shown in [12] and [98].
However, this work do not assume diffuse field. Hence the expected energy of
particle velocity and the sound intensity cannot be directly computed without the
knowledge of the reverberant field. Therefore, a method to estimate these charac-
teristics is needed to compute the DRR. The following subsection describes one such
method, using measurements from a first (or higher) order microphone system.
72 Direct-to-reverberant energy ratio estimation using a first order microphone

5.2.4 Reverberant field estimation

From (5.7) and (5.8), it can be observed that the spherical harmonic coefficients β11
and β1,−1 do not contain the direct path signal. In fact, β11 and β1,−1 collectively
represent the particle velocity of the reverberations in the directions orthogonal to
the direct path. The assumptions on the reverberation (5.25) (5.26) and (5.27) can
be expressed using spherical harmonic coefficients as

E{α00 (α10 i)∗ } E{α00 (α11 i + α1,−1 i)∗ }

√ = √
−i 12π −i 24π
E{α00 (α11 i − α1,−1 i)∗ }
= √ , (5.28)
− 24π

2 · E{|α10 |2 } = E{|α11 + α1,−1 |2 } = E{|α11 − α1,−1 |2 }, (5.29)

and Z k2
E{α00 (k)(α10 (k) · i)∗ }dk = 0. (5.30)
k1

Since it is assumed that the direct path signal is uncorrelated with the reverber-
ation signal, substituting (5.3), (5.5), (5.7) and (5.8) into (5.28), we can write
√
2|E{α00 (α10 i)∗ }| = |E{C00 (C11 i + C1,−1 i)∗ }|
= |E{C00 (C11 i − C1,−1 i)∗ }|, (5.31)

which illustrates a way to indirectly estimate the value of |R1 | in (5.24). Using (5.5)
(5.6), the energy of the reverberation can be approximated by

Y002
E{|α00 |2 } = E{|C00 |2 } − (E{|C10 |2 } − E{|C10 |2 }), (5.32)
Y102 |H|2

If the plane wave model is used for the direct path, (5.32) can be simplified using
(5.22), as
E{|C |2 } E{|C |2 } E{|α |2 }
2 00 10 10
E{|α00 | } ≈ 2
− 2
+ 2
Y002 , (5.33)
Y00 Y10 Y10

where E{|α10 |2 } can be estimated using (5.29). Substituting (5.29), (5.31) and
5.2 DRR estimation based on coherence measurements 73

(5.32) into (5.17), the estimation expression for |R1 | can be written as

1 |E{C00 (C11 i + C1,−1 )∗ }| + |E{C00 (C11 i − C1,−1 i)∗ }|

|R1 | ≈ √ · Y2 Y2
, (5.34)
2 2 E{|C00 |2 } − E{|C10 |2 } Y 200H2 + Mpwr Y 200H2
10 10

where we define

1
Mpwr , (E{|C11 + C1,−1 |2 } + E{|C11 − C1,−1 |2 }) (5.35)
2

similarly, by substituting (5.7) (5.8) and (5.32) into (5.18), R2 can be written as

Mpwr
R2 ≈ Y2 Y2
, (5.36)
E{|C00 |2 } − E{|C10 |2 } Y 200H2 + Mpwr Y 200H2
10 10

It can be seen that all the coefficients required for the calculation can be acquired
by a first order microphone array directly. The estimated values of |R1 | and R2 can
be directly substituted into (5.21) or (5.24) for estimation of DRR using γ 2 .

5.2.5 DRR estimation procedure

Assuming that the value of DRR is positive, the solution for DRR can be found by
solving (5.20) or (5.24). For the plane wave model, the solution can be derived as
p
γ 2 + R2 γ 2 + 4|R1 |2 (γ 2 − 1) + γ 4 (R2 − 1)2 + 4R2 γ 2
DRR = , (5.37)
2 − 2γ 2

where the assumption (5.27) is used, which leads to Re{R1 } = 0. The calculated
DRR is in linear scale, and the more commonly used log-scale DRRlog is defined as

DRRlog = 10 log10 DRR. (5.38)

From our experience in testing the algorithm using the ACE Challenge Devel-
opment Dataset [100], the estimation of |R1 | and R2 at a single frequency is often
unstable. However, for typical room environments, one can assume that the charac-
teristics of reverberation do not vary rapidly over frequencies since sound waves of
similar wavelength are likely to have similar propagation modes. Therefore |R1 | and
R2 can be seen as constant if the frequency band of interest is sufficiently narrow,
then one can use the average values of |R1 | and R2 over a particular frequency band
for the calculation of DRR for this frequency band.
74 Direct-to-reverberant energy ratio estimation using a first order microphone

For subband and full band DRR estimation, the results are obtained by taking
the average of the single frequency DRR estimations within the band, then the
values are converted to log scale for convenience.
We recommend the following procedures to estimate the DRR of a particular
frequency band from a recording:

Step 1 Determine the direct path impinging direction using a suitable

Direction-of-Arrival (DOA) algorithm, which can be done using the
signal received by the first order (or higher order) microphone.
Step 2 Use an appropriate algorithm to detect the frames of the record-
ing that contain speech signal and calculate the 0th and 1st order
spherical harmonic coefficients for each frequency bin within the
frequency band.
Step 3 Rotate the spherical harmonics using the method in Chapter 2.2.3,
such that the z-axis is aligned with the direct path.
Step 4 Calculate |R1 | and R2 for each frequency bin, using (5.34) and
(5.36), then average over all the frequency bins to obtain an esti-
mation for the whole frequency band.
Step 5 Calculate γ for each frequency and using (5.20) or (5.37) with the
averaged |R1 | and R2 to estimate the DRR for each frequency.
Step 6 Average the DRR estimations calculated from each frequency bin
to obtain the subband or full band DRR estimation. Convert the
result to log scale.

A disadvantage of the original coherence method for DRR estimation is that the
angle between the direct path and the particle velocity measurement direction is
generally unknown, and in a real measurement, the microphone have to be pointed
towards the direct path [98]. In our improved method, since we use a first order
microphone for measurement, which records the complete sound field, it is possible
to derive the velocity measurement in any direction, through rotation of the spherical
harmonic coefficients. In addition, the data acquired by the microphone can be used
to perform Direction-of-Arrival (DOA) estimation for the direct path, therefore there
is no special requirement for positioning the microphone during measurements.

5.3 Impact of parameters on DRR estimation

5.3.1 Reverberation parameter

In order to illustrate the impact of R1 and R2 on the estimated DRR, we plot the
theoretical DRR against γ 2 using (5.24). with the diffuse field parameter setting
5.3 Impact of parameters on DRR estimation 75

0.8

R1 = 0, R2 = 0.5
0.6
γ2

R1 = 0, R2 = 0.33
R1 = 0, R2 = 0.28
0.4 R1 = - 0.15, R2 = 0.28
R1 = 0.15, R 2 = 0.28
R = 0.25i, R = 0.28
1 2
0.2
-10 -5 0 5 10 15 20
Direct-to-Reverberant Ratio (dB)

Figure 5.1: Theoretical γ 2 versus estimated direct-to-reverberant ratio (DRR) cal-

culated using (5.37), under various reverberation parameter settings.

proposed by Kuster [98] (R1 , R2 = 0, 0.5) and Jarrett [12] (R1 , R2 = 0, 1/3) as well
as a number of other values that were commonly found in our experiment (R1 , R2 =
0, 0.28; 0.15, 0.28; −0.15, 0.28; 0.25i, 0.28, respectively), as shown in Fig. 5.1. We
note that the assumption (5.27) is not applied here, in order to illustrate the impact
of R1 on the DRR estimation. It can be seen from Fig. 5.1 that depending on the
values of R1 and R2 , a deviation of ±3dB in estimated DRR can be observed for
low values of γ 2 .
From (5.17), it can be seen that R1 is equivalent to the sound intensity in the
z direction with certain normalization. Since all normalization factors are real,
the real and imaginary part of R1 correspond to the active and reactive sound
intensity, respectively. When Re{R1 } > 0, it indicates that the net energy flow of
reverberation coincides with the direct path signal, and as a result the reverberation
will be “added” to the direct path, and as a result contributes to coherence function
γ 2 positively. On the other hand, if Re{R1 } < 0, the net reverberation energy flow
in the z direction opposites the direct path, essentially cancelling part of the direct
path sound intensity, therefore it contributes to γ 2 negatively. As a result of this,
as can be seen in Fig. 5.1, for the same value of γ 2 , a positive Re{R1 } corresponds
to low value of DRR, and vice versa.
The absolute value of R1 represents the overall coherence of the reverberant
field in the z direction. This includes the reactive part of R1 , which corresponds
to the resonating reverberation energy. It can be seen from (5.24) that |R1 | always
contributes to γ 2 positively. Therefore, as seen in Fig. 5.1, a non-zero value of |R1 |
76 Direct-to-reverberant energy ratio estimation using a first order microphone

results in lower value of DRR, for the same γ 2 , this is especially significant at lower
values o γ 2 .
Using a first order microphone, it is possible to estimate |R1 | for each frequency
bin, if it is assumed that the reverberant sound intensity is uniform in each direc-
tion. Unfortunatelly, the sign of Re{R1 }, which indicates the direction of energy
flow, cannot be determined through observation of the sound field in its orthogonal
directions. However, by observing the reverberation sound field from the ACE Chal-
lenge Development Sataset [100], it was found that both active and reactive sound
intensity of the reverberation in the x and y directions have zero mean when aver-
aged over each 1/3 octave subband, indicating that the energy flow of reverberation
changes randomly and rapidly with frequency. Therefore it is reasonable to assume
that P Vz∗ is also zero mean when observed at multiple frequencies. As as result,
when averaging the estimated DRR over each subband, the impact of Re{R1 } (and
Im{HR1∗ } in (5.21)) on each frequency bin will be cancelled out, and the term can be
removed in the derivation of (5.37), provided that appropriate frequency averaging
is performed after calculating DRR for each frequency bin.
As can be seen from Fig. 5.1, R2 does not affect the estimated DRR as strongly
as R1 , and a lower value of R2 results in a slightly lower estimation of DRR. From
(5.18) it can be seen that R2 reflects the expected energy ratio between sound
pressure and particle velocity. In Jarrett’s diffuse field model [12], the value of R2
is lower (R2 = 1/3), therefore, we expect Jarrett’s method to yield a slightly lower
estimation of DRR compared to Kuster’s. From our analysis to the ACE Challenge
Development Dataset, the value of R2 typically varies between 0.25 − 0.33, which is
close to Jarrett’s model (see Table 5.2).

5.3.2 Nearfield sound source

In order to analyze the DRR estimation error due to using a plane wave to approx-
imate the direct path sound field, we compute the difference in the estimated DRR
using (5.20) and (5.24) (∆DRR , 10 log10 (DRRplane /DRRpoint )). It can be seen by
observing (5.20) that the calculated DRRpoint depends on the product kr0 . Fig. 5.2
plots ∆DRR as a function of kr0 , for various values of γ 2 . In this figure, for simplic-
ity, we assume that R1 = 0, R2 = 0.5. The selected values of γ 2 (0.86, 0.65, 0.33 and
0.19) correspond to DRRplane = 10dB, 5dB, 0dB and −2.5dB, respectively, using the
parameter settings described above.
From Fig. 5.2 we can see that the plane wave model results in higher DRR
5.4 Validation using ACE Challenge Database 77

5
γ2 = 0.86
4 γ2 = 0.65
∆DRR (dB) γ2 = 0.33
3
γ2 = 0.19
2

0
0.5 1 1.5 2 2.5 3
k·r0

Figure 5.2: Plot of theoretical DRR versus kr0 using plane wave model (5.24) and
point source model (5.20) with γ 2 = 0.86, 0.65, 0.33 and 0.19.

estimations than that of the point source model for smaller values of kr0 , where
∆DRR ≈ 2 − 4 dB for kr0 = 0, depending on the value of γ 2 . At higher frequencies
and larger source-microphone distance (higher kr0 ), the difference between the two
methods reduce rapidly, at kr0 > 3, the difference in the calculated DRR using the
two models becomes negligible.
Comparing the curves corresponding to each value of γ 2 , it can be seen that the
estimation error of the plane wave model is smaller when γ 2 is larger, corresponding
to higher values of DRR. The user may select the appropriate model for their ap-
plications, based on the target frequency band and expected source distance. Here,
we propose a rule of thumb for determining whether to use the point source model
or the plane wave model. When kr0 > 2, the error caused by plane wave model is
less than 0.5 dB for all values of γ 2 , as can be observed in Fig. 5.2. For kr0 < 2, the
use of point source model is recommended for improving DRR estimation accuracy.

5.4 Validation using ACE Challenge Database

5.4.1 The ACE Challenge Database

The ACE Challenge Database is used to validate our algorithm [100]. The database
consists of two datasets: the Evaluation dataset, and the Development dataset. The
Development dataset is provided to the ACE Challenge participants as a training
78 Direct-to-reverberant energy ratio estimation using a first order microphone

Table 5.1: Room dimensions (approx.) and minimum/maximum DRR for each room
recording configuration
Room Name Lecture Room 1 Lecture Room 2 Meeting Room 1 Meeting Room 2 Office 2
Length (m) 6.9 13.4 6.6 10.3 5.1
Width (m) 9.7 9.2 4.7 9.2 3.2
Height(m) 3.0 2.9 3.0 2.6 2.9
Volume (m3 ) 200 360 92 250 48
Setup A min DRR -0.82 -0.37 -2.0 -2.6 -0.44
Setup A max DRR 15 13 11 11 13
Setup B min DRR 0.87 -3.7 -3.1 1.1 -2.3
Setup B max DRR 7.9 6.4 7.6 12 9.5

database, using which the participants can train and fine tune their algorithms. The
Evaluation dataset is used to evaluate the performance of fine-tuned algorithms.
The Evaluation dataset consists of 4500 synthesized recordings of various con-
figurations. A total of 5 rooms are used to record the room impulse responses, with
two recording setups (positions) for each room. The room details are summarized in
Table 5.1. We note that although the impulse responses of 7 rooms were recorded ac-
cording to [100], only 5 of them are used to create the Evaluation dataset; the other
two rooms were used to create the Development dataset. The speech and noise setup
for the Development dataset differ from that of the Evaluation database, therefore
in this work, the Develop dataset is only used for developing the DRR algorithm;
the results presented in this section are all generated using the Evaluation dataset.
The impulse responses are recorded using an Eigenmike, and the reverberant
speech recordings are synthesized by convolving the impulse responses with anechoic
speech recordings [100]. The speech recordings consist of voices of 10 talkers, 5
female and 5 male, with 5 separate utterance recordings for each talker. Three
different types of noise (“Ambient”, “Fan” and “Babble”) are recorded separately
under the same room setup and mixed into the reverberant speech recordings, each
with three SNR settings (−1 dB, 12 dB and 18 dB).
The ground truths for both full band and subband DRR have been provided.
For subband DRR, the central frequencies for all bands have been chosen according
to the ISO standard [100].

5.4.2 Algorithm setup

Since the ground truth for direct path DOA is not given, we have to estimate the
DOA for each of the ten scene setups. This is done by segmenting each speech
recording into multiple short frames, and selecting the frames that correspond to
the beginning of each utterance (where the impinging signal is almost purely due to
5.4 Validation using ACE Challenge Database 79

the direct path). To find the frames containing speech, a simple speech detection
algorithm calculates the average signal energy of each frame, and select the frames
with higher energy, which are considered to contain the speech signal. If the energy of
a frame is significantly higher than the previous one, then this window is considered
to contain the beginning of an utterance. We then calculate the spherical harmonic
coefficients for each selected frame and for frequencies between 200-2000 Hz, and
perform a frequency averaged MUSIC DOA estimation in the spherical harmonic
domain [11, 103]. The estimated DOA is used for further calculations.

In order to maintain the highest possible frequency resolution while at the same
time to avoid violating the assumption that the direct path signal and reverberations
are uncorrelated, we choose the analysis window length to be 10 ms. When fine-
tuning our algorithm using the ACE Development Dataset, it was found that a
window length shorter than 10 ms does not reduce the average value of γ 2 , therefore
we assume that the chosen window length is appropriate.

For each speech recording, only the windows that contain the speech signal are
used for analysis. For each frequency subband, we calculate the 0th and 1st order
spherical harmonic coefficients for each selected window and for all the frequency
bins within each subband. We then follow steps 3 through 6 in Section. 5.2.5 to
estimate DRR for each subband.

Although the ground truth for subband DRR is given for all frequency bands
between 20 Hz and 20 kHz [100], the recorded speech signal does not cover the
complete spectrum. Therefore, we focus on the subbands with central frequency
between 199.52 Hz and 2511.89 Hz, where there is sufficient energy in the speech
recordings for DRR estimation. For this reason, we cannot estimate the full band
DRR in the complete sense, instead, we calculate the average DRR over the selected
subbands, which is used as the full band estimation. The full band ground truth
DRR used for comparison is also calculated by averaging the corresponding subband
ground truths, instead of using the full band DRR provided by the database.

The exact source-to-microphone distance is not provided by the database. How-

ever, according to the organizer of the ACE Challenge, the microphones are placed
at no less than 1 m away from the source for all recording scenarios. Since we only
focus on frequencies above 199.52, using the rule of thumb proposed in Section 5.3.2,
kr0 ≈ 3.66 > 2, therefore the plane wave model is sufficiently accurate, hence used
for DRR estimation.
80 Direct-to-reverberant energy ratio estimation using a first order microphone

The error of estimated DRR is defined as

DRRest
DRRerr = 10 log10 . (5.39)
DRRtruth

The mean and standard deviation of DRR estimation error is then calculated using
DRRerr from each recording.

5.4.3 Full band results

The full band DRR estimation results for the ACE Database are shown in Fig. 5.3.
In this figure, we plot the mean and standard deviation of the DRR estimations for
each of the 10 room configurations. Only the recordings with 18 dB SNR are used for
this analysis. In order to better evaluate the performance of the proposed method,
both Kuster’s [98] and Jarrett’s [12] methods were implemented for comparison. In
the case of Kuster’s method, since the algorithm requires a pair of omnidirectional
microphones placed very close to each other for recording, which is not available in
the ACE challenge (other microphone array setups used in the ACE Challenge have
a minimum spacing of 60 mm [100], which is too large for accurate measurement
of particle velocity), we use the 0th and 1st order spherical harmonics in place
of the sound pressure and particle velocity in calculation. Since it is shown that
the spherical harmonics are equivalent to sound pressure and particle velocity, this
implementation is expected to be representative of Kuster’s method.
It can be observed from Fig. 5.3 that all three methods yield less than 3 dB mean
error for all of the 10 room setups. The method proposed by Jarrett et al. shows
a similar trend as that proposed by Kuster, but with a slightly lower estimation of
DRR in most setups, as can be expected from Fig. 5.1. The proposed method, on
the other hand, results in 1 − 3 dB lower estimated DRR for most configurations.
A clear trend of DRR overestimation (estimated DRR higher than ground truth)
can be observed for both methods that assume diffuse reverberant field. This is
consistent with Kuster’s observations from his experiments, where his method tend
to overestimate DRR in real-life recording setups. The proposed method does not
show any clear tendency of overestimation or underestimation, with 5 of the setups
having positive mean error and the other 5 setups having negative mean error.
In terms of standard deviation, one would expect that the proposed method
would yield higher standard deviation compared to Kuster’s method, since in the
proposed algorithm, both |R1 | and R2 need to be estimated for each frequency
5.4 Validation using ACE Challenge Database 81

8
Proposed
Kuster
6 Jarrett et. al.

Estimated DRR (dB)

-2

A B A B A B A B A B
Lecture Rm 1 Lecture Rm 2 Meeting Rm 1 Meeting Rm 2 Office 2

Figure 5.3: Mean and standard deviation of estimated DRR using the proposed
method (blue), Kuster’s method (red) and Jarrett’s method (pink) for all 5 rooms
and 2 locations (A and B) in each room, with 18 dB SNR, averaged over 3 noise
types. Dashed lines indicate ground truth DRR.

band, which would add uncertainty to the distribution of estimated DRR. However,
from Fig. 5.3 it can be seen that the proposed algorithm yields almost identical
standard deviation as Kuster’s method, which indicates that the primary contributor
of standard deviation is the coherence function γ 2 , which is common for both the
proposed method and Kuster’s method.
On the other hand, Jarrett’s method results in the lowest error standard devia-
tion for all scenarios. The reason for this is that in the other two methods, only the
first order spherical harmonics are used to calculate the coherence γ 2 , while Jarrett’s
method utilizes all of the available spherical harmonic coefficients to reach a more
consistent estimation of γ 2 , which reduces its deviation due to random interference
and other sources of error.

5.4.4 Subband results

The subband estimation results are shown in Fig. 5.4. In this figure, we plot the mean
and standard deviation of the subband DRR estimation error using the proposed
method as well as the two baseline methods. The error mean and standard deviation
are averaged over the results from all 10 rooms, and once again only the 18 dB SNR
recordings are used for the analysis. Only the DRR for the subbands with central
frequency between 199 Hz and 2511 Hz are calculated.
82 Direct-to-reverberant energy ratio estimation using a first order microphone

6
Proposed
Kuster
DRR Estimation Error (dB)

4 Jarrett et. al.

-2

-4
199 251 316 398 501 631 794 1000 1258 1584 1995 2511
Central Frequency (Hz)

Figure 5.4: Mean and standard deviation of subband DRR estimation error for
all rooms and configurations with 18 dB SNR, using the proposed method (blue),
Kuster’s method (red) and Jarrett’s method (pink).

From Fig. 5.4 it can be seen that in general, the mean error of the proposed
method falls within 1 dB of the ground truth for all frequency bands. furthermore,
the subband results below 1000 Hz show a different pattern than the subbands
above 1000 Hz. Below 1000 Hz, the mean error are all positive, indicating a slight
overestimation of DRR; the error standard deviation is approximately 3 dB for
these subbands. On the other hand, for frequency bands above 1000 Hz, the mean
error becomes negative; the standard deviation of estimation error reduces to 2
dB at 1000 Hz, and decreases further at higher frequencies. On the other hand,
both Kuster’s and Jarrett’s methods show a clear trend of overestimation, this is
especially significant for Kuster’s method at lower frequencies. Jarrett’s method
yields lower DRR estimations compared to Kuster’s, and in most frequency bands,
have the lowest standard deviation.
Due to the geometry of the Eigenmike, only the 1st order spherical harmonics
can be reliably captured for frequencies below 1000 Hz [53]. Below 1000 Hz, the
2nd order spherical harmonics are aliased onto the 1st order coefficients, and the
aliasing error increases with frequency; at 1000 Hz and above, our algorithm begins
to calculate the 2nd order coefficients, which removes the aliasing and improves
the accuracy of the 1st order coefficients. Furthermore, at higher frequencies, the
wavelength of the sound becomes closer to the dimension of the Eigenmike (8.4
cm diameter), which further increases the accuracy of 1st order spherical harmonic
acquisition. This explains why the error standard deviation decreases gradually at
5.4 Validation using ACE Challenge Database 83

DRR Estimation Error (dB)

SNR = 18 dB
SNR = 12 dB
2
SNR = -1 dB

-2

-4

-6
199 - 398 Hz 501 - 1000 Hz 1258 - 2511 Hz

Figure 5.5: Mean and standard deviation of DRR estimation error with 18dB, 12dB
and −1dB SNR.

higher frequencies.
Overall it can be seen that compared to the two baseline algorithms, the proposed
method produces an unbiased DRR estimation. The standard deviation of the
proposed algorithm is on par with Kuster’s method, but slightly higher than Jarrett’s
method.

5.4.5 Impact of noise on DRR estimation

In order to examine the impact of noise (interference) on the result of DRR estima-
tion, the algorithm is run for the Evaluation dataset recordings of each SNR setting
(18dB, 12dB and −1dB), and we calculate the mean and standard deviation for each
SNR setting, the results are shown in Fig. 5.5. In this figure,the subband results
are separated into three frequency ranges: low (199-398 Hz), medium (501-1000
Hz) and high (1258-2511 Hz). Each frequency range covers four subbands, and the
subband results are averaged within each frequency range, in order to simplify the
data representation.
It can be seen from Fig. 5.5 that the difference between the estimation results
with 18 dB and 12dB SNR is less than 1 dB. At −1 dB SNR, however, the DRR
estimation becomes strongly biased towards underestimation. The cause of this
phenomenon is that the interference/acoustic noise, which does not have the same
impinging direction as the direct path signal, will reduce the coherence between
sound pressure and gradient (particle velocity), resulting in a lower value of γ 2 ,
thereby lowering the estimated DRR.
The other impact of high interference level is the increased error standard de-
84 Direct-to-reverberant energy ratio estimation using a first order microphone

4
DRR Estimation Error (dB)
Ambient
Babble
2
Fan

-2

-4

-6
199 - 398 Hz 501 - 1000 Hz 1258 - 2511 Hz

Figure 5.6: Mean and standard deviation of DRR estimation error in multiple noisy
environments with −1dB SNR.

viation. When developing and testing our algorithm using the ACE Development
Dataset, we noticed that our frequency averaged MUSIC DOA algorithm became
much less reliable at −1 dB SNR, compared to 18 dB and 12 dB SNR. A direct re-
sult of inaccurate DOA estimation is the decreased consistency of DRR estimations
at different utterance/interference configurations in the same room setup, which
is reflected by a higher error standard deviation. It is expected that if a more
interference-robust DOA algorithm is applied, or if the DOA information can be
measured directly, the proposed algorithm would produce more consistent estima-
tions at low SNR.
How different types of interference affect the performance of the DRR estimation
is also investigated. The three noise types mixed into the recordings each have
different spectral characteristics, and therefore their effects on the subband DRR
estimation vary. This is illustrated in Fig. 5.6, which plots the estimation results for
the low, medium and high frequency ranges and for each of the three noise types.
The SNR of all recordings used in this analysis are −1 dB.
From Fig. 5.6 it can be seen that the “Ambient” noise type has the least effect on
DRR estimation accuracy causing only a small bias towards under estimation, while
the “Babble” noise results in more than 3 dB of under estimation for all frequency
ranges. The “Fan” noise has slightly more impact than the “Ambient” noise type,
but less than that of the “Babble” noise. The cause of this result is due to both the
spectral and spatial characteristics of the different noise types.
Fig. 5.7 plots the normalized power spectrum of the three noise types, the spectra
are acquired by manually selecting the sections of recordings that contain purely
noise signal. It can be seen that the “Ambient” noise consists of primarily low
5.4 Validation using ACE Challenge Database 85

-10

Energy (dB) -20

-30
Ambient
-40 Babble
Fan
-50
10 1 10 2 10 3 10 4
Frequency (Hz)

Figure 5.7: Normalized power spectrum of the “Ambient”, “Babble” and “Fan”
noises in the ACE Evaluation Dataset.

frequency signals that do not overlap with the speech signal spectrum. Therefore,
the subbands of interest are most likely to have higher SNR than the full band SNR
of −1 dB. As a result, the ambient noise has the least effect on the accuracy of DRR
estimation. On the other hand, the “Babble” noise is essentially a speech recording
by itself, therefore it almost completely overlaps with the spectrum of the speech
of the talker, resulting in the lowest SNR in the speech spectrum of the three noise
types. The “Fan” noise has very similar spectral characteristics as the “Ambient”
noise type, although its higher frequency components have more energy than that
of the “Ambient” noise, which leads to slightly more impact on DRR estimation.

According to the ACE Challenge description [100], the “Fan” noise is generated
using one or two fans inside the recording environment, while the “Babble” noise
records the voices of up to 7 people talking around the recording location. The
“Ambient” noise is a recording of the ambient noise within the room. Due to the
larger number of uncorrelated sources, each with a different DOA, the “Babble” noise
is likely to have a lower coherence level than that of the “Fan” noise. Therefore when
mixed into the speech recording, the “Babble” noise would lower γ 2 further than
the “Fan” noise. Although the nature of the “Ambient” noise is unclear, in typical
room environments its source is likely to be AC vents or windows, both of which
can be considered as localized sources, thus creating a more coherent sound field
than the “Babble” noise. In addition, due to its spectral characteristics, its impact
on DRR estimation is the smallest of all three noise types.
86 Direct-to-reverberant energy ratio estimation using a first order microphone

Table 5.2: Mean of estimated parameters in each room configuration and frequency
range
|R1 | R2
Room Setup Low Med High Low Med High
A 0.280 0.194 0.219 0.288 0.251 0.265
Lecture Room 1
B 0.277 0.293 0.331 0.290 0.293 0.332
A 0.232 0.201 0.146 0.316 0.268 0.290
Lecture Room 2
B 0.239 0.189 0.232 0.337 0.277 0.314
A 0.191 0.120 0.157 0.294 0.248 0.253
Meeting Room 1
B 0.239 0.279 0.118 0.297 0.273 0.291
A 0.226 0.265 0.278 0.201 0.329 0.321
Meeting Room 2
B 0.211 0.215 0.225 0.241 0.255 0.281
A 0.268 0.167 0.174 0.252 0.269 0.286
Office 2
B 0.199 0.213 0.193 0.263 0.282 0.278

5.4.6 Estimated parameters from the ACE Evaluation Dataset

The parameters |R1 | and R2 estimated for each subband of every speech recording
in the ACE Evaluation Dataset has been recorded and is presented in Table. 5.2,
where we have taken the average values of |R1 | and R2 for the low, medium and
high frequency ranges and for all the recordings from each room configuration, only
the data from recordings with 18 dB SNR are used for this calculation.

As can be seen from Table 5.2, although the values of R1 and R2 vary for each
room configuration and frequency range, in general, |R1 | falls within the range of
0.15-0.25, while R2 lies in between 0.25-0.33 in the majority of cases. From Fig. 5.1,
it can be seen that the values of |R1 | and R2 shown in Table 5.2 would lead to our
proposed algorithm yielding lower DRR estimations than assuming R1 = 0, R2 =
0.5, which is indeed the case in our estimation results.

From the above results, we believe that setting |R1 | = 0.2 and R2 = 0.28 pro-
vides a more reasonable and accurate model for a general reverberant sound field
within room environments, compared to the diffuse model where it is assumed that
R1 = 0, and R2 = 1/2 or 1/3. It is sometimes easier to acquire or implement differ-
ential microphone pairs than complete first order microphone systems (such as the
Eigenmike), and when a differential array is to be used to estimate room DRR, we
suggest using (5.20) or (5.37) to calculate DRR, and assume |R1 | = 0.2, R2 = 0.28,
which is likely to yield more accurate estimation results.
5.5 Summary 87

5.5 Summary
In this work, we present a novel algorithm for estimating DRR using a first order
microphone system. We show that the proposed algorithm is a generalization of pre-
vious DRR estimation methods based on sound pressure-particle velocity coherence
function. Using the proposed algorithm, it is possible to estimate the characteristics
of a reverberant sound field which are relevant to DRR estimation, thereby improv-
ing the estimation accuracy of the method. We also show that at low frequency
and small source-to-microphone distance, using the plane wave model for the direct
path signal can result in a positive bias on the estimated DRR. Through validating
the proposed algorithm using the ACE Challenge Dataset, it was found that the
proposed algorithm provides ±2 dB mean estimation error for the frequency range
of human speech (200-2500 Hz), and shows no obvious bias.

5.6 Related Publications

This chapter’s work has been published in the following journal paper and conference
proceeding [104] [99]:

H. Chen, T. D. Abhayapala, P. N. Samarasinghe, and W. Zhang, “Direct-to-

reverberant energy ratio estimation using a first order microphone,”IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 25, no. 2, PP.
226–237, Feb 2017.

H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Estimation

of the direct-to-reverberant energy ratio using a spherical microphone array.,”
in Proc. ACE Challenge Workshop, a satellite event of WASPAA, New Paltz,
NY, USA, Oct 2015.

5.7 Proof of Equation (5.22)

The closed form expression of spherical Hankel functions of the first kind is [61]

l
(1) −l−1 −1 iz
X 1
hl (z) =i z e (l + , k)(−2iz)−k . (5.40)
0
2
88 Direct-to-reverberant energy ratio estimation using a first order microphone

(1) (1)
The expression of h0 (z) and h1 (z) can then be written as

(1) 1
h0 (z) = − ieiz (5.41)
z
(1) iz z + i
h1 (z) = − e , (5.42)
z2

substituting (5.41) and (5.42) into (5.22) yields

(1)
h1 (kr0 ) kr0 + i 1
lim (1)
= lim = lim − i + = −i, (5.43)
r0 →∞ h0 (kr0 ) r0 →∞ ikr0 r0 →∞ kr0

which completes the proof.

Chapter 6

Methods for spatial ANC

performance evaluation and
optimization

Overview: The use of spherical harmonic expansion to model noise fields enables
in-depth analysis and manipulation of the sound field. In this chapter, we introduce a
number of techniques for improving the spatial ANC performance. In Section 6.2, we
propose an improved sound field synthesis method based on spherical harmonic mode
matching. Through the use of spherical harmonic addition theorem, this method
allows the user to define a number of high priority regions within the quiet zone,
where greater noise attenuation can be achieved compared to the rest of the quiet
zone. In Section 6.3, we propose a new metric for measuring noise energy over a
spherical region, and use the new metric to evaluate the ANC performance of an
experimental ANC system. Finally, in Section 6.4, we use this metric to develop
a method for estimating optimum noise cancellation performance for a given noise
environment, and use this method to estimate the ANC performance in a passenger
car.

6.1 Introduction
The goal of spatial ANC is to minimize the noise level inside a certain quiet zone.
However, the exact “optimal” loudspeaker driving signals that would yield the best
noise reduction depends on the loudspeaker setup as well as the characteristics of
the quiet zone. For example, some regions within the quiet zone may be more

89
90 Methods for spatial ANC performance evaluation and optimization

important than the others, because the users are more likely to stay within these
regions. In such case, it would be beneficial to focus the ANC resources toward
these “more important” regions, which would result in a better overall ANC quality
than attenuating the noise evenly within the quiet zone.
Furthermore, the definition of “optimal” noise attenuation often depends on
the method employed to measure the noise level. For the noise level at a single
point in the space, one microphone is enough to pick up the sound pressure of the
noise; however, for a spatial region, measurement of the average sound pressure level
becomes much more complicated, as sampling the noise field at a few points within
the quiet zone cannot accurately represent the overall noise level inside the whole
quiet zone.
In addition, the ability to estimate, or predict the potentially achievable “op-
timal” spatial noise attenuation would be greatly beneficial to the design process
of spatial ANC systems. The designer would be able to find out the amount of
hardware that is necessary to achieve the desired noise attenuation, or to deter-
mine whether the available loudspeaker setup is sufficient for the ANC task, before
physically implementing a complete ANC system.
In this chapter, we utilize the spherical harmonic analysis technique to develop
a set of algorithms and tools to address the above problems. We show that char-
acterization and control of the noise field can be conveniently done by appropriate
manipulation of the spherical harmonic coefficients of the noise field. This chapter
is organized as follows:
In Section 6.2, we introduce a spatial single zone sound field reproduction tech-
nique which allows for higher reproduction accuracy within certain sub-zones while
maintaining a reasonable reproduction accuracy in the global region. By applying
the spherical harmonics addition theorem, we connect the spherical harmonic coef-
ficients of the global region with that of the local sub-zones, and use a weighting
method to enhance the reproduction quality at the sub-zones. This technique is
particularly useful when the available loudspeakers cannot provide very good sound
reproduction for the whole region, but a high accuracy is desired for at least some
sub-zones, such as some spatial ANC scenarios.
In Section 6.3, we propose a new metric for the measurement of average noise
level over a region. It is formulated in terms of the spherical harmonic decomposi-
tion of sound fields. Through a series of experiments, we show that the proposed
metric provides superior characterization of the noise level within the control region,
compared to existing methods where a number of microphones are placed around
6.2 Enhanced sound field reproduction within prioritized control region 91

the control region to sample the noise level. This metric is particularly suitable for
environments with irregular geometry and a fixed control region with moderate size,
such as vehicle and aircraft cabins.
In Section 6.4, utilizing the noise level metric developed in Section 6.3, we
evaluate a passenger car’s integrated loudspeakers’ noise cancelling capabilities by
analyzing the in-car noise field and the loudspeaker responses. Our proposed anal-
ysis method decomposes the noise field into a number of basis sound patterns, and
evaluate the loudspeakers’ capability at reproducing these basis patterns, then cal-
culate the expected overall noise reduction based on these results. Our results show
that the noise field inside a vehicle cabin has a sparse nature, and that the car’s
loudspeakers are capable of cancelling the noise around the passengers’ head posi-
tions.

6.2 Enhanced sound field reproduction within pri-

oritized control region
This section proposes an enhanced method for synthesizing the sound field using
a relatively small number of secondary sources which allows improved synthesizing
accuracy for certain subregions of the interested zone. We introduce the spherical
harmonic translation into the mode matching algorithm to acquire a uniform modal-
domain representation of the sound fields within different sub-regions. Then by
changing the weighing of each region, the least mean squares solution can be easily
controlled to cater for certain prioritized reproduction requirements. This method
is shown to be especially effective in the situations where the number of secondary
sources is limited.

6.2.1 Background
In 3D sound field synthesis, a fundamental problem arises which makes implemen-
tation very difficult: the synthesis quality is strongly related to the number and
position of the loudspeakers [105–107]. The ideal placement of the loudspeakers for
the mode-matching technique is to have the loudspeakers evenly distributed on a
sphere surrounding the interested region [44], such structure is impractical in real-
ity. To solve this problem, an array configuration for 3D sound field synthesis using
multiple circular loudspeaker arrays was proposed by Zhang and Abhayapala [108],
this method uses a functional analysis based algorithm to derive the driving signals.
92 Methods for spatial ANC performance evaluation and optimization

Still, the trade off between the number of the loudspeakers and the size and
frequency of the reproduction zone exists. The reproduction quality degrades rapidly
as the number of loudspeakers becomes less than the minimal required number.
In the case that the interested region can be separated and reduced into a few
smaller regions, it is possible to control the sound field in these small regions through
spatial multizone reproduction techniques [109]. However, the calculation involves
matrix inversion, and if done without proper regularization, the results may be
highly unstable.
The goal of this section is to introduce a spatial single zone sound field repro-
duction technique which allows for higher reproduction accuracy within certain sub-
zones while maintaining a reasonable reproduction accuracy in the global region.
This can be achieved by balancing between the single zone reproduction and the
spatial multizone reproduction techniques. Through the use of spherical harmonic
translation, the mode-matching method can be simultaneously applied to both the
global interested zone and certain sub-region within it (referred to as high priority
regions), and by adjusting the weighing factors in the LMS solution, one can easily
control the reproduction quality of different regions. This technique is particularly
useful when the reproduction region is large, but an insufficient number of loud-
speakers are available, and/or in applications where a high reproduction accuracy
is required for certain sub-zones, such as active noise cancellation.

6.2.2 Problem formulation

A common way of deriving loudspeaker driving signals to produce a certain desired
sound field is by pressure matching in the modal domain, which has been briefly
reviewed in 2.3.3. Given V loudspeakers and spherical harmonic coefficients up
to order L representing the sound field, the channel information between the v th
loudspeaker and the nth mode can be denoted Hvn , the channel matrix is then
expressed as H, where
 
1 2 V
H00 H00 . . . H00
 1 2 V 
 H11 H11 . . . H11 
H=
 .. .. ... . , (6.1)
 . . .. 
1 2 V
HLL HLL . . . HLL

v
where Hlm is the spherical harmonic coefficient of order l and mode m, due to the
vth loudspeaker playing an unit signal. The total number of coefficients is given by
6.2 Enhanced sound field reproduction within prioritized control region 93

N = (L + 1)2 . A desired sound field of the same order can be expressed as a column
vector of spherical harmonic coefficients
h iT
Q = Q00 , Q11 , Q10 , . . . QLL . (6.2)

The least mean square solution for the driving signals D can be written as

D = H −1 Q, (6.3)

where [·]−1 denotes pseudoinverse of the matrix. This minimizes the cost function

L = (Q − HD)H (Q − HD). (6.4)

We now consider a sub-region Oq within the global reproduction region. We

denote the spherical harmonic coefficients of the desired sound field with respect to
Oq as Qq , and denote the channel matrix for this sub-region as H q . The driving
signals that minimizes the reproduction error within the sub-region can be written
as
D = H −1 q Qq , (6.5)

which minimizes the cost function

Lq = (Qq − H q D)H (Qq − H q D). (6.6)

Our goal is to find a driving function, which minimizes a new cost function that
contains both L and Lq , and also has a weighting factor which could further enhance
the reproduction accuracy within the sub-region. This can be expressed as

min{L + αLq }, (6.7)

where α is a weighting factor.

6.2.3 Combined Least Mean Square Solution for sound field

reproduction
Using the spherical harmonics addition theorem (2.16), the sound field coefficients
C q at Oq can be expressed using the global coefficients C and a translation matrix
S
b q , as C q = S
b q C. The channel matrices also have a similar relationship H q = S
b q H.
94 Methods for spatial ANC performance evaluation and optimization

Then, a LMS solution for synthesizing the desired sound field only within the sub-
region Oq can be found by solving

S
b qQ = S
b q HD. (6.8)

Thus D can be expressed as

b q H)−1 S
D = (S b q Q. (6.9)

It should be noted that the solution provided by (6.9) normally requires regular-
ization, since the matrix S
b q H may be ill conditioned, which may result in very large
driving signals for the loudspeakers. This is especially true when the sub-region Oq
is small.
The local cost function Lq can be expressed as

Lq = (S b q HD)H (S
b qQ − S b qQ − S
b q HD), (6.10)

which corresponds to the sum of the squared errors in all local spherical harmonic
coefficients in the subregion Oq .
The combined cost function (6.7) can then be written as

Lall = E H H
G E G + αE Q E Q , (6.11)

where E G = (Q − HD)H (Q − HD) is the global error vector, and E Q = (S b qQ −

b q HD)H (S
S b q Q−S
b q HD) is the local error vector. α controls the relative importance
of the reproduction accuracy at region Oq .
A LMS solution that minimizes (6.11) can be derived as
" # " #
αS
b qQ αS
b qH
= D, (6.12)
Q H

whose solution for D is

" #−1 " #
αS
b qH αS
b qQ
D= . (6.13)
H Q

The solution (6.13) considers not only the set of global coefficients, but also a linear
mapping of these coefficients which correspond to the sound field in a sub-region Oq
6.2 Enhanced sound field reproduction within prioritized control region 95

within the interested zone. By adding a weighing factor α, the priority of the sub-
region can be controlled. When α = 0, the sub-region is ignored and the solution
becomes identical to (6.3); if α = 10, the local reproduction accuracy becomes
10 times more significant than the global accuracy, and as a result the driving
signals D would construct a sound field where the reproduction error within Q is
approximately 10 times smaller than the global average.
Equation (6.13) can be extended to the multiple sub-region case, with each region
controlled by a separate weighing factor
 −1  
α1 S
b q1 H α1 S
b q1 Q
   
 α2 S
b q2 H   α2 S b q2 Q 
.. ..
   
D= . (6.14)
   
 .  
  . 
α S H  α S Q 
 n qn   n qn 
b b
βH βQ

It can be seen that by setting αn = 1, ∀n and β = 0, (6.14) becomes a solution for

spatial multizone sound field reproduction.
Compared to the spatial single zone sound field reproduction technique, this
method allows more accurate reconstruction within some more critical areas; com-
pared to the multizone reproduction method [109], which provides optimal recon-
struction result within a few smaller regions but has no guarantee on the sound
field in between these regions, this approach offers a balanced solution, where the
the whole zone of interest is reproduced while a few sub-regions are given higher
priorities. This method is also more stable and more predicable than the multi-zone
method, since the global channel matrix H improves the condition of the matrix
inversion in (6.14) , as a result, the derived driving signal D normally has limited
power, and further regularization is normally unnecessary.
Another merit of this technique is when only a relatively small number of loud-
speakers are accessible. Specifically, when the total number of loudspeakers is in-
sufficient for producing the desired sound field for the whole interested zone, or
the placement of the loudspeaker array disallows synthesizing sound waves imping-
ing from certain directions, the normal LMS algorithm finds the best solution that
gives the minimal average error across the whole zone. However, if the application
has very high requirements on reproduction accuracy, for example active noise can-
cellation, this technique may fail to meet the demand, especially when insufficient
loudspeakers are available. Using this new technique, it is possible to utilize the
96 Methods for spatial ANC performance evaluation and optimization

limited amount of loudspeakers to reproduce the sound field accurately in certain

critical sub-zones, such as the area where the listener is most likely positioned, while
only slightly degrading the performance in the rest of the zone.

6.2.4 Simulation Results

The simulations are set up to synthesis a certain sound field based on microphone
array recordings of the sound field. First, one or more point sources are set up to
be primary sources, the sound fields in the interested region due to these primary
sources are captured by a microphone array, this captured sound field becomes
the desired field Q. Then, the sound fields due to each secondary loudspeaker is
recorded separately, which forms the channel matrix H. Next, different algorithms,
including the LMS mode matching, multizone and the proposed prioritized LMS
mode matching are used to derive loudspeaker driving signals D to best synthesize
the sound field. For the latter two algorithms, one or more sub regions within the
interested zone are selected as high priority area. Finally, the performance of the
three methods are compared by plotting the synthesized sound fields on multiple
elevations.
Figure 6.1 plots the case where 36 loudspeakers are arranged into three semi-
circles, placed 1 meter away from the origin, each semi-circular array consists of
12 loudspeakers, spanning from φ = 0 to φ = π. The three arrays have elevation
angles θ = 3π/8, π/2, and 5π/8, respectively. A single point source is placed at
(R, θ, φ) = (6 m, 2π/5, π/2), which generates a 540 Hz sine wave.The region of
interest is a 0.7 m radius sphere centered at the origin, in addition, a priority sub-
region is chosen to be a sphere of 0.4 m radius, centered at (0.2 m, π, 0).
Figure 6.1 (a) plots the real sound field produced by the primary source; (b)
shows the wave field synthesis result using the LMS method; (c) plots the result
using prioritized LMS method; and finally (d) plots the LMS result for the high
priority region only. It can be seen that both the conventional LMS method and the
proposed method yield acceptable reproduction result. Closer observation would
show that the proposed method gives a more accurate reproduction of the sound
field within the central circle, which represents the high priority zone. On the other
hand, in the case of (d), although the number of loudspeakers provided is sufficient
for an approximate reproduction in the high priority zone, due to the relatively long
distance between the secondary sources and the reproduction area, the algorithm
resulted in very large signal power, even with regularization parameters inserted.
6.2 Enhanced sound field reproduction within prioritized control region 97

a b
−1 −1

−0.8 −0.8

−0.6 −0.6

−0.4 −0.4

−0.2 −0.2
m

m
0 0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m

c d
−1 −1

−0.8 −0.8

−0.6 −0.6

−0.4 −0.4

−0.2 −0.2
m

0 0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m

Figure 6.1: Comparison of three methods for sound field reproduction on θ = π/2
plane. (a) plots the desired sound field; (b) plots the synthesized sound field using
LMS mode matching for global region; (c) plots the synthesized sound field using
proposed prioritized region LMS method, and (d) plots the synthesized sound field
using LMS mode matching for the high priority region only.
98 Methods for spatial ANC performance evaluation and optimization

The proposed prioritized LMS algorithm, however, managed to synthesis the

sound field with decent accuracy within the high priority zone, while also giving a
reasonable reconstruction result for the entire reproduction region. Furthermore,
the driving signals for the secondary sources are also limited, due to the global
coefficients also acting as regularization parameters for the matrix solution. Clearly,
in this case, the proposed method yields the best result among the three approaches.
When multiple priority regions are defined, the proposed algorithm will optimize
the result for all of these regions according to the weighing factor given to these
regions. A simulation result using the same setup as Fig. 6.1 is shown in Fig. 6.2.
The only difference is that in Fig. 6.2, two priority sub-regions were defined, their
locations were set to be (0.3 m, π/2, 0), and (0.4 m, π/2, 0.8π), and the radius are
0.4 m and 0.3 m, respectively. The weighing factor for both regions were set to
be α = 10, and the weighing for the global coefficients was β = 1. We note that
only the proposed method was able to reconstruct the sound field. Although there
are inevitable errors, the synthesized sound field was a good approximation to the
desired one, especially within the two prioritized sub-regions.
It has been mentioned in Section 3 that the weighting of each priority zone can be
adjusted independently to change the solution of the LMS algorithm, so as to further
improve the reproduction accuracy of certain priority area. This is demonstrated
in Fig. 6.3, where two priority zones of 0.3 m radius were chosen. Plot (a) shows
the desired sound field, which is reconstructed using the secondary sources and the
reconstruction errors are shown in plots (b) and (c). In (b), both regions were set to
have the same weighing α = 10 while the global sound field has a weighing β = 1.
In (c), however, the upper right priority zone has its weighting increased to α1 = 30,
while the other priority zone’s weighing is reduced to α2 = 5. It can be observed that
compared to (b), (c) gives a more accurate reproduction in the upper right region,
while the region on the left has a slightly worse accuracy. The global reproduction
accuracy also degraded slightly, due to the increased weighting for priority zone Q1 .
Table 6.1 lists a series of simulated data. The normalized mean square error and
average error percentage are calculated for different scenarios, covering LMS based
sound field synthesis with 0, 1, 2 and 3 high priority sub-regions. In order to acquire
more accurate synthesis results, a total of 60 loudspeakers are place in a semi-
circle one meter away from the origin. Despite the increased number of secondary
sources, there is still insufficient secondary sources to completely synthesis the sound
field. This is to show the advantage of the proposed algorithm in the cases where
insufficient secondary loudspeakers can be used.
6.2 Enhanced sound field reproduction within prioritized control region 99

a b
−1 −1

−0.8 −0.8

−0.6 −0.6

−0.4 −0.4

−0.2 −0.2
m

m
0 0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m

c d
−1 −1

−0.8 −0.8

−0.6 −0.6

−0.4 −0.4

−0.2 −0.2
m

0 0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1 1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m

Figure 6.2: Comparison of three methods for sound field reproduction with 2 high
priority zones. (a) plots the desired sound field; (b) plots the synthesized sound field
using LMS mode matching for global region; (c) plots the synthesized sound field
using proposed method with two high priority regions, and (d) plots the synthesized
sound field using LMS mode matching for the high priority regions only.
a b −3 c −3
x 10 x 10
−1 0.1 −1 5 −1 5

4 4
−0.5 0.05 −0.5 −0.5

3 3
m

0 0
m

0 0
2 2

0.5 −0.05 0.5 0.5

1 1

1 −0.1 1 0 1 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
m m m

Figure 6.3: Reproduction error plots for different high priority zone weight set-
tings;(a) desired sound field, (b) reproduction error with α1 = α2 = 10, and (c)
reproduction error with α1 = 30, α2 = 10.
100 Methods for spatial ANC performance evaluation and optimization

Max Order Coefficient No. Weight Mean Square Error Error %

Global 10 121 1 100% 7.24%
Global 10 121 1 138.53% 7.75%
Zone 1 3 16 10 0.26% 0.48%
Global 10 121 1 131.49% 7.41%
Zone 1 4 20 5 5.43% 1.93%
Global 10 121 1 185.24% 8.53%
Zone 1 4 20 4 7.13% 2.22%
Zone 2 3 16 12 0.59% 0.70%
Global 10 121 1 135.38% 8.91%
Zone 1 3 16 3 12.75% 3.85%
Zone 2 2 9 10 0.43% 0.83%
Zone 3 4 20 2 37.21% 7.67%

Table 6.1: Reproduction accuracy for different priority zone settings

The first row in Table 6.1 shows the performance of the normal LMS algorithm as
a reference. The mean square error of the LMS method without prioritized control
is used as a reference for comparison of mean square error of different setups. It
can be seen that an average error of 7.24% is observed from a total of 121 sound
field coefficients. The next four rows show the simulation results for 1 priority zone,
with its maximum order set to 3 and 4, respectively. In both situations, an increase
in the global mean square error is observed; the global error percentage also saw
a slight increase. Most importantly, the error percentages in the priority zones are
much smaller than the global error percentage (0.48% and 1.93% ), the error in row
5, Table 6.1 is greater because of the larger size of the priority region, as well as the
lower weight applied to the priority zone.
The rest of Table 6.1 shows the simulation results for 2 and 3 high priority
regions, in both cases each sub-region is given a different weighting factor. The
effect of the weighting factors can be easily seen, as the sub region with the largest
weighting assigned always result in the lowest error percentage, while the regions
with low weights and large radii only see a slight improvement over the global zone.
Another observations is that the global synthesis precision degrades more greatly
when a large weight is given to the high priority zone. Therefore, in practice, we
recommend to choose the weightings and the radii of the sub-regions according to
the needs, rather than simply using overly large values.
In order to investigate the impact of the weighting factor on the prioritized
sub-region and the global sound field, a series of simulations are carried out. The
6.2 Enhanced sound field reproduction within prioritized control region 101

0.12
Global Error
Local Error

0.1

0.08
Error Percenage

0.06

0.04

0.02

0
0 1 2 3 4 5 6 7 8 9 10
High Priority Zone Weight

Figure 6.4: Effect of the weighting factor on the local and global error percentage

simulations evaluate the reproduction error of a sub-region, located at (R, θ, φ) =

(0.1 m, π/2, π) with a maximum order of 3, within the global region of interest
whose maximum order is 10. The global region is given the weight β = 1 while
the weighting factor for the sub-region varied from 0 to 10. The resulting error
percentages are plotted in Figure 6.4.
It is easy to identify the exponentially decaying line in Figure 6.4 as the sub-
region error percentage. As the weighting factor changed from 0 to 10, the average
reproduction error within this sub-region went from 11% down to a very low 1.2%.
Meanwhile, the global error grew slowly from 7.2% to 8.6%. Further investigation
shows that the rate at which the global error increases tend to reduce as the local
weighting becomes larger.

6.2.5 Observations and insights

We can observe that when using the proposed method to synthesis a sound field,
the global reproduction accuracy is slightly reduced, in exchange for the greatly
improved reproduction quality in the prioritized sub-region. However, the tradeoff
between the local and global region depends on the ratio of size between them, i.e.,
if the local region is very small compared to the global region, then the reproduction
accuracy within it can be improved significantly, with minimum loss on the global
102 Methods for spatial ANC performance evaluation and optimization

accuracy; if the local region is large, the accuracy gain may become smaller. In prac-
tice, Figure 6.4 can be used as a trade-off guidance when choosing the appropriate
weighting factors for each region.

6.3 Evaluation of spatial active noise cancellation

performance using acoustic potential energy
This section presents a novel metric to evaluate the performance of spatial active
noise cancellation (ANC) systems. We show that the acoustic potential energy
within a spherical region can be expressed by a weighted squared sum of spherical
harmonic coefficients. This metric allows convenient evaluation of spatial ANC
performance using a spherical microphone array.
In order to demonstrate the usefulness of the proposed metric, we carried out
experiments using a MIMO system with 5 error microphones and 2 secondary loud-
speakers attempting to cancel a noise field caused by a single primary source. The
microphone recordings required to evaluate the spatial performance metric were
obtained using a 32 channel Eigenmike [110].

6.3.1 Background
For the successful development of an ANC system, it is important to accurately
measure their noise reduction capability over space, especially at the design stage.
At present, the performance of ANC systems is analyzed in terms of (i) sound
pressure at the error microphones or (ii) recordings from a secondary microphone(s)
in the cancellation region. The first approach is widely used in theory, where noise
reduction performance is characterized by the average noise reduction in decibels at
the error microphones [111]. The second approach is mostly used with human head
shaped mannequins with 2 microphones placed at the ear locations to interpret
the noise reduction levels experienced by humans [112, 113]. While both of the
above methods are adequate to obtain an acceptable measure of the ANC system
performance, their accuracy in terms of spatial coverage is limited due to the limited
number of measurement points and the sparse nature of the spatial sampling.
Here, we propose an improved metric to evaluate the noise reduction in spatial
regions. It is formulated in terms of the spherical harmonic decomposition of sound-
fields and requires the measurements from a secondary microphone array distributed
over a spherical surface, preferably enclosing the center of the region of interest. The
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 103

spatial metric is defined as the acoustic potential energy inside a spherical region,
and we formulate it in terms of the aforementioned microphone array recordings.
A similar spatial metric was introduced in [114] for rectangular enclosures, where
the acoustic potential energy was described in terms of room modes. However, the
results were limited to simulations and the extraction of room modes is difficult in
practical applications where the natural modes of a room depends greatly on the
geometry of the room.
The proposed potential energy method calculates the average noise level in the
entire spatial region, therefore there is no need to take multiple samplings of the con-
trol region, which simplifies the process of ANC performance evaluation. Compared
to [114], our method represents the potential noise using spherical harmonic coef-
ficients rather than room modes, which can be conveniently captured by spherical
microphone arrays. The main advantage of this approach is its applicability to any
arbitrary enclosure and its independence from the ANC system of use. Therefore
our method is particularly suitable for environments with irregular geometry and a
fixed control region with moderate size, such as vehicle and aircraft cabins.

6.3.2 Calculation of the acoustic potential energy

Our goal is to find a metric that best represents the average noise energy level
within a spatial region. A sensible measure of the average noise level inside a region
is the acoustic potential energy [114], which is defined as the integral of squared
sound pressure over the entire region. In a spherical region of interest, it can be
represented by

Z
1
Ep (k) = |P (r, θ, φ, k)|2 dS (6.15)
4ρ0 c2 S

R R R R π R 2π
where S dS = 0 0 0 r2 dr sin(θ)dθdφ denote the integral over a sphere. Using
(2.1), we can decompose the integral of sound energy as [115]
Z
|P (r, θ, φ, k)|2 dS (6.16)
S
Z R Z π Z 2π
= P (r, θ, φ, k)P ∗ (r, θ, φ, k)r2 dr sin(θ)dθdφ (6.17)
0 0 0
X Z R
∗
= Clm (k)Clm (k) jl2 (kr)r2 dr, (6.18)
l,m 0
104 Methods for spatial ANC performance evaluation and optimization

where the orthogonal property of the spherical harmonics (2.5) was used. Therefore
(6.15) can be expressed using the spherical harmonic coefficients as

1 X
Ep (k) = |Clm (k)Wl (k)|2 , (6.19)
4ρ0 c2 l,m

where ρ0 denotes the density of the media and c is the speed of sound, and we define
Z R 1/2
Wl (k) , jl (kr)2 r2 dr . (6.20)
0

The above result shows that the acoustic potential energy within a spherical
region is given by a sum of squared spherical harmonic coefficients with the weighting
Wl (k).

The commonly used criteria for ANC performance evaluation measures the at-
tenuation of the noise energy at microphone positions, the microphones are either
the error microphones themselves, or some additional microphones placed within
the region of interest. In the former case, it is difficult to gain any insight into the
spatial ANC performance of the system, due to lack of sampling of the noise level
inside the control region. When additional microphones are utilized to measure the
noise level inside the control region, it is necessary to sample the control region at
multiple locations in order to have a complete evaluation of the noise attenuation.

On the other hand, the proposed potential energy method calculates the average
noise level in the entire spatial region, therefore there is no need to take multiple
samplings of the control region, which simplifies the process of ANC performance
evaluation. In [114], the potential energy criteria is applied to evaluate the noise
level inside rectangular cabins. However, in practical scenarios, the natural modes of
a room depends greatly on the geometry of the room, and measuring these modes in
a practical environment is very difficult. Our method, on the other hand, represents
the potential noise using spherical harmonic coefficients, which can be conveniently
captured by spherical microphone arrays. Therefore our method is particularly
suitable for environments with irregular geometry and a fixed control region with
moderate size, such as vehicle and aircraft cabins.
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 105

Secondary
source 1

ͲǤͻͷ Mic 5

Mic 4

ͲǤͳ
ܱ
ͳǤ͵ͷ Mic 1
ͲǤʹ Mic 3
Primary source
ͲǤͻͷ Mic 2

Secondary
source 2

Figure 6.5: Loudspeaker and microphone placement for the experiment.

6.3.3 Performance evaluation

Experiment setup

Figure 6.5 shows the hardware configuration of the system, where the control region
is defined as a spherical area with 0.1 m radius, which approximately covers the size
of a human head. Five AKG CK92 omnidirectional microphones are placed evenly
on the horizontal plane boundary of the region, which act as the error sensors.
In order to investigate the differences in ANC performance due to different error
microphone setups, we vary the radius of the error microphone array, as well as the
number of active microphones in the array. The array radius is varied between 10
cm and 20 cm, and the microphones used in each experiment are either (i) all five
microphones, or (ii) “Mic 2” and “Mic 5” shown in Fig. 6.5 only. This results in a
total of four combinations of array radius and microphone number.
Three TANNOY 600 loudspeaker are used as the primary and secondary sources.
The two secondary speakers are placed on either sides of the primary source, forming
an angle of 72 degrees.
The error microphone signals are transmitted to a PC, which performs the adap-
tive ANC algorithm and generates the secondary loudspeaker driving signals in real
time. Since the focus of this experiment is not on the performance of the MIMO
ANC algorithm itself, the reference signal is obtained directly from the electronic
106 Methods for spatial ANC performance evaluation and optimization

signal path of the primary source, rather than using a separate reference micro-
phone. This eliminates the feed back signal path from the secondary sources to the
reference sensor which may affect the ANC performance.
An Eigenmike is placed at the center to monitor the noise field within the control
region. Although the Eigenmike is capable of capturing spatial sounds up to 4 th
order at 4 kHz, we only focus on the lower frequency sounds (up to 800 Hz and 1st
order). This is because at higher frequencies, the second order spherical harmonics
begins to have a higher contribution towards the sound energy close to the boundary
of control region, but the Eigenmike is unable to capture the second order sound
field at that frequency, due to its smaller radius (4.2 cm) compared to the control
region.
A separate computer is used to process the audio signal recorded by the Eigen-
mike and calculate the potential energy while the ANC system is running. The
Eigenmike is not involved in the signal path of the ANC system in any way.
Both narrow-band and wide-band signals are used as the primary noise for the
experiments. The narrow-band signals are sine waves with frequencies 100 − 800
Hz; the wide-band signal is generated by filtering a in-car noise recording through
a 100 − 800 Hz bandpass filter.
For narrow-band experiments, a sine wave is played through the primary speaker,
then we calculate the average sound energy recorded by each error microphone with
and without ANC, and calculate the attenuation of the noise energy due to ANC.
The attenuation of the average sound energy within the control region is measured
in the same way.
For wide-band experiments, we play the wide-band noise and record a section of
signal from the error microphones as well as the Eigenmike while the ANC system is
not active, then repeat the recording with ANC active and fully converged. We then
calculate the average frequency spectrum of each recorded section. The playback
and recordings are synchronized such that the same section of noise signal is recorded
each time.

6.3.4 Result analysis

Effect of microphone array radius

In order to investigate the effect of microphone array radius on the performance

of the ANC system at different frequencies, as well as the differences between the
potential energy criteria and microphone pressure criteria, we conduct the narrow-
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 107

Figure 6.6: Picture of the experiment setup, the small loudspeakers in the back-
ground are not used in the experiment.

band ANC experiments for microphone array radius r = 10 cm and r = 20 cm,

utilizing all 5 microphones. The energy attenuation of both potential energy and
microphone received signal are shown in Fig. 6.7 (a).
It can be seen from Fig. 6.7 (a) that for all configurations, overall the noise
attenuation becomes smaller as the frequency increases. A potential energy attenu-
ation of over 10 dB can be achieved for frequencies up to 450 Hz if the microphones
are placed on the boundary of the control region; when the microphones are placed
further away, the attenuations are worse by 5 − 10 dB for frequencies above 300 Hz.
This result is intuitive because when the microphones are placed further away, the
correlation between the sound pressure inside the control region and the sound pres-
sure at microphone positions becomes smaller, and this is especially true for higher
frequencies, where the wavelength is shorter. However, it can be seen that at very
low frequencies (below 200 Hz), both radii result in very similar ANC performance
in terms of potential energy attenuation.
Comparing the curves corresponding to microphone signal attenuation with those
corresponding to potential energy attenuation, it can be found that when the mi-
crophones are placed at the boundary of the region, the microphone signals are a
good indication of the potential energy inside the region; when the microphones
are placed further away, however, the microphone signal attenuations become much
smaller than the noise attenuation observed inside the region. In this case, the mi-
108 Methods for spatial ANC performance evaluation and optimization

(a)
0
Attenuation (dB)

-5

-10 Mic average, R = 10 cm

Potental energy, R = 10 cm
-15 Mic average, R = 20 cm
Potential energy, R = 20 cm
-20
100 200 300 400 500 600 700 800
Frequency (Hz)
(b)
10
Attenuation (dB)

-10

-20

-30
100 200 300 400 500 600 700 800
Frequency (Hz)

Figure 6.7: Average narrow-band noise energy attenuation at control region and
microphone locations using 5 error microphones (a) and 2 error microphones (b).
Legend of (b) is the same as (a).

crophone signals are no longer a good indication of the ANC systems’s performance.

Effect of microphone number

The effect of the number of error microphones is also investigated. For this purpose,
we repeated the narrow-band experiments with only two error microphones active,
and the noise attenuation results are shown in Fig. 6.7 (b). From this figure, it can
be observed that both the microphone signal attenuations and the potential energy
attenuations become very different from the case where 5 microphones are used. In
particular, the microphone signals can achieve more than 10 dB attenuation for all
frequencies, and the attenuation does not decay with increased frequency; on the
other hand, the potential energy attenuation is significantly worse compared to the
5 channel case, and the value even became positive (higher noise level with ANC
active) at some higher frequencies.
The cause of this phenomenon is that the number of error microphones is equal
to the number of secondary sources, therefore a solution always exists to significantly
6.3 Evaluation of spatial active noise cancellation performance using acoustic
potential energy 109

(a)
0

Sound Energy (dB)

-10

-20
Without ANC
-30 10 cm radius
20 cm radius
-40
100 200 300 400 500 600 700 800
Frequency (Hz)
(b)
0
Sound Energy (dB)

-10

-20
Without ANC
-30 10 cm radius
20 cm radius
-40
100 200 300 400 500 600 700 800
Frequency (Hz)

Figure 6.8: Spectrum of potential energy at control region (a) and microphone
locations (b) using wide-band noise signal.

reduce the sound pressure at the microphone positions, but often at the cost of very
high secondary source driving signals. Since the two microphones provide a complete
coverage of the control region, the potential energy inside the region becomes less
controllable compared to the 5-channel case, and in some extreme cases, this results
in positive attenuation inside the region. In this case, the microphone signals are
not a good indication of the ANC performance inside the control region, except at
very low frequencies (below 150 Hz), as can be seen in Fig. 6.7 (b).

Wide-band performance

In order to investigate the system’s wide-band performance, we compute the average

spectrum of the microphone signals with and without ANC, as well as the spectrum
of observed potential energy within the control region. All five error microphones
are used in this experiment. The spectrums are shown in Fig. 6.8, where Fig. 6.8
(a) plots the potential energy, and Fig. 6.8 (b) plots the spectrum of microphone
signals. The difference of spectrums at low frequency between Fig. 6.8 (a) and (b)
is due to the prescience of high pass filters in the error microphones’ amplification
circuitry.
110 Methods for spatial ANC performance evaluation and optimization

From Fig. 6.8 (a) it can be seen that overall the wide-band ANC performance
agrees with the corresponding narrow-band results shown in Fig. 6.7 (a), where an
attenuation of 10 dB and more can be achieved for most frequencies below 400 Hz,
while at higher frequencies, the attenuation gradually reduces to 3−5 dB. Comparing
the curves corresponding to 10 cm and 20 cm microphone array radius, it can be seen
that the low frequency ANC performance of the two configurations are very similar,
with the 10 cm configuration being superior at certain frequency ranges. At higher
frequencies, the 10 cm configuration yields consistently better attenuation than the
20 cm configuration, which also agrees with Fig. 6.7 (a), although the attenuation
is slight worse for the 10 cm radius case.
In the case of Fig. 6.8 (b), the attenuation observed at microphone positions
differ greatly for the two radius configurations. For 10 cm radius, the attenuation is
greater than 5 dB for nearly all frequencies between 100−800 Hz, while for the 20 cm
setup, the attenuation becomes negligible above 475 Hz. Up on closer observation, it
can be seen that the microphone attenuation at 10 cm radius is 2−3 dB greater than
that of the potential energy inside the control region. Therefore, neither of the two
microphone position configurations truthfully reflect the ANC performance within
the control region, although the result is more accurate when the microphones are
placed closer to the control region.

6.3.5 Observations and insights

Overall, we can see that the error microphone signals cannot be used to reliably
measure the noise attenuation inside the control region, especially when the number
of error microphones is small, or when the microphones are not place close to the
control region. On the other hand, using the proposed method, we can use a spherical
microphone array placed in the center of the region to conveniently and reliably
monitor the potential energy within the region.
In addition, in terms of the performance of the ANC system, we notice that
its wide-band performance largely agrees with the narrow-band performance, which
confirms the capability of the implemented ANC algorithm. Also, we notice that
at higher frequencies, although sometimes an attenuation of the noise cannot be
observed by the microphones, the noise level within the control region still saw a
small (around 3 dB) reduction. We note that this is because the wavelength of the
noise is still on par with the radius of the array, if the array had an even larger
radius, this phenomenon may not be observed.
6.4 In car spatial ANC performance analysis 111

6.4 In car spatial ANC performance analysis

In this section, we evaluate a passenger car’s integrated loudspeakers’ noise can-

celling capabilities by analyzing the in-car noise field and the loudspeaker responses.
We show that the noise field can be decomposed into several basis noise patterns.
Noise field measurements are carried out for both single seat and multiple seats
scenarios, and a series of analysis are performed to estimate the ANC capabilities
of the in-car loudspeakers.

6.4.1 Background

The application of noise cancellation methods to minimize interior cabin noise has
been a key topic of research in the automobile industry for the last 15 − 20 years
[116]. Initially, this problem was approached via passive noise cancellation methods,
which use acoustic treatments such as structural damping and acoustic absorption.
However, with the growing need to improve fuel efficiency, there has been more
preference on lighter bodies and smaller engines, which has significantly increased the
structural vibration and consequent interior noise, predominantly at low frequencies
(e.g. 0 − 500 Hz) [117]. As passive methods were least effective with low frequency
noise, active methods were developed where secondary loudspeakers were proposed
to attenuate measured noise inside the cabin [6, 117–120]. With modern in-car
entertainment systems providing 4 − 6 built-in loudspeakers, the addition of an
active noise cancellation systems is considered to involve no greater cost [7].
To the best of our knowledge, the existing in-car MIMO controllers are con-
strained to a set of arbitrary observation points. As a result, spatial control over
continuous regions is limited and made worse with increased frequency. Addressing
this issue, we focus this work on modeling vehicle-interior noise over a continuous
spatial region such that noise control can be achieved over the region with size simi-
lar to a human head for frequencies up to f = 500 Hz. We also derive the maximum
attenuation levels for a given speaker arrangement so that industrial designers can
investigate the potential noise cancellation capability of a given loudspeaker system
for various noise sources and driving conditions. All of the analysis we perform are
based on acoustic measurements taken in a real in-car environment.
112 Methods for spatial ANC performance evaluation and optimization

6.4.2 Problem Formulation

Denote the unwanted noise pressure at a point x as Pn (x), and the sound pres-
sure due to the loudspeakers as Pc (x), the average residual noise energy within the
interested region S can be expressed as
Z Z
2
|Pr (x)| dS = |Pn (x) + Pc (x)|2 dS. (6.21)
S S

A complete in-car active noise cancellation system consists of many components,

each component may have an impact on the system’s overall performance. In this
paper, we aim to evaluate the potential performance of in-car loudspeakers on ANC
R
applications by estimating the minimum values of S |Pr (x)|2 dS for various frequen-
cies, based on the information on in-car noise field and the acoustic characteristics
of the car’s integrated loudspeakers, so as to see if the integrated loudspeakers are
the potential bottleneck for in-car ANC systems.
Using the acoustic potential energy as the metric for spatial sound level, in the
active noise cancellation scenario, the residual noise field Pr (x) in (6.21) thus have
the average energy
Z Z
2
|Pr (x)| dS = |Pn (x) + Pc (x)|2 dS (6.22)
S S
X (n) (c)
= |(Clm + Clm )Wn |2 , (6.23)
l,m

(n) (c)
where the expression (6.19) is used, and Clm and Clm are the spherical harmonic
coefficients representing the noise field and the loudspeaker anti-noise field, respec-
tively.
R
We then move on to derive an estimation of S |Pr (x)|2 dS, by analyzing the noise
field and loudspeaker channel characteristics.

6.4.3 Noise field characterization

For a certain driving condition, we assume that the random noise field within S can
be seen as a weighted combination of a number of fixed, basis noise patterns, or
noise modes [121], each driving condition may have a different set of basis. Then
the noise field pressure within S at any time under a fixed driving condition can be
6.4 In car spatial ANC performance analysis 113

decomposed as
X
Pn (x) = gi Pi (x), (6.24)
i

where Pi (x) denotes the ith basic noise pattern at x, and gi are some random weigh-
ing factors for each noise pattern. Theoretically an infinite number of modes are
needed to fully describe an arbitrary noise field, however for a relatively small region
and low frequencies, only a small number of noise modes are required for a good
approximation of the noise field [121]. Using the spherical harmonics decomposition
(2.1) to decompose the noise field Pn (x) and the basis patterns Pi (x), we can express
(n) i
each noise field coefficient αnm using the corresponding coefficient Clm of every basis
pattern,
(n)
X
i
Clm = gi Clm . (6.25)
i

(n) (n)
We can write all the coefficients in a vector form such that C = [C00 , C11 , ..]T , and
i i
C i = [C00 , C11 , ..]T , then from (6.25) we have
X
C= gi C i . (6.26)
i

In order to reflect the relative impact of each spherical harmonic coefficient on

the overall noise level within S, we define the weighted coefficient vector ci =
i i
[C00 W0 , C11 W1 , ..]T , and by multiplying both sides of (6.26) with a diagonal ma-
trix W with diag{W } = [W0 , W1 , W1 , W1 , W2 , ...]T , we have
X
c= gi ci , (6.27)
i

R
where c represents the random noise field in S and kck2 = S |Pn (x)|2 dS. Similar to
the modal domain MUSIC DOA algorithm [122], we can find a set of ci by calculating
the autocorrelation matrix E{ccH }, and then decompose E{ccH } to acquire a set of
orthonormal eigenvectors and their corresponding eigenvalues. Unlike the MUSIC
DOA method which utilizes the noise subspace eigenvectors, we select the signal
subspace eigenvectors to be ci , which correspond to the eigenvalues λi whose values
are significant. The eigenvalues indicate the energy distribution of the overall noise
field among the basis noise patterns, and E{|gi |} = λi .
114 Methods for spatial ANC performance evaluation and optimization

Therefore the expected average noise power within S can be calculated as

Z X
E{ |Pn (x)|2 dS} = E{kck2 } = kλi ci k2 . (6.28)
S i

Through decomposing the noise field into basis noise patterns, we gain more in-
sight in the dimensionality/sparsity of the noise field. A noise field of high order may
have a compact representation using (6.24). Furthermore, additional signal analysis
methods such as direction-of-arrival (DOA) estimation may be applied on the basis
noise patterns to identify principal noise sources, which helps in determining optimal
loudspeaker placement for ANC purposes when designing the vehicle.

6.4.4 Residual noise level estimation

We model the characteristics of loudspeakers at each frequency bin through the

wave domain channel matrix H, which describes the loudspeakers’ response over
the entire region of interest,
 
1 2 V
H00 H00 . . . H00
 1 2 V 
 H11 H11 . . . H11 
H=
 .. .. .. ..  (6.29)
 . . . . 

1 2 V
HLL HLL . . . HLL

v
with Hlm being the spherical harmonic coefficient of order l and mode m, defined
in the region of interest S, due to the vth loudspeaker playing a unit signal at one
frequency. For an Lth order region S and an array of V independent loudspeakers,
the size of H is (L + 1)2 -by-V .
Since the noise field can be completely described by its eigenvectors c1 , c2 .., we
can estimate the noise cancellation performance by comparing the eigenvectors with
the loudspeaker channels. In particular, we define the weighted channel matrix
T = WH, where W is the diagonal matrix defined in Section 6.4.3.
Then we can solve for the loudspeaker driving signal solution that minimizes
(6.22) for each basis noise field pattern defined by ci , which can be derived as

D = −(T H T )−1 T H ci . (6.30)

6.4 In car spatial ANC performance analysis 115

The residual error vector is then

ei = ci + T D = (I − T (T H T )−1 T H )ci . (6.31)

The driving signal solution (6.30) is essentially the Least Mean-Square Error (LMS)
solution over the continuous space S, instead of the LMS solution based on a number
of discrete spatial sampling points which is commonly used in existing car ANC
systems.
We use the eigenvalues λi as well as the original and residual noise field vectors,
ci and ei , respectively, to express the noise cancelling performance, and the overall
expected noise power reduction ratio can be given using (6.28)
R
E{ S |Pr (x)|2 dS} 2
P
i kλi ei k
e= R = P 2
, (6.32)
E{ S |Pn (x)|2 dS} i kλi ci k

where the term ci in (6.32) can be omitted since ci are orthonormal.

6.4.5 Experiment on a single passenger seat

Experiment setup

In this experiment, we use the method developed in the previous sections to analyze
the potential noise cancellation performance of the loudspeakers installed in a car
(2005 Ford Falcon XR6 sedan).
We use an Eigenmike to measure the in-car noise field; the region of interest
is chosen to be a spherical area with 10 cm radius, located at the head position
of the frontal passenger seat. The radius of the region is larger than that of the
EigenMike (4.2 cm), therefore we only analyze the sound field for frequencies below
500 Hz, within this frequency range, only the 0th and 1st order sound field harmonics
are active inside the region of interest [53], which can be reliably measured by the
Eigenmike placed in the center of the region. Also, spectral analysis of the in-car
noise indicate that the majority of the noise power lie below 500 Hz (an example
of the noise spectrum is shown in Fig 6.11), thus the noise cancelling performance
within this frequency band is indicative of the overall cancelling quality.
The vehicle has four full-band loudspeakers installed, two of which are integrated
in either of the front doors, while the other two are placed behind each rear seat.
Unfortunately, the car’s audio system can only play stereo signals, which means the
two loudspeakers on either side cannot be driven separately, and always play the
116 Methods for spatial ANC performance evaluation and optimization

Figure 6.9: Picture of the EigenMike installed in a Ford Falcon XR6.

same signal.

In order to characterize the noise field, we record the in-car noise under var-
ious driving conditions. We also recorded the noise fields due to engine and air-
conditioner while the car is stationary. For each driving condition, a 10-second-long
recording is separated into 100 snapshots, we then calculate the sound field co-
efficients for each snapshot and at every frequency bin, and finally calculate the
coefficient covariance matrix of all the 100 snapshots. The covariance matrix used
as the estimation of ccH , and is used for further data analysis.

The loudspeaker channel matrix is obtained by measuring the spatial response

at the region of interest due to the left channel and right channel separately using
the Eigenmike, the sound field coefficients for each frequency bin are calculated in
the same way as the noise field samples. The 1st order sound field and the stereo
speaker system result in a 4-by-2 channel matrix for each frequency bin.

When calculating the residual noise field vector ei , we include a small regular-
6.4 In car spatial ANC performance analysis 117

100 km/h 100 Hz 200 Hz 300 Hz 400 Hz 500 Hz

λ1 137.7 29.58 18.43 15.02 9.360
λ2 6.578 1.049 1.286 1.270 1.134
λ3 2.610 0.651 0.814 0.800 0.679
λ4 1.475 0.418 0.645 0.596 0.414
Engine Only 100 Hz 200 Hz 300 Hz 400 Hz 500 Hz
λ1 74.50 50.97 17.65 8.697 5.029
λ2 2.217 1.807 0.930 0.997 0.578
λ3 0.862 0.706 0.496 0.574 0.442
λ4 0.557 0.400 0.341 0.369 0.172

Table 6.2: Table of noise field eigenvalues for freeway driving condition and pure
engine noise

ization parameter β, such that (6.31) becomes

ei = (I − T (T H T + βI)−1 T H )ci , (6.33)

with β = 0.01. The regularization prevents severe ill-conditioning of the matrix

inversion, thereby preventing the occurrence of very high secondary loudspeaker
volumes.

Experimental data analysis

By diagonalizing the estimated coefficient covariance matrices acquired from the

recordings from various driving conditions, we obtained the eigenvalues for every
case and each frequency bin. The eigenvalues are given in Table 6.2 for the freeway
driving condition and a pure engine noise recording. For the freeway recording, the
car was driven on a freeway at 100 km/h, with air conditioning turned to low; for
the engine noise recording, the car was parked in a quiet place with air conditioning
switched off, and the engine ran at various rpm during the recording.
From Table 6.2 we see that the eigenvalues decay quickly as the frequency in-
creases, which indicates the shape of the noise spectrum, where the lower frequencies
are more dominant.
Furthermore, one may notice that the first eigenvalues for both cases and each
frequency bin are much larger than the other 3 eigenvalues, this is particularly
noticeable at lower frequencies, and the same phenomenon is observed in all the
other driving scenarios. In Section 6.4.3 we showed that each eigenvector of the
covariance matrix corresponds to a specific spatial sound field pattern, with the
relative importance of each pattern indicated by its eigenvalue. This result shows
that there is one dominant noise pattern in the region of interest for each frequency
118 Methods for spatial ANC performance evaluation and optimization

-5
Noise Power Attenuation (dB)

-10

-15

-20

-25
Busy Road
-30 Engine Only
AC Only
-35 Freeway

-40
50 100 150 200 250 300 350 400 450 500
Frequency (Hz)
Figure 6.10: Noise power spectrum attenuation for 4 different driving conditions.

bin. Therefore we expect that the lower frequency noise fields can be seen as sparse,
thus controlling such sound fields may require only a small number of well-placed
loudspeakers which can nicely reproduce the dominant noise pattern.
Figure 6.10 plots the noise power attenuation for four different driving conditions,
with the values calculated using (6.32). In addition to the freeway recording and
the engine noise recording, the “Busy Road” recording was taken while driving on
a 3-lane road at moderate speed with multiple vehicles passing by; while the “AC
only” recording was taken with the car parked in a quiet place and engine idle, the
air conditioning turned to maximum.
Figure 6.10 indicates that for most cases, the noise cancelling performance is
relatively consistent, with the attenuation reducing gradually from 30 − 35 dB at 50
Hz to 15−20 dB at 500 Hz. This frequency-dependent performance is expected since
the noise field is expected to be more complicated and harder to reproduce/cancel
when the wavelength is shorter. We also notice that the noise field due to air
conditioning is particularly difficult to cancel at 50 − 100 Hz, compared to other
scenarios. We expect this is because the noise field due to AC is less similar to
that of the loudspeakers, compared to other noise sources. One may also notice the
6.4 In car spatial ANC performance analysis 119

Normalized Power Spectrum (dB) -10

Measured Noise
Attenuation Residual
-20

-30

-40

-50

-60

-70

-80
10 2 10 3 10 4
Frequency (Hz)
Figure 6.11: Comparison of average noise field power spectrum before and after
cancellation.

common peak in all cases at 470 Hz, clearly at this frequency, the loudspeakers are
unable to reproduce the noise fields very well.
We also include Fig. 6.11 which depicts the overall noise spectrum without at-
tenuation, and the expected residual noise spectrum if the in-car loudspeakers are
employed to cancel the noise field. The original noise spectrum is recorded while
driving at 70 km/h with air conditioning at minimum. The attenuation is cut off at
500 Hz. We can see from the figure that the most dominant noise frequencies can
be effectively cancelled by the integrated loudspeakers, resulting in a much quieter
sound field within the region of interest.
In general, we can conclude that the integrated loudspeakers are capable of can-
celling the noise field within our defined region of interest at the front passenger seat.
However, we would expect the performance to degrade should the noise cancellation
be carried out for multiple seats. Nevertheless, a proper in-car ANC system would
be able to drive the four loudspeakers separately, which provides extra degrees of
freedom for the loudspeaker channels, thereby promoting the overall performance of
the system.
120 Methods for spatial ANC performance evaluation and optimization

6.4.6 Experiment with multiple passenger seats and limited

loudspeaker output power

The theory developed in Section. 6.4.3 can be easily extended to multi-zone case.
Assuming that a number of adjacent regions are defined inside the car cabin, Then,
considering one of the control regions Sj , we can use the spherical harmonics de-
composition (2.1) to decompose the noise field Pn (x), x ∈ Sj as well as the basis
patterns Pi (x), x ∈ Sj , we can then express the noise field coefficients belonging
j j,i
to the jth control region Clm using the corresponding coefficient Clm of every basis
pattern,
j
X j,i
Clm = gi Clm . (6.34)
i

We have shown that the average energy of a noise field is related to the spherical
harmonic coefficients that represent the noise field by Wl ., substituting clm = Clm Wl
into (6.34), we have
X j,i
cjlm = gi clm . (6.35)
i

Since we are considering the overall noise field over all of the control regions, it
is convenient to write the coefficients of all regions in vector form, such that c =
[c100 , c111 ...c200 ...cjLL ]T , and ci = [c1,i 1,i 2,i j,i T
00 , c11 ...c11 ...cLL ] . Then from (6.35) and combining
the coefficient of all control regions we have the vector representation
X
c= gi ci . (6.36)
i

A limitation of the mode matching method for deriving loudspeaker driving sig-
nals is that the amplitude of the loudspeaker driving signal is unbounded. Although
a regularization can be added to the matrix inversion in (6.30) to avoid extremely
high driving signals, there is no strict upperbound to the loudspeaker output power.
From a practical point of view, driving a loudspeaker beyond its linear operating
range would result in harmonic distortions, which introduces additional noise in the
control regions. In order to avoid this problem, we define the optimization problem

min f (D) = kc + T Dk, subject to |Di | 6 K, i = 1, 2... (6.37)

where Di are the elements of D and represent the driving signal for the ith loud-
speaker, k · k denotes `2 -norm, K is a constant which sets the volume upper bound
6.4 In car spatial ANC performance analysis 121

for each loudspeaker. The noise energy attenuation can be represented as

R
S ,S ..
|Pr (x)|2 dx kc + T Dk2
A= R 1 2 = (6.38)
S1 ,S2 ..
|Pr (x)|2 dx kck2

where D is the solution to (6.37).

The loudspeaker channel matrix can be expressed in a similar manner, as
 
1,1 1,2 1,3
H00 H00 H00 ...
 1,1 1,2 1,3

H11 H11 H11 . . .
 
 .. .. .. 
 . . . . . .
H=
H 1,1 1,2 1,3
 (6.39)
 LL HLL HLL . . .
 2,1 2,2 2,3 
H00 H00 H00 . . .
.. .. .. ..
 
. . . .

j,v
where Hlm being the spherical harmonic coefficient of order l and mode m, associated
with the jth control region, due to the vth loudspeaker playing a unit signal.

Experiment Setup

In this experiment, we aim to investigate the noise field complexity within a 2005
Ford Falcon XR6 sedan, under various driving conditions; as well as examine the
noise cancelling potential of the multimedia loudspeakers installed in the car. The
regions of interest are chosen to be spherical regions located at the head position of
each of the four seats, the radius of each region is set to 10 cm, which covers the
size of a human head.
For this experiment, we focus on the noise below 200 Hz. Using (6.19), we can
calculate the relative contribution of each spherical harmonic mode towards the total
noise energy within the control regions, at f = 200 Hz we have
R
|P (x)|2 dS |α00 W0 |2
RS 00 = P ≈ 0.972 (6.40)
2
S
|P (x)|2 dS l,m |αlm Wl |

thus the 0th order spherical harmonic accounts for the vast majority of the noise
energy within the control regions, for frequencies below 200Hz, the contribution of
the 0th mode is even higher (99.3% at 100 Hz). Therefore, in our experiments,
we only monitor the 0th order spherical harmonic for each control region, which
can be done by placing a single omni-direction microphone at the center of each
122 Methods for spatial ANC performance evaluation and optimization

region. We note that we measure only the 0th mode spherical harmonic because
at low frequencies, the 0th mode contributes to the majority of the noise energy,
not because we believe the noise field is isotropic. For noise field analysis of large
region and higher frequencies, higher-order microphones are required, such as the
Eigenmike.
The recording system we use consists of four AKG CK92 omnidirectional con-
denser microphones, connected to a TubeFire 8 audio interface via four AKG SE300B
microphone pre-amps. The synchronous audio streams are recorded using a Mac-
book, which is connected to the TubeFire 8 via firewire.
We record the noise field at the four control regions simultaneously for various
driving conditions, including the pure engine noise recording, where the car is parked
in a relatively quiet place and the engine ran at 2000 rpm. For each driving condition,
we record the noise for 10 seconds. The recording is then split into 100 frames and
j
transformed into spherical harmonic coefficients α00 (k) at different frequency bins
for further analysis.
The Ford sedan has four full-band loudspeakers installed, two of which are in-
tegrated at the bottom of either of the front doors, while the other two are placed
behind each rear seat. However, the car’s audio playback system only supports
stereo signals, which means the two loudspeakers on the left side simultaneously
play the left channel of the stereo signal, and the same goes for the right channel.
We obtain the loudspeaker channel matrix by measuring the impulse response
at the region of interest due to the left channel and right channel separately, and
then calculating the corresponding sound field coefficients for each frequency bin, in
the same way as we obtain the noise field measurements. The channel matrix takes
the form of (6.39). The 0th order sound fields at 4 regions and the stereo speaker
system result in a 4-by-2 channel matrix for each frequency bin.
In order to estimate the noise cancellation capability of the in-car loudspeakers
in each driving condition, we solve (6.37) for each of the 100 snapshots in every
recording, and calculate the expected residual noise energy for each snapshot. The
value of K is chosen such that the sound energy at the regions of interest due to each
loudspeaker is no more 3 times more than that due to the noise. We then calculate
the average noise energy attenuation using
P100
kcn + T D n k2
A = n=1
P100 , (6.41)
kc k2
n=1 n

where cn and D n are the weighted coefficient vectors and the optimal driving signals
6.4 In car spatial ANC performance analysis 123

Table 6.3: Noise field eigenvalues for freeway driving condition and pure engine noise
100 km/h 40 Hz 80 Hz 120 Hz 160 Hz 200 Hz
λ1 1.000 1.000 1.000 1.000 1.000
λ2 0.292 0.282 0.498 0.476 0.292
λ3 0.062 0.207 0.181 0.372 0.102
λ4 0.007 0.139 0.049 0.092 0.053
Engine Only 40 Hz 80 Hz 120 Hz 160 Hz 200 Hz
λ1 1.000 1.000 1.000 1.000 1.000
λ2 0.033 0.315 0.108 0.293 0.042
λ3 0.005 0.095 0.018 0.106 0.031
λ4 0.000 0.018 0.003 0.045 0.015

for the snapshots in each recording, respectively.

Data Analysis

We first investigate the dimensionality of the combined noise field over the four
control regions by observing the eigenvalues of the estimated covariance matrix of
the spherical harmonic coefficients. We normalize the eigenvalues and sort them
from the largest to the smallest, the results for pure engine noise and the noise
when driving at 100 km/h are shown in Table 6.3. We can see from Table 6.3 that
the eigenvalues of the engine noise are almost always smaller than the corresponding
eigenvalues of the freeway driving condition (100 km/h). In the case of engine noise,
the fourth eigenvalue is in the order of 0.01 for most frequencies, therefore the noise
field may be modelled using 3 noise modes in (6.25), without significant loss of
accuracy. As a result, in order to effectively cancel the engine noise over the four
control regions simultaneously, a minimum of 3 loudspeakers would be sufficient,
assuming that the loudspeaker channels have sufficient diversity.
On the other hand, the noise field of the freeway driving condition is more
complicated, the fourth eigenvalues are above 0.01 for all frequencies above 40 Hz.
Therefore at least four independent loudspeakers are required to effectively cancel
the noise within the control regions simultaneously.
Since the car’s loudspeakers can only play stereo signals, and that the combined
noise fields require no less than 4 independent loudspeaker channels to effectively
control, we do not expect a high noise energy attenuation over 3 or 4 seats. However,
we expect the loudspeakers to simultaneously cancel the noise over two control
regions with good results. In order to validate our expectations, we use (6.41)
to calculate the expected noise attenuation for simultaneous noise cancellation for
2, 3 and 4 seats, the results are shown in Figs. 6.12-6.15. The noise cancellation
124 Methods for spatial ANC performance evaluation and optimization

0
60km/h
-10
80km/h
Average Attenuation (dB)

100km/h
Engine Noise
-20

-30

-40

-50

-60
40 60 80 100 120 140 160 180 200
Frequency (Hz)

Figure 6.12: Expected noise power attenuation after noise cancellation in the two
front seats only.

performance for the two front seats only is shown in Fig. 6.12. The attenuations are
calculated for frequencies from 40 Hz to 200 Hz, and for driving speeds at 60 km/h,
80 km/h and 100 km/h. The attenuation for the engine noise is also included in the
figure. We can see from Fig. 6.12 that the attenuation for all three driving speeds
are very similar. The residual noise level is highest at 40 Hz, and gradually reduces
to around -40 dB for all three driving speeds. The engine noise, on the other hand,
can be effectively cancelled at most frequency bins. We believe this is because of
the low dimensionality of the engine noise field, as is shown in Table 6.3.
Since we are only considering the 0th order coefficients in our calculations, while
ignoring the other coefficients which contribute to approximately 1 percent of total
noise energy, the upper bound of actual achievable attenuation would be around 20
dB, depending on the loudspeakers’ ability to attenuate the higher order coefficients.

Figure 6.13 shows the results for simultaneous noise control for the two right
side seats. A trend similar to that in Fig. 6.12 can be observed. We believe that
the reason for the increasing attenuation over frequency is due to the impact of
wavelength on loudspeaker channels, where at low frequency, the sound pressure at
two different seats due to one particular loudspeaker is very similar. Therefore the
loudspeaker channel matrix is highly coupled at low frequencies, resulting in less
6.4 In car spatial ANC performance analysis 125

0
60km/h
Average Attenuation (dB)
-10
80km/h
100km/h
Engine Noise
-20

-30

-40

-50

-60
40 60 80 100 120 140 160 180 200
Frequency (Hz)

Figure 6.13: Expected noise power attenuation after noise cancellation in the two
right side seats only.

noise attenuation under the same output power constraint. Figure 6.14 illustrates
the expected ANC performance for simultaneous 3-seat noise control (two front seats
and left passenger seat). As expected, the noise energy reduction is significantly
worse than the two-seat cases, with around 10 dB reduction across all frequency
bins of interest. We also notice that the engine noise is no longer easier to cancel
than the other noise fields apart from a few frequency bands (40-60 Hz). This is
consistent with Table 6.3, where the third and fourth eigenvalues of engine noise
at 40 Hz are very small, indicating a sparse noise field with 2 degrees of freedom,
therefore the noise field can be controlled by a stereo system. We also include
Fig. 6.15 which depicts the four-seat ANC performance. Compared to Fig. 6.14, the
attenuation is even smaller at around 6-7 dB. However, the ANC performance is once
again consistent over different driving speeds. From this observation, we estimate
that the noise field at different driving speeds are similar, and that a loudspeaker
array’s capability of controlling in-car noise does not vary greatly at different driving
speeds.
The attenuation of the engine noise is often lower that of the noise fields under
various driving conditions. However, from our subjective tests, the majority of the
noise in the car cabin came from the tires and suspension, the engine noise only
plays a small part in the overall perceived noise. Therefore, it is understandable
126 Methods for spatial ANC performance evaluation and optimization

-2 60km/h
80km/h
Average Attenuation (dB)

-4 100km/h
Engine Noise
-6

-8

-10

-12

-14

-16

-18

-20
40 60 80 100 120 140 160 180 200
Frequency (Hz)

Figure 6.14: Expected noise power attenuation after noise cancellation in the two
front seats and the left passenger seat.

that the overall noise reduction is different from the engine noise suppression under
the same conditions.
We would like to point out that the analysis of multiple-seat ANC performance
is limited to 200 Hz and 0th order only because of the limitations in the hardware
setup, more specifically, the lack of synchronized higher order microphones. In order
to obtain the analysis results for higher frequency and/or larger control regions,
it is necessary to replace the omni-directional microphones that are used in this
experiment with suitable higher order microphones, so that sound field components
of higher orders can be captured.

6.4.7 Observations and insights

In general, we can conclude that the integrated loudspeakers, when used as a stereo
system, are capable cancelling the in-car noise field at the head position of a single
seat for frequencies up to 500 Hz, or simultaneously cancelling the noise fields at
two seats for frequencies up to 200 Hz. It can be seen that when the number of
control regions exceed the number of loudspeaker channels, the ANC performance
reduces significantly. In order to control the noise over more regions or at higher
frequencies, additional independent loudspeakers are required. We expect the multi-
zone ANC performance of the four integrated loudspeakers to improve significantly,
6.5 Summary 127

-2
Average Attenuation (dB)
-4

-6

-8

-10

-12

-14
60km/h
-16 80km/h
100km/h
-18
Engine Noise
-20
40 60 80 100 120 140 160 180 200
Frequency (Hz)

Figure 6.15: Expected noise power attenuation after noise cancellation in all four
seats.

if they could be driven separately.

6.5 Summary
In this chapter, we proposed one method to enhance the sound field reproduction
quality over a large region by prioritizing the reproduction at some smaller, sub-
zones. This method improves the sound field reproduction accuracy at the smaller
sub-zones at the cost of slight worse overall reproduction accuracy. This method
is especially useful when there is insufficient number of loudspeakers available for
sound field reproduction.
We also proposed a new metric for measuring average noise level within a region.
It is shown that this metric is more robust and accurate than the commonly used
method where the noise level is determined by averaging the noise pressure measured
by microphones.
This metric is then utilized to develop a method to estimate the potential per-
formance of a spatial ANC system. We use this method to evaluate the in-car
loudspeakers’ capability of cancelling the in-car noise at the passengers’ head posi-
tions, it was shown that the loudspeakers have the capability to attenuate the noise
level at lower frequencies for the given region of interest.
128 Methods for spatial ANC performance evaluation and optimization

6.6 Related Publications

This chapter’s work has been published in the following conference proceedings [115]
[123] [124] [125]:

H. Chen, P. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Spatial noise

cancellation inside cars: Performance analysis and experimental results,” in
Proc. 2015 IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA), Oct 2015, pp. 1–5.

H. Chen, P. Samarasinghe, and T. D. Abhayapala, “In-car noise field analysis

and multi-zone noise cancellation quality estimation,” in Proc. 2015 Asia-
Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA), Dec 2015, pp. 773–778.

H. Chen, J. Zhang, P. N. Samarasinghe, and T. D. Abhayapala, “Evalua-

tion of spatial active noise cancellation performance using spherical harmonic
analysis,” in Proc. 2016 IEEE International Workshop on Acoustic Signal
Enhancement (IWAENC), Sept 2016, pp. 1–5.

H. Chen, T. D. Abhayapala, and W.Zhang, “Enhanced sound field reproduc-

tion within prioritized control region,” in INTER-NOISE and NOISE-CON
Congress and Conference Proceedings 2014, vol. 249, no. 3, pp. 4055–4064,
Nov 2014.
Chapter 7

Spatial active noise cancellation

system architectures

Overview: The adaptive filtering algorithm is a critical component of an ANC sys-

tem. For spatial ANC, we believe that incorporating spatial signal processing tech-
niques into the adaptive filtering algorithm would yield improved performance. In
this chapter, we propose an adaptive filtering algorithm developed for spatial ANC
applications. This algorithm is based on the existing multi-channel feed-forward
adaptive algorithm, but with the input signals transformed into spherical harmonic
domain before entering the adaptive filtering process. Both frequency domain and
time domain architectures are proposed for the adaptive algorithm. We also imple-
ment an experimental spatial ANC system using the proposed time domain adaptive
algorithm, and investigate the impact of loudspeaker number and placement on the
performance of the spatial ANC system.

7.1 Introduction
Active noise cancellation (ANC) over space has been a hot topic of research in
the last two decades. Typically, ANC systems targeting spatial noise reduction
are realized by Multi-Input Multi-Output (MIMO) systems [126] employing a feed-
forward or feedback control algorithm [6]. Most popular applications of such systems
include aircraft cabin noise reduction [127] and automobile noise reduction [7, 128].
The most widely used MIMO ANC systems employ a number of microphones,
placed within the spatial area where noise attenuation is desired. The algorithms
are designed to minimize the average noise level captured by the error microphones

129
130 Spatial active noise cancellation system architectures

in the least-mean-square sense [6], through playing back counter-noise signals from a
number of loudspeakers. Using these algorithms, the noise attenuation is only max-
imized at the positions of the error microphones, while the overall noise attenuation
quality within the region of interest cannot be guaranteed.

A spatial ANC algorithm utilizing circular harmonics analysis has proposed

in [31]. In this work, both the noise field and the secondary loudspeaker driv-
ing signals are converted into the circular harmonics domain, and it was shown
that this method reduces computational complexity of massive multichannel ANC
systems [31]. However, one drawback of this method is that the method requires
circular arrangement of the secondary loudspeaker array, and that the algorithm
only performs well in 2D sound fields. In [30] a feedback spatial ANC algorithm
was proposed, and it was shown that this algorithm has a faster convergence speed
than existing MIMO algorithms. Both of these algorithms were only validated in
simulations.

In this chapter, we propose an improved spatial feed-forward ANC algorithm.

It is formulated in terms of the spherical harmonic decomposition of the noise field
and requires the measurements from a secondary microphone array distributed over
the boundary of a spherical region of interest. After converting the noise signal into
the spherical harmonic domain, an additional weighting is applied to each spherical
harmonic coefficient, which are then used to update an adaptive filter through a
LMS adaptive algorithm. We show that through the spherical harmonic transform
and the weighting process, the ANC algorithm is able to maximize the average noise
attenuation over the entire region.

We develop the algorithm in the frequency domain, and since most ANC al-
gorithms are implemented in the time domain, we also present the time domain
equivalent of the algorithm, which can be realized through time domain filtering of
microphone signals. In order to validate the the proposed algorithm, a prototype
spatial ANC system is built inside our laboratory. The system is used to investigate
the spatial ANC performance of the proposed algorithm under various hardware
configurations.
7.2 Background theory 131

7.2 Background theory

7.2.1 Time domain multi-channel feed-forward ANC archi-

tecture
The time domain FxLMS algorithm, which is the most popular adaptation algorithm
in the ANC application is briefly described here.
The system diagram of the multi-channel Filtered-X LMS algorithm is shown in
Fig. 7.1. The wide arrows represent a vector of signals (acoustic or electrical). H(z)
represents the primary channel. S(z) and S(z)
b represent the secondary channel from
the loudspeakers to the error microphones and the their estimations, respectively.
Assume n is the time index, and the system consists of U reference signals,
V secondary loudspeakers and Z error microphones, the reference input signals
are xu (n), u = 1, . . . , U and the instantaneous error microphone measurements are
ez (n), z = 1, . . . , Z. The primary noise field and the sound field due to the secondary
loudspeakers at the error microphone positions are represented by dz (n) and yz0 (n),
respectively. The error function in our system can be written as

ez (n) = dz (n) + yz0 (n), (7.1)

where
X
yz0 (n) = yv (n) ∗ svz , for z = 1, 2, . . . , Z (7.2)
v

yv (n) is the driving signal for the vth loudspeaker, svz is the secondary channel
between the vth loudspeaker and zth error microphone, and “∗” denotes linear
convolution.
The secondary source driving signals in each iteration can be represented by
X
yv (n) = wTuv (n)xu (n), for v = 1, 2, . . . , V (7.3)
u

where wuv (n) = [wuv,0 (n), wuv,1 (n), . . . , wuv,L−1 (n)]T are the adaptive filter coeffi-
cients in the nth iteration, L is the length of the FIR adaptive filters.
The update equation of the multi-channel FxLMS algorithm is derived by
X
wuv (n + 1) = wuv (n) − µ x0uvz (n)ez (n), for v = 1, 2, . . . , V, and u = 1, . . . , U
u
(7.4)
132 Spatial active noise cancellation system architectures

Figure 7.1: Block diagram of the time domain feedforward ANC system.

where µ is the step size, x0uvz (n) = [x0uvz (n), . . . , x0uvz (n − L + 1)]T is the vector of the
latest L filtered reference signals, and the filtered reference signals can be obtained
by
x0uvz (n) = xu (n) ∗ b svz . (7.5)

Filtering the reference signal xu (n) by the secondary channel estimation helps to
improve the convergence speed of the adaptive algorithm, especially when the sec-
ondary path has a long delay [6].

7.2.2 Frequency domain feed-forward ANC architecture

The ANC algorithm can be efficiently implemented by computing the time-domain

linear convolutions using FFT techniques, which results in a system with an equiv-
alent system but with much lower computational cost. A single-channel frequency
domain implementation has been introduced in [129]. The data flow is shown in
Fig. 7.2.
In the frequency domain implementation proposed in [129], the data is processed
frame by frame, rather than sample by sample as in the time domain architecture.
Assuming the frame size to be N , the frequency domain reference signal is calculated
by taking the 2N -point FFT of two consecutive input data frames, i.e.
" #
x(j)
X(j + 1) = F2N , (7.6)
x(j + 1)
7.2 Background theory 133

Figure 7.2: Block diagram of the Frequency Domain feedforward ANC system.
134 Spatial active noise cancellation system architectures

where F2N denotes 2N -point FFT, j is frame index, x(j) and x(j + 1) denote the
previous and current frame of input data, respectively.
The cancelling signal is generated by convolving the reference signal and the
filter w(k) in the frequency domain, and discarding the first N samples of the
IFFT output, which can be expressed as

−1
y(j + 1) = [O N I N ]F2N [X(j + 1) ⊗ W (j)], (7.7)

where “⊗” denotes pair-wise multiplication of two vectors or matrices, y(k + N ) is

the latest frame of secondary source driving signal, O N and I N denote N-by-N zero
matrix and identity matrix, respectively. W (j) is the 2N point FFT of the time
domain adaptive filter w(j) with zero padding, expressed as
" #
IN
W (j) = F2N w(j). (7.8)
ON

The filtering of the reference signal with the secondary path estimation is imple-
mented in a similar manner. Denote the estimated secondary path impulse response
of length N as sb, the frequency domain filtered reference signal X 0 (j + 1) can be
calculated as " #
x(j)
X 0 (j + 1) = F2N ⊗ S,
b (7.9)
x(j + 1)

where S
b is the 2N point FFT of b
s,
" #
b = F2N IN
S s.
b (7.10)
ON

The adaptive filter w(k) is implemented in the time domain, in the form of a
vector of length N . The filter is updated when a new frame of reference signal and
error signal is available, and the update equation can be written as

w(j + 1) = w(j) + 2µc(j + 1) (7.11)

where µ represents step size, and

−1
c(j + 1) = [O N T N ]F2N [X 0 (j + 1) ⊗ E(j + 1)], (7.12)

with T N being a time reversal matrix, with its secondary diagonal equal to 1, and
7.2 Background theory 135

other entries equal to 0. The frequency domain error signal E(j + 1) is obtained
by taking the 2N point FFT of the latest frame of the time domain error signal,
expressed as " #
IN
E(j + 1) = F2N e(j + 1). (7.13)
ON

The algorithm is guaranteed to converge if [129]

1
0≤µ≤ , (7.14)
N λmax

where λmax is the maximum eigenvalue of the autocorrelation matrix of the input
signal, and
0
E[X T (k)X 0 (j)] = 0, k 6= j. (7.15)

The single channel frequency domain implementation can be easily extended to

a multi-channel system, with U reference sources, V secondary sources and Z error
microphones. In this case, the size of the data matrices would be 2N × U for X(k),
2N × U × V × Z for X 0 (k), 2N × Z for E(k), and N × U × V for w(k). The adaptive
filter w(k) needs to be updated for each u and v, i.e.,

Z
X
wuv (j + 1) = wuv (j) + 2µcuvz (j + 1), (7.16)
z=1

where
−1
cuvz (j + 1) = [O N T N ]F2N [X 0uvz (j + 1) ⊗ E z (j + 1)], (7.17)

and the driving signals for each loudspeaker is generated as

U
X
−1
y v (j + 1) = [O N I N ]F2N [X u (j + 1) ⊗ Wuv (j)]. (7.18)
u=1

It can be seen that the computational complexity grows quickly as the number of
reference signals, loudspeakers and error microphones increase.
136 Spatial active noise cancellation system architectures

7.3 Frequency domain feed-forward architecture

for spatial ANC systems

7.3.1 Existing spatial ANC system based on circular har-

monic transform

Spatial ANC systems using circular harmonic transform have been proposed in [31].
The overall data flow in this method is similar to the multi-channel ANC algorithm.
However, in this method, the error microphones form a circular array, which is
surrounded by the secondary loudspeaker array, also taking a circular geometry.
The reference microphones surround the loudspeaker array, and form a third circular
array. The extensive use of circular array geometries allows for transforming both
the reference and error signals into circular harmonic coefficients; in addition, the
secondary loudspeaker channels H v (k) are also transformed into circular harmonic
domain, under the assumption that the loudspeakers are point sources.

In this method, the filtered reference signals are generated by filtering the refer-
ence circular harmonic coefficients through the secondary channel circular harmonic
coefficients, therefore X 0 (k) is also in the circular harmonic domain. Furthermore,
since the error signals are also transformed into circular harmonics, the adaptive
filter, which takes the same form as in the MIMO adaptive algorithm, operates in
the circular harmonic domain, and updates the adaptive filter W (k) which contains
circular harmonic coefficients that mimics the primary channel.

The loudspeaker driving signals are first generated by filtering the reference coef-
ficients through the adaptive filter, also in the form of circular harmonic coefficients.
Then, an inverse circular harmonic transform maps the coefficients to each individual
loudspeaker, and produces the final output signal for each speaker.

Due to the use of circular harmonic transform, this method is able attenuate the
noise within the 2D space covered by the error microphone array, while significantly
reducing the computational complexity, compared to a multi-channel algorithm us-
ing the same loudspeaker-microphone setup [31]. However, one disadvantage of this
method is that in order to express the secondary sources using circular harmonics,
the loudspeakers have to be arranged as a circular array.
7.3 Frequency domain feed-forward architecture for spatial ANC systems 137

7.3.2 Proposed spatial ANC system based on spherical har-

monic transform

The adaptive algorithm proposed in [31] can be extended to 3D space by replacing

the circular harmonic transforms with spherical harmonic transforms. However,
this would require a hardware setup consisting of three concentric spherical arrays
surrounding the quiet zone, which makes deployment difficult, especially for real-life
applications.
In this section, we propose an alternative adaptive algorithm based on spherical
harmonic transform. In order to maintain the flexibility of the MIMO adaptive
algorithm, we only require that the error microphone array has the capability of
capturing 3D spatial sound field in the form of spherical harmonics, but not require
any specific geometry for the reference microphones and secondary loudspeakers.
Due to not using spherical reference microphone array, we do not perform spher-
ical transform for the reference signal. However, both the error signal and the sec-
ondary loudspeaker channels can be transformed into spherical harmonic domain.
Consequently, the filtered reference signal X 0 (k) can also expressed as spherical har-
monic coefficients. This allows the adaptive algorithm to operate in the spherical
harmonic domain, since all of its inputs are spherical harmonic coefficients. The
inverse transform to derive loudspeaker driving signals can be omitted, due to not
using spherical loudspeaker array geometry.
Figure 7.3 illustrates the structure of the proposed frequency domain spatial
ANC architecture. Compared to Fig. 7.2, it can be seen that the structure of the
spatial ANC system is similar to that of the multi-channel system, except that
the filtered reference signals and the error signals are transformed into spherical
harmonic domain before being passed to the adaptive filter.
The spherical harmonics transform operates on each frequency bin of all the error
microphone signals, the operation may be expressed in matrix form as

β(k, j) = T (k)E(k, j), (7.19)

where β(k, j) is a vector of length (L + 1)2 , containing all the spherical harmonic
coefficients for the error signals at frequency k and frame index j, E(k, j) is a
vector of length Z, containing the signals of all error microphones at frequency k
and frame index j. T (k) is the transformation matrix specific for the frequency bin
and microphone array geometry. For a uniform spherical error microphone array of
138 Spatial active noise cancellation system architectures

Figure 7.3: Block diagram of the frequency domain feedforward spatial ANC system.
7.3 Frequency domain feed-forward architecture for spatial ANC systems 139

radius R, T (k) takes the form

 Y ∗ (θ1 ,φ1 ) ∗ (θ ,φ )
Y00 2 2
∗ (θ ,φ )
Y00 z z

00
j0 (kR) j0 (kR)
... j0 (kR)
∗ (θ ,φ )
Y11 ∗ (θ ,φ )
Y11 ∗ (θ ,φ )
Y11
1 1 2 2 z z
...
 
j1 (kR) j1 (kR) j1 (kR)
 
T (k) =  .. .. .. ..
. (7.20)
. . . .
 
 
∗ (θ ,φ )
YLL ∗ (θ ,φ )
YLL ∗ (θ ,φ )
YLL
1 1 2 2 z z
jL (kR) jL (kR)
... jL (kR)

The spherical harmonics transform for the filtered reference signals can be defined
similarly as
αuv (k, j) = T (k)X 0uv (k, j), (7.21)

where αuv (k, j) are the spherical harmonic coefficients of reference signal u filtered
through the channel responses of secondary source v.
For a microphone array suitable for spherical harmonics analysis, the transforma-
tion of the signal to the spherical harmonic domain changes the number of channels
from Z to (L+1)2 , and since Z ≥ (L+1)2 when no spatial aliasing occurs, the trans-
form reduces the complexity of the adaptive algorithm by a factor of approximately
(L + 1)2 /Z.
The Least-Mean-Square algorithm employed in the system aims to minimize
the mean square of all the error inputs. For a multi-channel ANC system without
spherical harmonics transformation, the optimization goal is
X
min{ |Ez (k, j)|2 } (7.22)
z

which minimizes the average signal energy at each of the error microphone. Af-
ter applying the spherical harmonic transform to both X 0 (k) and E(k), the LMS
minimization criteria becomes
XX
min{ |βlm (k, j)|2 } (7.23)
l m

which approximately reduces the noise level within a sphere, rather than at a finite
number of points. However, in order to achieve minimum residual acoustic potential
energy, an additional weighing needs to be applied to each of the obtained spherical
harmonic coefficients, such that
XX Z
2
min{ |Wl βlm (k, j)| } = min{ P (r, θ, φ, k)dS}, (7.24)
l m S
140 Spatial active noise cancellation system architectures

where the expression of the weights Wl is given in (6.20). The same weighing needs
to be applied to both the filtered reference signal coefficients and the error signal
coefficients, as shown in Fig. 7.3. The weighing procedure can be combined with the
spherical harmonics transform, and the weighted transform matrix can be expressed
as
T W (k) = W(k)T (k) (7.25)

where W(k) is a (L + 1)2 -by-(L + 1)2 diagonal matrix, with its diagonal elements
arranged as diag{W(k)} = [W0 (k), W1 (k), W1 (k)...WL (k)]T .

The adaptive filter wu,v for the uth reference signal and vth secondary source is
updated according to
X
wuv (j + 1) = wuv (j) + 2µ cuv
lm (j + 1) (7.26)
l,m

where cuv
lm (j + 1) is generated by taking the IFFT of the product of αuv (k, j + 1)
and β(k, j + 1) for all k,

−1
cuv
lm (j + 1) = [O N T N ]F2N [αuv (j + 1) ⊗ β(j + 1)]. (7.27)

The driving signal for the vth secondary source is generated as

X
−1
y v (j + 1) = [O N I N ]F2N [X u (j + 1) ⊗ W uv (j)], (7.28)
u

where W uv (j) is the FFT of wuv (j) using (7.8).

It can be seen that due to the frame-based data processing scheme, generation of
the secondary driving signals as well as updating of the adaptive filter are delayed
by at least N samples, which is undesirable for ANC applications. One way to
reduce the delay of the secondary path signal is to implement (7.28) in the time
domain, i.e., use the time domain convolution method to generate the secondary
signals [129], this way, the latency of the secondary path signal can be reduced
to the same level of time domain ANC algorithms, at the cost of computational
efficiency. However, updating of the adaptive filter still needs to be processed frame-
by-frame, therefore the updating latency of the adaptive filter is always higher than
time domain implementations.
7.4 Time domain feed-forward architecture for spatial ANC systems 141

7.4 Time domain feed-forward architecture for spa-

tial ANC systems
Although the frequency domain implementation is straight forward and computa-
tionally efficient due to its use of FFT, it is sometimes desirable to implement ANC
systems in the time domain for lower latency and easier realization in existing em-
bedded systems. In this section, we propose a time domain spatial ANC architecture
based on spherical harmonics analysis.

7.4.1 Time domain spherical harmonics representation of

sound field
The sound pressure on the surface of a spherical region of radius r at a certain
frequency may be represented using spherical harmonics as

∞ X
X l
P (r, θ, φ, k) = τlm (k, r)Ylm (θ, φ). (7.29)
l=0 m=−l

By comparing (7.29) with (2.1), it can be seen that τlm (k, r) is related to the com-
monly used spherical harmonic coefficients by

τlm (k, r) = Clm (k)jl (kr) (7.30)

Taking the inverse Fourier transform of (7.29), we have

∞ X
X l
p(r, θ, φ, n) = τblm (n, r)Ylm (θ, φ), (7.31)
l=0 m=−l

where p(r, θ, φ, n) is the sound pressure at a discrete time index n, and τblm (n, r) is
is a set of coefficients defined in the time domain. The physical meaning of (7.31)
is that the sound pressure on the surface of a spherical region at a certain time
instant can be represented by an infinite summation of spherical harmonics. We note
that p(r, θ, φ, n) is the sum of spatial sound of all frequencies, and since the higher
frequency sound has shorter wave length and hence more complicated pressure field,
it is necessary to use the higher order spherical harmonics to represent the higher
frequency spatial sound components.
However, since for a given frequency, only a finite number of spherical harmonics
142 Spatial active noise cancellation system architectures

are needed to represent the sound field [53], if the sound signal is band limited, the
infinite summation in (7.31) may be truncated to a finite summation up to order L
given by (2.7). When sampling the sound field using a finite number of microphones,
in order to avoid spatial aliasing, it is necessary to low pass filter the input signal
before applying the time domain spherical harmonic transform, so that the higher
order spherical harmonics associated with higher frequency sound components would
be removed from the input signal.
Rearranging (7.30) and taking its inverse Fourier transform, we have

blm (n) = τblm (n, r) ∗ F −1 { 1

C }, (7.32)
jl (kr)

where C blm (n) = F −1 {Clm }. From (7.32), it can be seen that the time domain
spherical harmonic coefficients C blm (n) can be obtained by filtering the corresponding
τblm (n, r) with a filter whose frequency response is equal to 1/jl (kr).
Since time domain sound pressure signals are real-valued, it is sufficient to use
real-valued spherical harmonics (2.32) as the basis functions. The sound pressure at
a certain location (r0 , θ0 , φ0 ) and time index n can be obtained by

∞ X
X l
0 0 0 blm (n) ∗ F −1 {jl (kr0 )}Y R (θ, φ).
p(r , θ , φ , n) = C lm (7.33)
l=0 m=−l

In order to realize the minimization criteria (7.24), an additional filter is required

to obtain the weighted coefficients, this can be expressed as

C blm (n) ∗ F −1 {Wl (k)}.

elm (n) = C (7.34)

We note that the filters corresponding to 1/jl (kr) and Wl (k) can be designed
such that their frequency responses are accurate only for the interested frequency
band, so as to reduce the difficulty and complexity of the filter design problem.

7.4.2 Spatial ANC architecture using time domain spherical

harmonics analysis

We propose a time domain ANC system based on the time domain multichannel
ANC architecture and the time domain spherical harmonic analysis techniques. The
system structure is illustrated in Fig. 7.4.
7.4 Time domain feed-forward architecture for spatial ANC systems 143

Figure 7.4: Block diagram of the time domain feedforward spatial ANC system.
144 Spatial active noise cancellation system architectures

In the proposed system, the filtered reference signals x0uvz (n) are obtained by

x0uvz (n) = xu (n) ∗ b

svz (n). (7.35)

The filtered reference signal is then passed through a low pass filter, whose cut off
frequency equals to the maximum operating frequency of the ANC system. The
maximum frequency needs to agree with the capability of error microphone array,
i.e., the array should be able to capture spatial sound field at this frequency without
spatial aliasing.
For a uniform spherical error microphone array, the time domain coefficients
uv
τblm (n) of order l and mode m, due to reference signal u and secondary source v, are
obtained by
X
uv
τblm (n) = x0uvz (n)Ylm
R
(θz , φz ), (7.36)
z

where (θz , φz ) denote the angular position of the zth error microphone. For spherical
microphone arrays that do not have a uniform spatial sampling scheme, the alter-
native option is to solve for the coefficients in a Least-Mean-Square manner, which
can be expressed by
τbuv(n) = (Y R )−1 x0uv (n), (7.37)
uv uv uv
where τbuv(n) = [b
τ00 (n), τb11 (n), τb10 (n)...]T is the vector of all the coefficients τblm
uv
at
time instant n, x0uv (n) = [x0uv1 (n), x0uv2 (n)..]T is the vector containing the filtered
reference signals for all the error microphones at time instant n, and (Y R )−1 is the
Moore-Penrose pseudo inverse of the matrix Y R , which is given by
 
Y00R (θ1 , φ1 ) Y11R (θ1 , φ1 ) Y10R (θ1 , φ1 ) ···
 R
 Y00 (θ2 , φ2 ) Y11R (θ2 , φ2 ) Y10R (θ2 , φ2 ) · · ·

YR = .. .. .. . . . .
 (7.38)
. . .

 
Y00R (θZ , φZ ) Y11R (θZ , φZ ) Y10R (θZ , φZ ) · · ·

The pseudo inverse of the matrix Y R can be obtained offline, therefore it does not
increase the computation complexity of the algorithm.
uv
For clarity, instead of using Cblm (n), we use α
blm (n) and βblm (n) to represent the
time domain spherical harmonic coefficients for the filtered reference signals and
uv
error signals, respectively. We note that both α blm (n) and βblm (n) are real value coef-
ficients, due to the use of real-valued spherical harmonics. The time domain spher-
ical harmonic coefficients can be obtained according to (7.32). Next, the weighted
7.4 Time domain feed-forward architecture for spatial ANC systems 145

uv uv
spherical harmonic coefficients α
elm (n) are obtained by passing the coefficients α
blm (n)
through the weighing filter according to (7.34).
As can be seen in Fig. 7.4, the error signals from the microphone array are
processed in the same way as the filtered reference signal, and it is necessary that
the same low pass filter is used for both reference signal and error signal. We denote
the weighted error coefficients as βelm (n).
Since the inputs to the Least-Mean-Square algorithm are spherical harmonic
coefficients, the LMS algorithm operates on the spherical harmonics domain. The
adaptive filter bank w(n) has the size U × V × L. At each time instant, w(n) is
updated using the equation
XX
wuv (n + 1) = wuv (n) + µ e uv
α lm (n) ⊗ β lm (n),
e (7.39)
l m

where α e uv
lm (n) = [e
uv
αlm (n), α uv
elm (n − 1), α uv
elm (n − 2)...e uv
αlm (n − L + 1)] is the vector of the
latest L samples of the reference signal coefficients, and βelm (n) = [βelm (n), βelm (n −
1), βelm (n − 2)...βelm (n − L + 1)] is the vector of the latest L samples of the error
coefficients.
The driving signal for the vth loudspeaker is generated the same way as existing
multi-channel algorithm, given by
X
yv (n) = wuv (n)xu (n)T . (7.40)
u

The steps of filtering the reference signal and converting the filtered signal into
spherical harmonic domain may be simplified. Consider the equation
X 1
α uv
blm (n) = ( R
(xu ∗ sbvz )Ylm (θz , φz )) ∗ F −1 { } (7.41)
z
jl (kr)
X 1
= xu ∗ ( R
sbvz Ylm (θz , φz )) ∗ F −1 { }, (7.42)
z
jl (kr)

we may define the spherical harmonic domain secondary channel impulse response
as
X 1
v
Sblm , R
svz Ylm
b (θz , φz ) ∗ F −1 { }, (7.43)
z
jl (kr)

which essentially transforms the secondary channel impulse responses into the spher-
v
ical harmonic domain, and Sblm represents the spherical harmonic impulse response
v
of order l and mode m, due to secondary source v. Since Sblm can be calculated of-
146 Spatial active noise cancellation system architectures

Figure 7.5: Setup of the experimental ANC system.

fline without knowledge of the reference signal, the computational cost of obtaining
uv v
α
blm (n) can be greatly reduced through directly filtering xu (n) by Sblm , however it
should be noted that xu (n) needs to be low pass filtered first in order not to result
in spatial aliasing.

7.5 Experiment validation

7.5.1 System setup

An experimental ANC system is set up in the laboratory to validate the proposed
spatial ANC algorithm. Fig. 7.5 shows the hardware setup of the ANC system.
In this system, the ANC control region is defined as a spherical area of 29 cm
diameter, located at the center of a dodecahedron loudspeaker array system. Two
Tannoy 600 loudspeakers are used as the primary source, which are placed 1.5 me-
ters away from the center of the control region, outside of the dodecahedron array.
Although the dodecahedron array consists of 30 loudspeakers, only 6 of them are uti-
lized in this experiment as secondary loudspeakers, with their respective numbering
shown in Fig. 7.5.
In order to monitor the noise field within the control region, 6 omni-directional
7.5 Experiment validation 147

microphones are placed on the boundary of the region. The angular positions of
the microphones are (60◦ , 0), (60◦ , 120◦ ), (60◦ , 240◦ ), (120◦ , 60◦ ), (120◦ , 180◦ ) and
(120◦ , 300◦ ), respectively. An additional microphone is placed at the center of the
array, however this microphone is only used for monitoring purposes and is not part
of the ANC system.
The audio playback / record as well as real time signal processing are handled
by a desktop PC, with the ANC algorithm implemented using MatLab R2016. The
proposed time domain feed-forward spatial ANC system is implemented; we also
implemented the time domain MIMO ANC algorithm for comparison. The spatial
ANC algorithm is implemented in a frame based manner, with a frame size of 384
samples at a sampling rate of 44100 Hz. At each frame time, the program receives
audio input from the microphones, perform the ANC algorithm and generates the
loudspeaker driving signals to be played during the next frame. The noise signal is
also generated by the program, in order to create a controlled experimental environ-
ment.
Normally, in a feed-forward ANC system, the reference noise is picked up by
a reference microphone / sensor, sometimes attached to the primary noise source.
The digital signal processing system then processes the reference noise and generates
the anti-noise signals while the noise sound propagates towards the control region.
Ideally, the anti-noise sound is played by the secondary speakers before the noise
reaches the control region. In order to achieve this, the signal processing latency
must be smaller than the propagation time of the noise, from the error microphone to
the control region. Unfortunately, the signal path round trip latency of our system
is more than 2000 samples or 45 ms due to the buffering in the computer’s data
path, which means that the primary sources need to be placed more than 15 meters
away from the control region, should a reference microphone be used, which is not
possible in our lab condition1 .
Due to this reason, the reference noise is directly picked up in the electronic
path instead of being captured by a microphone, which eliminates the delay in the
capture of reference signal completely. In addition, the feedback from secondary
sources to the reference microphone is also avoided.
The aim of the experiment is to evaluate the performance of the proposed ANC
system under various system configurations. The target frequency band is 200 − 500
1
Using embedded signal processing systems such as microcontrollers, DSPs or FPGAs to im-
plement the adaptive algorithm and AD/DA conversion would significantly reduce the round trip
latency, down to only a few milliseconds or less, in which case primary noise source distance would
not be a problem.
148 Spatial active noise cancellation system architectures

Hz, this frequency band is chosen because it was found that the typical noise energy
inside cars is below 500 Hz, and that the loudspeakers being used as secondary
sources have limited low frequency capabilities. Within this frequency band, only
the 0th and 1st order spherical harmonic modes exist within the noise field, and the
microphone array can reliably pick up the noise field.
In order to evaluate the system’s performance at each frequency, for each experi-
ment, a sine wave of a certain frequency is played through one or two of the primary
noise sources; after a small period of time, the ANC algorithm begins to function
and gradually cancels the noise. The sound pressure received by the microphones
is recorded throughout each experiment, and the level of noise attenuation at the
microphone positions is calculated by

E{|ebef 2
P
z (n)| }
Amic = 10 log10 Pz aft 2
, (7.44)
z E{|ez (n)| }

where ebef aft

z (n) and ez (n) denote the sound pressure received by the zth error micro-
phone before ANC begins and after ANC algorithm converges, respectively.
Compared to the noise attenuation at the microphone positions, the attenuation
of the average noise level within the control region provides a better characterization
of the ANC performance. In order to calculate this, the noise signals received by
the error microphones are converted to spherical harmonic coefficients using (7.37),
and then filtered through the weighing filter Wl to obtain βelm (n). The attenuation
of the average noise energy within the control region is then defined as

bef
(n)|2 }
P
l,m E{|βelm
Aavg = 10 log10 P , (7.45)
l,m E{|βeaft (n)|2 }
lm

bef aft
where βelm (n) and βelm (n) represent the weighted error signal coefficients before ANC
begins and after ANC algorithm fully converges, respectively.

7.5.2 Experiment results

Comparison between MIMO ANC algorithm and spatial ANC algorithm

In this experiment, we investigate the differences in the performance between the

proposed spatial ANC algorithm and the existing MIMO ANC algorithm. For the
MIMO algorithm, all of the 6 microphones on the boundary of the control region
are utilized as the error microphones. For both algorithms, only the noise source 1
7.5 Experiment validation 149

-5
Attenuation (dB)

-10

-15
MIMO
Harmonic

-20
200 250 300 350 400 450 500
Frequency (Hz)

Figure 7.6: Comparison of spatial noise attenuation using MIMO algorithm (blue)
and the proposed spatial ANC algorithm (red).

is activated, and we only use speaker 1 (in Fig.7.5) as the secondary source.
Figure 7.6 plots the average noise attenuation within the control region, using
the existing MIMO ANC method as well as the proposed spatial ANC method.
It can be seen that overall, the two methods result in similar noise attenuations,
and the attenuation at lower frequencies are better than that at higher frequencies.
Since the 6 microphones are placed evenly over the spherical boundary of the con-
trol region (in order to capture the spherical harmonic coefficients), and that their
distances are much smaller than the wave length of the noise, they provide a very
good representation of the noise level within the field. As a result, the proposed
method does not show a clear advantage over the MIMO method.
However, it can be seen from Fig. 7.6 that at higher frequencies (460 Hz and
above), the spatial ANC method begins to yield consistently better attenuation
than the MIMO method. The reason for this is that as the wavelength shortens,
sampling of noise pressure on the boundary of the control region begins to have
less correlation with the noise field inside the region. As a result, minimizing the
noise pressure at the microphone positions no longer guarantees minimization of the
noise level inside the region. The proposed method, on the other hand, is able to
control the entire region through converting the microphone signals into the spherical
harmonic domain. Should the region size be larger, this phenomenon would be more
pronounced.
Of special notice is the peak at 300 Hz in Fig. 7.6, where the attenuation is 0
150 Spatial active noise cancellation system architectures

0
Speaker 1
Speakers 1 & 2
Speakers 1,2,4,5,6
-5
Attenuation (dB)

-10

-15

-20
200 250 300 350 400 450 500
Frequency (Hz)

Figure 7.7: Spatial noise attenuation using primary noise source 1 and various num-
ber of secondary loudspeakers.

dB for both MIMO algorithm and the spatial algorithm. A careful investigation
reveals that at this frequency, a standing wave is formed between the noise source
and the walls of the lab, with the control region located at a “minimum” point of
the standing wave, i.e., the amplitude of the standing wave is very small. Therefore,
the noise field essentially cancels itself without need of the secondary loudspeakers,
which results in minimum attenuation gain for the ANC system.

Impact of secondary source number on spatial ANC performance

In a multi channel ANC system, more than one secondary sources may be employed
in order to improve the attenuation level of the system. When the control region
is large, or the target frequency band is high, it is expected that a larger number
of secondary sources are required to achieve sufficient noise attenuation, due to
the increased complexity of the noise field. In this experiment, we validate this
assumption using the experimental ANC system.
First, we use only one primary noise source (noise source 1 in Fig. 7.5), and
perform the ANC experiment using the proposed algorithm with (i) speaker 1, (ii)
speakers 1 & 2, and (iii) speakers 1,2,4,5,6. The average noise attenuation for the
three cases at frequencies from 200-500 Hz is shown in Fig. 7.7.
From Fig. 7.7, we can see that indeed the noise attenuation is greater when
more secondary sources are active. Using only speaker 1, the system is able to
7.5 Experiment validation 151

-5

Attenuation (dB)
-10

-15

-20 Speaker 2
Speakers 1 & 3
Speakers 1 - 6
-25
200 250 300 350 400 450 500
Frequency (Hz)

Figure 7.8: Spatial noise attenuation using primary noise source 1 and various num-
ber of secondary loudspeakers.

achieve around 10 dB attenuation for most frequencies, and overall, the attenuation
is better at lower frequencies (below 300 Hz) than at higher frequencies (above 400
Hz). When speaker 2 is added to the system, a much better attenuation is observed
at the lowest frequencies, while a smaller performance gain is achieved at the higher
frequencies. Neither of the two configurations were able to attenuate the noise at
300 Hz.
When five secondary sources are used in the ANC system, compared to the case
with only speakers 1 and 2, the noise attenuation from 200 Hz to 240 Hz are almost
identical. As frequency increases, the difference between the two configurations
becomes more significant. At 440 Hz and above, the 5-speaker configuration results
in approximately 10 dB higher attenuation than the 2-speaker case. Furthermore,
unlike the other two cases, the 5-speaker setup is able to maintain more than 15 dB
noise reduction throughout the whole frequency band, with the only exception at
300 Hz, where only 1 db of noise reduction is observed.
The experiment is repeated with both primary noise sources active, with each
playing a sine wave at the same frequency, but different phase. This time, the
secondary sources being activated are (i) speaker 2, (ii) speakers 1 & 3, and finally
(iii) speakers 1-6. We plot the noise attenuation in Fig. 7.8.
The overall trend shown in Fig. 7.8 is similar to that of Fig. 7.7, where the single
secondary source case results in the least noise attenuation, and the case where
all 6 speakers are used has the most attenuation. The overall trend of decreasing
152 Spatial active noise cancellation system architectures

ANC performance with increased frequency is also observed for the single and duo
secondary source cases. At 420 Hz and above, using two speakers (speakers 1 an 3)
does not provide significant improvement over using speaker 2 alone. However, when
all 6 speakers are used, the noise attenuation can be further improved by 5 − 10 dB.
We also note that when both primary sources are active, the system no longer
experience difficulty at 300 Hz, this is because the speakers are capable of cancelling
the noise field due to the second noise source, whose noise field does not exhibit the
self-cancelling behavior like that of noise source 1.
It can be seen from both Fig. 7.7 and Fig. 7.8 that at lower frequencies, a small
number of secondary sources are sufficient to provide more than 15 dB of noise
attenuation, and the benefit of adding more secondary sources is only marginal.
This is because given the radius of our control region (0.145 m), at around 200 Hz,
the 0th order spherical harmonic mode is dominant within the control region. Since
the 0th mode is uniform and isotropic, any secondary source located in any direction
is capable of producing this mode within the control region, therefore only one or
two secondary speakers is sufficient to reproduce the noise field generated by the
noise source, hence resulting in high attenuation to the noise energy.
On the other hand, at higher frequencies, the 1st order spherical harmonic modes
begin to have more impact on the sound field. Since 1st order modes are directional,
a single secondary source can produce good results only if the sound field it pro-
duces within the control region is very similar to that of the primary noise source.
Because this is very unlikely to be the case, using a single or a small number of
secondary sources generally cannot provide high attenuation at higher frequencies.
The combined use of multiple secondary sources, however, can substantially improve
the spatial ANC system’s sound field reproduction capability, hence leads to better
results.

Impact of secondary source position on spatial ANC performance

In Section 7.5.2, it was stated that the noise attenuation is related to the secondary
source’s ability to reproduce the noise field, which in turn is related to the placement
of secondary sources in relation to the primary noise. This is investigated in more
detail in this section.
In this experiment, we first use primary source 1 to generate the noise field, and
compare the ANC performance using secondary speaker 1 and secondary speaker 3.
We plot the experiment results in Fig. 7.9.
7.5 Experiment validation 153

-5

Attenuation (dB)

-10

-15
Speaker 1
Speaker 3
-20
200 250 300 350 400 450 500
Frequency (Hz)

Figure 7.9: Spatial noise attenuation using primary noise source 1 with secondary
speaker 1 (red) and secondary speaker 3 (blue).

It can be observed from Fig. 7.9 that the overall noise attenuation using speaker
1 is superior to that of speaker 3. Below 250 Hz, the difference of the two is not
very significant; however, at higher frequencies, speaker 1 begins to consistently
outperform speaker 3.
From Fig. 7.5, it can be seen that if viewed from the center of the control region,
secondary speaker 1 lies approximately in the same direction as primary source 1;
on the other hand, secondary speaker 3 forms a 45 degree angle with primary source
1. Since the loudspeakers employed in this experiment (both primary sources and
secondary sources) can be approximately seen as point sources, it can be expected
that the sound field produced by secondary speaker 1 would be very close to that
of primary source 1. Secondary speaker 3, on the other hand, will produce a very
different sound field due to its different impinging direction.
We note that in the case of speaker 3, the ANC system fails to converge at 280
Hz, in addition to 300 Hz. The failure at 280 Hz is not due to a standing wave of
the noise field, but likely due to the secondary channel being very different from the
primary channel, thus causing difficulty with the convergence of the algorithm.
The experiment is also repeated with multiple secondary sources. Instead of
changing the position of secondary sources, we examine the impact of primary source
position on ANC performance, given a fixed set of secondary loudspeakers. The
secondary sources used in this experiment are speakers 1,2,4,5 and 6. We plot the
noise attenuation results achieved using primary source 1 and primary source 3 in
154 Spatial active noise cancellation system architectures

0
Noise Source 1
Noise Source 2
-5
Attenuation (dB)

-10

-15

-20

-25
200 250 300 350 400 450 500
Frequency (Hz)

Figure 7.10: Spatial noise attenuation using secondary speakers 1,2,4,5,6 with pri-
mary source 1 (blue) and primary source 2 (red).

Fig. 7.10.
In Fig. 7.10 we can see that the noise attenuation for both noise sources are very
similar at lower frequencies; at 400 Hz and above, however, the noise attenuation for
primary source 1 becomes significantly better than than of primary source 2, with
more than 5 dB extra attenuation at some frequencies. Also, we note that primary
source 2 does not produce the standing wave like primary source 1 does at 300 Hz,
and therefore our system is able to yield a noise attenuation consistent with other
frequencies.
It can be seen from Fig. 7.5 that the five selected secondary sources essentially
“surrounds” primary source 1, if viewed from the control region, with secondary
speakers 1 and 2 being the closest to the noise source. The other three secondary
speakers have a very different elevation angle than the noise source, however their
elevation angles approximately coincides with the reflected waves from the ceiling
and the floor, emitted by primary source 1. Therefore, the secondary source setup
provides a very good coverage of the noise field generated by primary source 1,
hence the high, consistent ANC performance across the whole frequency band. In
the case of primary source 2, due to its different azimuth angle, the secondary
array’s capability to reproduce its noise field is limited and degrades gradually as
the frequency increases and the sound field becomes more complex, which results in
a slowly decaying attenuation level.
Overall, it can be concluded that the performance of the spatial ANC system
7.6 Summary 155

is affected by both the number of secondary sources and the relative position be-
tween secondary sources and primary sources. For a small control region and low
frequencies, even a small number of secondary sources is sufficient to provide ad-
equate noise attenuation, and the system performance is not very sensitive to the
placement of secondary sources. As the frequency increases, the system’s perfor-
mance becomes more sensitive to the number and position of secondary sources. If
the secondary sources are placed such that they cover the impinging directions of
the primary noises, it is possible to achieve consistent noise attenuation of over 15
dB for frequencies up to 500 Hz within a spherical region of 0.145 m radius, using
our proposed spatial ANC system.

7.6 Summary
In this chapter we propose a spatial active noise cancellation algorithm based on
spherical harmonics decomposition of the noise field. Both the frequency domain
implementation and the equivalent time domain implementation of the algorithm are
discussed. The proposed algorithm allows flexible placement of secondary sources,
and is able to optimize the noise attenuation for a given spherical region. Through
a series of experiments with the proposed ANC system, we show that the spatial
noise cancelling quality of the system depends on both the secondary source numbers
and placement, and an average noise attenuation of over 15 dB is achievable for a
spherical region of 0.145m radius, at a frequency range of 200-500 Hz, using the
proposed spatial ANC system.
Chapter 8

Conclusion and future works

8.1 Conclusion
For the purpose of effectively attenuating the noise level inside a spatial region in
a practical environment, using active noise control methods, a number of problems
and difficulties have to be addressed. These include accurate acquisition of noise
field information, as well as generation of optimal noise cancelling signals. The
goal of this thesis was to address these problems by proposing a number of signal
processing algorithms, which includes algorithms for spatial sound recording, noise
environment modelling, as well as spatial adaptive ANC system architecture.
Spherical harmonic analysis has been shown to be an efficient and accurate tool
for representing spatial sound field. However, existing microphone array layouts
suitable for spherical harmonic analysis all exhibit 3D geometries. In Chapter 3, we
proposed a 2D planar microphone array layout which has the capability of capturing
3D spatial sound field. Through the use of vertically placed first order microphone
units, the proposed planar array is able to detect sound field components that are
“invisible” on a plane. The proposed array geometry is shown to have the same
capability as a spherical array of the same radius. However, it is desirable to use
high precision microphones to implement the proposed array, since its robustness is
inferior to spherical arrays. In Chapter 4, we also propose a generalization of this
method, which allows the use of planar higher order microphone arrays to sample
3D sound field. This method reduces the total number of microphones required for
sound field recording, hence promotes the feasibility of spherical harmonic analysis
in real-life ANC applications.
In a reverberant environment, the noise field due to a single source can be re-

157
158 Conclusion and future works

flected multiple times, thereby creating a more complex noise field. Such kind of
noise field is harder to control, due to its wide range of impinging directions. It
is therefore critical to have a method to estimate the level of reverberation in a
given environment, so as to aid the design of the ANC system. In Chapter 5, we
developed an algorithm for Direct-to-Reverberant Ratio estimation. Compared to
existing methods for DRR estimation, the proposed method provides a more ac-
curate modelling of the reverberant field, therefore its estimation of DRR is more
accurate. The sound field information required by the algorithm can be captured
by a first order microphone system.
In order to develop spatial noise cancellation techniques, one first needs a metric
to measure the average noise level inside a spatial area. We proposed one such
metric in Chapter 6, which can be calculated by taking the weighted squared sum
of the spherical harmonic coefficients of the sound field. Using this metric, we
developed a method to predict the optimum spatial ANC performance in a given
noise environment, before physically implementing an ANC system. This method
is then applied to estimate the performance of in-car loudspeakers for the purpose
of cancelling the car cabin noise under various driving conditions. In this chapter,
we also show that by appropriately weighing the spherical harmonic coefficients, it
is possible to optimize the ANC performance at a number of sub-regions within the
desired quiet zone, only at a small sacrifice to the global noise reduction performance.
This technique is shown to be especially useful when the number of secondary sources
is insufficient.
In Chapter 7, we developed an adaptive ANC algorithm based on the spherical
harmonic analysis technique. Through transforming the microphone signals into the
spherical harmonics domain, the algorithm is able to optimize the noise attenuation
within a spherical region. Both frequency domain and time domain implementations
are discussed. An experimental spatial ANC system based on the proposed algo-
rithm has been implemented, and we used this system to investigate the performance
of ANC system under various system configurations.
Overall, it can be concluded that despite the challenges that still remain un-
solved, the spatial active noise cancellation technique has been developed to the
point where it does not only exist in theory and simulations, but has become feasi-
ble and practical to be deployed in many applications to solve real-world problems.
Further development of the algorithms related to spatial ANC would surely lead
to improvements in the performance of spatial ANC systems, as well as identifying
more potential applications for the technique.
8.2 Future works 159

8.2 Future works

A number of problems that arose from the work presented in this thesis are listed
below.

Planar microphone array for mounting on rigid surfaces

The planar microphone array aperture described in this work reduces the space
requirement for capturing 3D spatial sound using a microphone array. However, the
proposed array is developed under the free field assumption, i.e., there should not be
sound sources or reflectors within the spherical region which the array covers. This
means that the planar array cannot be mounted on walls or ceilings for convenient
deployment. A possible approach to address this problem is to model the sound
field in the proximity of a planar reflector, and incorporate this sound field model
into the calculation of the spherical harmonic coefficients. Since in this case, the
sound impinging direction is limited to within a hemisphere, which reduces the
complexity or dimensionality of the sound field, the total number of microphones
may be reduced.

Compressive sampling of noise field

In Chapter 6, we have shown that the noise field inside a vehicle cabin shows a
tendency of spareness, where a small number of basis functions is sufficient to rep-
resent the noise field at any time instant. This is likely to be the case especially
when the number of independent noise sources is small. In such cases, a relatively
small number of well placed microphones may be sufficient to monitor the noise
field, given prior knowledge of the basis functions that could describe the noise field.
This would allow much more flexible placement of error microphones for a spatial
ANC system.

Secondary channel transformation

The adaptive ANC algorithm described in this work only transforms the error mi-
crophone signals to spherical harmonics domain; the secondary loudspeaker model
used here is the same as the MIMO ANC algorithms. Since the convergence speed
of the algorithm depends on the secondary channel information, when the channels
of multiple loudspeakers show a strong correlation, convergence speed will become
160 Conclusion and future works

slower. In a previous work, the secondary driving signals are transformed into circu-
lar harmonics, which overcomes this problem. The drawback of this method is that
it requires a circular loudspeaker array, which is often impractical. For non-circular
speaker arrays, it may be possible to define a different transformation which does
not require specific loudspeaker placement, thus improving the convergence speed
of the adaptive algorithm.

Online secondary channel estimation

The adaptive algorithms presented in this work all require the secondary channel
information. Although this information can be obtained offline by playing a sweep
signal from each secondary loudspeaker, it is still desirable to be able to update the
secondary path while the ANC system is functioning. This is because the secondary
path changes with the slightest movement of objects inside and around the control
region, when the pre-recorded secondary path deviates too far from the current
secondary path, the adaptive system would become unstable. With online secondary
channel estimation, the ANC system can keep track of the changes of the secondary
path, thus promoting system stability and noise attenuation level.

Feedback adaptive spatial ANC algorithm

The feedback ANC algorithm has a number of advantages over its feed-forward
counterpart, such as the omittance of reference microphones. Incorporating the
spatial noise control techniques developed in this work into feedback ANC systems
would certainly bring benefits such as a simplified system structure. However, the
limitations of feedback algorithms would also apply, which includes its inability to
cancel non-periodic noise, and its inferior stability compared to feed-forward ANC
algorithm. Furthermore, the feedback algorithm has a higher computational cost
than the feed-forward algorithm, due to its need to synthesize reference signals.
These challenges should be investigated to determine the feasibility of a spatial
feedback ANC system in real-life applications.

Active Noise Control in reverberant environments

In a reverberant environment, the sound field due to a simple sound source can
become very complex due to the reverberations. The long impulse response also
poses a challenge to the active noise control systems deployed in such environments.
8.2 Future works 161

It is therefore worthwhile to investigate how badly the reverberation can impact

ANC performance, and what modifications can be done to the ANC algorithm
to improve its performance in such environments. The DRR estimation method
proposed in this thesis would become very useful for this purpose, because it allows
the ANC system to identify the strength of the reverberation, so that the algorithm
can adjust accordingly to optimize noise attenuation.
Bibliography

[1] Sharon G. Kujawa and M. Charles Liberman, “Adding insult to injury:

Cochlear nerve degeneration after “temporary” noise-induced hearing loss,”
vol. 29, no. 45, pp. 14077–14085, 2009.

[2] S. J. Elliott and P. A. Nelson, “Active noise control,” IEEE Signal Processing
Magazine, vol. 10, no. 4, pp. 12–35, Oct 1993.

[3] S. M. Kuo, S. Mitra, and Woon-Seng Gan, “Active noise control system for
headphone applications,” IEEE Transactions on Control Systems Technology,
vol. 14, no. 2, pp. 331–335, March 2006.

[4] S. M. Kuo and S. Mitra, “Design of noise reduction headphone,” in Proc. 2006
Digest of Technical Papers International Conference on Consumer Electronics,
Jan 2006, pp. 457–458.

[5] M. Guldenschuh, A. Sontacchi, M. Perkmann, and M. Opitz, “Assessment of

active noise cancelling headphones,” in Proc. 2012 IEEE Second International
Conference on Consumer Electronics - Berlin (ICCE-Berlin), Sept 2012, pp.
299–303.

[6] S. M. Kuo and D. R. Morgan, “Active noise control: a tutorial review,”

Proceedings of the IEEE, vol. 87, no. 6, pp. 943–973, 1999.

[7] H. Sano, T. Inoue, A. Takahashi, K. Terai, and Y. Nakamura, “Active control

system for low-frequency road noise combined with an audio system,” IEEE
Transactions on Speech and Audio Processing, vol. 9, no. 7, pp. 755–763, 2001.

[8] R. J. Bernhard, “Active control of road noise inside automobiles,” in Inter-

noise and Noise-con Congress and Conference Proceedings. Institute of Noise
Control Engineering, 1995, vol. 1995, pp. 21–32.

163
164 Bibliography

[9] A. Barkefors, S. Berthilsson, and M. Sternad, “Extending the area silenced

by active noise control using multiple loudspeakers,” in Proc. IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE, 2012, pp. 325–328.

[10] T. D. Abhayapala and D. B. Ward, “Theory and design of high order sound
field microphones using spherical microphone array,” in Proc. 2002 IEEE In-
ternational Conference on Acoustics, Speech, and Signal Processing (ICASSP),
2002, vol. 2, pp. II–1949–II–1952.

[11] D. Khaykin and B. Rafaely, “Coherent signals direction-of-arrival estimation

using a spherical microphone array: Frequency smoothing approach,” in Proc.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
2009, pp. 221–224.

[12] D. P. Jarrett, O. Thiergart, E. A. P. Habets, and P. A. Naylor, “Coherence-

based diffuseness estimation in the spherical harmonic domain,” in Proc. 2012
IEEE 27th Convention of Electrical Electronics Engineers in Israel (IEEEI),,
Nov 2012, pp. 1–5.

[13] T. D. Abhayapala and A. Gupta, “Spherical harmonic analysis of wavefields

using multiple circular sensor arrays,” IEEE Transactions on Audio, Speech,
and Language Processing, vol. 18, no. 6, pp. 1655–1666, 2010.

[14] G. W. Elko and J. M. Meyer, “Using a higher-order spherical microphone

array to assess spatial and temporal distribution of sound in rooms,” The
Journal of the Acoustical Society of America, vol. 132, no. 3, pp. 1912–1912,
2012.

[15] B. Rafaely, “The spherical-shell microphone array,” IEEE Transactions on

Audio, Speech, and Language Processing, vol. 16, no. 4, pp. 740–747, 2008.

[16] C. T. Jin, N. Epain, and A. Parthy, “Design, optimization and evaluation of a

dual-radius spherical microphone array,” IEEE/ACM Transactions on Audio,
Speech, and Language Processing, vol. 22, no. 1, pp. 193–204, Jan 2014.

[17] T. D. Abhayapala and C. T. Chan, “Limitation and errior analysis of spher-

ical microphone arrays,” in Proc. 14th International Congress on Sound and
Vibration (ICSV14), 2007.
Bibliography 165

[18] M. Chan, “Theory and design of higher order sound field recording,” Depart-
ment of Engineering, FEIT, ANU, Honours Thesis, 2003.

[19] Rishabh RANJAN, Jianjun HE, Tatsuya MURAO, Lam BHAN, and
Woon Seng GAN, “Selective active noise control system for open windows
using sound classification,” in Proc. Inter.noise 2016, Nov 2016.

[20] Chuang Shi, Tatsuya Murao, Dongyuan Shi, Bhan Lam, and Woon-Seng Gan,
“Open loop active control of noise through open windows,” The Journal of
the Acoustical Society of America, vol. 140, no. 4, pp. 3313–3313, 2016.

[21] Tatsuya Murao, Chuang Shi, Woon-Seng Gan, and Masaharu Nishimura,
“Mixed-error approach for multi-channel active noise control of open win-
dows,” Applied Acoustics, vol. 127, pp. 305 – 315, 2017.

[22] Jordan Cheer and Stephen J. Elliott, “Multichannel control systems for the
attenuation of interior road noise in vehicles,” Mechanical Systems and Signal
Processing, vol. 6061, pp. 753 – 769, 2015.

[23] S. J. Elliott, W. Jung, and J. Cheer, “The spatial properties and local active
control of road noise,” in Proc. of Euro-noise, 2015, pp. 2189–2194.

[24] J. Cheer and S. J. Elliott, “Mutlichannel feedback control of interior road

noise,” in Proceedings of Meetings on Acoustics ICA2013. ASA, 2013, vol. 19,
p. 030118.

[25] Kosuke Sakamoto and Toshio Inoue, “Development of feedback-based active

road noise control technology for noise in multiple narrow-frequency bands
and integration with booming noise active noise control system,” SAE Inter-
national Journal of Passenger Cars-Mechanical Systems, vol. 8, no. 2015-01-
0660, pp. 1–7, 2015.

[26] Akira Takahashi, Toshio Inoue, Kosuke Sakamoto, and Yasunori Kobayashi,
“Integrated active noise control system for low-frequency noise in automo-
biles,” in INTER-NOISE and NOISE-CON Congress and Conference Proceed-
ings. Institute of Noise Control Engineering, 2011, vol. 2011, pp. 2105–2113.

[27] J. Cheer and S. J. Elliott, “Active noise control of a diesel generator in a

luxury yacht,” Applied Acoustics, vol. 105, pp. 209–214, 2016.
166 Bibliography

[28] J. Cheer and S. Daley, “An investigation of delayless subband adaptive filtering
for multi-input multi-output active noise control applications,” IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 25, no. 2, pp.
359–373, Feb 2017.

[29] Tongwei Wang, Woon-Seng Gan, and Sen M. Kuo, “New feedback active
noise control system with improved performance,” in Proc. 2014 IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP),.
IEEE, 2014, pp. 6662–6666.

[30] J. Zhang, W. Zhang, and T. D. Abhayapala, “Noise cancellation over spatial

regions using adaptive wave domain processing,” in Proc. 2015 IEEE Work-
shop on Applications of Signal Processing to Audio and Acoustics (WASPAA),
Oct 2015, pp. 1–5.

[31] S. Spors and H. Buchner, “Efficient massive multichannel active noise control
using wave-domain adaptive filtering,” in Proc. 3rd International Symposium
on Communications, Control and Signal Processing, March 2008, pp. 1480–
1485.

[32] T. D. Abhayapala, Modal Analysis and Synthesis of Broadband Nearfield

Beamforming Arrays, Ph.D. thesis, The Australian National University and
Telecommunications Engineering Group, 12 1999.

[33] E. Tiana-Roig, F. Jacobsen, and E. Fernandez-Grande, “Beamforming with a

circular array of microphones mounted on a rigid sphere,” The Journal of the
Acoustical Society of America, vol. 130, no. 3, pp. 1095–1098, 2011.

[34] Shefeng Yan, Haohai Sun, U. P. Svensson, Xiaochuan Ma, and J. M. Hovem,
“Optimal modal beamforming for spherical microphone arrays,” IEEE Trans-
actions on Audio, Speech, and Language Processing, vol. 19, no. 2, pp. 361–371,
2011.

[35] C. Lai, S. Nordholm, and Y. Leung, “Design of steerable spherical broad-

band beamformers with flexible sensor configurations,” IEEE Transactions
on Audio, Speech, and Language Processing, vol. 21, no. 2, pp. 427–438, 2013.

[36] T. D. Abhayapala, R. A. Kennedy, and R. C. Williamson, “Nearfield broad-

band array design using a radially invariant modal expansion,” The Journal
of the Acoustical Society of America, vol. 107, no. 1, pp. 392–403, 2000.
Bibliography 167

[37] R. A. Kennedy, T. D. Abhayapala, D. B. Ward, and R. C. Williamson,

“Nearfield broadband frequency invariant beamforming,” in Proc. 1996
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP-96). IEEE, 1996, vol. 2, pp. 905–908.

[38] T. D. Abhayapala, R. A. Kennedy, R. C. Williamson, and D. B. Ward,

“Nearfield broadband adaptive beamforming,” in Proc. of the Fifth Inter-
national Symposium on Signal Processing and Its Applications (ISSPA’99).
IEEE, 1999, vol. 2, pp. 839–842.

[39] H. Sun, E. Mabande, K. Kowalczyk, and W. Kellermann, “Localization of

distinct reflections in rooms using spherical microphone array eigenbeam pro-
cessing,” The Journal of the Acoustical Society of America, vol. 131, no. 4,
pp. 2828–2840, 2012.

[40] D. Khaykin and B. Rafaely, “Acoustic analysis by spherical microphone array

processing of room impulse responses,” The Journal of the Acoustical Society
of America, vol. 132, no. 1, pp. 261–270, 2012.

[41] P. N. Samarasinghe, T. D. Abhayapala, M. A. Poletti, and T Betlehem, “An

efficient parameterization of the room transfer function,” IEEE/ACM Trans-
actions on Audio, Speech, and Language Processing, vol. 23, no. 12, pp. 2217–
2227, 2015.

[42] P. N. Samarasinghe, T. D. Abhayapala, M. A. Polettfi, and T. Betlehem, “On

room impulse response between arbitrary points: An efficient parameteriza-
tion,” in Proc. 6th International Symposium on Communications, Control and
Signal Processing (ISCCSP). IEEE, 2014, pp. 153–156.

[43] B. Rafaely, “Analysis and design of spherical microphone arrays,” IEEE

Transactions on Speech and Audio Processing, vol. 13, no. 1, pp. 135–143,
2005.

[44] M. A. Poletti, “Three-dimensional surround sound systems based on spherical

harmonics,” Journal of the Audio Engineering Society, vol. 53, no. 11, pp.
1004–1025, 2005.

[45] D. B. Ward and T. D. Abhayapala, “Performance bounds on sound field

reproduction using a loudspeaker array,” in Proc. of Workshop on App. of
Sig. Proc. to Audio and Acoustics, Mohonk, 2001.
168 Bibliography

[46] D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave sound field

using an array of loudspeakers,” IEEE Trans. Speech Audio Process., vol. 9,
no. 6, pp. 697–707, September 2001.

[47] H. Teutsch and W. Kellermann, “Detection and localization of multiple wide-

band acoustic sources based on wavefield decomposition using spherical aper-
tures,” in Proc. IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP) 2008, 2008, pp. 5276–5279.

[48] Y. Peled and B. Rafaely, “Method for dereverberation and noise reduction
using spherical microphone arrays,” in Proc. 2010 IEEE International Confer-
ence on Acoustics Speech and Signal Processing (ICASSP), 2010, pp. 113–116.

[49] D. P. Jarrett and E. A P Habets, “On the noise reduction performance of

a spherical harmonic domain tradeoff beamformer,” IEEE Signal Processing
Letters, vol. 19, no. 11, pp. 773–776, 2012.

[50] E. de Witte, H. D. Griffiths, and P. V. Brennan, “Phase mode processing for

spherical antenna arrays,” Electronics Letters, vol. 39, no. 20, pp. 1430–1431,
2003.

[51] A. J. Berkhout, D. de Vires, and P. Vogel, “Acoustic control by wave field

synthesis,” J. Acoust. Soc. Amer., vol. 93, no. 5, pp. 2764–2778, 1993.

[52] Boaz Rafaely, “Plane-wave decomposition of the sound field on a sphere by

spherical convolution,” The Journal of the Acoustical Society of America, vol.
116, no. 4, pp. 2149–2157, 2004.

[53] R. A. Kennedy, P. Sadeghi, T. D. Abhayapala, and H. M. Jones, “Intrin-

sic limits of dimensionality and richness in random multipath fields,” IEEE
Transactions on Signal Processing, vol. 55, no. 6, pp. 2542–2556, 2007.

[54] H. M. Jones, R. A. Kennedy, and T. D. Abhayapala, “On dimensionality of

multipath fields: Spatial extent and richness,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE,
2002, vol. 3, pp. III–2837.

[55] T. D. Abhayapala, T. S. Pollock, and R. A. Kennedy, “Spatial decomposition

of mimo wireless channels,” in Proc. Seventh International Symposium on
Signal Processing and Its Applications. IEEE, 2003, vol. 1, pp. 309–312.
Bibliography 169

[56] I.S. Gradshteyn and I.M. Ryzhik, Table of Integrals, Series, and Products, p.
955, Academic Press, 2000.

[57] P. A. Martin, Multiple scattering: interaction of time harmonic waves with N

obstacles, Cambridge Univ., 2006.

[58] P. N. Samarasinghe, T. D. Abhayapala, and M. A Poletti, “3D spatial sound-

field recording over large regions,” in Proc. International Workshop on Acous-
tic Signal Enhancement (IWAENC), Sep. 2012, pp. 1–4.

[59] J. Kautz, J. Snyder, and P. J. Sloan, “Fast arbitrary BRDF shading for low-
frequency lighting using spherical harmonics,” Rendering Techniques, vol. 2,
pp. 291–296, 2002.

[60] R. Rabenstein A. Kuntz, “Cardioid pattern optimization for a virtual circular

microphone array,” in Proc. of the EAA Symposium on Auralizatio, Espoo,
Finland, 2009.

[61] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with

formulas, graphs, and mathematical tables, p. 439, Number 55. Courier Cor-
poration, 1964.

[62] Frederik J. Simons, “Slepian functions and their use in signal estimation and
spectral analysiss,” in Handbook of Geomathematics, Willi Freeden, M. Zuhair
Nashed, and Thomas Sonar, Eds., pp. 891–923. Springer Berlin Heidelberg,
2010.

[63] R. H. Rapp and N. K. Pavlis, “The development and analysis of geopotential

coefficient models to spherical harmonic degree 360,” Journal of Geophysical
Research: Solid Earth, vol. 95, no. B13, pp. 21885–21911, 1990.

[64] J. Meyer and G. Elko, “A highly scalable spherical microphone array based on
an orthonormal decomposition of the soundfield,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2002,
vol. 2, pp. II–1781–II–1784.

[65] E. Mabande, K. Kowalczyk, H. Sun, and W. Kellermann, “Room geometry

inference based on spherical microphone array eigenbeam processing,” The
Journal of the Acoustical Society of America, vol. 134, no. 4, pp. 2773–2789,
2013.
170 Bibliography

[66] F. Jacobsen, G. Moreno-Pescador, E. Fernandez-Grande, and Jørgen Hald,

“Near field acoustic holography with microphones on a rigid sphere,” The
Journal of the Acoustical Society of America, vol. 129, no. 6, pp. 3461–3464,
2011.

[67] I. Balmages and B. Rafaely, “Open-sphere designs for spherical microphone

arrays,” IEEE Transactions on Audio, Speech, and Language Processing, vol.
15, no. 2, pp. 727–732, 2007.

[68] T. D. Abhayapala and M.C.T. Chan, “Limitation and errior analysis of spher-
ical microphone arrays,” in Proc. 14th International Congress on Sound and
Vibration (ICSV14), Cairns, Australia, July 2007.

[69] C. Jin, A. Parthy, and A. Van Schaik, “Optimisation of co-centred rigid and
open spherical microphone arrays,” in Proc. 120th Audio Engineering Society
Convention, Paris, France, May 2006, p. 6 pages, Audio Engineering Society.

[70] Z. Li and R. Duraiswami, “Flexible and optimal design of spherical microphone

arrays for beamforming,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 15, no. 2, pp. 702–714, 2007.

[71] A. Gupta and T. D. Abhayapala, “Double sided cone array for spherical
harmonic analysis of wavefields,” in Proc. IEEE International Conference on
Acoustics Speech and Signal Processing (ICASSP), March 2010, pp. 77–80.

[72] T. D. Abhayapala, A. Gupta, et al., “Non-spherical microphone array struc-

tures for 3d beamforming and spherical harmonic analysis,” in Proc. of the
11th International Workshop on Acoustic Echo and Noise Control, 2008.

[73] J. M. Meyer and G. W. Elko, “Spherical harmonic modal beamforming for an

augmented circular microphone array,” in Proc. IEEE International Confer-
ence on Acoustics, Speech and Signal Processing, 2008, pp. 5280–5283.

[74] H. Teutsch, Modal Array Signal Processing: Principles and Applications of

Acoustic Wavefield Decomposition, chapter 3, pp. 53–54, Springer, Mar. 2007.

[75] H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact

hybrid microphone arrays on two-dimensional planes for three-dimensional
soundfield analysis,” The Journal of the Acoustical Society of America, vol.
138, no. 5, pp. 3081–3092, 2015.
Bibliography 171

[76] P. N. Samarasinghe, T. D. Abhayapala, and M. A Poletti, “Wavefield analysis

over large areas using distributed higher order microphones,” IEEE/ACM
Trans. Audio, Speech and Lang. Proc., vol. 22, no. 3, pp. 647–658, Mar. 2014.

[77] H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-
ing circular higher-order microphone array,” in Proc. 23rd European Signal
Processing Conference (EUSIPCO),, Aug 2015, pp. 1153–1157.

[78] D. Griesinger, “The importance of the direct to reverberant ratio in the per-
ception of distance, localization, clarity, and envelopment,” in Proc. Audio
Engineering Society Convention 126. Audio Engineering Society, 2009.

[79] K. Lebart, J. M. Boucher, and P. N. Denbigh, “A new method based on

spectral subtraction for speech dereverberation,” Acta Acustica united with
Acustica, vol. 87, no. 3, pp. 359–366, 2001.

[80] C. Marro, Y. Mahieux, and K. U. Simmer, “Analysis of noise reduction and

dereverberation techniques based on microphone arrays with postfiltering,”
IEEE Trans. on Speech and Audio Processing, vol. 6, no. 3, pp. 240–259,
1998.

[81] D. B. Hawkins and W. S. Yacullo, “Signal-to-noise ratio advantage of binaural

hearing aids and directional microphones under different levels of reverbera-
tion,” Journal of Speech and Hearing Disorders, vol. 49, no. 3, pp. 278–286,
1984.

[82] E. Larsen, N. Iyer, C. R. Lansing, and A. S. Feng, “On the minimum audible
difference in direct-to-reverberant energy ratio,” The Journal of the Acoustical
Society of America, vol. 124, no. 1, pp. 450–461, 2008.

[83] M. Laitinen and V. Pulkki, “Utilizing instantaneous direct-to-reverberant

ratio in parametric spatial audio coding,” in Proc. Audio Engineering Society
Convention 133. Audio Engineering Society, Oct 2012.

[84] P. Zahorik, D. S. Brungart, and A. W. Bronkhorst, “Auditory distance per-

ception in humans: A summary of past and present research,” Acta Acustica
united with Acustica, vol. 91, no. 3, pp. 409–420, 2005.

[85] A. J. Kolarik, S. Cirstea, and S. Pardhan, “Evidence for enhanced discrimina-

tion of virtual auditory distance among blind listeners using level and direct-
172 Bibliography

to-reverberant cues,” Experimental brain research, vol. 224, no. 4, pp. 623–633,
2013.

[86] E. Larsen, C. D. Schmitz, C. R. Lansing, W. D. O’Brien Jr, B. C. Wheeler,

and A. S. Feng, “Acoustic scene analysis using estimated impulse responses,”
in Proc. IEEE Thirty-Seventh Asilomar Conference on Signals, Systems and
Computers, 2003, vol. 1, pp. 725–729.

[87] T. H. Falk and W. Chan, “Temporal dynamics for blind measurement of room
acoustical parameters,” IEEE Trans. on Instrumentation and Measurement,
vol. 59, no. 4, pp. 978–989, 2010.

[88] S. Mosayyebpour, H. Sheikhzadeh, T. A. Gulliver, and M. Esmaeili, “Single-

microphone LP residual skewness-based inverse filtering of the room impulse
response,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 20,
no. 5, pp. 1617–1632, 2012.

[89] P. P. Parada, D. Sharma, T. Waterschoot, and P. A. Naylor, “Evaluating the

non-intrusive room acoustics algorithm with the ACE challenge,” in Proc.
ACE Challenge Workshop, a satellite event of WASPAA, New Paltz, NY,
USA, Oct 2015.

[90] Y. Lu and M. Cooke, “Binaural estimation of sound source distance via the
direct-to-reverberant energy ratio for static and moving sources,” IEEE Trans.
on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1793–1805,
2010.

[91] S. Vesa, “Sound source distance learning based on binaural signals,” in Proc.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
Oct 2007, pp. 271–274.

[92] M. Jeub, C. Nelke, C. Beaugeant, and P. Vary, “Blind estimation of the

coherent-to-diffuse energy ratio from noisy speech signals,” in Proc. 19th Eu-
ropean Signal Processing Conference, Aug 2011, pp. 1347–1351.

[93] O. Thiergart, G. Del Galdo, and E. A. P. Habets, “Signal-to-reverberant ratio

estimation based on the complex spatial coherence between omnidirectional
microphones.,” in Proc. International Conference on Acoustics, Speech and
Signal Processing, 2012, pp. 309–312.
Bibliography 173

[94] E. Georganti, J. Mourjopoulos, and S. van de Par, “Room statistics and

direct-to-reverberant ratio estimation from dual-channel signals,” in Proc.
2014 IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), May 2014, pp. 4713–4717.

[95] O. Thiergart, T. Ascherl, and E. A. P. Habets, “Power-based signal-to-diffuse

ratio estimation using noisy directional microphones,” in Proc. IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), May
2014, pp. 7440–7444.

[96] Y. Hioka and K. Niwa, “PSD estimation in beamspace for estimating direct-to-
reverberant ratio from a reverberant speech signal,” in Proc. ACE Challenge
Workshop, a satellite event of WASPAA, New Paltz, NY, USA, Oct 2015.

[97] Y. Hioka, K. Niwa, S. Sakauchi, K. Furuya, and Y. Haneda, “Estimat-

ing direct-to-reverberant energy ratio using D/R spatial correlation matrix
model,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 19, no.
8, pp. 2374–2384, 2011.

[98] M. Kuster, “Estimating the direct-to-reverberant energy ratio from the coher-
ence between coincident pressure and particle velocity,” The Journal of the
Acoustical Society of America, vol. 130, no. 6, pp. 3781–3787, 2011.

[99] H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Estimation

of the direct-to-reverberant energy ratio using a spherical microphone array.,”
in Proc. ACE Challenge Workshop, a satellite event of WASPAA, New Paltz,
NY, USA, Oct 2015.

[100] J. Eaton, A. H. Moore, N. D. Gaubitch, and P. A. Naylor, “The ACE challenge

- corpus description and performance evaluation,” in Proc. IEEE Workshop
on Applications of Signal Processing to Audio and Acoustics (WASPAA), New
Paltz, NY, USA, Oct 2015.

[101] E. G. Williams, Fourier Acoustics: Sound Radiation and Near field Acoustical
Holography, USA: Academic, 1999.

[102] F.J. Fahy, Sound Intensity, Elsevier Applied Science, London, 1989.

[103] T. D. Abhayapala and H. Bhatta, “Coherent broadband source localization by

modal space processing,” in Proc. 10th International Conference on Telecom-
munications (ICT 2003), 2003, vol. 2, pp. 1617–1623.
174 Bibliography

[104] H. Chen, T. D. Abhayapala, P. Samarasinghe, and W. Zhang, “Direct-

to-reverberant energy ratio estimation using a first order microphone,”
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.
PP, no. 99, pp. 1–1, 2016.

[105] N. Epain and E. Friot, “Active control of sound inside a sphere via control of
the acoustic pressure at the boundary surface,” J. Sound Vibr., vol. 299, no.
3, pp. 587–604, 2007.

[106] M. Naoe, T. Kimura, and M. Katumoto, “Performance evaluation of 3D sound

field reproduction system using a few loudspeakers and wave field synthesis,”
in Proc. 2nd Int. Symp. Universal Commun., Osaka, Japan, 2008.

[107] F. M. Fazi, P. A. Nelson, Christensen J. E. N., and J. Seo, “Surround system

based on three-dimensional sound field reconstruction,” in Proc. 125th Conv.
Audio Eng. Soc., San Francisco, CA, USA, 2008.

[108] W. Zhang and T. D. Abhayapala, “Three dimensional sound field reproduc-

tion using multiple circular loudspeaker arrays: Functional analysis guided
approach,” IEEE Trans. Audio, Speech, Lang. Process, vol. 22, no. 7, pp.
1184–1194, 2014.

[109] Y. J. Wu and T. D. Abhayapala, “Spatial multizone sound field reproduction:

Theory and design,” IEEE Trans. Audio, Speech, Lang. Process, vol. 19, no.
6, pp. 1711–1715, 2011.

[110] “em32 eigenmike microphone array release notes,”

www.mhacoustics.com/sites/default/files/ReleaseNotes.pdf, 2013.

[111] S. C. Douglas, “Fast implementations of the filtered-X LMS and LMS algo-
rithms for multichannel active noise control,” IEEE Transactions on Speech
and Audio Processing, vol. 7, no. 4, pp. 454–465, Jul 1999.

[112] M. de Diego, A. Gonzalez, M. Ferrer, and G. Pinero, “An adaptive algorithms

comparison for real multichannel active noise control,” in Proc. 12th European
Signal Processing Conference, Sep 2004, pp. 925–928.

[113] M. de Diego, A. Gonzalez, M. Ferrer, and G. Pinero, “Multichannel active

noise control system for local spectral reshaping of multifrequency noise,”
Journal of Sound and Vibration, vol. 274, no. 1C2, pp. 249 – 271, 2004.
Bibliography 175

[114] A. Montazeri, J. Poshtan, and M. H. Kahaei, “Analysis of the global reduction

of broadband noise in a telephone kiosk using a MIMO modal ANC system,”
International Journal of Engineering Science, vol. 45, no. 2C8, pp. 679 – 697,
2007.

[115] H. Chen, P. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Spatial noise

cancellation inside cars: Performance analysis and experimental results,” in
Proc. 2015 IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA), Oct 2015, pp. 1–5.

[116] P. N. Samarasinghe, W. Zhang, and T. D. Abhayapala, “Recent advances in

active noise control inside automobile cabins: Toward quieter cars,” IEEE
Signal Processing Magazine, vol. 33, no. 6, pp. 61–73, 2016.

[117] P. A. Nelson and S. J. Elliott, Active control of sound, Academic press, 1991.

[118] S. Hasegawa, T. Tabata, A. Kinoshita, and H. Hyodo, “The development of

an active noise control system for automobiles,” Tech. Rep., SAE Technical
Paper, 1992.

[119] C. Bohn, A. Cortabarria, V. Härtel, and K. Kowalczyk, “Active control of

engine-induced vibrations in automotive vehicles using disturbance observer
gain scheduling,” Control Engineering Practice, vol. 12, no. 8, pp. 1029–1039,
2004.

[120] S. J. Elliott and P. A. Nelson, “The active control of sound,” Electronics &
communication engineering journal, vol. 2, no. 4, pp. 127–136, 1990.

[121] S. J. Elliott, “A review of active noise and vibration control in road vehicles,”
Technical Report 981, ISVR Technical Memorandum, 2008.

[122] Xuan Li, Shefeng Yan, Xiaochuan Ma, and Chaohuan Hou, “Spherical har-
monics MUSIC versus conventional MUSIC,” Applied Acoustics, vol. 72, no.
9, pp. 646 – 652, 2011.

[123] H. Chen, P. Samarasinghe, and T. D. Abhayapala, “In-car noise field analysis

and multi-zone noise cancellation quality estimation,” in Proc. 2015 Asia-
Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA), Dec 2015, pp. 773–778.
176 Bibliography

[124] H. Chen, J. Zhang, P. N. Samarasinghe, and T. D. Abhayapala, “Evalua-

tion of spatial active noise cancellation performance using spherical harmonic
analysis,” in Proc. 2016 IEEE International Workshop on Acoustic Signal
Enhancement (IWAENC), Sept 2016, pp. 1–5.

[125] H. Chen, T. D. Abhayapala, and W.Zhang, “Enhanced sound field reproduc-

tion within prioritized control region,” in Proc. Inter.noise 2014, Nov 2014,
p. 596.

[126] Y. Kajikawa, W. S. Gan, and S. M. Kuo, “Recent advances on active noise

control: Open issues and innovative applications,” APSIPA Transactions on
Signal and Information Processing, vol. 1, pp. 21, Apr 2012.

[127] S.J. Elliott, P.A. Nelson, I.M. Stothers, and C.C. Boucher, “In-flight exper-
iments on the active control of propeller-induced cabin noise,” Journal of
Sound and Vibration, vol. 140, no. 2, pp. 219 – 238, 1990.

[128] J. Cheer, Active control of the acoustic environment in an automobile cabin,

Ph.D. thesis, University of Southampton, 2012.

[129] D. P. Das, G. Panda, and S. M. Kuo, “New block filtered-x lms algorithms for
active noise control systems,” IET Signal Processing, vol. 1, no. 2, pp. 73–81,
June 2007.

View publication stats

Upcat 2024 Form1 1241324137
100% (1)
Upcat 2024 Form1 1241324137
2 pages
(IISc Lecture Notes Series, V. 3) M L Munjal - Noise and Vibration Control
No ratings yet
(IISc Lecture Notes Series, V. 3) M L Munjal - Noise and Vibration Control
294 pages
ABB Inverter Price List - OBNIPOER
No ratings yet
ABB Inverter Price List - OBNIPOER
7 pages
Thesis
No ratings yet
Thesis
158 pages
Frequency Noise Control of Het
No ratings yet
Frequency Noise Control of Het
149 pages
Vibration Engineering
No ratings yet
Vibration Engineering
408 pages
Visualization of Long Duration Acoustic Recordings of The Environment 2014 Procedia Computer Science
No ratings yet
Visualization of Long Duration Acoustic Recordings of The Environment 2014 Procedia Computer Science
11 pages
495-Oasis 2017
No ratings yet
495-Oasis 2017
112 pages
7thAsianAerosolConference2011 Part1shrunk PDF
No ratings yet
7thAsianAerosolConference2011 Part1shrunk PDF
685 pages
Yung Boon Chong PWE
No ratings yet
Yung Boon Chong PWE
352 pages
Wenmaekers
No ratings yet
Wenmaekers
245 pages
Chayan Kumer PHD Afhandling
No ratings yet
Chayan Kumer PHD Afhandling
216 pages
Lametal - LargescalenoisehealthstudyAcoustics2012paper F Rev
No ratings yet
Lametal - LargescalenoisehealthstudyAcoustics2012paper F Rev
7 pages
China Pakistan Economic Corridor Socio-Cultural Cooperation and Its Impact PDF
No ratings yet
China Pakistan Economic Corridor Socio-Cultural Cooperation and Its Impact PDF
581 pages
2014MTA Aninvestigationofpixelresonancephenomenon
No ratings yet
2014MTA Aninvestigationofpixelresonancephenomenon
20 pages
Electrostatic Precipitation: Keping Yan
No ratings yet
Electrostatic Precipitation: Keping Yan
30 pages
ISBN: 978-1-5090-6338-3 IEEE Catalog Number: CFP17L31-USB
No ratings yet
ISBN: 978-1-5090-6338-3 IEEE Catalog Number: CFP17L31-USB
5 pages
International Conference On Chemistry and Material Science (IC2MS) 2017
No ratings yet
International Conference On Chemistry and Material Science (IC2MS) 2017
8 pages
Noise Attenuation
No ratings yet
Noise Attenuation
7 pages
10th International Symposium On Fire Safety Science University of Maryland, USA 19-24 June 2011
No ratings yet
10th International Symposium On Fire Safety Science University of Maryland, USA 19-24 June 2011
14 pages
2018 IOP Conf. Ser.: Mater. Sci. Eng. 306 011001
No ratings yet
2018 IOP Conf. Ser.: Mater. Sci. Eng. 306 011001
9 pages
1 International Symposium On Green Technology For Value Chains 2016
No ratings yet
1 International Symposium On Green Technology For Value Chains 2016
4 pages
Simulation-Based Engineering Science
No ratings yet
Simulation-Based Engineering Science
88 pages
Dissertation Ideas Sound Engineering
100% (2)
Dissertation Ideas Sound Engineering
4 pages
Full download Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part II 1st Edition Jinho Kim pdf docx
100% (5)
Full download Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part II 1st Edition Jinho Kim pdf docx
65 pages
Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part I 1st Edition Jinho Kim 2024 scribd download
100% (5)
Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part I 1st Edition Jinho Kim 2024 scribd download
55 pages
Concepts, Methods and Tools in Kansei Engineering: Theoretical Issues in Ergonomics Science May 2004
No ratings yet
Concepts, Methods and Tools in Kansei Engineering: Theoretical Issues in Ergonomics Science May 2004
22 pages
Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part I 1st Edition Jinho Kim pdf download
100% (3)
Advances in Knowledge Discovery and Data Mining 21st Pacific Asia Conference PAKDD 2017 Jeju South Korea May 23 26 2017 Proceedings Part I 1st Edition Jinho Kim pdf download
69 pages
Noise Pollution A3
No ratings yet
Noise Pollution A3
34 pages
USCT Data Challenge PDF
No ratings yet
USCT Data Challenge PDF
10 pages
Complete Download Web and Big Data APWeb WAIM 2017 International Workshops MWDA HotSpatial GDMA DDC SDMA MASS Beijing China July 7 9 2017 Revised Selected Papers 1st Edition Shaoxu Song PDF All Chapters
100% (4)
Complete Download Web and Big Data APWeb WAIM 2017 International Workshops MWDA HotSpatial GDMA DDC SDMA MASS Beijing China July 7 9 2017 Revised Selected Papers 1st Edition Shaoxu Song PDF All Chapters
62 pages
Latest Developments in Civil Engineering Proceedings From the International
No ratings yet
Latest Developments in Civil Engineering Proceedings From the International
991 pages
FinalYearProject-AdamYassin
No ratings yet
FinalYearProject-AdamYassin
37 pages
A_system_for_spatial_hearing_research
No ratings yet
A_system_for_spatial_hearing_research
8 pages
Fgcs Wimmer
No ratings yet
Fgcs Wimmer
10 pages
ICNIB 2023 Abstract Booklet
No ratings yet
ICNIB 2023 Abstract Booklet
186 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
59 pages
Elizabeth Amudhini Stephen
No ratings yet
Elizabeth Amudhini Stephen
6 pages
2013 - PAWEES - PROGRAM - 2013 修正
No ratings yet
2013 - PAWEES - PROGRAM - 2013 修正
20 pages
Proceedings of the Fifth International Conference in Ocean Engineering (ICOE2019) Vallam Sundar download
100% (5)
Proceedings of the Fifth International Conference in Ocean Engineering (ICOE2019) Vallam Sundar download
67 pages
Aletta - Xiao - What Are The Current Challenges in Soundscape Research - 2018
No ratings yet
Aletta - Xiao - What Are The Current Challenges in Soundscape Research - 2018
11 pages
Fluid Structure Sound Interactions and Control Proceedings of the 3rd Symposium on Fluid Structure Sound Interactions and Control 1st Edition Yu Zhou download
100% (5)
Fluid Structure Sound Interactions and Control Proceedings of the 3rd Symposium on Fluid Structure Sound Interactions and Control 1st Edition Yu Zhou download
57 pages
Proceedings of The 2nd International Conference On Innovative Solutions in Hydropower Engineering and Civil Engineering
100% (1)
Proceedings of The 2nd International Conference On Innovative Solutions in Hydropower Engineering and Civil Engineering
519 pages
Dutycycle MRI
No ratings yet
Dutycycle MRI
140 pages
Report
No ratings yet
Report
57 pages
Anexo6. INSOMNIOpdf
No ratings yet
Anexo6. INSOMNIOpdf
688 pages
基于卡尔曼滤波变体训练的径向基函数神经网络水面舰艇智能控制
No ratings yet
基于卡尔曼滤波变体训练的径向基函数神经网络水面舰艇智能控制
168 pages
Proceedings of The Fifth International Conference in Ocean Engineering (ICOE2019) Vallam Sundar Ebook All Chapters PDF
100% (5)
Proceedings of The Fifth International Conference in Ocean Engineering (ICOE2019) Vallam Sundar Ebook All Chapters PDF
62 pages
Instant Download Advanced Computational Methods in Energy Power Electric Vehicles and Their Integration International Conference on Life System Modeling and Simulation LSMS 2017 and International Conference on Intelligent Computing for Sustainable Energy and Environ 1st Edition Kang Li PDF All Chapters
100% (1)
Instant Download Advanced Computational Methods in Energy Power Electric Vehicles and Their Integration International Conference on Life System Modeling and Simulation LSMS 2017 and International Conference on Intelligent Computing for Sustainable Energy and Environ 1st Edition Kang Li PDF All Chapters
52 pages
Simulating Autophony With Auralized Oral-Binaural Room Impulse Responses
No ratings yet
Simulating Autophony With Auralized Oral-Binaural Room Impulse Responses
9 pages
Earth and Space 2010 - Engineering, Science, Construction, and Operations in Challenging Environments, 2010
No ratings yet
Earth and Space 2010 - Engineering, Science, Construction, and Operations in Challenging Environments, 2010
3,901 pages
Buy ebook Space Information Networks: Second International Conference, SINC 2017, Yinchuan, China, August 10-11, 2017, Revised Selected Papers 1st Edition Quan Yu (Eds.) cheap price
100% (1)
Buy ebook Space Information Networks: Second International Conference, SINC 2017, Yinchuan, China, August 10-11, 2017, Revised Selected Papers 1st Edition Quan Yu (Eds.) cheap price
55 pages
Newsletter Winter 08
No ratings yet
Newsletter Winter 08
4 pages
Simulation of Room Acoustics Using Comsol Multiphysics
No ratings yet
Simulation of Room Acoustics Using Comsol Multiphysics
7 pages
Ias Symposium On Activated Sludge-Past and Next 100 Years: 26 - 28 August 2014
No ratings yet
Ias Symposium On Activated Sludge-Past and Next 100 Years: 26 - 28 August 2014
18 pages
A Study On Five International Scientists: Micro Project Report
No ratings yet
A Study On Five International Scientists: Micro Project Report
18 pages
Soundscape Study of An Urban Campus Park
No ratings yet
Soundscape Study of An Urban Campus Park
12 pages
Call for Papers(2)
No ratings yet
Call for Papers(2)
7 pages
Acoustic Design of An Auditorium: July 2023
No ratings yet
Acoustic Design of An Auditorium: July 2023
6 pages
Buy ebook Intelligent Robotics and Applications 11th International Conference ICIRA 2018 Newcastle NSW Australia August 9 11 2018 Proceedings Part II Zhiyong Chen cheap price
100% (2)
Buy ebook Intelligent Robotics and Applications 11th International Conference ICIRA 2018 Newcastle NSW Australia August 9 11 2018 Proceedings Part II Zhiyong Chen cheap price
38 pages
Fiber Optics and Optoelectronic Devices
From Everand
Fiber Optics and Optoelectronic Devices
S Mohan
No ratings yet
Writing Successful Scientific Papers A User’s Guide
From Everand
Writing Successful Scientific Papers A User’s Guide
Ho-Young, Song
No ratings yet
Interpersonal Skills: Section 3
No ratings yet
Interpersonal Skills: Section 3
26 pages
SQL Constraints Unit III (1)
No ratings yet
SQL Constraints Unit III (1)
16 pages
Transformers: Shell Type
No ratings yet
Transformers: Shell Type
5 pages
Financial Closing Checklist
No ratings yet
Financial Closing Checklist
3 pages
209c AO-III PDF
No ratings yet
209c AO-III PDF
1 page
Unit 10 Band Theory of Solids: Structure
No ratings yet
Unit 10 Band Theory of Solids: Structure
29 pages
6 Hold On To Your Kids
No ratings yet
6 Hold On To Your Kids
3 pages
Schumer's AI One Pager
No ratings yet
Schumer's AI One Pager
1 page
Chinese Dragon Reiki
100% (9)
Chinese Dragon Reiki
13 pages
Telangana State Road Transport Corporation: 1.student Details
No ratings yet
Telangana State Road Transport Corporation: 1.student Details
2 pages
Business Final Exam Review
No ratings yet
Business Final Exam Review
47 pages
Formal Greeting 2
No ratings yet
Formal Greeting 2
4 pages
978 1 941926 11 6 - Chapter06
No ratings yet
978 1 941926 11 6 - Chapter06
22 pages
Band By Band _ Overall
No ratings yet
Band By Band _ Overall
30 pages
8b45614b en
No ratings yet
8b45614b en
526 pages
Digital Education Action Plan (2021-2027) - Education and Training
No ratings yet
Digital Education Action Plan (2021-2027) - Education and Training
8 pages
Max Stirner The Ego and His Own
No ratings yet
Max Stirner The Ego and His Own
213 pages
Prisoner Form Medical Mental Health Records
No ratings yet
Prisoner Form Medical Mental Health Records
3 pages
Ebrochure Blackseed
No ratings yet
Ebrochure Blackseed
23 pages
Comm. Electronics (2011 s11) - l300ppt 1
No ratings yet
Comm. Electronics (2011 s11) - l300ppt 1
17 pages
Fr. Jean-Baptiste Saint-Jure - A Treatise On The Knowledge and Love of Our Lord Jesus Christ - II
No ratings yet
Fr. Jean-Baptiste Saint-Jure - A Treatise On The Knowledge and Love of Our Lord Jesus Christ - II
628 pages
The Volatility Machine Emerging Economics and the Threat of Financial Collapse 1st Edition Michael Pettis instant download
No ratings yet
The Volatility Machine Emerging Economics and the Threat of Financial Collapse 1st Edition Michael Pettis instant download
42 pages
30 Transactions of Journal, Ledger, Trial Balance, Financial Sta
No ratings yet
30 Transactions of Journal, Ledger, Trial Balance, Financial Sta
11 pages
150744
No ratings yet
150744
4 pages
Edtpa Lesson 1 - Pile Patterns
No ratings yet
Edtpa Lesson 1 - Pile Patterns
3 pages
Chapter 15
No ratings yet
Chapter 15
16 pages
Numbers Apptitude
No ratings yet
Numbers Apptitude
7 pages
NSTP 012 - WEEK 2 - National Service Training Program (CWTS)
No ratings yet
NSTP 012 - WEEK 2 - National Service Training Program (CWTS)
4 pages

TheoryandDesignofSpatialActiveNoiseControlSystems

Uploaded by

TheoryandDesignofSpatialActiveNoiseControlSystems

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Theory and Design of Spatial Active Noise Control Systems

Thesis · January 2017

The user has requested enhancement of the downloaded file.

Bachelor of Engineering (Hons 1)

©Hanchi Chen 2017

9L]LYZL 9.) 9.)

 H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact

 H. Chen, T. D. Abhayapala, P. N. Samarasinghe, and W. Zhang, “Direct-to-

 P. N. Samarasinghe, T. D. Abhayapala, and H. Chen, “Estimating the Direct-

 H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-

 H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Estimation

in Proc. ACE Challenge Workshop, a satellite event of WASPAA, New Paltz,

 H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Spatial noise

 H. Chen, P. N. Samarasinghe, and T. D. Abhayapala, “In-car noise field

 H. Chen, J. Zhang, P. N. Samarasinghe, and T. D. Abhayapala, “Evalua-

 H. Chen, T. D. Abhayapala, and W.Zhang, “Enhanced sound field reproduc-

 G. Dickins, H. Chen and W. Zhang, “Soundfield control for consumer de-

Research School of Engineering

 Dr. Prasanga Samarasinghe, who had provided suggestions on many research

 My fellow students in the Applied Signal Processing Group, specially Jing,

 My parents for sending me to Australia in the first place, and supporting my

 Finally, my girlfriend Mendy, for accompanying me throughout my PhD study

Notations and Symbols ix

2 Background: Spherical harmonic analysis and synthesis of sound

3 Planar microphone array apertures for 3D spatial sound field anal-

4 3D sound field analysis using circular higher order microphone ar-

5 Direct-to-reverberant energy ratio estimation using a first order

6 Methods for spatial ANC performance evaluation and optimization 89

6.3.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . 105

7 Spatial active noise cancellation system architectures 129

8 Conclusion and future works 157

8.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

d·e ceiling operator

1.1 Motivation and scope

Figure 1.1: Passive noise control system.

Figure 1.2: Active noise control system.

Figure 1.3: Structure of feed-forward active noise cancelling headphones.

Figure 1.4: Feed-forward MIMO active noise cancelling system.

1.2 Problem description

Figure 1.5: Breakdown of the spatial active noise cancellation problem.

adaptive algorithm enables an ANC system to quickly respond to changes in the

1.3 Recent advancements in spatial ANC

1.4 Thesis outline

Chapter 2 - Background: Spherical harmonic analysis and

Chapter 3 - Planar microphone array apertures for 3D spatial

Chapter 4 - 3D sound field analysis using circular higher or-

Chapter 5 - Direct-to-reverberant energy ratio estimation us-

Chapter 6 - Methods for spatial ANC performance evaluation

In Chapter 6, we develop a series of methods to estimate and optimize spatial noise

Chapter 7 - Spatial Active Noise Cancellation System Archi-

Chapter 8 - Conclusion and future works

Background: Spherical harmonic

2.1 Spherical harmonic expansion of a sound field

Ylm (ϑ, ϕ) has the orthogonal property

A rule of thumb for determining the upper bound L is given by [53–55]

2.2 Properties of the spherical harmonic expan-

2.2.1 Recurrent property of associated Legendre functions

P 0 l|m| (0) = (|m| + l)P(l−1),|m| (0), (2.9)

which illustrates a relationship between the normalized associate Legendre functions

2.2.2 Addition theorem

Equation (2.12) can be conveniently represented in matrix form, as

2.2.3 Rotation of spherical harmonics

where R denotes the rotation matrix for the spherical coordinates.

2.2.4 Relationship between first order spherical harmonics

Proof. The particle velocity Vx (x0 , k) at position x0 , in the direction x, is related

and the fact that 

It can be shown that

In addition, Y10 (π/2, 0) = 0. Therefore from (2.25) we have

In the case of (2.22), we consider the partial derivative of sound pressure at

2.2.5 Real-valued spherical harmonics

The real-value spherical harmonics can be defined as [62]

The real-value spherical harmonics have the orthogonal property

R Ylm (θ, φ) + Yl,−m (θ, φ)

2.3 Spatial sound recording and synthesis using

2.3.1 Spatial sound recording using spherical microphone

9L]LYZL 9.) 9.)

H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact

H. Chen, T. D. Abhayapala, P. N. Samarasinghe, and W. Zhang, “Direct-to-

P. N. Samarasinghe, T. D. Abhayapala, and H. Chen, “Estimating the Direct-

H. Chen, T. D. Abhayapala, and W. Zhang, “3D sound field analysis us-

H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Estimation

H. Chen, P. N. Samarasinghe, T. D. Abhayapala, and W. Zhang, “Spatial noise

H. Chen, P. N. Samarasinghe, and T. D. Abhayapala, “In-car noise field

H. Chen, J. Zhang, P. N. Samarasinghe, and T. D. Abhayapala, “Evalua-

H. Chen, T. D. Abhayapala, and W.Zhang, “Enhanced sound field reproduc-

G. Dickins, H. Chen and W. Zhang, “Soundfield control for consumer de-

Dr. Prasanga Samarasinghe, who had provided suggestions on many research

My fellow students in the Applied Signal Processing Group, specially Jing,

My parents for sending me to Australia in the first place, and supporting my

Finally, my girlfriend Mendy, for accompanying me throughout my PhD study

H. Chen, T.D. Abhayapala, and W. Zhang, “Planar sensor array”, Interna-

H. Chen, T. D. Abhayapala, and W. Zhang, “Theory and design of compact