0% found this document useful (0 votes)
11 views

Analytical Approach To Wave Field Reconstruction Filtering in Spatio-Temporal Frequency Domain

This document presents a new method called wave field reconstruction filtering for transforming sound pressure signals from a microphone array into driving signals for a loudspeaker array to reproduce the sound field. The method analytically derives a continuous transform equation in the spatio-temporal frequency domain and introduces spatial sampling to determine the wave field reconstruction filter. Numerical simulations show it can achieve similar performance to existing least squares methods but with advantages in filter design, size, computation and stability.

Uploaded by

Tao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Analytical Approach To Wave Field Reconstruction Filtering in Spatio-Temporal Frequency Domain

This document presents a new method called wave field reconstruction filtering for transforming sound pressure signals from a microphone array into driving signals for a loudspeaker array to reproduce the sound field. The method analytically derives a continuous transform equation in the spatio-temporal frequency domain and introduces spatial sampling to determine the wave field reconstruction filter. Numerical simulations show it can achieve similar performance to existing least squares methods but with advantages in filter design, size, computation and stability.

Uploaded by

Tao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO.

4, APRIL 2013 685

Analytical Approach to Wave Field Reconstruction


Filtering in Spatio-Temporal Frequency Domain
Shoichi Koyama, Member, IEEE, Ken’ichi Furuya, Senior Member, IEEE, Yusuke Hiwasaki, Member, IEEE,
and Yoichi Haneda, Senior Member, IEEE

Abstract—For transmission of a physical sound field in a large A number of sound field reproduction methods have been
area, it is necessary to transform received signals of a microphone proposed that make it possible to calculate driving signals of
array into driving signals of a loudspeaker array to reproduce the loudspeakers for reproducing a desired sound field. In order
sound field. We propose a method for transforming these signals by
using planar or linear arrays of microphones and loudspeakers. A to achieve real-time recording and reproducing systems that
continuous transform equation is analytically derived based on the use arrays of microphones and loudspeakers, for example,
physical equation of wave propagation in the spatio-temporal fre- telecommunication systems, it is preferable to calculate the
quency domain. By introducing spatial sampling, the uniquely de- driving signals with a sound pressure distribution obtained
termined transform filter, called a wave field reconstruction filter by the microphone array. This is because any parameters of
(WFR filter), is derived. Numerical simulations show that the WFR
filter can achieve the same performance as that obtained using the the desired sound field other than the sound pressures, for
conventional least squares (LS) method. However, since the pro- example, source positions, directions, and original signals,
posed WFR filter is represented as a spatial convolution, it has are unknown and difficult to obtain. Therefore, methods for
many advantages in filter design, filter size, computational cost, directly transforming the sound pressures into the driving
and filter stability over the transform filter designed by the LS signals are necessary. We name this type of transformation
method.
sound-pressure-to-driving-signal (SP-DS) conversion.
Index Terms—Fourier transform, sound field reproduction, The methods based on spherical harmonics expansion, e.g.,
spatio-temporal frequency, wave field reconstruction filter, wave
Ambisonics [1], can be regarded as SP-DS conversion methods
field synthesis.
through the encoding and decoding process [2]–[6]. However,
these methods cannot be used for planar or linear arrays of mi-
I. INTRODUCTION crophones and loudspeakers.
Wave field synthesis (WFS) [7] is a sound field reproduction
method based on the Kirchhoff-Helmholtz integral or the
T HE way that sound is recorded, transmitted, and repro-
duced has been investigated as a fundamental problem
in acoustical signal processing. In order to transmit a physical
Rayleigh integral, a physical equation of wave propagation.
Even though several formulations of WFS have been proposed,
most of them cannot be employed for SP-DS conversion.
sound field in a large area, recording and reproduction of signals
The original formulation of WFS introduces stationary phase
requires the use of many microphones and loudspeakers, as well
approximation into the Rayleigh I integral [8]–[10]. As a result,
as the transformation of these signals. The authors have focused
the parameters of sound sources to be reproduced, primary
on methods for transforming signals received by microphones
sources, are included in the function of driving signals. Another
into driving signals of loudspeakers in order to reproduce sound
formulation proposed by Spors, et al. [11] includes a planar or
fields when the alignments of the microphones and loudspeakers
linear distribution of the sound pressure gradient in the function
are planar or linear.
of driving signals. Because it is difficult to obtain them by using
an ordinary microphone array, this method cannot be used for
SP-DS conversion. The spectral division method for a planar or
Manuscript received June 14, 2012; revised August 30, 2012 and November
09, 2012; accepted November 13, 2012. Date of publication November 27, linear array of loudspeakers was proposed by Ahrens and Spors
2012; date of current version January 11, 2013. Part of this work was presented [12]. This method is not suitable for SP-DS conversion for rea-
at the 2011 IEEE Workshop on Applications of Signal Processing to Audio and sons discussed in detail in Section II. Another method proposed
Acoustics (WASPAA). The associate editor coordinating the review of this man-
uscript and approving it for publication was Prof. Woon-Seng Gan. by Ahrens and Spors [13] can be used as a means of SP-DS
S. Koyama and Y. Hiwasaki are with NTT Media Intelligence Laboratories, conversion when spherical harmonics expansion of the desired
Nippon Telegraph and Telephone Corporation, Tokyo 180-8585 Japan (e-mail: sound field can be estimated. Although the method based on
[email protected]; [email protected]).
K. Furuya was with NTT Media Intelligence Laboratories, Nippon Telegraph wave field analysis [14] or the method previously proposed by
and Telephone Corporation, Tokyo 180-8585 Japan. He is now with the Depart- the authors [15] can be applied as SP-DS conversion methods
ment of Computer Science and Intelligent Systems, Faculty of Engineering, Oita derived by extending WFS, an extrapolation of a sound field
University, Oita 870-1192, Japan (e-mail: [email protected]).
Y. Haneda was with NTT Media Intelligence Laboratories, Nippon Telegraph under the free-field assumption must be introduced.
and Telephone Corporation, Tokyo 180-8585 Japan. He is now with the Grad- Methods based on numerical algorithms can be applied to
uate School of Informatics and Engineering, the University of Electro-Commu- SP-DS conversion [16]–[23]. In these methods, sound pressures
nications, Tokyo 182-8585 Japan (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online at discrete points are controlled to correspond with the desired
at http://ieeexplore.ieee.org. ones. Therefore, an inverse of a known transfer function matrix
Digital Object Identifier 10.1109/TASL.2012.2229985

1558-7916/$31.00 © 2012 IEEE


Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
686 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

between loudspeakers and control points is numerically calcu- . We call these two areas the target and source, respectively.
lated to obtain driving signals of loudspeakers. Related works Primary sources are located in the source area. When secondary
can be found in the context of room compensation for WFS sources are continuously distributed on the -plane at ,
[24]–[26]. We selected a method based on a least squares (LS) the sound field synthesized by secondary sources in the target
algorithm for planar or linear arrays of microphones and loud- area is given by
speakers in a temporal frequency domain1 for comparison in
Section V [21]. However, this extended method from the multi-
(1)
point control technique causes several difficulties in designing
and applying LS-based filters.
We propose an SP-DS conversion method for planar or linear
where is the position vector in the target area,
arrays of microphones and loudspeakers. For a simple procedure
is the position vector on the -plane at
of designing and applying a transform filter, we apply concepts
, is the synthesized sound pressure of temporal
of continuous distribution of receivers and secondary sources,
frequency at , is the driving signal of the sec-
i.e., microphones and loudspeakers, and physical properties of
ondary source at for the planar case, and is the
the wave propagation. These concepts enable us to derive a con-
transfer function between and excited by secondary source
tinuous transform equation that relates sound pressure distri-
at . Equation (1) can be considered as a two-dimensional con-
bution and driving signals for reproducing sound fields in the
volution of and with respect to and . Based
spatio-temporal frequency domain [27], [28]. This transform
on the convolution theorem, the spatial Fourier transform of (1)
equation is analytically derived by simultaneously solving the
with respect to and is represented as
sound field synthesized by secondary sources and the desired
sound field defined as the Rayleigh I integral, which is defined as
(2)
the wave field reconstruction equation (WFR equation). By dis-
cretization, the transform filter that converts the received signals
where and denote the spatial frequency in the directions
of the microphone array into driving signals of the loudspeaker
of and , respectively. The variables in the spatial frequency
array is uniquely determined. This transform filter is defined as a
domain are hereafter indicated by tildes. In this context, the spa-
wave field reconstruction filter (WFR filter). Therefore, signals
tial Fourier transform is defined as
obtained using only a planar or linear omni-directional micro-
phone array are all that is required to calculate driving signals
of a planar or linear loudspeaker array. Since the WFR filter
is represented as a spatial convolution in addition to its analyt-
ical derivation, it has many advantages in terms of filter design,
filter size, computational cost, and filter stability compared to (3)
the conventional LS method.
This paper is organized as follows. In Section II, we present
the spectral division method, which is the basis of our proposed where and denote Fourier transform operators with re-
method. The WFR equation derived by using continuous dis- spect to and , respectively. The idea of the spectral division
tributions of receivers and secondary sources is presented in method is to directly solve (2) to calculate the driving signals of
Section III. In Section IV, spatial sampling is introduced, and secondary sources:
the WFR filter as a discrete transform filter is described. The
derivation of the LS method is presented, and a comparison of (4)
the properties between the proposed and LS methods is given in
Section V. Section VI reports on simulation experiments that The driving signals are derived by substituting analytical repre-
compare the proposed method with the LS method. Finally, sentations of virtual sound fields in [12], [29].
Section VII concludes this paper. To implement an SP-DS conversion, it is necessary to somehow
relate the received signals to the driving signals.
II. SPECTRAL DIVISION METHOD
Our study is based on the spectral division method for a planar III. DERIVATION OF WFR EQUATION
or linear distribution of secondary sources [12]. This method is The problem is how to calculate the driving signals for repro-
derived under the assumption that secondary sources are con- ducing the sound field when only the sound pressure distribution
tinuously distributed. However, this method cannot be applied on the -plane at is known. As shown in Fig. 1(a), the
as an SP-DS conversion in its original form. sound pressure distribution is captured on the receiving plane in
The following is a formulation of the spectral division method the source area. The secondary sources are driven by signals
for a planar secondary source distribution. As shown in Fig. 1, transformed from the captured sound pressure distribution in
the sound field in the half-space of must be reproduced order to reproduce the sound field in the target area (Fig. 1(b)).
to coincide with the sound field created in the half-space of The strategy of our method is that the sound field synthesized by
1“Frequency” is described as “temporal frequency” to distinguish it from spa- secondary sources described by (2) and the desired sound field
tial frequency. described by the physical equation of the wave propagation are
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
KOYAMA et al.: ANALYTICAL APPROACH TO WAVE FIELD RECONSTRUCTION FILTERING 687

Fig. 1. Sound pressure distribution at receiving plane is obtained in source area. In target area, sound field is reproduced by using planar distribution of secondary
sources. (a) Source area ; (b) target area .

simultaneously solved in the spatio-temporal frequency domain. dient can be estimated from the spatial frequency spectrum on
That leads to the WFR equation that relates the sound pressure the receiving plane in a similar manner to that given in [30].
distribution on the -plane at and the driving sig-
nals of secondary sources. We derive the WFR equation when
the distributions of receivers and secondary sources are planar
or linear.

A. Planar Receiver and Secondary Source Distributions

As shown in Fig. 1, planar distributions of receivers and sec-


ondary sources are assumed. We introduce the Rayleigh I inte-
gral in three dimensions as representation of the desired sound
field in the target area. This equation represents the sound field
in a half-space of a planar boundary based on the planar distri-
bution of the sound pressure gradient on the boundary as follows
[30]: (7)

where

(8)

(5) Therefore, the spatial Fourier transform of (5) with respect to


Here, is the desired sound pressure, and is the and is represented as
free-field Green function in three dimensions defined as
(9)
(6) This equation represents the desired sound field in the spatio-
temporal frequency domain.
where is the wave number, and is sound velocity. The synthesized and desired sound fields are represented
The abbreviated notation is the directional gradient in as (2) and (9), respectively, in the spatio-temporal frequency
the direction of at . domain. When these equations are simultaneously solved,
Even though the conventional WFS is based on (5), the i.e., , the WFR equation that relates the
driving signal results in the function including the sound spatial frequency spectrum of the received signals and that
pressure gradient at in the direction of , which is derived of the driving signals is derived. For simplicity, we assume
under the assumption that secondary sources can be approxi- that each secondary source can be approximated as monopole,
mated as monopole, i.e., [11], [31]. However, . Therefore,
the distribution of the sound pressure gradient in the normal
direction of the receiving plane is difficult to obtain using an
ordinary microphone array. Therefore, it is difficult to apply (10)
conventional WFS to an SP-DS conversion [9], [10]. where
Now, we consider the spatial Fourier transform of (5) with
respect to and . The distribution of the sound pressure gra- (11)
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
688 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

Fig. 2. Sound pressure distribution at receiving line is obtained in source area. In target area, sound field is reproduced by using linear distribution of secondary
sources. (a) Source area ; (b) target area .

Only the sound pressure distribution on the receiving plane, where is the 0-th order Hankel function of second kind.
, is required to calculate (10) because (11) is deter- Although the sound fields in the source and target areas are in
mined without setting any parameters. three dimensions, the desired sound field is assumed to be two
dimensional. This assumption means that captured and repro-
B. Linear Receiver and Secondary Source Distributions duced sound fields are invariant with regard to changes in the
Because the sound field in the horizontal ear-plane is the most -axis in the source and target areas, respectively. Obviously,
important from a perception viewpoint [32], only a sound field this assumption is difficult to meet in practical situations; there-
reproduced at a constant height is discussed. For reproduction fore, it leads to some artifacts as discussed further below.
on the -plane at only, the linear receiver and sec- The spatial Fourier transform of (14) with respect to can be
ondary source distributions at the -axis are approximately ap- derived by using an estimation of the sound pressure gradient in
plicable (Fig. 2). The WFR equation for the linear distributions the spatio-temporal frequency domain. This estimation can be
are derived in a similar way as the planar case. derived in a similar way as the planar case:
Since the secondary sources are assumed to be continuously
distributed along the -axis in the target area, the synthesized (16)
sound field is described as follows [12]:
where
(12)
(17)

where denotes the driving signals for the linear case. As mentioned before, it is assumed that the desired sound field
Here, we focus only on the -plane at . The spatial is invariant with regard to changes in the -axis, i.e., two dimen-
Fourier transform with respect to of (12) is represented as sions. Therefore, (16) is only valid when the primary sources are
on the -plane at only in the three-dimensional space.
(13) Based on the convolution theorem, the spatial Fourier transform
of (14) with respect to is represented as
As in the planar case, in (12) is the transfer function be-
tween and in three dimensions because dimensionality does
not depend on the array configuration. Since only the -plane
(18)
at is focused on, the -coordinate in (13) is set as 0.
This equation represents the desired sound field on the
Now, it is assumed that the desired sound field follows the
-plane at in the spatio-temporal frequency domain.
Rayleigh I integral in two dimensions. This equation represents
When the equations of the synthesized and desired
the two-dimensional sound field in the half-plane of a linear
sound fields, (13) and (18), are simultaneously solved, i.e.,
boundary based on the linear distribution of sound pressure gra-
, the WFR equation for linear distributions
dient on the boundary as follows [30]:
is derived. The three-dimensional synthesized and two-dimen-
sional desired sound fields are combined in this derivation,
(14) which is utilized in the spectral division method [12]. As in
the planar case, it is assumed that each secondary source can
be approximated as monopole, , for simplicity.
Here, is the free-field Green function in two dimensions Therefore,
defined as follows [30]:

(15) (19)
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
KOYAMA et al.: ANALYTICAL APPROACH TO WAVE FIELD RECONSTRUCTION FILTERING 689

where the source area on the -plane at the arbitrarily chosen con-
stant is extracted and reproduced.

IV. WFR FILTER FOR LINEAR ARRAYS


In a practical implementation, in order to use microphone and
(20) louspeaker arrays, the receiver and secondary source distribu-
tions must be discretized and truncated. The transformation of
the received signals into the driving signals is also processed
The analytical solutions for the spatial Fourier transform of (6) as a digital filter, i.e., the WFR filter as a transform filter. A
and (15) with respect to are used to derive (20). These are block diagram of the proposed method using linear arrays is de-
described in the Appendix. picted in Fig. 3 when the WFR filter is designed as a finite im-
Equation (20) depends on in the target area. This means pulse response (FIR) filter. The sound pressure distribution in
that (13) and (18) can be equivalent only on a line parallel to the source area is obtained by using an equally spaced omni-di-
the -axis. Therefore, the reference line must be set, rectional microphone array. A loudspeaker array is arranged at
and it leads to faster amplitude decay than desired [5], [13]. This the coinciding position with the microphone array in the target
artifact comes from the mismatch between the two-dimensional area, and each loudspeaker is assumed to be omni-directional.
assumption of the desired sound field and the three-dimensional The numbers of microphones and loudspeakers are the same,
synthesized sound field. If in (13) has the same character- and are denoted as . The received signals of the microphone
istic as a line source, i.e., it is invariant with regard to changes in array and the driving signals of the loudspeaker array in the tem-
the -axis, this artifact may not appear. Since is assumed poral frequency domain are respectively denoted as
to have monopole characteristic in order to derive (20), the syn-
thesized sound field propagates axisymmetrically with a central
axis on the secondary source line, and the amplitude decay be-
comes faster.
Similar to the planar case, only the sound pressure distribu- The spatial discrete Fourier transform (DFT) is achieved with
tion along the -axis is needed to calculate (19) because (20) is the proper size of zero-padding, and the number of samples for
determined without setting any parameters except for . the spatial DFT is denoted as . The spatial frequency spec-
As mentioned before, the desired sound field is assumed to be trum of and is respectively denoted as
two-dimensional in the source area, which only requires the pri-
mary sources to be on the -plane at . Since the
actual sound field in the source area is three dimensional, pri-
mary sources and reflected image sources may not exist on the
-plane at . Therefore, all primary sources along the
axisymmetric position with a central axis on the receiving line According to (19) and (20), , the -th element of ,
are projected on a two-dimensional plane. can be calculated as
For computational simplicity, we derive a simplified form of
(23)
(20). The Hankel function can be approximated for large argu-
ments as where

(21)
(24)
if . Therefore, (20) can be approximated for
as Here, denotes the spatial frequency in the -th bin.
Therefore, the matrix representation of transformation can be
described as

(25)
(22)
where
This equation is simpler than (20) to calculate numerically.
If the planar receiver and linear secondary source distribu- (26)
tions are combined, i.e., Figs. 1(a) and 2(b), the desired sound
field in the source area is regarded as three dimensional. Since Equation (26) is the WFR filter in the spatio-temporal frequency
the distribution of the sound pressure gradient can be estimated domain. The WFR filter in the spatial domain, which is denoted
based on (7), the primary sources are not required to be only on by , needs to be trimmed to a length that is sufficient for
the -plane at . The WFR filter for the planar receiver reproduction accuracy because this filter is designed as an FIR
and linear secondary source distributions is derived by replacing filter. Although it may be possible to design it as an infinite im-
with in (20) or (21). In the target area, the sound field in pulse response (IIR) filter, FIR filters are easier to design than
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
690 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

Fig. 3. Block diagram of proposed method for linear arrays. Received signals of microphone array are transformed into driving signals of loudspeaker array. WFR
filter is processed in spatio-temporal frequency domain.

IIR filters, and furthermore, maintaining the stability of IIR fil- line, and the position of the control line is denoted as .
ters is not entirely straightforward. The longer is, the more The number of control points, which is equivalent to that of the
precise the WFR filter becomes. The spatial DFT size, , microphones, is denoted as . Synthesized and desired sound
must be larger or equal to . This is because (25) can pressures at the control points in the temporal-frequency domain
be considered as the convolution of the -size vector and are denoted as
the -size vector in the spatial domain. The driving
signal is sorted from the spatial inverse DFT (IDFT) of
.
The elements of the WFR filter (24) are determined only by
setting the position of the reference line . The Hankel func- The signals received by the microphone array in the source area
tion can be calculated numerically, e.g., as shown in [33], or the are applied as . Here, the transfer function matrix is
discrete form of (21) can be used as the WFR filter without the denoted as , which has the transfer function
Hankel function. between each loudspeaker and control point in each element.
The discretization and truncation of the receiver and sec- These elements are assumed to have monopole characteristics
ondary source distributions lead to several artifacts. The prop- (6). Therefore,
erties of these artifacts are the same as those of other methods (28)
using a planar or linear loudspeaker array [12], [34]. Because
of discretization, spatial aliasing errors occur above the spatial The objective of the LS method is to solve the following mini-
Nyquist frequency that is defined as mization problem at temporal frequency :

(27)

where denotes the interval of loudspeakers. The truncation (29)


of the secondary source distribution causes unnecessary reflec-
tions from the edge. Furthermore, the steep truncation of the
Under the assumption that the sound field is uniquely de-
received signals causes severe error on the edge of the driving
termined when the sound pressure distribution on the linear
signals. Therefore, a tapering window is applied to reduce the
boundary is determined, the sound field in the target area
truncation errors [8], [12].
should be reproduced [19], [21], [30]. Therefore, (29) is
uniquely solved as
V. COMPARISON WITH LS METHOD
When only sound pressures at discrete points in the target area (30)
are given, methods based on the LS algorithm can be applied
[16]–[21], [23]. These methods are derived as the extensions of where . Therefore, the trans-
multi-point control of sound pressure by using inverse filtering form filter based on the LS method can be described as
[35]. In this context, sound pressures at discrete control points (31)
aligned in front of the loudspeaker array are controlled in order
to correspond to sound pressures obtained by the microphone If the derivation of (31) is ill-conditioned, which can be evalu-
array in the source area, as in [21]. We describe the LS method ated by the condition number defined as the ratio of the largest
in this context in order to compare properties with the proposed and smallest singular value of [16], [36], the power-reg-
method. ularizing term is generally added to the cost function (29). This
method is referred to as the regularized LS method, and the
A. Transform Filter Based on LS Method for Linear Arrays transform filter based on this method is described as [20]:
As shown in Fig. 4, the control points are arranged along the
line parallel to the loudspeaker array, which is called the control (32)
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
KOYAMA et al.: ANALYTICAL APPROACH TO WAVE FIELD RECONSTRUCTION FILTERING 691

spatial aliasing artifacts. To make them smaller, it is nec-


essary to reduce the number of either loudspeakers or con-
trol points; however, this leads to severe errors.
The size of the proposed WFR filter, , is , and is
flexible because it is designed as an FIR filter, which is
a matter of trimming. The longer is, the more precise
the WFR filter becomes. Making smaller reduces the
spatial resolution of . However, the energy of
is concentrated near the origin. Therefore, the size
is usually sufficient for obtaining , which leads
to computational efficiency. As a result, the size of ,
, can also be a much smaller number
than the size of and , .
Fig. 4. Geometry of loudspeakers and control points for LS method. Control C) Computational cost: The computational cost for calcu-
points are arranged along line parallel to loudspeaker array.
lating the driving signals in the temporal frequency do-
main by using the LS methods is represented as
because the multiplication of the -size matrices is in-
where denotes a pre-selected regularization parameter, cluded.
and is an unit matrix. It is not necessary to apply any In the proposed method, the computational cost of (25)
spatial window with the LS methods, because these methods are is because that indicates the multiplication of
based on optimization of sound pressures on the finite length each element of -size vectors in the spatio-temporal
control line. frequency domain. The computational cost of the spatial
DFT is when the fast Fourier transform
B. Property Comparison Between the WFR and LS-Based (FFT) algorithm is applied. As a result, the computational
Filters cost of the proposed method in the temporal frequency
domain is represented as . In most cases,
As is clear from the derivations so far, the concept of the LS the computational cost of the proposed method is much
and regularized LS methods differs greatly from that of the pro- smaller than that of the LS methods.
posed method. The LS methods are based on controlling sound D) Filter stability: The stability of can be evalu-
pressure at discrete points by using an inverse filter. Therefore, ated as the condition number of , which depends
the physical properties such as alignments of loudspeakers and on the linear independence of each transfer function [36].
characteristics of wave propagation are included in the known Therefore, with the constraints that the alignments of the
transfer function matrix , and its inverse is numerically loudspeakers and control points are linear, be-
solved in the least square error sense. In contrast, the proposed comes very unstable. Although it is necessary to apply
method is based on the continuous WFR equation, which is an- and to adjust to stabilize , it is dif-
alytically derived based on the physical equation of wave prop- ficult with such a large matrix. Making larger increases
agation. These facts give rise to many differences between the reproduction errors.
properties of these transform filters. We remark on the several Because the proposed WFR filter is a uniquely determined
differences in the properties. Note that only the spatial aspects of vector defined in the orthogonal space, it is much more
the transform filters are discussed. The number of microphones stable than that of the LS methods. Additionally, the more
and loudspeakers is assumed to be the same, i.e., in the stable the transform filter is, the shorter the filter length
LS methods. that can be achieved in the time domain.
A) Filter design: In the LS and regularized LS methods, it These advantages of the proposed method derive from the
is possible to use either a modeled or measured transfer fact that the WFR filter is represented as the convolution form in
function matrix for . However, because the inverse the spatial domain and is analytically determined. These advan-
of becomes unstable in many cases, heuristic pro- tages are closely related to the advantages of acoustical holog-
cesses for stabilization, such as adjustment of the regu- raphy based on the angular spectrum representation in the con-
larization parameter , are required. Several systematic text of sound field analysis [30], [39]. The differences described
methods for determining have been proposed such as above are investigated and quantified in Section VI.
the L-curve [37] and generalized cross validation methods
[38]. VI. EXPERIMENTS
The proposed WFR filter is analytically determined; Numerical simulations of reproducing point sources were
therefore, it has an advantage in that no heuristic pro- performed by using linear arrays of microphones and loud-
cesses for designing the filter are needed. speakers under the free-field assumption. The proposed method
B) Filter size: The size of the LS-based filters, and is compared with the LS and regularized LS methods presented
, is . Because is assumed to be a very in Section V. Fig. 5 shows the simulation setup. Identical linear
large number, e.g., , and gen- microphone and loudspeaker arrays were located at ,
erally become very large matrices in order to avoid the with 64 channels in each array. These positions were assumed
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
692 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

Fig. 5. Numerical simulation setup. Linear microphone and loudspeaker arrays were located at -axis, with 64 channels in each array. The array elements were
equally spaced 6 cm apart. (a) Original sound field in source area; (b) target sound field in target area.

for simplicity because the proposed WFR filter (26) is derived of the original sound pressure distribution to the error of repro-
under the assumption that the microphones and loudspeakers duced sound pressure distribution, we define the signal to dis-
are set at the coinciding positions. If their positions are not tortion ratio (SDR), written as
coincident, the original and reproduced sound fields would be
distorted or misaligned. For the loudspeaker intervals to be dif-
ferent from the microphone ones, the spatial DFT or IDFT must
be calculated accordingly. The directivity of the array elements
was assumed to be omni-directional. The array elements were (34)
equally spaced 6 cm apart, so the array lengths were 3.8 m. The SDR was calculated in the region bounded by the dashed
Therefore, the spatial Nyquist frequency defined in (27) was line in Fig. 5(b).
about 2.8 kHz. The point sources as primary sources located at Fig. 6 shows the simulation results when the point source was
, i.e., the source area, were observed with the microphone located at ( 0.4 m, 1.0 m). The source signal was a 1-kHz si-
array. The observed signals were transformed into the driving nusoidal wave. Fig. 6(a) shows the original sound pressure dis-
signals of the loudspeaker array by using each transform filter. tribution in the source area. Fig. 6(b) and (c) show the repro-
In the proposed method, (26) was used as the transform filter duced sound pressure distribution using the proposed and regu-
and was set at 1.0 m. The Tukey window function was larized LS methods, respectively. The filter size of the proposed
applied as a tapering window whose sides tapered by 10%. In method was set at . Fig. 7 shows the time-averaged
the LS and regularized LS methods, (31) and (32) were used squared error of sound pressure distribution. In both methods,
as the transform filters, respectively. Since the control line the reproduction accuracies were distinctly high along the line
must be set in front of the loudspeaker array, the location parallel to the loudspeaker array. Those in the region off the
of the microphone array was shifted to to match line are low because of the faster amplitude decay. The SDRs
the target sound field with that of the proposed method. The of both the proposed and regularized LS methods were 19.8 dB.
control line was set at 0.5 m. The regularization parameter, Almost the same performance as the regularized LS method was
in (32), was determined by using the L-curve method [37] achieved with the proposed method.
at each temporal frequency; therefore, was different at each The results when the source signal was a 4-kHz sinusoidal
temporal frequency. The original and target sound fields were wave are shown in Figs. 8 and 9. Because the frequency of the
simulated in 3.6 3.6 regions (shaded regions in Fig. 5) at source signal was above the spatial Nyquist frequency, severe
every 1.5 cm. The reproduced sound pressure distributions in errors can be seen in both methods. In the proposed method, it
the target area were normalized at the center of the simulated appears that the errors derive from the replication of the spa-
region. The sampling frequency was 48 kHz. tial frequency spectra. On the contrary, in the regularized LS
We evaluated the reproduced sound fields by using the method, the synthesized sound pressure distribution is more
time-averaged squared error at every simulated discrete posi- complicated. The SDRs of the proposed and regularized LS
tion defined as methods were 6.6 and 5.7 dB, respectively.
Fig. 10 plots the relation between SDRs and the frequency of
the source signal. The position of the point source was ( 0.4 m,
1.0 m). The results of the proposed method are shown for three
(33) filter sizes, , 32, and 8. The SDRs of the proposed
method using the shorter filter size were lower than
where and are the reproduced those of the LS and regularized LS methods, especially at low
and original sound pressure distributions in the time domain, frequencies. By using the proposed method with , the
respectively, and denotes the discrete time. The total number SDRs were almost the same as the LS methods above 400 Hz.
of time samples was set at 10 ms, i.e., 480 samples. As the ratio The longer the , the higher the reproduction accuracy becomes
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
KOYAMA et al.: ANALYTICAL APPROACH TO WAVE FIELD RECONSTRUCTION FILTERING 693

Fig. 6. Simulation results of original and reproduced sound pressure distribu- Fig. 8. Simulation results of original and reproduced sound pressure distribu-
tion by using proposed and regularized LS methods when source signal was tion by using proposed and regularized LS methods when source signal was
1-kHz sinusoidal wave. (a) Original; (b) proposed ; (c) regularized 4-kHz sinusoidal wave. (a) Original; (b) proposed ; (c) regularized
LS. LS.

Fig. 9. Time averaged squared error of sound pressure distribution when source
Fig. 7. Time averaged squared error of sound pressure distribution when source
signal was 4-kHz sinusoidal wave. (a) Proposed ; (b) regularized LS.
signal was 1-kHz sinusoidal wave. (a) Proposed ; (b) regularized LS.

at low frequency. The SDRs of all the methods were very low
above the spatial Nyquist frequency. The reproduction accura-
cies of the proposed method were almost the same as those of
the LS methods at most frequencies.
The relation between the condition number in dB and the
frequency of the source signal is shown in Fig. 11. Although the
condition number of the proposed method was not originally
defined, we calculated it as , where
indicates the operation of the ratio of the largest and
smallest eigenvalue. The condition numbers of the LS and reg-
ularized LS methods were calculated as
and , respec-
tively. A smaller condition number means that the filter is more
stable. Note that these values do not depend on input signals.
The condition numbers of the LS method were distinctly higher
Fig. 10. Relation between SDR and frequency of source signal.
than those of the proposed method. Although the condition
numbers of the regularized LS method were lower than those of
the LS method, they were still higher than those of the proposed whereas the filter stability of the proposed method was higher
method. Almost the same performance as the LS and regular- than that of these LS methods. These results are summarized
ized LS methods was achieved by using the proposed method, in Table I. Obviously, the filter size and computational cost of
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
694 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

TABLE I
SUMMARIZED RESULTS OF EXPERIMENTS

Fig. 12. Amplitude of driving signals in spatio-temporal frequency domain


when primary source is point source. (a) Proposed ; (b) regular-
ized LS.
Fig. 11. Relation between condition number in dB and frequency of source
signal.
loudspeaker array, the WFR filter, was analytically derived
based on the continuous WFR equation in the spatio-temporal
the proposed method were much smaller than those of the LS frequency domain. The properties of the proposed WFR filter
methods. have many advantages in filter design, filter size, computational
Fig. 12(a) and (b) show the amplitude of the driving signals cost, and filter stability compared to those of the conven-
of the proposed and regularized LS methods in the spatio-tem- tional LS-based transform filter. Numerical simulations were
poral frequency domain, respectively. The horizontal and ver- conducted to compare the proposed method with the LS and
tical axes indicate the spatial frequency rad/m and temporal regularized LS methods. Almost the same performance as that
frequency , respectively. The amplitudes were calcu- of the LS methods was achieved by using the proposed method.
lated as with spatial DFT. The size of the spatial However, higher filter stability was achieved with the proposed
DFT was 256. These amplitudes were normalized at 2000 Hz. method. Moreover, the proposed method allows a variety of
The red line indicates the line of ; therefore, the region filter sizes to be used.
below this line represents the evanescent components. Although
it seems that the amplitude of the regularized LS method is APPENDIX
slightly concentrated compared to that of the proposed method The spatial Fourier transform of (6) with respect to is cal-
in the region below the spatial Nyquist frequency, this is because culated by using [40, (3.876-1) and (3.876-2)]:
the microphone array position of the regularized LS method was
closer to the primary source than that of the proposed method. (35)
In the region above the spatial Nyquist frequency, the aliasing
noise of the proposed method seems to be a simple replication In a similar way, the spatial Fourier transform of (15) with re-
of the spatial frequency spectra. On the contrary, that of the reg- spect to is calculated by using [40, (6.677-3) and (6.677-4)]
ularized LS method was very different from the simple repli-
cation of the spatial frequency spectra. This difference can be
considered as an effect of the off-diagonal components of the (36)
transform filter .

REFERENCES
VII. CONCLUSION
[1] M. A. Gerzon, “Periphony: With-height sound field reproduction,” J.
An SP-DS conversion method for sound field reproduction Audio Eng. Soc., vol. 21, pp. 2–10, Jan. 1973.
[2] D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave
was proposed. The transform filter that converts the received sound field using an array of loudspeakers,” IEEE Trans. Speech Audio
signals of the microphone array into the driving signals of the Process., vol. 9, no. 6, pp. 697–707, Aug. 2001.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
KOYAMA et al.: ANALYTICAL APPROACH TO WAVE FIELD RECONSTRUCTION FILTERING 695

[3] J. Daniel, “Spatial sound encoding including near field effect: Intro- [28] S. Koyama, K. Furuya, Y. Hiwasaki, and Y. Haneda, “Sound field
ducing distance coding filters and a viable, new ambisonics format,” in recording and reproduction using transform filter designed in spatio-
Proc. 23rd Conf. AES, Copenhagen, Denmark, May 2003. temporal frequency domain,” in Proc. 131st Conv. AES, New York,
[4] M. Poletti, “Three-dimensional surround sound systems based Oct. 2011.
on spherical harmonics,” J. Audio Eng. Soc., vol. 56, no. 11, pp. [29] J. Ahrens, Analytic Methods of Sound Field Synthesis. New York:
1004–1025, 2005. Springer-Verlag, 2010.
[5] J. Ahrens and S. Spors, “An analytical approach to sound field repro- [30] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield
duction using circular and spherical loudspeaker distributions,” Acta Acoustical Holography. New York: Academic, 1999.
Acustica united with Acustica, vol. 94, no. 6, pp. 988–999, 2008. [31] F. Fazi, P. Nelson, and R. Potthast, “Analogies and differences be-
[6] Y. J. Wu and T. D. Abhayapala, “Theory and design of soundfield tween three methods for sound field reproduction,” in Proc. Ambisonics
reproduction using continuous loudspeaker concept,” IEEE Trans. Symp., Graz, Ausria, Jun. 25–27, 2009.
Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 107–116, 2009. [32] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Lo-
[7] A. J. Berkhout, D. D. Vries, and P. Vogel, “Acoustic control by wave calization. Cambridge, MA: MIT Press, 1996.
field synthesis,” J. Acoust. Soc. Amer., vol. 93, no. 5, pp. 2764–2778, [33] D. E. Amos, “A portable package for Bessel functions of a complex
1993. argument and nonnegative order,” ACM Trans. Math. Software, vol.
[8] E. Verheijen, “Sound field reproduction by wave field synthesis,” Ph.D. 12, no. 3, pp. 265–273, 1986.
dissertation, Delft Univ. of Technol., Delft, The Netherlands, 1997. [34] S. Spors and J. Ahrens, “A comparison of wave field synthesis and
[9] D. D. Vries, Wave field synthesis, ser. AES Monograph. Audio Eng. higher order ambisonics with respect to physical properties and spatial
Soc.., 2009. sampling,” in Proc. 125th Conv. AES, San Francisco, CA, Oct. 2008.
[10] S. Spors, H. Teutsch, A. Kuntz, and R. Rabenstein, “Sound field syn- [35] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE
thesis,” in Audio Signal Processing For Next-Generation Multimedia Trans. Acoust., Speech, Signal Process., vol. 36, no. 2, pp. 145–152,
Communication Systems, Y. Huang and J. Benesty, Eds. Norwell, Feb. 1988.
MA: Kluwer, 2004, ch. 12. [36] F. Asano, Y. Suzuki, and T. Sone, “Sound equalization using derivative
[11] S. Spors, R. Rabenstein, and J. Ahrens, “The theory of wave field syn- constraints,” Acta Acustica united with Acustica, vol. 82, no. 2, pp.
thesis revisited,” in Proc. 124th Conv. AES, Amsterdam, The Nether- 311–320, 1996.
lands, Oct. 2008. [37] P. C. Hansen and D. P. O’Leary, “The use of the L-curve in the regu-
[12] J. Ahrens and S. Spors, “Sound field reproduction using planar and larization of discrete ill-posed problems,” SIAM J. Sci. Comput., vol.
linear arrays of loudspeakers,” IEEE Trans. Audio, Speech, Lang. 14, no. 6, pp. 1487–1503, 1993.
Process., vol. 18, no. 8, pp. 2038–2050, Nov. 2010. [38] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
[13] J. Ahrens and S. Spors, “Wave field synthesis of a sound field described Learning: Data Mining, Inference, and Prediction, 2nd ed. New
by spherical harmonics expansion coefficients,” J. Acoust. Soc. Amer., York: Springer, 2009, 2nd ed..
vol. 131, no. 3, pp. 2190–2199, 2012. [39] O. K. Ersoy, Diffraction, Fourier Optics and Imaging. New York:
[14] E. Hulsebos, D. Vries, and E. Bourdillat, “Improved microphone array Wiley, 2006.
configuration for auralization of sound fields by wave field synthesis,” [40] I. Gradshteyn and I. Ryzhik, Table of Integrals, Series, and Products.
J. Audio Eng. Soc., vol. 50, no. 10, pp. 779–790, 2002. New York: Academic, 2007.
[15] S. Koyama, K. Furuya, Y. Hiwasaki, and Y. Haneda, “Reproducing
virtual sound sources in front of a loudspeaker array using inverse wave
propagator,” IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no.
6, pp. 1746–1758, 2012.
[16] O. Kirkeby and P. A. Nelson, “Reproduction of plane wave sound
fields,” J. Acoust Soc. Amer., vol. 94, no. 5, pp. 2992–3000, 1993.
[17] P. A. Nelson, “Active control of acoustic fields and the reproduction of
sound,” J. Sound Vibr., vol. 177, no. 4, pp. 447–477, 1993.
[18] O. Kirkeby, P. A. Nelson, F. O. Bustamante, and H. Hamada, “Local Shoichi Koyama (M’10) received the B.E. and M.E.
sound field reproduction using digital signal processing,” J. Acoust. degrees in mathematical engineering and information
Soc. Amer., vol. 100, no. 3, pp. 1584–1593, 1996. physics from the University of Tokyo, Tokyo, Japan,
[19] S. Ise, “A principle of sound field control based on the Kirchhoff- in 2007 and 2009, respectively.
Helmholtz integral equation and the theory of inverse systems,” Acta He joined Nippon Telegraph and Telephone Cor-
Acustica united with Acustica, vol. 85, no. 1, pp. 78–87, 1999. poration (NTT) in 2009. He is now a Researcher at
[20] P. A. Gauthier and A. Berry, “Sound-field reproduction in-room using NTT Media Intelligence Laboratories, Tokyo, Japan.
optimal control techniques: Simulations in the frequency domain,” J. His research interests include acoustic signal pro-
Acoust. Soc. Amer., vol. 117, no. 2, pp. 662–678, 2005. cessing, and sound field analysis and reproduction.
[21] N. Kamado, H. Hokari, S. Shimada, H. Saruwatari, and K. Shikano, Mr. Koyama is a member of the Audio Engi-
“Sound field reproduction by wavefront synthesis using directly neering Society (AES) and the Acoustical Society of
aligned multi point control,” in Proc. 40th Conf. AES, Tokyo, Japan, Japan (ASJ). He was awarded the Young Researcher Award on Measurement
Oct. 2010. Division by the Society of Instrument and Control Engineers (SICE) in 2009,
[22] G. N. Lilis, D. Angelosante, and G. B. Giannakis, “Sound field repro- the Best Young Researcher Paper Award on Sensors and Micromachines
duction using the lasso,” IEEE Trans. Audio, Speech, Lang. Process., Society by the Institute of Electrical Engineers of Japan (IEEJ) in 2010, and
vol. 18, no. 8, pp. 1902–1912, Nov. 2010. the Awaya Prize Young Researcher Award by ASJ in 2011.
[23] M. Kolundžija, C. Faller, and M. Vitterli, “Reproducing sound fields
using MIMO acoustic channel inversion,” J. Audio Eng. Soc., vol. 59,
no. 10, pp. 721–734, 2011.
[24] J. J. López, A. González, and L. Fuster, “Room compensation in wave Ken’ichi Furuya (M’96–SM’10) received his B.E.
field synthesis by means of multichannel inversion,” in Proc. IEEE and M.E. degrees in acoustic design from Kyushu In-
Workshop Appl. Signal Process. Audio Acoust. (WASPAA), New Paltz, stitute of Design, Fukuoka, Japan, in 1985 and 1987,
NY, Oct. 2005, pp. 146–149. and his Ph.D. degree from Kyushu University, Japan,
[25] E. Corteel, “Equalization in an extended area using multichannel in- in 2005.
version and wave field synthesis,” J. Audio Eng. Soc., vol. 54, pp. From 1987 to 2012, he was with the laboratories
1140–1161, 2006. of Nippon Telegraph and Telephone Corporation
[26] S. Spors, H. Buchner, and R. Rabenstein, “Active listening room com- (NTT), Tokyo, Japan. In 2012, he joined the Depart-
pensation for massive multichannel sound reproduction systems using ment of Computer Science and Inteligent Systems
wave-domain adaptive filtering,” J. Acoust. Soc. Amer., vol. 54, no. 12, of Oita University, Oita, Japan, where he is currently
pp. 354–369, 2007. a Professor. His current research interests include
[27] S. Koyama, K. Furuya, Y. Hiwasaki, and Y. Haneda, “Design of signal processing in acoustic engineering.
transform filter for sound field reproduction using microphone array Dr. Furuya was awarded the Sato Prize by the Acoustical Society of Japan
and loudspeaker array,” in Proc. IEEE Workshop Appl. Signal Process. (ASJ) in 1991. He is also a member of the Acoustical Society of Japan, the
Audio Acoust. (WASPAA), New Paltz, NY, Oct. 2011, pp. 5–8. Acoustical Society of America, and IEICE.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.
696 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL 2013

Yusuke Hiwasaki (M’96) obtained B.E. in instru- Yoichi Haneda (M’97–SM’06) received his B.S.,
mentation engineering, M.E., and Ph.D. degrees in M.S., and Ph.D. degrees from Tohoku University,
computer science, from Keio University, Yokohama, Sendai, in 1987, 1989, and 1999.
Japan, in 1993, 1995, and 2006, respectively. He is a Professor at the University of Electro-Com-
Since joining NTT in 1995, he has been engaged munications, Tokyo, Japan. From 1989 to 2012,
in research field of low bit-rate speech coding and he was with the Nippon Telegraph and Telephone
voice-over-IP telephony. From 2001 to 2002, he was Corporation (NTT), Japan. In 2012, he joined the
a guest researcher at Kungliga Tekniska Hogskolan University of Electro-Communications. His research
(Royal Institute of Technology) in Sweden. He now interests include the modeling of acoustic transfer
works as a senior research engineer, supervisor, at functions, microphone array, loudspeaker array, and
NTT. Since 2007, he has been active in standardiza- acoustic echo cancellers.
tion of speech coding especially at ITU-T Study Group 16 (SG16) and acted He received paper awards from the Acoustical Society of Japan (ASJ) and
as the editor of Recommendation ITU-T G.711.1. From 2009, he had been As- from the Institute of Electronics, Information, and Communication Engineers
sociate Rapporteur of ITU-T SG16 Q.10, a question on speech coding matters, (IEICE) of Japan in 2002. Dr. Haneda is a senior member of IEICE, and a
and then became Rapporteur in 2011. member of Acoustical Society of Japan.
Dr. Hiwasaki received Technology Development Award from the Acous-
tical Society of Japan, Best Paper Award from IEICE Communications Society,
IEICE Achievement Award, Teishin Association Maejima Award in 2006, 2006,
2009 and 2010, respectively. He is a member of IEEE, IEICE, and Acoustical
Society of Japan.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on April 10,2024 at 12:32:06 UTC from IEEE Xplore. Restrictions apply.

You might also like