Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array

This document describes a study that used a quadrotor equipped with a microphone array to improve outdoor sound source detection. The researchers proposed two new methods: 1) iGSVD-MUSIC, which uses incremental generalized singular value decomposition to detect sound sources in real-time with low computational cost by adapting to dynamically changing noise. 2) Correlation Matrix Scaling (CMS), which softens the whitening process to be more robust to errors from imperfect noise correlation matrix estimation, reducing missed and extra detections of sound sources. Experiments showed the combination of these two new methods significantly improved sound source detection performance compared to previous methods.

Uploaded by

OSCARIN1001

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array

Uploaded by

OSCARIN1001

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2014 IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS 2014)

September 14-18, 2014, Chicago, IL, USA

Improvement in Outdoor Sound Source Detection

Using a Quadrotor-Embedded Microphone Array
Takuma Ohata, Keisuke Nakamura, Takeshi Mizumoto, Tezuka Taiki, and Kazuhiro Nakadai

Abstract— This paper addresses sound source detection in array which consists of a small and lightweight microphone
an outdoor environment using a quadrotor with a microphone and multichannel A/D converter [5]. Since dynamically-
array. Since the previously reported method has a high compu- changing high power noise is generated due to the rotation
tational cost, we proposed a sound source detection algorithm
called MUltiple SIgnal Classification based on incremental of propellers and wind, they proposed incremental Gen-
Generalized Singular Value Decomposition (iGSVD-MUSIC), eralized EigenValue Decomposition-based Multiple Signal
which detects sound source location and temporal activity with Classification (iGEVD-MUSIC). It originated from GEVD-
low computational cost. In addition, to relax an over-esitimation MUSIC proposed by Nakamura et al. [6] as an extension of
problem of noise correlation matrix which is used in iGSVD- a well-known adaptive beamforming algorithm, MUSIC [7].
MUSIC, we proposed Correlation Matrix Scaling (CMS), which
realizes soft whitening of noise. The protptype system based GEVD-MUSIC can localize sound sources by whitening high
on the proposed methods were evaluated with two types of power noise using a noise correlation matrix which is esti-
microphone arrays in an outdoor environment. Experimental mated in advance, and thus it shows high noise robustness.
results showed that the combination of iGSVD-MUSIC and On the other hand, such a pre-estimated noise correlation
CMS improves sound source detection performance drastically matrix results in a lack of robustness for dynamically-
and achieves real-time processing.
Index Terms— robot audition, speech detection, sound source changing noise. Since iGEVD-MUSIC can incrementally and
localization, sound source separation adaptively estimate the noise correlation matrix, it is robust
for dynamically-changing high power noise.
I. I NTRODUCTION Since iGEVD-MUSIC simply used sound signals recorded
a few seconds before the current processing time for noise
Computational Auditory Scene Analysis (CASA) in an
correlation matrix estimation, Furukawa et al. improved it
outdoor environment has been actively studied in the past
using motion information obtained from Inertial Measure-
few years. This is caused by the fact that real-time signal and
ment Unit (IMU) of their quadrotor [8].
speech processing for a robot in an indoor environment was However, these studies still have two issues as follows:
established for over ten years of robot audition research1 ,
• They assume off-line processing, and computational
and that software environments for robot technology became
costs of GEVD is too expensive to perform real-time
available such as HARK (HRI-JP Audition for Robots with
processing.
Kyoto Univ.) [1] and ROS2 . For instance, Sasaki et al. re-
• Since perfect estimation of a noise correlation matrix
ported sound source identification using a mobile robot with
is difficult, it is inevitable that the whitening process
a 32 ch microphone array for anomaly sound detection [2].
in GEVD produces errors, which leads to performance
Bando et al. studied posture estimation of a hose-type robot
degradation.
by using robot’s embedded microphones and speakers in a
For the first issue, we propose MUltiple SIgnal Classi-
SLAM manner, which is an essential function to determine
fication based on incremental Generalized Singular Value
the position of any person buried in rubble [3].
Decomposition (iGSVD-MUSIC), which is an extension
Sound Source Detection (SSD), which localizes sound
of GSVD-MUSIC [9]. Likewise GEVD-MUSIC, GSVD-
sources with sound activity detection using an Unmanned
MUSIC can whiten noise using a noise correlation matrix,
Aerial Vehicle (UAV) is also reported. It has been studied
and drastically reduce the computational cost while main-
for military use utilizing expensive sensors such as Acoustic
taining the performance of sound source localization.
Vector Sensor (AVS) [4], and thus focused only on detection
For the second issue, we propose Correlation Matrix
of high power sound sources such as tanks and airplanes.
Scaling (CMS) which realizes soft whitening by raising the
Another approach, Okutani et al. recently reported SSD
singular values obtained from the noise correlation matrix to
using a Parrot AR.Drone by installing an 8 ch microphone
the power of α(0 ≤ α ≤ 1). The noise correlation matrix is
T. Ohata, T. Taiki and K. Nakadai are with Graduate School sometimes over-estimated, which degrades the performance
of Information Science and Engineering, Tokyo Institute of Tech- of localization drastically. For instance, the over-estimated
nology, 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552, JAPAN.
[email protected] noise correlation matrix whitens noise sources as well as
K. Nakamura, T. Mizumoto, K. Nakadai are with Honda Research target sound sources, and generates ghost sound sources,
Insititute Japan Co., Ltd., 8-1 Honcho, Wako, Saitama 351-0114, JAPAN. which causes miss and extra-detection of sound sources.
{keisuke,t.mizumoto,nakadai}@jp.honda-ri.com
1 Organized sessions on robot audition have been continuously held at Because CMS makes the whitening effect softer using the
IROS 2004-2013 scaling parameter α, it avoids the performance degradation
2 http://www.ros.org even when over-estimation occurs.

978-1-4799-6934-0/14/$31.00 ©2014 IEEE 1902

We integrated iGSVD-MUSIC and CMS into a prototype The MUSIC spatial spectrum P (ω, ψ, f ) is calculated
SSD system, and the prototype system showed that the SSD using steering vector G(ω, ψ) as
performance drastically improved, and its processing speed |G∗ (ω, ψ)G(ω, ψ)|
was fast enough to realize real-time processing. P (ω, ψ, f ) = ∑M , (4)
The rest of the paper is organized as follows: Sections m=L+1 |G∗ (ω, ψ)em (ω, ψ)|
II and III propose iGSVD-MUSIC and CMS, respectively. where ψ is a sound source direction, and L is the number
Section IV constructs a prototype system of sound source of target sound sources. em shows the m-th singular vector
detection. Section V evaluates the proposed methods through included in El .
the constructed prototype system. The last section concludes Since noise is whitened, singular values corresponding
the paper. to the target sound sources have bigger values than those
II. S OUND S OURCE L OCALIZATION USING corresponding to noise sources (see [6] for more details).
I GSVD-MUSIC
Thus, the first L steering vectors in El correspond to target
sound sources, and others are related to noise sources. When
This section first introduces iGSVD-MUSIC to deal with ψ shows a sound source direction, an inner product of
dynamically-changing noise. We also compare iGSVD- G∗ (ω, ψ) and em goes to 0 for every m ≥ L + 1 because
MUSIC and iGEVD-MUSIC in terms of their computational signal and noise vectors are orthogonal to each other in the
costs and performance. As mentioned, iGSVD-MUSIC is the signal space spanned by El . This means that the denominator
incremental processing version of GSVD-MUSIC, which has of Eq. (4) becomes 0, and thus a sharp peak is formed for
low computational cost and comparable performance with the sound source direction ψ in P (ω, ψ, f ).
iGEVD-MUSIC, which is our previously-proposed method. After that, P (ω, ψ, f ) is averaged on ω denoted as
A. iGSVD-MUSIC 1 ∑
ωH

Let X(ω, f ) be the observed signal vector at the f -th P̄ (ψ, f ) = P (ω, ψ, f ), (5)
ωH − ωL + 1 ω=ω
frame at the ω-th frequency bin. The correlation matrix L

R(ω, f ) is obtained from X(ω, f ) by where ωH , and ωL the highest and lowest boundary of the
R −1
frequency bin, respectively.
1 ∑
f +T
R(ω, f ) = X(ω, τ )X ∗ (ω, τ ), (1) Sound source direction is estimated as ψ which has a peak
TR exceeding a threshold Pth in P̄ (ψ, f ).
τ =f

where TR is the number of frames for temporal integration B. Comparison of iGSVD-MUSIC and iGEVD-MUSIC
to calculate the correlation matrix. The computational cost and the performance between
For the f -th frame signal, iGSVD-MUSIC assumes that iGSVD-MUSIC and iGEVD-MUSIC are discussed. GEVD-
TN frames with the boundary of f − fs − TN to f − fs MUSIC uses generalized eigenvalue decomposition instead
include only noise signals, and the noise correlation matrix of generalized singular value decomposition in Eq. (3).
K(ω, f ) is defined by, However, it is not guaranteed that K −1 (ω, f )R(ω, f ) is a
−fs
f∑ Hermitian matrix. This means that eigen vectors obtained
1
K(ω, f ) = X(ω, τ )X ∗ (ω, τ ) , (2) by GEVD are not always orthogonal to each other. This
TN leads to performance degradation because MUSIC-based
τ =f −fs −TN
algorithms assume that eigen (or singular) vectors are or-
The problem with the original GSVD-MUSIC is that it
thogonal as mentioned above. iGEVD-MUSIC, therefore,
uses the fixed noise correlation matrix obtained from the
replaces K −1 (ω, f )R(ω, f ) with K − 2 RK − 2 to solve this
1 1

pre-defined noise signals, and thus it has a difficulty in

problem. However, the computational cost to calculate K − 2
1

coping with dynamically changing noise. Likewise iGEVD-

is quite expensive, and it is difficult to achieve real-time
MUSIC, it has an incremental mechanizm to estimate the
processing.
noise correlation matrix, and it is expected to be able to deal
On the other hand, GSVD guarantees that singular vectors
with dynamically changing noise. The assumption that the
are mutually orthogonal for a non-Hermitian matrix, and
TN frames include only noise signals has a risk that these
there is no problem to use K −1 (ω, f )R(ω, f ) directly.
frames also include target signals. This risk can be minimized
Because it is not necessary for GSVD to calculate K − 2 and
1

by controlling two parameters fs and TN . The optimization

the computational cost of GSVD is slightly lower than that of
of these parameters are discussed with iGEVD-MUSIC [5].
GEVD, iGSVD-MUSIC drastically reduces its computational
The whitening process is performed by multiplying R by
cost compared to iGEVD-MUSIC.
an inverse matrix of K. GSVD is performed for the whitened
Let us consider that the observed signal vector X is
correlation matrix, K −1 (ω, f )R(ω, f ) defined by,
defined by
K −1 (ω, f )R(ω, f ) = El (ω, f )Λ(ω, f )Er∗ (ω, f ), (3) X = AS + N (6)
where Λ(ω, f ) is a matrix whose diagonal elements are where A is a transfer function between a sound source and a
singular values in descending order. El (ω, f ), and Er (ω, f ) microphone array, S is a signal from a sound source, and N
are matrices consisting of singular vectors. is a noise vector. We assume that S and N are uncorrelated

1903
in this model, which is a common assumption in the field of where Λ is a diagonal matrix consisting of eigenvalues, and
signal processing. ω and f are omitted for simplicity. R is E is an eigen vector matrix.
represented by Λ specifies the power of each eigen vectors, and E does
the direction of each eigen vectors in the noise subspace.
R = XX ∗ = ASS ∗ A∗ + N N ∗ (7) To fulfill the above requirements, K α is defined by by
= Γ+K (8) controlling Λ.
In iGEVD-MUSIC, the noise is whitened as I. Kα = EΛα E ∗ , (16)
K − 12
RK − 21
= K − 12
(Γ + K) K − 12
(9) Λα = diag(λα α
1 , ..., λM ) (17)
− 12 − 12
= K ΓK + I. (10) where we call α (0 ≤ α ≤ 1) the scaling parameter of CMS.
CMS replaces K with K α in Eq. (3). When α is 1, K α is
In iGSVD-MUSIC, Eq. (3) is transformed by multiplying
identical to K, which is the same as iGSVD-MUSIC without
(K −1 R)∗ as
CMS. When α is 0, K α becomes I, which means no whiten-
K −1 R(K −1 R)∗ = El ΛEr∗ (El ΛEr∗ )∗ , (11) ing is performed like SEVD. By changing α, the level of
K −1 2
R K −1
= El Λ 2
El∗ . (12) soft whitening can be controlled. Since SEVD is performed
every time after estimating K, higher computational cost is
This is equivalent to GEVD of R2 , and thus El is regarded as necessary. This issue is discussed in the later sections.
an eigen vector matrix obtained from GEVD of R2 . Eq. (12) Note that CMS can be applied to any whitening-based
is able to be re-written using Eq. (8) as MUSIC such as iGSVD-MUSIC, iGEVD-MUSIC, GSVD-
∗ MUSIC, and GEVD-MUSIC.
K −1 R2 K −1 = K −1 (Γ + K) (Γ + K) K −1 (13)
= K −1 Γ2 K −1 + K −1 Γ IV. S YSTEM A RCHITECTURE
+ΓK −1 + I (14) Fig. 1 shows the system architecture for SSD based
on the proposed method. It consists of a quadrotor with
The noise factor K is successfully whitened as I ac- microphones and a laptop PC where SSD is running. They
cording to the fourth term in the right side of Eq. (14) , are described in the following sections.
but the second and third terms remain, which degrades the
performance. Therefore, the performance of iGSVD-MUSIC A. Quadrotor and Microphone Layout
should be poorer than that of iGEVD-MUSIC. The effect We used AscTech Pelican for our quadrotor, which has a
of these terms for degradation should be investigated in the minimum take-off weight and maximum payload of 630 g
later sections. and 650 g respectively. It can fly up to 15 min with a
In this paper, we call the original MUSIC Standard Eigen- full payload. Since we can control it and obtain sensor
Value Decomposition MUSIC (SEVD-MUSIC). It directly information of IMU, and GPS using ROS, it is convenient for
uses R(ω, f ) to perform SEVD without whitening, and thus research purposes. We installed a microphone array on our
the noise robustness is lower than GEVD and GSVD. In Pelican, sensory data from IMU and GPS, and 16 ch audio
particular, when the noise is of higher power than the target are recorded synchronously and all data is sent to a remote
signal, the assumption that larger eigenvalues correspond to laptop PC via a wireless network.
the target sound sources does not hold, and sound source We constructed two types of microphone arrays shown
localization deteriorates. in Fig. 2. Both consist of a multi-channel A/D converter
RASP and 16 MEMS microphones developed by Systems in
III. C ORRELATION M ATRIX S CALING Frontier Inc. on our quadrotor.
A key factor for iGSVD-MUSIC is estimation perfor- Fig. 2a) shows a microphone array with a semi-sphere
mance of the noise correlation matrix. Since it is impossible shape which is made of Styrofoam and 16 microphones are
to have perfect estimation in a real environment, the more attached to the surface at the positions of the black hair
precisely we estimate it, the more risk we have for over- shown in Fig. 2b). This layout is designed mainly to detect
estimation. sound sources under the quadrotor. We think that this layout
The idea of CMS is to soften the whitening effect to avoid is beneficial in practical use, because the distance between
errors caused by over-estimation of the noise correlation the quadrotor and such sound sources are relatively short,
matrix. To do this, we have to control the noise subspace and thus Signal-to-Noise Ratio (SNR) is expected to be high.
spanned by noise singular (or eigen) vectors so that only Fig. 2c) shows another microphone array which has a
the level of whitening can be changing without modifying circular layout. In signal processing, it is known that the
changing the direction of the noise singular vectors. larger microphone array gives a sharper mainlobe, i.e., better
Let K be the estimated noise correlation matrix in resolution in sound source localization. We designed this
iGSVD-MUSIC. Since K is an Hermitian matrix, Standard microphone array to have a large diameter so that the
EigenValue Decomposition (SEVD) is performed by resolution can be as high as possible.
Both microphone arrays are used in an outdoor environ-
K = EΛE ∗ , (15) ment, where wind noise is a big problem. To solve this

1904
Wireless Network Sound Source Detection
IMU, GPS Peak
Correlation Sound
16 ch audio data GSVD
Matrix Estimation Extraction Tracking
16 ch audio Frequency
Analysis (STFT) Incremental Correlation
AscTech Pelican Noise Correlation Matrix Scaling sound source diretion
Matrix Estimation (CMS) utterance start time
utterance duration

ROS HARK
Fig. 1. System Architecture for Sound Source Detection Using Quadrotorn

a) hemispherical layout(profile) b) hemispherical layout (underside) c) circular layout

Fig. 2. Microphone Array Layout

problem, we put wind protection at the position of each of the extracted peaks. The basic algorithm of sound tracking
microphone. Although various materials such as sponge is simple; two soccessive peaks with the same direction are
rubber and cloth are used for wind protection, we selected a connected.
hair-type material. It is known as one of the most effective
V. E VALUATION
materials because the wind power is absorbed by hair waving
motions. In addition, the motion of the hairs does not produce We evaluated the proposed methods using the constructed
acoustic noise. prototype SSD system. Two types of microphone arrays
are used to investigate general applicability of the proposed
B. Software Implementation methods, and also to explore the best design of the micro-
We used HARK3 [1] for software implementation of phone array. Evaluation tasks are sound source detection of
acoustic signal processing. Since HARK is able to be in- speech sources in an outdoor environment.
tegrated with ROS-based systems in a seamless manner, we Fig. 3 shows the situation of the experiments. The height
easily integrated the data recording module developed with of the quadrotor (h) was fixed to one of 0 m, 1 m, 2.7 m, 4 m,
ROS and the acoustic signal processing module developed and the distance to the speaker (d) was fixed to one of 1 m,
with HARK. Since we focus on sound source detection in 2 m, 3 m for each condition. Since two microphone arrays
this paper, Fig. 1 illustrates only sound source detection. were also investigated, in total 24 conditions (4 × 3 × 2)
IMU and GPS information can be used for the coordinate were recorded. In one condition, 7-9 trials were included, i.e.,
conversion for our future research purposes. a speaker made 7-9 utterances. In each condition, we con-
Sound source detection consists of seven modules such as ducted five SSD methods, that is, SEVD-MUSIC, iGEVD-
frequency analysis, correlation matrix estimation, incremen- MUSIC, iGEVD-MUSIC with CMS iGSVD-MUSIC, and
tal noise correlation matrix estimation, CMS, GSVD, peak iGSVD-MUSIC with CMS.
extraction and sound tracking. The frequency analysis mod- As for metrics, we used Localization Correct Rate (LCR)
ule simply performs short time Fourier transformation with and Localization Accuracy Rate (LAR) which were utterance-
a 32 ms window and a 10 ms window shift for 16 ch audio based metrics proposed by Okutani et al. [5]. The LCR and
data sampled at 16 kHz. The correlation matrix estimation LAR are defined by
module calculates the correlation matrix for the current time Nt − NS − ND − NI
frame according to Eq. (1). The incremental noise correlation LAR = (18)
Nt
matrix is described as Eq. (2), which is followed by the CMS Nt − NS − ND
module defined in Eq. (16). The peak extraction module LCR = (19)
Nt
extracts peaks from theMUSIC spectrum defined by Eq. (5)
where Nt is the total number of utterances, NS is the number
with threshold parameter Pth and sends them to the sound
of detected but mis-localized utterances (substitution). ND
tracking module. The sound tracking module forms a sound
and NI are the number of undetected utterances (deletion)
stream corresponding to an utterance as a temporal sequence
and extra-detected utterances (insertion). Note that we de-
3 http://www.hark.jp/ cided that an utterance is successfully-detected when its

1905
TABLE I
R ESULT OF SSD P ERFORMANCE
a) SEVD-MUSIC
semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 50 100 44 11 63 50 67 33
d 2.0 75 13 33 33 63 100 89 44
[m] 3.0 63 13 22 11 88 25 67 56
LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 0 100 44 -33 38 25 44 -44
d 2.0 25 -100 -56 -67 25 100 44 -12
[m] 3.0 13 -50 -22 -33 75 -25 33 -12
b) iGEVD-MUSIC
semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 100 100 66 33 100 63 100 44
d 2.0 88 100 88 89 100 100 67 67
[m] 3.0 88 100 100 100 100 87 78 67
Fig. 3. Saturation of Experiments LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
temporal error is within 0.5 s, and localization error is within 1.0 100 100 56 33 100 50 100 33
d 2.0 75 87 78 89 100 100 67 56
10◦ compared to the corresponding reference data. [m] 3.0 75 100 100 100 100 75 78 56
We also used Real Time Factor (RTF) to compare the c) iGSVD-MUSIC
computational costs of these methods. It is defined by semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1 ∑
Nt
Tp (t) 1.0 13 13 33 22 100 25 78 33
RT F = (20) d 2.0 13 50 33 67 100 100 33 44
Nt t Ta (t) [m] 3.0 13 25 44 44 100 63 44 33
LAR [%] h [m] h [m]
where Tp and Ta are the processed time for the recorded file 0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 13 -138 11 -11 100 0 78 -22
at the t-th condition and the whole length of the file. Nt is d 2.0 -25 25 22 44 100 100 11 33
the number of conditions. [m] 3.0 -50 25 11 -56 100 50 44 0
Real-time processing is achieved for RTF ≤ 1. In this pa- d) iGEVD-MUSIC w/ CMS
semi-sphere circular
per, we used a laptop PC with Intel(R) Core(TM) i7-2920XM LCR [%] h [m] h [m]
(2.5GHz – 3.5GHz) and 16GB of memory. Although it has 0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 100 100 100 89 100 63 100 44
4 cores with hyper-threading, we just used one core without d 2.0 100 100 88 56 100 100 89 56
hyper-threading for RTF measurement. [m] 3.0 100 100 100 67 100 100 78 89
LAR [%] h [m] h [m]
There are several system parameters which needed to be 0 1.0 2.7 4.0 0 1.0 2.7 4.0
set in advance. TR in Eq. (1), TN and fs in Eq. (2) were 1.0 100 100 100 89 100 63 100 33
d 2.0 100 100 67 11 100 100 78 56
empirically set to 500 ms, 900 ms, and 1400 ms, respectively [m] 3.0 88 100 89 -44 100 63 78 67
since these values show the best performance in our rough e) iGSVD-MUSIC w/ CMS
parameter search. MUSIC-based algorithms require the num- semi-sphere circular
LCR [%] h [m] h [m]
ber of sound sources denoted by L in Eq. (4). Since only 0 1.0 2.7 4.0 0 1.0 2.7 4.0
one person uttered in every condition, L was set to 1. CMS 1.0 100 100 89 67 100 63 100 33
d 2.0 100 100 89 67 100 100 89 56
also has a scaling parameter α. We empirically set this [m] 3.0 100 100 100 78 100 88 78 89
parameter to be 0.5, which can avoid most errors caused by LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
over-estimation of the noise correlation matrix. Automatic 1.0 100 100 89 67 100 63 100 11
optimization of this parameter is future work. d 2.0 100 100 67 33 100 100 78 56
[m] 3.0 88 100 100 -44 100 63 78 56
A. Results
TABLE II
Tab. I shows SSD performance in terms of LAR and
R EAL T IME FACTOR (RTF)
LCR for 24 conditions. Tab. I a)-e) shows results with method RTF
SEVD-MUSIC, iGEVD-MUSIC, iGEVD-MUSIC with CMS SEVD-MUSIC 0.48
iGSVD-MUSIC, and iGSVD-MUSIC with CMS, respec- iGEVD-MUSIC 1.02
iGEVD-MUSIC w/ CMS 1.56
tively. To understand the results visually, Fig. 4 depicts SSD iGSVD-MUSIC 0.37
results with the semi-sphere microphone array under the iGSVD-MUSIC w/ CMS 0.95
condition of h = 1, and d = 2. The vertical axis shows
sound source direction in the quadrotor coordinates, and the In Tab. I a), it is interesting to know SEVD-MUSIC
horizontal axis shows the time in each frame. Fig. 4a) shows showed good performance when sound sources are close to
reference data which is regarded as correct SSD. Black lines the microphone array, but it deteriorates when the distance
in Fig. 4b)-d) show the detected utterances, and the color is over 1.0 m, which is also seen in Fig. 4b). Tab. I b)
maps visualizes P̄ (ψ, f ) in Eq. (5). Tab. II shows RTFs and Fig. 4c) show that iGEVD-MUSIC improves the SSD
averaged over all conditions for these five methods. performance. Tab. I c) shows that iGSVD-MUSIC is not as

1906
MUSIC is light weight enough to achieve real-time process-
ing. Even when CMS is used on the top of iGSVD-MUSIC,
it still maintain real-time processing. Since its performance
is comparable with iGEVD-MUSIC with CMS, as a whole
it is the best method among these five methods for outdoor
SSD.
VI. C ONCLUSION
a) reference
This paper presents sound source detection using a quadro-
tor with a microphone array in an outdoor environment.
To reduce computational cost of microphone-array-based
sound source detection while maintaining performance, we
proposed MUltiple SIgnal Classification based on incre-
mental Generalized Singular Value Decomposition (iGSVD-
MUSIC). In addition, we proposed Correlation Matrix Scal-
ing (CMS) to improve SSD performance by using soft
whitening. The effectiveness of these proposed methods
b) SEVD-MUSIC are proved using sound data recorded with two types of
microphone arrays embedded in a quadrotor. Since the
system is fast enough to perform real-time processing and the
evaluation was done in an offline manner, we will implement
a real-time and online SSD system and evaluate it in a
realistic scenario in the near future.
ACKNOWLEDGMENTS
This research was partially supported by Grant-in-Aid for
c) iGEVD-MUSIC Scientific Research No. 24118702, and No. 24220006
R EFERENCES
[1] K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsu-
jino, “Design and implementation of robot audition system HARK,”
Advanced Robotics, vol. 24, no. 9, pp. 739–761, 2009.
[2] Y. Sasaki, N. Hatao, K. Yoshii, and S. Kagami, “Nested igmm
recognition and multiple hypothesis tracking of moving sound sources
for mobile robot audition,” in Proc. of the IEEE/RSJ International
Conference on Robots and Intelligent Systems (IROS), 2013, pp. 3930–
3936.
d) iGSVD-MUSIC with CMS [3] Y. Bando, T. Mizumoto, K. Itoyama, K. Nakadai, and H. G. Okuno,
“Posture estimation of hose-shaped robot using microphone array
Fig. 4. An example of SSD result (semi-sphere, h = 1, d = 2) localization,” in Proc. of the IEEE/RSJ International Conference on
Robots and Intelligent Systems (IROS), 2013, pp. 3446–3451.
effective as iGEVD-MUSIC, but slightly better than SEVD- [4] B. Kaushik, D. Nance, and K. K. Ahuj, “A review of the role of acoustic
sensors in the modern battlefield,” in 11th AIAA/CEAS Aeroacoustics
MUSIC. This is caused by the over-estimation of the noise Conference (26th AIAA Aeroacoustics Conference), 2005, pp. 1–13.
correlation matrix and cross term effect in Eq. (14). Tab. I d) [5] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoor
shows the best performance among these five methods, which auditory scene analysis using a moving microphone array embedded
in a quadrocopter,” in IEEE/RSJ IROS, 2012, pp. 3288–3293.
means that CMS is quite a powerful method to improve [6] K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino,
performance. Tab. I e) shows comparable performance to “Intelligent sound source localization for dynamic environments,” in
iGEVD-MUSIC with CMS (see also Fig. 4d)). CMS drasti- Proc. of IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS
2009), 2009, pp. 664–669.
cally improves the performance compared to iGSVD-MUSIC [7] R. Schmidt, “Multiple emitter location and signal parameter estimation,”
in Tab. I c). This suggests that the over-estimation problem IEEE Trans. on Antennas and Propagation, vol. 34, no. 3, pp. 276–280,
was severer than the cross term effect in iGSVD-MUSIC, and 1986.
[8] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. itoyama,
these problems were overcome by CMS. As for microphone K. Nakadai, and H. G. Okuno, “Noise correlation matrix estimation
layouts, they have different characteristics, but the same for improving sound source localization by multirotor uav,” in Proc.
tendency was observed between the five methods. of the IEEE/RSJ International Conference on Robots and Intelligent
Systems (IROS). IEEE, 2013, pp. 3943–3948.
As for RTF, Tab. II proves that iGEVD-MUSIC is com- [9] K. Nakamura, K. Nakadai, and G. Ince, “Real-time super-resolution
putationally expensive, and it is difficult to attain real-time sound source localization for robots,” in Proceedings of 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS
processing although it shows good performance. Since CMS 2012). IEEE, Otc. 2012, pp. 694–699.
performs SEVD, we were afraid that it also requires high
computational power. Indeed RTF becomes larger, but it was
less critical than we expected. On the other hand, iGSVD-

1907

Microsoft PowerPoint - 3. ELMAG - 1 - Coordinate Systems and Transformation
No ratings yet
Microsoft PowerPoint - 3. ELMAG - 1 - Coordinate Systems and Transformation
44 pages
Maths Cheat Sheet
No ratings yet
Maths Cheat Sheet
1 page
Wang&Cavallaro (2018) Acoustic Sensing From A Multi-Rotor Drone
No ratings yet
Wang&Cavallaro (2018) Acoustic Sensing From A Multi-Rotor Drone
13 pages
Histogram of Gradients of Time-Frequency Representations For Audio Scene Detection
No ratings yet
Histogram of Gradients of Time-Frequency Representations For Audio Scene Detection
15 pages
Urban Sound Classification
No ratings yet
Urban Sound Classification
6 pages
08078461
No ratings yet
08078461
8 pages
Deep Convolutional Neural Networks For Environmental Sound Classification
No ratings yet
Deep Convolutional Neural Networks For Environmental Sound Classification
7 pages
Sound Event Detection For Human Safety and Security in Noisy Environments
No ratings yet
Sound Event Detection For Human Safety and Security in Noisy Environments
11 pages
Miron Et Al. - 2013 - An Open-Source Drum Transcription System For Pure Data and Max MSP
No ratings yet
Miron Et Al. - 2013 - An Open-Source Drum Transcription System For Pure Data and Max MSP
5 pages
Noise Control Paper
No ratings yet
Noise Control Paper
5 pages
Underwater Acoustic Target Classification Based On Dense Convolutional Neural Network
No ratings yet
Underwater Acoustic Target Classification Based On Dense Convolutional Neural Network
5 pages
A Comprehensive Analysis of Voice Activity Detection Algorithms For Robust Speech Recognition System Under Different Noisy Environment
No ratings yet
A Comprehensive Analysis of Voice Activity Detection Algorithms For Robust Speech Recognition System Under Different Noisy Environment
6 pages
Gravitational Search Algorithm For IIR Filter-Based Audio Equalization
No ratings yet
Gravitational Search Algorithm For IIR Filter-Based Audio Equalization
5 pages
3) Doa Est Array Geometry
No ratings yet
3) Doa Est Array Geometry
13 pages
FP Sinyal Fix
No ratings yet
FP Sinyal Fix
13 pages
AES 122 Paper
No ratings yet
AES 122 Paper
7 pages
SPARTA McCormackPolitis2019SpartaCompass
No ratings yet
SPARTA McCormackPolitis2019SpartaCompass
13 pages
Audio Fingerprinting Based On Multiple
No ratings yet
Audio Fingerprinting Based On Multiple
4 pages
Robotics Echolocation Test Platform
No ratings yet
Robotics Echolocation Test Platform
5 pages
Fujipress - JRM 29 1 4
No ratings yet
Fujipress - JRM 29 1 4
13 pages
Wang 2010
No ratings yet
Wang 2010
6 pages
Ray_acoustics_using_computer_graphics_technology
No ratings yet
Ray_acoustics_using_computer_graphics_technology
9 pages
A_Gated_Recurrent_Unit_Based_Robust_Voic
No ratings yet
A_Gated_Recurrent_Unit_Based_Robust_Voic
6 pages
Direction of Radio Finding Via MUSIC (Multiple Signal Classification) Algorithm For Hardware Design System
No ratings yet
Direction of Radio Finding Via MUSIC (Multiple Signal Classification) Algorithm For Hardware Design System
6 pages
Background CHI2020 - SAW Sensing CR
No ratings yet
Background CHI2020 - SAW Sensing CR
2 pages
FB Farmani Etal Feb2023 ML
No ratings yet
FB Farmani Etal Feb2023 ML
7 pages
An Experimental SDIF Sampler in MaxMSP
No ratings yet
An Experimental SDIF Sampler in MaxMSP
4 pages
Optical Measurement of Acoustic Drum Strike Locations: Janis Sokolovskis Andrew P. Mcpherson
No ratings yet
Optical Measurement of Acoustic Drum Strike Locations: Janis Sokolovskis Andrew P. Mcpherson
4 pages
ICASSP 2025 ImmersDiffusion Immersive Audio Latent Diffusion Generative Model
No ratings yet
ICASSP 2025 ImmersDiffusion Immersive Audio Latent Diffusion Generative Model
5 pages
Performance Analysis of Direction of Arrival Estimation Algorithms For Smart Antenna
No ratings yet
Performance Analysis of Direction of Arrival Estimation Algorithms For Smart Antenna
6 pages
Wild_Bird_Species_Identification_Based_on_a_Lightw
No ratings yet
Wild_Bird_Species_Identification_Based_on_a_Lightw
12 pages
Percussive Audio Mixing With Wave-U-Nets
No ratings yet
Percussive Audio Mixing With Wave-U-Nets
12 pages
3D Microphone Array Comparison: Objective Measurements: AES Fellow, and Dale Johnson
No ratings yet
3D Microphone Array Comparison: Objective Measurements: AES Fellow, and Dale Johnson
17 pages
Audio-Visual Cross-Attention Network For Robotic Speaker Tracking
No ratings yet
Audio-Visual Cross-Attention Network For Robotic Speaker Tracking
13 pages
Flicker Noise Mitigation in Direct-Conversion Receivers for OFDM Systems
No ratings yet
Flicker Noise Mitigation in Direct-Conversion Receivers for OFDM Systems
5 pages
ICAD Proceedings 15-Perez-Lopez
No ratings yet
ICAD Proceedings 15-Perez-Lopez
8 pages
Image de Noising
No ratings yet
Image de Noising
12 pages
Spectrogram Transformers for Audio Classification
No ratings yet
Spectrogram Transformers for Audio Classification
7 pages
On Pilot Symbol Assisted Carrier Synchro
No ratings yet
On Pilot Symbol Assisted Carrier Synchro
8 pages
A feasibility study for a hand-held acoustic imaging camera
No ratings yet
A feasibility study for a hand-held acoustic imaging camera
26 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
Sensors: Design and Implementation of A Real-Time Multi-Beam Sonar System Based On FPGA and DSP
No ratings yet
Sensors: Design and Implementation of A Real-Time Multi-Beam Sonar System Based On FPGA and DSP
23 pages
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
No ratings yet
Scream and Gunshot Detection and Localization For Audio-Surveillance Systems
6 pages
1 s2.0 S0957417424006341 Main
No ratings yet
1 s2.0 S0957417424006341 Main
10 pages
AES136IRIS
No ratings yet
AES136IRIS
8 pages
Acoustic Predictions of High Power Sound Systems
No ratings yet
Acoustic Predictions of High Power Sound Systems
7 pages
Transform Wave Flaw WangYou
No ratings yet
Transform Wave Flaw WangYou
8 pages
Parameter Estimation With Bistatic MIMO Radar
No ratings yet
Parameter Estimation With Bistatic MIMO Radar
16 pages
Infocom16 Qian
No ratings yet
Infocom16 Qian
9 pages
Conference Paper: Audio Engineering Society
No ratings yet
Conference Paper: Audio Engineering Society
8 pages
Politecnico Di Torino Porto Institutional Repository
No ratings yet
Politecnico Di Torino Porto Institutional Repository
13 pages
2208.03688
No ratings yet
2208.03688
2 pages
Acoustic Deep Learning PDF
No ratings yet
Acoustic Deep Learning PDF
16 pages
An Object-Based Audio System For Interactive Broadcasting
No ratings yet
An Object-Based Audio System For Interactive Broadcasting
10 pages
Rudrich-TMT16-Efficient_Spatial_Ambisonic_Effects
No ratings yet
Rudrich-TMT16-Efficient_Spatial_Ambisonic_Effects
5 pages
Securing Audio Watermarking System Using Discrete Fourier Transform For Copyright Protection
No ratings yet
Securing Audio Watermarking System Using Discrete Fourier Transform For Copyright Protection
5 pages
968_1_10.0009350
No ratings yet
968_1_10.0009350
15 pages
Algo For DOA
No ratings yet
Algo For DOA
4 pages
Electronics_Letters_2024_Niu_Acoustic_echo_cancellation_based_on
No ratings yet
Electronics_Letters_2024_Niu_Acoustic_echo_cancellation_based_on
5 pages
Acoustic Signal Based Fault Detection On Belt Conveyor Idlers Using Machine Learning
No ratings yet
Acoustic Signal Based Fault Detection On Belt Conveyor Idlers Using Machine Learning
10 pages
Dataset For Uav PDF
No ratings yet
Dataset For Uav PDF
9 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Cartesian Coordinates
No ratings yet
Cartesian Coordinates
13 pages
Matrix Algebra
No ratings yet
Matrix Algebra
22 pages
Convolution: 8.1. Linear Translation Invariant (LTI or LSI) Operators
No ratings yet
Convolution: 8.1. Linear Translation Invariant (LTI or LSI) Operators
9 pages
VI Sem. B.sc. Mathematics - Core Course - Number Theory and Linear Algebra
0% (1)
VI Sem. B.sc. Mathematics - Core Course - Number Theory and Linear Algebra
13 pages
Unit-6.PDF Engg Math
No ratings yet
Unit-6.PDF Engg Math
56 pages
Adaptive Regularized Zero-Forcing Beamforming in M
No ratings yet
Adaptive Regularized Zero-Forcing Beamforming in M
14 pages
Assignment - 1
No ratings yet
Assignment - 1
2 pages
MTK Bahasa Inggris
No ratings yet
MTK Bahasa Inggris
2 pages
Question Paper Code:: Reg. No.
No ratings yet
Question Paper Code:: Reg. No.
3 pages
General Physics 1 VECTOR
100% (1)
General Physics 1 VECTOR
57 pages
Partial Least Squares
No ratings yet
Partial Least Squares
21 pages
Lesson 3. Application of Matrices and Determinants
No ratings yet
Lesson 3. Application of Matrices and Determinants
63 pages
Mathmatical Physics
No ratings yet
Mathmatical Physics
7 pages
UNIT - IV Fourier Transforms
No ratings yet
UNIT - IV Fourier Transforms
66 pages
Divergence and Curl of A Vector Field: F 3x I +5xy J +xyz K
No ratings yet
Divergence and Curl of A Vector Field: F 3x I +5xy J +xyz K
3 pages
Chapter 10
No ratings yet
Chapter 10
36 pages
I Integral Equations and Operator Theory
No ratings yet
I Integral Equations and Operator Theory
20 pages
Introduction To GNU Octave: Updated To Current Octave Version by Thomas L. Scofield
No ratings yet
Introduction To GNU Octave: Updated To Current Octave Version by Thomas L. Scofield
18 pages
Math-Ii Question Bank
No ratings yet
Math-Ii Question Bank
11 pages
Differential Equations Problems and Solutions
No ratings yet
Differential Equations Problems and Solutions
19 pages
10 Hadamard Matrices
No ratings yet
10 Hadamard Matrices
4 pages
Continue
No ratings yet
Continue
3 pages
[2023.03.07] Vector Analysis
No ratings yet
[2023.03.07] Vector Analysis
11 pages
Linear Trans
100% (1)
Linear Trans
83 pages
Lecture 9
No ratings yet
Lecture 9
8 pages
TurboTDDFT 2.0-Hybrid Functionals and New Algorithms Within
No ratings yet
TurboTDDFT 2.0-Hybrid Functionals and New Algorithms Within
10 pages
How To Crack DSE Without Coaching by Ravit Thukral
No ratings yet
How To Crack DSE Without Coaching by Ravit Thukral
21 pages

Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array

Uploaded by

Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array

Uploaded by

2014 IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS 2014)

Improvement in Outdoor Sound Source Detection

978-1-4799-6934-0/14/$31.00 ©2014 IEEE 1902

pre-defined noise signals, and thus it has a difficulty in

coping with dynamically changing noise. Likewise iGEVD-

by controlling two parameters fs and TN . The optimization

a) hemispherical layout(profile) b) hemispherical layout (underside) c) circular layout

You might also like