Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
Abstract— This paper addresses sound source detection in array which consists of a small and lightweight microphone
an outdoor environment using a quadrotor with a microphone and multichannel A/D converter [5]. Since dynamically-
array. Since the previously reported method has a high compu- changing high power noise is generated due to the rotation
tational cost, we proposed a sound source detection algorithm
called MUltiple SIgnal Classification based on incremental of propellers and wind, they proposed incremental Gen-
Generalized Singular Value Decomposition (iGSVD-MUSIC), eralized EigenValue Decomposition-based Multiple Signal
which detects sound source location and temporal activity with Classification (iGEVD-MUSIC). It originated from GEVD-
low computational cost. In addition, to relax an over-esitimation MUSIC proposed by Nakamura et al. [6] as an extension of
problem of noise correlation matrix which is used in iGSVD- a well-known adaptive beamforming algorithm, MUSIC [7].
MUSIC, we proposed Correlation Matrix Scaling (CMS), which
realizes soft whitening of noise. The protptype system based GEVD-MUSIC can localize sound sources by whitening high
on the proposed methods were evaluated with two types of power noise using a noise correlation matrix which is esti-
microphone arrays in an outdoor environment. Experimental mated in advance, and thus it shows high noise robustness.
results showed that the combination of iGSVD-MUSIC and On the other hand, such a pre-estimated noise correlation
CMS improves sound source detection performance drastically matrix results in a lack of robustness for dynamically-
and achieves real-time processing.
Index Terms— robot audition, speech detection, sound source changing noise. Since iGEVD-MUSIC can incrementally and
localization, sound source separation adaptively estimate the noise correlation matrix, it is robust
for dynamically-changing high power noise.
I. I NTRODUCTION Since iGEVD-MUSIC simply used sound signals recorded
a few seconds before the current processing time for noise
Computational Auditory Scene Analysis (CASA) in an
correlation matrix estimation, Furukawa et al. improved it
outdoor environment has been actively studied in the past
using motion information obtained from Inertial Measure-
few years. This is caused by the fact that real-time signal and
ment Unit (IMU) of their quadrotor [8].
speech processing for a robot in an indoor environment was However, these studies still have two issues as follows:
established for over ten years of robot audition research1 ,
• They assume off-line processing, and computational
and that software environments for robot technology became
costs of GEVD is too expensive to perform real-time
available such as HARK (HRI-JP Audition for Robots with
processing.
Kyoto Univ.) [1] and ROS2 . For instance, Sasaki et al. re-
• Since perfect estimation of a noise correlation matrix
ported sound source identification using a mobile robot with
is difficult, it is inevitable that the whitening process
a 32 ch microphone array for anomaly sound detection [2].
in GEVD produces errors, which leads to performance
Bando et al. studied posture estimation of a hose-type robot
degradation.
by using robot’s embedded microphones and speakers in a
For the first issue, we propose MUltiple SIgnal Classi-
SLAM manner, which is an essential function to determine
fication based on incremental Generalized Singular Value
the position of any person buried in rubble [3].
Decomposition (iGSVD-MUSIC), which is an extension
Sound Source Detection (SSD), which localizes sound
of GSVD-MUSIC [9]. Likewise GEVD-MUSIC, GSVD-
sources with sound activity detection using an Unmanned
MUSIC can whiten noise using a noise correlation matrix,
Aerial Vehicle (UAV) is also reported. It has been studied
and drastically reduce the computational cost while main-
for military use utilizing expensive sensors such as Acoustic
taining the performance of sound source localization.
Vector Sensor (AVS) [4], and thus focused only on detection
For the second issue, we propose Correlation Matrix
of high power sound sources such as tanks and airplanes.
Scaling (CMS) which realizes soft whitening by raising the
Another approach, Okutani et al. recently reported SSD
singular values obtained from the noise correlation matrix to
using a Parrot AR.Drone by installing an 8 ch microphone
the power of α(0 ≤ α ≤ 1). The noise correlation matrix is
T. Ohata, T. Taiki and K. Nakadai are with Graduate School sometimes over-estimated, which degrades the performance
of Information Science and Engineering, Tokyo Institute of Tech- of localization drastically. For instance, the over-estimated
nology, 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552, JAPAN.
[email protected] noise correlation matrix whitens noise sources as well as
K. Nakamura, T. Mizumoto, K. Nakadai are with Honda Research target sound sources, and generates ghost sound sources,
Insititute Japan Co., Ltd., 8-1 Honcho, Wako, Saitama 351-0114, JAPAN. which causes miss and extra-detection of sound sources.
{keisuke,t.mizumoto,nakadai}@jp.honda-ri.com
1 Organized sessions on robot audition have been continuously held at Because CMS makes the whitening effect softer using the
IROS 2004-2013 scaling parameter α, it avoids the performance degradation
2 http://www.ros.org even when over-estimation occurs.
Let X(ω, f ) be the observed signal vector at the f -th P̄ (ψ, f ) = P (ω, ψ, f ), (5)
ωH − ωL + 1 ω=ω
frame at the ω-th frequency bin. The correlation matrix L
R(ω, f ) is obtained from X(ω, f ) by where ωH , and ωL the highest and lowest boundary of the
R −1
frequency bin, respectively.
1 ∑
f +T
R(ω, f ) = X(ω, τ )X ∗ (ω, τ ), (1) Sound source direction is estimated as ψ which has a peak
TR exceeding a threshold Pth in P̄ (ψ, f ).
τ =f
where TR is the number of frames for temporal integration B. Comparison of iGSVD-MUSIC and iGEVD-MUSIC
to calculate the correlation matrix. The computational cost and the performance between
For the f -th frame signal, iGSVD-MUSIC assumes that iGSVD-MUSIC and iGEVD-MUSIC are discussed. GEVD-
TN frames with the boundary of f − fs − TN to f − fs MUSIC uses generalized eigenvalue decomposition instead
include only noise signals, and the noise correlation matrix of generalized singular value decomposition in Eq. (3).
K(ω, f ) is defined by, However, it is not guaranteed that K −1 (ω, f )R(ω, f ) is a
−fs
f∑ Hermitian matrix. This means that eigen vectors obtained
1
K(ω, f ) = X(ω, τ )X ∗ (ω, τ ) , (2) by GEVD are not always orthogonal to each other. This
TN leads to performance degradation because MUSIC-based
τ =f −fs −TN
algorithms assume that eigen (or singular) vectors are or-
The problem with the original GSVD-MUSIC is that it
thogonal as mentioned above. iGEVD-MUSIC, therefore,
uses the fixed noise correlation matrix obtained from the
replaces K −1 (ω, f )R(ω, f ) with K − 2 RK − 2 to solve this
1 1
1903
in this model, which is a common assumption in the field of where Λ is a diagonal matrix consisting of eigenvalues, and
signal processing. ω and f are omitted for simplicity. R is E is an eigen vector matrix.
represented by Λ specifies the power of each eigen vectors, and E does
the direction of each eigen vectors in the noise subspace.
R = XX ∗ = ASS ∗ A∗ + N N ∗ (7) To fulfill the above requirements, K α is defined by by
= Γ+K (8) controlling Λ.
In iGEVD-MUSIC, the noise is whitened as I. Kα = EΛα E ∗ , (16)
K − 12
RK − 21
= K − 12
(Γ + K) K − 12
(9) Λα = diag(λα α
1 , ..., λM ) (17)
− 12 − 12
= K ΓK + I. (10) where we call α (0 ≤ α ≤ 1) the scaling parameter of CMS.
CMS replaces K with K α in Eq. (3). When α is 1, K α is
In iGSVD-MUSIC, Eq. (3) is transformed by multiplying
identical to K, which is the same as iGSVD-MUSIC without
(K −1 R)∗ as
CMS. When α is 0, K α becomes I, which means no whiten-
K −1 R(K −1 R)∗ = El ΛEr∗ (El ΛEr∗ )∗ , (11) ing is performed like SEVD. By changing α, the level of
K −1 2
R K −1
= El Λ 2
El∗ . (12) soft whitening can be controlled. Since SEVD is performed
every time after estimating K, higher computational cost is
This is equivalent to GEVD of R2 , and thus El is regarded as necessary. This issue is discussed in the later sections.
an eigen vector matrix obtained from GEVD of R2 . Eq. (12) Note that CMS can be applied to any whitening-based
is able to be re-written using Eq. (8) as MUSIC such as iGSVD-MUSIC, iGEVD-MUSIC, GSVD-
∗ MUSIC, and GEVD-MUSIC.
K −1 R2 K −1 = K −1 (Γ + K) (Γ + K) K −1 (13)
= K −1 Γ2 K −1 + K −1 Γ IV. S YSTEM A RCHITECTURE
+ΓK −1 + I (14) Fig. 1 shows the system architecture for SSD based
on the proposed method. It consists of a quadrotor with
The noise factor K is successfully whitened as I ac- microphones and a laptop PC where SSD is running. They
cording to the fourth term in the right side of Eq. (14) , are described in the following sections.
but the second and third terms remain, which degrades the
performance. Therefore, the performance of iGSVD-MUSIC A. Quadrotor and Microphone Layout
should be poorer than that of iGEVD-MUSIC. The effect We used AscTech Pelican for our quadrotor, which has a
of these terms for degradation should be investigated in the minimum take-off weight and maximum payload of 630 g
later sections. and 650 g respectively. It can fly up to 15 min with a
In this paper, we call the original MUSIC Standard Eigen- full payload. Since we can control it and obtain sensor
Value Decomposition MUSIC (SEVD-MUSIC). It directly information of IMU, and GPS using ROS, it is convenient for
uses R(ω, f ) to perform SEVD without whitening, and thus research purposes. We installed a microphone array on our
the noise robustness is lower than GEVD and GSVD. In Pelican, sensory data from IMU and GPS, and 16 ch audio
particular, when the noise is of higher power than the target are recorded synchronously and all data is sent to a remote
signal, the assumption that larger eigenvalues correspond to laptop PC via a wireless network.
the target sound sources does not hold, and sound source We constructed two types of microphone arrays shown
localization deteriorates. in Fig. 2. Both consist of a multi-channel A/D converter
RASP and 16 MEMS microphones developed by Systems in
III. C ORRELATION M ATRIX S CALING Frontier Inc. on our quadrotor.
A key factor for iGSVD-MUSIC is estimation perfor- Fig. 2a) shows a microphone array with a semi-sphere
mance of the noise correlation matrix. Since it is impossible shape which is made of Styrofoam and 16 microphones are
to have perfect estimation in a real environment, the more attached to the surface at the positions of the black hair
precisely we estimate it, the more risk we have for over- shown in Fig. 2b). This layout is designed mainly to detect
estimation. sound sources under the quadrotor. We think that this layout
The idea of CMS is to soften the whitening effect to avoid is beneficial in practical use, because the distance between
errors caused by over-estimation of the noise correlation the quadrotor and such sound sources are relatively short,
matrix. To do this, we have to control the noise subspace and thus Signal-to-Noise Ratio (SNR) is expected to be high.
spanned by noise singular (or eigen) vectors so that only Fig. 2c) shows another microphone array which has a
the level of whitening can be changing without modifying circular layout. In signal processing, it is known that the
changing the direction of the noise singular vectors. larger microphone array gives a sharper mainlobe, i.e., better
Let K be the estimated noise correlation matrix in resolution in sound source localization. We designed this
iGSVD-MUSIC. Since K is an Hermitian matrix, Standard microphone array to have a large diameter so that the
EigenValue Decomposition (SEVD) is performed by resolution can be as high as possible.
Both microphone arrays are used in an outdoor environ-
K = EΛE ∗ , (15) ment, where wind noise is a big problem. To solve this
1904
Wireless Network Sound Source Detection
IMU, GPS Peak
Correlation Sound
16 ch audio data GSVD
Matrix Estimation Extraction Tracking
16 ch audio Frequency
Analysis (STFT) Incremental Correlation
AscTech Pelican Noise Correlation Matrix Scaling sound source diretion
Matrix Estimation (CMS) utterance start time
utterance duration
ROS HARK
Fig. 1. System Architecture for Sound Source Detection Using Quadrotorn
problem, we put wind protection at the position of each of the extracted peaks. The basic algorithm of sound tracking
microphone. Although various materials such as sponge is simple; two soccessive peaks with the same direction are
rubber and cloth are used for wind protection, we selected a connected.
hair-type material. It is known as one of the most effective
V. E VALUATION
materials because the wind power is absorbed by hair waving
motions. In addition, the motion of the hairs does not produce We evaluated the proposed methods using the constructed
acoustic noise. prototype SSD system. Two types of microphone arrays
are used to investigate general applicability of the proposed
B. Software Implementation methods, and also to explore the best design of the micro-
We used HARK3 [1] for software implementation of phone array. Evaluation tasks are sound source detection of
acoustic signal processing. Since HARK is able to be in- speech sources in an outdoor environment.
tegrated with ROS-based systems in a seamless manner, we Fig. 3 shows the situation of the experiments. The height
easily integrated the data recording module developed with of the quadrotor (h) was fixed to one of 0 m, 1 m, 2.7 m, 4 m,
ROS and the acoustic signal processing module developed and the distance to the speaker (d) was fixed to one of 1 m,
with HARK. Since we focus on sound source detection in 2 m, 3 m for each condition. Since two microphone arrays
this paper, Fig. 1 illustrates only sound source detection. were also investigated, in total 24 conditions (4 × 3 × 2)
IMU and GPS information can be used for the coordinate were recorded. In one condition, 7-9 trials were included, i.e.,
conversion for our future research purposes. a speaker made 7-9 utterances. In each condition, we con-
Sound source detection consists of seven modules such as ducted five SSD methods, that is, SEVD-MUSIC, iGEVD-
frequency analysis, correlation matrix estimation, incremen- MUSIC, iGEVD-MUSIC with CMS iGSVD-MUSIC, and
tal noise correlation matrix estimation, CMS, GSVD, peak iGSVD-MUSIC with CMS.
extraction and sound tracking. The frequency analysis mod- As for metrics, we used Localization Correct Rate (LCR)
ule simply performs short time Fourier transformation with and Localization Accuracy Rate (LAR) which were utterance-
a 32 ms window and a 10 ms window shift for 16 ch audio based metrics proposed by Okutani et al. [5]. The LCR and
data sampled at 16 kHz. The correlation matrix estimation LAR are defined by
module calculates the correlation matrix for the current time Nt − NS − ND − NI
frame according to Eq. (1). The incremental noise correlation LAR = (18)
Nt
matrix is described as Eq. (2), which is followed by the CMS Nt − NS − ND
module defined in Eq. (16). The peak extraction module LCR = (19)
Nt
extracts peaks from theMUSIC spectrum defined by Eq. (5)
where Nt is the total number of utterances, NS is the number
with threshold parameter Pth and sends them to the sound
of detected but mis-localized utterances (substitution). ND
tracking module. The sound tracking module forms a sound
and NI are the number of undetected utterances (deletion)
stream corresponding to an utterance as a temporal sequence
and extra-detected utterances (insertion). Note that we de-
3 http://www.hark.jp/ cided that an utterance is successfully-detected when its
1905
TABLE I
R ESULT OF SSD P ERFORMANCE
a) SEVD-MUSIC
semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 50 100 44 11 63 50 67 33
d 2.0 75 13 33 33 63 100 89 44
[m] 3.0 63 13 22 11 88 25 67 56
LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 0 100 44 -33 38 25 44 -44
d 2.0 25 -100 -56 -67 25 100 44 -12
[m] 3.0 13 -50 -22 -33 75 -25 33 -12
b) iGEVD-MUSIC
semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 100 100 66 33 100 63 100 44
d 2.0 88 100 88 89 100 100 67 67
[m] 3.0 88 100 100 100 100 87 78 67
Fig. 3. Saturation of Experiments LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
temporal error is within 0.5 s, and localization error is within 1.0 100 100 56 33 100 50 100 33
d 2.0 75 87 78 89 100 100 67 56
10◦ compared to the corresponding reference data. [m] 3.0 75 100 100 100 100 75 78 56
We also used Real Time Factor (RTF) to compare the c) iGSVD-MUSIC
computational costs of these methods. It is defined by semi-sphere circular
LCR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
1 ∑
Nt
Tp (t) 1.0 13 13 33 22 100 25 78 33
RT F = (20) d 2.0 13 50 33 67 100 100 33 44
Nt t Ta (t) [m] 3.0 13 25 44 44 100 63 44 33
LAR [%] h [m] h [m]
where Tp and Ta are the processed time for the recorded file 0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 13 -138 11 -11 100 0 78 -22
at the t-th condition and the whole length of the file. Nt is d 2.0 -25 25 22 44 100 100 11 33
the number of conditions. [m] 3.0 -50 25 11 -56 100 50 44 0
Real-time processing is achieved for RTF ≤ 1. In this pa- d) iGEVD-MUSIC w/ CMS
semi-sphere circular
per, we used a laptop PC with Intel(R) Core(TM) i7-2920XM LCR [%] h [m] h [m]
(2.5GHz – 3.5GHz) and 16GB of memory. Although it has 0 1.0 2.7 4.0 0 1.0 2.7 4.0
1.0 100 100 100 89 100 63 100 44
4 cores with hyper-threading, we just used one core without d 2.0 100 100 88 56 100 100 89 56
hyper-threading for RTF measurement. [m] 3.0 100 100 100 67 100 100 78 89
LAR [%] h [m] h [m]
There are several system parameters which needed to be 0 1.0 2.7 4.0 0 1.0 2.7 4.0
set in advance. TR in Eq. (1), TN and fs in Eq. (2) were 1.0 100 100 100 89 100 63 100 33
d 2.0 100 100 67 11 100 100 78 56
empirically set to 500 ms, 900 ms, and 1400 ms, respectively [m] 3.0 88 100 89 -44 100 63 78 67
since these values show the best performance in our rough e) iGSVD-MUSIC w/ CMS
parameter search. MUSIC-based algorithms require the num- semi-sphere circular
LCR [%] h [m] h [m]
ber of sound sources denoted by L in Eq. (4). Since only 0 1.0 2.7 4.0 0 1.0 2.7 4.0
one person uttered in every condition, L was set to 1. CMS 1.0 100 100 89 67 100 63 100 33
d 2.0 100 100 89 67 100 100 89 56
also has a scaling parameter α. We empirically set this [m] 3.0 100 100 100 78 100 88 78 89
parameter to be 0.5, which can avoid most errors caused by LAR [%] h [m] h [m]
0 1.0 2.7 4.0 0 1.0 2.7 4.0
over-estimation of the noise correlation matrix. Automatic 1.0 100 100 89 67 100 63 100 11
optimization of this parameter is future work. d 2.0 100 100 67 33 100 100 78 56
[m] 3.0 88 100 100 -44 100 63 78 56
A. Results
TABLE II
Tab. I shows SSD performance in terms of LAR and
R EAL T IME FACTOR (RTF)
LCR for 24 conditions. Tab. I a)-e) shows results with method RTF
SEVD-MUSIC, iGEVD-MUSIC, iGEVD-MUSIC with CMS SEVD-MUSIC 0.48
iGSVD-MUSIC, and iGSVD-MUSIC with CMS, respec- iGEVD-MUSIC 1.02
iGEVD-MUSIC w/ CMS 1.56
tively. To understand the results visually, Fig. 4 depicts SSD iGSVD-MUSIC 0.37
results with the semi-sphere microphone array under the iGSVD-MUSIC w/ CMS 0.95
condition of h = 1, and d = 2. The vertical axis shows
sound source direction in the quadrotor coordinates, and the In Tab. I a), it is interesting to know SEVD-MUSIC
horizontal axis shows the time in each frame. Fig. 4a) shows showed good performance when sound sources are close to
reference data which is regarded as correct SSD. Black lines the microphone array, but it deteriorates when the distance
in Fig. 4b)-d) show the detected utterances, and the color is over 1.0 m, which is also seen in Fig. 4b). Tab. I b)
maps visualizes P̄ (ψ, f ) in Eq. (5). Tab. II shows RTFs and Fig. 4c) show that iGEVD-MUSIC improves the SSD
averaged over all conditions for these five methods. performance. Tab. I c) shows that iGSVD-MUSIC is not as
1906
MUSIC is light weight enough to achieve real-time process-
ing. Even when CMS is used on the top of iGSVD-MUSIC,
it still maintain real-time processing. Since its performance
is comparable with iGEVD-MUSIC with CMS, as a whole
it is the best method among these five methods for outdoor
SSD.
VI. C ONCLUSION
a) reference
This paper presents sound source detection using a quadro-
tor with a microphone array in an outdoor environment.
To reduce computational cost of microphone-array-based
sound source detection while maintaining performance, we
proposed MUltiple SIgnal Classification based on incre-
mental Generalized Singular Value Decomposition (iGSVD-
MUSIC). In addition, we proposed Correlation Matrix Scal-
ing (CMS) to improve SSD performance by using soft
whitening. The effectiveness of these proposed methods
b) SEVD-MUSIC are proved using sound data recorded with two types of
microphone arrays embedded in a quadrotor. Since the
system is fast enough to perform real-time processing and the
evaluation was done in an offline manner, we will implement
a real-time and online SSD system and evaluate it in a
realistic scenario in the near future.
ACKNOWLEDGMENTS
This research was partially supported by Grant-in-Aid for
c) iGEVD-MUSIC Scientific Research No. 24118702, and No. 24220006
R EFERENCES
[1] K. Nakadai, H. G. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsu-
jino, “Design and implementation of robot audition system HARK,”
Advanced Robotics, vol. 24, no. 9, pp. 739–761, 2009.
[2] Y. Sasaki, N. Hatao, K. Yoshii, and S. Kagami, “Nested igmm
recognition and multiple hypothesis tracking of moving sound sources
for mobile robot audition,” in Proc. of the IEEE/RSJ International
Conference on Robots and Intelligent Systems (IROS), 2013, pp. 3930–
3936.
d) iGSVD-MUSIC with CMS [3] Y. Bando, T. Mizumoto, K. Itoyama, K. Nakadai, and H. G. Okuno,
“Posture estimation of hose-shaped robot using microphone array
Fig. 4. An example of SSD result (semi-sphere, h = 1, d = 2) localization,” in Proc. of the IEEE/RSJ International Conference on
Robots and Intelligent Systems (IROS), 2013, pp. 3446–3451.
effective as iGEVD-MUSIC, but slightly better than SEVD- [4] B. Kaushik, D. Nance, and K. K. Ahuj, “A review of the role of acoustic
sensors in the modern battlefield,” in 11th AIAA/CEAS Aeroacoustics
MUSIC. This is caused by the over-estimation of the noise Conference (26th AIAA Aeroacoustics Conference), 2005, pp. 1–13.
correlation matrix and cross term effect in Eq. (14). Tab. I d) [5] K. Okutani, T. Yoshida, K. Nakamura, and K. Nakadai, “Outdoor
shows the best performance among these five methods, which auditory scene analysis using a moving microphone array embedded
in a quadrocopter,” in IEEE/RSJ IROS, 2012, pp. 3288–3293.
means that CMS is quite a powerful method to improve [6] K. Nakamura, K. Nakadai, F. Asano, Y. Hasegawa, and H. Tsujino,
performance. Tab. I e) shows comparable performance to “Intelligent sound source localization for dynamic environments,” in
iGEVD-MUSIC with CMS (see also Fig. 4d)). CMS drasti- Proc. of IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems (IROS
2009), 2009, pp. 664–669.
cally improves the performance compared to iGSVD-MUSIC [7] R. Schmidt, “Multiple emitter location and signal parameter estimation,”
in Tab. I c). This suggests that the over-estimation problem IEEE Trans. on Antennas and Propagation, vol. 34, no. 3, pp. 276–280,
was severer than the cross term effect in iGSVD-MUSIC, and 1986.
[8] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. itoyama,
these problems were overcome by CMS. As for microphone K. Nakadai, and H. G. Okuno, “Noise correlation matrix estimation
layouts, they have different characteristics, but the same for improving sound source localization by multirotor uav,” in Proc.
tendency was observed between the five methods. of the IEEE/RSJ International Conference on Robots and Intelligent
Systems (IROS). IEEE, 2013, pp. 3943–3948.
As for RTF, Tab. II proves that iGEVD-MUSIC is com- [9] K. Nakamura, K. Nakadai, and G. Ince, “Real-time super-resolution
putationally expensive, and it is difficult to attain real-time sound source localization for robots,” in Proceedings of 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS
processing although it shows good performance. Since CMS 2012). IEEE, Otc. 2012, pp. 694–699.
performs SEVD, we were afraid that it also requires high
computational power. Indeed RTF becomes larger, but it was
less critical than we expected. On the other hand, iGSVD-
1907