0% found this document useful (0 votes)
68 views

MIMO Beamforming System For Speech Enhancement in Realistic

Uploaded by

NavneetUpadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

MIMO Beamforming System For Speech Enhancement in Realistic

Uploaded by

NavneetUpadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Speech Technology

https://doi.org/10.1007/s10772-018-9530-9

MIMO beamforming system for speech enhancement in realistic


environment with multiple noise sources
Jafar Ramadhan Mohammed1 

Received: 9 March 2018 / Accepted: 1 July 2018


© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract
Multiple noise sources in a realistic environment severely degrade the quality and intelligibility of the desired speech signal,
thus posing a severe problem for many speech applications. Several noise reduction algorithms have been proposed with
a main goal to solve this problem. However, the good performances of such algorithms are severely impaired in realistic
environment under multi-noise sources condition. In this paper, the author treats the noise cancellation system as a multiple-
input multiple-output (MIMO) beamformer system. The proposed approach consists of two steps. First, the noise signals are
generated by applying the white noise sources to a MIMO AR system. Then, the noisy microphone signals are sequentially
processed by employing multi-channel linear prediction error filters (MCLPEFs) and multi-channel adaptive noise estima-
tion filters (MCANEFs) in the lower path of the proposed beamformer. The MCLPEFs are used to whiten the input signals,
while the MCANEFs are used as a MIMO system identification to perform the modeling process of the noise signals. Finally,
the noise signals in the upper path are subtracted from the estimated noises in the lower path to recover an enhanced speech
signal. Moreover, the performance of the proposed MIMO approach was validated under a realistic environment with real
noise sources.

Keywords  Adaptive beamforming · Multiple-input multiple-output · Microphone array · Noise estimation · Multi-noise
reduction

1 Introduction degraded in reverberant environments. Toward this end,


Bitzer et al. (1999) analyzed the performance of the GSC
Multiple-noise reduction under actual environment condi- beamformer and they showed that the noise reduction might
tions in speech communications is still an unsolved prob- be relatively large only when the noise source is directional.
lem in the signal processing society. The problem is that They also showed that its performance degrades severally in
the desired speech signal and the noise signals may overlap the realistic environment with a reverberation time > 100 ms.
in the time and in the frequency domains. Spatial filtering To overcome this problem with the GSC beamformer and
by beamforming microphone array may allow separation at the same time to fully benefit from its simplicity, many
without distortion of the desired speech signal. The most variants are derived from this structure (Rashmirekha and
simplest and efficient implementation of the adaptive beam- Mohanty 2018; Djendi 2018; Mohammed et al. 2017).
formers is the generalized sidelobe canceller (GSC) structure In Mohammed (2009a, b) the author proposed a com-
(Griffiths and Jim 1982). However, in practical situations bined system for noise reduction and acoustic echo cancel-
the GSC beamformers are sensitive to reverberation of the lation for enhancing the desired speech signal. The proposed
desired speech signal and to steering errors. Therefore, the system uses a new adaptive blocking matrix based on linear
desired speech signal may be distorted and the noise reduc- prediction error filters, a new multi-channel noise canceller
tion performance of the GSC beamformers may be severely based on adaptive noise estimation filters, and an acous-
tic echo canceller based on IIR-RLS filter. In Mohammed
* Jafar Ramadhan Mohammed (2017), the author investigates the possibility of simultane-
[email protected] ous cancellation of two types of noises, i.e., narrow band and
wide band noises, by properly modifying the standard two-
1
College of Electronic Engineering, Ninevah University, input adaptive noise canceller system. The modifications
Mosul, Iraq

13
Vol.:(0123456789)
International Journal of Speech Technology

were including the additions of two adaptive line enhancers Instead of choosing the filters of the quiescent weight vec-
(ALEs) in the primary and reference channels of the stand- tor independently of the filters for the blocking matrix, they
ard system so that the narrowband noise can be efficiently are the same. The steering of the array to the position of the
eliminated by these ALEs in the first stage, while the stand- desired source is implicitly included in the quiescent weight
ard system concentrates on the wide band noise cancella- vector. The problem of adapting the blocking matrix and
tion in the second stage. The proposed system has shown its the multi-channel noise canceller is addressed by switching
capability for reducing both the narrowband and wideband between the adaptation of the blocking matrix and the adapta-
noise signals and its performance was shown to be better tion of the multi-channel noise canceller depending on pres-
than that of the standard system. ence of the desired signal and presence of noise, respectively.
In this paper, the author treats the microphone array sys- The decision is performed by a full-band energy-based voice
tem as a multiple-input multiple-output (MIMO) optimum activity detector.
filter. This gives new insights into the merits of the beam- The way to compute the filters for the blocking matrix
former system which take reverberation of the input signals and for the quiescent weight vectors is altered in Linhui et al.
into account by modeling the propagation by room impulse (2017). The determination of optimum filters for the block-
responses. Such that the reverberations are cancelled at the ing matrix can be seen from a system identification point of
output of the linear predictor parts in the proposed system. view. Using this interpretation, it is suggested in Linhui et al.
In addition, this paper deals with the concept of speech (2017) to estimate the blocking matrix by a system identi-
enhancement under conditions of multi and various noise fication technique which exploits the non-stationarity of
sources in realistic environment. The proposed MIMO sys- the desired signal. As in Van Comprenolle (1990), speech
tem consists of two steps. In the first step, the noisy micro- pauses are required to determine the blocking matrix filters.
phone signals are whitened by using multi-channel linear The tracking problem for the sidelobe canceling path is thus
prediction filters in the lower path of the proposed MIMO not resolved. Other researchers have examined the improved
system. In the second step, the background noise signals GSC for speech enhancement in a reverberating environment
can be reconstructed from the prediction error signals by (Hoshuyama et al. 1999; Herbordt and Kellermann 2003; Her-
estimating the transfer functions of noise generation sys- bordt et al. 2007). Hoshuyama et al. (1999) used a three-block
tems. These estimations are performed by the adaptive noise structure similar to the GSC. However, the blocking matrix
estimation filters in the multi-channel noise canceller. The has been modified to operate adaptively. In order to limit the
enhanced speech signal is obtained by subtracting the recon- leakage of the desired speech signal, which is responsible
structed background noise signals from the noise signals in for distortion in the output signal, a quadratic constraint is
the upper path. The rest of this paper is organized as follows. imposed on the norm of the noise canceller coefficients. Alter-
In Sect. 2, the research background is illustrated, while in natively, use of the leaky LMS algorithm has been suggested.
Sect. 3 the problem formulation and notation is introduced. In Herbordt and Kellermann (2003), the relation of the robust
Section 4 provides the principle of the proposed MIMO sys- GSC after Hoshuyama et al. (1999) with wideband linearly-
tem for multiple noise cancellation and dereverberations. In constrained least squares error (LCLSE) beamforming in the
Sect. 5, simulation results and discussions will be presented time-domain using spatio-temporal constraints was introduced.
to demonstrate the performance of the proposed approach for The frequency-domain implementation of the robust GSC
dereverberation and multi-noise reduction. This Section also beamformer was introduced to accelerate the convergence rate
discusses the performance of the proposed MIMO system (Herbordt et al. 2007).
for variety of actual noise sources in an actual room acoustic
environment. Conclusions are given in Sect. 6. 3 Problem formulation

The problem considered in this paper is depicted in Fig. 1,


2 Research background where it consists of J sources in the sound field and it uses N
microphones to capture these signals. The output of the nth
In Faucon et al. (1989), a two-channel system was proposed microphone yn (k) is given by
for enhancing speech corrupted by additive noise. It is sug-
gested to learn the filter of the blocking matrix during an ∑
J
yn (k) = hn,j (k) ⊗ sj (k) + vn (k)
initialization phase without noise presence and to keep the
j=1
filters fixed when the system is running. This, of course, (1)
limits the usage to situations with time-invariant propagation ∑ ∑
J L−1
= hn,j (i)sj (k − i) + vn (k)
characteristics with respect to desired signal. j=1 i=0
In Van Comprenolle (1990), the system of Faucon et al.
(1989) is extended to an arbitrary number of microphones.

13
International Journal of Speech Technology

y1 (k )
Room e out (k ) + s j (k ) j = 1,...J N y2 ( k ) Room
s1 (k )

∑ WFB (k ) s 2 (k )
Upper path H (k )
y1 (k ) h11 - y N (k )
h21 h
s1 ( k ) s J (k )
sˆ j (k )
y2 ( k ) h12
N1
eˆ (k )
h22 s 2 (k ) WANEF (k ) WLPEF (k )
N N
hN 2 Lower path

h1 J
y N (k ) h2 J Fig. 2  Schematic diagram of the proposed MIMO system
s J (k )
hNJ

Given this signal model, our aim is to estimate the desired


Fig. 1  Acoustic system model speech signal s1 (k) from the observed microphone signals
yn (k) , n = 1, 2, ..., N  . This would involve two processing oper-
ations: dereverberation and multi-noise cancellation which are
where hn,j is the acoustic channel impulse response from the explained in the following section.
source j to microphone n, ⊗ denotes convolution, sj (k) is the
jth source signal (we assume that the first source, i.e.s1 (k)
is the desired speech source while the other source signals 4 The proposed MIMO beamformer system
sj (k)j = 2, 3, ...J  , are the noise sources), vn (k) is the additive
noise observed at the nth microphone, and L is the number The schematic diagram of the proposed MIMO system is
of taps of the room impulse response. For ease of analysis, shown in Fig. 2. It consists of three items: (1) fixed beam-
let us neglect the noise terms vn (k) in (1). former weights, (2) multi-channel linear prediction error
Using vector/matrix formulation, this signal model can be filters (MCLPEFs), and multi-channel adaptive noise esti-
rewritten as mation filters (MCANEFs).

𝐲(k) = 𝐇T 𝐬(k) (2) 4.1 Fixed beamformer weights 𝐖FB (k)


where
[ ]T It is desirable to provide constraints which depend on both
𝐲(k) = 𝐲1T (k) 𝐲2T (k), ..., 𝐲NT (k) (3) spatial and temporal characteristics of the desired signal.
This can be written as
[ ]T
𝐲n (k) = yn (k) yn (k − 1), ..., yn (k − (M − 1)) (4) 𝐂Ts (k)𝐰s (k) = 𝐜s (k) (10)
with the NM × 1 beamformer weight vector 𝐰s (k) , with the
2

[ ]T NM 2 × CM 2 constraint matrix 𝐂s (k) , and with the CM 2 × 1


𝐬(k) = 𝐬T1 (k) 𝐬T2 (k), ..., 𝐬TJ (k) (5) constraint vector 𝐜s (k),
[ ]T
𝐬j (k) = [sj (k) sj (k − 1), ..., sj (k − (L + M − 2))]T (6)
𝐰s (k) = 𝐰T (k), 𝐰T (k − 1), ..., 𝐰T (k − M + 1) , (11)

𝐇 is a J(L + M − 1) × NM convolution matrix expressed as 𝐂s (k) = diag{𝐂(k), 𝐂(k − 1), ..., 𝐂(k − M + 1)} (12)
⎡ 𝐇1,1 𝐇1,2 ... 𝐇1,N ⎤ [ ]T
⎢ 𝐇 𝐇 ... 𝐇2,N ⎥ 𝐜s (k) = 𝐜T (k), 𝐜T (k − 1), ..., 𝐜T (k − M + 1) , (13)
𝐇 = ⎢ 2,1 2,2 (7)
⎢ . . ⎥⎥
⎣ 𝐇J,1 𝐇J,2 ... 𝐇J,N ⎦ respectively. The constraint Eq. (10) repeats the spatial con-
straints 𝐂T (k − m)𝐰(k − m) = 𝐜(k − m) for M successive
time instant m = 0, 1, ..., M − 1 . We demand that the desired
⎡ 𝐡j,n 0 ... 0 ⎤ speech signal processed by a fixed beamformer weight vec-
⎢ 0 𝐡j,n ⎥
=⎢ (8) tor 𝐖FB (k) is not distorted at the output of the proposed
0 ⎥⎥
𝐇j,n
⎢ approach over a data block of M successive samples. With
⎣ 0 ... 0 𝐡j,n ⎦
the vector 𝐱(k) of the desired signal at the microphones
[ ]T
𝐡j,n = [hj,n (0), hj,n (1), ..., hj,n (L − 1)]T 𝐱(k) = 𝐱1T (k) 𝐱2T (k), ..., 𝐱NT (k) (14)
(9)
j = 1, 2, ..., J and n = 1, 2, ..., N

13
International Journal of Speech Technology

[ ]T [ ]T
𝐱n (k) = 𝐱n (k) 𝐱n (k − 1), ..., 𝐱n (k − (M − 1)) , (15) 𝐰n (k) = wn (0) wn (1), ..., wn (MLPEF − 1) n = 1, 2, ..., N
(23)
such a constraint can be put into the form of (10) with C = 1 are the tap coefficients of the nth LPEF which are updated by
as follows: normalized least mean square (NLMS) adaptive algorithm
𝐂s (k) = diag{𝐱(k), 𝐱(k − 1), ..., 𝐱(k − M + 1)} (16) (Haykin and Kailath 2002). By minimizing the mean square
values of the prediction error signal set ̂
𝐞(k) , we obtain
𝐜s (k) = 𝐂Ts (k)𝐖FB (k) (17) [ ]+ { }
where 𝐖𝐋𝐏𝐄𝐅 = 𝐇T E{𝐬(k − 1)𝐬T (k − 1)}𝐇 𝐇T E 𝐬(k − 1)𝐬T (k) 𝐇
(24)
[ ]T where A+ is the Moore–Penrose generalized inverse of
𝐖𝐅𝐁 (k) = 𝐖𝐓𝐅𝐁 (k), 𝐖𝐓𝐅𝐁 (k − 1), ..., 𝐖𝐓𝐅𝐁 (k − M + 1)
(18) matrix A (Harville 1997), and E{.} is an expectation opera-
Note that the constraint vector 𝐜𝐬 (k) depends on the fixed tor. Assuming the source signals 𝐬j (k) can be generated by
beamformer weight vector 𝐖𝐅𝐁 (k) so that the design of 𝐜𝐬 (k) the MIMO AR process, we can write 𝐬j (k) as (Kailath et al.
is replaced by the appropriate choice of the weight vector 2000)
𝐖𝐅𝐁 (k) . This has the advantage that any beamformer design
𝐬j (k) = 𝐂Tj 𝐬j (k − 1) + 𝐞j (k) (25)
can be used to specify the constraints, or equivalently, to
specify the beamformer response with respect to the desired where 𝐞j (k) = [ej (k) 0, ..., 0]T  , and 𝐂𝐣 is the B × B companion
signal. The fixed beamformer weight vector may be realized as matrix defined as
a delay and sum (DS) beamformer (as assumed in this paper),
if the position of the desired source only varies little within a ⎡ aj,1 1 0 . 0⎤
given interval of direction of arrival. When designing the fixed ⎢a 0 1 . 0⎥
𝐂𝐣 = ⎢ j,2 (26)
1 ⎥⎥
, j = 2, 3, ..., J
beamformer, one has to assure that the mainlobe width of the ⎢ .
fixed beamformer is sufficiently wide for not distorting the ⎣ aj,B 0 . . 0⎦
desired signal. Robustness against array perturbations can be
obtained by using appropriate designs of fixed beamformers. where aj is Z-transform of an AR polynomial of order B
given by
4.2 MCLPEF weights 𝐖LPEF (k) { }
aj (z) = 1 − aj,1 z−1 + .... + aj,B z−B , (27)
As can be seen from structure of the proposed system (Fig. 2),
we have two signal paths. The first is upper path, which con- and B is the order of the AR polynomials. We then have
tains a desired speech and residual noises after applying fixed { } { }
beamformer weights. The second path is the lower path, E 𝐬(k − 1)𝐬T (k) = E 𝐬(k − 1)𝐬T (k − 1) 𝐂𝐣 (28)
which consists of multi-channel linear prediction error filters
where 𝐬(k − 1) and 𝐞(k) are assumed to be orthogonal.
(MCLPEFs) and multi-channel adaptive noise estimation fil-
Assuming that E{𝐬(k − 1)𝐬T (k − 1)} is positive definite, we
ters (MCANEFs). The input signals of the MCLPEFs are the
can replace it with 𝐗T 𝐗 , where X is a matrix. Thus, the
noisy microphone signals yn (k) n = 1, 2, ..., N .
linear prediction error filters can be expressed as (Delcroix
In a matrix form, the output signal of the MCLPEFs can
et al. 2005)
be expressed as
( )−1
𝐞(k) = 𝐲(k) − 𝐲T (k − 1)𝐖𝐋𝐏𝐄𝐅
̂ 𝐖𝐋𝐏𝐄𝐅 = 𝐇T 𝐇𝐇T 𝐂𝐣 𝐡1 (29)
(19)
= 𝐬T (k)𝐇 − 𝐬T (k − 1)𝐇𝐖𝐋𝐏𝐄𝐅 where 𝐡1 = [𝐡T1,1 𝐡T2,1 , ..., 𝐡TJ,1 ]T is the first column of H.
Using (2), (19) and (29), the prediction error signal ̂ 𝐞(k)
where becomes
[ T ]T
̂ 𝐞T2 (k), ..., ̂
𝐞1 (k) ̂
𝐞(k) = ̂ 𝐞TN (k) (20) 𝐞(k) = 𝐬T (k)𝐇 − 𝐬T (k − 1)𝐇𝐖𝐋𝐏𝐄𝐅
̂
= 𝐬T (k)𝐇 − 𝐬T (k − 1)𝐇𝐇𝐓 (𝐇𝐇𝐓 )−𝟏 𝐂𝐣 𝐡𝟏
[ ]T
[ ] (30)
̂
𝐞n (k) = ̂
en (k) ̂
en (k − 1), ..., ̂
en (k − (MLPEF − 1)) n = 1, 2, ..., N = (𝐬T (k) − 𝐬T (k − 1)𝐂𝐣 𝐡𝟏
(21) = 𝐞T (k)𝐡𝟏
[ ]T
𝐖𝐋𝐏𝐄𝐅 (k) = 𝐰T1 (k) 𝐰T2 (k), ..., 𝐰TN (k) (22)
Equation (30) shows that the prediction error signals
𝐞(k) are proportional to the white noise sources 𝐞(k) . The
̂

13
International Journal of Speech Technology

were does not correlate with the desired speech com-


MIMO Noise Generation System ponents s1 (k) in the upper path. This results that the out-
j = 2,..., J put of the MCANEFs represents only the reconstructed
s j (k ) e j (k ) noise signals ̂𝐬j (k) j = 2, 3, ..., J  . The noise cancellation
AR is achieved by subtracting the reconstructed noise sig-
Noise
System nals ̂𝐬j (k) from the signals in the upper path. Referring to
White Noise
Signals Sources Fig. 2, the MIMO system (MCANEFs) with N input chan-
nels is described by the 𝐖ANEF (k) which captures N col-
umn vectors 𝐖n of length MANEF with filter coefficients
Fig. 3  MIMO noise generation system wm,n (k), m = 1, 2, ..., MANEF , n = 1, 2, ..., N  , i.e.
[ ]
𝐖ANEF (k) = 𝐖T1 (k) 𝐖T2 (k), ...., 𝐖TN (k) (31)

s1 (k )
The system 𝐖ANEF (k) is driven by N input signals ̂ 𝐞(k)
according to (20). The output signal of the MIMO system
e out (k ) sj e(k ) 𝐞(k) j = 2, 3, ..., J are subtracted from the
̂𝐬j (k) = 𝐖TANEF (k)̂
∑ ∑ MIMO Noise
Generation speech reference signal 𝐬𝐣 (k) j = 1, 2, ..., J  . This yields the
- J J-1 J-1
error signal 𝐞𝐨𝐮𝐭 (k) as follows 𝐞𝐨𝐮𝐭 (k) = 𝐬j (k) − ̂𝐬j (k) . The
error signal 𝐞𝐨𝐮𝐭 (k) is used for formulating cost function in
sˆ j (k ) order to determine 𝐖ANEF (k) according to LS optimization
WANEF (k )
N N criterion.
Using the singular value decomposition (SVD) for solv-
Fig. 4  MIMO system identification ing the LS problem, the solution can be computed as (Golub
and Loan 1989)
[ T ]+
MCLPEFs takes reverberation (multi-path propagation with 𝐖ANEF (k) = ̂
𝐞 (k) 𝐬j (k) j = 1, 2, ...J (32)
respect to the input source signals) into account by mod-
eling the propagation by room impulse response, such that The important observation from Fig.  4 is that the
the reflected signal paths are cancelled at the output of the
𝐖ANEF (k) performs very well even when the number of input
MCLPEFs. However, the prediction error signals becomes signals J (or equivalently the number of noise sources) are
white (Haykin and Kailath 2002). In the MCANEFs, we may equal to the number of microphones N. This is the unique
estimates the MIMO AR polynomials of the noise generation feature of the proposed system, while the good performance
systems in order to reconstruct the noise signals precisely. of the existing techniques is severely impaired in case J ≥ N  .
The influences of these parameters (J, N and filter length
4.3 MCANEF weights 𝐖ANEF (k) MANEF of the adaptive noise estimation filters) on the per-
formance of the proposed MIMO system are discussed in
In this part, we consider the reconstruction of the noises the following section.
from whitened signals (output of LPEFs). Assume that
the noise sources are generated by exciting a MIMO finite
order AR system with a white noise sources as shown in 5 Simulation results
Fig. 3. The principle of MIMO system identification model
is shown in Fig.  4. The MCANEFs is placed in paral- 5.1 Simulation environments
lel to MIMO noise generation system. The output signals
𝐬𝐣 (k) j = 1, 2, ..., J are subtracted from the output ̂𝐬𝐣 (k) of The simulation room has dimensions 6 m × 3 m × 2.5 m.
the modeling system, which yields the error signals 𝐞𝐨𝐮𝐭 (k) . It consists of a microphone array, a desired speech source
Minimization of the 𝐞𝐨𝐮𝐭 (k) according to LSE cost function, s1 (k) and multi white Gaussian noise sources. The speech
allows to identify the MIMO noise generation system by source is located at 1 m from the center of the microphone
the modeling system 𝐖ANEF (k) . Note that the noise signals array at broadside direction, and the noise sources are
𝐬𝐣 (k) j = 2, ..., J and speech s1 (k) are assumed to be desired located at a distance of 2 m from the array center at direc-
signals and disturbance, respectively. tions 𝜃n = 0, ± 30, ± 45, 90 and 180◦ . Unless otherwise
The MCANEFs cannot estimate the desired speech sig- indicated, we will only use the noise source at 𝜃n = 45◦ (in
nal s1 (k) . This is due to the fact that the residual speech Sect. 5.3 and 5.4, multi noise sources will be used). The used
components are whitened by the MCLPEFs in the lower signals are sampled at 16 kHz with 16 bit resolution. The
path. Then the whitened residual speech components acoustic room impulse responses are calculated using the

13
International Journal of Speech Technology

image method described by Allen and Berkley (1979) for 36


34 DS
different reverberation times T60 . The T60 can be expressed as 32 GSC
Proposed
a function of the reflection coefficient 𝛾 of the walls (here we 30
28
assume equal 𝛾 for all walls in the simulated room), accord- 26

Noise Reduction [dB]


24
ing to Sabinas formula (Neubauer 2001) 22
20
0.163V 18
T60 = (33) 16
−S Log(1 − 𝛾) 14
12
where V is the volume of the room, and S is the total sur- 10
8
face of the room. To validate the performance of the simu- 6
4
lated image method, we plots the room impulse response 2
from s1 (k) to the first microphone at different T60 . These 0 100 200 300 400 500 600
Reverberation Time
impulse responses are shown in Fig. 5. The performance
of the proposed system will be observed by the speech dis-
tortion and by the noise reduction. The speech distortion Fig. 6  Noise reduction performance as a function of the reverberation
time (N = 4 microphones)
in dB is defined as the ratio between the variance of the
desired speech signal at the microphones and the variance
of the desired speech signal at the proposed system output the proposed algorithm and the GSC as a function of the
on a logarithmic scale. While the noise reduction in dB is number of microphones N, of the reverberation time T60 ,
defined as the ratio between the variance of noise signals and of the filter length MANEF of the adaptive noise estima-
at the microphones and the variance of noise signals at the tion filters. The filter length MLPEF of the linear prediction
proposed system output on a logarithmic scale. error filters is 32 taps. The noise source and desired speech
are located at a distances of 1 and 2 m from the array
5.2 Noise reduction performance for single‑noise center at directions 90◦ and 45◦ , respectively. As can be
source seen from Fig. 7, the noise reduction of both algorithms
increases with increasing number of microphones N and
In this part, simulation results are presented by comparing with increasing filter lengths. Note also, for all N and for
the performance of the proposed system over the perfor- both T60 , the proposed algorithm performs noise reduction
mance of the DS and GSC beamformers. Figure 6 shows better than the GSC. Comparing Fig. 8a with 8b, more
the noise reduction performance as a function of the rever- speech distortion occurs for higher reverberation time, i.e.,
beration time T60 . As expected, for small T60 , the GSC per- T60 = 200 ms . The speech distortion of the GSC decreases
forms much better than for higher T60 . Similar results for with increasing number of spatial and temporal degrees
GSC algorithm are obtained in Bitzer et al. (1999). These of freedom, i.e., N. The speech distortion of the proposed
results seem to show the validity of the simulations car- algorithm is much lower than that of GSC. This has been
ried out with the image method. Unlike the GSC, the pro- explained earlier in Sect. 3. Briefly, the linear prediction
posed system performs well for high T60 . We can see that, filters take multi-path propagation of the desired speech
for all reverberation times, the proposed system performs into account such that the direct signal path and the sec-
noise reduction better than the GSC algorithm. Figures 7 ondary signal paths are blocked (cancelled) at the input
and 8 show the noise reduction and speech distortion of of 𝐖ANEF (k).

Fig. 5  Room impulse responses generated by image method, a ­T60 = 9 ms. b ­T60 = 200 ms. c ­T60 = 1.1 s

13
International Journal of Speech Technology

30
28 16
26
Proposed (N=6)
24 14
22 Proposed (N=4)
12
Noise Reduction [dB]

Noise Reduction [dB]


20 Proposed (N=2)
18
10
16
14 8
12 GSC (N=6)
10 6
8 GSC (N=4)
6 4
4 GSC (N=2)
2
0 500 1000 1500 2000 0 500 1000 1500 2000
Filter Length Filter Length

(a) T60 =64ms (b) T60 =200ms

Fig. 7  Noise reduction performance for T60 = 64 ms (a) and T60 = 200 ms (b) over the filter length MANEF and as a function of the number of
microphones N

20
10 GSC (N=2)
GSC (N=4) 18
GSC (N=6) 16
8 Proposed (N=2)
Proposed (N=4) 14
Speech Distortion [dB]
Speech Distortion [dB]

Proposed (N=6)
6 12

10
GSC (N=2)
4 8 GSC (N=4)
GSC (N=6)
6 Proposed (N=2)
2
4 Proposed (N=4)
Proposed (N=6)
0 2

0
0 500 1000 1500 2000 0 500 1000 1500 2000
Filter Length Filter Length

(a) T60 =64ms (b) T60 =200ms

Fig. 8  Speech distortion performance for T60 = 64 ms (a) and T60 = 200 ms (b) over the filter length MANEF and as a function of the number of
microphones N

5.3 Noise reduction performance for multi‑noise microphones. For example, when number of noises = 4 and
sources N = 4, obviously the noise reduction by GSC (5.541 dB) is
equal to noise reduction by DS (5.451 dB). This means that
In this part, we consider all the noise sources as they are the noise reduction ability of the GSC is performed by DS
given in Sect. 5.1. The T60 of the simulated room is 200 ms. in its reference path only and there are no degrees of free-
Table 1 summarizes the simulation results. Many observa- dom for noise reduction in multi-channel noise canceller.
tions and interdependencies which have been described ear- Second, the proposed system can achieve almost perfect
lier in Sect. 4 can clearly be noticed from this table. First noise reduction even when the number of noise sources is
of all, as the number of noises increases, the noise reduc- equal to N. For example, when number of noises = 4 and
tion ability of the GSC decreases, particularly when the N = 4 the noise reduction performance of the proposed sys-
number of noises is equal or greater than the number of tem is (20.211 dB). The noise reduction improvement with

13
International Journal of Speech Technology

Table 1  Performance of noise reduction using different algorithms, T60 = 200 ms , MLPEF = 32 taps , and MANEF = 256 taps
Noise source Two microphones (N = 2) Four microphones (N = 4) Six microphones (N = 6)
DS (dB) GSC (dB) Proposed (dB) DS (dB) GSC (dB) Proposed (dB) DS (dB) GSC (dB) Proposed (dB)

1 2.302 4.192 17.500 4.562 9.772 20.660 5.722 13.032 22.280


2 2.605 3.355 17.594 4.895 6.515 20.894 6.035 8.20 22.754
3 2.822 3.072 17.633 5.922 5.642 20.593 6.412 6.772 22.553
4 3.011 3.101 17.461 5.451 5.541 20.211 6.721 6.681 21.751
5 3.243 3.233 16.142 5.883 5.843 17.852 7.323 7.033 18.762
6 3.194 3.115 14.145 5.834 5.574 15.075 7.414 6.774 15.425
7 3.080 2.890 12.502 5.910 5.360 13.112 7.620 6.570 13.392

respect to GSC is about (14.67 dB). This improvement is which can be used to remove the undesired low-frequency
also depicted in Fig. 9. The reason behind this has been components. Figure 10b, c illustrates the spectrograms
explained in Sect. 4. Finally, the performance of the pro- of the enhanced speech for both GSC and the proposed
posed system decreases with increasing the number of noise MIMO system, respectively. It can be seen that the spec-
sources and more than the number of adaptive noise estima- trograms contain some of such low frequency compo-
tion filters 𝐖ANEF (k). nents. Comparing these three results, it reveals that the
proposed system have achieved almost perfect multi-noise
5.4 Performance validation with actual noise cancellation. However, the output of GSC still consists of
signals a significant amount of noises. Apparently, GSC is less
effective than the proposed system in terms of multi-noise
In this experimental four microphones, i.e. N = 4 arranged suppression. Since GSC employ only the channel informa-
in a linear array with spacing 5 cm are used. The micro- tion from the desired speech to the microphones while
phones are omni-directional electrets condenser of type the proposed system use all channel information includ-
ATP-20M. One desired speech s1 (k) and four noise ing those from noise sources. Figure 11 shows another
sources, i.e., s2 (k)  , s3 (k)  , s4 (k) and s5 (k) are considered. results for moving car, where the actual car noise signals
These noises are white Gaussian noises played by loud- are recorded inside the moving car then played back by
speakers. The Labview (2005) with appropriate data the loudspeakers s2 and s3 . It can be seen that the proposed
acquisition card is used to implement the proposed MIMO system has the potential for reducing the actual car noises.
system. For comparison, the GSC system is also imple- Also, note that the amount of the undesirable cancellation
mented and tested. All the parameters are chosen to be in the desired speech signal is high when using the GSC.
same as those in the simulated acoustic environment, if not This cancellation in the desired signal becomes less with
mentioned otherwise. To show the overlapping in the time the use of proposed system. In order to compare the multi-
and frequency domains, we used spectrogram to display noise reduction ability of the DS, GSC and the proposed
the results. Figure 10a shows the spectrogram of the total system for a variety of actual noise sources. The SNR
signals received by first microphone. Note that to show improvement are calculated and summarized in Table 2.
the realistic result, we have not applied a high-pass-filter, From Table 2, it may be noted that the superiority of the
proposed system with respect to GSC is obvious.

Fig. 9  Multi-noise reduction performance, a microphone signals, b by GSC, c by proposed algorithm

13
International Journal of Speech Technology

Fig. 10  Spectrograms of the noisy signal, enhanced speech using Fig. 11  Practical results for actual car noises
GSC, and proposed system

Table 2  SNR improvement for variety actual noise sources


6 Conclusions
Noise source SNR improvement (dB)
The proposed MIMO system can be applied for enhancing DS GSC Proposed
a speech signal corrupted by multi-noise sources in com-
Car 0.317 4.140 7.132
plicated acoustic environment. The structure of the pro-
Train 0.591 4.080 4.537
posed system consists of two signal paths, namely, upper
Street 0.798 5.634 7.729
and lower signal paths. The multi-channel linear predic-
Airport 0.776 4.370 5.248
tion filters in the lower path suppress the room reverbera-
Restaurant 0.776 4.370 5.2533
tion effect; however, they also whitened the noise signals.
Then the noise signals can be reconstructed by using multi-
channel adaptive noise estimation filters which are used as
MIMO system identification. The simulation demonstrates

13
International Journal of Speech Technology

that the proposed MIMO system has a better multi-noise- Herbordt, W., Buchner, H., Nakamura, S., & Kellermann, W. (2007).
reduction and lower speech-distortion performances than Multichannel bin-wise robust frequency-domain adaptive filtering
and its application to adaptive beamforming. IEEE Transactions
standard fixed and adaptive beamforming techniques for all on Audio, Speech, and Language Processing, 15(4), 1340–1351.
reverberation times. This due to the channel information for Herbordt, W., & Kellermann, W. (2003). Adaptive beamforming
all input signals are considered in the proposed approach. for audio signal acquisition. In J. Benesty & Y. Huang (Eds.),
Because the proposed approach loses no degrees of freedom Adaptive signal processing: Applications to real-world problems
(pp. 155–194). Berlin: Springer.
for multi-noise cancellation, it can be implemented with a Hoshuyama, O., Sugiyama, A., & Hirano, A. (1999). A robust adaptive
small number of microphones. The performance of the pro- beamformer for microphone arrays with a blocking matrix using
posed MIMO system was validated under a realistic environ- constrained adaptive filters. IEEE Transactions on Signal Process-
ment with real noise sources. ing, 47(10), 2677–2684.
Kailath, T., Sayed, A. H., & Hassidi, B. (2000). Linear estimation.
Upper Saddle River, NJ: Prentice-Hall.
Linhui, Sun, Min, & Yang, SuZ. (2017). An adaptive speech endpoint
detection method in low SNR environments. International Jour-
References nal of Speech Technology, September, 20(3), 651–658.
Mohammed, H. S., Rihan, A. M., NassarAdel, M. A., El-Fishawy, S.,
& Abd El-Samie, E. (2017). Efficient compression and reconstruc-
Allen, J., & Berkley, D. A. (1979). Image method for efficiently simu- tion of speech signals using compressed sensing. International
lating small room acoustics. Journal of the Acoustical Society of Journal of Speech Technology, 20(4), 851–857.
America, 66, 943–950. Mohammed, J. R. (2009a). An efficient method for combining adap-
Bitzer, J., Simmer, K. U., & Kammeyer, K. D. (1999). Theoretical noise tive echo and noise canceller in hands-free systems. International
reduction limits of the generalized sidelobe canceller (GSC) for Journal of Adaptive Control and Signal Processing, 23, 278–292
speech enhancement. Proceeding of IEEE International Confer- Mohammed, J. R. (2009b). Adaptive noise reduction and acoustic echo
ence on Acoustics, Speech, and Signal Processing (ICASSP), 5, cancellation using adaptive filters in hands-free communication
2965–2968) systems, PhD Thesis, Punjab Engineering College, Panjab Uni-
Delcroix, M., Hikichi, T., & Miyoshi, M. (2005). Blind dereverbera- versity, Chandigarh, India, 20 November 2009.
tion algorithm for speech signals based on multi-channel linear Mohammed, J. R. (2017). Development of two-input adaptive noise
prediction. Acoustical Science and Technology, 26(5), 432–439. canceller with ability to cancel wideband and narrowband noise
Djendi, M. (2018). An efficient wavelet-based adaptive filtering algo- signals. International Journal of Speech Technology, 20(3),
rithm for automatic blind speech enhancement. International 741–751.
Journal of Speech Technology, 21(2), 355–367. National Instruments, LabVIEW Fundamentals, Version 8.0, User
Faucon, G., Mezalek, S. T., & Le Bouquin, R. (1989). Study and com- Manual, Part Number 324029A-01, August 2005.
parison of three structure for enhancement of noisy speech. Pro- Neubauer, R. O. (2001). Existing reverberation time formulae—a
ceeding of IEEE International Conference on Acoustics, Speech, comparison with computer simulated reverberation times. In 8th
and Signal Processing, 1, 385–388. International Congress on Signal and Vibration, Hong Kong, July,
Golub, G. H., & Van Loan, C. F. (1989). Matrix computations 2001 (pp. 805–812).
(2nd edn.). Baltimore: John Hopkins University Press. Rashmirekha, R., & Mohanty, M. N. (2018). Performance analysis of
Griffiths, L. J., & Jim, C. W. (1982). An alternative approach to linearly adaptive variational mode decomposition approach for speech
constrained adaptive beamforming. IEEE Transactions on Anten- enhancement. International Journal of Speech Technology, 21(2),
nas and Propagation, 30(1), 27–34. 369–381.
Harville, D. A. (1997). Matrix algebra from a statistician’s perspective. Van Comprenolle, D. (1990). Switching adaptive filters for enhanc-
New York: Springer. ing noisy reverberant speech from microphone array recordings.
Haykin, S., & Kailath, T. (2002). Adaptive filter theory (4th edn.). In Proceeding of IEEE ICASSP, Albuquerque NM, April 1990
Upper Saddle River, NJ: Prentice-Hall, Pearson Education, Inc. (Vol. 2, pp. 833–836).

13

You might also like