Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation

Uploaded by

pravin2275767

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation

Uploaded by

pravin2275767

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Speech Enhancement in Modulation Domain Using

Codebook-based Speech and Noise Estimation
Vidhyasagar Mani, Benoit Champagne Wei-Ping Zhu
Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering
McGill University, 3480 University St. Concordia University, 1455 Maisonneuve Blvd. West
Montreal, Quebec, Canada, H3A 0E9 Montreal, Quebec, Canada, H3G 1M8
[email protected], [email protected] [email protected]

Abstract—Conventional single-channel speech enhancement methods Most speech enhancement algorithms, including those operating
implement the analysis-modification-synthesis (AMS) framework in the in the modulation domain, require an estimate of the background
acoustic frequency domain. In recent years, it has been shown that the
noise PSD which is typically obtained via a minimum statistics [13]
extension of this framework to the modulation frequency domain may
result in better noise suppression. However, this conclusion has been approach. Minimum statistics and its offshoots [14], [15] assume
reached by relying on a minimum statistics approach for the required that the background noise exhibits a semi-stationary behaviour (i.e.
noise power spectral density (PSD) estimation, which is known to create slowly changing statistics) while performing its estimation. This
a time frame lag when the noise is non-stationary. In this paper, to avoid may not be the case in acoustic environments with rapidly changing
this problem, we perform noise suppression in the modulation domain
with speech and noise power spectra obtained from a codebook-based background, e.g., a street intersection with passing vehicles or a busy
estimation approach. The PSD estimates derived from the codebook airport terminal. In such cases, the noise PSD cannot be tracked
approach are used to obtain a minimum mean square error (MMSE) properly and speech enhancement algorithms may perform poorly.
estimate of the clean speech modulation magnitude spectrum, which Codebook based approaches [16]–[20], which fit under the gen-
is combined with the phase spectrum of the noisy speech to recover
the enhanced speech signal. Results of objective evaluations indicate eral category of unsupervised learning [21], try to overcome this
improvement in noise suppression with the proposed codebook-based limitation by estimating the noise parameters based on a priori
speech enhancement approach, particularly in cases of non-stationary knowledge about different speech and noise types. In these ap-
noise.1 proaches, joint estimation of the speech and noise PSD is performed
Index Terms—Speech enhancement, modulation domain, MMSE on a frame-by-frame basis by exploiting a priori information
estimation, LPC codebooks stored in the form of trained codebooks of short-time parameter
vectors. Examples of such parameters include gain normalized linear
I. I NTRODUCTION predictive (LP) coefficients [16]–[19] and cepstral coefficients [20].
Speech enhancement involves the suppression of background The use of these codebook methods in the acoustic AMS frame-
noise from a desired speech signal while ensuring that the incurred work has shown promising results in the enhancement of speech
distortion is within a tolerable limit. Some of the most commonly corrupted by non-stationary noise. However, to the best of our
used single channel speech enhancement methods include spectral knowledge, they have not been applied yet to the modulation domain
subtraction [1], [2], Wiener filtering [3], and MMSE short-time framework. In this work, we conjecture that codebook methods can
spectral amplitude (STSA) estimation [4], [5]. These methods typi- indeed bring similar benefits to the enhancement of noisy speech
cally involve implementation of the following three-stage framework in the modulation domain by providing more accurate estimation
known as AMS [6], [7]: (1) Analysis, in which the short-time of the noise PSD in non-stationary environments, and validate this
fourier transform (STFT) is applied on successive frames of the hypothesis experimentally.
noisy speech signal; (2) Modification, where the spectrum of the Specifically, the new speech enhancement method that we pro-
noisy speech is altered for achieving noise suppression, and; (3) pose in this paper incorporates codebook assisted noise and speech
Synthesis, where the enhanced speech is recovered via inverse STFT PSD estimation into the modulation domain framework. We use
and overlap-add (OLA) synthesis. codebooks of linear prediction coefficients and gains obtained by
In past years, research has shown that extension of this frame- training with the Linde-Buzo-Gray (LBG) algorithm [22]. The PSD
work into the modulation domain may result in improved noise estimates derived from the codebook approach are used to calculate
suppression and better speech quality [8], [9]. For instance, in the a gain function based on the MMSE criterion [9], which is applied
case of spectral subtraction, musical noise distortion is lesser when to the modulation magnitude spectrum of the noisy speech in
the subtraction is performed in the modulation domain than in the order to suppress noise. Results of objective evaluations indicate
conventional frequency domain [8]. Extension of the MMSE-STSA improvement in noise suppression with the proposed codebook-
estimator to the modulation domain, in the form of the modulation based speech enhancement method, especially in cases of non-
magnitude estimator (MME) [9], has also shown positive results. stationary noise.
The interest towards this framework extension is further motivated
by physiological evidence [10]–[12], which underlines the signifi- II. ACOUSTIC VERSUS MODULATION DOMAIN PROCESSING
cance of modulation domain information in speech analysis. A. AMS in the Acoustic Frequency Domain
1 Funding for this work was provided by a CRD grant from the Natural
Conventional speech enhancement methods implement the AMS
Sciences and Engineering Research Council of Canada under sponsoring framework in the acoustic frequency domain, where the acoustic
from Microsemi Corporation (Ottawa, Canada). frequency spectrum of a speech signal is defined by its STFT. To

978-1-4799-7591-4/15/$31.00 ©2015 IEEE 707

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

this end, an additive noise model is assumed, i.e., III. C ODEBOOK - BASED SPEECH AND NOISE ESTIMATION
x[n] = s[n] + d[n], (1) A. Overview
Various noise estimation algorithms are available in the liter-
where x[n], s[n] and d[n] refer to the noisy speech, clean speech
ature to estimate the background noise PSD, needed to perform
and noise signals respectively, while n ∈ Z is the discrete-time
noise suppression in speech enhancement. In algorithms based on
index. STFT analysis of (1) results in,
minimum statistics [13], [14], which are widely applied, the noise
X(ν, k) = S(ν, k) + D(ν, k) (2) PSD is updated by tracking the minima of a smoothed version of
|X(ν, k)|2 within a finite window. Tracking the minimum power in
where X(ν, k), S(ν, k) and D(ν, k) refer to the STFTs of the noisy this way results in a frame lag in the estimated PSD. This lag can
speech, clean speech and noise signals, respectively, and where k is lead to highly inaccurate results in the case of non-stationary noise.
the discrete acoustic frequency index. The STFT X(ν, k) is obtained The basis for the codebook-based speech and noise PSD estimation
from, approach in [17]–[20] is the observation that the spectra of speech
∞
X(ν, k) = x(l)w(νF − l)e−2jklπ/N (3) and different noise classes can be approximately described by few
l=−∞ representative models’ spectra. These spectra are stored in finite
codebooks as quantized vectors of short-time parameters (e.g., LP
where w(l) is a windowing function of duration N samples, and F
coefficients) and serve as the a priori knowledge of the respective
is the frame advance. In this work, the Hamming window is used
signals. The use of a priori information about noise eliminates the
for this purpose [7]. The STFT of a signal is represented by its
dependence on buffers of past data. This makes the estimation robust
acoustic magnitude and phase spectra as,
to spectral variations in non-stationary noise conditions [16].
X(ν, k) = |X(ν, k)|ej∠X(ν,k) (4)
B. PSD Model
Speech enhancement methods, such as spectral subtraction [1] or
For the additive noise model (1), under the assumption of
MMSE-STSA [4], implement the modification part of the AMS
uncorrelated speech and noise signals, the PSD of the noisy speech
framework by modifying the noisy magnitude spectrum whilst
can be represented as,
retaining the phase spectrum. Synthesis of the enhanced signal is
performed by inverse STFT followed by OLA synthesis. Pxx (ω) = Pss (ω) + Pdd (ω), ω ∈ [0, 2π) (8)
B. Modulation Domain Enhancement where Pss (ω) and Pdd (ω) are the clean speech and background
The calculation of the short time modulation spectrum involves noise PSD, respectively, and ω ∈ [0, 2π) is the normalized angular
performing STFT analysis on time trajectories of the individual frequency. The PSD shape of signal y[n], where y ∈ {s, d} stands
acoustic frequency components of the signal STFT. The magnitude for either the speech or noise, can be modelled in terms of its LP
spectrum of the noisy speech in each acoustic frequency bin, i.e. coefficients and corresponding excitation variance as,
|X(ν, k)|, is first windowed and then Fourier transformed again,
Pyy (ω) = gy P yy (ω) (9)
resulting into,

∞ where P yy (ω) is the gain normalized spectral envelope and gy is
Z(t, k, m) = |X(ν, k)|wM (tFM − ν)e−2jνmπ/M (5) the excitation gain (or variance). The former is given by,
ν=−∞

p
−2
where wM (ν) is the so-called modulation window of length NM , P yy (ω) = 1 + ayk ejωk (10)
m ∈ {0, ..., M − 1} is the modulation frequency index, t is the k=1

modulation time-frame index, and FM is the frame advance in where {ayk }pk=1 are the LP coefficients, represented here by vector
the modulation domain. The resulting modulation spectrum can be θ y = [ay1 , ...., ayp ], and p is the model order chosen.
expressed in polar form as,
C. Codebook Generation
Z(t, k, m) = |Z(t, k, m)|ej∠Z(t,k,m) (6)
In this work, two different codebooks of short-time spectral
where |Z(t, k, m)| is the modulation magnitude spectrum and parameters, one for the speech and the other for the noise, are
∠Z(t, k, m) is the modulation phase spectrum. generated from training data comprised of multiple speaker signals
Speech enhancement in the modulation domain involves spectral and different noise types. The codebook generation comprises the
modification of the modulation magnitude spectrum while retaining following steps: segmentation of the training speech and noise data
the phase spectrum, into frames with 20-40ms duration; computation of LP coefficients
{ayk }pk=1 for each frame; vector quantization of the LP coefficient
Ŝ(t, k, m) = G(t, k, m)Z(t, k, m) (7)
vectors θ y using the LBG algorithm to obtain the required codebook
where G(t, k, m) > 0 is a processing gain. Following this operation, [22]. The LBG algorithm forms a set of median cluster vectors
the enhanced time-domain signal is recovered by applying inverse which best represent the given input set of LP coefficient vectors.
STFT and OLA operations twice. Previous works [8], [9] suggest Optimal values have to be chosen empirically for the size of the
that enhancement approaches applied in the modulation domain speech and noise codebooks, considering the trade-off between
perform better than their traditional acoustic domain counterparts. PSD estimation accuracy and complexity. In the sequel, we shall
In this work, the MMSE estimator of the modulation magnitude represent the speech and noise codebooks so obtained as {θ is }N s
i=1
Nd
spectrum, also known as MME [9], will be used as a basis and {θ jd }j=1 , where vectors θ is and θ jd are the corresponding i-th
for developing the proposed codebook-based speech enhancement and j-th codebook entries, and Ns and Nd are the codebook sizes,
method. respectively.

708
2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In addition to the codebook vectors generated from training on where ωk = 2πk N

. Equation (13) is a reasonable approximation of
noise data, during the estimation phase, the noise codebook is (12) for large frame sizes N .
supplemented by one extra vector. The latter is updated for every With the help of the estimated excitation gains at the ν-th
frame based on a noise PSD estimate obtained using a MS method frame, we can define for each pair of speech and noise codebook
[13], [14]. This provides robustness in dealing with noise types vectors θ is and θ jd a complete codebook-based parameter vector
which may not be present in the training set. θ ij = [θ is , θ jd , gsi , gdj ]. The joint MMSE estimation of the unknown
D. Gain Adaptation parameter vector θ is implemented by carrying numerical integra-
tion over the product codebook of vectors θ ij so obtained, as given
Each codebeook entry, i.e., θ is or θ jd , can be used to compute
by [19]:
a corresponding gain normalized spectral envelope, respectively
1 ij p(x|θ ij )
Ns N d
i j
P ss (ω) or P dd (ω) by means of relations (10). To obtain the final θ̂MMSE ≈ θ (14)
PSD shape as in (9), however, the resulting envelope needs to be Ns Nd i=1 j=1 p(x)
scaled by a corresponding excitation gain, which we denote as gsi
1
Ns Nd
and gdj , respectively. In this work, we use an adaptive approach
p(x) ≈ p(x|θ ij ). (15)
whereby the excitation gains for the speech and noise codebooks are Ns Nd i=1 j=1
updated every frame based on the observed noisy speech magnitudes
|X(ν, k)|. These equations provide a fair approximation to the MMSE estimate
Specifically, for every possible combination of vectors θ is and θ jd under the assumptions that the codebook is sufficiently large and
from the speech and noise codebooks, respectively, the correspond- the unknown parameter vector θ is uniformly distributed.
ing gains gsi and gdj at the ν-th frame are obtained by minimizing the
IV. I NCORPORATION OF C ODEBOOK - BASED PSD INTO THE
Itakura-Saito distance measure between an estimated PSD and the
M ODULATION M AGNITUDE E STIMATOR
squared magnitude spectrum |X(ν, k)|2 of the noisy speech over the
frequency domain. In this calculation, the estimated PSD is defined The MME method [9] is an extension of the widely used
as the sum of the gain-adapted speech and noise envelopes, i.e., acoustic domain based MMSE spectral amplitude estimator [4],
ij i j into the modulation domain. In the MME method, the clean speech
Pxx = gsi P ss (ω) + gdj P dd (ω). (11) modulation magnitude spectrum is estimated from the noisy speech
The final optimum values of gsi
and gdj ,
which can be interpreted by minimizing the mean square error, denoted as E, between the
as conditional ML estimates, are approximated as in [18]. clean and estimated speech, i.e.,
E. Joint PSD Estimation E = E[(|S(t, k, m)| − |Ŝ(t, k, m)|)2 ] (16)
The joint estimation of the speech and noise PSD is done on a
where |S(t, k, m)|and|Ŝ(t, k, m)| denote the modulation magnitude
frame by frame basis. Let θ = [θ s , θ d , gs , gd ] denote the vector of
spectra of the clean and estimated speech, respectively. Using this
unknown parameters to be estimated, and from which speech and
MMSE criterion, the modulation magnitude spectrum of the clean
noise PSD can be determined through (9)-(10). Following [19], we
speech can be estimated from the noisy speech as,
adopt an MMSE framework for the estimation of parameter vector
θ. This framework makes it possible to simultaneously estimate the
|Ŝ(t, k, m)| = G(t, k, m)|Z(t, k, m)| (17)
LP coefficients (and excitation gains) of two linear processes that
additively overlap with each other. where G(t, k, m) is the MME spectral gain function and Z(t, k, m)
To this end, the noisy speech signal x[n] in (1) is assumed to is the modulation spectrum of the noisy speech from (5). The MME
follow a multivariate normal distribution when conditioned on θ, gain function is given by [9],
1 T −1 √ −ν −ν −ν
p(x|θ) = e−(1/2)(x Rxx x) (12) πν
(2π)N/2 det(Rxx )1/2 G(t, k, m) = exp (1 + ν)I0 + νI1
2γ 2 2 2
where x = [x[νF +1], . . . , x[νF +N ]]T is the observed data vector (18)
at the ν-th frame and Rxx = E{xxT } is the associated covariance where I0 (·) and I1 (·) denote the modified bessel functions of order
matrix. Under the previous modeling assumptions, the latter can be zero and one, respectively, and the parameter ν ≡ ν(t, k, m) =
ξ
written as the sum of the speech and noise covariance matrices, 1+ξ
γ is defined in terms of the a priori and a posteriori SNRs ξ
i.e., Rxx = Rss + Rdd . In turn, Rss and Rdd are functions of the and γ.
corresponding LP coefficients and excitation gains, as in Rss = It is precisely in the calculation of these SNR parameters that we
gs (ATs As )−1 where As is an N × N Toeplitz lower triangular make use of the codebook-based PSD estimates. In this work, the
matrix derived from θ Ts . a posteriori SNR is estimated as,
The equation for the conditional distribution p(x|θ) in (12) |Z(t, k, m)|2
involves a matrix inversion, which is computationally expensive. γ̂(t, k, m) = (19)
|D̂(t, k, m)|2
For a simpler and less time consuming computation, the covariance
matrices Rss and Rdd can be approximated as circulant matrices where |D̂(t, k, m)|2 is an estimate of the noise power in the
[17], thereby reducing (12) to, modulation domain. This quantity is obtained by applying the STFT
(over frame index ν) to the square-root of the codebook-based noise
1
N −1
N PSD estimate, and then squaring the result. Specifically,
ln p(x|θ) ≈ − ln 2π − ln(gs P ss (ωk ) + gd P dd (ωk ))
2 2
k=0
D̂(t, k, m) = Pdd (ν, k)wM (tFM − ν)e−2jνmπ/M (20)
1
N −1
|X(ν, ωk )|2
− (13) ν
2 gs P ss (ωk ) + gd P dd (ωk )
k=0 where Pd (ν, k) is the noise PSD estimate obtained at the ν-th frame

709
2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

TABLE I: PESQ values

through codebook-based MMSE estimation.
To reduce spectral distortion the following “decision directed” Input SNR Noisy MMSE MME CB-MME
approach is employed to obtain the value of the a priori SNR, 0 dB 1.75 1.78 2.04 2.24
NS-white
2 2 5 dB 2.06 2.19 2.46 2.58
ˆ k, m) = α |Ŝ(t − 1, k, m)| + (1 − α) |C(t, k, m)|
ξ(t, (21) 0 dB 1.72 1.85 1.95 2.07
|D̂(t − 1, k, m)|2 |D̂(t, k, m)|2 Street
5 dB 2.01 2.17 2.30 2.40
where |C(t, k, m)|2 is an estimate of the clean speech power in Restaurant
0 dB 1.78 1.84 1.87 2.04
the modulation domain and 0 < α < 1 is a control factor which 5 dB 2.13 2.20 2.27 2.37
acts as a trade-off between noise reduction and speech distortion. 0 dB 1.67 1.83 1.93 2.07
Babble
Similar to (20), C(t, k, m) is obtained by applying the STFT to the 5 dB 2.04 2.19 2.30 2.43
square-root of Pss (ν, k), i.e. the codebook-based PSD estimate of
the clean speech at the ν-th frame. TABLE II: Segmental SNR values (dB)
The estimated modulation magnitude spectrum, |Ŝ(t, k, m)| in
Input SNR Noisy MMSE MME CB-MME
(15), is transformed to the acoustic frequency domain by applying
inverse STFT followed by OLA synthesis. The resulting spectrum 0 dB -2.02 -1.19 0.57 1.63
NS-white
is combined with the phase spectrum of the noisy speech to obtain 5 dB 1.55 2.60 3.75 5.04
the enhanced speech spectrum. The latter is mapped back to the 0 dB -2.75 -0.96 0.47 1.09
Street
time by performing inverse STFT followed by OLA synthesis. 5 dB 0.72 1.35 1.91 2.94
0 dB -2.44 -2.31 -0.59 0.71
Restaurant
V. E XPERIMENTAL E VALUATION 5 dB 1.14 1.43 2.07 3.67
0 dB -3.02 -2.24 -0.85 0.47
In this section we describe objective evaluation experiments Babble
5 dB 0.84 1.28 2.36 3.16
that were performed to assess the performance of the proposed
algorithm, referred to as codebook-based MME (CB-MME). Other
enhancement methods, including the acoustic domain MMSE-STSA
[4] and modulation domain MME [9], were also evaluated for short segments of speech; higher SegSNR values indicate lesser
comparison. background noise.

A. Methodology
Speech utterances of two male and two female speakers from B. Results & Discussion
the TSP [23] and TIMIT databases were used for conducting the
experiments, along with different types of noise samples from the The PESQ and SegSNR results for different noises at SNR
NoiseX92 [24] and Sound Jay [25] databases, including babble, of 0 and 5dB are reported in Tables I and II, respectively. It
street and restaurant noise. In addition, a non-stationary (i.e. am- can be seen that the proposed CB-MME method performs better
plitude modulated) Gaussian white noise was also considered. All than the MME and MMSE methods, for both performance metrics
the speech and noise files were uniformly sampled at a rate of under consideration. Results for other SNR and noise types (not
16kHz. The LP coefficient order p was set to 10 for both speech shown) show a similar trend. Informal listening tests concur with
and noise codebooks. A 7-bit speech codebook was trained with the objective results. The proposed CB-MME method seems to
7.5 minutes of clean speech from the above mentioned sources. suppress non-stationary elements of background noise better than
(i.e 55 short sentences for each speaker). A 4-bit noise codebook MMSE and MME, at the expense of some slight distortion in the
was trained using over 1 minute of noise data from the available enhanced speech. This is mainly due to the use of a codebook-based
databases (i.e. about 15s for each noise type). For the testing, i.e. approach, which performs on-line noise PSD estimation on a frame-
objective evaluation of the various algorithms, noisy speech files by-frame basis based on current observation, as opposed to the MS
were generated by adding scaled segments of noise to the clean approach used in the MMSE and MME algorithms, which relies on
speech. For each speaker, 3 sentences were selected and combined a long buffer of past frames. The slight distortion could be caused
with the four different types of noise, properly scaled to obtain the by the spectral mismatch between the codebook-based speech PSD
desired SNR values of 0 and 5dB. The speech and noise samples estimate and the actual one, which remains a topic for future study.
used for testing were different from those used to train the two
codebooks.
Fine tuning of parameters is crucial for the performance of the VI. C ONCLUSION
proposed enhancement method. The acoustic frame duration was
chosen to be 32ms, while the values of the other analysis parameters In this paper, we have proposed a new speech enhancement
where chosen empirically as follows: acoustic frame advance F method that performs noise suppression in the modulation domain
= 4ms, modulation frame duration NM = 80, modulation frame with speech and noise PSD obtained from a codebook-based esti-
advance FM = 8ms and control factor α = 0.95. mation approach. We use codebooks of linear prediction coefficients
For the objective evaluation of the enhanced speech, we used the and gains obtained by training with the LBG algorithm. The PSD
perceptual evaluation of speech quality (PESQ) and the segmental estimates derived from the codebooks were used to calculate an
SNR (SegSNR) as performance measures. PESQ [26] is widely MMSE gain function, which was applied to the modulation magni-
used for automated assessment of speech quality as experienced tude spectrum of the noisy speech in order to suppress noise. Results
by a listener, where higher PESQ values indicate a better speech of objective evaluation showed improvements in the suppression of
quality. SegSNR is defined as the average SNR calculated over non-stationary noise with the proposed CB-MME approach.

710
2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

R EFERENCES [27] E. Vincent, R. Gribonval, C. Fevotte, “Performance measurement

in blind audio source separation,” IEEE Trans. Audio, Speech and
[1] S. F. Boll, “Suppression of acoustic noise in speech using spectral Language Process., vol. 14, no. 4, pp. 1462-1469, Jul. 2006.
subtraction,” IEEE Trans. Acoust. Speech Signal Process., vol. 27, pp.
113-120, Apr. 1979.
[2] N. Virag, “Single channel speech enhancement based on masking
properties of the human auditory system,” IEEE Trans. Speech Audio
Process., vol. 7, pp. 126-137, Mar. 1999.
[3] J. Chen, J. Benesty, Y. Huang, “New insights into the noise reduction
Wiener filter,” IEEE Trans. Acoust. Speech Signal Process., vol. 14, pp.
1218-1234, Jul. 2006.
[4] Y. Ephraim, D. Malah, “Speech enhancement using a minimum mean-
square error short-time spectral amplitude estimator,” IEEE Trans.
Acoust. Speech Signal Process., vol. 32, pp. 1109-1121, Dec. 1984.
[5] E. Plourde, B. Champagne, “Generalized Bayesian estimators of the
spectral amplitude for speech enhancement,” IEEE Signal Process.
Letters, vol. 16, pp. 485-488, Jun. 2009.
[6] D. Griffin, J. Lim, “Signal estimation from modified short-time Fourier
transform,” IEEE Trans. Acoust. Speech Signal Process., vol. 2, pp. 236-
243, Apr. 1984.
[7] T. Quatieri, Discrete-Time Speech Signal Processing: Principles and
Practice, Prentice Hall, 2002.
[8] K. Paliwal, K. Wojcicki, B. Schwerin, “Single-channel speech enhance-
ment using spectral subtraction in the short-time modulation domain,”
Speech Commun., vol. 52, no. 5, pp. 450-475, May 2010.
[9] K. Paliwal, B. Schwerin, K. Wojcicki, “Speech enhancement using
minimum mean-square error short-time spectral modulation magnitude
estimator,” Speech Commun., vol. 54, no. 2, pp. 282-305, Feb. 2012.
[10] L. Atlas, S. Shamma, “Joint acoustic and modulation frequency,”
EURASIP J. on Applied Signal Process., vol. 7 , pp. 668-675, Jan. 2003.
[11] A. I. Shim, B. G. Berg, “Estimating critical bandwidths of temporal
sensitivity to low-frequency amplitude modulation,” J. Acoustical Soci-
ety of America, vol. 5, pp. 2834-2838, May 2013.
[12] K. Paliwal, B. Schwerin, “Modulation Processing for Speech Enhance-
ment,” Chap. 10 in T. Ogunfunmi, R. Togneri and M. Narasimha,
Eds., Speech and Audio Processing for Coding, Enhancement and
Recognition, Springer 2015.
[13] R. Martin, “Noise power spectral density estimation based on optimal
smoothing and minimum statistics,” IEEE Trans. Speech Audio Process.,
vol. 9, no. 5, pp. 504-512, Jul. 2001.
[14] I. Cohen, “Noise spectrum estimation in adverse environments: im-
proved minima controlled recursive averaging ,” IEEE Trans. on Speech
and Audio Process., vol. 11, pp. 466-475, Sep. 2003.
[15] V. Stahl, A. Fischer, R. Bippus, “Quantile based noise estimation for
spectral subtraction and wiener filtering,” Proc. of IEEE Int. Conf. on
Acoustics, Speech, and Signal Process., vol.3, pp. 1875-1878, Jun. 2000.
[16] S. Srinivasan, J. Samuelsson, W. B. Kleijn, “Speech enhancement using
a-priori information,” Proc. Eurospeech,, pp. 1405-1408, Sep. 2003.
[17] M. Kuropatwinski, W. B. Kleijn, “Estimation of the short-term predic-
tor parameters of speech under noisy conditions,” IEEE Trans. Audio,
Speech, Lang. Process., vol. 14, no. 5, pp. 1645-1655, Sep. 2006.
[18] S. Srinivasan, J. Samuelsson, W. B. Kleijn, “Codebook driven short-
term predictor parameter estimation for speech enhancement,” IEEE
Trans. Audio, Speech, Language Process., vol. 14, no. 1, pp. 163-176,
Jan. 2006.
[19] S. Srinivasan, J. Samuelsson, W. B. Kleijn, “Codebook-based Bayesian
speech enhancement for nonstationary environments,” IEEE Trans.
Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 441-452, Feb. 2007.
[20] T. Rosenkranz, “Modeling the temporal evolution of LPC parameters
for codebook-based speech enhancement,” Int. Symp. on Image and
Signal Process. and Analysis, Salzburg, pp. 455-460 , Sep. 2009.
[21] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical
Learning, 2nd Ed. Springer, 2009.
[22] Y. Linde, A. Buzo, R. M. Gray, “An algorithm for vector quantizer
design,” IEEE Trans. Communications, vol. 28, no. 1, pp. 84-95, Jan.
1980.
[23] P. Kabal, McGill University, “TSP speech database,” Tech. Rep., 2002.
[24] Rice University, “Signal processing information base: noise data.”
Available online: http://spib.rice.edu/spib/select noise.html.
[25] Sound Jay, “Ambient and special sound effects.” Available online:
http://www.soundjay.com/ambient-sounds-2.html.
[26] ITU-T. P.862, “Perceptual evaluation of speech quality (PESQ): and
objective method for end-to-end speech quality assessment of narrow-
band telephone networks and speech codecs,” Tech. Rep., 2000.

711

(Whitechapel Documents of Contemporary Art) Amelia Groom - Time-Whitechapel Gallery - MIT Press (2013)
No ratings yet
(Whitechapel Documents of Contemporary Art) Amelia Groom - Time-Whitechapel Gallery - MIT Press (2013)
240 pages
Noise Reduction in Speech Processing PDF
100% (1)
Noise Reduction in Speech Processing PDF
240 pages
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
No ratings yet
Speech Enhancement Noise Reduction Noise Reduction: Pham Van Tuan
26 pages
Biomedical Signal Processing and Signal Modeling - Bruce PDF
No ratings yet
Biomedical Signal Processing and Signal Modeling - Bruce PDF
14 pages
Art in The Classroom
No ratings yet
Art in The Classroom
4 pages
Single Channel Speech Enhancement Using MMSE Estimation of Short-Time Modulation Magnitude Spectrum
No ratings yet
Single Channel Speech Enhancement Using MMSE Estimation of Short-Time Modulation Magnitude Spectrum
4 pages
2019 Speech Enhancement For Secure Communication
No ratings yet
2019 Speech Enhancement For Secure Communication
19 pages
MMSE STSA Based Techniques For Single Channel Speech Enhancement
No ratings yet
MMSE STSA Based Techniques For Single Channel Speech Enhancement
5 pages
Paper 5
No ratings yet
Paper 5
19 pages
GSVD-Based Optimal Filtering for Single and Multimicrophone Speech Enhancement
No ratings yet
GSVD-Based Optimal Filtering for Single and Multimicrophone Speech Enhancement
15 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
10 pages
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
No ratings yet
Am-Demodulation of Speech Spectra and Its Application To Noise Robust Speech Recognition
4 pages
Spectral Analysis in Speech Processing Techniques: Prof. Vijaya Sugandhi
No ratings yet
Spectral Analysis in Speech Processing Techniques: Prof. Vijaya Sugandhi
3 pages
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
No ratings yet
Multi-Level Single-Channel Speech Enhancement Using A Unified Framework For Estimating Magnitude and Phase Spectra
13 pages
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
No ratings yet
An Experimental Study On Speech Enhancement Based On A Combination of Wavelets and Deep Learning
17 pages
Comb - Investigation of The Effect of Speech Enhancement On The Watermarking Process
No ratings yet
Comb - Investigation of The Effect of Speech Enhancement On The Watermarking Process
12 pages
LP Based Technology
No ratings yet
LP Based Technology
37 pages
Surround Noise Cancellation and Speech Enhancement Using Sub Band Filtering and Spectral Subtraction
No ratings yet
Surround Noise Cancellation and Speech Enhancement Using Sub Band Filtering and Spectral Subtraction
8 pages
Speech Enhancement Using Signal Subspace Algorithm
No ratings yet
Speech Enhancement Using Signal Subspace Algorithm
4 pages
Cassia PhD13
No ratings yet
Cassia PhD13
251 pages
Multi-Band Spectral Subtraction Algorithm For Speech Enhancement
No ratings yet
Multi-Band Spectral Subtraction Algorithm For Speech Enhancement
12 pages
Different Approaches of Spectral Subtraction Method For Enhancing The Speech Signal in Noisy Environments
No ratings yet
Different Approaches of Spectral Subtraction Method For Enhancing The Speech Signal in Noisy Environments
6 pages
A Block-Based Linear MMSE Noise Reduction With A H PDF
No ratings yet
A Block-Based Linear MMSE Noise Reduction With A H PDF
15 pages
Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF
No ratings yet
Boll79 SuppressionAcousticNoiseSpectralSubtraction PDF
8 pages
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
No ratings yet
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
14 pages
New Insights Into The Noise Reduction Wiener Filter
No ratings yet
New Insights Into The Noise Reduction Wiener Filter
17 pages
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
No ratings yet
Noise Estimation and Noise Removal Techniques For Speech Recognition in Adverse Environment
8 pages
Suppression of Acoustic Noise in Speech Processing
No ratings yet
Suppression of Acoustic Noise in Speech Processing
8 pages
Speech Enhancement Through Elimination of Impulsive Disturbance Using Log MMSE Filtering
No ratings yet
Speech Enhancement Through Elimination of Impulsive Disturbance Using Log MMSE Filtering
4 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
No ratings yet
Sensors: Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction
14 pages
DSP Paper
No ratings yet
DSP Paper
16 pages
Speech Enhancement Using LPC Analysis-A Review
No ratings yet
Speech Enhancement Using LPC Analysis-A Review
6 pages
Speech Enhancement
No ratings yet
Speech Enhancement
9 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
8 pages
sensors-22-04473
No ratings yet
sensors-22-04473
13 pages
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
No ratings yet
A Corpus-Based Approach To Speech Enhancement From Nonstationary Noise
15 pages
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
No ratings yet
Implementation of Adaptive Filtering Algorithm For Speech Signal On FPGA
5 pages
Post-Processing Method For Single Channel Speech Enhancement Systems 1
No ratings yet
Post-Processing Method For Single Channel Speech Enhancement Systems 1
74 pages
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
No ratings yet
T - C S E I C: WO Hannel Peech Nhancement AND Mplementation Onsiderations
180 pages
Ubiquitous Computing and Communication Journal - 72
No ratings yet
Ubiquitous Computing and Communication Journal - 72
8 pages
Speech Enhancement Based On ESS
No ratings yet
Speech Enhancement Based On ESS
8 pages
Comparison of Speech Enhancement Algorithms: Sciencedirect
No ratings yet
Comparison of Speech Enhancement Algorithms: Sciencedirect
11 pages
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
No ratings yet
A Novel Expectation-Maximization Framework For Speech Enhancement in Non-Stationary Noise Environments
12 pages
5
No ratings yet
5
11 pages
Speech Enhancement
No ratings yet
Speech Enhancement
5 pages
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
No ratings yet
Spectral Restoration Based Speech Enhancement For Robust Speaker Identification
6 pages
Noise PSD
No ratings yet
Noise PSD
4 pages
Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement
No ratings yet
Performance Evaluation of Different Thresholding Methods in Time Adaptive Wavelet Based Speech Enhancement
9 pages
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
No ratings yet
Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms
11 pages
TEST-1
No ratings yet
TEST-1
77 pages
Application of Microphone Array For Speech Coding in Noisy Environment
No ratings yet
Application of Microphone Array For Speech Coding in Noisy Environment
5 pages
Speech Enhancement: Chunjian Li Aalborg University, Denmark
No ratings yet
Speech Enhancement: Chunjian Li Aalborg University, Denmark
44 pages
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
No ratings yet
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
4 pages
A Review of Speech Signal Enhancement Techniques
No ratings yet
A Review of Speech Signal Enhancement Techniques
4 pages
Hu and Loizou 2006
No ratings yet
Hu and Loizou 2006
14 pages
BTP Group-1 Report
No ratings yet
BTP Group-1 Report
21 pages
Finding Structure in Audio For Music Information Retrieval - Naga Bhaskar
No ratings yet
Finding Structure in Audio For Music Information Retrieval - Naga Bhaskar
13 pages
Good Matter
No ratings yet
Good Matter
57 pages
Adaptive Blind Noise Suppression in Some Speech Processing Applications
No ratings yet
Adaptive Blind Noise Suppression in Some Speech Processing Applications
15 pages
Comparison of Noise Removal and Echo Cancellation For Audio Signals
No ratings yet
Comparison of Noise Removal and Echo Cancellation For Audio Signals
3 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
chunking
No ratings yet
chunking
19 pages
Transfermgr D 21 02696 PDF
No ratings yet
Transfermgr D 21 02696 PDF
30 pages
Rau's IAS CSAT FLT 1 PDF
No ratings yet
Rau's IAS CSAT FLT 1 PDF
32 pages
DIP
No ratings yet
DIP
5 pages
Demucs PDF
100% (1)
Demucs PDF
17 pages
WSN
No ratings yet
WSN
4 pages
JD-R59680 Senior Data Scientist
No ratings yet
JD-R59680 Senior Data Scientist
2 pages
Speech
No ratings yet
Speech
7 pages
CNN Basic
No ratings yet
CNN Basic
11 pages
Using Matlab With Python Cheat Sheet
0% (1)
Using Matlab With Python Cheat Sheet
1 page
Advances in Computational Intelligence
No ratings yet
Advances in Computational Intelligence
26 pages
Registration Form
No ratings yet
Registration Form
1 page
Microprocessor UNIT-6
No ratings yet
Microprocessor UNIT-6
15 pages
PART III: Biomedical Signal Processing: An Introduction
No ratings yet
PART III: Biomedical Signal Processing: An Introduction
83 pages
Pin Config1 SNK
No ratings yet
Pin Config1 SNK
6 pages
VGST Components
No ratings yet
VGST Components
4 pages
Viva Questions W Answers
No ratings yet
Viva Questions W Answers
12 pages
Laboratory Guide Extraction of Neem Leaves and Seeds
No ratings yet
Laboratory Guide Extraction of Neem Leaves and Seeds
2 pages
Opportunity Analysis Project: Learning Outcomes Addressed
No ratings yet
Opportunity Analysis Project: Learning Outcomes Addressed
7 pages
Broadband in Malaysia
No ratings yet
Broadband in Malaysia
4 pages
Program: 5: Write A Program To Implement and Find Class, Network ID and Host ID From Given IPV4 Address
No ratings yet
Program: 5: Write A Program To Implement and Find Class, Network ID and Host ID From Given IPV4 Address
5 pages
Samsung: Communications and Device Solutions. Samsung Is The World's Largest Mobile Phone and
No ratings yet
Samsung: Communications and Device Solutions. Samsung Is The World's Largest Mobile Phone and
5 pages
Xray 500ma With DR
No ratings yet
Xray 500ma With DR
3 pages
Pediatrics CWU Year 5
No ratings yet
Pediatrics CWU Year 5
7 pages
Norma R13 PDF
No ratings yet
Norma R13 PDF
278 pages
Kegels Done Right v1.1
No ratings yet
Kegels Done Right v1.1
7 pages
Bolotin VV-The Dynamic Stability of Elastic Systems
100% (2)
Bolotin VV-The Dynamic Stability of Elastic Systems
455 pages
H2_2023_Report_CERT_ENG
No ratings yet
H2_2023_Report_CERT_ENG
25 pages
Led TV: Service Manual
No ratings yet
Led TV: Service Manual
77 pages
1 PB
No ratings yet
1 PB
8 pages
Maths Shine Academy 11+ Mock Exam 2023
No ratings yet
Maths Shine Academy 11+ Mock Exam 2023
24 pages
Farandola - Contrabajo
No ratings yet
Farandola - Contrabajo
2 pages
Quadro RTX Mobile Line Card Us Nvidia r7 Web
No ratings yet
Quadro RTX Mobile Line Card Us Nvidia r7 Web
1 page
Complete Download The Casting Powders Book 1st Edition Kenneth C. Mills PDF All Chapters
100% (9)
Complete Download The Casting Powders Book 1st Edition Kenneth C. Mills PDF All Chapters
52 pages
LP Arts 4 RP
No ratings yet
LP Arts 4 RP
6 pages
Lesson Plan in Math 8 Using Problem Based Learning
80% (5)
Lesson Plan in Math 8 Using Problem Based Learning
2 pages
Summit 2 Workbook PDF by Joan M. Saslow
No ratings yet
Summit 2 Workbook PDF by Joan M. Saslow
1 page
Amidst Endless Quiet
No ratings yet
Amidst Endless Quiet
2 pages
CH5014 Final Exam Samplepaper
No ratings yet
CH5014 Final Exam Samplepaper
2 pages
Chapter 8 - Trends (Part 2)
No ratings yet
Chapter 8 - Trends (Part 2)
21 pages
Loewe L2710 Service Manual
No ratings yet
Loewe L2710 Service Manual
83 pages
Elements of Drama and Types of Drama
100% (1)
Elements of Drama and Types of Drama
29 pages
OceanofPDF.com Jessicas Cowboy Daddy - Melinda Barron
No ratings yet
OceanofPDF.com Jessicas Cowboy Daddy - Melinda Barron
141 pages
What Are The Types of Meaning Based On Leech's Theory
No ratings yet
What Are The Types of Meaning Based On Leech's Theory
3 pages

Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation

Uploaded by

Speech Enhancement in Modulation Domain Using Codebook-Based Speech and Noise Estimation

Uploaded by

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Speech Enhancement in Modulation Domain Using

978-1-4799-7591-4/15/$31.00 ©2015 IEEE 707

In addition to the codebook vectors generated from training on where ωk = 2πk N

TABLE I: PESQ values

R EFERENCES [27] E. Vincent, R. Gribonval, C. Fevotte, “Performance measurement

You might also like