Speech Enhancement Through Elimination of Impulsive Disturbance Using Log MMSE Filtering
Speech Enhancement Through Elimination of Impulsive Disturbance Using Log MMSE Filtering
in
International Journal Of Engineering And Computer Science ISSN:2319-7242
Volume 2 Issue 12, Dec.2013 Page No. 3435-3438
Abstract:
The project presents an enhancement of the speech signal by removal of impulsive disturbance from noisy speech using log
minimum mean square error filtering approach. Impulsive noise has a potential to degrade the performance and reliability of Speech
signal. To enhance the speech component from impulsive disturbance we go for emphasis, signal segmentation and log MMSE filtering.
In preprocessing of audio signals start with pre-emphasis refers to a system process designed to increase the magnitude of some
frequencies with respect to the magnitude of other frequencies. Emphasis refers to a system process designed to increase the magnitude
of some frequencies with respect to the magnitude of other frequencies in order to improve the overall signal-to-noise ratio. Then the
signal samples are segmented into fixed number of frames and each frame samples are evaluated with hamming window coefficients.
Mean-Square Error Log-Spectral Amplitude (MMSE), which minimizes the mean-square error of the log-spectra, is obtained as a
weighted geometric mean of the gains associated with the speech signal. The performance of the filtering is measured with signal to
noise ratio, Perceptual Evaluation of Speech Quality (PESQ), Correlation
D.Koti Reddy1IJECS Volume 2 Issue 12, Dec. 2013, Page No.3435-3438 Page 3435
research topics are the speech enhancement and speech recognition
in noisy environments have arose. For the speech enhancement, the
extraction of a signal buried in noise, adaptive noise cancellation
(ANC) provides a good solution. In contrast to other enhancement
techniques, its great strength lies in the fact that no a priori
knowledge of signal or noise is required in advance. The advantage Fig. 1 Mother wavelet w(t)
is gained with the auxiliary of a secondary input to measure the
noise source. The cancellation operation is based on the following
principle. Since the desired signal is corrupted by the noise, if the Normally it starts at time t = 0 and ends at t = T. The
noise can be estimated from the noise source, this estimated noise shifted wavelet w(t - m) starts at t = m and ends at t = m + T. The
can then be subtracted from the primary channel resulting in the scaled wavelets w(2kt) start at t = 0 and end at t = T/2k. Their
desired signal. Traditionally, this task is done by linear filtering. In graphs are w(t) compressed by the factor of 2k as shown in Fig.
real situations, the corrupting noise is a nonlinear distortion version 3.3. For example, when k = 1, the wavelet is shown in Fig 3.3 (a).
of the source noise, so a nonlinear filter should be a better choice. If k = 2 and 3, they are shown in (b) and (c), respectively.
In the typical speech enhancement methods based on STFT, only
the magnitude spectrum is modified and phase spectrum is kept
unchanged. It was believed that the magnitude spectrum includes
most of the information of the speech, and phase spectrum contains
little of that. Furthermore, the human auditory system is phase
deaf. For above reason, in typical speech enhancement algorithms,
such as Spectral subtraction (SS), MMSE-STSA or MAP
algorithm, the speech enhancement process is on the basis of
(a)w(2t) (b)w(4t) (c)w(8t)
spectral magnitude component only and keep the phase component
unchanged. Fig. 2 Scaled wavelets
The wavelets are called orthogonal when their inner products are
II. WAVELET BASED DENOISING zero. The smaller the scaling factor is, the wider the wavelet is.
Wide wavelets are comparable to low-frequency sinusoids and
Wavelets are mathematical functions defined over a finite interval narrow wavelets are comparable to high-frequency sinusoids. The
and having an average value of zero that transform data into reconstruction of the image is achieved by the inverse discrete
different frequency components, representing each component with wavelet transform (IDWT). The values are first up sampled and
a resolution matched to its scale. The basic idea of the wavelet then passed to the filters.
transform is to represent any arbitrary function as a superposition
of a set of such wavelets or basis functions. These basis functions
or baby wavelets are obtained from a single prototype wavelet
called the mother wavelet, by dilations or contractions (scaling)
and translations (shifts). They have advantages over traditional
Fourier methods in analyzing physical situations where the signal
contains discontinuities and sharp spikes. Many new wavelet
applications such as image compression, turbulence, human vision,
radar, and earthquake prediction are developed in recent years. In Fig. 3 Wavelet Reconstruction
D.Koti Reddy1IJECS Volume 2 Issue 12, Dec. 2013, Page No.3435-3438 Page 3436
Fig. 4 Reconstruction using up sampling.
means). For the case of stochastic signals, these notes look at the
derivation of the correlation values required for a minimum mean-
square error solution. We also examine systems which involve
cyclo stationary signals (interpolation filter, for instance).
D.Koti Reddy1IJECS Volume 2 Issue 12, Dec. 2013, Page No.3435-3438 Page 3437
study, the averaging operation will remove the dependency on n. V. CONCLUSION
For the case of wide-sense stationary processes, results are
The project presented that an enhancement of the speech
reviewed in Appendix B. signal by removal of impulsive disturbance based on log spectral
gain filtering approach. Here, Mean-Square Error Log-Spectral
The error is, Amplitude was used to minimize the mean-square error of the log-
spectra, is obtained as a weighted geometric mean of the gains
associated with the speech signal effectively. It provided that better
results in terms of performance parameters, processing time and
speech signal quality rather than prior methods. This system will
be enhanced with a modified filtering method to restore signals
with better accuracy rather than Log spectra.
VI. REFERENCES
IV. SIMULATION RESULTS
[1] P. Vary and R. Martin, Digital Speech Transmission:
We analyzed the performance of the proposed method within
a variety of noise scenarios. Results were compared to four Enhancement, Coding and Error Concealment. Chichester, U.K.:
established reference techniques. Two of these reference Wiley, 2006.
techniques were log-MMSE enhancers after Ephraim and Malah.
One of these methods, referred to as log-MMSE(MS), employed [2] P. C. Loizou, Speech Enhancement—Theory and Practice.
the Minimum Statistics technique developed by Martin [11] to Boca Raton, FL, USA: CRC, Taylor and Francis, 2007.
estimate the underlying noise power. The other method, referred to
as log-MMSE(RA), employed the same VAD-supported Recursive [3] X. Xiao and R. M. Nickel, “Speech enhancement with
Averaging that was also used in the “log-MMSE Filter” block of inventory style speech resynthesis,” IEEE Trans. Audio, Speech,
Fig. 5. The performance gains between the log-MMSE(RA)
method and the proposed method are, therefore, directly Lang. Process., vol. 18, no. 6, pp. 1243–1257, Aug. 2010.
attributable to the inventory search and the subsequent cepstral [4] J. Ming, R. Srinivasan, and D. Crookes, “A corpus-based
smoothing. As a third reference method
we chose the Multiband Spectral Subtraction (MBSS) technique approach to speech enhancement from nonstationary noise,” IEEE
proposed by Kamath and Loizou [2] and lastly, we also Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 822–836,
implemented a slightly modified version of the inventory-style
baseline system. May 2011.
Fig:7.Denoised Signal
D.Koti Reddy1IJECS Volume 2 Issue 12, Dec. 2013, Page No.3435-3438 Page 3438