0% found this document useful (0 votes)
37 views

Full Single-Type Deep Learning Models With Multihead Attention For Speech Enhancement

Uploaded by

edramonh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Full Single-Type Deep Learning Models With Multihead Attention For Speech Enhancement

Uploaded by

edramonh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

 Published: 

15 April 2023
Full single-type deep learning models with multihead
attention for speech enhancement
 Noel Zacarias-Morales, 
 José Adán Hernández-Nolasco & 
 Pablo Pancardo 
Applied Intelligence (2023)Cite this article
 Metricsdetails
Abstract
Artificial neural network (ANN) models with attention mechanisms for
eliminating noise in audio signals, called speech enhancement models, have
proven effective. However, their architectures become complex, deep, and
demanding in terms of computational resources when trying to achieve higher
levels of efficiency. Given this situation, we selected and evaluated simple and
less resource-demanding models and utilized the same training parameters
and performance metrics to conduct a fair comparison among the four
selected models. Our purpose was to demonstrate that simple neural network
models with multihead attention are efficient when implemented on
computational devices with conventional resources since they provide results
that are competitive with those of hybrid, complex and resource-demanding
models. We experimentally evaluated the efficiency of multilayer perceptron
(MLP), one-dimensional and two-dimensional convolutional neural network
(CNN), and gated recurrent unit (GRU) deep learning models with and
without multiheaded attention. We also analyzed the generalization capability
of each model. The results showed that although these architectures were
composed of only one type of ANN, multihead attention increased the
efficiency of the speech enhancement process, yielding results that were
competitive with those of complex models. Therefore, this study is helpful as a
reference for building simple and efficient single-type ANN models with
attention.
This is a preview of subscription content, access via your institution.
Data Availability
The datasets generated and analyzed during the current study are available
from the corresponding author upon reasonable request.
References
1. Brauwers G, Frasincar F (2021) A general survey on attention
mechanisms in deep learning. IEEE Trans Knowl Data Eng:1–
1. https://doi.org/10.1109/TKDE.2021.3126456
2. Fan C, Yi J, Tao J, et al (2021) Gated recurrent fusion with joint training
framework for robust end-to-end speech recognition. IEEE/ACM Trans
Audio Speech Language Process 29:198–
209. https://doi.org/10.1109/TASLP.2020.3039600

Article Google Scholar 

3. Galassi A, Lippi M, Torroni P (2020) Attention in natural language


processing. IEEE Trans Neural Netw Learn Syst 32(10):4291–
4308. https://doi.org/10.1109/TNNLS.2020.3019893

Article Google Scholar 

4. Garofolo J, Lamel L, Fisher W et al (1992) Timit acoustic-phonetic


continuous speech corpus. Linguis Data
Consortium. https://doi.org/10.35111/17gk-bn40
5. Hatzopoulos S, Ciorba AH, Skarzynski P (eds) (2020) The human
auditory system - basic features and updates on audiological diagnosis
and therapy. IntechOpen,
Rijeka. https://doi.org/10.5772/intechopen.77713
6. Hu G, Wang D (2010) A tandem algorithm for pitch estimation and
voiced speech segregation. IEEE Trans Audio Speech Lang Process
18(8):2067–2079. https://doi.org/10.1109/TASL.2010.2041110

Article Google Scholar 

7. Jensen J, Taal C H, Jensen J, et al (2016) An algorithm for predicting


the intelligibility of speech masked by modulated noise maskers.
IEEE/ACM Transactions on Audio. Speech Lang Process 24 (11):2009–
2022. https://doi.org/10.1109/TASLP.2016.2585878

Article Google Scholar 

8. Kamath U, Graham K, Emara W (2022) Transformers for Machine


Learning: A Deep Dive. Chapman and Hall/CRC, New
York. https://doi.org/10.1201/9781003170082
Book Google Scholar 

You might also like