Full Single-Type Deep Learning Models With Multihead Attention For Speech Enhancement
Full Single-Type Deep Learning Models With Multihead Attention For Speech Enhancement
15 April 2023
Full single-type deep learning models with multihead
attention for speech enhancement
Noel Zacarias-Morales,
José Adán Hernández-Nolasco &
Pablo Pancardo
Applied Intelligence (2023)Cite this article
Metricsdetails
Abstract
Artificial neural network (ANN) models with attention mechanisms for
eliminating noise in audio signals, called speech enhancement models, have
proven effective. However, their architectures become complex, deep, and
demanding in terms of computational resources when trying to achieve higher
levels of efficiency. Given this situation, we selected and evaluated simple and
less resource-demanding models and utilized the same training parameters
and performance metrics to conduct a fair comparison among the four
selected models. Our purpose was to demonstrate that simple neural network
models with multihead attention are efficient when implemented on
computational devices with conventional resources since they provide results
that are competitive with those of hybrid, complex and resource-demanding
models. We experimentally evaluated the efficiency of multilayer perceptron
(MLP), one-dimensional and two-dimensional convolutional neural network
(CNN), and gated recurrent unit (GRU) deep learning models with and
without multiheaded attention. We also analyzed the generalization capability
of each model. The results showed that although these architectures were
composed of only one type of ANN, multihead attention increased the
efficiency of the speech enhancement process, yielding results that were
competitive with those of complex models. Therefore, this study is helpful as a
reference for building simple and efficient single-type ANN models with
attention.
This is a preview of subscription content, access via your institution.
Data Availability
The datasets generated and analyzed during the current study are available
from the corresponding author upon reasonable request.
References
1. Brauwers G, Frasincar F (2021) A general survey on attention
mechanisms in deep learning. IEEE Trans Knowl Data Eng:1–
1. https://doi.org/10.1109/TKDE.2021.3126456
2. Fan C, Yi J, Tao J, et al (2021) Gated recurrent fusion with joint training
framework for robust end-to-end speech recognition. IEEE/ACM Trans
Audio Speech Language Process 29:198–
209. https://doi.org/10.1109/TASLP.2020.3039600
Article Google Scholar
Article Google Scholar
Article Google Scholar
Article Google Scholar