Autoencoders and Their Applications in Machine Learning
Autoencoders and Their Applications in Machine Learning
https://doi.org/10.1007/s10462-023-10662-6
Abstract
Autoencoders have become a hot researched topic in unsupervised learning due to their
ability to learn data features and act as a dimensionality reduction method. With rapid
evolution of autoencoder methods, there has yet to be a complete study that provides a
full autoencoders roadmap for both stimulating technical improvements and orienting
research newbies to autoencoders. In this paper, we present a comprehensive survey of
autoencoders, starting with an explanation of the principle of conventional autoencoder and
their primary development process. We then provide a taxonomy of autoencoders based
on their structures and principles and thoroughly analyze and discuss the related models.
Furthermore, we review the applications of autoencoders in various fields, including
machine vision, natural language processing, complex network, recommender system,
speech process, anomaly detection, and others. Lastly, we summarize the limitations of
current autoencoder algorithms and discuss the future directions of the field.
* Fatemeh Daneshfar
[email protected]
Kamal Berahmand
[email protected]
Elaheh Sadat Salehi
[email protected]
Yuefeng Li
[email protected]
Yue Xu
[email protected]
1
School of Computer Science, Faculty of Science, Queensland University of Technology (QUT),
Brisbane, Australia
2
Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran
3
Department of Electrical and Computer Engineering, University of Shiraz, Shiraz, Iran
13
Vol.:(0123456789)
28 Page 2 of 52 K. Berahmand et al.
List of symbols
X The input
X ′ The reconstructed output
X̂′ The noisy input
Z The hidden representation of the input data
L The graph Laplacian matrix
W Non-negative matrices (basis vectors)
H Non-negative matrices (coefficients or activations)
We The encoder weight matrix
Wd The decoder weight matrix
D The distances matrix between neighbors
N The number of data points
E The expectation operator
𝜆 The regularization parameter
KL(.||.) The Kullback–Leibler divergence
p(.) The probability distribution
q(.) The approximate probability distribution of p(.)
f(.) The encoder function
g(.) The decoder function
tr(.) The trace of the matrix
D(.) The discriminator’s output for a real data point
G(.) The generator’s output for the latent variable
‖.‖ The 2-norm of a vector
‖.‖F The Frobenius norm
‖X − X � ‖2F The reconstruction loss
Abbreviations
AA Adversarial Autoencoder
AAE Adversarial Autoencoder
AE Autoencoder
AGAE Adversarial Graph Autoencoder
BAE Bayesian Autoencoder
BCE Binary Cross-Entropy
BiRNNAE Bidirectional Autoencoder
CAE Convolutional Autoencoder
CAE Convolutional Autoencoder
CNN Convolutional Neural Network
CVAE Convolutional Variational Autoencoder
CSAE Convolutional Sparse Autoencoder
DAE Denoising Autoencoder
DVAE Disentangled Variational Autoencoder
GAE Graph Autoencoder
GAAE Graph Attentional Autoencoder
GCN Graph Convolution Network
GMAE Graph Masked Autoencoder
GPU Graphics Processing Unit
GRUAE GRU Autoencoder
ISOMAP Isometric Feature Mapping
13
Autoencoders and their applications in machine learning: a… Page 3 of 52 28
1 Introduction
Dimension reduction is crucial in machine learning for simplifying complex data sets
(Van Der Maaten et al. 2009), reducing computational complexity (Ray et al. 2021),
and mitigating the curse of dimensionality (Talpur et al. 2023), ultimately improving
model performance and interpretability. Dimension reduction encompasses two primary
approaches: feature selection (Solorio-Fernández et al. 2022), which involves choosing a
subset of the most informative features from the original data-set to reduce dimensionality
while maintaining interpretability; and feature extraction (Li et al. 2022), a method where
new, lower-dimensional features are derived from the original data to capture essential
patterns and relationships.
Feature extraction comprises both linear and nonlinear techniques that transform the
original data into a lower-dimensional representation. Linear feature extraction such as
Factor Analysis (FA) (Garson 2022), Linear Discriminant Analysis (LDA) (Balakrishnama
and Ganapathiraju 1998), Principal Component Analysis (PCA) (Abdi and Williams
2010) and Non-negative Matrix Factorization (NMF) (Lee and Seung 2000) involves
transforming the input data into a new set of features using linear combinations of the
original input features (Wang et al. 2023).
Linear methods are relatively straightforward and computationally efficient. They
often provide interpretable results, making it easier to understand the importance
of each feature, and are effective when the underlying relationships in the data
are approximately linear. However, they capture global correlations, and result in
13
28 Page 4 of 52 K. Berahmand et al.
Fig. 1 Categorization of feature extraction methods into linear and non-linear approaches
13
Table 1 Methods for dimensionality reduction
Method Type Method Loss function L Description
Linear FA (Shrestha 2021) min(−0.5 log |Ψ| Explains patterns of correlations among observed variables by uncovering underlying latent
factors
+ tr(SΨ−1 )
+ 0.5dp log(2𝜋))
PCA (Hasan and Abdulazeez 2021) maxW T W=I trace(W T AW) Optimizes the projection of data onto its principal components by maximizing the variance
along those components
( )
LDA (Li et al. 2020) wT Sb w Maximizes the separation between classes while minimizing the variance within each class
min wT Sw w
NMF (Wang et al. 2023) minW,H≥0 ‖X − WH‖2F Decomposes a non-negative matrix into two lower-dimensional non-negative matrices
Autoencoders and their applications in machine learning: a…
�∑ ∑ �
Nonlinear LLE (Miao et al. 2022) min 2 Seeks to preserve the local linear relationships between data points in a lower-dimensional
i ��xi − j wij xj ��
space
� �
ISOMAP (Ding et al. 2022) min ‖D − D̂‖2 Constructs a low-dimensional representation of data while preserving the geodesic distances
between data points on a manifold-like structure
�∑ ∑ � p ��
t-SNE (Meyer et al. 2022) ij Preserves the pairwise similarity relationships between data points in a lower-dimensional
min pij log q
ij space
� �
AE (Bank et al. 2023) min ‖X − X � ‖2F Aims to encode and subsequently decode data, facilitating dimensionality reduction and
feature extraction
RNN (Shi et al. 2022) – Captures temporal dependencies from sequential data passed recursively through hidden
layers
CNN (Molaei et al. 2022) – Processes structured grid data, by applying convolutional layers to automatically extracted
features
Page 5 of 52
28
13
28 Page 6 of 52 K. Berahmand et al.
Fig. 2 All published papers in gScholar, Web of Science and arxiv since 2012 with keywords
"Autoencoders" and "Machine Learning"
13
Autoencoders and their applications in machine learning: a… Page 7 of 52 28
However, AEs offer a powerful set of capabilities but also come with certain drawbacks
that should be considered. One of the main drawbacks of using AEs is that they are sensitive
to the choice of hyperparameters, such as the number and size of layers, the learning rate,
the loss function, and the regularization. These hyperparameters can affect the performance
and the quality of the autoencoder, and may require trial and error or grid search to find
the optimal values (Bank et al. 2020). Another common concern with AEs is their lack of
robustness. They can be sensitive to noisy data, outliers, and variations in input, which can
lead to suboptimal representations and reconstructions (Singh and Ogunfunmi 2022). AEs
can be prone to overfitting, especially when trained on limited data. Additionally, they may
not inherently preserve the spatial or temporal locality of data during training. This can
be problematic for tasks where preserving the local structure is essential, such as image
segmentation or sequence modeling (Liu et al. 2023). Furthermore, AEs tend to capture
lower-order features and may struggle to represent complex, higher-order relationships in
the data. This limitation can impact their performance on tasks that require understanding
intricate dependencies (Miuccio et al. 2022).
In recent years, substantial research efforts have been dedicated to addressing these
drawbacks through advancements in deep learning and AE techniques. Some of the
presented architectures in this area include regularization AEs, robust AE, generative
AE, convolutional AE, recurrent AE, semi-supervised AE, graph AE and masked AE.
These improvements, as demonstrated in Fig. 2, have caused that the use of autoencoder
algorithms in machine learning has gained increasing interest over the years. The
graph shows the trend of papers published in the field of "autoencoder" and “machine
learning” since 2012, revealing that over 90% of all indexed papers were published
between 2018 and 2023.
Despite being an important area of research, there is currently a lack of
comprehensive studies exploring the applications of AE algorithms in machine learning
on a wide scale. While existing review papers have examined specific themes, there has
been no comprehensive review conducted. In Table 2, we compare our contribution in
this paper to the descriptions of existing review papers in the field.
To this knowledge gap, our review will focus on addressing three key research
questions:
• What are the different types of AE algorithms that have been developed and utilized
in machine learning applications?
• What are the main methodological frameworks and the latest achievements in the
application of AE algorithms?
• What are the gaps and future directions in this field, and how can they be addressed
to enhance the effectiveness of AE algorithms in machine learning applications?
13
28
Table 2 Comparison of our article with the previous review or survey articles
Paper Year Brief description Aspects not considered
13
Sagha et al. (2017) 2017 The article provides a comprehensive review of existing literature Categorization of autoencoder taxonomies and applications.
and studies that have utilized stacked denoising autoencoders for Comprehensiveness
Page 8 of 52
Pratella et al. (2021) 2021 The review discusses several types of autoencoders, the Autoencoder applications in ML
advantages, and disadvantages of each algorithm, and provides
examples of how they can be applied to rare disease diagnosis
Song et al. (2021) 2021 It proposes the use of autoencoders as a technique for network Applications of autoencoder techniques in ML
intrusion detection. The authors conduct experiments on a
dataset of network traffic, comparing the performance of
autoencoders to traditional anomaly detection techniques
Qian et al. (2022) 2022 The article provides an overview of fault detection and diagnosis, Comprehensiveness. Autoencoder techniques in ML
and then discusses the use of autoencoders for feature extraction
in industrial processes. It covers different types of autoencoders,
and how they can be used for fault detection and diagnosis
Shankar and Parsana (2022) 2022 The paper provides an overview and empirical comparison of Categorization of autoencoder taxonomies and applications.
different NLP models and introduces and empirically applies Comprehensiveness
autoencoder models in the marketing domain
Singh and Ogunfunmi (2022) 2022 The paper provides an overview of VAEs and their applications Categorization of autoencoder taxonomies. Autoencoder
Autoencoders and their applications in machine learning: a…
13
Table 2 (continued)
28
13
them into distinct categories based on their architecture.
Page 10 of 52
This paper is organized as follows. Section 2 provides a concise overview of the structure
and hyperparameter in AEs. Section 3 discusses various taxonomies of AEs that have
been proposed in the literature. In Sect. 4, we review previous applications of AEs in the
machine learning domain, categorizing them according to the task they were used for. In
Sect. 5, we review explore publicly available software and platforms that can be used to
construct and develop AEs the performance of various autoencoders. Section 6 is dedicated
to discussing future directions in the field. Finally, in Sect. 7, we present our conclusions
based on the insights gathered from our analysis.
2 Background of autoencoder
AE is a fundamental building block that can be used hierarchically to create deep models.
They organize, compress, and extract high-level features, allowing unsupervised learning
and the extraction of non-linear features (Chen and Guo 2023). Autoencoders have
advantages over Restricted Boltzmann Machines (RBMs) as they can learn more complex
data representations. RBMs are widely used for generating various data types, including
images (Hinton et al. 2006). RBMs are a type of Boltzmann Machine (BM) that learns a
probability distribution from inputs (Chen and Guo 2023). The main difference between
Autoencoders, RBMs, and BMs lies in their architectures. AEs have an encoder and a
decoder, while RBMs consist of visible and hidden layers. Boltzmann Machines (BMs)
are more general and fully connected, making them less tractable compared to RBMs.
AEs are feed-forward neural networks, allowing information to flow in one direction. In
contrast, RBMs and BMs are generative models capable of generating new samples from
the learned distribution.
2.1 Vanilla autoencoder
13
28 Page 12 of 52 K. Berahmand et al.
During the encoding step, an AE maps an input vector X to a code vector Z using an
encoding function f𝜃 . In the decoding step, it maps the code vector Z back to the output
vector X ′, aiming to reconstruct the input data using a decoding function g𝜃 . AEs adjust
the network’s weights (W ) through fine-tuning, achieved by minimizing the reconstruction
error L between X and the reconstructed data X ′. This reconstruction error acts as a loss
function used to optimize the network’s parameters (Chai et al. 2019). The objective
function of an AE can be written as:
n n
∑ ∑
min JAE (𝜃) = min
𝜃 𝜃
l(xi , xi� ) = min
𝜃
l(xi , g𝜃 (f𝜃 (xi ))) (1)
i=1 i=1
where xi represents the i th dimension of the training sample, xi′ represents the i th dimension
of the output data, and n is the total amount of training data. The term "l" refers to the
reconstruction error between the input and output, defined as:
n
�
L(X, X � ) = ‖Xi − Xi� ‖2 (2)
i=1
The encoder and decoder mapping functions are Z = f𝜃 (X) = s(WX + b) and
X � = g𝜃 (Z) = s(W � Z + b� ), where "s" is a non-linear activation function like sigmoid
or ReLU. W and W ′ are weight matrices, and b and b′ are bias vectors. During training,
the weights and biases of the autoencoder are adjusted to minimize the reconstruction
error using an optimization algorithm like stochastic gradient descent. Once trained, the
encoding function can create low-dimensional representations of new input data ( Z ),
while the decoding function can reconstruct the original data from the low-dimensional
representation ( X ′).
2.2 Stack autoencoder
13
Autoencoders and their applications in machine learning: a… Page 13 of 52 28
Stacked Autoencoder follows a layer-wise approach (Hoang and Kang 2019; Hinton et al.
2006). After training layer 1, it serves as the input for training layer 2. When evaluating
the reconstruction loss, it is assessed relative to layer 1 rather than the input layer. The
encoding process can be mathematically represented as follows:
in which k represents the k-th autoencoder, ak represents the encoding outcome of the k-th
autoencoder, and when k = 1, a0 = x denotes the input data. The decoding process can be
mathematically represented as follows:
2.3 Hyperparameters in autoencoder
Autoencoders come with various hyperparameters that must be defined prior to training,
and their values can significantly influence the model’s performance. It’s crucial to
understand that certain hyperparameters are usually set before training and remain
constant, while others can be dynamically tuned during training to optimize the model’s
performance. Selecting and adjusting hyperparameters often involves experimentation
and validation to achieve the best results for a particular task. The following outlines the
most common hyperparameters in autoencoders:
• Number of Hidden Layers: The quantity of hidden layers within the autoencoder
defines its network depth and its capacity to capture intricate data patterns. This
parameter is configured before training. While adding more hidden layers can
enhance the model’s representational power, it may also introduce optimization
challenges and elevate the risk of overfitting.
• Number of Neurons in Each Layer: The number of neurons in each layer governs
the network’s data representation capacity and is typically set before training. A
higher count of neurons can amplify the network’s capacity but might also elevate
the risk of overfitting and complicate the optimization process.
• Size of Latent Space: Adjusting the size of the bottleneck layer permits fine-tuning
the balance between model complexity and performance. This parameter is set prior
to training.
• Activation Function: The activation function utilized in the bottleneck layer plays
a pivotal role in the autoencoder’s performance. To optimize the autoencoder’s
performance, the bottleneck layer activation function should be tailored before
training. These functions determine the network’s nonlinearity and its ability to
learn intricate data patterns. Common activation functions employed in bottleneck
layers encompass sigmoid, tanh, ReLU, and SELU. Further details, including their
equations, outputs, and output curves, are outlined in Table 3.
• Objective Function: The objective function, also known as the loss function, is a
critical element of an autoencoder, serving to train the network by minimizing the
distinction between input and output data. It gauges the dissimilarity between the
13
28 Page 14 of 52 K. Berahmand et al.
{
SELU x if x > 0 [−2, ∞]
f (x) = 𝜆
𝛼ex − 𝛼 if x ≤ 0
input and output data, and the autoencoder is trained to diminish this dissimilarity.
The selection of the objective function hinges on the data type and the specific
application and is generally determined before training. Common objective functions
used in autoencoders include:
– Mean Squared Error (MSE): This is the predominant objective function in
autoencoders, measuring the average squared difference between input and output
data. MSE is defined by formulas (1):
� �
LAE (X, X � ) = min ‖X − X � ‖2F (4)
When choosing an autoencoder loss function, consider the problem’s unique needs.
MSE suits regression tasks, offering robustness against outliers but sensitivity to data
scaling. BCE is for binary classification but can be numerically unstable near 0 or 1
probabilities. The choice depends on the problem and task requirements. MSE is the
13
Autoencoders and their applications in machine learning: a… Page 15 of 52 28
13
28 Page 16 of 52 K. Berahmand et al.
Autoencoder
OAE Difussion AE
3 Autoencoder taxonomy
3.1 Regularized autoencoder
13
Autoencoders and their applications in machine learning: a… Page 17 of 52 28
3.1.1 Sparse autoencoder
This combined penalty term encourages the model to acquire a sparse representation,
wherein only a limited number of neurons are active for each input.
3.1.2 Contractive autoencoder
Contractive Autoencoder (CAE) (Rifai et al. 2011) is an autoencoder that aims to produce
similar representations for similar input data by adding a penalty term to the loss function.
This penalty term, based on the Frobenius norm of the Jacobian matrix of the encoder
concerning the input data, encourages local stability in the learned representation. The
primary objective of the CAE is to minimize the difference between the input data and
the reconstructed data while taking the penalty term into account, promoting similarity
in representations for similar input data. The overall loss function of CAE includes the
reconstruction loss and a penalty term as follows:
� �
LCAE (X, X � ) = min ‖X − X � ‖2F + 𝜆‖JF (X)‖2F (8)
where ‖JF (X)‖2F represents the squared Frobenius norm of the Jacobian matrix of the
encoded representation concerning the input data. This norm measures the sensitivity of
the encoded representation to small variations, calculated as:
� �2
� 𝜕hj (X)
‖JF (X)‖2F = (9)
i,j
𝜕Xi
13
28 Page 18 of 52 K. Berahmand et al.
3.1.3 Laplacian autoencoder
The standard Autoencoder may not emphasize the relationships between nearby data
points during its learning process, which can lead to extracted features lacking crucial
information about the data’s internal structure. In contrast, the Laplacian Autoencoder
prioritizes preserving the distances between neighboring data points, effectively capturing
the significant internal structure within the data. Inspired by this concept, the Laplacian
Autoencoder (LAE) (Jia et al. 2015) was introduced to facilitate the generation of lower-
dimensional representations for Autoencoders. This approach ensures that the learned
representations incorporate essential local structural information, enhancing their
suitability for specific data analysis tasks. The loss function for the Laplacian Autoencoder
is defined as follows:
� �
LLAE (X, X � ) = min ‖X − X � ‖2F + 𝜆tr(Z � LZ) (10)
where matrix L, known as the graph Laplacian, is calculated based on how similar pairwise
are in the latent space. This calculation typically involves techniques like using k-nearest
neighbor graphs or Gaussian kernels.
3.1.4 Orthogonal autoencoder
where I is the identity matrix, Z T represents the transpose of the compressed representation
Z , and 𝜆 is a penalization parameter. Notably, setting 𝜆 to zero yields a conventional
autoencoder.
3.2 Robust autoencoder
13
Autoencoders and their applications in machine learning: a… Page 19 of 52 28
3.2.1 Denoising autoencoder
where X represents the clean input data, and X̂′ denotes the noisy input data.
where W signifies the learned transformation matrix, and m represents the total number of
input examples.
The M-DAE seeks the best solution for W, which can be expressed mathematically
as:
L2,1 Robust Autoencoder ( L2,1-RAE) (Li et al. 2018) is a modified version of the Robust
Autoencoder (RAE) designed to enhance the autoencoder’s resilience when dealing
with noisy or corrupted input data. This enhancement is achieved through the use of
a specific type of regularization known as L2,1 regularization.
L2,1 regularization
encourages the learned features to possess specific properties. Notably, it promotes
13
28 Page 20 of 52 K. Berahmand et al.
feature sparsity, meaning that most features consist of zeros, and robustness, enabling
them to handle scenarios with data outliers or noise. The mathematical expression of
the L2,1-RAE loss function is given as follows:
� �
L2,1RAE (X, X � ) = min ‖X − X � ‖2F + 𝜆 ⋅ ‖Z‖2,1 (15)
where ‖Z‖2,1 represents the L2,1-norm of the latent representations, which emphasizes both
sparsity and robustness in these learned features.
3.3 Generative autoencoder
3.3.1 Variational autoencoder
Variational Autoencoder (VAE) (An and Cho 2015) is a type of autoencoder that learns
to represent data in a lower-dimensional latent space and generate new data samples that
resemble the input. Unlike traditional autoencoders, VAEs are generative models that
can capture the underlying distribution of input data. In a VAE, the encoder maps input
data to a posterior distribution q(Z|X) instead of a fixed latent representation Z. During
reconstruction, Z is sampled from this distribution and passed through a decoder. The
regularization loss in VAE encourages q(Z|X) to match a specific distribution, often a
standard Gaussian. The VAE loss function is defined as:
[ ]
LVAE = − E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z)) (16)
the first term measures the difference between the original input data ( p(X|Z)) and the data
reconstructed by the decoder. The second term, a regularization component, quantifies the
KL divergence between q(Z|X) and p(Z), typically a standard Gaussian distribution. This
loss function guides VAE training to balance accurate data reconstruction with a structured
latent space for generative purposes.
3.3.2 Adversarial autoencoder
13
Autoencoders and their applications in machine learning: a… Page 21 of 52 28
� �
LAAE (X, X � ) = min ‖X − X � ‖2F + log(D(X)) + log(1 − D(G(Z))) (17)
where G(z) is the decoder function that converts the latent representation back to the
original input data, and D(X) represents the discriminator’s output for the original input
data. The term log(1 − D(G(Z))) reflects the discriminator’s output for data generated by
the decoder.
3.3.3 Bayesian autoencoder
Bayesian Autoencoder (BAE) (Yong and Brintrup 2022) is a probabilistic AE that models
all parameters, in contrast to the Variational Autoencoder (VAE) that mainly models the
latent layer. BAE combines a Gaussian likelihood for data reconstruction with an isotropic
Gaussian prior for parameter uncertainty. The loss function maximizes data likelihood and
minimizes model complexity. The BAE loss function is defined as:
( D
)
1 ∑ 1 1
log p(x|𝜃) = − (x − xi� )2 + log 𝜎i2 (18)
D i=1 2𝜎i2 i 2
where 𝜎i2 is the variance of the Gaussian distribution, and log p(x|𝜃) represents the log-
likelihood of observing the original data x given the model parameters 𝜃 . It quantifies data
reconstruction through squared errors and variances while promoting model simplicity.
The training objective is to maximize this log-likelihood while minimizing regularization
to find optimal parameters 𝜃 for effective data pattern and uncertainty capture.
3.3.4 Diffusion autoencoder
3.4 Convolutional autoencoder
13
28 Page 22 of 52 K. Berahmand et al.
data, as they excel at capturing spatial dependencies, which refer to the patterns and
relationships among pixels or locations within individual images or data frames. They find
wide-ranging applications in tasks such as image denoising, inpainting, segmentation, and
super-resolution.
the first term measures the difference between the original image and its reconstruction by
the decoder, while the second term encourages the latent representation q(Z|X) to follow a
standard Gaussian distribution through KL divergence regularization, ensuring a structured
latent space for effective generative capabilities.
where N is the number of spatial rows in the data, M is the number of spatial columns in
the data, T is the number of time steps in the sequence, Xtij represents the ground truth
value at spatial location (i, j) at time step t, and Xtij
′
represents the predicted value at spatial
location (i, j) at time step t.
13
Autoencoders and their applications in machine learning: a… Page 23 of 52 28
includes a sparsifying module designed to create sparse feature maps. This module retains
the highest value and its corresponding position within each local subregion before
performing unpooling, primarily through max pooling. The loss function used in CSAE,
which quantifies the disparities between the original input and the reconstructed output,
relies on the Frobenius norm and is defined as follows:
L �
� (l) �
2
LCSAE (X, X � ) = min ‖X (l) − X � � ) (22)
�F
l=1
d
∑ ( )
(23)
(l)
X� = rot(Wi , 180) ∗ Zil + ci
i=1
( ) ( )
Z l = Gp,s Zi(l) = Gp,s f (Wi ⋅ X (l) + bi ) (24)
where l is the number of layers, X (l) represents the original input at layer l, X � (l) represents
the reconstructed output at layer l, d is the number of feature channels, Zil is the ith
sparsified feature map, and Gp,s (X) represents the sparsifying operator, involving max-
pooling and unpooling operations to create sparse feature maps.
3.5 Recurrent autoencoder
RNNs (Medsker and Jain 2001) are designed for processing sequential data, like time
series where the current state (ht ) relies on the previous state (ht−1). Vanilla RNNs have
a limitation of short-term memory, leading to gradient problems in long sequences. To
address this, LSTM equipped with three gates (forget gate, input gate, and output gate),
and GRU networks consist of two gates (update gate and reset gate) were introduced.
These architectures incorporate self-loops to effectively manage gradients over extended
sequences, addressing the vanishing or exploding gradient issue. Recurrent Autoencoder is
an autoencoder that incorporates recurrent layers, such as LSTM or GRU, within both the
encoder and decoder components.
where X represents the clean input sequence and X ′ represents the reconstructed output
sequence.
13
28 Page 24 of 52 K. Berahmand et al.
GRU Autoencoder (GRUAE) (Dehghan et al. 2014) employs GRU units in both the
encoder and decoder parts. Unlike LSTM, GRU has a simpler architecture with only two
gates: the update and reset gates. This architectural simplicity can lead to easier training
and faster processing while still capturing long-term dependencies in input sequences.
The formulation of a GRU Autoencoder is similar to that of an LSTM Autoencoder,
making it flexible and effective for modeling sequential data,
� �
LGRUAE (X, X � ) = min ‖X − X � ‖2F (26)
where X represents the clean input sequence and X ′ represents the reconstructed output
sequence.
3.5.3 Bidirectional autoencoder
where T is the sequence length, Xt represents the input at time step t, and Xt′ represents the
reconstructed output at time step t.
3.6 Semi‑supervised autoencoder
13
Autoencoders and their applications in machine learning: a… Page 25 of 52 28
in which the first term represents the expectation of the conditional log-likelihood of the
latent variable z, the second term denotes the log-likelihood associated with y, and the third
term quantifies the Kullback–Leibler divergence between the prior distribution p(z) and the
posterior distribution q𝜙 (z|x, y).
Label and Sparse Regularized Autoencoder (LSRAE) (Chai et al. 2019) is a novel
approach that combines label and sparse regularizations with autoencoders to create a
semi-supervised learning method. This method effectively leverages the strengths of
both unsupervised and supervised learning processes. On one hand, sparse regularization
selectively activates a subset of neurons, enhancing the extraction of localized and
informative features. This unsupervised learning process helps uncover underlying data
concepts, improving generalization. On the other hand, label regularization enforces the
13
28 Page 26 of 52 K. Berahmand et al.
where the first term ensures precise data reconstruction, the second term promotes sparsity
within the hidden layer, facilitating efficient feature extraction. The third term acts as a
safeguard against overfitting by penalizing excessive weights. Lastly, the fourth term
enhances classification accuracy by quantifying the label error. Here, L denotes the actual
label, and T represents the desired label.
3.7 Graph autoencoder
Graph Autoencoder (GAE) (Pan et al. 2018) is a power method for reducing the
dimensionality of graph data, enhancing efficiency in graph analytics. It takes a graph
as input and outputs a condensed vector representation that captures its essential feature.
Within GAE, the encoder converts the input graph into a lower-dimensional vector,
which the decoder uses to recreate the original graph. The model aims to minimize the
dissimilarity between input and output graphs while capturing essential graph features. The
loss function for GAE is defined as:
� �
LGAE (X, X � ) = min ‖X − X � ‖2F (31)
where X ′ is computed from the inner product of the hidden representation Z and its
transpose Z T using the logistic sigmoid function 𝜎(ZZ T ). Z = GCN(F, X), obtained through
the Graph Convolutional Network (GCN) applied to the node features matrix F , is based
on the input data X .
Variational Graph Autoencoder (VGAE) (Kipf and Welling 2016) is a framework for
learning interpretable latent representations of graph-structured data. It employs a
probabilistic approach to encode graph information effectively. VGAE consists of two
essential components: an encoder and a decoder. The encoder utilizes a Graph Convolution
Network (GCN) to transform graph nodes into a lower-dimensional latent space. It
generates latent variables zi for each node by sampling from Gaussian distributions. These
latent variables capture crucial structural information of the graph. The decoder functions
as a generative model, aiming to reconstruct the original graph structure using the latent
variables zi . It estimates the likelihood of connections (edges) between nodes based on
their corresponding latent vectors.The VGAE loss function combines a reconstruction term
and a regularization term to guide the learning process effectively:
LVGAE = −E(q(Z|F, X))[log[p(X|Z)]] + KL(q(Z|F, X)||p(Z)) (32)
where q(Z|F, X) represents the encoding distribution, p(X|Z) models the likelihood of
the adjacency matrix given the latent variables, and KL(q(Z|F, X)||p(Z)) quantifies the
divergence between the encoding distribution and the prior distribution governing the
latent variables Z.
13
Autoencoders and their applications in machine learning: a… Page 27 of 52 28
Adversarial Graph Autoencoder (AGAE) (Pan et al. 2018) leverages adversarial training
to acquire a lower-dimensional representation of the input graph. It employs an encoder to
map graph nodes to this lower-dimensional space and a decoder to reconstruct the original
graph. AGAE integrates an adversarial component, akin to a discriminator, to ensure the
learned embeddings preserve the graph structure. This unsupervised model combines
autoencoder-based reconstruction with adversarial training to generate high-quality graph
representations. The AGAE loss function is defined as follows:
LAGAE = E(H∼pz ) [log D(Z)] + EX [log(1 − D(G(F, X)))] (33)
where G(⋅) represents the generator, and D(⋅) signifies the discriminator. The
discriminator’s role is to distinguish between the real input graph, pz , and the reconstructed
graph generated by the generator G(F, X).
Graph Attentional Autoencoder (GAAE) (Salehi and Davulcu 2019) is a variant of graph
autoencoders that combines Graph Attention Network (GAT) with GAE. It employs
attention mechanisms to weigh the importance of neighboring nodes and edges during the
reconstruction process. In essence, GAAE aims to learn a low-dimensional representation
of a graph while preserving its structural information using attention mechanisms. The
GAAE loss function is defined as follows:
� �
LGAAE = min ‖X − Sigmoid(ZZ T ))‖2F (34)
in which Z represents the hidden layer representation of node v. The calculation of Zi(l) is
based on the formula:
( )
∑
(35)
(l) (l−1) (l−1)
Zi = 𝜎 aij W Zj
j∈Ni
where Ni denotes the set of neighbors of node vi , and W (l−1) represents the learnable
parameter matrix. The attention coefficient aij is computed using the following formula:
3.8 Masked autoencoders
13
28 Page 28 of 52 K. Berahmand et al.
generate coherent and contextually appropriate text or videos, making them valuable for
tasks like text completion (Zhang et al. 2022), text generation (Zhang et al. 2023,) language
modeling, image captioning (Alzu’bi et al. 2021) and data augmentation (Xu et al. 2022).
Graph Masked Autoencoder (GMAE) (Hou et al. 2022) is a simplified and cost-effective
approach for self-supervised graph representation learning. Unlike most GAEs that focus
on reconstructing graph structures, GMAE’s core emphasis is on feature reconstruction
through masking. Additionally, GMAE departs from using MSE, opting for the cosine
error, which benefits cases where feature magnitudes vary, common in graph node
attributes. The primary objective of GMAE is to reconstruct the masked features of nodes,
V ′ ⊂ V , given the partially observed node signals. Formally, for GMAE, the Loss function
is as follow, where it is averaged over all masked nodes,
� �𝛾
1 � xiT zi
LGMAE = min � 1− , 𝛾≥1 (37)
�V � v ∈V � ‖xi ‖ ⋅ ‖zi ‖
i
� �
⎛ 𝜌j− ⎞
⎜ − exp( 𝜏
) ⎟
(38)
� 2
LCMAE = min ⎜‖Ym − Ym ‖F + 𝜆 log 𝜌−j ∑K 𝜌−j ⎟
⎜ exp( 𝜏 ) + j=1 exp( 𝜏 ) ⎟
⎝ ⎠
13
Table 4 Various autoencoder methods including details on their respective improvements and utilized loss functions
Method Improvement Loss function
� �
SAE Learns a more compact and informative representation of the data min ‖X − X � ‖2F + 𝜆KL(p ∥ q)
� �
CAE Learns a mapping that is robust to small input variations min ‖X − X � ‖2F + 𝜆‖JF (X)‖2F
� �
LAE Learns a low-dimensional data representation while preserving the local structure min ‖X − X � ‖2F + 𝜆tr(Z � LZ)
� �
OAE Enforcing orthogonality among latent features, enhancing class discriminability min ‖X − X � ‖2F + 𝜆‖Z T Z − I‖2F
� �
DAE Introduces noise to input and reconstructs the output from the original clean input min ‖X − X̂� ‖2F
� ∑ �
M-DAE Reconstructs clean data from noisy data where some of the features are missing m
min m1 i=1 ‖X − X̂� W‖2F
� �
2,1RAE Enhances resilience to noisy data using L2,1 regularization, encouraging feature sparsity min ‖X − X � ‖2F + 𝜆 ⋅ ‖Z‖2,1
and robustness
[ ]
VAE Learns the input data distribution and generates new data points from this distribution −E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z))
� �
AAE Learns the input data structure and generates new data points similar to them min ‖X − X � ‖2F + log(D(X)) + log(1 − D(G(Z)))
� ∑ �
BAE Combining Gaussian likelihood and isotropic Gaussian prior for effective data pattern and − 1 D 1 (x − x� )2 + 1 log 𝜎 2
D i=1 2𝜎 2 i i 2 i
uncertainty capture i
Autoencoders and their applications in machine learning: a…
DiffusionAE a specialized generative model, employing the Diffusion Probabilistic Loss for training − log P(X|X � )
[ ]
CVAE Integrating convolutional layers and probabilistic modeling, using a Gaussian latent −E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z))
variable and KL divergence regularization
∑N ∑M ∑T � �
ConvLSTM combines convolution and recurrent layers for spatiotemporal data min ‖X − X � ‖2
i=1 j=1 t=1 ijt ijt F
CSAE Combines the convolutional layers of a CNN with the sparsity constraint of a SAE ∑L � �2
min l=1 ‖X (l) − X � (l) � )
�F
� � 2
�
LSTMAE Uses LSTM units in the encoder and decoder parts of the network min ‖X − X ‖F
� �
GRUAE Uses GRU units in the encoder and decoder parts of the network min ‖X − X � ‖2F
∑T � �
BiRNNAE Using bidirectional RNNs to minimize squared reconstruction error with an MSE loss for min T1 t=1 ‖Xt − Xt� ‖2F
sequential data
SSVAE Combining log-likelihood terms for latent variables and Kullback–Leibler divergence −𝔼q𝜙 (z|x,y) [log p𝜃 (x|y, z)] − log p𝜃 (y) + KL(q𝜙 (z|x, y)||p(z))
DVAE A unique loss function for capturing complex data patterns and relationships between 𝔼q(y,z|x) (log p(x|y, z) + log p(y) + log p(z) − log q(y|x, z) − log q(z|x))
Page 29 of 52
13
Table 4 (continued)
28
13
LSRAE Combining sparse and label regularizations with autoencoders to improve feature min ‖X − X � ‖2F + KL(p ∥ q) + i=1 j=1 (Wij )2 + i=1 ‖L − T‖
extraction and categorization accuracy
Page 30 of 52
VGAE Using a probabilistic approach, combining an Encoder and a Decoder guided by a loss −E(q(Z|F, X))[log[p(X|Z)]] + KL(q(Z|F, X)||p(Z))
function with a reconstruction term
AGAE Using adversarial training with encoder and decoder components to create compact graph E(H∼pz ) [log D(Z)] + EX [log(1 − D(G(F, X)))]
representations
� �
GAAE Using attention mechanisms to reconstruct graphs effectively. Its loss emphasizes min ‖X − Sigmoid(ZZ T ))‖2F
preserving structural information
� �𝛾
GMAE Prioritizing feature reconstruction through masking and employs cosine error 1 ∑ xiT zi
min �V � � vi ∈V � 1− ‖xi ‖⋅‖zi ‖
, 𝛾≥1
� �
j
CMAE Uses Improving vision representations with online and target branches, online encoder ⎛ − exp( 𝜌𝜏− ) ⎞
reconstructs masked images, using cosine similarity loss 𝜌− 𝜌−
⎟
min ⎜‖Ym − Ym� ‖2F + 𝜆 log
⎜ ∑
exp( 𝜏j )+ Kj=1 exp( 𝜏j ) ⎟
⎝ ⎠
SDMAE Utilizing student and teacher branches to reconstruct missing information min(log q𝜓 (̂x|̃x)
K. Berahmand et al.
Autoencoders and their applications in machine learning: a… Page 31 of 52 28
∑n
mi f𝜙 (xi )f𝜃 (̂x)
i=1
LSDMAE = min(log q𝜓 (̂x�̃x)) ≈ min � �∑ (39)
∑n i (f (x ))2 n
i=1 m 𝜙 i
i x))2
i=1 m (f𝜃 (̂
4 Application autoencoder
AEs have been widely used in various domains, including computer vision, natural
language processing, complex network analysis, recommenders, anomaly detection,
speech recognition, and more. Different types of autoencoder architectures have been
proposed to address specific challenges and improve performance in these domains.
For example, convolutional autoencoders are commonly used in image processing
tasks, while recurrent autoencoders are well-suited for sequential data processing. In
addition, variational autoencoders have been developed for generating new data samples
and improving model generalization. Although each architecture has its own advantages
and limitations, it is important to consider the specific requirements of the application
domain when selecting an appropriate architecture. Figure 5 provides an overview of
the applications of autoencoders in various domains, which can be used as a starting
point for selecting an appropriate architecture. However, further research is needed to
investigate which architectures are more suitable for which application categories and
which architectures are more popular in specific domains.
Application of
Autoencoder
Object
Detection
3D Shape
Fig. 5 The process of creating the consensus matrix, including the generation of random walks of different
lengths and their combination
13
28 Page 32 of 52 K. Berahmand et al.
4.1 Machine vision
Machine vision utilizes computer algorithms and software to analyze and interpret
images or video data, aiming to enable machines to understand and interact with
the visual world (Jain et al. 1995). AEs play a vital role in various machine vision
applications by learning to extract meaningful image features and reducing data
dimensionality. These applications encompass tasks such as image classification
(Vincent et al. 2010), image clustering (Guo et al. 2017), image segmentation
(Myronenko 2019), image inpainting (Bertalmio et al. 2000), image generation (Vahdat
and Kautz 2020), object detection (Liang et al. 2018), and 3D shape analysis (Todd
2004).
AEs are instrumental in image classification. Methods like Semi-supervised stacked
distance autoencoder (Hou et al. 2020) enhance feature representation by incorporating
semi-supervised learning, utilizing both labeled and unlabeled data to learn inter-data
point distances. Deep Convolutional Autoencoders (DCAE) aid in semi-supervised
classification, as seen in Geng et al. (2015), where they pre-train on unlabeled Synthetic
Aperture Radar (SAR) images and fine-tune using labeled data for high-resolution SAR
images classification.
AEs are also valuable in image clustering, where they learn compressed image
representations for grouping similar images in the latent space. This technique involves
training a clustering algorithm like K-means on the latent space, as described in
references Song et al. (2013) and Yang et al. (2017). Additionally, AEs can be used for
unsupervised image clustering, making them suitable for scenarios with limited labeled
data.
AEs are instrumental in image segmentation, with a wide array of applications that
enhance the precision and efficiency of this critical computer vision task. By learning
meaningful feature representations from image data, AEs provide a valuable foundation
for distinguishing objects and boundaries in images. Their capability for dimensionality
reduction streamlines the processing of high-resolution images, making segmentation
algorithms computationally more tractable (Zhang et al. 2019). AEs also excel in noise
reduction, eliminating unwanted artifacts from images, which is pivotal for accurate
segmentation (Tripathi 2021). They are integral in semantic segmentation (Ohgushi
et al. 2020), where they classify each pixel in an image, and instance segmentation (Lin
et al. 2020), distinguishing individual object instances. Furthermore, AEs contribute
to medical image segmentation (Ma et al. 2022), aiding in the precise identification
of structures and anomalies in healthcare images. Overall, AEs substantially elevate
the accuracy and efficiency of image segmentation tasks, encompassing a range of
applications that extend from object recognition to medical diagnosis.
AEs find significant applications in the domain of image inpainting, a process
of reconstructing missing or corrupted parts of an image. They excel at capturing
complex patterns and textures within images, making them invaluable for this task.
AEs, particularly VAEs and GANs, offer high-quality inpainting results by learning to
generate realistic and coherent content to fill in the gaps (Tian et al. 2023; Han and
Wang 2021). They effectively model the underlying structures and features of images,
ensuring that the inpainted regions seamlessly blend with the surrounding content.
AEs find versatile applications in image generation tasks, contributing to the creation
of high-quality and diverse visual content. They serve as a foundational component
in generative models, VAEs and GANs, enabling the synthesis of realistic and novel
13
Autoencoders and their applications in machine learning: a… Page 33 of 52 28
images (Huang and Jafari 2023). AEs are essential in encoding and decoding operations,
effectively generating images with specific features, styles, and content (Xu et al. 2019).
They also play a vital role in style transfer, where they transform images to adopt the
artistic characteristics of other images or styles (Kim et al. 2021).
AEs play a role in object detection by extracting valuable features from images or video
frames, improving detection accuracy. Convolutional AEs are used to learn compressed
image representations that enhance the performance of object detection algorithms, such
as Region-based Convolutional Neural Networks (R-CNN) (Ding et al. 2019). VAE further
enhanes object detection accuracy, as seen in the integration of VAE with You Only Look
Once (YOLO) (Redmon et al. 2016).
In the domain of 3D shape analysis, AEs learn compressed representations for tasks like
shape generation, completion, and retrieval. Achieving a disentangled latent representation
that separates various factors of variation is a challenge. Recent research introduces
methods like Split-AE (Saha et al. 2022) and 3D Shape Variational Autoencoder Latent
Disentanglement (Foti et al. 2022), addressing this challenge. Other approaches employ
deep learning features for 3D shape retrieval by projecting 3D shapes into 2D space and
utilizing AEs for feature learning (Zhu et al. 2016). Additionally, architectures like point-
cloud AEs combined with VAEs are explored to partition the latent space and enhance 3D
shape analysis (Aumentado-Armstrong et al. 2019).
While AEs offer valuable capabilities in various machine vision applications, their
effectiveness often depends on the specific task and dataset characteristics, and they may
be complemented by specialized models in certain scenarios.
4.2 NLP
NLP is a field that explores how computers can understand and work with human
language in speech or text form to perform useful tasks (Chowdhary and Chowdhary
2020). This area mainly concentrates on methods for handling text data, including tasks
like categorizing text (text classification) (Kowsari et al. 2019), grouping similar texts
together (text clustering) (Aggarwal and Zhai 2012), generating new text (text generation)
(McKeown 1992), and assessing the sentiment expressed in text (sentiment analysis)
(Medhat et al. 2014). To tackle the complexities of working with textual data, researchers
have developed advanced models, often incorporating AEs. These models have proven
effective in addressing the challenges associated with processing text data (Li et al. 2023).
AEs play a versatile role in text classification tasks, offering feature learning to capture
crucial patterns in text data (Guo et al. 2023; Ye et al. 2022), dimensionality reduction
for efficient processing of high-dimensional text features (Le et al. 2023; Che et al. 2020),
noise reduction to clean and enhance noisy text (García-Mendoza et al. 2022; Che et al.
2020), and semi-supervised learning for improved classification using limited labeled
data (Wu et al. 2019; Xu et al. 2017). They also excel in topic modeling by uncovering
underlying themes within text documents (Paul et al. 2023; Smatana and Butka 2019),
aid in anomaly detection to identify unusual patterns (Gorokhov et al. 2023; Bursic
et al. 2019), and enable coherent text generation (Semeniuta et al. 2017; Zhao et al.
2021). Their adaptability and versatility make them indispensable tools in NLP and text
analysis, enhancing various aspects of text classification. Another application of AE in
the field of NLP is text clustering. In this context, AEs have been applied to organize text
documents into meaningful groups. One approach utilizes stacked AEs, combining them
with k-means clustering to effectively group text documents into meaningful clusters
13
28 Page 34 of 52 K. Berahmand et al.
(Hosseini and Varzaneh 2022). In Deep Embedded Clustering (DEC), AEs play a pivotal
role by initializing feature representations of data points and serving as the foundation for
similarity computations during the clustering process. The embeddings learned by AEs
are jointly optimized with cluster assignments, thereby enhancing the overall quality of
clustering results (Xie et al. 2016; Daneshfar et al. 2023). AEs also provide a solution
to the challenges of short text clustering. They address the sparsity problem in short text
representations by employing low-dimensional continuous representations or embeddings
like Smooth Inverse Frequency (SIF) embeddings. Here, the encoder maps the input
short texts to a lower-dimensional continuous representation, and the decoder strives to
reconstruct the input from this representation. AEs are used to encode and reconstruct
these SIF embeddings, resulting in improved short text clustering quality (Hadifar et al.
2019).
4.3 Complex network
13
Autoencoders and their applications in machine learning: a… Page 35 of 52 28
while preserving pairwise topology (Fan et al. 2021). Bayesian deep generative
frameworks are used to learn deep latent representations, improving link prediction in
HINs. Another method (Salha et al. 2019) inspired by Newtonian gravity extends the graph
autoencoder and VAE frameworks to address link prediction in directed graphs, effectively
reconstructing directed graphs from node embeddings. Lastly, the Multi-Scale Variational
Graph Autoencoder (MSVGAE) introduces a novel graph embedding framework that
leverages graph attribute information through self-supervised learning (Guo et al. 2022).
In conclusion, autoencoders are versatile tools for intricate network analysis,
contributing significantly to tasks such as network embedding, deep clustering, and link
prediction by capturing complex patterns, enhancing representations, and enabling precise
predictions.
4.4 Recommender system
13
28 Page 36 of 52 K. Berahmand et al.
and Neural Collaborative Autoencoder (NCAE) (He et al. 2017). HCCAE combines the
learned representations with other recommendation models, while NCAE utilizes a neural
network to generate recommendations directly from the learned representations. These
models leverage additional information such as content features, social relationships, or
visual data to enhance their recommendations. Each model possesses unique characteristics
and objectives, making them suitable for addressing various challenges like cold start
problems, sequential data, semantic information, or visual styles.
4.5 Anomaly detection
While AEs have the ability to learn complex patterns in data and detect anomalies that are
not easily identifiable, it has been widely used in the field of anomaly detection (Pang et al.
2021). An anomaly detection model can be used to detect a fraudulent transaction or any
highly imbalanced supervised tasks (Chandola et al. 2009). AEs can be used in supervised
(Alsadhan 2023), unsupervised (Lopes et al. 2022), and semi-supervised (Akcay et al.
2018; Ruff et al. 2019) anomaly detection tasks.
In supervised anomaly detection, AEs are trained on both normal and anomalous
data. The AE is first trained on normal data to learn the underlying patterns and features
of normal data. Then, the AE is fine-tuned on the combined normal and anomalous
data to capture the difference between normal and anomalous data. During training, the
objective is to minimize the reconstruction error between the input and the output of the
AE. After training, the reconstruction error of the test data is compared to a threshold. If
the reconstruction error is above the threshold, the input data is classified as anomalous
(Pang et al. 2021). This approach combines the feature learning capabilities of AEs with
the discriminative power of supervised classifiers, enhancing the accuracy of anomaly
detection in real-world applications, including fraud detection (Alsadhan 2023; Debener
et al. 2023; Fanai and Abbasimehr 2023), network security (Ghorbani and Fakhrahmad
2022; Lopes et al. 2022), and fault detection (Ding et al. 2022; Ying et al. 2023) in
industrial processes.
In unsupervised tasks, the idea is to train AEs on only sample data of one class
(majority class). This way the network is capable of re-constructing the input with good
or less reconstruction loss. Now, if a sample data of another target class is passed through
the AE network, it results in comparatively larger reconstruction loss, a threshold value
of reconstruction loss (anomaly score) can be decided, larger than that can be considered
an anomaly (Sakurada and Yairi 2014). This inherent ability to capture complex data
representations without labeled anomalies makes AEs effective in detecting anomalies,
whether in cyber-security for identifying network intrusions (Lopes et al. 2022; An
et al. 2022; Lewandowski and Paffenroth 2022), in manufacturing for spotting defects
(Papananias et al. 2023; Sudo et al. 2021), or in finance for fraud detection (Du et al. 2022;
Jiang et al. 2023; Kennedy et al. 2023). The versatility of AEs and their capacity to adapt
to diverse data types contribute to their widespread use in unsupervised anomaly detection
scenarios, enhancing system security and reliability.
AEs have been employed effectively in semi-supervised anomaly detection by
capitalizing on their capacity to learn rich data representations (Zhou et al. 2023). In this
context, a portion of the training data is labeled as normal, while the majority remains
unlabeled. The AE is trained to reconstruct the normal data accurately, and during this
process, it learns to capture the underlying structure and features of the normal class.
When presented with new, unlabeled data, the AE endeavors to reconstruct it (Ruff et al.
13
Autoencoders and their applications in machine learning: a… Page 37 of 52 28
2019). Anomalies, which deviate significantly from the learned normal patterns, result
in high reconstruction errors. By setting a suitable threshold on the reconstruction error,
anomalies can be effectively detected. This semi-supervised approach minimizes the need
for extensive labeled anomaly data and has proven effective in various domains, including
fraud detection (Charitou et al. 2020; DeLise 2023; Dzakiyullah et al. 2021), network
security (Dong et al. 2022; Hara and Shiomoto 2020; Hoang and Kim 2022; Thai et al.
2022), and quality control (Cacciarelli et al. 2022; Sae-Ang et al. 2022), where labeled
anomalies are often scarce.
4.6 Speech processing
13
28 Page 38 of 52 K. Berahmand et al.
4.7 Other
4.7.1 Fault diagnosis
4.7.2 Intrusion detection
13
Autoencoders and their applications in machine learning: a… Page 39 of 52 28
Autoencoders can play a significant role in automatic feature extraction for intrusion
detection systems. Kunang et al. (2018) propose a method in which an autoencoder is
employed to extract relevant features from raw network traffic data. These extracted
features are then used as input for a classifier, such as a Support Vector Machine
(SVM), to distinguish between normal and malicious traffic. Compared to traditional
rule-based or signature-based methods, autoencoders have the potential to enhance the
accuracy and efficiency of intrusion detection systems (Ieracitano et al. 2020).
4.7.3 Hyperspectral imaging
AEs find wide-ranging applications in hyperspectral image analysis due to their ability
to learn concise representations of high-dimensional data. Hyperspectral imaging is a
potent technique for capturing detailed spectral information about objects or scenes. It
involves multi-dimensional data where each pixel contains a spectrum of reflectance
or radiance values across numerous narrow, contiguous spectral bands (Jaiswal et al.
2023).
AEs are employed for various tasks in managing hyperspectral data, including
hyperspectral data compression (Minkin et al. 2021), hyperspectral unmixing (Książek
et al. 2022), blind hyperspectral unmixing (Palsson et al. 2022), and dimensionality
reduction (Zabalza et al. 2016). In data compression, AEs condense hyperspectral data
while retaining crucial information, facilitating subsequent analysis and processing.
Hyperspectral unmixing entails decomposing a hyperspectral image into its constituent
parts, referred to as endmembers. AEs play a pivotal role in reconstructing the spectral
profiles of these identified components (endmembers) and determining their proportional
mixing amounts (abundances). This is indispensable for enhancing the efficiency of
hyperspectral analysis and classification tasks (Su et al. 2019). Blind hyperspectral
unmixing involves deconstructing the recorded spectrum of a pixel into a mixture of
endmembers while simultaneously discerning the proportions or fractions of these
endmembers within the pixel. Training an AE on hyperspectral images results in a lower-
dimensional representation of the data, rendering it more manageable for subsequent
analysis (Petersson et al. 2016).
The development and availability of open-source libraries for various versions of AEs
have greatly facilitated research in this field. Three popular libraries that are widely
used for building and training autoencoder models are TensorFlow, PyTorch, and
Keras. Each of these libraries has its strengths and is preferred by different segments of
the machine learning and deep learning community. Table 5 presented in this section
provides a comprehensive overview of the source code for our proposed category of AE
variants. Researchers can access these code repositories to implement and test different
versions of AEs, and to compare their performance on various tasks. For instance, one
could use the available code to train a variational AE for image reconstruction or a graph
attention AE for node embedding. These libraries are not only useful for research but
also for practical applications, as they enable practitioners to easily deploy pre-trained
models on their own datasets. Table 6 presents a comprehensive overview of various AE
13
28 Page 40 of 52 K. Berahmand et al.
Table 5 AE Models and their corresponding years of publication, programming languages, and code
repositories
Subsection Model Year Language Code Repository
models and their diverse applications in machine learning. Each model is associated with
specific applications, datasets, methodology, evaluation metrics, and performance results.
Notable applications include feature learning, dimensionality reduction, graph-based data
representation, generative modeling, anomaly detection, and sequential data analysis. The
evaluation metrics vary depending on the application but commonly include error rates,
accuracy, precision, recall, F1 score, Area Under the Curve (AUC), and more. These AEs
demonstrate their effectiveness in tasks ranging from image classification and sentiment
analysis to graph representation learning and acoustic novelty detection, showcasing
their versatility in addressing a wide array of machine learning challenges across various
domains.
13
Table 6 AE Models and their corresponding applications
AE model Application Methodology Dataset Performance
SAE (Ng 2011) Sparse and Discriminative Feature Image classification MNIST Error rate = 1.35
Learning.
Fault diagnosis CWRU ACC = 100
CAE (Rifai et al. 2011) Feature Extraction and Feature extraction and classification CIFAR Error rate = 47.86
Dimensionality Reduction. MNIST Error rate = 1.14
LAE (Jia et al. 2015) Graph-based data representation Manifold generalization MNIST Error rate = 0.98
learning. CIFAR-10 Error rate = 45.41
OAE (Wang et al. 2019) Discriminative and diverse feature Data clustering MNIST ACC = 95.4
representations NMI = 90
DAE (Vincent et al. 2010) Robust Feature Extraction. Data classification MNIST Error rate = 1.21
M-DAE (Chen et al. 2012) Anomaly detection Sentiment analysis Amazon reviews Transfer rate = 1.1
L2,1-RAE (Li et al. 2018) Outlier detection Unsupervised MNIST ACC = 97.66
feature learning Reuters-21578 ACC = 82.92
VAE (An and Cho 2015) Generative modeling. Anomaly detection MNIST AUC ROC = 91.7
Autoencoders and their applications in machine learning: a…
LSTMAE (Nguyen et al. 2021) Capture representations from Forecasting and C-MAPSS ACC = 98.36
sequential data. anomaly detection F-score = 96.98
28
13
Table 6 (continued)
28
13
GRUAE (Dehghan et al. 2014) Sequential data reconstruction Determining Family 101 Precision = 81.5
Parent-Offspring KinFaceW-II Precision = 74.5
Page 42 of 52
Resemblance
BiRNNAE (Marchi et al. 2015) Capture contextual information from Acoustic novelty detection PASCAL CHiME Precision = 94.7
both of sequence directions Recall = 92.0
SSVAE (Xu et al. 2017) Data representation. Text classification IMDB Error rate = 7.6
AGNews Error rate = 7.68
DVAE (Higgins et al. 2016) Disentanglement representation Unsupervised disentanglement celebA ACC = 83.9
learning in complex data. representations
LSRAE (Chai et al. 2019) Extract the potential features to Image classification MNIST ACC = 98.33
improve classification.
VGAE (Kipf and Welling 2016) Graph-based generative modeling. Link prediction Cora ACC = 63.8
NMI = 45
AGAE (Pan et al. 2018) Graph-Based Anomaly Detection. Link prediction Cora AUC = 92.4
AP = 92.6
GAAE (Salehi and Davulcu 2019) Graph representation learning. Node classification Cora ACC = 83.2
GMAE (Hou et al. 2022) Sequence modeling and text Node classification Cora Micro-f = 84.2
generation.
CMAE (Huang et al. 2022) Data augmentation Image classification, data ImageNet-1k ACC = 85.3
augmentation
SDMAE (Chen et al. 2022) Generate high descriptive capability Image classification ImageNet-1k ACC = 84.1
for MAE
K. Berahmand et al.
Autoencoders and their applications in machine learning: a… Page 43 of 52 28
6 Future directions
Despite in-depth research on autoencoders and their improved algorithms in recent years,
the following issues still need to be addressed.
6.2 Hypergraph autoencoder
Autoencoders have proven effective in preserving the non-linear structure of data due to
their deep learning capabilities. However, they face a challenge in preserving higher-order
neighbors in complex datasets. While autoencoders can address the former concern, they
may not inherently handle the latter. To bridge this gap, integrating hypergraph-based
representations of data into the autoencoder framework emerges as a potential solution. By
transforming the data into a hypergraph and feeding it as input to the autoencoder, it may
be possible to preserve the critical high-order neighbor relationships. This approach holds
promise for enhancing the utility of autoencoders in scenarios where preserving intricate
data dependencies is crucial, potentially leading to improved performance across various
applications.
Constructing an autoencoder involves crucial decisions about parameters like the number
of hidden layers and nodes, which significantly influence the model’s final performance.
While parameter selection is essential, the process of identifying the most suitable
configuration can be challenging. In current research efforts, some have explored leveraging
reinforcement learning techniques in conjunction with autoencoder construction. This
novel approach aims to optimize autoencoder parameters efficiently, potentially enhancing
model performance. The integration of reinforcement learning into parameter tuning
represents an evolving research gap that holds promise for automating and improving the
autoencoder design process.
13
28 Page 44 of 52 K. Berahmand et al.
7 Conclusion
Autoencoders have become a focal point in unsupervised learning due to their remarkable abil-
ity to uncover data features and serve as a valuable dimensionality reduction tool. This paper has
conducted a thorough examination of autoencoders, covering their fundamental principles and a
detailed classification of models based on unique characteristics. We have also explored their use
in various areas, from computer vision to natural language processing, highlighting their adapt-
ability. During this study, we’ve recognized both the advantages and occasional drawbacks of
autoencoders. By classifying and summarizing these models based on their unique traits, we’ve
revealed possible directions for future enhancements and innovations. This insight paves the way
for further progress in the field.
In summary, autoencoders have an important role in the field of machine learning,
and their significance is continuously growing. They have the remarkable ability to
find valuable insights in data and create smart results, which can greatly impact vari-
ous areas. We expect an ongoing journey of progress and important developments in
the field of autoencoders, ultimately leading to the creation of even more powerful and
intelligent solutions that benefit society as a whole. Autoencoders are positioned to fos-
ter innovation and shape the future of machine learning.
Author contributions KB and FD has made a substantial contribution to the concept of the article and
drafted the article, ES has made an analysis of the article data, and YL and YX has revised the article.
Data availability The data that support the findings of this study are available from the corresponding author
upon reasonable request.
Declarations
Conflict of interest The authors declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
13
Autoencoders and their applications in machine learning: a… Page 45 of 52 28
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews:
computational statistics 2(4):433–459
Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. Mining Text Data, 77–128
Akcay S, Atapour-Abarghouei A, Breckon TP (2018) Ganomaly: Semi-supervised anomaly detection via
adversarial training. In: Computer Vision-ACCV 2018: 14th Asian conference on computer vision,
Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, Springer, pp 622–637
Alex SB, Mary L (2023) Variational autoencoder for prosody-based speaker recognition. ETRI J
45(4):678–689
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse
autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856
Alsadhan N (2023) A multi-module machine learning approach to detect tax fraud. Comput Syst Sci Eng
46(1):241–253
Alzu’bi A, Albalas F, Al-Hadhrami T, Younis LB, Bashayreh A (2021) Masked face recognition using
deep learning: a review. Electronics 10(21):2666
An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability.
Special Lecture IE 2(1):1–18
An P, Wang Z, Zhang C (2022) Ensemble unsupervised autoencoders and gaussian mixture model for
cyberattack detection. Inform Process Manag 59(2):102844
Aumentado-Armstrong T, Tsogkas S, Jepson A, Dickinson S (2019) Geometric disentanglement for
generative latent shape models. In: Proceedings of the IEEE/CVF international conference on
computer vision, pp 8181–8190
Azarang A, Kehtarnavaz N (2020) A review of multi-objective deep learning speech denoising methods.
Speech Commun 122:1–10
Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial. Inst Signal
Inform Process 18(1998):1–8
Bank D, Koenigstein N, Giryes R (2020) Autoencoders. arXiv preprint arXiv:2003.05991
Bank D, Koenigstein N, Giryes R (2023) Autoencoders. Machine Learning for Data Science Handbook:
Data Mining and Knowledge Discovery Handbook 353–374
Bank D, Koenigstein N, Giryes R (2023) Autoencoders. Machine learning for data science handbook:
Data mining and knowledge discovery handbook, pp 353–374
Berahmand K, Li Y, Xu Y (2023) DAC-HPP: deep attributed clustering with high-order proximity
preserve. Neural Comput Appl pp 1–19
Bertalmio M, Sapiro G, CasellesV, Ballester C (2000) Image inpainting. In: Proceedings of the 27th
annual conference on computer graphics and interactive techniques, pp 417–424
Bhangale KB, Kothandaraman M (2022) Survey of deep learning paradigms for speech processing.
Wireless Pers Commun 125(2):1913–1949
Bursic S, Cuculo V, D’Amelio A (2019) Anomaly detection from log files using unsupervised deep
learning. In: International symposium on formal methods, Springer, pp 200–207
Cacciarelli D, Kulahci M, Tyssedal J (2022) Online active learning for soft sensor development using
semi-supervised autoencoders. arXiv preprint arXiv:2212.13067
Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Proceedings of
the AAAI conference on artificial intelligence, vol. 30
Chai Z, Song W, Wang H, Liu F (2019) A semi-supervised auto-encoder using label and sparse
regularizations for classification. Appl Soft Comput 77:205–217
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Charitou C, Garcez Ad, Dragicevic S (2020) Semi-supervised gans for fraud detection. In: 2020
international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Charte D, Charte F, García S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for
nonlinear feature fusion: taxonomy, models, software and guidelines. Inform Fus 44:78–96
Che L, Yang X, Wang L (2020) Text feature extraction based on stacked variational autoencoder.
Microprocess Microsyst 76:103063
Chen S, Guo W (2023) Auto-encoders in deep learning-a review with new perspectives. Mathematics
11(8):1777
Chen Y, Liu Y, Jiang D, Zhang X, Dai W, Xiong H, Tian Q (2022) Sdae: Self-distillated masked
autoencoder. In: European conference on computer vision, Springer, pp 108–124
13
28 Page 46 of 52 K. Berahmand et al.
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation.
arXiv preprint arXiv:1206.4683
Chowdhary K, Chowdhary K (2020) Natural language processing. Fundamentals of artificial
intelligence, pp 603–649
Cui P, Wang X, Pei J, Zhu W (2018) A survey on network embedding. IEEE Trans Knowl Data Eng
31(5):833–852
Daneshfar F, Soleymanbaigi S, Nafisi A, Yamini P (2023) Elastic deep autoencoder for text embedding
clustering by an improved graph regularization. Expert Syst Appl 121780
Debener J, Heinke V, Kriebel J (2023) Detecting insurance fraud using supervised and unsupervised
machine learning. J Risk Insurance
Dehghan A, Ortiz EG, Villegas R, Shah M (2014) Who do i look like? determining parent-offspring
resemblance via gated autoencoders. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 1757–1764
DeLise T (2023) Deep semi-supervised anomaly detection for finding fraud in the futures market. arXiv
preprint arXiv:2309.00088
Ding L, Liu G-W, Zhao B-C, Zhou Y-P, Li S, Zhang Z-D, Guo Y-T, Li A-Q, Lu Y, Yao H-W et al
(2019) Artificial intelligence system of faster region-based convolutional neural network
surpassing senior radiologists in evaluation of metastatic lymph nodes of rectal cancer. Chin Med
J 132(04):379–387
Ding S, Keal CA, Zhao L, Yu D (2022) Dimensionality reduction and classification for hyperspectral
image based on robust supervised Isomap. J Ind Prod Eng 39(1):19–29
Ding Y, Zhuang J, Ding P, Jia M (2022) Self-supervised pretraining via contrast learning for intelligent
incipient fault detection of bearings. Reliab Eng Syst Saf 218:108126
Dong Y, Chen K, Peng Y, Ma Z (2022) Comparative study on supervised versus semi-supervised machine
learning for anomaly detection of in-vehicle can network. In: 2022 IEEE 25th international conference
on intelligent transportation systems (ITSC), IEEE, pp 2914–2919
Du X, Yu J, Chu Z, Jin L, Chen J (2022) Graph autoencoder-based unsupervised outlier detection. Inf Sci
608:532–550
Dutt A, Gader P (2023) Wavelet multiresolution analysis based speech emotion recognition system using 1d
CNN LSTM networks. IN: IEEE/ACM Transactions on audio, speech, and language processing
Dzakiyullah NR, Pramuntadi A, Fauziyyah AK (2021) Semi-supervised classification on credit card fraud
detection using autoencoders. J Appl Data Sci 2(1):01–07
Fan H, Zhang F, Wei Y, Li Z, Zou C, Gao Y, Dai Q (2021) Heterogeneous hypergraph variational
autoencoder for link prediction. IEEE Trans Pattern Anal Mach Intell 44(8):4125–4138
Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers
for credit card fraud detection. Expert Syst Appl 217:119562
Fan S, Wang X, Sh, C, Lu E, Lin K, Wang B (2020) One2multi graph autoencoder for multi-view graph
clustering. In: Proceedings of the web conference 2020, pp 3070–3076
Farahnakian F, Heikkonen J (2018) A deep auto-encoder based approach for intrusion detection system.
In: 2018 20th international conference on advanced communication technology (ICACT), IEEE, pp
178–183
Foti S, Koo B, Stoyanov D, Clarkson MJ (2022) 3d shape variational autoencoder latent disentanglement
via mini-batch feature swapping for bodies and faces. In: Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pp 18730–18739
Gaikwad SK, Gawali BW, Yannawar P (2010) A review on speech recognition technique. Int J Comput Appl
10(3):16–24
Gao Z, Cecati C, Ding SX (2015) A survey of fault diagnosis and fault-tolerant techniques-part I: fault
diagnosis with model-based and signal-based approaches. IEEE Trans Ind Electron 62(6):3757–3767
Gao Y, Wang L, Liu J, Dang J, Okada S (2023) Adversarial domain generalized transformer for cross-corpus
speech emotion recognition. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2023.32907
95
García-Mendoza J-L, Villaseñor-Pineda L, Orihuela-Espina F, Bustio-Martínez L (2022) An autoencoder-
based representation for noise reduction in distant supervision of relation extraction. J Intell Fuzzy
Syst 42(5):4523–4529
Garson GD (2022) Factor analysis and dimension reduction in R: a social Scientist’s Toolkit. Taylor &
Francis, New York
Geng J, Fan J, Wang H, Ma X, Li B, Chen F (2015) High-resolution SAR image classification via deep
convolutional autoencoders. IEEE Geosci Remote Sens Lett 12(11):2351–2355
13
Autoencoders and their applications in machine learning: a… Page 47 of 52 28
Ghorbani A, Fakhrahmad SM (2022) A deep learning approach to network intrusion detection using
a proposed supervised sparse auto-encoder and SVM. Iran J Sci Technol Trans Electr Eng
46(3):829–846
Girin L, Leglaive S, Bie X, Diard J, Hueber T, Alameda-Pineda X (2020) Dynamical variational
autoencoders: a comprehensive review. arXiv preprint arXiv:2008.12595
Gorokhov O, Petrovskiy M, Mashechkin I, Kazachuk M (2023) Fuzzy CNN autoencoder for unsupervised
anomaly detection in log data. Mathematics 11(18):3995
Guo X, Liu X, Zhu E, Yin J (2017) Deep clustering with convolutional autoencoders. In: Neural information
processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18,
2017, Proceedings, Part II 24, Springer, pp 373–382
Guo Z, Wang F, Yao K, Liang J, Wang Z (2022) Multi-scale variational graph autoencoder for link
prediction. In: Proceedings of the Fifteenth ACM international conference on web search and data
mining, pp 334–342
Guo Y, Zhou D, Ruan X, Cao J (2023) Variational gated autoencoder-based feature extraction model for
inferring disease-Mirna associations based on multiview features. Neural Netw
Hadifar A, Sterckx L, Demeester T, Develder C (2019) A self-training approach for short text clustering. In:
Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019), pp 194–199
Han C, Wang J (2021) Face image inpainting with evolutionary generators. IEEE Signal Process Lett
28:190–193
Hara K, Shiomoto K (2022) Intrusion detection system using semi-supervised learning with adversarial
auto-encoder. In: NOMS 2020-2020 IEEE/IFIP network operations and management symposium,
IEEE, pp 1–8
Hasan BMS, Abdulazeez AM (2021) A review of principal component analysis algorithm for dimensionality
reduction. J Soft Comput Data Min 2(1):20–30
He X, Liao L, Zhang H, Nie L, Hu X, Chua T-S (2017) Neural collaborative filtering. In: Proceedings of the
26th international conference on world wide web, pp 173–182
Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8(5):393–402
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae:
Learning basic visual concepts with a constrained variational framework. In: International conference
on learning representations
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput
18(7):1527–1554
Hoang D-T, Kang H-J (2019) A survey on deep learning based bearing fault diagnosis. Neurocomputing
335:327–335
Hoang T-N, Kim D (2022) Detecting in-vehicle intrusion via semi-supervised learning-based convolutional
adversarial autoencoders. Veh Commun 38:100520
Hosseini S, Varzaneh ZA (2022) Deep text clustering using stacked autoencoder. Multimedia tools and
applications 81(8):10861–10881
Hosseini M, Celotti L, Plourde E (2021) Speaker-independent brain enhanced speech denoising. In: ICASSP
2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP),
IEEE, pp 1310–1314
Hou L, Luo X-Y, Wang Z-Y, Liang J (2020) Representation learning via a semi-supervised stacked distance
autoencoder for image classification. Front Inform Technol Electron Eng 21(7):1005–1018
Hou Z, Liu X, Cen Y, Dong Y, Yang H, Wang C, Tang J (2022) Graphmae: Self-supervised masked graph
autoencoders. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and
data mining, pp 594–604
Huang G, Jafari AH (2023) Enhanced balancing GAN: minority-class image generation. Neural Comput
Appl 35(7):5145–5154
Huang Z, Jin X, Lu C, Hou Q, Cheng M-M, Fu D, Shen X, Feng J (2022) Contrastive masked autoencoders
are stronger vision learners. arXiv preprint arXiv:2207.13532
Ieracitano C, Adeel A, Morabito FC, Hussain A (2020) A novel statistical analysis and autoencoder driven
intelligent intrusion detection approach. Neurocomputing 387:51–62
Jain R, Kasturi R, Schunck BG et al (1995) Machine vision, vol 5. McGraw-hill New York, New York
Jaiswal G, Rani R, Mangotra H, Sharma A (2023) Integration of hyperspectral imaging and autoencoders:
benefits, applications, hyperparameter tunning and challenges. Comput Sci Rev 50:100584
Jha S, Shah S, Ghamsani R, Sanghavi P, Shekokar NM (2023) Analysis of RNNs and different ML and
DL classifiers on speech-based emotion recognition system using linear and nonlinear features. CRC
Press, Boca Raton, pp 109–126
Jia K, Sun L, Gao S, Song Z, Shi BE (2015) Laplacian auto-encoders: an explicit learning of nonlinear data
manifold. Neurocomputing 160:250–260
13
28 Page 48 of 52 K. Berahmand et al.
Jiang S, Dong R, Wang J, Xia M (2023) Credit card fraud detection based on unsupervised attentional
anomaly detection network. Systems 11(6):305
Kennedy RK, Salekshahrezaee Z, Villanustre F, Khoshgoftaar TM (2023) Iterative cleaning and learning of
big highly-imbalanced fraud data using unsupervised learning. J Big Data 10(1):106
Kim S, Jang H, Hong S, Hong YS, Bae WC, Kim S, Hwang D (2021) Fat-saturated image generation from
multi-contrast MRIs using generative adversarial networks with Bloch equation-based autoencoder
regularization. Med Image Anal 73:102198
Kipf TN, Welling M (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification
algorithms: a survey. Information 10(4):150
Książek K, Głomb P, Romaszewski M, Cholewa M, Grabowski B, Búza K (2022) Improving autoencoder
training performance for hyperspectral unmixing with network reinitialisation. In: International
Conference on Image Analysis and Processing, pp. 391–403. Springer
Kumar S, Rath SP, Pandey A (2022) Improved far-field speech recognition using joint variational
autoencoder. arXiv preprint arXiv:2204.11286
Kunang YN, Nurmaini S, Stiawan D, Zarkasi A, et al (2018) Automatic features extraction using
autoencoder in intrusion detection system. In: 2018 international conference on electrical engineering
and computer science (ICECOS), IEEE, pp 219–224
Le T-D, Noumeir R, Rambaud J, Sans G, Jouvet P (2023) Adaptation of autoencoder for sparsity reduction
from clinical notes representation learning. IEEE J Trans Eng Health Med
Lee J-w, Lee J (2017) Idae: Imputation-boosted denoising autoencoder for collaborative filtering.
In: Proceedings of the 2017 ACM on conference on information and knowledge management,
pp2143–2146
Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization. Adv Neural Inform Process Syst
13
Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi AK (2020) Applications of machine learning to machine fault
diagnosis: a review and roadmap. Mech Syst Signal Process 138:106587
Lewandowski B, Paffenroth R (2022) Autoencoder feature residuals for network intrusion detection:
Unsupervised pre-training for improved performance. In: 2022 21st IEEE international conference on
machine learning and applications (ICMLA), IEEE, pp 1334–1341
Li Y-J, Wang S-S, Tsao Y, Su B (2021) Mimo speech compression and enhancement based on convolutional
denoising autoencoder. In: 2021 Asia-pacific signal and information processing association annual
summit and conference (APSIPA ASC), IEEE, pp 1245–1250
Li F, Zuraday J, Wu W (2018) Sparse representation learning of data by autoencoders with l ̂ sub 1∕2̂
regularization. Neural Netw World 28(2):133–147
Li H, Zhang L, Huang B, Zhou X (2020) Cost-sensitive dual-bidirectional linear discriminant analysis. Inf
Sci 510:283–303
Li Z, Huang H, Zhang Z, Shi G (2022) Manifold-based multi-deep belief network for feature extraction of
hyperspectral image. Remote Sens 14(6):1484
Li X, Li C, Rahaman MM, Sun H, Li X, Wu J, Yao Y, Grzegorzek M (2022) A comprehensive review
of computer-aided whole-slide image analysis: from datasets to feature extraction, segmentation,
classification and detection approaches. Artif Intell Rev 55(6):4809–4878. https://doi.org/10.1007/
s10462-021-10121-0
Liang D, Krishnan RG, Hoffman MD, Jebara T (2018) Variational autoencoders for collaborative filtering.
In: Proceedings of the 2018 World Wide Web Conference, pp 689–698
Liao L, Cheng G, Ruan H, Chen K, Lu J (2022) Multichannel variational autoencoder-based speech
separation in designated speaker order. Symmetry 14(12):2514
Lin C-C, Hung Y, Feris R, He L (2020) Video instance segmentation tracking with a modified vae
architecture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 13147–13157
Li P, Pei Y, Li J (2023) A comprehensive survey on design and application of autoencoder in deep learning.
Appl Soft Comput 110176
Liu Y, Ponce C, Brunton SL, Kutz JN (2023) Multiresolution convolutional autoencoders. J Comput Phys
474:111801
Lopes IO, Zou D, Abdulqadder IH, Ruambo FA, Yuan B, Jin H (2022) Effective network intrusion detection
via representation learning: a denoising autoencoder approach. Comput Commun 194:55–65
Luo W, Li J, Yang J, Xu W, Zhang J (2017) Convolutional sparse autoencoders for image classification.
IEEE Trans Neural Netw Learn Syst 29(7):3289–3294
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017
IEEE international conference on multimedia and expo (ICME), IEEE pp 439–444
13
Autoencoders and their applications in machine learning: a… Page 49 of 52 28
Ma M, Sun C, Chen X (2018) Deep coupling autoencoder for fault diagnosis with multimodal sensory data.
IEEE Trans Ind Inf 14(3):1137–1145
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint
arXiv:1511.05644
Ma S, Li X, Tang J, Guo F (2022) Eaa-net: Rethinking the autoencoder architecture with intra-class features
for medical image segmentation. arXiv preprint arXiv:2208.09197
Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic
novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: 2015
IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1996–2000.
IEEE
Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. ACM Comput
Surv 49(4):1–33
McConville R, Santos-Rodriguez R, Piechocki RJ, Craddock I (2021) N2d:(not too) deep clustering via
clustering the local manifold of an autoencoded embedding. In: 2020 25th international conference on
pattern recognition (ICPR), IEEE, pp 5145–5152
McKeown K (1992) Text generation. Cambridge University Press, Cambridge
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain
Shams Eng J 5(4):1093–1113
Medsker LR, Jain L (2001) Recurrent neural networks. Design Appl 5(64–67):2
Meyer BH, Pozo ATR, Zola WMN (2022) Global and local structure preserving GPU t-SNE methods for
large-scale applications. Expert Syst Appl 201:116918
Miao J, Yang T, Sun L, Fei X, Niu L, Shi Y (2022) Graph regularized locally linear embedding for
unsupervised feature selection. Pattern Recogn 122:108299
Minkin A (2021) The application of autoencoders for hyperspectral data compression. In: 2021 international
conference on information technology and nanotechnology (ITNT), IEEE, pp 1–4
Miuccio L, Panno D, Riolo S (2022) A wasserstein GAN autoencoder for SCMA networks. IEEE Wireless
Commun Lett 11(6):1298–1302
Molaei S, Ghorbani N, Dashtiahangar F, Peivandi M, Pourasad Y, Esmaeili M (2022) Fdcnet: presentation
of the fuzzy CNN and fractal feature extraction for detection and classification of tumors. Comput
Intell Neurosci 2022
Myronenko A (2019) 3d mri brain tumor segmentation using autoencoder regularization. In: Brainlesion:
Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop,
BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018,
Revised Selected Papers, Part II 4, Springer, pp 311–320
Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72(2011):1–19
Nguyen HD, Tran KP, Thomassey S, Hamad M (2021) Forecasting and anomaly detection approaches using
LSTM and LSTM autoencoder techniques with the applications in supply chain management. Int J Inf
Manage 57:102282
Ohgushi T, Horiguchi K, Yamanaka M (2020) Road obstacle detection method based on an autoencoder
with semantic segmentation. In: proceedings of the Asian conference on computer vision
Palaz D, Collobert R (2015) Analysis of CNN-based speech recognition system using raw speech as input.
Report, Idiap
Palsson B, Sveinsson JR, Ulfarsson MO (2022) Blind hyperspectral unmixing using autoencoders: a critical
comparison. IEEE J Sel Topics Appl Earth Observ Remote Sens 15:1340–1372
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput
Surv 54(2):1–38
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput
Surv 54(2):1–38
Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversarially regularized graph autoencoder for
graph embedding. arXiv preprint arXiv:1802.04407
Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversarially regularized graph autoencoder for
graph embedding. arXiv preprint arXiv:1802.04407
Papananias M, McLeay TE, Mahfouf M, Kadirkamanathan V (2023) A probabilistic framework for product
health monitoring in multistage manufacturing using unsupervised artificial neural networks and
gaussian processes. Proc Inst Mech Eng Part B: J Eng Manufact 237(9):1295–1310
Paul D, Chakdar D, Saha S, Mathew J (2023) Online research topic modeling and recommendation utilizing
multiview autoencoder-based approach. IEEE Trans Comput Soc Syst
Pereira RC, Santos MS, Rodrigues PP, Abreu PH (2020) Reviewing autoencoders for missing data
imputation: technical trends, applications and outcomes. J Artif Intell Res 69:1255–1285
13
28 Page 50 of 52 K. Berahmand et al.
Petersson H, Gustafsson D, Bergstrom D (2016) Hyperspectral image analysis using deep learning-a review.
In: 2016 sixth international conference on image processing theory, tools and applications (IPTA),
IEEE, pp 1–6
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S (2021) A survey of
autoencoder algorithms to pave the diagnosis of rare diseases. Int J Mol Sci 22(19):10891
Preechakul K, Chatthee N, Wizadwongsa S, Suwajanakorn S (2022) Diffusion autoencoders: Toward a
meaningful and decodable representation. In: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, pp 10619–10629
Qian J, Song Z, Yao Y, Zhu Z, Zhang X (2022) A review on autoencoder based representation learning for
fault detection and diagnosis in industrial processes. Chemometrics Intell Lab Syst, 104711
Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data
analysis: a review. Artif Intell Rev 54(5):3473–3515. https://doi.org/10.1007/s10462-020-09928-0
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection.
In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: Explicit invariance
during feature extraction. In: Proceedings of the 28th international conference on international
conference on machine learning, pp 833–840
Rituerto-González E, Peláez-Moreno C (2021) End-to-end recurrent denoising autoencoder embeddings for
speaker identification. Neural Comput Appl 33(21):14429–14439
Ruff L, Vandermeulen RA, Görnitz N, Binder A, Müller E, Müller K-R, Kloft M (2019) Deep semi-
supervised anomaly detection. arXiv preprint arXiv:1906.02694
Rumelhart DE, Hinton GE, Williams RJ, et al (1985) Learning internal representations by error propagation.
Institute for Cognitive Science, University of California, San Diego La
Rusnac A-L, Grigore O (2022) CNN architectures and feature extraction methods for EEG imaginary
speech recognition. Sensors 22(13):4679
Sae-Ang B-I, Kumwilaisak W, Kaewtrakulpong P (2022) Semi-supervised learning for defect segmentation
with autoencoder auxiliary module. Sensors 22(8):2915
Sagha H, Cummins N, Schuller B (2017) Stacked denoising autoencoders for sentiment analysis: a review.
Wiley Interdiscip Rev Data Min Knowl Discov 7(5):1212
Saha S, Minku LL, Yao X, Sendhoff B, Menzel S (2022) Split-ae: An autoencoder-based disentanglement
framework for 3d shape-to-shape feature transfer. In: 2022 international joint conference on neural
networks (IJCNN), IEEE, pp 1–9
Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality
reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data
analysis, pp. 4–11
Salehi A, Davulcu H (2019) Graph attention auto-encoders. arXiv preprint arXiv:1905.10715
Salha G, Limnios S, Hennequin R, Tran V-A, Vazirgiannis M (2019) Gravity-inspired graph autoencoders
for directed link prediction. In: Proceedings of the 28th ACM international conference on information
and knowledge management, pp 589–598
Sayed HM, ElDeeb HE, Taie SA (2023) Bimodal variational autoencoder for audiovisual speech
recognition. Mach Learn 112(4):1201–1226
Seki S, Kameoka H, Tanaka K, Kaneko T (2023) Jsv-vc: Jointly trained speaker verification and voice
conversion models. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and
signal processing (ICASSP), IEEE, pp 1–5
Semeniuta S, Severyn A, Barth E (2017) A hybrid convolutional variational autoencoder for text generation.
arXiv preprint arXiv:1702.02390
Seyfioğlu MS, Özbayoğlu AM, Gürbüz SZ (2018) Deep convolutional autoencoder for radar-based
classification of similar aided and unaided human activities. IEEE Trans Aerosp Electron Syst
54(4):1709–1723
Shankar V, Parsana S (2022) An overview and empirical comparison of natural language processing (NLP)
models and an introduction to and empirical application of autoencoder models in marketing. J Acad
Mark Sci 50(6):1324–1350
Shi D, Zhao C, Wang Y, Yang H, Wang G, Jiang H, Xue C, Yang S, Zhang Y (2022) Multi actor hierarchical
attention critic with RNN-based feature extraction. Neurocomputing 471:79–93
Shixin P, Kai C, Tian T, Jingying C (2022) An autoencoder-based feature level fusion for speech emotion
recognition. Digital Commun Netw
Shrestha N (2021) Factor analysis as a tool for survey analysis. Am J Appl Math Stat 9(1):4–11
Singh A, Ogunfunmi T (2022) An overview of variational autoencoders for source separation, finance, and
bio-signal applications. Entropy 24(1):55
13
Autoencoders and their applications in machine learning: a… Page 51 of 52 28
Smatana M, Butka P (2019) Topicae: a topic modeling autoencoder. Acta Polytechnica Hungarica
16(4):67–86
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2022) A survey on feature
selection methods for mixed data. Artif Intell Rev 55(4):2821–2846. https://doi.org/10.1007/
s10462-021-10072-6
Song Y, Hyun S, Cheong Y-G (2021) Analysis of autoencoders for network intrusion detection. Sensors
21(13):4294
Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Progress in Pattern
Recognition, Image Analysis, Computer Vision, and Applications: 18th Iberoamerican Congress,
CIARP 2013, Havana, Cuba, November 20-23, 2013, Proceedings, Part I 18, pp 117–124. Springer
Srikotr T (2022) The improved speech spectral envelope compression based on VQ-VAE with adversarial
technique. Thesis
Strub F, Mary J, Gaudel R (2016) Hybrid collaborative filtering with autoencoders. arXiv preprint arXiv:
1603.00806
Strub F, Mary J, Philippe P (2015) Collaborative filtering with stacked denoising autoencoders and sparse
inputs. In: NIPS workshop on machine learning for ecommerce
Su Y, Li J, Plaza A, Marinoni A, Gamba P, Chakravortty S (2019) DAEN: deep autoencoder networks for
hyperspectral unmixing. IEEE Trans Geosci Remote Sens 57(7):4309–4321
Sudo T, Kanishima Y, Yanagihashi H (2021) A study of anomalous sound detection using autoencoder for
quality determination and condition diagnosis. IEICE Tech. Rep. 121(284):20–25
Talpur N, Abdulkadir SJ, Alhussian H, Hasan MH, Aziz N, Bamhdi A (2023) Deep neuro-fuzzy system
application trends, challenges, and future perspectives: a systematic survey. Artif Intell Rev
56(2):865–913. https://doi.org/10.1007/s10462-022-10188-3
Tanveer M, Rastogi A, Paliwal V, Ganaie M, Malik A, Del Ser J, Lin C-T (2023) Ensemble deep learning in
speech signal tasks: a review. Neurocomputing 126436
Thai HH, Hieu ND, Van Tho N, Do Hoang H, Duy PT, Pham V-H (2022) Adversarial autoencoder and generative
adversarial networks for semi-supervised learning intrusion detection system. In: 2022 RIVF international
conference on computing and communication technologies (RIVF), IEEE, pp 584–589
Tian Y, Xu Y, Zhu Q-X, He Y-L (2022) Novel stacked input-enhanced supervised autoencoder integrated
with gated recurrent unit for soft sensing. IEEE Trans Instrum Meas 71:1–9
Tian H, Zhang L, Li S, Yao M, Pan G (2023) Pyramid-VAE-GAN: transferring hierarchical latent
variables for image inpainting. Comput Visual Med pp 1–15
Todd JT (2004) The visual perception of 3d shape. Trends Cogn Sci 8(3):115–121
Tripathi M (2021) Facial image denoising using autoencoder and UNET. Herit Sustain Dev 3(2):89–96
Vahdat A, Kautz J (2020) Nvae: a deep hierarchical variational autoencoder. Adv Neural Inf Process
Syst 33:19667–19679
Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. Adv
Neural Inform Process Syst 26
Van Der Maaten L, Postma EO, van den Herik HJ et al (2009) Dimensionality reduction: a comparative
review. J Mach Learn Res 10(66–71):13
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L (2010) Stacked denoising
autoencoders: Learning useful representations in a deep network with a local denoising criterion.
J Mach Learn Res 11(12)
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L (2010) Stacked denoising
autoencoders: Learning useful representations in a deep network with a local denoising criterion.
J Mach Learn Res 11(12)
Wang W, Yang D, Chen F, Pang Y, Huang S, Ge Y (2019) Clustering with orthogonal autoencoder.
IEEE Access 7:62421–62432
Wang G, Karnan L, Hassan FM (2022) Face feature point detection based on nonlinear high-dimensional
space. Int J Syst Assurance Eng Manag 13(Suppl 1):312–321
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM
SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234
Wang D, Li T, Deng P, Zhang F, Huang W, Zhang P, Liu J (2023) A generalized deep learning clustering
algorithm based on non-negative matrix factorization. ACM Trans Knowledge Discovery Data
Wang C, Pan S, Long G, Zhu X, Jiang J (2017) Mgae: Marginalized graph autoencoder for graph
clustering. In: Proceedings of the 2017 ACM on conference on information and knowledge
management, pp 889–898
Wang H, Wang N, Yeung D-Y (2015) Collaborative deep learning for recommender systems. In:
Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data
mining, pp1235–1244
13
28 Page 52 of 52 K. Berahmand et al.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
13