0% found this document useful (0 votes)
4 views13 pages

VLSI Architecture Design for Compact Shortcut Denoising Autoencoder Neural Network of ECG Signal

The document presents a Compact Shortcut Denoising Autoencoder (CS-DAE) neural network designed for effective noise reduction in Electrocardiogram (ECG) signals, which is crucial for cardiovascular disease evaluation. The CS-DAE architecture improves noise reduction while minimizing memory and computational requirements, achieving a significant improvement in Signal-to-Noise Ratio (SNR) and a low hardware cost of 1.65W. The proposed VLSI architecture is tailored for efficient implementation on FPGA platforms, enhancing the performance of ECG signal processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

VLSI Architecture Design for Compact Shortcut Denoising Autoencoder Neural Network of ECG Signal

The document presents a Compact Shortcut Denoising Autoencoder (CS-DAE) neural network designed for effective noise reduction in Electrocardiogram (ECG) signals, which is crucial for cardiovascular disease evaluation. The CS-DAE architecture improves noise reduction while minimizing memory and computational requirements, achieving a significant improvement in Signal-to-Noise Ratio (SNR) and a low hardware cost of 1.65W. The proposed VLSI architecture is tailored for efficient implementation on FPGA platforms, enhancing the performance of ECG signal processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO.

4, APRIL 2025 1621

VLSI Architecture Design for Compact Shortcut


Denoising Autoencoder Neural Network
of ECG Signal
Shin-Chi Lai , Member, IEEE, Szu-Ting Wang , Member, IEEE,
S. M. Salahuddin Morsalin , Graduate Student Member, IEEE, Jia-He Lin , Shih-Chang Hsia , Member, IEEE,
Chuan-Yu Chang , Senior Member, IEEE, and Ming-Hwa Sheu , Member, IEEE

Abstract— The Electrocardiogram (ECG) test detects and Index Terms— Electrocardiogram, compact shortcut, denois-
records cardiac-related electrical activity of the heart. The ECG ing autoencoder, neural network, ECG signals, shortcut layers,
test identifies and documents cardiac-related electrical activity pixel-unshuffled and pixel-shuffled, VLSI architecture, hardware
in the heart. The use of ECG signals for cardiovascular disease design.
nursing as a crucial component of preoperative evaluation is
increasing. ECG signals need to denoise and display in a clear
waveform due to the numerous noises. We have introduced I. I NTRODUCTION
Compact Shortcut Denoising Auto-encoder (CS-DAE) neural net-
work, which reduces the noise from ECG signals. The Compact
Shortcut approach compresses the features passed through the
shortcut layers, which lowers the operation’s memory needs and
T HE ECG test detects the heart’s rhythm and electrical
behavior to prevent heart disease. Electrode and wire
leads are placed on the skin of the chest, arms, and legs to mea-
improves the noise reduction impact. In addition, the encoder and sure the potential changes produced by the heart contraction.
decoder process the Pixel-Unshuffled and Pixel-Shuffled, which The leads are attached to the ECG signals measuring machine
effectively mitigates the feature loss caused by down-sampling (electrocardiograph), which records the electrical activity of
and up-sampling operations. As a result, the CS-DAE algorithm
decreases the computation and required memory size while
the heart muscle and displays it on a screen or monitor.
maintaining higher accuracy. We have used MITDB and NSTDB In addition, any irregularity in the heart rhythm, or damage
datasets for training and testing the proposed CS-DAE model, to the heart muscle, can change the normal electrical behavior
resulting in the average Percentage of Root Mean Square of the heart. However, accurate ECG signal measurement is an
Difference (PRD) being 46.30% and the improvement of Signal- important and critical task for diagnosing heart disease. When
to-Noise Ratio (SNR i mp ) being 10.50. In addition, we have
designed VLSI architect ure for the proposed CS-DAE neural
ECG signal recording, it is easy to mix with noise interference
network to accelerate low hardware cost and less computation. due to many factors such as poor adhesion between the
The TUL PYNQTM-Z2 development platform runs the Verilog electrode patch, wire, and skin, body movement, breathing
code, which is used for VLSI architecture and has the lowest vibration, and so on. Common noise interference includes
power consumption of 1.65W. Baseline Wander (BW) [1] is a low-frequency noise of around
0.5 to 0.6 Hz, Muscle Artifact (MA) [2] electrocardiographic
alterations, not related to cardiac electrical activity, Electrode
Received 14 March 2024; revised 4 October 2024; accepted 18 January
2025. Date of publication 29 January 2025; date of current version 31 March Motion (EM) [3] is the noise that results from the motion of
2025. This work was supported in part by the National Science and Tech- the electrode concerning the patient’s skin, etc. These noises
nology Council, Taiwan, under Grant NSTC 113-2221-E-150-002 and in affect while doctors examine the trace and look for specific
part by the National Formosa University, Yunlin, Taiwan. This article was
recommended by Associate Editor F. Z. Z. Rokhani. (Corresponding author: features of different heart conditions.
Ming-Hwa Sheu.) In recent years, many Denoising Auto Encoder (DAE) [4],
Shin-Chi Lai is with the Department of Automation Engineering and [5], [6], [7], [8] methods for ECG signal noise reduction have
the Smart Machinery and Intelligent Manufacturing Research Center,
National Formosa University, Yunlin County, Huwei 632301, Taiwan (e-mail: been proposed. Most of the DAEs are designed by an encoder-
[email protected]). decoder, and input of the network with a noisy ECG signal.
Szu-Ting Wang is with the Program in Smart Industry Technology Research The encoder compresses the input signal and extracts essential
and Development, National Formosa University, Huwei 632301, Taiwan
(e-mail: [email protected]). features, and the decoder reconstructs the compressed data
S. M. Salahuddin Morsalin, Jia-He Lin, Shih-Chang Hsia, and Ming-Hwa into a clean ECG signal. The Deep Neural Network (DNN-
Sheu are with the Department of Electronic Engineering, National DAE) [9] based DAE architecture was proposed for noise
Yunlin University of Science and Technology, Yunlin County, Douliu
64002, Taiwan (e-mail: [email protected]; [email protected]; reduction by the expression with multi-level feature extraction.
[email protected]; [email protected]). A fully convolutional neural network (FCN) [10] was imple-
Chuan-Yu Chang is with the Department of Computer Science and Infor- mented to improve the signal-to-noise ratio. Although those
mation Engineering, National Yunlin University of Science and Technology,
Yunlin County, Douliu 64002, Taiwan (e-mail: [email protected]). models reduced computation the noise reduction effects are
Digital Object Identifier 10.1109/TCSI.2025.3533544 still not ideal level. A fully connected denoising autoencoder
1549-8328 © 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence
and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1622 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

(FCN-DAE) [11] was proposed to reconstruct the clean data


from its noisy input signals. This experiment showed that it has
lower parameters and a better noise reduction effect compared
to DNN-DAE. The CNN-DAE [12] and LSTM-DAE [13]
network was proposed by combining convolutional denoising
autoencoder (CDAE) with Long Short-Term Memory (LSTM)
for ECG signal compression. They added LSTM at the end of
the encoder section of the CDAE which helps reduce train-
able parameters and lessen the time relationship key features
through LSTM to improve the noise reduction effect. The
Low Memory Shortcut Connection DAE [14] (LMSC-DAE) Fig. 1. Proposed CS-DAE network architecture.
removes the electrode motion (EM) noise and Channel-wise
Average Pooling and Pixel Shuffle DAE [15] (CP-DAE) used
to reserve the key features generated from each layer in both features in ECG waveforms. By compressing features and
the encoder and decoder. They added the shortcut structure mitigating feature loss during signal processing, the CS-DAE
to transfer some features of the encoder to the decoder to algorithm achieves clearer ECG representations. Additionally,
strengthen the data restoration ability of the decoder. Fur- the algorithm reduces computation and memory requirements
thermore, the LMSC-DAE method helps to add the Residual as validated through evaluation using benchmark datasets.
and Point-wise addition. Moreover, the above two methods Some of the data in the collection are clean ECG signal
add Channel-wise average pooling and Point-wise Convolution data, but others are noisy ECG signal data. According to
to improve accuracy. Although many [16], [17], [18], [19] experimental findings, the proposed technique has a cheap
methods have been proposed, most of them added complex hardware cost, a low MAC, fewer parameters, and improved
calculations or stack network layers to noise reduction and noise reduction capabilities.
improve signal quality. Those models [20], [21] are quite
unfriendly for Field Programmable Gate Arrays (FPGAs) or II. P ROPOSED C S -DAE N ETWORK A RCHITECTURE
ASICs embedded platforms. Moreover, the hardware design,
The Compact Shortcut Denoising Autoencoder system is
computation, and system implementation become complex and
more suitable for ECG noise reduction and AI model’s compu-
unsuitable for edge systems.
tation reduction as well as low-cost hardware implementation.
In this research work, we have utilized Compact Shortcut
Therefore, we have designed CS-DAE neural network model
Denoising Auto-encoder (CS-DAE) based neural network.
and VLSI architecture for hardware implementation. Figure 1
That is more suitable for VLSI hardware design and imple-
shows the CS-DAE neural network architecture block diagram,
mentation, which improves the performance of neural network
which contains three convolutional layers, six encoders, six
models with higher noise reduction capability. The following
decoders, and 6 Compact Shortcut (CS) layers. The noisy
list represents significant contributions of this work and per-
ECG signals input image size is 1 × 1024, and the feature
formances.
channels have expanded to 12 through the convolutional layer.
❖ The proposed CS-DAE neural network architecture The encoder’s output channel size is 24, and the feature
has three operation modules in Encoder and Decoder: size has decreased by 50% for each layer. In addition, after
Pixel-Shuffle/Pixel-Un-shuffled function, Convolutional the 6th-layer encoder operation, the feature size is reduced
operation, and Channel-wise concatenation have the from 1024 to 16. However, the decoder operation process
privilege of reducing parameters, computation, and is the opposite of the encoder operation system. The feature
VLSI architecture design complexity. size is double for each layer, and after the 6th-layer decoder
❖ A Compact Shortcut Denoising Auto-encoder system operation, the feature size has been restored to 1024. The last
has been injected between Encoder and Decoder opera- layer output size of the decoder is 12 × 1024. The Feature
tion to compress and transfer essential features. Which Integration and Block Smoothing [22] technique follows to
effectively reduces the memory space and improves reconstruction feature map with the same size as the input
noise reduction performances. signal. In addition, to enhance the noise reduction performance
❖ We have developed a VLSI hardware architecture for of the CS-DAE neural network, the proposed CS operation
8-bit integer inference based on the suggested CS-DAE is inserted between the encoder and the decoder function to
neural network. The FPGA-embedded platform has been transfer more feature information which is helpful for the
deployed to implement the design. The whole chip decoder to reconstruct the noise-less ECG signals.
power consumes only 1.65W for real-time operation.
The improvements outlined in the research focus on enhanc- A. The Encoder Layer and Operation Process
ing the denoising capabilities of Electrocardiogram (ECG) The architectural layout of the proposed encoder is dis-
signals for preoperative evaluation in cardiovascular disease played in Figure 2 and the operational process is illustrated in
nursing. The introduction of the Compact Shortcut Denois- Table I, respectively. The convolutional layer, ReLU activation
ing Auto-encoder (CS-DAE) neural network addresses the function, and Pixel-Un-Shuffled system are all components
need for efficient noise reduction while preserving important in the encoder operational module. The characters Chi , Ni ,

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1623

Fig. 2. Encoder layer structure and operation process.

TABLE I
E NCODER ’ S O PERATIONAL F UNCTION

Fig. 3. Compact shortcut layers operation.

TABLE II
C OMPACT S HORTCUT O PERATION F UNCTION

and Ch O stand for the input channel number, feature size,


and output channel numbers, consequently. The Ch O channels
are first cut in half for each module’s convolutional operation
while keeping the feature-length constant. Then the Pixel-Un-
Shuffled operation makes a significant change in the number
of channels becoming double, and the feature length becomes
half. There are six encoder layers, which reduce the input fea- parameters, MACs, input size, and output size of each Com-
ture size by 24 × 16, and finally, a layer of convolution adjusts pact Shortcut layer.
the features information. This convolutional layer amplifies the
input feature and fuses the feature maps but does not change
the kernel size and feature size. C. The Decoder Layer Structure and Operation Process
The decoder operation is the reverse process of the encoder
function and gradually enlarges the condensed features until
B. The Shortcut Layer Structure and Operation Process
the length of the input features becomes the same as before.
The CS layer is implemented in between the encoder and Figure 4 shows the decoder functional architecture and oper-
decoder functions for reducing the number of feature channels. ation of the decoder. Furthermore, Table III also shows the
The CS method reduces the encoder’s memory capacity by numbers for each layer of the decoder’s kernel size, param-
shifting essential feature data from the encoder to the decoder. eters, MACs, and output size. The decoder module has two
The appropriate number of channel reductions effectively input features, one is the output feature of the previous
decreases the required memory for decoder operations while layer, and the other one is the output feature of CS. Ini-
improving the noise reduction performance. This study focuses tially, the encoder module conducted a Pixel-Shuffle function
on using a transposed convolution [23] layer to operate for and doubled the feature length. Besides, the output channel
channel number reduction, even if there are many other characteristic is reduced by 50% from the preceding layer.
approaches to achieve the outcome of lowering the channel Moreover, the output channel function splits the anterior layer
number, reducing the operation’s parameter, and improving in half. In Figure 5, channel-wise concatenation (Concat.)
the hardware reuse rate. Figure 3 shows the Compact Shortcut maintains the same feature-length and combines both features.
layer’s functional operation between the encoder and decoder. Concatenation later performs a convolutional operation to fuse
Furthermore, Table II represents the values of kernel size, the feature map and changes the output channels.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1624 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

clean signals and noisy signals. The SNRout is measured after


the noise reduction process. The SNRimp is intended for the
difference between before and after noise reduction process-
ing. If SNRimp is negative, noise reduction’s effect is worse.
If SNRimp is positive, it means the result of noise reduction is
effective and efficient. The larger number value of signal-to-
noise ratios SNRimp represents a greater improvement in noise
reduction performances.
N
xi2
P
i=1
Fig. 4. Decoder layer structure and operation process. S N Rin = (2)
N
(xi − x̃i )2
P
TABLE III i=1
D ECODER S TRUCTURE AND O PERATION F UNCTION N
xi2
P
i=1
S N Rout = (3)
N
(xi − x̂i )2
P
i=1
S N Rimp = S N Rout − S N Rin (4)

2) Experimental Dataset: We have used two publicly avail-


able benchmark data sets to evaluate the model’s performance:
the MIT-BIH Arrhythmia database (MITDB) [24] and the
Noise Stress Test database (NSTDB) [25]. The MITDB data
set has clean ECG signal data, whereas the NSTDB data
set has noisy ECG signal data. This file contains twelve
half-hour ECG recordings and three noise-typical half-hour
ambulatory ECG recordings and twelve regular half-hour ECG
recordings. In this study, the first channel, primarily a modified
limb lead II (MLII), is obtained by placing electrodes on
the chest for ambulatory ECG recording. The noise has been
added in two-minute segments after the first five minutes of
each record, alternating with two-minute documents of clean
data. The noise was added beginning after the first 5 minutes
of each record, during two-minute segments alternating with
Fig. 5. Concatenation operation.
two-minute clean segments. The noisy image signals have been
captured by selecting intervals that accommodate comprised
baseline wander (BW), muscle artifact (MA), and electrode
D. The CS-DAE Experimental Results
motion artifact (EM). This noisy ECG signal has comprised
1) Performance Evaluation Criteria: In this research, using two clear recordings from the MITDB clean recording
we have considered two evaluation pointers: Percentage root database of 118 and 119 from the MIT-BIH Arrhythmia
mean square difference (PRD) and improvement of Signal- Database. There are six levels of EM noise: −6 dB, 0 dB,
to-Noise Ratio (SNRimp ) to assess the noise reduction effect. 6 dB, 12 dB, 18 dB, and 24 dB. The Arrhythmia Evalu-
Equation 1 shows the PRD percentage calculation, the differ- ation in Wearable ECG Devices [26] evaluated more than
ence is calculated from the clean signal and the signal after 80% of training and test image data sets for the ventricular
noise reduction. TheN represents the total length of the ECG ectopic beat and atrial fibrillation. In addition, A Lossless
signal, xi signifies the clean ECG signal, x̂i denotes the ECG Electrocardiogram Compression System [27] was performed
signal after noise reduction, and x̃i indicates the ECG signal for a high-ratio ECG signal compression system with low
with noise. The smaller value of PRD is closer to the clean computational complexity. The training data is utilized as
signal. noisy fragments in NSTDB as the network’s input features,
v
u N while MITDB is used as Ground Truth to correct the network
uP
(x − x̂i )2 output’s noise reduction results. The dataset has two groups
u i=1 i
u
of training data accounting for 80% and verification data for
P R D(%) = u × 100% (1)
u N 20% of the experiments.
2
t P
xi 3) Experimental Results: We keep the proposed model’s
i=1
input image size at 1 × 1024 to assess the model’s viability
The SNRimp is defined as Signal-to-Noise Ratio which is and AI feasible ability. Ten fully connected layers make up the
shown in equation (2) - (4). The SNRin is calculated from proposed CS-DAE neural network, of which five levels work

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1625

TABLE IV TABLE VI
PRD C OMPARISON OF D IFFERENT C HANNELS DAE A RCHITECTURE PARAMETER , MAC S C OMPARISON TABLE

TABLE V
SNRimp C OMPARISON OF D IFFERENT C HANNELS

number of fully connected layers in the input nodes, which


is made possible by the fact all of those architectures’ large
number of parameters has increased to 10.69M. The FCN,
LMSC, and CP neural network topologies used only con-
volutional layers, resulting in lower parameters of 78.44K,
63.62K, and 55.51K, respectively. However, the proposed
CS-DAE architecture neural network, which uses CS and fully
connected layers, has the lowest parameters, falling at just
34.03K. The DNN only requires 2.8M MACs for multipli-
cation addition calculation performance comparison because
this architecture only has fully linked layers. Besides, the
proposed CS architecture of the neural network has the second
for encoding and another five layers for decoding operation. lowest MAC performance of 8.32M. Other models, such as
The six layers of Compact Shortcut one-dimensional (1D) the CNN (13.27M), LMSC (12.36M), CP (14.69M), FCN
convolution and activation function create concatenation and (25.08M), and CNN-LSTM (46.69M), neural networks have
pass through the decoder. The last CS layer connects the demonstrated higher values of MAC performance. All those
encoder and decoder in its entirety. Model parameters and neural networks contain more parameter values and higher
layers were selected based on literature, empirical evaluation, processing calculations. However, the noise reduction effect of
and the need to balance denoising performance with hard- those architectures is still lower than the proposed CS-DAE
ware efficiency. We referenced architectures like LMSC to neural network architecture. The LMSC and CP-DAE neu-
optimize ECG denoising for VLSI design. To evaluate and ral networks have closer noise reduction effects values than
optimize the model’s performance, we compared the PRD the proposed CS-DAE, but the LMSC neural network has
and SNRimp for different channel configurations of the CS- 0.53 times higher parameters, and the CP-DAE neural network
DAE model, as shown in Table IV and Table V. For example, has 0.61 times higher parameter values.
when the number of channels in the CS model was set to Table VII displays the results of the Percentage of Root
16 (the same as in LMSC), we observed that while this Mean Square Difference (PRD) values comparison for 24dB,
configuration significantly reduced the number of parameters 18dB, 12dB, 6dB, 0dB, and −6 dB respectively. The proposed
and computation load, the average PRD was only 49.87, CS-DAE neural network has measured the lowest PRD value,
and SNRimp was 9.57 dB. However, when we increased the and the Signal-to-Noise Ratio (SNR) input image 24 dB,
number of channels to 24, the PRD improved to 46.30, and 12 dB, 6 dB, or −6 dB have the best performance than
SNRimp increased to 10.50 dB, indicating better denoising other models. On the other hand, the proposed CS-DAE neural
performance than LMSC, with a substantially lower parameter network recorded the second-lowest PRD value of 18dB for
count and computation load. Based on these findings and the SNR input data. However, the LMSC model has the second
LMSC’s proven effectiveness in real-world heart rate noise lowest PRD value difference with the proposed CS-DAE
removal, we selected the 24-channel configuration for the CS- neural network is just 0.24%. The CNN, the CP network, and
DAE model. the proposed CS-DAE neural network all yielded values of
Table VI displays the experimental findings for parameters 58.03, 58.17, and 58.52, respectively, whereas the LMSC had
and Multiplication-Addition-Calculation (MAC) comparisons the lowest PRD values at 0dB SNR input data. However, the
results. Each of the three neural network designs used by optimal performance of the proposed CS-DAE neural network
DNN, CNN, and CNN-LSTM has more than a million has achieved the lowest PRD average value, which is 46.30%.
parameters: 1.39M, 1.11M, and 10.69M, respectively. The The proposed CS-DAE neural network accomplished state-of-
CNN-LSTM neural network architecture has an enormous the-art performance and better results than the existing work.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1626 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

TABLE VII
PRD C OMPARISON W ITH CS-DAE A RCHITECTURE

TABLE VIII
SNRimp C OMPARISON W ITH CS-DAE A RCHITECTURE

Fig. 6. Signal-to-noise ratio result by proposed CS-DAE and other models.

VLSI architecture for Low-Power VLSI consumption [29] was


developed for class-oriented classification and subject-oriented
classification. The DNN architecture is implemented using
180-nm bulk CMOS technology that abnormal atrial activity
is confined to the low-frequency range. The VLSI Implemen-
tation [30], [31] for ECG Compression Algorithm for Low
Power Devices using bit shifting operations as a replacement
for the different arithmetic operations. Those algorithms are
The level of a desired signal and the amount of background able to save memory storage space and reduce transmission
noise has compared in experimental findings. The increasing time. There is a parallel multi-scale one-dimensional residual
results in Signal-to-Noise Ratio (SNRimp ) for 24 dB, 18 dB, network [32] that works for stable results on different datasets.
12 dB, 6 dB, 0 dB, and −6 dB shows in Table VIII. The recom- It is more suitable for matching tasks in identity recognition
mended CS-DAE network performs best when the input data and also has good performance in classification tasks.
is 18dB, 12dB, 6dB, and −6dB since it has the highest SNRimp The HDWT [33] is a low-complexity pre-processing filter
value. The second-highest SNRimp value has measured for the that presents an approximate HDWT hardware architecture for
proposed CS-DAE network, and it was 0.49, barely 0.21% ECG processing at very high energy efficiency. The use of a
higher than the LMSC’s SNRimp value for 24dB input data. truncation technique to improve energy efficiency is also inves-
At 0dB SNRimp input data, the CNN network obtained the tigated herein by observing the evolution of the signal-to-noise
highest SNRimp value of 16.64, while the proposed CS-DAE ratio and the ultimate impact in the ECG peak-detection appli-
network achieved extremely comparable values of 16.55. The cation. The BLSTM convolutional neural networks [34] reduce
difference is 0.02 % between the CNN and proposed CS-DAE parameter precision and computation, and weight quantization,
networks. However, the maximum SNRimp average value is is applied to BLSTM design architecture to further minimize
10.50%, which is the best noise reduction rate of the proposed the hardware resource and energy consumption. In addition,
CS-DAE network. As shown in Figure 6, with a SNR input a neural network compression and low power dissipation VLSI
data of 6dB, the denoising capabilities of DNN, CNN, and hardware architecture [35] decrease the energy consumption of
CNN-LSTM are not particularly effective. In the case of FCN, the classification algorithm, which is adopted in the procedure
the QRS complex disappears after denoising, while LMSC of inference to save hardware resources.
shows a slight decrease in the amplitudes of some P and T The ELM [36] base low-power approach architecture
waves. Similar effects are observed with CS, albeit to a lesser for real-time reconfigurable inference engine fabricated in
extent. 40-nm CMOS technology for robust ECG signal perfor-
mances and robust classification under noise cancel conditions.
III. V LSI H ARDWARE A RCHITECTURE D ESIGN The Resource Optimal and Energy-efficient VLSI Archi-
The physical object must extract data from a device that tecture [37] was implemented to avoid any floating-point
transforms an analog signal from hardware into digital data for compute-intensive arithmetic operation using comparators,
processing by a computer. A Deep Neural Network [28] car- shifters, and adders, leading to efficient hardware resource
diac arrhythmia classifier was proposed to classify ECG beats utilization. An abnormal ECG Detection Chip architecture [38]
into normal and different types of arrhythmia beats. In addi- was designed using TSMC 90nm CMOS technology to mon-
tion, an end-to-end edge-enabled machine learning-based itor the heart rate which can help mediators with immediate

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1627

the boundaries in continuous ECG signals. The schedule


controller is the central control switch of the PL control
section workflow chart shown in Figure 10. The program
module receives calculation commands from PS unit using the
command register, and then command register communicates
with the PS unit through the AXI Lite transmission protocol.
After the system initialization, the module becomes idle until
the signal enables the command register to write data, and
the process control module starts to read the commands in the
command register. This module controls the weight mover and
feature mover to transfer the input data to the PE Array and
begin calculation, and then reorder the data through Pixel-
Shuffle/ Pixel-Unshuffled and return the data to the DDR
section.
Fig. 7. VLSI system hardware architecture.
B. 1-D Convolutional Module
CNN hardware accelerators [42] lower power consumption
medical attention. Previously, we had created an intelligent than mixed pooling without increasing the computational
electronic scale system [39] that uses deep learning object complexity. The 1D convolution kernel size is 1 × 7, so it
detection and auto-scale process, among other things, to accel- consists of seven sets of digital signal processors (DSPs) and
erate AI machine learning performance. A FPGA based two adders and a 4-level pipeline is designed for a critical
system [40] shows single-lead electrocardiogram signal QRS transmission path to avoid any problem shown in Figure 11.
complex detection. Figure 7 shows the block diagram for the Before operating each calculation begins, the schedule con-
proposed CS-DAE network implementation. We have built the troller switch controls the weight mover function for sending
software and hardware integrated design of the VLSI system the weight parameters to the 1D Convolution module. After
architecture. this operation, the feature mover keeps an 8-bit input to the 1D
The sky-blue color boxes are the IP design for this archi- convolution function for every cycle operation. There are five
tecture, including schedule controller, weight mover, feature shift registers inside of the 1D convolution function to store
mover, 1D Conv, FAQR, and Pixel-Shuffle / Pixel-Unshuffled the input features, so the output latency becomes nine cycles.
operation. Besides, the white color boxes are the soft IP The DSP functional operation shown in Figure 12, Xilinx’s
provided by Xilinx, weight RAM, feature RAM, and RAM DSP48E1 ensures the MAC function, and the DSP includes
inside of PE are employed using block RAM, and command a multiplier function, an adder operation, two temporary
register has implemented by using block RAM with AXI func- registers, and two multiplexers operation.
tion. There are four groups of PEs in the PE Array function, 1D convolution floating is calculated through the following
and every PE consists of 1D Conv, Feature Accumulation & process:
Quantization & ReLU (FAQR), and RAM shown in Figure 8. Ch i X
X K
Each operation transfers the layer’s input features and out R [co ] [n] = (in R [ci ] [n + 1] × w R [co ] [ci ] [i])
weight parameters from the Double Data Rate (DDR) to the ci =1 i=1
weight RAM and feature RAM inside of the Programmable + bias R [co ] , (5)
Logic (PL) through AXI DMA. Then the Processing System
(PS) writes the convolution operation instructions and related where Chi is the input channel size, Ch O is the output channel
parameters (such as the number of input features, the size of size, N is the feature length, K is kernel size, and C O =
input features, the mode of Pixel-Shuffle/ Pixel-Unshuffled, 1. . . . . . Ch O . The R is the Real_value (float32).
the number of output features, the memory location of the Parameter quantization of input, weight, and output for 8-bit
output feature return, etc.) to the command register through integer values [43] is calculated through the following process
the AXI Lite transmission protocol. After the value of the as shown in equation (6)-(8):
features, weights, commands, and related parameters transmit, Real_value = (int8_value − Z er o_ point) × Scale (6)
the schedule controller controls other IPs to start calculations
max(Real_value)
and sends them back to DDR through AXI DMA to complete Z er o_ point = max(int8_value) −
this layer’s convolution operation. The following sections will Scale
explain the functions of the module. Figure 9 illustrates the (7)
integer value transmission process through weight mover, max(Real_value) − min(Real_value)
feature mover, and 1D convolutional. Scale =
max(int8_value) − min(int8_value)
(8)
A. Schedule Controller Module By equation (5), 8-bit integer is adjusted to approximate
A low-power analog front end with an efficient auto- a 32-bit floating point number. Formulas for calculating zero
matic gain control mechanism [41] formulated for finding point and scale are addressed in (6) and (7).

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1628 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

Fig. 8. Feature mover & weight mover operation.

Fig. 9. Weight mover and feature mover transmission.

Fig. 10. Schedule controller system flow chart.

After parameter quantization, out R , in R , w R , and bias R are


converted from original 32-bit floating point to approximate
8-bit integers, where zero_point is set as 0. After substituting (9)-(12) into (5), and the formula finally
is structured as (13).
Out R = (out Q − out Z ) × out S (9)
out Q [co ] [n]
in R = (in Q − out Z ) × in S (10) in S × W S [co ]
= out Z +
W R = (W Q − 0) × W S (11) out S
Ch i X
X K
For high precision, the bias values is quantized from 32-bit ×( ((in Q [ci ] [n + 1] − in Z ) × w Q [co ] [ci ] [i])
floating number to 32-bit integer through the weight and input ci =1 i=1
scale, where Q is the quantized value, Z is the Zero_point, + bias Q [co ]), (13)
and S is the Scale.
S [co ]
where 0 < in S ×w <1
bias R = (bias Q − bias Z ) × (in S × W S ), bias Z = 0 out S
To achieve hardware design for integer arithmetic, floating-
(12) point values need to be converted into integers as shown

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1629

Fig. 12. DSP operation architecture for MAC logic function.

Fig. 11. The architecture of 1D (1 × 7) convolution module.

in (14). “≪” represents bit shift left and “≫” represents bit
shift right.
The choice of 25 bits stems from the fact that it aligns with
the maximum bit width supported by the FPGA’s DSP48E1,
which is 18 × 25.
FPGA implementation is calculated through the following
process:

Out Q [co ] [n]


in S × w S [co ]
= out Z + (( ≪ 25) Fig. 13. Feature accumulation and FAQR quantization.
out S
XCh i XK
×( ((in Q [ci ] [n + i] − in Z ) × w Q [co ] [ci ] [i])
ci =1 i=1
+ bias Q [co ])) ≫ 25 (14)

The suggested quantization approach enables integer-


arithmetic value inference, which can implement more
effectively than floating-point inference on readily accessi-
ble integer-only hardware. Accuracy and on-device latency
are better balanced by the suggested quantization approach.
In addition, with the post-quantization model correctness end-
to-end, we jointly create a training strategy.

Fig. 14. Pixel-shuffle/pixel-un-shuffled operation.


C. Feature Accumulate, Quantization &
ReLU (FAQR) Module
The network parameters translation from 32-bit floating- D. Pixel-Shuffle / Pixel-Unshuffled Operation
point to 8-bit integers, a quantization and training of neural
networks [43] performed using integer arithmetic. A finetune The PDBSNet [45] trained efficient background pixel-
and quantize floating-point-number convolutional network [44] shuffle down sampling blind-spot reconstruction network that
trained to obtain an integer value through convolutional net- operates in a self-supervised learning. The working function of
work. It can still cut RAM storage by 75% even while certain Pixel-Shuffle and Pixel-Un-Shuffle is to reorder the data and
noise reduction features are lost. The FAQR module gathers sort out the data one channel into two channels, or sort out the
and saves the 1D convolution output in a specific RAM. The data two channels into one channel. Therefore, the two RAMs
8 × 8 multiplication produces 16-bit output and generates are regarded as a group function, and the output sequence of
32-bit output after adding bias and channels. As a result, the two RAMs is adjusted through the internal Controller to
the 32-bit output must be converted into 8 bits using the achieve the function of reordering data. In addition, an external
Quantization module. Figure 13 depicts the quantization oper- group controller unit added with the schedule controller as
ation module. Following accumulation, the schedule controller regulator the output sequence of the group. The Pixel-Shuffle
delivers bias, scale, and zero points to the quantization module and Pixel-Un-Shuffle operation functional block diagram is
for quantization processing. shown in Figure 14.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1630 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

TABLE IX
R ESOURCE U TILIZATION

TABLE X
P OWER C ONSUMPTION FOR D IFFERENT H ARDWARE P LATFORMS

Fig. 15. The diagram of the system implemented on FPGA. TABLE XI


T HIS I S PRD ACCURACY C OMPARISON B ETWEEN F LOW 32 AND I NT 8

process 1,024 sampled points for each run across the testing
datasets. Multiple inference runs were performed, and the
results represent the average inference time for each platform.
RTX 2070 achieved 5.2ms, AGX Xavier 9.56ms, Raspberry
Pi 4 16.2ms, and PYNQ Z2, running an int8 model, achieved
10.8ms. Despite lower power consumption and clock speed,
Fig. 16. System implementation and demonstration on FPGA. PYNQ Z2’s performance is competitive with AGX Xavier
due to its efficient hardware acceleration. Table XI compares
the differences in PRD between PC and FPGA. There are
E. VLSI Hardware Experimental Results certain flaws since this work uses the quantization approach
after model training. Since the quantized network parameters
The TUL PYNQTM -Z2 platform development version has
are 8-bit integers, the inferred PRD typically decreases by
been used to design and construct the VLSI system archi-
2.53%. This noise reduction result is still superior to that
tecture. VLSI design architecture is depicted in Figure 15.
of CP with DNN, CNN, CNN-LSTM, and FCN. Although
TUL PYNQTM -Z2 platform is linked to a PC by an FPGA
LMSC has 1.81% less PRD than the quantized CS-DAE, the
network connection. In addition, the PYNQ writes 167MHz
LMSC parameters use RAM 7.5 times more than the quantized
system frequency to the SD card, together with copies of the
CS-DAE.
quantized parameters, NSTDB dataset, and FPGA bitstream.
The Python application runs to the AXI DMA function to
move the features and weight data from the SD card to the IV. C ONCLUSION
BRAM in the FPGA before starting the FPGA operation. In the research work, we have designed VLSI hardware
The noise reduction results will be shown on the Jupyter accelerators for the proposed CS-DAE neural network archi-
Notebook interface of the PC when all layer calculations tecture to reduce the ECG signal noise. It retains good noise
are completed. Figure 16 accurately depicts our original reduction quality while having affordable hardware costs, low
demonstration. computation requirements, and few parameters. The proposed
Table IX displays the hardware implementation findings for Compact Shortcut structure significantly improves the noise
the TUL PYNQTM -Z2 platform, with BRAM accounting for reduction efficiency with limited parameters and computations.
around 37.50% of the total use and DSP for about 16.82%. Furthermore, the encoder and decoder architectures mitigate
The comparison between FPGA and other platforms’ power feature loss while sampling the Pixel-Un-Shuffle and Pixel-
usage is shown in Table X. The rest of the processors run Shuffle operations. This method improves noise reduction
the CS-DAE model of float32, whereas PYNQ Z2 runs the quality without further calculation or regulatory processes.
CS-DAE model of int8. The network inference power con- The hardware design complexity decreased with fewer regu-
sumption of the GPU, AGX Xavier, and Raspberry Pi 4 are, lation procedures, which can also increase the hardware reuse
respectively, 24.85 times, 12.4 times, and 4.32 times greater rate.
than that of the PYNQ Z2, the difference is quite significant. The experimental findings demonstrate that the proposed
Inference time refers to the time taken for the model to CS-DAE network in this study employs NSTDB datasets as

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1631

the ECG input signal with noise for validation and MITDB [11] H. Chiang, Y. Hsieh, S. Fu, K. Hung, Y. Tsao, and S. Chien,
datasets as the ECG input signal without noise. The average “Noise reduction in ECG signals using fully convolutional denois-
ing autoencoders,” IEEE Access, vol. 7, pp. 60806–60813, 2019, doi:
result of PRD value is 46.3%, and SNRimp is 10.50 achieved 10.1109/ACCESS.2019.2912036.
for this experiment. Comparing the proposed CS-DAE network [12] T. Yoon and D. Kang, “Bimodal CNN for cardiovascular disease
with the LMSC model, the computation is reduced by 32.69%, classification by co-training ECG grayscale images and scalograms,”
Sci. Rep., vol. 13, no. 1, p. 2937, Feb. 2023, doi: 10.1038/s41598-023-
and the parameters are reduced by 46.51%. Besides, the PRDis 30208-8.
increased by 0.72, and SNRimp is increased by 0.13. The [13] E. Dasan and I. Panneerselvam, “A novel dimensionality
CS-DAE uses two quantization methods for quantizing the reduction approach for ECG signal via convolutional denoising
float32 to int8 parameters, although PRD is 1.81% lower than autoencoder with LSTM,” Biomed. Signal Process. Control,
vol. 63, Jan. 2021, Art. no. 102225, doi: 10.1016/j.bspc.2020.
LMSC, and the parameters of the quantized CS-DAE require 102225.
7.5 times less memory than the LMSC model. Therefore, CS- [14] Y.-S. Jhang, S.-T. Wang, M.-H. Sheu, S.-H. Wang, and S.-C.
DAE is more friendly to VLSI designs and embedded systems Lai, “Integration design of portable ECG signal acquisition with
deep-learning based electrode motion artifact removal on an embed-
with fewer hardware resources. Finally, we have implemented ded system,” IEEE Access, vol. 10, pp. 57555–57564, 2022, doi:
the proposed VLSI design on the TUL PYNQ™-Z2 devel- 10.1109/ACCESS.2022.3178847.
opment board, with a power consumption of only 1.65W. [15] Y.-S. Jhang, S.-T. Wang, M.-H. Sheu, S.-H. Wang, and S.-C. Lai,
“Channel-wise average pooling and 1D pixel-shuffle denoising autoen-
However, there are some limitations. The quantization from coder for electrode motion artifact removal in ECG,” Appl. Sci., vol. 12,
32-bit floating-point to 8-bit integers, while reducing memory no. 14, p. 6957, Jul. 2022, doi: 10.3390/app12146957.
and computational complexity, slightly decreases accuracy by [16] D. Lee, S. Lee, S. Oh, and D. Park, “Energy-efficient FPGA
1.81%. Additionally, the architecture’s low power and reduced accelerator with fidelity-controllable sliding-region signal processing
unit for abnormal ECG diagnosis on IoT edge devices,” IEEE
complexity may limit its flexibility for other types of signal Access, vol. 9, pp. 122789–122800, 2021, doi: 10.1109/ACCESS.2021.
processing. Lastly, the model’s focus on the MLII lead restricts 3109875.
its generalizability, and incorporating more diverse datasets [17] W. Jiang and G. Seong Kong, “Block-based neural networks for
personalized ECG signal classification,” IEEE Trans. Neural Netw.,
could improve its applicability in broader clinical settings. vol. 18, no. 6, pp. 1750–1761, Nov. 2007, doi: 10.1109/TNN.2007.
Future work could explore expanding the model’s capabilities 900239.
to handle additional ECG leads and adapting the architecture [18] X. Wang, Y. Zhu, Y. Ha, M. Qiu, and T. Huang, “An FPGA-based
cloud system for massive ECG data analysis,” IEEE Trans. Circuits
for more versatile signal processing tasks. Syst. II, Exp. Briefs, vol. 64, no. 3, pp. 309–313, Mar. 2017, doi:
10.1109/TCSII.2016.2556861.
[19] S. M. Noor, E. John, and M. Panday, “Design and implementation
R EFERENCES of an ultralow-energy FFT ASIC for processing ECG in cardiac
pacemakers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
[1] X. Wang, Y. Zhou, M. Shu, Y. Wang, and A. Dong, “ECG baseline vol. 27, no. 4, pp. 983–987, Apr. 2019, doi: 10.1109/TVLSI.2018.
wander correction and denoising based on sparsity,” IEEE Access, vol. 7, 2883642.
pp. 31573–31585, 2019, doi: 10.1109/ACCESS.2019.2902616. [20] H. Ghonchi and V. Abolghasemi, “A dual attention-based autoen-
[2] J. S. Paul, M. R. Reddy, and V. J. Kumar, “A transform domain SVD coder model for fetal ECG extraction from abdominal signals,”
filter for suppression of muscle noise artefacts in exercise ECG’s,” IEEE Sensors J., vol. 22, no. 23, pp. 22908–22918, Dec. 2022, doi:
IEEE Trans. Biomed. Eng., vol. 47, no. 5, pp. 654–663, May 2000, 10.1109/JSEN.2022.3213586.
doi: 10.1109/10.841337. [21] E. K. Wang, X. Zhang, and L. Pan, “Automatic classification
[3] Y. Bu, M. F. U. Hassan, and D. Lai, “The embedding of of CAD ECG signals with SDAE and bidirectional long short-
flexible conductive silver-coated electrodes into ECG monitor- term network,” IEEE Access, vol. 7, pp. 182873–182880, 2019, doi:
ing garment for minimizing motion artefacts,” IEEE Sensors J., 10.1109/ACCESS.2019.2936525.
vol. 21, no. 13, pp. 14454–14465, Jul. 2021, doi: 10.1109/JSEN.2020. [22] M.-H. Sheu, S. M. S. Morsalin, S.-H. Wang, Y.-T. Shen, S.-C. Hsia,
3001295. and C.-Y. Chang, “FIBS-Unet: Feature integration and block
[4] H. Cao and L. Peyrodie, “Variational mode decomposition-based simul- smoothing network for single image dehazing,” IEEE Access,
taneous R peak detection and noise suppression for automatic ECG vol. 10, pp. 71764–71776, 2022, doi: 10.1109/ACCESS.2022.
analysis,” IEEE Sensors J., vol. 23, no. 8, pp. 8703–8713, Apr. 2023, 3188860.
doi: 10.1109/JSEN.2023.3257332. [23] M. Sheu, S. M. S. Morsalin, C. Hsu, S. Lai, S. Wang, and
[5] X. Wang et al., “An ECG signal denoising method using conditional C. Chang, “Improvement of human pose estimation and pro-
generative adversarial net,” IEEE J. Biomed. Health Informat., vol. 26, cessing with the intensive feature consistency network,” IEEE
no. 7, pp. 2929–2940, Jul. 2022, doi: 10.1109/JBHI.2022.3169325. Access, vol. 11, pp. 28045–28059, 2023, doi: 10.1109/ACCESS.2023.
[6] P. Bing, W. Liu, and Z. Zhang, “DeepCEDNet: An efficient 3258417.
deep convolutional encoder–decoder networks for ECG signal [24] G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhythmia
enhancement,” IEEE Access, vol. 9, pp. 56699–56708, 2021, doi: database,” IEEE Eng. Med. Biol. Mag., vol. 20, no. 3, pp. 45–50,
10.1109/ACCESS.2021.3072640. May 2001, doi: 10.1109/51.932724.
[7] C. Li, Y. Wu, H. Lin, J. Li, F. Zhang, and Y. Yang, “ECG [25] G. B. Moody, W. Muldrow, and R. G. Mark, “A noise stress test for
denoising method based on an improved VMD algorithm,” IEEE arrhythmia detectors,” Comput. Cardiol., vol. 11, no. 3, pp. 381–384,
Sensors J., vol. 22, no. 23, pp. 22725–22733, Dec. 2022, doi: 1984. [Online]. Available: https://physionet.org/content/nstdb/1.0.0/
10.1109/JSEN.2022.3214239. [26] M. Sadrawi et al., “Arrhythmia evaluation in wearable ECG devices,”
[8] P. Singh and A. Sharma, “Attention-based convolutional denoising Sensors, vol. 17, no. 11, p. 2445, Oct. 2017, doi: 10.3390/s17112445.
autoencoder for two-lead ECG denoising and arrhythmia classifica- [27] M. Jia, F. Li, Y. Pu, and Z. Chen, “A lossless electrocardio-
tion,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–10, 2022, doi: gram compression system based on dual-mode prediction and error
10.1109/TIM.2022.3197757. modeling,” IEEE Access, vol. 8, pp. 101153–101162, 2020, doi:
[9] P. Xiong, H. Wang, M. Liu, and X. Liu, “Denoising autoencoder for 10.1109/ACCESS.2020.2998608.
eletrocardiogram signal enhancement,” J. Med. Imag. Health Informat., [28] M. Janveja, R. Parmar, M. Tantuway, and G. Trivedi, “A DNN-based
vol. 5, no. 8, pp. 1804–1810, Dec. 2015, doi: 10.1166/jmihi.2015.1649. low power ECG co-processor architecture to classify cardiac arrhythmia
[10] H. Lin, R. Liu, and Z. Liu, “ECG signal denoising method based for wearable devices,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
on disentangled autoencoder,” Electronics, vol. 12, no. 7, p. 1606, vol. 69, no. 4, pp. 2281–2285, Apr. 2022, doi: 10.1109/TCSII.2022.
Mar. 2023, doi: 10.3390/electronics12071606. 3146036.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
1632 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 72, NO. 4, APRIL 2025

[29] R. Parmar, M. Janveja, J. Pidanic, and G. Trivedi, “Design of DNN-based Shin-Chi Lai (Member, IEEE) received the B.S.
low-power VLSI architecture to classify atrial fibrillation for wearable degree in electronic engineering from Chienkuo
devices,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 31, Technology University, Changhua, Taiwan, in 2002,
no. 3, pp. 320–330, Mar. 2023, doi: 10.1109/TVLSI.2023.3236530. the M.S. degree in electronic engineering from the
[30] T.-H. Tsai and M. A. Hussain, “VLSI implementation of lossless ECG National Yunlin University of Science and Tech-
compression algorithm for low power devices,” IEEE Trans. Circuits nology, Yunlin County, Taiwan, in 2005, and the
Syst. II, Exp. Briefs, vol. 67, no. 12, pp. 3317–3321, Dec. 2020, doi: Ph.D. degree from National Cheng Kung University,
10.1109/TCSII.2020.2978554. Tainan, Taiwan, in 2011. From October 2011 to July
[31] T.-H. Tsai, N.-C. Tung, and D.-B. Lin, “VLSI implementation of 2013, he was an Assistant Research Fellow with
multi-channel ECG lossless compression system,” IEEE Trans. Circuits the Department of Electrical Engineering, National
Syst. II, Exp. Briefs, vol. 68, no. 8, pp. 2962–2966, Aug. 2021, doi: Cheng Kung University. From August 2013 to July
10.1109/TCSII.2021.3071757. 2016, he was an Assistant Professor with the Department of Computer
[32] Y. Chu, H. Shen, and K. Huang, “ECG authentication method based Science and Information Engineering, Nanhua University, Chiayi, Taiwan.
on parallel multi-scale one-dimensional residual network with center From August 2016 to July 2019, he was an Associate Professor with the
and margin loss,” IEEE Access, vol. 7, pp. 51598–51607, 2019, doi: Department of Computer Science and Information Engineering. From August
10.1109/ACCESS.2019.2912519. 2019 to July 2021, he was a Full Professor with the Department of Computer
Science and Information Engineering, Nanhua University. He is currently
[33] H. B. Seidel, M. M. A. da Rosa, G. Paim, E. A. C. da Costa,
a Full Professor with the Department of Automation Engineering, National
S. J. M. Almeida, and S. Bampi, “Approximate pruned and trun-
Formosa University. His main research interests include signal processing and
cated Haar discrete wavelet transform VLSI hardware for energy-
its circuit design, especially for speech, audio, biomedical, and multimedia
efficient ECG signal processing,” IEEE Trans. Circuits Syst. I,
applications.
Reg. Papers, vol. 68, no. 5, pp. 1814–1826, May 2021, doi:
10.1109/TCSI.2021.3057584.
[34] J. Wu, F. Li, Z. Chen, Y. Pu, and M. Zhan, “A neural network-
based ECG classification processor with exploitation of heartbeat
similarity,” IEEE Access, vol. 7, pp. 172774–172782, 2019, doi:
10.1109/ACCESS.2019.2956179. Szu-Ting Wang (Member, IEEE) received the B.S.
[35] Y. Chuang, Y. Chen, H. Li, and A. A. Wu, “An arbitrarily reconfigurable degree from the Department of Computer Science
extreme learning machine inference engine for robust ECG anomaly and Information Management, Providence Univer-
detection,” IEEE Open J. Circuits Syst., vol. 2, pp. 196–209, 2021, doi: sity, Taichung, Taiwan, and the M.S. degree from
10.1109/OJCAS.2020.3039993. the Department of Information Engineering and
[36] M. Da Rosa, P. Da Costa, E. Da Costa, S. Almeida, G. Paim, and Computer Science, Feng Chia University, Taichung.
S. Bampi, “A robust and power-efficient power line interference cancel- She is currently pursuing the Ph.D. degree with the
ing VLSI design,” in Proc. 34th SBC/SBMicro/IEEE/ACM Symp. Integr. Program of Smart Industry Technology Research
Circuits Syst. Design (SBCCI), Campinas, Brazil, Aug. 2021, pp. 1–6, and Design, National Formosa University, Yun-
doi: 10.1109/SBCCI53441.2021.9529983. lin County, Taiwan. Her research interests include
[37] M. Janveja, R. Parmar, G. Trivedi, P. Jan, and Z. Nemec, “An image processing, digital signal processing, and deep
energy efficient and resource optimal VLSI architecture for ECG fea- learning.
ture extraction for wearable healthcare applications,” in Proc. 32nd
Int. Conf. Radioelektronika (RADIOELEKTRONIKA), Kosice, Slovakia,
Apr. 2022, pp. 1–6, doi: 10.1109/RADIOELEKTRONIKA54537.2022.
9764910.
[38] K.-F. Chang and Y.-H. Chen, “High accuracy abnormal ECG
detection chip using a simple neural network,” in Proc. 19th S. M. Salahuddin Morsalin (Graduate Student
Int. SoC Design Conf. (ISOCC), Oct. 2022, pp. 177–178, doi: Member, IEEE) received the B.Sc. degree in elec-
10.1109/ISOCC56007.2022.10031526. trical and electronic engineering from Daffodil
International University, Bangladesh, in 2015, the
[39] W.-Y. Zhu, W.-K. Wong, S. Morsalin, S.-H. Wang, and M.-H.
M.Sc. degree in green technology for sustainabil-
Sheu, “Software and hardware integration system design with fruit
ity (major in electronics) from Nanhua University,
identification for smart electronic scale applications,” in Proc. IEEE
Taiwan, in 2020, and the Ph.D. degree from the
Int. Conf. Consum. Electron. (ICCE-TW), Penghu, Taiwan, Sep. 2021,
Department of Electronic Engineering, National
pp. 1–2, doi: 10.1109/ICCE-TW52618.2021.9603151.
Yunlin University of Science and Technology, Yun-
[40] C. Dong, C. I. Ieong, M. I. Vai, P. U. Mak, P. I. Mak, and F. Wan, lin County, Taiwan, in 2023. From 2020 to 2022,
“A real-time heart beat detector and quantitative investigation based on he worked as a Lecturer with the Department of
FPGA,” in Proc. Asia–Pacific Conf. Postgraduate Res. Microelectron. Computer Science and Information Engineering, Nanhua University, Chiayi,
Electron., Macao, China, Oct. 2011, pp. 65–69, doi: 10.1109/PrimeA- Taiwan. In addition, he also worked as a Server Product Design Engineer with
sia.2011.6075072. the Hardware Research and Development Department, Wiwynn Corporation
[41] N. Vemishetty et al., “Low power personalized ECG based system design Ltd., Taiwan. Currently, he is a Post-Doctoral Researcher with the National
methodology for remote cardiac health monitoring,” IEEE Access, vol. 4, Yunlin University of Science and Technology. His research interests include
pp. 8407–8417, 2016, doi: 10.1109/ACCESS.2016.2629486. image and video processing, big data analytics, deep learning, bio-medical
[42] K. Khalil, O. Eldash, A. Kumar, and M. Bayoumi, “Designing novel image processing, analysis, edge AI system designs, digital signal processing,
AAD pooling in hardware for a convolutional neural network accelera- and VLSI architecture design.
tor,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 30, no. 3,
pp. 303–314, Mar. 2022, doi: 10.1109/TVLSI.2021.3139904.
[43] B. Jacob et al., “Quantization and training of neural networks for
efficient integer-arithmetic-only inference,” in Proc. IEEE Conf. Com-
put. Vis. Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018,
pp. 2704–2713, doi: 10.1109/CVPR.2018.00286. Jia-He Lin received the B.S. degree from the
Department of Electronic Engineering, Oriental
[44] H. Zhao, D. Liu, and H. Li, “Efficient integer-arithmetic-only con-
Institute of Technology, New Taipei City, Taiwan,
volutional networks with bounded ReLU,” in Proc. IEEE Int. Symp.
and the M.S. degree from the Department of Elec-
Circuits Syst. (ISCAS), Daegu, South Korea, May 2021, pp. 1–5, doi:
tronic Engineering, National Yunlin University of
10.1109/ISCAS51556.2021.9401448.
Science and Technology, Yunlin County, Taiwan,
[45] D. Wang, L. Zhuang, L. Gao, X. Sun, M. Huang, and A. J. Plaza, in 2023. His research interests include digital signal
“PDBSNet: Pixel-shuffle downsampling blind-spot reconstruction processing, VLSI architecture design, FPGA appli-
network for hyperspectral anomaly detection,” IEEE Trans. cation, embedded systems, design deep learning, and
Geosci. Remote Sens., vol. 61, May 2023, Art. no. 5511914, doi: image and video processing.
10.1109/TGRS.2023.3276175.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.
LAI et al.: VLSI ARCHITECTURE DESIGN FOR CS DENOISING AUTOENCODER NEURAL NETWORK OF ECG SIGNAL 1633

Shih-Chang Hsia (Member, IEEE) received the pattern recognition. In the above areas, he has more than 200 publications in
Ph.D. degree from the Department of Electri- journals and conference proceedings. He served as the Program Co-Chair
cal Engineering, National Cheng Kung University, for TAAI 2007, CVGIP 2009, the 2010–2019 International Workshop on
Taiwan, in 1996. From 1986 to 1989, he was Intelligent Sensors and Smart Environments, and the third International
an Engineer with the Research and Development Conference on Robot, Vision and Signal Processing. He is an IET Fellow
Department, Microtek International Inc. He was and a Life Member of IPPR and TAAI. He served as the General Co-Chair
an Instructor and an Associate Professor with for the 2012 International Conference on Information Security and Intelligent
the Department of Electronic Engineering, Chung Control, the 2011–2013 Workshop on Digital Life Technologies, CVGIP2017,
Chou Institute of Technology, from 1991 to 1998. WIC2018, ICS2018, and WIC2019. From 2015 to 2017, he was the Chair of
He worked as a Professor with the Department of the IEEE Signal Processing Society Tainan Chapter and the Representative
Computer and Communication Engineering and the for Region 10 of the IEEE SPS Chapters Committee. He is the President of
Department of Electronics Engineering, National Kaohsiung First University Taiwan Association for Web Intelligence Consortium.
of Science and Technology, Kaohsiung, from 1998 to 2010. He was elected
as the Chairperson with the Department of Electronics Engineering in 2007.
He is currently a Professor with the Department of Electronics Engineering,
National Yunlin University of Science and Technology. His research inter-
ests include VLSI/SOC designs, video/image processing, HDTV/Stereo TV
systems, LED lighting systems, and electrical sensors.

Chuan-Yu Chang (Senior Member, IEEE) received


the Ph.D. degree in electrical engineering from Ming-Hwa Sheu (Member, IEEE) received the
National Cheng Kung University, Taiwan, in 2000. M.S. and Ph.D. degrees in electrical engineering
He is currently the Deputy General Director of from National Cheng Kung University, Taiwan, in
the Service Systems Technology Center, Industrial 1989 and 1993, respectively. From 2015 to 2018,
Technology Research Institute, Taiwan. He was he was a Supervisor of Taiwan IC Design Associa-
the Chair of the Department of Computer Science tion. He was the Committee Chair of E.E. course
and Information Engineering from 2009 to 2011. planning for Technical High School, Ministry of
From 2011 to 2019, he was the Dean of Research Education, Taiwan. He was a review committee of
and Development and the Director of the Incu- the Engineering Department, Ministry of Science &
bation Center for Academia-Industry Collaboration Technology (MOST). From 2008 to 2011, he was the
and Intellectual Property. He is currently a Distinguished Professor with Chairperson of the Department of Electronic Engi-
the Department of Computer Science and Information Engineering, National neering. He is currently a Full Professor with the Department of Electronic
Yunlin University of Science and Technology, Taiwan. His current research Engineering, National Yunlin University of Science and Technology, Taiwan.
interests include computational intelligence and their applications to medical His research interests include CAD/VLSI, digital signal process, algorithm
image processing, automated optical inspection, emotion recognition, and analysis, edge AI, and embedded systems.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY DELHI. Downloaded on April 19,2025 at 09:18:52 UTC from IEEE Xplore. Restrictions apply.

You might also like