A Deep Biometric Recognition and Diagnosis Network
A Deep Biometric Recognition and Diagnosis Network
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS. 2017. Doi Number
Corresponding author: Xiaoguang Zhou ([email protected]), Xiangdong Xu ([email protected]), and Xingxiang Tao
([email protected])
This work was supported in part by the Open Foundation of State key Laboratory of Networking and Switching Technology (Beijing
University of Posts and Telecommunications) under Grant SKLNST-2018-1-18, and in part by the Foundation of Jiading Health Science
under Grant 2018-KY-04.
ABSTRACT Arrhythmia is one of the most persistent chronic heart diseases in the elderly and is associated
with high morbidity and mortality such as stroke, cardiac failure, and coronary artery diseases. It is significant
for patients with arrhythmias to automatically detect and classify arrhythmia heartbeats using
electrocardiogram (ECG) signals. In this paper, we develop three robust deep convolutional neural network
(DCNN) models, including a plain-CNN network and two MSF(multi-scale fusion)-CNN architectures (A
and B), to aid in better feature extraction for the detection of arrhythmia and thus significantly improve the
performance metrics. The proposed models are trained and tested with a public MIT-BIH arrhythmia database
on five types of signals. Six groups of ablation experiments are conducted to analyze the performance of the
models. The accuracy, sensitivity, and specificity obtained from MSF-CNN architecture A are higher than
those from the plain-CNN model, demonstrating that the different parallel group convolution blocks (1×3, 1
× 5, and 1 × 7) dramatically improve a model’s performance. Additionally, the best model MSF-CNN
architecture B achieves an average accuracy, sensitivity, and specificity of 98.00%, 96.17%, and 96.38%,
respectively. This illustrates the method with residual learning and concatenation group convolution blocks
has a profound effect on the feature learning of the model. The results of ablation experiments show that our
proposed biometric recognition and diagnosis network with residual learning (MSF-CNN B) achieves a rapid
and reliable diagnosis approach on ECG signal classification, which has the potential for introduction into
clinical practice as an excellent tool for aiding cardiologists in reading ECG heartbeat signals.
1
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
equipment must be able to analyze the morphological achieved great progresses, but the complex feature extraction
characteristics of ECG signals [6] as well as the correlation process consumes considerable computing resources [4]. In
between heartbeats, and finally detect abnormal heartbeats and recent years, deep learning has become a mainstream pattern
determine types. recognition method. It is an end-to-end learning approach that
According to the standard from the Association for the does not require complex process of hand-crafted extracted
Advancement of Medical Instrumentation (AAMI) [7], ECG features. Moreover, great achievements have been obtained in
signals can be divided into five categories: normal beat (N), the fields of image classification [10-14], object detection [15-
supraventricular ectopic (S), ventricular ectopic (V), fusion 17], and image segmentation [18-21]. Therefore, in this paper,
beat (F), and unknown beat (Q). The AAMI standard focuses we introduce a deep learning technology into the study of one-
on the detection of ventricular ectopic beats (VEBs) and non- dimensional signals and propose a more accurate, rapid, and
VEBs, and each category includes several types of heartbeats. robust discriminant model to analyze the classification of ECG
The specific classification is shown in Table 1. In Table 1, signals.
each heartbeat represents different cardiac activity patterns. This paper is organized as follows: Section II introduces
Under different cardiac activity states, each ECG signal has a literature related to the classification of ECG signals,
different implication and requires different targeted treatments including data pre-processing, machining learning methods,
[8]. At present, visual evaluation based on cardiologists is an and deep learning methods. Then the database is described in
important standard of diagnosis. It requires numerous well- section III. We propose a plain-CNN network and two MSF-
trained specialists to correctly identify the type of signal, CNN architectures (A and B) and deeply analyze the
which not only leads to the deviation between subjective configuration parameters of three network architectures in
judgment and the actual situation [9], but also consumes section IV. In section V, the experimental results are shown in
considerable time and energy. Therefore, it is of utmost detail, and the performance evaluation is also compared with
importance for cardiologists to automatically identify recent popular algorithms. Finally, we conclude our work and
abnormal heart rhythms before clinical treatment. propose future research directions in section VI.
TABLE 1 MAPPING OF THE MIT-BIT ARRHYTHMIA DATABASE HEARTBEAT
TYPES TO THE AAMI STANDARD II. RELATED WORK
AAMI In this section, we survey related literature on traditional
N S V F Q
heartbeat
types machine learning approaches and recent popular deep learning
methods based on the detection and classification of ECG
NOR AP P signals. In general, traditional machine learning methods
PVC
mainly consist of three steps for the classification of
LBBB aAP
arrhythmias: data preprocessing, feature extraction and
MIT-BIH fPN
fVN selection, and feature classification. However, the deep
heartbeat AE NP
types
learning approach is an end-to-end model, which shows the
RBBB capacity to self-learn from the input ECG signal segmentation.
VE
SP U
Nodal(jun A. DATA PRE-PROCESSING
ctional) The pre-processing of ECG signals mainly includes denoising
Abbreviations: and segmentation. Firstly, the ECG signals are contaminated
Heartbeat types: N: Any heartbeat not in the S, V, F, Q by various noise and artefacts [22]. In arrhythmias, as the ECG
classes; S: Supraventricular Ectopic beat; V: Ventricular signals belong to low-amplitude and low-frequency signals,
ectopic beat; F: Fusion beat; Q: Unknown beat; NOR: Normal diverse noises lead physicians to perform an incorrect
beat; LBBB: Left bundle branch block beat; AE: Atrial escape assessment and reduce the accuracy of diagnosis. Therefore,
beats; RBBB: Right bundle branch block beat; AP: Atrial the denoising of ECG signals is a significant baseline [23] of
premature beat; AAP: Aberrated atrial premature beat; PAC: data pre-processing. The goal is to reduce noises and artefacts
Premature atrial contraction beat; NP: Nodal(junctional) and determine the point of interest, which is beneficial to
premature beat; SP: Supraventricular premature beat; PVC: extract effective waveform features from ECG signals. Many
Premature ventricular contraction; VE: Ventricular escape scholars have proposed different preprocessing methods. In
beat; fVN: Fusion of ventricular and normal beat; P: Paced general, they can be divided into four categories: filtering
beat; FPN: Fusion of paced and normal beat; U: Unclassified methods, transformation filtering methods, statistical methods,
beat. and a combination of these methods [24-28]. Additionally, the
Over the past decades, ECG signal recognition and ECG signals segmentation is also necessary, which mainly
classification have become an established technique that can divides the whole signal record into a large number of
effectively assist physicians in clinical diagnosis [4]. The heartbeats or RR intervals, and the heartbeats or RR intervals
relevant automatic recognition models mainly rely on belonging to same classification are grouped together
traditional pattern matching methods. These methods have according to the annotations of the expert.
2
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
B. MACHINING LEARNING METHODS classification models [38-39], neural network methods [39-
In recent decades, traditional machine learning algorithms 40], and support vector machines (SVMs) [40].
have been widely used in the classification of arrhythmia For example, Li and Zhou [38] presented an approach to
signals and have made remarkable achievements. The classify ECG signals using wavelet packet entropy (WPE) and
machining learning methods include the complex processing random forests (RF) following the recommendations from
of feature extraction, feature selection and feature learning. AAMI. The experimental results have shown that the WPE
1) FEATURE EXTRACTION AND SELECTION and RF methods are superior to several state-of-the-art
Feature extraction and selection are a pivotal part of the competitive methods. A. M. Alqudah [37] introduced a novel
classification of ECG signals in traditional machine learning method to model cardiac-related biological signals (ECG and
methods, which is conducive to obtaining the most essential PPG) based on Gaussian mixture waves. The proposed
features of signals and providing an accurate feature for the method has been applied to the MICIC and MIT-BIH
final classification. The main features of ECG signals include arrhythmia databases.
time-domain features (also known as waveform features), Moreover, A. M. Alqudah et al. [39] utilized two classifier
frequency domain features, and statistical features [22]. techniques, the probabilistic neural network (PNN) algorithm
Time-domain features mainly refer to physical parameters and random forest (RF) algorithm to extract gaussian mixture
reflecting the activity regularity of the ECG signal, including and wavelets features, which were applied to classify the ECG
the frequency and amplitude of each waveform, such as P- beat into six classes, normal beat (N), left bundle branch block
wave, Q-wave, R-wave, S-wave, T-wave, and intervals beat (LBBBB), right bundle branch block beat (RBBBB),
information, such as PR-interval, QT-interval, and RR- premature ventricular contraction (PVC), atrial premature beat
interval. The QRS-complex and RR-interval features from (APB), and aberrated atrial premature (AAP).
ECG signals are significant in the time-domain, which mainly Hammad et al. [40] employed four support vector machines
reflect the position, duration, amplitude, and shape of a (SVM), two Neural Networks (NNs), and a k-nearest neighbor
specific waveform or deflection in signals [29-30]. Otherwise, (KNN) classifier to classify the ECG signals. These algorithms
digital filters [31], neural networks [32], high-order moments extracted 13 features from each ECG segmentation and set
[33], and phasor transforms [34] have also been used for them as an input of the proposed classifier. All the records of
detecting of the QRS-complex. the MIT-BIH arrhythmia database were used to validate these
Frequency-based approaches are one of the most popular algorithms.
feature extraction techniques for representing ECG signals In general, although these above methods have shown
[22]. Many researchers claim the wavelet transform is the best favorable classification performances, they also have
approach for feature extraction and selection from the ECG numerous shortcomings. First, these automatic ECG signal
signals [35]. Within the wavelet transform, the discrete classification models mainly depend on machine learning and
wavelet transforms (DWTs) is the most widely used in ECG pattern recognition. In the process, ECG signal segmentations
signal classification. In addition to DWT, continuous wavelet are regarded as a sequence of stochastic patterns. The hand-
transforms (CWTs) are also used to extract features from ECG crafted extracted feature process requires burdensome
signals, which overcomes the disadvantages of representation computational resource and time. Second, in terms of
coarseness and instability from DWT [36]. classification algorithms and training datasets, the robustness
The main statistical features are the expectation, variance, of classification models is still limited because they fail to
maximum, minimum, standard deviation, and high-order handle large intra-class variations. In addition, the above
moment of ECG signal [24]. In general, these features provide algorithms often subject to overfitting and show poor
an effective method for analyzing the complexity and performance during validating the different datasets.
distribution of waves on any time series. Therefore, in the case Furthermore, the classifier algorithms don’t perform well in
of ECG recording, these functions are conducive for practical applications under the condition of the various ECG
distinguishing the variation process of particular patients and signals from different patients, which shows a common
diseases [22]. disadvantage of inconsistent performance results when
In general, the above feature extraction and selection classifying a new ECG record. This makes them less reliable
methods are implemented in machine learning classification clinically or in practice. Finally, the recent ECG monitoring
algorithms. In this work, we introduce the deep learning models require well-established cardiologists for diagnosis,
approach into 1-D ECG signal classification. It is an end-to- which also consumes a lot of time and energy.
end model with self-learning. The features are automatically
extracted from the ECG signals by the convolutional neural C. DEEP LEARNING METHODS
network. The hand-crafted feature extraction and selection Deep learning is a new technology that has become the
process is unnecessary. mainstream in computer vision and pattern recognition. In the
2) FEATURE LEARNING METHODS past few years, deep learning has been widely used in the fields
These methods are summarized according to different types of of image classification [10-14], object detection [15-17], and
classifiers, including statistical methods [37], decision tree image segmentation [18-21]. In recent years, deep learning-
3
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
based methods have been successfully applied to analyze ECG features of input signals. It can achieve self-learning through
signal so that overcome the challenges from traditional end-to-end model design. Meanwhile, the radical problem of
machine learning-based methods. both methods is that they only focus on how to propose a better
For example, Kiranyaz et al. [41] presented a fast and model, but do not pay attention to data processing issues: such
accurate patient-specific ECG classification system for as data denoising, data augmentation, and multi-scale data
recognizing the two types of signals of supraventricular training and testing. The data preprocessing of signals should
ectopic beats (S) and ventricular ectopic beats (V). The model be focus on because signals and images are different data types.
designed three convolutional layers and two multi-layer Hence, in this work, inspired by these previous efforts, a
perceptron to obtain the experimental result. more accurate, comprehensive, and robust method based on
In additional, Jun et al. [42] proposed a deep neural network deep learning is proposed to identify five different types of
for the classification of premature ventricular contraction arrhythmia signals. The proposed model not only pays
(PVC) beats. Acharya et al [43] developed a 9-layer CNN attention to the superiority of model design but also presents
model to automatically classify five classes of heartbeats. the importance of data processing in this paper. The final
Murugesan et al. [6] also implemented three robust deep results also prove that the application of ECG signal
neural networks (DNNs) (CNNs, LSTM, and CNN-LSTM) to classification using the convolutional neural network is
detect the two types of Premature Ventricular Contraction reliable. The deep learning architecture outperforms the hand-
(PVC) and premature atrial contraction (PAC). The results crafted feature extractors assembled by machine learning
showcased the potential of the network as a feature extractor models in terms of classification accuracy, sensitivity,
for ECG signal classification. specificity, and confusion matrix.
Moreover, in [44], the CNN was transferred in this study to The contributions of this work are as follows:
carry out automatic ECG arrhythmia diagnostics after (1) We propose an end-to-end plain-CNN architecture and
employing the higher-order spectral algorithms. Transfer two MSF-CNN architectures (A and B) to replace additional
learning strategies were applied on a pre-trained convolutional hand-crafted feature extraction, selection, and classification
neural network, namely AlexNet and GoogleNet, to carry out using machine learning methods. The plain-CNN is a baseline
the final classification. model, the MSF-CNN A and B are implemented based on this
Compared with traditional machine learning methods, the baseline network. Thus, it significantly enhances the
most critical feature of deep learning is that it does not require performance against recent state-of-the-art studies.
the processes of feature extraction and feature selection. The (2) Moreover, the signal processing problems are fully
deep learning approaches have the ability to self-learning from considered. We first design multi-scale input signals,
input signals. In other words, the previous processes of feature including 251 samples (named set A) and 361 samples (named
extraction and selection in machine learning are embedded in set B). This design can improve the generalization ability of
the deep learning model, which can continuously learn the model by extracting multi-scale signal features. Then, the
features from input data. However, the above deep learning signal denoising and data augmentation also are implemented
methods also showcased some imperfections. The research in this paper. The data augmentation strategy is a major
directions of [41], [42] and [6] were a two-class problem. It innovation in this paper. This problem has not been paid much
was a simple research point compared to the five-class attention in most ECG signal research papers before.
problem in this work. Otherwise, [37] and [40] presented a (3) In particular, we present six sets of detailed ablation
plain CNNs model to extract features from ECG signals. The experiments on ECG signal classification and achieve
structure of the plain model was not conducive to the excellent performance metrics. And we also compare the
extraction of features from deep layers. Moreover, [43] results from our model to recent state-of-the-art methods.
proposed 9-layer models, which is enough to features Additionally, detailed analysis and comparison are presented
extraction. But the model didn’t fully consider the imbalance in this paper.
between data classes, which may lead to the overfitting of
model. Additionally, the influence of different lengths of input III. ECG DATABASE DESCRIPTION AND PRE-
signal and the problem of unbalanced original data PROCESSING
classification on model’s performance has not been fully It is crucial to acquire and process the research data in our
considered. work. In this section, we first introduce the MIT-BIH
Broadly speaking, the fundamental disadvantages and Arrhythmia Database in detail, and then we fully illustrate the
challenges of existing machine learning methods for ECG data pre-processing, including denoising, data segmentation,
signal detection and classification are that hand-crafted and data augmentation.
extracted feature, which not only greatly affects the accuracy
of the algorithm, but also consumes a lot of calculation time A. THE DESCRIPTION OF DATABASE
and cost. The deep convolutional neural network is essentially The MIT-BIH Arrhythmia Database (MITDB) [45] is an
realized by stacking automatic encoders. Considerable feature open-source PhysioBank database that is widely used to
representational power effectively reveals unknown abstract research the detection and classification of ECG signals. The
4
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
Figure 1. A 10 s signal example of MLII and V5 from MITDB. Each ECG record is approximately 30 minutes, which includes two leads. The MLII of Figure 1
denotes the signal of lead II, and V5 describes the lead V.
database consists of 48 half-hour ECG records obtained from completely masks the ECG waveform [4]. Power-line
47 subjects, and each ECG record contains two leads (lead II interference and high-frequency noise are usually removed by
and lead V) originating from different electrodes. Figure 1 a low pass filter. Considering the feature, first, the wavelet
shows an example of signals from the MITDB. Each ECG transform multi-resolution theory is leveraged to decompose
record duration is approximately 30 minutes, and the signal the noisy signal. Then, we take advantage of the different
sampling frequency is 360Hz. These subjects comprise 25 distribution of signal and noise on the spectrum to remove the
males aged range from 32 to 89 and 22 females aged 23 to 89. detail component on the scale of wavelet decomposition
The Arrhythmia database is divided into 25 subjects of normal directly corresponding to the noise. Finally, wavelet inverse
ECG recordings and 23 subjects with abnormal ECG transformation is used to reconstruct signals, which can
recordings. effectively remove the noise in the signal component.
In this paper, two-lead signals (lead II or MLII) are used to 2)DATA SEGMENTATION
train, validate and test the algorithm. In addition, all the signal The denoised ECG signals are classified into 5 classifications:
records are independently annotated by at least two normal (N), supraventricular ectopic beat (S), ventricular
cardiologists. A total of 109,454 heartbeats are extracted in ectopic beat (V), fusion beat (F), and unknown beat (Q)
this work (shown in Table 2). The data directory contains the according to the annotation from cardiologists, and these
entire MIT-BIH arrhythmia data, which uses a custom format signals will be fed into the classification network. A complete
to save file length and storage space. An ECG record consists normal heartbeat is shown in Figure 2, including an integrated
of three parts: a header file (.hea), a data file (.dat), and an rhythm from P-wave onset to T-wave offset (or U-wave onset).
annotation file (.atr). Considering the different lengths of ECG signals contain
different amounts of feature information, data segmentation
B. DATA PRE-PROCESSING follows two strategies: 251 samples and 361 samples. The
We process the original raw data from the MIT-BIH original raw ECG signals with denoising are segmented into a
arrhythmia database through a series of approaches such as mass of heartbeats centered around the R-peak without the
denoising, data segmentation, and data augmentation to form inclusion of the first and last heartbeats. Each heartbeat
the new data sets, and finally train a network with stronger consists of 251 samples (60 samples before the R-peak and
robustness and better generalization ability. The specific 190 samples after R-peak), including an integrated P-, Q-, R-,
processes are as follows: S-, and T-peak. We regard these signals included 251 samples
1)DENOISING as set A. Likewise, these original raw signals with denoising
The main function is to eliminate power-line interferences and also are segmented into 361 samples of a heartbeat (120
baseline wanderings caused by patient respiration or samples before the R-peak and 240 samples after the R-peak).
movement, which will lead to several problems in detecting We regard these signals included 361 samples as set B.
heart diseases. Baseline wandering is a low-frequency noise 3)DATA AUGMENTATION
signal. For baseline wandering, the median filtering method is It is an important part of this work, mainly to balance the
adopted to remove this kind of noise. Power-line interference number of five classifications (N, S, V, F, Q), which is more
is an interfering voltage with an integer multiple of 50 Hz that conducive to feature learning in deep neural networks. A total
5
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
6
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
7
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
Equation (5) is the representation of residual learning when upgraded network based on the plain-CNN to verify the
the number of feature maps from the input and output is the processing ability of three parallel convolution kernels for
same. If the number of feature maps from the input and output ECG signals. As shown in Figure4 (b), the network mainly
is different, the convolution of 1 × 1 will be leveraged to includes one parallel group convolutional block, three
increase the dimension or decrease dimension. convolution layers, two max-pooling layers, one global
𝐻(𝑥) = 𝐹(𝑥) + ℎ(𝑥) (6) average-pooling layer, two full convolutional layers, and the
where ℎ(𝑥) is a convolution operation of 1 × 1 added in the corresponding BN, ReLU, and dropout. The datasets are first
shortcut connection. divided into two subsets (set A and set B) according to the
In addition to solving the degradation problem by different length of ECG signals and fed into three different
optimizing the residual function, residual learning can also parallel convolution kernels (1×7, 1×5, 1×3). The three
effectively reduce gradient dispersion. outputs are then concatenated. This strategy can enable the
When the layer of network becomes deep, the gradient back network model to learn the hierarchical feature information
propagation is as follows. from different spaces, and finally obtain more continuous and
𝜕𝐿𝑜𝑠𝑠 𝜕𝐹𝑁 (𝑋𝐿𝑁 ,𝑊𝐿𝑁 ,𝑏𝐿𝑁 ) 𝜕𝐹2 (𝑋𝐿2 ,𝑊𝐿2 ,𝑏𝐿2 )
better representation. Then it is followed by the BN and ReLU
= ∗⋯∗ (7) layers. The trick of BN relieves overfitting, and ReLU
𝜕𝑥1 𝜕𝑋𝐿 𝜕𝑋1
During the backpropagation of this gradient value, if 𝑁 is increases nonlinear expression. The first two convolutional
large, the gradient value will decrease as it propagates to the blocks contain a convolutional layer, max-pooling, BN and
first few layers, and the gradient may disappear when it is ReLU, and the last convolutional blocks are connected to a
deeper in the deep neural network. However, residual learning global max-pooling layer. The two fully connected layers are
solves this problem at the level of the neural network structure. followed by BN, ReLU, and dropout operations. The MSF-
The gradient back propagation is as follows when the residual CNN A is mainly introduced three parallel convolution
learning is utilized in the model. kernels to fully extract the feature from set A and set B.
Finally, we design another multi-scale fusion CNN
𝜕𝐿𝑜𝑠𝑠
=
𝜕𝑋𝐿 +𝐹(𝑋𝐿 ,𝑊𝐿 ,𝑏𝐿)
=1+
𝜕𝐹2 (𝑋𝐿 ,𝑊𝐿 ,𝑏𝐿 )
(8) architecture B (MSF-CNN B, in Figure 4 (c)) based on the
𝜕𝑥1 𝜕𝑋𝐿 𝜕𝑋𝐿 MSF-CNN A, which is inspired by VGGNets [50] and ResNet
Hence, even with deep network layers, gradient dispersion [10]. The MFS-CNN B is upgraded network based on the
will be effectively contained. MFS-CNN A to verify processing ability of the concatenation
group convolution blocks and residual learning blocks for
C. THE PROPOSED NETWORK ARCHITECTURE
ECG signals. The architecture includes one parallel group
The design of the network mainly relies on the six parts convolutional block (1×7, 1×5, and 1×3) as the MSF-CNN A,
computing units mentioned above. In this work, we design 7 convolution layers, two residual learning blocks, two max-
three network architectures (plain-CNN, MSF-CNN A, and pooling layers, one global average pooling, and two fully
MSF-CNN B.) with a highly modularized block, which are connected layers. The parallel group convolution block is the
inspired by the idea of VGG published as a conference paper same as the MSF-CNN A. The difference between network A
at ICLR 2015[50]. VGG is a mature deep neural network that and B is that two or three convolutional layers (named the
has been proven to effectively solve various problems in the concatenation group convolution block) are grouped together
field of computer vision. in the deep layer of MSF-CNN B, sharing the same number of
As shown in Figure 4 (a), the plain-CNN network, a filters, and the concatenation group convolution blocks are
baseline network, is a simple CNN architecture to verify the separated by the max-pooling layer. Therefore, one parallel
processing ability of 1-D CNN for ECG signals. It includes group convolutional block and two concatenation group
three convolution layers, two fully connected layers, and convolutional blocks constitute the entire convolution MSF-
corresponding nonparametric layers (pooling layer, batch CNN B, and the global average pooling layer is behind the
normalization layer, ReLU layer, and softmax layer). The third concatenation group convolutional blocks. Most
input signals of set A and set B are directly fed into the importantly, we implement the residual learning block to
convolution layer. The first two convolution layers are avoid the degradation problem described above. The
followed by a max-pooling layer, a batch normalization (BN) concatenation group convolution blocks and residual learning
layer, and a ReLU layer, respectively. The last convolution blocks are a vital innovation of this model.
layer is followed by global average pooling. The fully In training, the operation of the fully connected layer is
connected layer is followed by a BN layer, a ReLU layer, and replaced by a full convolutional layer in the network. Since the
a dropout layer. The plain-CNN is an ordinary multi-layer output of the convolutional layer maintains the spatial locality
convolution network. between the feature signals, and the input size of ECG signals
In addition, we propose a multi-scale fusion CNN is not limited. Additionally, this conversion greatly reduces the
architecture A (MSF-CNN A, in Figure 4 (b)) that integrates number of parameters that need to be trained, and it can also
different spatial features by using one parallel group provide a better effect. The corresponding function is shown
convolutional block (1×7,1×5, and 1×3). The MFS-CNN A is in equation (9).
8
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
ECG signal
Denoising
Pre-processing
Data Segmentation
The input of
network
Conv 1×3 conv 1×5 conv 1×7 conv 1×3 conv 1×5 conv 1×7 conv
Convolution Convolution
Max- pooling block block
Concat Concat
Batch Normalization
Batch Normalization Batch Normalization
ReLU
ReLU ReLU
Conv Conv
Conv
Max- pooling
Batch Normalization
Batch Normalization
Batch Normalization
ReLU ReLU
ReLU
Conv Conv
Conv
Global average pooling Max- pooling
Conv
Batch Normalization Batch Normalization
Max- pooling
ReLU ReLU
Batch Normalization
Conv
Fully-connected
ReLU
Global average pooling
Batch Normalization
Conv
Batch Normalization
ReLU
Conv
ReLU
Dropout Global average pooling
Fully-connected
Fully-connected Batch Normalization
Batch Normalization
ReLU
Batch Normalization
ReLU
Fully-connected
ReLU
Dropout
Batch Normalization
Dropout
Fully-connected
ReLU
Batch Normalization
Target class Dropout
ReLU
Fully-connected
Dropout
Batch Normalization
Dropout
Target class
Figure 4. Example network architecture. (a): the plain network as a reference. (b): the MSF-CNN architecture A. (c): the MSF-CNN architecture B.
Table 2 shows more details and other variants.
9
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
10
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
𝑇𝑃
𝑤𝑖 describes the weight of 𝑖 − 𝑡ℎ layers. Sensitivity = (13)
𝑇𝑃+𝐹𝑁
B. EVALUATION METRICS 𝑇𝑁
Specificity = (14)
For the evaluation, the four-standard metrics of accuracy, 𝑇𝑁+𝐹𝑃
sensitivity (also known as recall), specificity (also known as TP (true positive) refers to the number of samples that are
the true negative rate), and confusion matrix are used to truly identified as positive samples, TN (true negative) refers
evaluate the classification performance of the plain-CNN, to the number of samples that are truly identified as negative
MSF-CNN A, and MSF-CNN B, respectively. Accuracy is samples, FP (false positive) refers to the number of samples
defined as the ratio of the number of correct predictions (It is that are mistaken for positive samples, which actually is
means that positive samples are classified into positive and negative samples, and FN (false negative) refers to the number
negative samples are classified into negative) to the total of samples that are mistaken for negative samples, which are
number of predictions. Sensitivity describes the proportion of actually positive samples. Because of the large differences in
positive cases identified with accounts for all positive cases, different categories, sensitivity and specificity are more
which is to judge model’s ability of detecting positives relevant performance criteria in arrhythmia detection than
accurately. Specificity denotes the proportion of negative
accuracy.
cases identified accounts for all negative cases, which is to
In addition, the confusion matrix is leveraged to validate the
judge model’s ability of detecting negatives accurately.
performance of proposed model, which is an important
Among them, sensitivity and specificity are two commonly
judgment standards in the field of medical classification tasks. standard to judge the performance of multi-classification
These metrics are defined in the following equations (12), (13), model.
and (14): In the confusion matrix, the greater the number of true
𝑇𝑃+𝑇𝑁
positive cases and true negative cases are, the better the
Accuracy = (12) model’s performance is. Likewise, the fewer false positive
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
11
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
examples and false negative examples, the better the overall better than set A, mainly because each heartbeat from set B
performance of the model is. includes more samples than set A, and these models can learn
more abundant features information. Otherwise, the overall
C. PERFORMANCE COMPARISON AND DISCUSSION average classification performances (accuracy, sensitivity, and
In this section, we implement six groups of ablation specificity) for set A and set B in the three models are shown
experiments to analyze the performance of model. First, we in Table 4. In set A, the average accuracies of the three
carry out a set of experiments to compare the effects of networks are 83.15%, 86.40%, 89.17%, respectively. The
different lengths (set A and set B) of signals on our models’ result of MSF-CNN A is 3.25% higher than the performance
performances. Moreover, we show the change of of the plain-CNN in the set A. Additionally, the result of MSF-
performances by using the data augmentation method on CNN B without residual learning is 2.77% higher than the
training process. In addition, we conduct a set of experiments performance of MSF-CNN A in set A. In set B, the
to demonstrate the function of denoising on the pre-processing performances of the three models also differ by 4.42% and
of data. Meanwhile, we specially designed an experiment to 2.78%, respectively. Otherwise, sensitivity and specificity of
verify the effect of the residual learning network. And the 75.90% and 87.64% are also obtained in this experiment from
convergence analysis experiment is shown to validate our set B. It is lower than the metrics from the plain-CNN network
models’ convergence ability in the fifth group experiment. and MSF-CNN A in set B without residual learning. However,
Finally, the confusion matrix also is implemented to analyze they are higher than the metrics from the three models in set
each classification signals’ performances. The detailed A. It is analyzed that data imbalance may lead to this problem.
discussion about the six specific groups of experiments is as In Table 2, the number of instances of each category without
follows. data augmentation is quite different. Overall, the results also
1) SET A VS. SET B suggest that the parallel group convolutional block in MSF-
We design a set of experiments to verify the effect of set A and CNN A and B and the concatenation group convolution block
set B on three models in the first phase. Every heartbeat in MSF-CNN B without residual learning have an important
includes 251 samples in set A and 361 samples in set B. Figure effect on the performance improvement of the proposed
5 presents the performances’ trends of the two datasets on the models. In theory, longer ECG records cover more heartbeat
three models. According to Figure 5, the changing curves of rhythm information, which will lead to better classification
accuracy from the three models (plain-CNN, MSF-CNN A, performance. Thus, in the following experiments, we use the
MSF-CNN B) indicate that the accuracy of set B is slightly data from set B to implement ablation experiment analysis.
2) DATA AUGMENTATION VS. WITHOUT DATA augmentation is implemented in accordance with the
AUGMENTATION description of section III. B, and the total number of heartbeats
In the second phase, we set up a set of experiments to analyze increased to 331,055 after data augmentation (shown in Table
the impact of data augmentation on the model. The data used 2). In Figure 6, we compare the performances of the proposed
in this experiment are from set B. The strategy of data three networks architectures with data augmentation and
12
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
without data augmentation in set B. As seen in Figure 6, the 92.67%, respectively. It is better than the metrics from the
models with data augmentation perform dramatically better plain-CNN and MSF-CNN A with data augmentation.
than these models’ performances without data augmentation. Additionally, the performances are superior to the results of
Table 5 shows detailed evaluation metrics of the model the three models without data augmentation. The experiment
predictions. The average accuracies of set B are 92.81%, confirms that data augmentation dramatically improves the
95.48%, and 95.96% with data augmentation on the three classification performance of ECG signals, which is also
models. The results are 7.58%, 5.83%, and 3.53% higher than beneficial to data balancing in the dataset. Therefore, we adopt
those of the three models without data augmentation. set B with data augmentation to perform the following
Otherwise, due to data augmentation, the independent experiments.
performance assessment of MSF-CNN B without residual
learning results in sensitivity and specificity of 96.58% and
Figure 6. The accuracy plot of set B. (a): the result of the plain -CNN. (b): the result of MSF-CNN A. (c): the result of MSF-CNN B (w/o residual
learning).
TABLE 5. THE AVERAGE CLASSIFICATION RESULTS FOR SET B WITH DATA AUGMENTATION ON THE PROPOSED THREE MODELS
Set B (361 samples, without data augmentation) Set B (361 samples, with data augmentation)
network
Acc. (%) Se. (%) Sp. (%) Acc. (%) Se. (%) Sp. (%)
3) DENOISING VS. WITHOUT DENOISING 96.38%, and 97.03% with denoising on the three models
In this experiment, we set up a set of experiments to analyze without residual learning, respectively.
the impact of denoising on the model. The data used in this The results are 0.6%, 0.9%, and 1.07% higher than those of
experiment are from set B with data augmentation. As shown the three models without denoising. Moreover, compared with
in Figure 7, the performance of denoising performs slightly all the other models, very high sensitivity (94.43%) and
better than these models’ performances without the processing specificity (96.41%) are obtained in this experiment. It is
of denoising. The detailed classification measures are reported necessary to emphasize that the data augmentation strategy is
in Table 6. The average accuracies of set B are 93.41%,
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
TABLE 6 THE AVERAGE CLASSIFICATION RESULTS FOR SET B WITH DATA AUGMENTATION AND DENOISING ON THE PROPOSED THREE MODELS
implemented in this experiment. It is clear that the denoising and between 80 and 100 epochs during validation. Hence, 100
technique has an influence on the performance of the models. epochs are used in this experiment to ensure full convergence
4) RESIDUAL LEARNING VS. WITHOUT RESIDUAL of the model and reduce overfitting. Moreover, the speed of
LEARNING convergence from the model with residual learning is faster.
Next, we evaluate the effect of the residual learning block on
MSF-CNN B with augmentation and denoising on set B. The
baseline network is the same as the above MSF-CNN B
without the residual learning block. The MSF-CNN B with
residual learning adds a shortcut connection to each pair of 1
×3 as in Figure 4 (c). We make two major observations from
Table 6 (the last row) and Figure 8. First, the result situation
(accuracy) is reversed with residual learning—the MSF-CNN
B with residual learning is better than it without residual
learning (differ by 0.97%). Most importantly, the
performances of sensitivity and specificity also exhibit
excellent and stable metrics. This indicates that the residual Figure 9. Training and validation loss function of set B on MSF-CNN B
learning block dramatically enhances the optimization without residual learning over the epochs.
efficiency by providing faster convergence at the early stage.
Figure 8. The accuracy plot of set B with denoising. The blue solid denotes Figure 10. Training and validation loss function of set B on MSF-CNN B with
MSF-CNN B without residual learning, and the red solid denotes MSF-CNN residual learning over the epochs.
B with residual learning. 6) CONFUSION MATRIX ANALYSIS
5) CONVERGENCE ANALYSIS Finally, in addition to evaluating each classification signal’s
Then, we obtain the loss details during the training and performances of the model with residual learning block, we
validation processes. Figure 9 illustrates the change curve of also assessed a confusion matrix of ECG heartbeats (Tables 7
loss of set B on MSF-CNN B without residual learning block, and 8). They show the accuracy, sensitivity, and specificity of
and Figure 10 also shows the result of set B on MSF-CNN B each classification. Table 8 shows a confusion matrix from the
with residual learning block. As shown in the figures 9 and 10, MSF-CNN B without a residual learning block. Table 9
the convergence effect of the model with residual learning is describes a confusion matrix from the model with residual
better than that of the model without residual learning. In learning block. According to Table 8, on average less than
addition, these experiments’ results also show that the model 1.12% of the ECG heartbeats are wrongly classified across all
converges after between 60 and 100 epochs during training 10-fold when the model does not utilize a residual learning
14
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
block. Likewise, for the model with residual learning block, detection of class Q and is 95.33%. And the minimal
less than 1.00% of the ECG heartbeats are wrongly classified specificity is 96.81%, which is a model with residual learning
across all 10-folds. The minimal sensitivity recorded for both block attributed to the detection of class V. The results also
models are attributed to the detection of class F and are 92.25% demonstrate that the residual learning block has a positive
and 92.32%, respectively. The minimal specificity for the impact on the performance of the model.
model without residual learning block is attributed to the
TABLE 7 A CONFUSION MATRIX OF ECG HEARTBEATS WITHOUT RESIDUAL LEARNING BLOCK ACROSS ALL 10-FOLDS
Predicted
Confusion Matrix Acc (%) Sen (%) Spe (%)
N S V F Q
N 87904 511 1036 926 218 98.49 97.03 98.05
S 36 54579 427 105 473 99.98 98.13 97.24
Predicted
Confusion Matrix Acc (%) Sen (%) Spe (%)
N S V F Q
N 88837 267 48 526 917 99.46 98.06 97.68
S 43 54930 26 497 124 97.52 98.76 99.68
V 544 103 70903 267 533 99.41 98.00 96.81
True
F 97 233 307 31438 5 99.32 92.32 99.46
15
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
different types of ECG heartbeats. The plain-CNN is a (1) The real-world constraints must be considered in the
baseline network with multiple convolution layers, which is a new model. We will put theory research results into a specific
simple CNN architecture to verify the processing ability of 1- filed or for a specific product.
D CNN for ECG signals. The MSF-CNN A is proposed to (2) It’s considerable to design an adaptive parameter system
improve the learning ability of the plain-CNN. It is an to improve the robustness of optimization model.
upgraded network based on baseline network to verify the (3) We will consider the imbalanced data classification
processing ability of three parallel convolution kernels for problem and sufficient prior knowledge. The dendritic neuron
ECG signals, which increases a parallel group convolution model [69] and evolutionary cost-sensitive [70] will provide a
block (including three different convolution kernels with new idea in future work.
1×7,1×5, and 1×3). Finally, the MSF-CNN B based on the
MSF-CNN A is improved by implementing a residual learning
block with three concatenation groups convolution blocks to
promote the performance of the model. It is an upgraded
network based on the MFS-CNN A to verify processing ability
of the concatenation group convolution blocks and residual
learning blocks for ECG signals.
The three proposed models are trained and tested with a
public MIT-BIH arrhythmia database on five types of signals,
N, S, V, F, and Q. Six groups of ablation experiments are also
conducted to analyze the performances of these models. The
best model MSF-CNN B with residual learning and group
convolution blocks (including the parallel and concatenation
group convolution blocks) achieves an average accuracy,
sensitivity, and specificity of 98.00%, 96.17%, and 96.38% in
set B. Otherwise, the strategy of multi-scale data, data
augmentation, and denoising also have an important effect on
the training of the three models in our experiments.
Therefore, our proposed deep neural network algorithm
(MSF-CNN B) shows the potential of deep learning-based
approach for feature extraction of the MIT-BIH arrhythmia
database. As is evident from these results, the proposed
approach is an efficient automatic cardiac arrhythmia
classification method and provided a reliable recognition
system based on well-established CNN architectures instead
of training a deep CNN from scratch. It has the potential to
provide accurate ECG signal classification in clinical practice.
In future work, we would like to introduce more clinical
diagnosis data to test the proposed model. Additionally, the
temporal (heartbeats) and spatial (spectrogram) signal features
will be combined to improve the performance metrics of the
models in future work. We would also like to determine the
severity grades of patients with chronic heart diseases by the
detection and classification of ECG signals, which may
represent normal, abnormal, and cardiac electrical activity
conditions that may be life-threatening.
Specifically, compared with the self-organizing structural
size method [63-65], the deep convolutional neural network is
complicated to fast determine its optimal structure given
specific applications. Hence, we will propose a new method
combined the self-organizing maps and convolutional neural
network to the ECG signal research in the future work.
Moreover, we will try our best to propose a new method
combined the optimization approaches [66-68] and
convolutional neural network to the ECG signal research in
the future. This new method will focus on the following
aspects:
16
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
TABLE 9 A SUMMARY OF SELECTED WORKS FOR AUTOMATIC ARRHYTHMIA CLASSIFICATION OF ECG SIGNALS FROM THE DATABASE OF MIT-BIH ARRHYTHMIA
Performance (%)
Literature
Main Work Database Approach
and Time Accuracy (Correct Specificity (positive
Sensitivity (Recall)
recognition rate) predictivity)
MIT-BIH
Five classification ML and DL:
2017 [51] arrhythmia 97.50 / /
(N, S, V, F, Q) MFSWT; DNN
database
MIT-BIH ML: PSO
Two classification
2018 [52] arrhythmia optimized least- 89.90 80.80(S), 82.20(V) 96.70(S), 99.00(V)
(S, V)
database square twin SVM
Chinese
2019 [53] Ten classification Cardiovascular ML: PPNN 74.16 75.23 73.92
Disease Database
The ventricular MIT-BIH
2019 [54] ectopic beat arrhythmia DL: 1D-CNN 95.50 85.80 64.50
detection database
MIT-BIH
Five classification DL: DRNNs
2019 [55] arrhythmia 98.40 / /
(N, S, V, F, Q) based on BGRU
database
Chinese
Seven DL: Parallel
2019 [56] Cardiovascular 95.98 / /
classification GRU RNN
Disease Database
Six classification MIT-BIH
2019 [57] (Normal, L, R, V, arrhythmia ML: KNN 97.70 / /
A, P) database
MIT-BIH
2019 [58] Two-classification arrhythmia DL: CNN 94.70 77.30(S), 93.70(V) 97.70(S); 98.80(V)
database
MIT-BIH DL: CNN of
2019 [59] Five classification arrhythmia STFT-Based 99.00 / /
database Spectrogram
Chinese
Multi- DL: MTGBi-
2020 [60] Cardiovascular 88.86 94.19 /
classification LSTM
Disease Database
Personal Wearable 99.20(VEB) 93.00(VEB) 99.80(VEB)
2020 [61] Two-classification DL: LSTM-RNN
Devices 98.30(SVEB) 66.90(SVEB) 99.80(SVEB)
MIT-BIH
ML and DL:
2020 [62] Five classification arrhythmia 99.45 98.63 99.66
LSTM, SVM
database
Five MIT-BIH DL: Plain-CNN 93.41(Plain-CNN) 87.61(Plain-CNN) 89.73(Plain-CNN)
This
classification (N, arrhythmia MSF-CNN A 96.38(MSF-CNN A) 91.82(MSF-CNN A) 92.58(MSF-CNN A)
paper
S, V, F, Q) database MSF-CNN B 98.00(MSF-CNN B) 96.17(MSF-CNN B) 96.38(MSF-CNN B)
Abbreviations: [2] U.R. Acharya, J.S. Suri, J.A.E. Spaan, S.M. Krishnan, Advances in Cardiac
Signal Processing, 2007.
Heartbeat types: S: Supraventricular ectopic beat; V:
[3] Arrhythmia irregular heartbeat center, Heart Disease and Abnormal Heart
Ventricular ectopic beat; F: Fusion beat; Q: Unknown beat; N: Rhythm (Arrhythmia), 2017. [Online]. Available:
any heartbeat not in the S, V, F, Q classes or normal beat; PVC: https://www.medicinenet.com/ arrhythmia_irregular_heartbeat/article.htm.
Premature ventricular contraction beat; PAC: Premature atrial [4] S.M. Mathews, K. Chandra, K.E. Barner, “A novel application of deep
learning for single-lead ECG classification,” Computers in Biology and
contraction beat; L: Left bundled branch blocks; R: Right
Medicine, vol. 99, pp. 53–62, Jun. 2018.
bundled branch blocks; V: Premature ventricular contractions; [5] S. Preejith, R. Dhinesh, J. Joseph, and M. Sivaprakasam, “Wearable ECG
A: Atrial premature beats; P: Paced beats; VEB: Ventricular platform for continuous cardiac monitoring,” in Engineering in Medicine and
ectopic beats; SVEB: Supraventricular ectopic beats. Biology Society (EMBC), 2016 IEEE 38th Annual International Conference
of the. IEEE, pp. 623–626, Oct. 2016.
Approaches: ML: Machine learning, DL: Deep learning,
[6] B. Murugesan, V. Ravichandran, and K. Ram, “ECGNet: Deep Network
SVM: Support vector machine; DNN: Deep neural network, for Arrhythmia Classification,” IEEE Instrumentation and Measurement
CNN: Convolutional neural network; MFSWT: Slice wavelet Society, pp. 623–626, Jun. 2018.
transform; PSO: Particle swarm optimization; PPNN: [7] American National Standards Institute, Testing and Reporting
Performance Results of Cardiac Rhythm and ST Segment Measurement
Probabilistic process neural network; DRNNs: Deep recurrent
Algorithms, 2012.
neural networks; BGRU: Bidirectional gated recurrent unit; [8] R.J. Martis, U.R. Acharya, H. Adeli, “Current methods in
KNN: k-Nearest Neighbor; MTG: Multi-Task Group. electrocardiogram characterization,” Comput. Biol. Med, vol. 48, no.1, pp.
133-149, May. 2014.
REFERENCES [9] U.R. Acharya, S.L. Oh, Y. Hagiwara, “A Deep Convolutional Neural
Network Model to Classify Heartbeats,” Computers in Biology and Medicine,
[1] National Heart Lung and Blood Institute, Types of Arrhythmias, 2011
vol. 89, no.1, pp. 389-396, Oct. 2017.
[Online]. Available: https://www.nhlbi.nih.gov/health/health-
[10] K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun, “Deep Residual Learning for
topics/topics/arr/types. (Accessed 5 July 2017).
Image Recognition,” In CVPR, 2017.
17
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
[11] K. Simonyan and A. Zisserman. Very deep convolutional networks for [36] E. J. da S. Luz, W. R. Schwartz, G. C. Chavez, et al., “ECG-based
large-scale image recognition. Presented at CVPR,2015. [Online]. Available: heartbeat classification for arrhythmia detection: A survey,” Computer
arxiv.org/abs/1409.1556. Methods and Programs in Biomedicine, vol. 127, pp: 144-164, 2015.
[12] Gao Huang, Zhuang Liu, Kilian Q. Weinberger. Densely Connected [37] A. M. Alqudah, “An enhanced method for real-time modelling of cardiac
Convolutional Networks. Presented at CVPR,2016. [Online]. Available: related biosignals using Gaussian mixtures,” Journal of medical engineering
arxiv.org/abs/1608.06993. & technology, vol. 41, no. 8, pp. 600-611, Oct. 2017.
[13] G. Cai, Y. Wang, L. He, and M. Zhou, “Unsupervised Domain Adaptation [38] T.Y. Li, and M. Zhou, “ECG Classification Using Wavelet Packet
with Adversarial Residual Transform Networks,” IEEE Transactions on Entropy and Random Forests,” Entropy, vol. 18, no. 8, pp. 285-300, Aug. 2016.
Neural Networks and Learning Systems, DOI: TNNLS.2019.2935384, online [39] A. M. Alqudah, I. Abuqasmieh, A. Badarneh and H. Alquran,
2019. “Developing of robust and high accurate ECG beat classification by
[14] T. D. Pham, K. Wardell, “A. Eklund and G. Salerud, “Classification of combining Gaussian mixtures and wavelets features,” Australasian physical &
short time series in early Parkinsonʼs disease with deep learning of fuzzy engineering sciences in medicine, vol. 42, no. 1, pp. 149-157, Jan. 2019.
recurrence plots,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 6, pp. [40] M. Hammad, A. Maher, K. Q. Wang, F. Jiang, and M. Amrani, “Detection
1306-1317, November 2019. of Abnormal Heart Conditions Based on Characteristics of ECG Signals,”
[15] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real- Measurement, vol. 125, pp. 634-644, Sep.2018.
Time Object Detection with Region Proposal Networks. Presented at [41] S. Kiranyaz, T. Ince, M. Gabbouj, “Real-time patient-specific ECG
CVPR,2015. [Online]. Available: arXiv:1506.01497. classification by 1-D convolutional neural networks,” IEEE Transactions on
[16] J. Dai, Y. Li, K. He, and J. Sun. R-FCN: Object Detection via Region- Biomedical Engineering, vol. 63, no. 3, pp. 664–675, Mar. 2016.
based Fully Convolutional Networks. Presented at NIPS, 2016. [Online]. [42] T. J. Jun, H. J. Park, Y. H. Kim, “Premature ventricular contraction beat
Available: arXiv:1605.06409. detection with deep neural networks,” in 15th IEEE International Conference
[17] Y. Tian, X. Li, K. Wang and F. Wang, “Training and testing object on Machine Learning and Applications, pp. 859–864, 2016.
detectors with virtual images,” IEEE/CAA Journal of Automatica Sinica, vol. [43] U. R. Acharya, S. L. Oh, Y. Hagiwara, et al., “A Deep Convolutional
5, no. 2, pp. 539-546, Mar. 2018. Neural Network Model to Classify Heartbeats,” Computers in Biology and
[18] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y. Deformable Medicine, vol. 89, pp 389-396, 2017.
Convolutional Networks. Presented at CVPR, 2017. [Online]. Available: [44] H. Alquran, A. M. Alqudah, I. Abu-Qasmieh, Al-Badarneh, S.
arXiv:1703.06211. Almashaqbeh, “ECG classification using higher order spectral estimation and
[19] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. deep learning techniques,” Neural Network World, vol. 29, no. 4, pp: 207-219,
Rethinking Atrous Convolution for Semantic Image Segmentation. Presented Aug. 2019.
at CVPR, 2017. [Online]. Available: arXiv:1706. 05587. [45] A.L. Goldberger, “PhysioBank, PhysioToolkit, and PhysioNet:
[20] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia. components of a new research resource for complex physiologic signals,”
Pyramid Scene Parsing Network. Presented at CVPR, 2017. [Online]. Circulation 101 (23) (2000) e215–e220.
Available: arXiv:1612.01105,2017. [46] J. Schmidhuber, “Deep Learning in neural networks: an overview,”
[21] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. Presented Neural Netw, vol. 61, pp. 85-117, Jan.2015.
at ICCV, 2017. [Online]. Available: arXiv:1703.06870. [47] A. Krizhevsky, I. Sutskever, Geoffrey E. Hinton, “ImageNet
[22] S.K. Berkaya, A. K. Uysal, E.S. Gunal, “A survey on ECG analysis,” Classification with Deep Convolutional Neural Network”, NIPS Curran
Biomedical Signal Processing and Control, vol. 43, pp. 216-235, May. 2018. Associates Inc,2012.
[23] M.A. Awal, S.S. Mostafa, M. Ahmad, M.A. Rashid, “An adaptive level [48] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
dependent wavelet thresholding for ECG denoising,” Biocybern Biomed. Eng, boltzman machines,” Proceedings of the 27th international conference on
vol. 34, no.4, pp. 238-249, Mar. 2014. machine learning (ICML-10), pp. 807–814, 2010.
[24] L.Q. Wang, “Research on ECG Waveform Detection and Arrhythmia [49] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep
Classification[D],” Hebei University of Technology, Tianjin, 2014. Network Training by Reducing Internal Covariate Shift,” In CVPR, 2015.
[25] N. Sen, C. Chandrakar, “Development of a Novel ECG signal Denoising [50] K. Simonyan, A. Zisserman. “Very deep convolutional networks for
System Using Extended Kalman Filter,” IJAREEIE. vol. 3, pp. 6896-6901, large-scale image recognition”. In ICLR, 2015.
2014. [51] K. Luo, J. Li, Z. Wang, A. Cuschieri, “Patient-specific deep architectural
[26] B.S. Gayal, F.I. Shaikh, “Denoising of ECG signal using undecimated model for ECG classification,” J. Healthcare Eng., vol. 2017, May 2017.
wavelet transform,” IJAREEIE. vol. 3, pp. 7200-7208, 2014. [52] S. Raj, K. C. Ray, “Sparse representation of ECG signals for automated
[27] R. Rodrigues, P. Couto, “A Neural Network Approach to ECG Denoising,” recognition of cardiac arrhythmias,” Expert Systems with Applications, vol.
Available online: arXiv:1212.5217, 2012. 105, pp. 49–64, Sep. 2018.
[28] Md. A. Kabir, C. Shahnaz, “Denoising of ECG signals based on noise [53] N. D. Feng, S. H. Xu, Y. Q. Liang, K. Liu, “A Probabilistic Process Neural
reduction algorithms in EMD and wavelet domains,” Biomedical Signal Network and Its Application in ECG Classification,” IEEE Access, vol. 7, pp.
Processing and Control, vol. 7, pp. 481---489, 2012. 50431 – 50439, Apr. 2019.
[29] Y.C. Yeh, W.J. Wang, C.W. Chiou, “Cardiac arrhythmia diagnosis [54] A. A. S. León, J. R. N. Alvarez, “1D Convolutional Neural Network for
method using linear discriminant analysis on ECG signals,” Measurement, vol. Detecting Ventricular Heartbeats,” IEEE Latin America Transactions, vol. 17,
42, no.5, pp. 778-789, Jun. 2009. no. 12, pp. 1970 – 1977, Dec. 2019.
[30] Y.C. Yeh, W.J. Wang, C.W. Chiou, “Feature selection algorithm for ECG [55] H. M. Lynn, S. B. Pan, P. Kim, “A Deep Bidirectional GRU Network
signals using Range-Overlaps Method,” Expert Syst. Appl, vol. 37, no. 4, pp. Model for Biometric Electrocardiogram Classification Based on Recurrent
3499-3512, Apr. 2010. Neural Networks,” IEEE Access, vol. 7, pp. 145395-145405, Sep. 2019.
[31] V.X. Afonso, W.J. Tompkins, T.Q. Nguyen, S. Luo, “ECG beat detection [56] S. H. Xu, J. J. Li, K. Liu, L. Wu, “A Parallel GRU Recurrent Network
using filter banks,” IEEE Trans. Biomed. Eng., vol. 46, pp. 192–202, 1999. Model and Its Application to Multi-Channel Time-Varying Signal
[32] B. Abibullaev, H.D. Seo, “A new QRS detection method using wavelets Classification,” IEEE Access, vol. 7, pp. 118739 - 118748, Sep. 2019.
and artificial neural networks,” J. Med. Syst. vol. 35, pp. 683–691, 2011. [57] H. Yang, Z. Q. Wei, “Arrhythmia Recognition and Classification Using
[33] M. Korurek, B. Dogan, “ECG beat classification using particle swarm Combined Parametric and Visual Pattern Features of ECG Morphology,”
optimization and radial basis function neural network,” Expert Syst. Appl., vol. IEEE Access, vol. 8, pp. 47103 - 47117, Mar. 2019.
37, pp. 7563–7569, 2010. [58] S. S. S. Xu, M. W. Mak, C. C. Cheung, “Towards End-to-End ECG
[34] A. Martínez, R. Alcaraz, J.J.Rieta, “Application of the phasor transform Classification with Raw Signal Extraction and Deep Neural Networks,” IEEE
for automatic delineation of single-lead ECG fiducial points,” Physiol. Meas., Journal of Biomedical and Health Informatics, vol. 23, no. 4, pp. 1574 - 1584,
vol. 31, pp. 1467–1485, 2010. Jul. 2019.
[35] Y. Kutlu, D. Kuntalp, “Feature extraction for ECG heartbeats using higher [59] J. Huang, B. Chen, B. Yao and W. He, “ECG Arrhythmia Classification
order statistics of WPD coefficients,” Comput. Method Program Biomed., Using STFT-Based Spectrogram and Convolutional Neural Network,” IEEE
vol.105, no. 3, pp. 257–267, 2012. Access, vol. 7, pp. 92871-92880, 2019.
[60] Q. J. Lv, H. Y. Chen, W. B. Zhong, Y. Y. Wang, J. Y. Song, S. D. Guo,
L. X. Qi, C. Y.C. Chen, “A Multi-Task Group Bi-LSTM Networks
18
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3016938, IEEE Access
Application on Electrocardiogram Classification,” IEEE Journal of interests include the machine learning, computer vision, neural networks, and
Translational Engineering in Health and Medicine, vol. 8, pp. 1900111-
so on.
1900121, Feb. 2020.
[61] S. Saadatnejad, M. Oveisi, M. Hashemi, “LSTM-Based ECG
Classification for Continuous Monitoring on Personal Wearable Devices,” DANQUN XIONG received the B.S. degree from
IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 2, pp. 515 – the Department of Clinical Medicine, Nanchang
523, Feb. 2020. University, Nanchang, China, in 2013, and the M.S
[62] B. Hou, J. Yang, P. Wang and R. Yan, “LSTM-Based Auto-Encoder degree in internal medicine from medicine school of
Model for ECG Arrhythmias Classification,” IEEE Transactions on Tongji University, Shanghai, China, in 2016. He is
Instrumentation and Measurement, vol. 69, no. 4, pp. 1232-1240, April 2020. currently an attending doctor in Department of
[63] G. M. Wang, J. F. Qiao, J. Bi, W. J. Li, M. C. Zhou, “TL-GDBN: Growing Cardiology of Jiading District Central Hospital
deep belief network with transfer learning,” IEEE Transactions on Automation Affiliated Shanghai University of Medical and Health
Science and Engineering, vol. 16, no.2, pp. 874-885, 2019. Sciences. His research focus on the detection and
[64] W. A. Khan, S. H. Chung, H. L. Ma, et al., “A novel self-organizing diagnose of arrhythmia.
constructive neural network for estimating aircraft trip fuel consumption,”
Transportation Research Part E: Logistics and Transportation Review, vol.132, XIANGDONG XU received the B.S. degree from
pp. 72-96, 2019. the Department of Clinical Medicine, Nanchang
[65] E. J. Palomo, E. López-Rubio, “The growing hierarchical neural gas self- University,Nanchang, China, in 1997, He is currently
organizing neural network,” IEEE transactions on neural networks and an Professor, and a Master Supervisor and director of
learning systems, vol. 28, no. 9, pp. 2000-2009, 2016. in Department of Cardiology of Jiading District
[66] Y. Yu, S. Gao, Y. Wang, and Y. Todo, “Global optimum-based search Central Hospital Affiliated Shanghai University of
differential evolution,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 379-394, Medical and Health Sciences. He has published more
Mar. 2019. than 20 articles. His research focus on the usage and
[67] Q. Kang, X. Song, M. Zhou, and L. Li, “A Collaborative Resource challenge of innovative technology in General
Allocation Strategy for Decomposition-Based Multiobjective Evolutionary Practice Medicine, such as Machining Learning, Sequencing, and Big-Data.
Algorithms,” IEEE Transactions on Systems, Man, and Cybernetics: Systems,
vol.49, no. 12, pp. 2416-2423, Dec. 2019. XIAOGUANG ZHOU received the M.S. degree
[68] K. Z. Gao, Z. G. Cao, L. Zhang, Z. H. Chen, Y. Y. Han, and Q. K. Pan, from the Department of Precision Instrument,
“A review on swarm intelligence and evolutionary algorithms for solving Tsinghua University, in 1984, and the Ph.D. degree
flexible job shop scheduling problems,” IEEE/CAA J. Autom. Sinica, vol. 6, in engineering from the Tokyo University of
no. 4, pp. 875-887, July 2019. Agriculture and Technology, Japan. He was a Visitor
[69] S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, "Dendritic Professor with the Tokyo University of Agriculture
neuron model with effective learning algorithms for classification, and Technology from 2001 to 2002, and a JSPS
approximation and prediction," IEEE Transactions on Neural Networks and Researcher with Tokyo University from 2013 to
Learning Systems, vol. 30, no. 2, pp. 601 - 614, Feb. 2019. 2014. He is currently a Professor, and a Doctoral
[70] G. S. Hong, “A Cost-Sensitive Deep Belief Network for Imbalanced Supervisor with the School of Automation, Beijing
Classification,” in IEEE Transactions on Neural Networks and Learning University of Posts and Telecommunications. He
Systems, vol. 30, no. 1, pp. 109-122, Jan. 2019. also serves as the Director of the Engineering Research Center of
HAO DANG received the M.S. degree in pattern Information Networks, Ministry of Education. He is the author of over 10
recognition and intelligent system from the Henan books, over 100 articles, and over 16 inventions. His research interests
University of Technology, Zhengzhou, China, in include control theory and its application in engineering, deep learning,
2016. He is currently pursuing the Ph.D. Degree in computer vision, Internet of Things and automated logistics system, and
control science and engineering with the School of mechatronics technology. He is a permanent member of the Chinese
Automation, Beijing University of Posts and Association of Automation/Manufacturing Technology Committee and the
Telecommunications, Beijing, China. His research China Institute of Communications/Equipment manufacturing technical
interests include the pattern recognition, intelligent Committee.
systems, machine learning, and so on.
19
VOLUME XX, 2019
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.