Semantic ECG Interval Segmentation Using Autoencoders
Semantic ECG Interval Segmentation Using Autoencoders
[10]. Tafreshi et al. introduced an amplitude-based method ECG signal. Finally, Subsection 3.6 describes experiments
to identify QRS-complexes specifically [8]. and the convergence of the Hybrid-ECG-SegNet. The result
Furthermore, depending on the extracted ECG features, demonstrates the strength of Hybrid-ECG-SegNet compared
various algorithms have been developed for ECG cardiac to the other sequence learners such as HMM for the same
wave classification. In particular, algorithms based on Neural task of ECG interval segmentation using raw ECG signals.
Network (NN) [11], Support Vector Machine (SVM) [12],
Naive Bayes [13], Hidden Markov Model (HMM) [14], 3.1 ECG Intervals
logistic regression [15] and ruled-based [16] methods have Normal wave complexes contain well-known patterns that
been found in research literature. can be characterized by features, such as shape formation,
Recently, Deep Learning (DL) methods have been applied interval duration, and amplitudes. However, there are many
in ECG signal processing and analysis as well [17], [18], different shapes that can be found in ECG data. Normal
[19]. In [17], DL methods were used to classify ECG signals QRS-complexes can have nine different shapes [8] and P-
into normal and abnormal ECG related to certain heart waves and T-waves can also appear in different forms and
disease symptoms directly. In [19], Abrishami, et al., used amplitudes. When the characterization of key values of
convolutional networks to localize P-wave, QRS-complex, the wave parameters and the information that they carry
and T-wave in a single cardiac complex and utilized a are understood by the clinicians, then as abnormalities are
secondary algorithm to extract cardiac complexes. Then in observed, it is possible for physicians to make decisions
[20], Abrishami, et al., used local derivative-based features that lead to differential diagnoses of cardiac diseases. Car-
and LSTM networks to capture the temporal attributes of diologists do not make diagnoses based solely on single
ECG signals. Following these works, this paper is to present features in wave complexes, such as ST depression, T-
a comprehensive hybrid deep NN, called the Hybrid-ECG- inversion, and long QT-interval, but they also identify cardiac
SegNet, which is capable of automatic feature learning and wave components in relation to each other. For example,
capturing the temporal attribute of the extracted features following a QRS-complex location is probably an S-wave.
simultaneously, to segment real-time ECG signals. Thus, knowledge of the waveâĂŹs prior location is essential
to predicting the consequent wave. Likewise, each individual
3. Methodology waveâĂŹs formation also affects the formation of other
Deep convolutional autoencoders have shown their advan- waves. Therefore, a method that is capable of keeping
tages in segmentation [2]. They have the ability to extract persistent memory, i.e., a Recurrent Neural Network (RNN),
or compress the essential information from the input and can be a viable solution to this type of time-dependent
reconstruct the desired output based on the extracted infor- problem. The recurrent neural network creates a loop to pass
mation. In addition, LSTM networks can capture temporal the information from one timestamp to another, which allows
long-term and short-term dependencies in the input. it to learn a time series [22].
The proposed method used in the Hybrid-ECG-SegNet
scheme consists of four components. The first component is 3.2 Data Preparation and Dat Set
the input layer to the normalized ECG signals. The second The data preparation step consists of preprocessing of a
component is a deep convolutional autoencoder. The convo- given data set of ECG signals to get the raw data ready for
lutional autoencoder is capable of generating feature vectors NN training. The data used for this study is the QT database
for the next layer, the LSTM layer. The third component (QTDB). QTDB was produced by PhysioNet [23] and has
is a sequence learner. The sequence learner includes two a large collection of recorded physiological signals sampled
Bidirectional LSTM (BLSTM) layers. BLSTMs have two at 250Hz. This database includes over 105 two-channeled
different hidden LSTM layers including one forward hidden ECG recordings, each 15-minutes in duration and it is chosen
layer and one backward hidden layer to receive inputs in both to include a broad variety of P, QRS, ST, and T pattern
forward and backward directions [21]. The fourth and the morphologies [23]. This dataset thus allows researchers to
last component forms the output layer, which classifies every perform research on ECG signal delineation.
data point in the time-stamp into one of the four categories, The preprocessing of raw ECG signals takes care of
namely, Neutral, P-wave, QRS-complex and T-wave. two problems: wander drift baseline caused by sampling
This Section, Section 3, contains several subsections devices and not having a defined unit to measure amplitude.
and they are organized as follows. In Subsection 3.1, the Therefore, in this step, the ECG wander drift baseline is
order and attributes of ECG intervals are discussed. The removed, and the range of ECG amplitudes range is brought
data preparation and the dataset are to be introduced in to the interval of (−1, 1). Dohare et al. [24] proposed a
Subsection 3.2. Subsections 3.3, 3.4, and 3.5 introduce the successful median filtering approach to remove the wander
novel architecture of the Hybrid-ECG-SegNet. The network drift baseline from ECG. This approach applies two median
is capable of extracting the hierarchical ECG structure, long filters, each with half the size of the sampling frequency
temporal dependencies, and short temporal dependencies of (fs ), to the ECG signal. The result of applying these two
filters is the wander drift baseline signal. Thus, a wander- max-pooling, and up-sampling layers. While convolutional
baseline-free ECG signal will be obtained by subtracting and max-pooling layers are used for encoding, convolutional
ECG signal from its wander drift baseline. The mathematical and up-sampling layers are used for decoding. The decoder
representation of removing wander drift is design follows the architecture of the Super-Resolution Con-
volutional Neural Network (SRCNN) [2]. SRCNN upsam-
XW F = ER − M (M (ER )) (1)
ples the signal to a higher resolution from a convolution layer
where XW F is wander free ECG, ER is the raw ECG and has demonstrated its strengths in reconstructing a high-
from QTDB, and M (.) operation is the result of applying a resolution image from the low-resolution image without any
median filter operation to a signal, and the size of the median obvious image artifact [2].
filter equals fs /2. Given that QTDB is sampled at 250Hz;
thus, fs = 250Hz.
The second task of data preparation is to normalize the
amplitudes to the range of (−1, 1), after removing the spikes.
For this task, Eq. 2 is used. Therefore, even when there are
still some amplitudes that can be above one they do not
affect the learning process.
XW F − μ
X= (2)
σ
where μ is the XW F mean, σ is XW F standard deviation,
and XN orm is the normalized ECG signal.
After performing these signal preprocessing steps, the
ECG signals are ready for generating a dataset.
In this study, every recording is divided into 1, 000 data
points sampled at 250Hz. Within every segment, one or
more cardiac complexes can be found, which makes the
segmenting task more challenging. The 1, 000 ECG data
points are the inputs to Hybrid-ECG-SegNet model, and
correspondingly, the related annotations are the output tar-
gets for the Hybrid-ECG-SegNet model. In total, there are
46, 690 sets of 1, 000 data points of ECG that have been
extracted from the QTDB to be used as inputs/outputs.
More specifically, for our experiments, three different sets –
training, validation, and test sets – have been created using
all the extracted ECG segments. These sets are mutually
exclusive, indicating there are no identical segments from
one recording to another and subject independent. Table 1
illustrates the training, validation and test data sets.
Table 1: Dataset
Dataset Number of samples Percentage
Training set 28,014 60%
Validation set 4,669 10%
Testing set 14,007 30%
uses LSTM cells and computes both forward and backward Table 2: Hybrid-ECG-SegNet Layers
hidden sequences. Layer Category Description of the layer Size
As mentioned earlier, it is useful to find QRS-complex Input Input ECG raw signal 1000 x 1
Convolutional layer
using prior sample points such as P-wave samples and future Layer 1
with 16 filters
1000 x 16
data points such as S-wave samples. Thus, BLSTM becomes Layer 1 Max Pooling of Size 2 x 1 500 x 16
a very viable approach to be explored for ECG segmentation Convolutional layer
Layer 2 500 x 32
with 32 filters
task. Given the rationale of the components, we define a new Layer 2 Max Pooling of Size 2 x 1 250 x 32
neural network architecture in the next section. Convolutional layer
Layer 3 Atuoencoder 250 x 32
with 32 filters
3.5 New Hybrid-ECG-SegNet Architecture Layer 3 Upsampling of Size 2 x 1 500 x 32
Layer 4 Conv. layer with 16 filters 500 x 32
Model Layer 4 Upsampling of Size 2 x 1 1000 x 16
Time Distributed
Layer 4 1000 x 16
Dense Layer
Layer 5 Sequence BDLSTM Layer 150
Layer 6 Learner BDLSTM Layer 75
Time Distributed
Layer 7 Output 1000 x 4
Dense Layer
1, 000×4 matrix, which is the annotated class obtained from Table 4: Segmentation Accuracy Comparison
QTDB. If a data point belongs to the first class, neutral, the Method P (%) QRS (%) T (%) Overall (%)
output data at that timestamp is [1, 0, 0, 0] vector. It gave Hybrid-ECG-
91.0 94.0 93.0 93.99
SegNet
the probability of 1 to the first class and the rest were 0. HMM on raw ECG
The Hybrid-ECG-SegNet is trained with Root Mean Square 5.5 79.0 83.6 56.03
[14]
Propagation (RMSProp) Optimizer [29] through 47 epochs HMM on wavelet
74.2 94.4 96.1 88.23
using mini-batch procedure of batch size 75. The training encoded ECG [14]
stopped after 47 epochs because the validation set error was
not improving after 10 epochs and this was the description of
the early stopping policy. RMSProp maintains per parameter The majority of research focuses on finding the cardiac
learning rate based on the average of the recent magnitude complex fiducial points and not segmenting ECG data points.
of the gradients for the parameter (weight). This approach Even though the Hybrid-ECG-SegNet task is different than
is recommended for non-stationary signals [29]. After 68 finding ECG cardiac waves location, it provides competitive
epochs training, the results showed 94.73% accuracy for accuracy in finding cardiac wave locations. The accuracies of
training set, 93.58% accuracy for the validation set, and finding waves regardless of segmentation for P-wave, QRS-
93.99% accuracy for the test set. complex, and T-wave are 96%, 99%, and 98%, respectively.