0% found this document useful (0 votes)
22 views

DeepApnea Deep Learning Based Sleep Apnea Detection Using Smartwatches

Uploaded by

j9712597
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

DeepApnea Deep Learning Based Sleep Apnea Detection Using Smartwatches

Uploaded by

j9712597
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

DeepApnea: Deep Learning Based Sleep Apnea


Detection Using Smartwatches
Zida Liu, Xianda Chen, Fenglong Ma, Julio Fernandez-Mendoza, and Guohong Cao
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom) | 979-8-3503-2603-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/PERCOM59722.2024.10494473

The Pennsylvania State University


E-mail: {zjl5310, xuc23, fenglong, jfmendoza, gcao}@psu.edu

Abstract—Sleep apnea is a serious sleep disorder where pa- sleep monitoring and apnea detection. For example, with the
tients have multiple extended pauses in breath during sleep. wide deployment of wireless technology, many researchers [6]
Although some portable or contactless sleep apnea detection [7] [8] [9] leverage sound waves, WiFi or radio frequency
systems have been proposed, none of them can achieve fine-
grained sleep apnea detection without strict requirements on the signals to measure the chest movements during patients’ sleep.
device or environmental settings. To address this problem, we The chest movement due to breathing can be identified by
present DeepApnea, a deep learning based sleep apnea detection analysing the properties of the wireless signal, i.e., the channel
system that leverages patients’ wrist movement data collected by state information or the shift in carrier frequency. Although
smartwatches to identify different types of sleep apnea events (i.e., wireless technologies can extract breathing information for
central apneas, obstructive apneas, and hypopneas). Through a
clinical study, we identify some special characteristics associated monitoring sleep, they either require customized hardware,
with different types of sleep apnea captured by smartwatch. have strict environmental restrictions, and hence cannot be
However, there are many technical challenges such as how to largely deployed or cannot detect abnormal breathing signals
extract informative apnea features from the noisy data and (i.e., sleep apnea).
how to leverage features extracted from the multi-axis sensing Compared to these systems based on wireless technology,
data. To address these challenges, we first propose signal pre-
processing methods to filter the raw accelerometer (ACC) data, the wristband-based methods [10] [11] [12] and the geophone-
smoothing away noise while preserving the respiratory signal based method [13] [14] can measure the respiration signal
and potential features for identifying sleep apnea. Then, we with widely adopted wearable devices such as smartwatches
design a deep learning architecture to extract features from three or through multiple geophone sensors. However, they can
ACC axes collaboratively, where self attention and cross-axis only provide some coarse-grained sleep data such as the
correlation techniques are leveraged to improve the classification
accuracy. We have implemented DeepApnea on smartwatches respiratory rate, and they are not capable of detecting sleep
and performed a clinical study. Evaluation results demonstrate apnea. ApneaDet [15] is the first smartwatch-based system
that DeepApnea can significantly outperform existing work on which exploits the built-in sensors in smartwatch to detect
identifying different types of sleep apnea. sleep apnea. Specifically, it leverages the accelerometer (ACC)
Index Terms—Apnea Detection, Deep Learning, SmartWatch to monitor the wrist movements, then extracts respiratory
information from the ACC data for apnea detection. How-
I. I NTRODUCTION ever, it was designed for achieving binary classification, i.e.,
Sleep apnea is a serious sleep disorder where patients have differentiating sleep apnea from normal sleep, which limits its
multiple extended pauses in breath during sleep. Sleep apnea is general application.
linked to many diseases, such as high blood pressure, chronic There are three different kinds of respiratory events as-
heart failure, depression, obesity, and daytime fatigue [1]. It sociated with sleep apnea (i.e., central apneas, obstructive
is estimated that more than 22 million Americans suffer from apneas, hypopneas), and distinguishing these different kinds of
sleep apnea [2]. Although the US government spends more respiratory events is very important. This is because different
than 150 billion on sleep apnea [3] every year, about 75% of respiratory events have different etiology (e.g., central apneas
people with moderate and severe apnea are still undiagnosed are caused by the brain stopping the breathing process, while
[4]. obstructive apneas are caused by local collapsibility of the
To diagnose sleep apnea, the commonly used method is upper airway) and they have links to different diseases. Thus,
the polysomnography (PSG) test, which requires the subjects identifying all three types of sleep apnea can help clinicians
to wear more than 20 wired sensors, including the pulse provide better diagnosis and treatment [5].
oximeter, pressure transducer, thermocouple, and electrodes There are many technical challenges for identifying three
placed at different parts of the body. It is uncomfortable for different types of sleep apnea. First, the wrist movement gener-
many patients and can even affect their sleep and the diagnosis ated by breath or lung movement is very subtle. The raw ACC
results [5]. Moreover, such in-lab PSG test is expensive, data recorded by smartwatch contains a large amount of noise,
cumbersome, and time-consuming, and thus many potential which makes it harder to extract the respiratory information.
patients cannot be timely diagnosed, endangering their health. Second, existing machine learning features used for binary
In order to overcome these shortcomings of the PSG test, classification do not work well for multi-classification, i.e.,
many contactless or wearable systems have been proposed for identifying three types of sleep apnea. This is because it is

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2603-1/24/$31.00 ©2024 IEEE 206
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

1 1
relatively easier to differentiate normal sleep from abnormal 1
Central Apnea

Amplitude( U)

Amplitude( U)
Amplitude( U)
Obstructive Apnea
Hypopnea
0 0
0
sleep apnea, but it is much harder to identify different kinds of
1 0 10 20 30 40 50 60 1 0 10 20 30 40 50 60 1 0 10 20 30 40 50 60
sleep apnea events. Third, based on the sleeping posture and Time(s) Time(s) Time(s)

the wrist position, the three ACC axes may carry different (a) hypopnea (b) obstructive apnea (c) central apnea
amount of respiratory information. How to leverage such
Fig. 1. The airflow (nasal pressure)
information to identify sleep apnea remains as a challenge. 3.7 1.1 7.8

In this paper, to address these challenges, we propose a 3.8 1.2

ax(m/s2)

ax(m/s2)
7.7

ax(m/s2)
3.9 1.3
4.0 7.6
smartwatch-based system named DeepApnea, which can detect 4.1
0 10 20 30 40 50 60
1.4
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time(s) Time(s) Time(s)
different types of sleep apnea. We first propose signal prepro-
(a) hypopnea (b) obstructive apnea (c) central apnea
cessing methods to filter the raw ACC data, smoothing away
noise while preserving the respiratory signal and potential Fig. 2. The raw ACC data (x axis), which corresponds to the subfigures in
features for identifying sleep apnea. Then, we design a deep Fig. 1.
learning architecture to extract features from three ACC axes
collaboratively. Specifically, we apply self attention technique Fig. 2 shows the raw ACC data collected using smart-
to accentuate more significant features and apply cross-axis watches. In general, respiration leads to the periodic subtle
correlation technique to exploit the correlations among dif- movement of the chest, abdomen, arms and wrists and these
ferent axes. The extracted deep features and the correlation movements can be recorded by the ACC in smartwatch.
information are merged through aggregated classification to Fig. 2(a) shows the ACC data corresponding to Fig. 1(a).
further improve the classification accuracy. Between the 12th and 36th seconds, labeled by the technician,
The main contributions of the paper are as follows. a hypopnea happens. During this time, the respiration becomes
• To the best of our knowledge, this is the first work to
shallow, so the amplitude change of airflow will decrease
identify three types of sleep apnea (hypopneas, obstruc- and slimier change is reflected on the ACC data. Fig. 2(b)
tive apneas, central apneas) only using wrist-worn ACC (corresponding to Fig. 1(b)) presents the ACC data during
data. an obstructive apnea. In obstructive apnea, after a respiratory
• We propose signal preprocessing techniques to extract
blockage for several seconds, the subject is likely to make
accurate representations of the respiratory signal from the one or several intense breaths before returning to normal
raw noisy ACC data. breathing, leading to the signal spike around the 42th seconds.
• We design a deep learning model to automatically extract
In Fig. 2(c), since the subject holds breath during central
informative apnea features from three ACC axes and apnea, the ACC data are flat and there is no intense spike
wisely fuse these features to improve performance. after this holding.
• We have implemented DeepApnea on smartwatches and
performed clinical study. Evaluation results show that 
'LVWDQFHRISHDNV

&HQWUDO$SQHD
+\SRSQHD
DeepApnea significantly outperforms existing work on  2EVWUXFWLYH$SQHD
1RUPDO
identifying three types of sleep apnea.

II. BACKGROUND AND M OTIVATION 
There are three types of respiratory events associated with      
1XPEHURISHDNV
sleep apnea [16]. A Central Apnea occurs when the subject
holds his/her breath for a long period of time, typically Fig. 3. The infeasibility of using hand-crafted features to differentiate three
10 to 30 seconds. During central apnea, the human brain types of sleep apnea.
fails to provide the signal to inhale, resulting the absence
of breathing effort. A hypopnea occurs when the subject’s Based on this study, we can see that different sleep apnea
breathing becomes shallow. Specifically, patients will lose 30% can lead to different pattern of ACC data. With machine learn-
to 90% of normal airflow. This procedure usually lasts more ing techniques, sleep apnea can be identified. Unfortunately,
than ten seconds. An Obstructive Apnea occurs when there since the ACC data is very noisy and the wrist movement
is a complete or partial blockage of the upper airway during is very subtle, it is hard to use simple machine learning tech-
sleep. The subject makes an effort to pull air into the lungs, niques to identify different types of sleep apnea (i.e., multiclass
however, the air does not reach the lungs because of blockage. classification) although it is possible to differentiate sleep
Fig. 1 shows the airflow measured by the nasal pressure sensor apnea events from normal sleep (i.e., binary classification)
for different sleep apneas types. which was the design goal of [15]. For example, the number of
Based on our previous clinical study [15], we demonstrated respiration peaks and the maximum distance between two con-
the feasibility of apnea detection with a smartwatch. This secutive respiration peaks are commonly used as features [15]
clinical study was conducted with twenty subjects at Penn [17] for sleep apnea detection. Fig. 3 visualizes different types
State Hershey Sleep Research & Treatment Center. Each of apnea events using a two-dimensional scatter plot, where
subject wears a smartwatch to collect the ACC data of the the horizontal dimension represents one feature (the number
wrist movement, during a regular PSG test. of peaks) and the vertical dimension represents another feature

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
207
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

0.3 9.85
0.8

ax(m/s2)

ay(m/s2)

az(m/s2)
0.2 9.80
0.9 9.75
0.1
0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time(s) Time(s) Time(s)

(a) X axis (b) Y axis (c) Z axis

Fig. 4. The ACC data along three axes in an obstructive apnea.

(maximum distance of peaks). Although normal sleep events III. S YSTEM OVERVIEW
can be easily distinguished from apnea events, different types
The overall design of DeepApnea is shown in Fig. 5. During
of apnea events are overlapped with each other and there is
sleep, the smartwatch on the subject’s wrist records data
no obvious decision boundary to distinguish them.
generated by the accelerator sensor. When the subject wakes
To address this problem, we propose deep learning tech-
up, the smartwatch stops recording and the collected data can
niques to identify different types of sleep apnea. Although
be forwarded to the subject’s smartphone through Bluetooth
deep learning has been proved to help extract more represen-
for further analysis. The raw acceleration data is preprocessed
tative features than traditional machine learning methods in
by the signal prepossessing module, and then forwarded to a
many areas, simply applying widely used deep learning models
deep learning module which extracts representative features
such as CNN and LSTM to our problem may not work. This
and classifies into four sleep events - normal sleep, hypopnea,
is because they only treat the triaxial data as one single input
obstructive apnea, and central apnea.
feature without considering the heterogeneity among different
axes. In practice, based on the sleeping posture and the wrist
position, the collected ACC data along each axis may be
different. Instead of only using the data from one axis, this
multi-dimensional data can help us obtain more information.
However, if the ACC data is not processed properly, it may
have adverse effects. For example, Fig. 4 shows the raw data
recorded from one obstructive apnea event. X axis and Y axis
have a similar data pattern (e.g., they both contain a signal
spike near 29nd second) whereas Z axis does not have it.
Simply fusing the data from three axes through basic deep
learning operations (e.g., average pooling) may result in large
errors.
To deal with this problem, we consider the data reliability Fig. 5. System architecture of DeepApnea.
from each axis. For example, we can assign a higher weight to • Signal preprocessing module: The raw ACC data col-
more informative axes X, Y and lower weight to less informa- lected by the smartwatch contains a large amount of
tive axis Z in Fig. 4. However, in practice, it is hard to man- electronic and mechanical noises, which make it harder
ually determine such weights due to the data heterogeneity. for the deep neural network to extract useful features from
Since the ACC data from three axes jointly represents the wrist the time-series data. To deal with these problems, we
movement, there exist correlations between different axes’ have the following three steps for signal preprocessing:
ACC data (e.g., Y is like an inverse of X in Fig. 4 whereas Z is 1) signal resampling: Resample the raw data at a fixed
less correlated with X and Y). Based on existing research [18], rate to mitigate fluctuations from the actual sampling
the learning performance can be improved by leveraging the rate caused by system operations. 2) signal denoising:
correlations between different data sources or feature subsets. Utilize a signal filter to remove unnecessary noise while
Since data from each axis can be treated as a single data source preserving the breathing signal. 3) signal normalization:
of the patient’s wrist movement, the detection accuracy can Normalize the data to mitigate high variations introduced
be improved by exploiting the correlations among different by varying sleep poses and wrist positions. The details of
axes. Thus, we propose the Cross-axis correlation technique these steps will be presented in Section IV.
(section V-C), to explore correlations between different axes • Deep learning module: In order to extract informative
and automatically assign different weights to different axes. features from the prepossessed data and effectively lever-
Additionally, within a single data segment collected from age signals from three axes, the deep learning module has
one axis, certain parts of the data may contain more valuable the following steps. First, the prepossessed data of each
information than others. For example, a signal spike part might axis is fed to its specific CNN-based feature extractor to
be more representative of recognizing obstructive apnea. Thus, obtain the corresponding deep features. Second, Self at-
it is crucial to focus more on the informative parts. To achieve tention and Cross-axis correlation techniques are applied
this objective, we employ self-attention techniques (section to these deep features to obtain the weighted deep features
V-B), which assign higher weights to the informative parts, of each axis and the correlation information between
thereby further enhancing the detection accuracy. any two axes. Lastly, both the weighted deep features

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
208
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

9.4 9.25
and correlation information are merged with Aggregated
9.2

ax/m2

ax/m2
classification to classify the sleep event, i.e., normal, 9.15

hypopnea, obstructive apnea, or central apnea. The details 9.0 signal spike
0 10 20 30 40 50 60 9.05 10 20 30 40 50 60
of these steps will be presented in Section V. Time(s) Time(s)
(a) raw data (b) moving average
IV. SIGNAL PREPROCESSING 9.25 9.25

ax/m2

ax/m2
A. Signal Resampling 9.15 9.15

To collect data with smartwatch, we first setup a sampling 9.05 0 10 20 30 40 50 60 9.05 0 10 20 30 40 50 60


Time(s) Time(s)
rate higher than the respiration rate so that all useful signal can
be preserved. However, in most commercial systems, the real (c) TV filter (d) ADA
sampling rate may fluctuate around the expected sampling rate Fig. 6. The raw and filtered ACC data with different denoising methods.
due to many uncontrollable system operations. For example,
we use a smartwatch (Huawei Watch 2) to collect the sleep obstructive apnea from other apnea classes. However, as shown
data. Although the sampling rate was set to SENSOR -DELAY- in Fig. 6(b), the moving average filter smooths away the signal
GAME (i.e., 50Hz) through the Android API, the real sampling spike.
rate varies from from 40 Hz to 60 Hz. This will create
To preserve all useful signal information, e.g., the signal
problems for running the deep learning model which require
spike, most existing research [15] relies on the Total Variation
the input data to have fixed size. Therefore, we need to re-
filter (TV filter) [21] for signal denoising. Although, TV filter
sample the collected data with a certain sampling rate before
can preserve the respiratory spikes well, it cannot remove the
sending it to the deep learning model.
low-amplitude noise as shown in Fig. 6(c). This is because TV
The Fourier method [19] is adopted to resample the raw
filter only aims to minimize the sum of the variation between
data, because it can avoid information distortion (e.g., alias-
two adjacent signal values over the whole signal sequence
ing) during resampling and well preserve the information of
without considering that the denoised signal should be locally
the original signal. Fourier method first leverages Discrete-
smooth. As a result, TV filter eliminates the high-amplitude
time Fourier Transform (DTFT) to transform the accelermoter
noise, but low-amplitude noise still remains (e.g., the 20th to
signal into frequency domain. Then, during Inverse Discrete
25th, and the 48th to 50th seconds in Fig. 6(c)).
Fourier Transform (IDFT), we can eliminate aliasing by lim-
To keep the periodic respiratory information and remove all
iting the highest frequency to half of the sampling rate and
unnecessary noise, we design an adaptive denoising algorithm
obtain resampled data points with the same time interval. The
based on [22]. Our goal is to not only eliminate the high-
procedure is expressed in the following equations: X̂[k] =
PN −1 −j 2π nk 1
PM/2−1 j 2π mk amplitude noise but also achieve local smoothness. To achieve
n=0 e x[n] and ẋ[m] = M e M X̂[k].
N
k=0 this goal, we first divide the raw signal segment into partially
The first equation represents DTFT, where j is the imag-
overlapped subsegments. Then, each subsegment is denoised
inary unit, N is the number of data points of the raw
separately based on their own signal trend and merged together
signal and x[n] denotes the nth raw data points. Through
by using a linear weighting mechanism.
it, we can get different frequency components X̂[k], where
k = 0, 1, 2, ..., N − 1. The second equation represents IDFT, The algorithm is shown in Algorithm 1. The input signal
where M is the number of data points after resampling and is divided into m subsegments. Each subsegment contains
x[m] denotes the mth data point of the resampled signal. 2n + 1 data points, where adjacent subsegments overlap by
When calculating ẋ[m], we only consider 0th to (M/2 − 1)th n+1 points. For each subsegment, a polynomial function with
frequency components to avoid aliasing. Finally, we can obtain order K is used to fit its data to extract the respiratory signal
the rasampled signal ẋ[m] from the raw signal x[n]. trend and eliminate noise. Then the denoised subsegments
are concatenated together by leveraging the overlapping area,
B. Signal Denoising where a weighted sum is used to recalculate the data points in
The raw signal collected by the smartwatch contains a large the overlapping area. The weighted sum ensures symmetry and
amount of electronic and mechanical noise, which makes it effectively eliminates any jumps or discontinuities around the
harder for the deep neural network to extract useful features boundaries of neighboring subsegments, and the subsegment
from the time-series data. Therefore, we need to design an polynomial fitting ensures the local smoothness. Fig. 6(d)
effective filter to filter out the noise. shows that designed adaptive denoising algorithm (ADA) can
The moving average filter [20] is a widely-used filter for preserve the useful signal and filter out the noise.
denoising. Since this method simply averages different sub- The algorithm contains two important parameters: K and n.
sequences of the signal, it also eliminates potential useful As shown in Fig. 7, choosing different K and n can lead to
information for apnea classification. For example, Fig. 6(a) different denoising results. Since low polynomial order lacks
shows the raw ACC signal representing an obstructive apnea the ability to represent complex signal trends, setting K to a
event. There is a signal spike when the subject tries to make small value may filter out the useful information. For example,
an intense breath after an obstructive apnea event, and such as shown in Fig. 7 (a), the periodic respiratory movement
a signal spike can serve as the feature for distinguishing signal is also filtered out. On the other hand, if K is too large

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
209
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

Algorithm 1: Adaptive Denoising Algorithm CNN-based feature extractor to obtain the corresponding deep
1 Input: (1) raw ACC data D; (2) half length of features. Second, these deep features are sent to two modules
subsegment n; (3) polynomial order K; in parallel - Self attention and Cross-axis correlation for
2 Output: denoised accleromter data array Pdenoised ; obtaining the weighted deep features of each axis and the
3 Function ADA(D, n, K): correlation information between any two axes. Lastly, both the
4 Initialization: an empty array P ; weighted deep features and correlation information are merged
5 D → {d(1) , d(2) , d(3) , .., d(m) }; in the Aggregated classification module to classify the sleep
event. The rest of this section presents the details of these four
6 for each data segment d(i) do
modules.
7 f ← polyfit(d(i) (x), K);
8 p(i) (x) ← f (x);
9 end for
10 P .append(p(1) (x)), where x = 1, .., n;
11 for every overlap part of two adjacent denoised
subsegment do
(j,j+1)
12 poverlap (x) ←
w1 ∗ p(j) (x + n) + w2 ∗ p(j+1) (x), where
x = 1, .., n + 1, w1 = (1 − (x − 1))/n,
w2 = (x − 1)/n;
(j,j+1)
13 P .append(poverlap (x));
14 end for
15 P .append(p(n) (x)), where x = n + 1, .., end;
Fig. 8. The architecture of the proposed DeepApnea model.
16 Pdenoised ← P ;
17 Return: Pdenoised ;
A. CNN-based Feature Extractor

9.3
Traditional machine learning methods can only extract
9.20 9.20 9.2
apnea features based on the designers’ domain knowledge
ax/m2
ax/m2

ax/m2

9.18
9.16
9.15
9.1 without exploiting the unknown apnea features, hindering their
0 10 20 30
Time(s)
40 50 60 0 10 20 30
Time(s)
40 50 60 0 10 20 30
Time(s)
40 50 60 capability of distinguishing different types of apnea events. To
(a) n = 10, k = 1 (b) n = 10, k = 4 (c) n = 10, k = 8 solve this problem, we build a CNN to automatically extract
9.3
9.22 apnea features from the collected data.
9.20 9.20
9.2
ax/m2

The CNN consists of four convolutional layers and two


ax/m2

ax/m2

9.1 9.15 9.18

9.0 0 10 20 30 40 50 60 0 10 20 30 40 50 60
9.16
0 10 20 30 40 50 60
max pooling layers. Each layer is a non-linear operation. The
Time(s) Time(s) Time(s)
multi-layer non-linear operations make the obtained features
(d) n = 2, k = 4 (e) n = 10, k = 4 (f) n = 30, k = 4 more sensitive to different apnea types, and less sensitive
Fig. 7. The filtered ACC data (along X axis) with different ADA parameter to irrelevant variations coming from other factors such as
settings. physical device, patient, environment, etc. The parameters of
each layer are shown in Table I. In addition, we adopt a
as shown in Fig. 7 (c), overfitting happens and there are still batch normalization layer after each convolutional layer and a
too much noises. dropout layer after each pooling layer, respectively, to prevent
With a small n, as shown in Fig. 7 (d), the subsegment over-fitting.
is too small and there is not enough information to fit the Since different axes record wrist movements in different
polynomial to filter out the noise. With a large n, as shown in directions, we prepare a separate feature extractor for each
Fig. 7 (f), the subsegment is too long and the fitted polynomial axis. Let X = {xxi , xyi , xzi }ni=1 represents all the samples of
lacks the capability to represent all the data variations implying input data, where xxi denotes the time-series signal along the
useful respiratory information. We experimentally determine X axis of the ith sample. The feature extractor is denoted
the combinations of K and s and found that when K = 4 and as F(x, θF ), where θF represents the trainable parameters of
n = 10, the noise is eliminated and the periodic respiratory the extractor. After feature extraction, we can obtain the deep
movement can be preserved well, as shown in Fig. 7 (b) (e). features Dix , Diy , and Diz along the three axes:
Thus, we use such a setting in the rest of the paper.
Dix = F(xxi , θFx ), Diy = F(xyi , θFy ), Diz = F(xzi , θFz ) (1)
V. DEEP LEARNING A RCHITECTURE
Specifically, Dix , Diy and Diz ∈ RT ×F , where Te represents
e

In order to extract informative features from the prepos- the spatial dimension and F denotes the feature dimension.
sessed data and effectively leverage signals from three axes, After obtaining the deep feature of each axis, they will be
we propose a deep learning architecture as shown in Fig. 8. sent to the Self attention and Cross-axis correlation module
First, the prepossessed data of each axis is fed to its specific for further processing.

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
210
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

TABLE I axis. High similarity suggests that the two axes convey sim-
T HE ARCHITECTURE OF THE CNN- BASED F EATURE E XTRACTOR . ilar apnea-related information. When such high similarity is
Layer Size In Size Out Filter identified, our model puts more weights on the information
conv1 584 × 1 98 × 64 24, 6
conv2 98 × 64 49 × 64 8, 2 from these axes. In deep learning, the similarity of two feature
pool1 49 × 64 24 × 64 2, 2 vectors is often assessed using element-wise difference [26]
conv3 24 × 64 24 × 128 4, 1
conv4 24 × 64 24 × 128 4, 1 [27]. Thus, we employ this method to quantify the correlation.
pool2 24 × 128 12 × 128 2, 2
Taking X-axis as an example, the cross-axis correlation vector
B. Self Attention is as follows:
The deep feature from different axes may carry different x|y x|z
Di = Dix − Diy , Di = Dix − Diz
amount of respiratory information due to various reasons, e.g., (3)
x|y x|z
the pose of patients’ wrists and hence the smartwatch. It is Cix = σ(W x|yz ⊙ {Di ⊕ Di } + bx|yz )
natural to assign different weights to the data collected from where ⊙ represents the convolutional operation, and ⊕
different axis for measuring each axis’s equality or reliability. represents the concatenate operation. W x|yz and bx|yz are
Moreover, even within a single data segment collected from trainable parameters. The superscript denotes the axis relation-
one axis, some parts of the data may contain more useful x|y
ship, e.g., Di means the correlation between X-axis and Y-
information than other parts. For example, the part of signal x|y x|z
axis. After obtaining the correlation vector Di and Di , we
spike is more representative to recognize obstructive apnea.
concatenate them together and use one fully connected layer to
Therefore, in addition to setting different weights for different
extract more information to represent X-axis correlation with
axes, we also set different weights for different parts of the
the other two axes, which is Cix . The cross-axis correlations
data segment.
Ciy for Y-axis and Ciz Z-axis are calculated by the same way.
To achieve this goal, we leverage the self-attention mech-
After capturing the correlation vectors Cix , Ciy , Ciz , they are
anism. It is a weighted aggregation method to obtain better
sent the Aggredated classification model and served as the
representations of the signal and it has been successfully
cross-axis weights. By incorporating all the mutual correla-
applied to many deep learning applications such as sentence
tions among the three axes, our deep learning model can assign
embedding [23], speech and activity recognition [24] and
higher weight to more informative axis and lower weight to
disease diagnosis [25]. Self-attention mimics cognitive atten-
less informative axis.
tion. For a single data segment, it enhances some parts while
diminishing other parts. Specifically, it forces deep neural D. Aggregated Classification
network to devote more focus on that small but important part To capitalize on the valuable features from all axes and
of the input data. After extracting the deep features from each allocate suitable weights to each axis, we conduct a dot-
axis, we apply self-attention mechanism on them separately to multiplication between the output vectors of the self-attention
assign intra-axis weights. More specifically, and cross-axis correlation modules. This enables our frame-
Axi = σ(W x Dix + bx ), Six = Axi · Dix work to emphasize the informative parts within a single data
segment of each axis and also prioritize the more informative
Ayi = σ(W y Diy + by ), Siy = Ayi · Diy (2)
axes. Specifically, we have:
Azi = σ(W z
Diz z
+ b ), Siz = Azi · Diz
Hix = Six ⊙ Cix , Hiy = Siy ⊙ Ciy , Hiz = Siz ⊙ Ciz
where is the convolution operation and · is dot product. (4)
W and b are trainable parameters of one-layer convolution Hixyz = Hix ⊕ Hiy ⊕ Hiz , Fixyz = Pooling(Hixyz )
operation. Ai ∈ RT ×F is the self-attention weight which is
e
where Hix , Hiy , Hiz ∈ RT ×F are the final features of each
e
controlled by the signal sequence itself. With such a mecha- axis combining both the result of self-attention module and
nism, our model can learn to focus more on the informative cross-axis correlation module. Then, fused feature Hixyz can
locations of each axis. Finally, we can obtain the weighted be obtained by concatenating Hix , Hiy , Hiz along the feature
deep features Six , Siy and Siz . dimension. After applying an average pooling layer on Hixyz ,
C. Cross-axis Correlation the final fused feature Fixyz ∈ RT ×F is calculated. Finally,
e

Although the self-attention module can assign weights au- the hybrid fused feature is fed into a 2-layer fully-connected
tomatically for each axis to obtain weighted features, it only network. The first layer is a fully-connected layer, which is
considers the signal from each axis independently without activated by the ReLU function, while the the second layer is
leveraging the correlations among them. Since the ACC data a softmax layer to calculate the probability of the four types
from all three axes can record the wrists’ movement informa- of sleep events. The class with the maximum probability will
tion collaboratively during patients’ sleep, as mentioned at the be considered as the classification result.
end of Section II, we should leverage the correlation between VI. EVALUATIONS
different axes to assign appropriate weight for each axis, which
can further improve the performance. A. Clinical Study
We assess the correlation between different axes by an- We conducted a clinical sleep study at at Penn State Milton
alyzing the similarity of deep features extracted from each S. Hershey Medical Center, with approval by our Institutional

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
211
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

Central Apnea peak distance, peak number, and peak amplitude introduced in
400 Obstructive Apnea
Event Number
ApneaDet as input features. Note that ApneaDet has the same
300 Hypopnea performance as RF since it uses RF as the classifier.
2) Deep learning methods: Comparing to traditional ma-
200
chine learning methods, deep learning based methods have
100 been proved to be more effective on analysing time-series
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 signal in many applications. AHF-CNN [29] adopts a 6-
Subject layer convolutional neural network to automatically extract
features based on the ACC data collected by IoT devices for
Fig. 9. The number of central apnea, Hypopnea, and obstructive apnea for human fall detection. However, this method only considers
each subject.
the triaxial ACC data as one single input feature with three
dimensions without considering different modalities of the
Review Board (IRB). The details of the clinical study were three inputs. For fair comparison, we also compare to the
presented in [15]. The study includes twenty subjects (eight following two methods which consider multi-modal or multi-
males and twelve females), and their ages vary from 36 to view data as their inputs. The first is MM-CNN [30], which
72, with an average of 59.3. The subjects presented certain designs a multi-channel CNN model for learning the features
diversity in terms of the severity of sleep apnea and they were from different types of polysomnography signals (e.g., EEG,
prescribed to undergo the regular polysomnography (PSG) EMG, and EOG) to distinguish sleep stages. Although, for
study without receiving continuous positive airway pressure each type of PSG data, MM-CNN adopts a separate CNN
therapy. During the PSG study, all the patients were required to channel to learn the signal’s temporal context information,
wear Huawei Watch 2 to collect the ACC data. The smartwatch it treats all the channels equally and does not exploit the
is fully charged before recording the sensor data, to make consensual and complementary information between them.
sure that it is able to record the sensor data for whole night The other one is DeepSense [31], which is a deep learning
which is around eight hours. The smartwatch and the PSG framework for analysing the signals from different mobile
equipment time are synchronized at the beginning of recording senors. It first converts the original signal of different mobile
so that we can obtain the corresponding period of the sensor sensors into the frequency domain, and then leverages CNN
data when apnea events occur. The sleep apnea events are and RNN to take advantage of the interactions among different
labeled by the sleep physician as the ground truth according input modalities. Although DeepSense has been demonstrated
to patients’ PSG test. In total, we set the window size to being effective in multiple challenging tasks through learning
be 60 seconds and recorded 2822 normal sleep events, 1018 the correlation between different types of input signals, the
obstructive apneas, 125 central apneas, and 818 hypopneas. quality of these inputs is not well considered. In DeepApnea,
Fig. 9 shows the number of sleep apnea events of each subject. our self-attention module are designed to address this problem.
Some subjects suffer from severe obstructive apnea but mild
C. Experiment Setup
hypopnea such as subject 3 and subject 19, whereas others
have more hypopnea but less obstructive apnea. In general, All the collected raw ACC data are preprocessed based
the number of central apnea is much smaller than that of on the techniques introduced in Section IV, i.e., resample
obstructive apnea and hypopnea. at 8Hz, signal denoising, and normalization. Then, we train
and examine the proposed DeepApnea model based on the
B. Comparing Methods processed data.
Since there is no existing work for identifying four types of To train our DeepApnea model, we use categorical cross-
sleep apnea events using purely wrist-worn ACC, we choose entropy to calculate the training loss and adopt the Adam
the following relevant methods as baselines. optimizer for updating the model’s parameters. All the pa-
1) Traditional machine learning methods: Many traditional rameters are initialized using HeNormal initializer and we
machine learning approaches such as naive bayes (NB), de- train the model for 500 epochs with initial learning rate
cision tree (DT) , random forest (RF), adaboost (ABT) and of 0.005. Specially, we apply the 3-fold cross-validation for
support vector machine (SVM) have been widely used for apnea classification. The final results are calculated based on
recognizing time-series signal. For example, [28] successfully the mean values of these cross-validation experiments. The
identifies fingerprints by recognizing fingerprint-induced sonic performance is measured in terms of accuracy and F1-score.
waves through LR, SVM and RF. ApneaDet [15] applies RF
on different hand-crafted features of the ACC signal (e.g., D. Overall Performance Comparison
peak distance, peak number, and peak amplitude) and realizes We compare the overall performance of DeepApnea with
high recognition accuracy on binary sleep apnea classification different machine learning methods introduced in Section
task. In our evaluation, we compare with NB, DT, SVM, VI-B, and the results are shown in Fig. 10. As can be seen
RF, and ABT. For fair comparison, we use the same signal from the figure, DeepApnea achieves the best performance in
processing pipeline introduced in ApneaDet to extract hand- terms of accuracy and Macro-F1 scores. By comparing Fig. 10
crafted features, and use the same hand-crafted features such as (a) and (b), we can see that the performance improvement of

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
212
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

1.000 0.8
0.9

Macro F1-score
0.8
0.975 0.6
Accuracy

F1-score
F1-score
0.8 0.6 0.950 0.4
0.7 0.925
0.4 0.2
0.6 0.900
NB DT SVMeaDet ABT-CNN-CNNSensepnea NB DT SVMeaDet ABT-CNN-CNNSensepnea NB DT SVMeaDet ABT-CNN-CNNSensepnea NB DT SVMeaDet ABT-CNN-CNNSensepnea
Apn AHF MMDeepDeepA Apn AHF MMDeepDeepA Apn AHF MMDeepDeepA Apn AHF MMDeepDeepA

(a) Accuracy (b) Macro-F1 score (a) Normal (b) Hypopnea


0.8 0.6

F1-score

F1-score
Fig. 10. Overall performance comparison. 0.6 0.4
0.4 0.2
DeepApnea is much higher when Macro-F1 score instead of 0.2 0.0
accuracy is used as the performance metric. As explained in NB DT SVMeaDet ABT-CNN-CNNSensepnea NB DT SVMeaDet ABT-CNN-CNNSensepnea
Apn AHF MMDeepDeepA Apn AHF MMDeepDeepA
the last section, accuracy is a good metric to show the overall
classification result. Due to the imbalance of the data set, the (c) Obstructive apnea (d) Central apnea
result is dominated by the classification of normal events, and Fig. 11. Per-class performance comparisons.
it cannot accurately reflect the classification results of different
types of sleep apnea events. By treating different sleep events hypopnea, obstructive apnea, and central apnea. For normal
equally, macro-F1 score can better reflect the classification sleep events, as shown in Fig. 11 (a), traditional machine
results of different types of sleep apnea events. learning methods can achieve almost perfect performance (i.e.,
Traditional machine learning methods do not perform well about 97% using DT, RF, and ABT) based on hand-crafted
on distinguishing different types of apneas. For instance, while features introduced in ApneaDet [15]. This is consistent with
ApneaDet [15] can achieve a high F1-score (Fig. 11 (a)) the results in [15], which only focuses on classifying sleep
on binary classification task, its F1-score drops to around events into normal and sleep apnea.
55% when applied to this multi-classification task. Com- For hypopnea, obstructive and central apnea, as shown
pared to traditional feature-based machine learning methods, in Fig. 11 (b), (c) and (d), deep learning-based methods
deep learning methods perform better. As shown in Fig. 10 outperform traditional methods. In Fig. 11 (d), we can see
(a), AHF-CNN, MM-CNN, DeepSense, and DeepApnea have that SVM and ApneaDet underperform other traditional meth-
higher accuracy than NB, DT, SVM, ApneaDet, ABT. Similar ods. This is attributed to their necessity to map simple
advantage can also be seen in Fig. 10 (b). This is because tra- hand-crafted features into a highly dimensional space, lead-
ditional feature-based machine learning method only considers ing to potential overfitting with limited central apnea data
the properties of respiratory peaks as the hand-crafted features (Hughes Phenomenon). Different from APT-CNN, MM-CNN
which ignores the signal trend and other potentially useful and DeepSense, DeepApnea leverages self-attention and cross-
apnea characteristics. On the other hand, deep learning-based axis correlation modules to obtain better features and adopts
methods do not adopt hand-crafted features and automatically several techniques for preventing overfitting such as the batch-
learn the appropriate features for sleep apnea classification. normalization layers and dropout layers. Thus, DeepApnea can
Compared to other deep learning based methods, our achieve the highest F1 score on central apnea.
DeepApnea model can achieve much better performance. Overall, our DeepApnea model can achieve 99.3%, 71.6%,
Specifically, compared to AHF-CNN, our model improves 82.8% and 68.2% F1 score for normal sleep, hypopnea,
the accuracy by 5.2% and improves the macro-F1 score by obstructive apnea, and central apnea.
18.9%. This is because AHF-CNN only considers the triaxial
ACC data as one single feature without considering different F. Axis Data Fusion Study
modalities among the three inputs. As a result, the information
In DeepApnea, data from three axes are leveraged. In this
from different axes cannot be effectively leveraged, and hence
section, we demonstrate why such data fusion is necessary
underperforms our model. Compared to MM-CNN, our model
for improving performance. We compare DeepApnea with the
improves the accuracy by 4.9% and improves the macro-F1
following simplified models.
score by 17.9%. Although MM-CNN considers the ACC signal
from each axis, it treats data from each axis equally and does • DeepApnea-X (DeepApnea-Y/DeepApnea-Z): It only
not take advantage of the correlation between different axes. takes the ACC data from X-axis (Y-axis/Z-axis) as the
Compared to DeepSense, our model improves the accuracy by input. Since there is only data from a single axis, there
3.8% and improves the macro-F1 score by 16.7%. Although is no cross-axis correlation.
DeepSense extracts the correlation information between differ- • DeepApnea-XY (DeepApnea-YZ/DeepApnea-XZ): It
ent axes, it does not consider the quality of the input signal, takes the ACC data from X-axis and Y-axis (Y-axis
and hence underperforms our model. and Z-axis)/(X-axis and Z-axis) as the input. Since
data from two axes are used, cross-axis correlation and
E. Per-class Performance self-attention techniques are also applied.
In this subsection, we compare the performance of different Fig. 12 (a) shows the overall performance of these sim-
methods on classifying sleep events into four types: normal, plified models. As can be seen, DeepApnea significantly

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
213
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

0.9 1.00 0.8 1.0


Macro-F1 score

F1-score

F1-score

F1-score

F1-score
0.8 0.8
0.7 0.95 0.6 0.5
0.6
0.5 0.90 0.6 0.0
-X -Y -Z Y YZ Z ea -X -Y -Z Y YZ Z ea Y Y Z
ea-X ea- ea-Z a-X a-Y a-XZ pnea
Y Y Z
ea-X ea- ea-Z a-X a-Y a-XZ pnea
-X -Y -Z Y YZ Z ea
nea nea nea ea-X ea- ea-X Apn nea nea nea ea-X ea- ea-X Apn nea nea nea ea-X ea- ea-X Apn Apn pApn pApn Apne Apne Apne eepA Apn pApn pApn Apne Apne Apne eepA
pAp pAp pAp Apn Apn Apn eep pAp pAp pAp Apn Apn Apn eep pAp pAp pAp Apn Apn Apn eep p p
Dee Dee Dee Deep Deep Deep D Dee Dee Dee Deep Deep Deep D Dee Dee Dee Deep Deep Deep D Dee Dee Dee Deep Deep Deep D Dee Dee Dee Deep Deep Deep D

(a) overall (b) Normal (c) Hypopnea (d) Obstructive apnea (e) Central apnea

Fig. 12. performance comparisons for data fusion models leveraging inputs from various number of axes.
outperforms three other two-axis fusion models (DeepApnea- can achieve almost perfect F1 score. For hypopnea, compared
XY, DeepApnea-YZ, DeepApnea-XZ), which outperform to DeepApnea-SelAtt and DeepApnea-CroCor, the improve-
the one-axis fusion models (DeepApnea-X, DeepApnea-Y, ments are 5.56% and 6.93% respectively. For obstructive
DeepApnea-Z). This demonstrates the benefits of leveraging apnea, the improvements are 5.9% and 7.14%. Finally, for
data from three axes. central apnea, the improvements are 29.9% and 33.2% re-
Fig. 12 (b)(c)(d)(e) show the per-class performance of spectively, which demonstrate that the self-attention and cross-
these simplified models. Similar to the results in Section correlation modules can significantly improve performance
VI-E, for normal sleep events, all models can achieve al- especially when the training dataset is small.
most perfect performance (i.e., above 98%). For hypopnea
and obstructive apnea, as shown in Fig. 12 (c) and (d), H. Comparison of the Denoising Methods
DeepApnea significantly outperforms the three two-axis fusion In this section, we compare how different denoising meth-
models (DeepApnea-XY, DeepApnea-YZ, DeepApnea-XZ), ods, i.e., ADA, moving average, and TV filter, affect the
which outperform the one-axis fusion models (DeepApnea-X, performance of the proposed DeepApnea model.
DeepApnea-Y, DeepApnea-Z).
For central apnea, as shown in Fig. 12 (e), DeepApnea TABLE III
C OMPARISON OF DIFFERENT DENOISING METHODS .
significantly outperforms the three two-axis fusion models
and the three one-axis fusion models. Because the number Denoising Method Normal Hypopnea Obstructive Central
of central apnea events is very small compared to other Moving average 98.51% 42.55% 69.8% 24.02%
sleep anpea events in our dataset, the single-axis and two- TV filter 99.05% 65.05% 78.57% 38.10%
ADA 99.53% 71.58% 82.86% 68.29%
axis fusion models do not perform well. DeepApnea leverages
information from all axes and considers their correlations, and
hence obtains more informative and representative features, As shown in Table III, DeepApnea can achieve the best
outperforming the simplified models. classification result when ADA is used for denoising. The
moving average filter has the worst performance because it
G. Ablation Study may remove potential useful apnea information by simply av-
eraging different sub-sequences of the raw signal. Specifically,
In this subsection, we validate the effectiveness of the
the F1 score of using moving average filter is 42.55%, 69.8%
proposed self-attention and cross-correlation modules by com-
and 24.02% for hypopnea, obstructive apnea, and central
paring with the following two models: DeepApnea-SelAtt,
apnea, respectively. Although the TV filter can preserve the
which only contains the CNN-based feature extractor and the
useful signal information, it cannot eliminate low-amplitude
self-attention module without the cross-correlation module,
noise. As discussed in Section IV-B, our ADA method not
and DeepApnea-CroCor which only contains the CNN-based
only keeps the periodic respiratory information but also re-
feature extractor and the cross-correlation module without the
moves all unnecessary noise. Therefore, compared to TV filter,
self-attention module.
using ADA can further improve the F1 score by 6.7%, 4.3%
TABLE II
and 30.19% on classifying hypopnea, obstructive apnea, and
A BLATION STUDY central apnea, respectively.

Method Normal Hypopnea Obstructive Central I. System Profiling


DeepApnea-SelAtt 98.75% 66.02% 76.95% 38.30%
DeepApnea-CroCor 98.63% 64.64% 75.72% 35.09%
When running DeepApnea, the acclerometor data from
DeepApnea 99.53% 71.58% 82.86% 68.29% the smartwatch can be moved to the smartphone through
Bluetooth. Based on the clinical study introduced in Section
According to Table II, DeepApnea clearly outperforms the VI-A, the total size of the raw ACC data from a single subject
other two, which illustrates that both self-attention and cross- during 8 hours measurement is about 45 MB, which can be
correlation modules help extract informative features from the transformed from a smartwatch to a smartphone within three
raw ACC data. Specifically, for normal sleep event, compared seconds through Bluetooth 5.0.
to DeepApnea-SelAtt and DeepApnea-CroCor, there is little In order to measure the running time of proposed DeepAp-
improvement when using DeepApnea model. This is because nea model on modern smartphones, we implemented Deep-
the breathing pattern of normal sleep is significantly different Apnea on several smartphones based on TensorFlow Lite.
from that of sleep apnea so that even the incomplete models As shown in Table IV, it only takes two or three seconds

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
214
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

to generate the classification results from the data over a from the same subject. From Fig. 9, we can see that the
whole night measurement (e.g., eight hours). We also consider apnea distribution varies significantly across different subjects.
the energy consumption of DeepApnea on smartwatch. When Consequently, the limited dataset may impact the generaliz-
running DeepApnea, the smartwatch has to constantly record ability of DeepApnea when applied to new users, potentially
the real-time data of ACC and write them into the watch’s leading to large variations in predictions. To provide a more
external storage (e.g., SD card). comprehensive evaluation of our model, we plan to conduct a
larger clinical study in the future.
TABLE IV
T HE RUNNING TIME OF D EEPA PNEA ON SMARTPHONES . VIII. C ONCLUSION
In this paper, we presented DeepApnea, a deep learning
Preprocessing Deep Learning
Smart Phone Processing Unit
(seconds) Inference (seconds) based sleep apnea detection system that leverages patients’
Google Pixel 3 Snapdragon 845 1.38 2.36 wrist movement data collected by smartwatches to identify
Huawei Mate30 Pro Kirin 990 0.83 1.22
Google Pixel 6 Google Tensor 0.45 1.01 different types of sleep apnea. We first proposed signal prepro-
cessing methods to filter the raw ACC data, smoothing away
We measured two modern commercial smartwatches: (1)
noise while preserving the respiratory signal and potential
Huawei Watch 2; (2) Apple Series 6. During the measurement,
features for identifying sleep apnea. Then, we designed a deep
we close all irrelevant applications and turn off the watches’
learning architecture to extract features from three ACC axes
screen to make sure the power usage comes from the operating
collaboratively. Specifically, we apply self attention technique
system and DeepApnea. Table V shows their battery usages
to accentuate more significant features and apply cross-axis
for running or not running DeepApnea for a whole night. As
correlation technique to exploit the correlations among dif-
can be seen, running DeepApnea for eight hours on Huawei
ferent axes. The extracted deep features and the correlation
watch 2 drains 55% of battery. For more advanced smart watch
information are merged through aggregated classification to
Apple Series 6, the watch battery only drains 23%. It shows
further improve the classification accuracy. Through a clinical
that modern smartwatches can support DeepApnea for running
study, we demonstrate that DeepApnea outperforms existing
at least one night.
solutions on multiclass classification. More specifically, Deep-
Apnea can detect different sleep apnea with high F1-score, i.e.,
TABLE V
T HE BATTERY USAGE ( M A H ) normal sleep (99.5%), obstructive apnea (82.9%), hypopnea
(71.6%), and central apnea (68.3%). Finally, by profiling
Smart Watch w/o DeepApnea w DeepApnea DeepApnea on different commodity devices, we demonstrate
Huawei Watch 2 (420mAh) 49.2 mAh 229.8 mAh that it is practical to apply our system on modern smartwatches
Apple Serise 6 (303.8mAh) 24.6 mAh 69.6 mAh
and smartphones.
R EFERENCES
VII. D ISCUSSIONS
[1] J. M. Parish, “Sleep-Related Problems in Common Medical Conditions,”
In this paper, we focus on identifying various types of Chest Journal, 2009.
sleep apnea using accelerometers in smartwatches. While [2] A. S. Association, “Sleep and Sleep Disorder Statistics,” April 2021,
evaluations demonstrate the superior performance of Deep- https://www.sleepassociation.org/about-sleep/sleep-statistics/.
[3] N. F. Watson, “Health care savings: the economic value of diagnostic
Apnea, it can be further improved by considering other and therapeutic care for obstructive sleep apnea,” Journal of Clinical
sensors besides accelerometer. Sleep apnea not only disrupts Sleep Medicine, 2016.
typical respiratory patterns, affecting the accelerometer signal [4] S. L. Appleton, A. Vakulin, R. D. McEvoy, A. Vincent, S. A. Martin,
J. F. Grant, A. W. Taylor, N. A. Antic, P. G. Catcheside, G. A. Wittert
on smartwatches, but also induces cardiovascular variations, et al., “Undiagnosed obstructive sleep apnea is independently associated
resulting in fluctuations in oxygen saturation (SpO2) and heart with reductions in quality of life in middle-aged, but not elderly men of
rate. In recent years, there has been an integration of new a population cohort,” Sleep and Breathing, 2015.
[5] A. Sabil, C. Marien, M. LeVaillant, G. Baffet, N. Meslier, and F. Gag-
physiological sensors into smartwatches, such as the oximeter nadoux, “Diagnosis of sleep apnea without sensors on the patient’s face,”
sensor and photoplethysmography (PPG) sensor. These sensors Journal of Clinical Sleep Medicine, 2020.
enable the measurement of oxygen saturation and heart rate. [6] T. Rahman, A. T. Adams, R. V. Ravichandran, M. Zhang, S. N. Patel,
J. A. Kientz, and T. Choudhury, “Dopplesleep: A contactless unobtru-
By incorporating data from these new sensors, we could sive sleep sensing system using short-range doppler radar,” in IEEE
achieve more accurate and robust apnea detection through International Conference on Pervasive Computing and Communications
smartwatches. (PerCom), 2015.
[7] L. Chen, J. Xiong, X. Chen, S. I. Lee, D. Zhang, T. Yan, and D. Fang,
One limitation of our dataset is its small size, comprising “Lungtrack: Towards contactless and zero dead-zone respiration mon-
data from only 20 subjects. Although the dataset has plenty itoring with commodity rfids,” ACM on Interactive, Mobile, Wearable
of obstructive apneas and hypopneas, the number of central and Ubiquitous Technologies(IMWUT), 2019.
[8] S. Yue, Y. Yang, H. Wang, H. Rahul, and D. Katabi, “Bodycompass:
apneas is limited, with the majority occurring in subjects 10 Monitoring sleep posture with wireless signals,” ACM on Interactive,
and 13. Moreover, several subjects do not have central apneas Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2020.
such as subjects 4, 9, and 12. As a result, in our 3-fold [9] Q. S. Xue, D. Shin, A. Pathak, J. Garrison, J. Hsu, M. Malhotra, and
S. Patel, “Luckychirp: Opportunistic respiration sensing using cascaded
cross-validation, the 125 central apnea events are split into sonar on commodity devices,” in IEEE International Conference on
training and testing sets, without checking whether they are Pervasive Computing and Communications (PerCom), 2022.

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
215
2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)

[10] D. Liaqat, M. Abdalla, P. Abed-Esfahani, M. Gabel, T. Son, R. Wu,


A. Gershon, F. Rudzicz, and E. D. Lara, “WearBreathing: Real World
Respiratory Rate Monitoring Using Smartwatches,” ACM on Interactive,
Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2019.
[11] X. Sun, L. Qiu, Y. Wu, Y. Tang, and G. Cao, “SleepMonitor: Monitoring
Respiratory Rate and Body Position During Sleep Using Smartwatch,”
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
(IMWUT), 2017.
[12] L. Zhao, F. Zhang, H. Zhang, Y. Liang, A. Zhou, and H. Ma, “Robust
respiratory rate monitoring using smartwatch photoplethysmography,”
IEEE Internet of Things Journal, pp. 4830–4844, 2022.
[13] Z. Jia, A. Bonde, S. Li, C. Xu, J. Wang, Y. Zhang, R. E. Howard, and
P. Zhang, “Monitoring a person’s heart rate and respiratory rate on a
shared bed using geophones,” in ACM SenSys, 2017.
[14] J. Clemente, M. Valero, F. Li, C. Wang, and W. Song, “Helena: Real-
time contact-free monitoring of sleep activities and events around the
bed,” in IEEE International Conference on Pervasive Computing and
Communications (PerCom), 2020.
[15] X. Chen, Y. Xiao, Y. Tang, J. Fernandez-Mendoza, and G. Cao,
“Apneadetector: Detecting sleep apnea with smartwatches,” ACM on
Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT),
2021.
[16] M. Shokoueinejad, C. Fernandez, E. Carroll, F. Wang, J. Levin, S. Rusk,
N. Glattard, A. Mulchrone, X. Zhang, A. Xie et al., “Sleep apnea: a
review of diagnostic sensors, algorithms, and therapies,” Physiological
measurement, 2017.
[17] R. Nandakumar, S. Gollakota, and N. Watson, “Contactless Sleep Apnea
Detection on Smartphones,” in ACM MobiSys, 2015.
[18] C. Xu, D. Tao, and C. Xu, “A survey on multi-view learning,” arXiv
preprint arXiv:1304.5634, 2013.
[19] A. V. Oppenheim, A. S. Willsky, S. H. Nawab, G. M. Hernández et al.,
Signals & systems. Pearson Educación, 1997.
[20] S. W. Smith, “The Moving Average Filter.” April 2021,
https://www.analog.com/media/en/technical-documentation/dsp-
book/dsp book Ch15.pdf.
[21] I. W. Selesnick and I. Bayram, “Total Variation Filtering. White paper.”
2010.
[22] J. Gao, H. Sultan, J. Hu, and W.-W. Tung, “Denoising nonlinear time
series by adaptive filtering and wavelet shrinkage: a comparison,” IEEE
signal processing letters, 2009.
[23] Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, and
Y. Bengio, “A structured self-attentive sentence embedding,” arXiv
preprint arXiv:1703.03130, 2017.
[24] Y. Zhang, L. Wang, H. Chen, A. Tian, S. Zhou, and Y. Guo, “If-
convtransformer: A framework for human activity recognition using imu
fusion and convtransformer,” ACM on Interactive, Mobile, Wearable and
Ubiquitous Technologies (IMWUT), 2022.
[25] H. Ren, J. Wang, W. X. Zhao, and N. Wu, “Rapt: Pre-training of time-
aware transformer for learning robust healthcare representation,” in ACM
SIGKDD, 2021.
[26] H. Xue, W. Jiang, C. Miao, Y. Yuan, F. Ma, X. Ma, Y. Wang, S. Yao,
W. Xu, A. Zhang et al., “Deepfusion: A deep learning framework for
the fusion of heterogeneous sensory data,” in ACM MobiHoc, 2019.
[27] L. Mou, R. Men, G. Li, Y. Xu, L. Zhang, R. Yan, and Z. Jin, “Natural
language inference by tree-based convolution and heuristic matching,”
arXiv preprint arXiv:1512.08422, 2015.
[28] A. S. Rathore, W. Zhu, A. Daiyan, C. Xu, K. Wang, F. Lin, K. Ren,
and W. Xu, “Sonicprint: a generally adoptable and secure fingerprint
biometrics in smart devices,” in ACM MobiSys, 2020.
[29] G. L. Santos, P. T. Endo, K. H. d. C. Monteiro, E. d. S. Rocha,
I. Silva, and T. Lynn, “Accelerometer-based human fall detection using
convolutional neural networks,” Sensors, 2019.
[30] S. Chambon, M. N. Galtier, P. J. Arnal, G. Wainrib, and A. Gramfort, “A
deep learning architecture for temporal sleep stage classification using
multivariate and multimodal time series,” IEEE Transactions on Neural
Systems and Rehabilitation Engineering, 2018.
[31] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, “Deepsense:
A unified deep learning framework for time-series mobile sensing data
processing,” in Proceedings of the 26th International Conference on
World Wide Web, 2017.

Authorized licensed use limited to: SHANDONG UNIVERSITY. Downloaded on November 12,2024 at 14:27:50 UTC from IEEE Xplore. Restrictions apply.
216

You might also like