0% found this document useful (0 votes)

47 views10 pages

Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors

This document discusses a new method called triplet cross-domain attention for human activity recognition using wearable sensors. The method builds three attention branches to capture cross-interactions between sensor dimension, temporal dimension, and channel dimension. Prior attention methods ignored these cross-interactions. The new method is evaluated on four public activity recognition datasets and shows consistent improvements over backbone models like plain CNN and ResNet, demonstrating its general applicability.

Uploaded by

Ayush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views10 pages

Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors

Uploaded by

Ayush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO.

5, OCTOBER 2022 1167

Triple Cross-Domain Attention on Human Activity

Recognition Using Wearable Sensors
Yin Tang , Lei Zhang , Qi Teng, Fuhong Min , and Aiguo Song , Senior Member, IEEE

Abstract—Efficiently identifying activities of daily living (ADL) Random Forest and native Bayesian methods have been widely
provides very important contextual information that is able to adopted in HAR areas [4], [5], which have achieved remarkable
improve the effectiveness of various sports tracking and healthcare performance. However, these shallow learning methods often
applications. Recently, attention mechanism that selectively focuses
on time series signals has been widely adopted in sensor based require feature extraction from the data, which heavily depends
human activity recognition (HAR), which can enhance interesting on expert knowledge from specific domain [6]. The handcrafted
target activity and ignore irrelevant background activity. Several feature engineering inevitably restricts the practicability of the
attention mechanisms have been investigated, which achieve re- HAR model when the task is transferred from one domain to the
markable performance in HAR scenario. Despite their success, other.
prior these attention methods ignore the cross-interaction between
different dimensions. In the paper, in order to avoid above short- Lately, deep learning techniques [7]–[9] have broken the
coming, we present a triplet cross-dimension attention for sensor- limit to shallow learning methods, which enables richer fea-
based activity recognition task, where three attention branches are ture representations to be learned automatically with no need
built to capture the cross-interaction between sensor dimension, of domain-specific knowledge. In particular, compared with
temporal dimension and channel dimension. The effectiveness of these shallow learning methods with handcrafted features that
triplet attention method is validated through extensive experiments
on four public HAR dataset namely UCI-HAR, PAMAP2, WISDM only can recognize low-level or simple activities, convolutional
and UNIMIB-SHAR as well as the weakly labeled HAR dataset. neural networks (CNNs) [7] are more suitable for recognizing
Extensive experiments show consistent improvements in classifi- more complex activities because of its advantages of local
cation performance with various backbone models such as plain dependencies and scale invariance. CNNs have significantly
CNN and ResNet, demonstrating a good generality ability of the pushed state-of-the-art performance in HAR scenario given its
triplet attention. Visualization analysis is provided to support our
conclusion, and actual implementation is evaluated on a Raspberry rich representation ability. Despite its effectiveness, deep HAR
Pi platform. still faces many key challenges, one of which is ground truth
annotation [10]. In a supervised learning setting, the use of
Index Terms—Activity recognition, attention, weakly supervised
learning, wearable sensors, convolutional neural networks.
deep CNNs relies heavily on strictly labeled activity sensor data
for training. Nevertheless, compared with HAR that uses video
data (e.g. GoPro motion camera), the high dimensional time
I. INTRODUCTION series data from motion sensors such as accelerometer is harder
URING recent years, human activity recognition (HAR) to interpret and annotate, which has brought cumbersome and
D using various motion sensors embedded in smartphones
or other wearable devices has become a new research hotspot in
arduous difficulties to HAR.
Such challenges can be tackled by utilizing attention mecha-
ubiquitous and mobile computing due to the rapid growth of ap- nism [11], [12], which shows great potential in a large variety
plication demands in domains such as health care, life assistance of computer vision or natural language processing tasks. The
and exercise monitoring. Sensor-based HAR task [1]–[3] can be learning of attention weights can aid the model to focus on the
regarded as a multi-channel time series classification problem, in target object, thereby improving the recognition accuracy. On
which a fixed length sliding window is utilized to split time series the other hand, for an annotator who is in charge of recording
signal into equal segments. Various traditional machine learn- sensor data, it is much simpler to identify whether a target
ing approaches such as Logistic Regression, Decision Trees, activity occurs in a long sensor sequence. If a specific activity
can be recognized according to coarse or weakly labels, it will
Manuscript received 15 July 2021; revised 2 November 2021; accepted 22 significantly ease the burden of manual labeling. Intuitively,
November 2021. Date of publication 5 January 2022; date of current version the attention mechanism is capable of aiding to tell where or
23 September 2022. This work was supported in part by the National Science
Foundation of China under Grant 61971228 and in part by the Natural Science what to focus via enhancing selectively the interesting target
Foundation of Jiangsu Province under Grant BK20191371. (Corresponding activity while weakening redundant or even other irrelevant
author: Lei Zhang.) information. Therefore, it deserves further research whether the
Yin Tang, Lei Zhang, Qi Teng, and Fuhong Min are with the School
of Electrical and Automation Engineering, Nanjing Normal University, attention mechanism can promote the state-ot-the-art perfor-
Nanjing 210023, China (e-mail: [email protected]; [email protected]; mance of HAR via consciously improving output feature maps
[email protected]; [email protected]). of convolutional network.
Aiguo Song is with the School of Instrument Science and Engineering,
Southeast University, Nanjing 210096, China (e-mail: [email protected]). Recently, hard attention [13] and soft attention [14] have been
Digital Object Identifier 10.1109/TETCI.2021.3136642 proposed respectively in weakly supervised learning scenario,

2471-285X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
1168 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO. 5, OCTOBER 2022

in which sensor data does not need to be strictly labeled. One II. RELATED WORKS
only needs to know which kind of activity has occurred in a
Attention in human perception is everywhere, which selec-
long sensor sequence without the specific location of the target tively focus on interesting parts while suppressing the other
activity. The learned attention weights can help to focus on
irrelevant or even misleading information. During the past few
the target activity from a long background sequence. However,
years, the attention mechanism has been widely incorporated
the two attention mechanisms can only tell us where to focus, into various deep CNN architectures, which can significantly
ignoring channel information, which plays an important role
improve performance on large scale computer vision tasks.
in deciding what to focus on. The dual attention network [15]
Several related attention mechanisms to our work are introduced
in weakly supervised HAR applications has demonstrated as follows. Hu et al. for the first time proposed the Squeeze-and-
the advantages of computing multi-attention. Although
Excitation Networks (SENet) [20], which successfully utilizes
dual attention mechanism provides significant performance global average-pooled features to compute channel attention
improvements in HAR scenario, it does not account for the in an efficient way. This was followed by the introduction of
importance of capturing cross-dimension interaction, which
Convolutional Block Attention Module (CBAM) [21], in which
have successfully shown a favorable impact in computer vision the combination of channel attention and spatial attention leads
task. to significant performance improvement. Global-Context Net-
In the paper, we firstly propose a novel triplet attention
works (GC-Net) [22] proposed a novel NL-block, which takes
network in HAR scenario, which mainly blends three attention into account global context modeling and lightweight modular
branches. Given a standard convolutional layer, let us consider design. More recently, Landskape et al. [23] adopted triplet
its input tensor with shape C × T × S, in which C, T and S are
attention mechanism for a variety of computer vision tasks,
the channel, temporal and sensor modality respectively. Each which concentrates on cross dimension interaction. However,
branch is responsible for capturing cross-dimension interaction
attention mechanism has been rarely explored in sensor based
between the spatial dimensions (T × S) and channel dimension
HAR scenario.
(C) of sensor input. We conduct extensive experiments to eval- Due to the popularity of attention mechanism in deep learning,
uate the triplet attention network on several public benchmark
a surge of research hotspot has been emerging to utilize attention
HAR datasets consisting of UCI-HAR dataset [16], PAMAP2
for handling HAR tasks. Recently, Ma et al. [24] proposed a
dataset [17], WISDM dataset [18] and UNIMIB-SHAR [19] novel AttnSense for HAR, which has incorporated the attention
dataset, as well as the weakly labeled HAR dataset. The experi-
mechanism into a Gated Recurrent Units (GRU) subnet for
mental results manifest that triplet attention perform better than
capturing the dependencies of sensor signals in both spatial and
one or two attention respectively. The main contributions of this temporal domains. Zeng et al. [25] highlighted the important
method are summarized as follows:
r Firstly, we propose a new architecture relying on triple part of different time series and sensor modalities by designing
temporal attention and sensor attention with Long Short Term
attention mechanism for HAR task, which could aid to Memory (LSTM). When compared to recurrent neural networks,
extract richer activity feature representations via building
CNN has better ability of feature extraction. In recent works,
three attention branches to capture cross-interaction be- two mainstream attention mechanisms, hard attention [13] and
tween sensor dimension, temporal dimension, and channel soft attention [14], have been incorporated into convolutional
dimension.
r Second, the triple attention tends to strength the impor- architecture to perform the weakly supervised HAR tasks, which
ignores the importance of sensor channels. Gao et al. [15]
tance of cross-dimension interaction, which is superior to proposed a novel dual attention method for HAR that blends
its corresponding predecessors, i.e., one or two attention
channel attention and spatial attention, demonstrating obvious
respectively.
r Finally, extensive experiments are conducted on several superiority in handling multimodal HAR task. In order to capture
cross-domain interaction of sensor signals, we for the first time
public HAR datasets, and several key hyperparameters
propose a new triple attention network for HAR task, which
are analyzed in details. We also examine actual imple-
is able to extract meaningful cross-dimensional features via
mentation on a Raspberry Pi platform with ARM-based
building three main attention branches.
computing core. The experimental results manifest that
triplet attention method could provide competitive results
at a negligible computational cost. III. MODEL
The rest of the paper is organized as follows. Section II Actually, the channel attention [20] often needs to compute
introduces related works on attention based HAR methods. a singular weight, i.e., a scalar for each channel of input sen-
Section III presents an overall architecture of the proposed sor tensor, which can be used to scale these feature maps for
triplet attention. In Section IV and Section V, we detail ex- generating attention effect. Although the lightweight channel
perimental results obtained on four public HAR datasets and attention is very effective, there is an obvious shortcoming
the weakly labeled HAR dataset, which are compared with the in its computing process. Usually, in order to produce these
existing SOTAs. Moreover, several ablation studies about the singular weights for each channel, one has to use global average
triplet attention method are provided. Section VI summarizes pooling to spatially subsample the input sensor tensor along
our conclusion. each channel, which inevitably leads to a significant loss in

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: TRIPLE CROSS-DOMAIN ATTENTION ON HUMAN ACTIVITY RECOGNITION USING WEARABLE SENSORS 1169

Fig. 1. The overview of our proposed triplet attention (TA) module for HAR system. It simply describes the three pipelines: Data collection and preprocessing,
model training as well as activity recognition. T &S, T &C and C&S represent temporal and sensor interaction, temporal and channel interaction, as well as
channel and sensor interaction, respectively.

spatial information. That is to say, the cross-dependence between

channel dimension and spatial dimension is lost due to the sub-
sampling by global average pooling. To avoid above drawback,
the dual attention [21] computes the spatial attention, which is
used as a complement to the channel attention. Simply speaking,
the channel attention tells “what channel” to focus on, and
meanwhile the spatial attention tells “where in the channel” to
focus. However, its shortcoming lies in that the channel attention
and spatial attention are computed independently. As a result,
the cross-interaction [23] between the two is ignored. To address
above shortcoming, we present the use of cross-dimension in- Fig. 2. Description of the triplet attention with three branches.
teraction for HAR task, which builds three attention branches
to capture the interaction between the spatial dimensions (i.e.,
temporal and sensor modality) and channel dimension of input further be expressed as:
sensor tensor. The time series sensor data is firstly preprocessed Wc = σ (w2 ReLU (w1 AP(χ)) + w2 ReLU (w1 MP(χ))) .
with the sliding window technique, which is then fed into a (2)
standard convolution layer. For a given convolutional layer, let us Note that two FC layers are used as indicated above, where
consider an input tensor χ with shape C × T × S, in which C, T the size of w1 and w2 is set by adjusting a scaling factor. On
and S are the channel, temporal and sensor modality respectively. the whole, the Eq. (2) uses two linear projections to assign
The cross-dimension interaction is introduced via three parallel corresponding weights to each channel.
attention branches, each of which is responsible for capturing
dependencies between the (C, T), (C, S) and (T, S) dimensions B. Triplet Attention Using Cross-Domain Interaction
of sensor input tensor respectively. Finally, the weights of three
cross-domain attentions are learned in the triplet attention. Fig. 1 As mentioned above, the triplet attention has three attention
shows the framework based triplet attention in HAR system. branches, which is built via using cross-domain attention mod-
ule. The given input sensor tensor χ ∈ RC×T ×S will be sent to
A. Rethinking Channel Attention an attention branch respectively. In fact, there are several aggre-
gation for attention weights, such as addition, multiplication and
Let us revisiting channel attention [20], [21] via considering concatenation. In order to make computation more lightweight,
a convolutional layer and its corresponding input tensor χ ∈ the Z_Pooling technique [23] is used, which can preserve richer
RC×H×W . For each independent channel, the SE block compute feature representations, and meanwhile compressing depth. Un-
the channel attention via utilizing global average-pooling tech- like the CBAM, two pooled features are concatenated to aggre-
nique to squeeze the H × W dimension. The channel weights gate information, which can be formulated as:
are generated by two FC layers that is followed by sigmoid
non-linearity function. The research in CBAM shows that the Z_Pooling(χ) = [MaxPool0d (χ), AvgPool 0d (χ)] , (3)
max-pooling operation is also a good choice for aggregating
where 0d is the 0th-dimension across which the max and
discriminative features. Referring to SE and CBAM block, the
average pooling operations occur. That is to say, the Z_Pooling
weight of the channel attention combining average-pooling and
can reduce the zeroth dimension given input tensor to two by
max-pooling can be expressed as:
aggregating the two pooled features. For example, a sensor
Wc = σ f{w1 ,w2 } (AP(χ)) + f{w1 ,w2 } (MP(χ)) , (1) tensor of shape (C × T × S) will be transformed into an output
tensor of shape (2 × T × S) through Z_Pooling. In fact, every
in which AP(χ) = W1H W,H i=1,j=1 χij and MP(χ) = branch is implemented by the following three steps, which can
W,H
maxi=1,j=1 χij is global average-pooling and max-pooling generate a refined tensor. The triplet across-domain attention is
operation respectively. σ is Sigmoid function. The Eq. (1) can shown in Fig. 2.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
1170 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO. 5, OCTOBER 2022

The first branch is in charge of calculating the cross- on four publicly available HAR datasets including UCI-HAR,
interaction between temporal dimension and channel dimension. WISDM, PAMAP2 and UNIMIB-SHAR. All datasets have been
Firstly, the tensor χ with input shape (C × T × S) is rotated 90◦ recorded by various sensors such as accelerometers and gyro-
counter-clockwise along the T axis to generate a new tensor χ 1 scope, which can reflect human activities in different scenarios.
with the shape (S × T × C); χ 1 is then fed into Z_Pooling, Secondly, detailed ablation experiments are provided to analyze
which can generate a tensor χ ∗1 with the shape (2 × T × C); the impact of several hyperparatmers. Finally, we evaluate the
∗
As a third stage, χ 1 is passed through a standard convolution performance of triplet attention in the weakly supervised activity
with k × 1 kernel size (e.g., 3 × 1, 5 × 1), followed by a batch recognition task, which uses the weakly labeled dataset collected
normalization, which results in an intermediate output (shape by He et al. [26]. The impact of different cross dimension
is 1 × T × C); After passed through a sigmoid activation, the attention for HAR is explored.
intermediate output is turned into attention weights ω1 , which
are applied to χ1 , then rotated 90◦ clockwise along the T axis to A. Training Details
keep the shape of input χ.
Our model is trained by minimizing cross-entropy (CE) loss
In the second branch, the cross interaction between sensor
using mini-batch gradient descent, where the batch size is set
dimension and channel dimension can be computed in a similar
to 200. An Adam optimizer with dynamic learning rate is used.
way. The tensor χ with input shape (C × T × S) is rotated
The initial learning rate is set to 0.001, which will be reduced
90◦ counter-clockwise along the S axis, which provides a new
by a factor of 0.1 after every 100 epochs. All the experiments
tensor χ2 with the shape (T × C × S); χ 2 is then passed through
are implemented in Python using PyTorch framework backend
Z_Pooling layer, which can generate a tensor χ ∗2 with the shape
∗ on a server with an Intel i7-6850 K CPU, 64 GB RAM and
(2 × C × S); At the third stage, χ 2 is passed through a standard
NVIDIA RTX 3090 GPU. Since there is highly imbalanced class
convolution with k × 1 kernel size (e.g., 3 × 1, 5 × 1), followed
in various naturalistic activity datasets, different class weights
by a batch normalization, which results in an intermediate output
need to be reconsidered according their sample proportion. Thus,
(1 × C × S); After passed through a sigmoid activation, the
the mean F1 score [27] is used as metric to evaluate final
intermediate output is turned into attention weights ω2 , which
performance.
are applied to χ2 , then rotated 90◦ clockwise along the S axis to
maintain the shape of input χ.
For the third branch, the channels of input tensor χ are reduced B. Datasets
to two via using Z_Pooling operation, which provides the tensor A comprehensive evaluation of the proposed method is con-
3 with the shape (2 × T × S); χ
χ 3 is then fed into a standard ducted using four popular HAR datasets that include both high-
convolution with k × 1 kernel size (e.g., 3 × 1, 5 × 1), followed dimensional and low-dimensional sensor modalities. The sensor
by a batch normalization, which results in an intermediate data is segmented using sliding window technique with different
output; The output is then fed into a sigmoid activation, which window size and step length, which has an important influence
generates the attention weights ω3 with shape (1 × T × S); The on recognition system’s practical performance. We select the
attention weights ω3 are then applied to the input χ. same window size and step length adopted in previous successful
Finally, the three refined tensors from three branches are cases [15], [27] to ensure fair comparison.
aggregated via learning three weight parameters. For simplicity, • UCI-HAR [16]: This dataset was collected by recruiting 30
it can be represented as: volunteers. Everyone is required to wear a Samsung Galaxy S
1 1 1 II smartphone around their waist to perform six simple daily
Y= (R (ω1 χ1 )) + (R (ω2 χ 2 )) + (ω3 χ 3 ) , (4) activities consisting of “Walking,” “Going upstairs,” “Going
3 3 3
downstairs,” “Sitting,” “Standing,” “Laying”. Three-axis ac-
where ω1 , ω2 and ω3 are the three cross-dimensional attention
celerometer and gyroscope sensor signals are recorded at a fixed
weights. The χ1 , χ
2 and χ3 represent the refined tensor, which
frequency of 50 Hz. The raw data is firstly preprocessed by the
can be obtained via rotating the input tensor χ 90◦ counter-
noise filter, which is then segmented by the sliding window with
clockwise along T axis and S axis respectively. R means 90◦
a fixed length of 128 and 50% overlap. Finally, the whole dataset
clockwise rotation. Compared with above simple averaging, the
has been randomly split into two parts, where 70% for training
model performance can be further improved by introducing a
and 30% for test.
combination of three learnable weight parameters, which can be
• PAMAP2 [17]: The Physical Activity Monitoring for Aging
formulated as:
People 2 dataset is collected from 9 participants to perform 12
Y = α1 (R (ω1 χ
1 )) + α2 (R (ω2 χ
2 )) + α3 (ω3 χ
3 ) , (5) daily activities (“Walking”, “Lying down”, “Standing”, etc.)
and excises (“Watching TV,” “Computer work,” “Car driving,”
which will be detailed in Section V. B.
etc.) The three inertial measurement units (IMUs) were placed
on the hand, chest, and ankle of each subject to collect raw
IV. EXPERIMENT
sensor data from accelerometer, gyroscope, magnetometer, and
In the following, we will describe the experimental setup heart rate. At a 100 Hz sampling rate, the collection process lasts
and main results in detail. All the experiments are divided into around 10 hours. To perform fair comparisons with previous
three parts. Firstly, to demonstrate the superiority of the pro- works [27], the sensor signal is down-sampled into 33.3 Hz and
posed triplet attention method, we compare classification results with a 5.12 s sliding window and 78% overlap. Generally, this

TABLE I • Equally-sized ResNet: To demonstrate the effectiveness of

BRIEF DESCRIPTION FOR EACH BACKBONE
triplet attention module, we also incorporate it into the residual
networks proposed in the previous work [28]. The residual
network consisting of three residual blocks with the same ar-
chitecture is used as our baseline, in which the contribution of
triplet attention mechanism is further evaluated.

V. DISCUSSION
The proposed method is compared with both baselines on four
public HAR datasets. We have three major observations from
Table II. Firstly, it can be seen that the ResNet outperforms all
original CNN by a large margin due to its strong feature extrac-
tion ability. For instance, the ResNet outperforms standard CNN
by 0.21% in terms of accuracy on UCI-HAR dataset. Secondly,
the results indicate that our triplet attention can further improve
performance by clear gains compared to these baselines. Results
from Table II, it can be easily seen that the proposed method
achieves 1.35% and 0.62% performance gains on PAMAP2
dataset when using CNN and ResNet as backbones respectively.
Similar results are also reflected on WISDM dataset. Meanwhile,
dataset are randomly divided into two parts, in which 80% is the triplet attention with almost the same complexity is superior
used for training and 20% for test. to the original CNN and equally-sized ResNet by 0.96% and
• WISDM [18]: The WISDM samples belong to 29 volun- 1.47% in terms of accuracy on UNIMIB-SHAR dataset, respec-
teer subjects who performed 6 discriminative human activities tively. This comparison consistently verifies the effectiveness of
(“Walking”, “Jogging,” “Sitting,” “Standing,” “Going down- our model on different baselines. That is to say, it can boost
stairs” and “Going upstairs”) by placing their mobile phones the accuracy of baselines significantly, demonstrating that it
with Android operating system in front leg pocket. It contains can generalize well on various models on HAR dataset. Lastly,
1,098,213 samples sampled at a rate of 20 Hz from a triaxial we note that there is no extra parameter caused by the triplet
accelerometer sensor. Accordingly, the accelerometer sensor attention compared to their plain counterparts, which motivates
data will be preprocessed by a sliding window of 10 seconds us to update new light-weight network by applying our proposed
and 95% overlap (200 readings/window). This dataset will be module.
split into two parts, in which 80% for training and 20% for test. In addition, the triplet attention method is compared with the
• UNIMIB-SHAR [19]: This dataset includes 11,771 samples other state-of-the-art algorithms [15], [31], [32], [35] accord-
from 30 test subjects for the use of human pose estimation ingly. Table II summarizes main experimental results. Com-
and fall detection. During data collection, a Samsung Galaxy pared with recent state-of-the-art methods, it obtains better or
Nexus I9250 smartphone is embedded with a Bosh BMA220 competitive results without increasing model complexity. As
3D accelerometer, which measured sensor signals at a frequency shown in Table II, we observe that the integration of the triplet
of 50 Hz. The dataset consists of 17 fine-grained categories, attention with ResNet is superior to Xiao et al.’s [31] result
which is further split into 9 classes of activities of daily living by 0.44% that uses a federated learning method on UCI-HAR
and 8 classes of falls. Accordingly, the sliding windows of data dataset. Compared with Teng et al.’s [27] result using local loss
are produced by a size T = 151 (151 readings/window). Our method, the triplet attention achieves 0.23% performance gain
experiment requires dividing up this dataset into two parts, where in terms of accuracy on PAMAP2 dataset. On WISDM dataset,
70% for training and the rest for test. our method is also able to beat Janarthanan et al.’s [35] result by
1.11%. Finally, the triplet attention also achieves very competi-
C. Comparison Algorithms tive accuracy on UNIMIB-SHAR dataset, which outperforms all
previous results [15], [19], [27], [36]. In particular, as mentioned
The triplet attention mechanism can be used to update the
above, it indicates that the triplet attention can be used to update
existing network architectures at a negligible cost. Extensive
the existing network architecture.
experiments are conducted to evaluate the performance gain
brought by the triplet attention part. To demonstrate generaliza-
tion ability of the triplet attention and analyze how it influence A. Visualization Analysis
the classification results, we use standard CNN, equally-sized To evaluate whether the cross-dimensional interaction pro-
ResNet [28] as our backbones, which is introduced as follows. vided by triplet attention can capture richer internal represen-
Table I presents their detailed architectures. tations of sensor signals, we provide sample visualization to
• Standard CNN: The baseline CNN consists of three standard better understand the cross-dimensional interaction between
convolution layers. Batch normalization and ReLU activation sensor dimension, temporal dimension and channel dimension
are applied after each convolutional layer. on PAMAP2 dataset. The results show that our triplet attention

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
1172 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO. 5, OCTOBER 2022

TABLE II
THE CLASSIFICATION PERFORMANCE ON FOUR HAR DATASETS

Fig. 3. Visualization of cross-dimension interaction between three attention

branches.

module is superior to its plain counterparts. As can be seen

from Fig. 3 (Left), the baseline without the triplet attention
fails to focus on relevant features between cross-dimension. It
is very evident that the triplet attention is able to provide richer
activity feature representations due to the use of cross-dimension Fig. 4. Visualization of sensor attention on PAMAP2 dataset.
interaction (Fig. 3 Right vs. Left).
Furthermore, the visualization analysis is provided to evaluate
the impact of sensor nodes placed on different body parts of chest sensor (chest_x, chest_z). In a word, compared with the
each participant. As shown in Fig. 4, the three main IMUs baseline counterparts, it is more reasonable that triplet attention
placed on the wrist, ankle and chest of human body are used mechanism can treat different sensor modalities unequally.
to collect various human activities, the attention weights of dif-
ferent sensor modalities are computed. Specifically, for “nordic
walking” activity, the triplet attention puts a high emphasis B. Ablation Studies
on the hand sensor (hand_x), the ankle sensor (ankle_z) and We further conduct ablation experiments on PAMAP2 dataset
the chest sensor (chest_z). For “rope jumping” activity, the to validate the effectiveness of cross-dimension interaction via
triplet attention focuses on the hand sensor (hand_y, hand_z), evaluating the impact of the branches in the triplet attention
the ankle sensor (ankle_x) and the chest sensor (chest_y). For module. As shown in Table III, the triplet attention with all three
“vacuum cleaning” activity, it pays much attention to the hand branches turned on is denoted as full. “Channel off” indicates
sensor (hand_y), the ankle sensor (ankle_y, ankle_z) and the that the first two branches of the input sensor tensor without

TABLE III
PERFORMANCE FOR DIFFERENT TRIPLET ATTENTION BRANCHES

Fig. 6. The test mean F1 (%) score at different sliding window sizes.

perform segmentation, there is still no clear consensus on how

to select an optimal window size. According to our intuition,
reducing the window length will be more beneficial for a faster
activity recognition, as well as reduced computational cost and
energy consumption. Instead, increasing window length are
usually used for the recognition of complex activities that last a
longer time. We check the performance on PAMAP2 dataset with
different window sizes to show the robustness of the proposed
Fig. 5. The impact of different weighting factors.
method. Results are summarized in Fig. 6. It can be seen that
the classification performance evolves non-monotonically as the
TABLE IV window size increases, which attains a peak value at 343. The
THE MEAN F1 (%) SCORE OF DIFFERENT AVERAGE METHODS triplet attention is able to reliably produce performance gain on
every window size.
In order to verify the robustness of the proposed method,
we perform leave-one-subject-out cross validation on PAMAP2
dataset. Actually, it can be seen as a special case of k-fold
cross validation, in which each individual person is treated as a
“test” set. In other words, the number of folds should be equal
to that of persons. As mentioned above, the PAMAP2 dataset
permutation are turned off, which can be seen as a two-attention
is collected from nine subjects. Thus, in this case, the whole
case. “Spatial off” indicates that the third branch, which is
dataset will be divided into 9 folds. The average F1 score is
involved in permutations of the input sensor tensor, is turned
used as a metric to evaluate the final classification performance.
off. It can be seen as a one-attention case. The results show that
We perform 9 folds, or iterations, of our model. Each time,
the triplet attention performs significantly better than one or two
the model will be trained on 8 subject and tested on the “left
attention, as well as its plain counterpart without attention, which
out” subject. Results are shown in Table V. It can be clearly
is in line with our statement. Here, we treat α1 , α2 and α3 as three
seen that the triple attention can reliably produce performance
learnable parameters rather than hyperparameters, whose initial
gain over both baselines. Specifically, the triplet attention could
values are 1/3. That is to say, their parameters are learned during
produce a significant improvement in the leave-one-subject-out
training from data sets. Fig. 5 illustrate this learning process on
cross validation, which beats the baseline CNN by 1.15%, and
PAMAP2 dataset. Results from Table IV, it can be seen that
ResNet by 0.45% respectively.
the learnable parameters are superior to simple averaging. The
source code will be released at the website: https://github.com/
yinntag/Triple-Cross-domain-Attention-for-HAR. C. Actual Implementation of Raspberry Pi
Actually, the window size has an important effect on activity To test the real-time performance of our model on mobile
recognition performance. Fixing an overlap rate, one can use devices, the CNN integrated with triplet attention module is
a fixed-length sliding window to segment continuous sensor deployed in an embedded system based on Raspberry Pi OS
reading, which may produce continuous samples and each of with PyTorch installed. By importing the trained model file
them may be assigned a specific activity label. As a consequence, to the embedded system, we perform real-time prediction of
sensor signals are divided into windows of a fixed size and activities on WISDM dataset. As shown in Fig. 7, the HAR
with no inter-window gaps, where an overlap between adjoining system is deployed into a Raspberry Pi 3B+, which is equipped
windows is tolerated in order to preserve the continuity of with an official supported Raspberry Pi operating system. It has
samples. Though sliding window has been normally utilized to a good compatibility with current popular deep learning library

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
1174 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO. 5, OCTOBER 2022

TABLE V
THE MEAN F1 (%) SCORE OF LEAVE-ONE-SUBJECT-OUT
EXPERIMENT ON PAMAP2 DATASET

Fig. 9. Inference time of convolutional network with or without triplet

attention.

Fig. 10. Snapshots of data collection in real scene.

Fig. 7. Actual implementation on Raspberry Pi 3 Model B+ platform.

Fig. 11. The demo of collecting and processing raw data.

this result is in line with our expectations and indicates that

the proposed model can easily perform activity inference in a
real-time way.

D. Weakly Supervised Learning

Our method is also evaluated on the weakly supervised
dataset, which was collected by placing an iPhone 7 in the
right pants pocket of 10 volunteers. Fig. 10 illustrates the data
collection process, in which each volunteer performs 5 kinds
of activities (“walking,” “going upstairs,” “going downstairs,”
“jumping” and “jogging”). “Walking” is regarded as a back-
ground activity, which is distinguished from the rest four target
Fig. 8. The user interface of HAR application with triplet attention.
activities. The sensor data is collected at a sampling frequency of
50 Hz. The application called HascLogger is used to record the
PyTorch 1.7. The Raspberry Pi is configured to communicate three-axis accelerometer data of these activities, which produces
with a laptop computer. A Python program is developed for the 76,157 samples. Accordingly, the sensor data can be segmented
HAR application (Fig. 8). For the practical implementation, a by a sliding window of 40.96 seconds and 50% overlap. In our
10-second window with an 95% overlap rate is used to segment experiment, the ratio of training set to test set is 7:3. Fig. 11
sensor readings. That is to say, the sliding step length is equal presents the software interface where these raw data are collected
to 500 ms, and the HAR system will wait for 500 ms to read and processed.
and predict next sample. We measure the inference time over We compare the triplet attention with several state-of-the-art
300 runs and results are shown in Fig. 9. It can be seen that the algorithms such as CNN, VGGNet and ResNet on the weakly
standard CNN takes around 129 ms per window, while CNN+TA labeled dataset. It can be seen that the embedded triplet attention
takes 153.9 ms per window, which is far below 500 ms. Thus, module produces the best performance among all the algorithms

TABLE VI VI. CONCLUSION

THE CLASSIFICATION PERFORMANCE ON WEAKLY LABELED HAR DATASET
In this paper, we focus on learning cross-interaction attention
for sensor based HAR task with low model complexity. A new
triplet attention module is proposed, which tends to capture
the cross-interaction between sensor dimension, temporal di-
mension, and channel dimensions via building three attention
branches. Our experimental results show that the lightweight
triplet attention block plays a crucial role in improving the
performance of various deep CNN architectures such as the plain
CNN and ResNet. Our triplet attention exhibits a good gener-
alization ability for various sensor based HAR tasks. Several
ablation experiments including visualization analysis are pro-
vided to support our conclusion, which verify the effectiveness
of the triplet attention method. We hope this work could motivate
future research of attention-based network architecture design
in a large variety of practical HAR scenarios.

REFERENCES
[1] Z. Wang, M. Jiang, Y. Hu, and H. Li, “An incremental learning method
based on probabilistic neural networks and adjustable fuzzy clustering for
human activity recognition by using wearable sensors,” IEEE Trans. Inf.
Technol. Biomed., vol. 16, no. 4, pp. 691–699, Jul. 2012.
[2] M. A. Alsheikh, A. Selim, D. Niyato, L. Doyle, S. Lin, and H. P. Tan,
“Deep activity recognition models with triaxial accelerometers,” in Proc.
30th AAAI Conf. Artif. Intell., 2016, pp. 8–13.
[3] A. Akbari and R. Jafari, “Personalizing activity recognition models
through quantifying different types of uncertainty using wearable sensors,”
IEEE Trans. Biomed. Eng., vol. 67, no. 9, pp. 2530–2541, Sep. 2020.
[4] Z. Wang, D. Wu, J. Chen, A. Ghoneim, and M. A. Hossain, “A triaxial
accelerometer-based human activity recognition via EEMD-based features
and game-theory-based feature selection,” IEEE Sensors J., vol. 16, no. 9,
pp. 3198–3207, May 2016.
[5] Z. Chen, Q. Zhu, Y. C. Soh, and L. Zhang, “Robust human activity
Fig. 12. Some example of location for target activity of the weakly sensor recognition using smartphone sensors via CT-PCA and online SVM,” IEEE
data. Trans. Ind. Informat., vol. 13, no. 6, pp. 3070–3080, Dec. 2017.
[6] A. Bulling, U. Blanke, and B. Schiele, “A tutorial on human activity recog-
nition using body-worn inertial sensors,” ACM Comput. Surv., vol. 46,
no. 3, pp. 1–33, 2014.
[7] M. Zeng et al., “Convolutional neural networks for human activity recog-
nition using mobile sensors,” in Proc. 6th Int. Conf. Mobile Comput. Appl.
in Table VI. Respectively, the proposed method achieves 2.88%, Serv., 2014, pp. 197–205.
2.24% and 2.92% performance gains over all baselines using [8] B. Meng, X. Liu, and X. Wang, “Human action recognition based on
CNN, VGGNet and ResNet as backbones. At the same time, quaternion spatial-temporal convolutional neural network and LSTM in
RGB videos,” Multimedia Tools Appl., vol. 77, no. 20, pp. 26901–26918,
our method is also superior to DeepConvLSTM [37] by a large 2018.
margin of 2.91%. Compared with Wang et al’s work [38], [9] X. Li, Y. Wang, B. Zhang, and J. Ma, “PSDRNN: An efficient and effective
the triplet attention achieves 0.3% performance gain. The re- HAR scheme based on feature extraction and deep learning,” IEEE Trans.
Ind. Informat., vol. 16, no. 10, pp. 6703–6713, Oct. 2020.
sults show that cross-dimensional attention is also conducive [10] A. Joulin, L. Van Der Maaten, A. Jabri, and N. Vasilache, “Learning visual
to enhance the feature representation of weakly supervised features from large weakly supervised data,” in Proc. Eur. Conf. Comput.
learning. Vis., New York, NY, USA: Springer, 2016, pp. 67–84.
[11] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
In the final step, the visualizing analysis is provided so as Process. Syst., 2017, pp. 5998–6008.
to identify what part of the target signal is the most important [12] Y. Chen, Y. Kalantidis, J. Li, S. Yan, and J. Feng, “A2 -nets: Double atten-
along the temporal dimension. For the weakly labeled dataset, tion networks,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 352–361.
[13] K. Xu et al., “Show, attend and tell: Neural image caption generation with
every signal window often contains the target activity and the visual attention,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 2048–2057.
background activity that submerges it, such as “walking”, which [14] S. Sharma, R. Kiros, and R. Salakhutdinov, “Action recognition using
is different from strictly labeled HAR dataset. The four sensor visual attention,” in Proc. Neural Inf. Process. Syst. Time Ser. Workshop,
2015.
signal windows, that are roughly labeled as “jogging,” “jump- [15] W. Gao, L. Zhang, Q. Teng, J. He, and H. Wu, “DanHAR: Dual atten-
ing,” “going downstairs” and “going upstairs”, are shown in tion network for multimodal human activity recognition using wearable
Fig. 12. Due to the reason that our triplet attention method sensors,” Appl. Soft Comput., vol. 111, 2021, Art. no. 107728.
[16] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A public
can focus on only the interesting part of the target activity and domain dataset for human activity recognition using smartphones,” in
weaken the background activities, it will be more beneficial for Proc. 21th Int. Eur. Symp. Artif. Neural Netw. Comput. Intell. Mach. Learn.,
ground truth data annotation. vol. 3, pp. 437–442, 2013.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.
1176 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO. 5, OCTOBER 2022

[17] A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for Yin Tang received the B.S. degree from the Hu-
activity monitoring,” in Proc. 16th Int. Symp. Wearable Comput., 2012, nan University of Engineering, Xiangtan, China, in
pp. 108–109. 2018. He is currently working toward the M.S. degree
[18] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition using with Nanjing Normal University, Nanjing, China. His
cell phone accelerometers,” ACM SigKDD Explorations Newslett., vol. 12, research interests include activity recognition, com-
no. 2, pp. 74–82, 2011. puter vision, and machine learning.
[19] D. Micucci, M. Mobilio, and P. Napoletano, “Unimib shar: A dataset for
human activity recognition using acceleration data from smartphones,”
Appl. Sci., vol. 7, no. 10, 2017, Art. no. 1101.
[20] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7132–7141.
[21] S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, “Cbam: Convolutional block
attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19. Lei Zhang received the B.Sc. degree in computer sci-
[22] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “Gcnet: Non-local networks ence from Zhengzhou University, Zhengzhou, China,
meet squeeze-excitation networks and beyond,” in Proc. IEEE Int. Conf. the M.S. degree in pattern recognition and intelli-
Comput. Vis. Workshops, 2019. gent system from the Chinese Academy of Sciences,
[23] D. Misra, T. Nalamada, A. U. Arasanipalai, and Q. Hou, “Rotate to attend: Beijing, China, and the Ph.D. degree from Southeast
Convolutional triplet attention module,” in Proc. IEEE/CVF Winter Conf. University, Nanjing, China, in 2011. In 2008, he was a
Appl. Comput. Vis., 2021, pp. 3139–3148. Research Fellow with IPAM, UCLA. He is currently
[24] H. Ma, W. Li, X. Zhang, S. Gao, and S. Lu, “Attnsense: Multi-level an Associate Professor with the School of Electrical
attention mechanism for multimodal human activity recognition,” in Proc. and Automation Engineering, Nanjing Normal Uni-
28th Int. Joint Conf. Artif. Intell., 2019, pp. 3109–3115. versity, Nanjing, China. His research interests include
[25] M. Zeng et al., “Understanding and improving recurrent networks for machine learning, human activity recognition, and
human activity recognition by continuous attention,” in Proc. ACM Int. computer vision.
Symp. Wearable Comput., 2018, pp. 56–63.
[26] J. He, Q. Zhang, L. Wang, and L. Pei, “Weakly supervised human activity
recognition from wearable sensors by recurrent attention learning,” IEEE Qi Teng received the B.S. degree from the Henan
Sensors J., vol. 19, no. 6, pp. 2287–2297, Mar. 2019. University of Engineering, Zhengzhou, China, in
[27] Q. Teng, K. Wang, L. Zhang, and J. He, “The layer-wise training con- 2017. He is currently working toward the M.S. degree
volutional neural networks using local loss for sensor based human with Nanjing Normal University, Nanjing, China. His
activity recognition,” IEEE Sensors J., vol. 20, no. 13, pp. 7265–7274, research interests include activity recognition, com-
Jul. 2020. puter vision, and machine learning.
[28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
pp. 770–778.
[29] Z. N. Khan and J. Ahmad, “Attention induced multi-head convolutional
neural network for human activity recognition,” Appl. Soft Comput.,
vol. 110, 2021, Art. no. 107671.
[30] A. Ignatov, “Real-time human activity recognition from accelerometer Fuhong Min received the master’s degree from the
data using convolutional neural networks,” Appl. Soft Comput., vol. 62, School of Communication and Control Engineering,
pp. 915–922, 2018. Jiangnan University, Wuxi, China, in 2003, and the
[31] Z. Xiao, X. Xu, H. Xing, F. Song, X. Wang, and B. Zhao, “A federated Ph.D. degree from the School of Automation, Nan-
learning system with enhanced feature extraction for human activity jing University of Science and Technology, Nanjing,
recognition,” Knowl.-Based Syst., vol. 229, 2021, Art. no. 107338. China, in 2007. From 2009 to 2010, she was a Post-
[32] S. Wan, L. Qi, X. Xu, C. Tong, and Z. Gu, “Deep learning models for doctoral Fellow with the School of Mechanical Engi-
real-time human activity recognition with smartphones,” Mobile Netw. neering, University of Southern Illinois, Carbondale,
Appl., vol. 25, no. 2, pp. 743–755, 2020. IL, USA. She is currently a Professor with the School
[33] K. Walse, R. Dharaskar, and V. Thakare, “Performance evaluation of of Electrical and Automation Engineering, Nanjing
classifiers on WISDM dataset for human activity recognition,” in Proc. Normal University, Nanjing, China. Her research in-
Second Int. Conf. Inf. Commun. Technol. Competitive Strategies, 2016, terests include circuits and signal processing.
pp. 1–7.
[34] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “Deep learning for human
activity recognition: A. resource efficient implementation on low-power Aiguo Song (Senior Member, IEEE) received the B.S.
devices,” in Proc. IEEE 13th Int. Conf. Wearable Implantable Body Sensor degree in automatic control and the M.S. degree in
Netw., 2016, pp. 71–76. measurement and control from the Nanjing Univer-
[35] R. Janarthanan, S. Doss, and S. Baskar, “Optimized unsupervised deep sity of Aeronautics and Astronautics, Nanjing, China,
learning assisted reconstructed coder in the on-nodule wearable sen- in 1990 and 1993, respectively, and the Ph.D. degree
sor for human activity recognition,” Measurement, vol. 164, 2020, in measurement and control from Southeast Univer-
Art. no. 108050. sity, Nanjing, China, in 1998. He was an Associate
[36] T. Liu, S. Wang, Y. Liu, W. Quan, and L. Zhang, “A lightweight Researcher with Intelligent Information Processing
neural network framework using linear grouped convolution for hu- Laboratory, Southeast University. From 1998 to 2000,
man activity recognition on mobile devices,” J. Supercomput., pp. 1–21, he was an Associate Professor with the Department
2021. of Instrument Science and Engineering, Southeast
[37] F. J. Ordóñez and D. Roggen, “Deep convolutional and LSTM recurrent University. From 2000 to 2003, he was the Director of Robot Sensor and
neural networks for multimodal wearable activity recognition,” Sensors, Control Laboratory, Southeast University. From April 2003 to April 2004, he
vol. 16, no. 1, p. 115, 2016. was a Visiting Scientist with the Laboratory for Intelligent Mechanical Systems,
[38] K. Wang, J. He, and L. Zhang, “Sequential weakly labeled multiactivity Northwestern University, Evanston, IL, USA. He is currently a Professor with
localization and recognition on wearable sensors using recurrent attention the School of Instrument Science and Engineering, Southeast University. His
networks,” IEEE Trans. Hum.-Mach. Syst., vol. 51, no. 4, pp. 355–364, research interests include teleoperation control, haptic display, Internet teler-
Aug. 2021. obotics, and distributed measurement systems.

Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on October 04,2023 at 05:23:17 UTC from IEEE Xplore. Restrictions apply.

Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
65 pages
AI and Machine Learning for Coders A Programmer s Guide to Artificial Intelligence 1st Edition Laurence Moroney instant download
100% (2)
AI and Machine Learning for Coders A Programmer s Guide to Artificial Intelligence 1st Edition Laurence Moroney instant download
68 pages
1-s2.0-S0167739X22003089-main
No ratings yet
1-s2.0-S0167739X22003089-main
14 pages
Research Highlights (Required) : /item
No ratings yet
Research Highlights (Required) : /item
11 pages
Human Activity Recognition in Artificial Intelligence Framework: A Narrative Review
No ratings yet
Human Activity Recognition in Artificial Intelligence Framework: A Narrative Review
54 pages
Ensembled Transfer Learning Based Multichannel Attention Networks For Human Activity Recognition in Still Images
No ratings yet
Ensembled Transfer Learning Based Multichannel Attention Networks For Human Activity Recognition in Still Images
12 pages
Dilated_causal_convolution_with_multi-head_self_at
No ratings yet
Dilated_causal_convolution_with_multi-head_self_at
19 pages
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
No ratings yet
Ensemble of Deep Learning Techniques To Human Activity Recognition Using Smart Phone Signals
30 pages
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
No ratings yet
LSTM Networks Using Smartphone Data For Sensor-Based Human Activity Recognition in Smart Homes - Enhanced Reader
25 pages
A_Human_Activity_Recognition_Method_Based_on_Lightweight_Feature_Extraction_Combined_With_Pruned_and_Quantized_CNN_for_Wearable_Device (1)
No ratings yet
A_Human_Activity_Recognition_Method_Based_on_Lightweight_Feature_Extraction_Combined_With_Pruned_and_Quantized_CNN_for_Wearable_Device (1)
14 pages
SLR Zainab Saba
No ratings yet
SLR Zainab Saba
21 pages
Machine Learning and Deep Learning Models For Human Activity Recognition in Security and Surveillance: A Review
No ratings yet
Machine Learning and Deep Learning Models For Human Activity Recognition in Security and Surveillance: A Review
32 pages
(2021) Attention-Based Sensor Fusion for Human Activity Recognition Using IMU Signals
No ratings yet
(2021) Attention-Based Sensor Fusion for Human Activity Recognition Using IMU Signals
32 pages
electronics10030308
No ratings yet
electronics10030308
21 pages
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
No ratings yet
A Novel Semisupervised Deep Learning Method For Human Activity Recognition PDF
10 pages
TII Deep Learning PA Accepted
No ratings yet
TII Deep Learning PA Accepted
12 pages
1 s2.0 S2667096821000392 Main
No ratings yet
1 s2.0 S2667096821000392 Main
18 pages
Optimizing Physical Activity Recognition Using LSTM Network
No ratings yet
Optimizing Physical Activity Recognition Using LSTM Network
14 pages
Deep Learning Models for Real-time Human Activity Recognition
No ratings yet
Deep Learning Models for Real-time Human Activity Recognition
13 pages
buffelli2021
No ratings yet
buffelli2021
10 pages
Human Activity Recognition using DL methods
No ratings yet
Human Activity Recognition using DL methods
14 pages
A_Public_Domain_Dataset_for_Real-Life_Human_Activi
No ratings yet
A_Public_Domain_Dataset_for_Real-Life_Human_Activi
14 pages
Sensors 19 03731
No ratings yet
Sensors 19 03731
20 pages
1 s2.0 S1110016824000425 Main
No ratings yet
1 s2.0 S1110016824000425 Main
14 pages
A Hybrid Deep Approach To Recognizing Student Activity and Monitoring Health Physique Based On Accelerometer Data From Smartphones
No ratings yet
A Hybrid Deep Approach To Recognizing Student Activity and Monitoring Health Physique Based On Accelerometer Data From Smartphones
18 pages
An Active Semi-Supervised Deep Learning Model For Human Activity Recognition
No ratings yet
An Active Semi-Supervised Deep Learning Model For Human Activity Recognition
17 pages
LSTM-CNN Architecture For Human Activity Recognition
No ratings yet
LSTM-CNN Architecture For Human Activity Recognition
12 pages
EdgeActNet_compressed
No ratings yet
EdgeActNet_compressed
15 pages
Wearable-Based Behaviour Interpolation For Semi-Supervised Human Activity Recognition
No ratings yet
Wearable-Based Behaviour Interpolation For Semi-Supervised Human Activity Recognition
13 pages
Recent Trends in Machine Learning For Human Activity Recognition - A Survey
No ratings yet
Recent Trends in Machine Learning For Human Activity Recognition - A Survey
16 pages
Redundant Feature Screening Method
No ratings yet
Redundant Feature Screening Method
9 pages
1
No ratings yet
1
12 pages
Es2013 11 PDF
No ratings yet
Es2013 11 PDF
10 pages
Multi‑input CNN‑GRU based human activity recognition
No ratings yet
Multi‑input CNN‑GRU based human activity recognition
18 pages
Human Activity Recognition Based On Acceleration Data From Smartphones Using HMMs
No ratings yet
Human Activity Recognition Based On Acceleration Data From Smartphones Using HMMs
16 pages
DEEP-LEARNING-ENHANCED HUMAN ACTIVITY RECOGNITION FOR INTERNET OF HEALTHCARE THINGS Zhou Et Al 2020
No ratings yet
DEEP-LEARNING-ENHANCED HUMAN ACTIVITY RECOGNITION FOR INTERNET OF HEALTHCARE THINGS Zhou Et Al 2020
10 pages
Activity Recognition
No ratings yet
Activity Recognition
14 pages
Deep Learning Enhanced Human Activity Recognition for Internet of Healthcare Things
No ratings yet
Deep Learning Enhanced Human Activity Recognition for Internet of Healthcare Things
10 pages
Thesis Report
No ratings yet
Thesis Report
14 pages
HUMAN_ACTIVITY_RECOGNITION_-_IEEE_ConferencePaper
No ratings yet
HUMAN_ACTIVITY_RECOGNITION_-_IEEE_ConferencePaper
8 pages
HUMAN_RECOGNITION_-_IEEE_ConferencePaper
No ratings yet
HUMAN_RECOGNITION_-_IEEE_ConferencePaper
8 pages
Harmonic Loss Function For Sensor-Based Human Acti
No ratings yet
Harmonic Loss Function For Sensor-Based Human Acti
11 pages
HARcnn
No ratings yet
HARcnn
7 pages
Deep Learning For Sensor-Based Activity Recognition: A Survey
No ratings yet
Deep Learning For Sensor-Based Activity Recognition: A Survey
10 pages
Context-Aware Human Activity Recognition (CAHAR) In-The-Wild Using Smartphone Accelerometer
No ratings yet
Context-Aware Human Activity Recognition (CAHAR) In-The-Wild Using Smartphone Accelerometer
11 pages
deep2019 3
No ratings yet
deep2019 3
6 pages
A New Framework For Smartphone Sensor Based Human Activity Recognition Using Graph Neural Network
No ratings yet
A New Framework For Smartphone Sensor Based Human Activity Recognition Using Graph Neural Network
8 pages
Human Activity Recognition
No ratings yet
Human Activity Recognition
6 pages
3
No ratings yet
3
6 pages
Human Activity Recog Paper2
No ratings yet
Human Activity Recog Paper2
5 pages
Basic Activity Recognition From Wearable
No ratings yet
Basic Activity Recognition From Wearable
20 pages
Human Activity Recognition: A Review: March 2015
No ratings yet
Human Activity Recognition: A Review: March 2015
6 pages
Sensors 22 06463 v2
No ratings yet
Sensors 22 06463 v2
33 pages
Abstract
No ratings yet
Abstract
3 pages
Application of HAR - a review
No ratings yet
Application of HAR - a review
30 pages
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
No ratings yet
An Adaptive Batch Size-Based-CNN-LSTM Framework For Human Activity Recognition in Uncontrolled Environment
9 pages
Human Activity Recognition Using Machine Learning: Bachelor of Technology
No ratings yet
Human Activity Recognition Using Machine Learning: Bachelor of Technology
19 pages
Human Activity Reco
No ratings yet
Human Activity Reco
17 pages
Lecture 02
No ratings yet
Lecture 02
147 pages
Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
No ratings yet
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
17 pages
Object Detection Using CNN-RCNN.-1
No ratings yet
Object Detection Using CNN-RCNN.-1
14 pages
Internship_report (1)
No ratings yet
Internship_report (1)
29 pages
Official Copy of Thesis Sams
No ratings yet
Official Copy of Thesis Sams
82 pages
Time-Series Forecasting With Deep Learning - A Survey
No ratings yet
Time-Series Forecasting With Deep Learning - A Survey
14 pages
Keratoconus Detection Using Deep Learning - Is It Possible
No ratings yet
Keratoconus Detection Using Deep Learning - Is It Possible
2 pages
20MIS0146_VL2023240102512_PE003 (1)
No ratings yet
20MIS0146_VL2023240102512_PE003 (1)
26 pages
Machine Learning (Important QS) - Young Researchers
No ratings yet
Machine Learning (Important QS) - Young Researchers
81 pages
Eswa D 23 08549
No ratings yet
Eswa D 23 08549
29 pages
Human Brain Mapping - 2021 - Cao - Brain Functional and Effective Connectivity Based On Electroencephalography Recordings
No ratings yet
Human Brain Mapping - 2021 - Cao - Brain Functional and Effective Connectivity Based On Electroencephalography Recordings
20 pages
CRI Seasonal Program (CRISP) Welcome Pack
No ratings yet
CRI Seasonal Program (CRISP) Welcome Pack
16 pages
Deep Learning Techniques For Geospatial Data Analysis: August 2020
No ratings yet
Deep Learning Techniques For Geospatial Data Analysis: August 2020
21 pages
A Configurable 10T SRAM-Based IMC Accelerator With Scaled-Voltage-Based Pulse Count Modulation for MAC and High-Throughput XAC
No ratings yet
A Configurable 10T SRAM-Based IMC Accelerator With Scaled-Voltage-Based Pulse Count Modulation for MAC and High-Throughput XAC
6 pages
Predicting Bentonite Swelling Pressure Optimized XGBoost Versus Neural Networks
No ratings yet
Predicting Bentonite Swelling Pressure Optimized XGBoost Versus Neural Networks
28 pages
1 s2.0 S0924271622003380 Main
No ratings yet
1 s2.0 S0924271622003380 Main
15 pages
SMA Net: Deep Learning Based Identification and Fitting of CAD Models From Point Clouds
No ratings yet
SMA Net: Deep Learning Based Identification and Fitting of CAD Models From Point Clouds
22 pages
C1 Projectreport
No ratings yet
C1 Projectreport
58 pages
CS231n Convolutional Neural Networks For Visual Recognition 6
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 6
17 pages
WCAM - Wavelet Convolutional Attention Module
No ratings yet
WCAM - Wavelet Convolutional Attention Module
6 pages
Project Report
No ratings yet
Project Report
30 pages
Editted
No ratings yet
Editted
11 pages
Artificial Intelligence For 5G and Beyond 5G: Implementations, Algorithms, and Optimizations
No ratings yet
Artificial Intelligence For 5G and Beyond 5G: Implementations, Algorithms, and Optimizations
15 pages
SIW 2021 ICDAR Competition On Script Identification in The Wild
No ratings yet
SIW 2021 ICDAR Competition On Script Identification in The Wild
15 pages
PDF Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
100% (2)
PDF Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
65 pages
Seminar Report Deep-Learning PDF
No ratings yet
Seminar Report Deep-Learning PDF
26 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet

Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors

Uploaded by

Triple Cross-Domain Attention On Human Activity Recognition Using Wearable Sensors

Uploaded by

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 6, NO.

5, OCTOBER 2022 1167

Triple Cross-Domain Attention on Human Activity

spatial information. That is to say, the cross-dependence between

TABLE I • Equally-sized ResNet: To demonstrate the effectiveness of

Fig. 3. Visualization of cross-dimension interaction between three attention

module is superior to its plain counterparts. As can be seen

perform segmentation, there is still no clear consensus on how

Fig. 9. Inference time of convolutional network with or without triplet

Fig. 10. Snapshots of data collection in real scene.

Fig. 7. Actual implementation on Raspberry Pi 3 Model B+ platform.

Fig. 11. The demo of collecting and processing raw data.

this result is in line with our expectations and indicates that

D. Weakly Supervised Learning

TABLE VI VI. CONCLUSION

You might also like