0% found this document useful (0 votes)
1 views

Energy-Efficient_and_Interpretable_Multisensor_Human_Activity_Recognition_via_Deep_Fused_Lasso_Net

The document presents a deep fused Lasso net (dfLasso-Net) for multisensor human activity recognition (HAR) that integrates sensor and feature selection into a single deep learning framework. This model aims to improve recognition accuracy while reducing the number of sensors used, thus enhancing energy efficiency and interpretability of the results. Experimental results demonstrate that dfLasso-Net outperforms existing methods by achieving better accuracy with fewer sensors and providing insights into the importance of various sensors and features.

Uploaded by

kavidha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Energy-Efficient_and_Interpretable_Multisensor_Human_Activity_Recognition_via_Deep_Fused_Lasso_Net

The document presents a deep fused Lasso net (dfLasso-Net) for multisensor human activity recognition (HAR) that integrates sensor and feature selection into a single deep learning framework. This model aims to improve recognition accuracy while reducing the number of sensors used, thus enhancing energy efficiency and interpretability of the results. Experimental results demonstrate that dfLasso-Net outperforms existing methods by achieving better accuracy with fewer sensors and providing insights into the importance of various sensors and features.

Uploaded by

kavidha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

3576 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO.

5, OCTOBER 2024

Energy-Efficient and Interpretable Multisensor


Human Activity Recognition via Deep
Fused Lasso Net
Yu Zhou , Senior Member, IEEE, Jingtao Xie , Xiao Zhang , Member, IEEE, Wenhui Wu , Member, IEEE,
and Sam Kwong , Fellow, IEEE

Abstract—Utilizing data acquired by multiple wearable sensors visualizing the weights of the sensors and features. Last but not
can usually guarantee more accurate recognition for deep learning least, dflasso-Net can be used as an effective filter-based feature
based human activity recognition. However, an increased number selection approach with much flexibility.
of sensors bring high processing cost, influencing real-time ac-
tivity monitoring. Besides, existing methods rarely consider the Index Terms—Sparse optimization, human activity recognition,
interpretability of the recognition model in aspects of both the deep fused lasso, feature selection, sensor selectivity.
importance of the sensors and features, causing a gap between
deep learning and their extendability in real-world scenario. In this
paper, we cast the classical fused lasso model into a deep neural I. INTRODUCTION
network, proposing a deep fused Lasso net (dfLasso-Net), which
UMAN activity recognition (HAR) [1] has been widely
can perform sensor selection, feature selection and HAR in one
end-to-end structure. Specifically, a two-level weight computing
module (TLWCM) consisting of a senor weight net and a feature
H used in various real-world applications such as health-
care [2] and sports [3]. With the popularity of ubiquitous
weight net is designed to measure the importance of sensors and computing and Internet-of-Things (IoT) technology, HAR with
features. In sensor weight net, spatial smoothness between physical wearable sensors has received extensive attention due to their
channels within each sensor is considered to maximize the usage
of selected sensors. And the feature weight net is able to maintain
high portability, low power consumption and privacy protection
the physical meaning of the hand-crafted features through feature compared with video based methods [4], [5].
selection inside the sensors. By combining with the learning module Due to high complexity and diversity of human activities,
for classification, HAR can be performed. We test dfLasso-Net using a single sensor for classification in HAR tasks is usually
on three multi-sensor based HAR datasets, demonstrating that unsatisfactory because action changes are sensitive to sensor
dfLasso-Net achieves better recognition accuracy with the least
number of sensors and provides good model interpretability by
location, which requires analysis of different actions and de-
ployment at the right location to achieve good recognition
results [6]. In addition, sensors with different modalities are
Manuscript received 16 April 2024; revised 30 May 2024; accepted 25 June complementary to each other so that more comprehensive data
2024. Date of publication 23 July 2024; date of current version 3 October are available [7], [8]. Thus, using multiple sensors on human
2024. This work was supported in part by Guangdong Basic and Applied
Basic Research Foundation under Grant 2024A1515012485, in part by Shen- body, known as body area networks (BANs) can greatly improve
zhen Fundamental Research Program under Grant JCYJ20220810112354002, the recognition accuracy [9], [10], where different modalities,
in part by Shenzhen Science and Technology Program under Grant numbers and locations of sensors in HAR systems are taken into
KJZD20230923114111021, in part by the National Natural Science Foundation
of China under Grant 62376162, in part by the Fund for Academic Innovation consideration. However, as suggested by previous studies on
Teams and Research Platform of South-Central Minzu University under Grant IoT wireless networks [11], [12], using multiple sensors greatly
XTZ24003 and Grant PTZ24001, in part by the Knowledge Innovation Pro- increases the computational complexity and processing costs,
gram of Wuhan-Basic Research through Project 2023010201010151, in part
by the Research Start-up Funds of South-Central Minzu University under Grant which is not adapted to the increasing demand for less time
YZZ18006, and in part by the Spring Sunshine Program of Ministry of Education consumption and energy requirement in ubiquitous computing.
of the People’s Republic of China under Grant HZKY20220331. (Corresponding Moreover, the data or features collected by additional sensors
author: Xiao Zhang.)
Yu Zhou and Jingtao Xie are with the College of Computer Science and are likely to introduce the redundant information and confuse the
Software Engineering, Shenzhen University, Shenzhen 518060, China (e-mail: learning algorithm, which negatively affects the classification
[email protected]; [email protected]). performance. Therefore, it is desirable to identify the effective
Xiao Zhang is with the Department of Computer Science, South-Central
Minzu University, Wuhan 430074, China, and also with the School of Computer sensors out of all the sensors to build a more compact and
Science and Technology, University of Science and Technology of China, Hefei energy-efficient HAR system by removing redundant sensors.
230052, China (e-mail: [email protected]). For multisensor HAR, each sensor often contains multi-
Wenhui Wu is with the College of Electronics and Information Engineer-
ing, Shenzhen University, Shenzhen 518060, China (e-mail: wuwenhui@szu. ple physical channels to process different directional compo-
edu.cn). nents of the captured motion signals (e.g. x, y and z axis),
Sam Kwong is with the Department of Computing and Decision Science, so multi-channel signal processing and feature extraction are
Lingnan University, Hong Kong (e-mail: [email protected]).
Recommended for acceptance by J. Liu. key prerequisites. In recent years, various deep neural network
Digital Object Identifier 10.1109/TETCI.2024.3430008 (DNN) models have been introduced in HAR, such as Long
2471-285X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3577

Short Term Memory Network (LSTM) [13], Recurrent Neural method when removing the classifier module. Experimental
Network (RNN) [14] and Graph Neural Network (GNN) [10], studies on three public multisensor based human activity recog-
where the signal in each physical channel is processed based on nition datasets demonstrate that dfLasso-Net can achieve better
time series modeling approaches [15]. Some recent works on recognition performance with the least number of sensors and is
HAR explore feature extraction or learning within each phys- potential to provide good model interpretability by visualizing
ical channel and perform ensemble learning through attention the weights of both the sensors and features. In short, our main
mechanism in various deep neural networks [16], [17], [18]. It contributions are as follows:
is also investigated in [18] that for multisensor HAR, within r We propose an energy-efficient and interpretable deep
each sensor, the physical channels usually have significantly neural network for multisensor based HAR inspired by
different importance, where not all the channels are relevant. the classical sparse learning method, fused Lasso, which
The independent channel-wise processing brings one issue that is able to perform sensor selection and features selection
the important channels may be scattered in different sensors, with learned weights and accurate HAR in one end-to-end
resulting in difficulty in sensor selection. Moreover, in this structure.
case, the energy consumption is not reduced since the signal r A two-level weight computing module (TLWCM) con-
transmission cannot be turn off for each channel but for each sisting of a senor weight net and a feature weight net is
sensor in real-world scenario. designed to measure the importance of sensors and fea-
Another line of research regards the multi-channel time tures simultaneously, where spatial smoothness between
series data jointly as a 2D or 3D image and Convolutional physical channels within each sensor is considered which
Neural Network (CNN) [19] is used to to extract and learn not only correlate the signals in different channels but also
the deep features [20], [21], [22], [23], [24], [25], [26]. The maximize the usage of each selected sensor.
interpretability of deep learning model is very important for r Alternative feature selection mechanism: the proposed fea-
HAR in real world applications. For example, in smart sports ture weight net can not only be used as an embedded
training system, users would like to know how the incorrect method but also a filter-based method for feature selection,
action is generated, which parts of body is more important to which can increase the model interpretability from the
perform accurate recognition [27]. The hierarchy of increasing perspective of feature level.
complexity and fusion of multi-channel signal can boost their The rest of this paper is organized as follows. Section II
predictive capacity, but lowers their interpretability in terms of provides the overview of related research. In Section III, we
preserving physical meanings. On one hand, the learned deep introduce the details of dfLasso-Net. Experimental settings and
features across the channels or sensors is hard to resolve and trace results are shown in Section IV. Finally, conclusion and discus-
back to its original physical channels; on the other hand, treating sion of our future work are provided in Section V.
the time-series signals as an image is not rational enough, since
simply stacking the sensor signals cannot satisfy local structure II. RELATED WORK
similarity assumption in image processing tasks.
To tackle the above issues, we consider developing an energy- A. Deep Learning Based Multi-Channel Feature Extraction in
efficient HAR with a reduced number of sensors which can main- Human Activity Recognition
tain feature interpretability in respective physical channel while With the rapid development of artificial intelligence tech-
exploring the correlation across different physical channels. nology, deep learning based activity recognition models [34]
Specifically, we cast the classical fused lasso model into a deep have been actively researched in recent years, which outperform
neural network, proposing a deep fused Lasso net (dfLasso-Net), the traditional machine learning models such as support vector
which can perform sensor selection, feature weighting and HAR machine (SVM). Currently, CNN are most widely used in HAR
in one end-to-end structure. In dfLasso-Net, a two-level weight tasks due to its local dependence and scale invariance, where the
computing module (TLWCM) consisting of a sensor weight net multi-physical channel signals are usually treated and processed
and a feature weight net is designed to measure the importance as images, considering both the temporal and spatial dimensions.
of sensors and features simultaneously so that useful sensors and Mohmed et al. [20] proposed the use of a Deep Convolutional
informative features can be retained. In sensor weight net, spatial Neural Network for human activity recognition using binary
smoothness between physical channels within each sensor is ambient sensors, which can visualise binary string data as a
considered to maximize the usage of selected sensors. And greyscale image. Mekruksavanich et al. [21] proposed a hybrid
the feature weight net inspired by feature weighting method HAR framework that employs spatial-temporal features that are
in interpretable machine learning (IML), such as LIME [28] automatically extracted from data obtained from smartwatch
or SHAP [29] is able to maintain the physical meaning of the sensors, which eliminate the need for the manual extraction of
hand-crafted features through feature selection (FS) to enhance features. Zhang et al. [22] employed the structure of U-net, where
the model interpretability from the feature level. Different FS a 2D image of raw motion signal was used as the input data.
methods have been used to reduce the redundant features so Mohamed et al. [23] considered complex IoT scenario consisting
as to improve the generalization ability for high-dimensional of heterogeneous sensors, where the sensed time-series data are
data [30], [31], [32], [33]. It is also worth noting that the proposed converted into an image as the input for deep learning model.
feature weight net is detachable, which can be regarded as Teng et al. [24] applied deep residual network, which was trained
either an embedded FS method or used as a filter-based FS in a block-wise manner, to recognize different activities through

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3578 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

multi-channel signals as an image. Tang et al. [25] combined than that of the sample place. The lasso [40] method uses l1
CNN and transformer to deal with multi-channel time-series regularization constrain the model solution as follows:
data, where both the local and global correlation between the
signals were explored. Although good recognition accuracy can min y − Xw22 + λw1 (1)
w
be obtained by using CNN to deal with the multi-channel signal,
the rationality of treating the time-series signals as an image
where y ∈ Rn is the observations, X ∈ Rn×p is the design
still need further investigation, since simply stacking the sensor
matrix contained p features for each observations, w ∈ Rp is
signals cannot satisfy local structure similarity assumption in
a vector of p weights mapping the features to the observations.
image processing tasks. More importantly, using CNN to fuse
As an extension of lasso, the fused lasso applies the l1 -norm
different local information brings difficulty in keeping the model
penalty of both the coefficients and their successive differences
interpretability from the feature level. Therefore, it is desirable
to make the local coefficients homogeneous, thus achieving local
for deep neural network to maintain the features in respective
smooth, which can be expressed as:
physical channel while performing HAR.

B. Channel Attention Mechanism in Human Activity 


M
min y − Xw22 + λ1 w1 + λ2 |wi − wi−1 | (2)
Recognition w
i=2
In addition to feature learning from multiple physical chan-
nels, CNN feature channel attention is taken into consideration, where |wi − wi−1 | is the absolute value of the difference be-
which can further reduce the feature redundancy and realize the tween the weights of adjacent features, M represents the total
channel selectivity (feature selection) to improve the recognition number of feature weights and wi denotes the ith weight. Re-
accuracy. Huang et al. [35] et al. considered shallow CNNs with cently, fused lasso [41] has been widely used in signal processing
channel-selectivity for the use of HAR, and further improve the and biomedical engineering. For fused lasso model, complex
performance without any extra cost. Huang [36] et al. proposed iterative methods are often used to solve the problem, where a
a channel equalization mechanism to solve the “channel col- lot of parameters are manually set and it is also time consuming
lapse” phenomenon in HAR tasks. By performing whitening when solving some large scale problems.
or decorrelation operation, it reactivates the inhibited channels
and compels them to feature representation. Tang et al. [37]
III. PROPOSED MODEL
developed a tripe cross domain attention mechanism, concerning
sensor modality attention, CNN channel attention and temporal A. Motivation
attention simultaneously, which could associate the specific
In our multi-sensor HAR problem, the increasing number
action with relevant features in the corresponding sensors. Wang
of sensors incurs high processing costs and affects real-time
et al. [38] also considered the spatial, temporal and feature-level
activity monitoring. In our method, we concentrate on the
attention together, where a novel attention-based deep HAR
physical channel level, where we regard the sparsity of sensor
model was proposed. Han et al. [39] use channel attention mech-
weight accumulated by the channel weights and the similarity
anism to enhance the interpretabilty of the deep learning model,
between the weights of these channels, which just matches
where extracted CNN features were selected in a channel-wise
the philosophy of above fused Lasso in (2). On one hand,
manner.
the sparsity of sensor weights can be used to perform sensor
Although the above works employ channel attention to re-
selection; on the other hand, considering the similarity of the
move redundant and irrelevant features, most of them only
channel weights within one sensor can achieve the maximum
consider CNN feature channel attention, but not the physical
usage and minimize the energy cost each sensor. Both of the
channels of the sensors, which can hardly be resolved to reflect
above two terms can contribute to save the energy cost in HAR.
the features in physical channels. There are a number of key
In addition, in our problem, for each channel, the sparsity of
challenges here in order to address these issues. How to select
the features should also be considered. By considering both
key sensors. And how to address the fact that there may be
the sensor selection and feature selection (sensor weights and
similar trends between channels within the same sensor that have
feature weights), it is convenient to identify which sensor is more
an impact on the recognition performance, and to consider the
important and which feature is important to specific actions in
contribution of different physical channels within the sensor to
HAR, thus increasing the model interpretability from the feature
the task. By using a deep fusion lasso to smooth out differences
level.
between neighboring channels and highlight sensors that are
The detailed structure of our proposed dfLasso-Net is shown
useful for the HAR task, our approach is able to achieve high
in Fig. 1. The entire network consists of two parts: two-level
recognition performance. Moreover, little deep learning works
weight computing module (TLWCM) which contains sensor
focus on the energy-efficiency issue of multisensor HAR.
weight net and feature weight net, and classification net. The
sensor weight net and the feature weight net are used to cal-
C. Fused Lasso culate the sensor weights and the feature weights to measure
In machine learning, sparse optimization models, such the importance of different sensors and features respectively.
as Lasso are often used to solve ill-posed underdetermined The classification net is used to perform classification given the
equations due to the dimension of the variable space being higher learned features.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3579

Fig. 1. The structure of our proposed dfLasso-Net, which consists of a two-level weight computing module and classification net.

Fig. 2. Details of weight generation net and fusion optimization net in sensor weight net.

B. Sensor Weight Net


As shown in Fig. 1, sensor weight net is mainly composed of
weight generation net and fusion optimization net.
1) Weight Generation Net: Following the philosophy of
fused lasso, weight generation net is used to calculate sensor
weights to measure the priority of sensors. In the sensor level,
the features in all the channels should be used to learn the
weight. As shown in Fig. 2, we integrate the feature vectors in
different channels within all the sensors and extract their mean
and maximum values to merge into two vectors respectively,
inspired by the spirit of pooling operation that can get the rep-
resentative features. Then we put them into the same multilayer Fig. 3. Optimization sub-net in fusion optimization net.
perceptron (MLP) to get the vector outputs, which contains one
hidden layer. The corresponding vectors are summed up and
the softmax activation function is used to balance the weights where Sk is the kth sensor’s weight, ck,i is the ith channel weight
of each channel. In sensor weight net, since the input data is in the kth sensor. In testing phase, after the sensor weights are
channel-wise, the weight generation net calculates the weight of obtained, they are sorted in descending order. We select the
each sensor based on the accumulation of its multiple channels sensors with higher weights as the retained sensors and set the
as follows: small weight value to zero. The internal feature weights of each
sensor are further calculated by means of feature weight net.

Sk = ck,i (3) 2) Fusion Optimization Net: Fig. 3 presents the fusion opti-
i=1 mization net with the aim of exploring the correlation between

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3580 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

the internal physical channels of each sensor, which consists of


multiple optimization sub-nets with each working for one sensor,
respectively. As mentioned above, we obtained the weights of
all the internal channels of the sensors by means of weight
generation net. To maximize the usage of each sensor, it is
desirable to equalize the contribution of its physical channels.
Thus, we extend fused lasso model, which constrains the weights
of the internal channels of the sensors to be as similar as possible
by using the weight difference matrix as input. For multisensor
HAR task, considering d physical channels within each sensor,
fused lasso can be extended to a more general case:


d  Fig. 4. Feature sub-net in feature weight net.
min l(c) + λ1 |ci | + λ2 |ci − cj | (4)
c
i=1 (i,j)∈(1,...,d)
the weights within each sensor. The structure of feature sub-net
where c denotes the channel weight vector, ci , i = 1, . . . , d, is is based on improvement of non-local network [43] shown in
the weight of ith channel and l(c) can be applied as a general Fig. 4, which invokes the attention mechanism of self-attention
loss term. Equation (4) is developed based on generalized fused GAN to obtain the key motion features. We combine the feature
lasso [42], which is suitable for the classification problems. vectors from different channels within the same sensor into one
Contrasting with (1), we add the new component |ci − cj |. feature vector and then convert the features into different feature
This component constrains the channels within the same sensor. spaces by 1D convolution. The size of 1D convolution is 1 × nf ,
This addresses the fact that there may be similar trends between where nf denotes the number of features in each sensor. The
channels within the same sensor that have an impact on the final 1×1 convolution is used for feature channel adjustment. We
recognition performance, maximizing the use of each sensor. regard the output as the feature weights within each sensor, rather
After generating all the internal channel weights of the sen- than the final feature weights, as the the priority of different
sors, we calculate the difference between the weights for internal sensors also needs to be considered.
channels in the same sensor and generate weight difference For each feature, the final feature weight is determined by the
matrix. Specifically, for d channels in each sensor, a d × d product of sensor weight and the internal feature weight of the
symmetric matrix D is obtained, where Di,j = |ci − cj | records sensor. As mentioned above, in testing phase for discarded sen-
the weight difference between ith channel and jth channel. sors, the internal feature weight is set to 0, so as the final weight
We also use the mean value and maximum value vectors in is also 0. For retained sensors, the final weight is calculated as:
each sensor obtained in the weight generation net and input
wk,i,p = Sk ∗ fi,p (5)
them into optimization sub-net, where the mean value difference
matrix and the maximum difference matrix are calculated in where Sk denotes the weight of the kth sensor and fi,p represents
the same way as D mentioned above. Then, we pass them the feature weight of pth feature in ith channel calculated by
into two CNNs, respectively, to further extract deep features feature weight net.
and obtain the corresponding deep feature vectors. Both of the At last, as shown in Fig. 1, the feature subset is obtained
CNNs have the same structure, consisting of three convolutional by sorting the final feature weights in descending order and
layers, Layer 1 with one convolutional kernel (size of d × d, select the features with higher weights. Since the feature weights
stride = d, padding = d), Layer 2 with one convolutional ker- within the discarded sensors are always 0, the feature selection
nel (size of 3 × 3, stride = 3, padding = 3) and Layer 3 with d process always give preference to the features in the retained
convolutional kernels (size of 3 × 3, stride = 1, no padding). sensors. Thus, we can complete the sensor selection and feature
Finally, we add the two resultant vectors and calculate a matrix selection process simultaneously.
multiplication with the weight difference matrix to obtain the
difference output, which has the same length as the number of D. Classification Net
physical channels d. The output is split into individual elements
and fed back into the input feature vectors of weight generation Classification net is mainly used to optimize the correlation
net. With the interaction of the weight generation net and the between the weighted features and labels. When combined with
fusion optimization net, the sensor weights are continuously TLWCM, an end-to-end structure for classification is estab-
optimized and exhibit a reasonable distribution. lished, where the feature subset as the intermediate results can be
determined based on the feature weights. Classification net can
be replaced with other pre-trained network models to optimise
C. Feature Weight Net
the weighted features in case of different task requirements.
Feature weight net is used to learn the feature weights within In this case, an embedded feature selection can be performed.
each physical channel to remove the redundant features. As Besides, since during the training process, feature weights and
shown in Fig. 1, feature weight net consists of several feature sensor weights are optimized continuously, the obtained feature
sub-nets. For each sensor, we use a feature sub-net to calculate subset can be combined with other different classifiers for the

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3581

downstream classification tasks, which can be regarded as a filter TABLE I


DATA SEGMENTATION FOR THREE DATASETS
feature selection method. Theoretically, our proposed TLWCM
is flexible to realize deep feature selection and classification
tasks.

E. Differentiable Loss Function


In view of the above characteristics of our proposed dfLasso-
Net, which can perform sensor selection, feature selection and
classification, simultaneously, the loss function can be expressed
as follows: IV. EXPERIMENT SETTINGS AND RESULTS

(ĉ, ŵ) = argmin CE(c, w) + λ1 |wk,i,p | A. Datasets
k=1 i=1 p=1 Three benchmark multisensor HAR datasets are used to ver-
  ify the effectiveness of our proposed dfLasso-Net, including
+λ2 |ck,i − ck,j | + λ3 |ck,i | (6)
Daily and Sports Activities Data Set (DSA) [45] and PAMAP2
k=1 i=1 j=1 k=1 i=1
Physical Activity Monitoring dataset (PAMAP2) [46] from UCI
where CE(·) is cross entropy loss function, c is channel weight Machine Learning Repository, and one real-world dataset named
vector, wk,i,p is the weight of pth feature of the jth channel of Skoda Mini Checkpoint (Skoda) [47]. Table I shows the data
the kth sensor and ck,i is the weight of the ith channel of the kth segmentation for three datasets, which are suggested by their cor-
sensor. Since the weights c and w are the function of network responding literature. And the brief introduction to the datasets
parameters Θ, cand w are as follows:
can be used as variables in the loss 1) DSA: The DSA dataset are collected from 5 sensor units
function, where k=1 i=1 |ck,i | guarantee the channel-wise
sparsity, and |ck,i − ck,j | is used to restrain the internal channels (each with three modalities: accelerometer, gyroscope and mag-
within the same sensor, which can maximize the usage of each netometer) placed on the left arm, right arm, left leg, right
  leg and torso. Therefore, 3 × 5 = 15 sensors are considered in
sensor and k=1 i=1 p=1 |wk,i,p | is used to perform feature
selection (through calculating the feature weights) within each our problem. In each sensor, 3 physical channels (x, y and z
channel in the corresponding sensor. directional component) are included.
However, the penalty in the above formula is non- 2) PAMAP2: The PAMAP2 dataset are collected based on 3
differentiable, and it’s usually difficult to obtain a true sparse wearable measurement units (each with each with three modali-
solution via back propagation. Thus, we consider applying ties: accelerometer, gyroscope and magnetometer). Therefore,
approximated function that is differentiable to replace directly 3 × 3 = 9 wearable sensors are considered in our problem,
optimizing the l1 norm. where each sensor consists of 3 physical channels (x, y and
As the l1 norm penalty is not completely differentiable, we z directional component).
used the hyperbolic tangent function [44] to approximate l1 3) Skoda: The Skoda dataset are sampled by 2 × 10 USB
norm. The hyperbolic tangent function has an adjustable slope at sensors placed on the left and right upper and lower arm. Each
the origin and is bounded by the line y = ±1, making it a suitable sensor contains two types of data with 6 physical channels. In our
surrogate function for l1 norm, which can be expressed as: work, left arm’s and right arm’s sensors are separately used for
the experiments, where Skoda (left) and Skoda (right) datasets
|x| ≈ f (x) = axtanh(γ ∗ x) (7) are tested.

where a and γ are parameters, and γ  1. When a = 1, the B. Feature Extraction


larger of γ can provide a closer of the function. Then, the loss
function in (6) can be updated as: Considering that hand-crafted features contain domain knowl-
edge and can reflect the physical meanings in real-world sce-
(ĉ, ŵ) = argmin CE(c) nario, we choose the minimum, the maximum, the mean value,
 variance, skewness, kurtosis, five peaks of the discrete Fourier
+ λ1 wk,i,p ∗ tanh(γ ∗ wk,i,p ) transform (DFT) as feature construction and extraction criteria
k=1 i=1 p=1 suggested by [6]. Also, min-max normalization is used to reduce
 the range differences between features. Therefore, a total of 11
+ λ2 (ck,i − ck,j ) ∗ tanh(γ ∗ (ck,i − ck,j ))
× 15 × 3 = 495 features can be obtained for the DSA dataset,
k=1 i=1 j=1
 PAMAP2 dataset has 11 × 9 × 3 = 297 features and 11 × 10 ×
+ λ3 ck,i ∗ tanh(γ ∗ ck,i ) (8) 6 = 660 for Skoda (left) and Skoda (right) dataset, respectively.
k=1 i=1

In our work, we set γ = 100 since it is very close to |x| and C. Training Details
the approximation works better in comparison to [44], where In our work, we use repeated random sub-sampling (RRSS)
γ = 50 is used to approximate the norm. for the three datasets, where 80% of the samples from each

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3582 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

TABLE II
COMPARISON AS EMBEDDED METHODS ON THREE DATASETS

dataset are used as the training set and the remaining 20% as the As can be seen in Table II, dfLasso-Net achieves the best
testing set. The model parameters are optimized by stochastic recognition performance on DSA and Skoda (left & right) in
gradient descent (SGD) with batch size 64, and the weights of terms of accuracy, WeightF and AUC, while the accuracy on
regularizer are 1e-4. We also set the initial learning rate accord- PAMAP2 is slightly worse than ResNet+TA. The proposed
ing to different datasets (DSA: 0.05, PAMAP2: 0.02, Skoda: dfLasso-Net can reduce the number of unnecessary sensors
0.03) and the training step is set to 40. We conducted experiments through the sensor weight Net and the redundant features within
on a server equipped with 8 Tesla P100-PCIE-16 GB GPUs. each sensor by feature weight net, respectively. Compared with
According to the parameter settings suggested in fused group existing works, only with sensor selection or feature selec-
lasso method [48], we test different cases of (λ1 , λ2 , λ3 ) chosen tion, dfLasso-Net can better overcome the overfitting issue in
from the set {1e-5, 2e-5, 5e-5,..., 1e-1, 2e-1, 5e-1, 1, 2, 5}. Then, large-scale networks and maintain good generalization ability.
the one that yield the smallest number of misclassified labels is This two-level data selection mechanism also enhances the
used. The training and testing process are repeated 10 times with intepretability of the model for HAR.
the average value as the final results.
For the classification net, we use pre-trained Resnet18 [49] E. Feature Selection
due to its popularity. Furthermore, two metrics are used for eval-
uation, including accuracy and weighted F1-score (WeightF). DfLasso-Net can be used as a filter-based feature selection
method for HAR. To validate the effectiveness of dfLasso-Net
1
L
T P i + T Ni on feature selection, three Lasso based methods (I-CNN, tradi-
Accuracy = (9) tional sparse group lasso, fused lasso with Alternating Direction
L i=1 T Pi + F Ni + F Pi + T Ni
Method of Multipliers (ADMM)) and one ensemble method,
where i denotes class index and L is the number of classes. MSF-EP are chosen for comparison. Since I-CNN and MSF-EP
 only measure the importance of sensors without any feature
T Pi
W eightF = 2 ∗ si ∗ (10) selection process, we adapted the methods for a fair compar-
F N i + F Pi + 2 ∗ T Pi
i ison. Specifically, we perform feature selection on the internal
where si denotes the proportion of samples of class i; class index features of the retained sensors and get the feature subsets. This
and l is the number of classes. process is implemented though the feature selection module in
the scikit-learn machine learning library (sklearn). Specifically,
we use SelectKBest module and set the “mutual_info_classif”
D. Comparison of Overall Performance
module with default parameters as the score function. K stands
DfLasso-Net can perform sensor selection and HAR in an end- for the number of selected features.
to-end structure, where embedded feature selection is conducted. For fair comparison, all the methods use the same neural
We compare dfLasso-Net with state-of-the-art HAR methods network (NN) classifier [52]. In addition, we compute the av-
involving either sensor selection or feature selection modules, erage testing results of different methods according to a given
including convolutional neural network based group lasso (I- range of selected features K. It is indicated from Table III that
CNN) [50], discriminant sensor pruning (MSF-EP) method [6], dfLasso-Net achieves the best average recognition performance
shallow convolutional neural networks with channel-selectivity with the same number of selected features. I-CNN is worse than
(ResNet+SelectConv) [35], deep multifeature attention net- dfLasso-Net, but outperforms MSF-EP on three datasets. As
work (DMEFAM) [38] and Triple cross-domain attention based traditional methods, sgLasso and fused lasso perform closely on
Resnet (Resnet+TA) [37]. MSF-MP considers sensor pruning Skoda, but very different on DSA and PAMAP2. This is due to
and HAR in multisensor HAR and I-CNN considers identifying the fact that the number of actions on DSA and PAMAP2 are
the important sensors, while Resnet+SelectConv, DEMFAM and more than Skoda, which contains more highly related actions to
Resnet+TA applied channel attention mechanism to perform identify. Experimental results verify that dfLasso-Net can select
feature selection in HAR. A statistical Wilcoxon significance more important and relevant features for HAR tasks.
test [51] with 5% significance level is applied, where the best To verify the effectiveness and stability of dfLasso-Net, the
results are bold and the second best are underlined. recognition results under different number of selected features

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3583

TABLE III
COMPARISON AS FILTER METHODS ON THREE DATASETS

Fig. 5. Comparisons among different methods on the Skoda dataset (using left arm’s data where the number of total features is 11 × 6 × 10 = 660).

are given in Fig. 5. With a small K, the testing results for all the
methods improve significantly as K increases, indicating that ef-
fective features are more important for recognition performance
when the number of retained sensors is small. When K increases
to a certain value, the improvement of the testing results for all
methods becomes smaller, suggesting that there is a threshold
for the number of effective sensors in HAR tasks and that too
many sensors and internal features do not improve recognition
performance much. Undoubtedly, dfLasso-Net not only achieves
the best performance but also performs the most stable on Skoda
when using left arm’s data.

F. Sensor Selection for Energy Saving


We also mark top N important sensors’ position selected by
dfLasso-Net on the three datasets (where N = 4 for PAMAP2,
N = 6 for DSA and N = 5 for Skoda (left) and Skoda (right),
respectively) in Fig. 6. It can be seen that on PAMAP2, the
triaxial accelerometer of the right hand is mainly selected, which
Fig. 6. Top N most important sensors selected by dfLasso-Net on three
is beneficial to recognize more arm movements. In addition, datasets. (a) On PAMAP2. (b) On DSA. (c) On Skoda (right arms). (d) On
dfLasso-Net also select the triaxial accelerometer of the left Skoda (left arms). Skoda contains Calibrated and raw acceleration data.
ankle to recognize motor tasks with foot movements (e.g.,
walking and rope jumping). On DSA, since the sensor units are
placed on the limbs and torso, dfLasso-Net prefers to select both sensors located in the upper or lower arm. This is because the
limbs’ sensors to complete the recognition, for the category that action classes on Skoda require simultaneous movement of the
contains many motor tasks related to foot movements. As for upper and lower arms of both hands to complete. Therefore,
Skoda, the sensor positions are basically located on the upper our proposed dfLasso-Net is capable of removing the redundant
and lower arms of the hands. It can be seen that whether the sensors while performing effective sensor selection.
sensor data of the left arm or the right arm is used for testing, We also compare the sensor selection results, where the num-
the positions of selected sensors obtained by dfLasso-Net is ber of selected sensors is recorded in Table V. It is concluded that
relatively scattered, without the scenario that most of the selected dfLasso-Net can select the least number of sensors as important

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3584 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

Fig. 7. Feature weight distribution within sensors on the Skoda dataset.

TABLE IV
AVERAGE RESULTS OF ABLATION ANALYSIS

Fig. 9. Feature weight distribution within sensors on DSA.

Fig. 8. Feature weight distribution within sensors on PAMAP2. TABLE V


COMPARISON OF THE NUMBER OF SELECTED SENSORS

sensors compared with that by I-CNN, MSF-EP, sgLasso and


fused Lasso. In our problem, we assume that each sensor have
the same working power p. So, the total energy consumption is
proportional to the number of selected sensors. Thus, the number
of selected sensors can represent the trend of energy cost. The
lower the number of sensors used, the lower the energy cost.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3585

Fig. 10. Confusion matrices.

smooth the weights of channels within each sensor. Figs. 7 to


G. Visualization of Confusion Matrices and Feature Weights 9 present the distribution of feature weights on three datasets,
We visualize the confusion matrices of activity recognition which can be seen that the distribution of important features is
on three datasets in Fig. 10. On PAMAP2, “house cleaning”, closely related to the priority of sensors selected by dfLasso-Net.
“vacuum cleaning” and “folding laundry” are often misclassified With the smoothing effect of fused lasso, feature weights within
to each other, because the actions of them are very similar. the same sensor is closer and clearly differentiated from other
The same situation is also shown on DSA, where “walking sensors. Moreover, the selection of features within each channel
on treadmill (flat)” and “walking walking on treadmill (15 deg also varies among sensors. We can observe that within each
inclined positions)” easy to be misclassified, which shows that it channel, the differences of feature weights are more significant
is difficult to detect the effect of treadmill’s inclination changes for Skoda than those of PAMAP2 and DSA. Since only 11
on human action. While on Skoda, different results are obtained features are extracted, we can rank the features in each channel
by different arms’ data. In the case of testing with left arm’s according to the weights and then accumulate the channel-wise
data, the classification error mainly appeared on “open and close feature rankings to get the corresponding feature rankings in a
hood”, while “open and close left front door” are misclassified sensor. Then, it is easy to tell which feature is more important and
on right arm’s data, because these actions have high correlation perform sensor level feature selection, having good guidelines
with each other, and easily confused. for studying the feature extraction criteria of different datasets.
We also observe the distribution of feature weights across dif- Furthermore, we compare the difference of weight distri-
ferent channels and sensors. In previous discussion, dflasso-Net bution and running time obtained by the approximated loss

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3586 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

V. CONCLUSION AND FUTURE WORK


In this paper, we introduce a novel deep network model called
dfLasso-Net for sensor and feature selection. This model ad-
dresses sensor and internal feature redundancy to enable energy-
efficient and interpretable multisensor HAR tasks. Specifically,
by considering the spatial smoothness between the internal chan-
nels of sensors, we extend fused lasso to constrain the weights
between channels and fuse the information between sensors so
as to select the sensors and features with higher priority. By
using hand-crafted features as input, dfLasso-Net can identify
the important sensors and relevant features to the specific ac-
tivities. Besides, by maintaining the physical meanings of the
features in a channel-wise manner, the proposed model has bet-
ter interpretability compared with CNN-based feature learning
and fusion. Extensive experimental results show that proposed
dfLasso-Net achieves the best recognition performance and
Fig. 11. Comparison of Weight distribution between hyperbolic tangent func- stability on three public human activity recognition datasets.
tion and that of original l1 norm.
In the future work, one one hand, we consider extending our
method to more complex real-world scenarios, where different
function and the original l1 norm, which shown in Fig. 11. We sensors have different working power and energy cost. There-
can see that using hyperbolic tangent to approximate l1 norm fore, to achieve the goal of energy-efficient HAR, minimizing
has greater distribution because the weight difference is more the total energy consumption considering sensor utility is of
significant. By recording the average running time, it is found good significance. On the other hand, more efforts will be made
that the time cost of using approximation function is 1.35 times on enhancing the model interpretability from the raw signals or
of that by using original l1 norm. The results demonstrate that deep features, which is promising to obtain a better classification
although optimizing the approximated l1 norm increase the time performance.
complexity of training to some extent, it can obtain a better
weight distribution that facilitate the sensor selection process.
APPENDIX
H. Ablation Study ANALYSIS OF ENERGY-EFFICIENT HAR THROUGH SENSOR
WEIGHT NET
To verify the effectiveness of different modules in proposed
method, we conduct further tests. Specifically, we remove the Sensor weight net has a recursive learning structure consisting
entire sensor weight net (marked as sennet), the fusion optimiza- of two core components, weight generation net and fusion opti-
tion net separately (marked as optnet) and the feature weight mization net. As shown in Fig. 2, weight generation net indeed
net (marked as feanet), then test the corresponding results after considers the correlation among the channels of all the sensors,
the removal. All the testing results are shown in the Table IV. where MLP extracts the global information from all the sensors.
Since the network cannot calculate feature weights after re- In the meantime, the channels in individual sensor are input
moving the feature weight net, we compute the average testing into the fusion optimization net to explore the local correlation
results with using different numbers of sensors. In other cases, among the channels. This global and local information explo-
we compute the average testing results with a given range of ration mechanism can ensure that the weights (importance) of
selected features. It can be seen that when the sensor weight sensor can be learned appropriately.
net is removed, the network only calculate feature weights in To implement energy-efficient multisensor HAR, suppose
all sensors without performing sensor selection, and the results there are the same M body sensors with each sensor consisting
are all lowered, which show that in HAR tasks, it is necessary of d physical channels, the computing unit cost in terms of
to select the useful sensors. The results obtained by removing electric current for each sensor is I(t). For d channels in each
the fusion optimization net alone are also lower than those by sensor,
d the weights output by MLP and softmax function satisfy
the original network, indicating that the fusion optimization net i=1 i = 1. At first, the average current for each sensor can
c
based on fused lasso indeed improve the performance of our be obtained:
method. When the feature weight net is removed, the results on 
PAMAP2, DSA and Skoda are closer to the original network, 1 T
I= I(t)dt (11)
because only the function of sensor selection is retained, and T 0
the corresponding results are based on the use of all the features
within the different numbers of sensors. The proposed method According to the weight assignment of channels, the corre-
retains the effective features in the process of feature selection sponding magnitude of current in channel i of sensor k is ci ∗ I.
and avoids the influence of redundant features on the recognition As we know, the energy cost for channel i, Ei ∝ (ci ∗ I)2 . Thus,
performance. minimizing energy consumption for each sensor is accumulated

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3587

as follows: [15] Q. Xiao, L. Wu, X. Wu, and M. Rätsch, “Simulating temporally and
spatially correlated wind speed time series by spectral representation

d 
d method,” Complex Syst. Model. Simul., vol. 3, no. 2, pp. 157–168, 2023.
2 2 [16] K. Hirooka, M. A. M. Hasan, J. Shin, and A.Y. Srizon, “Ensembled
min ci ∗ I , s.t. ci = 1 (12) transfer learning based multichannel attention networks for human activity
i=1 i=1 recognition in still images,” IEEE Access, vol. 10, pp. 47051–47062, 2022.
[17] T. Hasegawa and K. Kondo, “Easy ensemble: Simple deep ensemble learn-
Based on the inequality of arithmetic and geometric means, it ing for sensor-based human activity recognition,” IEEE Internet Things J.,
is easy to obtain that when ci = 1/d, i = 1, . . . , d, the minimum vol. 10, no. 6, pp. 5506–5518, Mar. 2023.
[18] Y. Zhou, Z. Yang, X. Zhang, and Y. Wang, “A hybrid attention-based deep
energy cost is I 2 /d. This conclusion is consistent to the spirit neural network for simultaneous multi-sensor pruning and human activity
of our fusion optimization net, where the weight differences recognition,” IEEE Internet Things J., vol. 9, no. 24, pp. 25363–25372,
between channels are minimized and smoothed. This can ensure Dec. 2022.
[19] A. Bevilacqua, K. MacDonald, A. Rangarej, V. Widjaya, B. Caulfield,
that the selected sensors can be maximum utilized and mean- and T. Kechadi, “Human activity recognition with convolutional neu-
while maintaining a minimum energy cost. Since the sparsity of ral networks,” in Proc. Mach. Learn. Knowl. Discov. Databases, 2019,
the channels across all the sensors is also concerned in weight pp. 541–552.
[20] G. Mohmed, A. Lotfi, and A. Pourabdollah, “Employing a deep con-
generation net, for the channel with weight equal to zero, the volutional neural network for human activity recognition based on bi-
others channels will also be set to zero, which means the sensor nary ambient sensor data,” in Proc. 13th ACM Int. Conf. PErvasive
will be discarded. Therefore, by using sensor weight net, sensor Technol. Related Assistive Environ., 2020, pp. 1–7. [Online]. Available:
https://doi.org/10.1145/3389189.3397991
selection and physical channel equalization can be achieved, [21] S. Mekruksavanich and A. Jitpattanakul, “Smartwatch-based human ac-
where energy-efficient HAR can be implicitly performed. tivity recognition using hybrid LSTM network,” in Proc. IEEE SENSORS,
2020, pp. 1–4.
[22] Y. Zhang, Z. Zhang, Y. Zhang, J. Bao, Y. Zhang, and H. Deng, “Human
activity recognition based on motion sensor using u-net,” IEEE Access,
REFERENCES vol. 7, pp. 75213–75226, 2019.
[1] F. Gu, M.-H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu, “A [23] M. Abdel-Basset, H. Hawash, V. Chang, R. K. Chakrabortty, and M. Ryan,
survey on deep learning for human activity recognition,” ACM Comput. “Deep learning for heterogeneous human activity recognition in complex
Surv., vol. 54, no. 8, pp. 1–35, 2021, doi: 10.1145/3472290. Iot applications,” IEEE Internet Things J., vol. 9, no. 8, pp. 5653–5665,
[2] X. Zhou, W. Liang, K. I.-K. Wang, H. Wang, L. T. Yang, and Q. Jin, “Deep- Apr. 2022.
learning-enhanced human activity recognition for internet of healthcare [24] Q. Teng, L. Zhang, Y. Tang, S. Song, X. Wang, and J. He, “Block-wise
things,” IEEE Internet Things J., vol. 7, no. 7, pp. 6429–6438, Jul. 2020. training residual networks on multi-channel time series for human ac-
[3] Y. Zhou, R. Wang, Y. Wang, S. Sun, J. Chen, and X. Zhang, “A swarm tivity recognition,” IEEE Sensors J., vol. 21, no. 16, pp. 18063–18074,
intelligence assisted IoT-based activity recognition system for basket- Aug. 2021.
ball rookies,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 8, no. 1, [25] Y. Tang, L. Zhang, H. Wu, J. He, and A. Song, “Dual-branch interac-
pp. 82–94, Feb. 2024. tive networks on multichannel time series for human activity recogni-
[4] Y. Guo et al., “Evolutionary dual-ensemble class imbalance learning for tion,” IEEE J. Biomed. Health Inform., vol. 26, no. 10, pp. 5223–5234,
human activity recognition,” IEEE Trans. Emerg. Topics Comput. Intell., Oct. 2022.
vol. 6, no. 4, pp. 728–739, Aug. 2022. [26] K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie, “A semisuper-
[5] D. Tao, L. Jin, Y. Yuan, and Y. Xue, “Ensemble manifold rank preserving vised recurrent convolutional attention model for human activity recogni-
for acceleration-based human activity recognition,” IEEE Trans. Neural tion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1747–1756,
Netw. Learn. Syst., vol. 27, no. 6, pp. 1392–1404, Jun. 2016. May 2020.
[6] J. Cao, W. Li, C. Ma, and Z. Tao, “Optimizing multi-sensor deployment [27] J. Wang et al., “Tac-trainer: A visual analytics system for IoT-based
via ensemble pruning for wearable activity recognition,” Inf. Fusion, racket sports training,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 1,
vol. 41, pp. 68–79, 2018. [Online]. Available: https://www.sciencedirect. pp. 951–961, Jan. 2023.
com/science/article/pii/S1566253517304803 [28] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why should i trust you?’
[7] M. M. Islam and T. Iqbal, “MuMu: Cooperative multitask learning-based explaining the predictions of any classifier,” in Proc. 22nd ACM SIGKDD
guided multimodal fusion,” in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 1135–1144.
pp. 1043–1051. [29] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model
[8] M. M. Islam and T. Iqbal, “Hamlet: A hierarchical multimodal attention- predictions,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4768–4777,
based human activity recognition algorithm,” in Proc. IEEE/RSJ Int. Conf. vol. 30.
Intell. Robots Syst., 2020, pp. 10285–10292. [30] Y. Zhou, W. Zhang, J. Kang, X. Zhang, and X. Wang, “A problem-specific
[9] Y. Zhou, C. Xie, S. Sun, X. Zhang, and Y. Wang, “A self-supervised human non-dominated sorting genetic algorithm for supervised feature selection,”
activity recognition approach via body sensor networks in smart city,” Inf. Sci., vol. 547, pp. 841–859, 2021. [Online]. Available: https://www.
IEEE Sensors J., vol. 24, no. 5, pp. 5476–5485, Mar. 2024. sciencedirect.com/science/article/pii/S0020025520308549
[10] Y. Zhu, H. Luo, R. Chen, and F. Zhao, “DiamondNet: A neural-network- [31] J. Wang, H. Ouyang, Z. Zhou, and S. Li, “Harmony search algorithm
based heterogeneous sensor attentive fusion for human activity recogni- based on dual-memory dynamic search and its application on data
tion,” IEEE Trans. Neural Netw. Learn. Syst., early access, Jul. 04, 2023, clustering,” Complex Syst. Model. Simul., vol. 3, no. 4, pp. 261–281,
doi: 10.1109/TNNLS.2023.3285547. 2023.
[11] E. Tong et al., “A hierarchical energy-efficient service selection approach [32] Y. Zhou, Y. Qiu, and S. Kwong, “Region purity-based local feature selec-
with QoS constraints for Internet of Things,” IEEE Trans. Green Commun. tion: A multiobjective perspective,” IEEE Trans. Evol. Comput., vol. 27,
Netw., vol. 5, no. 2, pp. 645–657, Jun. 2021. no. 4, pp. 787–801, Aug. 2023.
[12] T. Zhao, X. Chen, Q. Sun, and J. Zhang, “Energy-efficient federated [33] Y. Zhou, N. Yang, X. Huang, J. Lee, and S. Kwong, “A novel mul-
learning over cell-free Iot networks: Modeling and optimization,” IEEE tiobjective genetic programming approach to high-dimensional data
Internet Things J., vol. 10, no. 19, pp. 17436–17449, Oct. 2023. classification,” IEEE Trans. Cybern., early access, Mar. 18, 2024,
[13] D. Tao, Y. Wen, and R. Hong, “Multicolumn bidirectional long short- doi: 10.1109/TCYB.2024.3372070.
term memory for mobile devices-based human activity recognition,” IEEE [34] E. Sansano, R. Montoliu, and B. Fernández, “A study of deep neural
Internet Things J., vol. 3, no. 6, pp. 1124–1134, Dec. 2016. networks for human activity recognition,” Comput. Intell., vol. 36, no. 3,
[14] A. Anagnostis, L. Benos, D. Tsaopoulos, A. Tagarakis, N. Tsolakis, and D. pp. 1113–1139, 2020, doi: 10.1111/coin.12318.
Bochtis, “Human activity recognition through recurrent neural networks [35] W. Huang, L. Zhang, Q. Teng, C. Song, and J. He, “The convolutional
for human–robot interaction in agriculture,” Appl. Sci., vol. 11, no. 5, neural networks training with channel-selectivity for human activity recog-
2021, Art. no. 2188. [Online]. Available: https://www.mdpi.com/2076- nition based on sensors,” IEEE J. Biomed. Health Inform., vol. 25, no. 10,
3417/11/5/2188 pp. 3834–3843, Oct. 2021.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3588 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024

[36] W. Huang, L. Zhang, H. Wu, F. Min, and A. Song, “Channel-equalization- Jingtao Xie received the B.S. degree in Internet of
HAR: A light-weight convolutional neural network for wearable sensor Things engineering from the Guangdong University
based human activity recognition,” IEEE Trans. Mobile Comput., vol. 22, of Technology, Guangzhou, China, in 2022. He is cur-
no. 9, pp. 5064–5077, Sep. 2023. rently working toward the M.S. degree with the Col-
[37] Y. Tang, L. Zhang, Q. Teng, F. Min, and A. Song, “Triple cross-domain lege of Computer Science and Software Engineering,
attention on human activity recognition using wearable sensors,” IEEE Shenzhen University, Shenzhen, China. His research
Trans. Emerg. Topics Comput. Intell., vol. 6, no. 5, pp. 1167–1176, focuses on multimodal human action recognition.
Oct. 2022.
[38] Y. Wang et al., “A novel deep multifeature extraction framework based
on attention mechanism using wearable sensor data for human activity
recognition,” IEEE Sensors J., vol. 23, no. 7, pp. 7188–7198, Apr. 2023.
[39] C. Han et al., “Understanding and improving channel attention for human
activity recognition by temporal-aware and modality-aware embedding,”
IEEE Trans. Instrum. Meas., vol. 71, 2022, Art. no. 2513612.
[40] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy. Xiao Zhang (Member, IEEE) received the B.Eng. and
Stat. Society, Ser. B. (Methodological), vol. 58, no. 1, pp. 267–288, 1996. M.Eng. degrees from the South-Central University
[Online]. Available: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/ for Nationalities, Wuhan, China, in 2009 and 2011,
j.2517-6161.1996.tb02080.x respectively, and the Ph.D. degree from the Depart-
[41] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, ment of Computer Science, City University of Hong
“Sparsity and smoothness via the fused lasso,” J. Roy. Stat. Soci- Kong, Hong Kong, in 2016. In 2015, he was a Visiting
ety, Ser. B. (Statistical Methodol.), vol. 67, no. 1, pp. 91–108, 2005, Scholar with the Utah State University, Logan, UT,
doi: 10.1111/j.1467-9868.2005.00490.x. USA. During 2016–2019, he was a Postdoc Research
[42] B. Xin, Y. Kawahara, Y. Wang, L. Hu, and W. Gao, “Efficient generalized Fellow with the Singapore University of Technology
fused lasso and its applications,” ACM Trans. Intell. Syst. Technol., vol. 7, and Design, Singapore. He is currently an Associate
no. 4, pp. 1–22, 2016, doi: 10.1145/2847421. Professor with the College of Computer Science,
[43] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” South-Central University for Nationalities. His research interests include al-
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803. gorithms design and analysis, combinatorial optimization, wireless and UAV
[44] J. Shah, I. Qureshi, Y. Deng, and K. Kadir, “Reconstruction of sparse networking.
signals and compressively sampled images based on smooth l1-norm
approximation,” J. Signal Process. Syst., vol. 88, no. 3, pp. 333–344, 2017,
doi: 10.1007/s11265-016-1168-8.
[45] B. Barshan and M. C. Yüksek, “Recognizing daily and sports activities in
two open source machine learning environments using body-worn sensor
units,” Comput. J., vol. 57, no. 11, pp. 1649–1667, 2014.
[46] A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for Wenhui Wu (Member, IEEE) received the B.S. and
activity monitoring,” in Proc. IEEE 16th Int. Symp. Wearable Comput., M.S. degrees from Xidian University, Xian, China, in
2012, pp. 108–109. 2012 and 2015, respectively, and the Ph.D. degree in
[47] P. Zappi et al., “Activity recognition from on-body sensors: Accuracy- computer science from the City University of Hong
power trade-off by dynamic sensor selection,” in Proc. Wireless Sensor Kong, Hong Kong, China, in 2019. She is currently an
Netw., 2008, pp. 17–33. Associate Professor with the College of Electronics
[48] S. Zhang, Z. Zhu, B. Zhang, B. Feng, T. Yu, and Z. Li, “Fused group lasso: and Information Engineering, Shenzhen University,
A new EEG classification model with spatial smooth constraint for motor Shenzhen, China. Her research interests include ma-
imagery-based brain–computer interface,” IEEE Sensors J., vol. 21, no. 2, chine learning, image enhancement, and community
pp. 1764–1778, Jan. 2021. detection.
[49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
pp. 770–778.
[50] E. Kim, “Interpretable and accurate convolutional neural networks for
human activity recognition,” IEEE Trans. Ind. Inform., vol. 16, no. 11,
pp. 7190–7198, Nov. 2020. Sam Kwong (Fellow, IEEE) is currently the Chair
[51] J. Derrac, S. García, D. Molina, and F. Herrera, “A practical tutorial on Professor of computational intelligence, and con-
the use of nonparametric statistical tests as a methodology for comparing currently as Associate Vice-President (Strategic Re-
evolutionary and swarm intelligence algorithms,” Swarm Evol. Comput., search) of Lingnan University, Hong Kong. He is a
vol. 1, no. 1, pp. 3–18, 2011. Distinguished Scholar of evolutionary computation,
[52] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” J. Mach. artificial intelligence (AI) solutions, and image/video
Learn. Res., vol. 12, pp. 2825–2830, 2011. processing, with a strong record of scientific inno-
vations and real-world impacts. He is the Chair Pro-
fessor of computer science with Lingnan University.
He has a prolific publication record with more than
400 journal articles and 160 conference papers with
an h-index of 80 based on Google Scholar. He was listed as one of the top
2% of the world’s most cited scientists, according to the Stanford University
Yu Zhou (Senior Member, IEEE) received the B.Sc. report. He was listed as one of the top 1% of the world’s most cited scientists
degree in electronics and information engineering the by Clarivate in 2022. He has also been actively engaged in knowledge transfer
M.Sc. degree in circuits and systems from Xidian between academia and industry. He was elevated to IEEE Fellow in 2014 for his
University, Xi’an, China, in 2009 and 2012, respec- contributions to optimization techniques in cybernetics and video coding. He was
tively, and the Ph.D. degree in computer science from a Fellow of the US National Academy of Inventors and Hong Kong Academy
the City University of Hong Kong, Hong Kong, in of Engineering and Science. He was the President of the IEEE Systems, Man,
2017. He is currently a tenured Associate Professor and Cybernetics Society (SMCS) from 2021 to 2023. He is an Associate Editor
with the College of Computer Science and Software for a number of leading IEEE transaction journals.
Engineering, Shenzhen University, Shenzhen, China.
His research interests include computational intelli-
gence, machine learning, and intelligent information
processing.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.

You might also like