Energy-Efficient_and_Interpretable_Multisensor_Human_Activity_Recognition_via_Deep_Fused_Lasso_Net
Energy-Efficient_and_Interpretable_Multisensor_Human_Activity_Recognition_via_Deep_Fused_Lasso_Net
5, OCTOBER 2024
Abstract—Utilizing data acquired by multiple wearable sensors visualizing the weights of the sensors and features. Last but not
can usually guarantee more accurate recognition for deep learning least, dflasso-Net can be used as an effective filter-based feature
based human activity recognition. However, an increased number selection approach with much flexibility.
of sensors bring high processing cost, influencing real-time ac-
tivity monitoring. Besides, existing methods rarely consider the Index Terms—Sparse optimization, human activity recognition,
interpretability of the recognition model in aspects of both the deep fused lasso, feature selection, sensor selectivity.
importance of the sensors and features, causing a gap between
deep learning and their extendability in real-world scenario. In this
paper, we cast the classical fused lasso model into a deep neural I. INTRODUCTION
network, proposing a deep fused Lasso net (dfLasso-Net), which
UMAN activity recognition (HAR) [1] has been widely
can perform sensor selection, feature selection and HAR in one
end-to-end structure. Specifically, a two-level weight computing
module (TLWCM) consisting of a senor weight net and a feature
H used in various real-world applications such as health-
care [2] and sports [3]. With the popularity of ubiquitous
weight net is designed to measure the importance of sensors and computing and Internet-of-Things (IoT) technology, HAR with
features. In sensor weight net, spatial smoothness between physical wearable sensors has received extensive attention due to their
channels within each sensor is considered to maximize the usage
of selected sensors. And the feature weight net is able to maintain
high portability, low power consumption and privacy protection
the physical meaning of the hand-crafted features through feature compared with video based methods [4], [5].
selection inside the sensors. By combining with the learning module Due to high complexity and diversity of human activities,
for classification, HAR can be performed. We test dfLasso-Net using a single sensor for classification in HAR tasks is usually
on three multi-sensor based HAR datasets, demonstrating that unsatisfactory because action changes are sensitive to sensor
dfLasso-Net achieves better recognition accuracy with the least
number of sensors and provides good model interpretability by
location, which requires analysis of different actions and de-
ployment at the right location to achieve good recognition
results [6]. In addition, sensors with different modalities are
Manuscript received 16 April 2024; revised 30 May 2024; accepted 25 June complementary to each other so that more comprehensive data
2024. Date of publication 23 July 2024; date of current version 3 October are available [7], [8]. Thus, using multiple sensors on human
2024. This work was supported in part by Guangdong Basic and Applied
Basic Research Foundation under Grant 2024A1515012485, in part by Shen- body, known as body area networks (BANs) can greatly improve
zhen Fundamental Research Program under Grant JCYJ20220810112354002, the recognition accuracy [9], [10], where different modalities,
in part by Shenzhen Science and Technology Program under Grant numbers and locations of sensors in HAR systems are taken into
KJZD20230923114111021, in part by the National Natural Science Foundation
of China under Grant 62376162, in part by the Fund for Academic Innovation consideration. However, as suggested by previous studies on
Teams and Research Platform of South-Central Minzu University under Grant IoT wireless networks [11], [12], using multiple sensors greatly
XTZ24003 and Grant PTZ24001, in part by the Knowledge Innovation Pro- increases the computational complexity and processing costs,
gram of Wuhan-Basic Research through Project 2023010201010151, in part
by the Research Start-up Funds of South-Central Minzu University under Grant which is not adapted to the increasing demand for less time
YZZ18006, and in part by the Spring Sunshine Program of Ministry of Education consumption and energy requirement in ubiquitous computing.
of the People’s Republic of China under Grant HZKY20220331. (Corresponding Moreover, the data or features collected by additional sensors
author: Xiao Zhang.)
Yu Zhou and Jingtao Xie are with the College of Computer Science and are likely to introduce the redundant information and confuse the
Software Engineering, Shenzhen University, Shenzhen 518060, China (e-mail: learning algorithm, which negatively affects the classification
[email protected]; [email protected]). performance. Therefore, it is desirable to identify the effective
Xiao Zhang is with the Department of Computer Science, South-Central
Minzu University, Wuhan 430074, China, and also with the School of Computer sensors out of all the sensors to build a more compact and
Science and Technology, University of Science and Technology of China, Hefei energy-efficient HAR system by removing redundant sensors.
230052, China (e-mail: [email protected]). For multisensor HAR, each sensor often contains multi-
Wenhui Wu is with the College of Electronics and Information Engineer-
ing, Shenzhen University, Shenzhen 518060, China (e-mail: wuwenhui@szu. ple physical channels to process different directional compo-
edu.cn). nents of the captured motion signals (e.g. x, y and z axis),
Sam Kwong is with the Department of Computing and Decision Science, so multi-channel signal processing and feature extraction are
Lingnan University, Hong Kong (e-mail: [email protected]).
Recommended for acceptance by J. Liu. key prerequisites. In recent years, various deep neural network
Digital Object Identifier 10.1109/TETCI.2024.3430008 (DNN) models have been introduced in HAR, such as Long
2471-285X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3577
Short Term Memory Network (LSTM) [13], Recurrent Neural method when removing the classifier module. Experimental
Network (RNN) [14] and Graph Neural Network (GNN) [10], studies on three public multisensor based human activity recog-
where the signal in each physical channel is processed based on nition datasets demonstrate that dfLasso-Net can achieve better
time series modeling approaches [15]. Some recent works on recognition performance with the least number of sensors and is
HAR explore feature extraction or learning within each phys- potential to provide good model interpretability by visualizing
ical channel and perform ensemble learning through attention the weights of both the sensors and features. In short, our main
mechanism in various deep neural networks [16], [17], [18]. It contributions are as follows:
is also investigated in [18] that for multisensor HAR, within r We propose an energy-efficient and interpretable deep
each sensor, the physical channels usually have significantly neural network for multisensor based HAR inspired by
different importance, where not all the channels are relevant. the classical sparse learning method, fused Lasso, which
The independent channel-wise processing brings one issue that is able to perform sensor selection and features selection
the important channels may be scattered in different sensors, with learned weights and accurate HAR in one end-to-end
resulting in difficulty in sensor selection. Moreover, in this structure.
case, the energy consumption is not reduced since the signal r A two-level weight computing module (TLWCM) con-
transmission cannot be turn off for each channel but for each sisting of a senor weight net and a feature weight net is
sensor in real-world scenario. designed to measure the importance of sensors and fea-
Another line of research regards the multi-channel time tures simultaneously, where spatial smoothness between
series data jointly as a 2D or 3D image and Convolutional physical channels within each sensor is considered which
Neural Network (CNN) [19] is used to to extract and learn not only correlate the signals in different channels but also
the deep features [20], [21], [22], [23], [24], [25], [26]. The maximize the usage of each selected sensor.
interpretability of deep learning model is very important for r Alternative feature selection mechanism: the proposed fea-
HAR in real world applications. For example, in smart sports ture weight net can not only be used as an embedded
training system, users would like to know how the incorrect method but also a filter-based method for feature selection,
action is generated, which parts of body is more important to which can increase the model interpretability from the
perform accurate recognition [27]. The hierarchy of increasing perspective of feature level.
complexity and fusion of multi-channel signal can boost their The rest of this paper is organized as follows. Section II
predictive capacity, but lowers their interpretability in terms of provides the overview of related research. In Section III, we
preserving physical meanings. On one hand, the learned deep introduce the details of dfLasso-Net. Experimental settings and
features across the channels or sensors is hard to resolve and trace results are shown in Section IV. Finally, conclusion and discus-
back to its original physical channels; on the other hand, treating sion of our future work are provided in Section V.
the time-series signals as an image is not rational enough, since
simply stacking the sensor signals cannot satisfy local structure II. RELATED WORK
similarity assumption in image processing tasks.
To tackle the above issues, we consider developing an energy- A. Deep Learning Based Multi-Channel Feature Extraction in
efficient HAR with a reduced number of sensors which can main- Human Activity Recognition
tain feature interpretability in respective physical channel while With the rapid development of artificial intelligence tech-
exploring the correlation across different physical channels. nology, deep learning based activity recognition models [34]
Specifically, we cast the classical fused lasso model into a deep have been actively researched in recent years, which outperform
neural network, proposing a deep fused Lasso net (dfLasso-Net), the traditional machine learning models such as support vector
which can perform sensor selection, feature weighting and HAR machine (SVM). Currently, CNN are most widely used in HAR
in one end-to-end structure. In dfLasso-Net, a two-level weight tasks due to its local dependence and scale invariance, where the
computing module (TLWCM) consisting of a sensor weight net multi-physical channel signals are usually treated and processed
and a feature weight net is designed to measure the importance as images, considering both the temporal and spatial dimensions.
of sensors and features simultaneously so that useful sensors and Mohmed et al. [20] proposed the use of a Deep Convolutional
informative features can be retained. In sensor weight net, spatial Neural Network for human activity recognition using binary
smoothness between physical channels within each sensor is ambient sensors, which can visualise binary string data as a
considered to maximize the usage of selected sensors. And greyscale image. Mekruksavanich et al. [21] proposed a hybrid
the feature weight net inspired by feature weighting method HAR framework that employs spatial-temporal features that are
in interpretable machine learning (IML), such as LIME [28] automatically extracted from data obtained from smartwatch
or SHAP [29] is able to maintain the physical meaning of the sensors, which eliminate the need for the manual extraction of
hand-crafted features through feature selection (FS) to enhance features. Zhang et al. [22] employed the structure of U-net, where
the model interpretability from the feature level. Different FS a 2D image of raw motion signal was used as the input data.
methods have been used to reduce the redundant features so Mohamed et al. [23] considered complex IoT scenario consisting
as to improve the generalization ability for high-dimensional of heterogeneous sensors, where the sensed time-series data are
data [30], [31], [32], [33]. It is also worth noting that the proposed converted into an image as the input for deep learning model.
feature weight net is detachable, which can be regarded as Teng et al. [24] applied deep residual network, which was trained
either an embedded FS method or used as a filter-based FS in a block-wise manner, to recognize different activities through
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3578 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
multi-channel signals as an image. Tang et al. [25] combined than that of the sample place. The lasso [40] method uses l1
CNN and transformer to deal with multi-channel time-series regularization constrain the model solution as follows:
data, where both the local and global correlation between the
signals were explored. Although good recognition accuracy can min y − Xw22 + λw1 (1)
w
be obtained by using CNN to deal with the multi-channel signal,
the rationality of treating the time-series signals as an image
where y ∈ Rn is the observations, X ∈ Rn×p is the design
still need further investigation, since simply stacking the sensor
matrix contained p features for each observations, w ∈ Rp is
signals cannot satisfy local structure similarity assumption in
a vector of p weights mapping the features to the observations.
image processing tasks. More importantly, using CNN to fuse
As an extension of lasso, the fused lasso applies the l1 -norm
different local information brings difficulty in keeping the model
penalty of both the coefficients and their successive differences
interpretability from the feature level. Therefore, it is desirable
to make the local coefficients homogeneous, thus achieving local
for deep neural network to maintain the features in respective
smooth, which can be expressed as:
physical channel while performing HAR.
Fig. 1. The structure of our proposed dfLasso-Net, which consists of a two-level weight computing module and classification net.
Fig. 2. Details of weight generation net and fusion optimization net in sensor weight net.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3580 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
d Fig. 4. Feature sub-net in feature weight net.
min l(c) + λ1 |ci | + λ2 |ci − cj | (4)
c
i=1 (i,j)∈(1,...,d)
the weights within each sensor. The structure of feature sub-net
where c denotes the channel weight vector, ci , i = 1, . . . , d, is is based on improvement of non-local network [43] shown in
the weight of ith channel and l(c) can be applied as a general Fig. 4, which invokes the attention mechanism of self-attention
loss term. Equation (4) is developed based on generalized fused GAN to obtain the key motion features. We combine the feature
lasso [42], which is suitable for the classification problems. vectors from different channels within the same sensor into one
Contrasting with (1), we add the new component |ci − cj |. feature vector and then convert the features into different feature
This component constrains the channels within the same sensor. spaces by 1D convolution. The size of 1D convolution is 1 × nf ,
This addresses the fact that there may be similar trends between where nf denotes the number of features in each sensor. The
channels within the same sensor that have an impact on the final 1×1 convolution is used for feature channel adjustment. We
recognition performance, maximizing the use of each sensor. regard the output as the feature weights within each sensor, rather
After generating all the internal channel weights of the sen- than the final feature weights, as the the priority of different
sors, we calculate the difference between the weights for internal sensors also needs to be considered.
channels in the same sensor and generate weight difference For each feature, the final feature weight is determined by the
matrix. Specifically, for d channels in each sensor, a d × d product of sensor weight and the internal feature weight of the
symmetric matrix D is obtained, where Di,j = |ci − cj | records sensor. As mentioned above, in testing phase for discarded sen-
the weight difference between ith channel and jth channel. sors, the internal feature weight is set to 0, so as the final weight
We also use the mean value and maximum value vectors in is also 0. For retained sensors, the final weight is calculated as:
each sensor obtained in the weight generation net and input
wk,i,p = Sk ∗ fi,p (5)
them into optimization sub-net, where the mean value difference
matrix and the maximum difference matrix are calculated in where Sk denotes the weight of the kth sensor and fi,p represents
the same way as D mentioned above. Then, we pass them the feature weight of pth feature in ith channel calculated by
into two CNNs, respectively, to further extract deep features feature weight net.
and obtain the corresponding deep feature vectors. Both of the At last, as shown in Fig. 1, the feature subset is obtained
CNNs have the same structure, consisting of three convolutional by sorting the final feature weights in descending order and
layers, Layer 1 with one convolutional kernel (size of d × d, select the features with higher weights. Since the feature weights
stride = d, padding = d), Layer 2 with one convolutional ker- within the discarded sensors are always 0, the feature selection
nel (size of 3 × 3, stride = 3, padding = 3) and Layer 3 with d process always give preference to the features in the retained
convolutional kernels (size of 3 × 3, stride = 1, no padding). sensors. Thus, we can complete the sensor selection and feature
Finally, we add the two resultant vectors and calculate a matrix selection process simultaneously.
multiplication with the weight difference matrix to obtain the
difference output, which has the same length as the number of D. Classification Net
physical channels d. The output is split into individual elements
and fed back into the input feature vectors of weight generation Classification net is mainly used to optimize the correlation
net. With the interaction of the weight generation net and the between the weighted features and labels. When combined with
fusion optimization net, the sensor weights are continuously TLWCM, an end-to-end structure for classification is estab-
optimized and exhibit a reasonable distribution. lished, where the feature subset as the intermediate results can be
determined based on the feature weights. Classification net can
be replaced with other pre-trained network models to optimise
C. Feature Weight Net
the weighted features in case of different task requirements.
Feature weight net is used to learn the feature weights within In this case, an embedded feature selection can be performed.
each physical channel to remove the redundant features. As Besides, since during the training process, feature weights and
shown in Fig. 1, feature weight net consists of several feature sensor weights are optimized continuously, the obtained feature
sub-nets. For each sensor, we use a feature sub-net to calculate subset can be combined with other different classifiers for the
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3581
In our work, we set γ = 100 since it is very close to |x| and C. Training Details
the approximation works better in comparison to [44], where In our work, we use repeated random sub-sampling (RRSS)
γ = 50 is used to approximate the norm. for the three datasets, where 80% of the samples from each
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3582 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
TABLE II
COMPARISON AS EMBEDDED METHODS ON THREE DATASETS
dataset are used as the training set and the remaining 20% as the As can be seen in Table II, dfLasso-Net achieves the best
testing set. The model parameters are optimized by stochastic recognition performance on DSA and Skoda (left & right) in
gradient descent (SGD) with batch size 64, and the weights of terms of accuracy, WeightF and AUC, while the accuracy on
regularizer are 1e-4. We also set the initial learning rate accord- PAMAP2 is slightly worse than ResNet+TA. The proposed
ing to different datasets (DSA: 0.05, PAMAP2: 0.02, Skoda: dfLasso-Net can reduce the number of unnecessary sensors
0.03) and the training step is set to 40. We conducted experiments through the sensor weight Net and the redundant features within
on a server equipped with 8 Tesla P100-PCIE-16 GB GPUs. each sensor by feature weight net, respectively. Compared with
According to the parameter settings suggested in fused group existing works, only with sensor selection or feature selec-
lasso method [48], we test different cases of (λ1 , λ2 , λ3 ) chosen tion, dfLasso-Net can better overcome the overfitting issue in
from the set {1e-5, 2e-5, 5e-5,..., 1e-1, 2e-1, 5e-1, 1, 2, 5}. Then, large-scale networks and maintain good generalization ability.
the one that yield the smallest number of misclassified labels is This two-level data selection mechanism also enhances the
used. The training and testing process are repeated 10 times with intepretability of the model for HAR.
the average value as the final results.
For the classification net, we use pre-trained Resnet18 [49] E. Feature Selection
due to its popularity. Furthermore, two metrics are used for eval-
uation, including accuracy and weighted F1-score (WeightF). DfLasso-Net can be used as a filter-based feature selection
method for HAR. To validate the effectiveness of dfLasso-Net
1
L
T P i + T Ni on feature selection, three Lasso based methods (I-CNN, tradi-
Accuracy = (9) tional sparse group lasso, fused lasso with Alternating Direction
L i=1 T Pi + F Ni + F Pi + T Ni
Method of Multipliers (ADMM)) and one ensemble method,
where i denotes class index and L is the number of classes. MSF-EP are chosen for comparison. Since I-CNN and MSF-EP
only measure the importance of sensors without any feature
T Pi
W eightF = 2 ∗ si ∗ (10) selection process, we adapted the methods for a fair compar-
F N i + F Pi + 2 ∗ T Pi
i ison. Specifically, we perform feature selection on the internal
where si denotes the proportion of samples of class i; class index features of the retained sensors and get the feature subsets. This
and l is the number of classes. process is implemented though the feature selection module in
the scikit-learn machine learning library (sklearn). Specifically,
we use SelectKBest module and set the “mutual_info_classif”
D. Comparison of Overall Performance
module with default parameters as the score function. K stands
DfLasso-Net can perform sensor selection and HAR in an end- for the number of selected features.
to-end structure, where embedded feature selection is conducted. For fair comparison, all the methods use the same neural
We compare dfLasso-Net with state-of-the-art HAR methods network (NN) classifier [52]. In addition, we compute the av-
involving either sensor selection or feature selection modules, erage testing results of different methods according to a given
including convolutional neural network based group lasso (I- range of selected features K. It is indicated from Table III that
CNN) [50], discriminant sensor pruning (MSF-EP) method [6], dfLasso-Net achieves the best average recognition performance
shallow convolutional neural networks with channel-selectivity with the same number of selected features. I-CNN is worse than
(ResNet+SelectConv) [35], deep multifeature attention net- dfLasso-Net, but outperforms MSF-EP on three datasets. As
work (DMEFAM) [38] and Triple cross-domain attention based traditional methods, sgLasso and fused lasso perform closely on
Resnet (Resnet+TA) [37]. MSF-MP considers sensor pruning Skoda, but very different on DSA and PAMAP2. This is due to
and HAR in multisensor HAR and I-CNN considers identifying the fact that the number of actions on DSA and PAMAP2 are
the important sensors, while Resnet+SelectConv, DEMFAM and more than Skoda, which contains more highly related actions to
Resnet+TA applied channel attention mechanism to perform identify. Experimental results verify that dfLasso-Net can select
feature selection in HAR. A statistical Wilcoxon significance more important and relevant features for HAR tasks.
test [51] with 5% significance level is applied, where the best To verify the effectiveness and stability of dfLasso-Net, the
results are bold and the second best are underlined. recognition results under different number of selected features
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3583
TABLE III
COMPARISON AS FILTER METHODS ON THREE DATASETS
Fig. 5. Comparisons among different methods on the Skoda dataset (using left arm’s data where the number of total features is 11 × 6 × 10 = 660).
are given in Fig. 5. With a small K, the testing results for all the
methods improve significantly as K increases, indicating that ef-
fective features are more important for recognition performance
when the number of retained sensors is small. When K increases
to a certain value, the improvement of the testing results for all
methods becomes smaller, suggesting that there is a threshold
for the number of effective sensors in HAR tasks and that too
many sensors and internal features do not improve recognition
performance much. Undoubtedly, dfLasso-Net not only achieves
the best performance but also performs the most stable on Skoda
when using left arm’s data.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3584 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
TABLE IV
AVERAGE RESULTS OF ABLATION ANALYSIS
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3585
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3586 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
ZHOU et al.: ENERGY-EFFICIENT AND INTERPRETABLE MULTISENSOR HUMAN ACTIVITY RECOGNITION VIA DEEP FUSED LASSO NET 3587
as follows: [15] Q. Xiao, L. Wu, X. Wu, and M. Rätsch, “Simulating temporally and
spatially correlated wind speed time series by spectral representation
d
d method,” Complex Syst. Model. Simul., vol. 3, no. 2, pp. 157–168, 2023.
2 2 [16] K. Hirooka, M. A. M. Hasan, J. Shin, and A.Y. Srizon, “Ensembled
min ci ∗ I , s.t. ci = 1 (12) transfer learning based multichannel attention networks for human activity
i=1 i=1 recognition in still images,” IEEE Access, vol. 10, pp. 47051–47062, 2022.
[17] T. Hasegawa and K. Kondo, “Easy ensemble: Simple deep ensemble learn-
Based on the inequality of arithmetic and geometric means, it ing for sensor-based human activity recognition,” IEEE Internet Things J.,
is easy to obtain that when ci = 1/d, i = 1, . . . , d, the minimum vol. 10, no. 6, pp. 5506–5518, Mar. 2023.
[18] Y. Zhou, Z. Yang, X. Zhang, and Y. Wang, “A hybrid attention-based deep
energy cost is I 2 /d. This conclusion is consistent to the spirit neural network for simultaneous multi-sensor pruning and human activity
of our fusion optimization net, where the weight differences recognition,” IEEE Internet Things J., vol. 9, no. 24, pp. 25363–25372,
between channels are minimized and smoothed. This can ensure Dec. 2022.
[19] A. Bevilacqua, K. MacDonald, A. Rangarej, V. Widjaya, B. Caulfield,
that the selected sensors can be maximum utilized and mean- and T. Kechadi, “Human activity recognition with convolutional neu-
while maintaining a minimum energy cost. Since the sparsity of ral networks,” in Proc. Mach. Learn. Knowl. Discov. Databases, 2019,
the channels across all the sensors is also concerned in weight pp. 541–552.
[20] G. Mohmed, A. Lotfi, and A. Pourabdollah, “Employing a deep con-
generation net, for the channel with weight equal to zero, the volutional neural network for human activity recognition based on bi-
others channels will also be set to zero, which means the sensor nary ambient sensor data,” in Proc. 13th ACM Int. Conf. PErvasive
will be discarded. Therefore, by using sensor weight net, sensor Technol. Related Assistive Environ., 2020, pp. 1–7. [Online]. Available:
https://doi.org/10.1145/3389189.3397991
selection and physical channel equalization can be achieved, [21] S. Mekruksavanich and A. Jitpattanakul, “Smartwatch-based human ac-
where energy-efficient HAR can be implicitly performed. tivity recognition using hybrid LSTM network,” in Proc. IEEE SENSORS,
2020, pp. 1–4.
[22] Y. Zhang, Z. Zhang, Y. Zhang, J. Bao, Y. Zhang, and H. Deng, “Human
activity recognition based on motion sensor using u-net,” IEEE Access,
REFERENCES vol. 7, pp. 75213–75226, 2019.
[1] F. Gu, M.-H. Chung, M. Chignell, S. Valaee, B. Zhou, and X. Liu, “A [23] M. Abdel-Basset, H. Hawash, V. Chang, R. K. Chakrabortty, and M. Ryan,
survey on deep learning for human activity recognition,” ACM Comput. “Deep learning for heterogeneous human activity recognition in complex
Surv., vol. 54, no. 8, pp. 1–35, 2021, doi: 10.1145/3472290. Iot applications,” IEEE Internet Things J., vol. 9, no. 8, pp. 5653–5665,
[2] X. Zhou, W. Liang, K. I.-K. Wang, H. Wang, L. T. Yang, and Q. Jin, “Deep- Apr. 2022.
learning-enhanced human activity recognition for internet of healthcare [24] Q. Teng, L. Zhang, Y. Tang, S. Song, X. Wang, and J. He, “Block-wise
things,” IEEE Internet Things J., vol. 7, no. 7, pp. 6429–6438, Jul. 2020. training residual networks on multi-channel time series for human ac-
[3] Y. Zhou, R. Wang, Y. Wang, S. Sun, J. Chen, and X. Zhang, “A swarm tivity recognition,” IEEE Sensors J., vol. 21, no. 16, pp. 18063–18074,
intelligence assisted IoT-based activity recognition system for basket- Aug. 2021.
ball rookies,” IEEE Trans. Emerg. Topics Comput. Intell., vol. 8, no. 1, [25] Y. Tang, L. Zhang, H. Wu, J. He, and A. Song, “Dual-branch interac-
pp. 82–94, Feb. 2024. tive networks on multichannel time series for human activity recogni-
[4] Y. Guo et al., “Evolutionary dual-ensemble class imbalance learning for tion,” IEEE J. Biomed. Health Inform., vol. 26, no. 10, pp. 5223–5234,
human activity recognition,” IEEE Trans. Emerg. Topics Comput. Intell., Oct. 2022.
vol. 6, no. 4, pp. 728–739, Aug. 2022. [26] K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie, “A semisuper-
[5] D. Tao, L. Jin, Y. Yuan, and Y. Xue, “Ensemble manifold rank preserving vised recurrent convolutional attention model for human activity recogni-
for acceleration-based human activity recognition,” IEEE Trans. Neural tion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1747–1756,
Netw. Learn. Syst., vol. 27, no. 6, pp. 1392–1404, Jun. 2016. May 2020.
[6] J. Cao, W. Li, C. Ma, and Z. Tao, “Optimizing multi-sensor deployment [27] J. Wang et al., “Tac-trainer: A visual analytics system for IoT-based
via ensemble pruning for wearable activity recognition,” Inf. Fusion, racket sports training,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 1,
vol. 41, pp. 68–79, 2018. [Online]. Available: https://www.sciencedirect. pp. 951–961, Jan. 2023.
com/science/article/pii/S1566253517304803 [28] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why should i trust you?’
[7] M. M. Islam and T. Iqbal, “MuMu: Cooperative multitask learning-based explaining the predictions of any classifier,” in Proc. 22nd ACM SIGKDD
guided multimodal fusion,” in Proc. AAAI Conf. Artif. Intell., 2022, vol. 36, Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 1135–1144.
pp. 1043–1051. [29] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model
[8] M. M. Islam and T. Iqbal, “Hamlet: A hierarchical multimodal attention- predictions,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4768–4777,
based human activity recognition algorithm,” in Proc. IEEE/RSJ Int. Conf. vol. 30.
Intell. Robots Syst., 2020, pp. 10285–10292. [30] Y. Zhou, W. Zhang, J. Kang, X. Zhang, and X. Wang, “A problem-specific
[9] Y. Zhou, C. Xie, S. Sun, X. Zhang, and Y. Wang, “A self-supervised human non-dominated sorting genetic algorithm for supervised feature selection,”
activity recognition approach via body sensor networks in smart city,” Inf. Sci., vol. 547, pp. 841–859, 2021. [Online]. Available: https://www.
IEEE Sensors J., vol. 24, no. 5, pp. 5476–5485, Mar. 2024. sciencedirect.com/science/article/pii/S0020025520308549
[10] Y. Zhu, H. Luo, R. Chen, and F. Zhao, “DiamondNet: A neural-network- [31] J. Wang, H. Ouyang, Z. Zhou, and S. Li, “Harmony search algorithm
based heterogeneous sensor attentive fusion for human activity recogni- based on dual-memory dynamic search and its application on data
tion,” IEEE Trans. Neural Netw. Learn. Syst., early access, Jul. 04, 2023, clustering,” Complex Syst. Model. Simul., vol. 3, no. 4, pp. 261–281,
doi: 10.1109/TNNLS.2023.3285547. 2023.
[11] E. Tong et al., “A hierarchical energy-efficient service selection approach [32] Y. Zhou, Y. Qiu, and S. Kwong, “Region purity-based local feature selec-
with QoS constraints for Internet of Things,” IEEE Trans. Green Commun. tion: A multiobjective perspective,” IEEE Trans. Evol. Comput., vol. 27,
Netw., vol. 5, no. 2, pp. 645–657, Jun. 2021. no. 4, pp. 787–801, Aug. 2023.
[12] T. Zhao, X. Chen, Q. Sun, and J. Zhang, “Energy-efficient federated [33] Y. Zhou, N. Yang, X. Huang, J. Lee, and S. Kwong, “A novel mul-
learning over cell-free Iot networks: Modeling and optimization,” IEEE tiobjective genetic programming approach to high-dimensional data
Internet Things J., vol. 10, no. 19, pp. 17436–17449, Oct. 2023. classification,” IEEE Trans. Cybern., early access, Mar. 18, 2024,
[13] D. Tao, Y. Wen, and R. Hong, “Multicolumn bidirectional long short- doi: 10.1109/TCYB.2024.3372070.
term memory for mobile devices-based human activity recognition,” IEEE [34] E. Sansano, R. Montoliu, and B. Fernández, “A study of deep neural
Internet Things J., vol. 3, no. 6, pp. 1124–1134, Dec. 2016. networks for human activity recognition,” Comput. Intell., vol. 36, no. 3,
[14] A. Anagnostis, L. Benos, D. Tsaopoulos, A. Tagarakis, N. Tsolakis, and D. pp. 1113–1139, 2020, doi: 10.1111/coin.12318.
Bochtis, “Human activity recognition through recurrent neural networks [35] W. Huang, L. Zhang, Q. Teng, C. Song, and J. He, “The convolutional
for human–robot interaction in agriculture,” Appl. Sci., vol. 11, no. 5, neural networks training with channel-selectivity for human activity recog-
2021, Art. no. 2188. [Online]. Available: https://www.mdpi.com/2076- nition based on sensors,” IEEE J. Biomed. Health Inform., vol. 25, no. 10,
3417/11/5/2188 pp. 3834–3843, Oct. 2021.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.
3588 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 8, NO. 5, OCTOBER 2024
[36] W. Huang, L. Zhang, H. Wu, F. Min, and A. Song, “Channel-equalization- Jingtao Xie received the B.S. degree in Internet of
HAR: A light-weight convolutional neural network for wearable sensor Things engineering from the Guangdong University
based human activity recognition,” IEEE Trans. Mobile Comput., vol. 22, of Technology, Guangzhou, China, in 2022. He is cur-
no. 9, pp. 5064–5077, Sep. 2023. rently working toward the M.S. degree with the Col-
[37] Y. Tang, L. Zhang, Q. Teng, F. Min, and A. Song, “Triple cross-domain lege of Computer Science and Software Engineering,
attention on human activity recognition using wearable sensors,” IEEE Shenzhen University, Shenzhen, China. His research
Trans. Emerg. Topics Comput. Intell., vol. 6, no. 5, pp. 1167–1176, focuses on multimodal human action recognition.
Oct. 2022.
[38] Y. Wang et al., “A novel deep multifeature extraction framework based
on attention mechanism using wearable sensor data for human activity
recognition,” IEEE Sensors J., vol. 23, no. 7, pp. 7188–7198, Apr. 2023.
[39] C. Han et al., “Understanding and improving channel attention for human
activity recognition by temporal-aware and modality-aware embedding,”
IEEE Trans. Instrum. Meas., vol. 71, 2022, Art. no. 2513612.
[40] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy. Xiao Zhang (Member, IEEE) received the B.Eng. and
Stat. Society, Ser. B. (Methodological), vol. 58, no. 1, pp. 267–288, 1996. M.Eng. degrees from the South-Central University
[Online]. Available: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/ for Nationalities, Wuhan, China, in 2009 and 2011,
j.2517-6161.1996.tb02080.x respectively, and the Ph.D. degree from the Depart-
[41] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, ment of Computer Science, City University of Hong
“Sparsity and smoothness via the fused lasso,” J. Roy. Stat. Soci- Kong, Hong Kong, in 2016. In 2015, he was a Visiting
ety, Ser. B. (Statistical Methodol.), vol. 67, no. 1, pp. 91–108, 2005, Scholar with the Utah State University, Logan, UT,
doi: 10.1111/j.1467-9868.2005.00490.x. USA. During 2016–2019, he was a Postdoc Research
[42] B. Xin, Y. Kawahara, Y. Wang, L. Hu, and W. Gao, “Efficient generalized Fellow with the Singapore University of Technology
fused lasso and its applications,” ACM Trans. Intell. Syst. Technol., vol. 7, and Design, Singapore. He is currently an Associate
no. 4, pp. 1–22, 2016, doi: 10.1145/2847421. Professor with the College of Computer Science,
[43] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” South-Central University for Nationalities. His research interests include al-
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803. gorithms design and analysis, combinatorial optimization, wireless and UAV
[44] J. Shah, I. Qureshi, Y. Deng, and K. Kadir, “Reconstruction of sparse networking.
signals and compressively sampled images based on smooth l1-norm
approximation,” J. Signal Process. Syst., vol. 88, no. 3, pp. 333–344, 2017,
doi: 10.1007/s11265-016-1168-8.
[45] B. Barshan and M. C. Yüksek, “Recognizing daily and sports activities in
two open source machine learning environments using body-worn sensor
units,” Comput. J., vol. 57, no. 11, pp. 1649–1667, 2014.
[46] A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for Wenhui Wu (Member, IEEE) received the B.S. and
activity monitoring,” in Proc. IEEE 16th Int. Symp. Wearable Comput., M.S. degrees from Xidian University, Xian, China, in
2012, pp. 108–109. 2012 and 2015, respectively, and the Ph.D. degree in
[47] P. Zappi et al., “Activity recognition from on-body sensors: Accuracy- computer science from the City University of Hong
power trade-off by dynamic sensor selection,” in Proc. Wireless Sensor Kong, Hong Kong, China, in 2019. She is currently an
Netw., 2008, pp. 17–33. Associate Professor with the College of Electronics
[48] S. Zhang, Z. Zhu, B. Zhang, B. Feng, T. Yu, and Z. Li, “Fused group lasso: and Information Engineering, Shenzhen University,
A new EEG classification model with spatial smooth constraint for motor Shenzhen, China. Her research interests include ma-
imagery-based brain–computer interface,” IEEE Sensors J., vol. 21, no. 2, chine learning, image enhancement, and community
pp. 1764–1778, Jan. 2021. detection.
[49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
pp. 770–778.
[50] E. Kim, “Interpretable and accurate convolutional neural networks for
human activity recognition,” IEEE Trans. Ind. Inform., vol. 16, no. 11,
pp. 7190–7198, Nov. 2020. Sam Kwong (Fellow, IEEE) is currently the Chair
[51] J. Derrac, S. García, D. Molina, and F. Herrera, “A practical tutorial on Professor of computational intelligence, and con-
the use of nonparametric statistical tests as a methodology for comparing currently as Associate Vice-President (Strategic Re-
evolutionary and swarm intelligence algorithms,” Swarm Evol. Comput., search) of Lingnan University, Hong Kong. He is a
vol. 1, no. 1, pp. 3–18, 2011. Distinguished Scholar of evolutionary computation,
[52] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” J. Mach. artificial intelligence (AI) solutions, and image/video
Learn. Res., vol. 12, pp. 2825–2830, 2011. processing, with a strong record of scientific inno-
vations and real-world impacts. He is the Chair Pro-
fessor of computer science with Lingnan University.
He has a prolific publication record with more than
400 journal articles and 160 conference papers with
an h-index of 80 based on Google Scholar. He was listed as one of the top
2% of the world’s most cited scientists, according to the Stanford University
Yu Zhou (Senior Member, IEEE) received the B.Sc. report. He was listed as one of the top 1% of the world’s most cited scientists
degree in electronics and information engineering the by Clarivate in 2022. He has also been actively engaged in knowledge transfer
M.Sc. degree in circuits and systems from Xidian between academia and industry. He was elevated to IEEE Fellow in 2014 for his
University, Xi’an, China, in 2009 and 2012, respec- contributions to optimization techniques in cybernetics and video coding. He was
tively, and the Ph.D. degree in computer science from a Fellow of the US National Academy of Inventors and Hong Kong Academy
the City University of Hong Kong, Hong Kong, in of Engineering and Science. He was the President of the IEEE Systems, Man,
2017. He is currently a tenured Associate Professor and Cybernetics Society (SMCS) from 2021 to 2023. He is an Associate Editor
with the College of Computer Science and Software for a number of leading IEEE transaction journals.
Engineering, Shenzhen University, Shenzhen, China.
His research interests include computational intelli-
gence, machine learning, and intelligent information
processing.
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on October 22,2024 at 13:21:17 UTC from IEEE Xplore. Restrictions apply.