Human Activity Recognition 2
Human Activity Recognition 2
https://doi.org/10.1007/s40860-021-00147-0
REVIEW
Received: 15 April 2021 / Accepted: 24 June 2021 / Published online: 3 July 2021
© The Author(s) 2021
Abstract
Recognizing human activities and monitoring population behavior are fundamental needs of our society. Population security,
crowd surveillance, healthcare support and living assistance, and lifestyle and behavior tracking are some of the main appli-
cations that require the recognition of human activities. Over the past few decades, researchers have investigated techniques
that can automatically recognize human activities. This line of research is commonly known as Human Activity Recognition
(HAR). HAR involves many tasks: from signals acquisition to activity classification. The tasks involved are not simple and
often require dedicated hardware, sophisticated engineering, and computational and statistical techniques for data preprocess-
ing and analysis. Over the years, different techniques have been tested and different solutions have been proposed to achieve
a classification process that provides reliable results. This survey presents the most recent solutions proposed for each task in
the human activity classification process, that is, acquisition, preprocessing, data segmentation, feature extraction, and clas-
sification. Solutions are analyzed by emphasizing their strengths and weaknesses. For completeness, the survey also presents
the metrics commonly used to evaluate the goodness of a classifier and the datasets of inertial signals from smartphones that
are mostly used in the evaluation phase.
Keywords ADL · Human activity recognition · Machine learning · Deep learning · Smartphone
123
190 Journal of Reliable Intelligent Environments (2021) 7:189–213
ng
ct re
tio
process
en ta
si
n
tra tu
on
qu ata
io
es
gm Da
ta
Ex Fea
iti
D
c
is
ro
ep
Ac
Se
Pr
Raw Data Input Data
focus is on techniques and methods that have been experi- ity that is normally performed on the raw data as acquired
mented and proposed for smartphone. Therefore, this review by the sensors. Sections 5 and 6 describe the commonly
does not include other types of devices used in HAR. The used segmentation strategies and features, respectively. Sec-
choice to consider smartphones only is due to the increasing tion 7 introduces the most recent classification methods, their
attention paid to these devices by the scientific community as strength, and weakness. Moreover, the Section discusses per-
a result of their valuable equipment and their wide diffusion. sonalization and why it is important to improve the overall
The paper also provides an overview of the most used classification performance. Given the importance of datasets
datasets for evaluating HAR techniques. Since this review is in the evaluation process of techniques and methods, Sect. 8
focused on smartphones, the datasets included are those of discusses the characteristics of a set of publicly available
inertial signals collected using smartphones. datasets often used in the evaluation of classifiers. Section 9
The analysis of the state-of-the-art encompasses scientific summarizes the lessons learned and provides some guidance
articles selected based on the following criteria and key- on where the research should focus. Finally, Sect. 10 sketches
words: the conclusions.
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 191
The potentially recognizable activities vary in complexity: were worn by people on different parts of their bodies and
walking, jogging, sitting, and standing are examples of the included typically inertial sensors [7].
most simple ones; preparing a meal, shopping, taking a bus, Over the past decade, a considerable progress in hard-
and driving a car are examples of the most complex ones. ware and software technologies has modified habits of the
Depending on the complexity, different techniques and types entire population and business. On one hand, the micro-
of signals are implemented. We are interested in activities electro-mechanical systems (MEMS) have reduced sensors
that belong to the category of the simplest ones. size, cost, and power needs of sensors, while capacity, preci-
When the wearable device is a smartphone, the most com- sion, and accuracy have increased. On the other hand, Internet
monly used sensors are the accelerometer, gyroscope, and of Things (IoT) has enabled the spread of easy and fast con-
magnetometer. Therefore, the first step of the Activity Recog- nections between devices, objects, and environments. The
nition Process (ARP) introduced in Sect. 1 (Data Acquisition) pervasiveness and the reliability of these new technologies
requires to be able to interface with the sensors and to acquire enables the acquisition and storage of a large amount of mul-
the signals with the required frequencies. This step is detailed timodal data [15].
in Sect. 3. Thanks to these technological advances, smartphones,
As the signals are acquired, they undergo an elabora- smartwatches, home assistants, and drones are daily used
tion process whose purpose is to remove the noise caused and represent essential instruments for many economy busi-
by the user and the sensors. Generally, high-pass, low-pass nesses, such as remote healthcare, merchandise delivering,
filters, and average smoothing methods are applied. This cor- agriculture, and others [16]. These new technologies together
responds to the second step (Preprocessing) of the ARP that with the large availability of data gained the attention from
is detailed in Sect. 4. the research communities, including HAR.
The continuous pre-processed data stream is then split into The goal of this section is to present the most used
segments whose dimensions and overlaps may vary accord- wearable devices for data acquisition in HAR, which are a
ing to several factors such as the technique used to classify, consequence of the technological advances discussed above.
the type of activity to be detected, and the type of signals to be Wearable devices encompass all accessories attached to
processed. This corresponds to the third step (Segmentation) the person’s body or clothing incorporating computer tech-
of the ARP process that is detailed in Sect. 5. nologies, such as smart clothing, and ear-worn devices [17].
The segments of pre-processed signals are then elaborated They enable to capture attributes of interest as motion, loca-
to extract significative features. This step (Feature extraction tion, temperature, and ECG, among others.
in the ARP process) is crucial for the performance of the final Nowadays, smartphones and smartwatches are the most
recognition. Two main types of features are commonly used: used wearable devices among the population. In particular,
hand-crafted features (which are divided into time-domain the smartphone is one of the most used devices in people’s
and frequency-domain) and learned features that are auto- daily lives and it has been stated that it is the first thing people
matically discovered. Feature extraction is detailed in Sect. 6. reach for after waking up in the morning [18,19].
The last step of the ARP process is Classification. Smartphone’s pervasiveness over last years is due mostly
For many years, this step was accomplished through the because it provides the opportunity to connect with people,
exploitation of traditional machine learning techniques. to play games, to read emails, and, in general, to achieve
More recently, due to promising results in the field of video almost all online services that a user needs. In particular, their
signal processing, deep learning techniques have also been high diffusion is a crucial aspect, because the more the users,
used. More recently, due to the problem known as population the more the data availability. The more data availability, the
diversity [6] (which is related to the natural users heterogene- more information and the more the possibility to create robust
ity in terms of data), researchers have applied recognition models.
techniques based on personalization to obtain better results. A the same time, smartphones are preferable over other
Classification is detailed in Sect. 7. wearables, because a huge amount of sensors and softwares
are already installed and permit to acquire many kind of data,
potentially, all day long.
The choice of the sensors plays an important role for the
3 Data acquisition activity recognition performances [20].
Accelerometers, gyroscopes, and magnetometers are the
Historically, human activity recognition techniques exploited most used sensors for HAR tasks and classification.
both environmental devices and ad hoc devices worn by
subjects [7]. Commonly used environment devices include
cameras [8–11], and other sensors such as, for example RFID – Accelerometer. The accelerometer is an electromechani-
[12], acoustic sensors [13], and WiFi [14]. The ad hoc devices cal sensor that captures the rate of change of the velocity
123
192 Journal of Reliable Intelligent Environments (2021) 7:189–213
of an object over a time laps, that is, the acceleration. Shoaib et al. demonstrated that gyroscope-based classifica-
It is composed of many other sensors, including some tion achieves better results than accelerometer for specific
microscopic crystal structures that become stressed due activities, such as walking downstairs and upstairs [35]. Fur-
to accelerative forces. The accelerometer interprets the thermore, as afore mentioned, gyroscope data permit to infer
voltage coming from the crystals to understand how fast device position that drastically impacts recognition perfor-
the device is moving and which direction it is pointing mances [37,38].
in. A smartphone records three-dimension acceleration, Other studies combined accelerometer and magnetometer
which join the reference devices axes. Thus, a trivariate simultaneously [39], acceleration and gyroscope with mag-
time series is produced. The measuring unit is meters netometer [40,41], accelerometer with microphone and GPS
over second squared (m/s 2 ) or g forces. [6], and other combinations [42].
– Gyroscope. The gyroscope measures three-axial angu- An important factor to consider in the acquisition step is
lar velocity. Its unit is measured in degrees over second the sample rate that influences the number of available sam-
(degrees/s). ples for the classification step. The sampling rate is defined
– Magnetometer. A magnetometer measures the change of as the number of data points recorded in a second and is
a magnetic field at a particular location. The measurement expressed in Hertz. For instance, if the sampling rate is equal
unit is Tesla (T ), and is usually recorded on the three axes. to 50Hz, it means that 50 values per second are recorded.
This parameter is normally set during the acquisition phase.
In the literature, different sampling rates have been con-
In addition to accelerometers, gyroscopes, and magne-
sidered. For instance, in [43], the sampling rate is set at 50
tometers, other less common sensors are used in HAR. For
Hz, in [44] at 45 Hz, and from 30 to 32 Hz in [32]. Although
example, Garcia-Ceja and Brena use a barometer to classify
the choice is not unanimous in the literature, 50 Hz define a
vertical activities, such as ascending and descending stairs
suitable sampling rate that properly permits to model human
[21]. Cheng et al. [22] and Foubert et al. [23] use pressure
activities [45].
sensors arrays to detect respectively activities and lying and
sitting transitions. Other researchers use biometric sensors.
For example, Zia et al. use electromyography (EMG) for
fine-grained motion detection [24], and Liu et al. use elec- 4 Preprocessing
trocardiography (ECG) in conjunction with accelerometer to
recognize activities [25]. In a classification pipeline, data preprocessing is a funda-
Accelerometer is the most popular sensor in HAR, because mental step to prepare raw data for further steps.
it measures the directional movement of a subject’s motion Raw data coming from sensors often present artifacts due
status over time [26–31]. Nevertheless, it struggles to resolve to instruments, such as electronic fluctuation or sensor cal-
lateral orientation or tilt, and to find out the location of the ibration, or to the physical activity its self. Data have to be
user, which are precious information for activity recognition. cleaned to exclude from the signals these artifacts.
For these reasons, some sensor combinations have been Moreover, accelerometer signal combines the linear accel-
proposed as valid solution in HAR. In most of the cases, eration due to body motion and due to gravity. The presence
accelerometer and gyroscope are used conjointly to both of the gravity is a bias that can influence the accuracy of
acquire more information about the device movements, and the classifier, and thus is a common practice to remove the
to possibility infer the device position [32–36]. Moreover, gravity component from the raw signal.
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 193
Three main types of windowing are mainly used in HAR: Another parameter to consider is the percentage of over-
activity-defined windows, event-defined windows, and slid- lap among consecutive windows. Sliding windows are often
ing windows [53]. overlapped, which means that a percentage of a window is
In activity-defined windowing, the initial and end points of repeated in the subsequent window. This leads to two main
each window are selected by detecting patterns of the activity advantages: it avoids noise due to the truncation of data dur-
changes. ing the windowing process, and increases the performance
In event-defined windowing, the window is created around by increasing the data points number.
a detected event. In some studies, it is also mentioned as Generally, the higher the number of data points, the higher
windows around peak [43]. the classification performance. For these reasons, overlapped
In sliding windowing, data are split into windows of fixed sliding windows are the most common choice in the litera-
size, without gap between two consecutive windows, and, in ture.
some cases, overlapped, as shown in Fig. 4. Sliding window- Figure 5 shows the distribution of the percentage overlap
ing is the most widely employed segmentation technique in in the state-of-the-art. In more than 50% of the proposals
activity recognition, especially for periodic and static activ- we selected, an overlap of 50% has been chosen. Some
ities [54]. approaches avoid any overlap [29,32,44,57], claiming faster
123
194 Journal of Reliable Intelligent Environments (2021) 7:189–213
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 195
Table 1 Hand-crafted features in time domain and frequency domain. F F T ( f ) is the Fourier transformation of the signal f
Time domain features
Feature name Formula Description
123
196 Journal of Reliable Intelligent Environments (2021) 7:189–213
6.2 Learned features by more flexible techniques, developed during recent years,
based on data-driven paradigms. The main difference between
The goal of feature learning is to automatically discover these two approaches is given by the a priori assumption
meaningful representations of raw data to be analyzed [65]. about the relationship between independent and response
According to [66], the main features learning methods variables. Thus, given a classification model, y = f (x),
from sensor data are the following: model-driven approaches state that f is (or can be) deter-
mined by assumptions on the distribution of the underlying
– Codebooks [67,68] considers each sensor data window as stochastic process that generates x. f is build through a set
a sequence, from which subsequences are extracted and of rules, or algorithms, which choices depend on data with
grouped into clusters. Each cluster center is a codeword. an unknown distribution. On the opposite, in data-driven
Then, each sequence is encoded using a bag-of-words paradigms, f is unknown and depends directly on the data
approach using codewords as features. and on the choice of the algorithm.
– Principal Component Analysis (PCA) [69] is a multi- The strength and the success of data-driven approaches
variate technique, commonly used for dimensionality is due to their capability to manage and to analyze large
reduction. The main goal of PCA is the extraction of amount of variables that characterize a phenomenon without
a set of orthogonal features, called principal component, assuming any a priori relation between the independent and
which are linear combination of the original data and such response variables. From a certain point of view, this flexi-
as the variance extracted from the data is maximal. It is bility can be a weakness, because the lack of a well-known
also used for features selection. relation also can be interpreted as a lack of cause–effect
– Deep Learning uses Neural Networks engines to learn knowledge.
patterns from data. Neural Networks are composed from In model-driven approaches, in contrast, cause–effect
a set of layers. In each layer, the input data are trans- relation is known by definition. However, model-driven
formed through combinations of filters and topological approaches loose in performance in estimating high-
maps. The output of each layer becomes the input of the dimensionality relations.
following layer and so on. At the end of this procedure, the In activity recognition context, model-driven approaches
result is a set of features more or less abstract depending are less powerful and data-driven approaches are preferred
on the number of layers. The higher the number of layers [71].
is, the more the features are abstract. These features can Among data-driven algorithms, Artificial Intelligence
be used for classification. Different deep learning meth- (AI) have produced very promising results over the last years
ods for features extraction have been used for time series and have been largely used for data analysis, for information
analysis [70]. extraction, and for classification tasks. AI algorithms encom-
passes machine learning which, in turns, encompasses deep
Features learning techniques avoid the issue of manu- learning methods.
ally defining and selecting the features. Recently, promising Machine learning uses statistical exploration techniques
results are leading the research community to exploit learned to enable the machine to learn and improve with the experi-
features in their analysis. ences without being explicitly programmed. Deep learning
emulates human neural system to analyze and extract fea-
tures from data. In this survey, we focus on machine learning
7 Classification and deep learning algorithms.
The choice of the classification algorithm drastically influ-
Over the last years, hardware and software development has ences the classification performance, but up to now, there is
increased wearable devices capability to face with complex no evidence of a best classifier and its choice still remains a
applications and tasks. For instance, smartphones are, nowa- challenging task for the HAR community.
days, able to acquire, store, share, and elaborate huge amount In particular, machine learning and deep learning meth-
of data in a very short time. As a consequence of this tech- ods struggle to achieve good performances for new unseen
nological development, new instruments related to the data users. This loss of performance is mostly caused by the
availability, data processing, and data analysis are born. subjects variabilities, also called population diversity [6],
The capability of a simple smartphone to meet some com- which is related to the natural users heterogeneity in terms of
plex tasks (e.g., steps count and life style monitoring) is the data. The following sections present both traditional state-of-
result of very recent scientific changes regarding methods the-art machine learning and deep learning techniques, and
and techniques. personalized machine learning and deep learning techniques
In general, more traditional data analysis methods, based as solutions to overcome the population diversity problem.
on model-driven paradigms, have been largely substituted
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 197
123
198 Journal of Reliable Intelligent Environments (2021) 7:189–213
Table 3 Distance metrics in k-nearest neighbor The construction of a tree involves determining split cri-
Distance Formula terion, stopping criterion, and class assignment rule [82].
J48 and C4.5 are the most used decision tree in HAR
n [29,30,77,81].
i=1 (xi − x j )
Euclidean 2
n Random Forest (RF) is a classifier consisting of a col-
City Block i=1 |xi − x j |
Chebychev maxi=1...n |xi − x j |
lection of tree-structured classifiers {h(x, k ), k = 1, ...}
xi xTj
where the {k } are independent identically distributed
Cosine 1−
random vectors and each tree casts a unit vote for the
(xi xiT )(x j xTj )
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 199
Decision rules is the maximum a posteriori (MAP) given and eventually to fire the neuron. The output of the acti-
by vation function is given by y = σ (wT x). If it fires, the
output becomes the next neuron’s input. Table 4 provides
n more details about activation functions.
arg max P(y|x1 ...xn ) = arg max P(xi |y). A set of neurons is called layer. A set of layers and
y y
i=1 synapses is called network. The input data x are passed
from the first layers to the last layer, called, respectively,
Naive Bayes has been applied in activity recognition because the input layer and the output layer, through intermediary
of the simple assumption on the likelihood, which is usually layers, called hidden layers. The term Deep Learning comes
violated in practice [29,77,81,86] from the network’s depth, that is, when the number of hidden
Adaboost is part of the classifier ensembles. Classifier layers grows.
ensembles encompass all algorithms that combine different Neurons belonging to same layers are not communicating
classifiers together. to each others, while neurons belonging to different layers
The combination between classifiers is meant in two ways: are connected and share the information passed through the
either using the same classifiers with different parameter’s activation function. If each neuron of the previous layer is
settings (e.g., random forest with different lengths), or using connected to all neurons of the next layer, the former is called
different classifiers together (e.g., random forest, support fully connected or dense layer. The output layer, also called
vector machines, and k-NN). classification layers in case of classification task or regres-
The ensemble classifiers encompass bagging, stacking, sion layer in case of continuous estimation, is responsable
blending, and boosting. In bagging, n samplings are gener- to estimate the predicted value ŷ of the labels y. Once the
ated from training set and a model is created on each. The last output is computed, the feed-forward procedure is com-
final output is a combination of each model’s prediction. Nor- pleted.
mally, either the average or a quantile is used. In stacking, Thereafter, an iterative procedure is computed to minimize
the whole training dataset is given to the multiple classifiers the loss function. This procedure is called back propagation
which are trained using the k-fold-cross-validation. After and is responsible to minimize the loss function with respect
training, they are combined for final prediction. In blending, to the weights wi . The weight’s values, indeed, represent how
the same procedure as staking is performed, but instead of strong is the relation between neurons belonging to different
the cross-validation, the dataset is divided into training and layers and how far the input information has to be trans-
validation. Finally, in boosting, the final classifier is com- fer through the network. The minimization procedure bases
posed of a weighted combination of models. The weights on gradient descent algorithms, which iteratively search for
are initially equal for each model and are iteratively updated weights, that reduce the value of the gradient of the loss until
based on the models performance, as for Adaboost [6,7,87]. it meets the global minimum or a stopping criteria. In general,
a greedy-wise tuning procedure over the hyper-parameters is
7.1.2 Traditional deep learning performed to the aim at achieving the best network configu-
ration. Most important hyper-parameters are: the number of
Generally, the relation between input data and labels is very layers, the kernel’s number and size, the pooling’s size, and
complex and mostly non-linear. Among Artificial Intelli- the regularization parameter, such as the learning rate.
gence algorithms, Artificial Neural Network (ANN) is a set According to Fig. 6, most used deep learning algorithms
of supervised machine learning techniques which emulate are described in the following.
human neural system with the aim at extracting non-linearity Multi-layer Perceptron (MLP) is the most widely used
relations from data for classification. Artificial Neural Network (ANN). It is a collection or neu-
Human neural system is composed by neurons (about 86 rons organized in a layers’ structure, connected in an acyclic
billions) which are connected with synapses (around 1014 ). graphs. Each neuron that belongs to a layer produces an
Neurons receive input signals from the outer (e.g., visual or output which becomes the input of the neurons of the next
olfactory) and based on the synaptic’s strength they fire and adjacent layer. Most common layer type is the fully con-
produce some output signals to be transmit to other neurons. nected layer, where each neurons share their output with each
Artificial Neural Network bases on the same neurons and adjacent layer’s neuron, while same layer’s neurons are not
synapses concept. connected. MLP is made up of the input layer, one (or more)
In a traditional ANN, each data input value is associated hidden layer and the output layer [88]. Used in HAR as base-
with a neuron and its synapses strength is measured by a line for deep learning techniques, it has been often compared
functional combination of input data x and randomly chosen with machine learning, such as SVM [48,89], RF [48], k-NN
weights w. This value is passed to an activation functions [89], DT [89], and deep learning techniques, LSTM [90],
σ which is responsable to determine the synapse strength CNN [89,90].
123
200 Journal of Reliable Intelligent Environments (2021) 7:189–213
Convolutional Neural Networks (ConvNet or CNN) is Forest in [27]. Roano et al. demonstrate that CNN out-
a class of ANN based on convolution products between ker- performs state-of-the-art techniques, which are all using
nels and small patches of the input data of the layer. The input hand-crafted features [92]. More recently, ensemble clas-
data are organized in channels if needed (e.g., in tri-axial sification algorithm with CNN-2 and CNN-7 shows better
accelerometer data, each axes is represented by one chan- performance when compared with machine learning random
nel), and normally convolution is performed independently forest, boosting, and traditional CNN-7 [40].
on each channel. The convolutional function is computed by Residual Neural Networks (ResNet) is a particular con-
sliding a convolutional kernel of the size of m × m over the volutional neural network composed by blocks and skip
input of the layer. That is, the calculation of the lth convolu- connections which permit to increase the number of lay-
tional layer is given by ers in the network. Success of Deep Neural network has
been accredited to the additional layer, but He at al. empiri-
m
l, j j l−1, j cally showed that there exists a maximum threshold for the
xi = f wa · xi+a−1 + bj , network’s depth without avoiding vanishing\explosion gra-
a=1
dient’s issues [93].
l, j
In Residual Neural Networks, the output xt−1 is both
where m is the kernel size, and xi is the jth kernel on passed as an input to the next convolutional-activation-
j
the i-th unit of the convolutional layer l. wa is the convolu- pooling block and directly added to the output of the block
tional kernel matrix and b j is the bias of the convolutional f (xt−1 ). The former addiction is called shortcut connection.
kernel. This value is mapped through the activation function The resulting output is xt = f (xt−1 ) + xt−1 . This procedure
σ . Thereafter, a pooling layer is responsable to compute the is repeated many times and permit to deepen the network
maximum or average value on a patch of the size r × r of the without adding neither extra parameters nor computation
resulting activation’s output. Mathematically, a local output complexity. Figure 8 shows an example of ResNet. Bianco et
after the max pooling or the average pooling process is given al. state that ResNet represents the most performing network
by in the state of the art [94], while Ferrari et al. demonstrated
that ResNet outperforms traditional machine learning tech-
niques [59,95].
l, j
max pooling xi = maxra,b=1 (xa,b ) Long-Short-Term-Memory Networks is a variant of the
Recurrent Neural Network which enables to store infor-
l, j r
average pooling xi = 1
r2 a,b=1 (x a,b ). mation over time in an internal memory, overcoming gra-
dient’s vanishing issue. Given a sequence of inputs x =
The pooling layer is responsible to extracts impor- {x1 , x2 , ..., xn }, LSTM’s external inputs are its previous cell
tant features and to reduces the data size’s dimension. state ct−1 , the previous hidden state h t−1 , and the current
This convolutional-activation-pooling layers block can be input vector xt . LSTM associates each time step with an input
repeated may time if necessary. The number of repetition gate, forget gate, and output gate, denoted, respectively, as i t ,
time determines the depth of the network. f t , and ot , which all are computed by applying an activation
Generally, between the last block and the output layer one function of the linear combination of weights, input xi , and
(or more), fully connected layer is added to perform a fusion hidden state h t−1 . An intermediate state c̃i is also computed
of the information extracted from all sensor channels [88]. through the tahnh of the linear combination of weights, input
After the feed-forward procedure is ended, the back propa- xi , and hidden state h t−1 . Finally, the cell and hidden state
gation is performed on the convolutional weights until the are updated as
convergence to the global minimum or until a stopping crite-
rion is met. Figure 7 depicts a CNN example in HAR, with six
channels, corresponding to xyz-acceleration and xyz-angualr
ct = f t · c̃t + i t · c̃t
velocity data, two convolutional-activation-max pooling lay-
ers, one fully connected layer, and a soft-max layer which
h t = ot · thanh(ct ).
compute the class probability given input data.
CNN is a robust model under many aspects: in terms of
local dependency due to the the signals correlation, in terms The forget gate decides how much of the previous infor-
of scale invariance for different paces or frequencies, and in mation is going to be forgotten. The input gate decides how
terms of sensor position [31,71]. For this reasons, CNN have to update the state vector using the information from the cur-
been largely studied in HAR [91]. rent input. Finally, the output gate decides what information
Additionally, CNN have been compared to other tech- to output at the current time step [30]. Figure 9 represents
niques. CNN outperforms SVM in [78] and baseline Random the network schema.
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 201
Although LSTM is a very powerful techniques when feature selection techniques to select the best features [97,
data temporal dependencies have to be considered during 98].
classification, it takes into account only past information. Additionally, approaches using hand-crafted features make
Bidirectional-LSTM (BLSTM) offers the possibility to con- it very difficult to compare between different algorithms due
sider past and future information. Hammerla et al. illustrate to different experimental grounds and encounter difficulty in
how their results based on LSTM and BLSTM, verified on a discriminating very similar activities [40].
large benchmark dataset, are the state-of-the-art [96]. In recent years, deep learning techniques are increasingly
becoming more and more attractive in human activity recog-
nition. First applied to 3D and 2D context in particular in
vision computing domain [99,100], deep learning methods
7.1.3 Traditional machine learning vs traditional deep have been shown to be valid methods also adapted to the
learning 1D case, that is, for time series classification [101], such as
HAR.
Machine learning techniques have been demonstrated to be Deep learning techniques have shown many advantages
high performing even with low amount of labeled data and over the machine learning, among them the capability to
that are low time-consumption methods. automatically extract features. In particular, depending on
Nevertheless, machine learning techniques remain highly the depth of the algorithm, it is possible to achieve a very
expertise-dependent algorithms. Input data feeding machine high abstraction level for the features, despite machine
learning algorithms are normally features, a processed ver- learning techniques [71]. In these terms, deep learning
sion of the data. Features permit to reduce data dimen- techniques are considered valid algorithms to overcome
sionality and computational time. However, features are machine learning dependency on the feature extraction
hand-crafted and are expert knowledge and tasks depen- procedure and show crucial advantages in algorithm perfor-
dent. mance.
Furthermore, engineered features cannot represent salient
feature of complex activities, and involve time-consuming
123
202 Journal of Reliable Intelligent Environments (2021) 7:189–213
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 203
Fig. 10 A graphical
representation of the three main
classification models
Following sections discuss state-of-the-art results related context. The flaw is that it must be implemented for each
to population diversity issue based on the personalization of end user [108].
machine learning and deep learning algorithms. The hybrid model uses the end user data and the data of
the other users for the development of the activity recog-
nition model. In other words, the classification model is
7.2.1 Personalized machine learning trained both on the data of other users and partially on data
from the final user. The idea is that the classifier should
To achieve generalizable activity recognition models based recognize easier the activity performed by the final user.
on machine learning algorithms, three approaches are mainly Figure 10 shows a graphical depiction of the three models
adopted in literature: to better clarify their differences. Tapia et al. [109] intro-
duced the subject-independent and subject-dependent
– Data-based approaches encompass three data split con- models, and later Weiss at al. [29] the hybrid model.
figurations: subject-independent, subject-dependent, and The models were compared by different researchers and
hybrid. The subject-independent (also called imper- also extended to achieve better performance.
sonal) model does not use the end user data for the Medrano et al. [110] demonstrated that the subject-
development of the activity recognition model. It is based dependent approach achieves higher performance and
on the definition of a single activity recognition model then subject-independent approach for falls detection,
that must be flexible enough to be able to generalize the called respectively personal and generic fall detec-
diversity between users and it should be able to have good tor.
performance once a new user is to be classified. Shen et al. [111] achieved similar results for activity
The subject-dependent (also called personal) model only recognition and come to the conclusion that the subject-
uses the end user data for the development of the activity dependent (termed personalized) model tends to perform
recognition model. The specific model, being built with better than the subject-independent (termed generalized)
the data of the final user, is able to capture her/his pecu- one, because user training data carry her/his personalized
liarities, and thus, it should well generalize in the real activity information.
123
204 Journal of Reliable Intelligent Environments (2021) 7:189–213
Lara et al. [112] consider subject-independent approach higher the weight, the more similar two users are and
more challenging, because in practice, a real-time activ- the more that signals from those users is used for classi-
ity recognition system should be able to fit any individual fication.
and they consider not convenient in many cases to train Garcia-Ceja et al. [116,117] exploited inter-class simi-
the activity model for each subject. larity instead of the similarity between subjects (called
Weiss at al. [29] and Lockhart et al. [61] compared the inter-user similarity) presented by Lane et al. [6]. The
subject-independent and the subject-dependent (termed final model is trained using only the instances that are
impersonal and personal, respectively) with the hybrid similar to the target user for each class.
model. They concluded that the models built on the – Classifier-based approaches obtain generalization from
subject-dependent and the hybrid approaches achieve several combinations of activity recognition models.
same performance and outperform the performance of Hong at al. [105] proposed a solution where the general-
the model based on the subject-independent approach. ization is obtained by a combination of activity recogni-
Similar conclusions are achieved by Lane et al. [6], who tion models (trained by a subject-dependent approach).
compare subject-dependent and subject-independent This combination permits to achieve better activity recog-
(respectively, named isolated and single) models with nition performance for the final user.
another model called multi-naive. In this case, subject- Reiss et al. [118] proposed a model that consists of a set
dependent approach outperformed the other two of weighted classifiers (experts). Initially, all the weights
approaches as the amount of the available data increases. have the same values. The classifiers are adapted to a
Chen et al. [75] compared the subject-independent, new user by considering a new set of suitable weights
subject-dependent, and hybrid (respectively, called rest- that better fit the labeled data of the new user.
to-one, one-to-one, and all-to-one) models, and once
again the subject-dependent model outperforms the Ferrari et al. have recently proposed a similarity-based
subject-independent model, whereas the hybrid model approach that does not fall into the above classification [70].
achieves the best performance. The authors also classify The proposed approach is a combination of data-based and
subject-independent and hybrid models as generalized similarity-based approaches. Authors trained the algorithms
models, while the subject-dependent model falls into the by exploiting the similarity between subject and different
category of the personalized models. data splits. They stated that hybrid models and similarity
Same results have been achieved by Vaizman et al. achieve best performance with respect to the state-of-the-art
[113], who compared the subject-independent, subject- algorithms.
dependent, and hybrid (respectively, called universal,
individual, and adapted) models. Furthermore, they 7.2.2 Personalized deep learning
introduced context-based information by exploiting many
sensors, such as, location, audio, and phone-state sen- Personalized deep learning techniques have been explored in
sors. the literature and mainly refer to two main approaches
– Similarity-based approach consider the similarity between
users as a crucial factor for obtaining a classification – Incremental learning refers to recognition methods that
model able to adapt to new situations. can learn from streaming data and adapt to new moving
Sztyler et al. [114,115] proposed a personalized vari- style of a new unseen person without retraining [119]. Yu
ant of the hybrid model. The classification model is et al. [120] exploited the hybrid model and compare it to a
trained using the data of those users that are similar to new model called incremental hybrid model. The latter is
the final user based on signal patterns similarity. They trained first with the subject-independent approach, and
found that people with same fitness level also have simi- then, it is incrementally updated based on personal data
lar acceleration patterns regarding the running activity, from a specific user. The difference from the hybrid is
whereas gender and physique could characterize the that the incremental hybrid model gives more weights to
walking activity. The heterogeneity of the data is not personal data during training.
eliminated, but it is managed in the classification pro- Similarly, Siirtola et al. [41] proposed an incremental
cedure. learning method. The method initially uses a subject-
A similar approach is presented by Lane et al. [6]. The independent model, which is updated with a two-step
proposed approach consists in exploiting the similarity feature extraction method from the test subject data.
between users to weight the collected data. The simi- Afterwards, the same authors proposed a 4 steps subject-
larities are calculated based on signal pattern data, or dependent model [39]. The proposed method initially
on physical data (e.g., age and height), or on lifestyle uses a subject-independent model, collects and labels
index. The value of similarity is used as weight. The the data from the user based on the subject-independent
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 205
123
206 Journal of Reliable Intelligent Environments (2021) 7:189–213
(false negative) with respect to the cases where it does rec- In Table 7, each dataset has assigned an ID (column ID).
ognize a normal behavior as fall (false positive). Columns Dataset and Reference specify the official name
An appropriate metric for this case is the Fβ -score. It is and the bibliographic reference of each dataset respectively.
defined as function of recall and precision. Column # Activities specifies the number of ADLs present
Recall is also called sensitivity or true positive rate and in the dataset. Usually, each dataset contains 6–10 ADLs and
is calculated as the number of correct positive predictions in some cases, both ADLs and Falls data are considered, as
divided by the total number of positives; the best value cor- in datasets D08, D09, D11.
responds to 1, the worst to 0. Column # Subjects reports the number of subjects that
Precision is also called positive predictive value and is cal- performed the activities. Considering a restricted number of
culated as the number of correct positive predictions divided subjects in the analysis does not just impact the quality and
by the total number of positive prediction; the best precision robustness of the classification, but also the ability to eval-
is 1, whereas the worst is 0. uate the consistency of the results across subjects [107]. In
Formula are given by other words, the number of the subjects included in the train-
ing set of the algorithm is crucial in terms of generalization
pr ecision = TP
T P+F P capabilities of the model to classify a new unseen instance.
r ecall = T P+F TP Nevertheless, the difference between people, also called pop-
N
ulation diversity, could lead to poor classification, as largely
Fβ = (1+β
2 )· pr ecision·r ecall
(β 2 · pr ecision)+r ecall discussed in [6]. Unfortunately, most of the datasets are lim-
ited in terms of subject numerousness.
If β = 1, F1 -score is the harmonic mean of the precision To overcome this issues, recently, several HAR research
and the recall. groups implemented strategies for merging datasets [102,
The specifictiy, also called true negative rate, is calcu- 134]. Other techniques, such as transfer learning and per-
lated as the number of correct negative predictions divided sonalization, have been investigated for robustness of results
by the total number of negatives. Best value corresponds to [61,123,135].
1, while the worst is 0. Together with the sensitivity, the Column Devices reports typologies and number of devices
specificity helps to determine the best parameter value when that have been used to collect the data. In particular, datasets
a tuning procedure is computed. A common practice is to D03, D04, D05, D06, D11, D12 collected data from sev-
calculate the area under the curve (AUC) created by plot- eral wearable devices at the same time, which is due to the
ting values of the sensitivity and 1-specificity. This curve is following reasons. First, the device position influences the
called Receiver-Operating Characteristic curve (ROC). The performance of the classification. Several works investigated
value of the parameter which maximizes the classification which position leads to the best classification [35,136]. Fur-
performance corresponds the point on the ROC curve where thermore, it is also challenging to investigate devices fusion,
AUC is maximal. which has a not negligible positive effects on the classifica-
tion performances and reflects realistic situation where users
employ multiple smart devices at once [30,56,63,114].
8 Datasets Position-aware and position-unaware scenarios have been
presented in [35]. In position-aware scenarios, the recogni-
In recent years, the spread of wearable devices has lead to tion accuracy on different positions is evaluated individu-
a huge availability of physical activity data. Smartphones ally, while in position-unaware scenarios, the classification
and smartwatches have become more and more pervasive performance of the combination of devices positions is mea-
and ubiquitous in our everyday life. This high diffusion and sured. It is shown that the latter approach highly improves the
portability of wearable devices has enabled researchers to classification performances for some activities, such as walk-
easily produce plenty of labeled raw data for human activity ing, walking upstairs, and walking downstairs. Almaslukh et
recognition. al. exploited deep learning technique for classification and
Several public datasets are open to the HAR community demonstrated its capability to produce an effective position-
and are freely accessible on the web, see for instance the UC independent HAR.
Irvine Machine Learning Repository [127]. Column Sensors lists the sensors exploited in data collec-
Table 7 shows the main characteristics of the most used tion. Tri-axial acceleration sensor (A) is the most exploited
datasets in the state-of-the-art. inertial sensor among the literature [7]. Datasets D9, D14,
Most of the datasets used contain signals recorded by and D15 even collected just acceleration data. Acceleration
smartphones. Some datasets also contain signals from both is very popular, because it both directly captures the subjects’
smartphones and IMUs, and from both smartphones and physiology motion status and it consumes low energy [137].
smartwatches (datasets D03, D010, D11, and D16).
123
Table 7 Public HAR dataset collection inertial signals recorded from smartphone
ID Dataset # Activity # Subject # Devices Sensors Sampling rate (Hz) Metadata Reference
123
207
208 Journal of Reliable Intelligent Environments (2021) 7:189–213
Acceleration has been combined with other sensors, such share the exact same specifications. As an example, some
as gyroscope, magnetometer, GPS, and biosensors, with the accelerometers may output signals including the low fre-
aim of improving activity classification performance. quencies of the gravity acceleration, while other may exclude
In general, data captured from several sensors carry addi- it internally. For this reason, the preprocessing phase is of
tional informations about the activity and about the device paramount importance to reduce signal differences due to
settings. For instance, information derived from the gyro- heterogeneous sources and improve the consistency between
scope is used to maintain reference direction in the motion the in vitro training (usually performed with specific devices
system and permits to determine the orientation of the smart- and sensors) and real-world use, where the devices and sen-
phone [32,51]. sors may be similar, but not equal, to the ones used when the
Performances comparisons between gyroscope, accelera- models have been trained.
tion, and their combination for human activity recognition Moreover, we covered the fact that the way the signal
have been explored in many studies. For example, Ferrari et is segmented and fed to the classification model may have
al. showed that accelerometer is more performing than the a significant impact on the results. In the literature, sliding
gyroscope and their combination leads to an overall improve- windows with a 50% overlap is the most common choice.
ment of about 10% [36]. Shoaib et al. stated that in situations Another aspect we highlighted during this study is the
where accelerometer and gyroscope individually perform importance of the features used to train the model, as they
with low accuracies, their combination improved the over- have a significant impact on the performances of the clas-
all performances, while when one of the sensors performs sifiers. Specifically, hand-crafted features may better model
with higher accuracy, the performances are not impacted by some already known traits of the signals, but automatically
the combination of the sensors [35]. extracted features are free of bias and may uncover unfore-
Column Sampling Rate shows the frequency at which the seen patterns and characteristics.
data are acquired. The sampling rate has to be high enough New and improved features that are able to better repre-
to capture most significant behavior of data. In HAR, the sent the peculiar traits of the human activities are needed:
most commonly used sampling rate is 50 Hz when recording ideally, they would combine the domain knowledge of the
inertial data (see Table 7). experts given in the hand-crafted features and the lack of
Column Metadata lists characteristics regarding the sub- bias provided by the automatically generated features.
jects that performed the activities. In D07–11, D15 physical Finally, regarding the Classification phase, we highlighted
characteristics are annotated. In D15, environmental char- how Model-Driven approaches are being replaced by Data-
acteristics have been also stored, such as the kind of shoes Driven approaches as they are usually better performing.
worn, floor characteristics, and places where activities have Among the Data-Driven approaches, we find that both
been preformed. As discussed in Sect. 7, metadata are pre- traditional ML approaches and more modern DL techniques
cious additional information, which help to overcome the can be applied to HAR problems. Specifically, we learned
population diversity problem. that while DL methods outperform traditional ML most of
the time and are able to automatically extract the features,
they require significant amounts of computational power and
9 Lessons learned and future directions data more than traditional ML techniques, which makes the
latter still a good fit for many use cases.
In this study, we covered the main steps of the Activity Recog- Regardless of the classification method, we discussed how
nition Process (ARP). For each phase of the ARP pipeline, Population Diversity may impact the performances of HAR
we highlighted what are the key aspects that are being con- applications. To alleviate this problem, we mentioned some
sidered and that are more challenging in the field of HAR. recents trends regarding the personalization of the models.
Specifically, when considering the Data Acquisition phase, Personalizing a classification model means identifying only a
we noted that the number and kind of available devices portion of the population that is similar to the current subject
is constantly increasing and new devices and sensors are under some perspective and then only use this subset to train
introduced every day. To take advantage of this aspect, new the classifier. The resulting model should be better fitting to
sensors should be experimented in HAR applications to the end user. This, however, may exacerbate the issue of data
determine whether or not they can be employed to recog- scarcity, since only small portions of the full datasets may be
nize actions. Moreover, new combinations (data fusion) are used to train the model for that specific user.
possible, which may again increase the ability of the data to To solve this issue, more large-scale data collection cam-
represent the performed activities. paigns are needed, as well as further studies in the field of
This increase in sensor numbers and types, while ensuring dataset combination and preprocessing pipelines to effec-
the availability of more data sources, may pose a challenge tively combine and reduce differences among data acquired
in terms of heterogeneity as not all the devices and sensors from different sources.
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 209
123
210 Journal of Reliable Intelligent Environments (2021) 7:189–213
classification of hand motions with deep learning techniques. Sen- 42. Li F, Shirahama K, Nisar MA, Köping L, Grzegorzek M (2018)
sors 18(8):2497 Comparison of feature learning methods for human activity recog-
25. Liu J, Chen J, Jiang H, Jia W, Lin Q, Wang Z (2018) Activity nition using wearable sensors. Sensors 18(2):679
recognition in wearable ECG monitoring aided by accelerome- 43. Micucci D, Mobilio M, Napoletano P (2017) Unimib shar: a
ter data. In: 2018 IEEE international symposium on circuits and dataset for human activity recognition using acceleration data
systems (ISCAS) (IEEE, 2018), pp 1–4 from smartphones. Appl Sci 7(10):1101
26. Bao L, Intille SS (2004) Activity recognition from user-annotated 44. Khan AM, Lee YK, Lee SY, Kim TS (2010) Human activity
acceleration data. In: International conference on pervasive com- recognition via an accelerometer-enabled-smartphone using ker-
puting, Springer, New York, pp 1–17 nel discriminant analysis. In: 2010 5th international conference
27. Lee SM, Yoon SM, Cho H (2017) Human activity recognition on future information technology (IEEE, 2010), pp 1–6
from accelerometer data using convolutional neural network. In: 45. Ravi N, Dandekar N, Mysore P, Littman ML (2005) Activity
2017 IEEE International conference on big data and smart com- recognition from accelerometer data. In: Proceedings of the con-
puting (BigComp) (IEEE, 2017), pp 131–134 ference on innovative applications of artificial intelligence (IAAI)
28. Shakya SR, Zhang C, Zhou Z (2018) Comparative study of 46. Lester J, Choudhury T, Borriello G (2006) A practical approach
machine learning and deep learning architecture for human activ- to recognizing physical activities. In: International conference on
ity recognition using accelerometer data. Int J Mach Learn pervasive computing, Springer, New York, pp 1–16
Comput 8:577 47. Gyllensten IC, Bonomi AG (2011) Identifying types of physical
29. Weiss GM, Lockhart JW (2012) The impact of personalization activity with a single accelerometer: evaluating laboratory-trained
on smartphone-based activity recognition. In: Proceedings of the algorithms in daily life. IEEE Trans Biomed Eng 58(9):2656
AAAI workshop on activity context representation: techniques 48. Bayat A, Pomplun M, Tran DA (2014) A study on human activ-
and languages ity recognition using accelerometer data from smartphones. Proc
30. Milenkoski M, Trivodaliev K, Kalajdziski S, Jovanov M, Sto- Comput Sci 34:450
jkoska BR (2018) Real time human activity recognition on 49. Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A
smartphones using LSTM networks. In: 2018 41st International public domain dataset for human activity recognition using smart-
convention on information and communication technology, elec- phones. In: Proceedings of the European symposium on artificial
tronics and microelectronics (MIPRO) (IEEE, 2018), pp 1126– neural networks, computational intelligence and machine learning
1131 (ESANN13)
31. Almaslukh B, Artoli AM, Al-Muhtadi J (2018) A robust deep 50. Bo X, Huebner A, Poellabauer C, O’Brien MK, Mummidisetty
learning approach for position-independent smartphone-based CK, Jayaraman A (2007) Evaluation of sensing and processing
human activity recognition. Sensors 18(11):3726 parameters for human action recognition. In: 2017 IEEE Inter-
32. Alruban A, Alobaidi H, Clarke N, Li F (2019) Physical activ- national Conference on Healthcare Informatics (ICHI) (IEEE,
ity recognition by utilising smartphone sensor signals. In: 8th 2017), pp 541–546
International conference on pattern recognition applications and 51. Su X, Tong H, Ji P (2014) Activity recognition with smartphone
methods, SciTePress, pp 342–351 sensors. Tsinghua Sci Technol 19(3):235
33. Hernández F, Suárez LF, Villamizar J, Altuve M (2019) Human 52. Antonsson EK, Mann RW (1985) The frequency content of gait.
activity recognition on smartphones using a bidirectional LSTM J Biomech 18(1):39
network. In: 2019 XXII symposium on image, signal processing 53. Quigley B, Donnelly M, Moore G, Galway L (2018) A com-
and artificial vision (STSIVA) (IEEE, 2019), pp 1–5 parative analysis of windowing approaches in dense sensing
34. Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A environments. In: Multidisciplinary Digital Publishing Institute
robust human activity recognition system using smartphone sen- Proceedings, vol 2, p 1245
sors and deep learning. Fut Gen Comput Syst 81:307 54. Banos O, Galvez JM, Damas M, Pomares H, Rojas I (2014)
35. Shoaib M, Bosch S, Incel OD, Scholten H, Havinga PJ (2014) Window size impact in human activity recognition. Sensors
Fusion of smartphone motion sensors for physical activity recog- 14(4):6474
nition. Sensors 14(6):10146 55. Chen K, Zhang D, Yao L, Guo B, Yu Z, Liu Y (2020) Deep
36. Ferrari A, Micucci D, Mobilio M, Napoletano P (2019) Human learning for sensor-based human activity recognition: overview,
activities recognition using accelerometer and gyroscope. In: challenges and opportunities. arXiv:2001.07416
European conference on ambient intelligence, Springer, New 56. Janidarmian M, Roshan Fekr A, Radecka K, Zilic Z (2017) A com-
York, pp 357–362 prehensive analysis on wearable acceleration sensors in human
37. Sztyler T, Stuckenschmidt H (2016) On-body localization of activity recognition. Sensors 17(3):529
wearable devices: An investigation of position-aware activity 57. Capela NA, Lemaire ED, Baddour N (2015) Improving classi-
recognition. In: 2016 IEEE international conference on perva- fication of sit, stand, and lie in a smartphone human activity
sive computing and communications (PerCom) (IEEE, 2016), pp recognition system. In: 2015 IEEE international symposium on
1–9 medical measurements and applications (MeMeA) proceedings,
38. Bharti P, De D, Chellappan S, Das SK (2018) HuMAn: complex IEEE, pp 473–478
activity recognition with multi-modal multi-positional body sens- 58. Langley P (1996) Elements of machine learning. Morgan Kauf-
ing. IEEE Trans Mob Comput 18(4):857 mann, New York
39. Siirtola P, Koskimäki H, Röning J (2019) From user-independent 59. Ferrari A, Micucci D, Marco M, Napoletano P (2019) Hand-
to personal human activity recognition models exploiting the sen- crafted features vs residual networks for human activities recog-
sors of a smartphone. arXiv:1905.12285 nition using accelerometer. In: Proceedings of the IEEE interna-
40. Zhu R, Xiao Z, Li Y, Yang M, Tan Y, Zhou L, Lin S, Wen H tional symposium on consumer technologies (ISCT)
(2019) Efficient human activity recognition solving the confusing 60. Liu H, Motoda H (1998) Feature extraction, construction and
activities via deep ensemble learning. IEEE Access 7:75490 selection: A data mining perspective, vol 453, Springer, New York
41. Siirtola P, Koskimäki H, Röning J (2019) Personalizing 61. Lockhart JW, Weiss GM (2014) The benefits of personalized
human activity recognition models using incremental learning. smartphone-based activity recognition models. In: Proceedings of
arXiv:1905.12628 the 2014 SIAM international conference on data mining (SIAM,
2014), pp 614–622
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 211
62. Kwapisz JR, Weiss GM, Moore SA (2011) Activity recognition 82. Rokach L, Maimon OZ (2008) Data mining with decision trees:
using cell phone accelerometers. ACM SIGKDD Explor Newsl theory and applications. Data mining with decision trees: theory
12(2):74 and applications, vol. 69, World scientific, Singapore
63. Altun K, Barshan B, Tunçel O (2010) Comparative study on 83. Breiman L (1999) 1 RANDOM FORESTS–RANDOM FEA-
classifying human activities with miniature inertial and magnetic TURES
sensors. Pattern Recogn 43(10):3605 84. Polu SK (2018) Human activity recognition on smartphones
64. Sani S, Massie S, Wiratunga N, Cooper K (2017) Learning using machine learning algorithms. Int J Innovat Res Sci Technol
deep and shallow features for human activity recognition. In: 5(6):31
International conference on knowledge science, engineering and 85. Bansal A, Shukla A, Rastogi S, Mittal S (2018) Micro activity
management, Springer, New York, pp 469–482 recognition of mobile phone users using inbuilt sensors. In: 2018
65. Plötz T, Hammerla NY, Olivier PL (2011) Feature learning for 8th international conference on cloud computing, data science &
activity recognition in ubiquitous computing. In: Twenty-second engineering (confluence), IEEE, pp 225–230
international joint conference on artificial intelligence 86. Antal P (1998) Construction of a classifier with prior domain
66. Lago P, Inoue S (2019) Comparing Feature Learning Methods for knowledge formalised as bayesian network. In IECON’98. Pro-
Human Activity Recognition: Performance study in new user sce- ceedings of the 24th Annual Conference of the IEEE Industrial
nario. In: 2019 Joint 8th International Conference on Informatics, Electronics Society (Cat. No. 98CH36200), vol 4, IEEE, pp 2527–
Electronics & Vision (ICIEV) and 2019 3rd International Confer- 2531
ence on Imaging, Vision & Pattern Recognition (icIVPR) (IEEE, 87. Nguyen H, Tran KP, Zeng X, Koehl L, Tartare G (2019) Wear-
2019), pp 118–123 able sensor data based human activity recognition using machine
67. Wang J, Liu P, She MF, Nahavandi S, Kouzani A (2013) Bag- learning: a new approach. arXiv:1905.03809
of-words representation for biomedical time series classification. 88. Yu T, Chen J, Yan N, Liu X (2018) A multi-layer parallel
Biomed Signal Process Control 8(6):634 LSTM Network for Human Activity Recognition with Smart-
68. Shirahama K, Grzegorzek M (2017) On the generality of code- phone Sensors. In: 2018 10th International conference on wireless
book approach for sensor-based human activity recognition. communications and signal processing (WCSP), IEEE, pp 1–6
Electronics 6(2):44 89. Suto J, Oniga S, Lung C, Orha I (2018) Comparison of offline
69. Abdi H, Williams LJ (2010) Principal component analysis. Wiley and real-time human activity recognition results using machine
Interdiscip Rev Comput Stat 2(4):433 learning techniques. In: Neural computing and applications, pp
70. Ferrari A, Micucci D, Mobilio M, Napoletano P (2020) On the 1–14
personalization of classification models for human activity recog- 90. Nair N, Thomas C, Jayagopi DB (2018) Human activity recogni-
nition. IEEE Access 8:32066 tion using temporal convolutional network. In: Proceedings of the
71. Wang J, Chen Y, Hao S, Peng X, Hu L (2019) Deep learning for 5th international workshop on sensor-based activity recognition
sensor-based activity recognition: a survey. Pattern Recogn Lett and interaction, pp 1–8
119:3 91. Demrozi F, Pravadelli G, Bihorac A, Rashidi P (2020) Human
72. Zhang W, Yang G, Lin G, Ji C, Gupta MM (2018) On definition activity recognition using inertial, physiological and environmen-
of deep learning. In: 2018 World Automation Congress (WAC), tal sensors: a comprehensive survey. arXiv:2004.08821
IEEE, pp 1–5 92. Ronao CA, Cho SB (2015) Deep convolutional neural networks
73. Lin Y, Zhang W (2004) Towards a novel interface design frame- for human activity recognition with smartphone sensors. In: Inter-
work: function-behavior-state paradigm. Int J Hum Comput Stud national conference on neural information processing, Springer,
61(3):259 pp 46–53
74. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 93. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
20(3):273 image recognition. In: Proceedings of the IEEE conference on
75. Chen Y, Shen C (2017) Performance analysis of smartphone- computer vision and pattern recognition (CVPR), pp 770–778
sensor behavior for human activity recognition. IEEE Access 94. Bianco S, Cadene R, Celona L, Napoletano P (2018) Benchmark
5:3095 analysis of representative deep neural network architectures. IEEE
76. Amezzane I, Fakhri Y, El Aroussi M, Bakhouya M (2018) Towards Access 6:64270
an efficient implementation of human activity recognition for 95. Ferrari A, Micucci D, Mobilio M, Napoletano P (2019) Hand-
mobile devices. EAI Endorsed Trans Context-Aware Syst Appl crafted features vs residual networks for human activities recog-
4(13) nition using accelerometer. In: 2019 IEEE 23rd international
77. Vaughn A, Biocco P, Liu Y, Anwar M (2018) Activity detection symposium on consumer technologies (ISCT), IEEE, pp 153–156
and analysis using smartphone sensors. In: 2018 IEEE Interna- 96. Hammerla NY, Halloran S, Plötz T (2016) Deep, convolutional,
tional Conference on Information Reuse and Integration (IRI), and recurrent models for human activity recognition using wear-
IEEE, pp 102–107 ables. arXiv:1604.08880
78. Xu W, Pang Y, Yang Y, Liu Y (2018) Human activity recog- 97. Friday NH, Al-garadi MA, Mujtaba G, Alo UR, Waqas A (2018)
nition based on convolutional neural network. In: 2018 24th Deep learning fusion conceptual frameworks for complex human
International conference on pattern recognition (ICPR), IEEE, pp activity recognition using mobile and wearable sensors. In: 2018
165–170 International conference on computing, mathematics and engi-
79. Jalal A, Quaid MAK, Hasan AS (2018) Wearable sensor-based neering technologies (iCoMET), IEEE, pp 1–7
human behavior understanding and recognition in daily life for 98. Yang J, Nguyen MN, San PP, Li XL, Krishnaswamy S (2015) Deep
smart environments. In: 2018 International conference on fron- convolutional neural networks on multichannel time series for
tiers of information technology (FIT), IEEE, pp 105–110 human activity recognition. In: Proceedings of the international
80. Witten IH, Frank E, Hall MA (2005) Practical machine learning joint conference on artificial intelligence (IJCAI 15)
tools and techniques. Morgan Kaufmann, pp 578 99. Coşar S, Donatiello G, Bogorny V, Garate C, Alvares LO, Bré-
81. Shoaib M, Bosch S, Incel OD, Scholten H, Havinga PJ (2016) mond F (2016) Toward abnormal trajectory and event detection in
Complex human activity recognition using smartphone and wrist- video surveillance. IEEE Trans Circ Syst Video Technol 27(3):683
worn motion sensors. Sensors 16(4):426
123
212 Journal of Reliable Intelligent Environments (2021) 7:189–213
100. Mabrouk AB, Zagrouba E (2018) Abnormal behavior recognition 119. Siirtola P, Röning J (2019) Incremental learning to personalize
for intelligent video surveillance systems: a review. Expert Syst human activity recognition models: the importance of human AI
Appl 91:480 collaboration. Sensors 19(23):5151
101. LeCun Y, Bengio Y et al (1995) Convolutional networks for 120. Yu T, Zhuang Y, Mengshoel OJ, Yagan O (2016) Hybridizing
images, speech, and time series. Handb Brain Theory Neural Netw personal and impersonal machine learning models for activity
3361(10):1995 recognition on mobile devices. In: Proceedings of the EAI interna-
102. Siirtola P, Koskimäki H, Röning J (2018) OpenHAR: A Matlab tional conference on mobile computing, applications and services
toolbox for easy access to publicly open human activity data sets. (MobiCASE)
In: Proceedings of the ACM international joint conference and 121. Vo QV, Hoang MT, Choi D (2013) Personalization in mobile activ-
international symposium on pervasive and ubiquitous computing ity recognition system using K-medoids clustering algorithm. Int
and wearable computers (UbiComp18) J Distrib Sens Netw 9(7):315841
103. Bianchi V, Bassoli M, Lombardo G, Fornacciari P, Mordonini M, 122. Abdallah ZS, Gaber MM, Srinivasan B, Krishnaswamy S (2015)
De Munari I (2019) IoT wearable sensor and deep learning: an Adaptive mobile activity recognition system with evolving data
integrated approach for personalized human activity recognition streams. Neurocomputing 150:304
in a smart home environment. IEEE Internet of Things J 6(5):8553 123. Rokni SA, Nourollahi M, Ghasemzadeh H (2018) Personalized
104. Burns DM, Whyne CM (2020) Personalized activity recognition human activity recognition using convolutional neural networks.
with deep triplet embeddings. arXiv:2001.05517 In: Thirty-second AAAI conference on artificial intelligence
105. Hong JH, Ramos J, Dey AK (2016) Toward personalized activity 124. Ferrari A, Micucci D, Mobilio M, Napoletano P (2020) On the
recognition systems with a semipopulation approach. IEEE Trans personalization of classification models for human activity recog-
Hum-Mach Syst 46(1):101–112 nition. arXiv:2009.00268 (2020)
106. Igual R, Medrano C, Plaza I (2015) A comparison of public 125. Ronao CA, Cho SB (2014) Human activity recognition using
datasets for acceleration-based fall detection. Med Eng Phys smartphone sensors with two-stage continuous hidden Markov
37(9):870 models. In: 2014 10th International conference on natural com-
107. Lockhart JW, Weiss GM (2014) Limitations with activity recogni- putation (ICNC), IEEE, pp 681–686
tion methodology & data sets. In: Proceedings of the 2014 ACM 126. Su X, Tong H, Ji P (2014) Accelerometer-based activity recog-
international joint conference on pervasive and ubiquitous com- nition on smartphone. In: Proceedings of the 23rd ACM interna-
puting: adjunct publication, pp 747–756 tional conference on conference on information and knowledge
108. Berchtold M, Budde M, Schmidtke HR, Beigl M (2010) An exten- management, pp 2021–2023
sible modular recognition concept that makes activity recognition 127. Bay SD, Kibler D, Pazzani MJ, Smyth P (2000) The UCI KDD
practical. In: Annual conference on artificial intelligence (AAAI) archive of large data sets for data mining research and experimen-
109. Tapia EM, Intille SS, Haskell W, Larson K, Wright J, King A, tation. ACM SIGKDD Explor Newsl 2(2):81
Friedman R (2007) Real-time recognition of physical activities 128. Stisen A, Blunck H, Bhattacharya S, Prentow TS, Kjaergaard MB,
and their intensities using wireless accelerometers and a heart rate Dey A, Sonne T, Jensen MM (2015) Smart devices are different:
monitor. In: Proceeding of the IEEE international symposium on assessing and mitigating mobile sensing heterogeneities for activ-
wearable computers (ISWC) ity recognition. In: Proceedings of the 13th ACM conference on
110. Medrano C, Igual R, Plaza I, Castro M (2014) Detecting falls embedded networked sensor systems, pp 127–140
as novelties in acceleration patterns acquired with smartphones. 129. Malekzadeh M, Clegg RG, Cavallaro A, Haddadi H (2018) Pro-
PLoS One 9(4):e94811 tecting sensory data against sensitive inferences. In: Proceedings
111. Shen C, Chen Y, Yang G (2016) On motion-sensor behavior analy- of the workshop on privacy by design in distributed systems (W-
sis for human-activity recognition via smartphones. In: 2016 Ieee P2DS18)
International Conference on Identity, Security and Behavior Anal- 130. Vavoulas G, Chatzaki C, Malliotakis T, Pediaditis M, Tsik-
ysis (Isba), IEEE, pp 1–6 nakis M (2016) The MobiAct dataset: recognition of activities
112. Lara OD, Pérez AJ, Labrador MA, Posada JD (2012) Centinela: of daily living using smartphones. In: Proceedings of Information
a human activity recognition system based on acceleration and and Communication Technologies for Ageing Well and e-Health
vital sign data. Pervasiv Mob Comput 8(5):717 (ICT4AgeingWell16)
113. Vaizman Y, Ellis K, Lanckriet G (2017) Recognizing detailed 131. Casilari E, Santoyo-Ramón JA, Cano-García JM (2017)
human context in the wild from smartphones and smartwatches. UMAFall: a multisensor dataset for the research on automatic
IEEE Pervasive Comput 16(4):62 fall detection. Procedia Comput Sci 110:32
114. Sztyler T, Stuckenschmidt H (2017) Online personalization of 132. Siirtola P, Röning J (2012) Recognizing human activities user-
cross-subjects based activity recognition models on wearable independently on smartphones based on accelerometer data.
devices. In: Proceedings of the IEEE international conference on IJIMAI 1(5):38
pervasive computing and communications (PerCom) 133. Kawaguchi N, Watanabe H, Yang T, Ogawa N, Iwasaki Y, Kaji K,
115. Sztyler T, Stuckenschmidt H, Petrich W (2017) Position-aware Terada T, Murao K, Hada H, Inoue S et al (2012) Hasc2012corpus:
activity recognition with wearable devices. Pervasiv Mob Comput large scale human activity corpus and its application. In: Proceed-
38:281 ings of the second international workshop of mobile sensing: from
116. Garcia-Ceja E, Brena R (2015) Building personalized activity smartphones and wearables to big data, pp 10–14
recognition models with scarce labeled data based on class simi- 134. Ferrari A, Mobilio M, Micucci D, Napoletano P (2019) On the
larities. In: International conference on ubiquitous computing and homogenization of heterogeneous inertial-based databases for
ambient intelligence, Springer, New York, pp 265–276 human activity recognition. In: 2019 IEEE world congress on
117. Garcia-Ceja E, Brena R (2016) Activity recognition using com- services (SERVICES), IEEE, pp 295–300
munity data to complement small amounts of labeled instances. 135. Ferrari A, Micucci D, Marco M, Napoletano P (2019) On the
Sensors 16(6):877 homogenization of heterogeneous inertial-based databases for
118. Reiss A, Stricker D (2013) Personalized mobile physical activity human activity recognition. In: Proceedings of IEEE services
recognition. In: Proceeding of the IEEE international symposium workshop on big data for public health policy making
on wearable computers (ISWC) 136. Krupitzer C, Sztyler T, Edinger J, Breitbach M, Stuckenschmidt
H, Becker C (2018) Hips do lie! a position-aware mobile fall detec-
123
Journal of Reliable Intelligent Environments (2021) 7:189–213 213
tion system. In: 2018 IEEE international conference on pervasive Publisher’s Note Springer Nature remains neutral with regard to juris-
computing and communications (PerCom), IEEE, pp 1–10 dictional claims in published maps and institutional affiliations.
137. Huynh DTG (2008) Human activity recognition with wearable
sensors, Human activity recognition with wearable sensors. Ph.D.
thesis, Technische Universitat
123