0% found this document useful (0 votes)
41 views24 pages

Information Fusion: Tengyue Li, Simon Fong, Kelvin K.L. Wong, Ying Wu, Xin-She Yang, Xuqi Li

li2020

Uploaded by

Diego Cornelio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views24 pages

Information Fusion: Tengyue Li, Simon Fong, Kelvin K.L. Wong, Ying Wu, Xin-She Yang, Xuqi Li

li2020

Uploaded by

Diego Cornelio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Information Fusion 60 (2020) 41–64

Contents lists available at ScienceDirect

Information Fusion
journal homepage: www.elsevier.com/locate/inffus

Fusing wearable and remote sensing data streams by fast incremental


learning with swarm decision table for human activity recognition
Tengyue Li a, Simon Fong a,∗, Kelvin K.L. Wong b,∗∗, Ying Wu c, Xin-she Yang d, Xuqi Li e
a
Department of Computer and Information Science, University of Macau, Macau SAR, China
b
School of Electrical and Electronic Engineering, The University of Adelaide, SA 5000, Australia
c
School of Nursing, Capital Medical University, Beijing, China
d
Department of Design Engineering and Mathematics, Middlesex University, London, UK
e
School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK

a r t i c l e i n f o a b s t r a c t

Keywords: Human activity recognition (HAR) by machine learning finds wide applications ranging from posture monitoring
Kinect depth sensor for healthcare and rehabilitation to suspicious or dangerous actions detection for security surveillance. Infrared
Wearable sensor cameras such as Microsoft Kinect and wearable sensors have been the two most adopted devices for collecting
Data mining
data for measuring the bodily movements. These two types of sensors generally are categorized as contactless
Classification model
sensing and contact sensing respectively. Due to hardware limitation, each of the two sensor types has their in-
Feature selection
herent limitations. One most common problem associating with contactless sensing like Kinect is the distance
and indirect angle between the camera and the subject. For wearable sensor, it is limited in recognizing complex
human activities. In this paper, a novel data fusion framework is proposed for combining data which are col-
lected from both sensors with the aim of enhancing the HAR accuracy. Kinect is able to capture details of bodily
movements from complex activities, but the accuracy is dependent heavily on the angle of view; wearable sensor
is relatively primitive in gathering spatial data but reliable for detecting basic movements. Fusing the data from
the two sensor types enables complimenting each other by their unique strengths. In particular, a new scheme
using incremental learning with decision table coupled with swarm-based feature selection is proposed in our
framework for achieving fast and accurate HAR by fusing data of two sensors. Our experiment results show that
HAR accuracy could be improved from 23.51% to 68.35% in a case of almost 90 degrees slanted view of Kinect
sensing while a wearing sensor is used at the same time. The swarm feature selection in general is shown to
enhance the HAR performance compared to standard feature selection method. The experiment results reported
here contribute to the possibilities of using hybridized sensors from the machine learning perspective.

1. Introduction tors and nurses monitoring patients remotely. The system could not only
relieve the shortage of medical staff manpower, but also give patients
During the past decade, there has been an exceptional development 7 × 24 h of care. Through recording and analyzing patient’s daily activi-
of microelectronics and computer systems, enabling sensors and mo- ties, the doctor can be provided with additional information on the treat-
bile devices with unprecedented characteristics [1]. Their high compu- ment, avoiding treatment that relies on subjective experience problems.
tational power, small size, and low cost allow people to interact with HAR systems could also be used in nursing homes for elderly people. By
the devices as part of their daily living. That makes detecting human detecting actions of the elderlies who are living alone, it is possible to
activity and emotions at any time possible [28]. Nowadays, human ac- find out whether the elderly is under danger or not. If the elderly has
tivity recognition (HAR) is prevalent topic in computer vision research. been sitting still or staying in one position for a long time, there may
The application of human activity recognition is very extensive, mainly be something abnormal happened. Then the HAR system would send
focusing on intelligent video surveillance, patient monitoring system, warning to corresponding people in time.
human-computer interaction, virtual reality, smart home, intelligent se- The HAR system has to collect behaviour information through a sens-
curity, and athlete-assisted training. Especially in the medical area, there ing device. There are mainly two different ways, using external (remote)
are some important systems created using HAR technology to assist doc- and wearable sensors (via wireless body sensor networks) [32,33]. In the


Corresponding author.
∗∗
Co-corresponding author.
E-mail addresses: [email protected] (S. Fong), [email protected] (K.K.L. Wong).

https://doi.org/10.1016/j.inffus.2020.02.001
Received 13 April 2019; Received in revised form 20 November 2019; Accepted 1 February 2020
Available online 14 February 2020
1566-2535/© 2020 Elsevier B.V. All rights reserved.
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

(3) Cooperative: when the provided information is combined into


new information that is typically more complex than the original
information. For example, multi-modal (audio and video) data
fusion is considered cooperative.

In our case here, detecting human activities by combining data from


Kinect with data from wearable sensor belongs to the first classification
criteria. This is a complementary approach taking the wearable sensor
data to assist the machine learning from the Kinect data. Meanwhile, a
practical challenge [30] is how to fuse two data feeds from two types of
sensors whose data formats are quite different. The data formats need to
be modified before fusion can take place for the Kinect HAR [4]. When
we successfully combine the two types of data, another problem is how
to effectively tune up the new dataset ensuring its suitability for machine
learning. Since the aim of HAR is to detect danger in smart home envi-
ronments, the data collection and analysis should be done in real time,
imposing requirements of high accuracy and high speed. For this reason,
Fig. 1. Three classification criteria based on relations of the sources. Image
courtesy from [3].
the traditional batch-mode of data mining may be less relevant than data
stream mining. The core of data stream mining [5] is the process of ex-
tracting knowledge structures from continuous, rapid data records, by
incrementally learning up a predictive model. During incremental learn-
former, the devices are fixed in predetermined location of interest. For
ing, the model is “refreshed” whenever a new segment of data comes
example, the sensor could be installed at the door way or mounted on the
using a sliding window mechanism. The model hence would be pro-
wall. In the latter, the devices are attached to the body of the user. Due
gressively updated and the learning is done at the sequence of testing
to hardware constraints, each of the two sensor types has their inherent
(using the most current induced model), then training (by only updating
limitations. For wearable sensor, it is limited in recognizing complex
the model with the latest portion of data), the data stream mining is on-
human activities. For external sensor, Microsoft Kinect depth camera is
going. This way is good for continuously detecting HAR without using
one of the most cost-effective remote sensors, favoured by developers for
too much memory and latency, so the users of HAR application could re-
supporting HAR application. The core function of Kinect is its capabil-
ceive feedback in a very short time. Furthermore, a quick pre-processing
ity to capture depth data of the target with skeletal point tracking. Cur-
function, which is implemented using a correlation-based feature selec-
rently, it can track 20 joint points, and one device can simultaneously
tion algorithm coupled with swarm optimization search methods [6],
position up to 6 people’s joint positions. Depending on the chosen func-
are to be integrated into the data mining algorithms at HAR. The re-
tion, it could perfectly detect complex human movement when people
minder of this paper is organized as follows. The proposed framework
are shown in front of the device perspective. However, Kinect is known
of fusing two types of sensing data is presented in detail in Section 2. An
to have a problem in its limited sensing distance and indirect angle be-
experiment using empirical sensor data is described in Section 3 for ver-
tween the camera and the subject which would strongly influence the
ifying the usefulness of the data fusion framework. The results are ana-
accuracy. In this paper, we propose to take complimentary advantages
lyzed and discussed in Section 4. A conclusion is drawn in Section 5 that
of the two types of sensors to achieve accurate HAR. The challenge is to
concludes the contribution.
combine the two sources of data streams for training a machine learning
model. So, a novel data fusion framework is proposed for combining the 2. Proposed framework
data which are collected from both Kinect and wearable sensor. Kinect
is able to capture details of body movement from complex activity, but A computation framework is proposed for realizing the fusion of data
accuracy is depending heavily on the angle of view. Wearable sensor is streams from Kinect depth sensor and wearable sensor, without requir-
relatively primitive in gathering spatial data of a particular body part ing any special hardware. In the context of machine learning, the col-
position, but it is reliable in detecting basic movement. Consequently, lected data are formatted in such a way that could be loaded into a
fusing these two types of data complements each other by their unique predictive model building process. The formatted data stream would
strengths. However, there are two computational challenges existing in then become suitable for machine learning, inferring a model that rec-
this direction. One is how to fuse the two datasets collected from dif- ognize the non-linear relations between the features or attributes that
ferent sensors. The other one is after fusion, how to effectively use the describe the data and the target labels as model training. Essentially, for
data to train a classification model. the purpose of machine learning, the data records that arrive as a stream
A well-known definition of data fusion provided by Hall and Llinas and the features, in some form of rows and columns would have to be
[2]: “data fusion techniques combine data from multiple sensors and taken care of, when two separate data streams are coming from different
related information from associated databases to achieve improved ac- sources. Please see the system framework in Fig. 2. The data collected
curacy and more specific inferences than could be achieved by the use of from the sensors could be stored either locally or in Cloud [29,31].
a single sensor alone.” Based on the relations of the sources (as shown in Just like any other multi-sensor data fusion approach [7], raw data
Fig. 1), Durrant-Whyte [3] proposed the following classification criteria: would have to pass through a sequence of processing: pre-processing,
time-registration, spatial-registration and data merging. In our frame-
(1) Complementary: when the information provided by the input work, this sequence of processing is mainly composed of dual steps: data
sources represents different parts of the scene and could thus be expansion (DE) and data contraction (DC). The raw data are assumed
used to obtain more complete global information. For example, in to be properly labelled with time-stamps and they arrive in regular se-
the case of visual sensor networks, the information on the same quence. If they are training data, some activities labels are attached at
target provided by two cameras with different fields of view is the last column of the dataset. The label indicates that the values of the
considered complementary; features contained in this instance give rise to the said activity. Let two
(2) Redundant: when two or more input sources provide information data streams, S1 and S2 be lists of data instances, so S1 , S2 ∈ S where
about the same target and could thus be fused to increment the S = (<t1 , d1 (a1 , a2 ,..am )>, < t2 , d2 (a1 , a2 ,..am )>, .. < ti ,di (a1 ,a2 ,..am )>)
confidence. For example, the data coming from overlapped areas where i is the current ith data instance that has arrived so far and i → ∞
in visual sensor networks are considered redundant; for an unbounded data stream.

42
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 2. System framework.

In the DE phase, the incoming data S is assumed to be pre-processed, joint that amounts up to 3 × 48 features with x, y and z information for
so noise and missing values are handled. The data expansion is on both each body point. With the shadow features added on, the total number
the features (the columns) and the volume (the rows). Firstly, the fea- of features during the DE would be 2 × m + m′.
tures at the Kinect depth data are doubled by adding in the shadow fea- With all the original and shadow features from Kinect depth data and
tures [8]. Shadow features are replicates of the original features from wearable sensor data in place, the two data can merge physically into
previous instances. Embracing the additional features gives rise to a one data format – let Sm be the merged data where Sm = S1 + S2 with
stereotype effect which adds more dimensions of information into the attributes (a1 , a2 , … am + b1 , b2 , … bm ′ ). Within the same data format,
data, so machine learning would be made more effective. The shadow the data from Kinect and Wearable sensor fill the values under their cor-
features that are generated in this model come from the filter called responding attributes. In any rows of data, if the data are unavailable
TimeSeriesTranslate [10]. It is an instance filter that assumes instances for corresponding features of the data from the other sensor, the fields
form time-series data and adds a new attribute with equivalent attribute would be padded by zeros under those features. The data from the wear-
values of some previous instance. They are a number of instances back- able sensor are added into the Kinect data progressively in bulk, with
ward that are taken from past instance to merge values between. During maximum the size of the sliding window each time. However, not all the
the pre-processing, the two data streams are adjusted to be synchro- data from the wearable sensors are added directly; only subsets of data
nized, therefore it is no need to worry about the time registration. A from the wearable sensor which have an activity label equivalent to that
sliding window mechanism is deployed which synchronized with the of the Kinect sensor would be added. In other words, data are merged
speed of the arriving which is 30 fps for both Kinect depth sensor and only when both the Kinect sensor data and the wearable sensor data
wearable sensor data feeds. have common target labels. For example, most primitive activities such
The features which are comprised of the original features and the as standing, sitting, lying down, walking are common across the two
shadow features, will be combined with the features of the data col- sensors. The data from the two sensors for such activities labels then
lected from the wearable sensor. Usually the data from wearable sensor can be merged together. Otherwise, data of activities labels which are
possess fewer features than the number of features in Kinect data. Let not common across the two sensors can neither be shared nor merged.
m be the original number of features in Kinect data, m′ is the number Intuitively, using more training samples from both sensor sources would
of features in wearable sensor data, where m′ ≤ m. The simplest form of help enhance the model learning in the classification model building, as
wearable sensor is in the form of accelerometer which records the 3D long as both data contribute to the same recognition target. By this de-
spatial coordinates of a moving object in time series, that has only three sign, a computer program can be written to automate the selective data
attributes (x, y, z). Some more sophisticated wearable sensors could in- addition process when the fusion mechanism is implemented in prac-
clude readings from ECG sensor (which is built-in inside the wearable tice. The prerequisite is that the meanings of the activity labels must
sensor), and readings from gyrometer as well as the acceleration rates first be understood, hence it would be known whether or not the label
from accelerometer. The depth sensor contains a monochrome CMOS of the wearable sensor and that of the Kinect describe the same activity.
sensor and infrared projector that help create the 3D imagery through- This implies human expert knowledge is needed for this role.
out the room [10]. It also measures the distance of each point of the Alternatively, some advanced machine learning is required to learn
user’s body by transmitting invisible near-infrared light and measuring from the data patterns from the two sensors and estimate if the data
its “time of flight” after it reflects off the objects. It can track 48 differ- are characterizing the same activity. The label learning is however be-
ent points on each user’s body and repeats 30 times every second. Kinect yond the scope of this work, though it is a feasible direction for our
depth data, therefore, typically would include 3D information of each future work. Nevertheless, the final volume of the merged data would

43
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

be smaller than the sum of the data from the two sensors because not be facing directly the Kinect camera. In order to simulate the practi-
all the target labels of Kinect and wearable sensors are perfectly equiv- cal use of such Kinect applications, activity data therefore are collected
alent to each other. Data merging occurs only when the target labels from some angles. Consequently the accuracy of the activity monitor-
of the two sensors are referring to the same activities. After the data ing, be it fall detection or other more precise movement in healthcare
from the two sensors are merged, they will undergo a data contraction [13], would decline as the participant is moving away from the central
stage that occurs in the early calibration and periodically at the later line-of-sight from the camera (Fig. 4).
times. During the data contraction, columns of data are trimmed via a To alleviate this problem of angle of view deviation, another wear-
feature selection process. Out of all the available features (2 × m + m′) able sensor is proposed to compliment the inaccuracy due to the short-
which are fused from the two data, only the significant features which falls on the imperfect Kinect data from slanted views. The participant
contribute to the predictive power of the classification model are is supposed to wear a portable sensor which records his/her activity
retained. for generating another source of activity data stream. The two indepen-
Usually, the feature selection operates only once at the beginning. dent data streams could be analyzed and used separately. However, the
Once the most significant features are found, the data format is con- wearable sensor data is meant to supplement the Kinect data because
firmed, and the data generation and fusion follow. In some cases when wearable sensor unless there are detailed markers placed all over the
the performance of the classification declines due to the change in the body, runs short in offering as detailed view as Kinect in capturing the
underlying patterns of the new sensor data, feature selection may need user’s activity. In the simulation experiment, the wearable sensor was
to be applied again. The flow of data fusion would continue for train- Shimmer version 2. Several Shimmers were strapped on three diverse
ing and testing the data in doing HAR. The full process of data fusion parts of the participant’s body - chest, wrist and ankle. As Shimmers are
as proposed in this methodology is depicted in Fig. 3. The advantage taking only supplementary role in data collection in this case, we deploy
of this methodology is its simplicity, fusion is conducted only at the them only at the most prominent positions. We know by common sense
data level. The choice of metaheuristic search in enabling feature se- that the positions of chest, wrist and ankle are most influential to major
lection is because of the speed and efficiency. Data from the two sen- activities in our daily life. It is attempted that using the minimum num-
sors are combined by examining the target labels from the data which ber of Shimmers that cover the most prominent body parts, gives us the
is relatively fast, making the whole process suitable for real-time HAR greatest extent possible in terms of measuring human activities. For ex-
application. ample, standing, sitting, lying, climbing high, raising up your arms, etc.,
involve most often of these three body positions. Data are recorded from
measuring the motion experienced by the wearable sensors. The inertia
3. Experiment
motion is defined by the Cartesian values of three signals which are the
acceleration, the magnetic field orientation, and the speed of rotation.
The experiment is designed to validate the efficacy of the proposed
By the default hardware setting, the maximum sampling rate of Shim-
framework, with the aim of solving the off-field-view problems that led
mer is 50 signals per second which is faster than that of Kinect which is
to poor HAR accuracy. It is known by specification that the horizon
30 signals per seconds. However, the effective sampling rate was scaled
field-of-views are 58 and 71 degrees respectively for the first and second
down to 30 signals per second, in order to make it synchronized with the
generations of Kinect sensors. HAR is tested at the limit of the hardware
Kinect data. A software buffer approach was used in the processing unit.
capabilities where the subject was deliberated placed at the boundary of
Reservoir sampling [9] was applied as post-processing for synchronizing
the detection range, and the positions of the camera were shifted away
wearable sensor data with the Kinect data streams. As we can see from
from the central line-of-sight.
Fig. 6, there is a function called sliding window applied to help fuse two
different sensing data that are generated at different frequency. It ran-
3.1. Data collection setup domly extracts an amount of data from high sampling frequency sensor
dataset, which equals to the amount of low sampling frequency sensor
To simulate such harsh conditions where most machine learning for data. In our case, it randomly samples 30 frames from 50 frames per sec-
HAR may fail, the following setup was arranged in collecting the data: ond from wearable sensor to match the KDS frequency. Then the sliding
Firstly, the Kinect version 1.0 camera was placed at approximately 1.5 m window combines these two sets of data with equal frequency for data
above group in an indoor laboratory with ambient ceiling light. The vol- fusing, as shown in Fig. 6. We set the size of sliding window as 90, since
unteer (subject) was performing a range of 30 activities of different body a sampling resolution of three seconds is assumed to be sufficient to
movements at approximately 3 m away from the camera. The effective detect most bodily movements in most circumstances. After fusing the
detection range with high accuracy for Kinect version 1.0 is at maxi- WS and KDS data, the attribute number of the dataset would increase to
mum 3 m is the detection range is selected based on the Kinect specifi- ‘sum of #WS + sum of #KDS’. A simple notation is used in the captions
cation guide published on the product blog by Microsoft [27]. The data of Fig. 6– WS_x-y and KDS_x-y denote the portions of WS data and KDS
initially was collected for smart-home projects where elderly was moni- data that are extracted respectively from xth instance to the yth instance
tored at home for automatically detecting if there is a fall, or dangerous from the corresponding data streams, to be placed within the space of a
happenings on the subject [11]. Data stream of vector of twenty body- sliding window. The wearable sensors provide data that describe gener-
joint positions which are representative of the human body captured by alized actions such as running, standing, sitting, and relaxing, and their
Kinect camera via the Microsoft Kinect Software Development Kit [12]. intensities. The locations of the measurements from the two sensors are
The data collected would be examined by a human expert who assigned depicted in Fig. 5.
the activity labels on the data that would be used as training data to In addition to validating the framework of fusing Kinect data and
train a classification model. The five positions of the camera in front of wearable data for effective machine learning, the experiment has two
which the subject was performing the activities are shown as Fig. 4. objectives from the machine learning perspective. First, we compare
As a result, a total of six datasets were generated – two contain the how the original Kinect data stream and the fused Kinect data stream
data that were created when the camera was facing directly at the par- and wearable sensor data stream can be used to induce a classification
ticipant, one file is for training a machine learning model and the other model for HAR using two different types of incremental data stream
file is for testing. The other four files, namely, P1, P2, P3 and P4 are mining algorithms – one is decision tree-based, and the other decision
used for testing too. It could be seen that these four files are used to test table-based. The second objective is to test the effect of metaheuristic
the accuracy of a machine learning to its extreme because they are not search [14] (also known as swarm search) versus conventional heuris-
positioned right facing the participant. In practice smart home applica- tic search in feature selection to reduce the dimensions of the sensor
tion scenarios, seldom a resident who moves around the house would datasets.

44
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 3. HAR machine learning processes by fusing two different sensors.

3.2. Experiment setup For this experiment, fused Kinect data stream which contains 17,216
instances and each data instance involves up to 150 attributes, gathered
The requirement for sensor-based monitoring applications is real- from two types of sensors that measures the motions experienced by var-
time, any detection of cautious event needs to be recognized accurately ious joint points of the body. The last attribute in the archive is ‘class’,
and quickly. To address the challenge of fast and accurate multi-class which is divided into the 30 distinct human activities. This dataset gen-
classification from the high-dimensional dataset in fused Kinect data, a eralizes a case of healthcare or security monitoring in a smart home
dual process of feature selection and data stream mining model induc- [15–17] or office environment [18].
tion by tree-based and table-based algorithms is conducted.

45
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

The incremental learning algorithms used in the experiment are the


Hoeffding tree (HT) and the Swarm Decision Table (SDT), which sim-
ulate the fast data analytics in real-time computing environment. The
SDT model attempts to construct a suitable decision table for modelling
the data. The three essential steps for implementation are first to select
efficient features (attributes), which can be done by the BestFirst or any
of the other metaheuristics methods (Ant, Bat, Bee, Cuckoo, Elephant,
Flower, Genetic, Harmony, PSO and Wolf in our case), second is to select
Fig. 4. Positions of the Kinect camera. efficient rules and third is to construct a matrix of decision rules inferred
from the training instances. As one of the experimentation objectives,
we tested the ten search methods in terms of reducing the number of
Given that the experimental training dataset has 150 features that features and then forming an optimal feature subset, which is an essen-
characterize the human activity patterns [19,20], feature selection be- tial step in making the decision table efficient and effective. A compact
comes mandatory for speeding up the model training time for meet- feature subset speeds up the learning process. If the selected features are
ing the real-time HAR application requirements. Preprocessing by fea- significant, the model is prevented from being overfitted or underfitted.
ture selection in this experiment is carried out by the BestFirst search To establish the optimal attribute subset that can meet the target per-
method [21] and ten metaheuristic search methods which are coined as formance level we experimented with different swarm feature selection
swarm search methods in short. The swarm search methods are coupled approaches, based on a simple incremental learning decision table algo-
with an evaluation function called correlation feature subset (cfs) evalu- rithm. The design of the decision table algorithm is similar to k-nearest
ation. Hence there are feature selection methods which based on swarm neighbour, in comparison with the Very Fast Decision Tree algorithm,
search [14], namely, cfs-Ant, cfs-Bat, cfs-Bee, cfs-Cuckoo, cfs-Elephant, which is another name for the Hoeffding tree algorithm. In this experi-
cfs-Flower, cfs-Genetic, cfs-Harmony, cfs-PSO and cfs-Wolf. ment, we evaluated ten types of swarm feature selection algorithms, for
The baseline feature selection method that is based on a popular non- their suitability in fast increment learning in HAR. The full details of the
swarm type of search method called BestFirst search, is known as cfs- formulation of SDT are presented in Appendix A.
bestfirst. The BestFirst search explores a graph by expanding the most A prequential evaluation scheme is used in the experiment, which
promising node chosen according to an evaluation function. With re- is equivalent to an interleaved test-then-train strategy. It works by test-
spect to swarm search methods, PSO is a classical swarming metaheuris- ing the current model with each instance that freshly arrives; should
tic search algorithm in which each particle follows a randomized local a new conditional test arise due to sufficient samples that hints that
velocity and also the global velocity of the group. The other metaphor- new testing is necessary, the tree expands (is trained) or a new rule
types of swarm search methods belong to the contemporary family of in the decision table is created (thus the name test-then-train). A slid-
metaheuristics that mimic the behaviour of some animals, insects or ing window is assumed to function in our case, which collects a certain
natural phenomena in evolution. Other swarm search methods that are amount of samples from the data stream one segment at a time. The
used in the experimentation here are described in [22]. window size is set at the default value of 1000 in the experiment. The

Fig. 5. The joint positions where Kinect and wearable sensors are collecting data from.

46
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 6. Sliding window for fusing WS data with KDS data.

classification performance is measured from the start of the prequential ing 0 s as 1 s and 1 s as 0 s. And when ROC is 0.5, it means the model
evaluation. has no class separation capacity whatsoever.
As the real-time performance is of concern in HAR scenarios, the There are two prime objectives in the experimentation. One is
performance of inferring up a model for classification in this experi- to compare the performance of HT and SDT vis-à-vis in the case of
ment is mainly evaluated using the three criteria of accuracy, kappa using fused Kinect data and only Kinect data by incremental learn-
statistics and ROC. Accuracy is the percentage of correct classifications. ing. The other prime objective is to investigate the effects of vari-
Kappa statistics measures how reliable the model is, by evaluating how ous swarm search methods in feature selection which is necessary for
much a trained model can be generalized and show its usefulness when shrinking the data dimensions after combining the features from Kinect
used across different testing datasets. ROC stands for Receiver Operating and wearable sensors. Three performance indicators are used to com-
Characteristics which is also known as Area-Under-The-Curve (AUC), pare HT and SDT, before and after the data fusions. The results are
the normalized ROC rate goes between 0 and 1. ROC curve is a per- tabulated in Table 1, followed by the graphical representations from
formance measurement for classification problem at various thresholds Figs. 7–18.
settings. ROC is a probability curve and it represents degree or measure The longitudinal views of HT and SDT operations in actions, under
of separability. It tells how much the model is capable of distinguishing the effects of some selected swarm search methods with respect to ac-
between classes. Higher the ROC rate, better the model is at predicting curacy and kappa are shown in Figs. 31 and 32 respectively. Individu-
0 s as 0 s and 1 s as 1 s. By analogy, higher the ROC, better the model ally under the effect of each of the 10 swarm searches (Ant, Bat, Bee,
is at distinguishing between human activities. An excellent model has Cuckoo, Elephant, Flower, Genetic, Harmony, PSO and Wolf) as well as
ROC near to the 1 which means it has good measure of separability. A the feature selection with standard best-first search and original opera-
poor model has ROC near to the 0 which means it has worst measure of tion without feature selection are plotted for side-by-side comparisons
separability. In fact, it means it is reciprocating the result. It is predict- from Figs. 19–30 respectively.

47
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 7. Comparison of accuracy levels by Hoeffding tree at various Kinect camera positions.

Fig. 8. Comparison of kappa statistics by Hoeffding tree at various Kinect camera positions.

48
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Table 1
Full comparison of classification performance at camera position 5.

Accuracy % Kappa ROC

Algorithm Hoeffinding tree Swarm decision table Hoeffinding tree Swarm decision table Hoeffinding tree Swarm decision table

Before After Before After Before After Before After Before After Before After

original 35.55 72.0609 37.125 77.0562 0.3332 0.6836 0.3493 0.7396 0.902 0.904 0.931 0.991
bestfirst 35.725 66.8 59.6875 69.4703 0.3351 0.621 0.5829 0.6532 0.91 0.788 0.98 0.983
ant 36.25 75.85 58.725 81.5 0.3404 0.7261 0.5729 0.7895 0.906 0.946 0.976 0.995
bat 33.95 76.45 58.7125 82.85 0.3166 0.7329 0.5728 0.8049 0.899 0.943 0.973 0.995
bee 32.8375 67.65 58.4125 77.7 0.3052 0.633 0.5697 0.7462 0.901 0.858 0.977 0.993
cuckoo 36.725 68.1 57.8625 80.75 0.3454 0.6356 0.564 0.7809 0.908 0.883 0.976 0.993
elephant 35.1625 74.7 58.2875 82.1 0.3292 0.7131 0.5684 0.7963 0.9 0.918 0.973 0.995
firefly 35.05 75.55 64.125 81.5 0.3281 0.7228 0.6288 0.7895 0.902 0.942 0.98 0.995
flower 36.7375 76.15 64.3625 80.4 0.3455 0.7295 0.6313 0.777 0.91 0.956 0.981 0.996
genetic 37.7 71.3 57.5 86.25 0.3554 0.6744 0.5603 0.8435 0.908 0.855 0.975 0.996
harmony 36.7 63.6 56.0625 72 0.3451 0.5872 0.5453 0.6814 0.906 0.938 0.972 0.99
PSO 37.0875 75.15 59.4125 82.85 0.3491 0.7182 0.58 0.8049 0.908 0.935 0.976 0.995
wolf 37.975 76.1 63.6375 84.3 0.3583 0.729 0.6238 0.8213 0.912 0.942 0.981 0.997
Maximum 37.975 76.45 64.3625 86.25 0.3583 0.7329 0.6313 0.8435 0.912 0.956 0.981 0.997

Bold numbers are the best value in each senarial.

Fig. 9. Comparison of ROC rates by Hoeffding tree at various Kinect camera positions.

4. Observations and results analysis dataset from camera position 5 which is directly frontal. Table 1 shows
the same from camera position 1 which is one of the most extreme off-
The incremental learning performance in terms of accuracy and reli- the-mark angle. In terms of accuracy, in frontal position, the accuracy
ability is crucial in real-time HAR application environment; the latency and kappa values doubled using HT, while SDT has about 30% increase.
in decision making should be kept as short as possible, so learning speed But when the experiment was tested by the camera position 1, which
would be fast enough to support rapid HAR when new activities need represents a kind of extreme case, the improvement in accuracy and
to be learnt on the go. The experiment conducted over the fused Kinect kappa quadrupled and quintupled respectively using HT; but they were
dataset in an attempt to find out how the impact of fusion scheme on only tripled using SDT. The ROC rates in all these cases increased up
the classification model is versus using only the original Kinect data in to about 10%. The positive results are encouraging, as it is proven that
a multi-class classification problem, under the effects of various swarm fusing Kinect with wearable sensor at data level indeed offers significant
methods in feature selection. The results from our multi-class classifica- improvement. Especially in the case of camera position 1, the original
tion experiment, as summarized in Tables 1 and 2, are interesting. accuracy levels for HT are at as low as 13% odds; by applying the pro-
In both Tables, it is apparent to observe the significant improvement posed fusion method, the accuracy levels rise up to 66.45% for HT and
on all the performance indicators after the data is fused. Table 1 shows 68.35% for SDT. That implies the data fusion method can support HAR
the results of the experiment which was conducted using the testing monitoring of a moving subject in smart home environment with rea-

49
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 10. Comparison of accuracy levels by swarm decision table at various Kinect camera positions.

Fig. 11. Comparison of kappa statistics by swarm decision table at various Kinect camera positions.

sonable accuracy. Through simple data fusion by software enhancement, with sustainable performance in slanted camera positions. After data fu-
the useful range of field-of-view could be extended. sion is applied, the performance of the classification model improves not
The radar charts from Figs. 7–18 essentially show that all the per- only greatly but consistently over various swarm feature selection meth-
formance indicators drop in values from camera positions 5 to 1 as ods and different camera positions. This shows sufficient data which are
expected. However, comparing two groups of charts between Figs. 7– supplied by data fusion and appropriate features are the key to good per-
12 HT suffers severely in performance when the camera positions drift formance in a machine learning model, as shown in Figs. 16–18.
away from the direct front. SDT is more resilience in learning and testing

50
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 12. Comparison of ROC rates by swarm decision table at various Kinect camera positions.

Fig. 13. Comparison of accuracy levels by Hoeffding tree at various Kinect camera positions after the data are merged.

Nevertheless, not all feature selection methods achieved the same and bestfirst except Harmony method. The improvements in general are
results. When the training is done at proper camera position, Firefly, significant after data fusion. It is supposed that by the nature of random
Flower and Wolf as well as bestfirst which is the baseline feature selec- and stochastic search for the swarm feature selection methods, more
tion method perform almost equally well and better than the rest of the features and larger volumes data enable the search methods to work ef-
search methods. At the extreme camera position, only Bee outperforms fectively. From Tables 1 and 2, the winning swarm methods are largely
the standard bestfirst; the rest of the swarm methods have no edge in im- Wolf and Flower when the camera was placed in proper position. In ex-
provement. This phenomenon is somewhat reserved when data fusion is treme position, Bee search method showed good performance, though
applied. All swarm methods show significant improvement over original

51
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 14. Comparison of kappa statistics by Hoeffding tree at various Kinect camera positions after the data are merged.

Fig. 15. Comparison of ROC rates by Hoeffding tree at various Kinect camera positions after the data are merged.

Wolf managed to achieve the highest possible accuracy and most model nature of the group movement in comparison to PSO. PSO as well as
reliability, pairing with SDT and data fusion. its similar counterparts have a central moving force which guides the
From the experiment results, it is observed that different bio-inspired whole group to move towards a particular direction during the search.
algorithms perform differently as per data fusion and angle. For exam- Wolf however does not have such centrally guided movement. But each
ple, Wolf and Flower perform best at camera position 1 but Bee perform- Wolf agent is programmed with an inclination to stick with one another
ing best at extreme angles with Wolf having the best accuracy and high- when they are within some predefined close distance. Specifically, Wolf
est model reliability. The difference in performance is due to the var- is treated as an independent searching agent, yet having some flock-
ied nature of such algorithms. Wolf algorithm could achieve such good ing characteristics when they are near to each other as a pack. Wolf
performance in this extreme position is because of the semi-swarming [25] has displayed unique advantages in efficiency because each search-

52
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 16. Comparison of accuracy levels by swarm decision table at various Kinect camera positions after the data are merged.

Fig. 17. Comparison of kappa statistics by swarm decision table at various Kinect camera positions after the data are merged.

ing agent simultaneously performs autonomous solution searching and why the Flower [26] algorithm is efficient can be explained twofold:
merging. Each Wolf is empowered by memory caches that store the long-distance pollinators and Flower consistency. Pollinators such as
previously visited positions. So that each searching agent will go to a insects can travel long distance, and thus they introduce the ability
better performance position without stepping back any previous posi- (into the algorithm) that they can escape any local landscape and sub-
tion. The most unique feature in Wolf is the implementation of hunters sequently explore larger search space. This acts as exploration moves.
that are added at random chances corresponding to each Wolf. When a On the other hand, Flower consistency ensure that the same species of
Wolf agent meets its hunters, it will jump far out of its hunter’s visual the Flowers (thus similar solutions) are chosen more frequently and
range to avoid being trapped in the local optima. By this hunter avoid- thus guarantee the convergence more quickly. This step is essentially
ance method, Wolf is efficient in finding a global optimum. Similarly, an exploitation step. What have in common between Wolf algorithm

53
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 18. Comparison of ROC rates by swarm decision table at various Kinect camera positions after the data are merged.

Fig. 19. Comparison of accuracy levels by different swarm search feature selection methods at position 5.

Fig. 20. Comparison of kappa statistics by different swarm search feature selection methods at position 5.

54
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 21. Comparison of ROC rates by different swarm search feature selection methods at position 5.

Fig. 22. Comparison of accuracy levels by different swarm search feature selection methods at position 1.

Fig. 23. Comparison of kappa statistics by different swarm search feature selection methods at position 1.

55
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 24. Comparison of ROC rates by different swarm search feature selection methods at position 1.

Fig. 25. Comparison of accuracy levels by different swarm search feature selection methods at position 5 after the data are fused.

Fig. 26. Comparison of kappa statistics by different swarm search feature selection methods at position 5 after the data are fused.

56
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 27. Comparison of ROC rates by different swarm search feature selection methods at position 5 after the data are fused.

Fig. 28. Comparison of accuracy levels by different swarm search feature selection methods at position 1 after the data are fused.

Fig. 29. Comparison of kappa statistics by different swarm search feature selection methods at position 1 after the data are fused.

and Flower algorithm are their efficient capability to avoid local op- The chaotic data patterns give rise to very non-linear and noisy relations
tima and active exploration for global optimum. The search agents can between the many features and prediction target, that makes a search
stay far apart most of the time. This formation is suitable for finding space challenging to swarm algorithms. Swarm search algorithms that
a global. have considerable global exploration capability could overcome such
Therefore, they performed well under extreme situation where posi- difficult search space in finding a relatively best feature subset that leads
tion 1 that places the camera at the most slanted angle. The data gen- to the best classification performance.
erated from such extreme angle like position 1 would be most chaotic Figs. 31 and 32 show the longitudinal views of the performance of
and most difficult for inducing up an effective machine learning model. the classification model. They clearly show that without data fusion the

57
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 30. Comparison of ROC rates by different swarm search feature selection methods at position 1 after the data are fused.

Fig. 31. Comparison of accuracy levels of various algorithms in incremental learning environment.

performance degrades badly when Kinect is being used alone. With data fulness includes contactless structured light sensing, being able to cap-
fused, the curve called “expanded” increased in performance by several ture a user’s body movement. The sensed data of body movement could
multitudes. Furthermore, with swarm feature selection the performance be used to train a classification model, recognizing the activity of the
can be improved even more. The curves also show that after data fu- user from processing the numeric data points perceived from the sensor.
sion the performances increase rapidly along the learning process, till a One weakness about Kinect camera, just like any other remote sensing
steady state is reached some time later as the learning matures. device, is the physical limitation in the field-of-view. When the subject
moves close to or beyond the boundary of the field-of-view, the detec-
tion becomes less reliable. Hence the data perceived, when deliver to
5. Conclusion and future works
the machine learning model, the accuracy of the model declines. One
strategy that is proposed and reported in this paper is to fuse the Kinect
Kinect depth camera recently became popular for its low cost and
sensor data with supplementary data that come from another sensing
versatility, in many useful applications especially health-related. Its use-

58
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Fig. 32. Comparison of kappa statistics of various algorithms in incremental learning environment.

Table 2
Full comparison of classification performance at camera position 1.

Accuracy % Kappa ROC

Algorithm Hoeffinding tree Swarm secision table Hoeffinding tree Swarm decision table Hoeffinding tree Swarm decision table

Before After Before After Before After Before After Before After Before After

original 13.5625 62.3838 15.325 64.0102 0.1057 0.574 0.1239 0.5919 0.8 0.831 0.757 0.926
bestfirst 14.3875 62.05 22.025 61.6229 0.1143 0.5669 0.1931 0.5641 0.828 0.753 0.82 0.931
ant 15.025 66.05 22.2625 67.05 0.1208 0.6152 0.1956 0.6252 0.845 0.892 0.832 0.959
bat 14.0875 66.45 21.15 65.1 0.1111 0.6198 0.1841 0.6031 0.818 0.878 0.798 0.921
bee 16.3125 61.35 23.5125 64.1 0.1342 0.5625 0.2084 0.5916 0.837 0.824 0.872 0.923
cuckoo 13.7125 63.4 19.1125 65.45 0.1072 0.5815 0.163 0.607 0.824 0.854 0.827 0.949
elephant 13.3 64.65 19.25 67.7 0.103 0.5994 0.1645 0.6327 0.801 0.854 0.761 0.95
firefly 13.8 66.4 19.8375 67.7 0.1082 0.6192 0.1706 0.6327 0.82 0.886 0.784 0.954
flower 13 64.55 21.0125 67.1 0.0999 0.5982 0.1827 0.6258 0.83 0.91 0.793 0.944
genetic 13.1625 63.35 21.2125 66.95 0.1015 0.5851 0.1848 0.6241 0.822 0.807 0.795 0.942
harmony 13.5625 58.6 17.4875 62.85 0.1057 0.5309 0.1463 0.5775 0.814 0.914 0.777 0.904
PSO 14.375 64.85 19.1125 65.5 0.1141 0.6016 0.1631 0.6076 0.822 0.875 0.783 0.919
wolf 14.2125 65.15 18.3875 68.35 0.1125 0.605 0.1555 0.64 0.832 0.874 0.776 0.944
Maximum 16.3125 66.45 23.5125 68.35 0.1342 0.6198 0.2084 0.64 0.845 0.914 0.872 0.959

device. In this strategy, wearable sensors are proposed because of their up to 5 times better than using the Kinect data alone without feature se-
prevalence and affordable prices. A simple data fusion framework is lection. As future works, this fusion model could be extended in two
therefore formulated that works purely at the data level on software directions: one is to extend the testing scenarios that incorporate more
without requiring extra hardware. The strategy basically repacks the different angles of field of views from aerial. This would have implica-
Kinect data by adding relevant data from the wearable devices by the tions for human activity recognition using sensors that are installed up
common target labels (some call this context-aware). Then we run a fast near the ceiling at some corner of a room or even from a flying drone.
feature selection for retaining only the strongly correlated features over The current work considered only horizontal angles at the same eye-
the combined features from both Kinect and wearable sensors. Experi- level. The second direction is the use of more other machine learning
ment results ascertain that the proposed data fusion method improves for benchmarking the performance. The present work considered Ho-
the performance of the classification model using empirical data feeds. effding tree which is popular among data stream mining algorithms,
Furthermore, pairing with incremental learning algorithms, swarm fea- and swarm decision table which is proposed here.
ture selection enhances the performance of the machine learning models

59
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Declaration of Competing Interest elements inside R are logical combinations by the elements inside A∗ .

For example, r1 = “Age > 40” “income > 15,000. Then for the second
The author(s) declare(s) that there is no conflict of interest. function f, how to construct a sub-table for each decision attribute will
be discussed next. Here the sub-table is named as rule matrix.
CRediT authorship contribution statement Example:
r1 r2 r3 … rz -2 rz -1 rz
Tengyue Li: Writing - review & editing, Data curation, Software, Val- A1 = age A1 > 40 A1 > 40 … … … … …
idation, Visualization. Simon Fong: Writing - original draft. Kelvin K.L. A2 = income 0 < A2 < 50 50 < A2 < 100 … … … … …
Wong: Project administration. Ying Wu: Conceptualization. Xin-she D = Good 100 25 … … … … …
D = Average 2 162 … … … … …
Yang: Supervision. Xuqi Li: Formal analysis.
D = Poor 41 0 … … … … …

Acknowledgements Definition 2. Rule matrix


ReS is a real matrix and 𝑅𝑒 ∈ 𝐼 𝑅𝑧× (𝑛_𝑟𝑢𝑙𝑒) where z is the number of
The authors are thankful for the financial support from the research classes and 𝑛_𝑟𝑢𝑙𝑒 is the number of rules based on specific dataset S.
grants (1) MYRG2016-00069, entitled ’Nature-Inspired Computing and Then the element in the matrix could be counted as:
Metaheuristics Algorithms for Optimizing Data stream mining Perfor- ∑
mance’ offered by RDAO/FST, the University of Macau and the Macau 𝑅𝑒𝑆𝑖,𝑗 = 1𝑥∈𝛿𝑖,𝑗
SAR government; and (2) A Scalable Data Stream Mining Methodology: 𝑥∈𝑆
{
Stream-based Holistic Analytics and Reasoning in Parallel, Grant no. 1 𝑖𝑓 𝑥 ∈ 𝛿𝑖,𝑗
The f unction 1𝑥∈𝛿𝑖,𝑗 = (1)
FDCT/126/2014/A3, by FDCT Macau. 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Supplementary materials where 𝛿𝑖,𝑗 = {𝑥 ∈ 𝑆| 𝑐𝑙𝑎𝑠𝑠(𝑥) = 𝑖 𝑎𝑛𝑑 𝑚(𝑥) = 𝑟𝑗 } is a collection of in-


stances which match the rule rj , while the class of them are both ith
Supplementary material associated with this article can be found, in class and function class() means mapping the decision attribute value of
the online version, at doi:10.1016/j.inffus.2020.02.001. x into real number set. Without loss of generality, before that the class
of the dataset is numerically ordered for the convenience of computing.
Appendix A. Formulation of swarm decision table Function m () is used to map the data instance to the rule set. With the
definition of decision table and rule matrix, the prediction mechanism
Swarm decision table of decision table is given below.

Definition 3. Decision table algorithm


Suppose that there is a dataset 𝑆 = {𝑥1 , … , 𝑥𝑡 ...} where xt ∈ S is any
DTM [23] is defined as a prediction function of decision table, DTM:
of instances inside S and t = 1…∞. For each instance, it has two at-
A∗ × S × T → IR where A∗ is a sub-attribute set of A and S is a training
tributes z which means they are z dimensional data and the attribute
dataset, T is a testing dataset. This function is to predict test instance’s
space which contains all kinds of combinations of attributes is denoted
class using decision table formed by S. If the test instance could be
by A. In order to make a decision, we divide the attribute set A into two
mapped into a rule set of decision table formed by S, then to print out the
disjoint subsets C and D, namely, 𝐴 = 𝐶 ∪ 𝐷 and 𝐶 ∩ 𝐷 = ∅. C is called
number of majority class based on the exact column of rule matrix. Oth-
condition attribute set which means the attributes in C will form the
erwise, it prints out the majority class in the dataset. 𝑅𝑢(𝑆, 𝐴∗ ) = 𝑅.
rule for decision while D is called decision attribute set which means
this is the class of this instance and in common case there are only one ( )
𝑝𝑟𝑒𝑥 = 𝐷𝑇 𝑀 A∗ , S, x
attribute in this set. For example, if “age” and “income” are in C and
⎧∑ ( )
“quality” is in decision set, then the logical combination relationship ⎪ 𝑟𝑃 ∈𝑅𝑢(S, A∗ ) (1𝑚(𝑥)=𝑟𝑃 ) ∗ 𝑎𝑟𝑔𝑖∈[1, 𝑧] max 𝑅𝑒𝑠𝑖,𝑝 if 𝑚(𝑥) ∈ 𝑅
of “age” and “income” will form several rules, and each rule will out- =⎨
⎪ 𝑎𝑟𝑔𝑖∈[1, 𝑧] max |{𝑦∈𝑆| 𝑐𝑙𝑎𝑠𝑠
|𝑆 |
(𝑦)=𝑖}|
if 𝑚(𝑥) ∉ 𝑅
put a value of “quality” attribute for decision measurement. With this ⎩
notation, the definition of decision table will be given below. (2)
Definition 1. Decision table
Then after the prediction, the method for computing the accuracy of
A decision table [22] is an ordered quintuple DT = <S, R, A∗ , V,
this algorithms will be formulated below.
f> where S is a dataset and R is a non-empty finite set of rules, A∗ is
a subset of A and A∗ can also be divided into C∗ and D where C∗ is a Definition 4. Accuracy
condition attribute set and D is decision attribute set. Therefore there is The accuracy function Acc is defined for computing the accuracy
a mapping function Ru from A∗ , S to R namely, Ru: S × A∗ → R which after the whole prediction. Acc: A∗ × S × T → IR. In this accuracy com-
means regarding the elements in sub attribute set A∗ as rule elements. putation, zero-one loss function L is involved.
Based on the specific dataset S we find the exact number of attributes ∑
( ) 𝑥∈𝑇 𝐿(𝐷𝑇 𝑀 (𝐴 , 𝑆, 𝑥), 𝑐 𝑙𝑎𝑠𝑠(𝑥))

value (normally continuous data will be transferred into discrete data), 𝑎𝑐 𝑐𝑇 = 𝐴𝑐 𝑐 𝐴∗ , 𝑆, 𝑇 = 1 −
finally using combination and rule reduction to generate a set of rules |𝑆 |
where each rule are disjoint. A function f is an information function, where
f: S × R × A∗ → V, which input dataset into table and then allocates the {
( ( ) ) 0 𝑖𝑓 𝐷𝑇 𝑀 (A∗ , S, x) = 𝑐𝑙𝑎𝑠𝑠(𝑥)
instances according to rules. Then it forms a sub-table for each decision 𝐿 𝐷𝑇 𝑀 A∗ , S, x , 𝑐𝑙𝑎𝑠𝑠(𝑥) = (3)
1 otherwise
attribute value, namely, every class. It counts the number of class of
instances which belonged to a specific rule. Finally, it completes this In the decision table, there are three essential steps to construct the
decision table. V is a collection of all kind of union of attribute domains. table. The first one is to select efficient feature(attributes), the second
In order to formulate the prediction process of decision table clearly, one is to select the efficient rules, and the third one is to constructs
after the general definition, formulation of the process follows. First of the rule matrix by the training instances. In this paper, we focus on the
all, for the first function Ru, in the algorithms, it is set as default rule first one that is using Binary Particle Swarm Optimization (BPSO) as a
mapping which means it guarantees the rule is disjoint and non-empty variant of PSO but simpler in design, to do the feature selection which
throughout the whole process. Rule set R= {r1 , r2 , r3 , ……} and all the will find an optimal set of attributes 𝐴∗ = 𝑎𝑟𝑔𝐴′ ⊆𝐴 𝑚𝑎𝑥{𝐴𝑐 𝑐 (𝐴′ , 𝑆, 𝑇 )}.

60
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Theoretically the PSO in this searching part of feature selection can be Base on the sigmoid functions, we get the location update functions:
replaced by any other swarm search methods.
𝐴𝑡𝑖,𝑑 ∶ 𝑍 → {0, 1} (11)
Swarm decision table (with an example of PSO)
{
1 𝑡 > 𝑆𝑡
𝑖𝑓 𝑟𝑑𝑖,𝑑
In order to optimize the feature set, firstly, this paper will transfer 𝐴𝑡𝑖,𝑑 = 𝐴𝑡𝑖,𝑑 (𝑡) = 𝑖,𝑑 (12)
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
the attribute set A′⊆A into binary vector which means A′ ∈ Bz where z is
𝑡 is a random variable and 𝑟𝑑 𝑡 ∈ [0, 1]. Therefore, for each
where 𝑟𝑑𝑖,𝑑
the number of attributes in dataset S. Without loss of generality, these 𝑖,𝑑
attributes are ordered and every element in A′ is either 0 or 1, namely, turn of updating, 𝐴𝑡𝑖 = (𝐴𝑡𝑖,1 , … , 𝐴𝑡𝑖,𝑑 , … , 𝐴𝑡𝑖,𝑧 ) for i = 1,..,k then the per-
sonal best,
{
1 𝑖𝑓 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑖𝑠 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑠𝑒𝑡
(𝐴′ )𝑖 = (4) 𝑃 𝑒𝑖 ∶ 𝑍 → 𝐵 𝑧 (13)
0 𝑖𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛 𝑠𝑒𝑡
{ ( ) ( )
Therefore, the target of this optimization problem could be formu- 𝐴𝑡𝑖 𝑖𝑓 𝑓 𝑖𝑡 𝐴𝑡𝑖 , 𝑆 < 𝑓 𝑖𝑡 𝑃𝑖𝑡−1 , 𝑆
𝑃𝑖𝑡 = 𝑃 𝑒𝑖 (𝑡) = (14)
lated as 𝑃𝑖𝑡−1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑓 𝑖𝑡 ∶ 𝐵 𝑧 × 𝑆 → 𝐼𝑅 (5) And the global best is defined by:

( ) 𝐺𝑜 ∶ Z → B𝑧 (15)
( ) ( ) 1 1
min 𝑓 𝑖𝑡 𝐴′ , 𝑆 = − 𝛼1 × 𝐴𝑐 𝑐 𝐴′ , 𝑆, 𝑆 + 𝛼2 × + 𝛼2 ×
𝑛𝑓 𝑒𝑎𝑡𝑢𝑟𝑒 𝑡𝑖𝑚𝑒(𝐴′ , 𝑆 ) { ( )}
G𝑡 = 𝐺𝑜(𝑡) = 𝑎𝑟𝑔A′ ∈{P𝑡 }𝑘 min 𝑓 𝑖𝑡 A′ , S (16)
(6) 𝑖 𝑖=1

where the number of the attributes in the sub attribute set A′ is 𝑛𝑓 𝑒𝑎𝑡𝑢𝑟𝑒 = For the update time t and other coefficients c1 , c2 , they will be set by

𝑗∈𝐴′ 1𝑗∈{1} included in the attributes set, Acc(A′, S, S) is the accuracy
user and r1 , r2 are random variables. Finally, based on these definitions
function to calculate the accuracy and time(A′, S) is to calculate the run- and notation, BPSO function for feature selection could be formulated
ning time. In other words, the influence factors we consider are ac- and defined.
curacy, the number of the selected features and running time. At the
Definition 5. Binary particle swarm optimization
same time, they are weighted by different weights, the quantity of these
Given the two learning coefficients c1 , c2 as well as iteration time
weights are 𝛼 1 , 𝛼 2 and 𝛼 3 which could be set by user to deal with differ-
tt, a Binary Particle Swarm Optimization function BPSO() is defined on
ent demand.
training dataset S, attribute set A(according to S), a fixed fit function,
To construct an efficient decision table, reducing the number of fea-
as follow, and finally maps the optimal attribute set A∗ .
tures and then forming an optimal attribute subset is an important ap-
( )
proach to make the decision table efficient. ( ) ( ) 1 1
In order to find out the optimal attribute set A∗ which could 𝑓 𝑖𝑡 𝐴′ , 𝑆 = − 𝛼1 × 𝐴𝑐 𝑐 𝐴′ , 𝑆, 𝑆 + 𝛼2 × + 𝛼2 ×
𝑛𝑓 𝑒𝑎𝑡𝑢𝑟𝑒 𝑡𝑖𝑚𝑒(𝐴′ , 𝑆 )
meet the target function, we use Binary Particle Swarm Optimiza-
(17)
tion (BPSO) to approach the best. The algorithm will set up k parti-
cles for searching and also initialize the locations {𝐴0𝑖 }𝑘𝑖=1 , 𝐴0𝑖 ∈ 𝐵 𝑧
𝐵𝑃 𝑆 𝑂𝑓 𝑖𝑡 ∶ 𝑆 × 𝐴 × 𝐼𝑅 × 𝐼𝑅 × 𝑍 → 𝐴 (18)
and velocities{𝑣0𝑖 }𝑘𝑖=1 , 𝑣0𝑖 ∈ 𝐵 𝑧 where 𝐴0𝑖 = (𝐴0𝑖,1 , … , 𝐴0𝑖,𝑑 , … , 𝐴0𝑖,𝑧 ) and
𝑣0𝑖 = (𝑣0𝑖,1 , … , 𝑣0𝑖,𝑑 , … , 𝑣0𝑖,𝑧 ) for i = 1,…,k. Then for ith particle, the per- ( ) { ( )}
𝐴∗ = 𝐵𝑃 𝑆 𝑂𝑓 𝑖𝑡 𝑆, 𝐴, 𝑐1 , 𝑐2 , 𝑡𝑡 = 𝑎𝑟𝑔𝐴′ ∈{𝑷 𝑡𝑡 }𝑘 min 𝑓 𝑖𝑡 𝐴′ , 𝑆 (19)
sonal best 𝑃𝑖0 = 𝐴0𝑖 = (𝑃𝑖,01 , … , 𝑃𝑖,𝑑
0 , … , 𝑃 0 ) and global best is 𝐺 0 =
𝑖,𝑧 𝑖 𝑖=1
𝑎𝑟𝑔𝐴′ ∈{𝑃 0 }𝑘 min{𝑓 𝑖𝑡(𝐴′ , 𝑆, 𝑇 )}. To search for the best attribute set, we
𝑖 𝑖=1
need to set up update function vei, d (t)and Ati, d (t)to update the velocity With the Definitions 3 and 5, a new algorithm is invented using deci-
and location of the dth attribute status of ith particle. sion table for prediction meanwhile applying BPSO as one of the many
possible metaheuristic methods for feature selection.
𝑣𝑒𝑖,𝑑 ∶ 𝑍 → 𝐼𝑅 ( ) ( ) Definition 6. A swarm decision table called BPSO-DTM
𝑣𝑡𝑖,𝑑 = 𝑣𝑒𝑖,𝑑 (𝑡) = 𝑤 × 𝑣𝑡𝑖,𝑑
−1 𝑡−1
+ 𝑐1 × 𝑟1 × 𝑃𝑖,𝑑 − 𝐴𝑡𝑖,𝑑
−1
+ 𝑐2 × 𝑟2 × 𝐺𝑡−1 − 𝐴𝑡𝑖,𝑑
−1
The prediction function of BPSO-DTM algorithm is defined on a
(7) training dataset S, test dataset T, attribute set A, learning coefficient
c1 , c2 and maximum iteration time tt with the fit function
where c1 , c2 are learning coefficients and r1 , r2 are random variable r1 ,
( )
r2 ∈ [0, 1]. At the same time, weight w could be adapted function and ( ) ( ) 1 1
wmin and wmax will be set by user and fitmin and fitavg is the fitness values f it 𝐴′ , 𝑆 = − 𝛼1 × 𝐴𝑐 𝑐 𝐴′ , 𝑆, 𝑆 + 𝛼2 × + 𝛼2 × .
𝑛𝑓 𝑒𝑎𝑡𝑢𝑟𝑒 𝑡𝑖𝑚𝑒(𝐴 , 𝑆 )

of the minimal and average in the current searching.
(20)
⎧𝑤 + (𝑤max −𝑤min )×(𝑓 𝑖𝑡(𝑃𝑖𝑡−1 ,𝑆 )−𝑓 𝑖𝑡min ) After the process of feature selection, rule mapping and rule matrix
⎪ min 𝑓 𝑖𝑡𝑎𝑣𝑔 −𝑓 𝑖𝑡min ( )
𝑤=⎨ if 𝑓 𝑖𝑡 𝑃𝑖𝑡−1 , 𝑆 ≤ 𝑓 𝑖𝑡𝑎𝑣𝑔 construction, BPSO-DTM model is constructed and used for prediction.
⎪𝑤 Let 𝜇 be:
⎩ max otherwise
(8) ∑
1𝑚(𝑥)=𝑟𝑃
( )
After we update the velocity, next step is to update the location. Lo- 𝑟𝑃 ∈𝑅𝑢 𝑆 , 𝐵𝑃 𝑆 𝑂𝑓 𝑖𝑡 (𝑆 ,𝐴,𝑐1 ,𝑐2 ,𝑡𝑡)
cation should be influenced by searching velocity. However, because it ( )
𝑅𝑒𝑠𝑖,𝑝
is a binary problem, so the velocity should be adjusted to fit the algo- × 𝑎𝑟𝑔𝑖∈[1, 𝑧] max ∑ 𝑖𝑓 𝑚(𝑥) ∈ 𝑅 (21)
𝑅𝑒𝑠𝑘,𝑝
𝑘∈[1,𝑧]
rithms. So a series of sigmoid functions are chosen as:
( )
{{ }𝑧 }𝑘 𝐵𝑃 𝑆 𝑂𝑓 𝑖𝑡 𝐷𝑇 𝑀 𝑆, 𝐴, 𝑐1 , 𝑐2 , 𝑡𝑡, 𝑥
𝑆𝑖𝑑 (𝑡) 𝑑=1 𝑖=1 , 𝑆𝑖𝑑 ∶ 𝑍 → [0, 1] (9) {
𝜇 if 𝑚(𝑥) ∈ 𝑅
= (22)
S𝑡𝑖,𝑑 = 𝑆𝑖,𝑑 (t ) =
1
(10) 𝑎𝑟𝑔𝑖∈[1, 𝑧] max |{𝑦∈𝑆| 𝑐𝑙𝑎𝑠𝑠
|𝑆 |
(𝑦)=𝑖}|
otherwise
1 + exp(−𝐯𝑡𝑖,𝑑 )

61
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

Long-term and short-term swarm decision table ever, the incremental swarms feature selection will succeed the last op-
timal feature set as the initial starting feature set for the current feature
With the formulation of swarm decision table, we extend this model searching. Because the previous or current dataset have many similar in-
to an incremental learning model suitable for data stream mining. Being stances, the optimal feature sets of the two datasets may be very close.
an incremental learning model means the training data of this decision Based on this assumption, in the incremental feature selection, if we set
table is changed to a stream of sequential data. When the training data is the last optimal feature set as one of initial feature sets, and then it will
generated in every second or less, and ever going, the prediction model have much higher probability to reach the optimal feature set in less
can’t be static because the information and underlying prediction rules iteration time. This makes the algorithm more efficient.
of a data stream may vary with time. Therefore, there are three issues Beside the feature selection algorithm, counting of major class of
that arise when it comes to implement the traditional swarm decision rules are the other essential step of constructing a decision table. In
table into incremental version. the previous counting part of decision table, every instance in training
Before introducing the problems, it is essential to describe how the dataset is regarded as the same level of importance which means count
new model works for prediction in data stream. Because data stream is 1 for every instance. But in this new incremental swarm decision table,
potentially infinite, that is unrealistic to train the model based on the new data are more vital than old data. So we set up a depreciating rate
whole data. Due to this consideration, a temporary buffer is constructed to depreciate the counting of old data and also set up depreciating coef-
to collect a part of data for prediction. Buffer could pick up that part ficient for each rule to record the long term change and this coefficient
of data from data stream and store them to train model. When the new will help to control the process of forgetting objects.
data is coming, the buffer will be cleared of some space in buffer for With these implementations in place, Swarm decision table becomes
new incoming data will be freed up, and then it updates the prediction an intelligence algorithm which not only applicable on data stream, but
model based on the current data in the buffer. This is the basic process also it updates long term memory, adjust memory cycle and increase
for the incremental swarm decision table. the computing velocity compared to previous version. In the following
However, some issues that need to be tackled. There is a common sections, the detail of the solutions of three problems is introduced.
sense that the quantity and quality of training data is much more impor-
tant for training a prediction model in supervised learning than unsuper- Bound adjustment
vised learning. Therefore, the first challenge is how to decide the length
of the buffer. Secondly, the strategy for clearing the data in buffer is also The adjustment of the buffer’s size is to simulate the memory cycle
important. Thirdly, how to make swarm decision table algorithm to be of the detecting object. In order to find out the cycle which could repre-
incremental. In other words, how to make feature selection algorithms sent the status of a person more correctly, SVD method will be applied to
to be incremental which gets synchronized with the SDT framework. analyze the data and then it decomposes to get several most important
These three questions will be explored in the following section. features. These important features are thought to describe the detect-
In this algorithm, since the data in buffer will be changed with time ing object at the best angle. Therefore, the cycle problem is similar to
and this process is much like forgetting and updating memory based on analyzing the cycle of these important features. The formulation of this
some strategies. Therefore, like a human brain, we will pick up much decomposition will be given below.
more important memory and special memory to remember, update the Suppose there is a sequential dataset A which excludes all discrete
memory which are redefined while forgetting those memory or experi- attributes, and then A is transferred into a Matrix format. A ∈ IRm × n
ence that no longer valid. That’s why we choose to update the buffer where m is the number of instances in the dataset and n is the number
based on some strategies. This process is a long-term updating process, of attributes of A. Then for Singular value decomposition,
so buffer is also called long term memory buffer. However, long term ∑
memory updating process cannot work independently. The principles of A𝑚×𝑛 = U𝑚×𝑘 V𝑘×𝑛 (23)
𝑘×𝑘
updating and forgetting are closely related to the decision table. So in
the flow chart there are two arrows connecting to the long term buffer where the columns of U are the left singular vectors of A, rows of V

updating process. are right singular vectors of A and is the diagonal matrix where the
After explanations of the necessary of updating long term buffer, elements of diagonal are the singular values of A. Here 𝑈 = [𝑢1 |𝑢2 | … |𝑢𝑘 ]
it comes to a question that whether the bound (size) of the buffer is and uk is the left singular vector. In this design, these singular vectors are
fixed. In our case here, the window size is set at 1000. Sequential data the features which we get from SVD. Since the values of the elements in

in buffer is ordered by time, there will be a big probability that there represent the importance of the column features, the most important
will be a working cycle in them. For example, if the data are collected r features (r < k) will be picked from U to find the cycle.
from city traffic, then the cycle maybe is 1 day or 1 week. With this
knowledge, if we pick 1 week or 1day data for training the model every Incremental BPSO
time, then the prediction accuracy and efficiency could be much better
than randomly assign a size for buffer. Meanwhile, because stream data In this algorithm, when decision table was established in the begin-
will be generated with time and the measured object may change its ning step, the rule table will be used in the next run. However, this
habits or working pattern with time. If we can detect the cycle of data situation will not last forever. Once the prediction accuracy is found
in an efficient way, then the size of buffer can be adjusted according to be not good enough or the rule matrix is confused, the feature se-
to time. In other words, the size of bound is to simulate the cycle of lection needs to be redone. In this design, Incremental BPSO is used
memory. as an example to find the new optimal attribute set. Other metaheuris-
If the two questions above can be solved, then this new incremen- tics could be made possible to replace for this task of incremental fea-
tal technique is formulated. However, swarm decision table, the main ture selection. Compared to BPSO, incremental version is to succeed
prediction model, used to work on a static training dataset. In this pa- the last optimal feature set as the initial starting feature set for the
per, we are going to make it be applied on incremental training data. current feature searching. This process can help save more time for
Because the core idea of decision table is swarm feature selection, if computation.
we want to make decision table be incremental [24] then inventing in-
cremental swarm feature selection is the core task. The advantage of Deprecated counting
applying incremental feature selection is to reduce the computing time
of table construction. Once the data is updated then old decision table When long term data and short-term data are used together to con-
will be computed again which waste much time to do this process. How- struct a rule matrix, different weights will be given to counting long term

62
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

data and short-term data. Here the depreciated counting is introduced. process, then it will forget the same proportion of data that associates
Since the feature selection may not be carried on with every short-term with each rule until it meets the capacity.
data coming, the rule table may remain static for only several terms.
Therefore, based on a temporary static rule table, we can do two kinds
of operations. First of all, a row of depreciated coefficient could be added References
as one of the attributes. The definition of depreciated coefficient will be
[1] O.D. Lara, M.A. Labrador, A survey on human activity recognition using wearable
defined below.
sensors, IEEE Commun. Surv. Tutor. 15 (3) (2012) 1192–1209 29 November.
[2] D.L. Hall, J. Llinas, An introduction to multisensor data fusion, Proc. IEEE 85 (1)
Definition 7. Accumulated depreciated coefficient (1997) 6–23.
Suppose that there is a decision table with selected features and rules [3] H.F. Durrant-Whyte, Sensor models and multisensor integration, Int. J. Robot. Res.
table. This decision table could be static for several turns of incremental 7 (6) (1990) 97–113.
[4] H. Du, Y. Zhao, J. Han, Z. Wang, G. Song, Data fusion of multiple Kinect sensors for
data which means dataset S1 , 𝑆2 … , 𝑆𝑇 share the same decision table. a rehabilitation system, in: 2016 38th Annual International Conference of the IEEE
Every time, there will be different amount of data classified under spe- Engineering in Medicine and Biology Society (EMBC), 2016, pp. 4869–4972. 16–20
cific rules. Then the accumulated depreciated coefficients of the rule i Aug.
[5] S. Fong, W. Song, R. Wong, C. Bhatt, D. Korzun, Framework of Temporal Data Stream
at the turn of dataset ST could be defined as
Mining By Using Incrementally Optimized Very Fast Decision Forest, Internet of
Things, and Big Data Analytics Toward Next Generation Intelligence, Series Title:
⎧ ( )
Studies in Big Data, Volume Number, 30, Springer, 2018 ISBN: 978-3-319-60434-3.
⎪𝛾 𝑑 𝑒𝑝𝑇 −1 + ∑ 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 [6] B. Ma, S. Fong, Y. Zhuang, R.C. Millham, Data stream mining in fog computing en-
⎪ 𝑖 𝑚(𝑥)=𝑟𝑖
𝑑 𝑒𝑝𝑇𝑖 = ⎨ 𝑥∈ 𝑆𝑇 (24) vironment with feature selection using ensemble of swarm search algorithms, 2018
⎪𝛾 ∑ Conference on Information Communications Technology and Society (ICTAS), 2018
1 𝑖𝑓 𝑇 = 1
⎪ 𝑥∈ 𝑆1 𝑚(𝑥)=𝑟𝑖 8 and 9 March.
⎩ [7] P. Ghamisi, et al., Multisource and multitemporal data fusion in remote sensing: a
comprehensive review of the state of the art, IEEE Geosci. Remote Sens. Mag. 7 (1)
where 𝛾 is the depreciated rate. (2019) 6–39 20 March 2019.
[8] S. Fong, W. Song, K. Cho, R. Wong, K.K.L. Wong, Training classifiers with shadow
Definition 8. Depreciated counting features for sensor-based human activity recognition, Sensors 17 (3) (2017) 476 27
February.
Suppose that there is a decision table available, long term data will [9] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Tech-
be denoted as LST and short-term data will be denoted as SST . The whole niques, second ed., Elsevier, 2002.
training dataset is 𝑆𝑇 = 𝐿𝑆𝑇 ∪ 𝑆 𝑆𝑇 . Then the rule matrix in incremen- [10] A. Kadambi, A. Bhandari, R. Raskar, 3D Depth cameras in vision: benefits and lim-
itations of the hardware, in: L. Shao (Ed.), Computer Vision and Machine Learn-
tal swarm decision table could be defined as ing With RGB-D Sensors, Advances in Computer Vision and Pattern Recognition,
𝑆 ∑ ∑ Springer, 2014.
𝑅𝑒𝑖,𝑗𝑇 = 𝛾 × 1𝑥∈𝛿𝑖,𝑗 + 1𝑥∈𝛿𝑖,𝑗 (25) [11] O. Patsadu, C. Nukoolkit, B. Watanapa, Human gesture recognition using Kinect
𝑥∈𝐿𝑆𝑇 𝑥∈𝑆 𝑆𝑇 camera, in: Ninth International Joint Conference on Computer Science and Software
Engineering (JCSSE), 2012, pp. 28–32.
Here 𝛾 is the depreciated rate. Depreciated counting is a way to [12] B. Bonnechère, B. Jansen, P. Salvia, H. Bouzahouene, L. Omelina, J. Cornelis,
weigh the importance of the instances between long term and short M. Rooze, S. Van Sint Jan, What are the current limits of the KinectTM sensor,
in: Proc. 9th Intl Conf. Disability, Virtual Reality & Associated Technologies, Laval,
term. 𝛾 is always lower than 1, because the event that happened in the
France, 2012, pp. 287–294. 10–12 Sept.
history may not have such strong influence, but it still has some impact. [13] M. Naeemabadi, B. Dinesen, O. Andersen, S. Najafi, and J. Hansen, Evaluating ac-
curacy and usability of microsoft kinect sensors and wearable sensor for tele knee
rehabilitation after knee operation. In Proceedings of the 11th International Joint
Long term buffer updating Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018),
Volume 1: BIODEVICES, pp. 128–135.
Long term buffer increments. If the new data comes and forms a new [14] S. Fong, R.P. Biuk-Aghai, R.C. Millham, Swarm search methods in weka for data min-
ing, in: 10th International Conference on Machine Learning and Computing (ICMLC
rule, then we keep these data in the long-term buffer. If the size of buffer
2018), Macau, China, 2018, pp. 122–127. February 26–28.
has been changed to become larger than the total number of instances [15] A.B.H. Mohamed, T. Val, L. Andrieux, A. Kachouri, Assisting people with disabili-
in the original buffer and the short-term data, then we just add the new ties through Kinect sensors into a smart house, 2013 International Conference on
Computer Medical Applications (ICCMA), 2013 20–22 Jan.
data in without forgetting. In this principle, something similar to ac-
[16] H.M. Hondori, M. Khademi, C.V. Lopes, Monitoring intake gestures using sensor
cumulated depreciated coefficient will be defined, called accumulated fusion (Microsoft Kinect and inertial sensors) for smart home tele-rehab setting, in:
major class depreciated coefficient. 1st Annual IEEE Healthcare Innovation Conference of the IEEE EMBS, 2012, pp. 1–4.
November.
Definition 9. Accumulated major class depreciated coefficient [17] W. Zhao, et al., A privacy-aware Kinect-based system for healthcare professionals,
in: 2016 IEEE International Conference on Electro Information Technology (EIT),
Suppose that there is a decision table with selected features and rules
2016, pp. 205–210. 19–21 May.
table. This decision table could be static for several turns of incremental [18] P. Paliyawan, C. Nukoolkit, P. Mongkolnam, in: The 20th Asia-Pacific Conference
data which means dataset 𝑆1 , 𝑆2 … , 𝑆𝑇 share the same decision ta- on Communication (APCC2014), Office Workers Syndrome Monitoring Using Kinect,
ble. Every time, there will be different amount of data classified under 2014, pp. 58–63. 1–3 Oct.
[19] S. Sukreep, P. Mongkolnam, C. Nukoolkit, Detect the daily activities and in-house
specific rules and belong to specific class. Then the accumulated major locations using smartphone, Recent Adv. Inf. Commun. Technol. (2015) 215–225.
class depreciated coefficient of the rule i at the turn of dataset ST could [20] S. Sukreep, K. Elgazzar, H. Chu, P. Mongkolnam, C. Nukoolkit, iWatch: a fall and
be defined as: activity recognition system using smart devices, Int. J. Comput. Commun. Eng. 8 (1)
( ) (2019) 18–31.
⎧ 𝑇 ∑ [21] Bestfirst Search Method, URL: https://blog.csdn.net/highkit/article/details/
⎪ 𝛾 𝑚𝑐 𝑑𝑒𝑝𝑖 +−1 1𝑥∈Φ𝑖 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 7326167, ONLINE, (Last accessed on 14 March 2019).
⎪ 𝑥∈ 𝑆𝑇

[22] R. Kohavi, The power of decision tables, ECML (1995),
𝑚𝑐 𝑑𝑒𝑝𝑇𝑖 = ⎨ 𝛾 1𝑥∈Φ𝑖 (26) doi:10.1007/3-540-59286-5_57.
⎪ 𝑥∈ 𝑆𝑇 [23] R. Kohavi, D. Sommerfield, J. Dougherty, Data mining using MLC a machine learning
⎪ library in C++, Int. J. Artif. Intell. Tools 06 (04) (1997) 537–566 World Scientific.
⎩𝑖𝑓 𝑡ℎ𝑒 𝑚𝑎𝑗𝑜𝑟 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛 𝑇 − 1 𝑡𝑢𝑟𝑛 𝑖𝑠 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒
[24] L. Guan, An incremental updating algorithm of attribute reduction set in decision ta-
bles, in: Sixth International Conference on Fuzzy Systems and Knowledge Discovery,
where 𝛾 is the depreciated rate, and Φi is the collection of major class
2009, pp. 421–425. 14–16 Aug.
of rule i. If the short-term data comes and changed the major class of [25] R. Tang, S. Fong, Wolf search algorithm with ephemeral memory, Seventh In-
the specific rules which means the counting of new major class is larger ternational Conference on Digital Information Management (ICDIM 2012), 2012,
than the accumulated major class depreciated coefficient, then the long- doi:10.1109/ICDIM.2012.6360147.
[26] X.-S. Yang, “Flower pollination algorithm for global optimization”, Unconventional
term buffer will keep the data from major class and forget other data. If Computation and Natural Computation. UCNC 2012. Lecture Notes in Computer Sci-
the buffer is not enough for holding the whole data after doing all the ence, vol 7445. Springer, Berlin, Heidelberg, doi:10.1007/978-3-642-32894-7_27.

63
T. Li, S. Fong and K.K.L. Wong et al. Information Fusion 60 (2020) 41–64

[27] Microsoft Kinect Hardware Specification, https://blogs.msdn.microsoft.com/ [30] R. Gravina, P. Alinia, H. Ghasemzadeh, G. Fortino, Multi-sensor fusion in body sensor
kinectforwindows/2012/01/20/near-mode-what-it-is-and-isnt/ [Last accessed on networks: state-of-the-art and research challenges, Inf. Fusion 35 (2017) 68–80.
10/10/2019]. [31] G. Fortino, D. Parisi, V. Pirrone, G.D. Fatta, BodyCloud: a SaaS approach for com-
[28] M.M. Hassan, Md.G.R. Alam, Md.Z. Uddin, Md.S. Huda, A. Almogren, G. Fortino, munity body sensor networks, Future Gener. Comp. Syst. 35 (2014) 62–79.
Human emotion recognition using deep belief network architecture, Inf. Fusion 51 [32] S. Iyengar, F.T. Bonda, R. Gravina, A. Guerrieri, G. Fortino, A.L. Sangiovanni-Vin-
(2019) 10–18. centelli, A framework for creating healthcare monitoring applications using wireless
[29] R. Gravina, C. Ma, P. Pace, G. Aloi, W. Russo, W. Li, G. Fortino: Cloud-based activity- body sensor networks, BODYNETS 8 (2008).
aaservice cyber-physical framework for human activity monitoring in mobility. Fu- [33] G. Fortino, A. Guerrieri, F.L. Bellifemine, R. Giannantonio, SPINE2: developing BSN
ture Gener. Comp. Syst. 75: 158. applications on heterogeneous sensor nodes, SIES (2009) 128–131.

64

You might also like