SlideShare a Scribd company logo
IMPROVING SUPERVISED CLASSIFICATION OF
DAILY ACTIVITIES LIVING USING NEW COST
SENSITIVE CRITERION FOR C-SVM
M’hamed Bilal Abidine, Belkacem Fergani
Speech Communication & Signal Processing Laboratory.
Faculty of Electronics and Computer Sciences
USTHB, Algiers, Algeria
abidineb@hotmail.com, bfergani@gmail.com

ABSTRACT
The growing population of elders in the society calls for a new approach in care giving. By
inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important
discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional
Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor
patterns in a smart home environment. We address also the class imbalance problem in activity
recognition field which has been known to hinder the learning performance of classifiers. Cost
sensitive learning is attractive under most imbalanced circumstances, but it is difficult to
determine the precise misclassification costs in practice. We introduce a new criterion for
selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four
real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed
criterion outperforms the state-of-the-art discriminative methods in activity recognition.

KEYWORDS
Activity Recognition, C-SVM, Wireless Sensor Networks, Machine Learning, Imbalanced Data

1. INTRODUCTION
In 2030, nearly one out of two households will include someone who needs help performing basic
Activities of Daily Living (ADL) [1] such as cooking, brushing, dressing, toileting, bathing and
so on. For their comfort and because the healthcare infrastructure will not be able to handle this
growth, it is suggested to assist sick or elderly people at home. Sensor based technologies in the
home is the key of this problem. Sensor data collected often needs to be analysed using data
mining and machine learning techniques to build activity models and perform further means of
pattern recognition [2, 3]. The learning of such models is usually done in a supervised manner
(human labelling) and requires a large annotated datasets recorded in different settings.
Recognizing a predefined set of activities is a classification task: features are extracted from
signals gathered by the sensors within a time window and then used to infer the activity. The
classification algorithm has to be trained using a set of samples representing the activities that
have to be recognized.

Sundarapandian et al. (Eds) : ICAITA, SAI, SEAS, CDKP, CMCA-2013
pp. 415–424, 2013. © CS & IT-CSCP 2013

DOI : 10.5121/csit.2013.3834
Computer Science & Information Technology (CS & IT)

416

State of the Art methods used for recognizing activities can be divided in two main categories: the
so called generative models and discriminative models [5-8]. The generative methods perform
well but require data modelling, marred by generic optimization criteria and are generally time
consuming. Discriminative ones received the most attention in literature for its simplicity-model
and good performance. Therefore, we have studied in this paper, different discriminative
classification methods.
However, activity recognition datasets are generally imbalanced, meaning certain activities occur
more frequently than others (e.g. sleeping is generally done once a day, while toileting is done
several times a day). However, the learning system may have difficulties to learn the concept
related to the minority class, and therefore, not incorporating this class imbalance results in an
evaluation that may lead to disastrous consequences for elderly person. Recently, the class
imbalance problem has been recognized as a crucial problem in machine learning [9-12]. Most
classifiers assume a balanced distribution of classes and equal misclassification costs for each
class and therefore, they perform poorly in predicting the minority class for imbalanced data [13].
They optimize the overall classification accuracy and hence sacrifice the prediction performance
on the minority classes. Compared with other standard classifiers, SVM is more accurate on
moderately imbalanced data. The reason is that only Support Vectors are used for classification
and many majority samples far from the decision boundary can be removed without affecting
classification [3]. However, It has been identified that the separating hyperplane of an SVM
model developed with an imbalanced dataset can be skewed towards the minority class [14], and
this skewness can degrade the performance of that model with respect to the minority class.
Previous research that aims to improve the effectiveness of SVM on imbalanced classification
[14-16], and some good results have been reported [10]. Approaches for addressing the
imbalanced training-data problem can be categorized into two main divisions: the data processing
approach and the algorithmic approach. At the data level, these solutions can be divided into :
oversampling [14] (in which new samples are created for the minority class), undersampling [14]
(where, the samples are eliminated for the majority class) or some combination of the two is
deployed. Vilarino et al. used Synthetic Minority Oversampling TEchnique (SMOTE) [17]
oversampling. At the algorithmic level, the solutions include adjusting the costs associated with
misclassification so as to improve performance [18, 19], adjusting the probabilistic estimate at the
tree leaf (when working with decision trees), adjusting the decision threshold, and recognitionbased (i.e., learning from one class) rather than discrimination-based (two class) learning [14].
Akbani et al. proposed the SMOTE with Different Costs algorithm (SDC) [14]. SDC conducts
SMOTE oversampling on the minority class with different error costs. Wu et al. proposed the
Kernel Boundary Alignment algorithm (KBA) that adjusts the boundary toward the majority class
by modifying the kernel matrix [15]. In addition to the naturally occurring class imbalance
problem, the imbalanced data situation may also occur in one-against-rest schema in multiclass
classification. Therefore, even though the training data is balanced, issues related to the class
imbalance problem can frequently surface.
Our objective is to deal the class imbalance problem to perform automatic recognition of
activities from binary sensor patterns in a smart home. The main contribution of our work is
twofold. Firstly, we propose a new criterion to select the cost parameter C for the discriminative
method Soft-Support Vector Machines (C-SVM) [3, 7] to appropriately tackle the problem of
class imbalance caused by imbalanced activity datasets. Secondly, this method is compared with
Conditional Random Fields (CRF) [5], The k-Nearest Neighbors k-NN [2] and the traditional
SVM utilized as reference methods. Especially, CRF is a generative probabilistic model have
been mainly used as a reference methods which recently gained popularity and work well in
recognition activity field [5].
The remainder of this paper is organized as follows, Section 2 describes the different
discriminative methods and the weighted C-SVM method combined with our proposed criterion
417

Computer Science & Information Technology (CS & IT)

for parameter C setting. Then, Section 3 presents the setup and discusses the results acquired
through a series of experiments using different datasets. Finally, we conclude in Section 4.

2. DISCRIMINATIVE METHODS FOR ACTIVITY RECOGNITION
2.1. Conditional Random Fields (CRF)
Conditional Random Fields (CRF) have an exponential model for the conditional probability (1)
of the entire sequence of labels Y given an input observation sequence X. CRF is defined by a
weighted sum of K feature functions f i that will return a 0 or 1 depending on the values of the
input variables and therefore determine whether a potential should be included in the calculation.
Each feature function carries a weight λi that gives its strength to the proposed label. These
weights are the parameters we want to find when learning the model. CRF model parameters can
be learned using an iterative gradient method by maximizing the conditional probability
distribution defined as
T
K
1
(1)
P (Y | X ) =
exp ∑  ∑ λi f i ( yt , yt −1 , xt ) 


t =1 i =1

Z(X )
With Z(X) = ∑  exp ∑  ∑ λi f i ( yt , yt −1 , xt )  


T

y





K

t =1 i =1



(2)



One of the main consequences of this choice is that while learning the parameters of a CRF we
avoid modelling the distribution of the observations, p(x). As a result, we can only use CRF to
perform inference (and not to generate data), which is a characteristic of the discriminative
models. To find the label y for new observed features, we take the maximum of the conditional
probability.
ˆ
y ( x ) = argmax y p(y | x)
(3)

2.2. k-Nearest Neighbors (k-NN)
The k-Nearest Neighbors (k-NN) algorithm is amongst the simplest of all machine learning
algorithms [2], and therefore easy to implement. The m training instances x ∈ R n are vectors in an
n-dimensional feature space, each with a class label. The result of a new query is classified based
on the majority of the k-NN categories. The classifiers do not use any model for fitting and are
only based on memory to store the feature vectors and class labels. They work based on the
minimum distance from an unlabelled vector (a test point) to the training instances to determine
the k-NN. The positive integer k is a user-defined constant. Usually Euclidean distance is used as
the distance metric.

2.3. C-Support Vector Machines (C-SVM)
SVM classifies data by determining a hyperplane into a higher dimensional space (feature space)
[3]. For a two class problem, we assume that we have a training set

{(xi , yi )}

m

i =1

where xi ∈ R n are

the observations and yi are class labels either 1 or -1. The primal formulation of the soft-margin in
SVM maximizes margin 2/K(w,w) between two classes and minimizes the amount of total
misclassications (training errors) ξi simultaneously by solving the following optimization problem
:
Computer Science & Information Technology (CS & IT)

418

m

min 1/2.K(w,w) + C ∑ ξ i
w,b,ξ

i=1

(4)

T

subject to yi (w φ( xi ) + b) ≥ 1 − ξi , ξi ≥ 0,i = 1,..., m

where w is normal to the hyperplane, b is the translation factor of the hyperplane to the origin
and φ (.) is a non-linear function which maps the input space into a feature space defined by
K ( xi , x j ) = φ ( xi )T φ ( x j ) that is kernel matrix of the input space.

Figure1. C-SVM classification problem: The classes are linearly separated in a feature space

(

We choose the popular Radial Basis Function (RBF kernel): K ( x, y ) = exp − xi − x j

2

/2σ 2

)

where σ is the width parameter. It is a reasonable first choice for the classification of the
nonlinear datasets, as it has fewer parameters. The construction of such functions is described by
the Mercer conditions [20]. The regularization parameter C is used to control the trade-off
between maximization of the margin width and minimizing the number of training error of
nonseparable samples in order to avoid the problem of overfitting [2]. A small value for C will
increase the number of training errors, while a large C will lead to a behavior similar to that of a
hard-margin SVM. In practice the parameters ( σ and C) are varied through a wide range of
values and the optimal performance assessed using a cross-validation technique to verify
performance using only training set [20].
The dual formulation of the soft margin SVM can be solved by representing it as a Lagrangian
optimization problem as follows [3] :
1
∑im 1 αi − ∑im 1 ∑ m=1 αi α j yi y j K(xi , x j )
=
=
j
2
Subject to ∑im 1 α i yi = 0 and 0 ≤ α i ≤ C ,
=
max
αi

(5)

Solving (5) for α gives a decision function in the original space for classifying a test point
x ∈ R n [3] is presented by the following formula
f ( x ) = sgn ∑ αi yi K(x, xi ) + b 


msv

 i =1



(6)

where msv is the number of support vectors xi ∈ R n . α i > 0 are Lagrange multipliers. The
training samples where αi > 0 are called support vectors.
419

Computer Science & Information Technology (CS & IT)

In this study, a software package LIBSVM [21] was used to implement the multiclass classifier
algorithm. It uses the One-vs-One method [3]. Although SVM often produce effective solutions
for balanced datasets, they are sensitive to imbalanced training datasets and produces sub-optimal
models because the constraint in (4) imposes equal total influence from the positive and negative
support vectors. To cope the imbalanced samples set, we choose the weighted C-SVM
formulation [3] and we propose a new criterion for tuning the parameter C.
2.3.1. Weighted SVM

In this method, the SVM soft margin objective function is modified to assign two different
penalty constraints C + and C − for the minority and majority classes respectively, as given in the
quadratic optimization below
min 1/ 2.K(w,w) + C+ ∑ξi + C− ∑ξi
w,b,ξ

yi =1

yi =−1

T

subject to yi (w φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0,i = 1,...,m

(7)

The SVM dual formulation gives the same Lagrangian as in the original soft-margin SVM in (5),
but with different constraints on α i as follows:
1
∑im 1 αi − ∑im 1 ∑ m=1 αi α j yi y j K(xi , x j )
j
=
=
2
Subject to 0 ≤ αi ≤ C+ , if yi = +1 , and
max
αi

0 ≤ αi ≤ C − ,

(8)

if yi = −1

In the construction of cost sensitive SVM, the cost parameter plays an indispensable role. For the
cost information, some authors [18, 19] have proposed adjusting different penalty parameters for
different classes of data which effectively improves the low classification accuracy caused by
imbalanced samples. For example, it is highly possible to achieve the high classification accuracy
by simply classifying all samples as the class with majority samples (positive observations),
therefore the minority class (negative observations) is the error training. Veropoulos et al. in [19]
propose to increase the tradeoff associated with the minority class (i.e., C − > C + ) to eliminate the
imbalance effect. Veropoulos et al. have not suggested any guidelines for deciding what the
relative ratios of the positive to negative cost factors should be.
2.3.1.1. Proposed Criterion

Our proposed criterion advocates analytic parameter selection of Ci regularization parameter in
N-multi class problem for each class i directly from the training data, on the basis of the
proportion of class data. This criterion respects the reasoning of Veropoulos that is to say that the
tradeoff C − associated with the smallest class is large in order to improve the low classification
accuracy caused by imbalanced samples. It allows the user to set individual weights for individual
training examples, which are then used in C-SVM training. We give the main cost value Ci in
function of m+ the number of majority class and mi the number of other classes samples, it is
given by:
Ci = [m+ /mi ]
(9)
[ ] is integer function and Ci ∈ { ,...,m+ /mi } , i = 1,..., N
1
Computer Science & Information Technology (CS & IT)

420

For the two-class training problem, the primal optimization problem of the soft-margin in SVM
can be constructed via this criterion and become:
min 1/2K(w,w) + ∑ξi +[m+ /m− ]. ∑ξi
w,b,ξ

yi =1

yi =−1

T

subject to yi (w φ( xi ) + b) ≥ 1− ξi , ξi ≥ 0, i =1,...,m

(10)

The SVM dual formulation gives the same Lagrangian as in the soft margin SVM in (8) with
C + = 1 and C− = m + /m− .

3. EXPERIMENTAL RESULTS AND DISCUSSION
3.1. Datasets
Experiments were performed using a datasets gathered from three houses having different layouts
and different number of sensors [5, 22]. Each sensor is attached to a wireless sensor network
node. The activities performed with a single man occupant at each house are different from each
other. Data are collected using binary sensors such as reed switches to determine open-close
states of doors and cupboards; pressure mats to identify sitting on a couch or lying in bed;
mercury contacts to detect the movements of objects like drawers; passive infrared (PIR) sensors
to detect motion in a specific area; float sensors to measure the toilet being flushed. Time slices
for which no annotation is available are collected in a separate activity labelled ‘Idle’. The data
were collected by a Base-Station and labelled using a Wireless Bluetooth headset combined with
speech recognition software or a handwritten diary for the house C.
Table 1. Overview of activities and the number of observations for each house [5, 22].
House A(1)
Idle(4627)
Leaving(22617)
Toileting(380)
Showering(265)
Sleeping(11601)
Breakfast(109)
Dinner(348)
Drink(59)

House A(2)
Idle(6031)
Leaving(16856)
Toileting(382)
Showering(264)
Brush teeth(39)
Sleeping(11592)
Breakfast(93)
Dinner(330)
Snack(47)
Drink(53)

House B
Idle(5598)
Leaving(10835)
Toileting(75)
Showering(112)
Brush teeth(41)
Sleeping(6057)
Dressing(46)
Prep.Breakfast(81)
Prep.Dinner(90)
Drink(12)
Dishes(34)
Eat Dinner(54)
Eat Breakfast(143)
Play piano(492)

House C
Idle(2732)
Leaving(11993)
Eating(376)
Toileting(243)
Showering(191)
Brush teeth(102)
Shaving(67)
Sleeping(7738)
Dressing(112)
Medication(16)
Breakfast(73)
Lunch(62)
Dinner(291)
Snack(24)
Drink(34)
Relax(2435)

3.2. Setup and Performance Measures
We separate the data into a test and training set using a “leave one day out cross validation”
approach. Sensors outputs are binary and represented in a feature space which is used by the
model to recognize the activities performed. We do not use the raw sensor data representation as
observations; instead we use the “Change point” and “Last” representation which have been
shown to give much better results in activity recognition [5]. The raw sensor representation gives
a 1 when the sensor is firing and a 0 otherwise. The “change point” representation gives a 1 when
421

Computer Science & Information Technology (CS & IT)

the sensor reading changes. While the last sensor representation continues to assign a 1 to the last
sensor that changed state until a new sensor changes state.
As the activity instances were imbalanced between classes, we evaluate the performance of our
models by two measures, the accuracy and the class accuracy. The accuracy shows the percentage
of correctly classified instances which is highly affected by the sample distribution across activity
classes, the class accuracy taking into account the class imbalance shows the average percentage
of correctly classified instances per classes
Accuracy =

m
∑i =1[inferred (i )=true(i )]
m

N  ∑ mc [inferred (i ) = true c (i ) ]
i =1
c
Class = 1 ∑ 

N c =1
mc


(11)

(12)

in which [a = b] is a binary indicator giving 1 when true and 0 when false. m is the total number
of samples, N is the number of classes and mc the total number of samples for class c.

3.3. Results
We compared the performance of the CRF, k-NN and C-SVM on the imbalanced dataset of the
house A(1) in which minority class are all classes that appear at most 1% of the time, while others
are the majority classes that typically, have a longer duration (e.g. leaving and sleeping). These
algorithms are tested under MATLAB environment and the SVM algorithm is tested with
implementation LibSVM [21].
In our experiments, the C-SVM hyper-parameters (σ, C) have been optimized in the range (0.1-2)
and (0.1-10000) respectively to maximize the class accuracy of leave-one-day-out cross
validation technique. The best pair parameters (σopt, Copt) = (1, 5) are used, see table 2. Then, we
tried to find the penalty parameters Cadaptatif (class) adapted for different classes by using our
criterion, see table 3.
Table 2. Selection of parameter Copt with the cross validation for C-SVM.

Copt
Class (%)

0.1 5 50 500 1000 5000 10000
51.7 61 61 61
61
61
61

Our empirical results in table 2 suggest that the value of regularization parameter C has
negligible effect on the generalization performance (as long as C is larger than a certain threshold
analytically determined from the training data (C =5)).
Table 3. Selection of parameter Copt adapted for each class with our criterion for C-SVM.

ADL Id Le To Sh Sl Br Di Dr
Copt 5 1 59 85 2 207 65 383
We see in table 3 that the minority class requires a large value of C compared with the majority
class. This fact induces a classifier’s bias in order to give more importance to the minority ones.
The summary of the accuracy and the class accuracy obtained with the concatenation matrix of
“Changepoint+Last” for CRF, k-NN, C-SVM using cross validation research and wighted CSVM using our criterion are presented in Table 4. This table shows that C-SVM+our criterion
Computer Science & Information Technology (CS & IT)

422

performs better in terms of class accuracy, while others methods performs better in terms of
accuracy.
Table 4. Accuracy and class accuracy for CRF, k-NN, C-SVM+cross validation search and C-SVM+our
criterion.
Methods
CRF [5]

Feature representation
Changepoint+Last

Accuracy
95.6%

Class
70.8%

k-NN

Changepoint+Last

94.4%

67.9%

C-SVM+CV

Changepoint+Last

95.4%

61%%

C-SVM +Our criterion

Changepoint+Last

92.5%

72.4 %

We report in figure 2 the classification results in terms of accuracy measure for each class with
CRF, k-NN, C-SVM+CV and C-SVM+our criterion methods. CRF, k-NN and C-SVM+CV
perform better for the majority activities, while C-SVM+our criterion performs better for minority
activities (other classes).

Figure 2. Comparison of accuracy of classification between CRF, k-NN, C-SVM+CV and C-SVM+our
criterion for different activities

Finally, we presented a way of compactly presenting all results in a single table 5, allowing a
quick comparison between CRF, k-NN, C-SVM+CV and C-SVM+our criterion performed using
three real world datasets recorded in three different houses A(2), B, C. We utilize the leave-oneday-out cross validation technique for the selection of width parameter. We found σopt=1, σopt=1
and σopt=2 for these datasets respectively. Our results give us early experimental evidence that our
method C-SVM combined with our proposed criterion works better for model classification; it
consistently outperforms the other methods in terms of the class accuracy for all datasets.
423

Computer Science & Information Technology (CS & IT)

Table 5. Accuracy and class accuracy for CRF, k-NN, C-SVM+cross validation search and C-SVM+our
criterion with three houses datasets.
Houses
A(2)

B

C

Models
CRF [22]
k-NNk=7
C-SVM + CVC=5
C-SVM+our criterion
CRF [22]
k-NN k=9
C-SVM+CVC=5
C-SVM+our criterion
CRF [22]
k-NN k=1
C-SVM +CVC=500
C-SVM +our criterion

Class(%)
57
55.9
50.3
62
46
31.3
39.3
46.4
30
35.7
35.6
37.2

Accuracy(%)
91
90.5
92.1
88
92
67.7
85.5
62.7
78
78.4
80.7
76.8

3.4. Discussion
Using experiments on three large real world datasets, we showed the class accuracy obtained with
house (C) is lower compared to others houses for all recognition methods. We suspect that the use
of a hand written diary for annotation (used in house C) results in less accurate annotation than
using the bluetooth headset method (used in houses A and B).
In the rest of section, we explain the difference in terms of performance between CRF, k-NN, CSVM+CV and C-SVM+our criterion for the house A(1). The CRF model does not model each
action class individually, but use a single model for all classes. As a result classes that are
dominantly present in the data have a bigger weight in the CRF optimisation. This is why CRF
performs better for the majority activities (’Idle’, ’Leaving’ and ’Sleeping’). In k-NN method, the
class with more frequent samples tends to neighbourhood of a test instance despite of distance
measurements, which leads to suboptimal classification performance on the minority class. A
multiclass C-SVM+CV trains several binary classifiers to differentiate the classes according to
the class labels and optimise a single parameter C for all class. When not considering the weights
in C-SVM formulation, this affect the classifiers performances and favorites the classification of
majority class. C-SVM+our criterion including the individual setting of parameter C for each
class separately shows that C-SVM becomes more robust for classifying the rare activities.
The recognition of the three kitchen activities ’Breakfast’ ’Dinner’ and ’Drink’ is lower compared
to the others activities for all methods. In particular, the ‘Idle’ is one of the most frequent
activities in all datasets but is usually not a very important activity to recognize. It might therefore
be useful to less weigh this activity. The kitchen activities are food related tasks, they are worst
recognized for all methods because most of the instances of these activities were performed in the
same location (kitchen) using the same set of sensors. For example, ‘Toileting’ and ‘Showering’
are more separable because they are in two different rooms, which make the information from the
door sensors enough to separate the two activities. Therefore the location of the sensors is of great
importance for the performance of the recognition system.

4. CONCLUSION
This paper introduces a simple criterion that have the power to effectively control the cost of the
C-SVM learning machine by dealing imbalanced activity recognition datasets. We demonstrate
that our proposed strategy is effective to classify multiclass sensory data over common techniques
such as CRF, k-NN and C-SVM using an equal misclassification cost. Usual method for choosing
Computer Science & Information Technology (CS & IT)

424

classifiers’s parameters, based on grid search using cross validation become intractable as soon as
the number of parameters exceeds two. Our criterion using different penalty parameters in the
weighted C-SVM formulation improves the low classification accuracy caused by imbalanced
activity recognition datasets.

REFERENCES
[1]
[2]
[3]
[5]
[6]

[7]

[8]

[9]
[10]
[11]
[12]
[13]
[14]
[15]

[16]
[17]
[18]
[19]
[20]
[21]
[22]

M.Wallace “Best practices in nursing care to older adults” in Try This, issue no. 2, Hartford Institute
for Geriatric Nursing, 2007.
C. Bishop, Pattern Recognition and Machine Learning, Springer. New York, ISBN: 978-0-38731073-2, 2006.
V.N. Vapnik. The Nature of Statistical Learning Theory. (Statistics for Engineering and Information
Science). Springer Verlag, Second Aufl., 2000.
T. van Kasteren, A. Noulas, G. Englebienne, and B. Krose, “Accurate activity recognition in a home
setting”, in UbiComp ’08. New York, NY, USA: ACM, 2008, pp. 1-9.
A. Fleury, M. Vacher, N. Noury, “SVM-Based Multi-Modal Classification of Activities of Daily
Living in Health Smart Homes : Sensors, Algorithms and First Experimental Results,” IEEE
Transactions on Information Technology in Biomedicine, Vol. 14(2), pp. 274-283, March 2010.
M.B. Abidine, B. Fergani. Evaluating C-SVM, CRF and LDA classification for daily activity
recognition. In Proc. of IEEE Int. Conf. on Multimedia Computing and Systems (ICMCS), pp. 272–
277, Tangier-Morocco, May 10-12, 2012.
M.B. Abidine and B. Fergani. Evaluating a new classification method using PCA to human activity
recognition. In Proc. of IEEE Int. Conf. on Computer Medical Applications (ICCMA), pages 1-4,
Sousse, Tunisia, January 20-22, 2013.
N. Chawla. Data mining for imbalanced datasets: An overview. Data Mining and Knowledge
Discovery Handbook, pages 875-886, 2010.
N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on learning from imbalanced data
sets,” SIGKDD Explorations, vol. 6, no. 1, pp. 1–6, 2004.
N. Chawla, N. Japkowicz, and A. Kolcz. editors 2003. Proceedings of the ICML’2003 Workshop on
Learning from Imbalanced Data Sets.
G. M. Weiss, “Mining with rarity: a unifying framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7–
19, 2004.
G.M. Weiss and F. Provost, Learning when training data are costly: the effect of class distribution on
tree induction, Journal of Artificial Intelligence Research 19 :315-354, 2003.
R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,”
in Proc. of the 15th European Conference on Machine Learning (ECML 2004), pp. 39–50, 2004.
G. Wu and E. Y. Chang, “KBA: Kernel boundary alignment considering imbalanced data
distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 786–795,
2005.
X. Chen, B. Gerlach, and D. Casasent. Pruning support vectors for imbalanced data classification. In
Proc. of International Joint Conference on Neural Networks, 1883-88, 2005.
N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16, 321-357, 2002.
Thai-Nghe N. : Cost-Sensitive Learning Methods for Imbalanced Data, Intl. Joint Conf. on Neural
Networks, 2010.
K. Veropoulos, C. Campbell and N. Cristianini, “Controlling the sensitivity of support vector
machines”, Proceedings of the International Joint Conference on AI, 1999, pp. 55-60.
J. Shawe-Taylor and N. Cristianini. “Kernel Methods for Pattern Analysis”, Cambridge University
Press, p220, 2004.
C. C. Chang and C. J. Lin, LIBSVM: [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin-/libsvm/
T.L.M. van Kasteren, H. Alemdar and C. Ersoy. Effective Performance Metrics for Evaluating
Activity Recognition Methods. ARCS 2011 Workshop on Context-Systems Design, Evaluation and
Optimisation, Italy, 2011

More Related Content

What's hot (18)

PDF
A Review of Image Classification Techniques
IRJET Journal
 
PDF
A novel ensemble modeling for intrusion detection system
IJECEIAES
 
PDF
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
ijfls
 
PDF
Conv xg
Nueng Math
 
PDF
Pattern recognition using context dependent memory model (cdmm) in multimodal...
ijfcstjournal
 
PDF
The effect of gamma value on support vector machine performance with differen...
IJECEIAES
 
PDF
Statistical Pattern recognition(1)
Syed Atif Naseem
 
PDF
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
ijesajournal
 
PDF
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
ijaia
 
PDF
D05222528
IOSR-JEN
 
PDF
Brain Tumor Classification using Support Vector Machine
IRJET Journal
 
PDF
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Editor IJCATR
 
PDF
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
IJDKP
 
PDF
Ijetcas14 327
Iasir Journals
 
PDF
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
IOSR Journals
 
PPTX
Islamic University Pattern Recognition & Neural Network 2019
Rakibul Hasan Pranto
 
PDF
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
ijcsity
 
PDF
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET Journal
 
A Review of Image Classification Techniques
IRJET Journal
 
A novel ensemble modeling for intrusion detection system
IJECEIAES
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
ijfls
 
Conv xg
Nueng Math
 
Pattern recognition using context dependent memory model (cdmm) in multimodal...
ijfcstjournal
 
The effect of gamma value on support vector machine performance with differen...
IJECEIAES
 
Statistical Pattern recognition(1)
Syed Atif Naseem
 
PERFORMANCE EVALUATION OF FUZZY LOGIC AND BACK PROPAGATION NEURAL NETWORK FOR...
ijesajournal
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
ijaia
 
D05222528
IOSR-JEN
 
Brain Tumor Classification using Support Vector Machine
IRJET Journal
 
Image Segmentation Using Two Weighted Variable Fuzzy K Means
Editor IJCATR
 
APPLYING NEURAL NETWORKS FOR SUPERVISED LEARNING OF MEDICAL DATA
IJDKP
 
Ijetcas14 327
Iasir Journals
 
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
IOSR Journals
 
Islamic University Pattern Recognition & Neural Network 2019
Rakibul Hasan Pranto
 
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...
ijcsity
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET Journal
 

Viewers also liked (20)

DOCX
Practica # 6
Nohemi Alvarez
 
PDF
Agenda Colaborativa para o Fortalecimento do Sistema de Garantia de Direitos ...
Aghata Gonsalves
 
PPT
Εντυπώσεις των μαθητών του δημοτικού σχολείου ΔΔΜΝ Ναυστάθμου Κρήτης
Stavrula Lada
 
PPTX
Fauna colombianita
Mateus Duarte
 
PPTX
Presentación1
Bryan Ramirez
 
PPTX
Seguridad alison
Keiner Lara
 
PPTX
La amistad
afiestasa
 
PPTX
Electromagnetic Spectrum
Mubarek Kurt
 
PDF
Illumi room - Peripheral Projected Illusionsfor Interactive Experiences
Katsuhito Okada
 
PDF
Martin Huba thesis abstract (english)
Martin Huba
 
DOCX
Repaso prueba integrativa 7º años
profesoraudp
 
PPTX
My UTS
barkonitriwijanar
 
PPTX
My uts
Ajeng Pandumi
 
PPTX
Bioquimica
EDWININFA
 
PPTX
Final project
luke_jones
 
PPTX
BIOQUIM
EDWININFA
 
PPTX
Ley 30 de 1992 colombia
evaniebles
 
PPTX
Organizacion deportiva en venezuela
Johnny Aquiles Baldayo
 
PDF
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
csandit
 
Practica # 6
Nohemi Alvarez
 
Agenda Colaborativa para o Fortalecimento do Sistema de Garantia de Direitos ...
Aghata Gonsalves
 
Εντυπώσεις των μαθητών του δημοτικού σχολείου ΔΔΜΝ Ναυστάθμου Κρήτης
Stavrula Lada
 
Fauna colombianita
Mateus Duarte
 
Presentación1
Bryan Ramirez
 
Seguridad alison
Keiner Lara
 
La amistad
afiestasa
 
Electromagnetic Spectrum
Mubarek Kurt
 
Illumi room - Peripheral Projected Illusionsfor Interactive Experiences
Katsuhito Okada
 
Martin Huba thesis abstract (english)
Martin Huba
 
Repaso prueba integrativa 7º años
profesoraudp
 
Bioquimica
EDWININFA
 
Final project
luke_jones
 
BIOQUIM
EDWININFA
 
Ley 30 de 1992 colombia
evaniebles
 
Organizacion deportiva en venezuela
Johnny Aquiles Baldayo
 
IMPROVING RULE-BASED METHOD FOR ARABIC POS TAGGING USING HMM TECHNIQUE
csandit
 
Ad

Similar to IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST SENSITIVE CRITERION FOR C-SVM (20)

PDF
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
cscpconf
 
PDF
A survey of modified support vector machine using particle of swarm optimizat...
Editor Jacotech
 
PDF
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
irjes
 
PDF
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
ijtsrd
 
PDF
6145-Article Text-9370-1-10-20200513.pdf
chalachew5
 
PDF
Application of combined support vector machines in process fault diagnosis
Dr.Pooja Jain
 
PDF
Most Cited Articles in Academia ---International Journal of Data Mining & Kno...
IJDKP
 
PDF
thesis
Roemer Vlasveld
 
PPTX
Statistical Machine Learning unit4 lecture notes
SureshK256753
 
PDF
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
PDF
A Hybrid Theory Of Power Theft Detection
Camella Taylor
 
PDF
Analysis of Imbalanced Classification Algorithms A Perspective View
ijtsrd
 
PDF
A SURVEY OF METHODS FOR HANDLING DISK DATA IMBALANCE
IJCI JOURNAL
 
PPTX
super vector machines algorithms using deep
KNaveenKumarECE
 
PPT
SVM (2).ppt
NoorUlHaq47
 
PDF
Buddi health class imbalance based deep learning
Ram Swaminathan
 
PPT
i i believe is is enviromntbelieve is is enviromnt7.ppt
hirahelen
 
PPT
SVM.ppt
SrikanthK799073
 
PDF
Classification Techniques: A Review
IOSRjournaljce
 
PDF
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
cscpconf
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
cscpconf
 
A survey of modified support vector machine using particle of swarm optimizat...
Editor Jacotech
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
irjes
 
Adaptive Classification of Imbalanced Data using ANN with Particle of Swarm O...
ijtsrd
 
6145-Article Text-9370-1-10-20200513.pdf
chalachew5
 
Application of combined support vector machines in process fault diagnosis
Dr.Pooja Jain
 
Most Cited Articles in Academia ---International Journal of Data Mining & Kno...
IJDKP
 
Statistical Machine Learning unit4 lecture notes
SureshK256753
 
Hypothesis on Different Data Mining Algorithms
IJERA Editor
 
A Hybrid Theory Of Power Theft Detection
Camella Taylor
 
Analysis of Imbalanced Classification Algorithms A Perspective View
ijtsrd
 
A SURVEY OF METHODS FOR HANDLING DISK DATA IMBALANCE
IJCI JOURNAL
 
super vector machines algorithms using deep
KNaveenKumarECE
 
SVM (2).ppt
NoorUlHaq47
 
Buddi health class imbalance based deep learning
Ram Swaminathan
 
i i believe is is enviromntbelieve is is enviromnt7.ppt
hirahelen
 
Classification Techniques: A Review
IOSRjournaljce
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
cscpconf
 
Ad

Recently uploaded (20)

PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PDF
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
GDG Cloud Southlake #44: Eyal Bukchin: Tightening the Kubernetes Feedback Loo...
James Anderson
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Pipeline Industry IoT - Real Time Data Monitoring
Safe Software
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
How to Visualize the ​Spatio-Temporal Data Using CesiumJS​
SANGHEE SHIN
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 

IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST SENSITIVE CRITERION FOR C-SVM

  • 1. IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST SENSITIVE CRITERION FOR C-SVM M’hamed Bilal Abidine, Belkacem Fergani Speech Communication & Signal Processing Laboratory. Faculty of Electronics and Computer Sciences USTHB, Algiers, Algeria [email protected], [email protected] ABSTRACT The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition. KEYWORDS Activity Recognition, C-SVM, Wireless Sensor Networks, Machine Learning, Imbalanced Data 1. INTRODUCTION In 2030, nearly one out of two households will include someone who needs help performing basic Activities of Daily Living (ADL) [1] such as cooking, brushing, dressing, toileting, bathing and so on. For their comfort and because the healthcare infrastructure will not be able to handle this growth, it is suggested to assist sick or elderly people at home. Sensor based technologies in the home is the key of this problem. Sensor data collected often needs to be analysed using data mining and machine learning techniques to build activity models and perform further means of pattern recognition [2, 3]. The learning of such models is usually done in a supervised manner (human labelling) and requires a large annotated datasets recorded in different settings. Recognizing a predefined set of activities is a classification task: features are extracted from signals gathered by the sensors within a time window and then used to infer the activity. The classification algorithm has to be trained using a set of samples representing the activities that have to be recognized. Sundarapandian et al. (Eds) : ICAITA, SAI, SEAS, CDKP, CMCA-2013 pp. 415–424, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3834
  • 2. Computer Science & Information Technology (CS & IT) 416 State of the Art methods used for recognizing activities can be divided in two main categories: the so called generative models and discriminative models [5-8]. The generative methods perform well but require data modelling, marred by generic optimization criteria and are generally time consuming. Discriminative ones received the most attention in literature for its simplicity-model and good performance. Therefore, we have studied in this paper, different discriminative classification methods. However, activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others (e.g. sleeping is generally done once a day, while toileting is done several times a day). However, the learning system may have difficulties to learn the concept related to the minority class, and therefore, not incorporating this class imbalance results in an evaluation that may lead to disastrous consequences for elderly person. Recently, the class imbalance problem has been recognized as a crucial problem in machine learning [9-12]. Most classifiers assume a balanced distribution of classes and equal misclassification costs for each class and therefore, they perform poorly in predicting the minority class for imbalanced data [13]. They optimize the overall classification accuracy and hence sacrifice the prediction performance on the minority classes. Compared with other standard classifiers, SVM is more accurate on moderately imbalanced data. The reason is that only Support Vectors are used for classification and many majority samples far from the decision boundary can be removed without affecting classification [3]. However, It has been identified that the separating hyperplane of an SVM model developed with an imbalanced dataset can be skewed towards the minority class [14], and this skewness can degrade the performance of that model with respect to the minority class. Previous research that aims to improve the effectiveness of SVM on imbalanced classification [14-16], and some good results have been reported [10]. Approaches for addressing the imbalanced training-data problem can be categorized into two main divisions: the data processing approach and the algorithmic approach. At the data level, these solutions can be divided into : oversampling [14] (in which new samples are created for the minority class), undersampling [14] (where, the samples are eliminated for the majority class) or some combination of the two is deployed. Vilarino et al. used Synthetic Minority Oversampling TEchnique (SMOTE) [17] oversampling. At the algorithmic level, the solutions include adjusting the costs associated with misclassification so as to improve performance [18, 19], adjusting the probabilistic estimate at the tree leaf (when working with decision trees), adjusting the decision threshold, and recognitionbased (i.e., learning from one class) rather than discrimination-based (two class) learning [14]. Akbani et al. proposed the SMOTE with Different Costs algorithm (SDC) [14]. SDC conducts SMOTE oversampling on the minority class with different error costs. Wu et al. proposed the Kernel Boundary Alignment algorithm (KBA) that adjusts the boundary toward the majority class by modifying the kernel matrix [15]. In addition to the naturally occurring class imbalance problem, the imbalanced data situation may also occur in one-against-rest schema in multiclass classification. Therefore, even though the training data is balanced, issues related to the class imbalance problem can frequently surface. Our objective is to deal the class imbalance problem to perform automatic recognition of activities from binary sensor patterns in a smart home. The main contribution of our work is twofold. Firstly, we propose a new criterion to select the cost parameter C for the discriminative method Soft-Support Vector Machines (C-SVM) [3, 7] to appropriately tackle the problem of class imbalance caused by imbalanced activity datasets. Secondly, this method is compared with Conditional Random Fields (CRF) [5], The k-Nearest Neighbors k-NN [2] and the traditional SVM utilized as reference methods. Especially, CRF is a generative probabilistic model have been mainly used as a reference methods which recently gained popularity and work well in recognition activity field [5]. The remainder of this paper is organized as follows, Section 2 describes the different discriminative methods and the weighted C-SVM method combined with our proposed criterion
  • 3. 417 Computer Science & Information Technology (CS & IT) for parameter C setting. Then, Section 3 presents the setup and discusses the results acquired through a series of experiments using different datasets. Finally, we conclude in Section 4. 2. DISCRIMINATIVE METHODS FOR ACTIVITY RECOGNITION 2.1. Conditional Random Fields (CRF) Conditional Random Fields (CRF) have an exponential model for the conditional probability (1) of the entire sequence of labels Y given an input observation sequence X. CRF is defined by a weighted sum of K feature functions f i that will return a 0 or 1 depending on the values of the input variables and therefore determine whether a potential should be included in the calculation. Each feature function carries a weight λi that gives its strength to the proposed label. These weights are the parameters we want to find when learning the model. CRF model parameters can be learned using an iterative gradient method by maximizing the conditional probability distribution defined as T K 1 (1) P (Y | X ) = exp ∑  ∑ λi f i ( yt , yt −1 , xt )    t =1 i =1  Z(X ) With Z(X) = ∑  exp ∑  ∑ λi f i ( yt , yt −1 , xt )     T y   K t =1 i =1  (2)  One of the main consequences of this choice is that while learning the parameters of a CRF we avoid modelling the distribution of the observations, p(x). As a result, we can only use CRF to perform inference (and not to generate data), which is a characteristic of the discriminative models. To find the label y for new observed features, we take the maximum of the conditional probability. ˆ y ( x ) = argmax y p(y | x) (3) 2.2. k-Nearest Neighbors (k-NN) The k-Nearest Neighbors (k-NN) algorithm is amongst the simplest of all machine learning algorithms [2], and therefore easy to implement. The m training instances x ∈ R n are vectors in an n-dimensional feature space, each with a class label. The result of a new query is classified based on the majority of the k-NN categories. The classifiers do not use any model for fitting and are only based on memory to store the feature vectors and class labels. They work based on the minimum distance from an unlabelled vector (a test point) to the training instances to determine the k-NN. The positive integer k is a user-defined constant. Usually Euclidean distance is used as the distance metric. 2.3. C-Support Vector Machines (C-SVM) SVM classifies data by determining a hyperplane into a higher dimensional space (feature space) [3]. For a two class problem, we assume that we have a training set {(xi , yi )} m i =1 where xi ∈ R n are the observations and yi are class labels either 1 or -1. The primal formulation of the soft-margin in SVM maximizes margin 2/K(w,w) between two classes and minimizes the amount of total misclassications (training errors) ξi simultaneously by solving the following optimization problem :
  • 4. Computer Science & Information Technology (CS & IT) 418 m min 1/2.K(w,w) + C ∑ ξ i w,b,ξ i=1 (4) T subject to yi (w φ( xi ) + b) ≥ 1 − ξi , ξi ≥ 0,i = 1,..., m where w is normal to the hyperplane, b is the translation factor of the hyperplane to the origin and φ (.) is a non-linear function which maps the input space into a feature space defined by K ( xi , x j ) = φ ( xi )T φ ( x j ) that is kernel matrix of the input space. Figure1. C-SVM classification problem: The classes are linearly separated in a feature space ( We choose the popular Radial Basis Function (RBF kernel): K ( x, y ) = exp − xi − x j 2 /2σ 2 ) where σ is the width parameter. It is a reasonable first choice for the classification of the nonlinear datasets, as it has fewer parameters. The construction of such functions is described by the Mercer conditions [20]. The regularization parameter C is used to control the trade-off between maximization of the margin width and minimizing the number of training error of nonseparable samples in order to avoid the problem of overfitting [2]. A small value for C will increase the number of training errors, while a large C will lead to a behavior similar to that of a hard-margin SVM. In practice the parameters ( σ and C) are varied through a wide range of values and the optimal performance assessed using a cross-validation technique to verify performance using only training set [20]. The dual formulation of the soft margin SVM can be solved by representing it as a Lagrangian optimization problem as follows [3] : 1 ∑im 1 αi − ∑im 1 ∑ m=1 αi α j yi y j K(xi , x j ) = = j 2 Subject to ∑im 1 α i yi = 0 and 0 ≤ α i ≤ C , = max αi (5) Solving (5) for α gives a decision function in the original space for classifying a test point x ∈ R n [3] is presented by the following formula f ( x ) = sgn ∑ αi yi K(x, xi ) + b    msv  i =1  (6) where msv is the number of support vectors xi ∈ R n . α i > 0 are Lagrange multipliers. The training samples where αi > 0 are called support vectors.
  • 5. 419 Computer Science & Information Technology (CS & IT) In this study, a software package LIBSVM [21] was used to implement the multiclass classifier algorithm. It uses the One-vs-One method [3]. Although SVM often produce effective solutions for balanced datasets, they are sensitive to imbalanced training datasets and produces sub-optimal models because the constraint in (4) imposes equal total influence from the positive and negative support vectors. To cope the imbalanced samples set, we choose the weighted C-SVM formulation [3] and we propose a new criterion for tuning the parameter C. 2.3.1. Weighted SVM In this method, the SVM soft margin objective function is modified to assign two different penalty constraints C + and C − for the minority and majority classes respectively, as given in the quadratic optimization below min 1/ 2.K(w,w) + C+ ∑ξi + C− ∑ξi w,b,ξ yi =1 yi =−1 T subject to yi (w φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0,i = 1,...,m (7) The SVM dual formulation gives the same Lagrangian as in the original soft-margin SVM in (5), but with different constraints on α i as follows: 1 ∑im 1 αi − ∑im 1 ∑ m=1 αi α j yi y j K(xi , x j ) j = = 2 Subject to 0 ≤ αi ≤ C+ , if yi = +1 , and max αi 0 ≤ αi ≤ C − , (8) if yi = −1 In the construction of cost sensitive SVM, the cost parameter plays an indispensable role. For the cost information, some authors [18, 19] have proposed adjusting different penalty parameters for different classes of data which effectively improves the low classification accuracy caused by imbalanced samples. For example, it is highly possible to achieve the high classification accuracy by simply classifying all samples as the class with majority samples (positive observations), therefore the minority class (negative observations) is the error training. Veropoulos et al. in [19] propose to increase the tradeoff associated with the minority class (i.e., C − > C + ) to eliminate the imbalance effect. Veropoulos et al. have not suggested any guidelines for deciding what the relative ratios of the positive to negative cost factors should be. 2.3.1.1. Proposed Criterion Our proposed criterion advocates analytic parameter selection of Ci regularization parameter in N-multi class problem for each class i directly from the training data, on the basis of the proportion of class data. This criterion respects the reasoning of Veropoulos that is to say that the tradeoff C − associated with the smallest class is large in order to improve the low classification accuracy caused by imbalanced samples. It allows the user to set individual weights for individual training examples, which are then used in C-SVM training. We give the main cost value Ci in function of m+ the number of majority class and mi the number of other classes samples, it is given by: Ci = [m+ /mi ] (9) [ ] is integer function and Ci ∈ { ,...,m+ /mi } , i = 1,..., N 1
  • 6. Computer Science & Information Technology (CS & IT) 420 For the two-class training problem, the primal optimization problem of the soft-margin in SVM can be constructed via this criterion and become: min 1/2K(w,w) + ∑ξi +[m+ /m− ]. ∑ξi w,b,ξ yi =1 yi =−1 T subject to yi (w φ( xi ) + b) ≥ 1− ξi , ξi ≥ 0, i =1,...,m (10) The SVM dual formulation gives the same Lagrangian as in the soft margin SVM in (8) with C + = 1 and C− = m + /m− . 3. EXPERIMENTAL RESULTS AND DISCUSSION 3.1. Datasets Experiments were performed using a datasets gathered from three houses having different layouts and different number of sensors [5, 22]. Each sensor is attached to a wireless sensor network node. The activities performed with a single man occupant at each house are different from each other. Data are collected using binary sensors such as reed switches to determine open-close states of doors and cupboards; pressure mats to identify sitting on a couch or lying in bed; mercury contacts to detect the movements of objects like drawers; passive infrared (PIR) sensors to detect motion in a specific area; float sensors to measure the toilet being flushed. Time slices for which no annotation is available are collected in a separate activity labelled ‘Idle’. The data were collected by a Base-Station and labelled using a Wireless Bluetooth headset combined with speech recognition software or a handwritten diary for the house C. Table 1. Overview of activities and the number of observations for each house [5, 22]. House A(1) Idle(4627) Leaving(22617) Toileting(380) Showering(265) Sleeping(11601) Breakfast(109) Dinner(348) Drink(59) House A(2) Idle(6031) Leaving(16856) Toileting(382) Showering(264) Brush teeth(39) Sleeping(11592) Breakfast(93) Dinner(330) Snack(47) Drink(53) House B Idle(5598) Leaving(10835) Toileting(75) Showering(112) Brush teeth(41) Sleeping(6057) Dressing(46) Prep.Breakfast(81) Prep.Dinner(90) Drink(12) Dishes(34) Eat Dinner(54) Eat Breakfast(143) Play piano(492) House C Idle(2732) Leaving(11993) Eating(376) Toileting(243) Showering(191) Brush teeth(102) Shaving(67) Sleeping(7738) Dressing(112) Medication(16) Breakfast(73) Lunch(62) Dinner(291) Snack(24) Drink(34) Relax(2435) 3.2. Setup and Performance Measures We separate the data into a test and training set using a “leave one day out cross validation” approach. Sensors outputs are binary and represented in a feature space which is used by the model to recognize the activities performed. We do not use the raw sensor data representation as observations; instead we use the “Change point” and “Last” representation which have been shown to give much better results in activity recognition [5]. The raw sensor representation gives a 1 when the sensor is firing and a 0 otherwise. The “change point” representation gives a 1 when
  • 7. 421 Computer Science & Information Technology (CS & IT) the sensor reading changes. While the last sensor representation continues to assign a 1 to the last sensor that changed state until a new sensor changes state. As the activity instances were imbalanced between classes, we evaluate the performance of our models by two measures, the accuracy and the class accuracy. The accuracy shows the percentage of correctly classified instances which is highly affected by the sample distribution across activity classes, the class accuracy taking into account the class imbalance shows the average percentage of correctly classified instances per classes Accuracy = m ∑i =1[inferred (i )=true(i )] m N  ∑ mc [inferred (i ) = true c (i ) ] i =1 c Class = 1 ∑   N c =1 mc  (11) (12) in which [a = b] is a binary indicator giving 1 when true and 0 when false. m is the total number of samples, N is the number of classes and mc the total number of samples for class c. 3.3. Results We compared the performance of the CRF, k-NN and C-SVM on the imbalanced dataset of the house A(1) in which minority class are all classes that appear at most 1% of the time, while others are the majority classes that typically, have a longer duration (e.g. leaving and sleeping). These algorithms are tested under MATLAB environment and the SVM algorithm is tested with implementation LibSVM [21]. In our experiments, the C-SVM hyper-parameters (σ, C) have been optimized in the range (0.1-2) and (0.1-10000) respectively to maximize the class accuracy of leave-one-day-out cross validation technique. The best pair parameters (σopt, Copt) = (1, 5) are used, see table 2. Then, we tried to find the penalty parameters Cadaptatif (class) adapted for different classes by using our criterion, see table 3. Table 2. Selection of parameter Copt with the cross validation for C-SVM. Copt Class (%) 0.1 5 50 500 1000 5000 10000 51.7 61 61 61 61 61 61 Our empirical results in table 2 suggest that the value of regularization parameter C has negligible effect on the generalization performance (as long as C is larger than a certain threshold analytically determined from the training data (C =5)). Table 3. Selection of parameter Copt adapted for each class with our criterion for C-SVM. ADL Id Le To Sh Sl Br Di Dr Copt 5 1 59 85 2 207 65 383 We see in table 3 that the minority class requires a large value of C compared with the majority class. This fact induces a classifier’s bias in order to give more importance to the minority ones. The summary of the accuracy and the class accuracy obtained with the concatenation matrix of “Changepoint+Last” for CRF, k-NN, C-SVM using cross validation research and wighted CSVM using our criterion are presented in Table 4. This table shows that C-SVM+our criterion
  • 8. Computer Science & Information Technology (CS & IT) 422 performs better in terms of class accuracy, while others methods performs better in terms of accuracy. Table 4. Accuracy and class accuracy for CRF, k-NN, C-SVM+cross validation search and C-SVM+our criterion. Methods CRF [5] Feature representation Changepoint+Last Accuracy 95.6% Class 70.8% k-NN Changepoint+Last 94.4% 67.9% C-SVM+CV Changepoint+Last 95.4% 61%% C-SVM +Our criterion Changepoint+Last 92.5% 72.4 % We report in figure 2 the classification results in terms of accuracy measure for each class with CRF, k-NN, C-SVM+CV and C-SVM+our criterion methods. CRF, k-NN and C-SVM+CV perform better for the majority activities, while C-SVM+our criterion performs better for minority activities (other classes). Figure 2. Comparison of accuracy of classification between CRF, k-NN, C-SVM+CV and C-SVM+our criterion for different activities Finally, we presented a way of compactly presenting all results in a single table 5, allowing a quick comparison between CRF, k-NN, C-SVM+CV and C-SVM+our criterion performed using three real world datasets recorded in three different houses A(2), B, C. We utilize the leave-oneday-out cross validation technique for the selection of width parameter. We found σopt=1, σopt=1 and σopt=2 for these datasets respectively. Our results give us early experimental evidence that our method C-SVM combined with our proposed criterion works better for model classification; it consistently outperforms the other methods in terms of the class accuracy for all datasets.
  • 9. 423 Computer Science & Information Technology (CS & IT) Table 5. Accuracy and class accuracy for CRF, k-NN, C-SVM+cross validation search and C-SVM+our criterion with three houses datasets. Houses A(2) B C Models CRF [22] k-NNk=7 C-SVM + CVC=5 C-SVM+our criterion CRF [22] k-NN k=9 C-SVM+CVC=5 C-SVM+our criterion CRF [22] k-NN k=1 C-SVM +CVC=500 C-SVM +our criterion Class(%) 57 55.9 50.3 62 46 31.3 39.3 46.4 30 35.7 35.6 37.2 Accuracy(%) 91 90.5 92.1 88 92 67.7 85.5 62.7 78 78.4 80.7 76.8 3.4. Discussion Using experiments on three large real world datasets, we showed the class accuracy obtained with house (C) is lower compared to others houses for all recognition methods. We suspect that the use of a hand written diary for annotation (used in house C) results in less accurate annotation than using the bluetooth headset method (used in houses A and B). In the rest of section, we explain the difference in terms of performance between CRF, k-NN, CSVM+CV and C-SVM+our criterion for the house A(1). The CRF model does not model each action class individually, but use a single model for all classes. As a result classes that are dominantly present in the data have a bigger weight in the CRF optimisation. This is why CRF performs better for the majority activities (’Idle’, ’Leaving’ and ’Sleeping’). In k-NN method, the class with more frequent samples tends to neighbourhood of a test instance despite of distance measurements, which leads to suboptimal classification performance on the minority class. A multiclass C-SVM+CV trains several binary classifiers to differentiate the classes according to the class labels and optimise a single parameter C for all class. When not considering the weights in C-SVM formulation, this affect the classifiers performances and favorites the classification of majority class. C-SVM+our criterion including the individual setting of parameter C for each class separately shows that C-SVM becomes more robust for classifying the rare activities. The recognition of the three kitchen activities ’Breakfast’ ’Dinner’ and ’Drink’ is lower compared to the others activities for all methods. In particular, the ‘Idle’ is one of the most frequent activities in all datasets but is usually not a very important activity to recognize. It might therefore be useful to less weigh this activity. The kitchen activities are food related tasks, they are worst recognized for all methods because most of the instances of these activities were performed in the same location (kitchen) using the same set of sensors. For example, ‘Toileting’ and ‘Showering’ are more separable because they are in two different rooms, which make the information from the door sensors enough to separate the two activities. Therefore the location of the sensors is of great importance for the performance of the recognition system. 4. CONCLUSION This paper introduces a simple criterion that have the power to effectively control the cost of the C-SVM learning machine by dealing imbalanced activity recognition datasets. We demonstrate that our proposed strategy is effective to classify multiclass sensory data over common techniques such as CRF, k-NN and C-SVM using an equal misclassification cost. Usual method for choosing
  • 10. Computer Science & Information Technology (CS & IT) 424 classifiers’s parameters, based on grid search using cross validation become intractable as soon as the number of parameters exceeds two. Our criterion using different penalty parameters in the weighted C-SVM formulation improves the low classification accuracy caused by imbalanced activity recognition datasets. REFERENCES [1] [2] [3] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] M.Wallace “Best practices in nursing care to older adults” in Try This, issue no. 2, Hartford Institute for Geriatric Nursing, 2007. C. Bishop, Pattern Recognition and Machine Learning, Springer. New York, ISBN: 978-0-38731073-2, 2006. V.N. Vapnik. The Nature of Statistical Learning Theory. (Statistics for Engineering and Information Science). Springer Verlag, Second Aufl., 2000. T. van Kasteren, A. Noulas, G. Englebienne, and B. Krose, “Accurate activity recognition in a home setting”, in UbiComp ’08. New York, NY, USA: ACM, 2008, pp. 1-9. A. Fleury, M. Vacher, N. Noury, “SVM-Based Multi-Modal Classification of Activities of Daily Living in Health Smart Homes : Sensors, Algorithms and First Experimental Results,” IEEE Transactions on Information Technology in Biomedicine, Vol. 14(2), pp. 274-283, March 2010. M.B. Abidine, B. Fergani. Evaluating C-SVM, CRF and LDA classification for daily activity recognition. In Proc. of IEEE Int. Conf. on Multimedia Computing and Systems (ICMCS), pp. 272– 277, Tangier-Morocco, May 10-12, 2012. M.B. Abidine and B. Fergani. Evaluating a new classification method using PCA to human activity recognition. In Proc. of IEEE Int. Conf. on Computer Medical Applications (ICCMA), pages 1-4, Sousse, Tunisia, January 20-22, 2013. N. Chawla. Data mining for imbalanced datasets: An overview. Data Mining and Knowledge Discovery Handbook, pages 875-886, 2010. N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on learning from imbalanced data sets,” SIGKDD Explorations, vol. 6, no. 1, pp. 1–6, 2004. N. Chawla, N. Japkowicz, and A. Kolcz. editors 2003. Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Data Sets. G. M. Weiss, “Mining with rarity: a unifying framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7– 19, 2004. G.M. Weiss and F. Provost, Learning when training data are costly: the effect of class distribution on tree induction, Journal of Artificial Intelligence Research 19 :315-354, 2003. R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,” in Proc. of the 15th European Conference on Machine Learning (ECML 2004), pp. 39–50, 2004. G. Wu and E. Y. Chang, “KBA: Kernel boundary alignment considering imbalanced data distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 786–795, 2005. X. Chen, B. Gerlach, and D. Casasent. Pruning support vectors for imbalanced data classification. In Proc. of International Joint Conference on Neural Networks, 1883-88, 2005. N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer. SMOTE: Synthetic Minority Oversampling Technique. Journal of Artificial Intelligence Research, 16, 321-357, 2002. Thai-Nghe N. : Cost-Sensitive Learning Methods for Imbalanced Data, Intl. Joint Conf. on Neural Networks, 2010. K. Veropoulos, C. Campbell and N. Cristianini, “Controlling the sensitivity of support vector machines”, Proceedings of the International Joint Conference on AI, 1999, pp. 55-60. J. Shawe-Taylor and N. Cristianini. “Kernel Methods for Pattern Analysis”, Cambridge University Press, p220, 2004. C. C. Chang and C. J. Lin, LIBSVM: [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin-/libsvm/ T.L.M. van Kasteren, H. Alemdar and C. Ersoy. Effective Performance Metrics for Evaluating Activity Recognition Methods. ARCS 2011 Workshop on Context-Systems Design, Evaluation and Optimisation, Italy, 2011