7 CALDS - Context-Aware - Learning - From - Data - Streams
7 CALDS - Context-Aware - Learning - From - Data - Streams
net/publication/234830957
CITATIONS READS
11 512
3 authors:
Pedro A. C. Sousa
Universidade NOVA de Lisboa
119 PUBLICATIONS 823 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Pedro A. C. Sousa on 03 June 2014.
∗ †
João Bártolo Gomes Ernestina Menasalvas Pedro A. C. Sousa
Universidad Politécnica de Universidad Politécnica de Universidade Nova de Lisboa
Madrid Madrid Faculdade de Ciências e
Facultad de Informática Facultad de Informática Tecnologia
Madrid, Spain Madrid, Spain Lisboa, Portugal
joao.bartolo.gomes [email protected] [email protected]
@alumnos.upm.es
ABSTRACT 1. INTRODUCTION
Drift detection methods in data streams can detect changes Learning from data streams presents many challenging
in incoming data so that learned models can be used to repre- issues. One of the most prominent is the problem of concept
sent the underlying population. In many real-world scenarios drift[17], where changes in the underlying concept require the
context information is available and could be exploited to im- actual decision model to be revised. The problem of concept
prove existing approaches, by detecting or even anticipating drift is observed across several application domains, such
to recurring concepts in the underlying population. Several as health assistance; sensor networks; space observations;
applications, among them health-care or recommender sys- intelligent vehicles; weather prediction; detection of network
tems, lend themselves to use such information as data from intrusions or credit card frauds. In such domains concepts are
sensors is available but is not being used. Nevertheless, new context-dependent, which means that changes in concept are
challenges arise when integrating context with drift detection mostly due to changes in context. Still, this causal relation
methods. Modeling and comparing context information, rep- between context and concepts, referred in the literature as
resenting the context-concepts history and storing previously hidden context[20, 19, 9], is not known. It is also common in
learned concepts for reuse are some of the critical problems. such domains for the underlying concepts to recur[22, 12, 6].
In this work, we propose the Context-aware Learning from Technological advances have made context information
Data Streams (CALDS) system to improve existing drift de- more widely available in many learning environments, par-
tection methods by exploiting available context information. ticularly in ubiquitous computing through the information
Our enhancement is seamless: we use the association be- gathered from various sensors[15, 14, 11].
tween context information and learned concepts to improve Context representation in information systems is a problem
detection and adaptation to drift when concepts reappear. studied by many researchers as they attempt to formally de-
We present and discuss our preliminary experimental results fine the notion of context. Schmidt et al. [15] defined context
with synthetic and real datasets. as the knowledge about the users and device state, and the
well known definition from Dey[4] states that ‘Context is any
information that can be used to characterize the situation
Categories and Subject Descriptors of an entity’. In contrast Brézillon and Pomerol[3] argue
H.2.8 [Database Management]: Database Applications— that no particular knowledge that can be objectively called
Data Mining context, as context is in the eye of the beholder, stating that
‘knowledge that can be qualified as ‘contextual’ depends on
the context!’. Padovitz et al.[14] proposed a general approach
Keywords that models context using geometrical spaces which allow for
Data Stream Mining, Concept drift, Recurring Concepts, the inference of situations for context-aware environments
Context-awareness called Context Spaces.
∗The work of J.P. Bártolo Gomes is supported by a Phd Grant In situations where such context information is linked to
the underlying concepts, this relation could be exploited
from the Portuguese Foundation for Science and Technology to detect and adapt to recurring concepts. Nevertheless,
(FCT)
†The research is partially financed by project TIN2008-05924 such relations are not known apriori and in many real-world
problems we find examples where the available context does
of Spanish Ministry of Science and Innovation not explain all global concept changes. Still, there are many
cases where partial explanations can be obtained from such
information. For example, in weather prediction, one could
expect concepts to reappear with periodicity depending on
Permission to make digital or hard copies of all or part of this work for the season, but there are also other context circumstances
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
that can cause the same concept to reappear even when the
bear this notice and the full citation on the first page. To copy otherwise, to season is not appropriate, as in the event of natural disasters
republish, to post on servers or to redistribute to lists, requires prior specific (e.g. flood, tornado or volcano eruption). Consequently,
permission and/or a fee. associating context and concepts is not trivial although it
StreamKDD’10, July 25, 2010, Washington, DC, USA can be used to predict when a recurring concept will reappear.
Copyright 2010 ACM 978-1-4503-0226-5/10/06 ...$10.00.
16
Drift detection methods[7, 2, 12] detect concept drift by that deal with concept drift and recurrence as those are ones
monitoring the performance of the learning algorithm, en- most related to our own. These works can be divided by how
abling adaptation to the new underlying concept after a they address these problems:
change is detected. These methods do not consider context
information and the possibility of concept recurrence. New 1. Using a window of records or a decay function, where
challenges arise when extending such methods to integrate old records are forgotten and the decision model is
context and concept recurrence, such as: updated with the most recent data as it appears in
the window[20, 13]. This is a very simple and efficient
• Represent context information, context history and its approach, however the main drawback lies in deter-
integration with learned concepts. mining the size of the window. The FLORA learning
system[20] adjusts the window size is dynamically us-
• Store previously learned concept representations. ing an heuristic based on the prediction accuracy and
• Compare contexts and learned concept representations. concept descriptions. It also handles recurrence by stor-
ing concept descriptions. In[13] the authors monitor
• Anticipate to recurring concepts and adaptation to the values of performance indicators, accuracy, recall
drift. and precision over time. The key idea is to automat-
In this work we propose a Context-Aware Learning system ically determine and adjust the window size so that
for Data Streams (CALDS), that extends an existing drift the estimated generalization error on new examples is
detection method[7] by integrating context information and minimized.
explicitly addressing the problem of recurring concepts. 2. Building an ensemble using classification models learned
As in most of the existing approaches[7, 22] we assume from fixed size sequential chunks of the data stream[16,
that an increase in the error rate of the learning process is the 18, 12] and adapt the weights of the classifiers accord-
observable effect of a change in the underlying concept. We ing to the underlying concept. In this approach, the
also assume an on-line learning process working on a stream challenge is in how to determine such weights and the
of labeled records, where the records are read only once and size of chunks.
the target concepts are context-dependent[19]. However, the
relations between context and concepts are not explicitly 3. Using a drift detection method[7, 22, 2, 19], which is
known(hidden context[20, 19, 9]). able to signal when drift occurs. It is assumed that
In our approach the context-concepts relation is learned periods of stable concepts are followed by change into
from the history of the learning process. CALDS combines another stable concept period. Gama et al[7] and Yang
two strategies in order to improve an existing drift detection et al[22] approaches monitor the error-rate of the learn-
method[7]. On the one hand it uses the error rate of the ing algorithm to find drift events. In[2] the distance
learning algorithm and on the other it exploits learned rela- between the errors is used, while the approach pro-
tions between context and recurring concepts. CALDS deals posed by Widmer[19] focus on learning context-concept
with recurring concepts by keeping previously learned models relations. Yang et al[22] approach proposes to learn
and anticipating recurring concept changes. The main con- from the history of concept occurrences, that are seen
tribution presented in this work is the integration of context as states in a Markov chain, and using this information
in the process of detecting and adapting to drift. In order to to proactively adapt select the appropriate model in
achieve our aim the following challenges are also addressed: situation of periodic recurring concepts. In Gama and
i) how to represent context ii) how to learn relations between Kosina[6], referees are used to choose the most appro-
context-concepts (i.e. if the observed context is related to the priate model from history. Whenever drift is detected
learned concepts and their recurrence) iii) how to compare ([7] method is used for drift detection) the classifier
contexts/concept representations. iv) how to discard models and its referee are stored in a pool for further use. The
in situations of memory scarcity. Model is re-used when the percentage of votes of its
We also show with experiments how the proposed extension referee exceeds some given threshold, otherwise a new
is able to improve detection and adaptation to concept drift, classifier is learned. Our approach is similar to these
and thus the overall accuracy of the learning process in cases because it also learns from the history of changes, in
of recurring concepts. our case the context-concept relations.
The rest of the paper is organized as follows: Section
As context-awareness is also a part of our system, we will
2 presents the related work, where we discuss the various
review in depth the works similar to our own because of
methods reported in the literature to adapt to changes in the
their focus on context and concept recurrence. The idea
underlying concept. In section 3 we present our proposed sys-
of using context to track changes is not new and has been
tem, CALDS: Context-Aware Learning from Data Streams,
explored in existing works[19, 9]. It these works, it is assumed
we discuss its components and detection and adaptation to
that learning systems should be able detect and adapt to
the problem of recurring concepts. In section 4 we evaluate
concept changes without explicitly being informed about
the behavior and accuracy of the CALDS system by com-
those changes, by using only the available contextual features
paring it with an existing method, using synthetic and real
and the performance of the base learner as a measure to detect
datasets. Finally, section 5 presents the main conclusions
change in the underlying concept. This assumption is shared
and the outline of future research work.
by our approach.
The approach presented in[19] exploits what the author
2. RELATED WORK refers to as contextual clues and proposes a meta-learning
A general review of the literature related to the problem method to identify such clues. These are context-defining
of concept drift can be found in[17]. Here we review works attributes or combinations of attributes whose values are
17
characteristic of the underlying concept. When more or less
systematic changes in their values are observed this might
indicate a change in the target concept. The method auto-
matically detects contextual clues on-line. When a potential
context change is signaled, the knowledge about the recog-
nized context clues is used to adapt the learning process in
some appropriate way. This idea is explored in CALDS by
learning a context-concepts history.
The approach of conceptual clustering proposed by Har-
ries[9], identifies stable hidden contexts by clustering the
instances assuming that similarity of context is reflected by
the degree to which instances are well classified by the same
concept. A set of models is constructed based on the identi-
fied clusters. This idea is similar to our approach. However,
we present an on-line learning system and we don’t require
to partition the dataset into batches because our concepts
are of arbitrary size as determined by the drift detection
method. Figure 1: CALDS - Learning Process
18
factors that made us choose the incremental Naive Bayes easy to compute. But on the other hand, the measure
algorithm are: i) the possibility to handle both numeric and can only decide similarity based on a threshold value
continuous attributes; ii) a particular model similarity mea- that is not always easy to define. Besides, defining the
sure can be used[12]; iii) the good results reported[12, 6, 19] size of Sn is not straightforward, and for the purpose
for the algorithm as base learner. of our approach its records must belong to the same
underlying concept.
3.1.2 Description of the drift detection method
• Conceptual equivalence, proposed by Yang et al[22],
The proposed approach is based on the one presented
this measure given two decisions models M1 , M2 and
in[7], consequently we present a summary of the method but
a sample set Sn of n records, calculates the degree
for further details we refer the reader to[7]. In section 3.5
of equivalence between those models. It returns a
we present the proposed extension that integrates context
conceptual equivalence (ce) value in the range [-1,1],
information into this concept drift detection method.
the bigger the output value, the higher the degree of
The mentioned drift detection method(DDM)[7] assumes
conceptual equivalence. It compares the prediction of
that periods of stable (i.e. the record distribution is station-
M1 and M2 for the records in Sn . The authors[22] argue
ary) concepts are observed followed by changes leading to a
that the accuracy and the conceptual equivalence degree
new period of stability with a different underlying concept. It
are not necessarily positively correlated. The reasoning
considers the error-rate (i.e. false predictions) of the learning
is that, despite M1 and M2 classify Sn records with
algorithm to be a random variable from Bernoulli trials. The
low accuracy, their equivalence degree can be very high
binomial distribution gives the general form of the probabil-
if they agree on the class of many instances, even when
ity of observing an error. For each record i in the sequence
both misclassify. Moreover, when the accuracies match
being sampled, the error rate is the probability of misclassi-
that does not represent conceptual equivalence as both
fying pi = (F/i), with Fpmisclassifications and with standard
models can achieve the same accuracy and misclassify
deviation given by si = pi(1 − pi) ∗ i. It is assumed that pi different parts of the attribute space. This represents
will decrease while the i increases if the distribution of the the main advantage for this measure, nevertheless, the
examples is stationary. A significant increase in pi , indicates drawbacks associated with the definition of sample Sn
that the class distribution is changing. The values of pi and also exist in this measure.
si are calculated incrementally and their minimum values
(pmin , smin ) are recorded when pi + si reaches its minimum • Conceptual distance, this measure proposed in[12] uses
value. A warning level and a drift level, which represent the naive Bayes algorithm to represent concepts as
confidence levels, are defined using pi , si , pmin , smin . The conceptual vectors and uses the Euclidean distance to
levels and the adaptation strategies for each one are defined compare conceptual vectors. Models where the fre-
as follows: quency tables are similar are taken as belonging to the
same or similar concept. Euclidean distance is used to
• pi + si ≥ pmin + 2 * smin for the warning level(95% compare the conceptual vectors. As in our approach
confidence). Beyond this level, the incoming records are we also use the Naive Bayes algorithm as base learner,
stored in anticipation for a possible change in concept. this measure can be used to compare models. However,
it has to be adapted as in our approach the models
• pi + si ≥ pmin + 3 * smin for the drift level(99% are learned from an arbitrary number of records, in
confidence). Beyond this level the concept drift is contrast with [12] where batches of pre-defined size
considered to be true, the adaptation strategy consists are assumed. Moreover, for nominal attributes we nor-
in reset the model induced by the learning method malized the attribute frequency values so those could
is use the records stored during the warning period be compared in the conditions of our approach. This
to learn a new model that reflects the current target measure main advantage is not requiring a sample set
concept. The values for pmin and smin are also reset. of records to measure concept similarity, however it is
We will then use the described DDM (see section 3.5) in only applicable to a particular concept representation.
which we integrate context to anticipate the drift level and In any case the main problem associated with model simi-
improve the adaptation to change when concepts recur. larity is that a threshold has to be defined that determines
when two models are taken as similar. Even that the more
3.2 Model similarity appropriate measure looked to be the conceptual distance, in
In our approach we need a model similarity measure to the experimentation we present in section 4 the conceptual
check if a particular model represents a new concept or equivalence turned to be the optimal one for that problem.
a recurrent one. Several approaches can be seen in the
literature[22, 12, 18] for this purpose. We will discuss three 3.3 Context representation
different measures we explored in CALDS, explaining its The literature contains a large number of studies that
advantages and disadvantages according to possible execution attempt to formally define context[15, 3, 4, 14]. We based
scenarios. our context representation on the Context Spaces model
[14], where a context state is represented as an object in
• Mean square error(MSE), has been used to estimate a multidimensional Euclidean space. A context state Ci is
expected prediction error of a decision model on a defined as a tuple of N context attribute-values,
sample of records Sn [18]. Whenever MSE calculated for
Ci = (ai1 , ..., ain )
two models is low it is considered the models represent
the same concept and consequently those are said to be where ain represents the value of context attribute an for the
similar. The main advantage of this measure is that is ith context state Ci .
19
The available context information depends on the learning value that attribute takes in C. The model Mj is stored
environment and data mining problem. Context information together with the most f reqC before the concept drift.
can represent simple low level sensors (i.e. temperature, hu- One of the main goals of the presented approach is to
midity) or a high-level context (i.e. season, location, running) use the information of the context to anticipate to the drift
defined by domain experts or inferred by other means beyond when a recurring concept is going to appear. In order to
the scope of the problem discussed in this work. solve this problem in the presented approach we propose
to use the Naive Bayes algorithm to associate context with
3.3.1 Context similarity concepts in the following way: we incrementally learn from
Context similarity is not a trivial problem[14, 1], for the the frequent context associated with the model in use to
purposes of this work we have defined the degrees of similar- train the Naive Bayes to find the Model Mk that was used
ity between context states Ci and Cj , using the Euclidean in period k that matches a given context. As a consequence,
distance as: from the Naive Bayes algorithm we can keep an approximate
v
u N representation of the context history, and its relation with the
uX models without keeping the context records. This alleviates
|Ci − Cj | = t dist(ai − aj )k k the associated memory cost.
K=1
The Naive Bayes algorithm makes it possible to expand
where aik represents the kth attribute-value in context state the frequency tables to include new models that are learned
Ci . For numerical attributes distance is defined as: incrementally. Also it allows us to roughly estimate the
probability that a certain model Mk represents the current
(aik − ajk )2 underlying concept given a certain context state Ci , we
dist(aik , ajk ) =
s2 denote this estimation of probability as h(Mk |Ci ) which
where s is the estimated standard deviation for ak . For represents context-concept history, making it possible to
nominal attributes distance is defined as: anticipate to recurring concepts. This information is also
used when deciding which model to retrieve, as we will see
0 if aik = ajk
dist(aik , ajk ) = in subsection 3.4.2.
1 otherwise
3.4 Model management
We considered two context states Ci , Cj to be similar if
the distance between them is below a predifined threshold One of the main assumptions behind CALDS is that con-
epsilon : cepts reappear. We take this as basis of our approach to
improve learning process. The main disadvantage is the
memory consumed as models have to be stored so they can
true if |Ci − Cj | ≤ be reused if they reappear. There is an accuracy-efficiency
similar(Ci , Cj ) =
f alse if |Ci − Cj | > trade-off between storing learned decisions models, as reusing
a model could increase the classification accuracy and save
The definition of depends on the context space being
computational costs associated with relearning a concept
represented and must be defined according to the problem.
that is recurrent. Consequently, we propose a strategy to
3.3.2 Integration of context with concepts discard models to deal with the storage challenge. In what
follows we describe how we store and retrieve learned deci-
The main assumption under our approach is that when a
sion models. We also briefly discuss the proposed strategy
concept reappears normally the context previously associated
to discard models.
with it also reappears. We take advantage of this fact to
anticipate the detection of recurring concepts. Note that in 3.4.1 Model storage
those cases where we do not have recurring concepts, the
In CALDS, storing a decision model in the model repository
proposed method behaves with the same performance of a
R means storing:
method that does not integrate context.
The usage of a context history to address recurrence has • k, the period when a model Mk is used.
been explored in[22, 6]. To represent concept history and
better adapt to recurrence, here we propose to associate • The model representation: as the Naive Bayes algo-
the concepts (i.e. the stored decision models) with context rithm is used this means to store the conceptual vectors
information. of model Mk that we call CVk .
It is straightforward to see that the method we propose has
• The most frequent context state associated with model
consequently associated a memory cost that is something to
Mk as f reqCk .
take into account in a data stream scenario. In what follows
we present how the context is integrated with concepts and • The accuracy: Acc(Mk ) refers to the accuracy value
how we deal with memory issues. The main idea underlying obtained during the period k Mk was used. Let numC-
the approach is that the most frequent context for a given RecordsMk be the number of correctly classified records
model will be calculated and then associated and stored by Mk and numRecordsMk be the total number of
together with that model. records classified during the period Mk was in used to
Let Mj be the model learned in the period j (i.e. to classify unlabeled records. The accuracy Acc(Mk ) is
classify unlabeled records at a certain moment) and C = defined as:
{C1 , C2 , ..., Cn } a sequence of n concept state records associ- numCRecordsMk
ated with Mj . We aggregate the different context values of Acc(Mk ) =
numRecordsMk
C into one context state that we call frequent context f reqC,
where each attribute value of f reqC is the most frequent The accuracy value is updated if Mk is reused.
20
Consequently each decision model Mk stored in the model – Two situations can happen: i) the model Mk that
repository is defined as the tuple: we use to anticipate represents the current un-
derlying concept and the error-rate decreases (i.e.
Mk = {CVk , f reqC, Acc(Mk ), k}
recurrent concept); ii) the warning level continues,
This information is used in the decision function to dis- in this situation the process waits for n records to
card less relevant models as way to decrease the memory execute the same adaptation strategy as in drift
consumption of CALDS. The approach simply maximizes level.
context heterogeneity between the models kept in R, keeping
the more accurate ones for each context. Note that if the • If drif t is signaled, the current decision model is re-
model is reused and is not able to represent the underly- placed with Mnew from the new base learner and waits
ing concept its accuracy is affected. However, the concept until n records are processed. If Mnew represents a new
representation chosen, as described in section 3.1.1, is very concept (i.e. no similar model in the repository), it
compact and light in terms of resource consumption, and also continues to be incrementally updated. Otherwise the
is expected a manageable number of recurring concepts. We model retrieval procedure is used to obtain form the
do not explore in depth the issue of model removal because repository the model that best represents the recurrent
resource-awareness is not the focus of this work. Note that concept.
the analysis of memory consumption in devices with severe
resources limitations has to be subject of a deeper study.
3.6 CALDS - Learning Process
The continuous learning process represented in figure 2
3.4.2 Model retrieval consists of the following steps:
The main objective of model retrieval procedure is to find
1. process the incoming records from the data stream
the model M from R that better represents the current
using an incremental learning algorithm (base learner)
underlying concept.
to obtain a decision model Mk capable of representing
Let Mk be a stored model that is associated with f reqC
the underlying concept, and classify unlabeled records.
as the most frequent context taken from the context-concept
history h(Mk |f reqC) and M SEk be the mean square error 2. the drift detection method monitors the error-rate of
for Mk calculated with a sample Sn of n records. Let now the learning algorithm.
wm and wc be the weights assigned to the mean square error
and context factors of an utility function. We define a utility 3. the context records are associated with the current Mk
function as: in the context-concept learner.
u(M SEk , h(Mk |f reqC)) = wm ∗−M SEk +wc ∗h(Mk |f reqC) 4. when the error-rate goes up the change detection mech-
The utility function is calculated for all the models associated anism indicates:
to a certain context f reqC and the model that gives the • warning level: prepare a new learner that pro-
highest utility value is retrieved. cesses incoming records while the warning level is
3.5 Integration with drift detection method signaled. Try to use anticipate a recurring concept
using the context-concept history.
The drift detection method we propose is based on the one
by Gama et al.[7] that has been described in section 3.1.2 • drift level: replace the current decision model
CALDS extends this approach to detect and adapt to Mk with one from the repository R representing
recurring concepts using the the information learned from the new underlying concept, in case it is a recur-
context. We will consider as in[7] that the DDM is in one of ring concept, otherwise use the learner prepared
the following levels: stable, warning, or drif t as defined in during the warning window to represent the new
3.1. The following descriptions represent the actions executed underlying concept.
by CALDS for each of these levels: 5. after warning or drift is signaled there is a stability
• stable, this means that the error-rate is less than one period of n records, where the new base learner up-
of the pre-defined warning or drif t levels. In this dates its decision model. When stability period finishes
situation no adaptation is needed independently of the the new base learner model is compared in terms of
changes in context. conceptual equivalence with Mk in the repository R to
determine if the underlying concept is recurrent or a
• warning, could represent a potential false alarm (i.e. the
new concept and the adequate adaptation strategy is
change level is not reached and the error-rate decreases
executed, as proposed in 3.4.2 and 3.5.
to normal level). In this situation we:
– prepare a new instantiation of the base learner
to represent the new underlying concept in case 4. EVALUATION
the error-rate continues to increase and drift is To test CALDS, an implementation was developed in Java,
detected in a near future. using the MOA[10] environment as a test-bed. The MOA[10]
– if the statistics collected from context-concept his- evaluation features have been used to record the accuracy
tory are sufficient (i.e. after some pre-defined of the learning process over time.The SingleClassifierDrift
period considered for training) and h(Mk |Ci )) for class that implements the drift detection method[7] has been
a certain Mk is above a given threshold, we to extended into CALDS. We used synthetic and real world
anticipate to the recurring concept using Mk to datasets to evaluate the behavior and accuracy of the pro-
clasify unlabeled records. posed approach.
21
4.2 Context and recurrence settings
As context for the SEA dataset we used a numerical context
feature space with two features a1 and a2 with values between
1 and 4. It was generated independently as a context stream
where the context attribute a1 is equal to the target concept
function number, and a2 value is a random value, which
introduces noise in the context stream. We generated 250000
records and changed the underlying concept every 15000
records. We tested two recurrence situations, periodic where
the order of concepts is repeated periodically (i.e. 1,2,3,4 )
or random where the concept changes to a different concept
chosen randomly, note that here the context attribute a1 still
represents the underlying concept. The test was executed
with a 10% noise value as in the original paper[16], this
means the class value of the training record is wrong in 10%
Figure 2: CALDS - States of the Learning Process of the records, testing how sensitive the base learner is to
noise.
For the Electricity Market dataset we have considered the
4.1 Datasets classification problem to predict the changes in prices relative
to the next half hour, using as predictive attributes, the time
period, the NSW demand, the Victoria demand and the
4.1.1 Synthetic Dataset scheduled electricity transfer. As context we used the day
A synthetic dataset, the SEA Concepts[16] with MOA[10] of week attribute, as in [8] experiments using it lead to 10
as the stream generator was used. SEA Concepts is a bench- different contextual clusters. We expect that the association
mark data stream that uses different functions to simulate of this context with the stored models achieves good accuracy
concept drift, allowing control over the target concepts and results, when compared with the original paper results, that
its recurrence in our experiment. The SEA Concepts dataset uses the SPLICE-2 algorithm[9]. However, one drawback of
has two classes {class0, class1} and three features with values a real world dataset is that we do not know for sure what
between 0 and 10 but only the first two features are relevant. the actual hidden context is and when such changes occur,
The target concept function classifies a record as class1 if which makes it more difficult to evaluate the obtained results,
f1 + f2 ≤ θ and otherwise as class0, f1 and f2 are the two anyway this represents the learning scenario that motivates
relevant features and θ is the threshold value between the our approach, that is common in real-world problems. This
two classes. Four target concept functions as in proposed dataset was also used in [7] to test the drift detection method
in[16] are used, threshold values 8, 9, 7 and 9.5 are used to in real world problems, achieving good performance results.
define those functions.
4.3 Experiments
4.1.2 Real World Dataset For both datasets the approach proposed in this paper is
As real world dataset we used the Electricity Market compared in terms of accuracy with the SingleClassifierDrift
Dataset[8]. The data was collected from the Australian implemented in MOA[10]. We monitored the records where
New South Wales Electricity Market, where the electricity change occurs and observed if the adaptation to change is
prices are not stationary and are affected by the market as expected and the approach is able to learn the relations
supply and demand. The market demand is influenced by between concepts and context. In the case with the synthetic
context such as season, weather, time of the day and central dataset we monitored if the mechanism is able to predict
business district population density. The supply is influenced the underlying concept after change is detected by recording
primarily by the number of on-line generators. An influenc- its accuracy. The SingleClassifierDrift approach also uses
ing factor for the price evolution of the electricity market is the Naive Bayes algorithm and detects drift using the drift
time. During the time period described in the dataset the detection method[7], that does not consider recurrence. In
electricity market was expanded with the inclusion of adja- the real world dataset we also compare results with an incre-
cent areas (Victoria state), which lead to more elaborated mental Naive Bayes algorithm[5] (without any mechanism
management of the supply as oversupply in one area could be to adapt to drift), again to be used as reference.
sold interstate. The ELEC2 dataset contains 45312 records The parameter values presented were tested for the differ-
obtained from 7 May 1996 to 5 December 1998, with one ent datasets so a adequate value could be defined during the
record for each half hour (i.e. there are 48 instances for each experiments.
time period of one day). Each record has 5 attributes, the We tested the three concept similarity measures discussed
day of week, the time period, the NSW demand, the Victoria in 3.2, which achieve similar equivalence in situations ob-
demand, the scheduled electricity transfer between states and served in the experiments, when the thresholds are defined
the class label. The class label identifies the change of the in accordance to the measure and its range. The conceptual
price related to a moving average of the last 24 hours. The equivalence[22] measure lead to slightly better results. was
class level only reflects deviations of the price on a one day designed for the problem of concept similarity and its thresh-
average and removes the impact of longer term price trends. old is easy to define, for these reasons the results presented
As shown in[8] the dataset exhibits substantial seasonality we choose to use it with 0.3 as threshold value.
and is influenced by changes in context. This motivates its For the experiments, 100 was the number records used in
use as a real world dataset in our evaluation. sample Sn to compare the models (i.e. the number of records
22
we consider the learned concept stable), the weights assigned
to the utility function where 0.5 to each factor. We used a
training period for the concept history of 60000 records for
the SEA dataset and 10000 for Electricity Market dataset.
23
hidden context. Machine Learning, 32(2):101–126, 1998.
[10] G. Holmes, R. Kirkby, and B. Pfahringer. MOA:
Massive Online Analysis, 2007 -
http://sourceforge.net/projects/moa-datastream/.
[11] J. Indre and M. Pechenizkiy. Towards Context Aware
Food Sales Prediction. In 2009 IEEE International
Conference on Data Mining Workshops, pages 94–99.
IEEE, 2009.
[12] I. Katakis, G. Tsoumakas, and I. Vlahavas. Tracking
recurring contexts using ensemble classifiers: an
application to email filtering. Knowledge and
Information Systems, pages 1–21.
[13] R. Klinkenberg and T. Joachims. Detecting Concept
Drift with Support Vector Machines. In Proceedings of
the Seventeenth International Conference on Machine
Learning, page 494. Morgan Kaufmann Publishers Inc.,
2000.
Figure 5: Comparison of accuracy between the pro- [14] A. Padovitz, S. Loke, and A. Zaslavsky. Towards a
posed approach (Context), single classifier with drift theory of context spaces. In Pervasive Computing and
detection (Single) and incremental NaiveBayes(NB) Communications Workshops, 2004. Proceedings of the
using the Elec2 dataset. Second IEEE Annual Conference on, pages 38–42, 2004.
[15] A. Schmidt, M. Beigl, and H. Gellersen. There is more
to context than location. Computers & Graphics,
this work, we plan to further analyze the accuracy-efficiency
23(6):893–901, 1999.
trade-off of the approach, consequently we are currently ana-
[16] W. Street and Y. Kim. A streaming ensemble
lyzing overhead cost both in resources consumption and in
algorithm (SEA) for large-scale classification. In
response time to enable CALDS execution in devices with
Proceedings of the seventh ACM SIGKDD international
different resource constraints.
conference on Knowledge discovery and data mining,
pages 377–382. ACM New York, NY, USA, 2001.
6. REFERENCES [17] A. Tsymbal. The problem of concept drift: definitions
[1] C. Anagnostopoulos, Y. Ntarladimas, and and related work. Computer Science Department,
S. Hadjiefthymiades. Reasoning about situation Trinity College Dublin, 2004.
similarity. In Intelligent Systems, 2006 3rd [18] H. Wang, W. Fan, P. Yu, and J. Han. Mining
International IEEE Conference on, pages 109–114, concept-drifting data streams using ensemble classifiers.
2006. In Proceedings of the ninth ACM SIGKDD
[2] M. Baena-Garcıa, J. del Campo-Ávila, R. Fidalgo, international conference on Knowledge discovery and
A. Bifet, R. Gavaldà, and R. Morales-Bueno. Early data mining, pages 226–235. ACM New York, NY,
drift detection method. In Fourth International USA, 2003.
Workshop on Knowledge Discovery from Data Streams, [19] G. Widmer. Tracking context changes through
pages 77–86. Citeseer, 2006. meta-learning. Machine Learning, 27(3):259–286, 1997.
[3] R. Brezillon and J. Pomerol. Contextual knowledge [20] G. Widmer and M. Kubat. Learning in the presence of
sharing and cooperation in intelligent assistant systems. concept drift and hidden contexts. Machine learning,
TRAVAIL HUMAIN, 62:223–246, 1999. 23(1):69–101, 1996.
[4] A. Dey, G. Abowd, and D. Salber. A conceptual [21] I. Witten and E. Frank. Data Mining: Practical
framework and a toolkit for supporting the rapid machine learning tools and techniques. Morgan
prototyping of context-aware applications. Kaufmann Pub, 2005.
Human-Computer Interaction, 16(2):97–166, 2001. [22] Y. Yang, X. Wu, and X. Zhu. Mining in anticipation
[5] J. Gama and M. Gaber. Learning from data streams: for concept change: Proactive-reactive prediction in
processing techniques in sensor networks. data streams. Data mining and knowledge discovery,
Springer-Verlag New York Inc, 2007. 13(3):261–289, 2006.
[6] J. Gama and P. Kosina. Tracking Recurring Concepts
with Meta-learners. In Progress in Artificial
Intelligence: 14th Portuguese Conference on Artificial
Intelligence, Epia 2009, Aveiro, Portugal, October
12-15, 2009, Proceedings, page 423. Springer, 2009.
[7] J. Gama, P. Medas, G. Castillo, and P. Rodrigues.
Learning with drift detection. Lecture Notes in
Computer Science, pages 286–295, 2004.
[8] M. Harries. Splice-2 comparative evaluation:
Electricity pricing. Technical report, The University of
South Wales, 1999.
[9] M. Harries, C. Sammut, and K. Horn. Extracting
24