Artificial Intelligence Applications and Innovations III 2009
Artificial Intelligence Applications and Innovations III 2009
IFIP was founded in 1960 under the auspices of UNESCO, following the First World
Computer Congress held in Paris the previous year. An umbrella organization for
societies working in information processing, IFIP's aim is two-fold: to support
information processing within its member countries and to encourage technology
transfer to developing nations. As its mission statement clearly states,
The flagship event is the IFIP World Computer Congress, at which both invited and
contributed papers are presented. Contributed papers are rigorously refereed and the
rejection rate is high.
As with the Congress, participation in the open conferences is open to all and papers
may be invited or submitted. Again, submitted papers are stringently refereed.
The working conferences are structured differently. They are usually run by a
working group and attendance is small and by invitation only. Their purpose is to
create an atmosphere conducive to innovation and development. Refereeing is less
rigorous and papers are subjected to extensive group discussion.
Publications arising from IFIP events vary. The papers presented at the IFIP World
Computer Congress and at open conferences are published as conference
proceedings, while the results of the working conferences are often published as
collections of selected and edited papers.
Any national society whose primary activity is in information may apply to become
a full member of IFIP, although full membership is restricted to one society per
country. Full members are entitled to vote at the annual General Assembly, National
societies preferring a less committed involvement may apply for associate or
corresponding membership. Associate members enjoy the same benefits as full
members, but without voting rights. Corresponding members are not represented in
IFIP bodies. Affiliated membership is open to non-national societies, and individual
and honorary membership schemes are also offered.
ARTIFICIAL INTELLIGENCE
APPLICATIONS AND INNOVATIONS III
Edited by
Iliadis
Democritus University of Thrace
Greece
Maglogiannis
University of Central
Greece
Tsoumakas
Aristotle University of Thessaloniki
Greece
Vlahavas
Aristotle University of Thessaloniki
Greece
Bramer
University of Portsmouth
United Kingdom
Library of Congress Control Number: 2009921831
9 8 7 6 5 4 3 2 1
springer.com
Preface
The ever expanding abundance of information and computing power enables re-
searchers and users to tackle highly interesting issues, such as applications provid-
ing personalized access and interactivity to multimodal information based on user
preferences and semantic concepts or human-machine interface systems utilizing
information on the affective state of the user. The general focus of the AIAI confer-
ence is to provide insights on how AI can be implemented in real world applications.
This volume contains papers selected for presentation at the 5th IFIP Confer-
ence on Artificial Intelligence Applications & Innovations (AIAI 2009) being held
from 23rd till 25th of April, in Thessaloniki, Greece. The IFIP AIAI 2009 confer-
ence is co-organized by the Aristotle University of Thessaloniki, by the University
of Macedonia Thessaloniki and by the Democritus University of Thrace. AIAI
2009 is the official conference of the WG12.5 "Artificial Intelligence Applica-
tions" working group of IFIP TC12 the International Federation for Information
Processing Technical Committee on Artificial Intelligence (AI).
It is a conference growing and maintaining high standards of quality. The pur-
pose of the 5th IFIP AIAI Conference is to bring together researchers, engineers
and practitioners interested in the technical advances and business / industrial ap-
plications of intelligent systems. AIAI 2009 is not only focused in providing in-
sights on how AI can be implemented in real world applications, but it also covers
innovative methods, tools and ideas of AI on architectural and algorithmic level.
The response to the ‘Call for Papers’ was overwhelming resulting in the sub-
mission of 113 high quality full papers. All contributions were reviewed by two
independent academic referees. A third referee was consulted in some cases with
conflicting reviews after the submission of the reviews was officially over. Fi-
nally, 30 full papers and 32 short papers were accepted. This amounts to an accep-
tance rate of 27% for full papers and 28% for short ones. The authors of the ac-
cepted papers come from 19 countries from all over the world. The collection of
papers that were included in the proceedings offer stimulating insights into emerg-
ing applications of AI and describe advanced prototypes, systems, tools and tech-
niques. The 2009 AIAI Proceedings will interest not only academics and research-
ers, but also IT professionals and consultants by examining technologies and
applications of demonstrable value.
Two Keynote speakers are invited to make interesting presentations on innova-
tive and state of the art aspects of AI:
1. Professor Nikolaos Bourbakis, Associate Dean for Engineering Research, Dis-
tinguished Professor of Information Technology and Director of the ATR Cen-
ter at Wright State University will talk about “Synergies of AI Methods for Ro-
botic Planning & Grabbing, Facial Expressions Recognition, and Blind's
Navigation”.
vi Preface
2. Professor Dominic Palmer-Brown, Dean, Metropolitan University London,
UK, will talk about “Neural Networks for Modal and Virtual Learning”.
We would like to express our thanks to the Program Committee chair, Associ-
ate Professor L. Iliadis, to the Workshop chair, Assistant Professor N. Bassiliades,
and to the Organizing Committee chair Professor Yannis Manolopoulos, for their
crucial help in organizing this event. Special thanks are also due to the co-editors
of the proceedings, Assistant Professor Ilias Maglogiannis and Lecturer Gregory
Tsoumakas.
The AIAI 2009 conference comprises of the following seven (7) main thematic
Sessions:
• Machine Learning and Classification
• Knowledge Engineering and Decision Support Systems
• Ontologies and Knowledge Representation
• AI in Medical Informatics & Biomedical Engineering
• Signal and Image Processing for Knowledge Extraction
• Artificial Intelligence Applications
• Intelligent Environments and HCI
Also Workshops on various specific AI application areas, such as Software En-
gineering, Bioinformatics and Medicine, Learning, Environment, have been
scheduled.
The wide range of topics and high level of contributions will surely guarantee a
very successful conference. We express our special thanks to all who have con-
tributed to the organization and scientific contents of this conference, first to the
authors and reviewers of the papers, as well as the members of the Program and
Organization Committees.
Ioannis Vlahavas
Max Bramer
Preface vii
Organization of the AIAI’2009 Conference
Workshop Chair
Proceedings co-Editors
Organizing Committee
Invited Papers
Short papers
A Fuzzy Knowledge-based Decision Support System for Tender Call
Call Evaluation………………………………………………………….. 51
Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis,
Aristodimos Thomopoulos
x Table of Contents
Extended CNP Framework for the Dynamic Pickup and Delivery
Problem solving……………………………………………………......... 61
Zoulel Kouki, Besma Fayech chaar, Mekki Ksouri
Short papers
OntoLife: an Ontology for Semantically Managing Personal
Information…………………………………………………….……... 127
Eleni Kargioti, Efstratios Kontopoulos, Nick Bassiliades
Short papers
MEDICAL_MAS: an Agent-Based System for Medical Diagnosis…... 225
Mihaela Oprea
Short papers
Two Levels Similarity Modelling: a novel Content Based Image
Clustering Concept……………………………………...……………. 277
Amar Djouak, Hichem Maaref
Short papers
A Genetic Algorithm for the Classification of Earthquake Damages
in Buildings……………………………….………………………….. 341
Peter-Fotios Alvanitopoulos, Ioannis Andreadis, Anaxagoras Elenas
Short papers
Defining a Task's Temporal Domain for Intelligent Calendar
Applications………………………………………………………….. 399
Anastasios Alexiadis, Ioannis Refanidis
Short papers
A Lazy Approach for Machine Learning Algorithms……………....... 517
Ines M. Galvan, Jose M. Valls, Nicolas Lecomte, Pedro Isasi
Table of Contents xv
TELIOS: A Tool for the Automatic Generation of Logic
Programming Machines …………………………………………….. 523
Alexandros Dimopoulos, Christos Pavlatos, George
Papakonstantinou
Nikolaos G. Bourbakis
Bourbakis, N.G., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 1–1.
1eural 1etworks for Modal and Virtual
Learning
Dominic Palmer-Brown
Abstract This talk will explore the integration of learning modes into a single
neural network structure in order to overcome the inherent limitations of any given
mode (for example some modes memorize specific features, others average across
features and both approaches may be relevant according to the circumstances). In-
spiration comes from neuroscience, cognitive science and human learning, where
it is impossible to build a serious model of learning without consideration of mul-
tiple modes; and motivation also comes from non-stationary input data, or time
variant learning oblectives, where the optimal mode is a function of time. Several
modal learning ideas will be presented, including the Snap-Drift Neural Network
which toggles its learning (across the network or on a neuron-by-neuron basis) be-
tween two modes, either unsupervised or guided by performance feedback (rein-
forcement) and an adaptive function Neural Network (ADFUNN) in which adap-
tion applies simultaneously to both the weights and the individual neuron
activation functions. The talk will also focus on a virtual learning environment ex-
ample that involves the modal learning Neural Network, identifying patterns of
student learning that can be used to target diagnostic feedback that guides the
learner towards increased states of knowledge.
Palmer-Brown, D., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 2–2.
A Hybrid Technology for Operational Decision
Support in Pervasive Environments
St.Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
(SPIIRAS), 39, 14th line, St.Petersburg, 199178, Russia
{smir, oleg, nick, alexey}@iias.spb.su
Abstract The paper addresses the issue of development of a technology for opera-
tional decision support in a pervasive environment. The technology is built around
the idea of using Web-services for self-organization of heterogeneous resources of
the environment for decision support purposes. The approach focuses on three
types of resources to be organized: information, problem-solving, and acting. The
final purpose of the resource self-organization is to form an ad-hoc collaborative
environment, members of which cooperate with the aim to serve the current needs
according to the decision situation. The hybrid technology proposed in the paper
integrates technologies of ontology management, context management, constraint
satisfaction, Web-services, and intelligent agents. The application of the technol-
ogy is illustrated by response to a traffic accident.
1 Introduction
Smirnov, A., Levashova, T., Shilov, N. and Kashevnik, A., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 3–12.
4 A. Smirnov et al.
The idea behind is to use Web-services as mediators between the pervasive en-
vironment and the surrounding resources. It is proposed to represent the resources
by sets of Web-services. The set of Web-services representing each resource im-
plements the functionality of this resource. This makes it possible to replace the
self-organization of resources with that between the Web-services. In terms of this
replacement the resource collaborative environment is correspond to an ad-hoc
service network.
The decision situation is modeled at two levels: abstract and operational. At the
abstract level the decision situation is represented by the abstract context that is an
ontology-based model of this situation expressed by constraints. At the operational
level the decision situation is represented by the operational context that is an in-
stantiated abstract context. The operational context is produced by the self-
organized service network representing resources to be collaborated.
The decision support system (DSS) built upon the hybrid technology is based
on service-oriented architecture. The architecture enables interaction with the het-
erogeneous resources using the ad-hoc Web-service network and Web-service
communications using an agent-based service model [1].
2 Hybrid Technology
Result in terms
Objectives Techniques Technology
of OOCN
AO building Integration of existing Ontology engineering, OOCN without
ontologies, knowledge ontology management variable values
formalisation
Representation of re- Alignment of the AO Ontology engineering, OOCN without
source functionalities and Web-service de- ontology management, variable values
scriptions Web-services
Extraction and inte- Abstract context Ontology management General prob-
gration of relevant composition lem model
knowledge
Self-organization of Agent interactions Intelligent agents Instantiated
Web-services problem model
Gathering and proc- Operational context Context management, Instantiated
essing of relevant in- producing Web-services problem model
formation
Search for a solution Problem solving Constraint satisfaction, A set of feasi-
Web-services ble solutions
User preferences re- Context-based Profiling A set of user
vealing accumulation of made constraints
decisions
Based on the type of the situation the DSS extracts knowledge relevant to this
type from the AO and integrates it in the abstract context that is an ontology-based
model of the situation. The knowledge is extracted along with Web-services, de-
scriptions of which are aligned against this knowledge. To operate on the extrac-
tion of knowledge and its integration ontology management methods are applied.
The abstract context is the base for self-organization of Web-services that
have been included in the abstract context, into a service network. The purpose of
the service network is the organization of a resource collaborative environment for
producing an operational context and for taking joint actions required in the situa-
tion. The operational context is the instantiated abstract context or an instantiated
model of the decision situation. The operational context is interpreted as CSP by
the service network using the constraint satisfaction technology.
Self-organization of the Web-services is carried out through negotiation of
their needs and possibilities. To make the Web-services active components capa-
ble to self-organize an agent-based service model is used. Intelligent agents nego-
tiate services’ needs and possibilities in terms of the AO negotiating input (service
needs) and output (service possibilities) arguments of the functions that the Web-
services implement. To operate on the producing the operational context tech-
nologies of context management, Web-services, and intelligent agents are in-
volved.
6 A. Smirnov et al.
The decision situation (the operational context) and a set of solutions for tasks
represented in this context are presented to the decision maker. The solution cho-
sen by the decision maker is considered to be the decision. The abstract and opera-
tional contexts, the set of solutions, and the decision are saved. The DSS uses
them for revealing user preferences. This is the focus of the profiling technology.
3 Service-Oriented Architecture
In the architecture (Fig. 1) of the DSS intended for functioning in a pervasive en-
vironment two types of Web-services are distinguished: core Web-services and
operational Web-services.
The core Web-services are intended to support the DSS users and the abstract
context creation. These Web-services comprise:
• UserProfileService creates, modifies, and updates the user profile; provides
access to this profile; collects information about the user; accumulates informa-
tion about the made decisions in a context-based way; reveals user preferences;
• UserInteractionsService is responsible for interactions of the DSS with its us-
ers. It communicates between the DSS and its users providing DSS messages,
context-sensitive help, pictures of decision situations, results of problem solv-
ing, and delivering information from the users to the DSS;
• AOAccessService provides access to the AO;
• AbstractContextService creates, stores, and reuses abstract contexts;
• ManagementService manages Web-services to create the abstract context. It
operates with the service registry where the core services are registered.
The operational Web-services self-organize a Web-service network. In the ar-
Weather
conditions
Route availability
Road locations
cations is read from the healthcare infrastructure database, hospital free capacities
are provided by hospital administration systems.
The common model of a Web-service implementing information resource func-
tions is illustrated by the example of Web-service responsible for receiving infor-
mation about emergency and police teams, and firefighter brigades available in the
region. This Web-service requests the database storing information about the
emergency and police teams, and firefighter brigades and returns a list of such
teams and brigades. The list contains identifiers of the teams and brigades, URIs
of their Web-services (this Web-services are used to receive additional informa-
tion about the teams and brigades, e.g. current brigade location), and types of ve-
hicles used by these teams and brigades. The Web-service being illustrated is im-
plemented in PHP [5]. Key steps of the service are as follows:
...
$conn = odbc_connect ("brigades"); //connection to the database stor-
ing information about the emergency and police teams, and firefighter
brigades
10 A. Smirnov et al.
$sql = "SELECT id, brigade_Description, brigade_WebserviceURI,
brigade_Type, brigade_WorkType FROM brigades"; //query to the
database
// id – identifier of the brigade
// brigade_WebserviceURI – URI of the Web service of the brigade
// brigade_Type - type of the vehicle the brigade uses
// brigade_WorkType – type of the brigade (emergency team or
firefighter brigade or police team)
$brigades = GetData($conn, $sql); //query result returned to the Web
service in OOCN compatible format
// GetData is responsible for the conversion of information from
database format into OOCN compatible format
...
return $brigades; //Web service output - a list of brigades with
their characteristics
5 Related Research
Acknowledgements The paper is due to the research carried out as a part of the project funded
by grant 08-07-00264 of the Russian Foundation for Basic Research, and project 213 of the re-
search program “Intelligent information technologies, mathematical modelling, system analysis
and automation” of the Russian Academy of Sciences.
References
Flavia Cristina Bernardini and Ana Cristina Bicharra Garcia and Inhaúma
Neves Ferraz
Bernardini, F.C., Garcia, A.C.B. and Ferraz, I.N., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp.13–20.
14 F. Bernardini, A. Garcia and I. Ferraz
1 Introduction
Motor pumps, due to the rotating nature of their internal pieces, produce vibra-
tions. Accelerometers strategically placed in points next to the motor and the
pump allows acceleration of the machine over time to be measured, thus generat-
ing a signal of the vibration level. Fig. 1 shows a typical positioning configuration
of accelerometers on the equipment. In general, the orientations of the sensors fol-
low the three main axes of the machine, e.g. vertical, horizontal and axial.
The presence of any type of machine faults causes change in mechanical and
electrical forces that are acting in the machine [10]. The degree of change depends
upon the nature and intensity of the fault. The change in machine vibration is the
excitation of some of the vibration harmonics. Some of machine faults can be di-
rectly related to the vibration harmonic. Table 6.0, “Illustrated Vibration Diagnos-
tic Chart”, in [7], shows how to analyze signals, searching for mechanical and
16 F. Bernardini, A. Garcia and I. Ferraz
electrical faults. In what follows, electrical and mechanical faults are briefly de-
scribed.
Until the late ‘80s the most popular approach to classification problems was a
knowledge engineering one, consisting in manually defining a set of rules encod-
ing expert knowledge on how to classify documents under the given categories. In
the ‘90s, this approach has increasingly lost popularity in favor of the machine
learning paradigm, according to with a general inductive process automatically
builds a general hypothesis to classify new instances, based on instances previ-
ously labeled by some domain expert [9,8]. However, there are some problems
that label attributed to the instances are not 100% guaranteed that are true, or there
are unbalanced classes, which difficult the induction model process [1]. In these
cases, it is interesting to construct an expert system, which contains the knowledge
of the expert domain, represented in a parametric net, to (a) classify new instances
with a set of labels; and (b) validate the available instances.
Parametric nets are used to inference logical facts, supporting decision making.
In a parametric net, the parameters represent the problem features, domain proper-
ties, or decisions that must have made during the reasoning process. The various
parameters of a knowledge base are inter-connected. These are directed connec-
tions, because they represent the dependency between parameters and define the
logic precedence of the parameters instantiation. The parameter values represent
the actual state of the problem being solved.
In its basic version, proposed in [3] to Active Document Design (ADD), and il-
lustrated in Fig. 2, the parameters belong to one of three categories: primitive, de-
rived or decision. Primitive parameters normally are the representation of the
problem requisites. In general, these values are informed by the user during the
reasoning process. Values of derived parameters are calculated based on values of
other parameters. A value is chosen to a decided parameter from a set of alterna-
tives of the attribute. The set of alternatives is filtered by constraints that represent
conditions to be satisfied by values that come from the parameters connected to
the decided parameter. The constraints are represented by rules. The rules has the
form “if <body> then update weight wk with (positive or negative) value”, where
<body> is a set of conditions as primitive <operator> value, and <operator>
may be >, <, ≤, ≥ and =. At the end of the reasoning process, all the alternatives
are compared, and one alternative is chosen as an answer to the problem being
solved. One common way to decide what is the best alternative is weighting each
alternative. An evaluated criterion represents the value to be added to an alterna-
tive. The alternative with the maximum weight at the end of the evaluation criteria
is selected as the best alternative.
An Expert System Based on Parametric Net 17
Model Construction: The type of motor pump considered in our study has the
following characteristics: horizontal centrifuge with one stage (one rotor), direct
coupling without gear box, and actuated by AC induction squirrel cage motor. The
faults considered in our study are unbalance, misalignment, electric, hydraulic, ca-
vitation, turbulence, bearing faults, looseness and resonance. Fig. 1 shows the
points on motor pump where specific vibrations are captured. On each point, a
time signal is captured by an accelerometer, and signal operators are applied to ob-
tain acceleration, velocity and envelope signals. Each signal is important to detect
groups of faults, or specific faults. These pieces of information about motor pump
18 F. Bernardini, A. Garcia and I. Ferraz
AF1 AF2 AF3 AFM
...
(1XRF) (1XRF) (1XRF) (...)
FV1 x1 x2 x3 ... xM (1H-VxF)
FV2 xM+1 xM+2 xM+3 ... x2M (1V-AxF)
FV3 x2M+1 x2M+2 x2M+3 ... x3M (1A-ExF)
... ... ... ... ... ... ...
FVT x(T-1)M+1 x(T-1)M +2 x(T-1)M +3 ... xTM ...
Fig. 3 . Instances of abstract features. Texts in parentheses comprehend an example of abstract
features. In this case, the abstract features are vibration values in harmonic frequencies, e.g.
1XRF means vibration value in 1x rotation frequency, 2XRF means vibration value in 2x rota-
tion frequency, and so on. The feature instances are signals (VxF – velocity per frequency signal;
AxF –acceleration per frequency signal; ExF – envelope per frequency value) captured on motor
pump points (1, 2, 3 and 4) at different directions (H – horizontal, V – vertical or A – axial).
diagnoses vibration are extracted from Table 6.0, “Illustrated Vibration Diagnostic
Chart”, in [7], and were explained and detailed by domain experts. All of this
knowledge was used to construct constraints of the failure decision parameter. The
parametric net model that aims to classify signal sets into a set of classes has one
decision and many primitive parameters. Each fault is an alternative of the failure
decision parameter of the parametric net.
Since vibration harmonics are what influence each alternative, each one is a
primitive parameter. The primitives considered are vibration values in harmonic
(1X, 2X,…) and inter-harmonic (0.5X, 1.5X,…) of the rotational frequency in
r.p.s; RMS calculated in harmonic and inter-harmonic frequencies; BPFO, BPFI,
BSF and RHF frequencies; electrical frequency; and pole frequency. Also, there
are primitives that give to the model characteristics of the capturing position: ve-
locity, acceleration, envelope, radial, axial, motor and pump, and they are set to
true or false depending on the signal. E.g., if the signal is captured in position 1V
and is the velocity signal which is being analyzed, then velocity is set to true,
where as acceleration and envelope are set to false; radial is set to true1 where as
axial is set to false; and motor is set to true where as pump is set to false.
Model Application: To analyze a motor pump, ten acceleration signals in fre-
quency domain are captured (one signal per point). Applying the mentioned opera-
tors, 30 signals are obtained. The model has all alternative weights initialize with
0. Each velocity, acceleration and envelope signal of each point is shown to the
model, which may increment the weight of each alternative. At the end of this
process, all alternatives that have positive weights (greater than zero) are normal-
ized to the range 0-1, which are shown to the analyst.
A Case Study: We implemented a computational system, called ADDRPD, to
help the analyst in all of the analysis process. Time signals of a specific motor
pump are imported to the system. All transformations are applied and the resulting
signals are shown to the user. One instance was labeled by the expert having only
(a) (b)
Fig. 4 . Velocity (RMS/s) per rotational frequency harmonics signals, showing high vibration and
lower peaks in rotational frequency harmonics (a) captured at 1H and (b) captured at 2A.
Both signals showed in Fig. 4 are velocity per frequency signals, however Fig.
4 (a) was captured at radial direction, where as Fig. 4 (b) was captured at axial di-
rection. So, since both has high peak vibration at 1X, this represents that unbal-
ance is the most representative failure in the motor pump. But the signal Fig. 4 (b)
is from axial direction, which highly indicates misalignment failure, and the lower
peaks at harmonic frequencies weakly indicate looseness. The analysis shows that
the parametric net join to visual tools are efficient ways of analyzing motor pumps
to diagnostic their failures.
We propose in this work a method to assist fault diagnosis using parametric nets to
represent the expert knowledge, based on vibration analysis. To this end, we pro-
pose a parametric net to multi-label problems. We present a model we developed
for a special type of motor pump – horizontal centrifuge with one stage (one ro-
tor), direct coupling without gear box, and actuated by AC induction squirrel cage
motor. To preliminary evaluate the proposed model, we present a case study, us-
ing signals captured from a motor pump in use in real world. We could notice that
the model could assist the expert interpretation of the signals.
Our method was implemented in a computational system called ADDRPD. The
system will help to classify new instances, which we intend to improve our me-
thod. Ongoing work includes evaluating our method using recall and precision
measure for each class, as a multi-label problem [12]. After validating (part of) the
dataset, the parametric model will be updated and machine learning algorithms
will be used to induce models and compare to the method proposed in this work.
20 F. Bernardini, A. Garcia and I. Ferraz
References
1. Batista, G.; Prati, R. C.; Monard, M. C. A study of the behavior of several methods for bal-
ancing machine learning training data. SIGKDD Explorations, 6(1):20-29, 2004.
2. Brinkler, K; Hullermeier, E. “Case-Based Multilabel Ranking”. In: Proceedings 20th Inter-
national Conference on Artificial Intelligence (IJCAI '07), pp. 702-707 (2007).
3. Garcia, A. C. B. “Active Design Documents: A New Approach for Supporting Documenta-
tion in Preliminary Routine Design”, PhD thesis, Stanford University, (1992).
4. Kowalski, C. T.; Orlowska-Kowalska, T. “Neural networks application for induction motor
faults diagnosis”. Mathematics and Computers in Simulation, 63:435-448, 2003.
5. Li, B.; Chow, M.; Tipsuwan, Y; Hung, J.C. “Neural-Network-Based Motor Rolling Bear-
ing Fault Diagnosis”. IEEE Transactions on Industrial Electronics, 47(5), 2000.
6. Mendel, E.; Mariano, L. Z.; Drago, I.; Loureiro, S.; Rauber, T. W.; Varejão, F. M.; Batista,
R.J. “Automatic Bearing Fault Pattern Recognition Using Vibration Signal Analysis”. In:
Proceedings IEEE International Symposium on Industrial Electronics (ISIE’08), Cam-
bridge, pp. 955-960 (2008).
7. Mitchell, J. S. Introduction to Machinery Analysis and Monitoring, PenWel Books, Tulsa
(1993).
8. Mitchell, T. Machine Learning. McGraw Hill (1997).
9. Sebastiani, F. “Machine learning in automated text categorization”. ACM Computing Sur-
veys. 34(1):1-47, 2002.
10. Singh, G.K.; Kazzaz, S. A. S. A. “Induction machine drive condition monitoring and diag-
nostic research – a survey”. Electric Power Systems Research, 64:145-158, 2003.
11. Schapire, R. E.; Singer, Y.. “BoosTexter: A boosting-based system for text categorization”.
Machine Learning, 39(2/3):135-168, 2000.
12. X. Shen, M. Boutell, J. Luo, and C. Brown "Multi label Machine learning and its applica-
tion to semantic scene classification". In: Proceedings of the 2004 International Sympo-
sium on Electronic Imaging (EI 2004), pp. 18-22 (2004).
13. Zhang, S.; Ganesan, R.; Xistris, G. D. “Self-Organizing Neural Networks for Automated
Machinery Monitoring Systems”. Mechanical Systems and Signal Processing, 10(5):517-
532, 1996.
Providing Assistance during Decision-making
Problems Solving in an Educational Modelling
Environment
1 Introduction
Politis, P., Partsakoulakis, I., Vouros, G. and Fidas, C., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 21–29.
22 Panagiotis Politis at al.
ing environment that supports expression of different kinds of models (semi-
quantitative models, quantitative models, and decision making models) mostly for
students 11-16 years old. In this paper we present the ModelsCreator (Dimitra-
copoulou et. al., 1997), (Dimitracopoulou et. al, 1999) decision making module
(Partsakoulakis & Vouros, 2002).
ModelsCreator decision-making module offers support to students to be able to
construct models with or without doubts (expressed by probabilities). The module
provides generic techniques for models’ validation to successfully assist students
discover mistaken features in their models and attain an accord with the instructor.
The models meet the requirements of many curriculum subject matters, permitting
interdisciplinary use of the modelling process. ModelsCreator puts great emphasis
on visualization of the modelling entities, their properties and their relations.
Visualization is crucial in supporting the reasoning development of young students
and favours the transition from reasoning over objects to reasoning with abstract
concepts (Teodoro 1997). This feature is extended also to the simulation of execu-
table models allowing their validation through representation of the phenomenon
itself in a visual way and not in an abstract one, as it is usually the case.
Entities Editor
XML, COM
Libraries of entities
XML, COM
User
ModelsCreator
Fig. 1 The entities follow the COM standard and the XML specification
Models for decision-making are typically qualitative models. Each decision mak-
ing model has precisely one hypothesis part, precisely one decision part and at
most one counter decision part. The student selects certain properties/attributes,
sets the desired values and relates them with the appropriate logical connective.
For instance, the IF connective applies to an AND expression, while the AND
connective, applies to an OR expression and to a property of an entity.
Using such an environment, one may construct fully parenthesised expressions
of arbitrary complexity that are according to the following formal grammar:
Expression = if Construct then Construct
| if Construct then Construct else Con-
struct
Construct = (Construct and Construct)
| (Construct or Construct)
| not(Construct)
| Entity_Property_or_Attribute
3 Testing a Model
Testing a model means to find out its correctness. But some times a solution to a
decision making problem might not be totally true or false. Furthermore there are
alternative correct solutions to a decision problem. The model-checking mecha-
nism of the decision-making module aims to facilitate the cooperative process be-
tween students and instructors to make an agreement about the situations in which
a decision is valid.
24 Panagiotis Politis at al.
To overcome this problem a reference model has been specified for each logi-
cal domain (any curriculum subject). This way exist for each logical a reference
model which consist of a number of alternative right models which have been
specified by the domain owner. In this way a better evaluation can be achieved
while testing the correctness of the user model.
To evaluate his model the user has to specify first the logical domain. The logi-
cal domain module informs the translation module to set the right reference
model. Then the model module informs the translation module which starts pre-
paring the interface for the prolog module. This interface consists of the reference
model and the user model in ASCII format. The prolog module checks the refer-
ence with the user’s model and provides to the user the appropriate feedback.
The model-checking module consists of four major steps. First, the two models
(student and reference model) are converted using model-preserving formulas in
equivalent Conjunctive Normal Forms (CNF). After that, the two models are
compared and the results of the comparison are raised intermediately and can be
visualized in the form of a comparison table. Mistaken aspects of student’s model
may be diagnosed by inspecting the comparison table. Finally, the mechanism de-
cides on the appropriate feedback message that should be given to the student
To sustain the fundamental relation between the hypothesis and the decision part
of the model, the student-model and reference-model are recorded independently.
Each part is then converted using tautologies in CNF. A sentence in CNF is a con-
junction of a set of disjunctive formulas (figure 5). Each disjunctive formula con-
sists only of atomic formulas.
(A ) Ω ... Ω (Z )
(a1 Ϊ a 2 Ϊ ... Ϊ ana ) Ω ... Ω (z 1 Ϊ z 2 Ϊ ... Ϊ z n z )
Fig 2: The Conjunctive Normal Form
Before comparing the two models the mechanism simplifies the sentences by
removing redundant elements (atomic formulas present more than once in the
same disjunctive sentence, disjunctive formulas that form part of the whole for-
mula and that imply the whole formula).
Providing Assistance during Decision-making Problem Solving 25
3.1.2 Comparing student model and reference model
The two models are equivalent in the case that each disjunctive formula of one of
the models is implied by the other one. So, each disjunctive formula is converted
to a set of atomic formulas to compare the two models in CNF. For example the
formula a Ϊ b Ϊ c Ϊ d is converted to the set {a , b, c, d }. In other words, each
model is converted to a set of sets, where each inner set corresponds to a disjunc-
tive formula and contains atomic formulas.
The comparison process achieved by the model-checking mechanism results
the recording of the atomic formulas that are missing or that are surplus in each
disjunctive formula in both models. The comparison table reflects the result of the
comparison process. Entries of this table correspond to pairs of disjunctive formu-
las. Each entry (i,j) contains a sub-table with the missing and surplus elements of
the i-th disjunctive formula of the student model compared with the j-th disjunc-
tive formula of the reference model.
If in each row and column of the comparison table there is a sub-table with no
missing or surplus atomic formulas, that means that for each disjunctive formula
of student’s model exists a matching formula in the reference model, so the stu-
dent model and the reference model are equivalents. If the two models are not
equivalents, then the model-checking mechanism can diagnose different situations
by inspecting the comparison table.
If the student has over-specified the situations where a decision can be formed,
or the student has specified the right properties/attributes for the right entities, but
has not assigned the proper values for (at least) one of these properties/attributes
that means that the student has missed at least one atomic formula. This formula
may correspond to an entity participating to the model, to a property, to an attrib-
ute or to a value assigned to a property/attribute of a participating entity.
If the student has under-specified the situations where a decision can be formed
that means that at least one atomic formula in student model is surplus. This for-
mula may correspond to an entity participating to the model, to a property, to an
attribute or to a value assigned to a property/attribute of a participating entity.
If the student has related two entities with a wrong logical connective, the di-
agnosis is also based on atomic formulas that are missing and surplus.
All the mentioned cases are not measured the same. A total score is computed
that specifies the providing feedback to the student. So, the student model is as-
signed a score value that is a number between 0 and 100 (Table 1).
26 Panagiotis Politis at al.
Table 1. Situation and score assigned during student model’s judgment
Score Situation
0 Non recognizable error
10 Surplus entity
20 Entity missing
30 Surplus property
40 Property missing
50 Wrong value in property
60 Wrong probability value
70 Connective misuse
100 Equivalent to reference model
The score assigned during the student model’s judgment specifies the feedback
message provided by the environment. To each score level one or more messages
are assigned with a preference. Messages with lower preference are more or
equally detailed.
The purpose of the system is to facilitate the student to assemble his model
equivalent to the reference model. The feedback messages should give the suitable
assistance students construct their valid models. To attain this aim, the system cre-
ates and displays messages of increasing factor.
If the system has diagnosed that the student over-specified the situations for
making a decision, e.g. by specifying a surplus entity, it is proposing to the student
to confirm the entities in the model. In case that the student does not improve his
score, the checking mechanism provides suggestion by prompting the student to
ensure if there are any surplus entities in the student model. After that, if the stu-
dent insists in the same invalid situation, the checking mechanism provides a more
detailed assistance by saying that the specific entity is not related to the situation
being modelled.
4 Conclusions
References
1. Beckett, L., Boohan, R.: Computer Modelling for the Young - and not so Young – Scien-
tist, Microcomputer Based Labs: educational research and Standards, R. Thinker (ed),
Springer Verlag, ASI series, Vol 156, (1995) 227-238
2. Dimitracopoulou, A., Vosniadou, S., Ioannides, C.: Exploring and modelling the real
world through designed environments for young children. In Proceedings 7th European
Conference for Research on Learning and Instruction EARLI, Athens, Greece (1997)
3. Dimitracopoulou, A., Komis, V., Apostolopoulos P., Politis, P.: Design Principles of a
New Modelling Environment Supporting Various Types of Reasoning and Interdiscipli-
nary Approaches. In Proceedings 9th International Conference of Artificial Intelligence
in Education, IOS Press, Ohmsha, (1999) 109-120
4. Fidas, C., Komis, V., Avouris, N., Dimitracopoulou, A.: Collaborative Problem Solving
using an Open Modelling Environment. In G. Stahl (edited by), Computer Support For
Collaborative Learning: Foundations For A CSCL Community, Proceedings of CSCL
2002, Boulder, Colorado, Lawrence Erlbaum Associates, Inc., (2002) 654-655
5. Partsakoulakis, I., Vouros, G.: Helping Young Students Reach Valid Decision through
Model Checking. In Proceedings 3rd Hellenic Conference on Technology of Informa-
tion and Communication in Education, Rhodes, Greece (2002) 669-678
6. Teodoro, V. D.: Learning with Computer-Based Exploratory Environments in Science
and Mathematics. In S. Vosniadou, E. De Corte, H. Mandl (Eds.), Technology -Based
Learning Environments: Psychological and Educational Foundations, NATO ASI Se-
ries, Vol. 137, Berlin: Springer Verlag. (1994) 179-186
7. Teodoro V.D. Modellus: Using a Computational Tool to Change the Teaching and
Learning of Mathematics and Science, in “New Technologies and the Role of the
Teacher” Open University, Milton Keynes, UK (1997)
Appendix A
Appendix B
<AttributeState >
<Values AttributeStateID ="xx" max="xxx"
min="xxx" default="xxxx">
</ AttributeState>
<AttributeState >
<Values AttributeStateID ="xx" max="xxx"
min="xxx" default="xxxx">
</ AttributeState>
Providing Assistance during Decision-making Problem Solving 29
<AttributeState >..………..
</Attribute>
<Attribute ……………
<EntityState id="xxx">
<Icon Filename="xxx" />
<Attribute ID="xxx" AttributeStateID ="xx"
/>
<Attribute ID="xxx" AttributeStateID ="xx"
/>
<Attribute…..
</EntityState>
<EntityState >……………
</Entity>
Alternative Strategies for Conflict Resolution in
Multi-Context Systems
1
Institute of Computer Science, FO.R.T.H., Greece, [email protected]
2
Institute of Computer Science, FO.R.T.H., Greece, [email protected]
3
Department of Computer Science, Athens University of Ecomomics and Business,
[email protected]
Bikakis, A., Antoniou, G. and Hassapis, P., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 31–40.
32 A. Bikakis, G. Antoniou and P. Hassapis
he argued that the combination of non-monotonic reasoning and contextual rea-
soning would constitute an adequate solution to this problem. Since then, two
main formalizations have been proposed to formalize context: the propositional
logic of context (PLC [7], [11]), and the Multi-Context Systems introduced in [9],
which later became associated with the Local Model Semantics proposed in [8].
MCS have been argued to be most adequate with respect to the three dimensions
of contextual reasoning, as these were formulated in [5] (partiality, approxima-
tion, and proximity), and have been shown to be technically more general than
PLC [13]. Multi-Context Systems have also been the basis of two recent studies
that were the first to deploy non-monotonic reasoning methods in MCS: (a) the
non-monotonic rule-based MCS framework of [12], which supports default nega-
tion in the mapping rules allowing to reason based on the absence of context in-
formation; and (b) the multi-context variant of Default Logic proposed in [6],
which models bridge relations between different contexts as default rules, han-
dling cases of inconsistency in the imported knowledge. However, none of these
approaches includes the notion of priority or preference, which could be poten-
tially used for conflict resolution.
This paper focuses on the problem of global conflicts in Multi-Context Sys-
tems; namely the inconsistencies that may arise when importing conflicting infor-
mation from two or more different contexts. Even if all context theories are locally
consistent, we cannot assume consistency in the global knowledge base. The uni-
fication of local theories may result in inconsistencies caused by the mappings.
For example, a context theory A may import context knowledge from two differ-
ent contexts B and C, through two competing mapping rules. In this case, even if
the three different contexts are locally consistent, their unification through the
mappings defined by A may contain inconsistencies. In previous work [3], we
proposed a reasoning model that represents contexts as local rule theories and
mappings as defeasible rules, and a distributed algorithm for query evaluation,
which exploits context and preference information from the system contexts to re-
solve such conflicts. In this paper, we describe three alternative strategies for con-
flict resolution, which differ in the extent of context knowledge that system con-
texts exchange in order to be able to resolve the potential conflicts.
The rest of the paper is organized as follows. Section 2 describes the proposed
representation model. Section 3 describes the four alternative strategies for global
conflict resolution, and how they are implemented in four different versions of a
distributed algorithm for query evaluation. Section 4 presents the results of simu-
lation-based experiments that we conducted on the four strategies using a proto-
typical implementation of the algorithms in a Java-based P2P system. Finally, the
last section summarizes and discusses the plans of our future work.
Alternative Strategies for Conflict Resolution 33
2 Representation Model
In this system, there are six context theories and a query about literal x1 is is-
sued to context P1. To compute the truth value of x1, P1 has to import knowledge
from P2, P3 and P4. In case the three system contexts return positive truth values
for a2, a3 and a4 respectively, there will be a conflict about the truth value of a1
caused by the two conflicting mapping rules r12 and r13.
Single Answers requires each context to return only the truth value of the queried
literal. When a context receives conflicting answers from two different contexts, it
resolves the conflict using its preference order. The version of the distributed algo-
rithm that implements this strategy (P2P_DRSA) is called by a context Pi when it
receives a query about one of its local literals (say xi) and proceeds as follows:
Alternative Strategies for Conflict Resolution 35
1. In the first step, it determines whether the queried literal, xi or its negation ¬xi
derive from Pi's local rules, returning a positive or a negative truth value re-
spectively.
2. If Step 1 fails, the algorithm collects, in the second step, the local and mapping
rules that support xi (as their conclusion). For each such rule, it checks the truth
value of the literals in its body, by issuing similar queries (recursive calls of the
algorithm) to Pi or to the appropriate contexts. To avoid cycles, before each
new query, it checks if the same query has been issued before, during the same
algorithm call. If the algorithm receives positive answers for all literals in the
body of a rule, it determines that this rule is applicable and builds its Suppor-
tive Set SSr i. This derives from the union of the set of the foreign literals con-
tained in the body of ri, with the Supportive Sets of the local literals in the body
of ri. In the end, in case there is no applicable supportive rule, the algorithm re-
turns a negative answer for xi and terminates. Otherwise, it computes the Sup-
portive Set of xi, SSxi, as the strongest of the Supportive Sets of the applicable
rules that support xi, and proceeds to the next step. To compute the strength of a
set of literals, P2P_DRSA uses the preference order Ti. A literal ak is considered
stronger than literal bl if Pk precedes Pl in Ti. The strength of a set is determined
by the weakest element in the set.
3. In the third step, the algorithm collects and checks the applicability of the rules
that contradict xi (rules with conclusion ¬xi). If there is no such applicable rule,
it terminates by returning a positive answer for xi. Otherwise, it computes the
Conflicting Set of xi, CSxi, as the strongest of the Supportive Sets of the appli-
cable rules that contradict xi.
4. In its last step, P2P_DRSA compares the strength of SSxi and CSxi using Ti to de-
termine the truth value of xi. If SSxi is stronger, the algorithm returns a positive
truth value. Otherwise, it returns a negative one.
In the system of Fig. 1, P2P_DRSA fails to produce a local answer for x1. In the
second step, it attempts to use P1’s mapping rules. The algorithm eventually re-
ceives positive answers for a2, a3 and a4, and resolves the conflict for a1 by com-
paring the strength of the Supportive Sets of the two conflicting rules, r12 and r13.
Assuming that T1=[P4,P2,P6,P3,P5], it determines that SSr12={a2} is stronger than
SSr13={a3, a4} and returns positive answer for a1 and eventually for x1.
An analytical description of P2P_DRSA is available in [3]. The same paper pre-
sents some formal properties of the algorithm regarding (a) its termination; (b) re-
quired number of messages (O(n2l), where n stands for the total number of con-
texts, whereas l stands for the number of literals a local vocabulary may contain);
(c) computational complexity (O(n2l2r), where r stands for the number of rules a
context theory may contain); and (d) the existence of a unified defeasible theory
that produces the same results with the distributed algorithm under the proof the-
ory of Defeasible Logic [1].
36 A. Bikakis, G. Antoniou and P. Hassapis
3.2 Strength of Answers
The Strength of Answers strategy requires the queried context to return, along
with the truth value of the queried literal, information about whether this value de-
rives from its local theory or from the combination of the local theory and its map-
pings. To support this feature, the second version of the algorithm, P2P_DRSWA,
supports two types of positive answers: (a) a strict answer indicates that a positive
truth value derives from local rules only; (b) a weak answer indicates that a posi-
tive truth value derives from a combination of local and mapping rules. The query-
ing context evaluates the answer based not only on the context that returns it but
also on the type of the answer. This version follows the four main steps of
P2P_DRSA but with the following modifications:
a) A Supportive/Conflicting Set (of a rule or of a literal) is not a set of literals,
but a set of the answers returned for these literals.
b) The strength of an element in a Supportive/Conflicting Set is determined pri-
marily by the type of answer computed by the algorithm (strict answers are
considered stronger than weak ones); and secondly by the rank of the queried
context in the preference order of the querying context.
Given these differences, the execution of P2P_DRSWA in the system depicted in
Figure 1, produces the following results: The Supportive Sets of rules r12 and r13
are respectively: SSr12 ={weaka2}, SSr13 ={stricta3, stricta4} (the truth values of a3
and a4 derive from the local theories of P3 and P4 respectively, while P2 has to use
its mappings to compute the truth value of a2) and SSr13 is computed to be stronger
than SSr12. Eventually, the algorithm computes negative truth values for a1 and x1.
The main feature of Propagating Supportive Sets is that along with the truth value
of the queried literal, the queried context returns also its Supportive Set. The algo-
rithm that implements this strategy, P2P_DRPS, differs from P2P_DRSA only in the
construction of a Supportive Set; in this case, the Supportive Set of a rule derives
from the union of the Supportive Sets of all (local and foreign) literals in its body.
In the MCS depicted in Figure 1, P2P_DRPS when called by P2 to compute the
truth value of a2, and assuming that T2=[P5,P6], returns a positive value and its
Supportive Set SSa2 = {b5}. The answers returned for literals a3 and a4 are both
positive values and empty Supportive Sets (they are locally proved), and
P2P_DRPS called by P1 computes SSr12 = {a2,b5} and SSr13 = {a3,a4}. Using
T1=[P4,P2,P6,P3,P5], P2P_DRPS determines that SSr13 is stronger than SSr12 (as both
P3 and P4 precede P5 in Ti), and eventually computes negative values for a1 and x1.
Alternative Strategies for Conflict Resolution 37
3.4 Complex Supportive Sets
Complex Supportive Sets, similarly with Propagating Supportive Sets, requires the
queried context to return the Supportive Set of the queried literal along with its
truth value. In the case of Propagating Supportive Sets, the Supportive Set is a set
of literals that describe the most preferred (by the queried context) chain of rea-
soning that leads to the derived truth value. In the case of Complex Supportive
Sets, the Supportive Set is actually a set of sets of literals; each different set de-
scribes a different chain of reasoning that leads to the computed truth value. The
context that resolves the conflict determines the most preferred chain of reasoning
using its own preference order. P2P_DRCS, differs from P2P_DRSA as follows:
a) The Supportive Set of a rule derives from the product of the Supportive Sets
of the literals in the body of the rule.
b) The Supportive Set of a literal derives from the union of the Supportive Sets
of the applicable rules that support it.
c) Comparing the Supportive Set and the Conflicting Set of a literal requires
comparing the strongest sets of literals contained in the two sets.
In the system of Figure 1, P2P_DRCS called by P2 computes a positive truth
value for a2 and SSa2 = {{b5},{b6}}. When called by P3 and P4, P2P_DRCS returns
positive truth values and empty Supportive Sets for a3 and a4 respectively, while
when called by P1, it computes SSr12={{a2,b5},{a2,b6}} and SSr13={{a3,a4}}. Using
T1=[P4,P2,P6,P3,P5], P2P_DRCS determines that {a2,b6} is the strongest set in SSr12,
and is also stronger than {a3,a4} (as P6 precedes P3 in T1). Consequently, it returns
a positive answer for a1 and eventually a positive answer for x1 as well.
Analytical descriptions of the three latter algorithms, as well as some results on
their formal properties are omitted due to space limitations. Briefly, the three algo-
rithms share the same properties regarding termination, number of messages, and
the existence of an equivalent unified defeasible theory with P2P_DRSA. The com-
putational complexity of second and third strategy is similar to P2P_DRSA; the
fourth strategy imposes a much heavier computational overhead (exponential to
the number of literals defined in the system).
The goal of the experiments was to compare the four strategies in terms of actual
computational time spent by a peer to evaluate the answer to a single query. Using
a tool that we built for the needs of the experiments, we created theories that cor-
respond to the worst case that the computation of a single query requires comput-
ing the truth value of all literals from all system nodes. The test theories that we
created have the following form:
rm1 : a2,a3,…,an ⇒ a0
...
rmn/2 : a1,…an/2-1, an/2+1…,an ⇒ a0
rmn/2+1 : a1,…an/2, an/2+2…,an ⇒ ¬a0
…
rmn : a1,a2,…,an-1 ⇒ ¬a0
The above mapping rules are defined by P0 and associate the truth value of a0
with the truth value of the literals from n other system peers. Half of them support
a0 as their conclusion, while the remaining rules contradict a0. In case the truth
values returned for all foreign literals a1, a2,...,an are all positive then all mapping
rules are applicable and are involved in the computation of the truth value of a0.
To exclude the communication overhead from the total time spent by P0 to
evaluate the truth value of a0, we filled a local cache of P0 with appropriate an-
swers for all the foreign literals. Specifically, for all strategies we used positive
truth values for all foreign literals. For the second strategy (Strength of Answers),
Alternative Strategies for Conflict Resolution 39
the type of positive answer (strict or weak) was chosen randomly for each literal,
and for the last two strategies we used Supportive Sets that involve all literals.
For each version of the algorithm we tested six experiments with a variant
number of system peers (n): 10, 20, 40, 60, 80, and 100. The test machine was an
Intel Celeron M at 1.4 GHz with 512 MB of RAM.
4.3 Results
Table 1 shows in msec the computation time for each version of P2P_DR. In the
case of P2P_DRCS, we were able to measure the computation time only for the
cases n = 10, 20, 40; in the other cases the test machine ran out of memory.
The computation time for the first three strategies is proportional to the square
of the number of system peers. The fourth strategy requires much more memory
space and computation time (almost exponential to the number of peers), which
make it inapplicable in cases of very dense systems. The results also highlight the
tradeoff between the computational complexity and the extent of context informa-
tion that each strategy exploits to evaluate the confidence in the returned answers.
In this paper, we proposed a totally distributed approach for reasoning with mutu-
ally inconsistent rule theories in Multi-Context Systems. The proposed model uses
rule theories to express local context knowledge, defeasible rules for the definition
of mappings, and a preference order to express confidence in the imported context
information. We also described four strategies that use context and preference in-
formation for conflict resolution, and which differ in the extent of context infor-
mation exchanged between the system contexts, and described how each strategy
is implemented in a different version of a distributed algorithm for query evalua-
tion in Multi-Context Systems. Finally, we described the implementation of the
four strategies in a simulated peer-to-peer environment, which we used to evaluate
40 A. Bikakis, G. Antoniou and P. Hassapis
the strategies with respect to their computational requirements. The obtained re-
sults highlight the tradeoff between the extent of context information exchanged
between the contexts to evaluate the quality of the imported context and the com-
putational load of the algorithms that implement the four strategies.
Part of our ongoing work includes: (a) Implementing the algorithms in Logic
Programming, using the equivalence with Defeasible Logic [3], and the well-
studied translation of defeasible knowledge into logic programs under Well-
Founded Semantics [2]; (b) Adding non-monotonic features in the local context
theories to express uncertainty in the local knowledge; (c) Extending the model to
support overlapping vocabularies, which will enable different contexts to use ele-
ments of common vocabularies (e.g. URIs); and (d) Implementing real-world ap-
plications of our approach in the Ambient Intelligence and Semantic Web do-
mains. Some initial results regarding the application of our approach in Ambient
Intelligence are already available in [4].
References
1. Antoniou G., Billington D., Governatori G., Maher M.J.: Representation results for de-
feasible logic. ACM Transactions on Computational Logic 2(2):255-287, 2001.
2. Antoniou G., Billington D., Governatori G., Maher M.J.: Embedding defeasible logic
into logic programming. Theory and Practice of Logic Programming 6(6):703-735,
2006.
3. Bikakis A., Antoniou G.: Distributed Defeasible Reasoning in Multi-Context Systems.
In NMR'08, pp. 200-206, (2008)
4. Bikakis A., Antoniou G.: Distributed Defeasible Contextual Reasoning in Ambient
Computing. In AmI'08 European Conference on Ambient Intelligence, pp. 258-375,
(2008)
5. Benerecetti M., Bouquet P., Ghidini C.: Contextual reasoning distilled. JETAI 12(3):
279-305, 2000.
6. Brewka G., Roelofsen F., Serafini L.: Contextual Default Reasoning. In: IJCAI, pp.
268-273 (2007)
7. Buvac, S, Mason I.A.: Propositional Logic of Context. In AAAI, pp. 412-419, (1993).
8. Ghidini C., Giunchiglia F.: Local Models Semantics, or contextual reasoning = local-
ity + compatibility. Artificial Intelligence, 127(2):221-259, 2001.
9. Giunchiglia F., Serafini L.: Multilanguage hierarchical logics, or: how we can do
without modal logics. Artificial Intelligence, 65(1), 1994.
10. McCarthy J.: Generality in Artificial Intelligence. Communications of the ACM,
30(12):1030-1035, 1987.
11. McCarthy J., Buvac S.: Formalizing Context (Expanded Notes). Aliseda A., van
Glabbeek R., Westerstahl D. (eds.) Computing Natural Language, pp. 13-50. CSLI
Publications, Stanford (1998)
12. Roelofsen F, Serafini L.: Minimal and Absent Information in Contexts. In IJCAI, pp.
558-563, (2005).
13. Serafini L., Bouquet P.: Comparing formal theories of context in AI. Artificial Intelli-
gence, 155(1-2):41-67, 2004.
Certified Trust Model
1
Vanderson Botêlho, 1Fabríco Enembreck, 1Bráulio C. Ávila, 2Hilton de
Azevedo, 1Edson E. Scalabrin
1
PUCPR, Pontifical Catholic University of Paraná
PPGIA, Graduate Program on Applied Computer Science
R. Imaculada Conceição, 1155
Curitiba PR Brazil
{vanderson, fabricio, avila, scalabrin}@ppgia.pucpr.br
2
UTFPR, Federal Technological University of Paraná
PPGTE, Graduate Program on Technology
Av. 7 de Setembro, 3165
Curitiba, PR, Brazil
[email protected]
Abstract This paper presents a certified confidence model which aims to ensure
credibility for information exchanged among agents which inhabit an open envi-
ronment. Generally speaking, the proposed environment shows a supplier agent b
which delivers service for a customer agent a. The agent a returns to b a crypto-
graphed evaluation r on the service delivered. The agent b will employ R as testi-
monial when requested to perform the same task for a distinct customer agent. Our
hypotheses are: (i) control over testimonials can be distributed as they are locally
stored by the assessed agents, i.e., each assessed agent is the owner of its testimo-
nials; and (ii) testimonials, provided by supplier agents on their services, can be
considered reliable since they are encapsulated with public key cryptography. This
approach reduces the limitations of confidence models based, respectively, on the
experience resulted from direct interaction between agents (direct confidence) and
on the indirect experience obtained from reports of witnesses (propagated confi-
dence). Direct confidence is a poor-quality measure for a customer agent a hardly
has enough opportunities to interact with a supplier agent b so as to grow a useful
knowledge base. Propagated confidence depends on the willingness of witnesses
to share their experiences. The empiric model was tested in a multiagent system
applied to the stock market, where supplier agents provide recommendations for
buying or selling assets and customer agents then choose suppliers based on their
reputations. Results demonstrate that the confidence model proposed enables the
agents to more efficiently choose partners.
Botêlho, V., Enembreck, F., Ávila, B.C., de Azevedo, H. and Scalabrin, E.E., 2009, in IFIP
International Federation for Information Processing, Volume 296; Artificial Intelligence Applications
and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 41–49.
42 Vanderson Botelho et al.
1 Introduction
Basically, a trust model takes into account an agent a that quantifies the trust it has
regarding an agent b [2]. For example, agent a is the evaluator and the agent b is
the target. A rating is calculated based on the past experiences regarding the qual-
ity of a service made by an agent to the other. Every rating is represented by a tu-
ple r=(a,b,i,v,c), where a and b are agents that participate in a interaction i and v is
the assessment made by a over b about a given term c. Every assessment is stored
locally by the service agent that was evaluated. So, when asked by a client agent,
it can inform about the assessment results it had before. Term c brings to the trust
model the capability assess every agent in different contexts. For instance, every
evaluation is given for a specific time. The notation of the trust from a on b, about
the term c is T(a,b,c). Quantifying trust requires a set of relevant assessments. This
set is notated as R(a,b,c) and is the basis for the certified trust model we propose.
Τ(a, b, c ) =
∑ ri ∈R ( a ,b ,c )ω (ri ) ⋅ vi (1)
∑ r ∈Rc ( a ,b ,c ) ω ( ri )
i
−
∆t ( ri )
(2)
ω Re(ri ) = e λ
Where ωRe(ri) is the value of coefficient ri related to the time variation ∆t(ri),
that means the time elapsed between the time at the moment of the request and the
moment the rating was created. Finally, λ is the factor that determines the coeffi-
cient decreasing speed related to time.
We point that there is no guarantee that the agents are honest on their assess-
ments or that their capabilities to assess service agents are inaccurate or imprecise.
Our trust certified model reduces this problem introducing in the process the
credibility of the evaluator agent as another element in order to determine the
relevance of a specific rating inside a trust calculus. This process determines how
an evaluator is reliable and can be calculated when customers evaluate their per-
sonal interactions in order to compare with the ratings received. The credibility of
an assessment agent w is calculated by another assessment agent a and, is named
TRCr(a,w)∈[-1,+1], where RCr. A rating weight is related to the time ωRe(ri) and
the credibility ωRCr(ri):
When ωRCr(ri) is negative, the assessment agent has no credibility and its rat-
ing is adjusted to zero:
So, vk receives a positive value if the difference between vw and va stays below
the limit ι, otherwise, the credibility is negative, i.e. the evaluator agent can not be
trusted.
The honesty of provider agents when they envoy their ratings is granted by a
digital signature based on asymmetric keys. The signature is composed by both,
private and public keys. With this method, a System Administrator agent creates a
code key for every kind of service c. This key is sent to all agents, it is a public
key. Then the System Administrator agent creates a second key that is used only
for decoding. This key is sent only for the evaluators agents.
Every time a evaluator/client agent sends a rating to a provider agent, the pub-
lic key for service c is used to encrypt the value of v. As v can be decrypted only
with the private key that belongs to the client agents, no provider agent can know
about the value of v related to the rating r. This avoids that a provider agent selects
a subset of relevant ratings R. In our experiments we used the Pretty Good Privacy
(PGP) algorithm [16] to encrypt and decrypt the values of the ratings.
3 Experiment
We defined four behavior groups for the provider agents: good providers (which
use a analysis method with gives high level of success to their recommendations),
bad providers (with low level of success in their recommendations), ordinary pro-
viders (with a level of recommendations success around the average, i.e. a mobile
average) and, malicious providers (this provider agents used the same method
used by the third group but they order their ratings with the purpose of sending
only the better ones and make difficult the differentiation between good and bad
service provider agents).
Client agents are organized in four groups: No_Trust (the ones that do not have
any trust model); Direct_Trust, (the ones that implement a direct trust model);
Cr_Trust, (the ones that implement certified trust model based on certified reputa-
tion and; Cryp_Trust, (the ones that implement the certified trust model). Client
agents interact with different kind of service provider agents and, according to the
trust model they have, they select the service provider agent that seems to maxi-
mize their interests.
Every client agent starts consulting several service provider agents with whom
it performs as many buying/selling orders of actions as necessary. The evaluation
of the trust model is made by measuring the performance of every service agent
portfolio. Every agent receives the same amount of money to invest. At the end of
every working day, the percentage of every service agent portfolio is observed
46 Vanderson Botelho et al.
growing. The agents acted over historical real data of Bovespa stock market [15].
To this experiment we considered only one kind of action quoted at Bovespa from
January/2nd/2006 to December/18th/2007, totalizing 473 working days. Data re-
garding 2006 were used for training. At the end of 2006, the portfolios were re-
started. Nevertheless, the agents kept the experience acquired during 2006 year.
Then, during the year of 2007, every client agent (investor) evaluated the perform-
ance of its service provider agents (market expert) by using its trust model.
Every experiment was started by the creation of client and service agents. Ser-
vice agents had only one strategy to perform financial analysis. Client agents had
only one trust model. Client agent’s utility gain, named UG, represents the utility
gain of the trust model. At the end of every working day, the function of utility of
client agents was added according to agent's trust model. The average of those
values represented the utility gain of the trust model.
Four scenarios were set in order to evaluate the behavior of the trust model. At
the Scenario I has service agents that are honest, despite they select their ratings,
that selection do not disturbed their real performance because all service provider
agents use the same technique of analysis during the all scenario (Table 1 presents
the variables used).
At the scenario II, service provider agents have different performance because
their techniques of analysis change during the scenario. By doing so, a service
provider agent that was using a very good analysis technique may have its per-
formance decreased because it starts using a worse one, and vice-versa. Parame-
ters defined at Table 2 ensure that scenarios I and II have service provider agents
with rational behavior and constant performance due to the absence of agents from
the Malicious service provider agent group. Here, all service provider agents used
a same financial analytical technique.
120 40
100 30
80
60 20
40 10
20
0 0
-20 -10
-40 -20
3 4 5 6 7 8 9 10 11 12 3 4 5 6 7 8 9 10 11 12
active no_trust direct_trust active no_trust direct_trust
cr_trust cryp_trust cr_trust cryp_trust
Fig. 1. Honest context without changes. Fig. 2. Honest context with changes.
In scenarios III and IV, we introduce malicious service providers that even se-
lect and send their best ratings in order to influence calculus of trust made by the
client agents. Similarly to scenarios I and II, at Scenario III service providers
agents have a constant performance. At Scenario IV, they have variations in their
performance. Table 3 shows the configuration used.
Figure 3 shows a simulation where service provider agents do not make
changes in their financial analytical techniques, thus keeping their performance
constant. The cr_trust model had the worst result (-58%). The reason is that the
client agent is deceived by the malicious service agents that send ratings arbitrary
selected. On the other hand, the cryp_trust model with crypted rating avoid mali-
cious service agent to select their best ratings. As consequence, the cryp_trust
model performance remains similar to the scenario where there are no malicious
agents.
100 20
50 0
0 -20
-50 -40
-100 -60
3 4 5 6 7 8 9 10 11 12 3 4 5 6 8 9 10 11 12
action no_trust direct_trust action no_trust direct_trust
cr_trust cryp_trust cr_trust cryp_trust
Fig. 3. Dishonest context without changes. Fig. 4. Dishonest context with changes.
Scenario IV has the worst situation: the existence of malicious service provider
agents and the variation of their performance due to changes, in runtime, in their
financial strategies. Figure 4 shows that the cr_trust model presents much lower
performance if compared to the others (-55%), even presenting a drop of perform-
ance when compared with scenario III the cryp_trust model keeps a positive per-
formance (+20%). The great difference between both models is due to: the exis-
tence of malicious service providers and the changes of financial strategies in
runtime.
5 Conclusion
References
1 Introduction
Alexopoulos, P., Wallace, M., Kafentzis, K. and Thomopoulos, A., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 51–59.
52 Panos Alexopoulos et al.
knowledge. As such they can be a very effective tool for enhancing the productivity
of knowledge workers who are, after all, some of the most important knowledge
assets of an enterprise.
In this paper we describe a knowledge-based system that is used by consulting
firms for tackling the problem of the evaluation of tender calls. A tender call is an
open request made by some organization for a written offer concerning the procure-
ment of goods or services at a specified cost or rate. The evaluation of a (consult-
ing related) tender call by a consulting company refers to the process of deciding
whether the company should devote resources for preparing a competitive tender in
order to be awarded the bid.
In our approach, we formulate the above problem as a decision making problem
in which the decision to be taken is whether the company should write the tender
or not. The evaluation criterion is the probability that the company will be assigned
the call’s project. This is usually estimated by some consultant who takes in mind a
number of different (and often conflicting) factors (criteria) that, according to his/her
judgement and experience, influence it. These factors can be classified into objective
and subjective but, most importantly, the whole process of evaluating and combining
all those factors is knowledge-intensive. And that is because the consultant utilizes
during this process aspects of the company’s organizational knowledge as well as
knowledge derived from his/her own expertise.
For that, the key characteristic of our system is that it stores and utilizes all the
necessary knowledge that the consultant needs for taking an informed decision and
performs reasoning that is very close to the his/her way of thinking. Furthermore,
both the reasoning and the underlying knowledge have fuzzy characteristics and
that is because many aspects of the knowledge the consultant utilizes for his/her
evaluation are inherently imprecise.
The structure of the rest of the paper is as follows: In section 2 we outline some
of the key criteria that the consultant uses for evaluating the partial factors that
influence the company’s chances to win the tender should a proposal be written.
For each criterion, we highlight the knowledge that is required for its evaluation.
In section 3 we describe the system’s knowledge base which consists of a fuzzy
domain ontology while in section 4 we present the reasoning mechanism in the form
of fuzzy reasoning procedures for each criterion. Finally, in section 5 we provide a
representative use case of the system and in section 6 we conclude by addressing
issues for future work.
The tender call evaluation process that our system implements comprises the evalu-
ation of a number of partial criteria such as the call’s budget, the call’s coverage by
the consulting company’s expertise and experience and the potential competition.
Due to space limitations, we describe here only two of the above criteria which,
A Fuzzy Knowledge-based System for Tender Call Evaluation 53
nevertheless, are quite representative of our approach. Along with the descriptions,
we highlight the kind of knowledge required for the evaluation of each criterion.
The knowledge base of the system is implemented as a Fuzzy Ontology [1]. A Fuzzy
Ontology is a tuple OF = {E, R} where E is a set of semantic entities (or concepts)
and R is a set of fuzzy binary semantic relations. Each element of R is a function
R : E2 → [0, 1].
In particular, R = T ∪ NT where T is the set of taxonomic relations and NT is
the set of non-taxonomic relations. Fuzziness in a taxonomic relation R ∈ T has the
following meaning: High values of R(a, b), where a, b ∈ E, imply that b’s meaning
approaches that of a’s while low values suggest that b’s meaning becomes “nar-
rower” than that of a’s.
On the other hand, a non-taxonomic relation has an ad-hoc meaning defined by
the ontology engineer. Fuzziness in this case is needed when such a relation repre-
sents a concept for which there is no exact definition. In that case fuzziness reflects
the degree at which the relation can be considered as true.
In our case, the structure of the system’s ontology is dictated by the knowledge
requirements identified in the previous section. Thus, the ontology consists of three
groups of concepts, namely Companies, Projects and ConsultingAreas. The re-
lation between different consulting areas is represented by means of a taxonomical
relation called hasSubArea. The areas at which a company considers itself expert,
are captured through the fuzzy non-taxonomical relation isExpertAt while the areas
a project is relevant to are represented through the fuzzy non-taxonomical relation
isRelevantTo. Finally, the projects a company has implemented or participated in are
denoted by the (non-fuzzy) relation hasImplemented. A snapshot of the consulting
areas taxonomy is shown in figure 1.
4 Evaluation of Criteria
1. The call’s consulting areas are extracted into a fuzzy set as explained in paragraph
4.1
2. The company’s implemented projects are retrieved from the ontology as fuzzy
sets.
3. The relevance of each past project to the call is calculated, also as explained in
paragraph 4.1
4. Past projects whose relevance is less than a specific threshold (defined by the con-
sultant) are discarded. For the remaining projects we calculate their accumulated
relevance to the call.
5. The overall relevance is calculated by applying to the accumulated a fuzzy trian-
gular number of the form (0, a, ∞)
6. The parameter a denotes the threshold over which additional past project do not
further influence the overall similarity.
For the overall evaluation of a tender call we use a technique based on the notion
of the ordered weighted averaging operator(OWA), first introduced by Yager in [8].
An OWA operator of dimension n is a mapping F : Rn → R that has an associ-
ated n vector w = (w1 , w2 , ..., wn )T such as wi ∈ [0, 1], 1 ≤ i ≤ n and ∑nj=1 wi = 1.
Given this operator the aggregated value of a number of decision criteria ratings
is F(a1 , a2 , ..., an ) = ∑nj=1 w j b j where b j is the j-th largest element of the bag
< a1 , a2 , ..., an >.
The fundamental aspect of OWA operators is the fact that an aggregate ai is not
associated with a particular weight wi but rather a weight is associated with a partic-
A Fuzzy Knowledge-based System for Tender Call Evaluation 57
ular ordered position of the aggregate. This, in our case, is useful as most consultants
are not able to define a-priori weights to criteria since, as they say, the low rating of
a criterion might be compensated by the high rating of another criterion.
OWA operators can provide any level of criteria compensation lying between
the logical and and or. Full compensation (or) is implemented through the operator
w = (1, 0, ..., 0)T while zero compensation through the operator w = (0, 0, ..., 1)T . In
our case the compensation level is somewhat above average. This was determined
with the help of domain experts consultants after several trials of the system.
5 Use Case
The testing and evaluation of the effectiveness of our system was conducted by DI-
ADIKASIA S.A., a Greek consulting firm that provides a wide range of specialized
services to organizations and companies of both the public and private sector. The
whole process comprised two steps, namely the population of the system’s ontol-
ogy with company specific knowledge and the system’s usage by the company’s
consultants for evaluating real tender calls.
For the first step, instances of the relations isExpertAt and hasImplemented were
generated by the consulting company while the population of the relation isRele-
vantTo was performed automatically by applying the DTC algorithm to the com-
pany’s past projects’ descriptions. Thus, considering the areas taxonomy of figure
1, a part of the ontology for DIADIKASIA was as follows:
For the second step, the company’s consultants used the system for evaluating
real tender calls. Figure 2 illustrates an evaluation scenario in which of tender call
entitled “Requirement Analysis for the ERP System of company X”. As it can be
seen in the figure, the system identifies, through the DTC algorithm, the call’s area
as “Information Technology” at a degree of 0.77 and evaluates the expertise and
the experience of the company with scores 77% and 25% respectively. The overall
evaluation given these two criteria and through the OWA operator is 60%.
References
3. Klir, G., Yuan, B. (1995) Fuzzy Sets and Fuzzy Logic, Theory and Applications. Prentice
Hall.
4. Mentzas, G., Apostolou, D., Abecker, A., Young, R., 2002, “Knowledge Asset Management:
Beyond the Process-centred and Product-centred Approaches”, Series: Advanced Information
and Knowledge Processing, Springer London.
5. Wallace, M., Avrithis, Y., “Fuzzy Relational Knowledge Representation and Context in the
Service of Semantic Information Retrieval”, Proceedings of the IEEE International Confer-
ence on Fuzzy Systems (FUZZ-IEEE), Budapest, Hungary, July 2004
6. Wallace, M., Mylonas, Ph., Akrivas, G., Avrithis, Y. & Kollias, S., “Automatic thematic cat-
egorization of multimedia documents using ontological information and fuzzy algebra”, in
Ma, Z. (Ed.), Studies in Fuzziness and Soft Computing, Soft Computing in Ontologies and
Semantic Web, Springer, Vol. 204, 2006.
7. Yager, R.R. “On ordered weighted averaging aggregation operators in multi-criteria decision
making”, IEEE Transactions on Systems, Man and Cybernetics 18(1988) 183-190.
Extended CNP Framework for the Dynamic
Pickup and Delivery Problem Solving
Abstract In this paper, we investigate in the applicability of the Contract Net Pro-
tocol negotiation (CNP) in the field of the dynamic transportation. We address the
optimization of the Dynamic Pickup and Delivery Problem with Time Windows
also called DPDPTW. This problem is a variant of the Vehicle Routing Problem
(VRP) that may be described as the problem of finding the least possible dispatch-
ing cost of requests concerning the picking of some quantity of goods from a pick-
up to a delivery location while most of the requests continuously occur during the
day. The use of contract nets in dynamic and uncertain domains such as ours has
been proved to be more fruitful than the use of centralized problem solving [9].We
provide a new automated negotiation based on the CNP. Negotiation process is
adjusted to deal intelligently with the uncertainty present in the concerned prob-
lem.
1 Introduction
In this paper, we deal with the DPDPTW problem which is NP-hard since it is a
variant of the well-known NP-hard combinatorial optimization VRP [7]. It is
made harder because of the real time requests occurrence and the mandatory
precedence between the pick-up and the delivery customer locations [5], [8].
We propose a multi-agent based approach based on the CNP Negotiation. New
requests assignment to vehicles will be done according to the rules of the CNP
and vehicle agents are responsible for their own routing. Thus, proper pricing
strategies are needed to help the system carrying out the minimum transportation
and delay costs.
The remainder of the paper is structured as follows: In Section 2, we show a
literature review illustrating uses of the CNP in the VRP variants solving. Section
3 gives a detailed description of the DPDPTW. The Extended CNP framework is
then globally presented in section 4. Section 5 and section 6 deal with the details
of the insertion and optimization processes of the framework. In section 7 we dis-
cuss some implementation driven results. Final concluding remarks follow in Sec-
tion 8.
Kouki, Z., Chaar, B.F. and Ksouri, M., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 61–71.
62 Z. Kouki, B.F. Chaar and M. Ksouri
2 Related Literature
Most techniques and models used in transportation planning, scheduling and rout-
ing use centralized approaches. Several techniques and parallel computation
methods were also proposed, to solve models using the data that are known at a
certain point in time, and to re-optimize as soon as new data become available [3]
[7][8][12]. Psaraftis refers to the routing and scheduling in dynamic environments
as if the output is not a set of routes, but rather a policy that prescribes how the
routes should evolve as a function of those inputs evolving in real-time[7], [8].
3 Problem Description
3.1 Notations
In the DPDPTW, a set of customers call the dispatch center during the current day
before a fixed call for service deadline, asking for the transportation of some load
qv from a pick-up location Ov to a delivery location Dv . These requests occur-
rence is considered as the single source of the problem’s uncertainty. They are de-
noted by immediate requests and should be scheduled for the same day.
Extended CNP Framework for the Dynamic Pickup 63
The dispatch center has at its disposal M vehicles moving at a desired fixed ve-
locity and having a maximum capacity Q which should never be exceeded. Each
vehicle starts and ends its route Rk at the central depot v0 respectively at time t k
and t k . It starts its route with empty load qk (t = 0 ) = 0 .
Let N be the set of transportation requests of cardinality n and let
+ −
N = ∪ Oi and N = ∪ Di be the sets of the pick-up and delivery locations, re-
i∈N i∈N
3.2 Hypotheses
• The vehicle is not allowed to skip its next service location, once it is travelling
towards it [5][3] .
• We consider the “Wait First” waiting strategy. Once the pickup or delivery
service of some location is finished, the vehicle should wait for its next departure
time as long as it is feasible, so that it reaches its next destination after its time
window starts [10].
3.3 Constraints
• The service should be made within the time window, and never begin before
ev . A penalty is incurred in the objective whenever lv is exceeded.
• The precedence constraint between the pick-up and the delivery should be re-
spected.
The objective is to minimize the transportation costs. The global cost function
at time t is denoted by CDPDRTW (t ) and can be written as in [5] [3]:
(
1)
64 Z. Kouki, B.F. Chaar and M. Ksouri
∑T
0 ≤k ≤ M
k +α ∑ {max(0, t
v∈N
v − lv )} + β ∑ {max(0, t
0 ≤k ≤ M
k − l0 )}
α and β are the weight parameters and Tk is the travel time of Rk .
∑T
0≤ k ≤ M
k is the total travel time over all vehicles.
∑ max {0, t v − lv } is the penalty, for violating the time window for all the custom-
v∈N
ers of V \ {v0 } = N + ∪ N − .
∑ max{0, t
0≤ k ≤ M
k }
− l0 is the sum of the overtime of all vehicles.
The main task of our model is to find the best possible solution to the DPDPTW.
This is done by finding each time the best routing and scheduling of the set of
available requests. We define routing as the act of determining an ordered se-
quence of locations on each vehicle route and scheduling as the act of determining
arrival and departure times for each route location.
The dispatch center is represented by the dispatch agent who interacts with the
vehicle agents representing the vehicles. Negotiation concerns either the insertion
of the new requests or the optimizing of requests insertions into the planned
routes.
In fact, the dispatch agent acts uniquely as the manager of new requests inser-
tion negotiations, while vehicle agents may act either as managers of the optimiz-
ing negotiations or as participants of both types of negotiations. However, they
may certainly not take both roles in the same time.
In the original CNP, several managers may be involved and announcements are
allowed to be simultaneous. The available participants evaluate task announce-
ments made by the managers and submit bids on those they think convenient, then
managers evaluate the bids and contracts are awarded to the most appropriate bid-
ders [11].
However, our model considers negotiations to be held one by one, because of
the complicated nature of the addressed problem. The negotiation process effec-
tively handles the real time events by going through only feasible solutions. Only
vehicle agents that are effectively able to carry on the requests without exceeding
the vehicles capacities and violating the customers’ time windows constraints are
involved, and bids are binding so that each vehicle is obliged to honour its
awarded tasks. At the end of a negotiation, the task is awarded to the vehicle agent
Extended CNP Framework for the Dynamic Pickup 65
that had submitted the least price bid and who is responsible for improving the
routing and scheduling of its not yet served locations.
In order to reduce the uncertainty involved by new requests incurrence, we as-
sume that vehicle agents are completely aware of their environment. All the in-
formation about available unassigned requests is accessible to the vehicle agents.
A global description of our multi-agent system (MAS) behaviour is presented
in figure2.
.
Fig 2. Global description of our DPDPTW- Extended CNP framework.
We consider calls for service to arrive one-by-one and to be also announced one
by one. The Dispatch agent is the manager of requests insertion tasks. While the
set of available requests is not empty, the dispatch agent waits for the system to
reach a global equilibrium and then establishes a new negotiation contract dealing
with the most impending request. It issues a call for proposals act which specifies
the task by giving all the details about the request, as well as the constraints and
the conditions placed upon this task execution. Vehicle agents selected as poten-
tial participants to the negotiation receive the call for proposals. Their responses,
referred to, as bids or submissions indicate the least possible price of inserting the
request [2][4].
66 Z. Kouki, B.F. Chaar and M. Ksouri
4.3 Optimizing Process
The second process aims to improve planned routes, it concerns the moving of re-
quests from a first to a second route. This class of negotiations involves only Ve-
hicle agents. Actually, in addition to bidding for new requests, each vehicle agent
is also responsible for planning and scheduling its pickup and delivery services.
Thus, vehicle agent may sell its own requests in order to reduce penalties put upon
its objective function, to remove expensive requests or to be ready to accept some
a coming request when its insertion seems to be more beneficial.
Moreover, in order to avoid visited solutions, vehicle agents use their feed back
of last negotiations to make a good selection of the requests to be announced and
of the vehicles to bid for the announced request.
5 Insertion Process
When at least one new request is available, the dispatch agent establishes a new
contract to negotiate the request insertion possibilities with the interested vehicles.
When several requests are available, they are sorted according to their pick up
deadlines, and then announced one by one.
For each announcement, selection of bidders is based on the request insertion
feasibility chances. The dispatch agent selects vehicles that are likely to offer fea-
sible routes.
We assume that, each available request has at least one feasible insertion in the
current routes, so that for each announcement, at least one vehicle is eligible for
the negotiation. Eligible vehicles are those verifying both constraints of capacity
and time window.
• Capacity constraint Check
For each eligible vehicle agent, there exists at least one pair of possible posi-
tions i, j in the ordered sequence of service locations of the route Rk such as v +
may be inserted just next to i and v − next to j : and that verify :
max
(tD )i ≤t ≤(tD ) j
{q (t)}≤ Q − d
k
v.
In fact the insertion of the request v is possible only while (3) is maintained
true during the period of time between the departure time (t D )i from the i th loca-
tion of the route towards the pickup location v + and the departure time (t D ) j
from the j th location of the route towards to the delivery location v − .
• Time Window constraint check
Extended CNP Framework for the Dynamic Pickup 67
Let us divide the vehicle route into two portions: the first portion starts from
time t = 0 until the current time and the second portion starts from the current
time and finishes at the time the vehicle ends its route, only the settings of the
second portion of the route may be modified.
We define a possible routing block of a request v in the route Rk as a block
i th location and ends at the j th location of the route. i.e. v + may
that starts at the
be inserted just next to i and v − previous to j :
i < position (v +) < position(v −) < j
i and j should verify that insertion of v + and v − in the possible routing
block bounded by i and j is a feasible insertion that respect v + and v − time
windows constraints.
ev + + c ≥ (t A )i + t v + ,i ≥ ev+ − c
l v − − c ≤ (t A ) j − t v −, j ≤ l v − + c
c is a parameter specified according to the overall minimum request time win-
dow and t v + , j is the travel cost between the pickup location v + and the j th loca-
tion of the route.
In the original CNP, an agent could have multiple bids concerning different
contracts pending concurrently in order to speed up the operation of the system
[1], [11]. This was proved to be beneficial when the addressed tasks are independ-
ent and the price calculation processes are independent of the agent’s assigned
tasks and independent of any other tasks being negotiated at the same time.
However, in our model, negotiation concerns tasks that could be more or less
inter-related and related with the already assigned tasks.
In fact, at a given time t, for each vehicle agent, because of the capacity con-
straint, the bidding decision depends conjointly on the vehicle’s load which is the
sum of its assigned locations loads, and the new request’s load.
Besides, considering the problem’s precedence and temporal constraints, some
requests insertion may be considered to be prior to others. Locations of new and
assigned requests may be close in space or in time and this makes their insertion
into the same vehicle’s route more beneficial. Then new requests pricing and bid-
ding decisions are closely dependent of the earlier awarded requests.
Now, assuming that bids are binding, let us imagine that Vehicle agents may
bid on several announcements at a time. In that case, tasks may become awarded
to the vehicle while it is still bidding on other tasks. The
Its local settings changes should be considered by the bidder pricing functions.
68 Z. Kouki, B.F. Chaar and M. Ksouri
Otherwise the agent may submit wrong bids that it may be not able to honour
and the resulting solution is unfeasible.
Considering all those reasons, we opt for the negotiations over only one con-
tract at a time.
The key issue to be discussed in this section is how to make pricing of requests
insertions as accurate as possible. The pricing mechanism: a quasi-true valuation
We assume that the vehicle agent is bidding its true valuation such that the
price of the task does not depend on the value other bidders attach to the task [].
However, because the problem is NP-hard, evaluating the cost of the an-
nounced request’s insertion depends on the calculation of the truly optimizing
function, which requires the search of an optimal routing and the calculation of
departure and arrival times for every location included in the route, which is com-
putationally demanding. Thus we consider a fast approximation Cadd (v) of the
announced request v adding cost to the routing solution. The vehicle agent de-
termines first the possible routing block of the request, and while the bidding limit
time is not reached it iterates calculating Cadd (v) for all the feasible pairs of posi-
tions i, j included in the possible routing block and selects the best insertion.
Cadd (v) is the sum of the additional travel cost and the lateness eventually
caused to customers expected to be served after the i th position:
- The additional travel cost is given by:
( )
c add _ travel (v + ) = t i,v + + t v + ,i +1-t i,i +1
k
(c add _ travel
−
)
(v ) k = t j,v− + t v −,j +1 -t j,j +1
(c add _ travel (v))k = (c add _ travel (v + ))k + (c add _ travel (v − ))k
-Penalties caused by time windows violations are given by:
( ) ∑
cadd _ lateness (v) k =
p >i
( (
max 0, l p − t A ( p) + c add _ travel (v + ) k ))
+ ∑ p> j
( (
max 0, l p − t A ( p ) + c add _ travel (v − ) k ))
Awarding a contract means assigning the service request to the successful bidder.
Request insertion is performed considering the solution proposed in the bid.
Calculation of departure and arrival times of different locations is made immedi-
ately once the contract is awarded to the agent, according to the rules of the wait
first waiting strategy, except for the possible routing block
The agent then uses its local optimization heuristics to make better the routing
and scheduling of its assigned locations. Information about available requests is
useful in the management of waiting times, in order to make possible and appro-
priate the insertion of coming requests.
Some assigned requests may result in large delays for services of the same vehi-
cle.
Negotiations for post optimization are based on the idea of controlling cycles
by avoiding the repetition of visited tours. Some moves or bids or announces or
awards could have the status tabu, this helps avoiding the congestion of the nego-
tiation network when an announcement is tabu for a bidder, the bidder is not eli-
gible for this announcement.
In order to avoid infinite negotiation of post-optimization: an initiator could be
tabu, in order to let the chance for others to sell their requests.
An agent may not announce a task, just acquired in the last negotiation. At
least one change should be performed on its route to become authorized to an-
nounce the request.
When the request v is announced, the vehicle that had it in the past in its route
and that had announced it in the nth negotiation is eligible for bidding on, only if at
least one or more changes are performed into its route or in the current manger’s
70 Z. Kouki, B.F. Chaar and M. Ksouri
7 Experimental Results
The purpose of the experiments was to validate the application of the CNP Multi-
agent negotiation in dynamic subjected to constraints domains
Our CNP based solution is implemented using the Jadex BDI agent-oriented
reasoning engine realised as an extension to the widely used JADE middleware
platform [14].
We used the test beds of Mitrovic-Minic [13]. We examined 10 instances of
problems with one depot and respectively 100 and 500 requests. The service area
is 60 * 60 km2, and vehicle speed is 60 km/h. In all instances, requests occur dur-
ing the service period according to a continuous uniform distribution, and no re-
quests are known in advance. The service period is 10 hours. Experiments were
performed a simulation speed of 30 which means that one hour of real life opera-
tions is simulated in 1/30 hours of computer time.
Preliminary experiments were performed to determine convenient selection
and bidding parameter values.
At the end of experiments, we remarked that agents produce improved results.
Considering future and past information made solutions to be more accurate. Ta-
ble 1 provides the average experimental results performed with different initial
fleet of vehicles. Tests considers ranging from five to twenty for the 100 requests
instances and 20 to 40 for the 500 requests instances. when no near future or past
information is considered, when only future information is considered and when
both future and past information are considered.
The table reports total distance travelled and number of vehicles used (m).
8 CONCLUSION
We proposed a CNP automatic negotiation based approach for the DPDPTW op-
timization. Negotiation in our system concerned insertion of new requests and op-
timization of planned routes. It was real time and was adjusted in that way to deal
with special features and constraints of the addressed problem. We considered the
Extended CNP Framework for the Dynamic Pickup 71
use of the one by one negotiation in order to carry out precise insertions .We con-
sidered also agents to be entirely cooperative since their objectives match that
global system objective of reducing the dispatching costs.
We proposed agents to be well informed about their past negotiations and to
use their knowledge about the tasks about to be negotiated.
Congestion of the negotiation network was avoided by the use of a thoughtful
selection among tasks to be announced and bidders. Experimental results showed
that CNP negotiations dealt successfully with dynamic problem of pick-up and
delivery.
REFERENCES
1 Introduction
The satisfiability problem in propositional logic (SAT) is the task to decide for a
given propositional formula in conjunctive normal form (CNF) whether it has a
model. More formally, let C = {C1 ,C2 , . . . ,Cm } be a set of m clauses that involve n
Boolean variables x1 , x2 , . . . , xn . A literal is eitherWa variable xi or its negation ¬xi .
Each clause Ci is a disjunction of ni literals, Ci = nj=1 i
li j . The SAT problem asks to
Vm
decide whether the propositional formula Φ = i=1 Ci is satisfiable. SAT is the first
problem shown to be N P-complete [2]. The N P-completeness concept deals
with the idea of polynomial transformation from a problem Pi to Pj where the es-
sential results are preserved: if Pi returns “yes”, then Pj returns “yes” under the same
problem input. MAX-SAT (or unweighted MAX-SAT) is the optimization variation
of SAT. It asks to find a variable assignment that maximizes the number of satisfied
clauses. In weighted MAX-SAT (or only MAX-SAT), a weight wi is assigned to
each clause Ci (notation: Ciwi ), and the objective is to maximize the total weight of
Menai, M.E.B., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 73–79.
74 Mohamed El Bachir Menai
satisfied clauses ∑m i=1 wi · I(Ci ), where I(Ci ) is one if and only if Ci is satisfied and
otherwise zero. Partial MAX-SAT (PMSAT) involves two weighted CNF formulas
fA and fB . The objective is to find a variable assignment that satisfies all clauses of
fA (non-relaxable or hard clauses) together with the maximum clauses in fB (relax-
able or soft clauses). SAT has seen many successful applications in various fields
such as planning, scheduling, and Electronic Design Automation. Encoding combi-
natorial problems as SAT problems has been mostly motivated by the simplicity of
SAT formulation, and the recent advances in SAT solvers. Indeed, new solvers are
capable of solving very large and very hard real world SAT instances. Optimization
problems that involve hard and soft constraints can be cast as a PMSAT, e.g. uni-
versity course scheduling and FPGA routing. In 1995, Jiang et al. [6] proposed the
first heuristic local search algorithm to solve this problem as a MAX-SAT. In 1997,
Cha et al. [1] proposed another local search technique to solve the PMSAT prob-
lem. In 2005, Menai et al. [9] proposed a coevolutionary heuristic search algorithm
to solve the PMSAT. In 2006, Fu and Malik [4] proposed two approaches based on
a state-of-the-art SAT solver to solve the PMSAT.
The Steiner tree problem (STP) in graphs is a classic combinatorial problem.
It can be defined as follows. Given an arbitrary undirected weighted graph G =
(V, E, w), where V is the set of nodes, E denotes the set of edges and w : E → R+ is
a non-negative weight function associated with its edges. Any tree T in G spanning
a given subset S ⊆ V of terminal nodes is called a Steiner tree. Note that T may con-
tain non-terminal nodes referred to as Steiner nodes. The cost of a tree is defined to
be the sum of its edge weights. The STP asks for a minimum-cost Steiner tree. The
decision version of STP has been shown to be N P-complete by Karp [8]. STP has
found uses across a wide array of applications including network routing [10] and
VLSI design [7]. Several implementations of metaheuristics have been proposed for
the approximate solution of STP or its variations, such as Simulated Annealing [11],
Tabu Search [5], and Genetic Algorithms [3]. In this paper we are interested in solv-
ing the STP as a PMSAT problem. We show how to encode the STP into PMSAT
and propose a practical approach to solve it based on one of the best known SAT
solver WalkSAT [12] with certain extensions. Indeed, the success of WalkSAT and
its variations has led to the paradigm of SAT encoding and solving difficult prob-
lems from other problem domains. Our approach is based on exploiting problem
structural information, backbone in particular, to guide the search algorithm towards
promising regions of the search space. We show empirically that this method is ef-
fective by comparing our results to those obtained with specialized Steiner heuristic
algorithms. The rest of the paper is structured as follows. In the next section, we
explain how to encode the STP into PMSAT. In Section 3, we describe a heuris-
tic algorithm for PMSAT using backbone guided search. Computational results are
reported in Section 4. Concluding remarks are drawn in the last section.
A Logic-Based Approach to Solve the Steiner Tree Problem 75
Jiang et al. [6] suggested to encode STP as a weighted MAX-SAT instance and
to solve it using a MAX-SAT solver. However, a solution for the MAX-SAT in-
stance may violate some clauses whose satisfiability is required for the feasibility
of the STP solution. We propose to encode STP as a PMSAT instance to formu-
late independently hard and soft constraints and to solve it using a PMSAT solver.
Let G = (V, E, w) be a weighted graph of n nodes v1 , v2 , . . . , vn , and S ⊆ V a set of
terminal nodes.
1. For each edge ei j , 1 ≤ i ≤ n, 1 ≤ j ≤ n, connecting nodes i and j of the graph,
introduce a boolean variable xi j . I(xi j ) = 1 if ei j is chosen as part of the Steiner
tree.
2. For each variable xi j , construct the clause V ij
c = (¬xi j )wi j to minimize the cost of
including the edge ei j in the tree. fB = cij are soft clauses.
3. List terminal nodes in an arbitrary order. For some fixed l, generate the possible
k(k ≤ l) shortest paths between successive pairs of nodes using Dijkstra’s algo-
rithm. If no path exists between two terminal nodes, then return no solution.
Variables p1i j , p2i j , . . . , pkij denote the k shortest paths between terminal nodes i
and j. The reduction is an approximation of the original instance, since only the
k shortest paths are generated between pairs of nodes.
4. A solution to STP must contain a path between each pair of terminal nodes.
Namely, for each (vi , v j ) ∈ S × S, construct a clause (p1i j ∨ p2i j ∨ · · · ∨ pkij ).
V
fA1 = (p1ij ∨ p2ij ∨ · · · ∨ pkij ) are hard clauses.
5. Each path must contain all its edges. Namely, for each path pkij containing edges
eil , elm , . . . , er j , construct clauses (pkij ⊃ xil ) ∧ (pkij ⊃ xlm ) ∧ · · · ∧ (pkij ⊃ xr j ) which
are equivalent to (¬pkij ∨ xil ) ∧ (¬pkij ∨ xlm ) ∧ · · · ∧ (¬pkij ∨ xr j ).
V
fA2 = ((¬pkij ∨ xil ) ∧ (¬pkij ∨ xlm ) ∧ · · · ∧ (¬pkij ∨ xrj )) are hard clauses.
6. Let fA = fA1 ∧ fA2 . f = fA ∧ fB is the PMSAT instance yield.
The number of variables is |E| + k(|S| − 1). The total number of clauses is
O(|E| + kL(|S| − 1)), where L is the maximum number of edges in pre-computed
paths. The reduction is linearly dependent on the number of edges. The reduction is
sound as any PMSAT solution yields a valid Steiner tree. Since all hard clauses fA
are satisfied, a path exists between each pair of terminal nodes in the obtained set
of nodes. The reduction is incomplete, since the PMSAT instance will not yield a
solution if there is no Steiner tree using the k paths generated in step 3.
Backbone variables are a set of literals which are true in every model of a SAT in-
stance. The backbone of a PMSAT instance is the set of assignments of values to
variables which are the same in every possible optimal solution. They are proven to
76 Mohamed El Bachir Menai
procedure BB PMSAT
input: A formula F = fA ∧ fB in CNF containing n variables x1 , . . . , xn ,
MaxTries1, MaxTries2, MaxSteps.
output: A solution A for F, or “Not found” if fA is not satisfiable.
begin
for t = 0 to |Ω | − 1 do
A ← WalkSAT MAXSAT(F, X(F),MaxTries=1,MaxSteps);
If A satisfies F then return A;
Ω [t] ← A;
end for
A ← BB WalkSAT MAXSAT(F, X(F), Ω , MaxTries1, MaxSteps);
if A satisfies F then return A;
if (∃ fASAT , fAUNSAT | fA = fASAT ∧ fAUNSAT ) and (A satisfies fASAT )
and (X( fASAT ) ∩ X( fAUNSAT ) = 0)/
then f ← fAUNSAT , X( f ) ← X( fAUNSAT );
else f ← fA , X( f ) ← X( fA );
end if
A f ← BB WalkSAT( f , X( f ), Ω , MaxTries2, MaxSteps);
if A f satisfies f then update A and return A;
return “Not found”;
end
updated at each time a new local minimum is encountered. The second phase of the
algorithm is performed if the best assignment found in the previous phase does not
satisfy fA (a PMSAT instance F is satisfied iff fA is satisfied). In such case, it is
recycled to try to satisfy fA using BB WalkSAT guided by the information in Ω . If
the best assignment found does not satisfy fA , then it is recycled to a model of fA
using Ω .
procedure BB WalkSAT
input: A formula F in CNF containing n variables x1 , . . . , xn ,
Ω , MaxTries, MaxFlips.
output: A satisfying assignment A for F, or “Not found”.
begin
for try = 1 to MaxTries do
Calculate pi , (i = 1, n) using Ω ;
A ← best assignment for n variables among t randomly
created assignments in Ω using pi ;
for f lip = 1 to MaxFlips do
if A satisfies F then return A;
c ← an unsatisfied clause chosen at random;
if there exists a variable x in c with break value = 0
then
A ← A with x flipped;
else
with probability p
x ← a variable in c chosen at random;
with probability (1 − p)
x ← a variable in c with smallest break value;
A ← A with x flipped;
end if
end for
if (A 6∈ Ω ) and (∃Ω [k]|C(Ω [k]) < C(A)) then Ω [k] ← A;
end for
return “Not found”
end
4 Computational Experience
The computing platform used to perform the experiments is a 3.40 GHz Intel Pen-
tium D Processor with 1 GB of RAM running Linux. Programs are coded in C
language. We compared the BB PMSAT results with the optimal solutions of a test
problems’ set of the OR-Library (series D and E). Series D consists of 20 problems
with 1000 nodes, arcs varying from 1250 to 25000, and terminals from 5 to 500.
Series E consists of 20 problems of 2500 nodes, arcs varying from 3250 to 62500,
78 Mohamed El Bachir Menai
and terminals from 5 to 1250. In order to test the effectiveness of the proposed ap-
proach, we compared BB PMSAT results with those obtained with the Tabu Search
method called Full Tabusteiner (F-Tabu) from Gendreau et al. [5], which is one of
the best heuristic approach for the STP in terms of solution quality. BB PMSAT was
also compared to one of the best Genetic Algorithms (GA-E) that has solved STP,
which is due to Esbensen [3].
PMSAT and MAX-SAT instances were generated from STP instances using the
reduction described in Section 2. The number k of pre-computed paths between
each pairs of nodes was fixed to 10. The total number of tries for each run of
BB PMSAT was shared between its two phases. Let r be the first phase length
ratio of the total run length and pb the ratio of pseudo-backbone size to the num-
ber of variables n. BB PMSAT was tested using the following parameter settings:
r = 0.6, pb = 0.5 (values of r and pb are recommended in [9]), MaxFlips = 105 ,
and MaxTries = 100 (shared between MaxTries1 and MaxTries2). WalkSAT was
tested using a noise parameter p = 0.5 (recommended by the authors [12]) and the
same values of MaxFlips and MaxTries used in BB PMSAT.
Table 1 shows the mean results in terms of solution quality (in error percent-
age w.r.t. the optimum) for the series D and E of STP and their comparison with
the Tabu Search method F-Tabu [5] and the Genetic Algorithm GA-E [3]. The re-
sults reported for WalkSAT and BB PMSAT include average CPU time required
over 10 runs. CPU times of the methods GA-E and F-Tabu are omitted because
of a difference in the evaluation of the processing times. BB PMSAT and F-Tabu
were the best approaches in 28 times and gave clearly better solutions than GA-E
(18 times) and WalkSAT (16 times). In terms of solution quality, the average results
given by BB PMSAT and F-Tabu for series D were comparable. However, for se-
ries E, F-Tabu outperformed BB PMSAT. We expect that greater exploration of the
parameters of BB PMSAT may yield still better results.
Overall, BB PMSAT found more optimal solutions than WalkSAT on all in-
stances in less average CPU time. Indeed, the average CPU time achieved by
BB PMSAT and WalkSAT on all the problems is 5.71 secs and 7.58 secs, respec-
tively. These positive results can demonstrate the superiority of the PMSAT encod-
ing and the use of BB PMSAT search procedure in comparison to the MAX-SAT
encoding and the use of WalkSAT procedure for STP. BB PMSAT’s overall perfor-
mance is comparable to the Tabu Search method F-Tabu.
A Logic-Based Approach to Solve the Steiner Tree Problem 79
5 Conclusions
In this paper we have examined a logic-based method to solve STP. We have consid-
ered MAX-SAT and Partial MAX-SAT encodings of STP. Empirical evaluation has
been conducted on these encodings using two heuristic algorithms: BB PMSAT and
WalkSAT. BB PMSAT relies on a pseudo-backbone sampling to guide the search
trajectory through near-optimal solutions. We have reported some computational
results showing that BB PMSAT can solve large instances of STP. It appears that
solving STP as a PMSAT using BB PMSAT is more effective than solving it as a
MAX-SAT using WalkSAT. Results are compared to those of specialized STP al-
gorithms (F-Tabu and GA-E). The performance of BB PMSAT is better than that
of GA-E and close to that of F-Tabu in terms of solution quality . We have tested
larger STP instances and obtained good results. However, the lack of space prevents
us to present them and to discuss the choice of k, the number of precomputed short-
est paths. We can conclude that the reduction of STP into PMSAT and the use of
BB PMSAT represent an effective means of solving STP.
References
1. Cha, B., Iwama, K., Kambayashi, Y., Miyazaki, S.: Local search algorithms for Partial
MAXSAT. In Proc. AAAI-97, (1997), 263–268
2. Cook, S.A.: The complexity of theorem proving procedures. In Proc. 3rd ACM Symposium
of the Theory of Computation, (1971) 263–268
3. Esbensen, H.: Computing near-optimal solutions to the Steiner problem in graphs using a
genetic algorithm. Networks 26, (1995) 173–185
4. Fu, Z., Malik, S.: On solving the Partial MAX-SAT problem. In Proc. SAT’06, LNCS 4121,
(2006) 252–265
5. Gendreau, M., Larochelle, J.-F., Sansò, B.: A tabu search heuristic for the Steiner tree problem.
Networks 34(2), (1999) 162–172
6. Jiang, Y., Kautz, H.A., Selman, B.: Solving problems with hard and soft constraints using a
stochastic algorithm for MAX-SAT. In Proc. 1st Inter. Joint Workshop on Artificial Intelli-
gence and Operations Research, (1995)
7. Kahng, A.B., Robins, G.: On optimal interconnections for VLSI. Kluwer Publishers, (1995)
8. Karp, R.M.: Reducibility among combinatorial problems. In E. Miller and J.W. Thatcher, eds,
Complexity of Computer Computations, Plenum Press, (1972) 85–103
9. Menaı̈, M.B., Batouche, M.: A backbone-based co-evolutionary heuristic for Partial MAX-
SAT. In Proc. EA-2005, LNCS 3871, (2006) 155–166, Springer-Verlag
10. Nguyen, U.T.: On multicast routing in wireless mesh networks. Computer Communications
31(7), (2008), 1385–1399
11. Osborne, L.J., Gillett, B.E.: A comparison of two simulated annealing algorithms applied to
the directed Steiner problem on networks. ORSA Journal on Computing 3, (1991), 213–225
12. Selman, B., Kautz, H.A., Cohen, B.: Noise strategies for improving local search. In Proc.
AAAI-94, (1994) 337–343
13. Slaney, J., Walsh, T.: Backbones in optimization and approximation. In Proc. IJCAI-01, (2001)
254–259
14. Telelis, O., Stamatopoulos, P.: Heuristic backbone sampling for maximum satisfiability. In
Proc. 2nd Hellenic Conference on AI, (2002) 129–139
An Argumentation Agent Models Evaluative
Criteria
John Debenham
1 Introduction
This paper is based in rhetorical argumentation [1] and is in the area labelled:
information-based agency [3]. An information-based agent has an identity, values,
needs, plans and strategies all of which are expressed using a fixed ontology in
probabilistic logic for internal representation. All of the forgoing is represented in
the agent’s deliberative machinery. [2] describes a rhetorical argumentation frame-
work that supports argumentative negotiation. It does this by taking into account:
the relative information gain of a new utterance and the relative semantic distance
between an utterance and the dialogue history. Then [4] considered the effect that
argumentative dialogues have on the on-going relationship between a pair of nego-
tiating agents.
This paper is written from the point of view of an agent α that is engaged in
argumentative interaction with agent β . The history of all argumentative exchanges
is the agents’ relationship. We assume that their utterances, u, can be organised
into distinct dialogues, Ψ t . We assume that α and β are negotiating with the mutual
aim of signing a contract, where the contract will be an instantiation of the mutually-
understood object o(Ψ t ). An argumentation agent has to perform two key functions:
to understand incoming utterances and to generate responses.
John Debenham
University of Technology, Sydney, Australia, e-mail: [email protected]
Debenham, J., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 81–86.
82 John Debenham
2 Assessing a Contract
No matter what interaction strategy an agent uses, and no matter whether the com-
munication language is that of simple bargaining or rich argumentation, a negoti-
ation agent will have to decide whether or not to sign each contract on the table.
An agent’s preferences may be uncertain. In which case, we ask the question: “how
certain am I that δ = (φ , ϕ ) is a good contract to sign?” — under realistic con-
ditions this may be easy to estimate. Pt (sign(α , β , χ , δ )) estimates the certainty,
expressed as a probability, that α should sign proposal δ in satisfaction of her
need χ , where in (φ , ϕ ) φ is α ’s commitment and ϕ is β ’s. α will accept δ if:
Pt (sign(α , β , χ , δ )) > c, for some level of certainty c.
To estimate Pt (sign(α , β , χ , δ )), α will be concerned about what will occur if
contract δ is signed. If agent α receives a commitment from β , α will be interested
in any variation between β ’s commitment, ϕ , and what is actually observed, as the
enactment, ϕ 0 . We denote the relationship between commitment and enactment:
Pt (Observe(α , ϕ 0 )|Commit(β , α , ϕ ))
A general measure of whether q(s) is more interesting than p is: K(q(s) kD(X)) >
x
K(pkD(X)), where K(xky) = ∑ j x j log y jj is the Kullback-Leibler distance between
two probability distributions x and y, and D(X) is the expected distribution in the
absence of any observations — D(X) could be the maximum entropy distribution.
Finally, Pt+1 (X) = Pt (X(s) ). This procedure deals with integrity decay, and with two
probabilities: first, the probability z in the rating of the sequence s that was intended
to achieve τ , and second β ’s weighting Rt (π , τ , s) of the significance of τ as an
indicator of the true value of X. Equation 4 is intended to prevent weak information
from decreasing the certainty of Pt+1 (X). For example if the current distribution
is (0.1, 0.7, 0.1, 0.1), indicating an “acceptable” rating, then weak evidence P(X =
acceptable) = 0.25 is discarded.
84 John Debenham
In this Section we consider how the agent models its partner’s contract acceptance
logic in an argumentative context. In Section 2 we discussed modelling contract
acceptance, but there is much more to be done.
Estimating β ’s evaluative criteria. α ’s world model, M t , contains probability dis-
tributions that model the agent’s belief in the world, including the state of β . In par-
ticular, for every criterion c ∈ C α associates a random variable C with probability
mass function Pt (C = ei ).
The distributions that relate object to criteria may be learned from prior experi-
ence. If Pt (C = e|O = o) is the prior distribution for criteria C over an evaluation
space given that the object is o, then given evidence from a completed negotiation
with object o we use the standard update procedure described in Section 2. For
example, given evidence that α believes with probability p that C = ei in a negoti-
ation with object o then Pt+1 (C = e|O = o) is the result of applying the constraint
P(C = ei |O = o) = p with minimum relative entropy inference as described pre-
viously, where the result of the process is protected by Equation 4 to ensure that
weak evidence does not override prior estimates. In the absence of evidence of the
form described above, the distributions, Pt (C = e|O = o), should gradually tend to
ignorance. If a decay-limit distribution [2] is known they should tend to it otherwise
they should tend to the maximum entropy distribution.
In a multiagent system, this approach can be strengthened in repeated negotia-
tions by including the agent’s identity, Pt (C = e|(O = o, Agent = β )) and exploiting
a similarity measure across the ontology. Two methods for propagating estimates
across the world model by exploiting the Sim(·) measure are described in [2]. An
extension of the Sim(·) measure to sets of concepts is straightforward, we will note
it as Sim∗(·).
Disposition: shaping the stance. Agent β ’s disposition is the underlying rationale
that he has for a dialogue. α will be concerned with the confidence in α ’s beliefs
of β ’s disposition as this will affect the certainty with which α believes she knows
β ’s key criteria. Gauging disposition in human discourse is not easy, but is certainly
not impossible. We form expectations about what will be said next; when those
expectations are challenged we may well believe that there is a shift in the rationale.
α ’s model of β ’s disposition is DC = Pt (C = e|O = o) for every criterion in the
ontology, where o is the object of the negotiation. α ’s confidence in β ’s disposition
is the confidence he has in these distributions. Given a negotiation object o, confi-
dence will be aggregated from H(C = e|O = o) for every criterion in the ontology.
4 Strategies
At(y,x) = ρ × At−1
(y,x) + (1 − ρ ) × I(u) × ∆ (u, x)
for any x, where ρ is the discount rate, and I(u) is the information1 in u. The balance
of α ’s relationship with βi , Bt , is the element by element numeric difference of At
and α ’s estimate of β ’s intimacy on α .
Given the needs model, υ , α ’s relationship model (Relate(·)) determines the tar-
get intimacy, A∗t ∗t
i , and target balance, Bi , for each agent i in the known set of agents
|Agents|
∗t t
Agents. That is, {(Ai , B∗i )}i=1 = Relate(υ , X, Y, Z) where, Xi is the trust model,
Yi is the honour model and Zi is the reliability model as described in [2]. As noted
before, the values for intimacy and balance are not simple numbers but are struc-
tured sets of values over Y ×V .
When a need fires α first selects an agent βi to negotiate with — the so-
cial model of trust, honour and reliability provide input to this decision, i.e. βi =
Select(χ , X, Y, Z). We assume that in her social model, α has medium-term inten-
tions for the state of the relationship that she desires with each of the available
agents — these intentions are represented as the target intimacy, A∗t i , and target bal-
ance, B∗ti , for each agent β i . These medium-term intentions are then distilled into
short-term targets for the intimacy, A∗∗t i , and balance, B ∗∗t , to be achieved in the
i
current dialogue Ψ t , i.e. (A∗∗t
i , Bi ) = Set( χ , Ai , Bi ). In particular, if the balance
∗∗t ∗t ∗t
1 Information is measured in the Shannon sense, if at time t, α receives an utterance u that may
alter this world model then the (Shannon) information in u with respect to the distributions in M t
is: I(u) = H(M t ) − H(M t+1 ).
86 John Debenham
References
1. Rahwan, I., Ramchurn, S., Jennings, N., McBurney, P., Parsons, S., Sonenberg, E.:
Argumentation-based negotiation. Knowledge Engineering Review 18(4), 343–375 (2003)
2. Sierra, C., Debenham, J.: Trust and honour in information-based agency. In: P. Stone, G. Weiss
(eds.) Proceedings 5th International Conference on Autonomous Agents and Multi Agent Sys-
tems AAMAS-2006, pp. 1225 – 1232. ACM Press, New York, Hakodate, Japan (2006)
3. Sierra, C., Debenham, J.: Information-based agency. In: Proceedings 12th International Joint
Conference on Artificial Intelligence IJCAI-07, pp. 1513–1518. Hyderabad, India (2007)
4. Sierra, C., Debenham, J.: The LOGIC Negotiation Model. In: Proceedings 6th International
Conference on Autonomous Agents and Multi Agent Systems AAMAS-2007, pp. 1026–1033.
Honolulu, Hawai’i (2007)
ASIC Design Project Management Supported by
Multi Agent Simulation
1 Introduction
Jana Blaschke
Robert Bosch GmbH, Tübinger Straße 123, 72762 Tübingen, Germany
e-mail: [email protected]
Blaschke, J., Sebeke, C. and Rosenstiel, W., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 87–93.
88 Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel
projects is enabled. This neither allows what-if analysis of projects nor a reasonable
planning for investments and resources.
To overcome these disadvantages we developed a flexible approach that allows
status-analysis and an efficient planning of project courses. The approach was de-
veloped within the context of a public enhanced research project called PRODUK-
TIV+. The objective of PRODUKTIV+ is the development of a comprehensible
model and reference system to measure and assess the productivity and performance
of design processes [7].
2 The Concept
The approach needs to be able to make predictions for real and suggestions for op-
timal project courses. It should give status analysis of ongoing and finished projects
and has to be flexible enough to handle and compare heterogeneous designs.
Fig. 1 Our dynamic approach to assess the design process in order to make it analysable and
plannable. Inputs into the system are design tasks generated by a machine learning model. The
core of the model consists of a multi agent simulation, processing the tasks. The simulation is
organized by a scheduling algorithm. An interference module introduces dynamics to the system.
We use a Multi Agent System (MAS) to describe the design process. MAS accom-
plish time-dependent simulations of complex interactions within a group of agents.
This is a very important characteristic for our purpose, because it allows an inspec-
tion and analysis of the design process at any point of time.
The structure of the MAS depicts the design process of ASICS. Therefore the
agents, their interactions and organization has to describe a design team, the de-
sign environment and structure. Every agent runs in an independent thread. Basic
design tasks that have to be accomplished during the design process, determining
the structure of a simulation run. The duration and sector of the tasks as well as its
dependencies to other tasks account for a large part of the simulation organisation.
We defined four different agent types. The designer agent resembles the human
designer. At their initialization characteristics have to be defined that assign working
areas and properties to the designer. A designer can access a tool-pool that it can use
to accomplish tasks of a specified working sector. The tool agent specifies a design
tool. Its properties denote the design tasks that can be executed with the tool. The
management and alignment of tasks to agents is done by an administration agent.
Dynamics are introduced to the simulation by an interference module. It accounts
for a realistic simulation.
The simulation structure is determined by the tasks and the agent interactions.
Designers can only execute a task belonging to their working area using an appro-
priate tool. A designer can accomplish only one task at a time. A tool license can
only be used by one designer at a time. If a designer has snatched a free tool license,
he sends a task-request to the administration agent. If an executable task match-
ing the designer’s capabilities exists, the administration agent will provide it. For
implementation we used the agent platform A-globe [6].
90 Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel
To obtain a simulation that reflects reality as close as possible, dynamics are intro-
duced to the MAS. This is accomplished by an interference module. We identified
four main factors that have a strong influence on the course of the design process.
The first factor is the agent availability restricted by illness and holidays. If an
agent is out of office, he is not able to take and process a task. If he is already work-
ing on a task, the task duration will be prolonged or the task has to be passed to
another agent. The second factor is the occurrence of unforeseen events, say long
term resource drop outs or the introduction of new workload. The recursion of tasks
depicts a third factor. There can be intense recursions between front- and backend
design. The recursion probabilities were extracted from historical data and are in-
serted into a Markov Chain. As the tasks that are passed from the ML models to the
MAS have averaged durations, a fourth factor is the task duration deviation.
The administration agent has to manage the design tasks and to assign them to the
designers. A natural way of doing this is to use a task schedule. As the duration and
efficiency of the whole design process strongly depends on the tasks arrangement,
schedule optimization is very important. This problem is known as the precedence,
resource constrained scheduling problem, which is NP-complete [3]. An efficient,
heuristical way to address this problem is the formulation of a zero-one ILP [4].
Fig. 2 ILP schedule of a real design subproject. It was optimized in terms of resource and time
constraints and data dependencies between tasks. The constraints were formulized for three differ-
ent regions, the analogue frontend (blue) and backend (red) and the digital frontend (yellow).
To ensure that the number of rtype resources, here designers or tools, is not exceeded
at any point in time, the issue constraint is formulated for every resource type:
n
∑ xij ≤ rtype , (2)
i=1
The precedence constraint makes sure that dependencies between tasks are met, if
a task k depends on a task i, i has to be scheduled before k:
m m
∑ j ∗ xkj + Lki ≤ ∑ j ∗ xij , (3)
j=1 j=1
We used part of a real ASIC-design project to evaluate our approach. The design
task durations were averaged durations, gained from machine learning models. We
initialized a MAS with one designer and one tool licence for digital the frontend,
three designers and two tool licenses for the analogue frontend and 1.5 analogue
layouter (one is part-time) and two tool licenses. We created a schedule by solving
the ILP for these tasks.
At first we switched the interference module off and defined very tight degrees of
freedom to obtain a validation of the MAS. Our expectations, that the MAS executes
exactly the schedule were met. The ILP schedule, see figure 2, and the result of the
simulation run, shown in figure 3(a), exhibit exactly the same project course.
The validation of the MAS was only a sanity check. To get a realistic simulation
the interference module was switched on. Several runs gave an estimation for the
92 Jana Blaschke, Christian Sebeke, Wolfgang Rosenstiel
Fig. 3 (a) Simulation of the schedule provided by the ILP. The interference module is turned off.
This run gives a simulation of the MAS. (b) Simulation of the schedule provided by the ILP. The
interference module is switched on.
average runtime and the deviation of the project. Figure 3(b) shows an average run.
The deviation of the runtime is 5 weeks . The simulation results reflect the real
course of the project. They give a good picture of reality and allow an analysis of
the project as well as suggestions on how to arrange the tasks in a good manner.
The results of our simulation are quite satisfying. The MAS introduces a good pos-
sibility to simulate the complex and dynamic ASIC-design process and to make it
assessible. A realistic simulation is achieved by the introduction of an interference
module. To guide and improve the simulation, a schedule optimization is accom-
plished that offers ideal task-arrangements and resource allocations.
Up to 20 tasks (average 5 weeks duration for one task) can be handled efficiently
by the ILP. Because of its ploynomial runtime, bigger problems have very long
runtimes. Therefore we are working on a scheduling optimization based on genetic
algorithms to overcome these limitations.
References
[1] http://sourceforge.net/projects/lpsolve.
[2] http://www.numetrics.com/homepage.jsp.
ASIC Design Project Management Supported by Multi Agent Simulation 93
Ioannis Katakis and Georgios Meditskos and Grigorios Tsoumakas and Nick
Bassiliades and Ioannis Vlahavas
Abstract Semantic Web services have emerged as the solution to the need for au-
tomating several aspects related to service-oriented architectures, such as service
discovery and composition, and they are realized by combining Semantic Web tech-
nologies and Web service standards. In the present paper, we tackle the problem of
automated classification of Web services according to their application domain tak-
ing into account both the textual description and the semantic annotations of OWL-S
advertisements. We present results that we obtained by applying machine learning
algorithms on textual and semantic descriptions separately and we propose methods
for increasing the overall classification accuracy through an extended feature vector
and an ensemble of classifiers.
1 Introduction
Semantic Web services (SWSs) aim at making Web services (WSs) machine un-
derstandable and use-apparent, utilizing Semantic Web technologies (e.g. OWL-S1 ,
WSMO2 , SAWSDL [11]) and tools (e.g. Description Logic (DL) reasoners [2]) for
service annotation and processing.
The increasing number of available WSs has raised the need for their automated
and accurate classification in domain categories that can be beneficial for several
tasks related to WSs, such as:
Ioannis Katakis · Georgios Meditskos · Grigorios Tsoumakas · Nick Bassiliades · Ioannis Vlahavas
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece,
e-mail: {katak, gmeditsk, greg, nbassili, vlahavas}@csd.auth.gr
1 http://www.daml.org/services/owl-s/
2 http://www.wsmo.org/
Katakis, I., Meditskos, G., Tsoumakas, G., Bassiliades, N. and Vlahavas, I., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 95–104.
96 Ioannis Katakis et al.
3 http://www.oasis-open.org/committees/uddi-spec
4 http://www.w3.org/Submission/SWRL/
Automated Semantic Web Service Classification 97
2 Related Work
During the last years, a considerable effort was made for developing automatic or
semi-automatic methods for classifying WSs into their application domain. In [3],
WSDL5 text descriptions are used in order to perform automatic classification of
WSs using Support Vector Machines (SVMs) [21]. Many approaches [7, 13, 18, 6]
use structured text elements from various WSDL components (e.g. operations) as
input to various classification methods like naive Bayes [7, 13], SVMs [7], decision
trees [18] or even ensemble of classifiers [6, 18]. The main disadvantage of such
approaches is that no semantic information is taken into account that, as we discuss
in this paper, can be considerably beneficial for classification.
In [5], the classification of WSs is based on OWL-S advertisements and it is
achieved tby calculating the similarities of I/O annotation concepts between the un-
classified WS and a set of preclassified WSs for each class. The predicted class is
the one with the greatest overall similarity. The main disadvantage of this approach
is that the representation is not flexible enough in order to be used with any machine
learning algorithm and that the text of the description is ignored. We provide evalu-
ation results that prove the utility of even short textual descriptions that may appear
in the description of the WS advertisement.
A similar task to classification is SWS matchmaking. In this case a query WS
description is given in order to find a set of similar WSs [9, 10].
This section describes a number of approaches for representing the OWL-S adver-
tisement of a WS as a feature vector. Given a collection of labeled WSs, the corre-
sponding feature vectors along with the labels will constitute the training examples
for the machine learning algorithm.
5 www.w3.org/TR/wsdl
98 Ioannis Katakis et al.
that the human entered textual description will contain words that will discriminate
one category from another.
Algorithm 1: semSigVector
Input: The ontology concept vocabulary VC , the WS description i and the DL reasoner R
Output: The weighted vector Si
1 Set inouts ← i.inputs ∪ i.outputs;
2 Si ← [0, .., 0];
3 forall j ∈ inouts do
4 Si [VC .index(j)] ← 1;
5 forall k ∈ VC do
6 if R( j u k v⊥) then
7 continue;
8 if R( j ≡ k) ∨ R( j v k) ∨ R(k v j) then
9 Si [VC .index(k)] ← 1
10 return Si
In this case we merge the textual and syntactic / semantic vector into one, expecting
from the classifier to learn relationships between textual features, syntactic/semantic
features and categories. We denote the vector that represents the combination of the
textual description (T) and the syntactic signature TextSynSig (N) as:
Many machine learning algorithms can output not only the predicted category for a
given instance but also the probability that the instance will belong to each category.
Having two classifiers trained, one on textual features HT (d, λ ) → [0, 1] and one on
semantic features HS (d, λ ) → [0, 1] that output the probability that the WS d will
belong to category λ , we define two different decision schemas. If L is the set of
all categories then let hT = arg maxλ ∈L HT (d, λ ) and hS = arg maxλ ∈L HS (d, λ ) be
the decisions of HT and HS respectively and hE the decision of the ensemble. The
first schema (Emax ) just selects the decision of the most confident classifier. In other
words, hE = hT if HT (d, hT ) ≥ HS (d, hS ) or hE = hS otherwise. The second schema
(Eavg ) averages the probabilities over both classifiers for a given category and then
selects the category with the maximum average:
µ ¶
HT (d, λ ) + HS (d, λ )
hE = arg max (3)
λ ∈L 2
5 Evaluation
We used the OWLS-TC ver. 2.2 collection6 that consists of 1007 OWL-S adver-
tisements, without any additional modification. The advertisements define profile
instances with simple atomic processes, without pointing to physical WSDL de-
scriptions. Therefore, in our experiments we did not incorporate any WSDL con-
struct. The textual description of each advertisement consists of the service name
and a short service description. The WS I/O parameters are annotated with concepts
from a set of 23(= |Vo|) ontologies that the collection provides. The advertisements
are also preclassified in seven categories, namely Travel, Education, Weapon, Food,
Economy, Communication, and Medical. Please note that this collection is an ar-
tificial one. However, it is the only publicly available collection with a relatively
large number of advertisements, and it has been used in many research efforts. Af-
ter a preprocessing of the collection we obtained |Vc | = 395 and |VT | = 456. All
different versions of the resulting dataset are available online in Weka format at
http://mlkd.csd.auth.gr/ws.html.
In all of experiments we used the 10-fold cross validation evaluation procedure
and the Pellet DL reasoner [19] in order to compute the subsumption hierarchies of
the imported ontologies. In order to obtain classifier-independent results, we have
tested all of the approaches discussed in the previous section with 5 different classi-
fiers: 1) the Naive Bayes (NB) [8] classifier, 2) the Support Vector Machine (SVM)
(SMO Implementation [16]), 3) the k Nearest Neighbor (kNN) classifier [1], 4) the
RIPPER rule learner [4] and 5) the C4.5 decision tree classifier [17]. We used algo-
rithms from different learning paradigms, in order to cover a variety of real-world
application requirements. We used the Weka [22] implementations of all algorithms
with their default settings. The kNN algorithm was executed with k = 3. Emax and
Eavg are implemented by training two classifiers of the same type (one from Text
and one from SemSig representation) and using the combination schemes described
in Section 4. It would be interesting to study the combination of models of different
type but we consider this study out of the scope of this paper.
Table 1 presents the predictive accuracy for all methods and classifiers. With bold
typeface we highlight which method performs best for a specific classifier while
we underline the accuracy of the best performing classifier for each method. We
first notice the high performance of the SVM which achieves the best predictive
accuracy for almost all cases. The second best performance is achieved by C4.5.
Considering the different representation methods we first observe that the ac-
curacy of the Text representation reaches high levels (outperforming SynSig and
OntImp) even with this small amount of text from the OWL-S textDescription
6 http://projects.semwebcentral.org/projects/owls-tc/
102 Ioannis Katakis et al.
property. This is probably due to the existence of characteristic words for each
category. The OntImp vector-based representation performs the worst mainly be-
cause there are general-purpose ontologies in the collection that are imported by
domain unrelated advertisements. Moreover the SynSig approach despite its sim-
plicity (without the use of Pellet) achieves a decent performance. However, the bet-
ter performance of SemSig over SynSig stretches the importance of the inferencing
mechanism. By employing a reasoner, we are able to deduce more semantic rela-
tionships among the annotation concepts, beyond simple keyword matches, such as
equivalent (≡) or subsumed (v) concepts.
By studying the results of the enhanced representations TextSynSig and TextSem-
Sig we observe that both approaches outperform their corresponding basic represen-
tations (Text and SynSig for the former and Text and SemSig for the latter). This fact
is an indication that the classifier successfully takes advantage of both textual and
syntactic / semantic features .
Another fact that stretches the importance of combining text and semantics is the
accuracy of the two ensemble methods Emax and Eavg that present the best overall
performance. Emax and Eavg outperform TextSemSig probably because they build
two experts (one from text and one from semantics) while TexSemSig builds one
model that learns to combine both set of features.
racy. Note, that our methodology can be extended to other SWS standards, such
as SAWSDL.
Our classification approach can be extended in two directions. Firstly, the SemSig
representation can be extended in order to incorporate also non-binary vectors, using
as weights the similarities that are computed by concept similarity measures [12].
In that way, we will be able to define different degrees of relaxation in the represen-
tation. Secondly, it would be interesting to experiment with multilabel classification
methods [20] for collections of SWSs that belong to more than one category.
Acknowledgements This work was partially supported by a PENED program (EPAN M.8.3.1,
No. 03E∆73), jointly funded by EU and the General Secretariat of Research and Technology.
References
1. Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)
2. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description
Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press
(2003)
3. Bruno, M., Canfora, G., Penta, M.D., Scognamiglio, R.: An approach to support web ser-
vice classification and annotation. In: Proceedings IEEE International Conference on e-
Technology, e-Commerce and e-Service, pp. 138–143. Washington, DC (2005). DOI
http://dx.doi.org/10.1109/EEE.2005.31
4. Cohen, W.W.: Fast effective rule induction. In: Proceedings 12th International Conference on
Machine Learning, pp. 115–123 (1995)
5. Corella, M., Castells, P.: Semi-automatic semantic-based web service classification. In:
J. Eder, S. Dustdar (eds.) Business Process Mangement Workshops, Springer Verlag Lecture
Notes in Computer Science, vol. 4103, pp. 459–470. Vienna, Austria (2006)
6. Heß, A., Johnston, E., Kushmerick, N.: ASSAM: A tool for semi-automatically annotating
semantic web services. In: Proceedings 3rd International Semantic Web Conference (2004)
7. Hess, A., Kushmerick, N.: Learning to attach semantic metadata to web services. In: Proceed-
ings International Semantic Web Conference (ISWC’03), pp. 258–273 (2003)
8. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Pro-
ceedings 11th Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan
Kaufmann, San Mateo (1995)
9. Kiefer, C., Bernstein, A.: The creation and evaluation of isparql strategies for match-
making. In: M. Hauswirth, M. Koubarakis, S. Bechhofer (eds.) Proceedings 5th Euro-
pean Semantic Web Conference, LNCS. Springer Verlag, Berlin, Heidelberg (2008). URL
http://data.semanticweb.org/conference/eswc/2008/papers/133
10. Klusch, M., Kapahnke, P., Fries, B.: Hybrid semantic web service retrieval: A
case study with OWLS-MX. In: International Conference on Semantic Comput-
ing, pp. 323–330. IEEE Computer Society, Los Alamitos, CA (2008). DOI
http://doi.ieeecomputersociety.org/10.1109/ICSC.2008.20
11. Kopecký, J., Vitvar, T., Bournez, C., Farrell, J.: Sawsdl: Semantic annotations for
wsdl and xml schema. IEEE Internet Computing 11(6), 60–67 (2007). DOI
http://dx.doi.org/10.1109/MIC.2007.134
12. Meditskos, G., Bassiliades, N.: Object-oriented similarity measures for semantic web ser-
vice matchmaking. In: Proceedings 5th IEEE European Conference on Web Services
(ECOWS’07), pp. 57–66. Halle (Saale), Germany (2007)
104 Ioannis Katakis et al.
13. Oldham, N., Thomas, C., Sheth, A., Verma, K.: Meteor-s web service annotation framework
with machine learning classification. In: Proceedings 1st International Workshop on Semantic
Web Services and Web Process Composition (SWSWPC’04), pp. 137–146 (2005)
14. Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.P.: Importing the semantic web in uddi.
In: Revised Papers from the International Workshop on Web Services, E-Business, and the
Semantic Web (CAiSE’02/WES’02), pp. 225–236. Springer-Verlag, London, UK (2002)
15. Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.P.: Semantic matching of web services
capabilities. In: Proceedings 1st International Semantic Web Conference on The Semantic
Web (ISWC’02), pp. 333–347. Springer-Verlag, London, UK (2002)
16. Platt, J.: Machines using sequential minimal optimization. In: B. Schoelkopf, C. Burges,
A. Smola (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1998).
URL http://research.microsoft.com/j̃platt/smo.html
17. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Ma-
teo, CA (1993)
18. Saha, S., Murthy, C.A., Pal, S.K.: Classification of web services using tensor space model and
rough ensemble classifier. In: Proceedings 17th International Symposiumon Foundations of
Intelligent Systems (ISMIS’08), pp. 508–513. Toronto, Canada (2008)
19. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical owl-dl reasoner.
Web Semant. 5(2), 51–53 (2007). DOI http://dx.doi.org/10.1016/j.websem.2007.03.004
20. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of
Data Warehousing and Mining 3(3), 1–13 (2007)
21. Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag, NY, USA (1995)
22. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd
Edition. Morgan Kaufmann Publishers Inc., San Francisco, CA (2005)
Revealing Paths of Relevant Information in
Web Graphs
1
University of the Aegean
Department of Financial and Management Engineering
Department of Information and Communications Systems Engineering
{gkouzas, janag}@aegean.gr
2
National Technical University of Athens
School of Electrical and Computer Engineering
[email protected], [email protected]. gr
Abstract In this paper we propose a web search methodology based on the Ant
Colony Optimization (ACO) algorithm, which aims to enhance the amount of the
relevant information in respect to a user’s query. The algorithm aims to trace
routes between hyperlinks, which connect two or more relevant information nodes
of a web graph, with the minimum possible cost. The methodology uses the Ant-
Seeker algorithm, where agents in the web paradigm are considered as ants capa-
ble of generating routing paths of relevant information through a web graph. The
paper provides the implementation details of the web search methodology pro-
posed, along with its initial assessment, which presents with quite promising re-
sults.
1 Introduction
In this paper, a new web search methodology based on the ant colony algorithm, is
proposed. In more detail we suggest an ant colony algorithm approach, which is
capable of tracing relevant information in Internet. Based on [1][2], Ant Colony
algorithm can be applied on a connected graph Gp = (P,L), where P are the nodes,
and L the link between the nodes, which represents a problem definition. Every
route in this graph represents a solution of the initial problem. ACO converges in
an optimal solution tracing routes in the graph. In our approach, we consider the
world-wide web as a graph G. Although our methodology maintains most of the
Kouzas, G., Kolias, V., Anagnostopoulos, I., and Kayafas, E., 2009, in IFIP International Federation
for Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 105–112.
106 George Kouzas et al.
ant colony algorithm characteristics [3][4], its uniqueness lies upon the fact that it
applies in an environment, the structure of which is not pre-defined. In addition,
some search techniques for locating and evaluating the relevant information are
used based on web page similarity [5], [6],[7] as well as web page clustering [8].
The aim of this paper is to propose a methodology that is able to trace routes be-
tween hyperlinks, which connect two or more relevant information units (web
pages) with the minimum possible cost. Initially we consider a web information
unit (web page), which is relevant to the user requests. This web page is consid-
ered as the starting point of the search. The search is based on the principle that
when some information relevant to the user’s request exists on a point-node of the
world-wide web, then another point-node in a “close distance” is highly possible
to contain similar (and thus relevant) information [9]. We define the hyperlink as
the basic distance unit in the web universe. The “distance” between two nodes, is
defined as the number of subsequent hops needed in order to be transferred from
one node to another and vice versa. The methodology consists of three phases. In
the first phase the start point of the search are defined. These could be either user
defined web pages, either result of search engines [10][11][12]. In phase two, the
suggested search algorithm takes place. The algorithm procedure runs iteratively,
and each time that converges to an information unit, it specifies a new starting
point. The third and last phase is to group the results according to how relevant
their contents are. During the pre-processing of the hypertext documents the tex-
tual information is extracted from HTML format. Depending on the tags, we con-
sider three levels of importance (that is High, Medium and Low). Then, the outgo-
ing links are extracted in order to construct the search graph. The final step is the
web page similarities calculation according to [13][14], in which the document
hyperlink structure is taken into account, while it consists of the pre-process phase
and the similarity estimation phase. As document similarity, we consider that sen-
tences and phrases carry also significant information regarding the textual content.
According to this, we used a comparison measure based not only on the similarity
of individual terms but also on the similarity of sentences and phrases [15]. The
similarity between two documents, d1 and d2 is computed according to Equation 1.
In Equation 1 g(li) is a function that marks the length of the common phrase, while
s j1 and sk 2 represent the initial length of the d1 and d2 document sentences re-
spectively. Function g(li) is proportional to the ratio of the common sentence por-
tion length to the total sentence length as defined in Equation 2, while |msi| is the
matching phrase length and γ indicates a sentence partitioning greater than or
equal to 1. Parallel to this procedure, the term-based similarity of the tested web
pages takes place using the Vector Space Model (VSM) [16],[17] as defined in
Equation 3. However, the inverse document frequency between terms is not taken
Revealing Paths of Relevant Information in Web Graphs 107
under consideration, since the estimated similarity value concerns two web pages
and not a collection of web pages as required from the VSM. Therefore the final
document similarity SIMi value is given from Equation 4. Result grouping is used
for presenting the search results better. The acquired web pages are analyzed and
presented in clusters. Each cluster contains a set of high importance information
units. The cluster creation is based on the methodology proposed in [13][14] and
the similarity histogram analysis. For specifying similarity, the classic vector
space model is used [16],[17].
S p (d1 ,d 2 ) =
∑ [g(l ) ⋅ (f w + f w )]
i i1 i1 i12 i2
2
(1)
∑ |s |⋅ w + ∑ |s | ⋅ w
j1 j1 k2 k2
γ
ms i
g (li ) = (2)
s
i
di ⋅ d j ∑ (wk,i × wk, j )
sim(d i , d j ) = = (3)
|d i | × |d j | ∑ wk,2i × ∑ wk,2 j
(4)
SIM i = S(d 1 , d 2 ) = 0.5S p (d 1 , d 2 ) + 0.5St (d 1 , d 2 )
4 Ant Seeker
In this section we present the proposed search algorithm, which is based on the
theoretical model of the ant colony algorithm [1]. The proposed algorithm deals
with the world-wide web information search problem. The structure of the world-
wide web consists of a set of information units (web pages) and a set of links (hy-
perlinks) between them. Thus, assuming a graph Gp = (P,L), where P are the
nodes, and L the link set between the nodes, we can consider the world-wide web
as a graph G with infinite dimensions.
The proposed algorithm is a slight modification of the ant colony algorithm [9],
[18] and therefore it adopts most of the basic characteristics of the colony algo-
rithms family [1]. However, several modifications were made in order to apply the
proposed algorithm in the particular World Wide Web paradigm. Thus:
• Each artificial ant can visit a predefined maximum set of nodes.
108 George Kouzas et al.
• All artificial ants start from the start-node. In addition, when the algo-
rithm starts there is no further information for the graph structure in which search
will be applied. In this way, the harvesting procedure of real ants is simulated.
• The process of nodes recognition, which they have relevant content with
the initial node, is based on the web page similarity mechanism as described in the
previous section.
The search begins from the initial start node which is given by the user and in
each step of the algorithm, each ant-agent moves from node i to node j. Assuming
that in node j the pheromone value at time t is τj(t), then the ant visits node j
through node i according to the pheromone function. The process is repeated until
each ant visits the predefined maximum number of nodes. After the creation of the
candidate routes, the best route is extracted and the node pheromone values are
updated. This process is then repeated, while it ends when a convergence to a spe-
cific route is found. The node, which has the route with the maximum similarity
value in comparison with the starting node, is assigned as the starting node for the
next search. During the initialization phase, three variables are defined: the total
ant-agent number NoA, the initial pheromone value IPV is defined in each new
node and the number (Nmax) of the maximum nodes an ant can visit.
For every new node is added, the content similarity to the initial one is checked.
Thus, each node is characterized from a similarity value, which implies the quality
of a node (Equation 4). In order to specify the quality of a node, apart from its
content similarity to the initial node, a second value that defines its ability to leads
to a node with high quality content is also specified. Therefore, points (nodes)
with low similarity values increase their significance when they lead to points of
high frequency. The calculation of a quality value is given by the heuristic func-
tion according to Equation 5, where d is the path of an ant-agent where node i is
included in ( 0 < d < NoA ), SIMi is the similarity function of node j as defined in
Equation 4, and SIM dj stands for the similarity function of node j, which belong
to the route d right after its previous visit in the node i ( i < j < N max ). As men-
tioned before, in the initial phase, each node inserted in the graph, has a phero-
mone value IPV. Every time, a complete iteration of algorithm occurs, the phero-
mone is updated. More specifically, the nodes used as intermediate or final points
in the ant routes are updated. This procedure is given from Equations 6 and 7. In
Equation 6, hi is the heuristic function given from Equation 5, while k is the num-
ber of ants that used node i for the route creation. According to Equation 7, the
nodes that lead to high frequency routes increase their pheromone values substan-
tially over the algorithm iterations. To avoid infinite assignment of pheromone
values in certain nodes, the pheromone value is normalized between zero and one
as defined by Equation 8, where τ max (t +1 ) , is the maximum pheromone value in
Revealing Paths of Relevant Information in Web Graphs 109
the current iteration. Each time an ant is in a node i, it must choose the next node j.
The nodes, which are considering to be visited, are the directly connected nodes.
This defines the accessibility value of each web page given by Equation 9. In or-
der to avoid endless loops, accessibility excludes the nodes, which contributed in
the past in the creation of the route. The algorithm utilizes the classic probability
model of the ant colony algorithms given from Equation 10. Whenever an algo-
rithmic iteration occurs, ants make a route, based on the pheromone value and the
quality of the nodes in the search graph. The nodes with the highest quality values
increase their pheromone values, and thus they have higher probabilities to be
chosen. A solution consists of a chosen route (set of nodes) and not a single node.
As solution we define the node of the final route, which presents the larger simi-
larity value. This node is then added to the list of solutions.
∆τ i = kh i (6)
τi (t + 1 ) = τ i (t) + ∆τ i
'
(7)
τ i (t + 1 )
'
τi (t + 1 ) = (8)
τ max (t + 1 )
ηij = 1 if node j is directly linked from node i (9)
0 otherwise
τ j ⋅ ηij
Ρij = (10)
∑ τ k ⋅ ηik
k ∈allowed k
5 Results - Evaluation
This section presents the results of the proposed methodology. The evaluation
took place in two experiment phases. In the first we evaluate the performance of
the proposed Ant-Seeker algorithm [9], [18] by applying the algorithm in three
different queries. In the second phase we evaluate the introduction of some clus-
tering techniques to the methodology for grouping the results. The search proce-
dure was examined by querying different parts of the World Wide Web three
times. During this procedure we used only web-pages with English content. In or-
der to apply and evaluate the algorithm, we followed three steps. The first step in-
volves the preprocessing of the WebPages; the second involves the balancing of
variables NoA, Nmax and IPV while the third step involves the algorithm execu-
tion. For all experiments the variable values where chosen to be NoA=10,
110 George Kouzas et al.
Nmax=3, NC=100 and IPV=0.4. For each search the set of returning results is
equal to the number of algorithm iterations. This reduces the result quality, but
this is important especially in cases where the algorithm doesn’t manage to con-
verge in a relative to the query solution during the initial search stages. The results
of clustering to the three algorithm evaluation sets appear in table 6. Applying
clustering methods to the returning results, the percentage of related document re-
trieval is decreasing (between 2% to 5%) but at the same time their quality in-
creases (from 30-50% to 80-90%). This is an expected behavior, because, the
nodes-pages used for the search continuation only, are cut off due to the low simi-
larity value with the initial document. However, a small part of the correct results,
are not ranked correctly during the clustering. The explanation is that the similar-
ity calculation model during algorithm execution is given by Equation 4 and on
the other hand the similarity calculation model, during clustering is the vector
space model [17]. We use a different function to calculate the similarity because
the final collection of the web pages is unknown during the search, so we use
equation 4 that defines similarity between a pair of pages. On the contrary during
clustering the total returning results virtually defines the collection. In table 2 the
results of applying the methodology for 6 random queries in the world-wide web
appear. The proposed algorithm’s ability to seek and extract information relative
to the query from the world-wide web is outlined in the experimental results.
However during the experiments we created a set of constraints.
The most important constraint is the scale of the world-wide web which did not
allow applying the algorithm in a wider scale search. The sample size was of value
200.000. The proper evaluation of the algorithm requires full definition of the
samples as concerning their informational relativity to the reference node. The
classification of all the samples based on similarity gives an estimation of the rela-
tion of the documents and therefore a classification measure but still remains a
mechanical classification method which cannot replace the human factor. For the
evaluation some machine learning techniques could be used as in neural networks
[10], [11] but a set of already classified documents is required in order to extract
the relative documents.
The second constraint that virtually is a result of the previous constraint is the
overlapping between searches. The algorithm includes an overlapping search pre-
vention mechanism in order to avoid creating cyclic search routes. However, in a
limited portion of the world-wide web forbidding backtracking would result in a
search termination in only a few steps (2 to 5 searches per sample). For this rea-
son, the only constraint assigned for route creation was blocking adding a node,
which belongs to the current set of solutions of the algorithm.
6 Conclusion
1 M. Dorigo and T. St¨utzle. Ant Colony Optimization. The MIT Press, 2004.
2 Dorigo M., and Caro G.D., 1999, “Ant Algorithms Optimization. Artificial Life”, 5(3):137-
172.
3 Dorigo M., and Maniezzo V., 1996, “The ant system: optimization by a colony of cooperat-
ing agents”. IEEE Transactions on Systems, Man and Cybernetics, 26(1):1-13.
4 Dorigo M. and Caro G.D., 1999, “The Ant Colony Optimization Meta-heuristic” in New
Ideas in Optimization, D. Corne, M. Dorigo, and F. Glover (Eds.), London: McGraw-Hill, pp.
11-32
5 Pokorny J (2004) Web searching and information retrieval. Computing in Science & Engi-
neering. 6(4):43-48.
6 Oyama S, Kokubo T, Ishida T (2004) Domain-specific Web search with keyword spices.
IEEE Transactions on Knowledge and Data Engineering. 16(1):17-27.
7 Pokorny J (2004) Web searching and information retrieval. Computing in Science & Engi-
neering. 6(4):43-48.
8 Broder A, Glassman S, Manasse M, Zweig G. Syntactic clustering of the Web. Proceedings
6th International World Wide Web Conference, April 1997; 391-404.
9 G. Kouzas, E. Kayafas, V. Loumos: “Ant Seeker: An algorithm for enhanced web search”,
Proceedings 3rd IFIP Conference on Artificial Intelligence Applications and Innovations
(AIAI) 2006, June 2006, Athens, Greece. IFIP 204 Springer 2006, pp 649-656.
10 I. Anagnostopoulos, C. Anagnostopoulos, G. Kouzas and D. Vergados, “A Generalised Re-
gression algorithm for web page categorisation”, Neural Computing & Applications journal,
Springer-Verlag, 13(3):229-236, 2004.
11 I. Anagnostopoulos, C. Anagnostopoulos, Vassili Loumos, Eleftherios Kayafas, “Classifying
Web Pages employing a Probabilistic Neural Network Classifier”, IEE Proceedings – Soft-
ware, 151(03):139-150, March 2004.
12 Anagnostopoulos I., Psoroulas I., Loumos V. and Kayafas E., “Implementing a customized
meta-search interface for user query personalization”, Proceedings 24th International Confer-
ence on Information Technology Interfaces (ITI’2002), pp. 79-84, June 2002, Cav-
tat/Dubrovnik, Croatia.
13 K.M. Hammouda, M. S. Kamel,”Phrase-based Document Similarity Based on an Index
Graph Model”, Proceedings IEEE International Conference on Data Mining (ICDM’2002),
December 2002, Maebashi City, Japan. IEEE Computer Society 2002, pp. 203-210.
14 K.M. Hammouda, M. S. Kamel, “Incremental Document Clustering Using Cluster Similarity
Histograms”, Proceedings WIC International Conference on Web Intelligence (WI 2003),
October 2003, Halifax, Canada. IEEE Computer Society 2003, pp. 597-601
15 J. D. Isaacs and J. A. Aslam. “Investigating measures for pairwise document similarity.
Technical Report PCS-TR99-357, Dartmouth College, Computer Science, Hanover, NH,
June 1999
16 G. Salton, M. E. Lesk. Computer evaluation of indexing and text processing, Journal of the
ACM, 15(1):8-36, 1968.
17 G. Salton. The SMART Retrieval System – Experiments in Automatic Document Processing.
Prentice Hall Inc., 1971.
18 Kouzas G., E. Kayafas, V. Loumos “Web Similarity Measurements using Ant – Based Search
Algorithm”, Proceedings XVIII IMEKO WORLD CONGRESS Metrology for a Sustainable
Development September 2006, Rio de Janeiro, Brazil.
Preferential Infinitesimals for Information
Retrieval
1 Introduction
Chowdhury, M., Thomo, A. and Wadge, W.W., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 113–125.
114 Maria Chowdhury, Alex Thomo, and William W. Wadge
It is interesting to observe that if the user specifies these keywords in Google, then
she gets a list of only three, low quality, pages. What happens is that the true,
highly informative pages about “music-information-retrieval” are lost (or insignif-
icantly ranked) in the quest of trying to serve the “google-search” and “google-
ranking” keywords. Unfortunately, in Google and other search engines, the user
cannot explicitly specify her real preferences among the specified keywords. In this
example, what the user needs is a mechanism for saying that “music-information-
retrieval” is of primary importance or infinitely more important than “google-search”
and “google-ranking,” and thus, an informative page about “music-information-
retrieval” should be retrieved and highly ranked even if it does not relate to Google
technologies.
Structural Preferences. The other facet of using preferential weights is for system
administrators to annotate structural parts of the documents in a given corpus. In
practice, most of the documents are structured, and often, certain parts of them are
more important than others. While our proposed ideas can be applied on any corpus
of structured documents, due to the wide spread of XML as a standard for repre-
senting documents, we consider in this paper XML documents which conform to
a given schema (DTD). In the same spirit as for keyword preferences, we will use
hyperreal weights to denote the importance of different elements in the schema and
documents.
To illustrate preferences on structural parts of documents, suppose that we have
a corpus of documents representing research papers, and a user is searching for a
specific keyword. Now, suppose that the keyword occurs in the title element of one
paper and in the references element of another paper. Intuitively, the paper having
the keyword in the title should be ranked higher than the paper containing the key-
word in the references element as the title of a paper usually bears more represen-
tative and concise information about the paper than the reference entries do. In fact,
one could say that terms in the title (and abstract) are infinitely more important than
terms in the references entries as the latter might be there completely incidental.
While weighting of certain parts of documents has been considered and advo-
cated in the folklore (cf. [5, 8]), to the best of our knowledge there is no work deal-
ing with inferring a consistent weighting scheme for nested XML elements based
on the weights that a system administrator gives to DTD elements. As we explain in
Section 4, there are tradeoffs to be considered and we present a solution that prop-
erly normalizes the element weights producing values which are consistent among
sibling elements and never greater than the normalized weight of the parent element,
thus respecting the XML hierarchy.
Contributions. Specifically, our contributions in this paper are as follows.
1. We propose using hyperreal numbers (see [6, 7]) to capture both “quantitative”
and “qualitative” user preferences on search keywords. The set of hyperreal num-
bers includes the real numbers which can be used for expressing “quantitative”
preferences such as, say “A is twice more preferred than B,” as well as infinites-
Preferential Infinitesimals for Information Retrieval 115
imal numbers, which can be used to express “qualitative” preferences such as,
say “A is infinitely more preferred than B.” We argue that without such qualita-
tive preferences there is no guarantee that an IR system would not override user
preferences in favor of other measures that the system might use.
2. We extend the ideas of using hyperreal numbers to annotating XML (DTD)
schemas. This allows system administrators to preferentially weight structural
elements in XML documents of a given corpus. We present a normalization
method which produces consistent preferential weights for the elements of any
XML document that complies to an annotated DTD schema.
3. We adapt the well-known TF-IDF ranking in IR systems to take into consider-
ation the preferential weights that the search keywords and XML elements can
have. Our extensions are based on symbolic computations which can be effec-
tively computed on expressions containing hyperreal numbers.
4. We present (in the appendix) illustrative practical examples which demonstrate
the usefulness of our proposed preference framework. Namely, we use a full
collection of speeches from the Shakespeare plays, and a diverse XML collec-
tion from INEX ([13]). In both these collections, we observed a clear advantage
of our preferential ranking over the ranking produced by the classical TF-IDF
method. We believe that these results encourage incorporating both quantitative
and (especially) qualitative preferences into other ranking methods as well.
Organization. The rest of the paper is organized as follows. In Section 2, we give
an overview of hyperreal numbers and their properties. In Section 3, we present
hyperreal preferences for annotating search keywords. In Section 4, we propose an-
notated DTDs for XML documents and address two problems for consistent weight-
ing of document elements. In Section 5, we show how to extend the TF-IDF ranking
scheme to take into consideration the hyperreal weights present in the search key-
words and document elements. In Appendix, we present experimental results.
2 Hyperreal Numbers
Transfer Principle. Every real statement that holds for one or more particular real
functions holds for the hyperreal natural extensions of these functions.
In short, the Extension Principle gives the hyperreal numbers and the Transfer
Principle enables carrying out computation on them. The Extension Principle says
that there does exist an infinitesimal number, for example ε . Other examples of
hyperreals numbers, created using ε , are: ε 3 , 100ε 2 + 51ε , ε /300.
For a, b, r, s ∈ R+ and r < s, we have aε r < bε s , regardless of the relationship
between a and b.
If aε r and bε s are used for example to denote two preference weights, then aε r
is “infinitely better” than bε s even though a might be much bigger than b, i.e. co-
efficients a and b are insignificant when the powers of ε are different. On the other
hand, when comparing two preferential weights of the same power, as for exam-
ple aε r and bε r , the magnitudes of coefficients a and b become important. Namely,
aε r ≤ bε r (aε r > bε r ) iff a ≤ b (a > b).
3 Keyword Preferences
We propose a framework where the user can preferentially annotate the keywords
by hyperreal numbers.
Using hyperreal annotations is essential for reasoning in terms of “infinitely more
important,” which is crucially needed in a scenario with numerous documents. This
is because preference specification using only real numbers suffers from the possi-
bility of producing senseless results as those preferences can get easily absorbed by
other measures used by search engines. For instance, continuing the example given
in the Introduction,
suppose that the user, dismayed of the poor result from Google, containing only
three low quality pages, changes the query into1
1 This second query style corresponds more closely than the first to what is known in the folklore
as the popular “free text query:” a query in which the terms of the query are typed freeform into
the search interface (cf. [5, 8]).
Preferential Infinitesimals for Information Retrieval 117
Definition 1. An annotated free text query is simply a set of keywords (terms) with
preference weights which are polynomials of ε .
For all our practical purposes it suffices to consider only polynomials with coef-
ficients in R+ . For example, 3 + 2ε + 4ε 2 .
By making this restriction we are able to perform symbolic (algorithmic) com-
putations on expressions using ε . All such expressions translate into operations on
polynomials with real coefficients for which efficient algorithms are known (we will
namely need to perform polynomial additions, multiplications and divisions2 ).
Let us illustrate our annotated queries by continuing the above example. The user
can now give
2 The division is performed by first factoring the highest power of ε . For example, (6 + 3ε +
to express that she wants to find documents on Music Information Retrieval and
she is interested in the Google technology for retrieving and ranking music. How-
ever, by leaving intact the music-information-retrieval and annotating google-search
by ε and google-ranking by ε 2 , the user makes her intention explicit that a docu-
ment on music-information-retrieval is infinitely more important than any docu-
ment on simply google-search or google-ranking. Furthermore, in accord with the
above user expression, documents on music-information-retrieval and/or google-
search are infinitely more important than documents on simply google-ranking. Of
course, among documents on Music Information Retrieval, those which are relevant
to Google search and Google ranking are more important.
We note that our framework also allows the user to specify “soft” preference
levels. For example, suppose that the user changes her mind and prefers to have
both google-search and google-ranking in the same “hard” preference level as deter-
mined by the power of infinitesimal ε . However, she still prefers, say “twice more,”
google-search over google-ranking. In this case, the user gives
In this section, we consider the problem of weighting the structural elements of doc-
uments in a corpus with the purpose of influencing an information retrieval system
to take into account the importance of different elements during the process of doc-
ument ranking. Due to the wide spread of XML as a standard for representing docu-
ments, we consider in this paper XML documents which conform to a given schema
(DTD). In the same spirit as in the previous section, we will use hyperreal weights
to denote the importance of different elements in the schema and documents.
While the idea of weighting the document elements is old and by now part of
the folklore (cf. [8]), to the best of our knowledge, there is no work that system-
atically studies the problem of weighting XML elements. The problem becomes
challenging when elements can possibly be nested inside other elements which can
be weighted as well, and one wants to achieve a consistent weight normalization
reflecting the true preferences of a system administrator. Another challenging prob-
lem, as we explain in Subsection 4.4, is determining the right mapping of weights
from the elements of a DTD schema into the elements of XML documents.
In our framework, the system administrator is enabled to set the importance of vari-
ous XML elements/sections in a DTD schema. For example, she can specify that the
keywords elements of documents in an XML corpus, with “research activities” as the
Preferential Infinitesimals for Information Retrieval 119
main theme, is more important than than a section, say on related work. Intuitively,
an occurrence of a search term in the keywords section is way more important than
an occurrence in the related work section as the occurrence in the latter might be
completely incidental or only loosely related to the main thrust of the document.
Thus, in our framework, we allow the annotation of XML elements by weights
being, as in the previous section, polynomials of a (fixed) infinitesimal ε .
4.2 DTDs
Let Σ be the (finite) tag alphabet of a given XML collection, i.e. each tag is an
element of Σ . Then, a DTD D is a pair (d, r) where d is a function mapping Σ -
symbols to regular expressions on Σ and r is the root symbol (cf. [2]).
A valid XML document complying to a DTD D = (d, s) can be viewed as a tree,
whose root is labeled by r and every node labeled, say by a, has a sequence of
children whose label concatenation, say bc . . . x, is in L(d(a)).
A simple example of a DTD defining the structure of some XML research docu-
ments is the following:
where ‘+’ implies “one or more,” ‘∗’ implies “zero or more” and ‘?’ implies “zero
or one” occurrences of an element.
In essence, a DTD D is an extended context-free grammar, and a valid XML
document with respect to D is a parse tree for D.
To illustrate annotated DTDs, let us suppose that the system administrator wants
to express that in the body element, the introduction is twice more important than
a section, and both are infinitely more important than related-work and references,
with the latter being infinitely less important than the former, we would annotate the
rule for body as follows:
Further annotations, expressing for example that the preamble element is three
times more important than the body element, and in the preamble, the keywords
element is 5 times more important than title and 10 times more important than the
rest, would lead to having the following annotated DTD:
120 Maria Chowdhury, Alex Thomo, and William W. Wadge
Since an annotated element can be nested inside other elements, which can be
annotated as well, the natural question that now arises is: How to compute the actual
weight of an element in a DTD? One might be tempted to think that the actual
weight of an element should obtained by multiplying its (annotation) weight by the
weights of all its ancestors. However by doing that, we could get strange results as
for example a possibly increasing importance weight as we go deep down in the
XML element hierarchy.
What we want here is “an element to never be more important than its parent.”
For this, we propose normalizing the importance weights assigned to DTD elements.
There are two ways for doing this. Either divide the weights of a rule by the sum of
the rule’s weights, or divide them by the maximum weight of the rule. In the first
way, the weight of the parent will be divided among the children. On the other hand,
in the second way, the weight of the most important child will be equal to the weight
of the parent.
The drawback of the first approach is that the more children there are, the lesser
their weight is. Thus, we opt for the second way of weight normalization as it better
corresponds to the intuition that nesting in XML documents is for adding structure
to text rather than hierarchically dividing the importance of elements.
For example, in the above DTD, for the children of preamble, we normalize
dividing by the greatest weight of the rule, which is 10. Normalizing in this way the
weights of all the rules, we get
paper → (preamble : 1) (body : 1/3)
preamble → (title : 1/5) (author : 1/10)+ (abstract : 1/10) (keywords : 1)
body → (introduction : 1) (section : 1/2)∗ (related-work : ε /2)? (references : ε 2 /2).
After such normalization, for determining the actual weight of an element, we mul-
tiply its DTD weight by the weights of all its ancestors. For example, the weight of
a section element is (1/3) · (1/2).
As mentioned earlier, under this weighting scheme, the most important child of
a parent has the same importance as the parent itself. Thus, for instance, element
introduction has the same importance (1/3) as its parent body. Note that the weight
normalization can of course be automatically done by the system, while we annotate
using numbers that are more comfortable to write.
the same element might occur differently nested in different valid XML documents.
For example, if we had an additional rule, section → (title : 1) (text : 1/2), in our
annotated DTD, then, given a valid XML document, the weight of a title element
depends on the particular nesting of this element. Namely, if the nesting is
then the normalized weight of the title element is 1/5. On the other hand, if the
nesting is
then the normalized weight of the title element is (1/3) · (1/2) · 1 = 1/6.
In general, in order to derive the correct weight of an element in an XML docu-
ment, we need to first build the element tree of the document. This will be a parse
tree for the context-free grammar corresponding to the DTD. For each node a of this
tree with children bc . . . x, there is a unique rule a → r in the DTD such that word
bc . . . x is in L(r).
Naturally, we want to assign weights to a’s children b, c, . . . , x based on the
weights in annotated expression r. Thus, the question becomes how to map the
weights assigned to the symbols of r to the symbols of word bc . . . x.
Since b, c, . . . , x occur in r, this might seem as a straightforward matter. However,
there is subtlety here arising from the possibility of ambiguity in the regular expres-
sion. For example, suppose the (annotated) expression r is (b : 1 + c : 1)∗ (b : 2)(b :
3)∗ , and element a has three children labeled by b. Surely, bbb is in L(r), but what
label should we assign to each of b’s? There are three different ways of assigning
weights to these b’s: (b : 1)(b : 1)(b : 2), (b : 1)(b : 2)(b : 3), and (b : 2)(b : 3)(b : 3).
However, according to the SGML standard (cf. [3]), the only allowed regular
expressions in the DTD rules are those for which we can uniquely determine the
correspondence between the symbols of an input word and the symbols of the regu-
lar expression.
These expressions are called “1-unambiguous” in [3].
For such an expression r, given a word bc . . . x in L(r), there is a unique mapping
of word symbols b, c, . . . , x to expression symbols. Thus, when r is annotated with
symbol weights, we can uniquely determine the weights for each of the b, c, . . . , x
word symbols.
Based on all the above, we can state the following theorem.
Theorem 1. If T is a valid XML tree with respect to an annotated DTD D, then
based on the weight annotations of D, there is a unique weight assignment to each
node of T .
Now, given an XML document, since there is unique path from the root of an
XML document to a particular element, we have that
Corollary 1. Each element of a valid XML document is assigned a unique weight.
122 Maria Chowdhury, Alex Thomo, and William W. Wadge
The unique weight of an element is obtained by multiplying its local node weight
with the weights of the ancestor nodes on the unique path connecting the element
with the document root.3
fi j
tf i j = ,
max{ f1 j , . . . , fm j }
where the maximum is in fact computed over the terms that appear in document d j .
Considering now XML documents whose elements are weighted based on an-
notated DTDs, we have that not all occurrences of a term “are created equal.” For
instance, continuing the example in Section 4, an occurrence of a term ti in the key-
words element of a document is 5 times more important than an occurrence (of ti )
in the title, and infinitely more important than an occurrence in the related-work
element.
Hence, we refine the T F measure to take the importance of XML elements into
account. When an XML document conforms to an annotated DTD, each element ek
will be accordingly weighted, say by wk .
Suppose that term ti occurs fi jk times in element ek of document d j . Now, we
define the normalized term frequency of ti in d j as
∑k wk fi jk
tf i j = .
max{∑k wk f1 jk , . . . , ∑k wk fm jk }
The other popular measure used in Information Retrieval is the inverse document
frequency (IDF) which is used jointly with the TF measure. IDF is based on the
fraction of documents which contain a query term. The intuition behind IDF is that
a query term that occurs in numerous documents is not a good discriminator, or does
not bear to much information, and thus, should should be given a smaller weight than
other terms occurring in few documents. The weighting scheme known as TF*IDF,
which multiplies the TF measure by the IDF measure, has proved to be a powerful
heuristic for document ranking, making it the most popular weighting scheme in
Information Retrieval (cf. [10, 5, 8]).
Formally, suppose that term ti occurs ni times in a collection of n elements. Then,
the inverse document frequency of ti is defined to be
6 Experiments
same importance, and thus, only search keyword preferences are in fact relevant
for this corpus in influencing the ranking process.
Corpus II An INEX (INitiative for the Evaluation of XML retrieval) (cf. [13]) cor-
pus. INEX is a collaborative initiative that provides reference collections (cor-
pora). For evaluating our method, we have chosen a collection named “topic-
collection” with numerous XML documents of moderate size. The topics of doc-
uments vary from climate change to space exploration. We preferentially anno-
tated the DTD of this collection and gave many preferentially annotated search
queries.
Due to space constraints, we do not show our results here, but we point the reader
to the full version of this paper [4].
References
Abstract Personal knowledge management has been studied from various angles,
one of which is the Semantic Web. Ontologies, the primary knowledge representa-
tion tool for the Semantic Web, can play a significant role in semantically manag-
ing personal knowledge. The scope of this paper focuses on addressing the issue
of effective personal knowledge management, by proposing an ontology for mod-
elling the domain of biographical events. The proposed ontology also undergoes a
thorough evaluation, based on specific criteria presented in the literature.
1 Introduction
The latest technological developments and the WWW expose users to a great vol-
ume of information. A new perspective in Knowledge Management (KM) is essen-
tial that will filter out irrelevant information and increase knowledge quality, by
utilizing the underlying semantic relationships. This requirement is also present in
Personal Knowledge Management (PKM).
The first step towards PKM is to organize personal information. Various tools
and applications are used (e.g. task managers, spreadsheet applications), but often
comprise isolated solutions, revealing the need for a unified way of managing per-
sonal information, so that it becomes knowledge. Ontologies can assist towards
this direction. They are a key factor towards realizing the Semantic Web vision
[1], which promises to structure and semantically annotate raw information, to al-
low its interoperability, reuse and effective search by non-human agents.
This paper focuses on the issue of semantically managing the great volume of
personal information by the use of an appropriately defined ontology. More spe-
cifically, an ontology called OntoLife is proposed for describing a person’s bio-
graphical events and personal information. The ontology underwent a thorough
evaluation that indicates its suitability for the designated purpose.
The rest of the paper is organized as follows: Section 2 describes related work
paradigms, while the next section focuses on the presentation of the proposed on-
Kargioti, E., Kontopoulos, E. and Bassiliades, N., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 127–133.
128 E. Kargioti, E. Kontopoulos and N. Bassiliades
tology, accompanied by its evaluation. The paper concludes with final remarks
and directions for future work.
Ontologies are the primary knowledge representation tool in the Semantic Web
[1]. An ontology is a structured representational formalism of a domain, including
a set of domain concepts and the relationships between them. The concepts de-
scribe classes of objects, while the relationships describe hierarchical dependen-
cies among the concepts.
Regarding the domain of “life”, the authors are not aware of an existing appro-
priate ontology. The FOAF1 ontology is relevant, yet not wide enough for our pur-
poses. ResumeRDF2 is another ontology for representing Curriculum Vitae infor-
mation about work and academic experience, skills, etc. Finally, another paradigm
is HR-XML3, a library of XML schemas that a variety of business processes re-
lated to human resource management. Nevertheless, none of the above (or other)
ontologies and schemas can cover so broadly all the aspects of a person’s bio-
graphical events as OntoLife.
3 Proposed Ontology
The scope of the proposed ontology is to model life by describing the person’s
characteristics, relationships and experiences. Since the domain is broad, an at-
tempt to model it in details would produce a huge and cumbersome ontology.
Thus, the domain is modelled in a non-exhaustive yet sufficient way, adopting the
definition of generic entities that can easily be extended.
3.1 Description
The backbone of OntoLife is the Person entity. The entire ontology is built upon
and around a Person, by a set of properties that relate the Person with the rest of
the entities, as shown in Fig. 1. At the same time, many auxiliary entities and
properties are defined to further describe the domain.
When designing the ontology, the idea of reusing commonly accepted ontolo-
gies was always considered. Thus, the Person entity of the FOAF ontology
4
Proposed by the Organization for Advancement of Structured Information Standards
5 Proposed by UMBC eBiquity Research Group of the University Maryland, Baltimore
6
http://vocab.org/relationship/
7
http://users.auth.gr/~elkar/thesis/FamilyTree.owl
130 E. Kargioti, E. Kontopoulos and N. Bassiliades
tions, scope and possible periodicity are described. Direct subclasses are Pur-
chaseEvent, MedicalExaminationEvent and FamilyEvent. Further properties are
defined with domain the subclasses of entities Period and Event and enable a more
precise annotation of the related content. Fig. 2 describes in detail these subclasses
and properties.
3.2 Evaluation
The increasing number of ontologies in the web has led research to define meth-
ods and measures for evaluation. A popular method, also used in this work, is the
criteria-based evaluation [3]. Consistency, completeness, expandability, minimal
ontological commitment, etc. are some of the criteria listed in literature.
The adopted evaluation methodology includes the definition of specific re-
quirements that the ontology needs to satisfy and the mapping of each requirement
to a criterion [4]. Suitable measures are then selected to quantitatively assess each
requirement. The main requirement is to be easy and intuitive to annotate content
based on the ontology. Towards this, more specific requirements need to be met.
1. The terms used for the class names need to be close to real life terminology.
2. The classes should have a balanced number of subclasses; sufficient enough to
facilitate effective annotation, but not too high to confuse the user.
3. The ontology should be rich, concerning attributes and relationships.
4. Cycles and other errors in the ontology structure should be avoided.
OntoLife: an Ontology for Semantically managing Personal Information 131
The Semantic Quality [5] criterion is mapped to the first requirement and the
following measures. Interpretability, which is the percentage of class names that
have a definition listed in WordNet8 and Concept Paths [6], which is the percent-
age of class hierarchies that are depicted in WordNet through term hyponyms. Ex-
pandability/Coverage [7] are the criteria mapped to the second requirement. Re-
lated measures are: class tree depth, breath and branching factor. For a broad
ontology like OntoLife, a less deep tree with a low branching factor is preferred.
The third requirement is mapped to the ontology richness criterion [8], assessed by
the attribute and relationship richness. The last requirement is mapped to the
Minimal Ontological commitment criterion [9] and ontology validators are used to
exclude circularity and other types of errors.
To assess semantic quality, certain assumptions are made. Class names that
consist of more than one word written in CamelCase or separated by underscore
were considered listed in WordNet, if all included words were listed (e.g. Certifi-
cate_Diploma) or a phrase with these words made sense (e.g. ForeignLanguage).
Also, class names taken from the HR-XML Candidate specification were consid-
ered interpretable. Finally, a concept path (the path from a parent class to a leaf
subclass in a class tree) may be fully or partially depicted in WordNet, if all or
some subclasses are listed as parent class hyponyms. To measure Interpretability
and Concept Paths, a weighted average was calculated (Table 1). To assess ontol-
ogy expandability/coverage and attribute and relationship richness, the metrics of-
fered by the SWOOP9 ontology editor were used (Table 2). Finally, to assess the
last criterion the Vowlidator10 and WonderWeb online validator11 were used. The
ontology was identified as OWL Full compatible, while no errors were indicated.
Table 2. Calculation of measures for the ontology’s expandability/coverage criteria and the at-
tributes and relationship richness criterion
Tree Depth
Max. Depth of Class Tree: 4, Min. Depth of Class Tree: 1, Avg. Depth of Class Tree: 1.9
Tree Breadth
Max. Breadth of Class Tree: 33, Min. Breath of Class Tree: 1, Avg. breadth of Class Tree: 25
Tree Branching factor
Max. Branching Factor of Class Tree: 47, Min. Branching Factor of Class Tree: 1
Avg. Branching Factor of Class Tree: 6.6
Attribute richness
No. Attributes in all classes / No. classes = 85%
Relationships richness
No. Relations / (No. Subclasses+No. Relations) = 68%
References
1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American,
284(5):34-43, 2001.
2. Staab, S., Studer, R.: Handbook on Ontologies. International Handbooks on Informa-
tion Systems, Springer Verlag (2004)
3. Gomez-Perez, A.: Evaluation of Ontologies. International Journal of Intelligent Sys-
tems, 16(3):391-409, 2001.
4. Yu, J., Thom, A.A., Tam, A.: Ontology Evaluation Using Wikipedia Categories for
Browsing. In Proceedings 16th Conference on Information and Knowledge Manage-
ment (CIKM), pp. 223-232 (2007)
5. Burton-Jones, et al: A Semiotic Metrics Suite for Assessing the Quality of Ontologies.
Data Knowledge Engineering, 55(1):84-102, 2005.
6. Sleeman, D., Reul, Q. H.: CleanONTO: Evaluating Taxonomic Relationships in On-
tologies. In Proceedings 4th International Workshop on Evaluation of Ontologies for
the Web (EON), Edinburgh, Scotland (2006)
7. Gangemi, A. et al: A Theoretical Framework for Ontology Evaluation and Validation.
In Proceedings 2nd Italian Semantic Web Workshop (SWAP), Trento, Italy (2005)
8. Tartir, S. et al: OntoQA: Metric-Based Ontology Quality Analysis. IEEE ICDM Work-
shop on Knowledge Acquisition from Distributed, Autonomous, Semantically Hetero-
geneous Data and Knowledge Sources, Houston, TX (2005)
9. Yu, J., Thom, A.A., Tam, A.: Evaluating Ontology Criteria for Requirements in a Geo-
graphic Travel Domain. In Proceedings International Conference on Ontologies, Data-
bases and Applications of Semantics (2005)
10. Leuf, B., Cunningham, W.: The Wiki Way: Collaboration and Sharing on the Internet.
Addison Wesley, Reading, Massachusetts (2001)
AIR_POLLUTION_Onto: an Ontology for Air
Pollution Analysis and Control
Mihaela M. Oprea
Abstract The paper describes an ontology for air pollution analysis and control,
AIR_POLLUTION_Onto, and presents its use in two case studies, an expert sys-
tem, and a multiagent system, both dedicated to monitoring and control of air pol-
lution in urban regions.
1 Introduction
The last decade has registered a strong challenge on the improvement of our envi-
ronment quality under the international research framework of durable and sus-
tainable development of the environment. The main concern of this challenge is to
assure a healthy environment (air, water and soil) that allow the protection of eco-
systems, and human health. One of the key aspects of this challenge is the air pol-
lution control in urban regions with industrial activity [9]. In this context, more ef-
ficient tools has to be developed to solve the current environmental problems.
Artificial intelligence provides several techniques that can solve efficiently such
problems which have a high degree of uncertainty (see e.g [1] and [11]). The
knowledge-based approach and the multi-agent systems (MAS) approach ([17])
offer ones of the best solutions to the environmental problems, as they reduce their
complexity by structuring the domain knowledge from different sources in knowl-
edge bases [10]. We have used the solution of expert systems [14], as the main
sources of knowledge are given by the human experts as well as by the heuristic
rules generated through machine learning techniques. The expert system
DIAGNOZA_MEDIU was developed for the air pollution state diagnosis and con-
trol in urban regions with industrial activity. The solution of MAS was recently
adopted by our research group in a postdoctoral research project running at our
university. These types of approaches need to use an ontology specific to the ex-
pertise domain. Thus, we have developed an ontology dedicated to air pollution
analysis and control, AIR_POLLUTION_Onto.
Oprea, M.M., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 135–143.
136 Mihaela Oprea
2 Air Pollution Analysis and Control
The main air pollutants are carbone dioxide (CO2), carbon monoxide (CO), nitro-
gen dioxide (NO2) and nitrogen oxides (NOx), suspended particulates (particulate
matters: respirable PM10, and fine PM2.5), sulfur dioxide (SO2), ozone (O3), lead
(Pb), volatile organic compounds (VOC) etc. The concentrations of the air pollut-
ants are measured in specific sites and compared to the standard values, according
to national and international reglementations. The air pollutants have different
dispersion models, and several mathematical models are used for the description
of the relationships between environmental protection and meteorological factors
(see e.g. an analysis of NO2 and PM concentrations contribution to roadside air
pollution [7]). Moreover, there are a lot of unpredictable factors that may influ-
ence the degree of air pollution, and it is quite difficult to establish with certainty
which are the causes of an increase or of a decrease of an air pollutant concentra-
tion. The inclusion of most of the factors with their associated uncertainty degree
would increase too much the complexity of the mathematical models, thus making
them inefficient to solve real-time problems. The solution of a knowledge-based
approach is an alternative to the mathematical models, as it allows the integration
of multiple sources of knowledge in a knowledge base used by an inference en-
gine that can deal also with uncertainty [12].
Prevention is an important step to air pollution control, and include different
measures specific to each type of air pollutant and source of pollution. Some air
pollution control strategies includes emission abatement equipment (e.g. wet and
dry scrubbers, cyclones, bag filters), a policy of air pollution dispersion and dilu-
tion (e.g. a chimney of adequate height so that the pollution returned to ground
level it poses no risk to health), change the process technology (e.g. fuel change,
combined heat and power plant), change the operating patterns (e.g. alter the time
that a process causes peak emissions), relocation (e.g. change the location of the
process to have less impact on the urban and rural region).
O NTO LO GY
ROOT
PO L LU T AN T
AKO
A IR
PO L LU T AN T
SO 2 NOx CO2 PM VO C
AKO AKO
P M 2 .5 PM 10
4 Case Studies
4.1 DIAGNOZA_MEDIU
USER INTERFACE
INFERENCE ENGINE E X P L A N A T I ON
MODULE
KNOWLEDGE BASE
RULES BASE
K N O W L E D G E A C Q U I SITION
MODULE DIAGNOZA_MEDIU
KNOWLEDGE SOURCES
F O R E C A STING HUMAN NATIONAL & OTHER
KNOWLEDGE INTERNATIONAL AIR SOURCES
EXPERTS QUALITY STANDARDS
where DT is the duration (in days) of air temperatures greater than 370C, T is the minimum value of
the maximum temperatures measured in the last seven days, MF represents a global parameter that re-
fers to the evolution of the meteorological factors (wind, rainfalls, etc) in the next period of time (i.e.
next 2 days minimum), TPRED is the symbolic value of the predicted temperature, IP is the predicted
value for an air pollutant indicator (e.g. the concentration level), and MAC is the maximum admissible
value for an air pollutant indicator.
The three rules given above are applied to specific sites, and thus the chemical
plants, and other air pollution sources are known, and the warning as well as the
prevention and counter measures specified in the generic rule DPM_7 are directly
related to them. The instances specific to the urban region where
DIAGNOZA_MEDIU is applied are included in the ontology.
4.2 MAS_AirPollution
Agent-IMC
Agent-IMC
2 Agent-SIA 6
Agent-IMC
Agent-IMC
3 5
Agent-IMC 4 Agent-IMC
In case one of the agent discovers during conversation that some concepts from
the current message are unknown, it will generate a particular feedback message
to establish a mapping of its private ontology with that of the sender agent. Usu-
ally, only a part of the ontology is mapped.
References
1. Buisson, L., Martin-Clonaire, R., Vieu, L., Wybo, J.-L.: Artificial Intelligence and Envi-
ronmental Protection: A Survey of Selected Applications in France, Information Proc-
essing 92, vol. II, Elsevier (1992) 635-644.
2. Ehrig, M., Staab, S.: Efficiency of Ontology Mapping Approaches, University of
Karlsruhe, research report (2004).
3. Godo, L., Lopez de Mantaras, R., Sierra, C., Verdaquer, A.: MILORD: The architecture
and the management of linguistically expressed uncertainty, International Journal of In-
telligent Systems, 4 (1989) 471-501.
4. Gruber, T.: Towards principles for the design of ontologies used for knowledge sharing,
International Journal of Human-Computer Studies, 43(5/6) (1995) 907-928.
5. Guarino, N., Giaretta, P.: Ontologies and Knowledge Bases: Towards a Terminological
Clarification, in Towards Very Large Knowledge Bases: Knowledge Building and
Knowlede Sharing, N. Mars (Ed), IOS Press (1995) 25-32.
6. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art, The Knowledge
Engineering Review, 18(1) (2003) 1-31.
7. Lam, G.C.K., Leung, D.Y.C., Niewiadomski, M., Pang, S.W., Lee, A.W.F., Louie,
P.K.K.: Street-level concentrations of nitrogen dioxide and suspended particulate matter
in Hong Kong, Atmospheric Environment, 33(1) (1999) 1-11.
8. Li, J.: LOM: A Lexicon-based Ontology Mapping Tool, Teknowledge Corporation, re-
search report (2004).
9. Moussiopoulos, N. (Ed): Air Quality in Cities, Springer, Berlin (2003).
10. Oprea, M.: A case study of knowledge modelling in an air pollution control decision
support system, AiCommunications, IOS Press, 18(4) (2005) 293-303.
11.Oprea, M., Sànchez-Marrè, M. (Eds): Proceedings 16th ECAI 2004 Workshop Binding
Environmental Sciences and Artificial Intelligence (BESAI-4) (2004).
12.Oprea, M.: Modelling an Environmental Protection System as a Knowledge-Based Sys-
tem, International Journal of Modelling and Simulation, ACTA Press, 24(1) (2004) 37-
41.
13.Oprea, M.: Some Ecological Phenomena Forecasting by using an Artificial Neural Net-
work, Proceedings 16th IASTED International Conference on Applied Informatics AI98,
Garmisch-Partenkirchen, ACTA Press (1998) 30-32.
14.Page, B.: An Analysis of Environmental Expert Systems Applications, Environmental
Software, 5(4) (1990) 177-198.
15. Protégé-2000: http://protégé.stanford.edu
16. Uschold, M., King, M.: Towards a Methodology for Building Ontologies, research re-
port AIAI-TR-183, University of Edinburgh (1995).
17. Wooldridge, M.: Introduction to Multiagent Systems, John Wiley and Sons, New York,
(2002).
Experimental Evaluation of Multi-Agent
Ontology Mapping Framework
As a requirement for the Semantic Web vision to become reality several difficulties
have to resolved like ontology mapping, which makes it possible to interpret and
align heterogeneous and distributed ontologies in this environment. For ontology
mapping in the context of Question Answering over heterogeneous sources we pro-
pose a multi agent architecture [2] because as a particular domain becomes larger
and more complex, open and distributed, a set of cooperating agents are necessary
in order to address the ontology mapping task effectively. In real scenarios, ontology
mapping can be carried out on domains with large number of classes and properties.
Without the multi agent architecture the response time of the system can increase
exponentially when the number of concepts to map increases.
Miklos Nagy
Knowledge Media Institute, The Open University, UK, e-mail: [email protected]
Maria Vargas-Vera
Computing Department,The Open University, UK e-mail: [email protected]
Nagy, M. and Vargas-Vera, M., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 145–150.
146 Miklos Nagy and Maria Vargas-Vera
An overview of our system is depicted on Fig. 1 The two real word ontologies12
describe BibTeX publications from the University of Maryland, Baltimore County
(UMBC) and from the Massachusetts Institute of Technology (MIT) . The AQUA
[7] system and the answer composition component are described just to provide the
context of our work (our overall framework) but these are not our major target in
this paper. The user poses a natural language query to the AQUA system, which con-
verts it into FOL (First Order Logic) terms. The main components and its functions
of the system are as follows. First broker agent receives FOL term, decomposes it(in
case more than one concepts are in the query) and distributes the sub queries to the
mapping agents. Mapping agents retrieve sub query class and property hypernyms
from WordNet. and retrieve ontology fragments from the external ontologies, which
are candidate mappings to the received sub-queries. Mapping agents use WordNet
as background knowledge in order to enhance their beliefs on the possible mean-
ing of the concepts or properties in the particular context. At this point mapping
agents build up coherent beliefs by combining all possible beliefs over the similar-
ities of the sub queries and ontology fragments. Mapping agents utilize both syn-
tactic and semantic similarity algorithms build their beliefs over the correctness of
the mapping. After this step broker agent passes the possible mappings into the an-
swer composition component for particular sub-query ontology fragment mapping
in which the belief function has the highest value. In the last step the answer com-
position component retrieves the concrete instances from the external ontologies or
data sources, which will be included into the answer and creates an answer to the
user’s question.
The organisation of this paper is as follows. In section 2 we analyse related sys-
tems, which have participated in more than 3 OAEI tracks. In section 3 we present
our experimental evaluation of the benchmarks, anatomy and library tracks. Finally
in section 4 we draw our conclusions of our evaluation.
1 http://ebiquity.umbc.edu/ontology/publication.owl
2 http://visus.mit.edu/bibtex/0.01/bibtex.owl
Experimental Evaluation of Multi-Agent Ontology Mapping Framework 147
Several ontology mapping systems have been proposed to address the semantic data
integration problem of different domains independently. In this paper we consider
only those systems, which have participated in the OAEI competitions and has been
participated more than two tracks. There are other proposed systems as well however
as the experimental comparison cannot be achieved we do not include them in the
scope of our analysis. Lily [8] is an ontology mapping system with different purpose
ranging from generic ontology matching to mapping debugging. It uses different
syntactic and semantic similarity measures and combines them with the experiential
weights. Further it applies similarity propagation matcher with strong propagation
condition and the matching algorithm utilises the results of literal matching to pro-
duce more alignments. In order to assess when to use similarity propagation Lily
uses different strategies, which prevents the algorithm from producing more incor-
rect alignments. ASMOV [1] has been proposed as a general mapping tool in order
to facilitate the integration of heterogeneous systems, using their data source ontolo-
gies. It uses different matchers and generates similarity matrices between concepts,
properties, and individuals, including mappings from object properties to datatype
properties. It does not combine the similarities but uses the best values to create a pre
alignment, which are then being semantically validated. Mappings, which pass the
semantic validation will be added to the final alignment. ASMOV can use different
background knowledge e.g. Wordnet or UMLS Metathesaurus(medical background
knowledge) for the assessment of the similarity measures. RiMOM [6] is an auto-
matic ontology mapping system, which models the ontology mapping problem as
making decisions over entities with minimal risk. It uses the Bayesian theory to
model decision making under uncertainty where observations are all entities in the
two ontologies. Further it implements different matching strategies where each de-
fined strategy is based on one kind of ontological information. RiMOM includes
different methods for choosing appropriate strategies (or strategy combination) ac-
cording to the available information in the ontologies. The strategy combination is
conducted by a linear-interpolation method. In addition to the different strategies
RiMOM uses similarity propagation process to refine the existing alignments and
to find new alignments that cannot be found using other strategies. RiMOM is the
only system other than DSSim in the OAEI contest that considers the uncertain na-
ture of the mapping process however it models uncertainty differently from DSSim.
RiMOM appeared for first time in the OAEI-2007 whilst DSSim appeared in the
OAEI-2006.
3 Experimental Analysis
and as a response to this need the Ontology Alignment Evaluation Initiative3 has
been set up in 2004. The evaluation was measured with recall, precision and F-
measure, which are useful measures that have a fixed range and meaningful from
the mapping point of view. The experiments were carried out to assess the efficiency
of the mapping algorithms themselves. The experiments of the question answering
(AQUA) using our mappings algorithms are out of the scope of this paper. Our main
objective was to compare our system and algorithms to existing approaches on the
same basis and to allow drawing constructive conclusions.
3.1 Benchmarks
The OAEI benchmark contains tests, which were systematically generated starting
from some reference ontology and discarding a number of information in order to
evaluate how the algorithm behave when this information is lacking. The biblio-
graphic reference ontology (different classifications of publications) contained 33
named classes, 24 object properties, 40 data properties. Further each generated on-
tology was aligned with the reference ontology. The benchmark tests were created
and grouped by the following criteria. Group 1xx are simple tests such as compar-
ing the reference ontology with itself, with another irrelevant ontology or the same
ontology in its restriction to OWL-Lite. Group 2xx are systematic tests that were
obtained by discarding some features from some reference ontology e.g. name of
entities replaced by random strings, synonyms, name with different conventions.
Group 3xx contain four real-life ontologies of bibliographic references that were
found on the web e.g. BibTeX/MIT, BibTeX/UMBC. Figure 2 shows the 6 best per-
forming systems out of 13 participants. We have ordered the systems based on the
their the F-Value of the H-means because the H-mean unifies all results for the test
and F-Value represents both precision and recall.
Fig. 2 Best performing systems in the benchmarks track based on H-mean and F-value
3 http://oaei.ontologymatching.org/
Experimental Evaluation of Multi-Agent Ontology Mapping Framework 149
In the benchmark test we have performed in the upper mid range compared to
other systems. Depending on the group of tests our system compares differently
to other solutions. For the Group 1xx our results are nearly identical to the other
systems. In the group 2xx tests where syntactic similarity can determine the map-
ping outcome our system is comparable to other systems. However where semantic
similarity is the only way to provide mappings our systems provides less mappings
compared to the other systems in the best six. For the tests in group 3xx consider-
ing the F-value only 3 systems SAMBO, RIMOM and Lily performed better than
DSSim. The weakness of our system to provide good mappings when only semantic
similarity can be exploited is the direct consequence of our mapping architecture. At
the moment we are using four mapping agents where 3 carries our syntactic similar-
ity comparisons and only 1 is specialised in semantics. However it is worth to note
that our approach seems to be stable compared to our last year’s performance as our
precision recall values were similar in spite of the fact that more and more diffi-
cult tests have been introduced in this year. As our architecture is easily expandable
with adding more mapping agents it is possible to enhance our semantic mapping
performance in the future.
3.2 Library
The objective of this track was to align two Dutch thesauri used to index books from
two collections held by the National Library of the Netherlands. Each collection is
described according to its own indexing system and conceptual vocabulary. On the
one hand, the Scientific Collection is described using the GTT, a huge vocabulary
containing 35,000 general concepts ranging from Wolkenkrabbers (Sky-scrapers) to
Verzorging (Care). On the other hand, the books contained in the Deposit Collec-
tion are mainly indexed against the Brinkman thesaurus, containing a large set of
headings (more than 5,000) that are expected to serve as global subjects of books.
Both thesauri have similar coverage (there are more than 2,000 concepts having ex-
actly the same label) but differ in granularity. For each concept, the thesauri provide
the usual lexical and semantic information: preferred labels, synonyms and notes,
broader and related concepts, etc. The language of both thesauri is Dutch, but a quite
substantial part of Brinkman concepts (around 60%) come with English labels. For
the purpose of the alignment, the two thesauri have been represented according to
the SKOS model, which provides with all these features.
In the library track DSSim has performed the best out of the 3 participating sys-
tems. The track is difficult partly because of its relative large size and because of its
multilingual representation. However these ontologies contain related and broader
terms therefore the mapping can be carried out without consulting multi lingual
background knowledge. This year the organisers have provided instances as sepa-
rate ontology as well however we did not make use of it for creating our final map-
pings. For further improvements in recall and precision we will need to consider
these additional instances in the future.
150 Miklos Nagy and Maria Vargas-Vera
4 Conclusions
In this paper we have analysed two different experimental tests that were carried
out in order to evaluate our integrated ontology mapping solution. We have showed
that our solution DSSim, which is the core ontology mapping component of our
proposed multi agent ontology mapping framework performs really well compared
to other solutions. The analysis of other OAEI 2008 tracks in which we have partic-
ipated are out of the scope of this paper, however, a detailed description of the other
tracks can be found in [5]. Nevertheless we continuously evaluate the performance
of our system through OAEI competitions [3, 4, 5] that allows us to improve, eval-
uate and validate our solution compared to other state of the art systems. So far our
qualitative results are encouraging therefore we aim to investigate further the belief
combination optimisation, compound noun processing and agent communication
strategies for uncertain reasoning in the future.
References
1. Yves R. Jean-Mary and Mansur R. Kabuka. Asmov: Ontology alignment with semantic vali-
dation. In Joint SWDB-ODBIS Workshop, 2007.
2. Miklos Nagy, Maria Vargas-Vera, and Enrico Motta. Multi agent ontology mapping framework
in the aqua question answering system. In MICAI 2005: Advances in Artificial Intelligence, 4th
Mexican International Conference on Artificial Intelligence, pages 70–79, 2005.
3. Miklos Nagy, Maria Vargas-Vera, and Enrico Motta. Dssim-ontology mapping with uncertainty.
In The 1st International Workshop on Ontology Matching, 2006.
4. Miklos Nagy, Maria Vargas-vera, and Enrico Motta. Dssim - managing uncertainty on the
semantic web. In The 2nd International Workshop on Ontology Matching, 2007.
5. Miklos Nagy, Maria Vargas-Vera, and Piotr Stolarski. Dssim results for oaei 2008. In The 3rd
International Workshop on Ontology Matching, 2008.
6. Jie Tang, Juanzi Li, Bangyong Liang, Xiaotong Huang, Yi Li, and Kehong Wang. Using
bayesian decision for ontology mapping. Web Semantics, 2006.
7. Maria Vargas-Vera and Enrico Motta. Aqua - ontology-based question answering system. In
Third International Mexican Conference on Artificial Intelligence (MICAI-2004), 2004.
8. Peng Wang and Baowen Xu. Lily: Ontology alignment results for oaei 2008. In The 3rd
International Workshop on Ontology Matching, 2008.
Visualizing RDF Documents
1
Dept. of Business Administration, Univ. of Macedonia, GR-54006, Thessaloniki, Greece
[email protected]
2
Dept. of Informatics, Aristotle Univ. of Thessaloniki, GR-54124 Thessaloniki, Greece
{skontopo, nbassili}@csd.auth.gr
Abstract The Semantic Web (SW) is an extension to the current Web, enhancing
the available information with semantics. RDF, one of the most prominent stan-
dards for representing meaning in the SW, offers a data model for referring to ob-
jects and their interrelations. Managing RDF documents, however, is a task that
demands experience and expert understanding. Tools have been developed that al-
leviate this drawback and offer an interactive graphical visualization environment.
This paper studies the visualization of RDF documents, a domain that exhibits
many applications. The most prominent approaches are presented and a novel
graph-based visualization software application is also demonstrated.
1 Introduction
The Semantic Web [1] (SW) attempts to improve the current Web, by making Web
content “understandable” not only to humans but to machines as well. One of the
fundamental SW technologies is XML (eXtensible Markup Language) that allows
the representation of structured documents via custom-defined vocabulary. How-
ever, since XML cannot semantically describe the meaning of information, RDF
[2] (Resource Description Framework), an XML-based statement model, was in-
troduced that captures the semantics of data through metadata representation.
The management of XML-based RDF documents is a task easily handled by
machines that can easily process large volumes of structured data. For humans,
however, the same objective is highly cumbersome and demands experience and
expert understanding [3]. Software tools have been developed that alleviate this
drawback, hiding the technical low-level syntactical and structural details and of-
fering a graphical visualization interactive environment. This way, a human-user
can easily create new documents or modify their structure and content.
The most substantial requirement for these software tools is the efficient visu-
alization of RDF metadata [4]. The three most prominent RDF visualization ap-
Athanassiades, A., Kontopoulos, E. and Bassiliades, N., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 151–156.
152 A. Athanassiades, E. Kontopoulos and N. Bassiliades
proaches are: display-at-once, where the graph representing the document is dis-
played all at once, navigational-centric, where a chosen resource serves as the
start-point for the rest of the graph, and centric-graph-at-once, a combination of
the previous two. This paper studies thoroughly these approaches and demon-
strates RDFViz++, a novel graph-based visualization software. The tool offers an
alternative visualization approach that fulfills the needs unsatisfied by the avail-
able tools.
In the rest of the paper, section 2 gives some insight on RDF, followed by a
section that focuses on visualizing RDF documents, presenting the three dominant
visualization approaches. Section 4 presents RDFViz++, elaborating on its most
distinctive features as well as its visualization algorithm. The paper is concluded
with the final remarks and directions for future work.
Since RDF is based on XML, human interaction with RDF documents becomes
cumbersome, especially in rich, detailed domains with vast numbers of statements.
Dedicated software utilities bring the solution: statement visualization through
simple, two-dimensional shapes. A graph is usually the final result, where nodes
represent resources and arrows represent predicates. Visualizing the whole docu-
ment, nevertheless, is more complicated, as many resources, properties and values
must be combined in one display [5]. Also, each RDF document, demands a dif-
Visualizing RDF Documents 153
4 RDFViz++
The most important factors in visualizing RDF are the document size and com-
plexity. This causes different performances for different documents by the same
application – every tool follows a particular non-flexible algorithm that does not
adapt to document characteristics. RDFViz++ is an alternative RDF visualization
approach that faces this weakness; instead of enforcing one graph style, it com-
bines the three previous visualization techniques, preserving the advantages from
each. The software offers various layouts, but even when none of them proves to
be efficient enough, a random algorithmic graph layout can be executed as many
times as needed, until the final result is acceptable.
The interface consists of the toolbar, the subjects list, the display panel and the
status bar. Almost all functions can be executed from the toolbar at the top of the
window. The central node of the graph is chosen from the subjects list on the left
that contains all the subject resources of the RDF document. The rest of the screen
is used for displaying the graph, except from a narrow lane at the bottom, which
serves as the status bar. A snapshot of RDFViz++ is shown in Fig. 1.
The central node resource is passed as a parameter to a procedure that loads all
RDF statements, for which this resource is the subject. For each statement, its
predicate and object are isolated and drawn as an arrow and a new node, respec-
tively. After visualizing each statement, the system prevents re-expansion of an-
other instance of the same resource. Nevertheless, a resource may be present more
than once in a graph by following a strict constraint: it must be displayed only
once as a subject and as many times as needed as an object.
The above process accepts a node and draws all adjacent nodes. If the expan-
sion range is set to 1, then a single execution of this process gives the resulting
graph. If the range is set to a greater value, then the procedure calls itself recur-
sively. Every object that comes up from expanding the initial node becomes the
procedure parameter and another call is performed. If any of the objects that arise
has already been expanded as a subject before, then it is just omitted. Finally, if
the range is set to 0, recursion occurs until no more objects are able to expand.
Except from the automatic graph generation, RDFViz++ also provides manual
expansion via user interaction. When the initial graph is built, any visible object
can be expanded, unless it has been already expanded as a subject at previous lev-
els. Also, the RDF statements, where the resource participates as subject, must be
available. If these constraints are satisfied, then the selected object is passed as a
parameter to the main procedure, which is executed recursively. The number of
recursive executions is equal to the number of objects that arise from the levels,
which in their turn are defined by expansion range.
Document complexity does not depend only on the number of statements. One of
the most significant characteristics is concentration, namely, the phenomenon of
having only a few specific resources participate repeatedly in a vast number of
statements. RDFViz++ provides a variety of graph layouts; the most appropriate
can be chosen, according to the document visualization requirements. The layout
can even be dynamically modified –the whole graph with the chosen layout is
simply redrawn. Thus, experimentation can often lead to the best-suited configura-
tion for each document. The available layouts are:
• West/East/North/South Tree: West Tree is one of the most efficient layouts, po-
sitioning the central node at the leftmost part and maintaining a left-to-right
flow. In East Tree the flow is inversed. North and South Tree layouts have the
same arrangement, but start deployment from the top and bottom, respectively.
• North/West Compact: Variations of the North and West Tree layouts that im-
prove node placement, aiming at efficiently distributing the available space.
• Radial Tree: The initial centric point is the center for all levels, which are
drawn as concentric circles with a greater radius than the previous levels.
• Organic: Uses a randomized algorithm for calculating positions.
156 A. Athanassiades, E. Kontopoulos and N. Bassiliades
The paper reported on RDF document visualization, presenting the three most
prominent visualization approaches. Each is suitable for specific document types,
while no single methodology can handle all documents. This was the primary mo-
tivation behind RDFViz++, the RDF visualization tool presented in this work. The
software adjusts to the peculiarities of the document to be visualized, offering an
adequate array of available layouts and providing the possibility of choosing the
most suitable approach each time. Conclusively, the software offers a more inclu-
sive RDF visualization. Expansion range, customized level and node distances and
the various graph layouts add up to a flexible interactive application.
As for future work, the software could be enhanced with various controls like
zoom-in/zoom-out, inversing the flow of the arrows (from objects to subjects),
overview controls etc. Furthermore, it could be enhanced with authoring capabili-
ties; the potential of introducing, modifying or removing statements from an RDF
document would transform the tool into an integrated RDF development environ-
ment. Finally, the software could also be extended with RDF Schema representa-
tion and authoring capabilities, becoming, thus, an RDF Schema ontology editor.
References
1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American,
284(5), pp. 34-43 (2001)
2. Herman, I., Swick, R., Brickley, R.: Resource Description Framework (RDF).
http://www.w3.org/RDF/, last accessed: 4 November 2008
3. DeFanti, T. A., Brown, M. D., McCormick, B. H.: Visualization: Expanding Scientific
and Engineering Research Opportunities. IEEE Computer, 22 (8), pp. 12-25 (1989)
4. Deligiannidis, L., Kochut, K. J., Sheth, A. P.: RDF Data Exploration and Visualization.
Proc. ACM First Workshop on Cyberinfrastructure: Information Management in E-
Science (CIMS '07), Lisbon, Portugal, ACM, New York, pp. 39-46 (2007)
5. Frasincar, F., Telea, A., Houben, G. J.: Adapting Graph Visualization Techniques for
the Visualization of RDF Data. Visualizing the Semantic Web, pp. 154-171 (2006)
6. Pietriga, E.: IsaViz: A Visual Environment for Browsing and Authoring RDF Models.
Proc. 11th World Wide Web Conference (Developer’s day), Hawaii, USA (2002)
7. Sayers, C.: Node-Centric RDF Graph Visualization. Technical Report HPL-2004-60,
HP Laboratories, Palo Alto (2004)
8. Fallenstein, B.: Fentwine: A Navigational RDF Browser and Editor. Proc. 1st Work-
shop on Friend of a Friend, Social Networking and the Semantic Web, Galway (2004)
A Knowledge-based System for Translating
FOL Formulas into NL Sentences
Abstract In this paper, we present a system that translates first order logic (FOL)
formulas into natural language (NL) sentences. The motivation comes from an in-
telligent tutoring system teaching logic as a knowledge representation language,
where it is used as a means for feedback to the users. FOL to NL conversion is
achieved by using a rule-based approach, where we exploit the pattern matching
capabilities of rules. So, the system consists of a rule-based component and a lexi-
con. The rule-based unit implements the conversion process, which is based on a
linguistic analysis of a FOL sentence, and the lexicon provides lexical and gram-
matical information that helps in producing the NL sentences. The whole system
is implemented in Jess, a java-based expert system shell. The conversion process
currently covers a restricted set of FOL formulas.
1 Introduction
Mpagouli, A. and Hatzilygeroudis, I., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 157–163.
158 Aikaterini Mpagouli and Ioannis Hatzilygeroudis
2 Related Work
Our work can be considered as belonging to the field of Natural Language Gen-
eration [3], since it generates NL sentences from some source of information,
which are FOL formulas. In the existing literature, we couldn’t trace any directly
similar effort, i.e. an effort to translate FOL sentences into natural language sen-
tences. However, we traced a number of indirectly related efforts, those of trans-
lating some kind of natural language expressions into some kind of FOL ones.
In [4] an application of Natural Language Processing (NLP) is presented. It is
an educational tool for translating Spanish text of certain types of sentences into
FOL implemented in Prolog. This effort gave us a first inspiration about the form
of the lexicon we use in our FOLtoNL system.
In [5], ACE (Attempto Controlled English), a structured subset of the English
language, is presented. ACE has been designed to substitute for formal symbol-
isms, like FOL, in the input of some systems to make the input easier to under-
stand and to be written by the users.
Finally, in [6], a Controlled English to Logic Translation system, called CELT,
allows users to give sentences of a restricted English grammar as input. The sys-
tem analyses those sentences and turns them into FOL. What is interesting about it
is the use of a PhraseBank, a selection of phrases, to deal with the ambiguities of
some frequently used words in English like have, do, make, take, give etc.
Our FOLtoNL conversion algorithm takes as input FOL formulas [1] of the fol-
lowing form (in a BNF notation, where ‘[ ]’ denotes optional and ‘< >’ non-
terminal symbols): [<quant-expr>] [<stmt1> =>] <stmt2>, where <quant-expr>
denotes the expression of quantifiers in the formula, ‘=>’ denotes implication and
<stmt1> and <stmt2> denote the antecedent and the consequent statements of the
implication. These statements do not contain quantifiers. So, the input formula is
in its Prenex Normal Form [1, 2]. Furthermore, <stmt1> and <stmt2> can not con-
tain implications. Hence, the system currently focuses on the translation of simple
FOL implications or FOL expressions that do not contain implications at all. For
typing convenience, we use the following symbols in our FOL formulas: ‘~’ (ne-
gation), ‘&’ (conjunction), ‘V’ (disjunction), ‘=>’ (implication), ‘forall’ (universal
quantifier), ‘exists’ (existential quantifier).
The key idea of our conversion method, based on a FOL implication, is that
when both the antecedent and the consequent statements exist, the consequent can
give us the Basic Structure (BS) of that implication’s NL translation. BS may con-
tain variable symbols. In that case, the antecedent of the implication can help us to
define the entities represented by those variable symbols. In other words, we can
A Knowledge Based System for Translating FOL Formulas 159
find NL substitutes for those variables and then use them instead of variable sym-
bols in BS to provide the final NL translation of the implication.
If the FOL expression does not contain variables, the translation is simpler: “if
<ant-translation> then <con-translation>”, where <ant-translation> and <con-
translation> consist of appropriately combined interpretations of atoms and con-
nectives. In case we have only <stmt2>, i.e. an expression without implications,
we use the same method with one difference: variable NL substitutes emerge from
the expression of quantifiers, since there is no antecedent. Of course, there is a
special case in which we choose some atoms of the expression for the estimation
of NL substitutes and the rest of them for the BS and we work as if we had an im-
plication.
The basic steps of our algorithm are the following:
1. Scan the user input and determine <quant-expr>, <stmt1> and <stmt2>.
Gather information for each variable (symbol, quantifier etc). Each atom
represents a statement. Analyze each atom in the three parts of its corre-
sponding statement: subject-part, verb-part and object-part.
2. If <stmt1> ≠ ∅,
2.1 Find the basic structure (BS) of the final sentence based on <stmt2>.
2.2 For each variable symbol in BS specify the corresponding NL sub-
stitute based on <stmt1>. If there are no variables, then BS is in NL.
In that case, find also the Antecedent Translation (AT) based on
<stmt1>.
3. If <stmt1> = ∅,
3.1 Find BS based on all or some of the atoms of <stmt2>.
3.2 For each variable symbol in BS specify the corresponding NL sub-
stitute based on the information of quantifiers or particular atoms of
<stmt2>. If there are no variables, then BS is build via all the atoms
of <stmt2> and is in NL.
4. Substitute each variable symbol in BS for the corresponding NL substitute
and give the resulting sentence as output. If there are no variables, distin-
guish two cases: If the initial FOL sentence was an implication then return:
“If <AT> then <BS>”. Otherwise, return BS.
In the sequel, we explain steps 2 and 3 of our algorithm, which are quite similar.
In this subsection, steps 2.1 and 3.1 are analyzed. First of all, we find the atoms in
<stmt2> that can aggregate, i.e. atoms that have the same subject-part and verb-
part but different object-parts, or the same subject-part but different verb-parts and
object-parts or different subject-parts but the same verb-part and the same object-
part. Atoms that can aggregate are combined to form a new sentence which is
called a sub-sentence. If an atom cannot be aggregated, then it becomes a sub-
sentence itself. This process ends up with a set of sub-sentences, which cannot be
160 Aikaterini Mpagouli and Ioannis Hatzilygeroudis
further aggregated and, when divided by commas, give BS. Let us consider the
following input sentences as examples:
(i) (forall x) (exists y) human(x) & human(y) => loves(x,y)
(ii) (exists x) cat(x) & likes(Kate,x)
(iii) (forall x) (exists y) (exists z) dog(x) & master(y,x) & town(z) &
lives(y,z) => lives(x,z) & loves(x,y)
(iv) (forall x) bat(x) => loves(x,dampness) & loves(x,darkness) & small(x)
& lives(x,caves)
(v) (forall x) bird(x) & big(x) & swims(x) & ~flies(x) => penguin(x)
The basic structures produced for these input sentences are the following (note
the aggregation in (iii) and (iv) and the exclusion of the atom ‘cat(x)’ from BS in
(ii)):
(i) x loves y.
(ii) Kate likes x.
(iii) x lives in z and loves y.
(iv) x loves dampness and darkness and x is small and lives in caves.
(v) x is a penguin.
The next stage is the specification of NL substitutes for the variable symbols.
4 Implementation Aspects
The FOLtoNL process has been implemented in Jess [7]. Jess is a rule-based ex-
pert system shell written in Java, which however offers adequate general pro-
gramming capabilities, such as definition and use of functions. The system in-
cludes two Jess modules, MAIN and LEX. Each Jess module has its own rule base
and its own facts and can work independently from the rest of Jess modules. Focus
is passed from one module to the other to execute its rules. ΜΑΙΝ is the basic
module of the system, whereas LEX is the system’s lexicon.
The lexicon consists of a large number of facts concerning words, called word-
facts. Each word-fact is an instance of the following template: (word ?type ?gen
?form ?past ?exp ?stem ?lem), where ‘word’ declares that it is a fact describing
the word ?lem and the rest are variables representing the fields that describe the
word (part of speech, gender, number, special syntax, stem).
In this paper, we present an approach for translating FOL formulas into NL sen-
tences, called the FOLtoNL algorithm. The whole system is implemented in Jess
and consists of a rule-based system that implements the conversion algorithm and
a lexicon. Of course there are some restrictions that are challenges for further
work. One problem is the interpretation of sentences which are entirely in the
scope of a negation. Yet another constraint is forced upon the use of ‘=>’, which
can only occur once in the input sentence. Another restriction is that currently we
do not take into consideration the order of quantifiers in the user input. Finally, the
lexicon at the moment contains a limited number of words. It should be further ex-
tended. All the above problems constitute our next research goals, concerning our
algorithm and system.
A Knowledge Based System for Translating FOL Formulas 163
References
MIPS laboratory, University of Haute Alsace, 4 rue des Frères Lumière, F-68093 Mulhouse
Cedex, France
1 Introduction
Karathanou, A., Buessler, J.-L., Kihl, H. and Urban, J.-P., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 165–173.
166 Argyro Karathanou et al.
age acquisition, and on-line image processing. Low and medium magnification
image processing aims to globally characterize the sample and determine the re-
gions of interest to be explored. Membranes will be finally examined at high mag-
nification to assess the success of the crystallization. We present here certain as-
pects of TEM image processing acquired at medium magnification (x5000).
TEM images at medium magnification appear globally within a wide gray level
range as they can contain dark objects (protein or membrane aggregates, staining
artifacts) within a bright background with gray level fluctuations. However, inter-
esting non-superposed membranes are low contrasted, slightly darker than the
background with borders that often mark membrane boundaries. The low contrast
of the objects of interest makes image processing all the more difficult as TEM
images appear particularly noisy.
The principal objective of our image analysis at medium magnification is to
identify interesting membranes to provide certain statistical characteristics (such
as quantity, size, shape, etc) and to trigger a new acquisition at high magnification.
This paper deals with the background-foreground objects recognition within
medium magnification images. For membrane detection, a multiresolution seg-
mentation approach has been proposed and is briefly discussed in section 2. This
segmentation algorithm leads to an image partition, without specifying which re-
gions belong to the foreground objects or the background. A region, by itself, does
not have enough characteristics to realize its classification. Membrane object iden-
tification therefore requires the background extraction. The segmentation often
splits the background into smaller regions. Our approach, described in section 3,
eliminates false contours that create the over segmentation. In this way, the back-
ground can be regarded as a large and bright region that can be simply extracted
based on tthese two hypotheses.
In this section we will discuss the first part of the membrane identification proc-
ess: the edge segmentation algorithm applied to membrane detection.
Low contrasted membrane boundaries are not always characterized by a suffi-
cient gradient; their gradient amplitude and/or direction varies along the contour.
A multi resolution mechanism was therefore employed. Multi resolution gradient
analysis, as proposed in [2], overcomes problems that common edge techniques
face in TEM images. This method employs coarser resolutions to enhance edge
detection. An automatic, adapted thresholding is embedded to this gradient analy-
sis. The threshold for each resolution is determined automatically based on a his-
togram analysis that results in a confidence threshold of 2%. In this way, at each
resolution, a reasonable amount of noisy gradients is retained.
This analysis results in a set of binary images that are combined to form a re-
constructed gradient like (RGL) image. The value of each pixel in the RGL image
indicates the best scale at which it has been thresholded. RGL image provides a
Background Extraction in Electron Microscope Images 167
better compromise between edge detection and localization precision. Compared
to a gradient image, the RGL image is an edge map almost noise-free and is suit-
able for the watershed transform.
The watershed transform is then applied on this gradient image which produces
the segmented image. This algorithm progressively floods the regions starting
from the minima values and marks the merging of two basins with the watershed
line [3]. The resulting watershed lines situated along the edge ridges, partition the
image into regions formed by 1-pixel-width closed contours; and thus providing a
convenient partition of the image.
Results showed that TEM images are segmented satisfactorily. All important
low contrasted membrane contours are extracted in various TEM images. We no-
ticed that all membrane regions are detected with an over segmentation disturbing
mostly the background of the image.
3 Background Extraction
This section deals with the second part of the membrane recognition process. Af-
ter having partitioned the image into regions, we need to define the objects that are
present in our images. As a straight definition for objects characteristics is difficult
to be given, we can consider the recognition problem differently: including grad-
ual steps, beginning from differentiating the background from the foreground ob-
jects by extracting it.
The problem of the background extraction is principally raised because of the
absence of a simple criterion for the foreground-background separation. Common
methods are based on global or local threshold techniques. However, in complex
images, such methods cannot be considered. Examples of background extraction
can be found in color natural images where the size, the position and the color of
the background [9] are used as hypotheses for its extraction. In text document im-
ages, background is discriminated from text based on local statistical properties of
predefined regions (connected components) [8]. Others refine or adapt segmenta-
tion results in order to detect the objects of interest and background [6,7].
Fig. 1. Left: Initial TEM image, Center: Zoom of the white window of the initial image, Right:
Segmentation of the zoomed region where, 1: membrane region with average gray level of 9180,
2: background region with average gray level of 9250, 3: background region with average gray
level of 9470, 4: membrane region with average gray level of 9310
We propose a method specific for the treatment of our images. We assume that
the background region is characterized by: a) a large and continuous image region
with no specific shape, b) a high average gray level, presenting sometimes impor-
tant local fluctuations in combination with a strong noise, c) almost no structured
gradient.
Fig. 2. Perpendicular extraction of profile transitions for a given contour segment separating re-
gions A, B for gradient calculation
The gradient perpendicular to the contour permits to detect and then eliminate
the false contours. This gradient measure was computed for each contour pixel by
Background Extraction in Electron Microscope Images 169
extracting the 1D transition profile perpendicularly, as illustrated in Fig. 2, and
then correlating this profile with a gradient reference filter.
The importance of noise and the assumptions concerning its nature, made sta-
tistical hypothesis test an essential step. The gradient measure is therefore assessed
by means of a hypothesis test. In order to set an optimal threshold for our decision,
the average correlation measure was computed for the whole segment; this last is
defined as the group of edge pixels that separate two adjacent regions. This meas-
ure was reinforced by taking into account the gradient’s amplitude and direction
along the contour. Finally, a segment is validated if there exists a gradient perpen-
dicular to the contour statistically significant.
However, for false contours elimination, an iterative solution was chosen as in
[7]. In order to obtain meaningful regions, we searched for each one the most ap-
propriate fusion. Results showed that this elimination is efficient concerning the
background as it is not significantly disturbed by spurious contours, facilitating its
extraction.
After false contours elimination, the background is now considered free from over
segmentation. It is no more divided into small regions but appears as a large re-
gion. This characteristic is verified in all our images. This background region ap-
pears globally bright even though it presents some gray level fluctuations related
to the acquisition conditions (non-uniform illumination, etc.).
We propose a background extraction technique that is composed of three steps:
1) For each region R i whose size is greater than a threshold Ts :
a) Compute average gray level G ( R i ) of this region;
b) If G ( R i ) is greater than threshold Ti , region R i is retained as a
background region,
We introduce supplementary tests to avoid detecting a large membrane region
as background, using the fact that this region neighbors the background.
2) For each background region detected, find all neighbors;
3) If two neighboring regions are selected as background, use the gradient di-
rection of the contour segments validated during the contour validation step (sec-
tion 3.2) to retain as background only the brightest region.
Thresholds Ts and Ti are set empirically as they are highly image dependent.
As an example, Ts is set to 10% of the image size, and Ti to 70% of the maximal
region average gray level value. These thresholds were tested for a large number
of TEM images providing a satisfactory background selection.
170 Argyro Karathanou et al.
a) b)
c) d)
e) f)
Fig. 3. a) Initial image, b) Initial multiresolution segmentation with the white arrows indicating
false contours over segmenting the background, c) False contour detection (in white), d) Final
segmentation after elimination of false segments, e) Background extraction (in white), f) Back-
ground (in white) with segmented membrane regions (in black)
4 Results
The efficiency of the whole proposed method has been systematically assessed on
various series of images acquired with different TEMs taken under standard ac-
quisition conditions of illumination and exposure time. These images contain
Background Extraction in Electron Microscope Images 171
membranes of different types (such as sheets-like membranes or vesicles) and
sizes where our algorithm was able to extract the regions of interest satisfactorily
and finally identify the image background.
More specifically, the proposed segmentation and background extraction
scheme has been tested on 45 representative TEM images. The quantitative
evaluation of Table 1 was established according to an expert’s analysis. Our tech-
nique extracted and suitably selected the important contours segmenting all fore-
ground objects.
We consider the background well-detected even if small background regions
within the image are not identified. They represent less than 4% of the total image
size. Table 1 shows that the background in 87% of the images has been well-
detected, among them 13% contained small undetected regions (Fig. 4). A back-
ground is considered partially detected when a background region of a more im-
portant size is not identified (representing 4% to 8% of the total image size).
Complementary algorithms are currently implemented to improve the detection of
this kind of regions. In the case of a background misclassification, at least 50% of
the background surface is not detected.
On the other hand, foreground objects are globally properly classified as such.
Table 1. Quantitative performance measures of the background extraction algorithm for 45 repre-
sentative images
BACKGROUND MISCLASSIFICATION 2%
Fig. 4. Left: TEM image, Right: An example of the background extraction (in white)
172 Argyro Karathanou et al.
Fig.3a illustrates an example of a typical TEM image part. In the left hand side
of the image, a huge vesicle stack can be observed. Fig.3b shows the initial seg-
mentation of the image where the background over segmentation can be clearly
noticed. This last is resolved by the detection and elimination of false contours as
shown in the next step of the process (Fig.3c, 3d)). The background is then ex-
tracted allowing a clear distinction between the two classes objects-background.
5 Conclusion
We described a chain process for the extraction of the background in gray level
TEM images. This process starts with a multiresolution edge extraction technique
that segmented low contrasted membrane regions. False contours were then elimi-
nated by means of a statistical validation technique. This last enables a proper
false-true edge classification and therefore a correct background-object distinc-
tion. Background appears large and bright, characteristics that allow its extraction.
This technique is implemented for objects recognition on electron microscope im-
ages.
Acknowledgments This work was supported by the EU 6th framework (HT3DEM, LSHG-CT-
2005-018811, in collaboration with the Biozentrum of Basel and FEI company who provided the
TEM images.
References
Abstract Most current machine learning systems for medical decision support do
not produce any indication of how reliable each of their predictions is. However, an
indication of this kind is highly desirable especially in the medical field. This paper
deals with this problem by applying a recently developed technique for assigning
confidence measures to predictions, called conformal prediction, to the problem of
acute abdominal pain diagnosis. The data used consist of a large number of hospital
records of patients who suffered acute abdominal pain. Each record is described
by 33 symptoms and is assigned to one of nine diagnostic groups. The proposed
method is based on Neural Networks and for each patient it can produce either the
most likely diagnosis together with an associated confidence measure, or the set of
all possible diagnoses needed to satisfy a given level of confidence.
1 Introduction
Machine learning techniques have been applied successfully to many medical de-
cision support problems [7, 8] and many good results have been achieved. The re-
sulting systems learn to predict the diagnosis of a new patient based on past history
of patients with known diagnoses. Most such systems produce as their prediction
only the most likely diagnosis of the new patient, without giving any confidence
Harris Papadopoulos
Computer Science and Engineering Department, Frederick University, 7 Y. Frederickou St., Palou-
riotisa, Nicosia 1036, Cyprus. e-mail: [email protected]
Alex Gammerman
Department of Computer Science, Royal Holloway, University of London, Egham Hill, Egham,
Surrey TW20 0EX, England. e-mail: [email protected]
Volodya Vovk
Department of Computer Science, Royal Holloway, University of London, Egham Hill, Egham,
Surrey TW20 0EX, England. e-mail: [email protected]
Papadopoulos, H., Gammerman, A. and Vovk, V., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 175–184.
176 Harris Papadopoulos, Alex Gammerman and Volodya Vovk
2 Conformal Prediction
In this section we give a brief description of the idea behind CP, for more details
see [24]. We are given a training set {z1 , . . . , zl } of examples, where each zi ∈ Z is
a pair (xi , yi ); xi ∈ IRd is the vector of attributes for example i and yi ∈ {Y1 , . . . ,Yc }
is the classification of that example. We are also given a new unclassified exam-
ple xl+1 and our task is to state something about our confidence in each possible
classification of xl+1 .
CP is based on measuring how likely it is for each extended set of examples
to have been generated independently from the same probability distribution. First
we measure how strange, or non-conforming, each example in (1) is for the rest
of the examples in the same set. We use what is called a non-conformity measure
which is based on a traditional machine learning algorithm, called the underlying
algorithm of the CP. This measure assigns a numerical score αi to each example
(xi , yi ) indicating how different it is from all other examples in (1). In effect we
train the underlying algorithm using (1) as training set and we measure the degree
of disagreement between its prediction for xi and the actual label yi ; in the case of
xl+1 we use the assumed label Y j in the place of yl+1 .
(Y )
The non-conformity score αl+1j of (xl+1 ,Y j ) on its own does not really give us
any information, it is just a numeric value. However, we can find out how unusual
(Y )
(xl+1 ,Y j ) is according to our non-conformity measure by comparing αl+1j with all
other non-conformity scores. This comparison can be performed with the function
(Y )
#{i = 1, . . . , l + 1 : αi ≥ αl+1j }
p((x1 , y1 ), . . . , (xl , yl ), (xl+1 ,Y j )) = . (2)
l +1
1
We call the output of this function, which lies between l+1 and 1, the p-value of Y j ,
also denoted as p(Y j ), as that is the only part of (1) we were not given. An important
property of (2) is that ∀δ ∈ [0, 1] and for all probability distributions P on Z,
for a proof see [12]. As a result, if the p-value of a given label is under some very
low threshold, say 0.05, this would mean that this label is highly unlikely as such
sequences will only be generated at most 5% of the time by any i.i.d. process.
After calculating the p-value of every possible label Y j , as described above, we
are able to exclude all labels that have a p-value under some very low threshold
(or significance level) δ and have at most δ chance of being wrong. Consequently,
given a confidence level 1 − δ a CP outputs the set
Alternatively the CP can predict the most likely classification together with a con-
fidence and a credibility measure in this prediction. In this case it predicts the clas-
sification with the largest p-value, outputs one minus the second largest p-value as
confidence to this prediction and as credibility it outputs the p-value of the predicted
classification, i.e. the largest p-value.
The original CP technique requires training the underlying algorithm once for each
possible classification of every new test example. This means that if our problem
has 9 possible classifications and we have to classify 2000 test examples, as is the
case in this study, the training process will be repeated 9 × 2000 = 18000 times.
This makes it very computationally inefficient especially for algorithms that require
long training times such as Neural Networks.
Inductive Conformal Predictors (ICPs) are based on the same general idea de-
scribed above, but follow a different approach which allows them to train their un-
derlying algorithm just once. This is achieved by splitting the training set (of size l)
into two smaller sets, the proper training set with m < l examples and the calibra-
tion set with q := l − m examples. The proper training set is used for training the
underlying algorithm and only the examples in the calibration set are used for cal-
culating the p-value of each possible classification of the new test example. More
specifically, we calculate the p-value of each possible classification Y j of xl+1 as
(Y )
#{i = m + 1, . . . , m + q, l + 1 : αi ≥ αl+1j }
p(Y j ) = , (5)
q+1
where αm+1 , . . . , αm+q are the non-conformity scores of the examples in the calibra-
(Y )
tion set and αl+1j is the non-conformity score of (xl+1 ,Y j ).
In this section we analyse the Neural Networks ICP (NN-ICP) algorithm. We first
describe the typical output encoding for Neural Networks (NNs) and then, based on
this description, we define two non-conformity measures for NNs. Finally, we detail
the complete NN-ICP algorithm.
Confidence Predictions for the Diagnosis of Acute Abdominal Pain 179
or as
max j=1,...,c: j6=u oij
αi = , (7)
oiu + γ
where the parameter γ ≥ 0 in the second definition enables us to adjust the sensitivity
of our measure to small changes of oiu depending on the data in question. We added
this parameter in order to gain control over which category of outputs will be more
important in determining the resulting non-conformity scores; by increasing γ one
reduces the importance of oiu and consequently increases the importance of all other
outputs.
We can now use the non-conformity measure (6) or (7) to compute the non-
conformity score of each example in the calibration set and each test set pair
(xl+g ,Yu ). These can then be fed into the p-value function (5), giving us the p-value
for each classification Yu . The exact steps the Neural Networks ICP follows for a
training set {z1 , . . . , zl } and a test set {xl+1 , . . . , xl+r } are:
180 Harris Papadopoulos, Alex Gammerman and Volodya Vovk
• Split the training set into the proper training set with m < l examples and the
calibration set with q := l − m examples.
• Use the proper training set to train the Neural Network.
• For each example zm+t = (xm+t , ym+t ), t = 1, . . . , q in the calibration set:
– supply the input pattern xm+t to the trained network to obtain the output values
om+t m+t and
1 , . . . , oc
– calculate the non-conformity score αm+t of the pair (xm+t , ym+t ) by applying
(6) or (7) to these values.
• For each test pattern xl+g , g = 1, . . . , r:
– supply the input pattern xl+g to the trained network to obtain the output values
ol+g l+g
1 , . . . , oc ,
– consider each possible classification Yu , u = 1, . . . , c and:
(Y )
· compute the non-conformity score αl+g = αl+gu of the pair (xl+g ,Yu ) by
applying (6) or (7) to the outputs of the network,
· calculate the p-value p(Yu ) of the pair (xl+g ,Yu ) by applying (5) to the non-
(Y )
conformity scores of the calibration examples and αl+gu :
(Y )
#{i = m + 1, . . . , m + q, l + g : αi ≥ αl+gu }
p(Yu ) = ,
q+1
· predict the classification with the largest p-value (in case of a tie choose
the one with the smallest non-conformity score) and output one minus the
second largest p-value as confidence to this prediction and the p-value of
the output classification as its credibility,
· or given a confidence level 1 − δ output the prediction set (4).
The acute abdominal pain database used in this study was originally used in [5],
where a more detailed description of the data can be found. The data consist of 6387
records of patients who were admitted to hospital suffering from acute abdominal
pain. During the examination of each patient 33 symptoms were recorded, each of
which had a number of different discrete values. For example, one of the symptoms
is “Progress of Pain” which has the possible values: “Getting Better”, “No Change”,
“Getting Worse”. In total there are 135 values describing the 33 symptoms. These
values compose the attribute vector for each patient in the form of 135 binary at-
tributes that indicate the absence (0) or presence (1) of the corresponding value. It
is worth to mention that there are symptoms which have more than one value or no
value at all in many of the records.
There are nine diseases or diagnostic groups in which the patients were allocated
according to all information after their initial examination, including the results of
Confidence Predictions for the Diagnosis of Acute Abdominal Pain 181
Training Set 585 108 88 1941 372 290 65 326 612 4387
Test Set 259 35 42 894 200 127 31 147 265 2000
Total 844 143 130 2835 572 417 96 473 877 6387
The NN used in our experiments was a 2-layer fully connected feed-forward net-
work, with sigmoid hidden units and softmax output units. It consisted of 135 input,
35 hidden and 9 output units. The number of hidden units was selected by following
a cross validation scheme on the training set and trying out the values: 20, 25, 30,
35, 40, 45, 50, 55, 60. More specifically, the training set was split into five parts
of almost equal size and five sets of experiments were performed, each time us-
ing one of these parts for evaluating the NNs trained on the examples in the other
four parts. For each of the five test parts, a further 10-fold cross validation process
was performed to divide the examples into training and validation sets, so as to use
the validation examples for determining when to stop the training process. Training
was performed with the backpropagation algorithm minimizing a cross-entropy loss
function.
The results reported here were obtained by following a 10-fold cross validation
procedure on the training set in order to divide it into training and validation exam-
ples. To create the calibration set of the ICP, 299 examples were removed from the
training set before generating the 10 splits. This experiment was repeated 10 times
with random permutations of the training examples. Here we report the mean values
of all 100 runs.
Table 2 reports the accuracy of the NN-ICP and original NN methods and com-
pares them to that of the Simple Bayes, Proper Bayes and CART methods as re-
ported in [5]. Additionally it compares them to the accuracy of the preliminary di-
agnoses of the hospital physicians, also reported in [5]. Both the original NN and
NN-ICP outperform the other three methods and are almost as accurate as the hos-
182 Harris Papadopoulos, Alex Gammerman and Volodya Vovk
pital physicians. As was expected the original NN performs slightly better than the
ICP due to the removal of the calibration examples from the training set, however
the difference between the two is negligible. This is a very small price to pay con-
sidering the advantage of obtaining a confidence measure for each prediction.
Table 3 lists the results of the NN-ICP when producing set predictions for the
99%, 95%, 90% and 80% confidence levels. More specifically it reports the per-
centage of examples for which the set output by the ICP consisted of only one label,
of more than one label or was empty. It also reports in the last column the per-
centage of errors made by the ICP, i.e. the percentage of sets that did not include
the true classification of the example. The values reported here reflect the difficulty
in discriminating between the 9 diseases. Nevertheless, the set predictions output
by the NN-ICP can be very useful in practice since they pinpoint the cases where
more attention must be given and the diagnostic groups that should be considered
for each one. Bearing in mind the difficulty of the task and the 76% accuracy of the
preliminary diagnoses of physicians, achieving a 95% of accuracy by considering
more than one possible diagnosis for only about half the patients is arguably a good
result.
Acknowledgements This work was supported by the Cyprus Research Promotion Foundation
through research contract PLHRO/0506/22 (“Development of New Conformal Prediction Methods
with Applications in Medical Diagnosis”).
References
1. Anagnostou, T., Remzi, M., Djavan, B.: Artificial neural networks for decision-making in
urologic oncology. Review in Urology 5(1), 15–21 (2003)
2. Anastassopoulos, G.C., Iliadis, L.S.: Ann for prognosis of abdominal pain in childhood: Use
of fuzzy modelling for convergence estimation. In: Proceedings 1st International Workshop
on Combinations of Intelligent Methods and Applications, pp. 1–5 (2008)
3. Blazadonakis, M., Moustakis, V., Charissis, G.: Deep assessment of machine learning tech-
niques using patient treatment in acute abdominal pain in children. Artificial Intelligence in
Medicine 8(6), 527–542 (1996)
4. Christoyianni, I., Koutras, A., Dermatas, E., Kokkinakis, G.: Computer aided diagnosis of
breast cancer in digitized mammograms. Computerized Medical Imaging and Graphics 26(5),
309–319 (2002)
5. Gammerman, A., Thatcher, A.: Bayesian diagnostic probabilities without assuming indepen-
dence of symptoms. Methods of Information in Medicine 30(1), 15–22 (1991)
6. Holst, H., Ohlsson, M., Peterson, C., Edenbrandt, L.: Intelligent computer reporting ‘lack of
experience’: a confidence measure for decision support systems. Clinical Physiology 18(2),
139–147 (1998)
7. Kononenko, I.: Machine learning for medical diagnosis: History, state of the art and perspec-
tive. Artificial Intelligence in Medicine 23(1), 89–109 (2001)
8. Lisboa, P.: A review of evidence of health benefit from artificial neural networks in medical
intervention. Neural Networks 15(1), 11–39 (2002)
184 Harris Papadopoulos, Alex Gammerman and Volodya Vovk
9. Mantzaris, D., Anastassopoulos, G., Adamopoulos, A., Gardikis, S.: A non-symbolic imple-
mentation of abdominal pain estimation in childhood. Information Sciences 178(20), 3860–
3866 (2008)
10. Melluish, T., Saunders, C., Nouretdinov, I., Vovk, V.: Comparing the Bayes and Typicalness
frameworks. In: Proceedings 12th European Conference on Machine Learning (ECML’01),
Lecture Notes in Computer Science, vol. 2167, pp. 360–371. Springer (2001)
11. Nouretdinov, I., Melluish, T., Vovk, V.: Ridge regression confidence machine. In: Proceed-
ings 18th International Conference on Machine Learning (ICML’01), pp. 385–392. Morgan
Kaufmann, San Francisco, CA (2001)
12. Nouretdinov, I., Vovk, V., Vyugin, M.V., Gammerman, A.: Pattern recognition and density
estimation under the general i.i.d. assumption. In: Proceedings 14th Annual Conference on
Computational Learning Theory and 5th European Conference on Computational Learning
Theory, Lecture Notes in Computer Science, vol. 2111, pp. 337–353. Springer (2001)
13. Ohmann, C., Moustakis, V., Yang, Q., Lang, K.: Evaluation of automatic knowledge acquisi-
tion techniques in the diagnosis of acute abdominal pain. Artificial Intelligence in Medicine
8(1), 23–36 (1996)
14. Papadopoulos, H.: Tools in Artificial Intelligence, chap. 18. Inductive Conformal Prediction:
Theory and Application to Neural Networks, pp. 315–330. I-Tech, Vienna, Austria (2008).
URL http://intechweb.org/downloadpdf.php?id=5294
15. Papadopoulos, H., Gammerman, A., Vovk, V.: Normalized nonconformity measures for re-
gression conformal prediction. In: Proceedings IASTED International Conference on Artifi-
cial Intelligence and Applications (AIA 2008), pp. 64–69. ACTA Press (2008)
16. Papadopoulos, H., Proedrou, K., Vovk, V., Gammerman, A.: Inductive confidence machines
for regression. In: Proceedings 13th European Conference on Machine Learning (ECML’02),
Lecture Notes in Computer Science, vol. 2430, pp. 345–356. Springer (2002)
17. Papadopoulos, H., Vovk, V., Gammerman, A.: Qualified predictions for large data sets in the
case of pattern recognition. In: Proceedings 2002 International Conference on Machine Learn-
ing and Applications (ICMLA’02), pp. 159–163. CSREA Press (2002)
18. Papadopoulos, H., Vovk, V., Gammerman, A.: Conformal prediction with neural networks.
In: Proceedings 19th IEEE International Conference on Tools with Artificial Intelligence (IC-
TAI’07), vol. 2, pp. 388–395. IEEE Computer Society (2007)
19. Pattichis, C., Christodoulou, C., Kyriacou, E., Pattichis, M.: Artificial neural networks in med-
ical imaging systems. In: Proceedings 1st MEDINF International Conference on Medical
Informatics and Engineering, pp. 83–91 (2003)
20. Pesonen, E., Eskelinen, M., Juhola, M.: Comparison of different neural network algorithms in
the diagnosis of acute appendicitis. International Journal of Bio-Medical Computing 40(3),
227–233 (1996)
21. Proedrou, K., Nouretdinov, I., Vovk, V., Gammerman, A.: Transductive confidence machines
for pattern recognition. In: Proceedings of the 13th European Conference on Machine Learn-
ing (ECML’02), Lecture Notes in Computer Science, vol. 2430, pp. 381–390. Springer (2002)
22. Saunders, C., Gammerman, A., Vovk, V.: Transduction with confidence and credibility. In:
Proceedings of the 16th International Joint Conference on Artificial Intelligence, vol. 2, pp.
722–726. Morgan Kaufmann, Los Altos, CA (1999)
23. Saunders, C., Gammerman, A., Vovk, V.: Computationally efficient transductive machines.
In: Proceedings of the Eleventh International Conference on Algorithmic Learning Theory
(ALT’00), Lecture Notes in Artificial Intelligence, vol. 1968, pp. 325–333. Springer, Berlin
(2000)
24. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer,
New York (2005)
25. Zorman, M., Eich, H.P., Kokol, P., Ohmann, C.: Comparison of three databases with a decision
tree approach in the medical field of acute appendicitis. Studies in Health Technology and
Informatics 84(2), 1414–1418 (2001)
Enhanced Human Body Fall Detection Utilizing
Advanced Classification of Video and Motion
Perceptual Components
Abstract The monitoring of human physiological data, in both normal and ab-
normal situations of activity, is interesting for the purpose of emergency event de-
tection, especially in the case of elderly people living on their own. Several tech-
niques have been proposed for identifying such distress situations using either
motion, audio or video data from the monitored subject and the surrounding envi-
ronment. This paper aims to present an integrated patient fall detection platform
that may be used for patient activity recognition and emergency treatment. Both
visual data captured from the user’s environment and motion data collected from
the subject’s body are utilized. Visual information is acquired using overhead
cameras, while motion data is collected from on-body sensors. Appropriate track-
ing techniques are applied to the aforementioned visual perceptual component
enabling the trajectory tracking of the subjects. Acceleration data from the sensors
can indicate a fall incident. Trajectory information and subject’s visual location
can verify fall and indicate an emergency event. Support Vector Machines (SVM)
classification methodology has been evaluated using the latter acceleration and
visual trajectory data. The performance of the classifier has been assessed in terms
of accuracy and efficiency and results are presented.
1 Introduction
Doukas, C., Maglogiannis, I., Katsarakis, N. and Pneumatikakis, A., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 185–193.
186 Charalampos Doukas et al.
incidents such as a fall, or a long period of inactivity in a part of their area. Several
techniques have been proposed for identifying such distress situations using either
motion, audio or video data from the monitored subject and the surrounding envi-
ronment. This paper presents a human body fall detection platform based both mo-
tion and visual perceptual components. A number of on-body sensors collect the
movement data and transmit them wirelessly to the monitoring unit, while over-
head cameras track the trajectory and shape of the body and provide information
regarding the patient’s position and activity. Appropriate classification of the mo-
tion data can give an indication of a fall. Combining the latter with unusual change
of body’ shape followed by inactivity, an alarm can be triggered and more infor-
mation regarding the severity of the incident can be obtained; in case patient re-
mains still after the fall or moves but the body is detected on the ground then the
patient requires immediate assistance.
The rest of the paper is organized as follows; Section 2 discusses related work
in the context of patient activity and fall detection. Section 3 describes the pro-
posed system architecture and Sections 4 and 5 describe the acquisition of the pa-
tient movement and visual data using sensors and overhead cameras respectively.
Section 6 presents the data classification using Support Vector Machines and cor-
responding evaluation results and finally Section 7 concludes the paper.
2 Related Work
Although the concept of patient activity recognition with focus on fall detection is
relatively new, there exists related research work, which may be retrieved from
the literature ([1]-[9]). Information regarding the patient movement and activity is
frequently acquired through visual tracking of the patient’s position. In [5] over-
head tracking through cameras provides the movement trajectory of the patient
and gives information about user activity on predetermined monitored areas. Un-
usual inactivity (e.g., continuous tracking of the patient on the floor) is interpreted
as a fall. Similarly, in 8 omni-camera images are used to determine the horizontal
placement of the patient’s silhouettes on the floor (case of fall). Success rate for
fall detection is declared at 81% for the latter work. A different approach for col-
lecting patient activity information is the use of sensors that integrate devices like
accelerometers, gyroscopes and contact sensors. The latter approach is less de-
pended on the patient and environmental information and can be used for a variety
of applications for user activity recognition ([1], [3], [7]). Regarding fall detection,
authors in [2], [6], [9] use accelerometers, gyroscopes and tilt sensors for move-
ment tracking. Collected data from the accelerometers (i.e., usually rotation angle
or acceleration in the X, Y and Z axis) is used to verify the placement of the pa-
tient and time occupation in rooms and detect abrupt movement that could be as-
sociated with fall. Detection is performed using predefined thresholds [1], [3], [4],
[6] and association between current position, movement and acceleration [2], [9].
To our best knowledge there is no work in the literature that combines both visual
Enhanced Human Body Fall Detection 187
and sensor information for a more complete and robust estimation of a patient’s
fall and can provide some information regarding the severity of the incident (e.g.
patient has gotten up right after the fall, patient is inactive, etc.).
Fig. 1. Platform Architecture and Data interaction between the movement capturing tools and
monitoring node.
188 Charalampos Doukas et al.
4 Patient Movement Data Acquisition
This section provides information on the acquisition and pre-processing of the pa-
tient movement data. The Sentilla Perk [10] sensor kit has been utilized in our sys-
tem. The latter contains two 2.4 GHz wireless data transceivers (nodes, see Fig. 2)
using the IEEE 802.15.4 (ZigBee) protocol. It also includes a USB port for inter-
face with a personal computer acting as the monitoring unit. Each node has a low-
power, low-voltage MCU (MicroController Unit), one 3D Accelerometer for X, Y
and Z axis and additional analog and digital input pins for adding more sensors.
The Perk nodes are provided in a plastic robust small-sized enclosure (6x3x1.5cm)
making them more suitable for placing on patient’s body and tolerating falls.
(a) (b)
Fig. 2. The Sentilla Perk node containing a 3D accelerometer that can be attached on user and
send motion data through the ZigBee wireless protocol. The plastic enclosure can protect the
node from falls and makes it more suitable for carrying it on patient’s body. A) Actual photo of
the node, b) illustration indicating two analog-to-digital converter ports for the addition of alter-
native sensors.
Two Perk nodes can be placed on patient’s body. Preferable positions are close
to user’s chest and user’s belt or lower at user’s foot. The latter positions have
proven based on conducted experiments to be appropriate for distinguishing rapid
acceleration on one of the three axis that is generated during a fall.
Appropriate J2ME [17] code is developed and deployed on the nodes for read-
ing the accelerometer values and transmitting them wirelessly to the monitoring
unit. At the latter a Java application built using the Sentilla IDE [10] receives the
movement data and performs further processing as described in the following sec-
tions. An example of motion data as received by the two sensor nodes is illustrated
in Fig. 3. The X, Y and Z acceleration values from both sensors are interlaced.
The goal of the developed body video tracker is to provide across time the frame
regions occupied by human bodies. The tracker is built around a dynamic fore-
ground segmentation algorithm [12] that utilizes adaptive background modeling.
This is based on Stauffer’s algorithm [13] to provide the foreground pixels.
Enhanced Human Body Fall Detection 189
Stauffer’s algorithm models the different colors every pixel can receive in a video
sequence by Gaussian Mixture Models (GMM). One GMM corresponds to every
pixel at given coordinates across time. The Gaussians are three-dimensional, cor-
responding to the red, green and blue components of the pixel color. Their weight
is proportional to the time a particular Gaussian models best the color of the pixel.
Hence the weight of a given Gaussian is increased as long as the color of the pixel
can be described by that Gaussian with higher probability than any other Gaussian
in the GMM can, and that probability is above a threshold. As a result, a map can
be built in which every pixel is represented by the weight of the Gaussian from its
GMM that best describes its current color. This is the Pixel Persistence Map
(PPM): Regions of the map with large values correspond to pixels that have colors
that appear there for a long time, hence they belong to background. On the con-
trary, regions with small values correspond to pixels that have colors that appear
there for a short time, hence they are foreground. This is true as long as the fore-
ground objects have distinct colors from the background.
Fig. 3. Illustration of interlaced from both sensors acceleration data in X, Y and Z axis. The Y
axis represents the acceleration value (range between -2 and 2) and the X axis the number of
samples acquired.
The problem of Stauffer’s algorithm is with foreground objects that stop mov-
ing. In its original implementation, targets/objects that stop moving are learnt into
the background. This happens as the weights of the Gaussians of the GMM of pix-
els describing the foreground colors and corresponding to immobile foreground
objects increase with time. To avoid this, the learning rates of the adaptation that
increase the weights of Gaussians are not constant, neither across space, nor
across time. Instead, they are spatiotemporally controlled by the states of Kalman
filters [11]. Every foreground area corresponds to a target being tracked by a Kal-
man filter. The foreground pixels are combined into body evidence blobs, used for
the measurement update stage of the Kalman filters. The states are used to obtain
the position, size and mobility of each target, the latter being a combination of
translation and size change. This information is fed back to the adaptive back-
ground modeling module to adapt the learning rate in the vicinity of each target:
frame regions that at a specific time have a slow-moving target have smaller learn-
ing rates. The block diagram of the body tracker is shown in Fig. 4.
190 Charalampos Doukas et al.
With the feedback configuration of the tracker, the learning of the slow moving
foreground objects into the background is slowed down long enough for the in-
tended application, i.e. tracking people moving indoors and possibly falling down.
The tracker results when applied on the visual feed by an overhead camera are il-
lustrated in Fig. 5.
Fig. 4. Block diagram of the body video tracker. Kalman filters spatiotemporally adapt the learn-
ing rates of the adaptive background algorithm, effectively avoiding learning of immobile fore-
ground objects into the background.
Fig. 5. Visualization of video tracking performance. The tracker detects the movement of the
body and correlates it with the movement of a rectangular blob within the visual domain. Upper
left X, Y coordinates and respective width and height of the blob are reported for each visual
frame. Frame A corresponds to normal walking, Frame B to captured movement during fall and
Frame C illustrates detection of body in horizontal position after fall.
Tracking through overhead cameras has been selected due to the fact that it
provides a better visual representation of the monitored area and allows the tracker
to gain a better estimation of the body shape when subject moves, falls and lays
still after fall. The presented tracker creates and tracks a rectangular blob around
the detection of the moving body within the frames and reports the upper left cor-
ner coordinates and respective width and height of the blog. As indicated in Fig. 5
the size of the blob changes during the fall and after it.
Enhanced Human Body Fall Detection 191
6 The System in Practice: Classification of Motion and Visual
Perceptual Components
This Section provides information regarding the classification method used and
reports the accuracy of the system in the detection of a patient fall. According to
our previous research [14], [15] the SVM (Support Vector Machines) classifica-
tion method has been proved to obtain high accuracy in the detection of fall inci-
dents based on movement data. More particularly accuracy rates for the distinction
of fall against other movement types can reach 98.2%. In previous experiments
the train model has been built using only acceleration data whereas in the pro-
posed system the train model contains also visual information as described in Sec-
tion 5. The WEKA tool [16] has also been used for the development and evalua-
tion the SVM model. Classification data are provided in the following form:
Fall_ID X Y Z BBx BBy BBWidth BBHeight
where X, Y and Z are the acceleration data as retrieved from the sensors, BBx and
BBy are the upper left coordinates of the bounding box that tracks patient’s body
and BBwidth and BBheight the width and height of the bounding box respec-
tively. Fall_ID represents the case of fall incident (true or false).
To evaluate the efficiency and accuracy of the presented platform in the context
of detecting patient falls, a number of experiments were conducted; a volunteer
wearing the sensors devices described in Section 4 was recorded walking and fal-
ling in different locations and ways while an overhead camera was capturing vis-
ual frames. Motion data and body shape features are utilized for creating classifi-
cation models. The 10-cross fold validation methodology has been used to verify
each model’s accuracy and performance.
Apart from the detection of fall the system is also capable of estimating the se-
verity of the incident: When an estimation of a fall has occurred based on the sen-
sor and visual data the standard deviation of accelerometer values and visual
bounding box values is calculated for the next 15 seconds. A specific threshold
has been determined for each value that can determine the severity of the incident
according to the following table:
Table 1. Decision matrix for the severity of a fall incident based on standard deviations of
movement data and body bounding box coordinates after a fall has occurred.
7 Conclusions
In this paper an enhanced patient fall detection system has been proposed that
combines both motion and visual information. Accelerometer data obtained
through wireless sensors in conjunction to body shape features acquired by visual
tracking are evaluated through a SVM train model. A detection of a fall incident is
then generated. In addition, combining the motion data and movement of the body
obtained visually after the fall, the severity of the fall can also be estimated alert-
ing treatment personnel appropriately.
References
1. Noury N., Herve T., Rialle V., Virone G., Mercier E., Morey G., Moro A., Porcheron
T., “Monitoring behavior in home using a smart fall sensor and position sensors”, In
Proc. 1st Annual International Conference on Microtechnologies in Medicine and Biol-
ogy, pp. 607-610, Oct. 2000.
2. Noury N., “A smart sensor for the remote follow up of activity and fall detection of the
elderly”, In Proc. 2nd Annual International Conference on Microtechnologies in Medi-
cine and Biology, pp. 314-317, May 2002.
3. Prado M., Reina-Tosina J., Roa L., “Distributed intelligent architecture for falling detec-
tion and physical activity analysis in the elderly”, In Proc. 24th Annual IEEE EMBS
Conference, pp. 1910-1911, Oct. 2002.
Enhanced Human Body Fall Detection 193
4. Fukaya K., “Fall detection sensor for fall protection airbag”, In Proc. 41st SICE Annual
Conference, pp. 419-420, Aug. 2002.
5. Nait-Charif, H. McKenna, S.J., “Activity summarisation and fall detection in a suppor-
tive home environment”, In Proc. 17th International Conference on Pattern Recognition
ICPR 2004, pp. 323-236, Aug. 2004.
6. Hwang, J.Y. Kang, J.M. Jang, Y.W. Kim, H.C., “Development of novel algorithm and
real-time monitoring ambulatory system using Bluetooth module for fall detection in
the elderly”, In Proc. 26th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, pp. 2204-2207, 2004.
7. Shuangquan Wang, Jie Yang, Ningjiang Chen, Xin Chen, Qinfeng Zhang, “Human ac-
tivity recognition with user-free accelerometers in the sensor networks”, In Proc. Inter-
national Conference on Neural Networks and Brain, pp. 1212-1217, Oct. 2005.
8. S.-G. Miaou, Pei-Hsu Sung, Chia-Yuan Huang, “A Customized Human Fall Detection
System Using Omni-Camera Images and Personal Information”, In Proc. 1st
Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, pp.39-
42, 2006.
9. Allen, F.R. Ambikairajah, E. Lovell, N.H. Celler, B.G., “An Adapted Gaussian Mixture
Model Approach to Accelerometry-Based Movement Classification Using Time-
Domain Features”, In Proc. 28th Annual International Conference of the IEEE Engi-
neering in Medicine and Biology Society, pp. 3600-3603, Aug. 2006.
10. The Sentilla Perk Pervasive Computing Kit, http://www.sentilla.com/perk.html
11. R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems”, Trans-
actions of the ASME – Journal of Basic Engineering, Vol.82, Series D, pp.35-45, 1960.
12. Pnevmatikakis and L. Polymenakos, “Robust Estimation of Background for Fixed Cam-
eras,” International Conference on Computing (CIC2006), Mexico City, Mexico, 2006.
13. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp.
747-757, 2000.
14. Charalampos Doukas, Ilias Maglogiannis, Philippos Tragkas, Dimitris Liapis, Gregory
Yovanof, “Patient Fall Detection using Support Vector Machines”, In Proc. 4th IFIP
Conference on Artificial Intelligence Applications & Innovations (AIAI), Sept. 19-21,
Athens, Greece.
15. Charalampos Doukas, Ilias Maglogiannis, “Advanced Patient or Elder Fall Detection
based on Movement and Sound Data”, presented at 2nd International Conference on
Pervasive Computing Technologies for Healthcare 2008.
16. Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine learning tools
and techniques", 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
17. The JAVA ME Platform, http://java.sun.com/javame/index.jspf
An Evolutionary Technique for Medical
Diagnostic Risk Factors Selection
1
Medical Informatics Laboratory, Democritus University of Thrace, GR-68100,
Alexandroupolis, Hellas
[email protected] [email protected]
2
Hellenic Open University, GR-26222, Patras, Greece
3
Department of Forestry & Management of the Environment and Natural Resources,
Democritus University of Thrace, GR-68200, Orestiada, Hellas
[email protected]
4
Medical Physics Laboratory, Democritus University of Thrace, GR-68100,
Alexandroupolis, Hellas
[email protected]
Abstract This study proposes an Artificial Neural Network (ANN) and Genetic
Algorithm model for diagnostic risk factors selection in medicine. A medical dis-
ease prediction may be viewed as a pattern classification problem based on a set of
clinical and laboratory parameters. Probabilistic Neural Networks (PNNs) were
used to face a medical disease prediction. Genetic Algorithm (GA) was used for
pruning the PNN. The implemented GA searched for optimal subset of factors that
fed the PNN to minimize the number of neurons in the ANN input layer and the
Mean Square Error (MSE) of the trained ANN at the testing phase. Moreover, the
available data was processed with Receiver Operating Characteristic (ROC)
analysis to assess the contribution of each factor to medical diagnosis prediction.
The obtained results of the proposed model are in accordance with the ROC
analysis, so a number of diagnostic factors in patient’s record can be omitted,
without any loss in clinical assessment validity.
1 Introduction
Mantzaris, D., Anastassopoulos, G., Iliadis, L. and Adamopoulos, A., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 195–203.
196 Dimitrios Mantzaris et al.
works (ANNs), Genetic Algorithms (GAs) and Fuzzy Logic are non symbolic ap-
proaches of AI.
ANNs have been proved as a powerful tool for solving a variety of problems
[1, 2] The problems’ categories, where ANNs have been applied, are bioengineer-
ing [3], signal processing [4], environmental subjects [5, 6] and other fields.
While the results of medical statistic models are satisfactory, there are non-
linear models that may contribute to the enhancement of medical decision support.
In particular, ANNs have the ability to correlate input data to corresponding out-
put data. Especially, ANNs have effectively contributed to disease diagnosis [7-
14], like oncology [9], pediatrics [10], urology [10, 11], pediatric surgery [12], or-
thopedics [14], etc.
In an ANN design stage, the implementation of the best ANN architecture to
solve a real-world problem is relatively complex. A neural network with few neu-
rons implies inadequate lore, while a big one leads to poor generalization ability,
presenting overfitting [15]. Early works for the investigation of appropriate ANN
structure is achieved by trial and error method. However, in the last few years,
more efficient methods for designing ANN architectures, automatically, have been
developed.
This study presents a GA for the ANN pruning and the detection of the essen-
tial smallest input data of diagnostic factors for ANN training. The obtained ANN
architecture uses diminished number of diagnostic factors and has an evolved
structure, without any loss in terms of its performance and functionality ability.
The abdominal pain diagnosis except of the traditional methods (clinical, labo-
ratory, imaging) could also be supported by numerically scoring systems, fuzzy
logic techniques, etc [16]. The aim of implemented method was the detection of
essential diagnostic factors for construction of PNNs to estimate the abdominal
pain in childhood.
The obtained results were compared with the Receiver Operating Characteristic
(ROC) analysis outcome. There was a high level of convergence between the two
methods and some of the diagnostic factors have proven to be essential for clinical
evaluation and prognosis, whereas, some other factors could be excluded during
clinical estimation, without any loss in ANNs’ prognosis accuracy. From a techni-
cal point of view, the detection of essential diagnostic factors gives the ability for
the design of an ANN with simpler structure and improved performance, because
ANN training is based on smaller data sets.
Both ANNs and GAs are inspired by biological processes. However, ANNs’
learning is based on individuals (phenotypic learning), while GAs adapt a popula-
tion to changing environment (genotypic learning). In recent years, there is a large
An Evolutionary Technique for Medical Diagnostic Risk 197
body of literature in combination of GAs and ANNs to produce evolutionary
ANNs, with improved performance and simplified architectures [17].
Trial and error method, which is used for the implementation of ANNs, is com-
putationally complex and does not ensure that the proposed architecture is the best
one. These restrictions conduced to the development of more efficient methods
which are divided in constructive and pruning (destructive) algorithms [18, 19].
Constructive algorithms start with a minimum number of neurons for ANNs and
dynamically add neurons, generating more complex ANNs to achieve a satisfac-
tory ANNs solution. On the other hand, a pruning algorithm starts from maximal
ANNs and cuts nodes, layers and synaptic connections during the training based
on already collected data.
The constructive and destructive algorithms might investigate restricted topo-
logical subsets instead of complete space of ANN’s architectures [20]. GAs are
another approach to solve the problem of ANN’s implementation. GAs are an op-
timization tool when the search space is of great complexity and large size. The
determination of optimal ANN structure that solves a specific problem can be
achieved by a GA search [15, 21, 22].
1 n × n+ = j
AUC = ∑ n × n +
-= j (4)
n n -= j +> j 2
+ − x ∈{set of all test results }
where x+ is each value of a diagnostic factor for cases with positive actual
states, x_ is each value of the same diagnostic factor for cases with negative actual
states, n+ is the sample size of data set (D+) contained cases with positive actual
states, and n_ the sample size of data set (D-.) contained cases with negative actual
states.
The AUC can be utilized as an estimator of the discriminatory performance of
the diagnostic factors of a system. The AUC for a system without resolving power
equals to 0.5, while the AUC for a system with perfect discrimination equals to 1.
It is clear that the AUC for a system with satisfactory resolving power is between
0.5 and 1. The greater of AUC, the best discrimination ability for the system [24].
4 Data Collection
The abdominal pain data was obtained from the Pediatric Surgery Clinical Infor-
mation System of Alexandroupolis’ University Hospital, Greece. The appendicitis
diagnosis is based on 15 clinical and biochemical factors which are sex, age, relig-
ion, demographic data, duration of pain, vomitus, diarrhea, anorexia, tenderness,
rebound, leucocytosis, neutrophilia, urinalysis, temperature and constipation. The
possible diagnosis stages are discharge, observation, no findings, focal appendici-
tis, phlegmonous or supurative appendicitis, gangrenous appendicitis and peritoni-
tis. These factors and the diagnosis stages are well described in Table 2 and Table
1, correspondingly, in [12]. As presented in [12], the possible stages of abdominal
pain examination are seven, whereof four stages demand operative treatment and
three are referred for conservative treatment.
The present study is based on a data set consisted of 516 cases, whereof 422
(81.78%) normal and 94 (18.22%) underwent operative treatment. The pruned
data set, used in the proposed model, was divided into a set of 400 (77.52%)
records for construction of PNNs and another set of 116 (22.48%) records for
assessment of PNNs’ performance.
An Evolutionary Technique for Medical Diagnostic Risk 199
5 Evolutionary PNN Architecture
The aim of the present study is the elimination of abdominal pain in childhood di-
agnostic factors and the determination of essential diagnostic factors for evolved
ANN implementation.
A Probabilistic Neural Network (PNN) is the selected ANN architecture among
a great variety of ANN topologies that was used in this study. A PNN which is
based on Parzen’s Probabilistic Density Function (PDF) estimator, is a three-layer
feed-forward network, consisting of an input layer, a radial basis and a competi-
tive layer [25].
As it was mentioned in section 4, the number of abdominal pain diagnostic fac-
tors equals to 15. The possible combinations of input data subsets are given by fol-
lowing mathematical notation,
N!
C= (4)
k! ( N − k )!
where N is the maximum number of diagnostic factors (in this problem N = 15)
and k is an integer for the number of diagnostic factors of each input data subset.
The k variable’s values are range from 1 to 15.
The chromosome’s length of each individual was equal to the total number of
diagnostic factors, so that the population of the used GA consisted of binary
strings of 15 bits. The GA used is two-objective, thus the GA has to search for di-
agnostic data sets that at the same time: (a) minimize the number of diagnostic
factors used during the training phase and therefore minimize the number of nodes
in ANN input and hidden layers, and (b) minimize the Mean Square Error of the
testing phase. For this purpose, the following fitness function was used:
I
f = MSE + (5)
N
where I is the number of ANN input nodes and N is the maximum number of di-
agnostic factors in the original, full-sized training data set.
To find out the essential diagnostic factors that can be used for evolutionary
PNN, different experiments were performed using the scattered crossover, single-
point crossover, two-point crossover and uniform crossover, as well as gaussian
mutation and uniform mutation binary GA operators.
6 Experimental Results
The aim of the present study is the elimination of abdominal pain diagnostic fac-
tors based on GA search for the essential and optimal combination of necessary
factors for PNN construction.
Whereas, the PNNs architecture is constrained by the available features of spe-
cific problem, the width of the calculated Gaussian curve for each probability den-
sity function have to be defined. In the present study, this spread factor varied
from 0.1 to 100.
200 Dimitrios Mantzaris et al.
An extensive investigation was performed, to assess the PNNs’ performance
for training and testing full-sized data set consisted of all the 15 diagnostic factors.
The obtained results for best-implemented PNNs are presented in Table 1. The
radbas and compet were the transfer functions for hidden and output layers, corre-
spondingly.
The number of neurons for input and hidden layers of PNNs was specified by
the number of diagnostic parameters and the cases of training set, correspond-
ingly. The number of neurons for PNNs’ output layer is seven and is based on
coding of possible diagnosis according to [12]. The values of spread and the MSE
for the PNNs with the best performance are recorded, correspondingly, in the 1st
and 2nd column of the Table 1.
Table 1. Spread and MSE for the best implemented PNNs with full-sized data set.
Spread MSE
0.1 0.0025
1.0 0.0025
10.0 0.14
100 0.5375
The obtained results by executing the GA are presented in Table 2. The 2nd col-
umn of this table depicts the fitness value of the optimal individual, the 3rd col-
umn, the MSE of the optimal individual and the 4th column the independent diag-
nostic factors that were used as inputs for PNNs construction. It is mentioned that
the forenamed values were recorded for the same values of spread as in Table 1.
As it is presented Table 2, the GA managed to converge to PNNs that used 7 up
to 8 of the 15 diagnostic factors. At the same time, the MSE of PNNs that were
trained with the pruned input data sets is significantly decreased in accordance of
MSE of trained PNNs with 15 diagnostic factors. Consequently, the diagnostic
ability of evolved PNNs is improved in compare with full-sized trained PNNs.
The decrease in genetically trained PNNs is of the order of 3.15% to 18.9%, de-
pending on the value of spread. The diagnostic factors which are more effective on
PNNs training and testing are Demographic Data, Duration of Pain, Leycocytosis,
Neutrophilia and Temperature.
7 ROC Curves
The available data set of appendicitis’ records was processed by ROC analysis.
The aim of this data processing was evaluation of importance role of each diag-
nostic factor for appendicitis estimation. The obtained results are summarized in
Table 3. The diagnostic factors are presented in 2nd column of the Table 3, while
the Area Under Curve of ROC for each factor is recorded in the 3rd column of this
Table. Each PNN network was evolved genetically using different crossover and
mutation operators so the Table 2 records two different results for each spread
value.
An Evolutionary Technique for Medical Diagnostic Risk 201
Table 2. MSE of PNNs trained with pruned sets of diagnostic input factors
Fitness
Spread MSE Diagnostic Factors
Value
Demographic data, Vomitus, Re-
0.03196126 0.00242131 bound, Leucocytosis, Neutrophilia,
Temperature, Constipation
0.1
Demographic data, Duration of pain,
0.03753027 0.00000000 Anorexia, Tenderness, Leucocytosis,
Temperature, Constipation
Age, Duration of pain, Diarrhea,
0.00823245 0.00242131 Anorexia, Tenderness, Leucocytosis,
Neutrophilia, Temperature
1.0
Age, Religion, Duration of pain,
0.00435835 0.00000000 Tenderness, Lecocytosis, Neutrophilia,
Temperature, Constipation
Age, Demographic data, Duration of
10.0 0.16319613 0.11380145 pain, Diarrhea, Leucocytosis, Neutro-
philia, Temperature
Sex, Age, Demographic data, Dura-
1.75484262 0.43583535 tion of pain, Tenderness, Leucocytosis,
Urinalysis
100
Sex, Demographic data, Vomitus,
1.58002421 0.47457627 Anorexia, Tenderness, Leucocytosis,
Urinalysis
The AUC is an important statistic of ROC analysis. A value of area larger than
0.5, proves the importance role of a diagnostic factor for appendicitis estimation.
As it is shown in Table 3, the most important diagnostic factors are Religion,
Demographic Data, Vomitus, Anorexia, Tenderness, Rebound, Leucocytosis,
202 Dimitrios Mantzaris et al.
Neutrophilia and Temperature.
The Sex, Duration of Pain, and Constipation are parameters without significant
contribution to appendicitis prediction as their values of AUC are equal to 0.5.
The Age and Urinalysis are diagnostic factors that have not the ability to discrimi-
nate true positive patients as their value are 0.357 and 0.073, correspondingly,
which are smaller than 0.5.
The results as obtained by genetically evolved PNNs and ROC analysis were
further processed. The aim of this processing is the investigation of the conver-
gence in terms of proposed diagnostic factors for appendicitis estimation. It is
concluded that Tenderness, Leucocytosis, Neutrophilia and Temperature are es-
sential factors for appendicitis prediction, so these parameters are strongly rec-
ommended have to be recorded for each patient.
8 Discussion
This study presents a specific GA to evolve the subsets of patients’ data that are
used as inputs for PNN construction and testing. After adequate steps of genetic
evolution, the GA converged to diagnostic factors subsets that were consisted
from 7 or 8 over a total of 15 diagnostic factors. The evolved PNNs overper-
formed the full-trained PNNs in terms of the MSE. Consequently, the implementa-
tion of PNNs based on specifically selected diagnostic factors, instead of all of
them resulted to increase of PNNs’ performance and prognostic ability while at
the same time the training procedure was speed up. The comparison of genetically
evolved PNNs’ outcomes with those of ROC analysis concludes that the medical
diagnostic factors present a high level of redundancy and overlapping. Therefore,
a number of the diagnostic factors for appendicitis estimation may be omitted with
no compromise to the fidelity of clinical evaluation.
Acknowledgment We are grateful to the Pediatric Surgeon of the Pediatric Surgery Clinic of the
University Hospital of Alexandroupolis, Greece for their valuable contribution.
References
1. Dayhoff J., and DeLeo J., “Artificial Neural Networks Opening the Black Box”, CANCER
Supplement, 2001, Vol. 91, No. 8, pp. 1615-1635.
2. Huang D., “A Constructive Approach for Finding Arbitrary Roots of Polynomials by Neural
Networks”, IEEE Transactions on Neural Networks, 2004, Vol. 15, No. 2, pp. 477 - 491.
3. Levano M., and Nowak H., “Application of New Algorithm, Iterative SOM, to the Detection
of Gene Expressions”, Proceedings 10th International Conference on Engineering Applica-
tions of Neural Networks, 2007, Thessaloniki, Greece, pp. 141-147.
4. Zaknick A., “Introduction to the Modified Probabilistic Neural Network for General Signal
Processing”, IEEE Transactions on Signal Processing, 1998, Vol. 46, No.7, pp. 1980-1990.
5. Iliadis L., "An Intelligent Artificial Neural Network Evaluation System Using Fuzzy Set
Hedges: Application in Wood Industry" Proceedings 19th IEEE Annual International Confer-
ence on Tools with Artificial Intelligence (ICTA), pp. 366-370.
An Evolutionary Technique for Medical Diagnostic Risk 203
6. Paschalidou A., Iliadis L., Kassomenos P., and Bezirtzoglou C., “Neural Modelling of the
Tropospheric Ozone Concentrations in an Urban Site”, Proc. 10th International Conference
on Engineering Applications of Neural Networks, 2007 Thessaloniki, Greece, pp. 436-445.
7. Keogan M., Lo J., Freed K., Raptopoulos V., Blake S., Kamel I., Weisinger K., Rosen M.,
and Nelson R., “Outcome Analysis of Patients with Acute Pancreatitis by Using an Artificial
Neural Network”, Academic Radiology, 2002, Vol. 9, No. 4, pp. 410-419.
8. Brause R., Hanisch E., Paetz J., and Arlt B., “Neural Networks for Sepsis Prediction - the
MEDAN-Project1”, Journal für Anästhesie und Intensivbehandlung, 2004, Vol. 11, No. 1,
pp. 40-43.
9. Gómez-Ruiz J., Jerez-Aragonés J., Muñoz-Pérez J., and Alba-Conejo E., “A Neural Network
Based Model for Prognosis of Early Breast Cancer”, Applied Intelligence, 2004, Vol. 20, No.
3, pp. 231-238.
10. Mantzaris D., Anastassopoulos G., Tsalkidis A., and Adamopoulos A., “Intelligent Prediction
of Vesicoureteral Reflux Disease”, WSEAS Transactions on Systems, 2005, Vol. 4, Issue 9,
pp. 1440-1449.
11. Anagnostou T., Remzi M., and Djavan B., “Artificial Neural Networks for Decision-Making
in Urologic Oncology”, Reviews In Urology, 2003, Vol. 5, No. 1, pp.15-21.
12. Mantzaris D., Anastassopoulos G., Adamopoulos A., and Gardikis S., “A Non-Symbolic Im-
plementation of Abdominal Pain Estimation in Childhood”, Information Science, 2008, Vol.
178, pp. 3860-3866.
13. Economou G., Lymperopoulos D., Karavatselou E., and Chassomeris C., "A New Concept
Toward Computer-Aided Medical Diagnosis - A Prototype Implementation Addressing Pul-
monary Diseases" IEEE Transactions in Information Technology in Biomedicine, 2001 Vol.
5, Issue 1, pp. 55-66.
14. Mantzaris D., Anastassopoulos C., and Lymperopoulos K., “Medical Disease Prediction Us-
ing Artificial Neural Networks”, Proceedings IEEE International Conference on BioInformat-
ics and BioEngineering, 2008, Athens, Greece
15. Georgopoulos E., Likothanassis S., and Adamopoulos A., “Evolving Artificial Neural Net-
works Using Genetic Algorithms”, Neural Network World, 2000, Vol. 4, pp.565-574
16. Blazadonakis M., Moustakis V., and Charissis G., “Deep Assessment of Machine Learning
Techniques Using Patient Treatment in Acute Abdominal Pain in Children”, Artificial Intelli-
gence in Medicine, 1996, Vol. 8, pp. 527-542
17. Branke J., "Evolutionary Algorithms for Neural Network Design and Training", Proceedings
1st Nordic Workshop on Genetic Algorithms and its Applications, 1995, Vaasa, Finland.
18. Yao X., “Evolving Artificial Neural Networks”, Proceedings of the IEEE, 1999, Vol. 87, No.
9, pp. 1423-1447.
19. Burgess Ν., “A Constructive Algorithm that Converges for Real-Valued Input Patterns”, In-
ternational Journal on Neural Systems, 1994 Vol. 5, No. 1, pp. 59-66.
20. Angeline P., Sauders G., and Pollack J., “An Evolutionary Algorithm that Constructs Recur-
rent Neural Networks”, IEEE Transaction on Neural Networks, 1994 Vol. 5, pp. 54-65.
21. Adamopoulos A., Georgopoulos E., Manioudakis G., and Likothanassis S., “An Evolutionary
Method for System Structure Identification Using Neural Networks”, Neural Computation
’98, 1998
22. Billings S., and Zheng G., “Radial Basis Function Network Configuration Using Genetic Al-
gorithms”, Neural Networks, 1995, Vol. 8, pp. 877-890.
23. Swets J., "Signal Detection Theory and Roc Analysis in Psychology and Diagnostics: Col-
lected Papers", Lawrence Erlbaum Associates, 1996, Mahwah NJ.
24. Streiner D., and Cairney J., “What’s Under the ROC? An Introduction to Receiver Operating
Characteristics Curves”, The Canadian Journal of Psychiatry, 2007, Vol. 52, No. 2, pp. 121-
128.
25. Parzen E., “On Estimation of a Probability Density Function and Mode”, Annals of Mathe-
matical Statistics, 1962, Vol. 33, No.3, pp. 1065-1076.
Mining Patterns of Lung Infections in Chest
Radiographs
Abstract Chest radiography is a reference standard and the initial diagnostic test
performed in patients who present with signs and symptoms suggesting a pulmo-
nary infection. The most common radiographic manifestation of bacterial pulmo-
nary infections is foci of consolidation. These are visible as bright shadows inter-
fering with the interior lung intensities. The discovery and the assessment of
bacterial infections in chest radiographs is a challenging computational task. It has
been limitedly addressed as it is subject to image quality variability, content diver-
sity, and deformability of the depicted anatomic structures. In this paper, we pro-
pose a novel approach to the discovery of consolidation patterns in chest radio-
graphs. The proposed approach is based on non-negative matrix factorization
(NMF) of statistical intensity signatures characterizing the densities of the de-
picted anatomic structures. Its experimental evaluation demonstrates its capability
to recover semantically meaningful information from chest radiographs of patients
with bacterial pulmonary infections. Moreover, the results reveal its comparative
advantage over the baseline fuzzy C-means clustering approach.
1 Introduction
Tsevas, S., Iakovidis, D.K. and Papamichalis, G., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 205–214.
206 S. Tsevas, D. Iakovidis and G. Papamichalis
The early detection of such infections as well as the choice of the appropriate an-
tibiotic treatment can be life-saving especially for the critically ill patients. To this
end, a computational approach that would be capable of automatically discovering
patterns of infections and antibiotic prescription from patients’ health records
would constitute a valuable tool to the community [3].
Patients’ health records may include both structured and unstructured data,
digital signals and images. In the case of chest patients, chest radiographs provide
substantial indications on the presence of a pulmonary infection. The most com-
mon radiographic manifestation of bacterial pulmonary infections is foci of con-
solidation. These are visible as bright shadows interfering with the interior lung
intensities, which include intensities the lung parenchyma and intensities of super-
imposed structures of the thoracic cavity such as the ribs and the mediastinum.
The diversity and the complexity of the visual content of the lung fields as well as
the quality variability induced by the variable parameters of the radiation expo-
sure, make its medical interpretation a challenging task. This task has motivated
many researchers to develop computational methods for automatic lung field de-
tection and analysis [4][5].
Current lung field analysis methods include size measurements of structures of
the thoracic cavity [6], detection of the ribs [7], lung nodule detection [8], whereas
fewer methods have been proposed for mining radiographic patterns associated
with the presence of pulmonary infections [9]. Mining patterns of pneumonia and
severe acute respiratory syndrome (SARS) has also been in the scope of contem-
porary research. In [10] a supervised approach using intensity-histograms and sec-
ond-order statistical features has been proposed for mining pneumonia and SARS
patterns, whereas most recently, the use of wavelet-based features have been
proved useful for the detection radiographic patterns of childhood pneumonia un-
der a supervised classification framework [11].
In contrast to the former methods this paper proposes an unsupervised ap-
proach to the discovery of consolidation patterns associated with bacterial pulmo-
nary infections. Such an approach does not take into account any information ex-
tracted from previous images, thus avoiding the need for feature normalization
between images. We use statistical intensity signatures characterizing the densities
of the several anatomic structures depicted in chest radiographs to recover seman-
tically meaningful information regarding the consolidation patterns. This is
achieved by a clustering approach based on Non-negative Matrix Factorization
(NMF) that involves cluster merging. The results obtained are compared with
those obtained with the fuzzy C-means (FCM) clustering approach [12].
The rest of this paper consists of three sections. Section 2 provides a descrip-
tion of the proposed methodology, section 3 presents the results of its experimen-
tal application on a set of high-resolution chest radiographs, and section 4 summa-
rizes the conclusions that can be derived from this study.
Mining Patterns of Lung Infections 207
2 Methodology
V ≈ V = W×H (1)
2
min V − WH F
, W, H ≥ 0 (2a)
W, H
Wir ← Wir
(V × W ) T
ir (2b)
(W × H × H ) T
ir
H rj ← H rj
(W T
×V )
rj
(2c)
(W T
×W×H )rj
~
H T = H T D −H1 (4b)
S = DW D H (4c)
where DW and DH are diagonal matrices with diagonal elements be in the Lp-norm:
~c ~
(DC )rr = r p
, (D H )rr = h r (5)
p
~
For the Euclidean distance case (L2-norm) ~
cr 2
= 1 , hr = 1 and due to the
2
non-negativity of the data, this is just the condition that columns sum to 1. Thus,
DW contains the column sums of W, and DH contains the column sums of HT.
In this paper, we apply NMF as a clustering technique to extract consolidation
patterns from radiographic images. A radiographic image is divided into a set of
non-overlapping sub-images which are subsequently clustered into an even num-
ber of r clusters. Some of these clusters will correspond to patterns of normal lung
parenchyma and the rest will correspond to consolidation patterns. The sub-
images are represented by intensity histogram signatures characterizing the densi-
ties of the anatomic structures depicted in a chest radiograph [21]. Finally, the r
clusters are dyadically merged down to two clusters based on the similarity of
their centroids. Considering that the consolidations are dense foci in the lungs
which are normally filled with air, we assume that the cluster with the lower inten-
sity centroid corresponds to the patterns of the normal lung field parenchyma, and
that the cluster with the higher intensity centroid corresponds to the consolidation
patterns.
For the evaluation of the proposed approach, we used a collection of chest radio-
graphs from twenty patients. The radiographic images were 8-bit grayscale with a
size of 2816×2112 pixels. The lung fields were isolated using the methodology
proposed in [5] and were divided into 32×32 sub-images. From each sub-image an
intensity histogram signature was calculated so as to build the data matrix that was
used as input to the NMF. A representative chest radiograph along with the iso-
lated lung fields are illustrated in Fig. 1.
Mining Patterns of Lung Infections 209
(a) (b)
Fig. 1. (a) A chest radiographic image, (b) the lung fields isolated from (a). The magnified area
illustrates the sub-images considered. Consolidation areas are visible as bright shadows within
the lung fields.
Each column in the initial data matrix, V, corresponds to the histogram infor-
mation of each window. The resulting non-negative matrices, W and H, represent
the feature bases and their membership probabilities accordingly. Both W and H
were normalized by following the procedure described in the methodology sec-
tion. Such a normalization allows to easily compare the bases with the initial fea-
ture vectors on the one hand, while on the other hand it leads to an easier interpre-
tation of the probability of a signature to belong to a certain cluster or category.
To evaluate the performance of the proposed approach we applied the pro-
posed as well as the conventional NMF-based approach on each radiographic im-
age. Prior to the application of the algorithms, images were annotated by an expert
so as to provide us with the necessary ground truth information. The results ob-
tained are compared with the performance obtained with the fuzzy c-means
(FCM) algorithm which is considered as a baseline method [12].
The performance measures considered in this study are: sensitivity, specificity
and accuracy [23],
TP
Sensitivity = (6)
TP + FN
TN
Specificity = (7)
TN + FP
TP + TN
Accuracy = (8)
TP + FP + TN + FN
where TP (true positive), TN (true negative), FP (false positive) and FN (false
negative) are estimated as follows:
210 S. Tsevas, D. Iakovidis and G. Papamichalis
(a) (b)
(c) (d)
Fig. 2. Mining patterns of infections with the conventional NMF (left) and FCM (right) ap-
proaches using two clusters. (a) NMF first cluster, (b) FCM first cluster, (c) NMF second cluster,
(d) FCM second cluster.
The formation of the clusters from the dataset derived from the image in Fig.1
is illustrated in Fig.2 for the 2 clusters case and for both NMF (on the left) and
FCM (on the right). Figure shows that NMF achieves better separation of the con-
solidated areas (top left image), in contrast to FCM that fails to separate the con-
solidated areas from the normal ones. However, the separation of the consolidated
from the normal areas is not always feasible using two clusters. An example is
provided in Fig. 3 where the clustering of the lung fields in Fig.3(a) in two clus-
ters results in an accuracy that does not exceed 40%. To cope with this problem,
clustering in more than two clusters followed by a cluster merging scheme is pro-
posed.
According to this approach the image signatures are initially clustered into an
even number of clusters. Considering that the NMF bases actually represent the
cluster centroids, the clusters are dyadically merged down to two based on the
similarity of their centroids. Since the signatures are intensity histograms the simi-
larity is evaluated by the histogram intersection metric [22].
Mining Patterns of Lung Infections 211
Fig. 3. Mining patterns of infections with the conventional NMF clustering approach using two
clusters. (a) The lung fields to be clustered, (b) first cluster (consolidation areas), c) second clus-
ter (normal lung field parenchyma).
Fig. 4. Mining patterns of infections with the proposed approach. The resulting NMF bases after
clustering the dataset derived from Fig.3(a) in 4 clusters (top row), the formation of the 4 clusters
(top row) and the resulting merged clusters (bottom row).
212 S. Tsevas, D. Iakovidis and G. Papamichalis
The average results estimated from the application of the proposed approach
on the whole dataset are summarized in Fig. 5. The average accuracy achieved by
the NMF followed by cluster merging is 75%, whereas the accuracy achieved by
the direct NMF clustering into two clusters is significantly lower reaching only
35%. It can be noticed that the average accuracy obtained with the FCM is poorer.
Though, its sensitivity is much higher than the one obtained with the NMF after
cluster merging, NMF provides much higher specificity and accuracy leading to
an overall better performance. As it is illustrated in the figure, the accuracy ob-
tained with the FCM is about 29% in the direct clustering case and 61% for the
cluster merging case. Comparing the results of the proposed approach with the re-
sults of the FCM clustering with and without cluster merging as illustrated in Fig.
5, it becomes evident that the proposed approach is more suitable than the FCM
for the particular clustering task.
Fig. 5. Performance of the proposed cluster merging method in terms of sensitivity, specificity
and accuracy.
References
1. Irene M. Mullins, Mir S. Siadaty, Jason Lyman, Ken Scully, Carleton T. Garrett, W.
Greg Miller, Rudy Muller, Barry Robson, Chid Apte, Sholom Weiss, Isidore Rigoutsos,
Daniel Platt, Simona Cohen, William A. Knaus, Data mining and clinical data reposito-
ries: Insights from a 667,000 patient data set, Computers in Biology and Medi-
cineVolume 36, Issue 12, , December 2006, Pages 1351-1377
2. D.L. Smith, J. Dushof, E.N. Perencevich, A.D. Harris, S.A. Levin, “Persistent Coloni-
zation and the spread of antibiotic resistance in nosocomial pathogens: Resistance is a
regional problem,” PNAS, vol. 101, no. 10, pp. 3709-3714, Mar. 2004
3. C. Lovis, D. Colaert, V.N. Stroetmann, “DebugIT for Patient Safety - Improving the
Treatment with Antibiotics through Multimedia Data Mining of Heterogeneous Clinical
Data,” Stud Health Technol. Inform., vol. 136, 641-646, 2008
4. B.V. Ginneken, B.T.H. Romeny, and M.A. Viergever, “Computer-Aided Diagnosis in
Chest Radiography: A Survey,” IEEE Transactions Medical Imaging, vol. 20, no. 12,
pp. 1228-1241, Dec. 2001
5. D.K. Iakovidis, and G. Papamichalis, “Automatic Segmentation of the Lung Fields in
Portable Chest Radiographs Based on Bézier Interpolation of Salient Control Points,”
in Proceedings IEEE International Conference on Imaging Systems and Techniques,
Chania, Greece, 2008, pp. 82-87
6. I.C. Mehta, Z.J. Khan, and R.R. Khotpa, Volumetric Measurement of Heart Using PA
and Lateral View of Chest Radiograph, S. Manandhar et al. (Eds.): AACC 2004, LNCS
3285, pp. 34–40, 2004
7. M. Loog, B.van Ginneken: Segmentation of the posterior ribs in chest radiographs using
iterated contextual pixel classification. IEEE Transactions Medical Imaging 25(5): 602-
611 (2006)
8. Giuseppe Coppini, Stefano Diciotti, Massimo Falchini, N. Villari, Guido Valli: Neural
networks for computer-aided diagnosis: detection of lung nodules in chest radiograms.
IEEE Transactions on Information Technology in Biomedicine 7(4): 344-357 (2003)
9. B.V. Ginneken, S. Katsuragawa, B.T.H. Romeny, K. Doi, and M.A. Viergever, “Auto-
matic Detection of Abnormalities in Chest Radiographs Using Local Texture Analy-
sis”, IEEE Transactions Medical Imaging, vol. 21, no. 2, pp. 139-149, Feb. 2002
10. X. Xie, X. Li, S. Wan, and Y. Gong, Mining X-Ray Images of SARS Patients, G.J.
Williams and S.J. Simoff (Eds.): Data Mining, LNAI 3755, pp. 282–294, 2006
11. L.L.G. Oliveiraa, S. Almeida e Silvaa, L.H. Vilela Ribeirob, R. Maurício de Oliveiraa,
C.J. Coelhoc and A.L.S.S. Andrade, “Computer-Aided Diagnosis in Chest Radiography
for Detection of Childhood Pneumonia”, International Journal of Medical Informatics,
vol. 77, no. 8, pp. 555-564, 2007J.
214 S. Tsevas, D. Iakovidis and G. Papamichalis
12. C. Bezdek, J. Keller, R. Krisnapuram, and N.R. Pal. Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing. Kluwer Academic Publishers, 1999
13. Paatero and U. Tapper. Positive matrix factorization: a nonnegative factor model with
optimal utilization of error estimates of data values. Environmetrics, 5(1):111–126,
1994
14. D.D. Lee and H.S. Seung, “Algorithms for non-negative matrix factorization,” Ad-
vanced Neural Information Processing Systems, 13, 2000, pp. 556–562
15. Chris D, XiaoFeng H, Horst D.S. On the equivalence of nonnegative matrix factoriza-
tion and spectral clustering, Proceedings SIAM International Conference on Data Min-
ing (SDM’05), 2005: 606–610
16. C. Ding, X. He, H.D. Simon, “On the equivalence of nonnegative matrix factorization
and spectral clustering,” Proceedings SIAM International Conference on Data Mining,
Newport Beach, CA, April 2005, pp. 606–610
17. Xiong H.L, Chen X.W. Kernel-based distance metric learning for microarray data clas-
sification, BMC Bioinformatics, 2006, 7
18. Xu W, Liu X, Gong Y. Document clustering based on non-negative matrix factoriza-
tion, Proceedings ACM Conference Research Development in Information Retrieval,
2003: 267–273
19. Yuan G, George C. Improving molecular cancer class discovery through sparse non-
negative matrix factorization, Bioinformatics, 2005, vol 21, no.21:3970–3975
20. Ding, C., Li, T., Peng, W. On the equivalence between Non-negative Matrix Factoriza-
tion and Probabilistic Latent Semantic Indexing (2008) Computational Statistics and
Data Analysis, 52 (8), pp. 3913-3927.
21. Novelline, R.A. (1997) Squires’s Fundamentals of Radiology. Cambridge: Harvard
University Press
22. M.J. Swain, D.H. Ballard, Color Indexing. Int. J. Computer Vision, Vol. 7, No. 1, pp.
11–32, Nov. 1991
23. Han, J., Kamber, M., 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann
Publishers.
Computational Modeling of Visual Selective
Attention Based on Correlation and
Synchronization of Neural Activity.
1
Department of Computer Science, 2Department of Psychology, University of Cyprus, 75
Kallipoleos, 1678, POBox 20537, Nicosia, CYPRUS.
1 Introduction
Due to the great number of sensory stimulation that a person experiences at any
given point of conscious life, it is practically impossible to integrate all informa-
tion that is available to the senses into a single perceptual event. This implies that
a mechanism must be present in the brain to focus selectively its resources on spe-
cific information. This mechanism, known as attention, can be described as the
Neokleous, K.C., Avraamides, M.N. and Schizas, C.N., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 215–223.
216 Kleanthis Neokleous and Christos Schizas
process by which information is passed on to a higher level of processing either
through relative amplification of the neural activity that represents the “to be at-
tended” stimuli or by suppression of the distracting stimuli, or both.
Attention can be guided by top-down and bottom-up processing as cognition
can be regarded as a balance between internal motivations and external stimula-
tions. Volitional shifts of attention or endogenous attention results from "top-
down" signals originating in the prefrontal cortex while exogenous attention is
guided by salient stimuli from "bottom-up" signals in the visual cortex (Corbetta
and Shulman, 2002).
Previous literature on attention suggests that the attention selection mechanism
functions in two hierarchical stages: An early stage of parallel processing across
the entire visual field that operates without capacity limitation, and a later limited-
capacity stage that deals with selected information in a sequential manner. When
items pass from the first to the second stage of processing, they are typically con-
sidered as selected. (Treinsman and Gelade 1980).
Previous research suggests that attention is based on two processes. The first is
known as “biased competition” (Moran and Desimone, 1985) and it is supported
by findings from studies with single-cell recordings. These studies have shown
that attention enhances the firing rates of the neurons that represent the attended
stimuli and suppresses the firing rates of the neurons that encode the unattended
stimuli. The second process, which refers to the synchronization of neural activity
during the deployment of attention, is supported by studies showing that neurons
selected by attention have enhanced gamma-frequency synchronization (Gruber et
al., 1999; Steinmetz et al., 2000; Fries et al., 2001). For example, in a study by
Fries et al. (2001) the activity in area V4 of the brain of macaque monkeys was re-
corded while the macaques attended relevant stimuli. Results showed increased
gamma frequency synchronization for attended stimuli compared to the activity
elicited by distractors. A recent study by Buelhman and Deco (2008) provided
evidence that attention is affected by both biased competition and the synchroniza-
tion of neural activity.
A computational model for biased competition has been proposed by Deco and
Rolls (2005). In this model Deco and Rolls have shown that competition between
pools of neurons combined with top-down biasing of this competition gives rise to
a process that can be identified with attention. However, it should be pointed out
that this model only considered rate effects while gamma synchronization was not
addressed.
In the present report, we propose a computational model for endogenous and
exogenous visual attention that is based on both the rate and the synchronization
of neural activity. The basic functionality of the model relies on the assumption
that the incoming visual stimulus will be manipulated by the model based on the
rate and temporal coding of its associated neural activity. The rate associated with
a visual stimulus is crucial in the case of exogenous attention since this type of at-
tention is mainly affected by the different features of the visual stimuli. Stimuli
with more salient features gain an advantage for passing through the second stage
of processing and subsequently for accessing working memory. On the other hand,
Computational Modeling of Visual Selective Attention 217
endogenous or top-down attention is mainly affected by the synchronization of in-
coming stimuli with the goals that guide the execution of a task. These goals are
most likely maintained in the prefrontal cortex of the brain. The presence of a
closed link between endogenous attention with synchronization is supported by
many recent studies (Niebur et al 2002, Gross et al 2004). For example, Saalmann
et al (2007) recorded neural activity simultaneously from the posterior parietal
cortex as well as an earlier area in the visual pathway of the brain of macaques
during the execution of a visual matching task. Findings revealed that there was
synchronization of the timing activities in the two regions when the monkeys se-
lectively attended to a location. Thus, it seems that parietal neurons which pre-
sumably represent neural activity of the endogenous goals may selectively in-
crease activity in earlier sensory areas. In addition, the adaptive resonance theory
by Grossberg (1999) implies that temporal patterning of activities could be ideally
suited to achieve matching of top–down predictions with bottom–up inputs, while
Engel et al (2001) in their review have noted that “If top–down effects induce a
particular pattern of subthreshold fluctuations in dendrites of the target population,
these could be ‘compared’ with temporal patterns arising from peripheral input by
virtue of the fact that phase-shifted fluctuations will cancel each other, whereas in-
phase signals will summate and amplify in a highly nonlinear way, leading to a sa-
lient postsynaptic signal” (p.714). Finally, it should be noted that Hebbian learning
suggests that action potentials that arrive synchronously at a neuron summate to
evoke larger postsynaptic potentials than do action potentials that arrive asynchro-
nously; thus, synchronous action potentials have a greater effect at the next proc-
essing stage than do asynchronous action potentials.
A mechanism for selective attention based on the rate and synchronization of
the neural activity for incoming stimuli is thus used in the proposed model. The
model has been implemented computationally to simulate the typical data from
“the attentional blink” phenomenon (Raymond and Sapiro,1992).
The Attentional Blink (AB) is a phenomenon observed with using the rapid serial
visual presentation (RSVP) paradigm. In the original experiment by Raymond and
Shapiro (1992), participants were requested to identify two letter targets T1 and
T2 among digit distractors with each stimulus appearing for about 100ms (Figure
1.a). Results revealed that the correct identification of T1 impaired the identifica-
tion of T2 when T2 appeared within a brief temporal window of 200-500 ms after
T1. When T2 appeared outside this time window it could be identified normally
(Figure 1.b series 1.).
Another important finding from the AB task is that when T1 is not followed by
a mask/distractor, the AB effect is significantly reduced. That is, if the arrival of
the incoming stimulus at t= 200ms (lag 2) and/or lag 3 (t=300ms) are replaced by
a blank then the AB curve takes the form shown by series 2 and 3 in Figure 1.
218 Kleanthis Neokleous and Christos Schizas
Attentional Blink Curve Real Data
Series1
100
Figure1. Presentation of the RSVP for the “attentional blink” experiment (Figure1.a) and the
typical attentional blink curve with no blanks (red series), with blank at lag 1 (green series) and
blank at lag 2 (black series) based on the data of Raymond and Sapiro (1992) (Figure1.b).
Event-related potentials (ERPs) are signals that measure the electrical activity of
neuronal firing in the brain relative to events such as the presentation of stimuli.
Over the years a number of ERP components related to attention have been identi-
fied in the literature.
The first distinguishable physiological signals are observed around 130-150ms
post stimulus (P1/N1 signals). Most likely, these signals correspond to the initial
processing in the visual cortex and reflect early pre-frontal activation by the in-
coming visual stimuli. At about 180-240 ms post-stimulus the P2/N2 signals are
observed which have become clearer over the last years with the use of MEG (Io-
annides and Taylor, 2003). These signals have been proposed as control signals
for the movement of attention (Hopf et al., 2000 , Taylor 2002). More specifically,
the CODAM model of attention that is proposed by Taylor (2002) follows a con-
trol theory approach and uses the N2 signal as the signal from the controller that
modulates the direction of the focus of attention. Moreover, in Bowman and Wy-
ble’s (2007) Simultaneous Type Serial Token (ST2) model, when the visual sys-
tem detects a task-relevant item, a spatially specific Transient Attentional En-
hancement (TAE), called the blaster, is triggered. In the ST2 model the presence of
a correlation between the blaster and a component of the P2/N2 signal is also hy-
pothesized. The P300 ERP component which is present at about 350–600 ms post-
stimulus is taken to be an index of the availability for report of the attention-
amplified input arriving from earlier sensory cortices to the associated working
memory sensory buffer site. Thus, access to the working memory sensory site is
expected to occur in the specific time window. Finally, the N4 component which
is recorded at around 400 ms is related to semantic processing indicating percep-
tual awareness.
Computational Modeling of Visual Selective Attention 219
The chronometric analysis of the ERPs occurring during the attentional blink
has revealed some important observations. More importantly in the case where the
second target was not perceived, the P1/N1 and the N400 components which are
considered indices of semantic processing were still obtained even though the N2
and P300 were no longer observed (Sergent et al 2005). Thus, one possible expla-
nation for the classic U-shaped curve of Figure 1.b (series 1) based on the identifi-
cation of the second target to have a minimum at around 300 ms, is that an early
attention processing component of the second target (possibly N2 of T2) is inhib-
ited by a late component of the first target (P300 of T1), (Vogel et al 1998,Fell et
al 2001).
4 Proposed Model
The proposed model is a two stage model that, in contrast to other computational
models, contains a correlation control module (Figure 3). That is, in the case of
endogenous attention tasks, the functioning of the model is based on the synchro-
nization of incoming stimuli with information held in the endogenous goals mod-
ule which has probably been initialized by information from long –term memory
(Engel et al 2001).
In the conducted simulations each stimulus that enters the visual field, is coded
by determining the rate of the related neuron spikes (enhanced relatively by the sa-
lience filters) as well as the exact timing of the spikes. This means that both of
these characteristics are considered in the race between the different visual stimuli
to access working memory as initially implemented in a computational model by
Niebur and Koch (1994).
As shown in Figure 3, a visual stimulus initially moves from the inputs module
into the first stage of parallel processing. In this stage, competition among all sti-
muli, implemented as lateral inhibition, exerts the first impact on each of the neu-
ral responses. Following that, as the neural activity continues up through the visual
hierarchy, the information from the visual stimuli passes through the semantic cor-
relation control module. During this stage of process, a coincidence detection me-
chanism similar to the procedure discussed by Mikula and Niebur (2008) meas-
ures the degree of correlation between the visual stimuli and the endogenous goals
(in the case of top-down attention).
This procedure provides an advantage (in the case of amplification) to the se-
lected neural activity for accessing working memory. However, the initialization
of a signal by the correlation control module (that can be implied to be relevant
with the N2pc signal -component of N2/P2), can be represented by the combined
firing of a neural network. Thus, it is appropriate to consider a relative refractory
period each time the correlation control module “fires” or activates the specific
signal for amplification or inhibition. Consequently, the refractory period of the
correlation control module combined with the lateral inhibition between the RSVP
items causes the attenuation of the attentional blink in the case in which the dis-
220 Kleanthis Neokleous and Christos Schizas
tractors are replaced by blanks and both these mechanisms are inherited in the
proposed model (series 2 and 3 in Figure1.b.).
Finally, after the handling of the neural activity of each incoming stimulus, a
specific working memory node is excited producing inhibition towards the other
working memory nodes. After a specific threshold is passed, the working memory
node will fire an action potential simulating the initialization of the P300 signal
representing perceptual awareness of the specific visual stimuli as well as inhibi-
tion of the following signal from the correlation control model ( possibly the N2/
P2 signals of the following stimulus) if it appears during that specific timing.
It should be also noted that even stimuli with completely no correlation with
the endogenous goals could gain access to working memory sites, provided that
their response has been enhanced sufficiently by the salience filters at the first
stages of processing. Thus, the model allows for exogenous shifts of attention.
Computational Modeling of Visual Selective Attention 221
5 Simulations and Results
Inside the endogenous goals module, the pattern representing the targets is
saved. Therefore, when a visual stimulus enters, a coincidence detector mecha-
nism measures the degree of correlation and fires a relative signal. For the simula-
tions, T1 was always presented at time t=0 and T2 at each of the following time
lags. For each time lag that T2 was presented, the simulations where run for 50
times for the three different cases. That is, when distractors capture all the avail-
able positions causing masking to the targets, with blank at lag 1 and with blank at
lag 2. The simulation results compared to experimental results can be seen in Fig-
ure 5 below.
Attentional Blink simulation data
100 Attentional Blink Curve Real Data
Series1
Percent report of T2 given T1
100
Percent report of T2 given T1
90 No blanks
80
80
70
60 60
Series2
50
40 Blank at Lag 1
40
30
No blanks 20
20 Blank at Lag1 10
Series3
Blank at Lag2 0
0 0 200 400 600 800 Blank at Lag 2
0 1 2 3 4 5 6
Stimulus Onset Asynchrony (Each Lag = 100ms) Stimulus Onset Asynchrony
a b
Figure5. Comparison between simulation data (5.a) and experimental data (5.b).
222 Kleanthis Neokleous and Christos Schizas
6 Discussion
References
1. Bowman H.,Wyble S.(2007). “The Simultaneous Type, Serial Token Model of Temporal
Attention and Working Memory.” Psy. Re., Vol. 114
2. Buehlmann A., Deco G (2008). “The Neuronal Basis of Attention: Rate versus Synchro-
nization Modulation”. The Jour. of Neuros. 28(30)
3. Corbetta, M., Shulman, G.L. (2002).”Control of goal-directed and stimulus-driven atten-
tion in the brain”. Nature R. Neuroscience 3:201-215.
4. Deco G, Rolls ET (2005). “Neurodynamics of biased competition and cooperation for at-
tention: a model with spiking neurons”. J. Neurophysi.94
5. Engel A. K., Fries P., Singer W.(2001) “Dynamic predictions: Oscillations and synchrony
in top–down processing” Nature, Volume 2 pp.704-716
6. Fries P, Reynolds JH, Rorie AE, Desimone R (2001). “Modulation of oscillatory neuronal
synchronization by selective visual attention”. Science 291:1560-1563.
7. Grossberg, S. (1999). “The link between brain learning, attention, and consciousness”.
Conscious. Cogn 8, 1-44
8. Gross J., Schmitz F., Schnitzler I. et al (2004). “Modulation of long-range neural syn-
chrony reflects temporal limitations of visual attention in humans.” PNAS August 31,
2004 vol. 101 no. 35 pp13050–13055
9. Gruber T, Muller MM, Keil A, Elbert T (1999). “Selective visual-spatial attention alters
induced gamma band responses in the human EEG”. Clin Neurophysiol 110:2074-2085.
Computational Modeling of Visual Selective Attention 223
10. Hopf, J.-M., Luck, S.J., Girelli, M., Hagner, T., Mangun, G.R., Scheich, H., Heinze, H.-
J., (2000). “Neural sources of focused attention in visual Search”. Cereb. Cortex 10,
1233–1241.
11. Ioannides, A.A., Taylor, J.G., (2003). “Testing models of attention with MEG”. In: Pro-
ceedings IJCNN’03. pp. 287–297.
12. Mikula S., Niebur E., (2008). “Exact Solutions for Rate and Synchrony in Recurrent
Networks of Coincidence Detectors.” Neural Computation.20
13. Moran J, Desimone R (1985). “Selective attention gates visual processing in the extrastri-
ate cortex”. Science 229:782-784.
14. Niebur E., Hsiao S.S., Johnson K.O., (2002) “Synchrony: a neuronal mechanism for at-
tentional selection?” Cur.Op. in Neurobio., 12:190-194
15. Niebur E, Koch C (1994). “A Model for the Neuronal Implementation of Selective Visual
Attention Based on Temporal Correlation Among Neurons”. Journal of Computational
Neuroseience 1, 141-158.
16. Raymond JE, Shapiro KL, Arnell KM (1992). “Temporary suppression of visual process-
ing in an RSVP task: an attentional blink?". J.of exp. psyc. Human perc, and performance
18 (3): 849–60
17. Saalmann Y.B., Pigarev I.N., et al. (2007). “Neural Mechanisms of Visual Attention:
How Top-Down Feedback Highlights Relevant Locations” Science 316 1612
18. Sergent C., Baillet S. & Dehaene S. (2005). “Timing of the brain events underlying ac-
cess to consiousness during the attentional blink.” Nat Neurosci, Volume 8, Number 10,
page 1391-1400.
19. Steinmetz PN, Roy A, et al.(2000). “Attention modulates synchronized neuronal firing in
primate somatosensory Cortex”. Nature 404:187-190.
20. Taylor J.G., Rogers M. (2002). “A control model of the movement of attention”. Neural
Networks 15:309-326
21. Treisman, A., & Gelade, G. (1980). “A feature-integration theory of attention”. Cognitive
Psychology, 12, 97-136.
22. Vogel E.K., Luck S.J., Shapiro K.L., (1998). “Electrophysiological evidence for a post-
perceptual locus of suppression during the attentional blink.” J. Exp. Psychol. Hum. Per-
cept. Perform. 24 pp.1656-1674.
MEDICAL_MAS: an Agent-Based System for
Medical Diagnosis
Mihaela Oprea
1 Introduction
Oprea, M., 2009, in IFIP International Federation for Information Processing, Volume 296; Artificial
Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston:
Springer), pp. 225–232.
226 Mihaela Oprea
In this paper, it is presented an application of the multi-agent approach in the
medical domain. MEDICAL_MAS is an agent-based system that is under devel-
opment for medical diagnosis, and whose main purpose is to act as a decision sup-
port instrument for physicians during their activity of patients’ diagnosis. The pa-
per is organized as follows. Section 2 briefly presents the agent-based approach
and some applications in the medical domain. The architecture of the agent-based
system MEDICAL_MAS for diagnosis and treatment is described in section 3. A
case study in the cardiology area is shown in section 4. The last section concludes
the paper and highlights the future work.
The agent-based approach can significantly enhances our ability to model, design
and build complex (distributed) software systems [7]. An agent can be viewed as a
hardware or software entity that has properties such as autonomy and flexibility
(social ability, reactivity, pro-activity). From the viewpoint of Artificial Intelli-
gence (AI), an agent is a computer system that apart from the above mentioned
properties that must be included in it, it is either conceptualized or implemented
by using concepts that are more usually applied to humans (e.g. knowledge, belief,
desire, intention, obligation, learning, locality, adaptation, believability, emotion)
[8]. Other properties that an agent might have are: mobility, veracity, benevolence,
rationality, etc. Also, an important remark is that an agent is embedded in an envi-
ronment in which lives and interacts with other entities (e.g. agents, legacy soft-
ware). So, an agent can be viewed as living in a society in which it has to respect
the rules of that society. A multi-agent system (MAS) is a particular type of dis-
tributed intelligent system in which autonomous agents inhabit a world with no
global control or globally consistent knowledge. The characteristics of multi-agent
systems include: autonomy, communication, coordination, cooperation, security
and privacy, mobility, openness, concurrency, distribution. Most of these charac-
teristics need special approaches when designing and implementing a MAS. Usu-
ally, the MAS is based on a multi-agent infrastructure that enables and rules inter-
actions, and acts as a middleware layer that support communication and
coordination activities. A major characteristic of the agent technology is the high
heterogeneity that means agent model heterogeneity, language heterogeneity, and
application heterogeneity. This characteristic has to be manageable by using ap-
propriate models and software toolkits.
Several projects are developing agent-based systems in the medical domain.
One type of application is the agent-based modelling of a hospital, viewed as an
electronic organization. An example of such framework for modelling a hospital
as a virtual multi-agent organization is given in [6]. In some research projects in-
telligent agents are used for the distribution of human tissues or for the assignment
of transplantable organs (see e.g. [6]). A MAS architecture for monitoring medical
protocols was developed under the SMASH research project [9]. In [3] it is pre-
MEDICAL_MAS: an Agent-based System for Medical Diagnosis 227
sented the architecture of a MAS for monitoring medical protocols. Another appli-
cation that could be modelled by the multi-agent approach is given by the auto-
mated clinical guideline in critical care (e.g. an extension of the SmartCare system
described in [10], that uses a knowledge-based approach). Other potential applica-
tions that we have identified include the development of personal agents that assist
physicians during their work, the management of a hospital (viewed as a virtual
organisation), the patients monitoring and control, wireless applications for am-
bullance assistance, tutoring systems (e.g. in surgery). Practically, the main cate-
gories of applications are monitoring and control, diagnosis and treatment, plan-
ning and scheduling, tutoring. In this paper we shall focus on a multi-agent system
dedicated to medical diagnosis and treatment, MEDICAL_MAS, that could act as
a decision support instrument for physicians.
Physician
A virtual world
B C
D Relational
databases
Patient
real world
Nurse
Ask(consultation,P)
Ask(HistoryData,P) Search_Get(HistoryData,P)
Ask(symptoms,P)
Get(symptoms,P)
PDB
Receive(HistoryData,P)
DiseasesInfo
Ask(MoreInfo,P)
DDB
Receive(MoreInfo,P) MedicationsInfo
MDB
Inform(diagnostic,P) Update(HealthState,P)
The system is developed in Zeus [11], and so far, we have implemented the ontol-
ogy (Fig. 3 shows a screenshot with a part of the cardiology ontology), and we are
currently developing a first version of the multi-agent system.
230 Mihaela Oprea
2
EKG: Tachycardia
B
C
DataBases
In the example we have taken the case of a patient that has no symptoms, but
has problems with blood pressure. After the examination made by the physician
with the use of the sphygmomanometer the collected data are the following:
The agent-based approach is one of the most efficient AI technologies that could
be applied with success in the medical field, especially in the case of Internet and
web-based applications extension in this area. The paper presented a multi-agent
system for medical diagnosis, MEDICAL_MAS, that is under development, and
was designed to act as a decision support instrument for physicians. So far, we
have realized the analysis, and the design steps of the system development. Also, a
first preliminary implementation in Zeus was made, and we have studied the use
of the it in different scenarios. As a future work, we will end the implementation
phase, and we shall experiment the system use in more different case studies.
The main advantage of using MEDICAL_MAS is given by the possibility of
making the best decision regarding the diagnostic for a particular patient as well as
giving the best medication in a specific context of the patient health state. We
have to note that despite the fact that other artificial intelligence technologies (e.g.
expert systems) have offered quite similar benefits in the medical domain, the
multi-agent systems technology has the advantages (naming just few of them): of
using distributed expertise knowledge (searched and collected on line by the
agents), of being pro-active (e.g. taking the initiative to do some search on the
web), and of being capable of communication. As time is an important resource
for everyone, especially for physicians which are usually work overloaded,
MEDICAL_MAS could offer them an efficient decision making instrument that
can assist their work.
232 Mihaela Oprea
References
1 Introduction
Brain tumors are the second most common cancer of childhood, and comprise ap-
proximately 25% of all pediatric cancers. Over 3,400 children are diagnosed in the
U.S. each year; of that, about 2,600 will be under the age of 15. Brain tumors are
the leading cause of solid tumor cancer death in children; they are the third leading
cause of cancer death in young adults ages 20-39. Many researchers are looking
for efficient and reliable ways to early diagnose brain tumor types and detect re-
lated biomarkers through different biomedical images or biological data. The ma-
chine learning algorithms have been playing the most important role during those
heterogamous biomedical/biological datasets analysis to classify different brain
tumor types and detect biomarkers.
Magnetic resonance spectroscopic (MRS) studies of brain biomarkers can pro-
vide statistically significant biomarkers for tumor grade differentiation and im-
Metsis, V., Huang, H., Makedon, F. and Tzika, A., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 233–240.
234 Vangelis Metsis et al.
proved predictors of cancer patient survival [1]. Instead of selecting biomarkers
based on microscopic histology and tumor morphology, the introduction of mi-
croarray technology improves the discovery rates of different types of cancers
through monitoring thousands of gene expressions in a parallel, in a rapid and ef-
ficient manner [23][8]. Because the genes are aberrantly expressed in tumor cells,
researchers can use their aberrant expression as biomarkers that correspond to and
facilitate precise diagnoses and/or therapy outcomes of malignant transformation.
Different data sources are likely to contain different and partly independent in-
formation about the brain tumor. Combining those complementary pieces of in-
formation can be expected to enhance the brain tumor diagnosis and biomarkers
detection. Recently, several studies have attempted to correlate imaging findings
with molecular markers, but no consistent associations have emerged and many of
the imaging features that characterize tumors currently lack biological or molecu-
lar correlates [7][6]. Much of the information encoded within neuroimaging stud-
ies therefore remains unaccounted for and incompletely characterized at the mo-
lecular level [4]. This paper presents a computational and machine learning based
framework for integrating heterogeneous genome-scale gene expression and MRS
data to classify the different brain tumor types and detect biomarkers. We employ
wrapper method to integrate the feature selection process of both gene expression
and MRS. Three popular feature selection methods, Relief-F (RF), Information
Gain (IG) and χ2-statistic (χ2), are performed to filter out the redundant features in
both datasets. The experimental results show our framework using the combina-
tion of two datasets outperforms any individual dataset on sample classification
accuracy that is the standard validation criterion in cancer classification and bio-
marker detection. Our data fusion framework exhibits great potential on heteroge-
neous data fusion between biomedical image and biological datasets and it could
be extended to another cancer diseases study.
2 Methodology
Advancements in the diagnosis and prognosis of brain tumor patients, and thus in
their survival and quality of life, can be achieved using biomarkers that facilitate
improved tumor typing. In our research, we apply state-of-the-art, high-resolution
magic angle spinning (HRMAS) proton (1H) MRS and gene transcriptome profil-
ing to intact brain tumor biopsies, to evaluate the discrimination accuracy for tu-
mor typing of each of the above methods separately and in combination. We used
46 samples of normal (control) and brain tumor biopsies from which we obtained
ex vivo HRMAS 1H MRS and gene expression data respectively. The samples
came from tissue biopsies taken from 16 different people. Out of the forty-six bi-
opsies that were analyzed, 9 of them were control biopsies from epileptic surgeries
and the rest 37 were brain tumor biopsies. The tumor biopsies belonged to 5 dif-
ferent categories: 11 glioblastoma multiforme (GBM); 8 anaplastic astrocytoma
(AA); 7 meningioma; 7 schwanoma; and 5 from adenocarcinoma.
Heterogeneous Data Fusion to Type Brain Tumor Biopsies 235
HRMAS 1H MRS. Magnetic resonance spectroscopic (MRS) studies of brain bio-
markers can provide statistically significant biomarkers for tumor grade differen-
tiation and improved predictors of cancer patient survival [1]. Ex vivo high-
resolution magic angle spinning (HRMAS) proton (1H) MRS of unprocessed tis-
sue samples can help interpret in vivo 1H MRS results, to improve the analysis of
micro-heterogeneity in high-grade tumors [3]. Furthermore, two-dimensional
HRMAS 1H MRS enables more detailed and unequivocal assignments of biologi-
cally important metabolites in intact tissue samples [16]. In Fig.1, an ex vivo
HRMAS 1H MR spectrum of a 1.9 mg anaplastic ganglioglioma tissue biopsy is
shown together with metabolites values that correspond to each frequency of the
spectrum. Please see more detailed information in [29].
Combining MRS and genomic data. While several studies have utilized MRS data
or genomic data to promote cancer classification, to date these two methods have
236 Vangelis Metsis et al.
not been combined and cross-validated to analyze the same cancer samples.
Herein, we implement a combined quantitative biochemical and molecular ap-
proach to identify diagnostic biomarker profiles for tumor fingerprinting that can
facilitate the efficient monitoring of anticancer therapies and improve the survival
and quality of life of cancer patients. The MRS and genomic data strongly corre-
late, to further demonstrate the biological relevance of MRS for tumor typing [21].
Also, the levels of specific metabolites, such as choline containing metabolites, are
altered in tumor tissue, and these changes correspond to the differential expression
of Kennedy cycle genes responsible for the biosynthesis of choline phospholipids
(such as phosphatidylcholine) and suggested to be altered with malignant trans-
formation [18]. These data demonstrate the validity of our combined approach to
produce and utilize MRS/genomic biomarker profiles to type brain tumor tissue.
Classification aims to build an efficient and effective model for predicting class
labels of unknown data. In our case the aim is to build a model that will be able to
discriminate between different tumor types given a set of gene expression values
or MRS metabolite values or a combination of them. Classification techniques
have been widely used in microarray analysis to predict sample phenotypes based
on gene expression patterns. Li et al. have performed a comparative study of mul-
ticlass classification methods for tissue classification based on gene expression
[12]. They have conducted comprehensive experiments using various classifica-
tion methods including SVM [22] with different multiclass decomposition tech-
niques, Naive Bayes [14], K-nearest neighbor and decision trees [20].
Since the main purpose of this study is not to assess the classification perform-
ance of different classification algorithms but to evaluate the potential gain of
combining more than one type of data for tumor typing, we only experimented
with Naïve Bayes (NB) and Support Vector Machines (SVM) with RBF kernel.
Another related task is feature selection that selects a small subset of discrimi-
native features. Feature selection has several advantages, especially for the gene
expression data. First, it reduces the risk of over fitting by removing noisy features
thereby improving the predictive accuracy. Second, the important features found
can potentially reveal that specific chromosomal regions are consistently aberrant
for particular cancers. There is biological support that a few key genetic altera-
tions correspond to the malignant transformation of a cell [19]. Determination of
these regions from gene expression datasets can allow for high-resolution global
gene expression analysis to genes in these regions and thereby can help in focus-
ing investigative efforts for understanding cancer on them.
Existing feature selection methods broadly fall into two categories, wrapper
and filter methods. Wrapper methods use the predictive accuracy of predetermined
classification algorithms, such as SVM, as the criteria to determine the goodness
of a subset of features [9]. Filter methods select features based on discriminant cri-
Heterogeneous Data Fusion to Type Brain Tumor Biopsies 237
teria that rely on the characteristics of data, independent of any classification algo-
rithm [5]. Filter methods are limited in scoring the predictive power of combined
features, and thus have shown to be less powerful in predictive accuracy as com-
pared to wrapper methods [2]. In our experiments we used feature selection
method from both major categories. We experimented with Relief-F (RF), Infor-
mation Gain (IG), and χ2-statistic (χ2), filter methods and we also used wrapper
feature selection for each of the two types of classification algorithms.
The basic idea of Relief-F [11] is to draw instances at random, compute their
nearest neighbors, and adjust a feature weighting vector to give more weight to
features that discriminate the instance from neighbors of different classes. Specifi-
cally, it tries to find a good estimate of the following probability to assign as the
weight for each feature f.
wf = P(different value of f | different class) - P(different value of f | same class)
Information Gain (IG) [15] measures the number of bits of information ob-
tained for class prediction by knowing the value of a feature. Let {ci }im=1 denote the
set of classes. Let V be the set of possible values for feature f. The information
gain of a feature f is defined to be:
m m
G ( f ) = −∑ P (ci ) log P (ci ) + ∑∑ P ( f = υ ) P (ci | f = υ ) log P (ci | f = υ )
i =1 υ∈V i =1
2 2
The χ -statistic (χ ) [13] measures the lack of independence between f and c. It
is defined as follows:
m
( A ( f = υ ) − Ει ( f = υ )) 2
χ 2 ( f ) = ∑∑ i
υ∈V i =1 Ει ( f = υ )
where V is the set of possible values for feature f, Ai(f = υ) is the number of in-
stances in class ci with f = υ, Ei(f = υ) is the expected value of Ai(f = υ). Ei(f = υ) is
computed with Ei(f = υ) = P(f = υ)P(ci)(, where ( is the total number of in-
stances.
3 Experimental Results
Initially we aimed at evaluating how well the classifiers would perform when ap-
plying them to each of our datasets separately. For that purpose we performed 10-
fold cross validation over our 46 samples by using a combination of feature selec-
tion and classification methods.
Table 1 shows the classification accuracy of Naïve Bayes (NB) and SVM clas-
sifiers when using all 16 metabolites and when using a feature selection method.
Clearly the wrapper feature selection method gives the better accuracy across all
classifiers, followed by the case where we use all metabolites for classification.
The SVM classifier using RBF kernel consistently shows the best performance in
this type of data. The decision of keeping the top 6 metabolites when using the fil-
ter feature selection methods was based on the fact that that was the best number
238 Vangelis Metsis et al.
of features that were selected by using the wrapper feature selection method for
each classification algorithm.
Table 1: Classification accuracy for the Table 2: Classification accuracy using
6-class problem using MRS data only. gene expression data only.
NB SVM NB SVM
All metabolites 70.21 % 72.34 % χ2 + wrapper 82.98 % 46.81 %
χ2 (top 6) 46.81 % 51.06 % IG + wrapper 80.85 % 61.70 %
IG (top 6) 46.81 % 51.06 % RF + wrapper 61.70 % 57.44 %
RF (top 6) 63.83 % 68.09 %
Wrapper 72.34 % 78.72 %
CLassification accuracy____
90.00%
Classification accuracy____
90.00%
80.00% 80.00% NB SVM
NB SVM
70.00% 70.00%
60.00%
60.00%
50.00%
40.00% 50.00%
30.00% 40.00%
20.00% 30.00%
10.00% 20.00%
0.00%
s 10.00%
ite 6) 6) 6) er
ol p p p pp 0.00%
tab (to (to (to ra
e x2 IG F W x2 + wrapper IG + wrapper RF + wrapper
R
lm
Al Feature selection method Feature selection method
For the problem of the multiclass classification using gene expression data
only, we followed a hybrid feature selection method combining filter and wrapper
approaches. Using wrapper approach to select a few top genes starting from an ini-
tial number of thousands of genes is computationally prohibiting, and using filter
approach to select less than 100 genes does not give good classification accuracy
because the final set of selected genes contains genes that are highly correlated to
each other, thus giving a redundant set of genes. In our approach, first we selected
the top 100 genes using filter feature selection and then we used wrapper feature
selection to further reduce the number of genes to be used resulting usually in a
number between 5 and 15 genes.
The experimental results (Table 2) show that in this type of data the Naïve
Bayes was by far the best classification algorithm obtaining a maximum accuracy
of 82.98% accuracy when combined with χ2 and wrapper feature selection.
Finally, we tested the classification accuracy of our methods by using a combi-
nation of features from both gene expression and MRS data. For the MRS data we
tested the wrapper feature selection method which performed best in our previous
experiments. For the gene expression data we used the feature selection method
that we described above, i.e. combination of filter and wrapper feature selection.
After completing the feature selection stage separately for each of the datasets we
combined the selected features by putting them in the same feature vector space
and using that space for classification. Table 3 shows the classification accuracy
results of our experiments. In most cases, the combination of features sets from
the two datasets yield significantly better accuracy than each of them separately.
Heterogeneous Data Fusion to Type Brain Tumor Biopsies 239
In general Naïve Bayes gives the best performance with a maximum accuracy of
87.23% when using wrapper feature selection for metabolites and a combination
of Information Gain and wrapper feature selection for genes.
Table 3: Classification accuracy using a combination of features from gene NB SVM
expression and MRS datasets.
90.00%
NB SVM
Classification accuracy
85.00%
80.00%
75.00%
70.00%
65.00%
Metab.:wr. Metab.:wr. Metab.:wrapper
Genes:x2+wr. Genes:IG+wr. Genes:RF+wr.
Feature selection method
4 Conclusion
In this paper, we propose a machine learning based data fusion framework which
integrates heterogeneous data sources to type different brain tumors. Our method
employs real biomedical/biological MRS and genomic data and applies a combi-
nation of popular feature selection and classification methods to evaluate the tu-
mor type discrimination capabilities of the two datasets separately and together.
The feature selection process identifies a number of biomarkers from each dataset
which are subsequently used as features for the classification process. The ex-
perimental results show that our data fusion framework outperforms each individ-
ual dataset in the brain tumor multi-class classification problem. Since our frame-
work is a general method, it can also be applied to any other biomedical and
biological data fusion for sample classification and biomarker detection.
References
1
Luciana SG Kobus, 2Fabrício Enembreck, 1,2Edson Emílio Scalabrin, 1João
da Silva Dias, 1Sandra Honorato da Silva
1 Introduction
The public and the supplementary Brazilian health program are strongly based on
a health practice focused on curing. This fact leads to a high degree of complexity
of procedures, to high costs of health care, to failure of the attendance of the real
health clients’ needs of health promotion and prevention of diseases, and difficul-
ties on their access to health services. Such contradictory aspects interfere on the
management of health organizations.
Case management is an opportunity of improving the population health condi-
tion. This model could be improved by the application of precise prediction mod-
els to identify a health system user that could become of high risk and high cost.
The monitoring of this user could be the foundation to needed health care in the
Kobus, L.S.G., Enembreck, F., Scalabrin, E.E., Dias, J.S., da Silva, S.H., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 241–247.
242 Luciana Kobus et al.
right time, in an efficient way, in an attempt to avoid the development or worsen-
ing of a disease.
In this paper, two automatic knowledge discovery techniques were used on pa-
tient data. The first aimed to find rules that could show important relationships
among variables that describe health care events. The second aimed to find precise
prediction models of high risk and high cost patients. To improve the accuracy of
the prediction model generated and diminish the negative impact obtained by
sampling techniques, ensembles of classifiers were used. The use of these combin-
ing techniques was necessary because of the great amount of data. The combining
of classifiers generated by samples of a same database could significantly improve
the prediction’s precision, even when the size of the samples is really small [10,
11]. Results from both methods are discussed, to show that the patterns generated
could be useful to the development of a high cost patient eligibility protocol, as
well as to the definition of an efficient and particular case management model for
the population of the study. This paper is organized as follow. Section 2 shows the
particularities of case management. Sections 3 and 4, respectively, show symboli-
cal machine learning theoretical basis and methods. Finally, we present the con-
clusions from this study in Section 5.
2 Case Management
The case management primary goal is the search for benefits to the health system
user and his family, as well for the health care providers and payers. This goals
could be achieved by (a) the search for quality health care, in a way that the health
care provided be appropriated and beneficial to the population; (b) the inpatient
length of stay management; (c) the control of the utilization of support resources
by the use of information systems based on protocols and decision support tech-
niques; and (d) health care cost control assuring efficient results [6].
High risk patients correspond to almost 1% of a health care system population.
However, this small part of users corresponds to 30% of the available resources
utilization ([3, 4, 5]). The users which health care generates high costs to the
health systems are those who present the most complex profiles by both clinical
and psychosocial points of view. Near 45% of these patients have five or more di-
agnoses to describe their chronic condition, each one of these diagnostics could be
the focus of a specific case management program [4]. However, the precise identi-
fication of a high cost patient is not a simple task, mainly if it is done without the
help of suitable computational tools. This is why we chose to apply different sym-
bolic machine learning techniques.
Automatic Knowledge Discovery and Case Management 243
3 Symbolic Machine Learning and Meta-Learning
Symbolic learning systems are used in situations where the obtained model
assumes a comprehensible shape. The induction of decision trees by ID3 system
[7] and the production rules generation by decision trees [8] were important
contributions by this field of Knowledge Discovery from Databases. More
efficient versions of these algorithms were developed, like C4.5 and C5.0 [9].
Symbolic representation based on association rules are a powerful formalism
that enables the discovery of items that occur simultaneously and frequently in a
database. Each rule has a support. In [1] such support is defined as the relative
number of cases in which the rules could be applied. In this study we won’t ex-
plain how the methods work, because they are very well-known by the literature.
Some combining techniques allow that very precise prediction models could be
built by combining classifiers generated by samples of training data sets. Such
techniques usually use some heuristics to select examples and partition the dataset.
Some studies show that the combining of classifiers generated from many data-
base samples could significantly improve the precision of the prediction, even
when the size of these samples is very small ([10, 11]). Two well-known ensemble
techniques are Bagging e Boosting. The reader is invited to consult ([2, 10, 11])
for more details about them.
4 Method
The population of the study was the data from the users of Curitiba Health Insti-
tute (ICS), which is responsible for the health care of the Curitiba (Paraná, Brazil)
City Hall employees and their families1. Initially, data related to the period of
2001 to 2005 were analyzed, in a total of 55.814 users and 1.168.983 entries. Con-
sidering that the ICS epidemiological profile is congruent with the one of Curitiba
area, and the operational need to decrease the number of entries from the original
database, two criteria of data selection were defined: (i) users with age equal or
above 40 years old; and (ii) users that had, in their health care entries, at least one
registration related to the cardiovascular diseases group of the international code
of diseases version 10 (ICD-10).
After we applied the criteria, the initial sample decreased to 401.041 entries, re-
ferring to 8.457 users and 1.799 diseases’ codes. These two last values will define,
respectively, the numbers of lines and columns of the database that will be used
ahead for the generation of association rules and prediction models.
The first phase aims to discover associations among procedures that generate a
pattern which is indicative of high cost and high complexity and learn with this as-
sociation to detect similar cases in the future. Apriori [1] algorithm was applied to
discover such association rules.
1 Data utilization was authorized by both the Institute and the Pontifical Catholic University of Paraná Ethics on Research Committee,
register number 924.
244 Luciana Kobus et al.
The amount of relevant rules after both analysis process (subjective and trust
analysis) is of 18. Table 1 shows only 2 of them. The high association of cardio-
vascular procedures with emergency consultations could be related or to the seri-
ous condition of the user’s health, or to the poor monitoring of the users with car-
diovascular problems, once few procedures or exams are requested in a frequent
way to the cardiovascular diagnostic.
The fact of an “emergency consultation” event is associated to a heart proce-
dure shows the importance of establishing a monitoring protocol to users that were
submitted to cardiovascular procedures. The outpatient following after heart pro-
cedures should be a priority not only for the user’s health, but also to the proper
management of health care providers and payers.
Table 1: Relevant rules after subjective and trust analysis of the significant events.
ID Association Rules
R01 10,5% of the users presented in their historic the procedure MYOCARDIC
REVASCULARIZATION; among these, 75% have the probability to present association with
an EMERGENCY CONSULTATION
R10 11,8% of the users presented in their historic the fact of being from the male sex associated
to the procedure of GLYCATED HAEMOGLOBIN; among these, 100% have the probability to
present association with referential values of MYOCARDIC REVASCULARIZATION.
The rule R10 was considered relevant because it indicates a relationship be-
tween male users, the GLYCATED HAEMOGLOBIN procedure, and the MYOCARDIC
REVASCULARIZATION procedure. Accordingly to the same specialists, this is impor-
tant information to establish educational and preventive programs to users of the
male sex. These 18 rules compose the first part of the decision support system to
help improve the characterization of a user of the health system that should be
managed.
The second phase: the training database preparation for obtaining the predic-
tion models uses as entry the same data that were used in the first analysis and the
same selection criteria. However, some attributes were removed by a filter process
and other attributes were included by derivation (Table 2). Next section shows the
application of different machine learning techniques to obtain high cost patients
prediction models.
Results from the Discovered Classification Models: Table 3 illustrates the accu-
racy of the algorithms. On can notice that the standard deviation is quite small.
This shows that the distribution of the samples has a statistical correspondence to
the original data. With the results showed in Table 3 we can observe that Bagging
and Boosting methods improved the percentage of properly classified examples,
and the last one is the best algorithm for the experiments done. From Table 3, we
can state with statistical significance that Bagging is better than J48 algorithm and
that Boosting is better than Bagging for the present problem. This happens to both
samples. It was also possible to observe that as larger the sample is better is the
prediction rate. So, we conclude that ensemble of classifiers techniques could be
effective in situations where available data correspond only to a small part of the
tuple space.
Subjective Evaluation of Interesting Patterns: The subjective evaluation of the
obtained patterns is part of the quality evaluation of the obtained rules, accord-
ingly to the specialist’s points of view. Table 4 presents the first three rules ob-
tained, to make the explanation easier due to the limitations of space in this paper.
By now, we can observe that all rules pointed out that a patient considered as of
high cost (CC5) is the ones that had a high number of procedures utilization within
a period of 30 months.
Analysis shows coherence with results that arose from the rules. This can be
246 Luciana Kobus et al.
confirmed by the first rule (R1), where the patient of high cost presents a high
procedure utilization within a short period of time. In general, we could observe
that the frequent utilization of the health care services leads to high cost. This
could be easily observed, because the number of procedures was high in all 20
first rules. The lower absolute number of procedures was 78 (R18), which is still a
high number. This number is worsened if we consider the short time in which
these procedures occur. In the case of rule 18, the period is less than 20 months.
This set of rules is the second part of the decision support system to identify the
health system user to be managed and it help to predict if a patient will become of
high cost or not.
5 Conclusions
It is fundamental that health care providers include intervention protocols and case
management in their practice. Then, health care personnel, responsible for the pa-
tients’ orientation could do their jobs in the most rental manner as possible, focus-
ing on quality aspects of health care for that user already identified with some de-
gree of risk. In this paper, machine learning and data mining techniques were used
to help this task. It was observed that ensemble of classifiers could increase the
trustiness of the prediction model generated, and decrease the negative impact ob-
tained with the use of sampling techniques, generating a high cost patient’s predic-
tion model with accuracy of 90%. This model is very useful for the development
of an eligibility protocol of high cost patients, as well for the improvement of an
efficient and individualized management model for the population. By the other
hand, the discovered associations between procedures and pathologies allow that
management protocols could improve health care and direct resources for preven-
tion and education, decreasing the amount of high cost patients in a medium and
long period of time.
References
1. Agrawal R, Imielinski T, Swami A, Mining association rules between sets of items in large
databases, In Proceedings ACM International Conference on Management of Data
(SIGMOD), 1993, pp. 207-216.
2. Breiman L, Bagging Predictors, Machine Learning, n. 2, vol. 24, pp. 123-140, 1996.
3. Crooks P, Managing high-risk, high cost patients: the Southern California Kaiser permanent
experience in the Medicare ESRD Demonstration Project, The Permanent Journal. v.9, n.2,
2005.
4. Forman S, Targeting the Highest-Risk Population to Complement Disease Management,
Health Management Technology. v. 25, n. 7, Jull, 2004.
5. Knabel T, Louwers J, Intervenability: another measure of health risk, Health Management
Technology. v.25, n.7, July, 2004.
Automatic Knowledge Discovery and Case Management 247
6. May CA, Schraeder C, Britt T, Managed care and case management: roles for Professional
nursing, Washington: American Nurses Publishing; 1996.
7. Quinlan JR, Induction of decision trees, Machine Learning vol. 1, Kluwer Academic Pub-
lishers, pg. 81-106, Netherlands, 1986.
8. Quinlan JR, Generating Production Rules from Decision Trees, In Proceedings International
Joint Conference on Artificial Intelligence (IJCAI), pp. 304-307, 1987.
9. Quinlan JR, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.
10. Shapire RE, The Boosting Approach to Machine Learning: An Overview, MSRI Workshop
on Nonlinear Estimation and Classification, 2002.
11. Ting KM, Witten IH, Stacking Bagged and Dagged Models, Proceedings International Con-
ference on Machine Learning (ICML), 1997: 367-375.
12. Witten IH, E. Frank, Data Mining, Morgan Kauffman Publishers, San Francisco, USA,
2000.
Combining Gaussian Mixture Models and
Support Vector Machines for Relevance
Feedback in Content Based Image Retrieval
Abstract A relevance feedback (RF) approach for content based image retrieval
(CBIR) is proposed, which combines Support Vector Machines (SVMs) with
Gaussian Mixture (GM) models. Specifically, it constructs GM models of the im-
age features distribution to describe the image content and trains an SVM classi-
fier to distinguish between the relevant and irrelevant images according to the
preferences of the user. The method is based on distance measures between prob-
ability density functions (pdfs), which can be computed in closed form for GM
models. In particular, these distance measures are used to define a new SVM ker-
nel function expressing the similarity between the corresponding images modeled
as GMs. Using this kernel function and the user provided feedback examples, an
SVM classifier is trained in each RF round, resulting in an updated ranking of the
database images. Numerical experiments are presented that demonstrate the merits
of the proposed relevance feedback methodology and the advantages of using
GMs for image modeling in the RF framework.
1 Introduction
Marakakis, A., Galatsanos, N., Likas, A. and Stafylopatis, A., 2009, in IFIP International Federation
for Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 249–258.
250 Apostolos Marakakis et al.
GM models have been used extensively in many data modeling applications. Fur-
thermore, they have already been used in CBIR as probability density models of
the features that are used to describe images, e.g. [2], [9], [13]. In this framework,
Combining Gaussian Mixture Models and Support Vector Machines 251
each image is described as a bag of feature vectors which are computed locally
(e.g. a feature vector for each pixel or region of the image). This bag of feature
vectors is subsequently used to train, in a maximum likelihood manner, a GM that
models the probability density of the image features in the feature space. A GM
model for the image feature vectors x ∈ R d is defined as
K
p( x ) = ∑ π jφ (x | θ j ) (1)
j =1
θ j = (µ j , Σ j ) (2)
1
1 ( )Τ Σ−j 1 (x− µ j )
φ (x | θ j ) = N (x | θ j ) =
− x− µ j
e 2
(3)
(2π )d Σ j
where K is the number of Gaussian components in the model, 0 ≤ π j ≤ 1 the
K
mixing probabilities with ∑π j = 1 , and φ (x | θ j ) a Gaussian pdf with mean
j =1
µj and covariance Σj .
In order to retrieve images from an image database, a distance measure between
the image models is needed. The KL divergence cannot be computed analytically
for two GMs. Thus, for efficient retrieval using GM models, one has to resort to
approximations such as those discussed next.
A distance measure between images represented as GMs, which will be used for
CBIR, must have good separation properties and must allow fast computation.
This imposes the requirement that the distance can be defined in closed form for
the case of GMs, which is not easy to achieve. In this spirit, several distance
measures have been proposed, with the aim to address these requirements.
The distance measure introduced in [18] is adapted to the case of mixture mod-
els. It is known that the KL divergence between two Gaussian pdfs can be com-
puted in closed form. In particular,
KL(φ ( x | θ1 ) || φ ( x | θ 2 )) =
1 Σ1 1
2 tr(Σ −1
2 1Σ − )
log
Σ2
− d + (µ1 − µ 2 )Τ Σ 2−1 (µ1 − µ 2 )
2 (4)
252 Apostolos Marakakis et al.
K1
1
ALA( p1 || p2 ) = ∑π1 j logπ 2 β ( j ) + logφ (µ1 j | θ2β ( j ) ) − tr Σ−21β ( j )Σ1 j
2
( ) (7)
j =1
= ( x − µ ) Σ −1 ( x − µ )
2 Τ
x−µ Σ
(8)
2 2
β ( j ) = k ⇔ µ1 j − µ2 k Σ2k
− logπ 2κ < µ1 j − µ2l
Σ 2l
− logπ 2l , ∀l ≠ k (9)
Consider the binary classification problem {(xi , yi )}iN=1 with yi ∈ {− 1,+1} and
xi the labeled patterns based on which we want to train the SVM classifier. The
patterns are mapped to a new space, called kernel space, which can be non-linear
and of much higher dimension than the initial one, using a transformation
x a φ ( x ) . Then a linear decision boundary is computed in the kernel space. The
SVM methodology addresses the problem of classification by maximizing the
margin, which is defined as the smallest distance in the kernel space between the
decision boundary and any of the samples. This can be achieved by solving a
quadratic programming problem:
N
1 N N
max a=(a ,...,a )Τ ∑ ai − ∑∑ ai a j yi y j k (xi , x j ) (12)
1 N
i =1 2 i=1 j =1
N
s.t. 0 ≤ ai ≤ C and ∑ ai yi = 0 (13)
i =1
where
k (xi , x j ) = φ Τ (xi )φ (x j ) (14)
is the kernel function and C is a parameter controlling the trade-off between
training error and model complexity. Then, the decision function for a new pattern
x is defined by
N
y ( x ) = ∑ ai yi k ( x, xi ) + b (15)
i =1
where b is a bias parameter the value of which can be easily determined after the
solution of the optimization problem (see [17]). After training, the value y ( x )
can be regarded as a measure of confidence about the class of x , with large posi-
tive values (small negative values) strongly indicating that x belongs to the class
denoted by “+1” (“-1”).
It is obvious that the patterns under classification need not be in vectorial form,
but they can be any data objects for which an appropriate kernel function express-
ing their pair-wise similarity can be defined.
In the framework of CBIR with RF, and assuming that we model each image us-
ing a GM, in each round of RF we have a number of images, represented as GMs,
which correspond to the feedback examples provided by the user until now. Each
254 Apostolos Marakakis et al.
6 Experiments
In order to test the validity of the proposed method, an image set containing 3740
images from the image database in [19] is used. These images are classified in 17
semantic categories. This categorization corresponds to the ground truth.
To model each image, several features are extracted including position, color
and texture information. As position features we use the pixel coordinates, as color
features we use the 3 color coordinates (L*,a*,b*) in the CIE-Lab color space and
as texture features we use the contrast (c), the product of anisotropy with contrast
(ac) and the product of polarity with contrast (pc) as described in [9].
Consequently, for each image a set of feature vectors is extracted, which is sub-
sequently used as input to the Greedy EM algorithm [11] to produce a GM model
of the image features distribution. For all GM models, 10 Gaussian components
are adopted, and each Gaussian component is assumed to have full covariance ma-
trix.
Combining Gaussian Mixture Models and Support Vector Machines 255
For reasons of comparison, we also applied the SVM-RF approach using the
same image feature sets but the standard Gaussian RBF function, which is the
most commonly used kernel function for SVMs. This kernel function requires a
global vectorial representation of the images. Thus, in this case, we represent each
image by the joint position-color and position-texture histogram. The position-
color histogram consists of 3x3x4x8x8 (x-y-L*-a*-b*) bins, whereas the position-
texture histogram consists of 3x3x4x4x4 (x-y-ac-pc-c) bins.
In order to quantify the performance of the compared methods, we implemented
an RF simulation scheme. As a measure of performance we use Precision which is
the ratio of relevant images in top N retrieved images. An image is assessed to be
relevant or irrelevant according to the ground truth categorization of the image da-
tabase. In this simulation scheme, 1000 database images are used once as initial
queries. For each initial query, we simulated 6 rounds of RF. In each RF round, at
most 3 relevant and 3 irrelevant images are selected randomly from the first 100
images of the ranking. These images are used in combination with the examples
provided in the previous RF rounds to train a new SVM classifier. Based on this
new classifier, the ranking of the database images is updated.
For the experiments presented below, average Precision in scope N = 10, 20, 30
is shown. The values of the SVM parameter C and of the kernel parameter γ are
empirically chosen for each method so as to obtain the best performance. As SVM
implementation we used the one provided in [20].
In Figures 1-3 we can see that the SVM-RF method based on GMs constantly
outperforms the common SVM-RF method which uses histograms and the Gaus-
sian RBF kernel function. Moreover, it can be observed that the method which is
based on the distance measure defined in [18] results in slightly superior perform-
ance when compared to that obtained by the method which uses the ALA based
distance measure.
0.9
0.85
0.8
0.75
Precision
0.7
0.65
0.6
SVM+GMM+SKLgkl
0.55 SVM+GMM+SKLala
SVM+Hist+Gaussian RBF
0.5
0 1 2 3 4 5 6
Feedback Iteration
0.8
0.75
0.7
0.65
Precision
0.6
0.55
0.5
SVM+GMM+SKL gkl
SVM+GMM+SKL ala
0.45
SVM+Hist+Gaussian RBF
0.4
0 1 2 3 4 5 6
Feedback Iteration
0.75
0.7
0.65
0.6
Precision
0.55
0.5
SVM+GMM+SKL gkl
0.4
0 1 2 3 4 5 6
Feedback Iteration
A new relevance feedback approach for CBIR is presented in this paper. This ap-
proach uses GMs to model the image content and SVM classifiers to distinguish
between the classes of relevant and irrelevant images. To combine these method-
Combining Gaussian Mixture Models and Support Vector Machines 257
ologies, a new SVM kernel function is introduced based on distance measures be-
tween GMs which can be computed efficiently, i.e. in closed form. The main ad-
vantages of the proposed methodology are accuracy as indicated by our experi-
mental results, speed, due to the distance measures used, and flexibility. As
indicated by our experiments, very promising results can be obtained using GMs
as SVM patterns, even if we are forced to use an approximation and not the exact
KL divergence. In particular, for the two KL approximations tested, the perform-
ance does not differ significantly. However, the distance measure introduced in
[18] gives slightly better results.
In the future, we would like to adapt and test our method using other efficiently
computable distance measures for GMs. Moreover, we aim to use more sophisti-
cated image features to represent the image content. In addition, we plan to gener-
alize our RF scheme to support region-based image descriptions. Furthermore, we
aim to apply techniques for automatic determination of the appropriate number of
components for each GM. Finally, we would like to test the scalability of the pro-
posed method using even larger image databases.
Acknowledgement This work was supported by Public Funds under the PENED 2003 Project
co-funded by the European Social Fund (80%) and National Resources (20%) from the Hellenic
Ministry of Development - General Secretariat for Research and Technology.
References
1 Introduction
Spoken language dialogue systems considerably improve driver’s safety and user-
friendliness of human-machine interfaces, due to their similarity to the conversa-
tional activity with another human, a parallel activity to which the driver is used to
and it allows him concentrate on the main activity, the driving itself. Driving qual-
ity, stress and strain situations and user acceptance when using speech and manual
commands to acquire certain information on the route has previously been studied
[1], and the results have shown that, with speech input, the feeling of being dis-
tracted from driving is smaller, and road safety is improved, especially in the case
of complex tasks. Moreover, assessment of user requirements from multimodal in-
terfaces in a car environment has shown that when the car is moving the system
should switch to the “speech-only” interaction mode, as any other safety risks (i.e.
driver distraction from the driving task by gesture input or graphical output) must
be avoided [2].
The performance of speech-based interfaces, although reliable enough in con-
trolled environments to support speaker and device independence, degrades sub-
Mporas, I., Ganchev, T., Kocsis, O. and Fakotakis, N., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 259–266.
260 Iosif Mporas et al.
stantially in a mobile environment, when used on the road. There are various types
and sources of noise interfering with the speech signal, starting with the acoustic
environment (vibrations, road/fan/wind noise, engine noise, traffic, etc.) to
changes in the speaker’s voice due to task stress, distributed attention, etc. In the
integration of speech-based interfaces within vehicle environments the research is
conducted in two directions: (i) addition of front-end speech enhancement systems
to improve the quality of the recorded signal, and (ii) training the speech models
of the recognizer engine on noisy, real-life, speech databases.
In this study, the front-end speech enhancement system for a motorcycle on the
move environment is investigated. The speech-based interface, as presented in this
study, is part of a multi-modal and multi-sensor interface developed in the context
of the MoveOn project. The performance of various speech enhancement algo-
rithms in the non-stationary conditions of motorcycle on the move is assessed.
Performance of assessed algorithms is ranked in terms of the improvement they
contribute to the speech recognition rate, when compared to the baseline results
(i.e. without speech enhancement). Following, a short overview of the MoveOn
system, the enhancement methods evaluated, and the experimental setup and re-
sults are presented.
2 System Description
The MoveOn project aims at the creation of a multi-modal and multi-sensor, zero-
distraction interface for motorcyclists. This interface provides the means for
hands-free operation of a command and control system that enables for informa-
tion support of police officers on the move. The MoveOn information support sys-
tem is a wearable solution, which constitutes of a purposely designed helmet,
waist and gloves. The helmet incorporates microphones, headphones, visual feed-
back, a miniature camera and some supporting local-processing electronics. It has
a USB connection to the waist that provides the power supply and the data and
control interfaces. The waist incorporates the main processing power, storage re-
pository, TETRA communication equipment and power capacity of the wearable
system, but also a number of sensors, an LCD display, and some vibration feed-
back actuators. Among the sensors deployed on the waist are acceleration and in-
clination sensors, and a GPS device, which provide the means for the context
awareness of the system. Auxiliary microphone and headphone are integrated in
the upper part of the waist, at the front side near the collar, for guaranteeing the
spoken interaction and communication capabilities when the helmet is off.
The multimodal user interface developed for the MoveOn application consists
of audio and haptic inputs, and audio, visual and vibration feedbacks to the user.
Due to the specifics of the MoveOn application, involving hands-busy and eyes-
busy motorcyclists, speech is the dominating interaction modality.
The spoken interface consists of multi-sensor speech acquisition equipment,
speech pre-processing, speech enhancement, speech recognition, and text-to-
Performance Evaluation of a Speech Interface 261
speech synthesis components, which are integrated into the multimodal dialogue
interaction framework based on Olympus/RavenClaw [3, 4], but extended for the
needs of multimodal interaction. Each component in the system is a server on it-
self, i.e. ASR, TTS, speech preprocessing, speech enhancement, etc are servers,
communicating either directly with each other or through a central hub, which
provides synchronization.
Since the noisy motorcycle environment constitutes a great challenge to the
spoken dialogue interaction, a special effort is required to guarantee high speech
recognition performance, as it proved to be the most crucial element for the over-
all success of interaction.
The speech front-end described in Section 1.2 was tested with each of the speech
enhancement techniques outlined in Section 1.3. Different environmental condi-
tions and configuration settings of the speech recognition engine were evaluated.
In the following, we describe the speech data, the speech recognition engine and
the experimental protocol utilized in the present evaluation. Finally, we provide
the experimental results.
The evaluation of the front-end was carried out on the speech and noise data-
base, created during the MoveOn project [13]. The database consists of approxi-
mately 40 hours of annotated recordings, most of which were recorded in three
audio channels fed by different sensors, plus one channel for the audio prompts.
Thirty professional motorcyclists, members of the operational police force of UK,
were recorded when riding their motorcycles. Each participant was asked to repeat
a number of domain-specific commands and expressions or to provide a spontane-
ous answer to questions related to time, current location, speed, etc. Motorcycles
and helmets from various vendors were used, and the trace of road differed among
sessions. The database includes outdoor recordings (city driving, highway, tun-
nels, suburbs, etc) as well as indoor (studio) recordings with the same hardware.
The database was recorded at 44.1 kHz, with resolution 16 bits. Later on, all re-
cordings were downsampled to 8 kHz for the needs of the present application.
The Julius [14] speech recognition engine was employed for the present evalua-
tion. The decoder of the recognition engine utilizes a general purpose acoustic
model and an application-dependent language model. The acoustic model was
Performance Evaluation of a Speech Interface 263
built from telephone speech recordings of the British SpeechDat(II) database [15],
by means of the HTK toolkit [16]. It consists of three-state left-to-right HMMs,
without skipping transitions, one for each phone of the British SpeechDat(II)
phone set. Each state is modelled by a mixture of eight continuous Gaussian dis-
tributions. The state distributions were trained from parametric speech vectors,
taken out from speech waveforms after pre-processing and feature extraction. The
pre-processing of the speech signals, sampled at 8 kHz, consisted of frame block-
ing with length and step 25 and 10 milliseconds respectively, and pre-emphasis
with coefficient equal to 0.97. The speech parameterization consisted in the com-
putation of twelve Mel frequency cepstral coefficients [17], computed through a
filter-bank of 26 channels, and the energy of each frame. The speech feature vec-
tor was of dimensionality equal to 39, since the first and second derivatives were
appended to the static parameters. All HMMs were trained through the Baum-
Welch algorithm [18], with convergence ratio equal to 0.001.
The language models were built by utilizing the CMU Cambridge Statistical
Language Modeling (SLM) Toolkit [19]. Specifically, we used the transcriptions
of the responses of the MoveOn end-user to the system [20] to build bi-gram and
tri-gram word models. Words included in the application dictionary but not in the
list of n-grams were assigned as out-of-vocabulary words.
The performance of different enhancement methods, implemented as in [22],
was assessed by evaluating their effect on the speech recognition results. Two dif-
ferent experimental setups were considered: (i) indoors and (ii) outdoors condi-
tions. The performance of each enhancement method in the indoors condition was
used as a reference, while the outdoors condition is the environment of interest. In
contrast to previous work [21], were the performance of enhancement algorithms
was investigated on the basis of objective tests on the enhanced signals, here we
examine directly the operational functionality of the system by measuring the
speech recognition performance. Specifically, the percentage of correctly recog-
nized words (CRW) and the word recognition rates (WRRs) obtained in the speech
recognition process after applying each enhancement method were measured. The
CRW indicates the ability of the front-end to recognize the uttered message from
the end-user, while the WRR points out the insertion of non uttered words, to-
gether with the word deletions and substitutions that the CRW measures. In terms
of these performance measures we assess the practical worth of each algorithm
and its usefulness with respect to overall system performance. These results are
compared against the quality measures obtained in earlier work [21].
We evaluated the speech recognition performance for each speech enhance-
ment method in the indoors and outdoors conditions, with bi-gram and tri-gram
language models. Table 1 presents the performance for the indoor experiments, in
terms of WRR and CRW in percentages.
264 Iosif Mporas et al.
Table 1. Performance (WRR and CRW in percentages) for various speech en-
hancement techniques for the indoors recordings.
As can be seen in Table 1, the best performing method for the case of indoor
recordings was the Log-MMSE together with the non-enhanced speech inputs. All
remaining methods decreased the speech recognition performance. This is owed to
the distortion that these speech enhancement methods introduce into the clean
speech signal. Obviously, indoors, i.e. on noise-free speech, the general purpose
acoustic model performs better without speech enhancement pre-processing.
As Table 1 presents, the speech recognition performance for the bi-gram lan-
guage model was better than the one for the tri-gram language model. This is
owed to the limited amount of data that were available for training the language
models. Obviously the data were sufficient for training the bi-gram model but not
enough for the tri-gram model.
In Table 2 we present the speech recognition performance in percentages for
the outdoors scenario, in terms of WRR and CRW, for both the bi-gram and tri-
gram language models.
In contrast to the indoors scenario, the speech enhancement in the noisy out-
doors scenario (motorcycles on the move) improved the speech recognition per-
Table 2. Performance (WRR and CRW in percentages) for various speech enhancement tech-
niques for the outdoors recordings.
5 Conclusions
Acknowledgments This work was supported by the MoveOn project (IST-2005-034753), which
is partially funded by the European Commission.
References
1. Gartner, U., Konig, W., Wittig, T. (2001). Evaluation of Manual vs. Speech input when
using a driver information system in real traffic. Driving Assessment 2001: 1st Interna-
tional Driving Symposium on Human Factors in Driver Assessment, Training and Ve-
chicle Design, pp. 7-13, CO.
2. Berton, A., Buhler, D., Minker, W. (2006). SmartKom-Mobile Car: User Interaction with
Mobile Services in a Car Environment. In SmartKom: Foundations of Multimodal Dia-
logue Systems, Wolfgang Wahlster (Ed.). pp. 523-537, Springer.
3. Bohus, D., Rudnicky, A.I. (2003). RavenClaw: Dialog Management Using Hierarchical
Task Decomposition and an Expectation Agenda. Proceedings European Conference on
Speech Communication and Technology (EUROSPEECH):597-600.
266 Iosif Mporas et al.
4. Bohus, D., Raux, A., Harris, T.K., Eskenazi, M., Rudnicky, A.I. (2007). Olympus: an
open-source framework for conversational spoken language interface research, Bridging
the Gap: Academic and Industrial Research in Dialog Technology workshop at
HLT/NAACL 2007.
5. Berouti, M., Schwartz, R., Makhoul, J. (1979). Enhancement of speech corrupted by
acoustic noise. In Proceedings IEEE ICASSP’79:208-211.
6. Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing
and minimum statistics. IEEE Transactions on Speech and Audio Processing 9(5):504-
512.
7. Kamath, S., Loizou, P. (2002). A multi-band spectral subtraction method for enhancing
speech corrupted by colored noise. Proceedings ICASSP’02.
8. Ephraim, Y., Malah, D. (1985). Speech enhancement using a minimum mean square error
log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Proc-
essing 33:443-445.
9. Loizou, P. (2005). Speech enhancement based on perceptually motivated Bayesian esti-
mators of the speech magnitude spectrum. IEEE Transactions on Speech and Audio Proc-
essing 13(5):857-869.
10. Hu ,Y., Loizou, P. (2003). A generalized subspace approach for enhancing speech cor-
rupted by coloured noise. IEEE Transactions on Speech and Audio Processing 11:334-
341.
11. Jabloun, F., Champagne, B. (2003). Incorporating the human hearing properties in the
signal subspace approach for speech enhancement. IEEE Transactions on Speech and
Audio Processing 11(6):700-708.
12. Hu, Y., Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multi-
taper spectrum. IEEE Transactions on Speech and Audio Processing 12(1):59-67.
13. Winkler, T., Kostoulas, T., Adderley, R., Bonkowski, C., Ganchev, T., Kohler, J., Fako-
takis N. (2008). The MoveOn Motorcycle Speech Corpus. Proceedings of LREC’2008.
14. Lee, A., Kawahara, T., Shikano, K. (2001). Julius -- an open source real-time large vo-
cabulary recognition engine. Proceedings European Conference on Speech Communica-
tion and Technology (EUROSPEECH):1691-1694.
15. Hoge, H., Draxler, C., Van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S.
(1999). SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish
Line. Proceedings 6th European Conference on Speech Communication and Technology
(EUROSPEECH):2699-2702.
16. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ol-
lason, D., Povey, D., Valtchev, V., Woodland, P. (2005). The HTK Book (for HTK Ver-
sion 3.3). Cambridge University.
17. Davis, S.B., Mermelstein, P. (1980). Comparison of parametric representations for mono-
syllabic word recognition in continuously spoken sentences. IEEE Transactions on
Acoustics, Speech and Signal Processing 28(4):357-366.
18. Baum, L.E., Petrie, T., Soules, G., Weiss, N. (1970). A maximization technique occurring
in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathe-
matical Statistics 41(1):164–171.
19. Clarkson, P.R., Rosenfeld, R. (1997). Statistical Language Modeling Using the CMU-
Cambridge Toolkit. Proceedings 5th European Conference on Speech Communication
and Technology (EUROSPEECH): 2707-2710.
20. Winkler, T., Ganchev, T., Kostoulas ,T., Mporas, I., Lazaridis, A., Ntalampiras, S., Badii,
A., Adderley, R., Bonkowski, C. (2007). MoveOn Deliverable D.5: Report on Audio da-
tabases, Noise processing environment, ASR and TTS modules.
21. Ntalampiras, S., Ganchev, T., Potamitis, I., Fakotakis, N. (2008). Objective comparison
of speech enhancement algorithms under real world conditions. Proceedings PETRA
2008:34.
22. Loizou P. (2007). Speech Enhancement: Theory and Practice, CRC Press, 2007.
Model Identification in Wavelet Neural
Networks Framework
E-mail: zapranis,[email protected]
Abstract The scope of this study is to present a complete statistical framework for
model identification of wavelet neural networks (WN). In each step in WN con-
struction we test various methods already proposed in literature. In the first part
we compare four different methods for the initialization and construction of the
WN. Next various information criteria as well as sampling techniques proposed in
previous works were compared to derive an algorithm for selecting the correct to-
pology of a WN. Finally, in variable significance testing the performance of vari-
ous sensitivity and model-fitness criteria were examined and an algorithm for se-
lecting the significant explanatory variables is presented.
1 Introduction
This study presents a complete statistical wavelet neural network (WN) model
identification framework. Model identification can be separated in two parts,
model selection and variable significance testing. Wavelet analysis (WA) has
proved to be a valuable tool for analyzing a wide range of time-series and has al-
ready been used with success in image processing, signal de-noising, density esti-
mation, signal and image compression and time-scale decomposition.
In [1] have demonstrated that it is possible to construct a theoretical description
of feedforward NN in terms of wavelet decompositions. WN were proposed by [2]
as an alternative to feedforward NN hoping to elevate the weakness of each me-
thod. The WN is a generalization of radial bases function networks (RBFN).WNs
are one hidden layer networks that use a wavelet as an activation function instead
of the classic sigmoid function. The families of multidimensional wavelets pre-
serve the universal approximation property that characterizes neural networks. In
[3] various reasons presented in why wavelets should be used instead of other
transfer functions.
Wavelet networks have been used in a variety of applications so far. Wavelet
networks were used with great success in short term load forecasting, [4], in time
series prediction, [5], signal classification and compression, [6], static, dynamic
[1] and nonlinear modeling [7], nonlinear static function approximation, [8]. Fi-
Zapranis, A. and Alexandridis, A., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 267–276.
268 Achilleas Zapranis and Antonis Alexandidis
nally, [9] proposed WN as a multivariate calibration method for simultaneous de-
termination of test samples of copper, iron, and aluminum.
In contrast to sigmoid neural networks, wavelet networks allow constructive
procedures that efficiently initialize the parameters of the network. Using wavelet
decomposition a wavelet library can be constructed. Each wavelon can be con-
structed using the best wavelet of the wavelet library. These procedures allow the
wavelet network to converge to a global minimum of the cost function. Also start-
ing the network training very close to the solution leads to smaller training times.
Finally, wavelet networks provide information of the participation of each wave-
lon to the approximation and the dynamics of the generating process.
The rest of the paper is organized as follows. In section 2 we present the WN,
we describe the structure of a WN and we compare different initialization meth-
ods. In section 3 we present a statistical framework in WN model selection and
different methods are compared. Various sensitivity criteria of the input variables
are presented in section 4 and a variable selection scheme is presented. Finally, in
section 5 we conclude.
In [10] and [11] we give a concise treatment of wavelet theory. Here the emphasis
is in presenting the theory and mathematics of wavelet neural networks. So far in
literature various structures of a WN have been proposed [5] [8] [9] [7] [12] [13].
In this study we use a multidimensional wavelet neural network with a linear con-
nection of the wavelons to the output. Moreover for the model to perform well in
linear cases we use direct connections from the input layer to the output layer. A
network with zero hidden units (HU) is the linear model.
The network output is given by the following expression:
λ m
$y (x) = w[2] + w[2] ⋅ Ψ (x) + w[0] ⋅ x
λ +1 ∑ j j
j =1
∑ i i
i =1
m
Ψ j ( x) = ∏ψ ( zij )
i =1
w = ( wi[0] , w[2]
j , wλ +1 , w(ξ ) ij , w(ζ ) ij )
[2] [1] [1]
where x is equally spaced in [0,1] and the noise ε1(x) follows a normal distribution
with mean zero and a decreasing variance:
σ ε2 ( x) = 0.052 + 0.1(1 − x2 )
Figure 1 show the initialization of all four algorithms for the first example. The
network uses 2 hidden units with learning rate 0.1 and momentum 0. The use of a
large learning rate or momentum might lead to oscillation between two points. As
a result the WN would not be able to find the minimum of the loss function or it
will be trapped in a local minimum of the loss function. It is clear that the BE and
SSO algorithms starting approximation are very close to the target function f(x).
As a result less iterations and training time are needed. To compare the previous
methods we use the heuristic method to train 100 networks with different initial
conditions of the direct connections wi[0] and weights w[2] j to find the global
minimum. We find that the smallest mean square error (MSE) is 0.031332. Using
the RBS algorithm the MSE is 0.031438 and is found after 717 iterations. The
MSE between the underlying function f(x) and the network approximation is
0.000676. The SSO needs 4 iterations and the MSE is 0.031332 while the MSE
between the underlying function f(x) and the network approximation is only
0.000121. The same results achieved by the BE method. Finally, one implementa-
tion of the heuristic method needed 1501 iterations.
From the previous examples it seems the SSO and the BE algorithms give the
same results and outperform both the heuristic and the RBS algorithm. To have a
more clear view we introduce a more complex example where
and ε2(x) follows a Cauchy distribution with location 0 and scale 0.05 and x is
equally spaced in [-6,6]. While the fist example is very simple the second one
Model Identification in Wavelet Neural Networks 271
proposed by [22] incorporates large outliers in the output space. The sensitive to
the presence of outliers of the proposed WN will be tested.
The results for the second example are similar however the BE algorithm is
10% faster than the SSO. Using the RBS, SSO and BE algorithms the MSE is
0.004758, 004392 and 0.04395 and is found after 2763, 1851 and 1597 iterations
respectively. The MSE between the underlying function g(x) and the network ap-
proximation is 0.000609, 0.000073 and 0.000057 respectively.
One can observe in Figure 2 that the WN approximation was not affected by
the presence of large outliers in contrast to the findings of [22]. In this study 8
hidden units were used for the network topology proposed by ν-fold cross-
validation while in [22] 10 hidden units were proposed by the FPE criterion. As it
is shown in the next section the FPE criterion does not perform as well as sam-
pling techniques and should not be used.
The previous examples indicate that SSO and BE perform similarly whereas
BE outperforms SSO in complex problems. On the other hand BE needs the calcu-
lation of the inverse of the wavelet matrix which columns might be linear depend-
ent, [14]. In that case the SSO must be used. However since the wavelets come
from a wavelet frame this is very rare to happen, [14].
3 Model Selection
In this section we describe the model selection procedure. One of the most crucial
steps is to identify the correct topology of the network. A network with less HU
than needed is not able to learn the underlying function while selecting more HU
than needed will result to an overfitting model. Several criteria exist for model se-
lection, such as Generalized Prediction Error, Akaike’s Information Criterion, Fi-
nal Prediction Error (FPE), Network Information Criterion and Generalized Cross-
Validation (GCV). These criteria are based on assumptions that are not necessarily
true in the neural network framework. Alternatively we suggest the use of sam-
pling methods such as bootstrap and cross-validation. The only assumption made
by sampling methods is that the data are a sequence of independent and identically
distributed variables. However, sampling methods are computationally very de-
manding. In this study we will test the FPE proposed by [14], the GCV proposed
by [14], the bootstrap (BS) and the v-fold cross-validation (CV) methods proposed
by [23] and [24]. These criteria will be tested with and without training of the
network.
In both examples BS, FPE and CV propose similar models. In the first example
2 HU were needed to model the underlying function f(x).On the other hand GCV
suggests 3 hidden units. The MSE between the underlying function f(x) and the
approximation of the WN using 3 HU is 0.000271 while using 2 HU is only
0.000121 indicating that the GCV suggested a more complex model than needed.
In the second example BS and CV propose the same network topology (8 HU)
while using the FPE criterion the prediction risk minimized in 7 HU and using the
GCV criterion it is minimized in 14 HU. To compare the performance of each cri-
272 Achilleas Zapranis and Antonis Alexandidis
terion the MSE between the underlying function g(x) and the approximation of the
WN is calculated. The MSE is 0.000079, 0.000073 and 0.000101 for 7, 8 and 14
HU. Again the BS and CV gave correct results while the FPE performs satisfacto-
rily. 2.5
Initialization Using The Heuristic Method
1.6
Initialization Using The RBS Method
1.4
2
1.2
1.5 1
0.8
1
0.6
0.5 0.4
0.2
0
0
−0.5 −0.2
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
2
Data
1.5 Fit
0.5
−0.5
−1
−1.5
−2
−2.5
−3
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
To significantly reduce the training times [14] propose that since the initializa-
tion is very close to the underlying function the prediction risk can be calculated
after the initialization. In the first example all information criteria gave the same
results as in the previous case. However in the second example in all criteria more
than 14 HU were needed proving that early stopping techniques does not perform
satisfactory.
Since sampling techniques are very computationally expensive the FPE crite-
rion can be used initially. Then BS or CV can be used in +/-5 HU around the HU
proposed by FPE to define the best network topology.
Model Identification in Wavelet Neural Networks 273
4 Model Fitness and Sensitivity Criteria
5 Conclusions
This study presents a statistical framework for wavelet network model identifica-
tion. To our knowledge this is the first time that a complete statistical framework
for the use of WNs is presented. Several methodologies were tested in wavelet
network construction, initialization, model selection and variable significant test-
ing. We propose a multidimensional wavelet neural network with a linear connec-
tion of the wavelons to the output and direct connections from the input layer to
the output layer. The training is performed by the classic back-propagation algo-
rithm. Next four different methods were tested in wavelet network initialization.
Using the BE and SSO the training times were reduced significantly while the
network converged to the global minimum of the loss function.
Model selection is a very important step. Four techniques were tested with the
sampling techniques to give more stable results than other alternatives. BS and CV
found the correct network topology in both examples. Although FPE and GCV are
extensively used in the WN framework, due to the linear relation of the wavelets
and the original signal, it was proved that both criteria should not be used in com-
plex problems. Moreover using early stopping techniques in complex problems
was found to be inappropriate.
A variable selection method was presented. Various sensitivity and model fit-
ness criteria were tested. While sensitivity criteria are application dependent, MFS
criteria are much better suited for testing the significance of the explanatory vari-
ables. The SBP correctly indentified the insignificant variables while their re-
moval reduced the prediction risk and increased the adjusted R2 implying the cor-
rectness of this decision.
Model Identification in Wavelet Neural Networks 275
Finally the partial derivatives with respect to the weights of the network, to the
dilation and translation parameters as well as the derivative with respect to each
input variable are presented. The construction of confidence and prediction inter-
vals as well as a model adequacy testing scheme are left as a future work.
wi[0] Max Min MaxD MinD Avg AvgD Avg AvgL SBP
D D M M D M L M
(two variables)
Full model
References
1. Pati, Y., Krishnaprasad, P.: Analysis and Synthesis of Feedforward Neural Networks Using
Discrete Affine Wavelet Transforms. IEEE Trans. on Neural Networks 4(1), 73-85 (1993)
2. Zhang, Q., Benveniste, A.: Wavelet Networks. IEEE Trans. on Neural Networks 3(6), 889-
898 (1992)
276 Achilleas Zapranis and Antonis Alexandidis
3. Bernard, C., Mallat, S., Slotine, J.-J.: Wavelet Interpolation Networks. In Proc. ESANN '98,
47-52 (1998)
4. Benaouda, D., Murtagh, G., Starck, J.-L., Renaud, O.: Wavelet-Based Nonlinear Multiscale
Decomposition Model for Electricity Load Forecasting. Neurocomputing 70, 139-154 (2006)
5. Chen, Y., Yang, B., Dong, J.: Time-Series Prediction Using a Local Linear Wavelet Neural
Wavelet. Neurocomputing 69, 449-465 (2006)
6. Kadambe, S., Srinivasan, P.: Adaptive Wavelets for Signal Classification and Compression.
International Journal of Electronics and Communications 60, 45-55 (2006)
7. Billings, S., Wei, H.-L.: A New Class of Wavelet Networks for Nonlinear System
Identification. IEEE Trans. on Neural Networks 16(4), 862-874 (2005)
8. Jiao, L., Pan, J., Fang, Y.: Multiwavelet Neural Network and Its Approximation Properties.
IEEE Trans. on Neural Networks 12(5), 1060-1066 (2001)
9. Khayamian, T., Ensafi, A., Tabaraki, R., Esteki, M.: Principal Component-Wavelet Net-
works as a New Multivariate Calibration Model. Analytical Letters 38(9), 1447-1489 (2005)
10. Zapranis, A., Alexandridis, A.: Modelling Temperature Time Dependent Speed of Mean
Reversion in the Context of Weather Derivetive Pricing. Applied Mathematical Finance
15(4), 355 - 386 (2008)
11. Zapranis, A., Alexandridis, A.: Weather Derivatives Pricing: Modelling the Seasonal
Residuals Variance of an Ornstein-Uhlenbeck Temperature Process with Neural Networks.
Neurocomputing (accepted, to appear) (2007)
12. Becerikli, Y.: On Three Intelligent Systems: Dynamic Neural, Fuzzy and Wavelet Networks
for Training Trajectory. Neural Computation and Applications 13, 339-351 (2004)
13. Zhao, J., Chen, B., Shen, J.: Multidimensional Non-Orthogonal Wavelet-Sigmoid Basis
Function Neurla Network for Dynamic Process Fault Diagnosis. Computers and Chemical
Engineering 23, 83-92 (1998)
14. Zhang, Q.: Using Wavelet Network in Nonparametric Estimation. IEEE Trans. on Neural
Networks 8(2), 227-236 (1997)
15. Oussar, Y., Rivals, I., Presonnaz, L., Dreyfus, G.: Trainning Wavelet Networks for Nonlinear
Dynamic Input Output Modelling. Neurocomputing 20, 173-188 (1998)
16. Postalcioglu, S., Becerikli, Y.: Wavelet Networks for Nonlinear System Modelling. Neural
Computing & Applications 16, 434-441 (2007)
17. Oussar, Y., Dreyfus, G.: Initialization by Selection for Wavelet Network Training.
Neurocomputing 34, 131-143 (2000)
18. Xu, J., Ho, D.: A Basis Selection Algorithm for Wavelet Neural Networks. Neurocomputing
48, 681-689 (2002)
19. Gao, R., Tsoukalas, H.: Neural-wavelet Methodology for Load Forecasting. Journal of
Intelligent & Robotic Systems 31, 149-157 (2001)
20. Xu, J., Ho, D.: A Constructive Algorithm for Wavelet Neural Networks. Lecture Notes in
Computer Science(3610), 730-739 (2005)
21. Kan, K.-C., Wong, K.: Self-construction algorithm for synthesis of wavelet networks.
Electronic Letters 34, 1953-1955 (1998)
22. Li, S., Chen, S.-C.: Function Approximation using Robust Wavelet Neural Networks. In
Proc. ICTAI '02, 483-488 (2002)
23. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, USA (1993)
24. Zapranis, A., Refenes, A.: Principles of Neural Model Indentification, Selection and
Adequacy: With Applications to Financial Econometrics. Springer-Verlag (1999)
Two Levels Similarity Modelling: a Novel
Content Based Image Clustering Concept
1 Agriculture high institute (ISA) (computer science and statistics laboratory)- Catholic Lille
University. 48, boulevard Vauban. 59000. Lille. France.
E-mail: [email protected]
2 IBISC Laboratory (CNRS FRE 3190) 40 Rue du Pelvoux, 91025 EVRY Cedex. France.
Phone: +33169477555, Fax: +33169470306,
E-mail: [email protected]
1 Introduction
The aim of any unsupervised classification method applied to contend based im-
age retrieval is to gather considered images to be similar. In this case, algorithms
treat the data in only one direction: lines or columns, but not both at the same
time. Contrary to the one-dimensional clustering, co-clustering (also called bi-
clustering) proposes to process the data tables by taking of account the lines and
the columns in a simultaneous way. That implies the consideration of the existing
correlation between the data expressed in lines and columns. Thus, co-clustering is
a more complete data view since it includes a new concept “the mutual informa-
tion” which is a bond between the random variables representing the clusters. An
optimal coclustering is that which minimize the difference (the loss) of mutual in-
formation between the original random variables and mutual information between
the clusters random variables. Several co-clustering structures are proposed in the
literature [1],[2],[3]. The choice of one of these structures is directly related on the
considered application. That is also related to the relational complexity between
the lines and the columns elements. Among used co-clustering approaches, in [4],
Qiu proposed a new approach dedicated to content based image categorization us-
Djouak, A. and Maaref, H., 2009, in IFIP International Federation for Information Processing, Volume
296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 277–282.
278 Amar Djouak and Hichem Maaref
ing the bipartite graphs to simultaneously model the images and their features. In-
deed, the first bipartite graph set is associated to the images and the second unit is
associated to their features. The bonds between the two sets characterize the exist-
ing degrees of correspondence between the images and their features. This method
is very promising and could open the way for many possibilities to adapt co-
clustering techniques to images recognition and retrieval applications. In this
work, to propose one of the first coclustering based content based images recogni-
tion and search, our images are described by features vectors.
They form a two-dimensional table (lines/columns) such as the lines are the
DataBase images and the columns are the features which describe them. Then, we
propose in this paper a new co-clustering modelling which introduce a two levels
similarity concept. The aim of this method is to improve the image retrieval accu-
racy and to optimize time processing. In other hand, we use one of classical co-
clustering approaches to comparing its performances with those obtained with our
method. Moreover, the choice of BIVISU system [5] applied initially to manage
gene expression data is justified by its great conceptual simplicity.
This work is organized as follows: section 2 is devoted to the BIVISU co-
clustering system. Section 3 introduce the two levels association concept and
model the general architecture of the proposed approach. In section 4, different
experimental results are presented and commented. Finally, one synthesizes pre-
sented work and exposes the future tracks.
The used co-clustering algorithm (BIVISU system) can detect several co-clusters
forms (constant, constant rows, constant columns, additive and multiplicative co-
clusters as defined in [5]). Also, this method uses parallel coordinate (PC) plots
for visualize high dimensional data in a 2D plane. Besides visualization, it has
been exploited to re-formulate the co-clustering problem [5].
To cluster the rows and columns simultaneously, clustering of rows is first
performed for each pair of columns in used algorithm. Further columns are then
merged to form a big co-cluster. The approach “merge and split”, i.e. merging
paired columns and splitting in rows is performed then for obtaining final co-
clusters. Actually, the "merge and split" process is repeated for each column-pair.
Since there is either no significantly large co-cluster found or the same co-clusters
are detected, only the three co-clusters are obtained for our example. This algo-
rithm is based on a clear formalism and provides good quality co-clustering re-
sults. In the following paragraph, one extends it with introducing the two levels
similarity concept.
Two Levels Similarity Modelling 279
3 Two Levels Similarity Model
Start_Algorithm
End_Algorithm
Fig. 1. Two levels similarity method algorithm
Finally, one could introduce a precision/recall test which would validate (or
not) the obtained result. A result evaluation would be made and that by comparing
the obtained precision/recall values with minimal pre fixed thresholds according to
application requirements. If the result is judged sufficiently good (stability of ob-
tained error between two successive iterations), the processing operation would
stop. If not, one would introduce a co-clusters number gradual increase to obtain a
larger precision on the level of the associations development and thus a more pre-
cise result. Then, the co-clusters number would be increased in an iterative way
until obtaining desired result.
Two Levels Similarity Modelling 281
4 Experimental Results
In this section, one experimented two level similarity concept and one compared
co-clustering results with those obtained with BIVISU system (initially used for
the expression gene data applications).
The tests carried out consist in introducing a features table (26 columns repre-
senting 26 features [6] : classical low level features , color histograms features,
wavelet transform features and finally rotation translation and scaling invariance
by Trace transform) of processed images (200 lines representing the images) and
then introducing features vector for each query image and retaining the associa-
tions generated by the obtained co-clusters to determine the images subset as be-
ing most similar to each query image. Figure 2 shows a sample of the used Data-
Base images. Thus and for each introduced query image, the initially obtained co-
clusters structure is modified, and that modify its local similarity with one or more
co-clusters.
Then, the images into the co-clusters associated to query image are turned over
in an order which is based on the present features number in each associated co-
cluster.
Generally, one notices an acceptable image search quality according to the im-
ages heterogeneity and the difficulty in formalizing the BIVISU system parame-
ters adjustments (the maximum lines and columns dimensions per co-cluster...).
This difficulty generates some coarse image search errors what leads us to say that
an interactive use of this system in the content based image recognition field will
be more beneficial in precision term.
Precision/recall diagrams for a sample of 24 images (of figure 2) are given in
figure 3 for classical co-clustering method and for TLSM method. We can observe
easely the added value and the superiority of our approach for the choosen images.
Finally, one notes that in spite of the first encouraging results, a thorough ex-
perimentation will allows to confirm the two levels similarity model potential and
thus to give a solid experimental validation of this method.
282 Amar Djouak and Hichem Maaref
1
0.9
0.8
0.7
0.6
Precision 0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Fig. 3. Pecision/recall diagrams
5 Conclusion
References
Abstract Beamforming remains one of the most common methods for estimat-
ing the Direction Of Arrival (DOA) of an acoustic source. Beamformers operate
using at least two sensors that look among a set of geometrical directions for the
one that maximizes received signal power. In this paper we consider a two-sensor
beamformer that estimates the DOA of a single source by scanning the broadside for
the direction that maximizes the mutual information between the two microphones.
This alternative approach exhibits robust behavior even under heavily reverberant
conditions where traditional power-based systems fail to distinguish between the
true DOA and that of a dominant reflection. Performance is demonstrated for both
algorithms with sets of simulations and experiments as a function of different en-
vironmental variables. The results indicate that the newly proposed beamforming
scheme can accurately estimate the DOA of an acoustic source.
1 Introduction
Alrabadi, O.N., Talantzis, F. and Constantinides, A.G., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 283–291.
284 Osama N. Alrabadi, Fotios Talantzis, and Anthony G. Constantinides
The basic component of direct methods is a beamformer that scans a set of can-
didate directions for the one that exhibits the maximum power [4]. This process is
known as estimation of the Direction Of Arrival (DOA). Tuning the beamformer to
scan different directions refers to simply delaying the outputs of its microphones by
a different amount and then multiplying each of them by a set of appropriate coef-
ficients. In presence of noise and reverberation though the DOA estimate provided
could be spurious due to ensuing reflections and noise. Methods to overcome these
effects have been presented [5, 6] but still suffer significantly in heavily reverberant
environments.
In the present work we present a new criterion for choosing the direction from
which the acoustic source emits. We extend the work that was presented in [1] for
TDE and use a two-microphone array to look for the DOA that maximizes the
marginal Mutual Information (MI) at the output of the beamformer. Information
theory concepts in beamforming have been used before [7] but have no mechanisms
to deal with reverberation. The approach presented in this paper involves a frame-
work that takes into account the effects of the spreading the information into sam-
ples neighboring to the one that maximizes the MI comparing function. Through
experiments and extensive simulations we demonstrate that this novel MI based
beamformer resolves to a great degree the reverberation problem and generates ro-
bust DOA estimations. To verify our mathematical framework we test and compare
it with the traditional power-based for a set of different environmental variables.
The rest of the paper is organized as follows. In Section II we formulate the DOA
estimation problem under the beamformer constraint and present the typical power-
based method which is used at a later stage for comparison purposes. The MI based
alternative is presented in Section III. Section IV examines the performance of the
two systems under different criteria such as reverberation level, array geometry and
other requirements imposed by real-time systems. Section V discusses briefly the
conclusions of this study.
2 System Model
convolution. The length of hm (k), and thus the number of reflections, is a function
of the reverberation time T60 (defined as the time in seconds for the reverberation
level to decay to 60 dB below the initial level) of the room and expresses the main
problem when attempting to track an acoustic source. Data for DOA estimation is
collected over frames of L samples which for the t th frame we denote as xtm =
[xtm (0) . . . xtm (L − 1)] with xtm (k) = xm (L(t − 1) + k).
Estimating the DOA using a traditional beamformer involves scanning a set of
geometrical directions and choosing the one that maximizes the beamformer output
power. Typically this is performed in the frequency domain. As in the time-domain,
processing is performed in frames with the use of an L-point Short Time Fourier
Transform (STFT) over a set of discrete frequencies ω . Thus, the output of the
beamformer at frame t and frequency ω is:
1 2
Yt (θ , ω ) = ∑ Htm (θ , ω )Xtm (ω )
2 m=1
(2)
where Xtm (ω ) is the ω th element of frame Xtm i.e. the STFT of xtm . Htm (θ , ω ) is
the weight applied to the mth microphone when the beamformer is steered toward
direction θ . The beamformer weights are calculated as:
jω dm
sin θ
Htm (θ , ω ) = e− c (3)
where dm is the Euclidean distance of the mth microphone from the origin. Without
loss of generality we can consider r1 as the origin i.e. d1 = 0 and d2 = d. Thus, in
[P]
the case of the power-based beamforming the estimated direction θs from which
the source emits at frame t can be estimated as:
[P]
θs = arg max |Ybt (θ )|2 (4)
θ
where |Ybt (θ )|2 = ∑ω W (ω )|Yt (θ , ω )|2 is the average beamformer output power over
the L discrete frequencies ω . W (ω ) denotes any frequency weighting that is used. In
a reverberant environment though, the true source location is not always the global
maximum of the power function and thus the above approach often generates wrong
estimates.
1 det[C(θ )]
IN = − ln (5)
2 det[C11 ]det[C22 ]
C(θ ) ≈
H
X 1 X 1
D(X , 1) D(X , 1)
1 1
. .
.. ..
D(X 1 , N) D(X 1 , N)
ℜ D(X , d sin θ ) D(X , d sin θ )
2 c fs 2 c fs (6)
d sin θ d sin θ
D(X 2 , + 1) D(X 2 , + 1)
c f s c f s
.
.. .
..
D(X2 , d sin θ + N) D(X 2 , d sin θ
+ N)
c fs c fs
· ¸
C11 C12 (θ )
=
C21 (θ ) C22
where the ℜ{.} operation returns only the real part of its argument. Function
D(A, n) shifts the frequency components contained in frame A by n samples. This
is typically implemented by using an exponential with an appropriate complex ar-
gument.
If N is chosen to be greater than zero the elements of C(θ ) are themselves matri-
ces. In fact for any value of θ , the size of C(θ ) is always 2(N + 1) × 2(N + 1). We
call N the order of the beamforming system. N is really the parameter that controls
the robustness of the beamformer against reverberation. In the above equations and
in order to estimate the information between the microphone signals, we actually
use the marginal MI that considers jointly N neighboring samples (thus the inclu-
sion of delayed versions of the microphone signals). This way function (5) takes into
account the spreading of information due to reverberation and returns more accurate
estimates.
[MI]
The estimated DOA θs is then obtained as the angle that maximizes (5), i.e.
[MI]
θs = arg max {IN } (7)
θ
4 Performance Analysis
fs = 44.1 kHz which was broken into overlapped frames using a hamming window
and an overlap factor of 1/2. The source was placed at the geometrical angles of θs =
−60o , −30o , 0o , 30o , 60o (so as to validate the performance under different arrivals),
and at a distance Ro = 2 m, from the mid-point between the two microphones. The
test scenario involves scanning the broadside of the array i.e. from −90o to +90o in
steps of 3o and looking for the values that maximize functions (4) and (7). For each
frame of data processed, the beamforming systems return a different DOA estimate.
The squared error for frame t is then computed as:
First we look into a set of real experiments performed in a typical reverberant room
of size [5, 3.67, 2.58] m equipped with a speaker playing the test signal and a mi-
crophone array in which we can change the microphone distances. We repeated
the playback of the test signal for 30 random displacements of the overall relative
geometry between the source and microphone array inside the room. For each of
these displacements we examined the performance of the system for three different
inter-microphone distances. The reverberation time of the room was measured to be
approximately 0.3 s. In the figures to follow we present the average RMSE over all
30 experiments. It’s also worth noting that experiments are conducted in presence
of ambient noise from both air-conditioning and personal computers, estimated to
be 15 dB.
Fig. 1(a) shows the average RMSE of the beamforming systems for different
distances d between the sensors. Effectively, changing the inter-microphone dis-
tance changes the resolution of the array. It is evident that the MI based beamformer
remains more robust in estimating the correct DOA for all distances. The improve-
ment of performance for both beamforming systems as the spacing decreases can
prove misleading since it is caused by the decreased resolution. Safe conclusions
were drawn by observing the comparative performance of the two systems for each
spacing.
288 Osama N. Alrabadi, Fotios Talantzis, and Anthony G. Constantinides
10 9
MI beamformer MI beamformer
9 Power beamformer 8 Power beamformer
8 7
7
6
6
5
RMSE
RMSE
5
4
4
3
3
2 2
1 1
0 0
0.06 0.14 0.30 0.06 0.14 0.30
Distance between receivers d (m) Distance between microphones d (m)
(a) Effect of d during experiments (b)Effect of d during simulations
Fig. 1 Average RMSE for the two beamforming systems during experiments and simulations.
Values are shown for three different inter-microphone distances. L = 0.5 × T60 fs and N = 4.
4.2 Simulations
The most limiting factor in designing a robust beamformer is the effect of rever-
beration. As someone might expect, as the room becomes more reverberant the
performance of the estimating systems degrades because reflections enforce the
Locating an Acoustic Source Using a Mutual Information Beamformer 289
power or the MI at a wrong DOA. Fig. 3 summarizes the effect for the case when
L = 0.5 × T60 fs , N = 4. The MI beamformer exhibits a more robust behavior in all
environments when compared to the power-based beamformer of the corresponding
order.
10
T60=0.15
9 T60=0.30
T60=0.50
8
7
6
RMSE
5
4
3
2
1
0 1 2 3 4 5 6 7 8
Order N
Fig. 2 RMSE of MI system with increasing order N for different values of T60 . L = 0.5 × T60 fs .
Microphone spacing is 0.30 m.
10
9
8
7
6
RMSE
5
4
3
2
1 MI beamformer
Power beamformer
0
0 0.1 0.2 0.3 0.4 0.5
T60 (sec)
Fig. 3 RMSE of MI and power systems for varying T 60. L = 0.5 × T60 fs . Shown for microphone
spacing of 0.30 m.
We also investigate the effect of changing the distance between the microphones
for T60 = 0.30 sec, in order to compare the simulation results with those of the ex-
periments in Fig. 1.(a). Fig. 1.(b) shows the resulting RMSE as the distance of the
microphones increases. The MI system remains better for any spacing. The values
between Fig. 1(b) and Fig.1(a) are not identical but their differences remain small.
290 Osama N. Alrabadi, Fotios Talantzis, and Anthony G. Constantinides
These can be explained by noting that the experimental room is far from the ide-
alized version of the simulations. In reality, the experimental environment contains
furniture and walls of different texture and materials that a explain to a great degree
the differences. Additionally, the image model used in the simulations is subject to
a set of assumptions [9].
11
T60=0.15, MI
10 T60=0.15, Power
9 T60=0.30, MI
8 T60=0.30, Power
7
6
RMSE
5
4
3
2
1
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Block Size (multiples of T60fs)
Fig. 4 RMSE of MI and power systems for varying value of L. Shown for T60 = 0.15 sec and
T60 = 0.30 sec. Microphone spacing is 0.30 m.
5 Conclusions
In this paper a novel beamforming system has been introduced that detects the pres-
ence of an acoustic source based on information theory concepts. We demonstrated
that such an approach can take into account information about reverberation and
thus return DOA estimations those are more robust. This was demonstrated by a set
Locating an Acoustic Source Using a Mutual Information Beamformer 291
Acknowledgment
This work has been partly sponsored by the European Union, under the FP7 project
HERMES.
References
Abstract Color vision deficiency (CVD) is quite common since 8%-12% of the
male and 0.5% of the female European population seem to be color-blind to some
extent. Therefore there is great research interest regarding the development of meth-
ods that modify digital color images in order to enhance the color perception by the
impaired viewers. These methods are known as daltonization techniques. This pa-
per describes a novel daltonization method that targets a specific type of color vision
deficiency, namely protanopia. First we divide the whole set of pixels into a smaller
group of clusters. Subsequently we split the clusters into two main categories: colors
that protanopes (persons with protanopia) perceive in a similar way as the general
population, and colors that protanopes perceive differently. The color clusters of the
latter category are adapted in order to improve perception, while ensuring that the
adapted colors do not conflict with colors in the first category. Our experiments in-
clude results of the implementation of the proposed method on digitized paintings,
demonstrating the effectiveness of our algorithm.
1 Introduction
Color vision deficiency (CVD) is quite common since 8%-12% of the male and
0.5% of the female European population seem to be color-blind to some extent.
There is still no known medical treatment for this kind of problem. People suffering
from any type of color vision deficiency are not considered to be seriously disabled.
However there are certain cases in the daily activities of those people where the
different perception of colors could lead to experiences that range from simply an-
Doliotis, P., Tsekouras, G., Anagnostopoulos, C.-N. and Athitsos, V., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 293–301.
294 Paul Doliotis et al.
noying (e.g., when viewing artwork or browsing websites) to really dangerous (e.g.,
traffic signalling).
Several techniques have been proposed that modify digital color images in order
to enhance the color perception and reduce the confusion by viewers with CVD.
These methods are known as daltonization techniques. In this paper we describe a
daltonization technique that targets a specific type of color vision deficiency, namely
protanopia. In our method, we first divide the set of image colors into a smaller
group of clusters, reducing the amount of distinct colors. Subsequently we split the
clusters into two main categories: a category of colors that protanopes can perceive
in the same way as the general population, and a category of colors that protanopes
perceive differently. The color clusters belonging to the latter category are adapted
according to some initial daltonization parameters. Additionally we employ a color
checking module to make sure that there will not be any color confusion between
the two aforementioned categories. If there is color confusion the daltonization pa-
rameters are being iteratively modified until the confusion diminishes.
Many researchers have conducted research for appropriately modeling the visual
perception of persons with CVD. In [4], a daltonization technique is presented based
on the work published in [11] in order to modify a digital image so that it is more
visible to people with CVD. The former work is also compared to the online results
that are obtained visiting the Vischeck site [3].
The problem of color adaptation according to the user’s perception is also ad-
dressed in [9, 12]. In [9], one of the issues addressed was the problem of tailor-
ing visual content within the MPEG-21 Digital Item Adaption (DIA) framework
to meet the user’s visual perception characteristics. The image retrieval aspect for
people with CVD was discussed in [7]. A physiologically motivated human color
visual system model which represents visual information with one brightness com-
ponent and two chromatic components was proposed for testing the color perception
of people suffering from CVD [8]. In [1] a method was proposed for automaticly
adapting the daltonization parameters for each image, to ensure that there is no loss
of image structure due to conflict among colors that were daltonized and colors that
remained intact. The method described in this paper builds on top of [1], introducing
a color clustering step in order to drastically improve the efficiency of color conflict
detection.
2 Image Daltonization
The method is based on the LMS system, which specifies colors in terms of the rela-
tive excitations of the longwave sensitive (L), the middlewave sensitive (M), and the
shortwave sensitive (S) cones. As dichromats lack one class of cone photopigment,
they confuse colors that differ only in the excitation of the missing class of photopig-
ment. In contrast to the case of the trichromatic observer, who perceives three color
components, two components are sufficient to specify color for the dichromat. The
color perception of dichromats can be modeled using simple color transformations.
Intelligent Modification of Colors in Digitized Paintings for Color-blind Viewers 295
As in [3], the transformation from RGB to LMS color is obtained using a matrix
T1 , defined as follows:
17.8824 43.5161 4.1193
T1 = 3.4557 27.1554 3.8671 (1)
0.02996 0.18431 1.4670
[L p M p S p ]t = T2 [L M S]t , (4)
ER = |R − R p | (7)
EG = |G − G p | (8)
EB = |B − B p | (9)
296 Paul Doliotis et al.
Fig. 1 (a) Original artwork (“Prism”, modern abstract painting in acrylics by Bruce Gray [6]), (b)
Simulation of the artwork as perceived by a protanope. Note the confusion of red and black areas
in the artwork. www.brucegray.com/images/prism.jpg, (c) original image daltonized with Emod , (d)
protanope’s perception of (c)
Following [4], these errors are added back to the original image, but in such
a way that the error values are redistributed to the blue side of the spectrum,
so as to improve the perception of that color by a protanope. In particular, if
ER (i, j), EG (i, j), EB (i, j) are the error values at pixel (i, j), these error values are
converted to values ER,mod (i, j), EG,mod (i, j), EB,mod (i, j), which represent a shift of
color (ER (i, j), EG (i, j), EB (i, j)) towards the blue side of the spectrum. This conver-
sion is done using a matrix M defined as follows:
0 0 0
M = 0.7 1 0 (10)
0.7 0 1
Using M, ER,mod (i, j), EG,mod (i, j), EB,mod (i, j) are obtained as:
[ER,mod (i, j), EG,mod (i, j), EB,mod (i, j)]t = M[ER (i, j), EG (i, j), EB (i, j)]t . (11)
Then, the daltonized color is obtained by adding the modified error back to the
original color. At position (i, j), given the color R(i, j), G(i, j), B(i, j) of the origi-
nal image, the color Rd (i, j), Gd (i, j), Bd (i, j) in the daltonized image is defined as
follows:
[Rd (i, j), Gd (i, j), Bd (i, j)] = [R(i, j), G(i, j), B(i, j)] +
[ER,mod (i, j), EG,mod (i, j), EB,mod (i, j)] . (12)
Using the above method, good results may be achieved in the majority of color
images as shown in Figures 1c and 1d. However, there is still a lot of space for
improvements. The most obvious one is the fact that the adaptation parameters of
Equation 10 are manually chosen. In addition, there is always the possibility that,
in the daltonized image, some of the modified colors may still be confused with
other colors in the image by a protanope, as for example in Figure 2. As a result, in
addition to loss of color information there can be also loss of information about the
Intelligent Modification of Colors in Digitized Paintings for Color-blind Viewers 297
For some colors RGB, the corresponding colors R p G p B p obtained from Equation 6
are very close to the original RGB colors. We define the set Ccorrect to be the set of
colors RGB that are present in the image and for which the corresponding R p G p B p
is within 1% of RGB. Note that 1% is a threshold which can be changed according
to experiments. We define the set Cincorrect to simply be the complement of Ccorrect
among all colors appearing in the image. In our daltonization method we want to
achieve three goals:
1. Colors in Ccorrect should not be changed.
2. Colors in Cincorrect must be daltonized.
3. No color in Cincorrect should be daltonized to a color that a protanope would per-
ceive as similar to a color from Ccorrect .
Consequently, if the colors that we obtain from Equations 11 and 12 using matrix M
violate the third of the above requirements, we use an iterative algorithm, in which
M is repeatedly modified, until the third requirement is satisfied.
In order to specify the third requirement in a quantitative way, we define a pred-
icate conflict(R1 G1 B1 , R2 G2 B2 ) as follows:
½
true if |R1 − R2 | < d, |G1 − G2 | < d, |B1 − B2 | < d,
conflict(R1 G1 B1 , R2 G2 B2 ) =
false otherwise.
(13)
where d is an appropriately chosen threshold (in our experiments, d = 10).
If C1 and C2 are sets of colors, we use notation setconflict(C1 ,C2 ) for the pred-
icate denoting whether there is a conflict between any color in C1 and a color in
C2 :
½
true if ∃R1 G1 B1 ∈ C1 , R2 G2 B2 ∈ C2 |conflict(R1 G1 B1 , R2 G2 B2 )
setconflict(C1 ,C2 ) = false otherwise.
(14)
Given a matrix M and using Equations 11 and 12 we daltonize the colors of
Cincorrect . We define Cdalton to be the set of colors we obtain by daltonizing the colors
of Cincorrect . Furthermore we define Cprotanope to be the set of colors we obtain by
298 Paul Doliotis et al.
applying Equation 6 to the colors of Cdalton . Given the above definitions, our goal is
to prevent any conflicts between colors in Cprotanope and Ccorrect .
However, if the image contains a large number of distinct colors, checking for
conflicts can be too time consuming. Thus, we use clustering-based color quanti-
zation in order to reduce the number of colors we need to consider, thus obtaining
significant speedups in the overall running time. Each color can be regarded as a
point in a three dimensional space (e.g., RGB color space). Consequently, an image
can be regarded as set (or a “cloud”) of points in that space. Our goal is to create
groups of points such that:
1. Points belonging to the same group must minimize a given distance function.
2. Points belonging to different groups must maximize a given distance function.
We achieve this clustering by using the Fuzzy-C-means algorithm [13, 5, 2, 10]. An
essential parameter for Fuzzy-C-means is parameter C, which is the number of clus-
ters. In our problem, that is the number of colors with which we can describe more
efficiently our image. The more colors we use the more accurate the image becomes
but at the cost of increased running time. A good value for C in our experiments was
defined empirically (C = 100).
Next follows our pseudocode:
1. Read an image and run Fuzzy-C-means. We name cluster centers the matrix con-
taining our clusters’ centers.
2. Classify each color from cluster centers, as belonging to Ccorrect or Cincorrect .
3. Apply color daltonization to every color in Cincorrect , as described in Equation 12
and name the resulting matrix Cdalton .
4. Run protanope simulation on every color in Cdalton , as described in Equation 6
and name the resulting matrix Cprotanope .
5. If setconflict(Ccorrect ,Cprotanope ) is false, go to step 6. Otherwise go back to step
2, after modifying Matrix M appropriately as described in Equation 16.
6. Produce the result image by replacing, in the original image, every color in
Cincorrect with the corresponding color in Cdalton .
The initial value M0 given for matrix M of Equation 12 (used in step 3) is defined
as follows:
m1 m2 m3 −1 0 0
M0 = m4,0 m5 m6 = 1 1 0 (15)
m7,0 m8 m9 1 0 1
When we execute step 3 for the first time we use matrix M0 . At the t-th iteration,
matrix Mt is obtained from Mt−1 as follows:
m1 m2 m3 m1 m2 m3
Mt+1 = m4,t m5 m6 = m4,t−1 − s m5 m6 (16)
m7,t m8 m9 m7,t−1 + s m8 m9
Fig. 2 (a) Original image, where A(255,51,204), B(73,73,203) and C(193,193,255), (b) protanope
perception of (a). Note that left “1” is not visible, (c) First iteration of our algorithm: note that now
a protanope can’t perceive right “1” , (d) Second iteration: note that the right “1” still isn’t clear
enough for a protanope, (e) Third iteration: note that now right “1” is visible to the protanope.
Fig. 3 (a) Original artwork “Prism”, modern abstract painting in acrylics by Bruce Gray [6] , (b)
Protanope perception of (a), (c) Protanope perception after running our algorithm.
Fig. 4 Examples for Paul Gaugin’s painting “Market Day” (a) Original image , (b) Protanope
vision of (a) , (c) Protanope vision of (a) after running our algorithm
4 Experiments
First we test our algorithm on the image shown in Figure 2a. Note that in 2b, which
shows how the image is perceived by a protanope, the left “1” is not visible. In
300 Paul Doliotis et al.
Figure 2c, which shows the result after the first iteration, we are missing the right
“1”, whereas in Figure 2d, which shows the result of the second iteration, the right
“1” is visible, but barely. Finally in Figure 2e, which shows the result (as perceived
by a protanope) after the third iteration, both the left and the right “1” are visible.
In Figure 3a we show a modern abstract painting by Bruce Gray [6], called
“Prism”. Figure 3b shows the protanope’s perception of the painting. We should
note that red color is perceived as black, resulting in the protanope perceiving sev-
eral pairs of adjacent red and black regions as single regions. In Figure 3c we show
the result (as perceived by a protanope) after running our algorithm. We can see that
the red and black regions that appeared merged in 3b now have distinct colors.
Finally, in Figure 4a we show the painting “Market Day” by Paul Gaugin. Figure
4b shows the protanope’s perception of that painting. In Figure 4c we show the re-
sulting image (as perceived by a protanope) after running our algorithm. We should
note that several parts of the image structure are easier to perceive in Figure 4c
compared to Figure 4b, including, e.g., the contrast in the bottom part of the image
between the red and green colors that are shown in Figure 4a.
5 Conclusions
In this paper, a daltonization algorithm for people suffering from protanopia is pro-
posed. More specifically, an intelligent iterative technique is described for the se-
lection of the adaptation parameters. Computational time is drastically reduced due
to the use of color quantization. One of our method’s main advantages is that with
minor modifications, it can be applied to other types of Color Vision Deficiency,
such as deuteranopia, widely known as daltonism. An interesting future direction is
extending the proposed method to handle video content in addition to static images.
References
8. Curtis E. Martin, J. O. Keller, Steven K. Rogers, and Matthew Kabrisky. Color blindness and
a color human visual system model. IEEE Transactions on Systems, Man, and Cybernetics,
Part A, 30(4):494–500, 2000.
9. Jeho Nam, Yong Man Ro, Youngsik Huh, and Munchurl Kim. Visual content adaptation
according to user perception characteristics. IEEE Transactions On Multimedia, 7(3):435–
445, 2005.
10. N. R. Pal and J. C. Bezdek. On clustering validity for the fuzzy c-means model. IEEE
Transactions on Fuzzy Systems, 3:370–379, 1995.
11. F. Vinot, H. Brettel, and J. D. Mollon. Digital video colourmaps for checking the legibility of
displays by dichromats. Color Research and Application, 24(4):243–252, 1999.
12. Seungji Yang and Yong Man Ro. Visual contents adaptation for color vision deficiency. In
International Conference on Image Processing, volume 1, pages 453–456, 2003.
13. Lotfi A. Zadeh. Fuzzy sets. Information and Control, 8(3):338–353, 1965.
An intelligent Fuzzy Inference System for Risk
Estimation Using Matlab Platform: the Case of
Forest Fires in Greece
1
PhD candidate Democritus University of Thrace, Greece, [email protected]
2
Associate Professor Democritus University of Thrace, Greece, [email protected]
3
Professor Democritus University of Thrace, Greece, [email protected]
Abstract This paper aims in the design of an intelligent Fuzzy Inference System
that evaluates risk due to natural disasters. Though its basic framework can be eas-
ily adjusted to perform in any type of natural hazard, it has been specifically de-
signed to be applied in the case of forest fire risk in the area of the Greek terrain.
Its purpose is to create a descending list of the areas under study, according to
their degree of risk. This will provide important aid towards the task of distribut-
ing properly fire fighting resources. It is designed and implemented in Matlab’s
integrated Fuzzy Logic Toolbox. It estimates two basic kinds of risk indices,
namely the man caused risk and the natural one. The fuzzy membership functions
used in this project are the Triangular and the Semi-Triangular.
1 Introduction
Forest fire risk estimation is a major issue. The necessity for more efficient meth-
ods of fire fighting resources allocation becomes more and more urgent. This pa-
per aims in the design of a new intelligent decision support system that performs
ranking of the areas under consideration according to their forest fire risk. It is de-
signed and implemented in Matlab and it uses fuzzy logic and fuzzy sets. The sys-
tem assigns a degree of forest fire risk (DFFR) to each area by using Matlab’s
fuzzy toolbox and its integrated functions. The whole model that has been devel-
oped for this purpose consists of three distinct parts.
The first part is related to the determination of the main n risk factors (RF) affect-
ing the specific risk problem. Three fuzzy sets (FS) were formed for each RF:
~
1. S 1 = {(µj (Α j), Xi) (forest departments Aj of small risk) / j =1…N, i = 1… M}
~
2. S 2 = {(κj (Α j), Xi) (forest departments Α j of average risk) / j =1…N, i= 1…M}
Tsataltzinos, T., Iliadis, L. and Spartalis, S., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 303–310.
304 T. Tsataltzinos, L. Iliadis and S. Spartalis
~
3. S 3 = {(λj (Α j), Xi) (forest departments Α j of high risk) / j =1…N, i = 1…M }
Membership
Function
1
f
e high risk for risk factor i
d
c average risk for risk factor i
b
0
small risk for risk factor i
a
The risk factors are distinguished in two basic categories; Human factors and
Natural ones (Kailidis, 1990). Each one of these general risk types consists of sev-
eral sub-factors that influence in their own way the final risk degree (RD).
The second part was the design of the system’s main rule set that would per-
form the unification of the partial degrees of risk and the output of the unified risk
index (URI). These rules are distinct for each risk factor and most of them are
specified in bibliography (Kailidis, 1990). The greater the number of factors is, the
greater the number of rules required. This is the typical problem of combinatorial
explosion in the development of rule based knowledge systems.
To avoid the use of a huge number of rules, so that the project retains its sim-
plicity, the factors were divided into smaller subgroups according to their nature.
Decision tables were created and used for each subgroup. In this way the number
of rules was minimized significantly.
The third part of the development process was the application of the rule set for
the production of the URI. The URI can be produced by applying various types of
fuzzy relations to perform fuzzy AND, fuzzy OR operations between the fuzzy
sets (and consequently between partial risk indices). The functions for the con-
junction are called T-norms and for the union T-conorms or S-norms (Kandel A.,
1992).
The System was developed using Matlab’s integrated Fuzzy Logic Toolbox. The
row data was input into an MS Access database and extracted into MS Excel data-
sheets. Next, each column of the data was extracted into a separate Excel file to
form an input variable for Matlab. Using the xlsread and xlswrite commands of
the fuzzy toolbox, the final results were also extracted into an Excel file. The tri-
angular fuzzy membership function was implemented by the triamf function of the
fuzzy toolbox. This Project applied the Matlab's integrated Mamdani Inference
An Intelligence Fuzzy Inference System for Risk Estimation 305
method, which operates in a forward chaining mode. The Mamdani inference sys-
tem comprises of five parts:
1. Fuzzyfication of input with the Triangular membership function (Function 1)
2. Application of fuzzy operators. OR operation is performed by
µ ( x) = max( xn ) , while AND operation by using the algebraic product
µ ( x) = x1 x2 K xn
3. Application of the implication method (min) µ ( x) = min( xn )
4. Aggregation of output values with the use of max function µ ( x) = max( xn )
∫χ xf ( x)dx
5. Defuzzification on the output with the centroid method µ ( χ ) =
∫χ f ( x)dx
In the case of the human risk factors, the population density and the tourism
data was gathered from the General Secretariat of National Statistical Service of
Greece. The land value was estimated with the use of the previous two. The bigger
the population density of a forest department is and the greater its tourist devel-
opment the higher its land value. The value is represented in pure numbers from 1
306 T. Tsataltzinos, L. Iliadis and S. Spartalis
to 10. The fourth factor is input to exploit the experience and the intuition of a for-
est fire expert on the risk degree of an area.
In the case of natural factors, the Average Annual Temperature, Humidity and
Wind Speed were used. For better results, the above three factors’ data were sepa-
rated into seasons or months, because the risk has a seasonal nature. Yet the sys-
tem is capable of using even daily updates of these data to produce risk analysis
on a more frequent basis. The percentage of forest cover does not include the kind
of vegetation of each forest department due to the fact that this is a pilot effort.
The system also uses the Average Altitude of every data point as a risk factor.
The DSS was applied in all of the Greek territory. Meteorological and morpho-
logical data was gathered from Greek public services. Population density data was
gathered from General Secretariat of National Statistical Service of Greece. The
forest fire data used cover the period between 1983 and 1994, and the population
census of 1991. The results were extracted into different MS Excel files. This is a
pilot application just to indicate the performance validity of the prototype.
Fuzzy Logic (FL) and Fuzzy Sets (FS) can provide aid towards modeling the hu-
man knowledge and real world concepts (Leondes, 1998). For example the model-
ing of the concept “Hot area” in terms of average temperature, is both subjective
and imprecise so it can be considered as a fuzzy set (FS). It is clear that real world
situations can be described with the use of proper linguistics, each one defined by
a corresponding FS. For every FS there exists a degree of membership (DOM)
µs(X) that is mapped on [0,1]. For example every forest department belongs to the
FS “fire risky forest department” with a different degree of membership (Kandel,
1992). The functions used to define the DOM are called fuzzy membership func-
tions (FMF) and in this project the triangular FMF (TRIAMF) and the semi-
triangular FMF (semi-TRIAMF) were applied (Iliadis L. 2005). Functions 1 and 2
below represent the TRIAMF and semi-TRIAMF
0 if X < a
(X- a) /(c- a) if X ∈ [a, c]
Function 1
µ s (Χ) =
(b- X) /(b- c) if X ∈[c, b]
0 if X > b
0 if X < a
Function 2 µ s (Χ) =
(X - a)/(b - a) if X ∈[a, b]
Singleton functions were used to determine the boundaries of the membership
functions. The system assigns each forest department three Partial Risk Indices
(PRI), for every one of the nine factors that are taken under consideration, as it is
shown below:
An Intelligence Fuzzy Inference System for Risk Estimation 307
1. Low Danger due to each factor
2. Medium Danger due to each factor
3. High Danger due to each factor
Table 2. Values of the Singleton Fuzzy membership functions min and max
For each factor the minimum and maximum boundaries of its fuzzy member-
ship function are shown in table 2 above. This method allows the use of any kind
of data and does not need specific metrics for every factor. Due to this fact, there
was no need to do any changes in the row data provided by the Greek national
services. The above steps resulted in having 27 different PRIs. The more detailed
the linguistics become the greater the number of PRIs. Those 27 PRIs are too
many and not quite helpful. The next step was to unify them in one Unified Risk
Index (URI). To do this, this project had to take into consideration the human ex-
perience and to apply the rules that a human expert would use. For example if it is
known that an area has great population and tourism (which results in great land
value), it is near the sea (which means low altitude) and it has great forest cover,
then it definitely is a very dangerous area and needs to have our attention. In this
example, four parameters were used.
fuzzy tool 3
human.factor
(mamdani)
danger
natural.factor
Following the logic of the above structure (Fig. 3) the number of rules was
reduced significantly making the operation of the system much simpler. Combined
with the proper decision tables (Table 3) the total number of rules was reduced to
73.
This model and its corresponding intelligent information system provide a de-
scending ranking of the forest departments in Greece according to their forest fire
risk. The final membership values of all forest departments can only be compared
between each other. The bigger the difference between the final values of two de-
partments is, the bigger the difference in actual risk they have. To check the re-
sults validity, each year’s data was processed separately. The resulting descending
An Intelligence Fuzzy Inference System for Risk Estimation 309
Table 3. Decision Table sample
Temperature L L L L L L L L L M M M M M M M M M H H H H H H H H H
Wind L L L M M M H H H L L L M M M H H H L L L M M M H H H
Humidity L M H L M H L M H L M H L M H L M H L M H L M H L M H
Low Danger X X X X X X X
Medium Danger X X X X X X X X X X
High Danger X X X X X X X X X X
list produced from every year’s data was compared to the list obtained from the
ranking of the departments on their actual annual number of forest fires of the fol-
lowing year. The compatibility of this method to the actual annual forest fire situa-
tion varied from 52% to 70% (Table 4). In some cases, forest departments used to
mark as “forest fire” agricultural fires (caused on purpose) which makes the actual
logic of the Ruleset less efficient. However in a future effort this type of data
should be diminished from the fire database.
Table 4. Results.
83-84 84-85 85-86 86-87 87-88 88-89 89-90 90-91 91-92 92-93
Compatibility with the
following year's actual
52% 58% 62% 62% 66% 64% 62% 70% 54% 58%
ranking for the risky
area fuzzy set
The final ranking of all the forest departments remains almost the same despite
the use of other fuzzy membership functions. The system was also tested with the
use of Trapezoidal, semi-Trapezoidal and Sigmoid membership functions and the
differentiation in the results was not significant. Nevertheless, even if 52% to 70%
may not seem an impressively reliable performance from the statistical point of
view, it is actually a performance offering a very good practical application. Obvi-
ously there would be many governments that would be vary happy if they could
know from the previous year 52%-70% of the areas that are threatened seriously
by forest fires.
Testing showed that the years that had luck of detailed data for many forest de-
partments resulted in low compatibility, while on the other hand the results were
pretty impressive when there was enough data for all the departments. This also
was a first attempt to use detailed data for the human factors. The first tests in-
cluded only the “population density” and “tourism” factors. These tests resulted in
a maximum compatibly of 70%.
310 T. Tsataltzinos, L. Iliadis and S. Spartalis
The more detailed the data becomes on the human factors and with the help of
a proper human expert the better the accuracy of the system would become. Forest
fires can occur due to a great number of factors. Many of those factors are ex-
tremely unpredictable and immeasurable. These facts make the fire estimation a
complicated problem that can be studied with the use of fuzzy logic. This system
uses an alternative way of thinking and offers a different approach. The fact that it
can use any kind of data available and that it can produce results as soon as the da-
ta is inserted, makes it a valuable tool for estimating which forest department is in
danger. On the other hand, due to the fact that human behavior is pretty unpredict-
able, the expert’s opinion is necessary to enable the production of better results or
even to perform various scenarios.
The system has shown that it is quite useful and that it can improve its per-
formance if more data is gathered. It will also be expanded towards the estimation
of the daily forest fire risk which can be seen as the problem of having favorable
forest fire ignition and acceleration conditions.
References
1. Tsataltzinos T. (2007) “A fuzzy decision support system evaluating qualitative attributes
towards forest fire risk estimation”, Proceedings 10th International Conference on Engi-
neering Applications of Neural Networks, Thessaloniki, Hellas, August 2007.
2. Iliadis L. (2005) “A decision support system applying an integrated Fuzzy model for long
- term forest fire risk estimation” Environmental Modelling and Software, Elsevier Sci-
ence, Vol.20, No.5, pp.613-621, May 2005.
3. Iliadis L., Maris F., Tsataltzinos T. (2005). “An innovative Decision Support System us-
ing Fuzzy Reasoning for the Estimation of Mountainous Watersheds Torrential Risk: The
case of Lakes Koroneia and Vovli”, Proceedings IUFRO Conference “Sustainable For-
estry in theory and practice: recent advances in inventory and monitoring statistics and
modeling information and knowledge management and policy science” Pacific Northwest
Research Station GTR-PNW-688, University of Edimburgh, UK,
4. Kandel A., 1992, Fuzzy Expert Systems. CRC Press. USA.
5. Kecman V., 2001, Learning and Soft Computing. MIT Press. London England.
6. Leondes C.T., 1998, “Fuzzy Logic and Expert Systems Applications”, Academic Press.
California USA.
7. Iliadis L., Spartalis S., Maris F., Marinos D. 2004 “A Decision Support System Unifying
Trapezoidal Function Membership Values using T-Norms”. Proceedings International
Conference in Numerical Analysis and Applied Mathematics (ICNAAM), J. Wiley-VCH
Verlag GmbH Publishing co., Weinheim Germany.
8. Zhang J.X., Huang C.F., 2005. “Cartographic Representation of the Uncertainty related to
natural disaster risk: overview and state of the art”, LNAI Vol. 3327, pp. 213-220
9. Nguyen H., Walker E., 2000. “A First Course in Fuzzy Logic”, Chapman and Hall, Li-
brary of the Congress, USA
10. Cox E., 2005. Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration,
Elsevier Science, USA
11. E A Johnson, Kiyoko Miyanishi, 2001, “Forest Fire: Behavior and Ecological Effects”
12. J Kahlert, H Frank, 1994, “Fuzzy-Logik und Fuzzy-Control”
13. Mamdani, E.H. and S. Assilian, "An experiment in linguistic synthesis with a fuzzy logic
controller," International Journal of Man-Machine Studies, Vol. 7, No. 1, pp. 1-13, 1975.
14. Kailidis D. 1990, “Forest Fires”
MSRS: Critique on its Usability via a Path
Planning Algorithm Implementation
1 Introduction
In the last few years there has been an increasing interest in the unification of arti-
ficial intelligence and robotics platforms. This has led to the creation and use of an
expanding number of robotics software platforms, with a significant amount of
undergraduate classes making use of the new technologies by creating rather ad-
vanced robotics projects within one or two semester courses [1, 23, 26]. In 2006
Microsoft entered the robotics field with its own robotics platform, named Micro-
soft Robotics Studio, competing against already widespread platforms such as the
Player Project.
In this paper we implement a path planning algorithm in a simulated robotics
environment, of which will be able to change its topology and the number of ob-
stacles it contains during the agent’s movement in it. The robotics platform that
will be used is Microsoft’s Robotics Studio, due to the fact that its introduction
has caused extensive discussion and controversy as to whether or not it is suited
for academic research, or educational and industrial purposes [3, 24, 25]. We will
address this mixture of skepticism and enthusiasm by giving Microsoft’s Robotics
Markou, G. and Refanidis, I., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 311–320.
312 George Markou and Ioannis Refanidis
Studio’s features, ease of use and learnability a thorough critique, through the im-
plementation of the aforementioned algorithm.
Due to the nature of the simulated environment in which the agent will move,
the path planning algorithm that we will implement will have to be able to create a
new plan or adapt an existing one every time the environment’s topology changes.
Koenig et al. [7] suggested that in systems where an agent has to constantly adapt
its plans due to changes in its knowledge of the world, an incremental search
method could be very beneficial as it can solve problems potentially faster than
solving each search problem from scratch. They combined such a method with a
heuristic one, which finds shortest paths for path-planning problems faster than
uninformed search methods. This led to the creation of the algorithm we will im-
plement, Lifelong Planning A* (LPA*) [8], which produces a plan, having a qual-
ity which remains consistently as good as one achieved by planning from scratch.
The remainder of the paper is organized as follows: In Section 2 we review
works related to our own research, while in Section 3 we compare Microsoft’s
Robotics Studio to other prominent robotics platforms. Section 4 focuses on the
theoretical aspects of the LPA* algorithm. In Section 5 we discuss the domain that
was created in Robotics Studio, both in regard to the simulated maze and to the
robot that was used. Section 6 presents the experiments we implemented, and Sec-
tion 7 concludes the paper and poses directions for future work.
2 Related Work
For Microsoft Robotics Studio (MSRS) to become the standard robotics develop-
ment platform, it has to achieve mainly two different goals: First, partnerships
within the robotic industry, as well as with the academia. Secondly, the program
itself needs to be able to offer advantages in comparison with other platforms. The
first goal has been fulfilled to a point, as several companies, universities and re-
search institutes opted to support and use MSRS, such as Kuka, Robosoft,
fischertechnik and Parallax, Inc. [13]. Additionally, it is available currently for
free download and use to anyone using it for noncommercial purposes.
As to the second goal, in [6] the author concluded that MSRS offers a wide
range of technological solutions to problems common in the robotic field, by pro-
viding features such as visual programming or its combined system of concur-
rency control with efficient distributed message passing. However, he admits that
there are still evident limitations to the program, like its integration with low level
processors. The former opinion is shared by Tsai et al in [23] who used MSRS in
an effort to design a service oriented computer course for high schools. They con-
cluded that there are several disadvantages in the structure of the program, mainly
that the visual programming language that is used in MSRS requires detailed
knowledge of an imperative programming language, and that the loop structures
which are used in it are implemented by “Goto”, instead of by structure construct.
MSRS – Critique on its Usability 313
Also, they pointed out that some of the service oriented features that Microsoft
had promised to provide were not available.
Others, however, are far more positive towards MSRS. Workman and Elzer in
[26] used the program in an upper-level undergraduate robotics elective to docu-
ment its usefulness in such an academic environment. They found that MSRS pro-
vided a great link between the language syntax already known to students and un-
familiar robotics semantics and highly recommended its use, adding that they were
quite satisfied with the available features of the program and the support it pro-
vided for different hardware. Tick in [22] goes even further to suggest that the in-
troduction of MSRS in the robotic market shows the future direction for pro-
gramming for Autonomous Mobile Research Robots and could possibly determine
the evolution of these systems as its own features will force other platforms to de-
velop their competitive products so as to offer similar capabilities.
In conclusion, based on the related bibliography up-to-date it still remains un-
clear whether MSRS will evolve to be the industry’s standard, as other Micro-
soft’s programs have achieved in the past. On the other hand, it is quite definitive
that it has a lot of useful features to offer, especially in the educational field, as
well as that it is already at least a simple starting point for anyone who wants to
become involved with a field as complex as robotics.
Before we present the domain we created in MSRS we briefly discuss the similari-
ties and differences of it in comparison to some of the most prevalent robotics
platforms. Although MSRS is available as a free download for researchers or hob-
byists, it is not open source, and it is also not free of charge if intended for com-
mercial use, whereas several platforms like the Player Project are both. Moreover,
MSRS is the only platform in our comparison that can only be used in one operat-
ing system, while most are compatible with at least two, typically both Windows
and Linux operating systems. The Player Project and the Orocos Project do not na-
tively support Windows, but the former can run on Linux, Solaris, Mac OSX and
*BSD, whereas the latter is aimed at Linux systems, but has also been ported to
Mac OSX. One other major difference of Microsoft’s robotics platform in contrast
to its antagonists is that it does not provide a complete robotics intelligence system
so that the robots it supports can be made autonomous, but relies on the program-
mers to implement such behaviours.
Its advantages over the competition, however, are also significant. It is one of
the few major robotics platforms - along with Gostai’s and Cyberbotics’ collabo-
rative platform Urbi for Webots - to provide a visual programming environment,
and its architecture is based on distributed services, with these services being able
to be constructed in reusable blocks. Furthermore, the platform enjoys the finan-
cial and technological support of one of the largest corporations in the world. In
314 George Markou and Ioannis Refanidis
Table 1 there is a comparison of some of the available characteristics of six of the
most widely used robotics platforms today.
Table 1. Features of several of the most prominent robotics platforms [2, 4, 20].
4 Lifelong Planning A*
It is very common for artificial intelligence systems to try and solve path-planning
problems in one shot, without considering that the domain in which they operate
might change, thus forcing them to adapt the plan that they have already calcu-
lated. Solving the new path-planning problem independently might suffice if the
domain is sufficiently small and the changes in it are infrequent, but this is not
usually the case.
Koenig et al in [8] developed the Lifelong Planning algorithm to be able to re-
peatedly find a shortest path between two given vertexes faster than executing a
complete recalculation of it, in cases where this would be considered a waste of
computational resources and time. It combines properties of a heuristic algorithm,
namely A* [5], and an incremental one, DynamicSWSF-FP [16]. The first search
LPA* executes is identical to a search by a version of A* that breaks ties in favour
of vertices with smaller g-values. The rest of its searches, which take place when a
change in the domain happens, however, are significantly faster. This is achieved
by using techniques which allow the algorithm to recognize the parts of the search
tree which remain unchanged in the new one.
MSRS – Critique on its Usability 315
Properties of A* are used to focus the search on parts of the tree that are more
likely to be part of the shortest path and determine which start distances should not
be computed at all, while DynamicSWSF-FP is used to decide whether certain dis-
tances remain the same and should not be recomputed. The combination of these
techniques can be very efficient in reducing the necessary time to recalculate a
new path if the differences between the old and the new domain are not signifi-
cant, and the changes were close to the goal. Finally, it is noteworthy that our im-
plementation does not follow the original LPA* algorithm. Instead we opted to
implement the backwards version presented in [9] which continuously calculates a
new shortest path from the goal vertex to the agent’s current position, and not, as
it originally was, from the start vertex to the goal.
5 Maze Domain
The entire simulation domain was created using Microsoft Robotics Studio 1.5
Refresh, which was the current version of the program when we started working
on this paper. Subsequently, as Microsoft released a new version of the platform
Microsoft Robotics Developer Studio (MRDS) 2008 we migrated our project to
the newest version of the program. The platform allows the creation of new user-
defined entities, which can be associated with a mesh, making the entity appear
more realistic. As a three-dimensional mesh can be created and imported into the
MSRS’ simulations environment from most 3D graphical editing programs [15],
the resulting simulation can reflect almost any real situation.
Although creating a particularly realistic environment is not suited for a novice
user as it can be a very complex procedure, several lifelike environments exist as
built-in samples in Microsoft’s Visual Simulation Environment in MRDS 2008.
They have been developed by SimplySim, a French company that provides profes-
sional quality real time 3D simulations, and depict environments ranging from ur-
ban sceneries and apartments to a forest [19].
The environment for our experiments is a much simpler one, based on the
“MazeSimulator” project, a program which allows users to create labyrinths based
on a bitmap image. It was created by Trevor Taylor [21], who in turn used ele-
ments from previous work done by Ben Axelrod. The maze environment we simu-
lated is explained in further detail in Section 5.1.
5.2 Robot
Microsoft Robotics Studio 1.5 Refresh supported - with built-in services - a wide
variety of robots, ranging from simple and affordable hobbyist robots, such as the
iRobot Create, to sophisticated humanoid robots capable of performing fighting
and acrobatics, like the Kondo KHR-1. The list also includes the Lego Mind-
storms NXT, MobileRobots’ Pioneer 3DX, the Boe-Bot Robot from Parallax and
fischertechnik’s ROBO Interface [14]. All the aforementioned robots are also sup-
ported in MRDS 2008, with the exception of the Parallax Boe-Bot. The robot used
in our experiments was a Pioneer 3DX, with a mounted sick laser range finder on
top of it, as at the time it was one of the most widely used in various MSRS’ tuto-
rials and projects .
We have defined the movement of the robot to consist of three parts. First, the
robot moves in a straight line for a distance equal to the length of a node. Then, it
decides, based on the plan created from LPA*, whether or not is required to make
a turn, and finally it executes the turn, rotating in angles which are multiple of 90
degrees. Using the laser range finder, the robot builds a tri-color map of the envi-
ronment, in which white color symbolizes free space that the robot has explored.
Black color is drawn on the points on the map that the laser hit an obstacle, and
the rest of the map – the part of the environment that the robot has not explored, is
shown in grey color. Each time the robot moves through a specific location, the
part of the map that corresponds to that region will be overwritten by the new data
that the robot collects. In essence, we build a simple occupancy grid map, with
each cell of it containing a value that represents the possibility that it is occupied.
MSRS – Critique on its Usability 317
6 Experiments
We created three different experiments, all with the same initial maze settings, but
each one changing in a different way after the robot had reached a certain node of
the domain. In two of the experiments the changes were known beforehand, whe-
reas in the last one they were random. In each one, however, the changes were mi-
nimal, blocking / unblocking a maximum of two nodes.
We implemented LPA* in MSRS without having to study the program in great
depth or learn a new programming language, since the support for multiple lan-
guages gave us the opportunity to work in one related to our previous knowledge,
in our case C#. An inexperienced user however, has the option to use a graphical
“drag-and-drop” programming language provided by Microsoft, which is designed
on a dataflow-based model. Microsoft’s Visual Programming Language (VPL) al-
lows users to create their program by simply “orchestrating activities”, that is,
connecting them to other activity blocks. An activity is a block with inputs and
outputs that can be represent pre-built services, data-flow control, a function, or
even a composition of multiple activities.
Initially, it was our intention to make use of the visual programming environ-
ment that Microsoft developed to implement our project, so as to additionally
document the strengths and weakness of the new programming language as well
as MSRS. However, the task proved to be extremely difficult, if not impossible,
due to obvious deficiencies of VPL: First of all its diagrams tend to become ex-
ceedingly large as the program’s complexity increases. Moreover, VPL has lim-
ited support for arbitrary user-defined data types and does not support a generic
object which, naturally, is an important restriction to a programmer's tools. What
is more, the only type of control flow and collection of items that have built-in
support in VPL are “if statements” and lists respectively; that is, recursion and ar-
rays are not natively supported at the moment.
Thus, expert programmers will likely prefer to write in an imperative pro-
gramming language, although they can still find VPL useful as a tool, especially if
they are not familiar with MSRS’ environment, as it can easily be used for creat-
ing the skeleton of a basic program by wiring activities to each other and auto-
matically generating the consequent C# code through it. The opinion we formed
through our experience though, is that VPL is best suited for novice users who on-
ly have a basic understanding of programming concepts such as variables, and
might enjoy the easiness of not writing any code.
One element of the platform that is especially helpful to the programming
process is the Concurrency and Coordination Runtime (CCR), a programming
model that facilitates the development of programs that handle asynchronous be-
havior. Instead of writing complex multithreaded code to coordinate the available
sensors and motors functioning at the same time on a robot, the CCR handles the
required messaging and orchestration efficiently as its function is to “manage
asynchronous operations, exploit parallel hardware and deal with concurrency and
318 George Markou and Ioannis Refanidis
partial failure” [12]. Furthermore, it has been proven to be not only useful as a part
of MSRS, but in non-robotics development processes [11, 17].
As aforementioned, we implemented LPA* so that it can work backwards. The
reason behind this choice was that in this way we were able to calculate a new
shortest path for the part of the maze we were interested in, i.e., from the goal
node to the robot. Had we used the original version of LPA*, the algorithm would
calculate an entirely new shortest path from the original node to the finish. The al-
gorithm was applied successfully into the rest of the MSRS domain we created
and performed as one would expect having read the theoretical properties of LPA*
in [7, 8, 9].
The simulation environment was aesthetically appealing and served our func-
tional needs. Based on the robot’s interaction with it and in particular while the
robot followed the course through the maze depicted in Fig. 2 (b), the laser range
finder built the occupancy grid map that is shown in Fig. 2 (a).
Fig. 2.a (Left) Occupancy grid map of the maze’s final state. 2.b (Center) Ground plan of the
maze. The robot’s course is shown in blue. 2.c (Right) Final state of the simulated domain.
References
1. Blank D, Kumar D, Marshall J & Meeden L (2007) Advanced robotics projects for under-
graduate students. AAAI Spring Symposium: Robots and Robot Venues: Resources for AI
Education: 10-15
2. Bruyninckx H (2001) Open robot control software: the OROCOS project. Proceedings IEEE
International Conference on Robotics and Automation (3): 2523-2528
3. Bruyninckx H (2007) Microsoft Robotics Studio: Expected impact, Challenges & Alterna-
tives. Panel presentation at the IEEE International Conference on Robotics and Automation
4. Gerkey B (2005) The Player Robot Device Interface - Player utilities. http://playerstage.
sourceforge.net/doc/Player-2.0.0/player/group__utils.html. Accessed 15 January 2009
5. Hart P E, Nilsson N J, Raphael B (1968) A Formal Basis for the Heuristic Determination of
Minimum Cost Paths. IEEE Transactions on Systems Science & Cybernetics, 4(2):100-107
320 George Markou and Ioannis Refanidis
6. Jackson J (2007) Microsoft Robotics Studio: A Technical Introduction. IEEE Robotics & Au-
tomation Magazine 14(4):82-87
7. Koenig S, Likhachev M & Furcy D (2004) Lifelong Planning A*. Artificial Intelligence, 155
(1-2):93-146
8. Koenig S, Likhachev M, Liu Y & Furcy D (2004) Incremental Heuristic Search in Artificial
Intelligence. AI Magazine, 25(2):99-112
9. Likhachev M & Koenig S (2005) A Generalized Framework for Lifelong Planning A*. Pro-
ceedings International Conference on Automated Planning and Scheduling: 99-108
10. Michael N, Fink J & Kumar V (2008) Experimental Testbed for Large Multirobot Teams.
IEEE Robots and Automation Magazine, 15(1):53-61
11. Microsoft Corporation (2008) Microsoft CCR and DSS Toolkit 2008: Tyco Case Study.
http://go.microsoft.com/fwlink/?LinkId=130995. Accessed 13 January 2009
12. Microsoft Corporation (2008) Microsoft Robotics Developer Studio: CCR Introduction.
http://msdn.microsoft.com/en-us/library/bb648752.aspx. Accessed 12 January 2009
13. Microsoft Corporation (2008) Microsoft Robotics Studio Partners. http://msdn.micrsoft.com
/en-us/robotics/bb383566.aspx. Accessed 15 October 2008
14. Morgan S (2008) Programming Microsoft Robotics Studio, Microsoft Press
15. Morgan S (2008) Robotics: Simulating the World with Microsoft Robotics Studio.
http://msdn.microsoft.com/en-us/magazine/cc546547.aspx. Accessed 13 January 2009
16. Ramalingam G & Reps T (1996) An incremental algorithm for a generalization of the short-
est-path problem. Journal of Algorithms, 21:267-305
17. Richter J (2006) Concurrent Affairs: Concurrency and Coordination Runtime. http://msdn.
microsoft.com/en-us/magazine/cc163556.aspx. Accessed 13 January 2009
18. RoboCupRescue (2006) Rescue Simulation Leagues. http://www.robocuprescue.org
/simleagues.html. Accessed 25 October 2008
19. SimplySim (2008) Generic Environment. http://www.simplysim.net/index.php?scr=scrAc-
cueil&idcategorie=1. Accessed 12 January 2009
20. Somby M (2008) Software Platforms for Service Robotics http://linuxdevices.com
/articles/AT9631072539.html. Accessed 18 October 2008
21. Taylor T (2008) MSRS Maze Simulator. http://www.soft-tech.com.au/MSRS/MazeSimulator
/MazeSimulator.htm. Accessed 23 September 2008
22. Tick J (2006) Convergence of Programming Development Tools for Autonomous Mobile
Research Robots. Proceedings Serbian-Hungarian Joint Symposium on Intelligent Systems:
375-382
23. Tsai W T, Chen Y, Sun X, et al. (2007) Designing a Service-Oriented Computing Course for
High Schools. Proceedings IEEE International Conference on e-Business Engineering: 686-
693
24. Turner D (2006) Microsoft Moves into Robotics. http://www.technologyreview.com
/computing/17419/page2/. Accessed 21 October 2008
25. Ulanoff L (2006) Rivals Skeptical of Microsoft's New Robot Software. http://www.pcmag.
com/article2/0,1895,1979617,00.asp. Accessed 21 October 2008
26. Workman K & Elzer S (2009) Utilizing Microsoft robotics studio in undergraduate robotics.
Journal of Computing Sciences in Colleges 24(3):65-71
Automated Product Pricing Using
Argumentation
1
Department of Sciences - Technical University of Crete,
[email protected]
2
Department of Mathematics and Computer Science – Paris Descartes University,
{pavlos, nikolaos.spanoudakis}@mi.parisdescartes.fr
1 Introduction
Spanoudakis, N. and Moraitis, P., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 321–330.
322 Nikolaos Spanoudakis and Pavlos Moraitis
sparse works provide solutions, the retail business sector. Argumentation re-
sponded well to our requirements, which demanded a system that would have the
possibility to apply a pricing policy adjusted to the market context, in the mean-
while reflecting the points of views of diverse decision makers.
This product pricing agent was developed in the context of MARKET-MINER
project that was co-funded by the Greek government. After evaluation, its results
have been considered to be successful and are expected to have an important im-
pact in the firm’s business intelligence software suite in the next four to five years.
In what follows we firstly present the basics of the used argumentation frame-
work in section 2 and then, in section 3, we discuss how we modeled the knowl-
edge of the particular application domain. Subsequently, we present the product
pricing agent, including information on how we conceived and modeled the sys-
tem using the Agent Systems Engineering Methodology (ASEME), in section 4,
followed by the presentation of the evaluation results in section 5. Finally, in sec-
tion 6, we discuss related work and conclude.
Decision makers, be they artificial or human, need to make decisions under com-
plex preference policies that take into account different factors. In general, these
policies have a dynamic nature and are influenced by the particular state of the en-
vironment in which the agent finds himself. The agent's decision process needs to
be able to synthesize together different aspects of his preference policy and to
adapt to new input from the current environment. We model the product pricing
decision maker as such an agent.
To address requirements like the above, Kakas and Moraitis [5] proposed an
argumentation based framework to support an agent's self deliberation process for
drawing conclusions under a given policy. The following definitions present the
basic elements of this framework:
Definition 1. A theory is a pair (T, P) whose sentences are formulae in the
background monotonic logic (L, ⊢ ) of the form L←L1,…,Ln, where L, L1, …, Ln
are positive or negative ground literals. For rules in P the head L refers to an (irre-
flexive) higher priority relation, i.e. L has the general form L = h_p(rule1, rule2).
The derivability relation, ⊢ , of the background logic is given by the simple infer-
ence rule of modus ponens.
An argument for a literal L in a theory (T, P) is any subset, T, of this theory
that derives L, T ⊢ L, under the background logic. A part of the theory T0 ⊂ T, is
the background theory that is considered as a non defeasible part (the indisput-
able facts). An important notion in argumentation is that of attack. In the current
framework an argument attacks (or is a counter argument of) another when they
derive a contrary conclusion. Another notion is that of admissibility. An argument
(from T) is admissible if it counter-attacks all the attacks it receives. For this it
Automated Product Procing Using Argumentations 323
needs to take along priority arguments (from P) and makes itself at least as strong
as its counter-arguments
Definition 2. An agent’s argumentative policy theory or theory, T, is a tuple
T = (T, PR, PC) where the rules in T do not refer to h_p, all the rules in PR are prior-
ity rules with head h_p(r1, r2) s.t. r1, r2 ∈ T and all rules in PC are priority rules
with head h_p(R1, R2) s.t. R1, R2 ∈ PR ∪ PC.
Thus, in defining the decision maker’s theory three levels are used. The first
level (T) that defines the rules that refer directly to the subject domain, the second
level that define priorities over the first level rules and the third level rules that de-
fine priorities over the rules of the previous level.
Gorgias (http://www.cs.ucy.ac.cy/~nkd/gorgias/), a prolog implementation of the
framework presented above, defines a specific language for the object level rules
and the priorities rules of the second and third levels. A negative literal is a term
of the form neg(L). The language for representing the theories is given by rules
with the syntax rule(Signature, Head, Body) where Head is a literal, Body is a list of
literals and Signature is a compound term composed of the rule name with selected
variables from the Head and Body of the rule. The predicate prefer/2 is used to cap-
ture the higher priority relation (h_p) defined in the theoretical framework. It
should only be used as the head of a rule. Using the previously defined syntax we
can write the rule rule(Signature, prefer(Sig1, Sig2), Body)., which means that the rule
with signature Sig1 has higher priority than the rule with signature Sig2, provided
that the preconditions in the Body hold. If the modeler needs to express that two
predicates are conflicting he can express that by using the rule conflict(Sig1,Sig2).,
which indicates that the rules with signatures Sig1 and Sig2 are conflicting. A lit-
eral’s negation is considered by default as conflicting with the literal itself.
Firstly, we gathered the domain knowledge in free text format by questioning the
decision makers that participate in the product pricing procedure. They were offi-
cers in Financial, Marketing and Production departments of firms in the retail
business but also in the manufacture domain. Then, we processed their statements
aiming on one hand to discover the domain ontology and on the other hand the de-
cision making rules.
We used the Protégé (http://protege.stanford.edu/) open source ontology editor
for defining the domain concepts and their properties and relations. In Figure 1,
the Product concept and its properties are presented. The reader can see the prop-
erties identified previously hasPrice and isAccompaniedBy. Price is defined as a
real number (Float) and isAccompaniedBy relates the product to multiple other in-
stances of products that accompany it in the consumer’s cart. In the figure, we also
present the firm strategy concept and its properties that are all Boolean and repre-
sent the different strategies that the firm can have activated at a given time. For
324 Nikolaos Spanoudakis and Pavlos Moraitis
example, the hitCompetition property is set to true if the firm’s strategy is to re-
duce the sales of its competitors. The property retail_business characterizes the
firm as one in the retail business sector.
For our knowledge base definition we used Prolog. To use the concepts and
their properties as they were defined in Protégé we defined that a Boolean prop-
erty is encoded as a unary predicate, for example the advertisedByUs property of
the Product concept is encoded as advertisedByUs(ProductInstance). A property
with a string, numerical, or any concept instance value is encoded as a binary
predicate, for example the hasPrice property of the Product concept is encoded as
hasPrice(ProductInstance, FloatValue). A property with a string, numerical, or
any concept instance value with multiple cardinality is encoded as a binary predi-
cate. However the encoding of the property to predicate can be done in two ways.
The first possibility is for the second term of the predicate to be a list. Thus, the
isAccompaniedBy property of the Product concept is encoded as isAccompa-
niedBy(ProductInstance, [ProductInstance1, ProductInstance2, …]), where prod-
uct instances must not refer to the same product. A second possibility is to create
multiple predicates for the property. For example the hasProductType property of
the Product concept is encoded as hasProductType(ProductInstance, Pro-
ductTypeInstance). In the case that a product has more than one product types, one
such predicate is created for each product type.
Then, we used the Gorgias framework for writing the rules. The goal of the
knowledge base would be to decide on whether a product should be priced high,
Automated Product Procing Using Argumentations 325
low or normally. Thus it emerged, the hasPricePolicy property of the Product
concept. After this decision we could write the object-level rules each having as
head the predicate hasPricePolicy(Product, Value) where Value can be low, high
or normal – the relevant limitation for this predicate is also defined in the ontology
(see the hasPricePolicy property of the Product concept in Figure 1). Then, we
defined the different policies as conflicting, thus only one policy was acceptable
per product. To resolve conflicts we consulted with the firm (executive) officers
and defined priorities over the conflicting object rules. Consider, for example, the
following rules (variables start with a capital letter as it is in Prolog):
Rules r1_2_2 and r2_3 are conflicting if they are both activated for the same
product. The first states that a product should be priced low if the firm wants to hit
the competition for its product type, while the second states that a new technology
product that is an advertised invention should be priced high. To resolve the con-
flict we add the pr1_2_6 priority rule which states that r1_2_2 is preferred to
r2_3.
In this section we firstly describe the Market-mIner product Pricing Agent (also
referred to as MIPA) development process and then we focus in two important as-
pects of it, the decision making module and human-computer interaction.
We designed our agent using the Agent Systems Engineering Methodology
(ASEME) [10]. During the analysis phase we identified the actors and the use
cases related to our agent system (see Figure 2). Note that the Agent Modeling
Language [9] (AMOLA), which is used by ASEME for modeling the agent-based
system, allows for actors to be included in the system box, thus indicating an
agent-based system. The system actor is MIPA, while the external actors that par-
ticipate in the system’s environment are the user, external systems of competitors,
weather report systems (as the weather forecast influences product demand as in
the case of umbrellas) and municipality systems (as local events like concerts,
sports, etc, also influence consumer demand). We started by identifying general
use cases (like interact with user) and then we elaborated them in more specific
ones (like present information to the user and update firm policy) using the <<in-
clude>> relation.
Then, we completed the roles model as it is presented in Figure 3(a). This mod-
el defines the dynamic aspect of the system, general use cases are transformed to
capabilities, while the generic ones are transformed to activities. We used the Gaia
326 Nikolaos Spanoudakis and Pavlos Moraitis
operators ([14]) for creating liveness formulas that define the dynamic aspect of
the agent system, what happens and when it happens. A. B means that activity B is
executed after activity A, Aω means that activity A is executed forever (when it fin-
ishes it restarts), A | B means that either activity A or activity B is executed and A ||
B means activity A is executed in parallel with activity B.
The next step was to associate each activity to a functionality, i.e. the technol-
ogy that will be used for its implementation. In Figure 3(b) the reader can observe
the capabilities, the activities that they decompose to and the functionality associ-
ated with each activity. The choice of these technologies is greatly influenced by
non-functional requirements. For example the system will need to connect on di-
verse firm databases. Thus, we selected the JDBC technology (http://java.sun.com/
javase/technologies/database/) that is database provider independent.
The last step, before implementation, is to extract from the roles model the sta-
techart that resembles the agent. This is achieved by transforming the liveness
formula to a statechart in a straightforward process that uses templates to trans-
form activities and Gaia operators to states and transitions (see [9] for more de-
tails). The resulting statechart, i.e. the intra-agent control (as it is called in
ASEME) is depicted in Figure 7. The statechart can then be easily transformed to
a computer program.
The decision making capability includes four activities:
1. wait for new period activity: It waits for the next pricing period
2. get products information activity: It accesses a corporate database to collect the
data needed for inference,
3. determine pricing policy activity: It reasons on the price category of each prod-
uct, and,
4. fix prices activity: Based on the previous activity’s results, it defines the final
product price.
Automated Product Procing Using Argumentations 327
Role: Product Pricing Agent
Liveness:
product pricing agent = (decide on
pricing policy)ω || (interact with
user)ω || [(get market
information)ω]
decide on pricing policy = wait for
new period. get products
information. determine pricing
policy. fix prices.
interact with user = (present
information to the user | update
firm policy)+
get market information = get weather
information. get local
information. get competition
information.
(a) (b)
Fig. 3. MIPA Role Model (a) and the relation between Capabilities, Activities and Functional-
ities (b).
The determine pricing policy activity invokes the prolog rule base presented in
§3 that includes 274 rules, 31 of which are the object rules and 243 are the priority
rules. The fix prices activity’s algorithm aims to produce a final price for each
product. The algorithm’s inputs are a) the procurement/manufacture cost for a
product, or its price in the market, b) the outcome of the reasoning process (the
price policy for each product), c) the default profit ratio for the firm, d) a step for
rising the default profit ratio, e) a step for lowering this ratio, and, f) the lowest
profit ratio that the firm would accept for any product. The pricing algorithm also
takes into account the number of arguments that are admissible for choosing a
specific price policy, strengthening the application of the policy.
328 Nikolaos Spanoudakis and Pavlos Moraitis
A screenshot from the human-machine interface is presented in Figure 5. In the
figure we present the pricing results to the application user for some sample prod-
ucts. The facts inserted to our rule base for this instance are the:
The reader should notice the application of the rules presented in §3 for the
lcd_tv_32_inches product that is a new technology product and an advertised in-
vention but is priced with a low policy because its product type (electri-
cal_domestic_appliances) has been marked by the firm as a market where it
should hit competition. Moreover, the firm has also decided that it wants to pene-
trate the electrical_domestic_appliances market, therefore there are two argu-
ments for pricing the lcd_tv_32_inches product low. In Figure 5, these reasons are
explained to the user in human-readable format and also the final price is com-
puted. The human-readable format is generated automatically by having default
associations of the predicates to free text. The t_shirt_XXL and jacket_XXL prod-
ucts are clothes that are having a normal pricing policy. However, the jacket_XXL
product accompanies in the consumer’s basket the lcd_tv_32_inches product,
therefore, it is priced high according to the high_low_strategy of the firm.
The product pricing agent application was evaluated by SingularLogic SA, the
largest Greek software vendor for SMEs. The MARKET-MINER project included
an exploitation plan [12]. The application evaluation goals were to measure the
overall satisfaction of its users. In the evaluation report [13] three user categories
were identified, System Administrators, Consultants and Data Analysts. The crite-
ria C1) Performance, C2) Usability, C3) Interoperability, and C4) Security and
Trust were used for measuring user satisfaction. The users expressed their views
in a relevant questionnaire and they marked their experience on a scale of one
(dissatisfied) to five (completely satisfied) and their evaluation of the importance
of the criterion on a scale of one (irrelevant) to five (very important).
The Process of Evaluation of Software Products [2] (MEDE-PROS) was used
for our set of criteria. The results of the evaluation are presented in Table 1 and
they have been characterized as “very satisfactory” by the SingularLogic research
and development software assessment unit. MARKET-MINER has been decreed
as worthy for recommendation for commercialization and addition to the Firm’s
software products suite.
Table 1. MARKET-MINER evaluation results. The rows with white background are those of the
consultants, while those with grey background represent the evaluation of the system administra-
tors (see [13] for more details).
Criterion Criterion performance Criterion Importance
C1 86% 0,78
C2 83% 0,88
C3 91% 0,88
C4 83% 0,64
C3 86% 0,92
C4 61% 0,92
This paper presented a novel application of autonomous agents for automating the
product pricing process. This issue has never been tackled before in this scale. A
patent just provided some guidelines on an architecture for such a system exclu-
sively for super market chains [1]. Earlier works proposed a support of the product
pricing process for the retail business sector but did not provide an automated de-
cision mechanism [7]. In this paper we used argumentation that allows for ex-
pressing conflicting views on the subject and a mechanism for resolving these
conflicts. Moreover, with argumentation it is possible to provide an explanation of
the decisions that the agent makes. This is also the main technical difference with
existing works in the agent technology literature, where product pricing agents
have been referred to as economic agents, as price bots, or, simply, as seller agents
330 Nikolaos Spanoudakis and Pavlos Moraitis
(see e.g. [6], [3] and [4]) and their responsibility is to adjust prices automatically
on the seller’s behalf in response to changing market conditions [6].
All these existing solutions focus on a selected product negotiation rather than
bundles of products (as in the retail business sector). The MARKET-MINER
product pricing agent borrows interesting features from these works, i.e. resets
prices at regular intervals and can employ different strategies for pricing depend-
ing on market conditions. The added value of the MARKET-MINER product pric-
ing agent regarding these approaches is the capability to model human knowledge
and apply human-generated strategies to automate product pricing with the possi-
bility to provide logical explanations to decision makers, if needed.
The presented application’s results were evaluated according to a widely used
process (MEDE-PROS [2]) and they were proposed by the SingularLogic research
and development department for commercialization by the firm.
References
Panagiotis Symeonidis
Abstract Social Tagging is the process by which many users add metadata
in the form of keywords, to annotate and categorize items (songs, pictures,
web links, products etc.). Social tagging systems (STSs) can recommend users
with common social interest based on common tags on similar items. How-
ever, users may have different interests for an item, and items may have
multiple facets. In contrast to the current recommendation algorithms, our
approach develops a model to capture the three types of entities that exist
in a social tagging system: users, items, and tags. These data are represented
by a 3-order tensor, on which latent semantic analysis and dimensionality re-
duction is performed using the Higher Order Singular Value Decomposition
(HOSVD) method. We perform experimental comparison of the proposed
method against a baseline user recommendation algorithm with a real data
set (BibSonomy), attaining significant improvements.
1 Introduction
Social tagging is the process by which many users add metadata in the form
of keywords, to annotate and categorize songs, pictures, products, etc. Social
tagging is associated to the “Web 2.0” technologies and has already become
an important source of information for recommender systems. For example,
music recommender systems such as Last.fm and MyStrands allow users to
tag artist, songs, or albums. In e-commerce sites such as Amazon, users tag
products to easily discover common interests with other users. Moreover,
social media sites, such as Flickr and YouTube use tags for annotating their
content. All these systems can further exploit these social tags to improve
the search mechanisms and personalized recommendations. Social tags carry
Symeonidis, P., 2009, in IFIP International Federation for Information Processing, Volume 296;
Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 331–340.
332 Panagiotis Symeonidis
useful information not only about the items they label, but also about the
users who tagged. Thus, social tags are a powerful mechanism that reveal
3-dimensional correlations between users, tags, and items.
Several social tagging systems (STSs), e.g., Last.fm, Amazon, etc., rec-
ommend interesting users to a target user, opting in connecting people with
common interests and encouraging people to contribute and share more con-
tents. With the term interesting users, we mean those users who have similar
profile with the target user. If a set of tags are frequently used by many users,
then these users spontaneously form a community of interest, even though
they may not have any physical or online connections. The tags represent the
commonly interested web contents to this community of common interest.
For example, Amazon recommends to a user who used a specific tag, other
new users considering them as interesting ones. Amazon ranks them based
on how frequently they used the specific tag.
In this paper, we develop a model based on the three dimensions, i.e., items,
tags, users. The 3-dimensional data are represented by 3-dimensional ma-
tricies, which are called 3-order tensors. We avoid splitting the 3-dimensional
correlations and we handle all dimensions equally. To reveal latent semantics,
we perform 3-mode analysis, using the Higher Order Singular Value Decom-
position (HOSVD) [4]. Our method reveals latent relations among objects of
the same type, as well among objects of different types.
The contributions of our approach are summarized as follows:
• We use a 3-order tensor to model the three types of entities (user, item,
and tag) that exist in social sites.
• We apply dimensionality reduction (HOSVD) in 3-order tensors, to reveal
the latent semantic associations between users, items, and tags.
• For the first time to our knowledge we recommend interesting users to
other users.
The rest of this paper is organized as follows. Section 2 summarizes the
related work. The proposed approach is described in Section 3. Experimental
results are given in Section 4. Finally, Section 5 concludes this paper.
2 Related Work
In the area of discovering shared interests in social networks there are two
kinds of existing approaches [5]. One is user-centric, which focuses on detect-
ing social interests based on the on-line connections among users; the other
is object-centric, which detects common interests based on the common ob-
jects fetched by users in a social community. In the user-centric approach,
recently Ali-Hasan and Adamic [1] analyzed user’s online connections to dis-
cover users with particular interests for a given user. Different from this kind
User Recommendations based on Tensor Dimensionality Reduction 333
of approaches, we aim to find the people who share the same interest no mat-
ter whether they are connected by a social graph or not. In the object-centric
approach, recently Guo et al. [6] explored the common interests among users
based on the common items they fetched in peer-to-peer networks. However,
they cannot differentiate the various social interests on the same items, due
to the fact that users may have different interests for an information item and
an item may have multiple facets. In contrast, our approach focuses on di-
rectly detecting social interests and recommending users by taking advantage
of social tagging, by utilizing users’ tags.
Differently from existing approaches, our method develops a unified frame-
work to concurrently model the three dimensions. Usage data are represented
by a 3-order tensor, on which latent semantic analysis is performed using the
Higher Order Singular Value Decomposition (HOSVD), which has been in-
troduced in [4].
HOSVD is a generalization of singular value decomposition and has been
successfully applied in several areas. In particular, Wang and Ahuja [8]
present a novel multi-linear algebra based approach to reduced dimensional-
ity representation of multidimensional data, such as image ensembles, video
sequences and volume data. In the area of Data Clustering, Chen et al. [2]
used also a high-order tensor. However, they transform the initial tensor
(through Clique Expansion algorithm) into lower dimensional spaces, so that
clustering algorithms (such as k-means) can be applied. Finally, in the area
of Personalized Web Search, Sun et al. proposed CubeSVD [7] to improve
Web Search. They claimed that as the competition of Web Search increases,
there is a high demand for personalized Web search. Therefore based on their
CubeSVD analysis, Web Search activities can be carried out more efficiently.
In the next section, we provide more information on HOSVD.
Since we focus on 3-order tensors, n ∈ {1, 2, 3}, we use 1-mode, 2-mode, and
3-mode products.
Our Tensor Reduction algorithm initially constructs a tensor, based on
usage data triplets {u, t, i} of users, tags and items. The motivation is to use
all three entities that interact inside a social tagging system. Consequently,
we proceed to the unfolding of A, where we build three new matrices. Then,
we apply SVD in each new matrix. Finally, we build the core tensor S and
the resulting tensor Â. All these can be summarized in 5 steps, as follows.
From the usage data triplets (user, tag, item), we construct an initial 3-order
tensor A ∈ Ru×t×i , where u, t, i are the numbers of users, tags and items,
respectively. Each tensor element measures the preference of a (user u, tag t)
pair on an item i.
The core tensor S governs the interactions among user, item and tag entities.
From the the initial tensor A we proceed to the construction of the core
tensor S, as follows:
S = A ×1 (Uc(1)
1
)T ×2 (Uc(2)
2
)T ×3 (Uc(3)
3
)T , (2)
Finally, tensor  is built by the product of the core tensor S and the mode
(1) (2) (3)
products of the three matrices Uc1 , Uc2 and Uc3 as follows:
 = S ×1 Uc(1)
1
×2 Uc(2)
2
×3 Uc(3)
3
(3)
The reconstructed tensor  measures the associations among the users, tags
and items, so that the elements of  represent a quadruplet {u, t, i, p} where
p is the likeliness that user u will tag item i with tag t. On this basis, users
can be recommended to u according to their weights associated with {i, t}
pair.
4 Experimental Performance
where for a user u, I(u) denotes the items tagged by u. ACSN evaluates
the tightness or looseness of each Neighborhood or recommended users.
For each of the algorithms of our evaluation we will now describe briefly
the specific settings used to run them:
4.1 Results
they include. We compute the similarity between two web sites with the inner
product, i.e., the cosine similarity of their TF/IDF keyword term vectors [5].
For each user’s neighborhood, we compute the Average Cosine Simi-
larity (ACS) of all web site pairs inside the neighborhood, called intra-
Neighborhood similarity. We also randomly select 20 neighborhood pairs
among the 105 user neighborhoods and compute the average pairwise web
site similarity between every two neighborhoods, called inter-Neighborhood
similarity. In this figure, x axis is the rank of neighborhoods similarity, sorted
by the descending order of their intra-Neighborhood similarities. y-axis shows
the intra-Neighborhood similarity of each neighborhood and the correspond-
ing average inter-Neighborhood similarity of this neighborhood with other 20
randomly selected neighborhoods.
Figure 1a, shows the comparison between the intra-Neighborhood and the
inter-Neighborhood similarity of our Tensor Reduction Algorithm.
0.6 intra-Neighborhood
inter-Neighborhood
0.5
0.4
ACS
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Neighborhood Rank
(a)
0.6
0.5
0.4
ACS
0.3
0.22
0.2
0.1
0.03
0
intra-Neighborhood intra-Neighborhood
(b)
0.6
intra-Neighborhood
0.5 inter-Neighborhood
0.4
ACS
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Neighborhood Rank
(a)
0.6
0.5
0.4
ACS
0.3
0.2
0.1 0.06
0.03
0
intra-Neighborhood intra-Neighborhood
(b)
5 Conclusions
STSs provide recommendations to users based on what tags other users have
used on items. In this paper, we developed a unified framework to model the
three types of entities that exist in a social tagging system: users, items, and
tags. We applied dimensionality reduction in a 3-order tensor, to reveal the
latent semantic associations between users, items, and tags. The latent se-
mantic analysis and dimensionality reduction is performed using the Higher
Order Singular Value Decomposition (HOSVD) method. Our approach im-
proves user recommendations by capturing users multimodal perception of
item/tag. Moreover, for the first time to our knowledge, we provide user rec-
ommendations. We also performed experimental comparison of the proposed
method against a baseline user recommendation algorithm. Our results show
significant improvements in terms of effectiveness measured through, intra-
and inter-neighborhood similarity.
As future work, we intend to examine the following topics:
• To examine different methods for extending SVD to high-order tensors.
Another approach for multi dimensional decompositions is the Parallel
factor analysis (Parafac).
• To apply different weighting methods for the initial construction of a ten-
sor. A different weighting policy for the tensor’s initial values could im-
prove the overall performance of our approach.
• To adjust our Tensor Reduction framework to be able of handling on-line,
the newly emerged objects (new users, new items and new tags), at the
time they are inserted in a Social Tagging System. This may result to
4-dimensional tensors, where time represents the additional dimension.
References
1. N. Ali-Hasan and A. Adamic. Expressing social relationships on the blog through links
and comments. In Proceedings ICWSM Conference, 2007.
2. S. Chen, F. Wang, and C. Zhang. Simultaneous heterogeneous data clustering based on
higher order relationships. In Proceedings Workshop on Mining Graphs and Complex
Structures (MGCS’07), in conjunction with ICDM’07, pages 387–392.
340 Panagiotis Symeonidis
Faculty of Engineering.
Democritus University of Thrace, Xanthi, Greece.
{palvanit, iandread}@ee.duth.gr, [email protected]
1 Introduction
Alvanitopoulos, P.-F., Andreadis, I. and Elenas, A., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 341–346.
342 P.F. Alvanitopoulos, I. Andreadis and A. Elenas
addition, a satisfactory number of damage indices have been used to estimate the
earthquake damages in structures. Previous works prove that there is a correlation
between the damage indices and the aforementioned intensity parameters [3, 4].
At the second stage of processing a GA has been used to reduce the number of
the seismic parameters and find the subset that maximizes the classification rates.
The GA starts the feature extraction process using an initial population of indi-
viduals (combination of seismic parameters) and after a specific number of gen-
erations produce an optimal single solution.
To select the optimum representation of seismic signals different kinds of clas-
sifiers have been used. Previous studies proposed artificial neural networks and ar-
tificial neuro-fuzzy inference systems for the classification of earthquake damages
[5]. The classification accuracy of these systems has been used to evaluate the fit-
ness value of the individuals.
The last part of the research was the investigation of the classification perform-
ance. The classifiers have been trained and simulated using the optimal subset of
the intensity parameters. Classification results prove the effectiveness of this me-
thod.
2 Genetic Algorithms
3 Proposed Method
A GA was used to find the optimal feature set to produce the best classification
accuracy of the proposed classifiers. First several subsets of seismic parameters
have been examined. The classifiers have been trained according to these features.
The fitness function of these subsets has been evaluated and the optimal set of
seismic parameters have been extracted.
Let L=20 (twenty seismic parameters) be the number of feature descriptors.
Assume a population of N individuals. In this research a population size of N=20
individuals has been used. A chromosome of L genes is an individual which repre-
sents the subset of seismic parameters. In the initial population p = {x1, . . . ,xN}
the first sample x1 has all the genes equal to 1. The genes were allowed to take ei-
ther values 0 or 1. A value of 1 implied that the corresponding parameter would be
included in the feature subset. The seismic parameter would be excluded from the
feature subset if its gene value was set to 0. In our method the negative classifica-
tion accuracy of the classifiers is equal to the fitness function of the subset. We
use the negative classification accuracy because the algorithm selects as elitist in-
dividuals the subsets which have the lowest fitness value. The GA was allowed to
run for a maximum of 100 generations.
The GA creates three types of children to the next generation. The first type of
children is the elite children. These are the best individuals in the previous genera-
tion which are guaranteed to survive to the next generation. In this approach the
elite children parameter was set to 2. Besides elite children, the algorithm creates
the crossover and mutation children. The crossover operation recombines genes
from different individuals to produce a superior child. After the crossover the mu-
tation step was used to search through a larger search area to find the best solution.
In each generation 80% of the individuals in the population excluding the elite
children were created through the crossover operation and the remaining 20%
were generated through mutation. Using these parameters it is clear that for a
population equal to 20 there are 2 elite children from the previous generation, 14
crossover and 4 mutation children.
In the present study an ANN was used for the classification of seismic signals.
This network consists of one input layer, a hidden layer of 17 neurons and one
output layer. The proposed classifier is a supervised feed-forward ANN with hy-
344 P.F. Alvanitopoulos, I. Andreadis and A. Elenas
perbolic tangent sigmoid activation function. The first layer presents the inputs on
the network. The number of the inputs to the first layer is not fixed. All the indi-
viduals from the GA are passed through the ANN to estimate the classification ac-
curacy of them and their fitness function. Each time the inputs are equal to the
number of genes which their value is set to 1. The number of output units is fixed
to four, since four are the categories of possible damages. During the training of
the ANN a set of representative vector samples have been used. Then the ANN
was simulated using the entire set of seismic signals to evaluate the classification
performance. Each time a seismic signal was represented in ANN with the set of
seismic parameters according to the individuals of the GA. During the supervised
training process whenever a training vector appears, the output of the neuron,
which represents the class, where the input belongs, is set to 1 and all the rest out-
puts are set to 0. The training algorithm for the network is the Leven-
berg/Marquardt (LM) and is described in [7] in more detail.
4 Results
After the nonlinear dynamic analysis of the structure, for the entire set of artificial
accelerograms, three damage indices (DI), namely, the DI of Park/Ang, of Di-
Pasquale/Cakmak and the maximum inter-storey drift ratio (MISDR) have been
computed. According to the damage indices, the damages caused by seismic sig-
nals, were classified in four to classes. In this experiment a total set of 450 artifi-
cial accelerograms have been used. The representation of the artificial accelero-
grams has been studied using different subsets (individuals) of the twenty intensity
parameters. Each individual is a 1x20 bit matrix.
A Genetic Algorithm for the Classification of Earthquake Damages 345
Due to the bit string type of individuals the total number of the possible candi-
date solution is 220. A GA with a population of 20 individuals was employed and
executed for a maximum number of 100 generations. This means that the GA
searches for the optimal feature selection and tests up to 2000 possible solutions.
Using only the selection process in the GA without the crossover and mutation
step it will create a negative effect on the convergence. On the other hand using
mutation alone is similar to a random search. The GA has been used once time for
each of the damage indices. Two types of classifiers have been used to estimate
the fitness function of the GA. Tables 1 and 2 show the classification rates for the
three damage indicators. Fig. 1 presents the total best individuals for the represen-
tation of seismic signals using the MISDR and DI of DiPasquale/Cakmak.
DI of DI of
MISDR DiPasquale/Cakmak Park/Ang
Number of unknown samples 450 450 450
Number of intensity parameters 13 13 13
Number of well recognized samples 417 410 408
Total % of the recognized vectors 92,60% 91,10% 90,66%
DI of DI of
MISDR DiPasquale/Cakmak Park/Ang
Number of unknown samples 450 450 450
Number of intensity parameters 13 13 13
Number of well recognized samples 415 404 410
Total % of the recognized vectors 90,21% 89,70% 91,10%
0.5
0
0 2 4 6 8 10 12 14 16 18 20
Number of variables (20)
Current Best Individual
1
Current best individual
0.5
0
0 2 4 6 8 10 12 14 16 18 20
Number of variables (20)
GAs are a popular tool in Artificial Intelligence applications. It is well known that
GAs can be used for feature extraction. With the use of GAs this approach exam-
ined the structural seismic damages in buildings. A training set of 450 artificial
accelerograms with known damage effects was used to derive the parameters
which are able to describe the seismic intensity. The proposed algorithm was
based on a set of seismic parameters. The advantage of this approach is that the
proposed algorithm was able to produce high level classification accuracy using a
subset of seismic features. The number of intensity parameters was reduced from
20 to 13. The experimental results show that the classification rates are better from
previous studies [8, 9]. It was demonstrated, that the algorithm developed herein,
presents classification rates up to 92%. The results prove the effectiveness of the
proposed algorithm. Until today, survey is performed with on-site examination by
expert engineers. With the proposed technique engineers will have an additional
tool which can guide them to a faster and more confident estimation of the struc-
tural adequacy of constructions.
References
1. E. DiPasquale and A.S. Cakmak, On the relation between local and global damage indices,
Technical Report NCEER-89-0034, State University of New York at Buffalo, 1989.
2. Y.J. Park and A.H.S. Ang, Mechanistic seismic damage model for reinforced concrete,
Journal of Structural Engineering 111, 1985, 722-739.
3. Elenas and K. Meskouris, Correlation study Between Seismic Acceleration Parameters and
Damage Indices of Structures, Engineering Structures 23, 2001, 698-704.
4. Elenas, Correlation between Seismic Acceleration Parameters and Overall Structural Dam-
age Indices of Buildings, Soil Dynamics and Earthquake Engineering, 20, 2000, 93-100.
5. P. Alvanitopoulos, I. Andreadis and A. Elenas, A New Algorithm for the Classification of
Earthquake damages in Structures, IASTED Int. Conf. on Signal Processing, Pattern Rec-
ognition and Applications, Innsbruck, Austria, February 2008, CD ROM Proceedings, Pa-
per No. 599-062.
6. S.N. Sivanandam and S.N. Deepa, Introduction to Genetic Algorithms, Springer Verlag,
Germany, January 2008.
7. Hagan, M.T., and M. Menhaj, Training feed-forward networks with the Marquardt algo-
rithm, IEEE Transactions on Neural Networks, Vol. 5, No. 6, 1994, 989-993.
8. Tsiftzis, I. Andreadis and A. Elenas, A Fuzzy System for Seismic Signal classification, IEE
Proc. Vision, Image & Signal Processing, 153, 2006, 109-114.
9. Andreadis, Y. Tsiftzis and A. Elenas, Intelligent Seismic Acceleration Signal Processing
for Structural Damage Classification, IEEE Transactions on Instrumentation and Meas-
urement, 56, 2007, 1555-1564.
Mining Retail Transaction Data for Targeting
Customers with Headroom - A Case Study
1 Introduction
Recommender systems have recently gained a lot of attention both in industry and
academia. In this paper, we focus on the applications and utility of recommender
systems for brick-and-mortar retailers. We address the problem of identifying shop-
pers with high potential spending ability and presenting them with relevant of-
fers/promotions that they would most likely participate in. The key to successfully
answering this question is a system that, based on a shopper’s historical spending
behavior and shopping behaviors of others who have a similar shopping profile, can
predict the product categories and amounts that the shopper would spend in the fu-
ture. We present a case study of a project that we completed for a large retail chain.
The goal of the project was to mine the transaction data to understand shopping be-
havior and target customers who exhibit headroom - the unmet spending potential
of a shopper in a given retailer.
The paper is organized as follows. Section 2 presents an overview of the project
and Section 3 provides a mathematical formulation of the problem. After presenting
Shashanka, M. and Giering, M., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 347–355.
348 Madhu Shashanka and Michael Giering
Data from every transaction from over 350 stores of a large retail chain gathered over
a period of 16 months was provided to us. Data is restricted to transactions of regular
shoppers who used a “loyalty card” that could track them across multiple purchases.
For every transaction completed at the checkout, we had the following information:
date and time of sale, the receipt number (ticket number), loyalty-card number of the
shopper (shopper number), the product purchased (given by the product number),
product quantity and the total amount, and the store (identified by the store number)
where the transaction took place. A single shopping trip by a customer at a particular
store would correspond to several records with the same shopper number, the same
store number, and the same ticket number, with each record corresponding to a
different product in the shopping cart. There were 1,888,814 distinct shoppers who
shopped in all the stores in the period of data collection.
Along with the transaction data, we were also given a Product Description Hi-
erarchy (PDH). The PDH is a tree structure with 7 distinct levels. At level 7, each
leaf corresponds to an individual product item. Level 0 corresponds to the root-node
containing all 296,387 items. The number of categories at the intermediate levels, 1
through 6, were 9, 50, 277, 1137, 3074 and 7528 respectively. All analysis referred
to in this paper was performed at level 3 ( denoted as L3 henceforth).
There were two main aspects in the project. The first was to identify those shop-
pers who were not spending enough to reflect their spending potential. These could
be shoppers who have significant disposable income and who could be persuaded
to spend more or regular shoppers who use the retail chain to fulfill only a part of
their shopping needs. In both cases, the customers have headroom, i.e. unrealized
spending potential.
Once headroom customers have been identified, the next logical step is to find
out product categories that would most likely interest them and to target promotions
at this group. This is the problem of filling holes in the baskets, by motivating them
to buy additional products that they are not currently shopping for. In many respects
this is similar to a movie recommender problem, where instead of movie watching
history and movie ratings of each person, we have the shopping history and spends.
3 Mathematical Formulation
In this section, we introduce mathematical notation and formulate the problem. Let
Scpm denote the amount spent by shopper c in the product category p during month
m, and ncpm denote the number of items bought in that product category. For the
Mining Retail Transaction Data for Targeting Customers with Headroom 349
purposes of clarity and simplicity, let us denote the indices {c, p, m} by the variable
τ . In other words, each different value taken by τ corresponds to a different value of
the triplet {c, p, m}. Let us define the quantity Spend Per Item (SPI) as Iτ = (Sτ /nτ ).
The above above quantities can be represented as 3-dimensional matrices S, n and I
respectively, where the three dimensions correspond to shoppers, product categories
and months. These matrices are highly sparse with entries missing for those values
of τ = {c, p, m} that correspond to no data in the data set (i.e. items that were not
bought by shoppers). Let τ0 represent the set of values of {c, p, m} for which there
is no data and let τ1 represent the set of values of {c, p, m} for which data is present,
i.e. τ = {τ0 ∪ τ1 }.
The first problem is to estimate each shopping household’s unrealized spending
potential in product categories that they haven’t bought. This information can then
be used for targeting and promotions. Mathematically, the problem is to estimate
Sτ0 given the values in Sτ1 .
The second problem is to identify a set of shoppers who have headroom. Al-
though subjective, the most common usage of this term refers to customers who
have additional spending potential or who are not using a retailer to fill shopping
needs that could be met there. There are many possible proxy measures of head-
room, each focusing on different aspects of shopping behavior. We chose to derive
four of these headroom metrics1 - (a) total actual spend, (b) total actual SPI, (c)
residue between model spend and actual spend, and (d) frequency of shopping.
For ease of comparison between the metrics and for the purpose of consolidat-
ing them later, we express each metric for all the shoppers in probabilistic terms.
Our first three metrics are well suited for representation as standard z-scores. The
frequency metric requires a mapping to express it as a standard z-score. For every
metric, we choose a value range and define shoppers with z-scores in this range as
exhibiting headroom. Section 5 details how the scores are consolidated.
The elements of s are the singular values and the columns of U, V are the left and
right singular vectors respectively. The matrices are typically arranged such that the
diagonal entries of S are non-negative and in decreasing order. The M-dimensional
columns of U, {u1 , u2 , . . . , uM }, form an orthonormal matrix and correspond to a
linear basis for X’s columns (span the column space of X). Also, the N-dimensional
1 These metrics can be measured separately for each product category if desired. We mention the
Several well-known recommender systems are based on SVD and the related
method of Eigenvalue decomposition (eg. [6, 2]). Let the matrix X represent a ma-
trix of consumer spends over a given period. Each of the N columns represents a
different shopper and each of the M rows represents a different product. Xmn , the
mn-th entry in the matrix, represents how much shopper n spent on product m. Con-
sider the k-rank SVD U0 S0 V0T ≈ X. The subspace spanned by the columns of U0 can
be interpreted as the k most important types of “shopping profiles” and a location
can be computed for each shopper in this shopping profile space. The relationship
between xn , the n-th column of X representing the spends of the n-th shopper, and
his/her location in the shopping profile space given by a k-dimensional vector pn is
given by pn = U0T xn . It is easy to show that pn is given by the n-th row of the matrix
V0 S0 . This vector pn underlies all SVD-based recommender systems, the idea is to
estimate pn and thus obtain imputed values for missing data in xn . This also enables
one to identify shoppers with similar shopping profiles by measuring the distance
between their locations in the shopping profile space [6, 7]. Similarly, the prod-
uct space given by columns of V0 show how much each product is liked/disliked
by shoppers belonging to the various shopping profiles. These subspaces are very
useful for subsequent analysis such as clustering and visualization.
Despite its advantages, the main practical impediment to using a thin SVD with
large data sets is the cost of computing it. Most standard implementations are based
in Lanczos or Ritz-Raleigh iterations that do not scale well with large data sets.
Such methods require multiple passes through the entire data set to converge. Sev-
eral methods have been proposed to overcome this problem for fast and efficient
SVD computations [3, 5]. In this paper, we use the iterative incremental SVD im-
plementation (IISVD) [2, 1] which can handle large data sets with missing values.
Details of the implementation are beyond the scope of this paper.
Mining Retail Transaction Data for Targeting Customers with Headroom 351
5 Methodology
Data Preprocessing
For our retail sales data, the assumption of log-normal distribution of spend and
spend per item on each L3 product category and for the overall data are very good.
There are always issues of customers shopping across stores, customers buying for
large communities and other anomalous sales points. We first eliminate these out-
liers from further analysis. We screen shoppers based on four variables - the total
spend amount, the number of shopping trips, the total number of items bought, and
the total number of distinct products bought. The log-distribution for each variable
showed us that the distributions were close to normal, but containing significant
outlier tails corresponding to roughly 5% of the data on either end of the distribu-
tion. All the shoppers who fall in the extreme 5% tails are eliminated. This process
reduces the number of shoppers from 1,888,814 to 1,291,114.
The remaining data is log-normalized and centered. The relatively small diver-
gence from normality at this point is acceptable for the justification of using the
SVD method (which assumes Gaussian data) to model the data.
Clustering
A key to accurate modeling of retail data sets of this size is the ability to break the
data into subsegments with differing characteristics. Modeling each segment sep-
arately and aggregating the smaller models gives significant gains in accuracy. In
previous work [4] we utilized demographic, firmographic and store layout informa-
tion to aid in segmentation. In this project, we derived our segments solely from
shopping behavior profiles.
Shopping behavior profiles are generated by expressing each shopper’s cumula-
tive spend for each L3 product category in percentage terms. The main reason for
considering percentage spends is that it masks the effect of the shopper household
size on the magnitude of spend and focuses on the relative spend in different L3
categories. For example, shoppers with large families spend more compared to a
shopper who is single and lives alone. We believe that this approach produces infor-
mation more useful for discriminating between consumer lifestyles.
We begin by creating a 150 × 1 vector2 of percent spends per L3 category. A
dense matrix X containing one column for each shopper is constructed. We generate
the SVD decomposition, X = USVT , using IISVD.
From experience we know that we can assume a noise content in retail data of
more than 15 percent. Using this as a rough threshold, we keep only as many sin-
gular vectors whose cumulative variance measures sum to less than 85 percent of
the overall data variance. In other words, the rank we choose for the approxima-
tion U0 S0 V0T is the minimum value of k such that (∑ki=1 s2i )/(∑150 2
i=1 si ) ≥ 0.85. We
2 We were asked to analyze only a subset of 150 from among the total 277 categories.
352 Madhu Shashanka and Michael Giering
Consider S, the 3-D matrix of spends by all shoppers across product categories
and across months. We unroll the dimensions along p (product categories) and m
(months) into a single dimension of size |p| × |m|.
We now consider each cluster separately. Let X refer to the resulting sparse 2-D
matrix of spends of shoppers within a cluster. Let τ1 and τ0 represent indices corre-
sponding to known and unknown values respectively. The non-zero data values of
the matrix, Xτ1 , have normal distributions that make SVD very suitable for model-
ing. Depending on the goal of the analysis, the unknown values Xτ0 can be viewed
as data points with zeros or as missing values.
Treating these as missing values and imputing values using the SVD gives us an
estimate that can be interpreted as how much a customer would buy if they chose
to meet that shopping need. Again, this is analogous to the approach taken in movie
recommender problems. We use IISVD for the imputation and compute full-rank
decompositions. Given a sparse matrix X, IISVD begins by reordering the rows and
columns of an initial sample of data to form the largest dense matrix possible. SVD
is performed on this dense data and is then iteratively “grown” step by step as more
rows and columns are added. IISVD provides efficient and accurate algorithms for
performing rank-1 modifications to an existing SVD such as updating, downdating,
revising and recentering. It also provides efficient ways to compute approximate
fixed-rank updates, see [2] for a more detailed treatment. This process is repeated for
each of the 23 clusters of shoppers separately and imputed values X̂τ0 are obtained.
The imputed spends for a shopper are equivalent to a linear mixture of the spends
of all shoppers within the cluster, weighted by the correlations between their spends
and spends of the current shopper.
Note that if we now use this filled data set and impute the value of a known data
point, the imputed values of the missing data have no effect on the solution because
they already lie on the SVD regression hyperplane.
Headroom Model
per’s spend differs from the expected value given by the SVD imputation of non-
zero data.
Using the filled shopper-L3 spend data X̂τ0 , we remove 10% of the known data
values and impute new values using a thin SVD algorithm [2]. The thin SVD method
we use determines the optimal rank for minimizing the error of imputation by way
of a cross validation method. Because of this, the reduced rank used to model each
cluster and even differing portions of the same cluster can vary. A result of this as
shown in [4] is that the error of a model aggregated from these parts has much lower
error than modeling all of the data at the same rank. This process is repeated for all
known spend values Xτ1 and model estimates X̂τ1 are calculated.
The residues, differences between known spends Xτ1 and modeled spends X̂τ1 ,
are normally distributed and hence can be expressed as Z-scores. Figure 1 illustrates
the normality for percent errors in a given cluster. Figure 2 shows the normality plot
of the data. The preponderance of points exhibit a normal distribution while the tail
extremes diverge from normality.
Histogram of % Errors
Fig. 1 Histogram of Percent Errors. The difference between the imputed spend values Ŝτ1 and
known spend values Sτ1 is expressed in terms of percentages. Figure shows that distribution is
close to a Gaussian.
Observing the root-mean-squared error of the imputed known values X̂τ1 for each
L3 category gives a clear quantification of the relative confidence we can have across
L3 product categories.
The model z-scores generated in this process can be interpreted as a direct proba-
bilistic measure of shopper over-spending/under-spending for each L3 category. One
can do a similar analysis for SPI data and obtain z-scores in the same fashion. How-
354 Madhu Shashanka and Michael Giering
Normal Probability Plot
ever, it can be shown that the SPI z-scores obtained will be identical to the spend
z-scores that we have calculated3 .
3 We model the spend in categories that a customer has shopped in by making a reasonable as-
sumption that the number of items of different products bought will not change. Thus any in-
crease/decrease in total spend is equivalent in percentage terms to the increase/decrease in SPI.
Mining Retail Transaction Data for Targeting Customers with Headroom 355
For each of the HPMs, we select the shoppers corresponding to the top 30% of the
z-score values. The union of these sets is identified as the screened set of customers
with the greatest likelihood of exhibiting headroom.
The Consolidated Headroom Metric for each shopper is created by a weighted
sum across each of our four HPMs. In this project, we chose to apply equal weights
and computed the CHM as the mean of the four HPMs. However, one could subjec-
tively choose to emphasize different HPMs depending on the analysis goals.
6 Conclusions
In this paper, we presented the case-study of a retail data mining project. The goal
of the project was to identify shoppers who exhibit headroom and target them
with product categories they would most likely spend on. We described details of
the SVD-based recommender system and showed how to identify customers with
high potential spending ability. Due to the customers’ demand for mass customiza-
tion in recent years, it has become increasingly critical for retailers to be a step
ahead by better understanding consumer needs and by being able to offer promo-
tions/products that would interest them. The system proposed in this paper is a first
step in that endeavor. Based on the results of a similar highly successful project that
we completed for another retailer previously [4], we believe that with a few itera-
tions of this process and fine tuning based on feedback from sales and promotions
performance, it can be developed into a sophisticated and valuable retail tool.
Acknowledgements The authors would like to thank Gil Jeffer for his database expertise and
valuable assistance in data exploration.
References
1. M. Brand. Incremental Singular Value Decomposition of Uncertain Data with Missing Values.
In European Conf on Computer Vision, 2002.
2. M. Brand. Fast Online SVD Revisions for Lightweight Recommender Systems. In SIAM Intl
Conf on Data Mining, 2003.
3. S. Chandrasekaran, B. Manjunath, Y. Wang, J. Winkeler, and H. Zhang. An Eigenspace Update
Algorithm for Image Analysis. Graphical Models and Image Proc., 59(5):321–332, 1997.
4. M. Giering. Retail sales prediction and item recommendations using customer demographics
at store level. In KDD 2008 (submitted).
5. M. Gu and S. Eisenstat. A Stable and Fast Algorithm for Updating the Singular Value Decom-
position. Technical Report YALEU/DCS/RR-966, Yale University, 1994.
6. D. Gupta and K. Goldberg. Jester 2.0: A linear time collaborative filtering algorithm applied to
jokes. In Proc. of SIGIR, 1999.
7. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in
recommender system - a case study. In Web Mining for ECommerce, 2000.
Adaptive Electronic Institutions for
Negotiations
Abstract The expansion of web technologies pushes human activities over meth-
odologies and software that could ease reactions by means of software transac-
tions. Distribution of human and software agents over the web and their operation
under dynamically changing conditions necessitate the need for dynamic intelli-
gent environments. Electronic institutions can play an “umbrella” role for agents’
transactions, where institutions’ norms could protect and support movements and
decisions made through negotiations. However, dynamic information provision
may force changes in structures and behaviors, driving electronic institutions’ ad-
aptation to changing needs. Viewing negotiation structures as electronic institu-
tions, this paper investigates the impact of a dynamically changing environment to
negotiations’ electronic institutions.
1 Introduction
Sardis, M. and Vouros, G., 2009, in IFIP International Federation for Information Processing, Volume
296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 357–364.
358 Manolis Sardis and George Vouros
it cannot be utilized effectively in real time: The dossier of negotiating parties
with all the available “movements” and rules almost is fixed. It is the aim of this
paper to investigate the incorporation of dynamically provided information and its
effects in negotiations’ structure and function by means of electronic institutions’
constructs.
In our case study we deal with a traditional chartering task, where Shipowners
and Cargo owners have to reach, in a best price and under certain conditions and
terms, an agreement for a contract for the transferring of cargoes. Let us consider
five Shipowners’ brokers, which have started negotiating with a specific cargo-
owner to conclude in a contract [4]. During the negotiation procedure one Shi-
powner was informed that its vessel had stopped operating and that could delay its
arrival in the cargo port. The remaining Shipowners’ brokers continue to negotiate
with the cargo owner, when again a market change or a specific exceptional occa-
sion near the cargo destination pushed them to start the whole procedure under the
light of the new conditions. Such conditions are affecting the negotiation proce-
dures either by changing participants’ strategic decisions and their related actions,
or by militate against their scopes. In the worst case a participant could leave the
process and search for a new negotiation place. The above generic scenario in the
maritime sector is happening many times during negotiations and most of the
times the negotiating partners are not in the position to control and filter external
info/news that could affect their decisions in real-time. Being motivated by this
real-life problem we are investigating the use of a framework that could support
solving this type of negotiation problems by means of adaptive environments.
The paper is structured as follows: Section 2 analyzes the electronic negotia-
tions and the missing adaptability. Section 3 proposes eIs as a solution for adap-
tive negotiations, and according to structure eIs we present the adaptation that can
offer. Finally, in Section 4 conclusions and remarks are finalizing the paper struc-
ture giving future research topics.
2 Electronic Negotiations
In electronic negotiations, software agents prepare bids for and evaluate offers on
behalf of the parties they represent, aiming to obtain the maximum benefit for
their owners, following specific negotiation strategies. When building autonomous
agents capable of sophisticated and flexible negotiation, the following areas
should be considered [4]: (a) negotiation protocol and model to be adopted, (b) is-
sues over which negotiation will take place, (c) events affect the negotiation proc-
ess and drive adaptability, (d) negotiation strategies employed by agents, under
what conditions, and how will be implemented and adapted to changing circum-
stances. Given the wide variety of possibilities for negotiations, there is no univer-
sally best approach or technique for supporting automated negotiations [8]. Proto-
cols, models and strategies need to be set according to the prevailing situations
and to adapt accordingly based on new information. The change of negotiation
Adaptive Electronic Institutions for Negotiations 359
conditions can move the whole negotiation phase in its starting point, maybe caus-
ing the adoption of new negotiation protocol/strategies for the involved parties.
We consider a generic negotiation environment, covering multi-issue contracts
and multi-party situations, where negotiators face strict deadlines: However we
deal with a highly dynamic environment, in the sense that its variables, attributes
and objectives may change over time. The trigger to this change is the time and
the influence on chartering markets of external factors including catastrophes; po-
litical crises; environmental disasters; aid programmes. Dynamic changes of vari-
ables and conditions that affect negotiations cannot be easily incorporated in hu-
man negotiations’ transactions. This paper concentrates on the incorporation of
these changes in adaptive electronic negotiations in business-to-business (B2B)
marketplaces through eIs. The negotiating agents may be divided into
Buyer_Agents, Seller_Agents and Information_Provision_Agents. The
Buyer_Agents (BA) and the Seller_Agents (SA) are considered to be self-
interested, aiming to maximize their owners’ profit. The Informa-
tion_Provision_Agents (IPA) are signaling new events and the changing of condi-
tions (eg. world news, market changes, etc.), that may affect the negotiation pro-
cedure or the participation of the negotiation agents.
The proposed infrastructure for using eIs for the modeling of adaptive negotia-
tion structures is depicted in “Fig. 1”. Negotiations may adapt as a function of
Time and News Information. Adaptation applies to negotiation areas (NA) and re-
sults in a new negotiation area: In the initialization of the negotiation phase (NA
2), negotiation involves five buyer agents (BA) and one seller agent (SA). Some
of the BAs are also connected with their information provision agents (IPA): It is
not necessary all BA and SA agents to be connected with an IPA agent. Each NA
is specified to be an eI. As the conditions are changing, NAs adapt to new struc-
tures resulting to a different institution structure: From (NA 2) the negotiating
procedures are moved into (NA 2.1) where different eI(i,j,…n) structures control
the negotiation conditions and rules.
This section describes how the different constituents of an eI may change due to
eI’s adaptation to new information provided.
3.4 Scenes
The negotiation procedure comprises phases that can be modeled by scenes in an
eI. A scene is a pattern of multi-agent interaction. A scene protocol is specified by
a finite state oriented graph, where the nodes represent the different states and ori-
ented arcs are labeled with illocution schemes or timeouts. Scenes allow agents ei-
ther to enter or to leave a scene at some particular states of an ongoing conversa-
tion and can substantiate a negotiation procedure by splitting it in more than one
scene. Its negotiation protocol has a defined scenes structure. The negotiation in-
frastructure that related agents follow is translated through the eI in a set of de-
fined scenes graph. Scenes are the key points for eI’s adaptability, as the AIs con-
ditions affect the structure of the scenes graph.
Illocutions for arcs 2 and 5 are the transitions for closing negotiation scene. The
move from stage s_1 to stage s_3 means a positive negotiation result that will be
used as an input to a new scene_l of the performative structure. The transition arc
4 expresses the attitude of SA agent to leave (role_SA live) or stop a negotiation
Adaptive Electronic Institutions for Negotiations 363
when the conditions from the role_IPA_SA_market are not satisfactory and close
to its market and profit intentions.
From the above specific example of negotiations through the eIs structure, it is
clear that the adaptation could be incorporated during scenes’ come round.
4 Concluding Remarks
This paper proposes an infrastructure for adaptive negotiations using the context
of eIs. It is analyzing the negotiation aspects and based on a maritime case study
tries to analyze all the eI aspects that could support the external information into
the negotiation area. In the context of this paper, was investigated the issue of the
different eIs structures that should support the negotiation areas. There is a need
364 Manolis Sardis and George Vouros
for a mechanism responsible for the creation of an eI that links different eI struc-
tures and negotiation areas. Agents according to their profile characteristics and
the negotiation market domain should be forwarded into specific NAs that accord-
ing to the market constraints will be supported by one or more eI structures. The
external info and news adaptability using multi agent systems and through the eIs
are an add-on for the electronic negotiations. The design and the creation of a pro-
totype of the proposed infrastructure using technologies that support the adaptabil-
ity, like Jadex and XML, is our future objective.
References
Abstract This paper focuses on improving load balancing algorithms in grid envi-
ronments by means of multi-agent systems. The goal is endowing the environment
with an efficient scheduling, taking into account not only the computational capa-
bilities of resources but also the task requirements and resource configurations in a
given moment. In fact, task delivery makes use of a Collaborative/Cooperative
Awareness Management Model (CAM) which provides information of the envi-
ronment. Next, a Simulated Annealing based method (SAGE) which optimizes the
process assignment. Finally, a historic database which stores information about
previous cooperation/collaborations in the environment aiming to learn from ex-
perience and infer to obtain more suitable future cooperation/collaboration. The
integration of these three subjects allows agents define a system to cover all the
aspects related with load-balancing problem in collaborations grid environment.
Paletta, M. and Herrero, P., 2009, in IFIP International Federation for Information Processing, Volume
296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 365–371.
366 Mauricio Paletta and Pilar Herrero
niques [10]; 3) to manage the situation in overloaded conditions. Therefore, a co-
operative and dynamic load-balancing strategy becomes highly difficult to solve
effectively by taking in consideration these aspects below.
This paper presents a new MAS-based approach to deal with the necessity pre-
viously mentioned. The proposal is defined by using some components: 1) aware-
ness management concepts defined in the CAM (Collaborative/ Cooperative
Awareness Management) [8] model; 2) a heuristic technique used to optimize the
resources-processes assignments needed, named SAGE (Simulated Annealing to
cover dynamic load balancing in Grid Environment) [12]; 3) a Radial Based Func-
tion Network (RBFN) based learning strategy [11] used to obtain more suitable fu-
ture collaboration/cooperation based on the experience; and 4) a SOA-based
framework to implement Intelligent Agents (IAs) for the grid environments called
SOFIA [13].
These components complement each other because CAM manages resources
interaction by having information of the environment, SAGE delivers the load dy-
namically in the environment, SOFIA and the learning strategy will allow the sys-
tem to have a more suitable cooperation.
The rest of the paper is organized as follows. Section 2 reviewed the technical
backgrounds of previous researches. The MAS-based system proposed in this pa-
per is presented in section 3. Section 4 presents some implementation and evalua-
tion aspects. Finally, section 5 exposes the paper conclusions as well as the future
work related with this research.
2 Theoretical Background
Our approach integrates SOFIA’s framework with the CAM model by adapting
the CAM key concepts to the objectives to be achieved as following (see Fig. 3.1-
b). In this approach, the IA-SV agent manages Focus and Nimbus of each resource
(as “abilities”). The IA-EA agent (“body”) manages the InteractivePool of the col-
laborative grid environment. The load-balancing process is performed by the IA-
RA agent (“brain”) by using the SAGE method as well as the CoB-ForeSeer strat-
egy (see details below).
a)
b)
Fig. 3.1. a) The SOFIA general architecture; b) SOFIA-based system for balancing the load in
collaborative grid environments.
On the other hand, in overloaded conditions (for example when AwareInt is Pe-
ripheral or Null) it is necessary to try to extend the Focus or Nimbus of one of the
nodes so that the AwareInt could change to Full. This role is also performed by
IA-RA through a negotiation process or dialogue that takes place between the IA-
RA and the IA-SV. In this regard, the IA-FA agent (“facilitator”) is responsible to
manage this negotiation process (see details below).
The first thing IA-RA does when a load-balancing process is required in the
grid environment is to obtain the corresponding TaskResolution “scores” sj of each
pj by using CoB-ForeSeer. Once the answer is obtained, and depending on how it
is, IA-RA takes one of three possible decisions, by using some rules as well as the
current information associated with IA-EA (InteractivePool) and IA-SV (Focus
and Nimbus):
A Multi-agent Task Delivery System 369
1) Accept the processes-resources distribution given by CoB-ForeSeer.
2) Decline the answer because it is not keeping with the current situation. One
of the reasons why this may happen is because the RBFN in CoB-ForeSeer is not
sufficiently trained. In this case, IA-RA uses SAGE to find a better answer.
3) Decline the answer because the grid environment current conditions are
overloaded. In this regard IA-RA initiates the negotiation process aiming to
change the environment current conditions to obtain a new acceptable answer.
The negotiation process is performed by using a protocol defined as part of this
proposal. This protocol is used by IA-RA, IA-FA and IA-SV and consists of the
following dialogue:
• REQUEST (IA-RA → IA-FA): The load balancer in IA-RA is aware of an
overload in any of the processes-resources assignments and it decides, with this
message, to negotiate with any node an option relief.
• REQUEST (IA-FA → IA-SV): Once IA-FA receives the request from IA-
RA, and as IA-FA knows what the current “abilities” (Focus and Nimbus) are, it
asks for help aiming to find some node (IA-SV) that could change its abilities.
• CONFIRM (IA-SV → IA-FA): A node is confirming that it has changed its
abilities and it is informing its new Focus and Nimbus.
• DISCONFIRM (IA-SV → IA-FA): A node is confirming that it cannot or it
is not interested in changing its abilities.
• INFORM (IA-FA → IA-RA): Once the IA-FA receives the confirmations/
disconfirmations from the nodes, and it upgrades all the information related with
the Nimbus and Focus of these nodes, IA-FA sends to IA-RA this updated infor-
mation.
Next section presents some aspects related with the implementation and evalua-
tion of the model proposed in this paper and previously defined.
SOFIA was implemented using JADE [1]. JADE behaviour model associated to
IA-RA, IA-EA, IA-FA, and IA-SV agents were implemented. Protocol for the ne-
gotiation process was implemented using the JADE ACL message class.
The evaluation of the model was carried out by generating different random
scenarios in a simulated grid environment and under overload conditions. The
tests were mainly focused in the negotiation process aiming to quantify the ability
of the model to resolve all these specific situations. In addition, it used different
configurations in the grid environment, from 5 nodes to 25 nodes varying in 5 for
each test block aiming to evaluate the model capability for managing growth in
the grid environment conditions.
The results indicate that the proposed model has 84% success in negotiating the
way to resolve overload conditions. In those cases (16%) where the negotiation
was not successful any of the nodes (IA-SV) would or could modify its Fo-
370 Mauricio Paletta and Pilar Herrero
cus/Nimbus (abilities) based on the current grid environment configurations. De-
pending on the number of nodes in the system, the negotiation process requires
different time periods for execution (from less than 1 second to about 82 seconds),
some of which may not be acceptable because the dynamic of the load-balancing
process.
References
1. Bellifemine F., Poggi A., Rimassa G. (1999) JADE – A FIPA-compliant agent framework,
Telecom Italia internal technical report, in Proc. International Conference on Practical Appli-
cations of Agents and Multi-Agent Systems (PAAM'99), 97–108.
2. Berman F (1999) High-performance schedulers, in The Grid: Blueprint for a New Computing
Infrastructure, Ian Foster and Carl Kesselman, (Eds.), Morgan Kaufmann, San Francisco, CA,
279–309.
3. Cao J, Spooner DP, Jarvis SA, Nudd GR (2005) Grid load balancing using intelligent agents,
Future Generation Computer Systems, Vol. 21, No. 1, 135–149.
4. Fangpeng D, Selim GA (2006) Scheduling Algorithms for Grid Computing: State of the Art
and Open Problems” Technical Report No. 2006-504, Queen's University, Canada, 55 pages,
http://www.cs.queensu.ca/TechReports/Reports/2006-504.pdf.
5. Fidanova S, Durchova M (2006) Ant Algorithm for Grid Scheduling Problem, Lecture Notes
in Computer Science, VIII Distributed Numerical Methods and Algorithms for Grid Comput-
ing, 10.1007/11666806, ISSN: 0302-9743 , ISBN: 978-3-540-31994-8 , Vol. 3743, 405–412.
6. Foundation for Intelligent Physical Agents (2002) FIPA Abstract Architecture Specification,
SC00001, Geneva, Switzerland. http://www.fipa.org/specs/fipa00001/index.html.
7. Herrero P, Bosque JL, Pérez MS (2007) An Agents-Based Cooperative Awareness Model to
Cover Load Balancing Delivery in Grid Environments, Lecture notes in computer science
A Multi-agent Task Delivery System 371
2536, On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, ISSN:
0302-9743, ISBN: 978-3-540-76887-6, Springer Verlag, Vol. 4805, 64–74.
8. Herrero P, Bosque J L, Pérez MS (2007) Managing Dynamic Virtual Organizations to get Ef-
fective Cooperation in Collaborative Grid Environments, Lecture notes in computer science
2536, On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, ISBN:
978-3-540-76835-7, Springer Verlag, Vol. 4804, 1435–1452.
9. Jin R., Chen W., Simpson T.W. (2001) Comparative Studies of Metamodelling Techniques
under Multiple Modeling Criteria, Struct Multidiscip Optim, Vol. 23, 1–13.
10. McMullan P, McCollum B (2007) Dynamic Job Scheduling on the Grid Environment Using
the Great Deluge Algorithm, Lecture Notes in Computer Science, ISSN: 0302-9743, ISBN:
978-3-540-73939-5, Vol. 4671, 10.1007/978-3-540-73940-1, 283–292.
11. Paletta M., Herrero P. (2008) Learning Cooperation in Collaborative Grid Environments to
Improve Cover Load Balancing Delivery, in Proc. IEEE/WIC/ACM Joint Conferences on
Web Intelligence and Intelligent Agent Technology, IEEE Computer Society E3496, ISBN:
978-0-7695-3496-1, 399–402.
12. Paletta M., Herrero P. (2008) Simulated Annealing Method to Cover Dynamic Load Balanc-
ing in Grid Environment, in Proc. International Symposium on Distributed Computing and
Artificial Intelligence 2008 (DCAI 08), Advances in Soft Computing, J.M. Corchado et al.
(Eds.), Vol. 50/2009, Springer, ISBN: 978-3-540-85862-1, 1–10.
13. Paletta M., Herrero P. (2008) Towards Fraud Detection Support using Grid Technology, ac-
cepted for publication in a Special Issue at Multiagent and Grid Systems - An International
Journal. (To be published).
Backing-up Fuzzy Control of a Truck-trailer
Equipped with a Kingpin Sliding Mechanism
Abstract For articulated vehicles met in robotics and transportation fields, even
for an experienced operator, backing-up leads usually to jack-knifing. This paper
presents a fuzzy logic controller for back-driving a truck-trailer vehicle into a pre-
defined parking task. The truck-trailer link system is equipped with a kingpin slid-
ing mechanism acting as an anti-jackknife excitation input. By applying fuzzy
logic control techniques, precise system model is not required. The developed
controller with thirty four rules works well as the presented simulation results
demonstrate the avoidance of jack-knife and the accuracy of the backing-up tech-
nique.
1 Introduction
Control of the backward movement of a truck and trailer vehicle, called docking
task, is known to be a typical nonlinear control problem. The difficulty of the con-
trol system design not only causes the dynamics to be nonlinear, but also empha-
sizes the inherent physical limitations of the system such as the jackknife phe-
nomenon, a mathematical model of which is discussed in (Fossum and Lewis
1981). The control system under investigation is not only nonlinear but also non-
holonomic. Backward movement control of computer simulated truck-trailers us-
ing various types of intelligent control e.g. fuzzy control, neural control and neu-
rofuzzy or genetic algorithm-based control (Yang et al. 2006, Kiyuma et al. 2004,
Riid and Rustern 2001), has been reported. The backing-up control of a truck-
trailer is considered in (Park et al. 2007) as a system with time delay. A fuzzy
knowledge-based control for backing multi-trailer systems is also considered in
(Riid et al. 2007). The kingpin sliding mechanism, used in current work for back-
ward stabilization, has been also used for off-tracking elimination in multi-
articulated vehicles (Manesis et al. 2003). In this paper the kingpin sliding mecha-
nism is used mainly as an active anti-jackknife steering mechanism for backward
motion. The mathematical model of the considered system is different and more
Siamantas, G. and Manesis, S., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 373–378.
374 Georgios Siamantas and Stamatis Manesis
complex from that of a truck-trailer without a kingpin sliding mechanism. The
precise mathematical description offers only an initial guidance for writing the
fuzzy rules of the controller.
x&0 = U1 cos θ 0
y& 0 = U1 sin θ 0
U1
θ&0 = tan ϕ
l0
1 U S
θ&1 = U1 + S 1 tan ϕ sin (θ 0 − θ1 ) − sin −1 (1)
l1 l0 l1
S& = U 2
x1 = x0 − l1 cos θ1 + S sin θ 0
y1 = y0 − l1 sin θ1 − S cos θ 0
The − sin −1 (S / l1 ) term was added to take into account the change of angle θ1 in
case where the velocity U1 is zero and the rate of change of the kingpin sliding
Backing-up Fuzzy Control of a Truck-trailer 375
U 2 is not zero. The above system model will be used in the evaluation of the de-
signed fuzzy logic controller performance through simulations.
The main objective of this Fuzzy Logic Controller (FLC) is the backward mo-
tion control of the truck and trailer system to follow a target line, a procedure
similar to parking. In the design of the fuzzy logic controller we used 4 inputs
( α − θ1 , θ 0 − θ1 , d , S ) and 2 outputs ( ϕ , S& ). From the four inputs the first three
are needed for achieving the control objective and the fourth one ( S ) is needed to
make S zero when the first two inputs become zero. The input and output vari-
ables were defined to have 5 membership functions each: NL: Negative Large,
NS: Negative Small, ZE: Zero, PS: Positive Small, PL: Positive Large. The nor-
malized degree of membership function diagrams for each input or output variable
are shown in Fig. 2. For the input variables the NS, ZE, PS membership functions
are triangular and the NL, PL are trapezoidal. The output variables have all 5
membership functions triangular.
The fuzzy logic controller has been developed by defining 34 logical rules
shown in Fig. 3. These rules are consistent with the following evaluation strate-
gies: (1) the truck’s steering wheels must turn in such a way as to make the system
move towards the direction that makes the angle α − θ1 zero, (2) if the angles
α − θ1 and θ 0 − θ1 belong to the ZE degree, i.e. the truck-trailer system is moving
towards the axes origin and the angle between the truck and the trailer is close to
zero, then we can turn the truck’s steering wheels in such a way as to minimize the
distance d from the horizontal axis, (3) if the angle θ 0 − θ1 is large (NL, PL de-
grees) and to avoid jackknifing, regardless of the values of all other variables, the
steering wheels should be turned in such a way as to minimize the angle θ 0 − θ1 ,
(4) the kingpin sliding should have a direction opposite to the centrifugal direction
and must be proportional to the angle θ 0 − θ1 . This action will provide greater cor-
376 Georgios Siamantas and Stamatis Manesis
rection margin to the steering wheels to avoid jackknifing, (5) if the angles α − θ1
and θ 0 − θ1 belong to the ZE degree, i.e. the truck-trailer system is moving to-
wards the axes origin and the angle between the truck and the trailer is close to
zero, then we can make the kingpin sliding distance zero by applying opposite S&
proportionally to S . The linguistic rules that were used have the general form:
IF [INP1 is MFINP1] AND [INP2 is MFINP2] THEN [OUT1 is MFOUT1]
The fuzzy operator AND and the implication method were defined as mini-
mum. The centroid defuzzification method was used. The output aggregation
method was defined as sum. We have chosen the sum method instead of the
maximum method because it fits better to the cumulative control method imposed
from strategy no. 2 used in controlling the distance d over the control of the
α − θ1 angle.
Fig. 4. System response from initial conditions (60, -10, 45, -45). Without sliding control (left),
with sliding control (right)
378 Georgios Siamantas and Stamatis Manesis
Fig. 5. System response from initial conditions (60, 20, 45, -45) and (-35°, 35°) steering wheel
angle limit. Without sliding control (left - jackknife), with sliding control (right)
4 Conclusions
A fuzzy logic controller was designed for the backward motion of a truck-trailer
system with on-axle kingpin sliding. Various tests were conducted to test the con-
troller effectiveness. The controller showed good performance with the system
starting from various initial conditions. The use of kingpin sliding towards the
centripetal direction has no adverse effect in system response while it reduces the
maximum turn of the truck’s steering wheels when the system is maneuvering
backwards. This gives greater steering wheel turn margin for the truck to avoid
jackknife.
References
1. Fossum TV., Lewis GN. A mathematical model for trailer-truck jackknife. SIAM Review
(1981); vol.23, no.1: 95-99.
2. Kiyuma A., Kinjo H., Nakazono K., Yamamoto T. Backward control of multitrailer sys-
tems using neurocontrollers evolved by a genetic algorithm. Proceedings 8th International
Symposium on Artificial Life and Robotics (2004); pp.9-13.
3. Manesis S., Koussoulas N., Davrazos G., On the suppression of off-tracking in multi-
articulated vehicles through a movable junction technique. Journal of Intelligent and Ro-
botic Systems (2003); vol.37: pp.399-414.
4. Park C.W., Kim B.S., Lee J., Digital stabilization of fuzzy systems with time-delay and its
application to backing up control of a truck-trailer. International Journal of Fuzzy Systems,
(2007), vol.9, pp.14-21.
5. Riid A., Ketola J., Rustern E., Fuzzy knowledge-based control for backing multi-trailer
systems. Proceedings IEEE Intelligent Vehicles Symposium, (2007), pp.498-504.
6. Riid A., Rustern E. Fuzzy logic in control: Track backer-upper problem revisited. Proceed-
ings 10th IEEE International Conference on Fuzzy Systems (2001); vol.1: pp.513-516.
7. Yang X., Yuan J., Yu F. Backing up a truck and trailer using variable universe based fuzzy
controller. Proceedings IEEE International Conference on Mechatronics and Automation
(2006); pp. 734-739.
Sensing Inertial and Continuously-Changing
World Features
Abstract Knowledge and causality play an essential role in the attempt to achieve
commonsense reasoning in cognitive robotics. As agents usually operate in dynamic
and uncertain environments, they need to acquire information through sensing iner-
tial aspects, such as the state of a door, and continuously changing aspects, such as
the location of a moving object. In this paper, we extend an Event Calculus-based
knowledge framework with a method for sensing world features of different types
in a uniform and transparent to the agent manner. The approach results in the mod-
eling of agents that remember and forget, a cognitive skill particularly suitable for
the implementation of real-world applications.
1 Introduction
Patkos, T. and Plexousakis, D., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 379–388.
380 Theodore Patkos and Dimitris Plexousakis
significance of reasoning about what agents know or do not know about the current
world state and how this knowledge evolves over time has been acknowledged as
highly critical for real-life implementations [1, 11].
In previous work we have developed a unified formal framework for reasoning
about knowledge, action and time within the Event Calculus for dynamic and un-
certain environments [9]. The current paper extends the above framework with an
account of knowledge-producing actions for both inertial and continuously chang-
ing world features in a uniform manner. For instance, a moving robot may be able to
sense parameters, such as the state of doors, the number of persons around it or its
current position. Knowledge about door states can be preserved in its memory until
some relevant event changes it, while knowledge about the other features should be
considered invalid after a few moments or even at the next time instant. Still, it is
important that the act of sensing can treat the different contingencies in a transparent
to the robot fashion.
The contribution of this study is of both theoretical and practical interest. The
proposed approach allows sensing of inertial and continuously changing properties
of dynamic and uncertain domains in a uniform style, providing a level of abstrac-
tion to the design of an agent’s cognitive behavior. Moreover, it results in the devel-
opment of agents that are able not only to remember, but also to forget information,
either to preserve consistency between the actual state of the world and the view
they maintain in their KB or due to restrictions, such as limited resources, which
pose critical constraints when considering real-life scenarios. Finally, the approach
is based on a computationally feasible formal framework.
The paper proceeds as follows. We first provide an overview of relevant ap-
proaches and background material about the Event Calculus and the knowledge
theory. In Section 3 we describe the sensing methodology for inertial relations and
in Section 4 we explain how this scheme can also support continuously changing
aspects. Section 5 illustrates its application on more complex domains. The paper
concludes with remarks in Section 6.
Scherl and Levesque [12] developed a theory of action and sensing within the Sit-
uation Calculus, providing a solution to the frame problem by adapting the standard
possible worlds specification of epistemic logic to action theories, an approach first
proposed by Moore [7]. The significance of an explicit representation of time has
been acknowledged in [11], where the formalism has been extended with a treatment
of concurrent actions and temporal knowledge. Working on the Fluent Calculus,
Thielscher [15] provided a solution to the inferential frame problem for knowledge,
along with an elaborate introduction of the notion of ability for an agent to achieve a
goal. Nevertheless, in both frameworks, once knowledge is acquired it is preserved
persistently in memory; it can be modified by relevant actions, but is never lost.
The action language Ak [5] permits information to be retracted from the set of facts
known by an agent, as a result of actions that affect the world non-deterministically.
Still, sensing continuously changing world aspects is not considered.
Moreover, these frameworks are computationally problematic, due to their de-
pendence on the possible worlds model. Many recent approaches adopt alternative
representations of knowledge that permit tractable reasoning in less expressive do-
mains. Petrick and Levesque [10], for instance, define a combined action theory in
the Situation Calculus for expressing knowledge of first-order formulae, based on
the notion of knowledge fluents presented in [2] that treated knowledge change at
a syntactical level. To achieve efficient reasoning, knowledge of disjunctions is as-
sumed decomposable into knowledge of the individual components. Still, sensing is
again limited to inertial parameters and knowledge is never retracted.
Our approach is based on a knowledge theory that treats knowledge fluents in a
style similar to [10], which uses the Event Calculus as the underlying formalism.
The Event Calculus is a widely adopted formalism for reasoning about action and
change. It is a first-order calculus, which uses events to indicate changes in the en-
vironment and fluents to denote any time-varying property. Time is explicitly repre-
sented and reified in the language propositions. The formalism applies the principle
of inertia to solve the frame problem, which captures the property that things tend
to persist over time unless affected by some event; when released from inertia, a
fluent may have a fluctuating truth value at each time instant (we further elucidate
these concepts in successor sections). It also uses circumscription [4] to support non-
monotonic reasoning. The Event Calculus defines predicates for expressing which
fluents hold when (HoldsAt), what events happen (Happens), which their effects
are (Initiates, Terminates, Releases) and whether a fluent is subject to the law of
inertia or released from it (ReleasedAt).
A number of different dialects for the Event Calculus have been proposed sum-
marized in [13] and [6]. For our proposed knowledge theory we have employed and
extended the discrete time axiomatization (DEC), thoroughly described in [8]. The
knowledge theory can be applied to domains involving incomplete knowledge about
the initial state, knowledge-producing actions, actions that cause loss of knowl-
edge and actions with context-dependent effects. The axiomatization is restricted
to reasoning about fluent literals, assuming that disjunctive knowledge can be bro-
ken apart into the individual components. To simplify the presentation, we assume
perfect sensors for the agent, i.e., the result of sensing is always correct.
382 Theodore Patkos and Dimitris Plexousakis
Fig. 1 The DEC Knowledge Theory axiomatization, where e denotes an arbitrary event, f pos ,
fneg , frel are positive, negative and non-deterministic effects respectively, and fi , f j , fk are the
corresponding effect’s preconditions.
As a running example for the rest of the paper, we assume a robot named Rob
either standing or moving with constant velocity v and able to sense the state of
doors, the number of persons near him and its current location, represented by
fluents Closed(door), PersonsNear(robot, num) and Position(robot, pos), respec-
tively. Initially, Rob is unaware of any relevant information, i.e,
¬∃dHoldsAt(Kw(Closed(d)), 0), ¬∃nHoldsAt(Kw(PersonsNear(Rob, n)), 0) and
¬∃pHoldsAt(Kw(Position(Rob, p)), 0).
Happens(sense( f ),t) ⇒
(KT4.1)
Happens(remember( f ),t) ∧ Happens( f orget( f ),t + T ( f ))
Most current logic-based approaches that study the interaction of knowledge and
time focus on sensing and obtaining knowledge about inertial fluents. Still, this is
hardly the case when reasoning in dynamically changing worlds. Next, we show
how the aforementioned approach can also be applied to broader classes of situa-
tions. First, we elaborate on the characteristics of such situations.
In addition to inertial fluents there are also fluents that change their truth value in
an arbitrary fashion at each time instant. The number of persons entering a building
or the mails arriving daily at a mailbox are typical examples. Such fluents are usually
applied in order to introduce uncertainty, as they give rise to several possible models,
and can be defined as follows:
Definition 2. A fluent is called non-inertial if its truth value may change at each
timepoint, regardless of occurring events.
A non-inertial fluent is always released from inertia and is represented in the Event
Calculus by the predicate ReleasedAt( f ,t). A particular use for non-inertial fluents
has been proposed by Shanahan [13] as random value generators in problems, such
as tossing a coin, rolling a dice etc, naming them determining fluents, as they deter-
mine non-deterministically the value of other world aspects.
For the purposes of epistemic reasoning, sensing non-inertial fluents provides
temporal knowledge that is only valid for one time unit, i.e., it only reflects what is
known at the time of sensing, but not what will be known afterwards. Whenever Rob
needs to reason about the number of persons around him, it must necessarily perform
a new sense action to acquire this information; any previously obtained knowledge
may not reflect the current situation. Still, there is a class of non-inertial fluents that
Sensing Inertial and Continuously-Changing World Features 385
is far more interesting, because it expresses continuous change that follows a well-
defined pattern. Such fluents are utilized to denote gradual change (or processes,
according to Thielscher [16]), for instance to represent the height of a falling object,
the position of a moving robot, the patience of a waiting person etc. We call this
class of fluents functional non-inertial fluents:
Definition 3. A non-inertial fluent is called functional if its value changes gradually
over time, following a predefined function.
In order to represent gradual change in the Event Calculus, we first need to re-
lease the involved fluent from inertia, so that its value is allowed to fluctuate, and
then we can apply a state constraint to restrain the fluctuation, so that the fluent can
exhibit a functional behavior. For example, to express the change in Rob’s location
(on a single axis) while moving with constant velocity, we apply the following state
constraint concerning the Position(robot, pos) fluent:
HoldsAt(Knows( f1 ⇒ f2 ),t) ⇒
(K)
(HoldsAt(Knows( f1 ),t) ⇒ HoldsAt(Knows( f2 ),t))
Example 1 The previous discussion illustrates how the problem of sensing the two
non-inertial fluents PersonsNear(robot, num) and Position(robot, pos) can be ad-
dressed. Imagine that Rob performs Happens(sense(PersonsNear(Rob, num)), 0)
and Happens(sense(Position(Rob, pos)), 0) at timepoint 0. By forming the parallel
circumscription of the example’s domain theory (no initial knowledge and the two
event occurrences) along with Event Calculus, Knowledge Theory and uniqueness-
of-names axioms, we can prove several propositions2 . First, two, internal to the
robot, events will be triggered for each fluent; a remember event at timepoint 0 and
a f orget event at timepoint 1. For the PersonsNear fluent it can also be proved that
due to (KT4.2) and (KT2) at timepoint 0 and (KT4.3), (KT2) and (KT7) at timepoint
1. The case is different for the robot’s position:
for all t > 0. This holds true, because, although the f orget action results in
¬HoldsAt(KPw(Position(Rob, pos)),t) for t ≥ 1, axiom (K) transforms (SC) into
HoldsAt(Knows(Position(Rob, pos)),t1 ) ∧ t > 0 ⇒
HoldsAt(Knows(Position(Rob, pos + (v ∗ t))),t1 + t)
Consequently, once Rob senses his position at some timepoint, he can infer future
positions, without the need to perform further sense actions. The state constraint
provides all future derivations, affecting knowledge through (KT7).
5 Context-dependent Inertia
We can now formalize complex domains that capture our commonsense knowledge
of changing worlds, where fluents behave in an inertial or non-inertial manner ac-
cording to context. For instance, a robot’s location is regarded as a continuously
changing entity only while the robot is moving; when it stands still, the location is
subject to inertia. As a result only while the robot knows that it is not moving can
knowledge about its location be stored persistently in its KB in the style described
in Section 3. In general, for any fluent that presents such dual behavior, there usually
is some other fluent (or conjunction of fluents) that regulates its compliance to the
law of inertia at each time instant. For the Position(robot, pos) fluent, for instance,
there can be a Moving(robot) fluent that determines the robot’s motion state. Such
regulatory fluents appear in the body of state constraints to ensure that inconsistency
does not arise when inertia is restored. According to their truth state, the fluent that
they regulate can either be subject to inertia and maintain its value or released from
it in order to be subject to a state constraint. To integrate regulatory fluents in the
theory, (KT4.3) must be extended to ensure that the agent does not forget a fluent
when it knows that the latter is inertial and should be kept in the KB:
where frglr is f ’s regulatory fluent. Notice that even when the agent is not aware of
f ’s inertia state, i.e., ¬HoldsAt(Kw( frglr ),t), the axiom fires and knowledge about
f is lost, to avoid preserving knowledge that does not reflect the actual state.
Example 2 Imagine that Rob’s movement is controlled by actions Start(robot) and
Stop(robot) with effect axioms:
¬Happens(sense(Position(Rob, pos)),t)∧
¬HoldsAt(Knows(¬Moving(Rob)),t) ⇒ (5.6)
Terminates( f orget(Position(Rob, pos)), KPw(Position(Rob, pos)),t)
In brief, (5.6) states that the position should be stored if Rob knows that he is not
moving. If, on the other hand, the robot does not possess such knowledge (even
if he is unaware of his current mobility state, due to a potential malfunction), the
information acquired will be retracted one time instant after the sense action. In this
case, future knowledge can be inferred only if some state constraint is available.
Moreover, it can be proved that if Rob knows initially whether he is moving,
a single sense action is sufficient to provide knowledge about all future locations,
regardless of any narrative of Start and Stop actions before or after sensing. And,
most importantly, Rob does not need to consider his current state when sensing; the
knowledge theory abstracts the reasoning process of determining knowledge evolu-
tion, regardless of whether the sensed fluent is inertial or continuously changing.
6 Conclusions
Acknowledgements The authors wish to thank Dr. Nick Bassiliades for stimulating comments
and interesting discussions.
References
1. Chittaro, L., Montanari, A.: Temporal Representation and Reasoning in Artificial Intelligence:
Issues and Approaches. Annals of Mathematics and Artificial Intelligence 28(1-4), 47–106
(2000)
2. Demolombe, R., Pozos-Parra, M.: A simple and tractable extension of situation calculus to
epistemic logic. 12th International Symposium on Methodologies for Intelligent Systems
(ISMIS-00) pp. 515–524
3. Fritz, C., Baier, J.A., McIlraith, S.A.: ConGolog, Sin Trans: Compiling ConGolog into Ba-
sic Action Theories for Planning and Beyond. In: Proceedings International Conference on
Principles of Knowledge Representation and Reasoning (KR), pp. 600–610. Australia (2008)
4. Lifschitz, V.: Circumscription. Handbook of Logic in Artificial Intelligence and Logic Pro-
gramming 3, 297–352 (1994)
5. Lobo, J., Mendez, G., Taylor, S.R.: Knowledge and the Action Description Language A. The-
ory and Practice of Logic Programming (TPLP) 1(2), 129–184 (2001)
6. Miller, R., Shanahan, M.: Some Alternative Formulations of the Event Calculus. In: Compu-
tational Logic: Logic Programming and Beyond, Essays in Honour of Robert A. Kowalski,
Part II, pp. 452–490. Springer-Verlag, London, UK (2002)
7. Moore, R.C.: A formal theory of knowledge and action. In: Formal Theories of the Common-
sense World, pp. 319–358. J. Hobbs, R. Moore (Eds.) (1985)
8. Mueller, E.: Commonsense Reasoning, 1st edn. Morgan Kaufmann (2006)
9. Patkos, T., Plexousakis, D.: A Theory of Action, Knowledge and Time in the Event Calculus.
In: SETN ’08: Proceedings 5th Hellenic Conference on Artificial Intelligence, pp. 226–238.
Springer-Verlag, Berlin, Heidelberg (2008)
10. Petrick, R.P.A., Levesque, H.J.: Knowledge Equivalence in Combined Action Theories. In:
KR, pp. 303–314 (2002)
11. Scherl, R.: Reasoning about the Interaction of Knowledge, Time and Concurrent Actions in the
Situation Calculus. In: Proceedings 18th International Conference on Artificial Intelligence
(IJCAI), pp. 1091–1098 (2003)
12. Scherl, R.B., Levesque, H.J.: Knowledge, Action, and the Frame Problem. Artificial Intelli-
gence 144(1-2), 1–39 (2003)
13. Shanahan, M.: The Event Calculus Explained. Artificial Intelligence Today 1600, 409–430
(1999)
14. Shanahan, M., Witkowski, M.: High-Level Robot Control through Logic. In: ATAL ’00: Pro-
ceedings 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures
and Languages, pp. 104–121. Springer-Verlag, London, UK (2001)
15. Thielscher, M.: Representing the Knowledge of a Robot. In: A. Cohn, F. Giunchiglia, B. Sel-
man (eds.) Proceedings International Conference on Principles of Knowledge Representation
and Reasoning (KR), pp. 109–120. Morgan Kaufmann, Breckenridge, CO (2000)
16. Thielscher, M.: The Concurrent, Continuous Fluent Calculus. Studia Logica 67(3), 315–331
(2001)
17. Thielscher, M.: FLUX: A Logic Programming Method for Reasoning Agents. Theory and
Practice of Logic Programming 5(4–5), 533–565 (2005)
MobiAct: Supporting Personalized Interaction
with Mobile Context-aware Applications
1 Introduction
People need to access information and resources where and when they are per-
forming their activities to achieve the multitude of daily tasks in a satisfactory
manner. Today mobile devices are used widely to provide access to information
and services associated with tangible objects through various technologies (RFID
tags, two dimensional optical codes, Bluetooth etc.). As a result, an increasing
number of context aware applications exist in public environments which provide
various services and information to the public when and where they are needed.
Many institutions that have responsibility for places of public interest (libraries
[1], museums [2], showrooms [5], schools [4], supermarkets [6] etc.) introduce in
their environments extensions for mobile applications to harness the new potential
technology brings. The physical space owned by these institutions is gradually en-
riched by and interweaved with a digital information space. Users (visitors, cli-
ents, readers etc) need to interact with both spaces to fully benefit of all offered
Stoica, A. and Avouris, N., 2009, in IFIP International Federation for Information Processing, Volume
296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 389–397.
390 Adrian Stoica and Nikolaos Avouris
services to them and inherently they need devices that can link the physical and
digital spaces.
Personalization and adaptation of these services to the users are very relevant in
this context, however a number of issues mostly related to security and privacy [7]
need to be addressed.
Mobile applications can exploit user profiles as valuable information that can
dramatically improve their quality and their relevance. User profiles are based on
user traces, logs, user selections (small surveys completed, search terms etc), fre-
quency of use of various features. While the information can be used to improve
the quality and the efficiency of the services, there is a risk to be used against the
user. People are concerned about privacy. They wish to control their own data and
they do not like to feel followed. Usually users want the benefits of an adaptive
system that gives them relevant feedback but, on the other hand, they do not like
that the system gathers data about them.
The system design must take into account that systems and applications must
support and facilitate the tasks and the activities of their users. Users must feel that
they are in control of the system and they should fully understand the benefits and
the trade-offs of using it.
The user profile can be common to more applications using an external user
modelling server to allow interoperability [3]. This user profile, which is com-
posed of the traces that the user leaves in the visited places, is enriched and up-
dated according to the security and privacy options selected.
Middleware for context aware applications have been developed to allow fast
and consistent deployment of such applications. The existing middleware ap-
proaches are tackling issues like context management [8,14,15], privacy and secu-
rity [11], collaboration and social interaction [10], data sharing [9], service dis-
covery and so on.
For context aware mobile applications to step out from the labs and to be
widely used by the general public, it is required to find solutions for consistent
management of the service providers and consumers, user profiles and support of
heterogeneity of mobile devices’ platforms.
The key concepts of our MobiAct framework are: hybrid space, physical hyper-
link, hybrid space interaction device, the context and the dynamic service binding.
Hybrid Space is the space obtained from the fusion of physical and virtual
spaces that are interweaved. The need to access the virtual space comes from the
actual tasks the users have to perform in a certain physical space. While physical
space knowledge can be accessed using perceptual mechanisms, the virtual space
needs a mobile device to fetch and present information in a form suitable for hu-
man processing.
MobiAct: Supporting Personalized Interaction 391
Almost any physical artefact has associated information/services from different
providers. The relevance of the information depends in some extent on the relation
that the provider has with the physical space where the artefact is found – e.g.
ownership.
The physical world artefacts and the virtual space items are linked together by
an unequal correspondence (e.g. a very small physical object can have linked to it
many digital objects).
To instantiate the link between the physical and virtual artefacts, we need
means of interacting with the real world [12,13]. From the need of this kind of hy-
brid interactions the concept of physical hyperlink or object hyperlinking
emerged.
A Physical Hyperlink, as a means to connect a physical artefact with the digi-
tal information/ services associated to it that uniquely identifies the physical arte-
fact or a class of equivalent artefacts. A physical hyperlink can be implemented
using several modes:
• Human readable visual cue (numbers, letters)
• Machine readable visual cue (2D, 1D barcodes)
• RF tag (RFID, Bluetooth)
The different implementations of physical hyperlinks have various aspects that
match characteristics of different mobile devices. Considering this issue, we pro-
pose that a standardized multi-modal approach to be adopted for implementing
physical hyperlinks. Including a human readable visual cue will ensure mass ac-
cessibility provided that many devices support text entry. In figure 1 the book
physical hyperlink includes modalities for barcode scanners, camera equipped de-
vices and human visual perception.
3 MobiAct Architecture
Based on the MobiAct framework we have designed an architecture that takes into
account among others: roaming among contexts and spaces, personalization, ano-
nymity, privacy and security.
People move every day through a succession of public, semi-public and private
spaces according to their goals and their activities. The identity of the people
sometimes should be known to grant them access to certain spaces, while some
other times, it should be hidden to protect user privacy.
The relevance of accessing mobile services highly depends on user goals and
the nature of the tasks she is doing. The utility a mobile service provides to a user
depends on users’ interests, goals and lifestyle.
The figure 2 depicts the proposed MobiAct architecture. The user is in a hybrid
enabled space and she performs a certain task. In the physical space physical arte-
facts are present which allow user interactions through physical hyperlinks.
MobiAct: Supporting Personalized Interaction 393
The main entities in our architecture are: user, user agency, user agent, physical
space administrator/owner, semantics of space service, broker.
The figure 2 focuses on a certain physical space area at a certain moment. The
left side presents the physical space, while the right side presents the virtual space
elements that are separated by a dashed line for the sake of clarity. The entities
“User Agency”, “Broker” are represented as single instance for the simplicity of
the schema. A different user could use another agency or maybe she could use
several agencies. In the same way, there can be more brokers competing between
them in the quality of service providing. However, at a certain moment of interac-
tion, the user can be associated with one “User Agency” and she deals with one
“Broker”.
The user is represented in the virtual space by “User Agent” that mediates in-
teraction with other entities in the virtual space. The user initiates interaction with
an artefact and a typical sequence of actions takes place. Let us examine the sce-
nario of interacting with the hybrid space.
There are several phases of interaction. First there is the initiation phase where
the user starts the interaction with an artefact. The user agent has to select a Bro-
394 Adrian Stoica and Nikolaos Avouris
ker suitable for the space, making use of previous interaction trails from the user
to propose a set of available service providers to the user.
After selecting the desired service provider the interaction advances in the sec-
ond phase of service consuming. During her interaction with one service provider
the user might change its goals or she might understand that this service is not
what she needed. As a result she might change back to the initiation phase.
We have identified the following typical sequence of events during initial phase
of interaction with a hybrid space:
• The user is utilizing her mobile device to interact with the physical hyperlink of
an object (physical artefact);
• The device software connects with the user agent – there are two possibilities:
1) the user connects by means of an independent network like a 3G provider or
2) the user connects using a network provided by the space administrator/owner
- to request services for the selected object.
• The user agent selects a relevant broker (by means of a sort of directory ser-
vice) and it issues a request for service providers based on the user profile and
actual task performed and possibly on semantics of space provider information.
• Further on, the broker issues a request to available service providers registered
for that space. Upon the offers from the service providers it supplies the user
agent with a set of service providers.
• The user agent utilizes user profile and information from the semantics of space
provider to filter the results and to send them to the user.
In MobiAct there are two levels for providing personalization and adaptation: in-
tra-space and across spaces. The key element is the “User Agency” that is a
trusted entity which the user herself selects. The “User Agency” provides the user
with a ubiquitous accessible profile and also allows anonymous use of certain ser-
vices.
There are three types of profiles that play a role in the MobiAct architecture:
user profiles, hybrid space profiles and service provider profiles. During user in-
teraction with a hybrid space there are generated trails of interaction that enrich
these profiles. Except interaction trails profiles contain identity, interest ontology,
generic user information (e.g. language(s), age, sex etc.) and privacy preferences
for the user, rankings for hybrid space, content and service providers
At intra-space level the user experience is improved within a certain hybrid
space through filtering and ranking content and service providers according to the
user profile, community produced ratings, providers’ and content’s metadata and
collective profile. Both user profile and provider's metadata are being built incre-
mentally.
MobiAct: Supporting Personalized Interaction 395
The relations between the service provider and the physical space the user is
immersed in influence the degree of relevance – e.g. In a museum the information
provided by the museum service provider (information about exhibits) should be
of higher precedence to other service providers
At across spaces level the architecture uses collective profiles and statistical
methods to filter and rank content and service providers in new spaces based on
trails from other spaces richer in trails and with a more complete profile.
The user agency has access to a multitude of user profiles in a multitude of
spaces.
The adaptation of the services for the users is done through the combination of
the personal model and the collective model with different weights according to
the richness of the personal user model in the specific space.
In figure 3 a structured representation
of the information in the user profile data-
base is shown. When a user visits a new
space the system can examine the specific
context to find popular artefacts, informa-
tion or activities. Also to match interest
across spaces it can examine the interests
already depicted in other contexts and to
match them against the other users that
have also interests defined for our current
Fig. 3. Simplified ERD of user profile
user new context.
The sum of trails of interaction over time builds up the history of the user, of
the space and of services. Each interaction generates trails on the side of every
participating entity. Using historical data each entity can improve its performance.
The visits of the various users in a certain space build up the specific space
model. This model has a twofold use: on one hand provides a base for a collective
statistical user model for new users and on the other hand provides information for
restructuring both the physical and virtual space.
5 Conclusion
Acknowledgments Special thanks are due to the Hybrid Libraries Project, funded under the
PENED 2003 Program, Grant number 03ED791 by the General Secretariat of Research and
Technology for financial support and to the Project of Supporting visitors of Solomos Museum,
funded by the Museum of Solomos and Prominent Zakynthians, under the Information Society
Program, for providing us with application requirements for personalization and adaptation of the
developed services.
References
1. Aittola M, Ryhänen T & Ojala T (2003) SmartLibrary - Location-aware mobile library ser-
vice. Proc. 5th International Symposium on Human Computer Interaction with Mobile De-
vices and Services, Udine, Italy, 411-416.
2. Exploratorium (2005), Electronic Guidebook forum Report, San Francisco, Available at:
www.exploratorium.edu/guidebook/eguides_forum2005.pdf Accessed: April 2006
3. Heckmann D., Schwartz T., Brandherm B., Schmitz M., and von Wilamowitz-
Moellendorff M., GUMO – The General User Model Ontology, L. Ardissono, P. Brna, and
A. Mitrovic (Eds.): UM 2005, LNAI 3538, pp. 428-432, 2005.
4. Liang J.-K., Liu T.-C., Wang H.-Y., Chang B., Deng Y.-C., Yang J.-C., Chouz C.-Y., Ko
H.-W., Yang S., & Chan T.-W., A few design perspectives on one-on-one digital class-
room environment, Journal of Computer Assisted Learning 21, pp181-189
5. mEXRESS Project Homepage, available at http://mexpress.intranet.gr/index.htm, last ac-
cessed October 2006
6. Roussos G., Koukara L., Kourouthanasis P, Tuominen J.O., Seppala O., Giaglis G. and
Frissaer J., 2002, "A case study in pervasive retail", ACM MOBICOM WMC02, pp. 90-94.
7. SWAMI Project (2006), Dark scenarios in ambient intelligence: Highlighting risks and
vulnerabilities, Deliverable D2, Available: http://swami.jrc.es/pages/documents/
SWAMI_D2_scenarios_Final_ESvf_003.pdf Accessed: April 2006
8. Dey, A.K., Salber, D., Abowd, G.: A Conceptual Framework and a Toolkit for Supporting
the Rapid Prototyping of Context-Aware Applications. Human-Computer Interaction 16
(2001) 97-166
9. Boulkenafed M, Issarny V, A Middleware Service for Mobile Ad Hoc Data Sharing, En-
hancing Data Availability, Middleware 2003 (2003), pp. 493-511.
10. Kern S, Braun P, Rossak W, MobiSoft: An Agent-Based Middleware for Social-Mobile
Applications, On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops
(2006), pp. 984-993.
MobiAct: Supporting Personalized Interaction 397
11. Heckmann D, Ubiquitous User Modeling, Vol. 297, Dissertations in Artificial Intelligence,
IOS Press, Amsterdam, NL, 2006
12. E. Rukzio, M. Paolucci, T. Finin, P. Wisner, T. Payne (Eds.) , Proceedings Workshop Mo-
bile Interaction with the Real World (MIRW 2006), available at
http://www.hcilab.org/events/mirw2006/pdf/mirw2006_proceedings.pdf, retrieved on 11th
of April 2008
13. G. Broll, A. De Luca, E. Rukzio, C. Noda, P. Wisner (Eds.) , Proceedings Workshop Mo-
bile Interaction with the Real World (MIRW 2007), available at
http://www.medien.ifi.lmu.de/pubdb/publications/pub/broll2007mirwmguidesTR/broll200
7mirwmguidesTR.pdf, retrieved on 11th of April 2008
14. Riva O, Contory: A Middleware for the Provisioning of Context Information on Smart
Phones, Middleware 2006 (2006), pp. 219-239.
15. Zimmermann A., Specht M. and Lorenz A., Personalization and Context Management,
User Modeling and User-Adapted Interaction Special Issue on User Modeling in Ubiqui-
tous Computing, Vol. 15, No. 3-4. (August 2005), pp. 275-302.
Defining a Task’s Temporal Domain for
Intelligent Calendar Applications
Anastasios Alexiadis and Ioannis Refanidis
Abstract Intelligent calendar assistants have many years ago attracted researchers
from the areas of scheduling, machine learning and human computer interaction.
However, all efforts have concentrated on automating the meeting scheduling
process, leaving personal tasks to be decided manually by the user. Recently, an
attempt to automate scheduling personal tasks within an electronic calendar appli-
cation resulted in the deployment of a system called SELFPLANNER. The system
allows the user to define tasks with duration, temporal domain and other attributes,
and then automatically accommodates them within her schedule by employing
constraint satisfaction algorithms. Both at the design phase and while using the
system, it has been made clear that the main bottleneck in its use is the definition
of a task’s temporal domain. To alleviate this problem, a new approach based on a
combination of template application and manual editing has been designed. This
paper presents the design choices underlying temporal domain definition in
SELFPLANNER and some computational problems that we had to deal with.
1 Introduction
Alexiadis, A. and Refanidis, I., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 399–406.
400 Anastasios Alexiadis and Ioannis Refanidis
attempt to put them on the user’s calendar [2], SELFPLANNER puts them into the
user’s calendar, taking into account several types of constraints and preferences.
Perhaps the most important attribute of a task is its temporal domain, i.e. when
the task can be executed. However, defining a temporal domain is also the most
cumbersome part of a task’s definition. Having recognized that from the system’s
design phase, we devised several mechanisms to facilitate domain definition. They
are mainly based on a combination of template application and manual editing. So,
this paper concentrates on the domain definition issue both from a human-
computer interaction and from an algorithmic point of view.
The rest of the paper is structured as follows: Section 2 highlights the key fea-
tures of the SELFPLANNER application. Section 3 presents the internal representa-
tion of task domains in SELFPLANNER, whereas Section 4 discusses the algo-
rithmic issues incurred by the way domains are represented. Finally, Section 5
concludes the paper and poses future research directions.
2 SELFPLANNER Overview
3 Domain Representation
From a theoretical point of view, the temporal domain of a task consists of a set of
intervals. For practical reasons we consider only integer domains. In the context of
the application, a unit corresponds to the quantum of time that, as in most elec-
tronic calendars, is 30 minutes. For reasons of clarity, in the following we will use
a notation of the form 〈DD/MM/YY HH:MM〉 to denote time points. Depending
on the context, several parts of the time stamp will be omitted or altered.
Using intervals to represent domains is however problematic both from a com-
putational and from a user’s experience point of view. Suppose for example that a
user wants to schedule a task of 2 hours duration and this task has to be performed
during office hours next week. Supposing a 5 days working week, this might re-
sults in five intervals of the form, say:
[〈27/10/08 09:00〉, 〈27/10/08 17:00〉] … [〈31/10/08 09:00〉, 〈31/10/08 17:00〉]
Imagine now what happens if the same task has a deadline after a month or a
year: Storing and retrieving the domain of this task would be a time- and space-
consuming process. Even worse, having the user to define this domain would be
an inhibitory factor to use the system at all. To overcome these deficiencies, we
selected to avoid using interval representation for temporal domains.
A template is a pattern with specific duration and with no absolute time reference.
SELFPLANNER supports three types of templates: Daily, Weekly and Monthly.
402 Anastasios Alexiadis and Ioannis Refanidis
Each template consists of a set of intervals covering the entire pattern’s period and
denoting which slots are allowed for a task’s execution (the remaining are not).
For example, a daily template of lunch hours would comprise a single interval, say
[〈12:30〉, 〈15:00〉]. Similarly, a weekly template of office hours would comprise
five intervals of the form [〈Mo 09:00〉, 〈Mo 17:00〉] to [〈Fri 09:00〉, 〈Fri 17:00〉].
Note that the daily template’s interval does not have any day reference, whereas
the weekly template’s intervals have a relative reference of the week’s day.
Templates can be used to define domains. The simplest way is to combine a
template with a release date and a deadline. However, to increase flexibility in
domain definition through templates, we distinguish four different ways in apply-
ing a template. These are the following:
• Add included, denoted with : The time slots identified by the template are
added in a task’s temporal domain.
• Remove excluded, denoted with : The time slots not identified by the tem-
plate are removed from a task’s temporal domain.
• Add excluded, denoted with : The time slots not identified by the template
are added in a task’s temporal domain.
• Remove included, denoted with : The time slots identified by the template
are removed from a task’s temporal domain.
As an example, consider again the daily lunch hours template, named Lunch. If
we want to include lunch hours to a task’s domain we use Lunch. If we want to
say that a task is to be executed only during the lunch hours, we use Lunch. If
we want to exclude lunch hours from a task’s domain we use Lunch.
4 Computational Issues
Using lists of domain actions to represent temporal domains gives rise to interest-
ing computational problems, such as whether a time slot is included in the domain
or not, how to transform the domain into the traditional representation with list of
intervals or, finally, how to simplify the list of domain actions. These issues are
treated in the following subsections.
Knowing whether a time slot is included in a task’s temporal domain or not is im-
portant, among others, when graphically displaying parts of the domain on the
screen. The following algorithm answers this question:
Algorithm 1. GetTimeSlotStatus
Inputs: A domain represented by a list of domain actions and a time slot T.
Output: Either of the included or excluded values.
1. If T is before the release date or after the deadline, return excluded.
2. Let D be the last domain action. If no such action exists, then D is NULL.
3. While D≠NULL
404 Anastasios Alexiadis and Ioannis Refanidis
a. If D adds the T, return included.
b. If D removes T, return excluded.
c. Let D be the previous domain action. If no such action exists, D is NULL.
4. Return included.
Algorithm GetTimeSlotStatus is very fast. Indeed, suppose that the action list
has N entries and each template has at most M intervals, then the worst case com-
plexity is O(Ν·Μ), with N and M usually taking small values.
Algorithm 2. GetIntervals
Inputs: A domain represented by a list of domain actions.
Output: A list of intervals.
1. Let A be a table of integers, whose size equals the number of time slots be-
tween the domain’s release date and the deadline. Let S this size. Initialize A
with zeroes. Let C=0.
2. Let D be the last domain action. If no such action exists, then D is NULL.
3. While D≠NULL and C<S.
a. For each time slot T added by D
i. If A[T] is 0, then set A[T] to 1 and increase C by 1.
b. For each time slot T removed by D
i. If A[T] is 0, then set A[T] to -1 and increase C by 1.
c. Let D be the previous domain action. If no such action exists, D is NULL.
4. If C<S
a. For each time slot T such that A[T]=0, set A[T]=1.
5. Create a list of intervals by joining consecutive time slots having A[T]≥0.
There are cases where several domain actions can be removed from the list with-
Defining a Task’s Temporal Domain 405
out any change in the resulting domain. For example, suppose OfficeHours
exists in a domain action list. This domain action adds to the domain all the in-
cluded template’s intervals and, at the same time, it removes all excluded tem-
plate’s intervals. Furthermore, this domain action is not temporally constrained, so
it covers the entire domain. In this case, any domain action occurring before this
one wouldn’t have any effect in the domain and thus it could be safely removed.
Detecting domain actions that can be safely removed from the action list re-
quires a simple change in Algorithm 2. In particular, in step 3 we should check,
for each domain action D, whether the domain action has increased C or not. In
the latter case the domain action does not affect the temporal domain and can be
removed. However, from an application point of view, this removal should (and
does) not occur without prior confirmation from the user, since the user might in-
tend to remove or modify some of the subsequent domain actions, which could re-
sult in ineffective domain actions to become effective.
We are also working on defining and integrating a task ontology, where each
class of tasks has its default partially defined temporal domain. To conclude with,
we believe that intelligent calendar assistants will play a significant role in orga-
nizing our lives in the future but a crucial factor for their adoption is their usabil-
ity. The work presented in this paper is, we hope, a step towards that direction.
References
Abstract In the scientific literature, it is generally assumed that models can be com-
pletely established before the diagnosis analysis. However, in the actual mainte-
nance problems, such models appear difficult to be reached in one step. It is indeed
difficult to formalize a whole complex system. Usually, understanding, modelling
and diagnosis are interactive processes where systems are partially depicted and
some parts are refined step by step. Therefore, a diagnosis analysis that manages
different abstraction levels and partly modelled components would be relevant to
actual needs. This paper proposes a diagnosis tool managing different modelling
abstraction levels and partly depicted systems.
1 Introduction
Giap, Q.-H., Ploix, S. and Flaus, J.-M., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 407–415.
408 Quang-Huy GIAP, Stephane PLOIX, and Jean-Marie FLAUS
2 Problem statement
For instance, in the previous example, mode4 ∨ mode5 ∨ mode2 and mode3 are
the monomials of the m-proposition.
The concept of partial behavioral abstraction can then be introduced.
Managing Diagnosis Processes with Interactive Decompositions 409
In actual physical systems, a fault propagation models the fact that a fault (or failure)
mode of an item induces fault modes of other items. Fault propagation is usually
represented by a logical implication, e.g. mode(itemi ) → mode0 (item j ). To take into
account fault propagations, the transformation of logical implications into logical
conjunctions is achieved. A logical implication A → B is equivalent to ¬A ∨ B, then
mode(itemi ) → mode0 (item j ) is equivalent to ¬mode(itemi ) ∨ mode(item j ).
Let’s summarize results that can appears in the statement of a complete diagnostic
problem
1. the list of items and possible modes for each item.
2. the partial behavioral abstractions inferred from expert’s knowledge.
3. the modes implied in inconsistent tests, modelled by disjunctive m-propositions.
4. the fault propagations, modelled by disjunctive m-propositions.
Let’s now detail the diagnosis process based on interactive decompositions (top-
down method). It is an interactive process between a diagnosis tool (a machine)
410 Quang-Huy GIAP, Stephane PLOIX, and Jean-Marie FLAUS
and an expert. The diagnosis process begins when a malfunction is detected. Fault
isolation usually starts with the tests that check the global function of a system. In
each expert’s interaction, expert performs tests, collects new data and continues the
process. According to the monotony principle, the diagnosis tool provides more and
more detailed diagnoses as new results arise. Step by step, it locates the subsystems
or components which are in a faulty mode. This diagnosis process is depicted by
figure 1.
Note that, the solving process is the same at each interaction. Let’s focus now
on what happens between two interactions. Diagnosis process between two interac-
tions can be decomposed into two parts. The first one is called transformation: it
transforms the expert problem with partial behavioral abstractions into a solvable
problem. The second one is based on a MHS-Tree algorithm which computes and
provides diagnoses from the solvable problem.
3.1 Transformation
During the transformation step, the initial knowledge about system (symptoms, de-
composition model and fault propagations) can be transformed into a m-proposition
by:
1. introducing complementary fault mode for each known item
2. introducing virtual complementary items in order to transform partial behavioral
abstractions into complete behavioral abstractions in formalizing all the implica-
tions from conjunction of child modes to each parent mode, in order to compute
the corresponding equivalent m-propositions.
3. transforming logical implications from fault propagation into disjunctive propo-
sitions (see 2.3).
4. replacing the abstract modes by their equivalent m-propositions for points (3)
and (4) in section 2.4.
5. developing the m-propositions into a conjunctive normal form and splitting the
resulting proposition into a set of monomials.
Finally, after these transformations, the diagnosis problem to be solved may be
formulated as m-proposition whose monomials are provided to the solving algo-
rithm to compute diagnoses.
When items contain multiple modes, the standard HS-tree algorithm (a tree whose
nodes are hitting sets [8]) may lead to diagnoses that contain several behavioral
modes of the same item. However, these diagnoses are impossible because an item
may be in only one mode at the same time.
In addition to standard HS-tree approaches, the multi-mode context has to be
taken into account. It is not a new problem. In literature, some solving approaches
has for instance been proposed in [5, 9]. Based on ATMS [4], the model of faults is
integrated in GED+ [9] to analyse whether the faultiness of the components would
really explain the observation. In multi-mode context, Sherlock [5] is developed
from GDE to compute automatically conflict set and diagnostic hypotheses. It focus
reasoning on more probable probabilities firs in attempt to control the combina-
torics. Without the constraints propagation technique, HS-Tree based algorithm [8]
is preferred in this section to manage multiple-modes. The path from a node to the
root node of a HS-Tree show clearly all elements implied in a temporary diagnostic
result in the construction of HS-Tree. Then, it is easy to avoid the existence of two
or more modes of an Item in a diagnostic result. Moreover, in comparison with orig-
inal HS-Tree algorithm, which base on a set of conflicts, MHS-Tree is extended to a
set of disjunctive propositions to computes hitting set. Each disjunctive proposition
can correspond to a test inconsistent or to transformed fault propagation.
In order to keep a sound reasoning, a consistent test is not taken into account to
compute diagnoses except if it is fully checked. However, results of normal consis-
tent tests are useful for classification of diagnoses. In [7], an approach based on a
distance between theoretical and effective signatures has been proposed. Here, it is
extended to multi-mode context.
Let T = (ti ) be an ordered list of tests, and M = (mi ) be a set of faulty modes.
the signature of M in T is given by σT (M):
where ∏mode (ti ) corresponds to the set of modes implied in the test ti . And
∏mode (ti ) corresponds to the union of complementary modes of each mode implied
in the test ti :
∏ (ti ) =
[
Modes(I)\{m(I)}
mode m(I)∈∏mode (ti )
Let T = (ti ) be an ordered list of tests. At an given instant, the effective signature
in T , denoted by σT∗ , is given by:
( ∗
(σT )i = 1 ↔ ti is inconsistent
∀i, (2)
(σT∗ )i = 0 ↔ ti is consistent
412 Quang-Huy GIAP, Stephane PLOIX, and Jean-Marie FLAUS
The next measurement attempts to measure the similarity between the effective
signature and the theoretical signature of a diagnosis [7]. Let T = (ti ) be an ordered
list of tests, and D = di be a set of diagnoses. The coincidence measurement is given
by:
4 Application example
In order to illustrate how the proposed approach fits to iterative diagnosis with
consecutive decompositions, let’s consider a faulty car studied by a car mechanic.
Firstly, the car mechanic notes that the car does not start up. At this step, the re-
sulting symptom, which is also a trivial diagnosis, is: cfm(car). It is very general
and does not direct to the next step: almost every failure is possible. Implicitly, the
possible modes for the car are:
Then, the expert turns on the key to test whether the starting drive is operating: it
corresponds to a new test. Since he hears the starting drive cranking, he infers from
test 1 that:
∃OBS/ok(EPR) ∧ ok(ESS) ∧ ok(SD) (6)
The consistency test can be used to sort the diagnoses using the coincidence
measurement. The observed symptoms are now:
cfm(car) (7)
∃OBS/ok(EPR) ∧ ok(ESS) ∧ ok(SD) (8)
Expression (8) means that it exists at least an observation such that the test given by
(6) is consistent.
The problem is fully defined by (4), (5), (6), (7) and (8). Let’s transform this
problem into a solvable problem. In order to obtain a complete behavioral abstrac-
tion, complementary fault modes and a virtual item are introduced. It is named:
VI 1 = car \ {EPR, ESS, SD}.The new transformed set of modes coming from (4) is:
Managing Diagnosis Processes with Interactive Decompositions 413
Using the MHS-tree algorithm, the diagnosis of the transformed problem can be
computed. It leads to:
Diagnoses can now be sorted. A signature table (1) can be obtained from (6), (7)
and (8):
T1 1 1 1 0
cfm(VI 1 ) (15)
¬ok(EPR) ∨ ¬ok(SP) ∨ ¬ok(IC) (16)
The new problem to be solved is given by: (4), (5), (6), (7), (8), (13), (14), (15),
and (16). The problem is transformed by adding an virtual item VI 2 = VI 1 \{SP, IC},
which is equal to: car \ {EPR, ESS, SD, SP, IC}.
The new transformed set of modes is given by (4), (9)and:
{(SP) = (ok, cfm); (IC) = (ok, cfm); (VI 2 ) = (ok, cfm)} (17)
Using the MHS-tree algorithm, the diagnosis of the transformed problem can be
computed:
{cfm(EPR)}; {cfm(SP)}; {cfm(IC)} (20)
From (6), (7) and (8), a signature table is obtained:
T1 1 1 1 0 0 0
T2 1 0 0 1 1 0
5 Conclusion
References
1 Introduction
Halonen, P., Miettinen, M. and Hätönen, K., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 417–422.
418 Perttu Halonen, Markus Miettinen, and Kimmo Hätönen
Log sequences consist of log entries. A log entry e is a triplet (t, E, s) consisting of
a time t, an event type E and a source s. Frequent episodes are collections of event
types occurring frequently within a given time w in the entries of a log sequence [6].
The concept of frequent episodes is a derivative of frequent sets [1]. Frequent
sets are sets of items that frequently occur together in the records of a database.
The A PRIORI algorithm for mining frequent sets [1] can be modified to compute
frequent unordered episodes [7]. In this paper, we use unordered episodes.
To mine frequent episodes, we divide the log entry sequence into consecutive
non-overlapping time windows of maximal width w. In addition, we require that as
soon as a log entry with an event type equal to some event type already included in
the window is encountered again, the current window is terminated and a new win-
dow started. Thus, each event type can occur only once within each time window.
We use a set of closed frequent episodes instead of the set of all frequent episodes.
The closure of an episode is its largest super-episode that shares the same frequency,
and a closed episode is a frequent episode that is equal to its closure. The set of all
closed episodes effectively encodes information about all freqent episodes and can
be used to simplify processing without loosing information about the occurrences of
the frequent episodes. Closed frequent episodes are a derivative of so-called closed
frequent sets [8, 2].
In the following, we present algorithms that can be used for identifying anomalies
in frequent episodes in a set of analysed log data. The aim of the algorithms is to
identify new or modified patterns from a set of analysed data.
Let E be the set {E1 , E2 , ..., En } of all possible event types Ei that appear in the
log data. We denote with C the set of all closed frequent episodes that appear in
the analysed log L, i.e. C = { f ⊆ E | f is a closed frequent episode in L}. For each
Computer Log Anomaly Detection Using Frequent Episodes 419
for all f ∈ C do
for all p ∈ C s.t. f ⊂ p do
if f . f req − p. f req ≤ ∆ f then
A ← A ∪ (p, f )
return A
The algorithm has one parameter, ∆ f . It is the maximum difference of the fre-
quency of the sub- and super-episodes for them to be considered an anomalous pair.
The output of algorithm 1 can be used for analysing the input log and identifying
those windows that are anomalous. Algorithm 2 below marks as anomalous those
windows of the log data, which match to any of the episode pairs in the set A of
anomalous episode pairs.
repeat
w ← getFirstWindow(L)
L ← L\w
if ∃(p, f ) ∈ A s.t. f ⊂ w ∧ p 6⊂ w then
w.anomalybody ← f
w.anomalymissing ← p \ f
W ← W ∪ {w}
until L = 0/ return W
denote with P the episode profile that has been calculated based on a history H of log
data. P constitutes the model of normal behaviour and it contains all closed frequent
episodes f that appear in H, i.e. P = { f ⊆ E | f is a closed frequent episode in H}.
The algorithm compares the profile episodes in P with the closed frequent
episodes C found from the analysed log data L. Such novel frequent episodes are
potentially interesting because they may indicate completely new types of activity
in the log data.
N ←C\P
for all n ∈ N do
if ∃p ∈ P s.t. n ⊂ p then
N ← N \n
return N
3 Tests
We tested the anomaly detection algorithms on several types of logs obtained from
a telecommunications network, covering a continuous period of 42 days. The data
were divided into data sequences and closed frequent episodes were mined for each
data sequence, using a frequency threshold of 5 occurrences and limiting the maxi-
mum window length for episodes to 3600 seconds.
We wanted to know, how large a fraction of the input data would be considered
anomalous by our methods. We first measured the relation between the log entries
marked as anomalous and the total amount of log entries present in the analysed
log. For each data sequence, we calculated the number of log entries in windows
W covered by the set of anomalous episodes obtained from algorithm 2 which we
divided by the total amount of log entries in the sequence.
Figure 1 shows the results of our tests on data from the application log and the
system log of the network management system. The fraction of anomalous log en-
tries for the system log varies between ca. 5% and 10%, whereas the fraction of
anomalous log entries stays below 2% for the application log. One can see that with
the exception of a few observations, the measures maintain the same order of mag-
nitude within the same log type.
The second property we measured is the amount of novel episode patterns de-
tected from the analysed log data. That is, the profile contained the frequent closed
episodes that occurred in five days preceeding the analysed data sequence. We ex-
Computer Log Anomaly Detection Using Frequent Episodes 421
ecuted then algorithm 3 on the analysed data sequence and counted the amount of
novel frequent episodes. The counts are show in Figure 2.
20
Application log - delta=1
Anomalous entries (%)
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Day
Fig. 1 Fraction of daily log entries marked as anomalous by algorithm 2. The number of daily
entries in the application log was between 3284 and 5357 (average 5097). Reported anomalies
varied between 0 and 31 (average 9) for ∆ f = 1 and between 6 and 64 (average 28) for ∆ f = 3. The
system log contained 223 to 546 entries (average 280), for which 0 to 30 (average 14) anomalies
were reported for ∆ f = 1 and 0 to 75 (average 25) anomalies for ∆ f = 3.
9
8 Syslog
Application log
7
Novel patterns
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Day
Fig. 2 Daily amounts of new episode patterns in the application and system logs
4 Discussion
Figure 1 shows a clear difference between the analysed log types. The application
log contains a large amount of routinely recorded event records. The percentage of
anomalous entries remains low due to the large overall record mass of the appli-
cation log. The system log on the other hand monitors the operation of the basic
system components and records any errors and deviations occurring in the system.
The amount of log entries is smaller and the relative likelihood of error occurrences
higher.
The novel episode pattern measure in Figure 2 shows that the appearance of
entirely new episodes is rather exceptional for both shown log types. The number of
422 Perttu Halonen, Markus Miettinen, and Kimmo Hätönen
reported daily novel episodes remains so small that they could be easily inspected
on a daily basis by a human monitoring officer.
The results suggest that algorithms 1 and 2 can be used to filter out log entries that
deviate from the usual frequent behaviour. Such pre-filtering would enable an expert
system or even human analysts to focus the subsequent analysis on log entries that
are known to be anomalous with regard to the bulk of the data. The filtering seems
to be more effective (> 98% in our application log example) for log types with
higher entry volumes, where obviously abnormal activities do not dominate the data
set. However, also for log types showing more volatile behaviour, significant data
filtering efficiency can be achieved (ca. 90 − 95% in our system log example).
5 Summary
References
1. R. Agrawal et al. Fast discovery of association rules. In U.M. Fayyad et al., editors, Adv. in
knowl. discovery and data mining, pages 307 – 328. AAAI, Menlo Park, CA, USA, 1996.
2. J. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary data
mining. In Proc. PAKDD’00, volume 1805 of LNAI, pages 62–73, Kyoto, Japan, April 2000.
Springer.
3. S. Forrest et al. Self-nonself discrimination in a computer. In Proc. of the 1994 IEEE Symp. on
Research in Security and Privacy, Los Alamos, CA, pages 202–212. IEEE Computer Society
Press, 1994.
4. C. Ko et al. Execution monitoring of security-critical programs in distributed systems: a
specification-based approach. 1997 IEEE Symp. on Security and Privacy, 00:175–187, 1997.
5. T. Lane and C.E. Brodley. Sequence matching and learning in anomaly detection for computer
security. In AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pages
43–49, July 1997.
6. H. Mannila et al. Discovering frequent episodes in sequences. In Proc. of the First Int. Conf. on
Knowledge Discovery and Data Mining (KDD’95), pages 210–215, Montreal, Canada, August
1995. AAAI Press.
7. H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In
E. Simoudis et al., editors, Proc. of the Second Int. Conf. on Knowledge Discovery and Data
Mining (KDD’96), pages 146–151, Portland, Oregon, August 1996. AAAI Press.
8. N. Pasquier et al. Discovering frequent closed itemsets for association rules. LNCS, 1540:398–
416, 1999.
Semi-tacit Adaptation of Intelligent
Environments
1The Institute of Information Technology, Ulm University, Ulm, 89081 Germany (phone:
0049-731-50-26265; fax: 0049-731-50-26259; e-mail: [email protected])
2The Hellenic Open University and DAISy research unit at the Computer Technology
Institute, both in Patras, Hellas
3The Computational Intelligence Centre, Department of Computing and Electronic Systems,
University of Essex, Wivenhoe Park, Colchester, CO43SQ, UK
4The National Center for Scientific Research (LIMSI-CNRS) BP 133, 91403, Orsay cedex,
France and the Paris-South University
Abstract This paper presents a semi-tacit adaptation system for implementing and
configuring a new generation of intelligent environments referred to as adaptive
ambient ecologies. These are highly distributed systems, which require new ways
of communication and collaboration to support the realization of people’s tasks.
Semi-tacit adaptation is based on a mixed initiative approach in human-system
dialogue management and is supported by three types of intelligent agents: Fuzzy
Task Agent, Planning Agent and Interaction Agent. These agents use an ontology
as a common repository of knowledge and information about the services and state
of the ambient ecology.
1 Introduction
Heinroth, T., Kameas, A., Hagras, H. and Bellik, Y., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 423–429.
424 Tobias Heinroth et al.
or some other may cease functioning. While successful execution of tasks will de-
pend on the quality of interactions among artefacts and among people and arte-
facts, it is important that task execution will still be possible, despite changes in
the ambient ecology. Thus, the realization of mechanisms that achieve adaptation
of system to changing context is necessary. In this paper, we present an adaptation
mechanism that uses specialized intelligent agents (for task, plan and interaction
adaptation) and a common repository of ecology knowledge and information in
the form of an ontology, which is formed by matching the meta-data and self-
descriptions of the members of the ambient ecology. More specifically, we shall
focus on a mixed initiative approach in human-system dialogue management that
we call semi-tacit adaptation.
The remainder of this paper is structured as follows. Section 2 presents a sce-
nario illustrating our concepts. Section 3 gives an overview of the basic modelling
of an AE and presents the applications necessary to realize semi-tacit adaptation.
The paper concludes in Section 4.
In this section, we shall present a scenario based on the imaginary life of a user
(Suki) who just moved to a home that is characterized by being intelligent and
adaptive. The scenario will help to illustrate the concepts presented in the paper.
To reference the different parts of the scenario in the other sections we use
SP1…SPX as text marks.
Suki has been living in this new adaptive home for the past 10 months. Suki’s
living room has embedded in the walls and ceiling a number of sensors reading in-
side temperature and brightness; some more sensors of these types are embedded
in the outside wall of the house. A touch screen mounted near the room entrance is
used as the main control point. Suki uses an air-conditioning as the main heating /
cooling device. The windows are equipped with automated blinds, which can be
turned to dim or brighten the room. For the same purpose Suki can use the two
lamps hanging from the ceiling. Finally, Suki has brought some hi-tech devices in
the room: a digital flat screen TV set and a 9.1 sound system.
Suki’s goal is to feel comfortable in his living room, no matter what the season
or the outside weather conditions are. After careful thinking, he concluded that for
him comfort involved the adjustment of temperature and brightness, the selection
of his favourite TV channel and the adjustment of volume level, depending on the
programme (SP1). Regarding the latter, the smart home system had observed
Suki’s choices over the past months and has drawn the conclusion that he tends to
increase the volume when music or English speaking movies are shown, except
when it’s late at night; he keeps the volume low when movies have subtitles, or
when guests are around (SP2). Nevertheless, the system does not have enough
data to deduce Suki’s favourite lighting and temperature conditions as the seasons
change. Initially, the system will combine information in Suki’s personal profile,
Semi-tacit Adaptation of Intelligent Environment 425
the environmental conditions, the weather forecast and anything else that may
matter, to tacitly adapt to the values that Suki might want. In case of a doubt, it
will engage in dialogue with Suki about specific conditions. Of course, Suki can
always set directly the values he desires by manipulating the devices that affect
them; the system will monitor such activity and tacitly will adjust its rules.
For the past few days, as the weather has grown warmer, Suki has gone into a
spring time mood; the system in his smart home has read the changing context and
is trying to adapt, by decreasing the time that the heating system is on and by leav-
ing the windows open for longer time intervals, during the sunny days (SP3). Suki
still thinks that the living room is too warm and instructs the house to lower the
temperature even further (SP4); the system, noticing that the day is sunny (it is
early afternoon) and no rain or wind is foreseen for today, asks if Suki would pre-
fer to open the windows, as well. Suki agrees, so the system opens the windows
and lowers the thermostat only slightly. At the same time, it decreases slightly the
volume of the TV set, as it may disturb the neighbours.
An hour later, Joan, Suki’s friend arrives; she and Suki have arranged to go to a
concert in the evening. Already the temperature has fallen and Suki asks the sys-
tem to close the windows and lower the blinds; as a consequence, the system turns
on the room lights, but Suki immediately switches it off. The system is puzzled
and asks Suki if he wants some light or no light at all (SP5). Suki turns on a floor
lamp that he bought only yesterday, but didn’t have the chance to use until now
(SP6). The system registers the new source of light and then asks Suki if this will
be a new permanent brightness level, but Suki declines. After a while, Suki and
Joan leave for the concert; the system shuts down all light and sound sources, but
maintains the temperature until Suki’s return.
An AE offers both physical properties and digital services and acts as a container
for “activity spheres” (AS, see Fig. 1) [3]. An AS consists of passive entities (sen-
sors, actuators, users, services, devices, etc.) and active entities namely Fuzzy
Task Agent (FTA), Interaction Agent (IA) and Planning Agent (PA). It is inten-
tionally created by an actor (human or agent) to support the realization of a spe-
cific goal. The sphere is deployed over an AE and uses its resources (artefacts,
networks, services). The goal is described as a set of interrelated tasks; the sphere
contains models of these tasks and their interaction. An AS is considered as a dis-
tributed yet integrated system that is formed on demand to support people’s activi-
ties. It is adaptive in the sense that it can be instantiated within different environ-
ments and adaptively pursue its goals. An AS is realized as a composition of
configurations between the artefacts and the provided services into the AE.
The configuration and the adaptation of a sphere could be realized in three
ways, explicit, tacit and semi-tacit. In the former mode, people configure spheres
by explicitly composing artefact affordances, based on the visualized descriptions
426 Tobias Heinroth et al.
of the artefact properties, capabilities and services [4]. In a highly dynamic sys-
tem, such as an AS, explicit configuration is useful only for setting up the initial
values or to model a default profile of the AS (see Fig. 2), because the huge num-
ber of interactions involved would impose a heavy cognitive load to the user,
should he became aware of all of them. The tacit mode operates completely trans-
parently to the user and is based on the system observing user’s interactions with
the sphere and actions within the sphere.
Agents in the intelligent environment can monitor user actions and record, store
and process information about them [5], [6]. The sphere can learn user preferences
and adapt to them, as it can adapt to the configuration of any new AE that the user
enters. Tacit adaptation may achieve the opposite effect than the one it aims for, as
people might feel that they have lost control over system’s operation, which will
appear incomprehensible and untrustworthy.
The semi-tacit mode realizes a third way of configuring and adapting the
sphere by combining the explicit and the implicit way. The user interacts with the
system, for example, using speech or screen-based dialogues. The user does not
have to explicitly indicate resources to be used to program task models but the
user can provide only basic information regarding his goals and objectives. The
system, at the same time, attempts to tacitly resolve abstract tasks into concrete
tasks and realize them, while monitoring user’s actions with the resources in-
volved. In case of rules that cannot be resolved, the system pro-actively engages
into adaptive dialogue with the user.
The system uses the sphere ontology and three types of Agents: Planning Agent
(PA), Fuzzy Task Agent (FTA) and Interaction Agent (IA). Each AS is composed
by heterogeneous artefacts, each of which contains local descriptions of its ser-
vices, which can be regarded as independent ontologies of varying complexity. to
achieve efficient communication between the artefacts in the context of a task, we
propose the application of ontology matching, to make the ontologies of the inter-
acting artefacts semantically interoperable. Ontology matching is the process of
finding relationships or correspondences between entities of different ontologies.
Semi-tacit Adaptation of Intelligent Environment 427
This set of correspondences, is called an alignment [7]. So, by applying alignment
algorithms to the local ontologies of the AE members, the user profile and the lo-
cal agent ontologies, the sphere ontology is formed, which at each given moment,
represents the collective knowledge required to realize a given task, as well as the
state of the AS that supports this realization. Consequently, any adaptation
mechanism needs to access the sphere ontology, not only for obtaining access to
the correct AE resources, but also for obtaining the linguistic descriptions of the
resources state.
To realize semi-tacit adaptation and thanks to the descriptive power of speech,
we assume spoken dialogue interaction fits well to ask the user for further infor-
mation to enhance planning or, if necessary, to resolve conflicts that may occur by
negotiating with the user. One part of the IA is a speech dialogue manager (SDM)
that receives problem descriptions from the PA and the FTA whenever semi-tacit
adaptation is needed. Thus, the SDM tries to generate dialogues to react ade-
quately.
The PA main task is to find out which tasks must be realized to support the
user’s activity and how they could be combined with the resources of the AE.
During initialisation, the PA can use predefined rules and default domain models
described in the task model. This initial information can either be pre-defined ex-
plicitly or in a semi-tacit way by utilising the IA to generate a (spoken) dialogue to
retrieve general information from the user to personalize, for example, a default
profile (see Fig. 2 and SP1). Then the system will form the sphere ontology by
aligning the local ontologies of the members of the ecology, the agents and the
user profile. But planning must also be done during the lifetime of the AS: The
task model can be affected when conflicts arise (Suki behaves contrary to the sys-
tem’s consequences – SP5) or when a new device enters the ecology (SP6). In the
first case it is usually not possible for the system to act completely tacitly so it
tries to involve the user in a dialogue for conflict resolution. The latter case can
normally be handled tacitly, by re-aligning the sphere ontology.
The FTA will start to build an initial fuzzy logic based model of Suki’s prefer-
ences and behaviour to realise the given tasks of maintaining the temperature,
light levels and entertainment systems at the desired levels (SP2). To build the ini-
428 Tobias Heinroth et al.
tial model, the system will collect timestamped data (i.e. containing the related
time and date) of the environment status (given temperature, light levels, user con-
text (which room, activity, etc), the weather forecast, etc.) together with the user
actions for such environment status (air-conditioning settings, blind settings, en-
tertainment systems settings, etc). From the collected data, the FTA learns the
needed fuzzy logic membership functions and rules needed to build the fuzzy
logic model of Suki’s preferences to realise the given task. The system then begins
to adapt the generated fuzzy models over the short time interval to account for any
environment of user behaviour changes. Over the long term, the fuzzy logic sys-
tem will need to be adapted to Suki’s change of preferences associated with sea-
sonal variation (SP3). The FTA will then adapt its fuzzy logic systems rules and
membership function to accommodate the faced uncertainties where the FTA will
employ type-2 fuzzy logic systems [6]. The FTA acts completely in a tacit way
similar to the PA as long as there are no conflicts or situations where the user is
not satisfied with the adaptation (SP4).
4 Conclusions
Acknowledgement The research leading to these results has received funding from the European
Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°
216837 as part of the ATRACO Project (www.atraco.org).
References
† ‡ ‡
Suphot Chunwiphat , Patrick Reignier and Augustin Lux
†
Department of Electronic Engineering Technology, College of Industrial Technology
King Mongkut’s University of Technology North Bangkok
1518 Pibulsongkram Road, Bangsue, Bangkok 10800, Thailand
‡
LIG – PRIMA – INRIA Rhône-Alpes,
655 avenue de l’Europe, Montbonnot, 38334 Saint Ismier cedex, France
[email protected] , {Patrick.Reignier, Augustin.Lux}@inrialpes.fr
Abstract This paper focuses on the problem of human activity representation and
automatic recognition. We first describe an approach for human activity represen-
tation. We define the concepts of roles, relations, situations and temporal graph of
situations (the context model). This context model is transformed into a Fuzzy
Petri Net which naturally expresses the smooth changes of activity states from one
state to another with gradual and continuous membership functions. Afterward, we
present an algorithm for recognizing human activities observed in a scene. The
recognition algorithm is a hierarchical fusion model based on fuzzy measures and
fuzzy integrals. The fusion process nonlinearly combines events, produced by an
activity representation model, based on an assumption that all occurred events
support the appearance of a modeled scenario. The goal is to determine, from an
observed sequence, the confidence factor that each modeled scenario (predefined
in a library) is indeed describing this sequence. We have successfully evaluated
our approach on the video sequences taken from the European CAVIAR project1.
1 Introduction
As one of the most active research areas in computer vision, human activity analy-
sis is currently receiving a great interest in computer vision research community.
This is due to its promising applications in many areas such as visual surveillance,
human machine interaction, content-based image storage and retrieval, video con-
Chunwiphat, S., Reignier, P. and Lux, A., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 431–439.
432 S. Chunwiphat, P. Reignier and A. Lux
ferencing, etc. One of the major problems in such systems is how the system can
produce the high-level semantic interpretation of human activities from the low-
level numerical pixel data.
This paper focuses on the representation and recognition of human activities for
a generic human activity interpretation system. We propose formalism for context
aware observation to describe and model human activities. Then the activity mod-
els will be transformed into a graphical model, a fuzzy Petri net, for analyzing the
activities in a mathematical way. Finally, we present a hierarchical fusion model,
based on fuzzy measures, for recognizing the human activities.
The rest of this paper is organized as follows. Related work is discussed in Sec-
tion 2. Section 3 describes architecture for human activity modeling. Section 4
presents our technique for representing human activities. A hierarchical fusion
model for activity recognition is proposed in Section 5. Section 6 presents experi-
mental results. Section 7 summarises the paper and discusses future work.
2 Related Work
The core component of our architecture is a situation model [2]. This framework
organizes the observation of interaction using a hierarchy of concepts: scenario
(context), situation, role and relation. A scenario is a composition of situations
that share the same set of roles and relations. A role is an agent that performs cer-
tain action, while a relation describes a connection among objects that play the
roles in the situation. Thus, situations are a form of state defined over observa-
tions.
As an example, we consider a simple video, called “Browsing”, from the
CAVIAR project. The situation model for this video is described by an occurrence
of three situations and the related roles and relations as follows:
Situation 1 (s1): A person walks toward an information desk.
Role: Walker (anyone walking in the scene)
Relation: Toward (Walker heading toward an information desk)
Relation: Close (small distance between the walker and the information desk)
Relation: Slow (speed)
Situation 2 (s2): The person stops to read some information at the information desk.
Role: Browser (anyone immobile)
Relation: Toward (direction heading toward an information desk)
Relation: Very close (distance being very close to the information desk)
Relation: Very slow (speed)
Situation 3 (s3): The immobile person starts to walk away from the information desk.
Role: Walker (anyone walking in the scene)
Relation: Away (direction heading away from the information desk)
Relation: Close (distance being close to the information desk)
Relation: Slow (speed)
In the next section, we propose to transform the situation model into a Petri net.
As described above, the transition firing in Petri Nets is the instantaneous change
from one state to another. However, in real world situations, the change of human
activities from one state to another is not binary. In the next section, a method for
relaxing this binary character of the transition firing in Petri nets is presented.
434 S. Chunwiphat, P. Reignier and A. Lux
4 Fuzzy Petri Nets Representation
Let us consider a fuzzy set A in Fig. 2(a). This fuzzy set can be interpreted as a
condition, called fuzzy condition, used to describe the concept of “the sensor value
is close to 7”. When the fuzzy condition is assigned to a transition in Petri nets, the
transition firing will be defined by duration according to the membership function.
The duration of a transition firing can be described by the support of fuzzy set as
shown in Fig. 2(a). From Fig. 2(b), the occurrence of an event “x” on the transition
t1 is associated to the fuzzy condition represented by the fuzzy set A. The firing of
t1 will begin as soon as the support of the condition is reached and it terminates
when the event has crossed this support completely. During this firing, we will
consider that the token is on both input and output place. The functions on the in-
put and output place will proceed simultaneously. The two corresponding situa-
tions are simultaneously active.
µA Event “x”
A
1
p1 p2
4 6 7 8 10 x
Support of t1
fuzzy set A
(a) (b)
Fig. 2. (a) A Concept “the sensor value is close to 7”. (b) An Occurred event “x” at t1 .
To represent a situation model with a Fuzzy Petri Net, situations will be repre-
sented by places and transitions by fuzzy conditions. Situations are defined with
roles and relations. A role is an acceptance test. We have used a Support Vector
Machines approach to automatically learn from objects’ properties the person’s
roles (we have used the LIBSVM library from Chang and Lin whose software is
available at http://www.csie.nut.edu.tw/~cjlin/libsvm). A relation is a predicate on
entities selected by roles.
In Fig. 3, place s1, s2 and s3 represent three situations of scenario “Browsing”
(see Sect. 3). The signs → represent the changes of relation truth value from one
situation to other. The fuzzy condition function that describes, for example, the
change of speed between s1 and s2 can be constructed by creating a link between
A Formal Fuzzy Framework for Representation 435
“Slow” and “Very slow” of the speed relation, and “Close” and “Very close” of
the distance relation. The transition occurs when two membership functions are
true simultaneously: “Slow” AND “Very slow”. The logical connective AND can
be implemented by the intersection operator (∩) in fuzzy sets. Examples of mem-
bership functions representing the speed relations, “Very slow” and “Slow” in
situation s2 and s1 are show in Fig. 4(a) and (b). The shaded are shown in Fig. 4(c)
represents the transition between “Slow” AND “Very slow”.
Toward ← Away
t9
I =
Fig. 4. (a) Fuzzy term “very slow”. (b) Fuzzy term “slow”. (c) Fuzzy condition attached to t2.
The fuzzy condition describing the transition between “Close” and “Very
close” for the distance relation is built using the same approach. Finally, we can
show the firing rule attached to transition t2 in Fig. 3 as follows:
A Fuzzy Petri Net is a scenario model. When using real perception data, all the
situations might not be perceived. Each time a situation of a scenario model is
recognized, it can be interpreted as a new evidence that the corresponding scenario
is recognized. The problem now is how we can fuse all those evidences to esti-
mate how the scenario corresponds to what is currently observed.
Let X = {xi, x2,…, xn} be a finite set and let P(X) denote the power set of X. A
fuzzy measure on a set X is a function g: P(X) → [0, 1] such that
g (φ ) = 0, g ( X ) = 1 and g ( A) ≤ g ( B), if A ⊂ B and A, B ∈ P( X )
Following the definition, Sugeno [9] introduced the so-called gλ-fuzzy meas-
ure satisfying the following additional property
g ( A ∪ B) = g ( A) + g ( B) + λg ( A) g ( B) (1)
for all A, B ⊂ X and A ∩ B = φ , and for some fixed λ > −1 . The value of λ can be
found from the boundary condition g(X) = 1 by solving the following equation
n
(2)
∏
λ + 1 = (1 + λg i )
i =1
where gi is called a fuzzy density value. Let Ai = {xi + xi+1,…, xn}. When g is a gλ-
fuzzy measure, the values of g(Ai) can be computed recursively [9]. Murofushi [7]
proposed the so-called Choquet fuzzy integral which can be determined in the fol-
lowing form
n
Consider a scenario composed of three situation sources, i.e. S = {s1, s2, s3} to-
gether with density values (degrees of importance of the situations) g1 = 0.14, g2 =
A Formal Fuzzy Framework for Representation 437
3
0.45 and g = 0.12. The fuzzy measure on the power set of S can be calculated by
Eq. (1) and shown in Table 1.
0.9
h1 g1 g 2 g1 g3 g 2 g3 g1 g 2 g3
sstart s1 s2 s3 s2 s3 s1 sstart(End)
h1 = 0.6 h2 = 0.8 h3 = 0.7 h2 = 0.5 h3 = 0.4 h1 = 0.9
A simple sequence of observed situations and their confidence degrees
situation sources is shown in Fig. 6. The nodes at the middle level will be deter-
mined as the degrees of importance of combined situations using the values of
fuzzy measure in Table 2. We reduce the degree of importance of the sequence
{s1, s2, s3} from 1.0 to 0.9 in order to prevent the effect of source that is over con-
fident. To combine the evidence, a sequence of occurred situations, si, and its con-
fidence values, hi, are first separated according to the subset of the power set of S
(see Fig. 6). Then the separated group of hi will be aggregated by using Eq. (3) at
the bottom level of the model. Afterward the aggregated values will be conveyed
up to the nodes at the middle level and aggregated by Eq. (3) again. Finally, the
aggregated value at the middle level will be propagated to the top level for repre-
senting the confidence value of the scenario, e.
6 Experimental Results
We have created four different models of activities for the European Caviar pro-
ject: “Browsing”, “Leaving bag behind”, “Two people fighting” and “People
438 S. Chunwiphat, P. Reignier and A. Lux
meeting walking together and splitting up”. Table 2 only shows the results pro-
duced by the “Browsing” model when all sample videos are feed as an input.
The model produces the best confidence value when the most relevant video is
presented to the model. We can also classify the videos pertinent to the “Brows-
ing” scenario from the confidence value that the model produces on each video.
The overall confidence value is rather high on the video “Person leaving bag but
then pick it up again” because the activities contained in these video can also be
interpreted as “Browsing” (the person leaves a bag near a desk, stays for a while
and then goes away with the bag). Other videos may consist of activities that sup-
port the “Browsing” scenario partly.
In this paper, we have first defined the concepts and terms for human activities
modelling. Then, we have presented a fuzzy Petri net based-model in which the
human activities are represented and analyzed in a graphical way. With fuzzy
transition functions, the change of activity situations proceeds gradually and
smoothly. To recognize an observed scenario, we have presented a hierarchical fu-
sion model that nonlinearly combines evidence based fuzzy measures. The ex-
perimental results confirm that our proposed framework is optimal to represent
and recognize human activities based on the following reasons: 1) our representa-
tion model allows a flexible and extendable representation of human activities, 2)
the importance of information sources is taken account, which makes the process
of evidence combination more consistent with real world situations. In the future,
we plan to develop sophisticated temporal constraints in order to build a more ge-
neric model for representing scenarios in complicated situations.
*There is more than one person whose actions provoke the evolution of situations in the “Brows-
ing” model.
A Formal Fuzzy Framework for Representation 439
References
1 Introduction
HCI has a long history, during which various interfaces were developed and cur-
rently aiming to a more natural interaction, involving 3D gesture recognition and
speech-based interfaces. Achievement of naturalness involves progress from
command or menu-based (system driven) to user-driven dialog management. Sys-
tem intelligence to allow adaptation to environment/context changes and user
preferences is considered a must. The games domain has a special position in the
area of HCI, holding a leading position in the research of attractive interfaces and
interaction modes. Game development has become complex, expensive and bur-
dened with a long development cycle, this creating barriers to independent games
Kocsis, O., Ganchev, T., Mporas, I., Papadopoulos, G. and Fakotakis, N., 2009, in IFIP International
Federation for Information Processing, Volume 296; Artificial Intelligence Applications and
Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 441–447.
442 Otilia Kocsis et al.
developer and inhibiting the introduction of innovative games or new game gen-
res, i.e. serious games or games accessible to communities with special needs.
Serious games (SGs) or persuasive games are computer and video games used
as educational technology or as a vehicle for presenting or promoting a point of
view. They can be similar to education games, but are often intended for an audi-
ence outside of primary or secondary education. SGs can be of any genre and
many of them can be considered a kind of edutainment, intended to provide and
engage self-reinforcing context in which to motivate and educate players towards
non-game events or processes.
PlayMancer, a European Commission co-funded project, aims to implement a
platform for serious games, which allows: (i) augmenting the gaming experience
with innovative modes of interaction between the player and the game world, (ii)
shorter and most cost-effective game production chain, (iii) evolvement of Uni-
versally Accessible Games principles for application into action based 3D games.
In this in-progress work, the PlayMancer concept and the architectural model
of the multimodal platform are presented. The proposed platform architecture in-
tegrates a series of existing open source systems, such as a game engine, a spoken
dialog management system, spoken interface components (speech recognition, un-
derstanding and synthesis). The existing components are augmented to support
multimodality, to be adaptable to context changes, to user preferences/needs, and
to game tasks. New interaction modes are provided by newly developed compo-
nents, such as emotion recognition from speech audio data or motion tracking.
One of the most important features of the proposed architecture is mixed-initiative
dialogue strategy, enabled through dynamic generation of task-related interaction
data, by coupling dialog and interactive 3D graphics objects at the design phase.
Fast development of new games and adaptation to specific game scenario or user
needs is facilitated by a configuration toolbox.
The general architecture of the PlayMancer platform, illustrated in Fig. 1, has been
designed taking into account: (i) functional and technical specifications derived
from generic and specific domain user requirements, and (ii) the main technologi-
cal challenge of the project – the rendering of an open source game engine multi-
modal. In particular, multi-modality is achieved by developing an enhanced mul-
timodal dialogue interaction platform, based on the RavenClaw/Olympus architec-
ture [1], which is further extended to render other modalities than speech.
The platform relies on a modular architecture, where components processing
the data streams from the individual modalities and input sources interact through
the central hub of Olympus, which allows synchronous or alternative use of in-
put/output modalities. This way, in addition to traditional inputs/outputs used in
games (joystick, keyboard, mouse, display), the PlayMancer architecture inte-
grates also speech, touch, biosensors, and motion-tracking. These additional mo-
Multi-modal System Architecture for Serious Gaming 443
Acknowledgment This work was supported by the PlayMancer project (FP7 215839), which is
partially funded by the European Commission.
References
1 Introduction
Aliki Katsarou
Department of Management, London School of Economics and Political Science, Houghton Str,
London WC2A 2AE, U.K., e-mail: [email protected]
Aris Gkoulalas-Divanis · Vassilios S. Verykios
Department of Computer & Communication Engineering, University of Thessaly, 37 Glavani -
28th October Str, Volos GR–38221, Greece, e-mail: {arisgd,verykios}@inf.uth.gr
Katsarou, A., Gkoulalas-Divanis, A. and Verykios, V.S., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 449–458.
450 Aliki Katsarou, Aris Gkoulalas-Divanis, and Vassilios S. Verykios
2 Problem Formulation
In this section we first provide some basic definitions that are necessary for the
understanding of the proposed methodology, and then we introduce the problem
statement.
3 Solution Methodology
1 Reconstruction–based approaches for knowledge hiding, generate the sanitized dataset D0 from
scratch instead of directly modifying the transactions of the original dataset D.
452 Aliki Katsarou, Aris Gkoulalas-Divanis, and Vassilios S. Verykios
Our proposed methodology, called the Least Supported Attribute (LSA) modifica-
tion algorithm, uses the nonsensitive rules that are mined from the original database
D to reconstruct its sanitized counterpart D0 . As a first step, LSA identifies the sup-
porting transactions for each nonsensitive rule Ri ∈ R in the original dataset. Then,
among the supporting transactions for this rule, it selects the ones that are also sup-
porting at least one sensitive rule. Fig. 1 provides an example of such a scenario,
generated by a sequential covering algorithm like RIPPER [3] to discriminate be-
tween the positive and the negative examples of a two–class classification problem.
As one can notice, the rule generation process allows the transactions of the database
to support more than one rule, as is the case for rules R1 and R2 2 .
For each transaction that supports both a nonsensitive and a sensitive classifica-
tion rule, LSA modifies it appropriately so that it no longer supports the sensitive
rule. The proposed modification affects only one attribute–value pair of the trans-
action, which is selected to be the one having the least support in D, among the
attribute–value pairs of the supported sensitive rule. Furthermore, to minimize the
side–effects in the sanitized outcome, the new value that will be assigned to the se-
lected attribute will be the one that is supported the least by the transactions of the
original dataset (different from the current value in the sensitive rule). By altering
the value of the attribute to equal the one that is least supported in D, we manage
to moderate the increment of the support of some attribute–value pairs and thus to
minimize the probability of producing rules in D0 that were nonexistent in D. To
make this possible, we employ a data structure that keeps track of the number of
times that each attribute–value pair is met in the transactions of D, as shown in Fig.
2. As one can notice, the proposed data structure L is a list of lists, the later of which
holds the attribute–value pairs sorted in a descending order of support in dataset D.
Example An example will allow us to better demonstrate how this operation works. Consider the
database of Table 1 that consists of four attributes A1 , A2 , A3 , A4 and a class attribute C with labels
0 and 1. We assume that in the given dataset, the counters for the various attribute–value pairs (as
updated based on the support of the attributes–values in the dataset) are provided in Table 1. Let
T1 = (a1, b1, c1, d1, 1), T2 = (a1, b2, c2, d1, 1), T3 = (a1, b1, c2, d1, 1) and T4 = (a1, b2, c3, d1, 1)
be four transactions that support the rule (A1 = a1) ∧ (A4 = d1) −→ (C = 1). Among the four
transactions, T4 also supports the sensitive rule (A2 = b2) ∧ (A3 = c3) −→ (C = 1). In order to
modify T4 in such a way that it no longer supports the sensitive rule, LSA will choose to replace
2 However, due to the rule ordering scheme that is enforced by the rule generation algorithm, only
the first rule in the rule set that is supported by a new transaction is used for its classification.
Reconstruction-based Classification Rule Hiding through Controlled Data Modification 453
the value of attribute A3 in T4 , since (A3 = c3) is less supported in D than (A2 = b2). The new
value of A3 in T4 will be the one from L that is minimally supported in the dataset; that is c1.
The rationale behind the modification of the transactions that support sensitive
rules in D is as follows. In LSA (as in most of the currently proposed methodologies
for classification rule hiding), we consider that the sanitized dataset D0 will consist
of the same number of transactions N as the ones of dataset D. However, since the
sensitive rules cover a set of transactions that are not covered by the nonsensitive
rules, and since D0 is formulated only from the transactions supporting the non-
sensitive rules, it is reasonable to expect that the transactions that support all the
nonsensitive rules in D are less than N. Thus, LSA uses the transactions supporting
the nonsensitive rules in a round–robin fashion in order to construct the sanitized
outcome. However, we need to mention that LSA ensures that the representation of
the nonsensitive rules in D0 is proportional to their representation in D. Algorithm 1
provides the details of our implementation.
from LSA in the way it selects the attribute–value pair of a transaction that supports
a sensitive rule to facilitate knowledge hiding. Specifically, when a transaction is
found to support both a nonsensitive and a sensitive rule, NLSA randomly selects
an attribute–value pair of the supported sensitive rule and modifies the value of this
attribute based on the counters in L.
The second variation, called TR-A (Transaction Removal for All transactions
supporting sensitive rules) discards, instead of modifying, all the transactions that
support both a nonsensitive and a sensitive rule, while the third variation, called
TR-S (Transaction Removal for Selected transactions) is a combination of LSA and
TR-A. For every nonsensitive rule, TR-S retrieves its supporting transactions in D.
If some of these transactions also support a sensitive rule, then (i) if the number of
transactions supporting the nonsensitive rule is greater than the number of instances
that have to be generated for this rule in the sanitized dataset D0 , then any addi-
tional transactions from the ones supporting the sensitive rule are removed, and (ii)
the remaining transactions that also support the sensitive rule are modified as LSA
dictates. Otherwise, the algorithm operates the same way as LSA.
Mushroom 8,124 22 9 8
Vote 435 16 4 10
4 Experimental Evaluation
5 Related Work
In the last decade, there has been a lot of active research in the field of privacy pre-
serving data sharing. Vaidya et al. [8] tackle the problem of multiparty data sharing
by proposing a distributed privacy preserving version of ID3. The proposed strategy
assumes a vertical partitioning of the data where every attribute (including the class)
has to be known only by one party. A distributed version of ID3 that is suitable in
the case of a horizontal data partitioning scheme, can be found in [9].
Chang and Moskowitz [1] were the first to address the inference problem caused
by the downgrading of the data in the context of classification rules. Through a
blocking technique, called parsimonious downgrading, the authors block the infer-
456 Aliki Katsarou, Aris Gkoulalas-Divanis, and Vassilios S. Verykios
ence channels that lead to the identification of the sensitive rules by selectively san-
itizing transactions so that missing values appear in the released dataset. This has as
an immediate consequence the lowering of the confidence of an attacker regarding
the holding of the sensitive rules.
Chen and Liu [2] present a random rotation perturbation technique to privacy
preserving data classification. The proposed methodology preserves the multi–
dimensional geometric characteristics of the dataset with respect to task–specific
information. As an effect, in the sanitized dataset the sensitive knowledge is pro-
tected against disclosure, while the utility of the data is preserved to a large extend.
Natwichai et al. [6] propose a reconstruction algorithm for classification rules
hiding. The proposed algorithm uses the nonsensitive rules to build a decision tree
from which the sanitized dataset will be generated. To produce the sanitized dataset,
the algorithm traverses the paths of the decision tree that correspond to the same rule
and repeatedly generates transactions that support this rule in the sanitized outcome.
In [6] the decision tree is build based on the gain ratio of the various attributes. An
alternative approach that builds the decision tree by using the least common attribute
measure is presented in [5]. Finally, in [7] a data reduction approach is proposed,
which is suitable for the hiding of a specific type of classification rules, known as
canonical associative classification rules.
Reconstruction-based Classification Rule Hiding through Controlled Data Modification 457
Finally, Islam and Brankovic [4] present a noise addition framework for the hid-
ing of sensitive classification rules. The suggested framework achieves to protect
the sensitive patterns from disclosure, while preserving all the nonsensitive statisti-
cal information of the dataset.
6 Conclusions
In this paper, we presented a novel approach to classification rules hiding that guar-
antees the privacy of the sensitive knowledge, while minimizing the side–effects
introduced by the sanitization process. Through a series of experiments, we demon-
strated that our approach yields good results in terms of side–effects, while it keeps
the computational complexity within reasonable bounds.
Acknowledgements We would like to thank Prof. William W. Cohen from the Carnegie Mellon
University for providing us the implementation of the RIPPER algorithm.
458 Aliki Katsarou, Aris Gkoulalas-Divanis, and Vassilios S. Verykios
References
1. Chang, L., Moskowitz, I.S.: Parsimonious downgrading and decision trees applied to the infer-
ence problem. In: Proceedings 1998 Workshop on New Security Paradigms, pp. 82–89 (1998)
2. Chen, K., Liu, L.: Privacy preserving data classification with rotation perturbation. In: Proceed-
ings 5th IEEE International Conference on Data Mining, pp. 589–592 (2005)
3. Cohen, W.W.: Fast effective rule induction. In: Proceedings 12th International Conference on
Machine Learning, pp. 115–123 (1995)
4. Islam, M.Z., Brankovic, L.: A framework for privacy preserving classification in data mining.
In: Proceedings 22nd Workshop on Australasian Information Security, Data Mining and Web
Intelligence, and Software Internationalisation, pp. 163–168 (2004)
5. Natwichai, J., Li, X., Orlowska, M.: Hiding classification rules for data sharing with privacy
preservation. In: Proceedings 7th International Conference on Data Warehousing and Knowl-
edge Discovery, pp. 468–467 (2005)
6. Natwichai, J., Li, X., Orlowska, M.E.: A reconstruction-based algorithm for classification rules
hiding. In: Proceedings 17th Australasian Database Conference, pp. 49–58 (2006)
7. Natwichai, J., Sun, X., Li, X.: Data reduction approach for sensitive associative classification
rule hiding. In: Proceedings 19th Australian Conference on Databases (2007)
8. Vaidya, J., Clifton, C., Kantarcioglu, M., Patterson, A.S.: Privacy-preserving decision trees over
vertically partitioned data. ACM Trans. Knowl. Discov. Data 2(3) (2008)
9. Xiao, M.J., Huang, L.S., Luo, Y.L., Shen, H.: Privacy preserving ID3 algorithm over horizon-
tally partitioned data. In: Proceedings 6th International Conference on Parallel and Distributed
Computing Applications and Technologies, pp. 239–243 (2005)
Learning Rules from User Behaviour
Domenico Corapi, Oliver Ray, Alessandra Russo, Arosha Bandara, and Emil Lupu
1 Introduction
Corapi, D., Ray, O., Russo, A., Bandara, A. and Lupu, E., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 459–468.
460 Domenico Corapi, Oliver Ray, Alessandra Russo, Arosha Bandara, and Emil Lupu
modular enforcement through policy frameworks [1, 18] and principled representa-
tions of space and time [23]. Logic programming is an ideal choice for knowledge
representation from a computational point of view and it also benefits from Induc-
tive Logic Programming (ILP) [17] tools that permit the learning of logic programs
from examples.
Learning rules of user behaviour through inductive reasoning poses several chal-
lenges. Learning must be incremental: as examples of user behaviour are continu-
ously added, the system must permit periodic revision of the rules and knowledge
learnt. Moreover, the system must cater for temporal aspects, expressing both persis-
tence and change through time, and exceptions to previously learnt rules. For this,
the system must be capable of non-monotonic1 reasoning [13]. The system must
reason with partial information whilst providing fine grained control of the reason-
ing process to satisfy appropriate user-defined language and search biases (such as
minimising the changes made to the initial theory).
This paper presents an algorithm for learning and revising models of user be-
haviour which makes use of a non-monotonic ILP system, called XHAIL (eXtended
Hybrid Abductive Inductive Learning) [19], that is capable of learning normal logic
programs from a given set of examples and background knowledge. The contribu-
tion of this paper is twofold. First a novel algorithm is presented that is able to
perform general theory revision by supporting automatic computation of new theo-
0
ries T that are not necessarily extensions of original theories T , to correctly account
for newly acquired examples E. Second, an application of the algorithm to learning
rules describing behaviour of mobile phone users is presented by means of a simpli-
fied example consisting of learning the circumstances in which users accept, reject
or ignore calls. Once learnt, these rules can be periodically reviewed and amended
by the user and enacted automatically on the device avoiding user intervention. This
work is part of a larger project [2] that seeks to exploit the proposed approach in the
context of privacy policies.
The paper is structured as follows. Section 2 summarises relevant background
material on ILP. Section 3 describes the main features of the approach by intro-
ducing basic concepts, presenting a learning-based theory revision algorithm and
illustrating its application to the example. Section 4 relates our approach with other
existing techniques for theory revision. Section 5 concludes the paper with a sum-
mary and some remarks about future work.
2 Background
Inductive Logic Programming (ILP) [17] is concerned with the computation of hy-
potheses H that generalise a set of (positive and negative) examples E with respect
to a prior background knowledge B. In this paper, we consider the case when B and
H are normal logic programs [11], E is a set of ground literals (with positive and
negative ground literals representing positive and negative examples, respectively)
and H satisfies the condition B ∪ H E under the credulous stable model semantics
[9]. As formalised in Definition 1 below, it is usual to further restrict the clauses in
H to a set of clauses S called the hypothesis space.
Definition 1. Given a normal logic program B, a set of ground literals E, and a set
clauses S, the task of ILP is to find a normal logic program H ⊆ S, consistent with B
such that B ∪ H |= E. In this case, H is called an inductive generalisation of E wrt.
B and S.
We use the XHAIL system that, in a three-phase approach [21], constructs and
generalises a set of ground hypotheses K, called a Kernel Set of B and E. This
can be regarded as a non-monotonic multi-clause generalisation of the Bottom Set
concept [16] used in several well-known monotonic ILP systems. Like most ILP
systems, XHAIL heavily exploits a language bias, specified by a set M of so called
mode declarations [16], to bound the ILP hypothesis space when constructing and
generalising a Kernel Set. A mode declaration m ∈ M is either a head declaration
of the form modeh(s) or a body declaration of the form modeb(s) where s is a
ground literal, called scheme, containing placemarker terms of the form +t, −t and
#t which must be replaced by input variables, output variables, and constants of
type t respectively. For example modeb(in group(+contact, #contact list group))
allows rules in H to contain literals with the predicate in group; the first argument of
type contact must be an input variable (i.e. an input variable in the head or an output
variable in some preceding body literal), the second of type contact list group must
be a ground term. The three phases of the XHAIL approach are implemented using
a non-monotonic answer set solver. In the first phase, the head declarations are used
to abduce a set of ground atoms ∆ such that T ∪ ∆ E. Atoms in ∆ are the head
atoms of the Kernel Set. In the second phase, the body atoms of the Kernel Set are
computed as successful instances of queries obtained from the body declarations in
M. In the third phase, the hypothesis is computed by searching for a compressive
theory H that subsumes the Kernel Set, is consistent with the background knowl-
edge, covers the examples and falls within the hypothesis space.
To represent and reason about dynamic systems, we use the Event Calculus
(EC) formalism [23]. EC normal logic programs include core domain-independent
rules describing general principles for inferring when properties (i.e. fluents) are
true (resp. not true) at particular time-points, denoted as holdsAT (F, T ) (resp.
notholdsAT (F, T )), based on which events have previously occurred (denoted as
happens(E, T )). In addition, the program includes a collection of domain-dependent
rules, describing the effects of events (using the predicates initiates(E, F, T ) and
terminates(E, F, T ), as well as the time-points at which events occur (using the
predicate happens).
462 Domenico Corapi, Oliver Ray, Alessandra Russo, Arosha Bandara, and Emil Lupu
The algorithm consists of three phases: the pre-processing phase that “transforms”
the rules of the user theory U, into “defeasible” rules with exceptions, the learn-
ing phase that computes exception rules (if any), and the post-processing phase that
“re-factors” the defeasible rules into revised non-defeasible rules based on the ex-
ceptions rules learnt in the second phase. Informally, exception rules learned by
XHAIL are prescriptions for changes in the current user theory U in order to cover
Learning Rules from User Behaviour 463
new examples of user actions. These changes can be addition or deletion of entire
rules, and/or addition or deletion of literals in the body of existing rules.
Input: B background theory; U user agent theory; E set of current examples; M mode declara-
tions
Output: U 0 revised theory according to current examples
/*Pre-processing phase */
Ũ = ∅;
foreach rule αi ← δ i 1 , ..., δ i n ∈ U do
S = {δ i 1 , ..., δ i n };
α∗i denotes the schema in the modeh declaration referring to αi ;
Ũ = Ũ ∪ {αi ← try(i, 1, δ i 1 ), ...,try(i, n, δ i n ), ¬exception(i, αi ) };
M = M ∪ {modeh(exception(#int, α∗i ))}. ;
foreach δ i j ∈ S do
Ũ = Ũ ∪ {try(i, j, δ i j ) ← use(i, j), δ i j } ∪ {try(i, j, δ i j ) ← ¬use(i, j)} ;
end
end
Ũ = Ũ ∪ {use(I, J) ← ¬del(I, J)} ;
M = M ∪ {modeh(del(#int, #int))} ;
/*Learning phase */
H = XHAIL(B ∪ Ũ, E, M) ;
/*Post-processing phase */
U 0 = theory re f actoring(U, H);
return U 0 ;
The algorithm shows how XHAIL, which would normally be used to learn rules
from scratch, can be used to discover a minimal set of revisions to an initial set of
rules as well as new rules. The inputs are a set of mode declaration M, a background
knowledge B, a user theory U and a set of examples E. The former defines the the
atoms that are allowed to be head of new rules and part of the body of the rules. For
instance, body literals can be defined as to not contain conditions about GPS location
but to refer to higher-level location information (e.g. work, home) thus defining a
more appropriate hypothesis space. The background knowledge B, expressed in EC,
defines both static and dynamic domain-specific properties of the device and its
environment, in addition to the EC domain-independent axioms. The body of these
rules includes conditions expressed in terms of happens and holdsAt. The set E of
current examples is a set of do ground literals. The output is a revised user theory
0
U that, together with B, covers the current examples E.
Pre-processing phase: During this phase the given user theory U is rewritten in a
normal logic program, Ũ, suitable for learning exceptions. This consists of the fol-
lowing two syntactic transformations. First, for every rule in U, every body literal
δ ji is replaced by the atom try(i, j, δ ji ), where i is the index of the rule, j is the in-
dex of the body literal in the rule and the third argument is a reified term for the
literal δ ji . Furthermore, the literal ¬exception(i, αi ) is added to the body of the rule
464 Domenico Corapi, Oliver Ray, Alessandra Russo, Arosha Bandara, and Emil Lupu
where i is the index of the rule and α is the reified term for the head of the rule.
Intuitively, this transformation lifts the standard ILP process of learning hypotheses
about examples up to the (meta-)process of learning hypothesis about the rules and
their exception cases. Second, for each try(i, j, δ ji ) introduced in the program, the
rules try(i, j, δ ji ) ← use(i, j), δ ji and try(i, j, δ i j ) ← ¬use(i, j) are added to the pro-
gram together with the definition use(I, J) ← ¬del(I, J) of the predicate use. Head
mode declaration for exception and del are added to M. This sets the learning task
to compute exceptions cases for rules in the current user theory U and instances of
body literals that need to be deleted.
Learning phase: This phase uses the XHAIL system. In the first phase of XHAIL, a
set ∆ of ground atoms of the form exception(i, αi ) and del(i, j) are computed so that
B ∪ U ∪ ∆ |= E. This set indicates the definitions (i) of the predicates αi that need
exceptions, and the literals (with index j) in the rules i that need to be deleted for
the given set of examples E to be entailed by the user theory U. In the second phase,
XHAIL computes the body of the exception rules as instances of body declaration
predicates (e.g. holdsAt) that are derivable from U. In the third phase, the search
for maximally compressed (minimum number of literals in the hypothesis, [15])
hypothesis H such that E is true in a partial stable model ([10], [22]) of B ∪ Ũ ∪ H,
guarantees minimal induced revisions of U.
Post-processing phase: The post-processing phase generates a revised theory U 0
semantically equivalent to Ũ ∪ H (and thus consistent with E). The algorithm is
computationally simple. Informally, for each del(i, j) fact in H the correspond-
ing condition j in rule i in U is deleted. For each exception rule in H of the form
exception(i, αi ) ← c1 , ..., cn , the corresponding rule i in U is substituted with n new
rules, one for each condition ch , 1 ≤ h ≤ n. Each of these rules (h) will have in the
head the predicate αi and in the body all conditions present in the original rule i in
U plus the additional condition ¬c(h). An exception with empty body results in the
original rule i to be deleted.
3.2 Example
This section illustrates Algorithm 3.1 with a simple case study where we aim to
learn rules that define the context in which a user accepts incoming calls on a mobile
phone. This is part of a larger test case in which user actions and contextual data are
derived from real data on mobile phone usage collected in the Cityware project [4].
We used data collected over three days, running a revision step at the end of each
day. Due to space limitations only the outcome of the third day is shown here.
A user theory U is revised up to the end of the second day:
The set of examples collected in the third day is diagrammatically presented in Fig-
ure 1.
Fig. 1 Example scenario (C, H and F denote incoming calls from the user’s college, home, and
friends contact lists respectively. Refused calls are marked with a “ ”)
U must be revised, since two calls from contacts not included in the College contact
list are answered while at Imperial2 ), but no rule in U covers this case. The pre-
processing phase transforms the user theory U into the following theory Ũ:
Given Ũ, it is only possible to prove the examples set E by abducing the predicates
exception and del. The former is to be abduced to explain calls rejected or ignored
by the user and the latter is to be abduced to explain calls the user accepts (currently
not covered by U). Since XHAIL computes a minimal number of exception and del
clauses, U will be minimally revised. Thus the learning phase at the end of the third
day gives the hypothesis:
The post-processing phase will then give the following revised user theory
Note that the choice between learning a new rule or revising an existing one, when
both solutions are acceptable, is driven by minimality. In this way, we can preserve
much of the knowledge learnt from previous examples. In the above case study, each
revision computed by XHAIL took a couple of seconds on a Pentium laptop PC.
Although statistical techniques [6] may be necessary to process, classify and aggre-
gate raw sensor data upstream, the core logical methodology described in this paper
is well suited to learning user rules. For the application point of view it enables the
Learning Rules from User Behaviour 467
References
1. et al., E.L.: AMUSe: Autonomic Management of Ubiquitous Systems for e-health. J. Conc.
and Comp.: Practice and Experience 20(3), 277–295 (2008)
2. Bandara, A., Nuseibeh, B., Price, B., Rogers, Y., Dulay, N., et al.: Privacy rights management
for mobile applications. In: 4th Int. Symposium on Usable Privacy and Security. Pittsburgh
(2008)
3. Brodie, C., Karat, C., Karat, J., Feng, J.: Usable security and privacy: a case study of devel-
oping privacy management tools. In: SOUPS ’05: Proc. of the 2005 symp. on Usable privacy
and security, pp. 35–43. ACM, New York, NY, USA (2005)
4. Cityware: Urban design and pervasive systems. http://www.cityware.org.uk/
5. De Raedt, L., Thomas, G., Getoor, L., Kersting, K., Muggleton, S. (eds.): Probabilistic,
Logical and Relational Learning - A Further Synthesis, 15.04. - 20.04.2007. IBFI, Schloss
Dagstuhl, Germany (2008)
6. Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. Personal and Ubiq-
uitous Computing 10(4), 255–268 (2006)
7. Esposito, F., Ferilli, S., Fanizzi, N., Basile, T., Di Mauro, N.: Incremental learning and concept
drift in inthelex. Intell. Data Anal. 8(3), 213–237 (2004)
8. Esposito, F., Semeraro, G., Fanizzi, N., Ferilli, S.: Multistrategy theory revision: Induction and
Abduction in INTHELEX. Mach. Learn. 38(1-2), 133–156 (2000)
9. Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: R. Kowal-
ski, K. Bowen (eds.) Logic Programming, pp. 1070–1080. MIT Press (1988)
10. Kakas, A., Kowalski, R., Toni, F.: Abductive logic programming. J. Log. Comput. 2(6), 719–
770 (1992)
11. Lloyd, J.: Foundations of Logic Programming, 2nd Edition. Springer (1987)
12. Ma, J., Russo, A., Broda, K., Clark, K.: DARE: a system for Distributed Abductive REasoning.
J. Autonomous Agents and Multi-Agent Systems 16, 271–297 (2008)
13. Minker, J.: An overview of nonmonotonic reasoning and logic programming. Tech. Rep.
UMIACS-TR-91-112, CS-TR-2736, University of Maryland, College Park, Maryland 20742
(August 1991)
14. Moyle, S.: An investigation into theory completion techniques in inductive logic. Ph.D. thesis,
University of Oxford (2003)
15. Muggleton, S.: Inverse entailment and Progol. New Generation Comput. J. 13, 245–286
16. Muggleton, S.: Learning from positive data. In: 6th Int. Workshop on Inductive Logic Pro-
gramming, pp. 358–376. Springer Verlag, London, U.K. (1996)
17. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. J. of Logic
Programming 19/20, 629–679 (1994)
18. Ponder2: The ponder2 policy environment. www.ponder2.net
19. Ray, O.: Nonmonotonic abductive inductive learning. In: Journal of Applied Logic. (Elsevier,
in press) (2008)
20. Richards, B., Mooney, R.J.: Automated refinement of first-order horn-clause domain theories.
Machine Learning 19(2), 95–131 (1995)
21. Russo, A.: A hybrid abductive inductive proof procedure. Logic J. of the IGPL 12, 371–
397(27)
22. Sacca, D., Zaniolo, C.: Stable models and non-determinism in logic programs with negation
23. Shanahan, M.: The event calculus explained. In: Artificial Intelligence Today, pp. 409–430
(1999)
24. Widmer, G.: Learning in the presence of concept drift and hidden contexts. In: Machine
Learning, pp. 69–101 (1996)
25. Wogulis, J., Pazzani, M.: A methodology for evaluating theory revision systems: Results with
Audrey II. In: 13th IJCAI, pp. 1128–1134 (1993)
Behaviour Recognition using the Event Calculus
1 Introduction
Artikis, A. and Paliouras, G., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 469–478.
470 Alexander Artikis and George Paliouras
holdsAt( F = V, T ) ←
initially( F = V ), (1)
not broken( F = V, 0, T )
holdsAt( F = V, T ) ←
happens( Act, T 0 ),
T 0 < T, (2)
initiates( Act, F = V, T 0 ),
not broken( F = V, T 0 , T )
According to axiom (1) a fluent holds at time T if it held initially (time 0) and has not
been ‘broken’ in the meantime, that is, terminated between times 0 and T . Axiom
(2) specifies that a fluent holds at a time T if it was initiated at some earlier time T 0
and has not been terminated between T 0 and T . ‘not’ represents ‘negation by failure’
[3]. The domain-independent predicate broken is defined as follows:
broken( F = V, T1 , T3 ) ←
happens( Act, T2 ),
(3)
T1 ≤ T2 , T2 < T3 ,
terminates( Act, F = V, T2 )
F =V is ‘broken’ between T1 and T3 if an event takes place in that interval that ter-
minates F =V . A fluent cannot have more than one value at any time. The following
domain-independent axiom captures this feature:
terminates( Act, F = V, T ) ←
initiates( Act, F = V 0 , T ), (4)
V 6= V 0
Axiom (4) states that if an action Act initiates F =V 0 then Act also terminates F =V ,
for all other possible values V of the fluent F. We do not insist that a fluent must
have a value at every time-point. In this version of EC, therefore, there is a differ-
ence between initiating a Boolean fluent F = false and terminating F = true: the first
implies, but is not implied by, the second.
We make the following further comments regarding this version of EC. First, the
domain-independent EC axioms (1)–(4) specify that a fluent does not hold at the
time that was initiated but holds at the time it was terminated. Second, in addition to
the presented domain-independent definitions, the holdsAt and terminates predicates
may be defined in a domain-dependent manner. The happens, initially and initiates
predicates are defined only in a domain-dependent manner. Third, in addition to
axioms (1)–(4), the domain-independent axioms of EC include those defining the
holdsFor predicate, that is, the predicate for computing the intervals in which a fluent
holds. To save space we do not present here the definition of holdsFor; the interested
472 Alexander Artikis and George Paliouras
reader is referred to the source code of the long-term behaviour recognition (LTBR)
system, which is available upon request.
1 http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/
Behaviour Recognition using the Event Calculus 473
Person close to Object at T (in a sense to be specified below), and Person has ap-
peared at some time earlier than T . The appearance fluent records the times in
which an object/person ‘appears’ and ‘disappears’. The close(A, B, D) fluent is true
when the distance between A and B is at most D. The distance between two tracked
objects/people is computed given their coordinates. Based on our empirical analysis
the distance between a person leaving an object and the object is at most 30.
An object exhibits only inactive short-term behaviour. Any other type of short-
term behaviour would imply that what is tracked is not an object. Therefore, the
short-term behaviours active, walking and running do not initiate the leaving object
fluent. In the CAVIAR videos an object carried by a person is not tracked — only
the person that carries it is tracked. The object will be tracked, that is, ‘appear’,
if and only if the person leaves it somewhere. Consequently, given axiom (5), the
leaving object behaviour will be recognised only when a person leaves an object
(see the second line of axiom (5)), not when a person carries an object.
Axiom (6) expresses the conditions in which a leaving object behaviour ceases
to be recognised. In brief, leaving object is terminated when the object in question
is picked up. exit(A) is an event that takes place when appearance(A) = disappear.
An object that is picked up by someone is no longer tracked — it ‘disappears’ —
triggering an exit event which in turn terminates leaving object.
The long-term behaviour immobile was defined in order to signify that a person
is resting in a chair or on the floor, or has fallen on the floor (fainted, for example).
Below is (a simplified version of) an axiom of the immobile definition:
active for more than 54 frames. We insist that Person in immobile(Person) has been
active or walking before being inactive in order to distinguish between a left object,
which is inactive from the first time it is tracked, from an immobile person.
immobile(Person) is terminated when Person starts walking, running or ‘disap-
pears’ — see axioms (8)–(10) below:
According to axiom (11) moving is initiated when two people are walking and are
close to each other (their distance is at most 34). moving is terminated when the
people walk away from each other, that is, their distance becomes greater than 34
(see axiom (12)), when they stop moving, that is, become active (see axiom (13))
or inactive, when one of them starts running (see axiom (14)), or when one of them
‘disappears’ (see axiom (15)).
The following axioms express the conditions in which meeting is recognised:
meeting is initiated when two people ‘interact’: at least one of them is active or
inactive, the other is not running, and the distance between them is at most 25.
This interaction phase can be seen as some form of greeting (for example, a hand-
shake). meeting is terminated when the two people walk away from each other, or
one of them starts running or ‘disappears’. The axioms representing the termina-
tion of meeting are similar to axioms (12), (14) and (15). Note that meeting may
overlap with moving: two people interact and then start moving, that is, walk while
Behaviour Recognition using the Event Calculus 475
being close to each other. In general, however, there is no fixed relationship between
meeting and moving.
The axioms below present the conditions in which fighting is initiated:
Two people are assumed to be fighting if at least one of them is active or running, the
other is not inactive, and the distance between them is at most 24. We have specified
that running initiates fighting because, in the CAVIAR dataset, moving abruptly,
which is what happens during a fight, is often classified as running. fighting is ter-
minated when one of the people walks or runs away from the other, or ‘disappears’
— see axioms (20)–(22) below:
Under certain circumstances LTBR recognises both fighting and meeting — this
happens when two people are active and the distance between them is at most 24.
This problem would be resolved if the CAVIAR dataset included a short-term be-
haviour for abrupt motion, which would be used (instead of the short-term behaviour
active) to initiate fighting, but would not be used to initiate meeting.
5 Experimental Results
LTBR recognised 9 meeting behaviours, 6 of which took place and 3 did not take
place. 2 FP concerned fighting behaviours realised by people being active and close
to each other. As mentioned in the previous section, in these cases LTBR recog-
nises both meeting and fighting. The third FP was due to the fact that two people
were active and close to each other, but were not interacting. LTBR did not recog-
nise 3 meeting behaviours. 2 FN were due to the fact that the distance between the
people in the meeting was greater than the threshold we have specified. If we in-
creased that threshold LTBR would correctly recognise these 2 meeting behaviours.
However, the number of FP for meeting would substantially increase. Therefore we
chose not to increase the threshold distance. The third FN was due to the fact that
the short-term behaviours of the people interacting — handshaking — were clas-
sified as walking (although one of them was actually active). We chose to specify
that walking does not initiate a meeting in order to avoid incorrectly recognising
meetings when people simply walk close to each other.
Regarding fighting we had 4 TP, 8 FP and 2 FN. The FP were mainly due to the
fact that when a meeting takes place LTBR often recognises the long-term behaviour
fighting (as well as meeting). LTBR did not recognise 2 fighting behaviours because
in these two cases the short-term behaviours of the people fighting were classified
as walking (recall the discussion on the recognition of moving). We chose to specify
that walking does not initiate fighting. Allowing walking to initiate fighting (pro-
vided, of course, that two people are close to each other) would substantially in-
crease the number of FP for fighting, because fighting would be recognised every
time a person walked close to another person.
6 Discussion
2 http://crs.elibel.tm.fr/
478 Alexander Artikis and George Paliouras
Acknowledgements We would like to thank Anastasios Skarlatidis for converting the XML rep-
resentation of the CAVIAR dataset into an Event Calculus representation.
References
1. D. Alrajeh, O. Ray, A. Russo, and S. Uchitel. Extracting requirements from scenarios with
ILP. In Inductive Logic Programming, volume LNAI 4455. Springer, 2007.
2. A. Artikis, M. Sergot, and J. Pitt. Specifying norm-governed computational societies. ACM
Transactions on Computational Logic, 10(1), 2009.
3. K. Clark. Negation as failure. In H. Gallaire and J. Minker, editors, Logic and Databases,
pages 293–322. Plenum Press, 1978.
4. C. Dousson and P. Le Maigat. Chronicle recognition improvement using temporal focusing
and hierarchisation. In Proceedings International Joint Conference on Artificial Intelligence
(IJCAI), pages 324–329, 2007.
5. M. Ghallab. On chronicles: Representation, on-line recognition and learning. In Proceedings
Conference on Principles of Knowledge Representation and Reasoning, pages 597–606, 1996.
6. R. Kowalski and M. Sergot. A logic-based calculus of events. New Generation Computing,
4(1):67–96, 1986.
7. D. Luckham. The Power of Events: An Introduction to Complex Event Processing in Dis-
tributed Enterprise Systems. Addison-Wesley, 2002.
8. R. Miller and M. Shanahan. The event calculus in a classical logic — alternative axiomatiza-
tions. Journal of Electronic Transactions on Artificial Intelligence, 4(16), 2000.
9. M. Shanahan. The event calculus explained. In M. Wooldridge and M. Veloso, editors, Artifi-
cial Intelligence Today, LNAI 1600, pages 409–430. Springer, 1999.
10. V.-T. Vu. Temporal Scenarios for Automatic Video Interpretation. PhD thesis, Université de
Nice — Sophia Antipolis, 2004.
11. V.-T. Vu, F. Brémond, and M. Thonnat. Automatic video interpretation: A novel algorithm
for temporal scenario recognition. In Proceedings International Joint Conference on Artificial
Intelligence, pages 1295–1302, 2003.
Multi-Source Causal Analysis:
Learning Bayesian Networks from
Multiple Datasets
1 Introduction
Unlike humans that continuously and synthetically learn from their observa-
tions modern data-analysis fields, for the greatest part, approach learning as
single, isolated, and independent tasks. The data analyzed form a relatively
Ioannis Tsamardinos
CSD, University of Crete and BMI, ICS, FORTH, e-mail: tsamard at ics and forth and gr
Asimakis P. Mariglis
BMI, ICS, FORTH, and Physics Dept, University of Crete
Tsamardinos, I. and Mariglis, A.P., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 479–490.
480 Ioannis Tsamardinos and Asimakis P. Mariglis
have yet to fully address, or even focus, on the problem of synthesizing such
information.
The current practice is instead for humans to serve as the means of
integrating the extracted knowledge. Researchers read the scientific litera-
ture and form inside their heads a (causal, arguably) model of the working
mechanisms of the entity they study. A conscious effort of manual knowl-
edge synthesis in biology is for example the KEGG PATHWAY database
at http://www.genome.jp/kegg/pathway.html defined as ”... a collection of
manually drawn pathway maps representing our knowledge on the molecu-
lar interaction and reaction networks ”. Obviously, the manual synthesis of
information is severely limited by our mental capacities.
At a first glance, it may seem impossible that the above datasets can be
analyzed simultaneously. However, modern theories of causality are gradu-
ally making this possible. We argue that the concept of causality is funda-
mental in achieving an automated, or semi-automated, combined analysis
of different data-sources as in the above scenario. This is because causality
(a) can model the effects of actions, such as setting different experimental
conditions and sampling methodology and (b) it can be inferred by tests of
conditional independence that can be performed on different datasets. We
introduce the general problem and conceptual framework of Multi-Source
Causal Analysis (MSCA) defined as the problem of inferring and inducing
causal knowledge from multiple sources of data and knowledge. We consider
as the main components of MSCA (i) a formal representation and model-
ing of causal knowledge, (ii) algorithms for inducing causality from multiple
sources and for making causal inferences, and (iii) the ability to justify and
explain the causal inferences to human experts. The vision of MSCA is to en-
able data and knowledge synthesis on a grand-scale were thousands of studies
are simultaneously analyzed to produce causal models involving large parts
of human concepts. We now present algorithms that allow the simultaneous
inference of causal knowledge from the above datasets.
We assume the readers’ familiarity with the standard Pearl [5] and Spirtes et
al. [6] causality framework based on the concept of Causal Bayesian Network
and only briefly review it. We consider the standard notions of probabilistic
causality for “X is causing Y ” and for defining direct causality. Let us con-
sider a set of random variables V. We represent the causal structure among
the variables in V with a directed acyclic graph (DAG) G with the vertexes
corresponding to the variables V; an edge X → Y exists in G if and only if
X directly causes Y relatively to V. We define a Causal Bayesian Network
(CBN) as the tuple hG, P i , where G is a causal structure over V and P is
the joint probability distribution of variables V. We assume that for a CNB
482 Ioannis Tsamardinos and Asimakis P. Mariglis
hG, P i the Causal Markov Condition (CMC) holds: every variable X is prob-
abilistically independent of any subsets of its non-effects (direct or indirect)
given its direct causes. A causal graph G is depicted in Figure 1(a).
We denote the independence of X with Y given Z as I(X; Y |Z). We also
denote the d-separation of two nodes X and Y by a subset Z as Dsep(X; Y |Z)
(see [5] for a formal definition). The d-separation criterion is a graphical
criterion that determines all the independencies in the distribution P that
are entailed by the graph and the CMC: Dsep(X; Y |Z) ⇒ I(X; Y |Z). If
¬Dsep(X; Y |Z) we say that X is d-connected to Y given Z. For a broad
class of distributions, called Faithfull distributions, the converse also holds,
i.e., I(X; Y |Z) ⇔ Dsep(X; Y |Z) named as the Faithfulness Condition (FC).
The name faithful stems from the fact that the graph faithfully represents
all and only the independencies of the distribution; another equivalent way
of expressing faithfulness is that the independencies are a function only of
the causal structure and not accidental properties derived by a fine tuning
of the distribution parameters. In Pearl’s terminology, faithful distributions
and corresponding CBNs are called stable: under small perturbations of the
distribution, the set of independencies remains the same.
A large class of causal discovery algorithms performs statistical tests in
the data to determine whether I(X; Y |Z); subsequently, since in faithful dis-
tributions this is equivalent to Dsep(X; Y |Z), the result of the test imposes
a constraint on the data-generating graph. By combining and propagating
these constraints these algorithms, named constraint-based, can determine
the causal graphs that exactly encode (are consistent with) the independen-
cies observed in the data distribution.
In practice, there are typically many latent variables. We can think of
the variables V partitioned into observed variables O and hidden variables
H: V = O ∪ H, O ∩ H. The data are sampled from the marginal PO of
the observed variables only, i.e., we can only test independencies involving
variables in O. A prototypical, constraint-based causal discovery algorithm
is the FCI [6]. The output of FCI is what is called a Partial Ancestral Graph
(PAG) containing common features of all causal graphs (including ones with
hidden variables) that could faithfully capture the marginal data distribution
PO . A PAG is shown in Figure 1(b). The edges have the following semantics
1
:
• A → B means that A is a direct cause2 of B relatively to O.
• A ↔ B means that neither A directly causes B relatively to O nor vice-
versa, but A and B have a common latent cause.
1 Due to space limitations and for clarity of presentation, we do not discuss here the
possibility of selection bias in sampling the data that can be addressed with the FCI
algorithm.
2 This is a simplification for purposes of removing some technical details from the presen-
tation, not necessary for conveying the main ideas. The exact semantics of an edge is that
there is an inducing path from A into B relative to O, where the concept of inducing path
is defined in [6].
Learning Bayesian Networks from Multiple Datasets 483
• A ¦ − ¦ B with the ¦ denoting the fact that there is at least one causal
graph consistent with the data where ¦ is replaced by an arrowhead and
at least one graph where there is no arrowhead (e.g., an edge A ¦ − ¦ B
means that A → B, A ← B, and A ↔ B are all possible).
We now argue that causality could be used to make inductions about the
data-generating process of samples obtained under different experimental
conditions. This is because a causal model directly encodes the effects of ma-
nipulations of the system. Assume for example that G represents the causal
structure of the system without any intervention. Now assume that in i.i.d.
datasets {Di } a set of variables Mi is being manipulated, i.e., obtains values
set by an external agent performing an experiment. Then, the causal graph
GMi of the system under manipulations Mi is derived from G by removing
all incoming edges into any Vj ∈ Mi . The intuitive explanation is that the
value of Vi now only depends on the external agent and has no other causal
influence [5, 6]. Assuming an unmanipulated graph G and known performed
manipulations {Mi }, graphs {Gi } can be constructed; the fitness of each one
to the corresponding data Di can be estimated. This in turn allows us to
estimate the overall fitness of the assumed model G to the set of datasets
{Di }. For example, the algorithm in [3] greedily searches the space of CBNs
to find the best-fitting graph G to the set of datasets. The key-point is that
unlike causality-based formalisms, correlation-based ones do not allow us to
predict the effect of manipulations Mi in order to fit standard predictive or
diagnostic models or even to perform feature selection. The algorithm in [3]
could jointly analyze Datasets 1 and 2 of the scenario in the introduction.
are no further inferences about the joint P (A, B, C) other than the observed
correlations. Is there anything more to infer from the above datasets? Unlike
pairwise correlations, pairwise causal relations are transitive: if A is causing
B and B is causing C, then A is causing C. These and other more compli-
cated inferences allow us in some cases to induce more causal knowledge from
the combined data, than from each dataset individually.
We will present an example of such inferences on the structure of the union
set of variables V = V1 ∪ V2 . The variables of each dataset are by definition
latent when we infer structure from the other dataset. Thus, we will use the
FCI algorithm. Let us assume that the true (unknown) causal structure is
the one shown in Figure 1. From dataset on V1 we expect to observe the
independencies I(A; B|∅), I(A; D|C), I(A; D|C, B), I(B; D|C), I(B; D|C, A)
and only (because the graph is assumed faithful) and from the dataset V2 we
will observe I(D; F |∅) and only. Running FCI on each dataset independently
(assuming enough sample so that our statistical decisions about conditional
independence are correct) will identify these independencies and obtain the
two PAGs shown in Figure 1(b) and (c) named G1 and G2 . The edges with
a square in one of their ends-points denote that an arrowhead could or not
substitute the end-point. For example, in Figure 1(b) the edge A¦ → C
denotes the fact that FCI cannot determine whether the true edge is A → C
(i.e., A directly causes C) or A ↔ C (i.e., there is no direct causation between
A and C but the observed dependencies and independencies are explained by
the existence of at least one common hidden ancestor H, a.k.a. confounder
A ← H → C).
The models are informally combined as shown in Figure 1(d). An algorithm
that formalizes and automates the procedure combining the PAGs stemming
from different datasets is presented in [7] independently discovered at the
same time our group was designing a version of such an algorithm. Due
to space limitations, we only present some key inferences to illustrate our
argument. There is nothing we can infer about the causation between E and
{A, B, C}. We have no evidence for or against the existence of such causal
relations so the corresponding edges are shown in dashed in the figure. In
addition, there are CBNs that are compatible with any possible direction
of such edges. Similarly, there is no evidence for or against edges between
F and {A, B}. However, one can rule out the case that A → F because
then at least one of the paths D ← C ← A → F or D ← C ← H →
A → F (if there is a latent variable H between C and A) would exist in
the data-generating DAG. That is, there would be a d-connecting path from
F to D given the emptyset, which contradicts the observed independence
I(D, F |∅). With a similar reasoning, the possibility B → F is ruled out.
Even more impressive is the inference that there is no edge between F and
C, a pair of variables that we have never measured together in the available
data! No matter what kind of edge we insert in the graph, it would create
a d-connecting path between F and D. For example, if we insert an edge
F ↔ C, it would imply the path F ← H → C → D, which contradicts the
Learning Bayesian Networks from Multiple Datasets 485
Fig. 1 (a) Presumed true, unknown, causal structure among variables {A, B, C, D, E, F }.
(b)-(c) The causal structure identified by FCI when run on any large-enough dataset over
{A, B, C, D} and {D, E, F } respectively. (d) Informally combining all the causal knowl-
edge together and inferring new knowledge. Edges in dash are possible but have no direct
evidence. Particularly notice, F cannot be a cause of A or B and there is no edge between
C and F even though they are never measured together; this is because such edges would
lead to a contradiction with the observed independencies in the data of (b) and (c).
It is often the case that a common set of variables (i.e., variables that seman-
tically correspond to the same quantity) is observed in different datasets, but
for technical reasons the data cannot be pulled together in one dataset. For
example, the variables may be measured by different equipment and so it may
486 Ioannis Tsamardinos and Asimakis P. Mariglis
be hard to translate the values from all datasets to a common scale. Such
a situation is typical in gene-expression studies: for various technical rea-
sons measurements corresponding to the gene-expression of a specific gene
are not directly comparable among different studies [4]. In psychology and
social sciences different and incomparable methods may be used to measure
a quantity, such as social-economical status, degree of depression, or mental
capacity of patients. When constructing a predictive model it seems difficult
to combine the data together, without first finding a way to translate the
values to a common scale. This rules out most machine-learning methods.
However, certain inferences in constraint-based causal discovery (as in the
previous sections) are possible using only tests of conditional independence.
We now develop a multi-source test of conditional independence that em-
ploys all available data without the need to translate them first. We denote
with T (X; Y |Z) the test of conditional independence of X with Y given
Z . T (X; Y |Z) returns a p-value assuming the null hypothesis of the inde-
pendence. Constraint-based algorithms then use a threshold t rejecting the
independence if T (X; Y |Z) < t (i.e., they accept ¬I(X, Y |Z) ) and accepting
the independence I(X, Y |Z) otherwise. Since, we cannot pull all the data to-
gether, we perform the test of independence T (X; Y |Z) individually in each
2
available dataset Di obtaining the p-values {pi }. Fisher’s
P Inverse χ test can2
then be used to compute a combined statistic S = −2 log pi . S follows a χ
distribution with 2n degrees of freedom, where n is the number of datasets
contributing data to the test, from which we can obtain the combined p-
value p∗ for the test T (X; Y |Z) employing all available data. Other methods
to combine p-values exist too [4].
An important detail of the implementation of the test is the following.
Constraint-based methods do not perform a test T (X; Y |Z) if there is not
enough statistical power. The statistical power is heuristically assumed ad-
equate when there are at least k available samples per parameter to be es-
timated in the test (typically k equals 5 or 10 [8] in single-source analysis;
in our experiments it was set to 15). When combining multiple datasets,
each dataset may not have enough sample to perform the test, but their
combination could have. So, we implemented a new rule: the test T (X; Y |Z)
is
Pperformed when the total average samples per parameter exceeds k, i.e.,
ni /mi ≥ k, where ni the samples of dataset i and mi the parameters esti-
mated by the specific test in dataset i (these maybe different if for example
a variable takes 3 possible values in one dataset and 4 in another). When all
datasets have the same number of parameters for the test, the rule results in
testing whether n/m ≥ k as in the single-dataset case. The new multi-source
test T (X; Y |Z) can be used in any constraint-based algorithm for combining
data from different sources provided: X, Y and Z are measured simultane-
ously and the data are sampled under the same causal structure, experimental
conditions, and sampling conditions (e.g., case-control data cannot be com-
bined with i.i.d. data using this test).
Learning Bayesian Networks from Multiple Datasets 487
70 45
Structural hamming distance
30 20
15
20
10
10 5
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Number of datasets Number of datasets
Fig. 2 Left: the Structural Hamming Distance (SHD) versus the number of datasets, when
each dataset has 500 sample cases. Right: the SHD versus the number of datasets, assuming
a fixed total sample size of 5000. The line with the circles in each graph corresponds to
datasets having no difference in the measurement of their variables. The line with the
diamonds in each graph stems from combining datasets where some non-binary variables
have been binarized by grouping together consecutive values.
We show that the concept of causality, causal theories, and recently developed
algorithms allows one to combine data-sources under different experimental
conditions, different variables sets, or semantically-similar variables to infer
new knowledge about the causal structure of the domain. Omitted due to
space limitations, is a similar discussion about data obtained under different
sampling methods (e.g., case-control vs. i.i.d. data) and selection bias (see
[6] for a discussion). If one examines closely the basic assumptions of the al-
gorithms mentioned, causal induction as presented is based on the following
assumptions: the Causal Markov Condition, the Faithfulness Condition and
acyclicity of causal relations. Even though debatable, these assumptions are
broad, reasonable, and non-parametric (see [5, 6] for a discussion). In addi-
tion, they could be substituted for other sets of assumptions depending on
the domain, e.g., if one is willing to accept linearity and multivariate normal-
ity, Structural Equation Models can be employed for modeling causality that
deal with cyclic (a.k.a. non-recursive) networks.
Other work in data analysis for merging datasets exists. This includes the
field of meta-analysis [4] and multi-task learning [2]. The former mainly fo-
cuses on identifying correlations and effect sizes. It does not model causation
so it can only deal with datasets sampled using the same method over the
same experimental conditions and variables. Multi-task learning is limited to
building predictive models for different tasks with a shared representation
and input space (predictor variables) in order to extract useful common fea-
tures [2]. It cannot deal with the range of different data sources for which
causality provides the potential.
Learning Bayesian Networks from Multiple Datasets 489
References
Abstract In this paper we present a hybrid filtering algorithm that attempts to deal
with low prediction Coverage, a problem especially present in sparse datasets. We
focus on Item HyCoV, an implementation of the proposed approach that incorpo-
rates an additional User-based step to the base Item-based algorithm, in order to take
into account the possible contribution of users similar to the active user. A series of
experiments were executed, aiming to evaluate the proposed approach in terms of
Coverage and Accuracy. The results show that Item HyCov significantly improves
both performance measures, requiring no additional data and minimal modification
of existing filtering systems.
1 Introduction
Vozalis, M.G., Markos, A.I. and Margaritis, K.G., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 491–498.
492 Manolis G. Vozalis, Angelos I. Markos and Konstantinos G. Margaritis
caused by the fact that similar items may have different names and cannot be easily
associated. Furthermore, a filtering algorithm should be able to generate accurate
predictions, as measured by the appropriate metric of choice.
The fraction of items for which predictions can be formed over the total number
of rated items, is measured by prediction Coverage [4]. It is a common occurrence
that a RS will not be able to provide a prediction for specific users on specific items
because of either the sparsity in the data or other parameter restrictions, which are
set during the system’s execution. Systems with low Coverage are less valuable to
users, since they will be limited in the decisions they are able to help with. On the
other hand, RSs with high Coverage, combined with a good accuracy measure, will
correspond better to user needs.
Hybrid systems [3] combine different filtering techniques in order to produce im-
proved recommendations. Content-Boosted Collaborative Filtering [6] improves on
user-based predictions by enhancing the initial matrix of ratings through the appli-
cation of a content-based predictor. Wang et al. [12] formulate a generative prob-
abilistic framework and merge user-based predictions, item-based predictions and
predictions based on data from other but similar users rating other but similar items.
Jin et al. [5] propose a Web recommendation system which integrates collaborative
and content features under the maximum entropy principle.
In this paper, we present a hybrid approach that increases the percentage of items
for which predictions can be generated, while it can potentially improve the sys-
tem’s accuracy. The proposed algorithm combines Item-based and User-based CF
implementations in an attempt to effectively deal with the prediction coverage prob-
lem. A series of experiments were executed in order to evaluate the performance of
the proposed approach.
The rest of this paper is organized as follows: In Section 2 we describe in brief
two existing CF algorithms we built our work upon. In Section 3 we sketch the out-
line of Item HyCov, a hybrid approach. The results of two different sets of experi-
ments that compare the proposed approach with Item-based filtering are discussed
in Section 4. Finally, in Section 5 we draw the conclusions from the outcome of our
experiments and present the future work.
In this section we discuss in brief the two filtering algorithms which will be
utilized by the proposed approach, User-based Collaborative Filtering (UbCF) and
Item-based Collaborative Filtering (IbCF).
The inspiration for UbCF methods comes from the fact that people who agreed
in their subjective evaluation of past items are likely to agree again in the future [7].
The execution steps of the algorithm are (a) Data Representation of the ratings pro-
vided by m users on n items, (b) Neighborhood Formation, where the application
of the selected similarity metric leads to the construction of the active user’s neigh-
A Hybrid Approach for Improving Prediction Coverage of Collaborative Filtering 493
borhood, and (c) Prediction Generation, where, based on this neighborhood, pre-
dictions for items rated by the active user are produced.
IbCF is also based on the creation of neighborhoods. Yet, unlike the User-
based filtering approach, those neighbors consist of similar items rather than similar
users [8].
In this section, we present a hybrid algorithm that keeps the core implementations
of existing recommender systems and enhances them by adding a way to increase
the percentage of items for which a filtering algorithm can generate predictions. In
the following paragraphs we will describe how this general approach can be ap-
plied in the case of Item-based Collaborative Filtering, improving the coverage of
their predictions, and, depending on the various parameter settings, leading to more
accurate recommendations.
The Item HyCov Implementation
Let R be the m × n user-item ratings matrix, where element ri j denotes the rating
that user ui (row i from matrix R) gave to item i j (column j from matrix R).
• Step 1: Item Neighborhood Formation. The basic idea in that step is to isolate
couples of items, i j and ik , which have been rated by a common user, and apply
an appropriate metric to determine their similarity. We utilized the Adjusted Co-
sine Similarity approach, which, as shown in previous experiments [8], performs
better than Cosine-based Similarity or Correlation-based Similarity.
The formula for Adjusted Cosine Similarity of items i j and ik is the following:
where ri j and rik are the ratings that items i j and ik have received from user ui ,
while r¯i is the average of user’s ui ratings. The summations over i are calculated
only for those of the m users who have expressed their opinions over both items.
Based on the calculated similarities, we form item neighborhood IN, which in-
cludes the l items which share the greatest similarity with item i j . Finally, we
require that a possibly high correlation between the active item and a second ran-
dom item is based on an adequate number of commonly rating users, known as
Common User Threshold.
• Step 2: User Neighborhood Formation. User Neighborhood Formation is not
part of the base algorithm of IbCF. It is implemented in the proposed approach
for reasons that are explained in the following step of the procedure. The main
purpose is to create a neighborhood of users most similar to the selected active
user, ua . We achieve that by simply applying Pearson Correlation Similarity as
follows:
494 Manolis G. Vozalis, Angelos I. Markos and Konstantinos G. Margaritis
where sim jk is the Adjusted Cosine Similarity between the active item, i j , and an
item, ik , from its neighborhood, while rak is the rating awarded by the active user
to ik .
However, it is probable, especially when the dataset is sparse, that the active
user hasn’t rated any of the active item’s neighbors. When that happens, the base
algorithm is unable to generate a prediction for the item in mind.
The idea behind the Item HyCov algorithm is that, instead of ignoring the specific
item, and, consequently, accepting a reduction in the achieved Coverage, we can
take into consideration what users similar to the active user, as expressed by be-
longing to his neighborhood, are thinking about item i j . The proposed algorithm
implements this idea by checking whether one or more neighbors of the active
user, as calculated in the User Neighborhood Formation step, have expressed
their opinion on the neighbor items. After identifying which user neighbors have
rated the required items, the Item HyCov algorithm will utilize their ratings and
generate a prediction for the active user, ua , on the active item, i j , by applying
the following equation:
p
∑lk=1 ∑i=1 sim jk ∗ rik
pra j = p (4)
∑lk=1 ∑i=1 |sim jk |
As shown in Equation 4, we generate a prediction for the active user ua by sum-
p
ming up the ratings of one or more of its p neighbors (∑i=1 ) on the l items as
l
taken from the active item’s i j neighborhood (∑k=1 ). The summations over l are
calculated only for those items which have been rated by at least one of the p
A Hybrid Approach for Improving Prediction Coverage of Collaborative Filtering 495
neighbor users. Of course, users with zero correlation with the active user are
excluded. The user ratings are weighted by the corresponding similarity, sim jk ,
between the active item i j and the neighbor item ik , with k = 1, 2, ..., l.
The pseudo-code of the prediction step of the Item HyCov is given in Figure 1.
• Step 4: Measures of Performance. Finally, two evaluation metrics are calcu-
lated, Mean Absolute Error and Coverage [9]. Mean Absolute Error (MAE) mea-
sures the deviation of predictions generated by the RS from the true rating values,
as they were specified by the user. Coverage is computed as the fraction of items
for which a prediction was generated over the total number of items that all avail-
able users have rated in the initial user-item matrix.
A similar approach could be followed in the case of User-based filtering. The main
difference is that when the system cannot provide a prediction for the active item,
its neighborhood is formulated through Item-based CF, and the ratings of neighbor
users on these items are used for Prediction Generation.
4 Experimental Results
In this section we will evaluate the utility of the HyCov method. We will provide
a brief description of the various experiments we executed and then we will present
the results of these experiments.
For the execution of the subsequent experiments we utilized MovieLens, the
dataset publicly available from the GroupLens research group [1]. The MovieLens
dataset, used by several researchers [10, 2], consists of 100.000 ratings which were
assigned by 943 users on 1682 movies. Ratings follow the 1(bad)-5(excellent) nu-
merical scale. The sparsity of the data set is high, at a value of 93.7%. Starting
from the initial data set, a distinct split of training (80%) and test (20%) data was
generated.
At this point, it is necessary to note that while User-based and Item-based CF
each had a couple of changing parameters (size of the user/item neighborhood and
496 Manolis G. Vozalis, Angelos I. Markos and Konstantinos G. Margaritis
0.96 100
i hycov i hycov
ib ib
0.94 95
90
0.92
coverage (%)
85
0.9
MAE
80
0.88
75
0.86
70
0.84 65
0.82 60
0 50 100 150 200 250 0 50 100 150 200 250
item neighborhood size item neighborhood size
Fig. 2 Comparing Item HyCov to IbCF for varying item neighborhood sizes in terms of (a)MAE
and (b)Coverage
0.832 94
i hycov i hycov
ib ib
0.83 93.5
coverage (%)
0.828 93
MAE
0.826 92.5
0.824 92
0.822 91.5
0.82 91
0 5 10 15 20 25 30 0 5 10 15 20 25 30
user neighborhood size user neighborhood size
Fig. 3 Comparing Item HyCov to IbCF for varying user neighborhood sizes in terms of (a)MAE
and (b)Coverage
achieved by the hybrid approach was equal to 0.8211 for a user neighborhood of 12,
while IbCF MAE for the same parameter settings was slightly worse, at 0.8222.
5 Conclusions
This paper presented a filtering algorithm which combines the strengths of two
popular CF approaches, IbCF and UbCF, into a feature combination hybrid. Item
HyCov attempts to deal with low prediction Coverage, a problem especially present
in sparse datasets. The proposed approach was tested using the Movielens dataset
and was compared with plain Item-based CF.
The experimental results indicated that the proposed approach significantly in-
creases the prediction Coverage, with a simultaneous significant improvement of
accuracy in terms of MAE. Another advantage of the present approach is that it re-
quires no additional data and minimal additional implementation or modification of
existing CF recommender systems. However, it must be noted that the application of
498 Manolis G. Vozalis, Angelos I. Markos and Konstantinos G. Margaritis
the hybrid approach may increase the computational cost of the prediction process
up to O(p2 l), for p neighbor users and l neighbor items. This cost may be trans-
ferred to an off-line phase if item or user data change infrequently and therefore
there is no practical need to perform these computations at prediction time.
Further research issues include the incorporation of dimensionality reduction
methods as a preprocessing step, in order to deal with scalability problems (large
number of users and/or items). Moreover, item or user demographic data could
be utilized in the neighborhood formation stage [11]. Finally, further experiments
could be carried out in order to investigate how datasets with different characteris-
tics would affect the performance of the proposed algorithm.
References
1 Introduction
Answer Set Programming (ASP) [1,2] is a powerful and elegant way for incorpo-
rating non-monotonic reasoning into logic programming (LP). Many powerful and
efficient ASP solvers such as Smodels [3,4], DLV, Cmodels, ASSAT, and No-
MoRe have been successfully developed. However, these ASP solvers are re-
stricted to “grounded version of a range-restricted function-free normal programs”
since they adopt a “bottom-up” evaluation-strategy with heuristics [2]. Before an
answer set program containing predicates can be executed, it must be “grounded”;
this is usually achieved with the help of a front-end grounding tool such as Lparse
[5] which transform a predicate ASP into a grounded (propositional) ASP. Thus,
all of the current ASP solvers and their solution-strategies, in essence, work for
only propositional programs. These solution strategies are bottom-up (rather than
top-down or goal-directed) and employ intelligent heuristics (enumeration,
branch-and-bound or tableau) to reduce the search space. It was widely believed
that it is not possible to develop a goal-driven, top-down ASP Solver (i.e., similar
to a query driven Prolog engine). However, recent techniques such as Coinductive
Logic Programming (Co-LP) [6,7] have shown great promise in developing a top-
Min, R., Bansal, A. and Gupta, G., 2009, in IFIP International Federation for Information Processing,
Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I.,
Bramer, M.; (Boston: Springer), pp. 499–508.
500 R. Min, A. Bansal and G. Gupta
down, goal-directed strategy. In this paper, we present a goal-directed or query-
driven approach to computing the stable model of an ASP program that is based
on co-LP and coinductive SLDNF resolution [8]. We term this ASP Solver coin-
ductive ASP solver (co-ASP Solver). Our method eliminates the need for ground-
ing, allows functions, and effectively handles a large class of (possibly infinite)
answer set programs. Note that while the performance of our prototype implemen-
tation is not comparable to those of systems such as S-models, our work is a first
step towards developing a complete method for computing queries for predicate
ASP in a top-down, goal driven manner.
The rest of the paper is organized as follows: we first give a brief overview of
Answer Set Programming, followed by an overview of coinductive logic pro-
gramming and co-SLDNF resolution (i.e., SLDNF resolution extended with coin-
duction). Next we discuss how predicate ASP can be realized using co-SLDNF.
Finally, we present some examples and results from our initial implementation.
Answer Set Programming (ASP) and its stable model semantics [1-4] has been
successfully applied to elegantly solving many problems in nonmonotonic reason-
ing and planning. Answer Set Programming (A-Prolog [1] or AnsProlog [2]) is a
declarative logic programming language. Its basic syntax is of the form:
L0 :- L1, … , Lm, not Lm+1, …, not Ln. (1)
where Li is a literal and n ≥ 0 and n ≥ m. This rule states that L0 holds if L1, … ,
Lm all hold and none of Lm+1, …, Ln hold. In the answer set interpretation [2],
these rules are interpreted to be specifying a set S of propositions called the an-
swer set. In this interpretation, rule (1) states that Lo must be in the answer set S if
L1 through Lm are in S and Lm+1 through Ln are not S. If L0 = ⊥ (or null), then the
rule-head is null (i.e., false) which forces its body to be false (a constraint rule [3]
or a headless-rule). Such a constraint rule is written as follows.
:- L1, … , Lm, not Lm+1, …, not Ln. (2)
This constraint rule forbids an answer set from simultaneously containing all of
the positive literals of the body and not containing any of the negated literals. A
constraint can also be expressed in the form:
Lo :- not Lo, L1, … , Lm, not Lm+1, …, not Ln (3)
A little thought will reveal that (3) can hold only if Lo is false which is only possi-
ble if the conjunction L1, … , Lm, not Lm+1, …, not Ln is false. Thus, one can ob-
serve that (2) and (3) specify the same constraint.
The (stable) models of an answer set program are traditionally computed using the
Gelfond-Lifschitz method [1,2]; Smodels, NoMoRe, and DLV are some of the
Towards Predicate Answer Set Programming 501
well-known implementations of the Gelfond-Lifschitz method. The main diffi-
culty in the execution of answer set programs is caused by the constraint rules (of
the form (2) and (3) above). Such constraint rules force one or more of the literals
L1, … , Lm, to be false or one or more literals “Lm+1, …, Ln” to be true. Note that
“not Lo” may be reached indirectly through other calls when the above rule is in-
voked in response to the call Lo. Such rules are said to contain an odd-cycle in the
predicate dependency graph [9,10]. The predicate dependency graph of an answer
set program is a directed graph consisting of the nodes (the predicate symbols) and
the signed (positive or negative) edges between nodes, where using clause (1) for
illustration, a positive edge is formed from each node corresponding to Li (where
1 ≤ i ≤ m) in the body of clause (1) to its head node L0, and a negative edge is
formed from each node Lj (where m+1 ≤ j ≤ n) in the body of clause (1) to its head
node L0. Li depends evenly (oddly, resp.) on Lj if there is a path in the predicate
dependency graph from Li to Lj with an even (odd, resp.) number of negative
edges. A predicate ASP program is call-consistent if no node depends oddly on it-
self. The atom dependency graph is very similar to the predicate dependency
graph except that it uses the ground instance of the program: its nodes are the
ground atoms and its positive and negative edges are defined with the ground in-
stances of the program. A predicate ASP program is order-consistent if the de-
pendency relations of its atom dependency graph is well-founded (that is, finite
and acyclic).
Note that due to non-deterministic choice of a clause in steps (5) and (6) of co-
SLDNF resolution (Definition 3.1) there may be many successful derivations for a
goal G. Thus a co-SLDNF resolution step may involve expanding with a program
clause with the initial goal G = G0, and the initial state of (G0, E0, χ0+, χ0-) = (G,
∅, ∅, ∅), and Ei+1 = Eiθi+1 (and so on) and may look as follows:
Our current work is an extension of our previous work discussed in [7] for
grounded (propositional) ASP solver to the predicate case. Our approach pos-
sesses the following advantages: First, it works with answer set programs contain-
ing first order predicates with no restrictions placed on them. Second, it eliminates
the preprocessing requirement of grounding, i.e., it directly executes the predicates
in the manner of Prolog. Our method constitutes a top-down/goal-directed/query-
oriented paradigm for executing answer set programs, a radically different alterna-
tive to current ASP solvers. We term ASP solver realized via co-induction as
coinductive ASP Solver (co-ASP Solver). The co-ASP solver’s strategy is first to
504 R. Min, A. Bansal and G. Gupta
transform an ASP program into a coinductive ASP (co-ASP) program and use the
following solution-strategy:
(1) Compute the completion of the program and then execute the query goal
using co-SLDNF resolution on the completed program (this may yield a
partial model).
(2) Avoid loop-positive solution (e.g., p derived coinductively from rules
such as { p :- p. }) during co-SLDNF resolution: This is achieved during
execution by ensuring that coinductive success is allowed while exercis-
ing the coinductive hypothesis rule only if there is at least one interven-
ing call to ‘not’ in between the current call and the matching ancestor
call.
(3) Perform an integrity check on the partial model generated to account for
the constraints: Given an odd-cycle rule of the form { p :- body, not p. },
this integrity check, termed nmr_check is crafted as follows: if p is in
the answer set, then this odd-cycle rule is to be discarded. If p is not in
the answer set, then body must be false. This can be synthesized as the
condition: p ∨ not body must hold true. The integrity check (nmr_chk)
synthesizes this condition for all odd-cycle rules, and is appended to the
query as a preprocessing step.
The solution strategy outlined above has been implemented and preliminary re-
sults are reported below. Our current prototype implementation is a first attempt at
a top-down predicate ASP solver, and thus is not as efficient as current optimized
ASP solvers, SAT solvers, or Constraint Logic Programming in solving practical
problems. However, we are confident that further research will result in much
greater efficiency; indeed our future research efforts are focused on this aspect.
The main contribution of our paper is to demonstrate that top-down execution of
predicate ASP is possible with reasonable efficiency.
Theorem 4.1 (Soundness of co-ASP Solver for a program which is call-consistent
or order-consistent): Let P be a general ASP program which is call-consistent or
order-consistent. If a query Q has a successful co-ASP solution, then Q is a subset
of an answer set.
Theorem 4.2 (Completeness of co-ASP Solver for a program with a stable
model): If P is a general ASP program with a stable model M in the rational Her-
brand base of P, then a query Q consistent with M has a successful co-ASP solu-
tion (i.e., the query Q is present in the answer set corresponding to the stable
model).
We next illustrate our top-down system via some example programs and queries.
Most of the small ASP examples1 and their queries run very fast, usually under
0.0001 CPU seconds. Our test environment is implemented on top of YAP Prolog2
running under Linux in a shared environment with dual core AMD Opteron Proc-
essor 275, with 2GHz with 8GB memory.
Our first example is “move-win,” a program that computes the winning path
in a simple game, tested successfully with various test queries (Fig 5.1). Note that
in all cases the nmr_check integrity constraint is hand-produced.
win(X) move(X,Y)
not
The “move-win” program consists of two parts: (a) facts of move(x,y), to al-
low a move from x to y) and (2) a rule { win(X) :- move(X,Y), not win(Y). } to
infer X to be a winner if there is a move from X to Y, and Y is not a winner. This
is a predicate ASP program which is not call-consistent but order-consistent, and
has two answer sets: { win(a), win(c), win(e) } and { win(b), win(c), win(e) }. Ex-
isting solvers will operate by first grounding the program using the move predi-
cates. However, our system executes the query without grounding (since the pro-
gram is order consistent, the nmr_check integrity constraint is null). Thus, in
response to the query above, we’ll get the answer set { win(a), win(c), win(e) }.
The second example is the Schur number problem for NxB (for N numbers with B
boxes). The problem is to find a combination of N numbers (consecutive integers
from 1 to N) for B boxes (consecutive integers from 1 to B) with one rule and two
constraints. The first rule states that a number X should be paired with one and
only one box Y. The first constraint states that if a number X is paired with a box
B, then double its value, X+X, should not be paired with box B. The second con-
1 More examples and performance data can be found from our Technical Report,
available from: http://www.utdallas.edu/~rkm010300/research/co-ASP.pdf
2 http://www.dcc.fc.up.pt/~vsc/Yap/
506 R. Min, A. Bansal and G. Gupta
straint states that if two numbers, X and Y, are paired with a box B, then their
sum, X+Y, should not be paired with the box B.
%% rules
in(X,B) :- num(X), box(B), not not_in(X,B).
not_in(X,B) :- num(X),box(B),box(BB),B ≠ BB,in(X,BB).
%% constraint rules
:- num(X), box(B), in(X,B), in(X+X,B).
:- num(X), num(Y), box(B), in(X,B), in(Y,B), in(X+Y,B).
The ASP program is then transformed to a co-ASP program (with its completed
definitions added for execution efficiency); the headless rules are transformed to
craft the nmr_check.
%% rules
in(X,B) :- num(X), box(B), not not_in(X,B).
nt(in(X,B)) :- num(X), box(B), not_in(X,B).
not_in(X,B) :- num(X),box(B),box(BB),B\==BB, in(X,BB).
nt(not_in(X,B)) :- num(X), box(B), in(X,B).
%% constraints
nmr_chk :- not nmr_chk1, not nmr_chk2.
nmr_chk1 :- num(X),box(B),in(X,B),(Y is X+X),num(Y),in(Y,B).
nmr_chk2 :- num(X),num(Y),box(B),in(X,B),in(Y,B),
(Z is X+Y), num(Z), in(Z,B).
%% query template
answer :- in(1,B1), in(2,B2), in(3,B3), in(4,B4),
in(5,B5), in(6,B6), in(7,B7), in(8,B8), in(9,B9),
in(10,B10), in(11,B11), in(12,B12).
%% Sample query: ?- answer, nmr_chk.
First, Schur 12x5 is tested with various queries which include partial solutions of
various lengths I (Fig. 5.1; Table 5.1). That is, if I = 12, then the query is a test: all
12 numbers have been placed in the 5 boxes and we are merely checking that the
constraints are met. If I = 0, then the co-ASP Solver searches for solutions from
scratch (i.e., it will guess the placement of all 12 numbers in the 5 boxes provided
subject to constraints). The second case (Fig 5.2; Table 5.2) is the general Schur
NxB problems with I=0 where N ranges from 10 to 18 with B=5.
Towards Predicate Answer Set Programming 507
Cpu sec
80.00 80
Cpu Sec
60
60.00
40
40.00 20
0
20.00
5x10 5x11 5x12 5x13 5x14 5x155x16 5x17 5x18
0.00 Schur BxN
I=12 I=11 I=10 I=9 I=8 I=7 I=6 I=5 I=4 I=3 I=2 I=1
Fig. 5.2 Schur 5x12 (I=Size of the query). Fig. 5.3 Schur BxN (Query size=0).
Schur 5x12 I=12 I=11 I=10 I=9 I=8 I=7 I=6 I=5 I=4
CPU sec. 0.01 0.01 0.19 0.23 0.17 0.44 0.43 0.41 0.43
Table 5.2 Schur BxN problem (B=box, N=number). Query size=0, with a minor tuning.
Schur BxN 5x10 5x11 5x12 5x13 5x14 5x15 5x16 5x17 5x18
CPU sec. 0.13 0.14 0.75 0.80 0.48 4.38 23.17 24.31 130
The performance data of the current prototype system is promising but still in
need of improvement if we compare it with performance on other existing solvers
(even after taking the cost of grounding the program into account). Our main strat-
egy for improving the performance of our current co-ASP solver is to interleave
the execution of candidate answer set generation and nmr_check. Given the query
?- goal, nmr_check, the call to goal will act as the generator of candidate answer
sets while nmr_check will act as a tester of legitimacy of the answer set. This
generation and testing has to be interleaved in the manner of constraint logic pro-
gramming to reduce the search space. Additional improvements can also be made
by improving the representation and look-up of positive and negative hypothesis
tables during co-SLDNF (e.g., using a hash table, or a trie data-structure).
In this paper we presented an execution strategy for answer set programming ex-
tended with predicates. Our execution strategy is goal-directed, in that it starts
with a query goal G and computes the (partial) answer set containing G in a man-
ner similar to SLD resolution. Our strategy is based on the recent discovery of
coinductive logic programming extended with negation as failure. We also pre-
sented results from a preliminary implementation of our top-down scheme. Our
508 R. Min, A. Bansal and G. Gupta
future work is directed towards making the implementation more efficient so as to
be competitive with the state-of-the-art solvers for ASP. We are also investigating
automatic generation of the nmr_check integrity constraint. In many cases, the in-
tegrity constraint can be dynamically generated during execution when the ne-
gated call nt(p) is reached from a call p through an odd cycle.
References
1. Gelfond M, Lifschitz V (1988). The stable model semantics for logic programming. Proc.
of International Logic Programming Conference and Symposium. 1070-1080.
2. Baral C (2003). Knowledge Representation, Reasoning and Declarative Problem Solving.
Cambridge University Press.
3. Niemelä I, Simons, P (1996). Efficient implementation of the well-founded and stable
model semantics. Proc. JICSLP. 289-303. The MIT Press.
4. Simons P, Niemelä I, Soininen, T (2002). Extending and implementing the stable model
semantics. Artificial Intelligence 138(1-2):181-234.
5. Simons P, Syrjanen, T (2003). SMODELS (version 2.27) and LPARSE (version 1.0.13).
http://www.tcs.hut.fi/Software/smodels/
6. Simon L, Mallya A, Bansal A, Gupta G (2006). Coinductive Logic Programming. ICLP'06.
Springer Verlag.
7. Gupta G, Bansal A, Min R et al (2007). Coinductive logic programming and its applica-
tions. Proc. ICLP'07. Springer Verlag.
8. Min R, Gupta G (2008). Negation in Coinductive Logic Programming. Technical Report.
Department of Computer Science. University of Texas at Dallas.
http://www.utdallas.edu/~rkm010300/research/co-SLDNF.pdf
9. Fages F (1994). Consistency of Clark's completion and existence of stable models. Journal
of Methods of Logic in Computer Science 1:51-60.
10. Sato, T (1990). Completed logic programs and their consistency. J Logic Prog 9:33-44.
11. Kripke S (1985). Outline of a Theory of Truth. Journal of Philosophy 72:690-716.
12. Fitting, M (1985). A Kripke-Kleene semantics for logic programs. Journal of Logic Pro-
gramming 2:295-312.
13. Simon L, Bansal A, Mallya A et al (2007). Co-Logic Programming. ICALP'07.
14. Colmerauer A (1978). Prolog and Infinite Trees. In: Clark KL, Tarnlund S-A (eds) Logic
Programming. Prenum Press, New York.
15. Maher, MJ (1988). Complete Axiomatizations of the Algebras of Finite, Rational and Infi-
nite Trees. Proc. 3rd Logic in Computer Science Conference. Edinburgh, UK.
An Adaptive Resource Allocating Neuro-Fuzzy
Inference System with Sensitivity Analysis
Resource Control
1 Introduction
The guiding principle of soft computing is to exploit the tolerance for imprecision
by devising methods of computation that lead to an acceptable solution at low cost
[1]. The presented paper, following the same philosophy, proposes a methodology
that aims to reduce complexity as well as computational cost by applying a novel
resource control-via-evaluation technique. This technique leads to a dynamic and
robust network structure that abides by the demands of real time operation in non-
stationary environments and by modern requirements concerning energy saving.
Structure identification in neurofuzzy modeling is a relatively underestimated
field, in contrast to system parameters adjustment [2]. In most system design ap-
proaches, the structure, which usually implies the number of input and rule nodes,
is presumed and only parameter identification is performed to obtain the coeffi-
Pertselakis, M., Raouzaiou, N. and Stafylopatis, A., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 509–516.
510 A. Pertselakis, N. Raouzaiou and A. Stafylopatis
cients (e.g. weights) of the functional system. Even though the literature offers
some heuristic but practical and systematic methodologies, the problem of struc-
ture determination in fuzzy modeling is yet to be solved [3]. There are many is-
sues in practice that remain to be addressed; the “curse of dimensionality”, which
leads to high computational cost, and finding the optimum number of rules being
the most significant.
A common approach which addresses these problems is node pruning. Node
pruning aims to simplify a network by keeping only those nodes that contribute
the most to the general solution of the problem at hand. Various such methods
have been proposed in literature based on genetic algorithms, reinforcement learn-
ing or other soft computing techniques [4-5]. However, more often than not, most
of these concepts eliminate the less desired nodes from the network (either input
or rule nodes). This action usually leads to reduced computational cost but, on the
other hand, it sacrifices bits of information that can affect performance [6].
Our methodology attempts to tackle the problem from a different perspective.
On one hand, we propose an updated version of a resource allocating neurofuzzy
system [7], which expands its rule base dynamically during training to accommo-
date more rules to capture the given problem efficiently. On the other hand, we
apply a novel sensitivity analysis technique which aims to control and restrict rule
usage in real-time. Therefore, we combine efficiently two techniques with oppos-
ing philosophies to exploit their advantages and eliminate their individual draw-
backs through a hybrid product. The outcome is a well-balanced and possibly op-
timal rule structure. Experiments on various benchmark classification tasks show
that this dynamic and robust approach offers almost the same accuracy as a com-
plete rule base, but with significantly lower amount of computations and less exe-
cution time.
The paper is divided in the following sections: Section 2 summarizes the archi-
tecture and functionality of the resource allocating subsethood-product neurofuzzy
model we employ, whereas in Section 3 we present our proposed methodology of
rule base usage control, which relies on the statistical tool of sensitivity analysis.
Experimental results can be found in Section 4, while useful deductions and plans
for future research conclude the paper in Section 5.
where m is the number of output nodes and yk for iteration t is given by:
c vσjk − vσjk σ l
r l
q (t )
∑ z j v jk +
π
(
v jk + vσjk
r
)
yk (t ) =
j =1
(3)
q (t )
j =1
(
∑ z j vσjk + vσjk
l r
)
Epsilon and delta define the two thresholds upon which the rule insertion con-
ditions are based. Their respective values are usually determined according to the
complexity of the given task.
Sensitivity analysis is a statistical tool that provides us with a way to estimate the
influence of a slight change in value of one variable X towards another variable Y.
In the field of neural networks it has been widely used as a method to extract the
cause and effect relationship between the inputs and outputs of a trained network
[10-12]. The input nodes that produce low sensitivity values can be regarded as in-
significant and can be pruned from the system.
Obtaining the Jacobian matrix by the calculation of the partial derivatives of
the output yk with respect to the input xi, that is ∂yk/∂xi, constitutes the analytical
version of sensitivity analysis [10]. Nevertheless, this method assumes that all in-
put variables are numeric and continuous. When the input variables are discrete or
symbolic, which is not rare in systems involving fuzzy attributes, the partial de-
rivative cannot be deemed of practical significance.
Our methodology suggests the use of the activation strength zj of rule j as our
indicator of sensitivity towards the output variable yk of the output node k. We opt
to do this for two main reasons: First, the evaluation of the system rules is as im-
portant as the input features evaluation and secondly, the activation strength of a
An Adaptive Resource Allocating Neuro-Fuzzy Inference System 513
rule is a quantitative and differentiable variable for most modern neurofuzzy sys-
tems and thus, we can apply sensitivity analysis on a wide variety of algorithms,
regardless of the data set and the nature of the input features. In this paper, we ap-
ply this concept on a trained ARANFIS network model with its fully grown rule
base.
We define Rule Sensitivity RSkj over a trained rule-output pair p as:
∂y k
RS kj( p ) = (4)
∂z j
Using equation (3), Rule Sensitivity for ARANFIS becomes:
r
vσjk − vσjk
l
( σl
v jk + v jk σr
) v jk +
c
π
− yk
RS ( p)
= (5)
kj q (t )
∑z
j =1
j (v σl
jk
σr
+ v jk )
Since each training pair p produces a different sensitivity matrix, we evaluate
RS over the entire training set and obtain the average matrix:
N
∑ [ RS
p =1
( p) 2
kj ]
RS kj ,avg = (6)
N
where N is the total number of patterns of the given data set. To allow accurate
comparison among the variables, it is necessary to scale activation values and out-
puts to the same range using:
(max p =1,..., N {z (j p ) } − min p =1,..., N {z (j p ) })
RS kj ,avg = (7)
(max p =1,..., N { y k( p ) } − min p =1,..., N { y k( p ) })
The RSavg matrix of the trained network shows, ultimately, the relation of each
rule towards each output node. However, we need the relation of each rule to-
wards the general outcome and so we define the significance Φj of rule zj over all
outputs as:
Φ j = max k =1,.., K {RS kj , avg } (8)
Our dynamic rule evaluation uses the significance Φj as a measure to classify
the rules into two subsets based on a rather heuristic approach: If the significance
Φj of a rule is larger or equal to 1 (Φj ≥1), this rule belongs to the main set of rules.
Otherwise, if the significance Φj of a rule is less than 1 (Φj <1), this rule belongs
to the secondary subset of rules. During real-time operation the system first em-
ploys the main rule set which carries the most information. If the output activation
is below a confidence threshold, we add the rules of the second set, thus expand-
ing our pool of knowledge as shown in figure 2.
514 A. Pertselakis, N. Raouzaiou and A. Stafylopatis
4 Experimental Results
5 Conclusions
In this paper we prove the efficiency that can be produced when one combines two
different techniques of opposing philosophies. A novel neurofuzzy system which
relies on a well-balanced and possibly optimal network structure is proposed. Our
system is able to control its resource (rule) usage dynamically, through a confi-
dence measure, which leads to lower computational cost and a flexible structure.
The experimental study showed an improvement in resource usage of more than
30% with a minimal loss in network performance. It should also be noted that the
number of main rules, selected by our method, could be an efficient indicator con-
cerning the optimum number of hidden nodes required for a network and for a
given task.
Nevertheless, our approach in its current form can be applied only on classifi-
cation tasks due to its confidence measure. Thus, the need for an alternative crite-
rion to address regression problems is an important issue under consideration. An
interesting research topic of structure identification would also be the parallel
evaluation of input and rule nodes with an analogous dynamic control mechanism,
but this attempt certainly hides significant risks.
516 A. Pertselakis, N. Raouzaiou and A. Stafylopatis
References
1. Zadeh L A (1994) Fuzzy logic, neural networks, and soft computing. Communica-
tions of the ACM 37:77-84
2. Jang JS R, Sun C-T, Mizutani E (1997) Neuro-Fuzzy and Soft Computing: a compu-
tational approach to learning and machine intelligence. Prentice-Hall Inc., NJ
3. Jang JS R (1994) Structure determination in fuzzy modeling: a fuzzy CART ap-
proach. Proceedings IEEE 3rd International Conf on Fuzzy Systems 1:480-485
4. Pal T, Pal NR (2003) SOGARG: A Self-Organized Genetic Algorithm-based Rule
Generation Scheme for Fuzzy Controllers. IEEE Transactions on Evolutionary Com-
putation, 7(4):397-415
5. Mitra S, Hayashi Y (2000) Neuro-Fuzzy Rule Generation: Survey in Soft Computing
Framework. IEEE Transactions on Neural Networks 11:748-768
6. Nurnberger A, Klose A, Kruse R (2000) Effects of Antecedent Pruning in Fuzzy Clas-
sification Systems. Proceedings 4th International Conference on Knowledge-Based
Intelligent Engineering Systems and Allied Technologies 1:154-157
7. Pertselakis M, Tsapatsoulis N, Kollias S, Stafylopatis A (2003) An adaptive resource
allocating neural fuzzy inference system. Proceedings IEEE 12th Intelligent Systems
Application to Power Systems (electronic proceedings)
8. Platt J (1991) A resource-allocating network for function interpolation. Neural Com-
puting 3(2):213-225
9. Velayutham CS, Kumar S (2005) Asymmetric Subsethood-Product Fuzzy Neural In-
ference System (ASuPFuNIS). IEEE Transactions on Neural Networks 16(1):160-174
10. Engelbrecht P, Cloete I, Zurada JM (1995) Determining the significance of input pa-
rameters using sensitivity analysis. In: Miral J, Sandoval F (eds) From Natural to Arti-
ficial Neural Computing, LNCS, 930:382-388
11. Zurada JM, Malinowski A, Cloete I (1994) Sensitivity Analysis for Minimization of
Input Data Dimension for Feedforward Neural Network. Proceedings IEEE Sympo-
sium on Circuits and Systems, 447-450
12. Howes P, Crook N (1999) Using Input Parameter Influences to Support the Decisions
of Feedforward Neural Networks. Neurocomputing 24(1-3):191-206
13. Cordella LP, Foggia P, Sansone C, Tortorella F, Vento M (1999) Reliability Parame-
ters to Improve Combination Strategies in Multi-Expert Systems. Pattern Analysis
and Application 2:205-214
A Lazy Approach for Machine Learning
Algorithms
Abstract Most machine learning algorithms are eager methods in the sense that a
model is generated with the complete training data set and, afterwards, this model
is used to generalize the new test instances. In this work we study the performance
of different machine learning algorithms when they are learned using a lazy ap-
proach. The idea is to build a classification model once the test instance is received
and this model will only learn a selection of training patterns, the most relevant for
the test instance. The method presented here incorporates a dynamic selection of
training patterns using a weighting function. The lazy approach is applied to ma-
chine learning algorithms based on different paradigms and is validated in different
classification domains.
1 Introduction
Lazy learning methods [1, 2, 9] defer the decision of how to generalize or classify
until a new query is encountered. When the query instance is received, a set of
similar related patterns is retrieved from the available training patterns set and it is
used to classify the new instance. To select these similar patterns, a distance measure
is used having nearby points higher relevance. Lazy methods generally work by
selecting the k nearest input patterns to the query points, in terms of the Euclidean
Inés M. Galván
Universidad Carlos III, Leganés (Madrid) e-mail: [email protected]
José M. Valls
Universidad Carlos III, Leganés (Madrid), e-mail: [email protected]
Nicolas Lecomte
Universidad Carlos III, Leganés (Madrid) e-mail: [email protected]
Pedro Isasi
Universidad Carlos III, Leganés (Madrid) e-mail: [email protected]
Galván, I.M., Valls, J.M., Lecomte, N. and Isasi, P., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 517–522.
518 Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi
The general idea consists of learning a classification model for each query instance
using only a selection of training patterns. A key issue of this method is to weight
the examples in relation to their distance to the query instance in such a way that the
closest examples have the highest weight. The selected examples are included one
or more times in the resulting training subset.
Next, we describe the steps of the lazy approach. Let us consider q an arbitrary
testing pattern described by a n-dimensional vector. Let X = {(xk , yk ), k = 1, ..., N}
be the whole available training data set, where xk are the input attributes and yk the
corresponding class. For each new pattern q, the steps are the following:
A Lazy Approach for Machine Learning Algorithms 519
1. The standard Euclidean distances dk from the pattern q to each input training
pattern are calculated.
2. In order to make the method independent on the distances magnitude, relative
distances must be used. Thus, a relative distance drk is calculated for each training
pattern: drk = dk /dmax , where dmax is the distance from the novel input pattern to
the furthest training pattern.
3. A weighting function or kernel function is used to calculate a weight for each
training pattern from its distance to the test pattern. This function is the inverse
of the relative distance drk :
1
K(xk ) = drk ; k = 1...N (1)
4. These values K(xk ) are normalized in such a way that the sum of them equals the
number of training patterns in X, this is:
N
KN (xk ) = · K(xk ) (2)
∑Nk=1 K(xk )
5. Both the relative distance drk and the normalized weights KN (xk ) are used to de-
cide whether the k − th training pattern is selected and -in that case- how many
times is included in the training subset. They are used to generate a natural num-
ber, nk , following the next rule:
3 Experimental Results
In this paper we have applied the lazy proposed method to five domains from the
UCI Machine Learning Repository 1 : Bupa, Diabetes, Glass, Vehicle and, Balance.
1 http://archive.ics.uci.edu/ml/
520 Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi
All of them are classification domains with numerical attributes, although discrete
attributes could also be used with the appropriate distance. Also, different MLAs
have been chosen as the base algorithm. Although the lazy method can be applied
to any MLA, in this work we have used an algorithm based on trees, C4.5 [7]; an al-
gorithm based on rules, PART [7]; an algorithm based on functions approximations,
Support Vector Machines [8]; and an algorithm based on probabilities, NaiveBayes
[6].
The experiments were performed using the WEKA software package [10] that in-
cludes implementations of the classifiers mentioned before: J48 (a variant of C4.5),
PART, SMO (an implementation of SVM) and NaiveBayes algorithm. The results
for eager or traditional versions of MLAs are obtained directly with WEKA using
for each classifier the default parameters provided by the tool.
The lazy method studied in this paper is implemented and incorporated in the
WEKA Software. Thus, the comparison of eager and lazy versions is possible be-
cause the implementation and parameters of the base algorithms are identical in both
eager and lazy approaches.
In all the experiments the attributes values have been normalized to the [0, 1]
interval. For every domain and every MLA we performed 10 runs using 10-fold
cross-validation, which involves a total of 100 runs. The success rate on validation
data is averaged over the total number of runs.
When the lazy approach is applied, the relative radius is set as a parameter. In
the cases where no training patterns are selected, due to the specific characteristics
of the data space and the value of the radius, the lazy approach used the complete
training data.
Table 1 displays the average success rate on validation data of the classifiers using
the traditional or eager way and the lazy approach studied in this work for the dif-
ferent classification domains, respectively. In most domains and with most MLAs,
the lazy approach is better that the eager version of the algorithms. Only, in few
cases the performance of the lazy approach is similar to those provided by the eager
version, but it is never worse. For instance, in Diabetes domain the performance of
the lazy approach is equal than the eager one for all the classification algorithms.
This also happens for some classifier in the other domains (Glass domain using J48,
Vehicle Domain using Part, Balance using NaiveBayes). However, in most cases,
the lazy approach provides a very important improvement.
When the performance of the MLA is poor, the lazy approach reaches more than
10% of improvement. For instance, this can be observed for the lazy version of
SVM and NaiveBayes in Bupa and Glass domains, or for the lazy version of J48 in
Balance domain.
Comparing both the eager and the lazy versions of all the algorithms, it is inter-
esting to note that the best result in Bupa, Glass and Vehicle domains, is obtained
by the lazy approach of one of the algorithms. In Table 1 the best classification rate
for each domain is marked in bold. For the Bupa domain the best result is 68.90 %,
for the Glass domain 74.20%, for Vehicle 77.78 %, all of them obtained by the lazy
version of one of the algorithms. For the Diabetes and Balance domains, the results
obtained by both the eager and the lazy approaches are the same.
A Lazy Approach for Machine Learning Algorithms 521
For the lazy version of MLAs we have made experiments with different radius
values for each domain and each classification algorithm. The classification rates
displayed in the second column of tables 1 correspond to the radius value that pro-
vided the best performance. We have observed that each domain could need a dif-
ferent radius value, because it depends on how the data are distributed in the input
space. We have also observed that in some domains (Diabetes, Glass and Vehicle)
the most appropriate radius value is the same, independently of the MLA used as
base algorithm. However, in the Bupa domain for J48 and PART the most appro-
priate radius is 0.05 whereas for SVM and NaiveBayes is 0.2. This also happens in
the Balance domain where the best result obtained by the lazy version of J48 and
PART corresponds to a radius value of 0.2; conversely, SVM and NaiveBayes al-
gorithms need a value of 0.1 to obtain the best rate. Certainly, the radius value is a
parameter of the method. Each MLA might require a different number of training
examples (which implies a different radius value) due to the different paradigms
these methods are based on.
4 Conclusions
Most MLAs are eager learning methods because they build a model using the whole
training data set and then this model is used to classify all the new query instances.
The built model is completely independent of the new query instances. Lazy meth-
522 Inés M. Galván, José M. Valls, Nicolas Lecomte and Pedro Isasi
ods work in a different way: when a new query instance needs to be classified, a set
of similar patterns from the available patterns set is selected. The selected patterns
are used to classify the new instance. Sometimes, eager approximations could lead
to poor generalization properties because training data are not evenly distributed in
the input space and a lazy approach could improve the generalization results.
In this paper, we present a lazy method that can be applied to any MLA. In
order to validate the method, we have applied it to some well-known UCI domains
(Bupa, Diabetes,Glass, Vehicle and Balance Scale) using classification algorithms
based on different paradigms, specifically C4.5, PART, Support Vector Machine
and NaiveBayes algorithms. The results show that a lazy approach can reach better
generalization properties. It is interesting to note that the lazy approaches are never
outperformed by the eager versions of the algorithms. In Bupa, Glass and Vehicle
domains the best results are obtained by the lazy version of any of the algorithms.
In Diabetes and Balance domains the best results are obtained by both the eager
and the lazy version of a specific algorithm. In some cases, when the eager versions
of the algorithms have a poor performance, the lazy versions obtain a significant
improvement.
Acknowledgements This article has been financed by the Spanish founded research MEC projects
OPLINK:UC3M Ref:TIN2005-08818-C04-02 and MSTAR:UC3M Ref:TIN2008-06491-C04-03.
References
1. Aha D.W., Kibler D., Albert M.: Instanced-based learning algorithms. Machine Learning,
6:37–66 (1991).
2. Atkeson C.G., Moore A.W., Schaal S.: Locally weighted learning. Artificial Intelligence
Review, 11:11–73 (1997).
3. Dasarathy, B.: Nearest neighbour(NN) norms: NN pattern classification techniques. IEEE
Computer Society Press (1991).
4. Galvan I.M., Isasi P. , Aler R., Valls, J.M.: A selective learning method to improve the gener-
alization of multilayer feedforward neural networks. International Journal of Neural Systems,
11:167–157 (2001).
5. Valls J.M., Galvan I.M., Isasi P.: Lrbnn: A lazy radial basis neural network model. Journal
AI Communications, 20(2):71–86 (2007).
6. Langley P., Iba W., Thompson, K.: An analysis of bayesian classifiers. In National Conference
on Artificial Intelligence (1992).
7. Quinlan R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993).
8. Vapnik V.: Statistical Learning Theory. John Wiley and Sons (1998).
9. Wettschereck D., Aha D.W., Mohri T.: A review and empirical evaluation of feature weighting
methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11:273–314
(1997).
10. Witten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan
Kaufmann (2005)
TELIOS: A Tool for the Automatic Generation
of Logic Programming Machines
Abstract In this paper the tool TELIOS is presented, for the automatic generation
of a hardware machine, corresponding to a given logic program. The machine is
implemented using an FPGA, where a corresponding inference machine, in appli-
cation specific hardware, is created on the FPGA, based on a BNF parser, to carry
out the inference mechanism. The unification mechanism is based on actions em-
bedded between the non-terminal symbols and implemented using special modules
on the FPGA.
1 Introduction
Dimopoulos, A.C., Pavlatos, C. and Papakonstantinou, G., 2009, in IFIP International Federation for
Information Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds.
Iliadis, L., Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 523–528.
524 Alexandros C. Dimopoulos and Christos Pavlatos and George Papakonstantinou
Input string
which are considerably more efficient compared to the traditional ones. Some ef-
forts have been done in the past, towards this direction [4], [11], [6] . In [4] a hard-
ware parser was presented based on the CYK parsing algorithm. In [11] another
hardware parser was presented based on the Earley’s parallel algorithm [7]. Both
parsers have been implemented using FPGAs. In [6] a similar approach to the one
proposed here was presented. Nevertheless, the unification mechanism was imple-
mented using softcore general purpose on chip processors, hence reducing drasti-
cally the speed up obtained by using the hardware parser.
In this paper the tool TELIOS (Tool for the automatic gEneration of LogIc prO-
gramming machineS) is presented. The user describes his logic program in a subset
of PROLOG and the systems generates the necessary code to be downloaded to an
FPGA (Field Programmable Gate Array). This FPGA is the machine for this spe-
cific logic program. The proposed implementation follows the architecture shown in
Fig. 1. The given logic program can be transformed to an equivalent grammar, which
feeds the proposed architecture, in order the different components to be constructed.
The contribution of this paper is:
1. The modification of the hardware parser of [11], in order to be used for logic
programming applications. It is noted that the parser of [11] is two orders of
magnitude faster than the one used in [6].
2. The (automatic) mapping of the unification mechanism, to actions, easily imple-
mentable in FPGAs. To the best of the authors knowledge, this is the first effort
to implement logic programs on FPGAs, without the use of an external real pro-
cessor or a softcore one on the same chip.
2 Theoretical Background
Attribute Grammars (AG) [8] have been extensively used for logic programming
applications [10], [5], [9]. The basic concepts for transforming a logic program to
an equivalent AG are the following: Every inference rule in the initial logic program
can be transformed to an equivalent syntax rule consisting solely of non-terminal
symbols. Obviously, parsing is degenerate since there are no terminal symbols. For
every variable existing in the initial predicates, two attributes are attached to the
corresponding node of the syntax tree, one synthesized and one inherited. Those
attributes assist in the unification process of the inference engine. For more details
TELIOS: A Tool for the Automatic Generation of Logic Programming Machines 525
the user is referred to [10], [9]. The computing power required for the transformation
of logic programs to AGs is the one of L-attributed AGs [8]. In these grammars the
attributes can be evaluated traversing the parse tree from left to right in one pass.
In this paper it is shown that L-attributed AGs are equivalent to “action” gram-
mars, which are introduced in this paper, due to their easy implementation in hard-
ware. Hence, we can transform a logic program to an equivalent action grammar.
The Action Grammars, are defined in this paper as BNF grammars, augmented
with “actions”. Actions are routines which are executed before and after the recog-
nition of an input substring corresponding to a non-terminal. In the rule: < NT >::=
. . . [Ai ] < NTi > . . . < NT j > [A j ], the actions to be taken are the execution of the
routine Ai before recognizing the non-terminal NTi and the execution of the routine
A j after the recognition of the non-terminal NT j . The execution of Ai and A j takes
place after the generation of all possible parse trees. In the case of Earley’s algo-
rithm this is done in parallel, so that at the end of the parsing process all possible
parse trees are available.
As it was stated before, it will be shown here that action grammars are equiva-
lent to L-attributed grammars. For this purpose, some rules must be applied: 1) For
each attribute (synthesized or inherited) a stack is defined, having the same name
as the attribute. 2) At the end of each rule, unstacking of the synthesized attributes,
of the descendant (children nodes) of the non-terminal at the left hand side of the
rule (parent node), is done. These synthesized attributes are at the top of the stack.
The synthesized attribute of the parent node is calculated according to the corre-
sponding semantic rule and is pushed to the appropriate stack as shown in Fig. 2a.
In this way, it is sure that at the top of the stack, the synthesized attributes of the
children nodes of the parent (up to the corresponding child) are placed in sequence.
3) Regarding the inherited attributes: a) A push is done at the corresponding stack
of the inherited attribute, the first time it is evaluated (produced). A pop is done at
the time the inherited attribute is needed (consumed) as in Fig. 2b. b) If in a rule
an inherited attribute is used in more than one children non-terminals (as in Fig.
2b), then the same number of pushes of that attribute should be done. c) If a value
transfer semantic rule (for the same attribute) is needed in the AG, then no action
is required for inherited and synthesized attributes (as in Fig. 2b). In Fig. 2, i is an
inherited attribute, s a synthesized, xi auxiliary (temporary) variables and the arrows
indicate attribute dependencies.
The rules described above, will be further clarified with an example which fol-
lows.
3 An Illustrative Example
1 5
1 7
3 4
7
11
A7 , A2 for the two parse trees) will leave at the top of the stack stack2s the values
“j”, “b” respectively. The predicate names have been abbreviated.
4 Implementation
Chiang & Fu [3] parallelized Early’s parsing algorithm [7], introducing a new oper-
ator ⊗ and proposed a new architecture which requires n∗(n+1)2 processing elements
(PEs) for computing the parse table. A new combination circuit was proposed in
[11] for the implementation of the ⊗ operator. In this paper a modification of the
parsing algorithm of [11] has been done in order to compute the elements of the
parse table PT by the use of only n processing elements that each one handled the
cells belonging to the same column of the PT.
It is obvious that since parsing is top-down, when recursion occurs and no input
string is used (the empty string is the input string), we may have infinite creation
of dotted recursion rule in the boxes. Hence, we have to predefine the maximum re-
cursion depth as well as the maximum number of the input characters (d characters)
in the input string, as installation parameters. The unification mechanism has been
implemented through actions. The parse trees are constructed from the information
provided by the parser. Actions are identified in the Action Identification module
and executed in the Action Execution module (Fig. 1).
The system TELIOS has been implemented in synthesizable Verilog in the XIL-
INX ISE 8.21 environment while the generated source has been simulated for val-
idation, synthesized and tested on a Xilinx SPARTAN 3E FPGA. Furthermore, it
has been tested with hardware examples we could find in the bibliography and in all
cases our system runs faster. In the case of the well-documented “Wumpus World
Game” and of finding paths in a directed acyclic graph [6], our system was two
orders of magnitude faster than the one of [6] required.
The system is very useful in cases where rapid development of small scale intelli-
gent embedded hardware has to be used in special purpose applications, locally in
dangerous areas, in robotics, in intelligent networks of sensors e.t.c.. The system in
its present form accepts a subset of PROLOG e.g. only variables and constants as
parameters of the predicates. Nevertheless, since we have shown the equivalence of
L-attributed grammars with action grammars and L-attributed grammars can cover
many other characteristics of PROLOG [9] (e.g. functors), it is straightforward to
extend our system. Future work aims at: 1) Combining the modules of parsing and
action execution in one module so that parsing will be completely semantically
driven. This will solve the recursion problem in a more efficient way. 2) Extending
the power of the grammar from L-attributed to many passes ones. 3) Applying the
tool in medical applications. 4) Extending the PROLOG subset used in this paper.
Acknowledgements This work has been funded by the project PENED 2003. This project is
part of the OPERATIONAL PROGRAMME ”COMPETITIVENESS” and is co-funded by the
European Social Fund (80%) and National Resources (20%).
References
1 Introduction
Tsikolidaki, V., Pelekis, N. and Theodoridis, Y., 2009, in IFIP International Federation for Information
Processing, Volume 296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L.,
Vlahavas, I., Bramer, M.; (Boston: Springer), pp. 529–534.
530 V. Tsikolidaki, N. Pelekis and Y. Theodoridis
for the automatic production of the knowledge base of a FRBS, which encodes the
expert knowledge in the form of fuzzy rules.
In this paper we propose GF-Miner which is a genetic fuzzy classifier that is
based on a fuzzy rule based system for numerical data, namely Fuzzy Miner [4].
Fuzzy Miner implements a heuristic fuzzy method for the classification of numeri-
cal data. GF-Miner adopts an improved unsupervised clustering method for suc-
ceeding a more natural fuzzy partitioning of the input space. Furthermore, a ge-
netic process is devised that gets rid of needless rules from the initial set of rules
and at the same time refines the rulebase by eliminating unnecessary terms from
the fuzzy if-then rules. As such, GF-Miner optimizes not only the number of the
produced rules but also their size constructing small rules of different sizes which
are more comprehensible by the user and obtain higher classification accuracy.
Two are the most related works to the proposed approach. In [2] the authors
create a rule base and optimize it by using two GA. The first GA constructs a
rulebase of definite user-defined size. The number of fuzzy sets is static for every
variable. The second GA reduces the fuzzy rule base by using the following fit-
ness function: FitnessFun ction (Ci ) = NPC (Ci ) ∗ (L − NR (Ci )) where L is the number of
rules generated in the previous stage, NPC(Ci) is the number of Patterns Correctly
Classified, and NR(Ci) is the number of active rules.
In [3] the authors propose the construction of chromosome from already made
fuzzy rules. The variables are separated also in a predefined number of fuzzy sets.
The genetic algorithm codes the weights of attributes in the fuzzy rules. Every
chromosome consists of real numbers that is the weights of the attributes in the
fuzzy rules. The fitness function is the Classification Accuracy Rate.
In contradiction to the above approaches our work differs in three major points.
First, GF-Miner allows the analyst to specify different number of fuzzy sets per
input variable. Second, the number of generated rules is optimized by the GA and
it is not predetermined by the user. Third, a more efficient fitness function is ad-
opted, which as shown in the experiments, it obtains better results in terms of clas-
sification accuracy, while this is succeeded by maintaining a smaller rulebase.
Outlining the major issues that will be addressed in this paper, our main contri-
butions are: a) Fuzzy Miner [4] is improved by proposing a new fuzzy partitioning
method of the input space utilizing the Fuzzy C-Means (FCM) clustering algo-
rithm [5], b) a genetic algorithm (i.e. GASR) is appropriately devised for the reduc-
tion of the size of the rules that are produced by the improved Fuzzy Miner, c) the
number of the rules in the rulebase is reduced by the use of a second genetic algo-
rithm (i.e. GANR) and d) the efficiency of our approach is demonstrated through an
extensive experimental evaluation using the IRIS benchmark dataset.
The rest of the paper is structured as follows. Section 2 introduces the GF-
Miner, describes its architectural aspects and the proposed fuzzy partitioning
method. The proposed genetic process is described in Section 3, while Section 4
presents the results of our experimental study and evaluates our system. Finally,
Section 5 provides the conclusions of the paper and some interesting research di-
rections.
GF-Miner: a Generic Fuzzy Classifier for Numerical Data 531
2 GF-Miner as an Extension of Fuzzy Miner
GF-Miner is based on Fuzzy Miner [4]. More specifically, it extends Fuzzy Miner
by incorporating a more flexible and unsupervised way to partition the input space
into fuzzy sets and improves it by optimizing its output with the use of GA.
Fuzzy Miner is composed of four principal components: a fuzzification interface, a
knowledge base, a decision-making logic and a defuzzification interface. In GF-
Miner we adapt the fuzzy partition in the database, and we optimize the rulebase
of Fuzzy Miner (i.e. 1st level rulebase) in two different ways (i.e. 2nd and 3rd level
rulebase). The architecture of GF-Miner is shown in Figure 1, while we elaborate
on each of the components in the current and the following section.
GANR rulebase
3rd level
rulebase
GASR 2nd level
KnowledgeBase
non-fuzzy
input rulebase non-fuzzy out-
database put
1st level
fuzzification defuzzification
interface interface
The fuzzification interface performs a mapping that converts crisp values of in-
put variables into fuzzy singletons. On the other end, the defuzzification interface
performs a mapping from the fuzzy output of a FRBS to a crisp output.
Knowledge base: The knowledge base consists of a database and a rulebase.
Database - There are two factors that determine a database, i.e., a fuzzy parti-
tion of the input space and the membership functions of antecedent fuzzy sets.
GF-Miner supports two types of membership functions, i.e. triangular and trape-
zoidal. The Fuzzy Partition partitions the input and output spaces to a sequence of
fuzzy sets. In GF-Miner we use an unsupervised way to define the membership
function by using the FCM clustering algorithm [5]. In Table 1 we show how we
use the cluster centroids Vi= {V1, V2, … VK i }, where Ki is the number of fuzzy sets
for the i-th input variable xi , to determine the parameters for every fuzzy set:
The next fuzzy sets: For j=1 to Ki -2: Vj, Vj+1, Vj+2 Vj, Vj+1-(Vj+1-Vj)/3, Vj+1+(Vj+2-Vj+1)/3, Vj+2
Figure 2 depicts schematically the above fuzzy partition for a single variable.
532 V. Tsikolidaki, N. Pelekis and Y. Theodoridis
The rule base and the decision making logic are the same as in Fuzzy Miner [4].
In this section we present two GA that we devise having as goal to optimize the
rules that have been created so far. In detail the GA are used to reduce the number
and the size of the rules. GASR reduces the size of the rules in the initial rulebase
without eliminating any of them, while GANR reduces the number of the rules that
have been produced after the GASR algorithm has been applied.
The GASR algorithm
Individual representation: The chromosome of an individual represents the an-
tecedent part of the rule. The consequent part of the rule does not need to be coded
and is the same as the consequent part of the corresponding rule.
Let k be the number of rules generated in the previous stage. Then a chromo-
some is composed of k genes, where each gene corresponds to a rule. Each i gene
is partitioned into n binary fields where n is the number of input variables. We use
0 when the specific input variable is not important for the rule.
The GASR proceeds as follows:
Initial Population: It is generated randomly and we additionally introduce a
chromosome that represents all rules, that is all genes will receive value 1.
Fitness Function: It is the number of the patterns correctly classified by the
fuzzy rule base coded in the corresponding chromosome Ci:
FitnessFun ction (Ci ) = NPC (Ci ) , where NPC(Ci) is the number of Patterns Correctly
Classified.
Genetic Operators: For selection we use tournament selector and nonoverlap-
ping populations. Furthermore, we use uniform crossover because this crossover
operation does not take into account the position of every gene and the changes
are randomly made. The mutation is done randomly according to the mutation
probability and transforms 0 to 1 or 1 to 0. As Stopping Condition we used a
maximum number of generations m. The new rule base is represented by the best
chromosome with the best fitness value in the last generation.
Consequently, this GA reduces the number of rule conditions that are not im-
portant, thus their absence not affecting the number of Patterns Correctly Classi-
fied.
GF-Miner: a Generic Fuzzy Classifier for Numerical Data 533
The GANR algorithm
As soon as the fuzzy rule base that has been created contains possibly redun-
dant and/or unnecessary rules, and the aim of GANR is to eliminate some of them,
having in mind that besides compactness the final rule base should continue giving
high classification rates. In this genetic algorithm each individual encodes a set of
prediction rules. The chromosomes are coded as a sequence of binary digits with
the same length as the number of rules generated in the previous stage. Each gene
is associated with one rule. If the binary digit is 1 the rule is active and the rule as-
sociated with this gene will be in the final rule base; otherwise will not. The cross-
over, mutation and stopping condition are the same we used in GASR. We use tour-
nament selection and overlapping populations to reassure that the result will be the
best, as in this case the best chromosomes of every generation are carried over to
the next generation. As fitness function we use the following:
FitnessFunction(Ci ) = CR(Ci )2 ∗ (L − NR(Ci ) + 1) , where CR(Ci) is the Classification Rate,
L is the number of rules generated in the previous stage and NR(Ci) is the number
of active rules. As a result we make sure that we keep a high Classification Rate
and we decrease the number of active rules.
4 Evaluation of GF-MINER
We implemented the proposed method using C++ of Microsoft Visual Studio 6.0.
The GA where implemented using the GAlib which is an object-oriented library of
GA components [6]. The aim of our evaluation is twofold: on the one hand, we
compare the classification accuracy of GF-Miner with the one of the initial FRBS
Fuzzy Miner [4], while on the other hand, we compare it with two state-of-the-art
genetically optimized approaches, namely [2] and [3], which are the most related
to our approach. We used the IRIS dataset obtained from UCI repository of ma-
chine learning databases [7], which consists of 50 samples from each of 3 species
of Iris flowers (setosa, virginica and versicolor) and 4 features were measured
from each sample: the length and the width of sepal and petal.
The data set was partitioned randomly into training and test subset in two ways:
Partition A (70%): 105 instances for training and 45 instances for test.
Partition B (50%): 75 instances for training and 75 instances for test.
We used three fuzzy sets for every variable with triangular membership func-
tion to be compared with [3] and [2] where the writers use both triangular mem-
bership functions with three fuzzy sets. The classification rate and the number of
rules are calculated as the average after 10 runs. We also show the minimum and
the maximum values achieved by each partition.
The experimental results shown in table 2 prove that our approach presents bet-
ter results than the other approaches as we can achieve a higher classification rate
and still generate few rules. We observe that Fuzzy Miner has high classification
rate but constructs many rules. As an improvement we see that GF-Miner in-
534 V. Tsikolidaki, N. Pelekis and Y. Theodoridis
creases the classification rate and at the same time reduces dramatically the num-
ber of the rules.
Table 2. Experimental Results
In this paper we propose GF-Miner which is a genetic fuzzy classifier that im-
proves Fuzzy Miner [4] which is a recently proposed state-of-the-art FRBS for
numerical data. More specifically, we used the FCM clustering algorithm to suc-
ceed a more natural definition of the membership functions of the fuzzy partition,
while the extracted rules are optimized as far as the volume of the rulebase and the
size of each rule is concerned, using two appropriately designed genetic algo-
rithms. As future work we plan to evaluate GF-Miner using high dimensional
datasets. Another direction will be to further improve the genetic algorithms to
minimize their computational cost.
References
1. Cordón O, Gomide F, Herrera F, Hoffmann F, Magdalena L (2004) Ten years of genetic
fuzzy systems: Current framework and new trends, Fuzzy Sets and Systems 41:5-31
2. Castro P, Camargo H (2005) Improving the genetic optimization of fuzzy rule base by im-
posing a constraint condition on the number of rules, V Artificial Intelligence National
Meeting (ENIA), São Leopoldo, Rio Grande de Sul 972-981
3. Chen SM, Lin HL (2006) Generating weighted fuzzy rules from training instances using
genetic algorithms to handle the Iris data classification problem. Journal of Information
Science and Engineering 22: 175-188
4. Pelekis N, Theodoulidis B, Kopanakis I, Theodoridis Y (2005) Fuzzy Miner: Extracting
Fuzzy Rules from Numerical Patterns. International Journal of Data Warehousing and Min-
ing 57-81
5. Bezdek JC (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum
Press, New York
6. Wall M (1996) GAlib: A C++ Library of Genetic Algorithm Components, version 2.4,
Documentation Revision B, Massachusetts Institute of Technology.
http://lancet.mit.edu/ga/. Accessed January 19 2009
7. Blake CL, Merz CJ, (1998) UCI Repository of machine learning databases, Irvine, Univer-
sity of California, Department of Information and Computer Science.
http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed January 19 2009
Fuzzy Dependencies between Preparedness and
Learning Outcome
S. Encheva1, S. Tumin2
1
Stord/Haugesund University College
Bjørnsonsg. 45, 5528 Haugesund
Norway, [email protected]
2
IT Dept., University of Bergen
PO Box 7800, 5020 Bergen
Norway, [email protected]
1 Introduction
Learning objects are the core concept in an approach to learning content in which
content is broken down into "bite size" chunks. These chunks can be reused, inde-
pendently created and maintained, and pulled apart and stuck together like so
many legos, [25].
Learning technology systems and interoperability standards providing reuse of
learning objects and interoperability of content across delivery are developed by
[21], [22], and [23].
SCORM [22] provides technical standards that enable web-based learning sys-
tems to find, import, share, reuse, and export learning content in a standardized
way. However, SCORM is written for toolmakers who know what they need to do
to their products to conform with SCORM technically.
Encheva, S. and Tumin, S., 2009, in IFIP International Federation for Information Processing, Volume
296; Artificial Intelligence Applications and Innovations III; Eds. Iliadis, L., Vlahavas, I., Bramer, M.;
(Boston: Springer), pp. 535–540.
536 Sylvia Encheva and Sharil Tumin
IEEE Learning Object Metadata [21] defines a set of resource description
framework constructs that facilitates introduction of educational metadata into the
semantic web.
HarvestRoad Hive [23] is an independent, federated digital repository system.
It enables the collection, management, discovery, sharing and reuse of LOs used
in the delivery of online courses within higher education.
A lot of work has been done on creating, storing, classifying and filtering learn-
ing objects with respect to a specific subject. Considerable amount of research fo-
cuses on facilitating the process of reusing already available learning objects. This
work is devoted to a study of a decision making process related to recommending
the most appropriate learning objects to a particular student.
2 The Model
A concept is considered by its extent and its intent: the extent consists of all ob-
jects belonging to the concept while the intent is the collection of all attributes
shared by the objects [19]. A context is a triple (G, M, I) where G and M are sets
and I ⊂ G × M. The elements of G and M are called objects and attributes respec-
Fuzzy Dependencies between Preparedness and Learning Outcome 537
tively. The set of all concepts of the context (G, M, I) is a complete lattice and it is
known as the concept lattice of the context (G, M, I).
Fuzzy reasoning methods [20] are often applied in intelligent systems, decision
making and fuzzy control.
A prediction method in [4] applies formal concept analysis and fuzzy inference.
In particular it shows how to calculate the value of a membership function of an
object if the object belongs to a particular concept. The sum-of-1-criterion states
that Σi ∈ Mi µi (x) = 1, ∀ x ∈χ , where Mi, i = 1, ..., k denotes all possible member-
ship terms {m i, i = 1, ..., k} of a fuzzy variable in some universe of discourse χ.
An affiliation value to a concept represents the relative extent to which an ob-
ject belongs to this concept or an attribute is common to all objects in the concept.
The threshold for membership values is regarded as significant. This is obtained
by computing the arithmetic mean of all entries within a column and take it as a
threshold.
In this scenario all students, within a particular subject, are suggested to take a
web based test at the beginning of a semester. Test results indicate lack of knowl-
edge and skills, lack of understanding of certain concepts or misconceptions, that
are prerequisites for studying that subject. Based on the test results students are
placed in different groups. Suitable learning objects are later on suggested to each
student based on her group membership.
At the initial stage group types are formed based on previous teaching experi-
ence. If such experience is missing the groups can be formed according to a lec-
ture assumption. Group types are further tuned when more experience is obtained.
The theory of concept lattices is applied in establishing relationships among
groups of learners and the subject content. The process of assigning a student to a
particular group is based on fuzzy functions. Such functions allow partial group
membership, i.e. a particular individual may belong to some extend to more than
one group. This in contrast to classical set theory where an element is either within
a set or does not belong to that set, [24]. This makes the approach much more dy-
namic, flexible and easy to adapt to the individual needs of each student.
Learning objects are first collected in a database. Metadata is attached to each
learning object, describing content, size, purpose and recommended educational
level. AHP methods are applied for assigning a learning object to a group and
consequently to a student.
3 System
4 Conclusion
While most efforts aim at providing a technology to access and share existing
learning objects, much less is known about how to assign the most suitable learn-
ing objects for a student.
The proposed method can be used to determine the learning effect of using
learning objects in a subject as well as qualities of a single learning object. Learn-
ing styles and learning preferences can be further employed in the process of
choosing the most appropriate learning object for each student. In addition the ap-
plications of fuzzy functions allow partial group membership, i.e. a particular in-
dividual may belong to several groups. This makes the approach much more dy-
namic, flexible and easy to adapt to the individual needs of each student.
References