Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso
Progress in Artificial Intelligence: Francisco Pereira Penousal Machado Ernesto Costa Amílcar Cardoso
Progress in
Artificial Intelligence
17th Portuguese Conference
on Artificial Intelligence, EPIA 2015
Coimbra, Portugal, September 8–11, 2015, Proceedings
123
Lecture Notes in Artificial Intelligence 9273
Progress in
Artificial Intelligence
17th Portuguese Conference
on Artificial Intelligence, EPIA 2015
Coimbra, Portugal, September 8–11, 2015
Proceedings
123
Editors
Francisco Pereira Ernesto Costa
ISEC - Coimbra Institute of Engineering CIUSC, Department of Informatics
Polytechnic Institute of Coimbra Engineering
Coimbra University of Coimbra
Portugal Coimbra
Portugal
Penousal Machado
CIUSC, Department of Informatics Amílcar Cardoso
Engineering CIUSC, Department of Informatics
University of Coimbra Engineering
Coimbra University of Coimbra
Portugal Coimbra
Portugal
University of Coimbra, the Polytechnic Institute of Coimbra, and the Centre for
Informatics and Systems of the University of Coimbra. We would also like to thank the
members of the Organizing Committee: Anabela Simões, António Leitão, João Correia,
Jorge Ávila, Nuno Lourenço, and Pedro Martins. Acknowledgment is due to SISCOG
– Sistemas Cognitivos S.A., Feedzai S.A., Thinkware S.A., iClio, FBA., and FCT –
Fundação para a Ciência e a Tecnologia for the financial support. A final word goes to
Easychair, which greatly simplified the management of submissions, reviews, and
proceedings preparation, and to Springer for the assistance in publishing the current
volume.
Conference Co-chairs
Francisco Pereira Polytechnic Institute of Coimbra, Portugal
Penousal Machado University of Coimbra, Portugal
Program Co-chairs
Ernesto Costa University of Coimbra, Portugal
Francisco Pereira Polytechnic Institute of Coimbra, Portugal
Organization Co-chairs
Amílcar Cardoso University of Coimbra, Portugal
Penousal Machado University of Coimbra, Portugal
Proceedings Co-chairs
João Correia University of Coimbra, Portugal
Pedro Martins University of Coimbra, Portugal
Program Committee
AmIA Track Chairs
Paulo Novais University of Minho, Portugal
Goreti Marreiros Polytechnic of Porto, Portugal
Ana Almeida Polytechnic of Porto, Portugal
Sara Rodriguez Gonzalez University of Salamanca, Spain
Additional Reviewers
A Column Generation Based Heuristic for a Bus Driver Rostering Problem . . . 143
Vítor Barbosa, Ana Respício, and Filipe Alvelos
Intelligent Robotics
1 Introduction
theoretic and heuristic based approaches evolved and turned more complex. With this
development they have been used in a wide range of applications. However they share
some limitations. In the majority of game-theoretic and heuristic models, agents
exchange proposals, but these proposals are limited. Agents are not allowed to ex-
change any additional information other than what is expressed in the proposal itself.
This can be problematic, for example, in situations where agents have limited infor-
mation about the environment, or where their rational choices depend on those of
other agents. Another important limitation is that agent’s utilities or preferences are
usually assumed to be completely characterized prior to the interaction. Thus, to over-
come these limitations, argumentation-based negotiation appeared and turned one of
the most popular approaches to negotiation [4], it has been extensively investigated
and studied, as witnessed by many publications [5-7]. The main idea of argumenta-
tion-based negotiation is the ability to support offers with justifications and explana-
tions, which play a key role in the negotiation settings. So, it allows the participants to
the negotiation not only to exchange offers, but also reasons and justifications that
support these offers in order to mutually influence their preference relations on the set
of offers, and consequently the outcome of the dialogue.
It is simple to understand the parallelism between this approach and group decision-
making. The idea of a group of agents exchanging arguments in order to achieve, for
instance, a consensus, in order to support groups in decision-making process is easy to
understand [8]. However the complexity of this process must not be underestimated, if
considering a scenario where an agent seeks to defend the interests of who it represents
and at the same time be part of a group that aims to reach a collective decision towards
a problem for their organization [9, 10]. Not only are those agents simultaneously
competitive and cooperative but also represent human beings. Establishing some sort
of dialog, as well as the different types of arguments that can be exchanged by agents
is only the first step towards the problem resolution. One agent that represents a deci-
sion-maker involved in a process of group decision-making may show different levels
of experience and knowledge related with the situation and should behave accordingly.
Literature shows that there are works on the subject [10-13], however it should be
noted the existence of some flaws in terms of real world applicability of certain mod-
els. Some require high configuration costs that will not suit the different types of users
they are built for and others show flaws that in our opinion are enough to affect the
success of a Group Decision Support System (GDSS).
In this work it will be presented the most relevant models that allow inferring or
configuring a behaviour style for a group decision-making context. It is also proposed
a set of rules for which a behaviour model must follow without jeopardizing the entire
GDSS and finally it is proposed an approach made through the modification of one
existing model to the context of GDSS.
The rest of the paper is organized as follows: in the next section is presented the litera-
ture review. Section 3 presents our approach, where we identify different types of behav-
iours, defined with the use of an existing model and presented the set of rules that we
believe that are the most important to allow defining types of behaviours for the agents in
a way that does not compromise the system. In section 4 it will be discussed and debated
how our approach can be applied to the context of GDSS and its differences compared
Defining Agents’ Behaviour for Negotiation Contexts 5
with other existing approaches. Finally, some conclusions are taken in section 5, along
with the work to be done hereafter.
2 Literature Review
The concern for identifying and understanding particular behavioural attitudes has led
to many investigations and studies throughout the last decades with emphasis on pro-
posing models and behaviour styles that can relate to the personality of the negotiator.
Carl Jung (1921), was the first to specify a model to study different psychological
personality types based on four types of consciousness (sensation, intuition, thinking
and feeling) that could in turn be combined with two types of attitudes (extraversion
and introversion) and that way identify eight primary psychological types [14].
In 1962, Myers Briggs, developed a personality indicator model (The Meyers-
Briggs Type Indicator) based on Jung’s theories [15]. This indicator is used as a psy-
chometric questionnaire and allows people to understand the world around them and
how they behave and make decisions based on their preferences [16]. This model was
useful in order to identify different styles of leadership, which were later specified in
Keirsey and Bate’s publication [17], in 1984, as four styles of leadership:
• Stabilizer: tends to be very clear and precise when defining objectives and organiz-
ing and planning tasks in order to achieve them. Stabilizer leaders are also reliable
and trustworthy due to the fact they show concern for other worker’s necessities
and problems. They are able to increase the motivation of their workers by setting
tradition and organization as an example of success;
• Catalyst: the main focus is to develop the quality of own work and the one pro-
vided by their staff. They serve the facilitator’s role by bringing the best out of
other people, and motivate other workers with their own enthusiasm and potential;
• Trouble-shooter: as the name suggests, focus on dealing and solving problems.
They show great aptitude for solving urgent problems by being practical and im-
mediate. They bring people together as a team by analysing what needs to be done
and informing exactly what to do as quickly as possible;
• Visionary: visionaries act based on their own intuition and perception of the prob-
lems in order to make decisions. They have a mind projected for the future and
plan idealistic scenarios and objectives which may not always be achievable.
• Social: social individuals have high social aptitude, preferring social relationships
and helping other people solving their problems. They prefer working with people
over things;
• Enterprising: enterprising individuals show great communication and leadership
skills, and are usually concerned about establishing direct influence on other peo-
ple. They prefer dealing with people and ideas over things;
• Conventional: conventional individuals value order and efficiency. They show
administrative and organization skills. They prefer dealing with numbers and
words over people and ideas.
Fig. 2. Thomas and Kilmann’s model for interpersonal conflict-handling behaviour, adapted
from [19]
In 1992, Costa and McCrae [22] proposed a set of thirty traits extending the five-
factor model of personality (OCEAN model) which included six facets for each of the
factors. These traits were used in a study made by Howard and Howard [23] in order
to help them separate different kinds of behaviour styles and identify corresponding
themes. A theme is defined as “a trait which is attributable to the combined effect of
two or more separate traits”. Those styles and themes are based on common sense and
general research, and some of them have already been mentioned before in this litera-
ture review, however it is also important to referrer other relevant styles that were
suggested such as the Decision and Learning styles. Decision style includes the Auto-
cratic, Bureaucratic, Diplomat and Consensus themes while Learning style includes
the Classroom, Tutorial, Correspondence and Independent themes.
In 1995, Rahim and Magner [24] created a meta-model of styles for handling inter-
personal conflict based on two dimensions: concern for self and concern for the other.
This was the base for the five management styles identified as obliging, avoiding, dom-
inating, integrating and compromising as will be explained in detail in the Section 3.
3 Methods
It is really important to define correctly the agent’s behaviour in order to not jeopard-
ize the validation of the entire GDSS. Sometimes, in this area of research, there is an
exhaustive concern to find a better result and because of that, other variables may be
forgotten which can make impossible the use of a certain approach in those situations.
For example: Does it make sense for a decision-maker or a manager from a large
company, with his super busy schedule have the patience/time to answer (seriously) to
a questionnaire of 44 questions like “the Big Five Inventory” so that he can model his
agent with his personality? Due to reasons like this we have defined a list with con-
siderations to have when defining types of behaviours for the agents in the context
here presented. The definition of behaviour should:
8 J. Carneiro et al.
1. Enhance the capabilities of the agents, i.e., make the process more intelligent, more
human and less sequential, even though it may not be visible in the conceptual
model it must not be possible for the programmer to anticipate the sequence of
interactions just by reading the code;
2. Be easy to configure (usability) or not need any configuration at all from the user
(decision-maker);
3. Represent the interests of the decision-makers (strategy used), so that agent’s way
of acting meets the interests defined by the user (whenever possible);
4. Not be the reason for the decision-makers to give up using the application, i.e., in a
hypothetical situation, a decision-maker should not “win” more decisions just
because he knows how to manipulate/configure better the system;
5. Be available for everyone to benefit from it. Obviously all decision-makers face
meetings in different ways. Their interests and knowledge for each topic is not al-
ways the same. Sometimes it may be of their interest to let others speak first and
only after gathering all the information, elaborate a final opinion on the matter.
Other times it may be important to control the entire conversation and try to con-
vince the other participants to accept out opinion straightaway.
By taking into account all these points, we propose in this article a behaviour
model for the decision-making context based on conflict styles defined by Rahim and
Magner (1995) [24]. The styles defined are presented in Fig. 3 and have been adapted
to our problem. Rahim and Magner reckons the existence of 5 types of conflict styles:
integrating, obliging, dominating, avoiding and compromising. In their work, they
suggested these styles in particular to describe different ways of behave in conflict
situations. They defined these styles according to the level of concern a person has for
reaching its own goal and reaching other people’s objectives. This definition goes
along exactly with what we consider that the agents that operate in a GDSS context
should be, when we say that they are both cooperative and competitive simultaneous-
ly. Therefore this model ends up describing 5 conflict styles which support what we
think that is required for the agents to have a positive behaviour in this context. It also
has the advantage of being a model easy to understand and to use.
In our approach, the configuration of agent’s behavior made by the decision-
maker, will be done through the selection of one conflict style. The main idea is to
define the agent with the participant’s interests and strategies. For that, the definition
of each conflict style should be clear and understandable for the decision-maker. The
decision-maker can define in his agent different conflict styles throughout the process.
For example, a decision-maker who is included in a decision process and has few or
even no knowledge about the problem during the early stage of discussion. For that
situation he may prefer to use an “avoiding” style and learn with what other people
say, gather arguments and information that will support different options and that way
learn more about the problem. In a following stage, when the decision-maker already
has more information and knowledge about the problem, he may opt to use a more
active and dominating style in order to convince others towards his opinion. Like
mentioned before, there are many factors that can make the decision-maker face a
meeting in different ways: interest about a topic, lack of knowledge about a topic,
reckons the participation of more experienced people in the discussion, etc.
Defining Agents’ Behaviour for Negotiation Contexts 9
The different types of beehaviour defined and that can be used by the agents are:
4 Discussion
Many approaches have been suggested in the literature which define/model ageents
with characteristics that willl differentiate them from each other and as result will aalso
10 J. Carneiro et al.
show different ways of operating [11-13, 25-27]. However, even if many of those
publications might be interesting for an academic context, they still show some issues
that must be addressed. These issues that we will analyze are related to the context of
support to group decision-making and also to competitive agents which that represent
real individuals. There are several approaches in literature for (1) agents that are mod-
eled according to the real participant personality (decision-maker) which they repre-
sent and (2) modeled with different intelligence levels (abilities) [10, 12]. One of the
most used technics in literature is “The Big Five Inventory” questionnaire that allows
to obtain values for each one of the personality traits defined in the model of “The Big
Five” (openness, conscientiousness, extraversion, agreeableness and neuroticism) [11,
26, 28]. Theoretically, we can think that the way agents operate, which is similar to
real participant because it is modeled with “the same personality” is perfect. Howev-
er, defining an agent with a conflict style based on the values of personality traits may
not be the right way to identify the decision-maker. What makes a human act in a
particular way is the result of much more than just its personality, it is a set of factors
such as: personality, emotions, humor, knowledge, and body (physic part), and it can
also be considered other factors such as sensations and the spiritual part [29]. Another
relevant question is the fact this type of approach allows that certain agents have ad-
vantage over other agents. Many may say and think that this occurrence is correct,
because close to what happens in real life, there are decision-makers that are more apt
and therefore have advantage over other decision-makers. However the questions that
arise are the following: Would a product like this used by decision-makers that knew
they would be at disadvantage by using this tool? Would it be possible to sell a prod-
uct that does not guarantee equality between its future users? It is also important to
discuss another relevant analysis point which is the fact that this type of approach, in
some situations, might provide less intelligent and more sequential outputs.
The study of different types of behaviour in agents has been represented in litera-
ture by a reasonable number of contributions. However, it is a subject that most of the
times offers validation problems. Although there are proposals with cases of study
aiming to validate this subject, that validation is somewhat subjective most of the
times. Even when trying to mathematically formulate the problem so that it becomes
scientifically “proven”, that proof may often feel forced. A reflection of this problem
is the difference between the practiced approaches for social and exact sciences. It is
clear for us, as computer science researchers that it is not our goal to elaborate a mod-
el for behavioural definition to use in specific scenarios. Instead, we will use a model
defined and theoretically validated by others who work in areas that allow them to
have these skills. However, the inclusion of intelligence in certain systems is growing
at a blistering pace and some of the systems would not make sense nor would succeed
without this inclusion. This means that it become more of common practice to adapt
certain models that have not been designed specifically for the context for which they
will be used. Because of that the evolution of the presented approaches will happen in
an empirical way.
Another relevant condition is related with how most of the works are focused on
very specific topics which may prevent a more pragmatic comparison of the various
approaches. Even if in some situations the use of a specific technic (such as “The Big
Defining Agents’ Behaviour for Negotiation Contexts 11
Five Inventory”) might make sense, in others, and even though it may scientifically
provide a case of study with brilliant results, it can be responsible for jeopardizing the
success of the system. Our work aims to support each participant (decision-maker) in
the process of group decision-making. It is especially targeted for decision support in
ubiquitous scenarios where participants are considered people with a very fast pace of
life, where every second counts (top managers and executives). In our context the
system will notify the participant whenever he is added in a decision process (for
instance, by email), and after that every participant can access the system and model
his agent according to his preferences (alternatives and attributes classification), as
well as how he plans to face that decision process (informing the agent about the type
of behaviour to have), always knowing that there are no required fields in the agent
setup. This way provides more freedom for the user to configure (depending on his
interest and time) his agent with detail or with no detail at all. As can be seen in this
context (and referred previously) the agents must be cooperative and competitive.
They are cooperative because they all seek one solution for the organization they
belong to, and competitive because each agent seeks to defend the interests of its par-
ticipant and persuade other agents to accept his preferred alternative. For us this
means that if an agent is both cooperative and competitive then it cannot exhibit
behaviour where it is only concerned in achieving its objectives and vice versa.
have in the system where they are used. Our approach intends to provide a more
perceptible and concrete way for the decision-maker to understand the five types of
behaviour that can be used to model the agent in support to group decision-making
context where each agent represents a decision-maker. We believe that with our
approach it will be simpler for agents to reach or suggest solutions since they are
modeled with behaviours according to what the decision-maker wants. This makes it
easier to reflect in the agent the concern to achieve the decision-maker’s objectives or
the objectives belonging to other participants in the decision process. With this ap-
proach the agents follow one defined type of behaviour that also works as a strategy
that can be adopted by each one of the decision-makers.
As for future work we will work in the specific definition of each type of behaviour
identified in this work. We intend to describe behaviours according to certain facets
proposed in the Five Factor Model and also study tendencies for each type of behav-
iour to make questions, statements, and requests. At later stage we will integrate this
model in the prototype of a group decision support system which we are developing.
References
1. Rahwan, I., Ramchurn, S.D., Jennings, N.R., Mcburney, P., Parsons, S., Sonenberg, L.:
Argumentation-based negotiation. The Knowledge Engineering Review 18, 343–375
(2003)
2. Hadidi, N., Dimopoulos, Y., Moraitis, P.: Argumentative alternating offers. In: McBurney,
P., Rahwan, I., Parsons, S. (eds.) ArgMAS 2010. LNCS, vol. 6614, pp. 105–122. Springer,
Heidelberg (2011)
3. El-Sisi, A.B., Mousa, H.M.: Argumentation based negotiation in multiagent system. In:
2012 Seventh International Conference on, Computer Engineering & Systems (ICCES),
pp. 261–266. IEEE (1012)
4. Marey, O., Bentahar, J., Asl, E.K., Mbarki, M., Dssouli, R.: Agents’ Uncertainty in Argu-
mentation-based Negotiation: Classification and Implementation. Procedia Computer Sci-
ence 32, 61–68 (2014)
5. Mbarki, M., Bentahar, J., Moulin, B.: Specification and complexity of strategic-based rea-
soning using argumentation. In: Maudet, N., Parsons, S., Rahwan, I. (eds.) ArgMAS 2006.
LNCS (LNAI), vol. 4766, pp. 142–160. Springer, Heidelberg (2007)
6. Amgoud, L., Vesic, S.: A formal analysis of the outcomes of argumentation-based negotia-
tions. In: The 10th International Conference on Autonomous Agents and Multiagent
Systems, vol. 3, pp. 1237–1238. International Foundation for Autonomous Agents and
Multiagent Systems (2011)
Defining Agents’ Behaviour for Negotiation Contexts 13
7. Bonzon, E., Dimopoulos, Y., Moraitis, P.: Knowing each other in argumentation-based
negotiation. In: Proceedings of the 11th International Conference on Autonomous Agents
and Multiagent Systems, vol. 3, pp. 1413–1414. International Foundation for Autonomous
Agents and Multiagent Systems (2012)
8. Kraus, S., Sycara, K., Evenchik, A.: Reaching agreements through argumentation: a logi-
cal model and implementation. Artificial Intelligence 104, 1–69 (1998)
9. Faratin, P., Sierra, C., Jennings, N.R.: Negotiation decision functions for autonomous
agents. Robotics and Autonomous Systems 24, 159–182 (1998)
10. Rahwan, I., Kowalczyk, R., Pham, H.H.: Intelligent agents for automated one-to-many
e-commerce negotiation. In: Australian Computer Science Communications, pp. 197–204.
Australian Computer Society Inc. (2002)
11. Santos, R., Marreiros, G., Ramos, C., Neves, J., Bulas-Cruz, J.: Personality, emotion, and
mood in agent-based group decision making (2011)
12. Kakas, A., Moraitis, P.: Argumentation based decision making for autonomous agents. In:
Proceedings of the Second International Joint Conference on Autonomous Agents and
Multiagent Systems, pp. 883–890. ACM (2003)
13. Zamfirescu, C.-B.: An agent-oriented approach for supporting Self-facilitation for group
decisions. Studies in Informatics and control 12, 137–148 (2003)
14. Jung, C.G.: Psychological types. The collected works of CG Jung 6(18), 169–170 (1971).
Princeton University Press
15. Myers-Briggs, I.: The Myers-Briggs type indicator manual. Educational Testing Service,
Prinecton (1962)
16. Myers, I.B., Myers, P.B.: Gifts differing: Understanding personality type. Davies-Black
Pub. (1980)
17. Bates, M., Keirsey, D.: Please Understand Me: Character and Temperament Types. Prome-
theus Nemesis Book Co., Del Mar (1984)
18. Holland, J.L.: Making vocational choices: A theory of vocational personalities and work
environments. Psychological Assessment Resources (1997)
19. Kilmann, R.H., Thomas, K.W.: Interpersonal conflict-handling behavior as reflections of
Jungian personality dimensions. Psychological reports 37, 971–980 (1975)
20. Blake, R.R., Mouton, J.S.: The new managerial grid: strategic new insights into a proven
system for increasing organization productivity and individual effectiveness, plus a reveal-
ing examination of how your managerial style can affect your mental and physical health.
Gulf Pub. Co. (1964)
21. Walton, R.E., McKersie, R.B.: A behavioral theory of labor negotiations: An analysis of a
social interaction system. Cornell University Press (1991)
22. Costa, P.T., MacCrae, R.R.: Revised NEO Personality Inventory (NEO PI-R) and NEO
Five-Factor Inventory (NEO FFI): Professional Manual. Psychological Assessment Re-
sources (1992)
23. Howard, P.J., Howard, J.M.: The big five quickstart: An introduction to the five-factor
model of personality for human resource professionals. ERIC Clearinghouse (1995)
24. Rahim, M.A., Magner, N.R.: Confirmatory factor analysis of the styles of handling inter-
personal conflict: First-order factor model and its invariance across groups. Journal of
Applied Psychology 80, 122 (1995)
25. Allbeck, J., Badler, N.: Toward representing agent behaviors modified by personality and
emotion. Embodied Conversational Agents at AAMAS 2, 15–19 (2002)
26. Badler, N., Allbeck, J., Zhao, L., Byun, M.: Representing and parameterizing agent behav-
iors. In: Proceedings of Computer Animation, 2002, pp. 133–143. IEEE (2002)
14 J. Carneiro et al.
27. Velásquez, J.D.: Modeling emotions and other motivations in synthetic agents. In:
AAAI/IAAI, pp. 10–15. Citeseer (1997)
28. Durupinar, F., Allbeck, J., Pelechano, N., Badler, N.: Creating crowd variation with the
ocean personality model. In: Proceedings of the 7th International Joint Conference on Au-
tonomous Agents and Multiagent Systems, vol. 3, pp. 1217–1220. International Founda-
tion for Autonomous Agents and Multiagent Systems (2008)
29. Pasquali, L.: Os tipos humanos: A teoria da personalidade. Differences 7, 359–378 (2000)
Improving User Privacy and the Accuracy
of User Identification in Behavioral Biometrics
1 Introduction
In the last years there has been a significantly increase in jobs that are mentally
stressful or fatiguing, in expense of otherwise traditional physically demanding
jobs [1]. Workers are nowadays faced not only with more mentally demanding
jobs but also with demanding work conditions (e.g. positions of high responsibil-
ity, competition, risk of unemployment, working by shifts, working extra hours).
This results in the recent emergence of stress and mental fatigue as some of the
most serious epidemics of the twenty first century [2,3]. In terms of workplace
indicators, this has an impact on human error, productivity or quality of work
and of the workplace. In terms of social or personal indicators, this has an impact
on quality of life, health or personal development. Moreover, there is an increase
in the loss of focus that leads people to be unaware of risks, thus lowering the
security threshold.
Recent studies show the negative impact of working extra hours on produc-
tivity [4,5]: people work more but produce less. Stressful milieus just add to the
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 15–26, 2015.
DOI: 10.1007/978-3-319-23485-4 2
16 A. Pimenta et al.
problem. The questions is thus how to create the optimal conditions to meet
productivity requirements while respecting people’s well-being and health. Since
each worker is different, what procedures need to be implemented to measure the
level of stress or burnout of each individual worker? And their level of produc-
tivity? The mere observation of these indicators using traditional invasive means
may change the worker’s behavior, leading to biased results that do not reflect
his actual state. Directly asking, through questionnaires or similar instruments,
can also lead to biased results as workers are often unwilling to share feelings
concerning their workplace with their coworkers.
Recent approaches for assessing and managing fatigue have been developed
that look at one’s interaction patterns with technological devices to assess
one’s state (e.g. we type at a lower pace when fatigued). Moreover, the same
approaches can be used to identify users (e.g. each individual types in a dif-
ferent manner). This field is known as behavioral biometrics. In this paper we
present a framework for collecting from users in a transparent way, that allows
to perform tasks commonly associated to behavioral biometrics. Moreover, this
framework respects user privacy. Finally, we show how including the user’s state
and information about the interaction context may improve the accuracy of
user identification. The main objective of this work is to define a reliable and
non-intrusive user identification system for access control.
2 Security Systems
device. As each individual has a particular way to walk, talk, laugh or do any-
thing else, each one of us has also their own interaction patterns with technolog-
ical devices. Moreover, most of the applications we interact with have a specific
flow of operation or require a particular type of interaction, restricting or condi-
tioning the user’s possible behaviors to a smaller set. Maintaining a behavioral
profile of authorized users may allow to identify uncommon behaviors on the
current user that may indicate a possible unauthorized user. This is even more
likely to work when behavioral information for particular applications is used.
Such systems are known as Behavioural Biometrics: they rely on the users’
behavioral profiles to establish the behavior of authorized users. Whenever, when
analyzing the behavior of the current user, a moderate behavioral deviation is
detected from the known profiles, the system may take action such as logging off,
notifying the administrator or using an alternative method of authentication.
Such systems can also include behaviors other than the ones originated from
keyboard and mouse interaction patterns [10]. In fact, any action performed on the
technological device can be used as threat detection. For example, if the system
console is started and the authorized user of the device never used the console
before, a potential invasion may be taking place. Similar actions can be taken on
other applications or even on specific commands (e.g. it is unlikely that a user with
a non-expert profile suddenly starts using advanced commands on the console).
To implement behavioral biometrics, distinct procedures that can be adopted,
such as:
– Biometric Sketch: this method uses the user drawings as templates for com-
parison [11,12]. The system collects patterns from the user drawing and
compares it to others in a database. Singularity is assured by the number
of possible combinations. The downside is that the drawings must be very
precise, which in most of the cases is quite difficult, even more by using a
standard mouse to draw.
– GUI Interaction: this technique uses the interaction of the user with visual
interfaces of the applications and compares it to the model present in the
database. For every application that the user interacts with, a model must
be present. Thus, both the model and the application must be saved in the
database. This method requires that every action per application is saved,
resulting in a large amount of information to be maintained. Moreover, each
new application or update must be trained and modeled. Therefore, this
method is very strict and complex to implement and maintain.
– Keystroke Dynamics: this method uses the keyboard as input and is based
on the user’s typing patterns. It captures the keys pressed, measuring time
and pressing patterns, extracting several features about the typing behavior.
This is a well established method, as it relies solely on the user’s interaction,
allowing to create simple and usable models.
– Mouse Dynamics: this method consists in capturing the mouse movement
and translating it into a model. All the interactions are considered, such
as movement and clicking. This method is similar to the GUI Interaction
method but simpler; although it suffers the same context problem. The main
Improving User Privacy and the Accuracy of User Identification 19
In this work, Keystroke Dynamics and Mouse Dynamics are chosen as inputs.
Their broader features and availability are the traits that suit the aim of the
intended system. They are application independent and operating system inde-
pendent, and are nowadays the most common input method when interacting
with computers.
3 The Framework
The framework developed in the context of this work is a unified system, com-
posed of two main modules: fatigue monitoring and security. The process of
monitoring is implemented using an application that captures the keyboard and
mouse inputs transparently. The features used are the same in both systems and
are defined in more detail in [13]. The features extracted from the keyboard are:
– Double click duration: the time between two clicks in a double click event
– Absolute sum of angles: the quantification of how much the mouse turns,
regardless of the direction of the turn, between each two consecutive clicks
– Signed sum of angles: the quantification of how much the mouse turns, con-
sidering the direction of the turn, between each two consecutive clicks
– Distance between clicks: the distance traveled between each two consecutive
clicks
The data gathered may be processed differently to extract the information
related to each scope of the framework (fatigue monitoring and stress). An inte-
grated system is beneficial due to the jointly nature of the data and to the
fact that only one application is present locally, thus having a low footprint on
computer resources. Furthermore, these are two areas that are intrinsically con-
nected, and one can affect the other. Their joint analysis is fundamental to the
achievement of the proposed objectives.
In this approach, the encrypted key replaces the information about the spe-
cific key pressed. It is therefore still possible, while hiding what the user wrote,
to extract the previously mentioned features, thus guaranteeing the user’s pri-
vacy. The encryption of the pressed keys is carried out through the generation of
random key encryptions at different times. This is done as depicted in Algorithm
1, which exemplifies the developed approach for the case of the key down time
feature. An example of the result of the algorithm is depicted in Table 1, where
a record with and without encryption is depicted for different keys.
Hiding user input is just one part of the solution for the issue of user security.
The other is to prevent intrusions. In this scope, behavioral biometrics security
systems can run in two different modes [14]: identification mode and verification
mode. In this system we use the identification mode instead of the verification
mode to ensure a constant user identification in the monitoring system.
The identification mode is the process of trying to discover the identity of
a person by examining a biometric pattern calculated from biometric data of
22 A. Pimenta et al.
the person. In this mode the user is identified based on information previously
collected from keystroke dynamics profiles of all users. For each user, a biometric
profile is built in a training phase. When in the running phase, the usage pattern
being created in real-time is compared to every known model, producing either
a score or a distance that describes the similarity between the pattern and the
model. The system assigns the pattern to the user with the most similar biomet-
ric model. Thus, the user is identified without the need for extra information.
4 Case Study
The system was analyzed and tested in four different ways. As a first step we
used the records of 40 users registered in the monitoring system to train different
models. The created models were then validated through 150 random system
usage records, taken from the system in order to validate the models created.
In a second step models were created using the type of task to be performed in
addition to the biometric information, and in the third step, models were trained
using the user’s fatigue state. We finally created models that, in addition to using
biometric data, also used the type of task and the user’s mental state at the time
of registration. Both the type of task as the user’s mental state are provided by
the monitoring system.
The participants, forty in total (36 men, 4 women) which are registered in
the monitoring system. Their age ranged between 18 and 45. The following
requirements were established to select, among all the volunteers, the ones that
participated: (1) familiarity and proficiency with the use of the computer; (2)
use of the computer on a daily basis and throughout the day; (3) owning at least
one personal computer.
After training different models (Naive Bayes, KNN, SVM and Random Forest)
with data from different users on the system, different degrees of accuracy have
been obtained in user identification, as depicted in Figure 1. It is also possible
to observe that the type of task and the level of fatigue have influence on the
process of identifying the user, since these factors effectively influence interaction
patterns. Taking this information into consideration allows the creation of more
accurate models.
The type of task being carried out during the monitoring of the interaction
patterns is particularly important, mainly due to the very nature of the task, as
well as the set of tools available to perform the task. Figure 2 shows the values of
features Key Down Time and Average Excess of Distance for five different types
of applications: Chat, Leisure, Office, Reading and Programming. The way each
different application conditions the interaction behavior is explicit. Such infor-
mation must, therefore, absolutely be considered while developing behavioral
biometrics system based on input behavior. Table 2 further supports this claim
showing that data collected in different types of applications has statistically
Improving User Privacy and the Accuracy of User Identification 23
Fig. 1. Accuracy of the different algorithms and inputs considered for user authenti-
cation.
Table 2. p-values of the Kruskal-Wallis test when comparing the data organized
according to the types of application and to the level of fatigue. In the vast majority
of the cases, the differences between the different groups are statistically significant.
significant differences for most of the features. The same happens for different
levels of fatigue.
Another extremely important aspect in user identification is the influence
of mental states on interaction patterns. Previous studies by our research team
[8,15] show that individuals under different states of stress, fatigue, high/low
mental workload or even mood evidence significant behavior changes that impact
interaction patterns with devices. They do, consequently, influence behavioral
24 A. Pimenta et al.
Fig. 2. Differences in the distributions of the data when comparing interaction patterns
with different applications, for two interaction features.
Fig. 3. Effects of different levels of fatigue on the interaction patterns, depicted for two
different features.
biometric features. Figure 3 depicts this influence for two interaction features.
Numbers in the y-axis represent the level of fatigue as self-reported by the indi-
vidual using the seven-point USAFSAM Mental Fatigue Scale questionnaire [16].
Each value represents the following state:
1. Fully alert. Wide awake. Extremely peppy.
2. Very lively. Responsive, but not at peak.
3. Okay. Somewhat fresh.
4. A little tired. Less than fresh.
5. Moderately tired. Let down.
6. Extremely tired. Very difficult to concentrate.
7. Completely exhausted. Unable to function effectively. Ready to drop.
It is therefore possible to see how increased levels of fatigue result in generally
less efficient interactions of the participants with the computer. For example,
a higher value of the Mouse Acceleration depicts a more efficient interaction
behavior in which the user is moving the mouse. The same happens with Key
Down Time, where a shorter time corresponds to a more efficiency in the use of
the keyboard.
This conclusion justifies the need for the inclusion of mental states on behav-
ioral biometrics approaches and explains the increased accuracy of the presented
approach concerning user identification when all modalities are used jointly:
interaction patterns, user state and type of application, as depicted previously
in Figure 1.
Improving User Privacy and the Accuracy of User Identification 25
References
1. Tanabe, S., Nishihara, N.: Productivity and fatigue. Indoor Air 14(s7), 126–133
(2004)
2. Miller, J.C.: Cognitive Performance Research at Brooks Air Force Base, Texas,
1960–2009. Smashwords, March 2013
26 A. Pimenta et al.
3. Wainwright, D., Calnan, M.: Work stress: the making of a modern epidemic.
McGraw-Hill International (2002)
4. Folkard, S., Tucker, P.: Shift work, safety and productivity. Occupational Medicine
53(2), 95–101 (2003)
5. Rosekind, M.R.: Underestimating the societal costs of impaired alertness: safety,
health and productivity risks. Sleep Medicine 6, S21–S25 (2005)
6. Beauvisage, T.: Computer usage in daily life. In: Proceedings of the SIGCHI con-
ference on Human Factors in Computing Systems, pp. 575–584. ACM (2009)
7. Pantic, M., Rothkrantz, L.J.: Toward an affect-sensitive multimodal human-
computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
8. Pimenta, A., Carneiro, D., Novais, P., Neves, J.: Monitoring mental fatigue through
the analysis of keyboard and mouse interaction patterns. In: Pan, J.-S., Polycarpou,
M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.)
HAIS 2013. LNCS, vol. 8073, pp. 222–231. Springer, Heidelberg (2013)
9. Carneiro, D., Castillo, J.C., Novais, P., Fernández-Caballero, A., Neves, J.: Multi-
modal behavioral analysis for non-invasive stress detection. Expert Systems with
Applications 39(18), 13376–13389 (2012)
10. Lee, P.M., Chen, L.Y., Tsui, W.H., Hsiao, T.C.: Will user authentication using
keystroke dynamics biometrics be interfered by emotions?-nctu-15 affective key-
board typing dataset for hypothesis testing
11. Al-Zubi, S., Brömme, A., Tönnies, K.D.: Using an active shape structural model
for biometric sketch recognition. In: Michaelis, B., Krell, G. (eds.) DAGM 2003.
LNCS, vol. 2781, pp. 187–195. Springer, Heidelberg (2003)
12. Brömme, A., Al-Zubi, S.: Multifactor biometric sketch authentication. In: BIOSIG,
pp. 81–90 (2003)
13. Pimenta, A., Carneiro, D., Novais, P., Neves, J.: Analysis of human performance
as a measure of mental fatigue. In: Polycarpou, M., de Carvalho, A.C.P.L.F.,
Pan, J.-S., Woźniak, M., Quintian, H., Corchado, E. (eds.) HAIS 2014. LNCS,
vol. 8480, pp. 389–401. Springer, Heidelberg (2014)
14. Shanmugapriya, D., Padmavathi, G.: A survey of biometric keystroke dynamics:
Approaches, security and challenges (2009). arXiv preprint arXiv:0910.0817
15. Rodrigues, M., Gonçalves, S., Carneiro, D., Novais, P., Fdez-Riverola, F.:
Keystrokes and clicks: measuring stress on E-learning students. In: Casillas, J.,
Martı́nez-López, F.J., Vicari, R., De la Prieta, F. (eds.) Management Intelligent
Systems. AISC, vol. 220, pp. 119–126. Springer, Heidelberg (2013)
16. Samn, S.W., Perelli, L.P.: Estimating aircrew fatigue: a technique with application
to airlift operations. Technical report, DTIC Document (1982)
Including Emotion in Learning Process
Abstract. The purpose of this paper is to propose new architecture that includes
the student’s, learning preferences, personality traits and emotions to adapt the
user interface and learning path to the students need and requirements. This
aims to reduce the difficulty and emotional stain that students encounter while
interacting with learning platforms.
1 Introduction
and behaviour. In a learning platform this feedback process does not take place in real
time and, sometimes it is not what the student requires to overcome the problem at hand.
This overtime can become a major problem and cause difficulties to the student learning
process. A possible solution to this problem could be the addition of mechanisms, to the
learning platforms, that enable computers to detect and interfere when the student re-
quires help or motivation to complete a task. The major difficulties of this work will be
the detection of these situations and how to interfere. The method of detection cannot be
too intrusive, because that would affect the student behaviour in a negative way that
would cause damage to his learning process. Another important issue is the selection of
the variable to monitor. This can include the capture of emotions, behaviour or learning
results among others. Finally, determining which will be the computer intervention
when a help situation is detected in order to reverse the help situation.
2 EmotionTest Prototype
In order to prove that emotion can have influence in the learning process. A proto-
type (EmotionTest) was developed, a learning platform that takes into account the
emotional aspect, the learning style and the personality traits, adapting the course
(content and context) to the student needs. The architecture proposed for this proto-
type is composed of 4 major models: the Application Model, Emotive Pedagogical
Model, Student Model, and Emotional Model [4], as shown in the figure bellow.
Fig. 1. Architecture
The student model consists in the user information and characteristics. This in-
cludes personal information (name, email, telephone, etc.), demographic data (gender,
race, age, etc.), knowledge, deficiencies, learning styles, emotion profile, personality
traits, etc. This information is use by the student model to better adapt the prototype to
student [4].
The emotion model gathers all the information the facial emotion recognition soft-
ware and feedback of the students. Facial Expression Recognition allows video analy-
sis of images in order to recognise an emotion. This type of emotion recognition was
Including Emotion in Learning Process 29
chosen because it was the least intrusive with the student activities. The emotion rec-
ognition is achieved by making use of an API entitled ReKognition [5]. This API
allows detection of the face, eyes, nose and mouth and if the eyes and mouth are open
or close. In addition specifies the gender of the individual and an estimate of age and
emotion. In each moment a group of three emotions is captured. For each emotion is
given a number that shows the confidence level of the emotion captured.
The application model is compose by a series of modules contain different subjects.
The subject consist in a number steps that the student has to pass in order to complete
is learning program. Usually each subject is composed by a Placement test in order to
access and update the student level of knowledge. Followed by the subject content in
which the subject is explained and follow by the subject exercises and final test. The
first step is the subject’s Placement Test (PT) that can be optional. This is designed to
give students and teachers a quick way of assessing the approximate level of student's
knowledge. The result of the PT is percentage PTs that is added to the student know-
ledge (Ks), on a particular subject, and places the student at one of the five levels of
knowledge ∑ exercise . If the PT is not performed Ksp will be equal to zero
and the student will start with any level of knowledge. The Subject Content (SC) con-
tains the subject explanation. The subject explanation depends on the stereotype. Each
explanation will have a practice exercise. These exercises will allow the students to
obtain points to perform the final test of the subject. The student needs to get 80% on
the TotalKsc to undertake the subject test. The Subject Test (ST) is the assessment of
the learned subject. This will give a final value kst that represents the student’s know-
ledge on the subject, ∑ exercise Only if the kst is higher than 50% it can be
concluded that the student has successful completed the subject. In this case the values
of the ksp and kst are compared to see if there was an effective improvement on the
student’s knowledge. This is represented in the following diagram [6].
Placement
update Ksp
Test
Learning
improvement
Subject
Exercise Ksc1
content 1
+ yes
Subject Ksc2 If ksp>kst
Exercise
No content 2
+
No
Subject
Exercise Ksc1
content 3
No
improvement
... =
yes
If TotalKsc>80% TotalKsc
yes
No
Subject
Kst If kst>50%
Test
The last model is the emotive pedagogical model that is composed by three sub-
models: the rules of emotion adaptability, the emotional interaction mechanisms and
the graph of concepts in case of failure.
The Rules of Emotion Adaptability manage the way the subject content is pre-
sented. The subject content is presented according the student’s learning preference
30 A.R. Faria et al.
and personality. This way information and exercises are presented in a manner more
agreeable to the student helping him to comprehend the subject at hand.
The subject content and subject exercises are presented according the learning
style and personality of the student. The emotional interaction mechanisms consist in
the trigger of an emotion interaction when is captured an emotion that need to be con-
tradicted in order to facilitate the learning process. The emotions to be contradicted
are: anger, sadness, confusion and disgusted. The interaction can depend on the per-
sonality and on the learning style of the student. Finally the graph of concepts in case
of failure this indicates the steps to be taken when a student fails to pass a subject.
The graph of concepts in case of failure represents the steps to be taken when fail to
surpass a subject. To be approved in a subject the whole the tasks must be completed,
and only with a subject completed it is possible to pass to the next. Inside of a subject
the student has to complete the placement test, the subject content plus exercises with
a grade equal or higher than 80% and the subject test with approval with a grade
higher than 50% to complete the subject. In case of failure it has to go back to the
subject content and repeat all the steps [6].
3 Data Analysis
To test the performance of the developed prototype some experiences were carried
out with students from two ISEP Engineering courses: Informatics Engineering and
Systems Engineering. The total number of students involved in these tests was 115
with ages between 17 and 42 years old. This group of students was composed of 20%
female (n=23) and 80% male (n=92), the participants were mainly from the districts
of Oporto, Aveiro and Braga.
To assess validity of the prototype the students were divided in two groups, v1 and
v2. Group v1 tested the prototype with emotional interaction and group v2 without
any emotional interaction.
Group v1 had to accomplish a diagnostic test (in paper) to help grade the student
initial knowledge level, followed by the evaluation of the prototype with the emotion-
al interaction and learning style. This would include, the login into the prototype, at
this moment is when the initial data is begins to be collected for the student model. By
accessing the school’s Lightweight Directory Access Protocol (LDAP) one was able
to gather the generic information of the students (like name, email and other). After
login the students were required to answer two questionnaires (TIPI, VARK) build
into the prototype. This allows the prototype to known the student’s personality traits
and learning preferences. Afterward the student could assess the learning materials
and exercises. From the moment the student login his emotion state has been monitor
and saved and every time that is detected an emotion that triggers an intervention it
would appear on the screen. After this evaluation the students had to complete a final
test (in paper) to help grade the student final knowledge level [7].
Group v2 had to accomplish diagnostic test (in paper) to help grade the student ini-
tial knowledge level, followed by the prototype evaluation without any emotional
interaction. This evaluation is in all similar to group v1, but with one big difference.
Even though the emotional state is monitor, when is detected an emotion that triggers
Including Emotion in Learning Process 31
an intervention it would not appear on the screen. After this test the students had to
complete a final test (in paper) to help grade the student final knowledge level [7].
Analyzing the data of evaluation test, for group v3 for diagnostic test it has a mean
of 45,7% (SD =40,3 ) and for the final test a mean of 85,7% (SD=12,2) and for group
v4 for diagnostic test it has a mean of 37,1% (SD = 29,2) and for the final test a mean
of 61,4% (SD=33,7). The data gathered did not have a normal distribution so the two
groups were compared using a non-parametric test Mann-Whitney. The diagnostic
test has a Mann–Whitney U = 83,0 and for a sample size of 14 students. For this anal-
ysis it was found a P value of 0,479 which indicates that it doesn’t have any statistical
difference which is understandable because it was assumed that all students had more
or less the same level of knowledge. The final test it has a Mann–Whitney U =54,0
and for an equal sample size of the diagnostic test. For this analysis it was found a P
value of 0,029 in this case the differences observed are statistical different. In addi-
tion, a series of tests were made to compare the means values of the students by group
and by learning preference, by group and personality and by group and emotional
state. The objective of running these tests was to find out if learning preference, per-
sonality and emotional state had any influence on the outcome of the final test. In
relation to the first two tests, by learning preference and by personality, no statistical-
ly significant differences were found in the data. Therefore it cannot be concluded
that learning preference and personality in each group had any influence in the final
test outcome. To prove this assumption it is needed a larger sample size. For the ques-
tion “if the emotional state had any influence in the in the final test”, the differences
observed were statistical significant [7].
4 Conclusion
In conclusion, with this work was attempted to answer several questions. The central
question that guided this work was: “Does a learning platform that takes into account
the student’s emotions, learning preferences and personality improve the student’s
learning results?”
The gathered data from the performed test showed that there is a statistical differ-
ence between students’ learning results while using two learning platforms: one learn-
ing platform that takes into account the student’s emotional state and the other
platform that does not have that in consideration. This gives an indication that by
introducing the emotional component, the students’ learning results can possibly be
improved. Another question was: “Does Affective computing technology help im-
proving a student learning process?”
In answering positively to the central question, this question is partly answered. As
results showed, the students’ learning results can be improved by adding an emotional
component to a learning platform; also the use of Affective Computing technology to
capture emotion can enhance this improvement. The use Affective Computing allows
the capture of the student’s emotion by using techniques that don’t inhibit the stu-
dent’s actions. Also, it can be used one or more techniques simultaneously to help
verify the accuracy of the emotional capture. The last question was: “What are the
stimuli that can be used to induce or change the student state of mind in order to im-
prove the learning process?”
32 A.R. Faria et al.
First, the results indicate that the platform with an emotional component had an
overall set of more positive emotions than the platform without this component.
Showing that, the stimuli produced in the platform with an emotional component was
able to keep the students in a positive emotional state and motivated to do the tasks at
hand, this did not happen in the platform without this component.
Second the results demonstrated that the platform with an emotional component
not only got the set of more positive emotions among the students, but also obtained
an improvement in the students learning results.
Acknowledgments. This work is supported by FEDER Funds through the “Programa Opera-
cional Factores de Competitividade - COMPETE” program and by National Funds through
FCT “Fundação para a Ciência e a Tecnologia” under the project: FCOMP-01-0124-FEDER-
PEst-OE/EEI/UI0760/2014.
References
1. Picard, R.W., Papert, S., Bender, W., Blumberg, B., Breazeal, C., Cavallo, D., Machover,
T., Resnick, M., Roy, D., Strohecker, C.: Affective learning - a manifesto. BT Technol. J.
22(4), 253–268 (2004)
2. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
3. Kort, B., Reilly, R., Picard, R.W.: An affective model of interplay between emotions and
learning: re-engineering educational pedagogy-building a learning companion. In: Proc. -
IEEE Int. Conf. Adv. Learn. Technol. ICALT 2001, pp. 43–46 (2001)
4. Faria, A., Almeida, A., Martins, C., Lobo, C., Gonçalves, R.: Emotional Interaction Model
For Learning. In: INTED 2015 Proc., pp. 7114–7120 (2015)
5. orbe.us | ReKognition - Welcome to Rekognition.com (2015) (Online).
http://rekognition.com/index.php/demo/face. (accessed: 26–Jul–2014)
6. Faria, A.R., Almeida, A., Martins, C., Gonçalves, R.: Emotional adaptive platform for learn-
ing. In: Mascio, T.D., Gennari, R., Vittorini, P., de la Prieta, F. (eds.) Methodologies and In-
telligent Systems for Technology Enhanced Learning. AISC, vol. 374, pp. 9–16. Springer,
Heidelberg (2015)
7. Faria, R., Almeida, A., Martins, C., Gonçalves, R.: Learning Platform. In: 10th Iberian
Conference on Information Systems and Technologies – CISTI 2015 (2015)
Ambient Intelligence:
Experiments on Sustainability Awareness
1 Introduction
The advent of computer science and its evolution led to the availability of computa-
tional resources that can better assess and execute more complex reasoning and moni-
toring of sustainability attributes. This led to the creation of the field of computational
sustainability (Gomes, 2011). Coupled with sustainability is energy efficiency which
is directly affected by human behaviour and social aspects such as human comfort.
Fundamentally, efficiency deals with the best strategy to obtain the objectives that are
set, however, when the concept of sustainability is added, several efficient plans
might be deemed unsustainable because they cannot be maintained in the future.
While efficiency is focused on optimization, sustainability is mostly concerned on
restrictions put in place to ensure that the devised solution does not impair the future.
Not only, context hardens the problem but also the possibility of missing information
which might occur due to same unforeseen event that jeopardizes an efficient solu-
tion. To tackle such event, computational systems are able to maintain sensory net-
works over physical environments to acquire contextual information so it can validate
the conditions for efficient planning but also acquire information and, as a last resort,
act upon the physical environment.
2 Related Work
3 Platform Engine
The focus of this project is, more than developing new procedures or algorithms to
solving problems, putting these innovations on the hand of the user, with a clear pur-
pose: that these innovative tools should be guided to assist people in the context of
energetic sustainability.
Ambient Intelligence: Experiments on Sustainability Awareness 35
index (Fanger, 1970) for instance. Other application is the definition of sustainable
indicators according to custom mathematical formulae in the platform that shall proc-
ess some attributes in the system to make their calculation.
The configuration of data fusion steps, the selection of sensors and streams of data
is made on the initial step of the system by the local administrator. According to each
area of interest and with specialized knowledge obtained by experts it is possible to
monitor relevant information to build sustainable indicators.
As a case study, results from five days in an environment are presented. In this case a
home environment with a limited set of sensors, and a smartphone as a user detection
mechanism. User notifications are made by actuator modules which push notifications
to users in order to alert them based on notification schemes and personal rules. Sen-
sors include electrical consumption, temperature, humidity, luminosity and presence
sensing through smartphones and an indicator based on the sensation of temperature
PMV used in thermal comfort studies (Rana, Kusy, Jurdak, Wall, & Hu, 2013).
Ambient Intelligence: Experiments on Sustainability Awareness 37
The indicators are designed in the platform in order to perceive energy efficacy and
as such the case scenario uses electricity to do this analysis. Therefore, a list of sam-
ple indicator was defined using data fusion available through PHESS modules. A
sample of four indicators were defined and their expression is as follows:
5 Conclusions
References
1. Aztiria, A., Augusto, J.C., Basagoiti, R., Izaguirre, A., Cook, D.J.: Discovering frequent
user-environment interactions in intelligent environments. Personal and Ubiquitous
Computing 16(1), 91–103 (2012)
2. Fanger, P.O.: Thermal comfort: Analysis and applications in environmental engineering.
Danish Technical Press (1970)
3. Gomes, C.P.: Computational sustainability. In: Gama, J., Bradley, E., Hollmén, J. (eds.)
IDA 2011. LNCS, vol. 7014, p. 8. Springer, Heidelberg (2011).
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.158.2293&rep=rep1&type=pdf
4. Hagras, H., Doctor, F., Callaghan, V., Lopez, A.: An Incremental Adaptive Life Long
Learning Approach for Type-2 Fuzzy Embedded Agents in Ambient Intelligent Environ-
ments. IEEE Transactions on Fuzzy Systems 15(1), 41–55 (2007)
5. Rana, R., Kusy, B., Jurdak, R., Wall, J., Hu, W.: Feasibility analysis of using humidex as an
indoor thermal comfort predictor. Energy and Buildings 64, 17–25 (2013).
doi:10.1016/j.enbuild.2013.04.019
6. Costa, A., Novais, P., Simões, R.: A Caregiver Support Platform within the Scope of an
AAL Ecosystem. Sensors 14(3), 654–5676 (2014). MDPI AG, ISSN: 1424-8220
Artificial Intelligence in Medicine
Reasoning with Uncertainty
in Biomedical Models
1 Introduction
Mathematical models are extensively used in many biomedical domains for sup-
porting rational decisions. A mathematical model describes a system by a set
of variables and constraints that establish relations between them. Uncertainty
and nonlinearity play a major role in modeling most real-world continuous sys-
tems. A competitive framework for decision support in continuous domains must
provide an expressive mathematical model to represent the system behavior and
be able to perform sound reasoning that accounts for the uncertainty and the
effect of nonlinearity.
Given the uncertainty, there are two opposite attitudes for reasoning with
scenarios consistent with the mathematical model. Stochastic approaches [1]
reason on approximations of the most likely scenarios. They associate a proba-
bilistic model to the problem thus characterizing the likelihood of the different
scenarios. In contrast, constraint programming approaches [2] reason on safe
enclosures of all consistent scenarios. Rather than associate approximate values
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 41–53, 2015.
DOI: 10.1007/978-3-319-23485-4 5
42 A. Franco et al.
to real variables, intervals are used to include all their possible values. Model-
based reasoning and what-if scenarios are adequately supported through safe
constraint propagation techniques, which only eliminate combinations of values
that definitely do not satisfy model constraints.
In this work we use a probabilistic constraint approach that combines a
stochastic representation of uncertainty on the parameter values with a reliable
constraint framework robust to nonlinearity. Similarly to stochastic approaches
it associates an explicit probabilistic model to the problem, and similarly to
constraint approaches it assumes reliable bounds for the model parameters. The
approach computes conditional probability distributions of the model parame-
ters, given the uncertainty and the constraints.
The potential of our approach to support clinical practice is illustrated in
a real world problem from the obesity research field. The impact of obesity on
health, at both individual and public levels, is widely documented [3–5]. Despite
this fact, and the availability of nutritional recommendations and guidelines to
the general audience, the prevalence of overweight and obesity in adults and
children increased dramatically in the last 30 years [6]. According to the World’s
Health Organization, the main cause for the “obesity pandemics” is the energy
unbalance caused by an increased calorie intake associated to a lower energy
expenditure as a result of a sedentary lifestyle.
Many biomedical models use the energy balance approach to simulate indi-
vidual body weight dynamics, e.g. [7,8]. Change of body weight over time is
modeled as the rate of energy stored (or lost), which is a function of the energy
intake (from food) and the energy expended. However, the exact amount of calo-
ries ingested, or energy intake, is difficult to ascertain as it is usually obtained
through methods that underestimate its real value, such as self-reported diet
records [9].
The inability to rigorously assess the energy intake is considered by [10]
the “fundamental flaw in obesity research”. This fact hinders the success and
adherence to individual weight control interventions [11]. Therefore the correct
evaluation of such interventions will be highly dependent on the precision of
energy intake estimates and the assessment of the uncertainty inherent to those
estimates. In this paper we show how the probabilistic constraint framework can
be used in clinical practice to correctly characterize such uncertainty given the
uncertainty of the underlying biomedical model.
Next section overviews the energy intake problem and introduces a biomedi-
cal model used in clinical practice. Section 3 addresses constraint programming
and its extensions to differential equations and probabilistic reasoning. Section
4 shows how the problem is cast into the probabilistic constraint framework.
Section 5 discusses the experimental results and the last section summarizes the
main conclusions.
R=I −E (1)
where R is the energy stored or lost (kcal/d), I is the energy intake (kcal/d) and
E is the energy expended (kcal/d).
Several models have been applied to provide estimates of individual energy
intake [12,13]. Our paper focus on the work of [12] which developed a compu-
tational model to determine individual energy intake during weight loss. This
model, herein designated EI model, calculates the energy intake based on the
following differential equation:
dF dF F
cf + cl = I − (DIT + P A + RM R + SP A) (2)
dt dt
The left hand side of equation (2) represents the change in body’s energy
stores, R in equation (1), and is modeled through the weighted sum of the
changes in Fat mass (F ) - the body’s long term energy storage mechanism - and
Fat Free mass (F F ) - proxy for protein content used for energy purposes.
Differently from other models, that express the relationship between F and
F F using logarithmic eq. (3) [14], or linear approximations [15], the EI model
uses a fourth-order polynomial for estimating F F as a function of F , the age of
the subject a, and its height h eq. (4),
The rate of energy expended, E in equation (1), is the total amount of energy
spent in several physiological processes: Diet Induced Thermogenesis (DIT ) -
energy required to digest and absorb food; Physical Activity (P A) - energy
spent in volitional activities; Resting Metabolic Rate (RM R) - minimal amount
of energy used to sustain life and; Spontaneous Physical Activity (SP A) - energy
spent in spontaneous activities.
The EI model uses data from the 24-week CALERIE phase I study [16], in
particular body weight for one female subject of the caloric restriction group.
During the experiment, participants had their weight monitored every two weeks.
Those weight measures are used to estimate the real energy intake for that
particular individual.
3 Constraint Programming
A constraint satisfaction problem is a classical artificial intelligence paradigm
characterized by a set of variables and a set of constraints that specify relations
among subsets of these variables. Solutions are assignments of values to all vari-
ables that satisfy all the constraints. Constraint programming [2] is a form of
declarative programming which must provide a set of constraint reasoning algo-
rithms that take advantage of constraints to reduce the search space, avoiding
44 A. Franco et al.
y = f (p, y, t) (5)
is a restriction on the sequence of values that y can take over t. A solution, for
a time interval T , is a function that satisfies equation (5) for all values of t ∈ T .
Since (5) does not fully determine a single solution (but rather a family of
solutions), initial conditions are usually provided with a complete specification
of y at some time point t. An Initial Value Problem (IVP) is characterized by
an ODE system together with the initial condition y(t0 ) = y0 . A solution of the
IVP with respect to an interval of time T is the unique function that is a solution
of (5) and satisfies the initial condition.
Parametric ODEs are expressive mathematical means to model system
dynamics. Notwithstanding its expressive power, reasoning with such models
may be quite difficult, given their complexity. Analytical solutions are available
only for the simplest models. Alternative numerical simulations require precise
numerical values for the parameters involved, often impossible to gather given
the uncertainty on available data. This may be an important drawback since
small differences on input values may cause important differences on the output
produced.
Interval methods for solving differential equations with initial conditions [20]
do verify the existence of unique solutions and produce guaranteed error bounds
for the solution trajectory along an interval of time T . They use interval arith-
metic to compute safe enclosures for the trajectory, explicitly keeping the error
term within safe bounds.
Several extensions to constraint programming [21] were proposed for handling
differential equations based on interval methods for solving IVPs. An approach
that integrates other conditions of interest was proposed in [22] and successfully
applied to support safe decisions based on deep biomedical models [23].
In this paper we use an approach similar to [21] that allows the integration of
IVPs with the standard numerical constraints. The idea is to consider an IVP as
a function Φ where the first argument are the parameters p, the second argument
is the initial condition that must be verified at time point t0 (third argument)
and the last argument is a time point t ∈ T . A relation between the values at
two time points t0 and t1 along the trajectory is represented by the equation:
The advantages from this close collaboration between constraint pruning and
random sampling were previously illustrated in ocean color remote sensing stud-
ies [27] where this approach achieved quite accurate results even with small
sampling rates. The success of this technique relies on the reduction of the sam-
pling space where a pure non-naive Monte Carlo (adaptive) method is not only
hard to tune but also impractical in small error settings.
Recall that solving the above CCSP means finding the values for F0 and the
variables Fi , Ii (1 ≤ i ≤ n) that satisfy the above set of constraints.
F F (a, h, Fi ) = F F M (a, h, Fi ) + i
and we may rewrite the set of bi constraints of the CCSP model as follows,
bi ≡ wi = F F M (a, h, Fi ) + i + Fi
48 A. Franco et al.
where fi and hi are the normal distributions associated with the errors i and
δi respectively. The deviations are introduced in the model by considering con-
straints δi = i − i−1 (1 ≤ i ≤ n) determining their values from the i values.
A naive approximate algorithm for solving both alternative CCSP models
could be simply to perform Monte Carlo sampling in the space defined by
D (Fj ) × D (I1 ) × . . . × D (In ), with j ∈ {1, . . . , n}. Note that, given the con-
straints in the model, each sampled point determines the values of all variables
Fi and i (and δi ). From the values assigned to i (and δi ), eq. 11 (or 12) can be
used to compute an estimate of its probability, as shown in (8).
With this approach, accurate results are hard to obtain for increasing number
n+1
of observations due to the huge size of sampling space O |D| . Instead, we
developed an improved technique that is able to drastically reduce both the
exponent n and the base |D| of this expression, as described in the following
section.
4.3 Method
The main idea is to avoid considering all variables simultaneously but instead to
reason only with a small subset that changes incrementally over time. For each
observation i, we can compute the probability distributions of the variables of
interest given the past knowledge already accumulated.
We start by computing the probability distribution of F0 given the initial
weight w0 subject to the constraint b0 and the bounding constraints for 0 . This
Reasoning with Uncertainty in Biomedical Models 49
5 Experimental Results
This section demonstrates how to the previously described method may be used
to improve the applicability of the EI model by complementing its predictions
with measures of confidence. The algorithm was implemented in C++ and used
for obtaining the probability distribution approximations P (Fi , Ii ) at each
observation i ∈ {1, . . . , 12} of a 45 year-old woman over the course of the 24-
week trial (CALERIE Study phase I). The runtime was about 2 minutes per
observation on an Intel Core i7 @ 2.4 GHz.
Fat Free mass is estimated using two distinct models: F F poly (eq. 4), and
F F log (eq. 3). Both of these models were initially fit to a set of 7278 north
american women resulting in the corresponding standard deviation of the error,
σpoly = 3.35 and σlog = 5.04. This data set was collected during NHANES
surveys (1994 to 2004) and is available online at the Centers for Disease Control
and Prevention website [28].
We considered also different assumptions regarding independence of the error:
the uncorrelated error model (11), and a correlated error model (12) with a small
σδ = 0.5. Note that, due to current data access restrictions, this latter value is
purely illustrative.
The following techniques could be used for assessing propagation of uncer-
tainty.
Fig. 1. Probabilities of fat mass (F ) and intake (I) on the first clinical observation
(t = 14). Top and bottom rows shows results for different F F models. Left and right
columns correspond to different assumptions regarding independence of model errors.
In figure 1 we plot the obtained results regarding the first observation i = 1 for
each combination of F F model and error correlation. The following is apparent
from these plots: a) Uncertainty on F is positively correlated with uncertainty
on I; b) The assumption of independence between model errors on consecutive
weeks drastically affects the predicted marginal distribution of I (compare hor-
izontally); c) The improved accuracy of F F poly model (note that σpoly < σlog )
reflects in slightly sharper F estimates, but does not seem to impact the estima-
tion of I (compare vertically).
To perceive the effect of the uncertainty on the estimated variables over time,
it is useful to marginalize the computed joint probability distributions. Figure
2 shows the estimated Fi , and Ii over time, for each of the error correlation
assumptions. Since the results concerning the F F poly model are very similar to
those obtained for F F log , and for space economy reasons, we focus only on the
former.
Each box in these plots depicts the most probable value (marked in the center
of the box), the union hull of the 50% most probable values (the rectangle), and
the union hull of the 82% most probable values (the whiskers). Additionally,
each plot overlays the estimates obtained from the algorithm published by the
author of the EI model [7].
Reasoning with Uncertainty in Biomedical Models 51
The presented results show that the previous conclusions for the case of i = 1
extend for all remaining observations. Additionally, an interesting phenomena
occurs in the case of correlated error: the uncertainty in the estimation of F
decreases slightly over time. This is most probably the consequence of having,
at each new observation, an increasingly constrained problem for which the size
of the solution space is consequently increasingly smaller. At least in the case of
F , more information seems to lead to signicatively better estimations. This can
not occur if the errors are independent, as is indeed confirmed in the plots.
Finally, our results show that in some cases the most probable values obtained
by [7] are crude approximations to their own proposed model.
Fig. 2. Most probable intervals for the values of I (top row) and F (bottom row) over
time using the F poly model. Left and right columns correspond to different assumptions
regarding independence of model errors. The continuous line plots estimates obtained
in [12].
5.3 Best-Fit
Although the presented algorithm is primarly intended for characterizing uncer-
tainty in model predictions, it is nevertheless a sound method for obtaining
the predictions themselves. Indeed, as the magnitude of the error in the model
parameters decreases (σ in our example), the obtained joint probability distri-
butions will converge to the correct solution of the model.
52 A. Franco et al.
6 Conclusions
The standard practice for characterizing confidence on the predictions resulting
from a complex model is to perform controlled experiments. In the biomedi-
cal field, this often translates to closely monitoring distinct groups of subjects
over large periods of time, and assessing the fitness of the model statistically.
While the empirical approach has its own advantages, namely that it does not
require a complete understanding of the implications of the individual assump-
tions and approximations made in the model, it has some important shortcom-
ings. Depending on the medical field, controlled experiments are not always
practical, do not convey enough statistical significance, or have associated high
costs.
Contrary to the empirical, black-box approach, this paper proposes to charac-
terize the uncertainty on the model estimates by propagating the errors stemming
from each of its parts. The described technique extends constraint programming
to integrate probabilistic reasoning and constraints modeling dynamic behaviour,
offering a mathematically sound and efficient alternative.
The application field of the presented approach is quite broad: it targets
models which are themselves composed of other, possibly identically complex
(sub)models, for which there is a known characterization of the error. The
selected case-study is a good example: the EI model is a fairly complex model
including dynamic behaviour and nonlinear relations, and integrates various
(sub)models with associated uncertainty. The experimental section illustrated
how different choices for one of these (sub)models, the F F model, impacts the
error of the complete EI model.
Probabilistic constraint programming offers modeling and reasoning capabil-
ities that go beyond the traditional alternatives. This approach has the potential
to bridge the gap between theory and practice by supporting reliable conclusions
from complex biomedical models taking into account the underlying uncertainty.
References
1. Halpern, J.Y.: Reasoning about Uncertainty. MIT Press (2003)
2. Rossi, F., Beek, P.V., Walsh, T. (eds.): Handbook of Constraint Programming.
Foundations of Artificial Intelligence. Elsevier Science (2006)
3. Swinburn, B.A., et al.: The global obesity pandemic: shaped by global drivers and
local environments. The Lancet 378(9793), 804–814 (2011)
4. Leahy, S., Nolan, A., O’Connell, J., Kenny, R.A.: Obesity in an ageing society
implications for health, physical function and health service utilisation. Technical
report, The Irish Longitudinal Study on Ageing (2014)
5. Lehnert, T., Sonntag, D., Konnopka, A., Heller, S.R., König, H.: Economic costs of
overweight and obesity. Best Pract Res Clin Endoc. Metab. 27(2), 105–115 (2013)
6. Ng, M., et al.: Global, regional, and national prevalence of overweight and obesity in
children and adults during 1980–2013: a systematic analysis for the global burden
of disease study 2013. The Lancet 384(9945), 766–781 (2014)
Reasoning with Uncertainty in Biomedical Models 53
7. Thomas, D., Martin, C., Heymsfield, S., Redman, L., Schoeller, D., Levine, J.:
A simple model predicting individual weight change in humans. J. Biol. Dyn. 5(6),
579–599 (2011)
8. Christiansen, E., Garby, L., Sørensen, T.I.: Quantitative analysis of the energy
requirements for development of obesity. J. Theor. Biol. 234(1), 99–106 (2005)
9. Hill, R., Davies, P.: The validity of self-reported energy intake as determined using
the doubly labelled water technique. Brit. J. Nut. 85, 415–430 (2001)
10. Winkler, J.T.: The fundamental flaw in obesity research. Obesity Reviews 6,
199–202 (2005)
11. Champagne, C.M., et al.: Validity of the remote food photography method for
estimating energy and nutrient intake in near real-time. Obesity 20(4), 891–899
(2012)
12. Thomas, D.M., Schoeller, D.A., Redman, L.M., Martin, C.K., Levine, J.A.,
Heymsfield, S.: A computational model to determine energy intake during weight
loss. Am. J. Clin. Nutr. 92(6), 1326–1331 (2010)
13. Hall, K.D., Chow, C.C.: Estimating changes in free-living energy intake and its
confidence interval. Am. J. Clin. Nutr. 94, 66–74 (2011)
14. Forbes, G.B.: Lean body mass-body fat interrelationships in humans. Nutr. Rev.
45, 225–231 (1987)
15. Thomas, D., Ciesla, A., Levine, J., Stevens, J., Martin, C.: A mathematical model
of weight change with adaptation. Math. Biosci. Eng. 6(4), 873–887 (2009)
16. Redman, L.M., et al.: Effect of calorie restriction with or without exercise on body
composition and fat distribution. J. Clin. Endocrinol. Metab. 92(3), 865–872 (2007)
17. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th
IJCAI, pp. 232–238 (1993)
18. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138. MIT Press (1994)
19. Hentenryck, P.V., Mcallester, D., Kapur, D.: Solving polynomial systems using a
branch and prune approach. SIAM J. Num. Analysis 34, 797–827 (1997)
20. Moore, R.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966)
21. Goldsztejn, A., Mullier, O., Eveillard, D., Hosobe, H.: Including ordinary differ-
ential equations based constraints in the standard CP framework. In: Cohen, D.
(ed.) CP 2010. LNCS, vol. 6308, pp. 221–235. Springer, Heidelberg (2010)
22. Cruz, J.: Constraint Reasoning for Differential Models. Frontiers in Artificial Intel-
ligence and Applications, vol.126. IOS Press (2005)
23. Cruz, J., Barahona, P.: Constraint reasoning in deep biomedical models. Artificial
Intelligence in Medicine 34(1), 77–88 (2005)
24. Nedialkov, N.: Vnode-lp a validated solver for initial value problems in ordinary
differential equations. Technical report, McMaster Univ., Hamilton, Canada (2006)
25. Carvalho, E.: Probabilistic Constraint Reasoning. PhD thesis, FCT/UNL (2012)
26. Hammersley, J., Handscomb, D.: Monte Carlo Methods. Methuen London (1964)
27. Carvalho, E., Cruz, J., Barahona, P.: Probabilistic constraints for nonlinear inverse
problems. Constraints 18(3), 344–376 (2013)
28. National health and nutrition examination survey. http://www.cdc.gov/nchs/
nhanes.htm
Smart Environments and Context-Awareness
for Lifestyle Management in a Healthy Active
Ageing Framework*
Abstract. Health trends of elderly in Europe motivate the need for technologi-
cal solutions aimed at preventing the main causes of morbidity and premature
mortality. In this framework, the DOREMI project addresses three important
causes of morbidity and mortality in the elderly by devising an ICT-based home
care services for aging people to contrast cognitive decline, sedentariness and
unhealthy dietary habits. In this paper, we present the general architecture of
DOREMI, focusing on its aspects of human activity recognition and reasoning.
1 Introduction
According to the University College Dublin Institute of Food and Health, three are the
most notable health promotion and disease prevention programs that target the main
causes of morbidity and premature mortality: malnutrition, sedentariness, and cogni-
tive decline, conditions that particularly affect the quality of life of elderly people and
drive to disease progression. These three features represent the target areas in the
DOREMI project. The project vision aims at developing a systemic solution for the
elderly, able to prolong the functional and cognitive capacity by stimulating, and un-
obtrusively monitoring the daily activities according to well-defined “Active Ageing”
lifestyle protocols. The project joins the concept of prevention centered on the elderly,
characterized by a unified vision of being elderly today, namely, a promotion of the
health by a constructive interaction among mind, body, and social engagement.
This work has been funded in the framework of the FP7 project “Decrease of cOgnitive de-
cline, malnutRition and sedEntariness by elderly empowerment in lifestyle Management and
social Inclusion” (DOREMI), contract N.611650.
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 54–66, 2015.
DOI: 10.1007/978-3-319-23485-4_6
Smart Environments and Context-Awareness for Lifestyle Management * 55
To fulfill these goals, food intake measurements, exergames associated to social in-
teraction stimulation, and cognitive training programs (cognitive games) will be pro-
posed to an elderly population enrolled during a pilot study. The DOREMI project is
going further with respect to the current state of the art by developing, testing, and
exploiting with a short-term business model impact a set of IT-based (Information
Technology) services able to:
• Stimulate elderly people in modifying dietary needs and physical activity accord-
ing to the changes in age through creative, personalized, and engaging solutions;
• Monitor parameters of the elderly people to support the specialist in the daily veri-
fication of the compliance of the elderly with the prescribed lifestyle protocol, in
accordance with his/her response to physical and cognitive activities.
• Advise the specialist with different types and/or intensities of daily activity for
improving the elderly health, based on the assigned protocol progress assessment.
• Empower aging people by offering them knowledge about food and physical activ-
ity effectiveness, to let them become the main actors of their health.
To reach these objectives, the project builds over interdisciplinary knowledge en-
compassing health and artificial intelligence, the latter covering aspects ranging from
sensing, machine learning, human-machine interfaces, and games. This paper focuses
on the machine learning contribution of the project, which applies to the analysis of
the sensor data with the purpose of identifying users’ conditions (in terms of balance,
calories expenditure, etc.) and activities, detecting changes in the users’ habits, and
reasoning over such data. The ultimate goal of this data analysis is to support the user
who is following the lifestyle protocol prescribed by the specialist, by giving him
feedbacks through an appropriate interface, and by providing the specialist with in-
formation about the user lifestyle. In particular, the paper gives a snapshot of the sta-
tus of the project (which just concluded the first year of activity) in the design of the
activity recognition and reasoning components.
Exploratory data analysis (EDA) analyzes data sets to find their main features [1],
beyond what can be found by formal modeling or hypothesis testing task. When deal-
ing with accelerometer data, features are classified in three categories: time domain,
frequency domain, and spatial domain [2]. In the time domain, we use the standard
deviation in a frame, which is indicative of the acceleration data and the intensity of
the movement during the activity. In the frequency domain, frequency-domain entro-
py helps the distinction of activities with similar energy intensity by comparing their
periodicities. This feature is computed as the information entropy of the normalized
Power Spectral Density (PSD) function of the input signal without including the DC
component (mean value of the waveform). The periodicity feature evaluates the peri-
odicity of the signal that helps to distinguish cyclic and non-cyclic activities. In the
spatial domain, orientation variation is defined by the variation of the gravitational
56 D. Bacciu et al.
components at three axes of the accelerometer sensor. This feature effectively shows
how severe the posture change can be during an activity.
Other EDA tasks of the project concern unsupervised user habits detection aimed at
finding behavioral anomalies, by retrieving heterogeneous and multivariate timeseries
of sensor data, over long periods. In the project, these tasks are unsupervised to avoid
obtrusive data collection campaign at the user site. For this reason, we focus on motif
search on sensory data collected in the test by exploiting the results obtained in the field
of time series motifs discovery [3,4]. Time series motifs are approximately repeated
patterns found within the data. The approach chosen is based on stigmergy. Several
works used this technique in order to infer motifs in time series related to different
fields, from DNA and biological sequences [5,6] to intrusion detection systems [7].
Human activity recognition refers to the process of inferring human activities from
raw sensor data [8], classifying or evaluating specific sections of the continuous sensors
data stream into specific human activities, events or health parameters values. Recently,
the need for adaptive processing of temporal data from potentially large amounts of
sensor data has led to an increasing use of machine learning models in activity recogni-
tion systems (see [9] for a recent survey), especially due to their robustness and flexibili-
ty. Depending on the nature of the treated data, of the specific scenario considered and
of the admissible trade-off among efficiency, flexibility and performance, different su-
pervised machine learning methods have been applied in this area.
Among others, Neural Network for sequences, including Recurrent Neural Net-
works (RNNs) [10], are considered as a class of learning models suitable for ap-
proaching tasks characterized by a sequential/temporal nature, and able to deal with
noisy and heterogeneous input data streams. Within the class of RNNs, the Reservoir
Computing (RC) paradigm [11] in general, and the Echo State Network (ESN) model,
[12,13] in particular, represent an interesting efficient approach to build adaptive non-
linear dynamical systems. The class of ESNs provides predictive models for effi-
ciently learning in sequential/temporal domains from heterogeneous sources of noisy
data, supported by theoretical studies [13,14] and with hundreds of relevant successful
experimental studies reported in literature [15]. Interestingly, ESNs have recently
proved to be particularly suitable for processing noisy information streams originated
by sensor networks, resulting in successful real-world applications in supervised
computational tasks related to AAL (Ambient Assisted Living) and human activity
recognition. This is also testified by some recent results [16,17,18,19,20], which may
be considered as a first preliminary experimental assessment of the feasibility of ESN
to the estimation of some relevant target human parameters, although obtained on
different and broader AAL benchmarks.
At the reasoning level, our interest is for hybrid approachs founded on static rules
and probabilistic methods. Multiple‐stage decisions refer to decision tasks that consist
of a series of interdependent stages leading towards a final resolution. The deci-
sion‐maker must decide at each stage what action to take next in order to optimize
performance (usually utility). Some examples of this sort are working towards a de-
gree, troubleshooting, medical treatment, budgeting, etc. Decision trees are a useful
mean for representing and analyzing multiple-stage decision tasks; they support deci-
sions learned from data, and their terminal nodes represent possible consequences [21].
Smart Environments and Context-Awareness for Lifestyle Management * 57
Other popular approaches, which have been used to implement medical expert sys-
tems, are Bayesian Networks [22] and Neural Networks [23], but they require many
empirical data to train the algorithms and are not appropriate to be manually adjusted.
On the other hand, in our problem the decision process must be transparent and main-
ly requires static rules based on medical guidelines provided by the professionals.
Thus, the decision trees are the best solution since they provide a very structured and
easy to understand graphical representation. There also exist efficient and powerful
algorithms for automated learning of the trees [24,25,26]. A decision tree is a flow-
chart-like structure in which an internal node represents the test on an attribute, each
branch represents a test outcome and each leaf node represents a class label (decision
taken after computing all attributes). A path from root to leaf represents classification
rules. Decision trees give a simple representation for classifying examples. In general,
as for all machine learning algorithms, the accuracy of the algorithms increases with
the number of sample data. In applications in which the number of samples is not
large, a high number of decisions could lead to problems. In these cases, a possible
solution is the use of a Hybrid Decision Tree/Genetic Algorithm approach as
suggested in [27].
Our main objective is to provide a solution for prolonging the functional and cogni-
tive capacity of the elderly by proposing an “Active Ageing” lifestyle protocol. Medi-
cal specialists monitor the progress of their patients daily through a dashboard and
modify the protocol for each user according to their capabilities. A set of mobile ap-
plications (social games, exer-games, cognitive games and diet application) feedback
the protocol proposed by the specialist and the progress of games to the end user. The
monitoring of each user is achieved by means of a network of sensors, either wearable
or environmental, and applications running on personal mobile devices. The human
activity recognition (HAR) measures characteristics of the elderly lifestyle in the
physical and social domains through non-invasive monitoring solutions based on the
sensor data. Custom mobile applications cover the areas of diet and cognitive moni-
toring. In the rest of this section, we present the main requirements of the HAR.
By leveraging environmental sensors, such as PIRs (Passive InfraRed) and a loca-
lization system, the HAR module profiles user habits in terms of daily ratio of room
occupancy and indoor/outdoor living. The system is also able to detect changes in the
user habits that occur in the long-term. By relying on accelerometer and heartbeat
data from a wearable bracelet, the HAR module provides time-slotted estimates, in
terms of calories, of the energy expenditure associated with the physical activities of
the user. Energy consumption can result from everyday activities and physical exer-
cises proposed by the protocol. The system also computes daily outdoor covered dis-
tance, the daily number of steps and detects periods of excessive physical stress by
using data originated by the accelerometer and the heartbeat in the bracelet. Finally, a
smart carpet is used to measure the user weight and balance skills, leveraging a ma-
chine learning classification model based on the BERG balance assessment test.
58 D. Bacciu et al.
The HAR assesses the social interactions of the user both indoor and outdoor. In par-
ticular, in the indoor case, HAR estimates a quantitative measure of the social interac-
tions based on the occurrence and duration of the daily social gatherings at the user
house. Regarding the outdoor socialization, the system estimates the duration of the
encounters with other users by detecting the proximity of the users’ devices.
The Reasoner uses the data produced by the HAR, the diet, and the games applica-
tions to provide an indicator of the user protocol compliance and protocol progress in
three areas: social life, physical activity and related diet, cognitive status. These indi-
cators, along with the measured daily metrics and aggregate data, support medical
specialists on providing periodical changes to the protocol (i.e.: set of physical activi-
ties and games challenges, diet). The Reasoner is able to suggest changes to the user
protocol by means of specialist-defined rules.
The HAR module and the Reasoner are, therefore, core system modules, bridging
the gap between sensors data, medical specialists, and the end user.
4 An Applicative Scenario
We consider a woman in her 70s, still independent and living alone in her apartment
(for the sake of simplicity, we give her the name of Loredana). She is a bit overweight
and she starts forgetting things. Recently, the specialist told her that she is at risk for
cardiovascular disease, due to her overweight condition, and that she has a mild cog-
nitive impairment. For this reason, Loredana uses our system as a technological sup-
port to monitor her life habits and to keep herself healthier and preventing chronic
diseases. In a typical day, Loredana measures her weight and balance by means of a
smart carpet, which collects data for the evaluation of her BERG scale equilibrium
and her weight. The data concerning the balance is used to suggest a personalized
physical activity (PA) plan, while the data about the weight give indications about the
effectiveness of the intervention in terms of a personalized diet regimen and PA plan.
During the day, Loredana wears a special bracelet, which measures (by means of an
accelerometer) her heart rate, how much she walked, and how many movements she
did with physical exercises. These data are used, by the system developed in the
project, to assess her calories expenditure and to monitor the execution of the pre-
scribed physical exercises. The bracelet is also used to localize her both indoor (also
collecting information about the time spent in each room) and outdoor (collecting
information about distance covered). Furthermore, the bracelet detects the proximity
of Loredana with other users wearing the same bracelet, while machine-learning clas-
sification models based on environmental sensors deployed at home (PIRs and door
switches) detect the presence of other people in her apartment to give indication about
the number of received visits. These data are used as an indicator of her social life.
Loredana also uses a tablet to interact with the system, with which she performs
cognitive games and inserts data concerning her meals, which are converted by the
system (under the supervision of her specialist) in daily Kcalories intake and food
composition. She is also guided through the daily physical exercises and games that are
selected by the system (under the supervision of her specialist) based on the evolution
Smart Environments and Context-Awareness for Lifestyle Management * 59
of her conditions (in terms of balance, weight, physical exercises etc.). All the data
collected during the day are processed at night to produce a summary of the Loredana
lifestyle, with the purpose of giving feedbacks to Loredana in terms of proposed physi-
cal activity, and presenting the condition of Loredana to the specialist on a daily basis.
pilot site
outdoor indoor
remote server
Pre-processing EDA
Subsystem Subsystem
Middleware
internet sensors data RAW HOMER KIOLA Reasoner
DB DB DB
Activity
Task Configurator
Recognition
Subsystem
Physician practice Subsystem
Fig. 1. High-level architecture and deployment of a typical installation, with data flowing from
a pilot site to the remote Activity Recognition and Reasoning system.
information, which the user himself asserts daily through an app on his tablet. Finally,
the sedentariness data flow exploits data produced by the bracelet (heart rate, user’s
localization, movements, and step count), the smart carpet, and data about the use of
the application that guides the user through the daily physical activity. The HAR sub-
system processes the data of the smart carpet to assess the user balance according to
the BERG scale. The HAR and EDA subsystems also process data from the bracelet
to assess the intensity of the physical effort.
The Reasoner fuses all these data flows at a higher level than that of HAR and
EDA. In a first step, it exploits rules extracted from clinical guidelines to compute
specific parameters for each of the three data flows. For example, it uses the physical
activity estimation in the sedentariness data flow to assess the compliance of the user
to the prescribed lifestyle protocol (as the medical expert defines it). In the second
step, the Reasoner performs a cross-domain reasoning on top of the first step, allow-
ing a deeper insight in the well-being of the patient. Note that the empirical rules
needed to define the second-level reasoning protocol are yet not available, as they are
the output of medical studies on the data collected from the on-site experimentation
that will be concluded in the next year of the project activity. Hence, at this stage of
the project, the second level reasoning is not yet implemented.
Note that the Reasoner subsystem operates at a different time-scale with respect to
the HAR and EDA subsystems. In fact, the aim of HAR and EDA is the recognition of
short-term activities of the user. These can be recognized from a sequence of input
sensor information (possibly pre-processed) in a limited (short-term) time window.
Smart Environments and Context-Awareness for Lifestyle Management * 61
All short-term predictions generated across the day are then forwarded to the Reasoner
for information integration across medium/long time scales. Medium term reasoning
operates over 24h periods (for example, to assess the calories assumption/consumption
balance in a day). Long-term reasoning, on the other hand, shows general trends by
aggregating information on the entire duration of the experimentation in the pilot sites
(for example to offer statistical data about the user, which the medical experts can use
to assess the overall user improvement during the experimentation).
RAW data
Data
assembled
task
DB Interface data
AR1 … ARL
A.R. POOL
Task
HOMER
configuration Interface
DB activity
recognition
control outputs
commands
Output
Interface KIOLA
A.R. DB
Scheduler
components (based on predictive learning models), one for each specific task. These
components implement the trained predictive learning model obtained from a prelim-
inary validation phase, and they produce their predictions by computing the outputs of
the supervised learning model in response to the input data.
Fig. 3. Architecture of the Reasoner and its relationship with the other system components
The reasoning module and the dashboard are integrated in the KIOLA modular
platform, suitable for clinical data and therapy management. It is built on top of the
open-source web-framework Django1 and uses PostgreSQL 9.4 as primary data sto-
rage. KIOLA has two groups of components: core components (that provide data
1
http://www.djangoproject.com
64 D. Bacciu et al.
models for receiving and storing external sensor data, rule-based reasoning on obser-
vations, and messaging services to communicate results of the reasoning to external
systems), and frontend components (a dashboard for the specialists, an administrative
interface, and a search engine for all data stored in KIOLA). In particular, the dash-
board provides specialists with the possibility to review and adjust clinical protocols
online, and it is designed for both mobile devices and computers. The dashboard can
provide either an overview of all end-users to which the specialist has access, or a
detailed view of a specific end-user. Charts are used to visualize all observations in
the area of social, physical games, and dietary data (see Fig. 4). A task module on the
dashboard is also used to notify specialists when the reasoning system suggests an
adoption of the clinical protocol. Here, the specialist can then approve or disapprove
the recommendation, and he can tune the parameters of the protocol by himself.
6 Conclusions
The DOREMI project addresses three important causes of morbidity and mortality in
the elderly (malnutrition, sedentariness and cognitive decline), by designing a solution
aimed at promoting an active aging lifestyle protocol. It envisages to provide an ICT-
based home care services for aging people to contrast cognitive decline, sedentariness
and unhealthy dietary habits. The proposed approach builds on activity recognition
and reasoning subsystems, which are the scope of this paper. At the current stage of
the project such components are being deployed, and they will be validated in the
course of the year by means of an extensive data collection campaign aimed at obtain-
ing the annotated datasets. These datasets, that are currently being collected over a
group of elderly volunteers in Pisa, in view of the experimentation in the pilot sites
planned by the beginning of 2016.
References
1. Tukey, J.W.: Exploratory data analysis, pp. 2–3 (1977)
2. Long, X., Yin, B., Aarts, R.M.: Single-accelerometer-based daily physical activity classifi-
cation. In: Engineering in Medicine and Biology Society, EMBC 2009. Annual Interna-
tional Conference of the IEEE. IEEE (2009)
3. Fernández-Llatas, C., et al.: Process Mining for Individualized Behavior Modeling Using
Wireless Tracking in Nursing Homes. Sensors 13(11), 15434–15451 (2013)
4. der Aalst, V., Wil, M.P., et al.: Workflow mining: A survey of issues and approaches. Data
& Knowledge Engineering 47(2), 237–267 (2003)
5. Yang, C.-H., Liu, Y.-T., Chuang, L.-Y.: DNA motif discovery based on ant colony optimi-
zation and expectation maximization. In: Proceedings of the International Multi Confe-
rence of Engineers and Computer Scientists, vol. 1 (2011)
6. Bouamama, S., Boukerram, A., Al-Badarneh, A.F.: Motif finding using ant colony optimi-
zation. In: Dorigo, M., Birattari, M., Di Caro, G.A., Doursat, R., Engelbrecht, A.P.,
Floreano, D., Gambardella, L.M., Groß, R., Şahin, E., Sayama, H., Stützle, T. (eds.) ANTS
2010. LNCS, vol. 6234, pp. 464–471. Springer, Heidelberg (2010)
Smart Environments and Context-Awareness for Lifestyle Management * 65
7. Cui, X., et al.: Visual mining intrusion behaviors by using swarm technology. In: 2011
44th Hawaii International Conference on System Sciences (HICSS). IEEE (2011)
8. Bao, L., Intille, S.S.: Activity Recognition from User-Annotated Acceleration Data. In:
Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 1–17. Springer,
Heidelberg (2004)
9. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wearable sen-
sors. Communications Surveys & Tutorials, IEEE 15(3), 1192–1209 (2013)
10. Kolen, J., Kremer, S. (eds.): A Field Guide to Dynamical Recurrent Networks. IEEE Press
(2001)
11. Lukoševicius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network
training. Computer Science Review 3(3), 127–149 (2009)
12. Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving
energy in wireless communication. Science 304(5667), 78–80 (2004)
13. Gallicchio, C., Micheli, A.: Architectural and markovian factors of echo state networks.
Neural Networks 24(5), 440–456 (2011)
14. Tino, P., Hammer, B., Boden, M.: Markovian bias of neural based architectures with
feedback connections. In: Hammer, B., Hitzler, P. (eds.) Perspectives of neural-symbolic
integration. SCI, vol. 77, pp. 95–133. Springer-Verlag, Heidelberg (2007)
15. Lukoševičius, M., Jaeger, H., Schrauwen, B.: Reservoir Computing Trends. KI -
Künstliche Intelligenz 26(4), 365–371 (2012)
16. Bacciu, D., Barsocchi, P., Chessa, S., Gallicchio, C., Micheli, A.: An experimental charac-
terization of reservoir computing in ambient assisted living applications. Neural Compu-
ting and Applications 24(6), 1451–1464 (2014)
17. Chessa, S., et al.: Robot localization by echo state networks using RSS. In: Recent Ad-
vances of Neural Network Models and Applications. Smart Innovation, Systems and
Technologies, vol. 26, pp. 147–154. Springer (2014)
18. Palumbo, F., Barsocchi, P., Gallicchio, C., Chessa, S., Micheli, A.: Multisensor data fusion
for activity recognition based on reservoir computing. In: Botía, J.A., Álvarez-García, J.A.,
Fujinami, K., Barsocchi, P., Riedel, T. (eds.) EvAAL 2013. CCIS, vol. 386, pp. 24–35.
Springer, Heidelberg (2013)
19. Bacciu, D., Gallicchio, C., Micheli, A., Di Rocco, M., Saffiotti, A.: Learning context-
aware mobile robot navigation in home environments. In: 5th IEEE Int. Conf. on Informa-
tion, Intelligence, Systems and Applications (IISA) (2014)
20. Amato, G., Broxvall, M., Chessa, S., Dragone, M., Gennaro, C., López, R., Maguire, L.,
Mcginnity, T., Micheli, A., Renteria, A., O’Hare, G., Pecora, F.: Robotic UBIquitous
COgnitive network. In: Novais, P., Hallenborg, K., Tapia, D.I., Rodrìguez, J.M. (eds.)
Ambient Intelligence - Software and Applications. AISC, vol. 153, pp. 191–195. Springer,
Heidelberg (2012)
21. Lavrac, N., et al.: Intelligent data analysis in medicine. IJCAI 97, 1–13 (1997)
22. Chae, Y.M.: Expert Systems in Medicine. In: Liebowitz, J. (ed.) The Handbook of applied
expert systems, pp. 32.1–32.20. CRC Press (1998)
23. Gurgen, F.: Neuronal-Network-based decision making in diagnostic applications. IEEE
EMB Magazine 18(4), 89–93 (1999)
24. Anderson, J.R., Machine learning: An artificial intelligence approach. In: Michalski, R.S.,
Carbonell, J.G., Mitchell, T.M. (eds.) vol. 2. Morgan Kaufmann (1986)
25. Hastie, T., et al.: The elements of statistical learning, vol. 2(1). Springer (2009)
26. Murphy, K.P.: Machine learning: a probabilistic perspective. MIT Press (2012)
66 D. Bacciu et al.
27. Carvalho, D.R., Freitas, A.A.: A hybrid decision tree/genetic algorithm method for data
mining. Information Sciences 163(1), 13–35 (2004). [EDA1] Tukey, J.W.: Exploratory da-
ta analysis, pp. 2–3 (1977)
28. Fuxreiter, T., et al.: A modular plat- form for event recognition in smart homes. In: 12th
IEEE Int. Conf. on e-Health Networking Applications and Services (Healthcom), pp. 1–6
(2010)
29. Kreiner, K., et al.: Play up! A smart knowledge-based system using games for preventing
falls in elderly people. Health Informatics meets eHealth (eHealth 2013). In: Proceedings
of the eHealth 2013, OCG, Vienna, pp. 243–248 (2013). ISBN: 978-3-85403-293-9
Gradient: A User-Centric Lightweight
Smartphone Based Standalone Fall
Detection System
1 Introduction
Advances in health care services and techonologies, and decrease in fertility rate
(especially in developed countries) are bringing major demographic changes in
aging. Currently, around 10% world population is aged over 65 and it is estimated
that this demography will see a 10 times increase in next 50 years [1]. It is
observed that fall is the major contributor to the growing rates of mortality,
morbidity of aging population, and complications induced by fall contribute
to the increase in higher health care cost also [2]. According to U.S. Census
Bureau, 13% of the population is over 65 years old, out of which 40% homely
old-age adults fall atleast once a year and 1 in 40 is hospitalised. Those who are
hospitalised have 50% of chance to be alive a year later. Fall is a major health
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 67–78, 2015.
DOI: 10.1007/978-3-319-23485-4 7
68 A. Bhatia et al.
threat to not only indpendently living elders [3] but also to community-dwelling
ones where fall rate is estimated to be 30% to 60% annually [4]. Elders who suffer
from visual impairment, urinary incontinence, and functional limitations are at
increased risk of recurrent falls [5]. Therefore, pervasive fall detection systems
that meet the needs of aging population is a necessity especially since increase
of aging population and estimated decrease in health care professionals demand
for technology assisted intelligent health care solutions [6].
Considerable efforts have been made to design fall detection solutions for
elderly populations, see [7] for a comprehensive list of solutions proposed by
researchers. As more of society is relying on smartphones because of its ubiq-
uitous nature of internet connectivity and computing, smartphone based fall-
detection assisted health monitoring and emergency response system is highly
desired by elderly population. In fact, recent trend suggests that smartphones
are reducing the need for wearing watches, the most common wearable gadget
in previous centuries [8] making smartphones the most commonly carried device
by people. Naturally, effective smartphone based fall-detection solutions that do
not require any infrastructure support external to smartphones are more fitting
the needs of elderly population.
In this work, we aim to design a smartphone based standalone fall detec-
tion system that is portable, cost efficient, user friendly, privacy preserving [9],
and requiring only existing cellphone technology. In addition, such solution must
exhibit low memory and low computational overhead which is only fitting since
cellphones are constrained by limited energy and limited memory. Clearly, this
motivates the design approaches centering around existing sensing technology
available in cellphones. Among sensor data, accelerometer sensor is considered
to be accurate [10] and therefore, a natural choice for design of such system.
In fact, accelerometer and orientation sensor combination based solutions are
proposed in past researches, for example, iFall [11] are some of the most notable
solutions that meet the design requirements mentioned above. However, through
experimentation, we show that smartphone fall detection solutions that involve
accelerometer data supplemented with orientation sensor data are not very accu-
rate. Therefore, to meet the above requirements, we propose a novel fall detection
mechanism that utilizes gravity sensor data and accelerometer data. Experimen-
tal comparision shows that the proposed approach is superior as compared to its
peers. In our work, Android operating system platform is used for experimenta-
tion and implementation purposes.
This paper is organized as follows: in Section 2, we discuss recent and land-
mark research work in this area. Section 3 discusses an Android based applica-
tion for data acquisition and our proposed fall detection method. In Section 4,
alpha test is performed on Gradient using simulated data. Finally, summary and
possible future work are discussed in section 5.
2 Related Work
Various studies have been carried out on fall detection using wearable sensors
and mobile phones. We divide the fall detection systems in two categories:
Gradient: A User-Centric Lightweight Smartphone 69
The first accelerometer data driven fall detection system was proposed in [12].
Their system detects a fall when there is a change in body orientation from
upright to lying that occurs immediately after a large negative acceleration. This
system design later becomes a reference point for many fall detection algorithms
using accelerometers.
A popular Android phone application, iFall [11], is developed for fall mon-
itoring and response. Data acquired from the accelerometer with the help of
application is evaluated with several threshold based algorithms to determine a
fall. Basic body metrics like height and weight, along with level of activity are
used for estimating the threshold values. An alert notification system, SMS and
emergency call is developed in moderate to critical situations and emergency.
A threshold based fall detection algorithm is proposed in [13]. The algorithm
works by detecting dynamic situations of postures, followed by unintentional
falls to lying postures. The thresholds calculated and obtained from the collected
data are compared with the linear acceleration and the angular velocity sensor
data for detecting the fall. After the fall is detected, a notification is sent to
make an alert about the fall. The authors used gyroscope for data acquisition.
The gyroscope data is not efficient in detecting the fall accurately therefore
making the proposed system not a good choice for monitoring the fall. In [14],
accelerometer and gyroscope sensors are used together to detect fall in elderly
population. Further gravity and angular velocity is extracted from the data to
detect fall. Using gravity and angular velocity not only detect a fall but also the
posture of the body during fall is detected. Authors experimented the system to
know the false positive and false negative fall detection. Also, false fall positions
were used in study to know the impact and efficiency of the algorithm. The
approach used to detect the fall in their proposed study is too simple to be
adopted for detecting quick falls using accelerometer data. That is why, the
system performs poorly in differentiating jumping into bed and falling against
wall with seated posture.
In [15], a smartphone-based fall detection system using a threshold-based
algorithm to distinguish between activities of daily living and falls in real time is
proposed. By comparative analysis of threshold levels for acceleration, in order
to get the best sensitivity and specificity, acceleration thresholds were deter-
mined for early pre-impact alarm (4.5-5 m/s2 ) and post-fall detection (21-28
m/s2 ) under experimental conditions. The experimental thresholds calculated
are helpful for further study in this area of research. But accelerometer alone is
not sufficient for detecting the fall effectively.
The paper [16] describes a fall detection sensor to monitor the subject safely and
accurately by implementing it in a large sensor network called SensorNet. Initial
approaches such as conjoined angle change and magnitude detection algorithm
were unsuccessful. Another drawback of this approach involves inefficiency of
70 A. Bhatia et al.
complex fall detection in which the user did not end up oriented horizontally
with the ground. The fall-detection board designed was able to detect 90% of all
falls with 5% false positive rate.
The Ivy project [17] used low-cost and low-power wearable accelerometer on
wireless sensor network to detect the fall. The threshold on the peak values of
the accelerometer were estimated along orientation angular data to detect the
fall. The authors found that the intensity and acceleration of fall is far different
from other activities. The main drawback of this proposed system is that it
only works well for an indoor environment because of its dependance on a fixed
network to relay events. In the study [18], accelerometer sensor is used to detect
wearer’s posture, activity and fall in wireless sensor network. The activity is
determined by the alternating current component and posture is determined by
the direct current component of the accelerometer signal. Fall detection rate of
the proposed system is 93.2%. The paper lacks in explaining the actual algorithm.
Moreover the complexity and cost involved in designing the system makes it less
suitable for fall-detection in real life scenario.
In [19], authors proposed a fall detection system which uses two sets of sen-
sors, one with an accelerometer and a gyroscope and the other with only an
accelerometer. They used sensor data to calculate angular data and their system
outperforms the earlier known systems as the system is able to detect fall with
an average lead-time of 700 milliseconds before the impact to ground occurs.
The major drawback of the system is that the subject has to wear torso and
thigh sensors and this tends to be cumbersome for subjects. Also, the system is
not capable to record data in real-life situations.
3 Methods
In this section, first, principle behind previous researches that utilize orientation
sensor based design approaches is discussed, then we describe gravity sensor
based approach that forms the principle and basis of our proposal. We further
show that gravity sensor based design is more accurate and becomes a natural
choice for user-centric and device friendly fall detection solution approaches.
Consider the example in Figure 1(b), where axis system is rotated around
z axis by θ. The rotated axis are shown as X and Y . X and Y are axis
coordinates relative to phone. Contribution of this rotation to the X component
of fixed axis (X and Y) is given as: aX = rxy cos(θ + k), where ax and ay are X
and Y components of acceleration vector and k is the angle acceleration vector
makes width the X axis. Combining the effect of azimuth, roll and pitch, we
get following matrix:
⎡ ⎤
cos(ψ).cos(φ) −sin(ψ).cos(θ) sin(ψ).sin(θ)
M = ⎣sin(ψ).cos(φ) cost(ψ).cos(θ) −cost(ψ).sin(θ)⎦
−sin(ψ) cos(φ).sin(θ) cos(φ).cos(θ)
⎡ ⎤
0 cost(ψ).sin(φ).sin(θ) cost(ψ).sin(φ).cos(θ)
+ ⎣0 sin(φ).sin(θ) sin(ψ).sin(φ).cos(θ) ⎦ (1)
0 0 cos(φ).cos(θ)
Relation between body axis ([x , y , z ]) and earth axis ([x, y, z]) is given as
Gravity sensor returns only the influence of gravity. Gravity vector under the
→
−
context of the phone coordinate system is given as: G = (gx , gy , gz ), and
→
−
acceleration in the same axis system is given as: A = (ax , ay , az ).
72 A. Bhatia et al.
From Figure 2, since both vectors are under the same coordinate system, the
phone coordinate system by simple vector inner product, the angle between the
two can be easily found and thus acceleration component of cellphone parallel
to gravity vector is given as below,
→
−
−−→ − → → −
− → G
Apar = az = ( A G ) 2 (3)
−
→
G
4.2 Algorithm
Gradient is based on the observation that when a user falls there is a sudden
change in acceleration in vertical downward direction and therefore, downward
acceleration component contains enough information to detect fall events. We
ran several laboratory experiments to verify this hypothesis. The sensor data is
collected on a mobile device using Gradient application. Then the differentiation
of vertical downward component in time is compared with preset threshold value
to identify fall events. It is to be noted that the name Gradient is derived
from Gravity-differentiation, the driving idea behind the proposed solution.
The detailed algorithm is presented below:
74 A. Bhatia et al.
In this section, performance of the proposed solution is discussed along with the
experiment scenarios. We did not test our system on real fall datasets1 because
gravity sensor values are not available. However, we tried our best to collect fall
data in a controlled environment to match the real life natural fall events.
Gradient app distributed among internal members of the research team (4 grad-
uate students and 2 faculty members) to evaluate the user acceptance and feed-
back of the use of the app and also its accuracy for data acquisition. Data was
collected on Samsung Galaxy S4 which has Quad-core 1.6 GHz processor, 2
GB internal RAM and runs on Android OS, v4.2.2 (Jelly Bean). Members of
the team carried the smart phone with Gradient application during their daily
routine of work and activities across different time spans and number of days.
Detailed logs of the times when they fall were maintained to check the sustain-
ability of the app post-experimentation and results. Feedbacks were taken from
the members to improve the app in every aspect of data acquisition. Two of the
experiments performed by the team members are represented in the Figure 4a
and 4b respectively. In the first experiment, Figure 4a, there is fall at the end
of 6th minute of data acquisition. Second fall is clearly visible at end of the 7th
minute. Similarly in Figure 4b fall can be seen in the mid of 6th and 7th minute
respectively.
1
http://www.bmi.teicrete.gr/index.php/research/mobifall
Gradient: A User-Centric Lightweight Smartphone 75
(a) Experiment 1
(b) Experiment 2
The positive predictive value (86.96%) and sensitivity (90.91%) shows effec-
tivity of the proposed system in detecting falls.
We plotted the time series data in Figure 4. The first subplot is drawn from the
magnitude of accelerometer sensor data, which is calculated as:
|A| = a2x + a2y + a2z
In this section, we compare our work with one of the most notable work known
as iFall [11]. The experiment was performed by running iFall and Gradient
concurrently along a stop watch to measure the exact time of real falls. The
comparison between the iFall and our proposed design is presented in Figure 5.
The upper numbered labels 1 through 9 represents falls detected by iFall and
the lower numbered labels 1 through 4 represents falls detected by Gradient.
In Figure 5, we observe although iFall successfully detects fall events, it also
Gradient: A User-Centric Lightweight Smartphone 77
Fig. 5. Comparison between iFall (red square) and Gradient (blue square)
outputs several false positives in the event of no fall. We observe if the device
is on a running motion or on a jerky motion such as shake, iFall records such
events as fall events. Clearly, Gradient shows better accuracy than iFall.
6 Conclusion
Fall is the major health risk among the old-aged people around the world. Fall
detection using computational approach has remained a challenging task, that
prompted researchers to propose various computational methods to detect the
occurrence of fall. But the solution that are user-centric and device-friendly
is elusive. In this paper, we proposed a novel approach of fall detection using
accelerometer and gravity sensors which are now integral components of smart-
phones. We designed an Android application to collect experimental data, and
applied our algorithm to test the accuracy of the system. Our initial results
are very promising and the proposed method has a potential to reduce the false
positives which is a common problem with other popular user-centric and device-
friendly systems. Furthermore, we believe that this system can help health care-
takers, health professionals, and medical practitioners to better mange health
hazards due to fall in elder people. In future, we plan to conduct a user study
with a healthcare center and test our system on real fall datasets.
References
1. Haub, C.: World population aging: clocks illustrate growth in population under
age 5 and over age 65. Population Reference Bureau, June 18, 2013 (2011)
2. Fulks, J., Fallon, F., King, W., Shields, G., Beaumont, N., Ward-Lonergan, J.:
Accidents and falls in later life. Generations Review 12(3), 2–3 (2002)
3. Duthie Jr, E.: Falls. The Medical clinics of North America 73(6), 1321–1336 (1989)
4. Graafmans, W., Ooms, M., Hofstee, H., Bezemer, P., Bouter, L., Lips, P.: Falls in
the elderly: a prospective study of risk factors and risk profiles. American Journal
of Epidemiology 143(11), 1129–1136 (1996)
5. Tromp, A., Pluijm, S., Smit, J., Deeg, D., Bouter, L., Lips, P.: Fall-risk screening
test: a prospective study on predictors for falls in community-dwelling elderly.
Journal of Clinical Epidemiology 54(8), 837–844 (2001)
78 A. Bhatia et al.
6. Kleinberger, T., Becker, M., Ras, E., Holzinger, A., Müller, P.: Ambient intelligence
in assisted living: enable elderly people to handle future interfaces. In: Stephanidis,
C. (ed.) UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 103–112. Springer, Heidelberg
(2007)
7. Igual, R., Medrano, C., Plaza, I.: Challenges, issues and trends in fall detection
systems. BioMedical Engineering OnLine 12(1), 1–24 (2013)
8. Phones replacing wrist watches. http://today.yougov.com/news/2011/05/05/
brother-do-you-have-time/ (online accessed April 22, 2014)
9. Ziefle, M., Rocker, C., Holzinger, A.: Medical technology in smart homes: exploring
the user’s perspective on privacy, intimacy and trust. In: 2011 IEEE 35th Annual
Computer Software and Applications Conference Workshops (COMPSACW),
pp. 410–415. IEEE (2011)
10. Lindemann, U., Hock, A., Stuber, M., Keck, W., Becker, C.: Evaluation of a fall
detector based on accelerometers: A pilot study. Medical and Biological Engineer-
ing and Computing 43(5), 548–551 (2005)
11. Sposaro, F., Tyson, G.: ifall: An android application for fall monitoring and
response. In: Annual International Conference of the IEEE Engineering in Medicine
and Biology Society, EMBC 2009, pp. 6119–6122. IEEE (2009)
12. Williams, G., Doughty, K., Cameron, K., Bradley, D.: A smart fall and activ-
ity monitor for telecare applications. In: Proceedings of the 20th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society, 1998,
vol. 3, pp. 1151–1154. IEEE (1998)
13. Wibisono, W., Arifin, D.N., Pratomo, B.A., Ahmad, T., Ijtihadie, R.M.: Falls
detection and notification system using tri-axial accelerometer and gyroscope sen-
sors of a smartphone. In: 2013 Conference on Technologies and Applications of
Artificial Intelligence (TAAI), pp. 382–385. IEEE (2013)
14. Li, Q., Stankovic, J.A., Hanson, M.A., Barth, A.T., Lach, J., Zhou, G.: Accurate,
fast fall detection using gyroscopes and accelerometer-derived posture informa-
tion. In: Sixth International Workshop on Wearable and Implantable Body Sensor
Networks, BSN 2009, pp. 138–143. IEEE (2009)
15. Mao, L., Liang, D., Ning, Y., Ma, Y., Gao, X., Zhao, G.: Pre-impact and impact
detection of falls using built-in tri-accelerometer of smartphone. In: Zhang, Y.,
Yao, G., He, J., Wang, L., Smalheiser, N.R., Yin, X. (eds.) HIS 2014. LNCS,
vol. 8423, pp. 167–174. Springer, Heidelberg (2014)
16. Brown, G.: An accelerometer based fall detector: development, experimentation,
and analysis. University of California, Berkeley (2005)
17. Chen, J., Kwong, K., Chang, D., Luk, J., Bajcsy, R.: Wearable sensors for reli-
able fall detection. In: 27th Annual International Conference of the Engineering in
Medicine and Biology Society, IEEE-EMBS 2005, pp. 3551–3554. IEEE (2006)
18. Lee, Y., Kim, J., Son, M., Lee, J.H.: Implementation of accelerometer sensor mod-
ule and fall detection monitoring system based on wireless sensor network. In: 29th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, EMBS 2007, pp. 2315–2318. IEEE (2007)
19. Nyan, M., Tay, F.E., Murugasu, E.: A wearable system for pre-impact fall detection.
Journal of Biomechanics 41(16), 3475–3481 (2008)
20. Inc., G.: Android Gingerbread OS (2013). http://developer.android.com/about/
versions/android-2.3-highlights.html (online accessed April 04, 2014)
Towards Diet Management with Automatic
Reasoning and Persuasive
Natural Language Generation
1 Introduction
The daily diet is one of the most important factors influencing diseases, in partic-
ular for obesity. As highlighted by the World Health Organization, this factor is
primarily due to the recent changes in the lifestyle [26]. The necessity to encour-
age the world’s population toward a healthy diet has been sponsored by the FAO
[20]. In addition, many states specialized these guidelines by adopting strategies
related to their food history (for instance, for USA http://www.choosemyplate.
gov). In Italy, the Italian Society for Human Nutrition has recently produced a
prototypical study with recommendations for the use of specialized operators [1].
This scenario suggests the possibility to integrate the directives on nutrition
in the daily diet of people by using multimedia tools on mobile devices. The
smartphone can be considered as an super-sense that creates new modalities of
interaction with food. In recent years there has been a growing interest in using
multimedia applications on mobile devices as persuasive technologies [13].
Often a user is not able to carefully follow a diet for a number of reasons.
When a deviation occurs, it is useful to support the user in devising the conse-
quences of such deviation and to dynamically adapt the rest of the diet in the
upcoming meals so that the global Dietary Reference Values (henceforth DRVs)
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 79–90, 2015.
DOI: 10.1007/978-3-319-23485-4 8
80 L. Anselma and A. Mazzei
fast food or restaurant chains, where the effort of deploying the system can be
rewarded by an increase in customer retention.
This paper is organized as follows: in Section 2 we describe the automatic
reasoning facilities, in Section 3 we describe the design of the persuasive NLG
based on different theories of persuasion and, finally, in Section 4 we draw some
conclusions.
Fig. 2. Example of DRVs for a week represented as STP (for space constraints the
constraints for the meals are not represented).
weight, gender and age, using Schofield equation [24], it is possible to estimate
the basal metabolic rate; for example a 40-year-old male who is 1.80 m tall
and weighs 71.3 kg has an estimated basal metabolic rate of 1690 kcal/day.
Such value is then adjusted [1] by taking into account the energy expenditure
related to the physical activity of the individual; for example a sedentary lifestyle
corresponds to a physical activity level of 1.45, thus, in the example, since the
physical activity level is a multiplicative factor, the person has a total energy
requirement of 2450 kcal/day. Moreover, it is recommended [1] that such energy
is provided by the appropriate amount of the different macronutrients, e.g., 260
kcal/day of proteins, 735 kcal/day of lipids and 1455 kcal/day of carbohydrates.
In this section we focus on the total energy requirement; the macronutrients can
be dealt with separately in the same way.
We represent the DRVs as STPs; more precisely, we use an STP con-
straint to represent – instead of temporal distance between temporal points
– the admissible DRVs. Thus, e.g., a recommendation to eat a lunch of min-
imum 500 kcal and maximum 600 kcal is represented by the STP constraint
500 ≤ lunchE − lunchS ≤ 600, where lunchE and lunchS represent the end and
the start of the lunch, respectively.
Furthermore, we exploit the STP framework to allow a user to make small
deviations with regard to the “ideal” diet and to know in advance what are
the consequences of such deviations on the rest of the diet. Thus, we impose
less strict constraints over the shortest periods (i.e., days or meals) and stricter
constraints over the longest periods (i.e., months, weeks). For example the rec-
ommended energy requirement of 2450 kcal/day, considered over a week, results
in a constraint such as 2450 · 7 ≤ weekE − weekS ≤ 2450 · 7 and for the
single days we allow the user to set, e.g., a deviation of 10%, thus result-
ing in the constraints 2450 − 10% ≤ SundayE − SundayS ≤ 2450 + 10%,
. . . , 2450 − 10% ≤ SaturdayE − SaturdayS ≤ 2450 + 10% (see Fig. 2). For
single meals we can further relax the constraints: for example the user can
decide to split the energy assumption for the day among the meals (e.g., 20%
for breakfast and 40% for lunch and dinner) and to further relax the con-
straints (e.g., of 30%), thus resulting in a constraint, e.g., 2450 · 20% − 30% ≤
Sunday breakf astE − Sunday breakf astS ≤ 2450 · 20% + 30%.
Representing and Reasoning on the Diet and the Food. Along these
lines, it is possible to represent the dietary recommendations for a specific user.
However, we wish to support such a user into taking advantage of the information
regarding the actual meals s/he consumes. In this way, the user can learn what
Towards Diet Management with Automatic Reasoning 83
are the consequences on his/her diet of eating a specific dish and s/he could
use such information in order to make informed decisions about the current
or future meals. Therefore it is necessary to “integrate” the information about
the eaten dishes with the dietary recommendations. We devise a system where
the user inputs the data about the food s/he is eating using a mobile app where
the input is possibly supported by reading a QR code and s/he can also specify
the amount of food s/he has eaten. Thus, we allow some imprecision due to
possible differences in the portions (in fact, the actual amount of food in a
portion is not always the same and, furthermore, a user may not eat a whole
portion) or in the composition of the dish [6]. We support such feature by using
STP constraints also for representing the nutritional values of the eaten food.
The dietary recommendations can be considered constraints on classes, which
can be instantiated several times when the user assumes his/her meals. Thus, the
problem of checking whether a meal satisfies the constraints of the dietary rec-
ommendations corresponds to checking whether the constraints of the instances
satisfy the constraints of the classes. This problem has been dealt with in [25] and
[2]. In these works the authors have considered the problem of “inheriting” the
temporal constraints from classes of events to instances of events in the context of
the STP framework, also taking into account problems deriving from correlation
between events and from observability. In our setting we have a simpler setting,
where correlation is known and observability is complete (even if possibly impre-
cise). Thus, we generate a new, provisional, STP where we add the new STP
constraints deriving from the meals that the user has consumed: the added con-
straints possibly restrict the values allowed by the constraints in the STP. Then
we propagate the constraints in such a new STP and we determine whether the
new constraints are consistent and we obtain the new minimal network with the
implied relations. For example, let us suppose that the user on Sunday, Monday
and Tuesday had an actual intake of 2690 kcal for each day. This corresponds
to adding to the STP the new constraints 2690 ≤ SundayE − SundayS ≤ 2690,
. . . , 2690 ≤ T uesdayE − T uesdayS ≤ 2690. Then, propagating the constraints
of the new STP (see Fig. 3), we discover that (i) the STP is consistent and thus
the intake is compatible with the diet and (ii) on each remaining day of the week
the user has to assume a minimum of 2205 kcal and a maximum of 2465 kcal.
Although the information deriving from the STP is complete (and correct), in
order to show to the user a meaningful feedback and to make it possible to
interface the automatic reasoning module with the NLG module, it is useful
84 L. Anselma and A. Mazzei
to interpret the results of the STP. In particular we wish to provide the user
with a user-friendly information not limited to a harsh “consistent/inconsistent”
answer regarding the adequacy of a dish with regard to her/his diet. Therefore
we consider the case where the user proposes to our system a dish, we obtain
its nutritional values, we translate them, along with the user’s diet and past
meals, into STP and, by propagating the constraints, we obtain the minimal
network. By taking into account a single macronutrient (carbohydrates, lipids
or proteins), the resulting STP allows us to classify the macronutrient in the
proposed dish in one of the following five cases: permanently inconsistent (I.1),
occasionally inconsistent (I.2), consistent and not balanced (C.1), consistent and
well-balanced (C.2) and consistent and perfectly balanced (C.3).
In the cases I.1 and I.2 the value of the macronutrient is inconsistent. In case
I.1 the value for the nutrient is inconsistent with the DRVs as represented in
the user’s diet. The dish cannot be accepted even independently of the other
food s/he may possibly eat. This case is detected by considering whether the
macronutrient violates a constraint on classes. In case I.2 the dish per se does
not violate the DRVs, but – considering the past meals s/he has eaten – it would
preclude to be consistent with the diet. Thus, it is inconsistent now, but it could
become possible to choose it in the future, e.g., next week or month. This case
is detected by determining whether the macronutrient, despite it satisfies the
constraints on the classes, is inconsistent with the propagated inherited STP.
In the cases C.1, C.2 and C.3 the value of the macronutrient is consistent
with the diet, also taking into account the other dishes that the user has already
eaten. It is possible to detect that the dish is consistent by exploiting the minimal
network of the STP: if the value of the macronutrient is included between the
lower and upper bounds of the relative constraint, then we are guaranteed that
the STP is consistent and that the dish is consistent with the diet. This can
be proven by using the property that in a minimal network every tuple in a
constraint can be extended to a solution [19]. A consistent but not balanced
choice of a dish will have consequences on the rest of the user’s diet because the
user will have to “recover” from it. Thus we distinguish three cases depending on
the level of the adequacy of the value of the macronutrient to the diet. In order
to discriminate between the cases C.1, C.2 and C.3, we consider how the value
of the macronutrient stacks upon the allowed range represented in the related
STP constraint. We assume that the mean value is the “ideal” value according
to the DRVs and we consider two parametric user-adjustable thresholds relative
to the mean: according to the deviation with respect to the mean we classify the
macronutrient as not balanced (C.1), well balanced (C.2) or perfectly balanced
(C.3) (see Fig. 4). In particular, we distinguish between lack or excess of a specific
macronutrient for a dish: if a macronutrient is lacking (in excess) with regard to
the ideal value, we tag the dish with the keyword IPO (IPER). This information
will be exploited in the generation of the messages.
Towards Diet Management with Automatic Reasoning 85
Table 1. The persuasive message templates: the underline denotes the variable parts of
the template. The column C contains the classification produced by the STP reasoner,
while the column D contains the direction of the deviation: IPO (IPER) stands for
the information that the dish is poor (rich) in the value of the macronutrient.
interpreting the output of the reasoner (cf. Section 2.3) and possible suggestions
that can guide the choices of the user in the next days. The suggestions can be
obtained by a simple table that couples the excess (deficiency) of a macronutrient
with a dish that could compensate this excess (deficiency). In particular, for the
reasoner’s outputs I.1, I.2, C.1 and C.2, we need to distinguish the case of a dish
poor in a macronutrient (IPO in Table 1) with respect to the case of a dish rich
in a macronutrient (IPER). If the dish is classified as IPO (IPER), we insert
into the message a suggestion to consume in the next days a dish that contains
a big (small) quantity of that specific macronutrient.
For sake of simplicity we do not describe the algorithm used in the generation
module to combine the three distinct outputs of the reasoner on the three distinct
macronutrients (i.e. proteins, lipids and carbohydrates). In short, the messages
corresponding to each macronutrient need to be aggregated into a single message.
A number of constraints related to coordination and relative clauses need to
be accounted for [22]. In the next Section we describe the three theories of
persuasion that influenced and motivated the design of the messages.
References
1. LARN - Livelli di Assunzione di Riferimento di Nutrienti ed energia per la popo-
lazione italiana - IV Revisione. SICS Editore, Milan (2014)
2. Anselma, L., Terenziani, P., Montani, S., Bottrighi, A.: Towards a comprehensive
treatment of repetitions, periodicity and temporal constraints in clinical guidelines.
Artificial Intelligence in Medicine 38(2), 171–195 (2006)
3. Balintfy, J.L.: Menu planning by computer. Commun. ACM 7(4), 255–259 (1964)
4. Barzilay, R., Mccullough, D., Rambow, O., Decristofaro, J., Korelsky, T., Lavoie, B.
Inc, C.: A new approach to expert system explanations. In: 9th International Work-
shop on Natural Language Generation, pp. 78–87 (1998)
5. Bas, E.: A robust optimization approach to diet problem with overall glycemic
load as objective function. Applied Mathematical Modelling 38(19–20), 4926–4940
(2014)
6. Buisson, J.C.: Nutri-educ, a nutrition software application for balancing meals,
using fuzzy arithmetic and heuristic search algorithms. Artif. Intell. Med. 42(3),
213–227 (2008)
7. Cialdini, R.B.: Influence: science and practice. Pearson Education, Boston (2009)
8. Dechter, R., Meiri, I., Pearl, J.: Temporal constraint networks. Artif. Intell. 49(1–
3), 61–95 (1991)
90 L. Anselma and A. Mazzei
9. Derks, D., Bos, A.E.R., von Grumbkow, J.: Emoticons in computer-mediated com-
munication: Social motives and social context. Cyberpsy., Behavior, and Soc. Net-
working 11(1), 99–101 (2008)
10. Fogg, B.: Persuasive Technology: Using computers to change what we think and
do. Morgan Kaufmann Publishers, Elsevier, San Francisco (2002)
11. Fogg, B.: The new rules of persuasion (2009). http://captology.stanford.edu/
resources/article-new-rules-of-persuasion.html
12. Guerini, M., Stock, O., Zancanaro, M.: A taxonomy of strategies for multimodal
persuasive message generation. Applied Artificial Intelligence 21(2), 99–136 (2007)
13. Holzinger, A., Dorner, S., Födinger, M., Valdez, A.C., Ziefle, M.: Chances of
increasing youth health awareness through mobile wellness applications. In: Leit-
ner, G., Hitz, M., Holzinger, A. (eds.) USAB 2010. LNCS, vol. 6389, pp. 71–81.
Springer, Heidelberg (2010)
14. Hovy, E.H.: Generating Natural Language Under Pragmatic Constraints. Lawrence
Erlbaum, Hillsdale (1988)
15. Iizuka, K., Okawada, T., Matsuyama, K., Kurihashi, S., Iizuka, Y.: Food menu
selection support system: considering constraint conditions for safe dietary life. In:
Proceedings of the ACM Multimedia 2012 Workshop on Multimedia for Cooking
and Eating Activities, CEA 2012, pp. 53–58. ACM, New York (2012)
16. Kaptein, M., de Ruyter, B.E.R., Markopoulos, P., Aarts, E.H.L.: Adaptive persua-
sive systems: A study of tailored persuasive text messages to reduce snacking. TiiS
2(2), 10 (2012)
17. Lacave, C., Diez, F.J.: A review of explanation methods for heuristic expert sys-
tems. Knowl. Eng. Rev. 19(2), 133–146 (2004)
18. Lancaster, L.M.: The history of the application of mathematical programming to
menu planning. European Journal of Operational Research 57(3), 339–347 (1992)
19. Montanari, U.: Networks of constraints: Fundamental properties and applications
to picture processing. Information Sciences 7, 95–132 (1974)
20. Nishida, C., Uauy, R., Kumanyika, S., Shetty, P.: The joint WHO/FAO expert
consultation on diet, nutrition and the prevention of chronic diseases: process,
product and policy implications. Public Health Nutrition 7, 245–250 (2004)
21. Reiter, E., Robertson, R., Osman, L.: Lessons from a Failure: Generating Tailored
Smoking Cessation Letters. Artificial Intelligence 144, 41–58 (2003)
22. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge
University Press, New York (2000)
23. de Rosis, F., Grasso, F.: Affective natural language generation. In: Paiva, A. (ed.)
Affective Interactions. LNCS, vol. 1814, pp. 204–218. Springer, Heidelberg (2000)
24. Schofield, W.N.: Predicting basal metabolic rate, new standards and review of
previous work. Human Nutrition: Clinical Nutrition 39C, 5–41 (1985)
25. Terenziani, P., Anselma, L.: A knowledge server for reasoning about temporal con-
straints between classes and instances of events. International Journal of Intelligent
Systems 19(10), 919–947 (2004)
26. World Health Organization: Global strategy on diet, physical activity and health
(WHA57.17). In: 75th World Health Assembly (2004)
Predicting Within-24h Visualisation of Hospital
Clinical Reports Using Bayesian Networks
1 Introduction
Evidence-based medicine relies on three information sources: patient records,
published evidence and the patient itself [25]. Even though great improvements
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 91–102, 2015.
DOI: 10.1007/978-3-319-23485-4 9
92 P.P. Rodrigues et al.
and developments have been made over the years, on-demand access to clinical
information is still inadequate in many settings, leading to less efficiency as a
result of a duplication of effort, excess costs and adverse events [10]. Further-
more, a lot of distinct technological solutions coexist to integrate patient data,
using different standards and data architectures which may lead to difficulties
in further interoperability [7]. Nonetheless, a lot of patient information is now
accessible to health-care professionals at the point of care. But, in some cases, the
amount of information is becoming too large to be readily handled by humans or
to be efficiently managed by traditional storage algorithms. As more and more
patient information is stored, it is very important to efficiently select which one
is more likely to be useful [8].
The identification of clinically relevant information should enable an improve-
ment both in user interface design and in data management. However, it is dif-
ficult to identify what information is important in daily clinical care, and what
is used only occasionally. The main problem addressed here is how to estimate
the relevance of health care information in order to anticipate its usefulness at
a specific point of care. In particular, we want to estimate the probability of
a piece of information being accessed during a certain time interval (e.g. first
24 hours after creation), taking into account the type of data and the context
where it was generated and to use this probability to prioritise the information
(e.g. assigning clinical reports for secondary storage archiving or primary storage
access).
Next section presents background knowledge on electronic access to clinical
data (2.1), assessment of clinical data relevance (2.2) and machine learning in
health care research (2.3), setting the aim of this work (2.4). Then, section 3
presents our methodology to data processing, model learning, and prediction of
within-24h visualisation of clinical data, which results are exposed in section 4.
Finally, section 5 finalises the exposition with discussion and future directions.
2 Background
The practice of medicine has been described as being dominated by how well
information is collected, processed, retrieved, and communicated [2].
Currently in most hospitals there are great quantities of stored digital data
regarding patients, in administrative, clinical, lab or imaging systems. Although
it is widely accepted that full access to integrated electronic health records
(EHR) and instant access to up-to-date medical knowledge significantly reduces
faulty decision making resulting from lack of information [9], there is still very
little evidence that life-long EHR improve patient care [4]. Furthermore, there
use is often disregarded. For example, studies have indicated that data generated
before an emergency visit are accessed often, but by no means in a majority of
Predicting Within-24h Visualisation of Hospital Clinical Reports 93
times (5% to 20% of the encounters), even when the user was notified of the
availability of such data [12].
One usual solution for data integration in hospitals is to consider a virtual
patient record (VPR), created by integrating all clinical records, which must
collect documents from distributed departmental HIS [3]. Integrated VPR of
central hospitals may gather millions of clinical documents, so accessing data
becomes an issue. A paradigmatic example of this burden to HIS is the amount of
digital data produced in the medical imaging departments, which has increased
rapidly in recent years due mainly to a greater use of additional diagnostic
procedures, and an increase in the quality of the examinations. The management
of information in these systems is usually implemented using Hierarchical Storage
Management (HSM) solutions. This type of solution enables the implementation
of various layers which use different technologies with different speeds of access,
corresponding to different associated costs. However, the solutions which are
currently implemented use simple rules for information management, based on
variables such as the time elapsed since the last access or the date of creation of
information, not taking into account the likely relevance of information in the
clinical environment [6].
In a quest to prioritise the data that should be readily available in HIS, several
pilot studies have been endured to analyse for how long clinical documents are
useful for health professionals in a hospital environment, bearing in mind doc-
ument content and the context of the information request. Globally, the results
show that some clinical reports are still used one year after creation, regardless
of the context in which they were created, although significant differences existed
in reports created during distinct encounter types [8]. Other results show that
half of all visualisations might be of reports more than 2 years-old [20], although
this visualisation distribution also varies across clinical department and time of
production [21]. Thus, usage of patients past information (data from previous
hospital encounters), varied significantly according to the setting of health care
and content, and is, therefore, not easy to prioritise.
a given time would be more efficient in managing the information that is stored
in fast memory and slow memory. A recent study from the same group addressed
other possibly relevant factors besides document age, including type of encounter
(i.e. emergency room, inpatient care, or outpatient consult), department where
the report was generated (e.g. gynaecology or internal medicine) and even type
of report in each department, but the possibility of modelling visualisations with
survival analysis proved to be extremely difficult [21].
Nonetheless, if we could, for instance, discriminate solely between documents
that will be needed in the next 24 hours from the remaining, we could efficiently
decide which ones to store in a faster-accessible memory device. Furthermore,
we could then rank documents according to their probability of visualisation in
order to adjust the graphical user interface of the the VPR, to improve system’s
usability. By applying regression methods or other modelling techniques it is
possible to identify which factors are associated with the usage or relevance of
patient data items. These factors and associations can then be used to estimate
data relevance in a specific future time interval.
The definition of clinical decision support systems (most of the times based on
expert systems) is currently a major topic since it may help the diagnosis, treat-
ment selection, prognosis of rate of mortality, prognosis of quality of life, etc.
They can even be used to administrative tasks like the one addressed by this
work. However, the complicated nature of real-world biomedical data has made
it necessary to look beyond traditional biostatistics [14] without loosing the
necessary formality. For example, naive Bayesian approaches are closely related
to logistic regression [22]. Hence, such systems could be implemented applying
methods of machine learning [16], since new computational techniques are bet-
ter at detecting patterns hidden in biomedical data, and can better represent
and manipulate uncertainties [22]. In fact, the application of data mining and
machine learning techniques to medical knowledge discovery tasks is now a grow-
ing research area. These techniques vary widely and are based on data-driven
conceptualisations, model-based definitions or on a combination of data-based
knowledge with human-expert knowledge [14].
Bayesian approaches have an extreme importance in these problems as they
provide a quantitative perspective and have been successfully applied in health
care domains [15]. One of their strengths is that Bayesian statistical methods
allow taking into account prior knowledge when analysing data, turning the data
analysis into a process of updating that prior knowledge with biomedical and
health-care evidence [14]. However, only after the 90’s we may find evidence of
a large interest on these methods, namely on Bayesian networks, which offer a
general and versatile approach to capturing and reasoning with uncertainty in
medicine and health care [15]. They describe the distribution of probabilities of
one set of variables, making possible a two-fold analysis: a qualitative model and
a quantitative model, presenting two types of information for each variable.
Predicting Within-24h Visualisation of Hospital Clinical Reports 95
2.4 Aim
The aim of this work is the development of a decision support model for dis-
criminating between reports that are going to be useful in the next 24 hours and
reports which can be otherwise stored in slower storage devices, since they will
not be accessed in the next 24 hours, thus improving performance of the entire
virtual patient record system.
the curve (AUC). Furthermore, to assess the general structure and accuracy of
learned models, stratified 10-fold cross-validation was repeated 10 times, estimat-
ing accuracy, sensitivity, specificity, precision (positive and negative predictive
values) and the area under the ROC curve, for all compared models.
3.3 Software
Logistic regression was done with R package stats [18], Bayesian network struc-
ture was learned with R package bnlearn [23], Bayesian network parameters were
fitted with R package gRain [11], ROC curves were computed with R package
pROC [19], and odds ratios (OR) were computed with R package epitools [1].
4 Results
A total of 4975 reports were included in the analysis. The main characteristics of
the reports are shown in Table 1, which were generated from patients with a mean
(std dev) age of 55.5 (20.5). Less than 23% of the reports were visualised in the 24
hours following their creation, which were nonetheless more from female patients
(almost 55%) with a 24h-visualisation OR=1.51 (95%CI [1.32,1.72]) for female-
patient reports. Also significant was the context of report creation, with more
reports being created in inpatient care (44.4%) and outpatient consults (41.4%),
although compared with the latter context, 24-hour visualisations are more likely
for reports generated in inpatient care (OR=8.60 [7.04,10.59]) or in the emergency
room (OR=14.50 [11.22,18.83]). Regarding creation time, morning (OR=1.22
[1.05,1.41]), night (OR=1.82 [1.46,2.28]) and dawn (OR=2.88 [2.03,4.07]) have all
higher 24-hour visualisation likelihood than the afternoon period.
Figure 1 presents the qualitative model for the Tree-Augmented Naive Bayes net-
work, where interesting connections can be extracted from the resulting model.
First, patient’s data features are associated. Then, creation time data and con-
text data are also strongly related. However, the most interesting feature is prob-
ably the department that created the report, since this was chosen by the algo-
rithm as ancestor of patient’s age, time of report creation and type of encounter.
For a quantitative analysis, Figure 2 presents the in-sample ROC curves for
logistic regression (left), Naive Bayes (centre) and TAN (right). As expected,
increasing model complexity enhances the in-sample AUC (LR 88.6%, NB 86.9%
and TAN 90.7) but, globally, all models presented good discriminating power
towards the outcome.
98 P.P. Rodrigues et al.
Table 1. Basic characteristics of included reports: patient’s data (sex and age), report
creation context (department, encounter) and time (day of week, daily period) data.
Visualised in 24 hours
No Yes Total
Day-of-Week, n (%)
Mon 728 (18.9) 303 (26.8) 1031 (20.7)
Tue 671 (17.5) 291 (25.8) 962 (19.3)
Wed 743 (19.3) 208 (18.4) 951 (19.1)
Thu 804 (20.9) 35 (3.1) 839 (16.9)
Fri 673 (17.5) 92 (8.2) 765 (15.4)
Sat 122 (3.2) 99 (8.7) 221 (4.4)
Sun 105 (2.7) 101 (9.0) 206 (4.1)
Daily Period, n (%)
Morning 1768 (46.0) 521 (46.2) 2289 (46.0)
Afternoon 1661 (43.2) 402 (35.6) 2063 (41.5)
Night 331 (8.6) 146 (13.0) 477 (9.6)
Dawn 86 (2.2) 60 (5.3) 146 (2.9)
Predicting Within-24h Visualisation of Hospital Clinical Reports 99
Visual24h
Department
Fig. 1. Tree-Augmented Naive Bayes for predicting within 24h visualisation of clinical
reports in the virtual patient record.
1.0
1.0
0.8
0.8
0.8
0.6
0.6
0.6
Sensitivity
Sensitivity
Sensitivity
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0
Fig. 2. In-sample ROC curves for logistic regression (left), naive Bayes (centre) and
Tree-Augmented Naive Bayes (right).
100 P.P. Rodrigues et al.
In order to assess the ability of the models to generalise beyond the deriva-
tion cohort, cross-validation was endured. Table 2 presents the result of the
10-times-repeated stratified 10-fold cross-validation. Although the more compli-
cated model loses in terms of AUC (85% vs 87%), it brings advantages to the
precise problem of identifying reports that should be stored in secondary mem-
ory as they are less likely to be visualised in the next 24 hours, since it reveals a
negative precision of 89% vs 88% (NB) and 84% (LR). Along with this result, it
is much better at identifying reports that are going to be needed, as sensitivity
rises from 41% (LR) to 64%. Future work should consider different threshold
values for the decision boundary (here, 50%) in order to better suit the model
to the sensitivity-specificity goals of the problem a hands.
Acknowledgments. The authors acknowledge the help of José Hilário Almeida dur-
ing the data gathering process.
References
1. Aragon, T.J.: epitools: Epidemiology Tools (2012)
2. Barnett, O.: Computers in medicine. JAMA: the Journal of the American Medical
Association 263(19), 2631 (1990)
3. Bloice, M.D., Simonic, K.M., Holzinger, A.: On the usage of health records for
the design of virtual patients: a systematic review. BMC Medical Informatics and
Decision Making 13(1), 103 (2013)
4. Clamp, S., Keen, J.: Electronic health records: Is the evidence base any use? Med-
ical Informatics and the Internet in Medicine 32(1), 5–10 (2007)
5. Cruz-Correia, R., Boldt, I., Lapão, L., Santos-Pereira, C., Rodrigues, P.P., Ferreira,
A.M., Freitas, A.: Analysis of the quality of hospital information systems audit
trails. BMC Medical Informatics and Decision Making 13(1), 84 (2013)
6. Cruz-Correia, R., Rodrigues, P.P., Freitas, A., Almeida, F., Chen, R., Costa-Pereira,
A.: Data quality and integration issues in electronic health records. In: Hristidis, V.
(ed.) Information Discovery on Electronic Health Records, chap. 4. Data Mining and
Knowledge Discovery Series, pp. 55–95. CRC Press (2009)
7. Cruz-Correia, R.J., Vieira-Marques, P.M., Ferreira, A.M., Almeida, F.C., Wyatt,
J.C., Costa-Pereira, A.M.: Reviewing the integration of patient data: how systems
are evolving in practice to meet patient needs. BMC Medical Informatics and Deci-
sion Making 7, 14 (2007)
8. Cruz-Correia, R.J., Wyatt, J.C., Dinis-Ribeiro, M., Costa-Pereira, A.: Determi-
nants of frequency and longevity of hospital encounters’ data use. BMC Medical
Informatics and Decision Making 10, 15 (2010)
9. Dick, R., Steen, E.: The Computer-based Patient Record: An Essential Technology
for HealthCare. National Academy Press (1997)
10. Feied, C.F., Handler, J.A., Smith, M.S., Gillam, M., Kanhouwa, M., Rothenhaus,
T., Conover, K., Shannon, T.: Clinical information systems: instant ubiquitous clin-
ical data for error reduction and improved clinical outcomes. Academic emergency
medicine 11(11), 1162–1169 (2004)
11. Højsgaard, S.: Graphical independence networks with the gRain package for R.
Journal of Statistical Software 46(10) (2012)
12. Hripcsak, G., Sengupta, S., Wilcox, A., Green, R.: Emergency department access
to a longitudinal medical record. Journal of the American Medical Informatics
Association 14(2), 235–238 (2007)
102 P.P. Rodrigues et al.
13. Lappenschaar, M., Hommersom, A., Lucas, P.J.F., Lagro, J., Visscher, S., Korevaar,
J.C., Schellevis, F.G.: Multilevel temporal Bayesian networks can model longitudinal
change in multimorbidity. Journal of Clinical Epidemiology 66, 1405–1416 (2013)
14. Lucas, P.: Bayesian analysis, pattern analysis, and data mining in health care.
Current Opinion in Critical Care 10(5), 399–403 (2004)
15. Lucas, P.J.F., van der Gaag, L.C., Abu-Hanna, A.: Bayesian networks in
biomedicine and health-care. Artificial Intelligence in Medicine 30(3), 201–214
(2004)
16. Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)
17. Patriarca-Almeida, J.H., Santos, B., Cruz-Correia, R.: Using a clinical document
importance estimator to optimize an agent-based clinical report retrieval system.
In: Proceedings of the 26th IEEE International Symposium on Computer-Based
Medical Systems, pp. 469–472 (2013)
18. R Core Team: R: A Language and Environment for Statistical Computing (2013)
19. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M.:
pROC: an open-source package for R and S+ to analyze and compare ROC curves.
BMC Bioinformatics 12, 77 (2011)
20. Rodrigues, P.P., Dias, C.C., Cruz-Correia, R.: Improving clinical record visualiza-
tion recommendations with bayesian stream learning. In: Learning from Medical
Data Streams, vol. 765, p. paper4. CEUR-WS.org (2011)
21. Rodrigues, P.P., Dias, C.C., Rocha, D., Boldt, I., Teixeira-Pinto, A., Cruz-Correia,
R.: Predicting visualization of hospital clinical reports using survival analysis
of access logs from a virtual patient record. In: Proceedings of the 26th IEEE
International Symposium on Computer-Based Medical Systems, Porto, Portugal,
pp. 461–464 (2013)
22. Schurink, C.A.M., Lucas, P.J.F., Hoepelman, I.M., Bonten, M.J.M.: Computer-
assisted decision support for the diagnosis and treatment of infectious diseases in
intensive care units. The Lancet infectious diseases 5(5), 305–312 (2005)
23. Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of
Statistical Software 35, 22 (2010)
24. Vieira-Marques, P.M., Cruz-Correia, R.J., Robles, S., Cucurull, J., Navarro, G.,
Marti, R.: Secure integration of distributed medical data using mobile agents.
Intelligent Systems 21(6), 47–54 (2006)
25. Wyatt, J.C., Wright, P.: Design should help use of patients’ data. Lancet
352(9137), 1375–1378 (1998)
On the Efficient Allocation of Diagnostic
Activities in Modern Imaging Departments
1 Introduction
In a modern Diagnostic Imaging Department the allocation and re-allocation of
exams is a complex task, that is time-consuming and is still done manually. On
the one hand, it is fundamental to keep the waiting lists as short as possible, in
order to meet the established waiting time; on the other hand, it is of critical
importance to minimise expenses for the Department. Moreover, patient schedul-
ing has to be balanced, in order to plan the best possible allocation according to
the staff organisation/skills on different modalities, i.e. computed tomography
(CT), Radiography (RX), magnetic resonance (MR) and ultrasound (US) equip-
ment. In order to plan the best possible allocation a lot of available resources
must be taken into account: staff (radiologists, nurses, etc.), equipment (US,
CT, MR, etc.), examinations performed (tagged by imaging modalities, reim-
bursement rate, clustered by regions and/or pathologies) and staff characteristics
(part-time, full-time, etc.).
The literature on medical appointment scheduling is extensive, but
approaches –either automated or in the form of formal guidelines– to deal with
diagnostic activities in radiology Departments are rare. Nevertheless, the impor-
tance of scheduling activities in hospital services is well-known [8]. Even though
a few techniques have been proposed for dealing with part of the allocation prob-
lem (see, e.g., [1,4–6]), a complete approach able to manage all the aspects of the
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 103–109, 2015.
DOI: 10.1007/978-3-319-23485-4 10
104 R. Gatta et al.
In this section we define the relevant entities involved in the allocation problem,
and describe the function used for evaluating the quality of allocation plans.
Entities
We define a set of entities that can easily fit into most of the Radiology Infor-
mation Systems (RIS) currently used in diagnostic Imaging Departments [2].
Specifically, the proposed elements can directly fit with Paris, provided by ATS-
Teinos, and PRORAM from METRIKA. With some minor changes it can be
also fit with Estensa, of Esaote. We are confident it can also be easily adopted
in other situations. The most important entities are the following:
Exam represents the diagnostic examination that can be performed, e.g., “CT
brain”.
Exam Group (or cluster) is a group of exams. In many cases it is useful, due
to some team specialisation, to group exams for the area of the body (e.g.,
“head and neck” or “abdominal”) or to group them in order to reflect which
Department the patient comes from (e.g., “CT from GPs”). The grouping is
done according to the habits of the Imaging Department, team and work-flow:
hybrid models can also be implemented.
Modality this entity represents a medical device, like an ultrasound, a CT scan,
etc. In our model, a modality corresponds to an actual room. This is reasonable
since the machinery used for exams is usually not moved between rooms. Such
modelisation leads to having an independent agenda per modality. In principle, it
is possible to have the machines required for different sorts of exams in the same
room. This case, which is extremely rare since it leads to underused resources,
is not modelled.
Personnel represents the human resources (staff members) available. Each mem-
ber of staff has at least one role. Each Exam Group has a set of roles assigned, this
indicates the specific needs of that group of exams in terms of human resources.
On the Efficient Allocation of Diagnostic Activities 105
For instance, some exam groups require several nurses to be present in the room,
while other exams require technicians to be available.
Time Slot we adopted an atomic time slot of 5 minutes in a weekly calendar.
The granularity of 5 minutes has been chosen since it is not extremely long, thus
limiting the waste time; also, it is not too short, therefore short delays do not
affect the overall daily scheduling.
Temporal horizon catches the requirements in terms of queue governance. This
can represent constraints like “the queue for ’Brain MRI’ must be lower than 3
months for, at least, the next 12 months”. It should be noted that this is the
usual way in which queue governance requirements are expressed.
Objective Function and Constraints
The optimisation of the scheduling of exams has to deal with two main com-
ponents. On the one hand, it is important to maximise income for the Depart-
ment. This should result in the prioritisation of exams that are both frequently
requested and expensive. On the other hand, the Department is also providing
an important service to the community. Therefore, keeping all the queue lists
as short as possible is fundamental. In public hospitals there are strict upper
bounds for queues.
For considering both the aforementioned aspects, we designed an objective
function (to minimize) that combines the two perspectives. The adopted function
is depicted in Equation 1. In particular, n indicates the number of exams. ri is
the cost of the i-th exam. qi is the amount of exams of group i that should be
performed. Δj represents the difference between waiting queue and desired queue
length for the j-th exam group. Wj is the importance of the queue for the j-th
exam. Finally, α and β indicate, respectively, the importance that is given to the
economic side and to the respect of the limits on the queues’ length. Intuitively,
the function synthesises the point of view of the hospital administration (first
addend) –focused on the economic side– and of the doctor (second addend)
–focused on the quality of the service.
n
n
f =α (ri ∗ qi ) + β (Δj ∗ Wj ) (1)
1 1
1. the available personnel is fully assigned to the rooms. Most of the staff are
assigned to morning slots, since it is the period of the day where most exams
take place. Some heuristics are followed for reducing the spread of exams of
the same group in different rooms, or in very different time slots.
2. A random number of time slots is assigned to each cluster of examinations,
according to the hard constraints related to human-resources.
3. After the allocation, free time slots or free human resources are analysed in
order to be exploited. For reducing the fragmentation, the preferred solution
is to extend the time slot of exams allocated before/after the free slot. Frag-
mentation leads to waste time due to switching exam equipment between
modalities and personnel moving between rooms.
all the pairs of clusters. This is due to, for example, different requirements in
terms of personnel, equipment or time. The choice of the group of exam to be
substituted is done by ordering clusters according to the requested resources and
number of requests per week. Clusters that require many resources and are rarely
performed are suitable to be substituted. The selected cluster is substituted with
another that can fit in the released time-slots.
If the new allocation plan has a better target function than the current plan,
the former is saved; otherwise the algorithm restarts by considering a different
suitable cluster to substitute. The search stops when a specified number of re-
scheduling attempts, or the time limit, is reached. It should be noted that the
designed algorithm is able to provide several solutions of increasing quality.
4 Experimental Analysis
large hospitals, where examination rooms are far from each other, frequent staff
movement results in a significant waste of time.
Data entry is time-expensive: changes in personnel, exams, instrumentation
or policies are quite frequent in a medium-big Radiology Department, and require
updating of the data and re-planning. The best way to efficiently support an
operator would be by exporting, from existent RIS, data regarding staff, modal-
ities and dates, in order to save time. Currently, HL7 [3] would probably be the
best standard for such integration.
The proposed algorithm can provide useful information about the available
resources. In particular, it can be used for identifying the most limiting resource
(personnel, modalities, etc.) and evaluating the impact of new resources. Head of
Departments highlighted that it is currently a very complex problem to estimate
the impact of a new modality, or of increased personnel. By using the proposed
algorithm, the impact of new resources can be easily assessed by comparing the
quality of plans with and without them, in different scenarios.
New modalities or new staff require an initial “training” period. In the case
of new modalities, the staff will initially require more time for performing exams.
Newly introduced staff usually need to be trained. The current approach is not
able to catch such situations. A possible way for dealing with this is considering
a “penalty” for some instrumentation or personnel; a time slot is “longer” –by
a given penalty factor– when they are assigned to it. Penalties can be reduced
over time.
The current approach does not have a year overview. Some exams are more
likely to be required in some periods, thus they should have different priorities
with regards to their waiting lists’ requirements. Also, it is common practice that
in summer hospital’s personnel is reduced. A good integration with medical and
administrative databases will be useful for further refinements of the allocation
abilities.
5 Conclusion
In many Diagnostic Imaging Departments the allocation of exams is currently
done manually. As a result, it is time consuming, it is hard to assess its overall
quality, and no information about limiting resources are identified.
In this paper, we addressed the aforementioned issues by introducing: (i) a
formal model of the diagnostic activities allocation problem, and (ii) an efficient
algorithm for the automated scheduling of diagnostic activities. The proposed
model is general, and can therefore fit with any existing Imaging Department.
Moreover, a quantitative function for assessing the quality of allocation plans is
provided. The two-steps algorithm allows the generation and/or improvement
of allocation plans. An experimental analysis showed that the approach is effi-
ciently able to provide useful and valid scheduling for examinations. Feedback
received from experts confirms its usefulness, also for evaluating the impact of
new instrumentation or staff members.
This work can be seen as a pilot study, which can potentially lead to the
exploitation of more complex and sophisticated Artificial Intelligence techniques
On the Efficient Allocation of Diagnostic Activities 109
References
1. Barbati, M., Bruno, G., Genovese, A.: Applications of agent-based models for opti-
mization problems: A literature review. Expert Systems with Applications 39(5),
6020–6028 (2012)
2. Boochever, S.S.: HIS/RIS/PACS integration: getting to the gold standard. Radiol
Manage 26(3), 16–24 (2004)
3. Dolin, R.H., Alschuler, L., Boyer, S., Beebe, C., Behlen, F.M., Biron, P.V., Shvo, A.S.:
Hl7 clinical document architecture, release 2. Journal of the American Medical Infor-
matics Association 13(1), 30–39 (2006)
4. Eagen, B., Caron, R., Abdul-Kader, W.: An agent-based modelling tool (abmt)
for scheduling diagnostic imaging machines. Technology and Health Care 18(6),
409–415 (2010)
5. Falsini, D., Perugia, A., Schiraldi, M.: An operations management approach for
radiology services. In: Sustainable Development: Industrial Practice, Education and
Research (2010)
6. Macal, C.M., North, M.J.: Agent-based modeling and simulation. In: Winter Simu-
lation Conference, pp. 86–98 (2009)
7. Mazzini, N., Bonisoli, A., Ciccolella, M., Gatta, R., Cozzaglio, C., Castellano, M.,
Gerevini, A., Maroldi, R.: An innovative software agent to support efficient planning
and optimization of diagnostic activities in radiology departments. International
Journal of Computer Assisted Radiology and Surgery 7(1), 320–321 (2012)
8. Welch, J.D.: N.B.T.: Appointment systems in hospital outpatient departments. The
Lancet 259(6718), 1105–1108 (1952)
Ontology-Based Information Gathering System
for Patients with Chronic Diseases: Lifestyle
Questionnaire Design
1 Introduction
Computer-based questionnaires are a new form of data collection, which are
designed to offer more advantages compared to pen and paper questionnaires
or oral interviewing [13]. They are less time-consuming and more efficient by
offering more structure and more details compared to the classical methods [2].
The Information Gathering Systems (IGSs) have had measurable benefits in
reducing omissions and errors arising as a result of medical interviews [14]. The
medical and health care domain is one of the most active domain in using IGS
for gathering patients data [13].
Recently, various research works were conducted to design and to use the IGS
as part of clinical decision support system (CDSS). Among them. Bouamrane
et al. [2], [3] proposed a generic model for context-sensitive self-adaptation of
IGS based on questionnaire ontology. The proposed model is implemented as an
data collector module in [4] to collect patient medical history for preoperative
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 110–115, 2015.
DOI: 10.1007/978-3-319-23485-4 11
Ontology-Based Information Gathering System for Patients 111
risk assessment. Sherimon et al. [5], [6], [15] proposed an questionnaire ontology
based on [2]. This ontology is used to gather patient medical history, which is
then integrated within CDSS to predict the Risk of hypertension. Farooq et
al. [7] proposed an ontology-based CDSS for chest pain risk assessment, based
on [2] the proposed CDSS integrates a data collector to collect patient medical
history. Alipour [13] proposed an approach to design an IGS based on the use of
ontology-driven generic questionnaire and Pellet inference engine for questions
selection process.
Although the presented IGS in the literature permit gathering patient data
using ontologies, the created questionnaires are hard coded for specific domains
and they are defined under the domain ontologies. These, make them less flexible,
more difficult to maintain and even hard to share and to reuse.
Unlike previous approaches, our approach offers more flexibility by separat-
ing the ontologies and by integrating a domain ontology to drive the creation
of questionnaire. This allows to give meaning to the created questions, and con-
figuring different models of questionnaires without coding and regardless of the
content of the domains. Therefore, many CDDSSs can easily integrate and use
the proposed IGS for their specific needs.
Furthermore, the proposed approach permits to collect relevant information
by prompting the whole significant questions in connection with the patient
profile. The formerly collected answers are also taken into consideration in the
questions selection process. This improves the classical approach by customizing
the interview to each patient.
The proposed IGS is integrated within E-care home health monitoring plat-
form [1], [8] for gathering lifestyle-related patient data.
E-care is a home health monitoring platform for patients with chronic diseases
such as diabetes, heart failure, high blood pressure, etc. [1] [8]. The aim is early
detection of any anomalies or dangerous situations by collecting relevant data
from the patient such as physiological data (heart rate, blood pressure, pulse,
temperature, weight, etc.) and lifestyle data (tobacco-use, eating habits, physical
activity, sleep, stress, etc.).
To improve the accuracy in anomalies detection, the platform needs relevant
information that describes as precisely as possible the patient’s health status and
his lifestyle changes (tobacco-use, lack of physical activity, poor eating habits,
etc.). That is why the patient is invited daily to collect his physiological data
using medical sensors (Blood Pressure Monitor, Weighing Scale, Pulse Oximeter,
etc.) and to answer on lifestyle questionnaires. These questionnaires are auto-
matically generated by the IGS which permits gathering relevant information
about the patient lifestyle.
All collected data (physiological data and lifestyle data) is stored in the
patient profile ontology which models the health status of patient and then
analysed by the inference engine for anomalies detection.
112 L. Benmimoune et al.
Survey History Ontology (SHO): stores all the patient surveys. It includes
all the asked questions and the given answers by the patient. It is used in the
questions selection process.
The Adaptive Engine (AE): it interprets the properties asserted in the ques-
tionnaire ontology and prompts the corresponding questions in connection with
the patient profile and the formerly collected answers. The AE initially loads all
questions except the children questions. It prompts the first question and checks
if the question is appropriate to the patient profile (e.g. AE doesn’t ask questions
about the smoking habits, if the patient is a non-smoker). If it is, the AE asks
the question and gets the answer from the UI. If it is not, the AE just prompts
the next question.
Ontology-Based Information Gathering System for Patients 113
The User Interfaces (UI): consist of two parts of UI namely: expert UI and
Patient UI.
• Expert UI: permits the domain experts (clinicians) to configure the IGS
by defining questionnaires and to consult the surveys history.
• Patient UI: permits to start/stop the survey. It is designed in such a way
that the patient can respond to the questionnaire from anywhere using his
mobile device (tablet or smart phone).
The example illustrated by figure 2 shows how the domain concepts can be
related amongst them and how they are used to design lifestyle questionnaires.
Given the smoking habit that is characterized by a type of tobacco (e.g.
cigarette, electronic cigarette, drugs, etc.), time frequency (daily, monthly,
weekly, etc.), smoking quantity, etc. Several questions can be created based on
SmokingHabits concept, with each smoking-related question should be related to
the SmokingHabits through the domain properties, while the potential answers
are related either to the DimensionsEntities or to the CatalogueProperties. (see
figure 2).
References
1. Benyahia, A.A., Hajjam, A., Hilaire, V., Hajjam, M.: E-care ontological archi-
tecture for telemonitoring and alerts detection. In: 5th IEEE International
Symposium on Monitoring & Surveillance Research (ISMSR): Healthcare-Safety-
Security (2012)
2. Bouamrane, M.-M., Rector, A.L., Hurrell, M.: Ontology-driven adaptive medical
information collection system. In: An, A., Matwin, S., Raś, Z.W., Ślezak, D.
(eds.) Foundations of Intelligent Systems. LNCS (LNAI), vol. 4994, pp. 574–584.
Springer, Heidelberg (2008)
3. Bouamrane, M.M., Rector, A., Hurrell, M.: Gathering precise patient medical his-
tory with an ontology-driven adaptive questionnaire. In: 21st IEEE International
Symposium on Computer-Based Medical Systems, CBMS 2008, June 17–19, 2008
4. Bouamrane, M.-M., Rector, A., Hurrell, M.: Using ontologies for an intelligent
patient modelling, adaptation and management system. In: Meersman, R., Tari, Z.
(eds.) OTM 2008, Part II. LNCS, vol. 5332, pp. 1458–1470. Springer, Heidelberg
(2008)
5. Sherimon, P.C., Vinu, P.V., Krishnan, R., Takroni, Y.: Ontology Based System
Architecture to Predict the Risk of Hypertension in Related Diseases. IJIPM:
International Journal of Information Processing and Management 4(4), 44–50
(2013)
6. Sherimon, P.C., Vinu, P.V., Krishnan, R., Takroni, Y., AlKaabi, Y., AlFars, Y.:
Adaptive questionnaire ontology in gathering patient medical history in diabetes
domain. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) DaEng-2013. LNEE,
vol. 285, pp. 453–460. Springer, Singapore (2014)
7. Farooq, K., Hussain, A., Leslie, S., Eckl, C., Slack, W.: Ontology driven cardiovas-
cular decision support system. In: 2011 5th International Conference on Pervasive
Computing Technologies for Healthcare (PervasiveHealth), May 23–26, 2011
8. Benyahia, A.A., Hajjam, A., Hilaire, V., Hajjam, M., Andres, E.: E-care telemon-
itoring system: extend the platform. In: 2013 Fourth International Conference on
Information, Intelligence, Systems and Applications (IISA), July 10–12, 2013
9. Saripalle, R.K.: Current status of ontologies in Biomedical and clinical Infor-
matics. University of Connecticut. http://www.engr.uconn.edu/steve/Cse300/
saripalle.pdf (retrieved January 16, 2014)
10. Gruber, T.R.: Toward principles for the design of ontologies used for knowl-
edge sharing? International Journal Human-Computer Studies 43(5–6), 907–928
(1995)
11. Guarino, N.: Formal Ontology and information systems. Formal ontology in infor-
mation systems. In: Proceedings of FOIS 1998, Trento, Italy, June 6–8, 1998
12. Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating
Your First Ontology. Stanford University (2005)
13. Alipour-Aghdam, M.: Ontology-Driven Generic Questionnaire Design. Thesis for
the degree of Master of Science in Computer Science. Presented to The University
of Guelph, August 2014
14. Bachman, J.W.: The patient-computer interview: a neglected tool that can aid
the clinician. Mayo Clinic Proceedings 78, 67–78 (2003)
15. Sherimon, P.C., Vinu, P.V., Krishnan, R., Saad, Y.: Ontology driven analysis
and prediction of patient risk in diabetes. Canadian Journal of Pure and Applied
Sciences 8(3), 3043–3050 (2014). SENRA Academic Publishers, British Columbia
Predicting Preterm Birth in Maternity Care
by Means of Data Mining
Abstract. Worldwide, around 9% of the children are born with less than 37
weeks of labour, causing risk to the premature child, whom it is not prepared to
develop a number of basic functions that begin soon after the birth. In order to
ensure that those risk pregnancies are being properly monitored by the obstetri-
cians in time to avoid those problems, Data Mining (DM) models were induced
in this study to predict preterm births in a real environment using data from
3376 patients (women) admitted in the maternal and perinatal care unit of Cen-
tro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was
developed, assisting physicians in the decision-making process regarding the
patients’ observation. It was possible to obtain promising results, achieving sen-
sitivity and specificity values of 96% and 98%, respectively.
Keywords: Data mining · Preterm birth · Real data · Obstetrics care · Maternity
care
1 Introduction
Preterm birth portrays a major challenge for maternal and perinatal care and it is a
leading cause of neonatal morbidity. The medical, education, psychological and social
costs associated with preterm birth indicate the urgent need of developing preventive
strategies and diagnostic measures to improve the access to effective obstetric and
neonatal care [1]. This may be achieved by exploring the information provided from
the information systems and technologies increasingly used in healthcare services.
In Centro Hospitalar of Oporto (CHP), a Support Nursing Practice System focused
on nursing practices (SAPE) is implemented, producing clinical information. In addi-
tion, patient data plus their admission form are recorded though EHR (Electronic
Health Record) presented in Archive and Diffusion of Medical Information (AIDA)
platform. Both SAPE and EHR are also used by the CHP maternal and perinatal care
unit, Centro Materno Infantil do Norte (CMIN). CMIN is prepared to provide medical
care / services for women and child. Therefore, using obstetrics and prenatal informa-
tion recorded from SAPE and EHR, it is possible to extract new knowledge in the
context of preterm birth. This knowledge is achieved by means of Data Mining (DM)
techniques, enabling predictive models based on evidence. This study accomplished
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 116–121, 2015.
DOI: 10.1007/978-3-319-23485-4_12
Predicting Preterm Birth in Maternity Care by Means of Data Mining 117
DM models with sensitivity and specificity values of approximately 96% and 98%,
which are going to support the making of preventive strategies and diagnostic meas-
ures to handle preterm birth.
Besides the introduction, this article includes a presentation of the concepts
and related work in Section 2, followed by the data mining process, described in Sec-
tion 3. Furthermore, the results are discussed and a set of considerations are made in
Section 4. Section 5 presents the conclusions and directions of future work.
3 Study Description
In Table 2 are shown statistics measures related to the numerical variables age, ges-
tation, PG1, PG2 and BMI, while in Table 3 it is represented the percentage of occur-
rences for some used variables.
Predicting Preterm Birth in Maternity Care by Means of Data Mining 119
Table 2. Statistics measures of age, PG1, PG2, weight, height, BMI variables.
Minimum Maximum Average Standard Deviation
Age 14 46 29.88 5.81
PG1 5 40 12.81 2.96
PG2 0 8 3.09 1.96
BMI 14.33 54.36 29.40 4.57
3.4 Modelling
A set of Data Mining models (DMM) were induced using the four DM techniques
(DMT) mentioned in Section 3: GLM, SVM, DT and NB. The developed models
used two sampling methods Holdout sampling (30% of data for testing) and Cross
Validation (all data for testing). Additionally there were implemented two different
approaches, one using the raw dataset (3376 entries) and another with oversampling.
Different combinations of variables were used, obtaining 5 different scenarios:
S1: {Age (A), Gestation (G), Programmed (P), PG1, PG2, Motive (M), Height (H), Weight (W), BMI,
Blood Type (B), Marital Status (MS), CTG, Streptococcus (S)}
S2: {A, H, W, BMI, B, MS, CTG, S}
S3: {G, P, PG1, PG2, M, CTG, S}
S4: {A, G, PG1, PG2, M, H, W, BMI, B, CTG, S}
S5: {A, G, P, M, H, W, BMI, B}
All the models were induced using the Oracle Data Miner with its default configu-
rations. For instance, GLM was induced with automatic preparation, with a confi-
dence level of 0.95 and a reference value of 1.
120 S. Pereira et al.
3.5 Evaluation
The study used the confusion matrix (CMX) to assess the induced DM models. Using
the CMX, the study estimated some statistical metrics: sensitivity, specificity and
accuracy. Table 4 presents the best results achieved by each technique, sampling me-
thod and approach. The best accuracy (93.00%) was accomplished with scenario 3 by
both DT and NB techniques using oversampling and 30% of data for testing. The best
sensitivity (95.71%) was achieved by scenario 4 with oversampling using SVM tech-
nique and all the data for testing. Regarding specificity, scenario 2 reached 97.52%
using SVM with oversampling and all the data for testing.
Table 4. Sensitivity, specificity and accuracy values for the best scenarios for each DMT,
approach and sampling method. Below, the best metric values highlighted for each DMT.
DMT Oversampling Sampling Scenario Sensitivity Specificity Accuracy
DT No 30% 3 0.8889 0.9303 0.9300
No All 1 0.2896 0.9723 0.8599
GML No All 4 0.2896 0.9723 0.8599
Yes All 4 0.8674 0.7126 0.7687
NB No 30% 3 0.8889 0.9303 0.9300
No All 1 0.4868 0.9646 0.9271
SVM No All 2 0.1023 0.9752 0.4570
Yes All 4 0.9571 0.6647 0.7410
In order to choose the best models a threshold was established, considering sensi-
tivity, accuracy and sensitivity values upper than to 85%. Table 5 shows the models
that fulfil the threshold.
4 Discussion
Should be noted that the best sensitivity (95.71%) and specificity (97.52%) are
reached by models that did not achieve the threshold defined, showing low values in
the remaining statistical measures used to evaluate the models. It can be settled that
scenario 3 meets the defined threshold, presenting good results in terms of specificity
and sensitivity, as seen in Table 5. Thus, it appears that the most relevant factors that
affect the term of birth are: pregnancy variables, Gestation and physical conditions of
the pregnant woman. In a clinic perspective, the achieved results will enable the pre-
diction of preterm birth, with low uncertainty, allowing those responsible better moni-
toring and resource management. In a real time environment, physicians can rely on
the model to send a warning informing that a specific patient has a risk pregnancy and
it is in danger of preterm delivery. Consequently, the physician can be observant and
alert to these cases and can put the patients on special watch, saving resources and
time to the healthcare institution.
Predicting Preterm Birth in Maternity Care by Means of Data Mining 121
At the end of this work it is possible to assess the viability of using these variables
and classification DM models to predict Preterm Birth. The study was conducted
using real data. Promising results were achieved by inducing DT and NB, with over-
sampling and 30% of the data for testing, in scenario 3, achieving approximately 89%
of sensitivity and 93% of specificity, suited to predict preterm births. The developed
model support the decision-making process in maternity care by identifying the preg-
nant patients in danger of preterm delivery, alerting to their monitoring and close
observation, preventing possible complications, and ultimately, avoiding preterm
birth.
In the future new variables will be incorporated in the predictive models and other
types of data mining techniques will be applied. For instance, inducing Clustering
techniques would create clusters with the most influential variables to preterm birth.
Acknowledgments. This work has been supported by FCT - Fundação para a Ciência e Tecno-
logia within the Project Scope UID/CEC/00319/2013.
References
1. Berghella, V. (ed.): Preterm birth: prevention and management. John Wiley & Sons (2010)
2. Spong, C.Y.: Defining “term” pregnancy: recommendations from the Defining “Term”
Pregnancy Workgroup. Jama 309(23), 2445–2446 (2013)
3. Beta, J., Akolekar, R., Ventura, W., Syngelaki, A., Nicolaides, K.H.: Prediction of sponta-
neous preterm delivery from maternal factors, obstetric history and placental perfusion and
function at 11–13 weeks. Prenatal diagnosis 31(1), 75–83 (2011)
4. Andersen, H.F., Nugent, C.E., Wanty, S.D., Hayashi, R.H.: Prediction of risk for preterm deli-
very by ultrasonographic measurement of cervical length. AJOG 163(3), 859–867 (1990)
5. Abelha, A., Analide, C., Machado, J., Neves, J., Santos, M., Novais, P.: Ambient intelli-
gence and simulation in health care virtual scenarios. In: Camarinha-Matos, L.M., Afsar-
manesh, H., Novais, P., Analide, C. (eds.) Establishing the Foundation of Collaborative
Networks. IFIP — The International Federation for Information Processing, vol. 243, pp.
461–468. Springer, US (2007)
6. McGuire, W., Fowlie, P.W. (eds.): ABC of preterm birth, vol. 95. John Wiley & Sons
(2009)
7. Brandão, A., Pereira, E., Portela, F., Santos, M.F., Abelha, A., Machado, J.: Managing volun-
tary interruption of pregnancy using data mining. Procedia Technology 16, 1297–1306
(2014)
8. Kaur, H., Wasan, S.K.: Empirical study on applications of data mining techniques in
healthcare. Journal of Computer Science 2(2), 194 (2006)
9. Brandão, A., Pereira, E., Portela, F., Santos, M.F., Abelha, A., Machado, J.: Predicting the risk
associated to pregnancy using data mining. In: ICAART 2015 Portugal. SciTePress (2015)
10. Maimon, O., Rokach, L.: Introduction to knowledge discovery in databases. Data Mining
and Knowledge Discovery Handbook, pp. 1–17. Springer, US (2005)
11. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.:
CRISP-DM 1.0 Step-by-step data mining guide (2000)
Clustering Barotrauma Patients in ICU–A Data Mining
Based Approach Using Ventilator Variables
1 Introduction
Data Minng (DM) process provides not only the methodology but also the technology
to transform the data collected into useful knowledge for the decision-making process
[1]. In critical areas of medicine some studies reveal that one of the respiratory dis-
eases with higher incidence in the patients is Barotrauma [2]. Health professionals
have identified high levels of Plateau pressure as having a significantly contribute to
the Barotrauma occurrence [3]. This study is part of the major project INTCare. In
this work a clustering process was addressed in order to characterize patients with
barotrauma and analyze the similarity among ventilator variables. The best models
achieved a Davies-Bouldin Index of 0.64. The work was tested using data provided by
the Intensive Care Unit (ICU) of the Centro Hospitalar do Porto (CHP).
This paper consists of four sections. The first section corresponds to the introduc-
tion of the problem and related work. Aspects directly related to this study and sup-
porting technologies for knowledge discovering from databases are then addressed in
the second section. The third section formalizes the problem and presents the results
in terms of DM models following the methodology Cross Industry Standard Process
for Data Mining (CRISP-DM). In the fourth section some relevant conclusions are
taken.
2 Background
2.3 INTCare
This work was carried out under the research project INTCare. INTCare is an Intelligent
Decision Support System (IDSS) [6] for Intensive Care which is in constantly develop-
ing and testing. This intelligent system was deployed in the Intensive Care Unit (ICU)
of Centro Hospitalar do Porto (CHP). INTCare allows a continuous patient condition
monitoring and a prediction of clinical events using DM. One of the most recent goals
addressed is the identification of patients who may have barotrauma.
useful knowledge. The knowledge discovery may represent various forms, business
rules, similarities, patterns or correlations [7].
This work is mainly focused on the development and analysis of clusters. This is a
grouping process based on observing the similarity or interconnection density. This
process aims to discover data groups according to the distributions of the attributes
that make up the dataset [8]. To develop and assess the application of clustering algo-
rithms in the barotrauma dataset, the statistical system R was choosen.
3.4 Modelling
The algorithms k-means and k-medoids were used to create the cluster. The choice is
justified by the principle of partition method and the difference in their sensitivity to
find outliers.
Clustering Barotrauma Patients in ICU–A Data Mining Based Approach 125
K-means algorithm is sensitive to outliers, because the objects are far from the
majority, which can significantly influence the average value of the cluster. This ef-
fect is particularly exacerbated by the use of the squared error function [9].
On the other hand, K-medoids instead of using the value of a cluster object as a re-
ferencing point takes on real objects and represents the clusters, creates an object for
each cluster. The partitioning method is then performed based on the principle of
adding the differences between each p (intra-clusters distance). It is representing an
object (dataset partition) [9] where the p is always >= 0. The K-medoids algorithm is
similar to K-means, except that the centroids must belong to a set of grouped data
[10]. Some configurations were atempted for each one of the algorithms. In the k-
means algorithm the value (cluster number) varies between 2 and 10. In order to
obtain the appropriate number of it was used the sum of squared error (SSE). Each
dataset was executed 10 times.
The model belongs to an approach A and it is composed by the fields F, a type
of variable TV and an Algorithm AG:
_1 , _2 , _3 , _4 , _5 , _6 , _7 , _8 , _9 , _10 , _11
, _12 , _13 , _14
Being this study related with Barotrauma and Plateau Pressure, all the models in-
cluded the variable . Some of the clusters induced are composed by the group of
variables defined in the first approach.
3.5 Evaluation
This is the last phase of the study. It focuses mainly on the analysis of the results pre-
sented through the implementation of clustering algorithms (K-means and PAM).
The evaluation of the induced models was made by using the Davies-Bouldin Index.
The models which presented most satisfactory results were those obtained by means
of the K-means algorithm. In general, some models presented good results, however
the models did not achieve optimal results (index near 0). Table 2 presents the best
models and the correspondent results.
Number of Davies-Boldin
Model Fields Algorithm
Clusters Index
, , , , 2 0.82
, , , , 5 0.86
, , , 2 0.64
, , , 6 1.17
126 S. Oliveira et al.
The model shown to be the most capable in designing a clusters with better dis-
tances. Davies-Bouldin Index tends to +∝ however model has an index of 0.64.
This is not the optimal value, but it is the most satisfactory because it is closest to 0.
Figure 1 presents M3 results.
1 1 11
11
1 111 1
11 1111 111 11 1
1
5
1 1 2
1 111 1 1 2
1111 11
11111 1 1 11
111 1 1 1
111111111111 111 22
11111 1111111 11 1 1 2
11111111111 111 11111 1 11 1 1 1 11
11111111 1
111 11111111 1111 11 11 1111
111 2
1
11111111
11 11 1 1111 11 11111 11111111 1111 1 1 2 2 2
1
11111111 11 1111
11
1111 1111 11111 1 11111 2 22 222 22 2 2 2 22
0
11 1 1 111 1
1 1
1 1
111 1111 1 11111 1
1
1 111 1 1 1 1
1 1 1 1 11 1 1 1 1
1 1 1 1
111111111 1 1 1 1 111
11 1 1 1 1 1 1 1 1 2 22 2222222 2 2 22 22 2 22222 2
2 2 2 2 2
11111111111 111111 11
1
11
1
11 111
11 1111111111 11
11 1111 111 111 111
11
1 111 111111111111111111 11 111
1111111111
11 111 11 1111111 22
222 22
22
2222222222
1111
11 1111 111
1 1 111
1 11
1 11
1 11 11111
1 11
11 111 111
111
1 11111
11
11 11
1
1111 11
11
1
11
1 1 1 111
11
1 11
1
11
11
11111
11111111
1
11 1111
11
111
1111 1
11
111
11111
11
1111
11
1111
1
111
1 11
1 1
121
1 22
2222 222 2
222
222222222 22 2222222222222222 222222 2222 22 2 2 2
111111 1111111 1 1
11
1
11
11111
11
1
1
111111
11 111
11 1 1
1 1
11
1
1
111
1 11
1
1 1
1
1 111
1
1
11 1
1
11
11
1
1 1
11
11
111 1
111
1
11
11
11
1
111
1 11111
11
1 11
111
111
1
11
11
1
11
1
111
11
1
11
1
11
1
1
1
1
11
1
1
111
11
11 1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1 1
1 11 1
1
111
1 1 1
1
11
1
1
1 1111 1
1111
1 1
11 1 1
1 1
11
1
111112
2 2
2
2 2
2 2 222
222222
22
2 2
2
2222222222
2 2222
222 22222 22 22 2222 22 2222222 22 2 22 2 2 2
222
1 1 1 1 1 1 111
1 1 1 11
1 11
1 111
11 1 111
1111
1 11
1 1 11
1
11 11
11
1 1 111
1 11 11
1111 1111
111
11 1 1 1111 11 1 11
111 1 11 1 1 1
11111 11 1 2 222 2 2 2 22
2222 2 22 222
22222222222 2
2222 22 2 2
222222222 2222222222222 2 2 22 222
11111111 111 11111 1
11 1 1 1 11
11
1 1 111
11 11111 11 1 1 111 11
111 11
1111111111 111111111111
111 1111111
11 11 111 111111 1111 1
112 222 22 2222222222222 222222222222222 2 2222 22222222
22
dc 2
11
111 11111 11
1 11
1111111 11 111111
11 1 1
11 1 11
1 1
111
1111
111
11
11
1
111111
1 1 11 11 1
111
11 1 1
11
1 1 1
11 11
1
11
1
111 11 111
11
1 11 1
11
11111111 11 222 2222
22 2 22
2 222 222222222222 22
22222 222222 2222
222
222 22222
111
11111
111111111 1 11
1 1 1 11
1
111111 1
1111
11
11
11 1
11
111111 1
11
1
11
111
111
1 11
1
111
1 11
1
1111
11
11111
11
1111
1 1
11
111
1 1
111111
1 11
1 11
11
111111
1
11
1 1
1111
1
111
1
11
1
111
111
1
111
111
11
1111 1
11
11
1
1
11
1 11
1
11111 1
1
1111
11111 111
1 1
11 11
11
11111
1111
1
2222
1 2 22
22222 22
222 22 2222
22 22
22
222
222222 2222222 22
2222
22 222222222222222
22222 2222 22 2222222222222222 222222222
1 1 1 1 111
1 1 1 111
111
11
11
1 1 11111 11
1 1
111
1 11 11 11
11
1 1 1 1 1 11 1 111 1
11
1 1
1
11 111
11 11111111
1111
1 1 111 1 1 111 11
11 1111 111 1222 2222 2
2 22 2 2 2
2 2
2 22 2 2 22
22 22
2 22 2 2 22 2 2 2
22 2 2 2 2 22 2 2
1111111
11
1 1111111
1111 111 111 1
111
11 11111
11 111111111111
111 1111
11 111 11111111 111111 1111 1111111111111 1111 111 1112 22 222222222 2 22222222
22 2 2
2
1 11 111 111
1
1111
11111 111 111111
11
1 11 1111 11
1 11 1 11111 111111111 1111111112 22 222 222 22 22 22 2
111 111111 111
11111 11 111
11 11 1 11111 11111 11 1 111
111111111111 1 1 11 2 2 22
−5
1 11 1
1 11111
11 11111111
11 11111 1111111 1111 1 11 111
2
1 11111111 111 1 1 1 1 2 2 2
1 1111 11 1 1 1 1 2
11
1 1 1 2
1 2
1 1
1
−10
111
1
11
−2 0 2 4 6 8 10
dc 1
Table 3 presents the minimum, maximum, average, standard deviation and coeffi-
cient of variation of each variable used to host the clusters in 3 .
4 Conclusion
This study identified a set of variables that have a great similarity. These variables are
related with Plateau Pressure - variable with greater influence in the occurrence of
barotrauma. The better result was achieved with the model obtaining a Davies-
Bouldin Index of 0.64, a value near to the optimum value (0).
It should be noted that most of the variables used presented some dispersion how-
ever in one of the clusters the higher dispersion value is quite acceptable: 19.42. This
result was obtained with the implementation of the K-means algorithm. The CDYN is
one of the variables that most influences the clustering, demonstrating a strong
Clustering Barotrauma Patients in ICU–A Data Mining Based Approach 127
relationship with the PPR and Barotrauma. From the results shown in Table 3 it can
be noted that the best corresponding field is CDYN, presenting only a few intersect-
ing values (minimum and maximum). This means that Cluster 1 has CDYN values
ranging between [0; 73] and Cluster 2 has CDYN values between [73;200]. The re-
maining fields used have only few interceptions. Finally, this study demonstrated the
feasibility of creating clusters using only data monitored by ventilators and analyzing
similar populations. These results motivate further studies in order to induce more
adjusted models reliable for classification and clustering at the same time.
Acknowledgements. This work has been supported by FCT - Fundação para a Ciência e
Tecnologia within the Project Scope UID/CEC/00319/2013 and the contract PTDC/EEI-
SII/1302/2012 (INTCare II).
References
1. Koh, H., Tan, G.: Data mining applications in healthcare. J. Healthc. Inf. Manag. 19(2),
64–72 (2005)
2. Anzueto, A., Frutos-Vivar, F., Esteban, A., Alía, I., Brochard, L., Stewart, T., Benito, S.,
Tobin, M.J., Elizalde, J., Palizas, F., David, C.M., Pimentel, J., González, M., Soto, L.,
D’Empaire, G., Pelosi, P.: Incidence, risk factors and outcome of barotrauma in mechani-
cally ventilated patients. Intensive Care Med. 30(4), 612–619 (2004)
3. Al-Rawas, N., Banner, M.J., Euliano, N.R., Tams, C.G., Brown, J., Martin, A.D., Gabriel-
li, A.: Expiratory time constant for determinations of plateau pressure, respiratory system
compliance, and total resistance. Crit Care 17(1), R23 (2013)
4. Boussarsar, M., Thierry, G., Jaber, S., Roudot-Thoraval, F., Lemaire, F., Brochard, L.: Re-
lationship between ventilatory settings and barotrauma in the acute respiratory distress
syndrome. Intensive Care Med. 28(4), 406–413 (2002)
5. Oliveira, S., Portela, F., Santos, M.F., Machado, J., Abelha, A., Silva, A., Rua, F.: Predict-
ing plateau pressure in intensive medicine for ventilated patients. In: Rocha, A., Correia,
A.M., Costanzo, S., Reis, L.P. (eds.) New Contributions in Information Systems and
Technologies, Advances in Intelligent Systems and Computing 354. AISC, vol. 354, pp.
179–188. Springer, Heidelberg (2015)
6. Portela, F., Santos, M.F., Machado, J., Abelha, A., Silva, A., Rua, F.: Pervasive and intel-
ligent decision support in intensive medicine – the complete picture. In: Bursa, M., Khuri,
S., Renda, M. (eds.) ITBAM 2014. LNCS, vol. 8649, pp. 87–102. Springer, Heidelberg
(2014)
7. Turban, E., Sharda, R., Delen, D.: Decision Support and Business Intelligence Systems. 9a
Edição. Prentice Hall (2011)
8. Anderson, R.K.: Visual Data Mining: The VisMiner Approach, Chichester, West Sussex,
U.K., 1st edn. Wiley, Hoboken (2012)
9. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. 3a Edição. Morgan
Kaufmann (2012)
10. Xindong, W., Vipin, K.: The Top Ten Algorithms in Data Mining. CRC Press–Taylor &
Francis Group (2009)
Clinical Decision Support for Active and Healthy Ageing:
An Intelligent Monitoring Approach
of Daily Living Activities
Abstract. Decision support concepts such as context awareness and trend anal-
ysis are employed in a sensor-enabled environment for monitoring Activities of
Daily Living and mobility patterns. Probabilistic Event Calculus is employed
for the former; statistical process control techniques are applied for the latter
case. The system is tested with real senior users within a lab as well as their
home settings. Accumulated results show that the implementation of the two
separate components, i.e. Sensor Data Fusion and Decision Support System,
works adequately well. Future work suggests ways to combine both compo-
nents so that more accurate inference results are achieved.
1 Introduction
Europe’s ageing population is drastically increasing in numbers [1], thereby bearing
serious health warnings such as dementias or mental health disorders such as depres-
sion [2]. Hence, the immediate need for early and accurate diagnoses becomes appar-
ent. Ambient-Assisted Living (AAL) technologies can provide support to this end [3].
However, most of these research efforts fail to either become easily acceptable by
end-users or be useful at a practice level; the obtrusive nature of the utilized technolo-
gies invading the daily life of elder adults is probably the one to be blamed [4]. To
this end, the approach followed in this paper, which is also aligned with the major
objective of the USEFIL project [5], is to apply remote monitoring techniques within
an unobtrusive sensor-enabled intelligent monitoring system. The first part of the
intelligent monitoring system is an event-based sensor data fusion (SDF) module,
while the second part consiists of two major components, i) the trend analysis commpo-
nent and ii) a higher leveel formal representation model based on Fuzzy Cognittive
Maps (FCMs) [5]. The aim m of this paper is to present a feasibility study of SDF and
Trend Analysis componentts in real life settings and evaluate their capability of inntel-
ligent health monitoring.
Fig. 2 presents an event hierarchy for the transfer ADL, developed in USEFIL. The
leaves in the tree structure represent LLEs, obtained from sensor measurements, while
each node represents an HLE. According to this representation, to Barthel-score the
transfer ADL (root node), one should determine whether the user changed position,
while receiving help for this task, taking into account the ease and safety with which
the user performs. Each of these indicators (position change, help offered, ease-
safety) is represented by an HLE, defined in terms of LLEs and other HLEs in lower
levels of the hierarchy. The reader is referred to [12] for a detailed account of the
implementation of such a hierarchy in the probabilistic Event Calculus.
Time-series observations are divided into n overlapping windows. The mean ( x ) and the
standard deviation ( r ) of each time window are computed. Then, the mean value and the
standard deviation of the entire process are averaged based on the individual runs:
̂ (1)
The baseline profile is also consisted of a confidence interval for both the process mean
value and the standard deviation. These intervals are defined by the following limits:
̂ ̂
lim lim (2)
√ √
Ongoing monitoring of the process under consideration is facilitated through the cha-
racterization of a further follow-up period based on comparison against the control
limits of the baseline process. Single runs that are out of the control limits are consi-
dered as acute events.
3 Data Collection
In the process of system integration and pilot setup, an e-home like environment was
established serving as an Active & Healthy Aging (AHA) Living Lab (see Fig. 3). A
total of five (5) senior women aged 65+ (mean 74.6±3.85 years) were recruited. All
users provided voluntary participation forms to denote that they chose to participate to
this trial voluntarily after being informed of the requirements of their participation.
The ability of independent living was assessed by the Barthel index. The real testing
and use of the environment took place for several days. Seniors executed several ac-
tivities of everyday life in a free-form manner, meaning that they were left to perform
activities without strict execution orders.
Apart from the lab environment, the system was also installed in home of lone-
living seniors for several days lasting from one to three months. Four (4) elderly
women aged 75.3±4.1 years provided their informed consent for their participation in
the home study. Recordings over several days in these senior apartments measured,
among others, gait patterns, emotional fluctuations and clinical parameters.
4 Results
4.1 Short-Term Monitoring – Scoring of ADLs
In order to evaluate the SDF module, the Transfer ADL was extracted for each senior.
Carers examined seniors and assessed all of them as totally independent, with a
132 A.S. Billis et al.
Barthel score equal to “3”, which is the ground truth for all cases. Therefore, an over-
all confusion matrix for all five seniors is built in Table 1. As shown, SDF several
times scores seniors as needing help with the Transfer Activity (scores “1” and “2”),
although they are totally independent. This paradox can be attributed to the presence
of a facilitator during the monitoring sessions.
TransferScore3
Fig. 4. Walking speed control chart. Yellow line: lower control limit, Red line: upper control
limit, Blue continuous line: baseline period, Dots: follow up days.
5 Discussion
In this paper, mechanisms towards a truly intelligent and unobtrusive monitoring
system for active and healthy aging were demonstrated together with a sample of the
first series of results. Short-term context awareness was tested with the ADL scenario,
Clinical Decision Support for Active and Healthy Ageing 133
while long-term trend analysis with the gait patterns scenario. In the first case, there
were many false positives, due to the challenging, unconstrained nature of the expe-
riment. Personalized thresholds would help SDF algorithm to avoid scoring Barthel
equal to “1”. Trend analysis’ baseline extraction and process control limits would
possibly refine latter inference results. On the other hand, long term analysis, may
benefit by the SDF output, since outliers found to be out of control, could possibly be
annotated as logical “noise” through context awareness. This way pathological values
may be interpreted as normal based on a-priori knowledge of the context.
This whole notion is remarkably appealing, as it could lead to potential applica-
tions where the synergy between the short-term component of the SDF and the long-
term Trend analysis component may prove pivotal. Further data collection from home
environments, will prove pivotal upon integrating successfully the two components.
Acknowledgements. This research was partially funded by the European Union's Seventh
Framework Programme (FP7/2007-2013) under grant agreement no 288532. (www.usefil.eu).
The final part of this work was supported by the business exploitation scheme of LLM, namely,
LLM Care which is a self-funded initiative at the Aristotle University of Thessaloniki (www.
llmcare.gr). A.S. Billis also holds a scholarship from Fanourakis Foundation (http://www.
fanourakisfoundation.org/).
References
1. Lutz, W., O’Neill, B.C., Scherbov, S.: Europe’s population at a turning point. Science
28(299), 1991–1992 (2003)
2. Murrell, S.A., Himmelfarb, S., Wright, K.: Prevalence of depression and its correlates in
older adults. Am. J. Epidimiology 117(2), 173–185 (1983)
3. Kleinberger, T., Becker, M., Ras, E., Holzinger, A., Müller, P.: Ambient intelligence in as-
sisted living: enable elderly people to handle future interfaces. In: Stephanidis, C. (ed.)
UAHCI 2007 (Part II). LNCS, vol. 4555, pp. 103–112. Springer, Heidelberg (2007)
4. Wild, K., Boise, L., Lundell, J., Foucek, A.: Unobtrusive in-home monitoring of cognitive
and physical health: Reactions and perceptions of Older Adults. Applied Gerontology
27(2), 181–200 (2008)
5. https://www.usefil.eu/. Retrieved from web at 29/05/2015
6. Billis, A.S., Papageorgiou, E.I., Frantzidis, C.A., Tsatali, M.S., Tsolaki, A.C., Bamidis,
P.D.: A Decision-Support Framework for Promoting Independent Living and Ageing Well.
IEEE J. Biomed. Heal. Informatics. 19, 199–209 (2015)
7. Opher Etzion and Peter Niblett. Event processing in action. Manning Publications Co. (2010)
8. Kowalski, R., Sergot, M.: A logic-based calculus of events. In: Foundations of Knowledge
Base Management, pp. 23–55. Springer (1989)
9. Skarlatidis, A., Artikis, A., Filippou, J., Paliouras, G.: A probabilistic logic programming
event calculus. Journal of Theory and Practice of Logic Programming (TPLP) (2014)
10. Kimmig, A., Demoen, B., De Raedt, L., Santos Costa, V., Rocha, R.: On the implementa-
tion of the probabilistic logic programming language ProbLog. In: de la Banda, M.G., Pon-
telli, E. (eds.) Theory and Practice of Logic Programming, vol. 11, pp. 235–262 (2011)
11. Collin, C., Wade, D.T., Davies, S., Horne, V.: The barthel ADL index: a reliability study.
Disability & Rehabilitation 10(2), 61–63 (1988)
12. Katzouris, N., Artikis, A., Paliouras, G.: Event recognition for unobtrusive assisted living.
In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 475–488.
Springer, Heidelberg (2014)
Discovering Interesting Trends in Real Medical
Data: A Study in Diabetic Retinopathy
1 Introduction
Knowledge discovery is the process of automatically analysing large amount vol-
umes of data searching for patterns that can be considered as knowledge about
the data [2]. In large real-world datasets, it is possible to discover a large num-
ber of rules and relations, but it may be difficult for the end user to identify
the interesting ones. Trend mining deals with the process of discovering hidden,
but noteworthy, trends in a large collection of temporal patterns. The number of
trends that may occur –especially in large medical databases– is huge. Therefore,
a methodology to distinguish interesting trends is imperative.
In this paper we report on a framework (SOMA) that is capable of perform-
ing trend mining in large databases and evaluating the interestingness of the
produced trends. It uses a three-step approach that: (i) exploits logic rules for
cleaning noisy data; (ii) mines the data and recognises trends, and (iii) evalu-
ates their interestingness. This work encompasses a previous preliminary study
of Somaraki et al. [10]. The temporal patterns of interest, in the context of this
work, are frequent patterns that feature some prescribed change in their fre-
quency between two or more “time stamps”. A time stamp is the sequential
patient consultation event number, i.e., the date in which some medical features
of the patient have been checked and registered. We tested SOMA on the diabetic
retinopathy screening data collected by The Royal Liverpool University Hospi-
tal, UK, which is a major referral centre for patients with Diabetic Retinopathy
(DR). DR is a critical complication of diabetes, and it is one of the most common
cause of blindness in working age people in the United Kingdom.1 It is a chronic
1
http://diabeticeye.screening.nhs.uk/diabetic-retinopathy
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 134–140, 2015.
DOI: 10.1007/978-3-319-23485-4 15
Discovering Interesting Trends in Real Medical Data 135
disease affecting patients with Diabetes Mellitus, and causes significant damage
to the retina.
The contribution of this paper is twofold. First, we provide an automatic app-
roach for evaluating the interestingness of temporal trends. Second, we test the
ability of the SOMA framework in automatically extracting interesting tempo-
ral trends from a large and complex medical database. The extracted interesting
trends have been checked by clinicians, who confirmed their interestingness and
potential utility for the early diagnosis of diabetic retinopathy.
2 Background
4 Experimental Analysis
In this work we considered the data of the Saint Paul’s Eye Clinic of the Royal
Liverpool University Hospital, UK. The data (anonymised in order to guarantee
patients’ privacy) was collected from a warehouse with 22,000 patients, 150,000
visits to the hospital, with attributes including demographic details, visual acuity
data, photographic grading results, data from biomicroscopy of the retina and
results from biochemistry investigations. Stored information had been collected
between 1991 and 2009. Data are noisy and longitudinal; they are repeatedly
sampled and collected over a period of time with respect to some set of subjects.
Typically, values for the same set of attributes are collected at each sample
points. The sample points are not necessarily evenly spaced. Similarly, the data
collection process for each subject need not necessarily be commenced at the
same time.
138 V. Somaraki et al.
In our experimental analysis we considered all the 1420 patients who had
readings over 6 time stamps, 887 patients over 7 time stamps, and 546 patients
on 8 time stamps. The number of patients decreases when time windows are
larger. This is due to the fact that not all the patients are followed by the Clinic
for the same amount of time. For instance, a significant amount of patients from
the database had 2 visits only; clearly, this does not allow us to derive any
meaningful information about general trends. The percentage of missing values
is 9.67% for the test with 6 time stamps, 13.16% for the test with 7 time stamps
and 18.26% for the test with 8 time stamps.
The medical experts working with us required that the experiments focused
on 7 medical features that they believed to be important: age at exam, treat-
ment of the patient, diabetes type, diabetes duration, age at diagnosis, presence
of cataract and presence of DR. Features that represent time are continuous,
and have been discretised in bands. In particular, age at exam and at diagno-
sis have been discretised in bands of approximately 10 years; duration features
are discretised in 5-year bands. This was done following advice from medical
experts. These medical features have some known relationships with regards to
the diagnosis of diabetic retinopathy. Therefore, selecting the aforementioned
features, and following medical experts’ indications, allow us to validate the
SOMA framework in the following way: if the known interesting trends are iden-
tified by SOMA, it increases our confidence that the process is finding valid and
interesting knowledge. The interested reader can find more information about
the validation process in [9].
For each time stamp we used as support threshold 15% and for confidence
80%. The threshold for lift is 1.5 and for the other measures is set to 0.75. The
overall score threshold is set to 80% or where 24 out of the total 30 entries must
be 1. Such thresholds have been identified by discussing with medical experts,
and by performing some preliminary analysis on small subsets of the available
data. It should be noted that different thresholds can significantly affect the set of
identified interesting rules. Lower thresholds lead to larger numbers of potentially
less interesting rules output, while higher thresholds results in a very small –but
highly interesting– set of rules. A major influence on setting parameter values was
to guarantee that the configuration would produce already known associations
and trends, following the aforementioned validation approach.
Results
SOMA was implemented and executed in a MATLAB environment. On the
considered data, six interesting medical clauses regarding DR were discovered:
Table 1. Scores, with regards to considered metrics, of the six medical clauses at
different time stamps (1–6). All Conf stands for the All confidence criteria. Max Conf
indicates the Max confidence criteria.
4. If a patient suffers from diabetes type 1 it is very likely that this patient will
develop diabetic retinopathy.
5. If a patient suffers from diabetes type 2 and is on insulin treatment, this
patient is likely to suffer from diabetic retinopathy.
6. If a patient suffers from diabetes type 2 and the duration of diabetes is longer
than 20 years, this patient is likely to develop diabetic retinopathy.
Table 1 shows the value of the clauses, with regards to the considered criteria,
per time stamp. Given a row, it is possible to assess if the corresponding crite-
rion changed its value over time. The maximum score that a clause can get by
considering 6 time stamps and 5 criteria is 30; therefore, given the threshold of
80%, at least 24 values should be 1. According to Table 1, only the first medical
clause achieves an overall score which is above the threshold. Therefore, clause 1
is deemed to be the most interesting. When using 7 or 8 time stamps, the overall
interestingness reduced respectively to 73% and 64%. This is possibly due to the
smaller number of considered patients, and to the different impact of missing
values on the different datasets. Interestingly, for every considered clause, the
lift value is well above the threshold; this means that in rules in the form of
X =⇒ Y , X and Y of the medical clauses are positively correlated, and there
is associations between X and Y for all medical clauses. It should also be noted
that the confidence of the reverse rules is below the threshold: that explains
why Kulczynski and cosine and All conf could not exceed the threshold, and
indicates that there is not a strong relevance in the reverse rule. On the con-
trary, SOMA revealed a very good confidence also for the inverse rule of clause
1. Ophthalmologists of the Saint Paul’s Eye Unit confirm that this result is very
interesting, and it highlights a cause-and-effect relationship between cataract of
diabetic patients and diabetic retinopathy.
It is well known that diabetes has many factors that affect its progress,
and not all of them will necessarily appear in databases. However, in general,
according to the clinicians of Saint Paul’s Eye Unit, the first two clauses appear
to provide new evidence of previously unknown relations, and are thus worth
investigating further. The other 4 clauses fit with accepted thinking, and so while
not being actionable knowledge, provide validation to the approach described in
this work.
140 V. Somaraki et al.
5 Conclusion
In this paper we described SOMA, a framework that is able to identify inter-
esting trends in large medical databases. Our approach has been empirically
evaluated on the the data of the Saint Paul’s Eye Clinic of the Royal Liverpool
University Hospital. SOMA is highly configurable. In order to meaningfully set
available values, we involved medical experts in the process. In particular, we set
parameters in order to allow SOMA to find previously known interesting trends.
We used this as a heuristic to indicate that the previously unknown interest-
ing trends identified by SOMA within the same configuration may be valuable.
As clinicians confirmed, SOMA was able to identify suspected relations, and to
identify previously unknown causal relations, by evaluating the interestingness
of corresponding trends in the data.
Future work includes applying SOMA to other medical databases, and the
investigation of techniques for visualising trends and results.
Acknowledgments. The authors would like to thank Professor Simon Harding and
Professor Deborah Broadbent at St. Paul’s Eye Unit of Royal Liverpool University
Hospital for providing information and support.
References
1. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In:
Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
2. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in
databases: An overview. AI Magazine 13(3), 57 (1992)
3. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM
Computing Surveys (CSUR) 38(3), 9 (2006)
4. Han, J., Kamber, M., Pei, J.: Data mining: Concepts and techniques, (the morgan
kaufmann series in data management systems) (2006)
5. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: A recent survey.
GESTS International Transactions on Computer Science and Engineering 32(1),
47–58 (2006)
6. Liu, B., Hsu, W., Chen, S.: Using general impressions to analyze discovered clas-
sification rules. In: KDD, pp. 31–36 (1997)
7. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An enabling technique.
Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
8. Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In:
Knowledge Discovery in Databases, pp. 229–238 (1991)
9. Somaraki, V.: A framework for trend mining with application to medical data.
Ph.D. Thesis, University of Huddersfield (2013)
10. Somaraki, V., Broadbent, D., Coenen, F., Harding, S.: Finding temporal patterns
in noisy longitudinal data: a study in diabetic retinopathy. In: Perner, P. (ed.)
ICDM 2010. LNCS, vol. 6171, pp. 418–431. Springer, Heidelberg (2010)
11. Yuan, Y.B., Huang, T.Z.: A matrix algorithm for mining association rules. In:
Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644,
pp. 370–379. Springer, Heidelberg (2005)
Artificial Intelligence
in Transportation Systems
A Column Generation Based Heuristic
for a Bus Driver Rostering Problem
Abstract. The Bus Driver Rostering Problem (BDRP) aims at determining op-
timal work-schedules for the drivers of a bus company, covering all work du-
ties, respecting the Labor Law and the regulation, while minimizing company
costs. A new decomposition model for the BDRP was recently proposed and the
problem was addressed by a metaheuristic combining column generation and an
evolutionary algorithm. This paper proposes a new heuristic, which is inte-
grated in the column generation, allowing for the generation of complete or
partial rosters at each iteration, instead of generating single individual work-
schedules. The new heuristic uses the dual solution of the restricted master
problem to guide the order by which duties are assigned to drivers. The know-
ledge about the problem was used to propose a variation procedure which
changes the order by which a new driver is selected for the assignment of a new
duty. Sequential and random selection methods are proposed. The inclusion of
the rotation process results in the generation of rosters with better distribution
of work among drivers and also affects the column generation performance.
Computational tests assess the proposed heuristic ability to generate good quali-
ty rosters and the impact of the distinct variation procedures is discussed.
1 Introduction
In the next section the decomposition model for the BDRP is introduced. Section 3
introduces the column generation method, the improvements made by using an heuristic
to solve the subproblems and the global heuristic used to solve all the subproblems
together. Section 4 presents the computational tests run in a set of BDRP instances,
using three configurations of the global heuristic. Section 5 provides some conclusions.
The adopted model for the BDRP is an integer programming formulation adapted
from the one proposed in [4]. The complete adapted compact model and the decom-
position model were presented in [9]. The model is only concerned with the rostering
stage, assuming that the construction of duties was previously done by joining trips
and rest times to obtain complete daily duties ready to assign to drivers.
In the decomposition model, for each driver, the model considers a set of feasible
schedules, represented by the columns built with subproblems’ solutions. The set of
all the possible valid columns can be so large, making impossible its enumeration.
Therefore, only a restricted subset of valid columns are considered, leading to the
formulation of a restricted master problem (RMP) of the BDRP decomposition model.
RMP Formulation:
∑ ∑ (1)
Subject to:
∑ ∑ 1, , 1, … ,28, (2)
∑ 1, , (3)
0,1 , , . (4)
Where:
– Binary variable associated to the schedule j of driver v, from set of drivers V;
– Set of valid schedules for driver v (generated by subproblem v);
– Cost of the schedule j obtained from the subproblem of driver v;
– Assumes value 1 if duty i of day h is assigned in the schedule j of driver v;
In this model, the valid subproblem solutions are represented as columns, with cost
for the solution with index j of the subproblem v, with the assignment of duty i on
day h, if =1;
is the set of work duties available on day h.
The objective function is to minimize the total cost of the selected schedules, the
first set of constraints, the linking constraints (2), assure that all duties, in each day,
are assigned to someone and the last set of constraints, the convexity constraints (3),
assure that a work-schedule is selected for each driver/subproblem.
146 V. Barbosa et al.
To give some context about the subproblem constraints for the next sections, we
describe below the constraints included in its formulation. To see the complete model
with the description of the variables and data, we recommend the reading of [9]. The
constraints are the following:
─ A group of constraints assures that, for each day of the rostering problem, a duty is
assigned to the driver (the day-off is also represented as a duty);
─ A group of constraints avoids the assignment of incompatible duties in consecutive
days (if a driver works in a late duty on day h, the minimum rest time prevents the
assignment of an early duty on day h+1). A subset of these constraints considers
information from last duty assigned on the previous roster to be considered on the
first day assignment;
─ A group of constraints avoids the assignment of sequences of work duties that do
not respect the maximum number of days without a day-off. A subset of these con-
straints also considers information from the last roster to force the assignment of
the first day-off considering the working days on the end of the previous rostering
period;
─ A group of constraints forces a minimum number of days-off in each week of the
rostering period and also a minimum number of days-off in a Sunday during the
rostering period;
─ Another group of constraints sets limits on the sum of the working time units each
driver can do in each week and in all the rostering period;
─ A constraint is used to apply a fixed cost whenever a driver is used (at least one
work duty is assigned in the driver’ schedule).
The function TestAssignment used in the heuristic algorithm tests all the conditions
previously enumerated, which represent the constraints of the subproblems formula-
tion. If any of the conditions fails, the function returns false and only if all the condi-
tions are verified the function returns true, allowing the assignment of the duty to the
schedule of the driver represented by the subproblem.
Having an heuristic to obtain solutions to the subproblems, the column generation al-
gorithm is changed to use the heuristic, since it does not replace the exact optimization
solver, because the solutions of the heuristic are not optimal, only valid. The resulting
algorithm is presented in Figure 2 and details the column generation using the heuristic.
DO
Optimize RMP;
Update subproblems objective function with current dual solution of the RMP;
FOR EACH subproblem
Solve using heuristic;
Add new columns into the RMP with subproblems attractive solutions;
IF no new columns added THEN
FOR EACH subproblem
Solve using exact optimization solver;
Add new columns into the RMP with subproblems attractive solutions;
WHILE new columns added >0
Fig. 2. Column Generation with Subproblem Heuristic Algorithm
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 149
In the new configuration of the column generation cycle, the heuristic is used until
no new columns are added from the obtained solutions. At that point, the exact opti-
mization solver is used to obtain the optimal solutions of the subproblems and even-
tually add new attractive columns. In the next iteration the heuristic is tested again.
In the SearchCol++ framework, the algorithm presented in Figure 2 can have other
configurations. It is possible to solve only a single subproblem in each iteration, op-
timize the RMP again and, in the following iteration solve the next subproblem, iterat-
ing by all the subproblems. This strategy results in less columns added to the RMP
when the subproblems are returning similar solutions, allowing a faster optimization
of the RMP, due to a reduced number of variables.
The heuristic presented in Figure 3 is able to build rosters by testing the assignment
of each of the available duties in the schedules of free drivers. Since in each iteration
of the column generation a new dual solution is used to update the costs of the duties
in the subproblems, the order in which the duties are assigned may vary from iteration
to iteration. The objective is that the dual solution of the RMP can guide the genera-
tion of distinct, and valid, rosters through the iterations.
When using the aggregated heuristic in the column generation algorithm in Figure 2,
the cycle solving the subproblems is replaced by a single call to the new heuristic,
which returns schedules for all subproblems/drivers. The exact solver continues to be
used when no new attractive columns are built from the heuristic solutions.
The BDRP model defines a cost to each unit of time of overtime which may be dif-
ferent to all drivers. However, in our test instances, the drivers are split in a limited
number of categories. All the drivers in the same category have the same cost for the
overtime labor. This means that we still want to assign first the duties with bigger
overtime to the drivers from the category with lower cost of overtime, if possible.
However, we want to distribute them among all, avoiding the schedules with extra
days-off because of a large concentration of duties with overtime.
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 151
Although the ability of the Roster Builder Heuristic to generate valid and distinct ros-
ters, preliminary tests showed that the schedules of the first drivers were filled with the
duties with higher overtime. Even if we want to assign the duties with higher overtime
to drivers with lower salary, which are the first group in the set of all drivers, if the as-
signment starts always from the same driver, his/her schedule will be filled with the
duties with larger overtime, resulting in an unbalanced work distribution.
Given the existence of different drivers’ categories, concerning the value paid by
overtime labor, drivers of the same category are grouped and the dual solution values
of the convexity constraints are used to order them inside each group.
To assure that when the dual values of the convexity constraints do not lead to the
desired diversity in the order of the driver inside each group, we added an additional
procedure to select the first driver inside each ordered group. We started considering
each group of drivers as a circular array. After that, two configuration were prepared
to define how a driver is selected when a new duty needs to be assigned.
By default, when a new duty is selected for assignment, the driver to select is the
one in the position 0 of the first group. We developed two configurations of the Ros-
ter Builder Heuristic with drivers’ rotation, namely the sequential and the random
configurations. In both, after the assignment of a duty, we rotate the drivers inside the
group, the first is removed and inserted at the end. In the sequential configuration, the
rotation is of a single position, and in the random configuration, the number of posi-
tions rotated is randomly selected between one and the number of drivers in the group
minus one, to avoid a complete rotation to the same position.
The inclusion of the rotation leads to a better distribution of the duties with over-
time among the group drivers. Figure 4 presents the algorithm of the roster builder
heuristic with drivers’ rotation. The changes are: the inclusion of the groups of driv-
ers, the selection of the configuration: ‘normal’ – without rotations; ‘sequential’ – to
rotate one position, picking the drivers sequentially; ‘random’ - using the stochastic
selection by rotating the driver inside the group using a random number of positions.
If a new roster built by the heuristic is better than the best found in previous iterations
of the metaheuristic, the best is updated accordingly. The schedules composing the roster
are saved in the poll of solutions whenever considered attractive by column generation.
In the next section the computational tests and the results obtained using this new heu-
ristic (column generation with heuristic solving subproblems aggregated) are presented.
4 Computational Tests
The decomposition model for the adopted BDRP was implemented in the computa-
tional framework SearchCol++ [10]. The BDRP test instances are the ones designated
as P80 in [4]. All the instances have 36 drivers available, distributed by four salary
categories in groups of equal size (9 drivers). All the tests ran on a computer with
Intel Pentium CPU G640, 2,80GHz, 8 Gb of RAM, Windows 7 Professional 64 bits
operating system and IBM ILOG 12.5.1 64 bits installed. In all the test configurations,
only the column generation stage with the use of the new heuristic was run. It allows
to retrieve the lower bounds of the optimal solution (linear), the time consumed to
obtain that solution and the integer solution found by the global heuristic (solve all the
subproblems aggregated).
In both heuristic configurations where the rotation of the drivers is used, we set
that the rotation is not applied in 20% of the iterations (nearly the double of the prob-
ability of a driver to be selected randomly inside each group). In practice, for these
iterations the assignment starts by the first driver of the ordered group keeping the
order defined by the dual values.
Table 1 presents the results obtained from running the three configurations of the
Roster Builder Heuristic in each instance, namely, the computational time used by
the column generation to achieve the optimal solution (Time) and the value (Value)
of the integer solution found. The lower bound (LB) provided by the CG (ceiling
of the optimal solution value) is included in the table (the value is the same for all
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 153
configurations, only the normal configuration was unable to obtain an optimal solu-
tion for the instance P80_6 in the time limit of two hours). For the random configura-
tion, each instance was solved 20 times. In addition to the best value and its computa-
tional time, the table also display the average (Avg) and standard deviation (σ) of the
values and times of the runs.
Under the Time columns, the average time is presented. The “normal” configura-
tion is penalized by instance P80_6 where the time limit of two hours was reached
before obtaining the optimal solution. The best values (time and value) are displayed
in bold. Generally the computational times of the rotation heuristics are better, how-
ever for the P80_8 the time is considerably higher when comparing both with the
“normal” one. The configurations with rotation were able to reach the best solutions
for all instances, particularly the random configuration, which also reduces the aver-
age computational time by 39% relatively to normal configuration.
The heuristic solutions were compared with the solution value of the optimization
of the compact model using the CPLEX solver with the time limit of 24 hours. Table
2 presents the gaps between the best heuristic solutions with the best known solutions.
Only for the instances where the gaps are marked with bold the optimal solution was
found by the CPLEX solver before the time limit. The gap of the solutions found by
our heuristic is in average 3.2%.
Table 2. Gap of the best heuristic integer solution to the best known solution
Instance Solution Gap
P80_1 3601 2.0%
P80_2 2819 3.9%
P80_3 4694 2.6%
P80_4 3755 5.2%
P80_5 3608 3.0%
P80_6 3650 1.4%
P80_7 3840 3.7%
P80_8 4809 4.9%
P80_9 3594 1.9%
P80_10 4183 3.5%
154 V. Barbosa et al.
The previous results show that all the configurations are able to obtain good quality
rosters for the BDRP instances in test and that the separation of the drivers by catego-
ry groups with the inclusion of the rotation procedure has a significant impact in the
column generation optimization time. Besides that, Table 3 shows the impact on the
roster when changing the configuration used. For all the configurations, the table
presents the average (Avg) units of overtime assigned to a member of the first group
(lower cost), the second column (Δ) presents the maximum difference of overtime
assigned between the drivers and the last column (days-off) presents the number of
extra days-off counted in the schedules of the 9 members of the group.
With the rotation procedures more days-off are counted. However, it is observed
that, in average, additional units of overtime were assigned to the drivers, reaching
one additional unit, when comparing the sequential and the normal heuristics. The
most important change is observed in the uniformity of the distribution of the over-
time, where the random configuration reduces the difference for the normal configu-
ration by 5.6 units of time, and the sequential which reduces that value to less than
half.
5 Conclusions
In this paper, we presented a new heuristic capable of building good quality rosters to
the BDRP. The heuristic is integrated with the column generation exact optimization
method, using the information from the dual solutions.
In the BDRP, the objective is to define the schedules for all the drivers in the ros-
tering period considered, assuring the assignment of all the duties and optimizing each
driver use, reducing bus company costs.
In the proposed method, a decomposition model is implemented in a framework
and column generation is used to optimize it. The standard optimization of the sub-
problems in the column generation iterations is replaced by a global heuristic which
solves all the subproblems together. The heuristic is guided by the information from
the RMP solution, as it sets the order by which duties are assigned, and also by which
order the drivers are selected when assigning a new duty. Three configurations of
this heuristic are presented: the normal configuration makes use of the dual
A Column Generation Based Heuristic for a Bus Driver Rostering Problem 155
information to guide the assignment of all the duties; the sequential and the random
configurations group the drivers by category, and implement a rotation of drivers
inside the groups (by 1 and a random number, respectively). The last two configura-
tions intend to obtain rosters with a better distribution of work among drivers and
more diversity of schedules.
Computational tests were made in a set of BDRP instances and the results pre-
sented. In the results it is observed that the different configurations of the heuristics
have impact in the performance of the column generation and also that good quality
rosters are obtained by all configurations. The quality of the obtained rosters is eva-
luated by comparison with the best known integer solutions, where the average gap is
3.2%. An evaluation of the schedules of the first group of drivers shows that the rota-
tion procedure has impact in the distribution of overtime among drivers, particularly
when the sequential configuration is used. Besides the better distribution of overtime,
the rotation configuration was able to obtain better solutions by augmenting the aver-
age overtime units assigned to the drivers of the first group (with lower cost), even
with the additional days-off counted. The additional overtime assigned to drivers of
the first group compensates the extra days-off assigned.
Our heuristic with the variation configurations seems to work well in the BDRP in
most of the instances tested, however it is not guaranteed that the rotation is able to
improve the performance of the CG or obtain better solution, as in the instance P80_8
where the computational time increased greatly when comparing with the normal
configuration. If the solutions obtained by the heuristic do not include attractive solu-
tions to the column generation, the computational time can increase.
The proposed heuristic can be used with other problems, provided that there is an
heuristic to solve the subproblems and that it is possible to use it in an aggregated
way. The variation strategies used need to be tailored using knowledge about each
problem.
Future work will focus on tuning this heuristic to improve the column generation
performance and, if possible, obtain better integer solutions for the rostering problem.
We also intend to generate a search-space composed of solutions provided by this
heuristic, so that the concept of the SearchCol can be followed and other metaheuris-
tics can explore the recombination of the obtained rosters (complete or partial) to get
closer to the optimal solutions. Application of the current approach to other rostering
problems is being considered as future work, since minor changes are needed for
adaptation of the general metaheuristic, as well as for the roster generation heuristic
here proposed.
Acknowlegments. This work is supported by National Funding from FCT - Fundação para a
Ciência e a Tecnologia, under the project: UID/MAT/04561/2013.
References
1. Ernst, A.T., Jiang, H., Krishnamoorthy, M., Sier, D.: Staff scheduling and rostering:
A review of applications, methods and models. European Journal of Operational Research
153, 3–27 (2004)
156 V. Barbosa et al.
2. Van den Bergh, J., Beliën, J., De Bruecker, P., Demeulemeester, E., De Boeck, L.:
Personnel scheduling: A literature review. European Journal of Operational Research 226,
367–385 (2013)
3. Ernst, A.T., Jiang, H., Krishnamoorthy, M., Owens, B., Sier, D.: An Annotated Bibliogra-
phy of Personnel Scheduling and Rostering. Annals of Operations Research 127, 21–144
(2004)
4. Moz, M., Respício, A., Pato, M.: Bi-objective evolutionary heuristics for bus driver roster-
ing. Public Transport 1, 189–210 (2009)
5. Dorne, R.: Personnel shift scheduling and rostering. In: Voudouris, C., Lesaint, D., Owusu, G.
(eds.) Service Chain Management, pp. 125–138. Springer, Heidelberg (2008)
6. Burke, E.K., Kendall, G., Soubeiga, E.: A Tabu-Search Hyperheuristic for Timetabling
and Rostering. Journal of Heuristics 9, 451–470 (2003)
7. Respício, A., Moz, M., Vaz Pato, M.: Enhanced genetic algorithms for a bi-objective bus
driver rostering problem: a computational study. International Transactions in Operational
Research 20, 443–470 (2013)
8. Leone, R., Festa, P., Marchitto, E.: A Bus Driver Scheduling Problem: a new mathematical
model and a GRASP approximate solution. Journal of Heuristics 17, 441–466 (2011)
9. Barbosa, V., Respício, A., Alvelos, F.: A Hybrid Metaheuristic for the Bus Driver Rostering
Problem. In: Vitoriano, B., Valente, F. (eds.) ICORES 2013–2nd International Conference on
Operations Research and Enterprise Systems, pp. 32–42. SCITEPRESS, Barcelona (2013)
10. Alvelos, F., de Sousa, A., Santos, D.: Combining column generation and metaheuristics.
In: Talbi, E.-G. (ed.) Hybrid Metaheuristics, vol. 434, pp. 285–334. Springer,
Heidelberg (2013)
11. Lübbecke, M.E., Desrosiers, J.: Selected Topics in Column Generation. Oper. Res. 53,
1007–1023 (2005)
12. Puchinger, J., Raidl, G.R.: Combining metaheuristics and exact algorithms in combinatori-
al optimization: a survey and classification. In: Mira, J., Álvarez, J.R. (eds.) First Interna-
tional Work-Conference on the Interplay Between Natural and Artificial Computation.
Springer, Las Palmas (2005)
13. Nemhauser, G.L.: Column generation for linear and integer programming. Documenta
Mathematica Extra Volume: Optimization Stories, 65–73 (2012)
14. Dantzig, G.B., Wolfe, P.: Decomposition Principle for Linear Programs. Operations
Research 8, 101–111 (1960)
15. Cintra, G., Wakabayashi, Y.: Dynamic programming and column generation based ap-
proaches for two-dimensional guillotine cutting problems. In: Ribeiro, C.C., Martins, S.L.
(eds.) WEA 2004. LNCS, vol. 3059, pp. 175–190. Springer, Heidelberg (2004)
16. Yunes, T.H., Moura, A.V., de Souza, C.C.: Hybrid Column Generation Approaches for
Urban Transit Crew Management Problems. Transportation Science 39, 273–288 (2005)
17. dos Santos, A.G., Mateus, G.R.: General hybrid column generation algorithm for crew
scheduling problems using genetic algorithm. In: IEEE Congress on Evolutionary Compu-
tation. CEC 2009, pp. 1799–1806 (2009)
18. Barbosa, V., Respício, A., Alvelos, F.: Genetic Algorithms for the SearchCol++ framework:
application to drivers’ rostering. In: Oliveira, J.F., Vaz, C.B., Pereira, A.I. (eds.) IO2013 -
XVI Congresso da Associação Portuguesa de Investigação Operacional, pp. 38–47. Instituto
Politécnico de Bragança, Bragança (2013)
A Conceptual MAS Model for Real-Time Traffic Control
Abstract. This paper presents the description of the various steps to analyze
and design a multi-agent system for the real-time traffic control at isolated in-
tersections. The control strategies for traffic signals are a high-importance topic
due to impacts on economy, environment and society, affecting people and
freight transport that have been studied by many researches during the last dec-
ades. The research target is to develop an approach for controlling traffic sig-
nals that rely on flexibility and maximal level of freedom in control where the
system is updated frequently to meet current traffic demand taking into account
different traffic users. The proposed model was designed on the basis of the
Gaia methodology, introducing a new perspective in the approach where each
isolated intersection is a multi-agent system on its own right.
1 Introduction
2 Literature Review
MASs have been suggested for many transportation problems such as traffic signal
control. Zheng, et al. [2] describe their autonomy, their collaboration, and their reac-
tivity as the most appealing characteristics for MAS application in traffic manage-
ment. The application of MASs to the traffic signal control problem is characterized
by decomposition of the system into multiple agents. Each agent tries to optimize its
own behavior and may be able to communicate with other agents. The communication
can also be seen as a negotiation in which agents, while optimizing their own goals,
can also take into account the goals of other agents. The final decision is usually a
trade-off between the agent’s own preferences against those of others. MAS control is
decentralized, meaning that there is not necessarily any central level of control and
that each agent operates individually and locally. The communication and negotiation
with other agents is usually limited to the neighborhood of the agent, increasing ro-
bustness [3]. Although there are many actors in a traffic network that can be consi-
dered autonomous agents [4] such as drivers, pedestrians, traffic experts, traffic lights,
traffic signal controllers, the most common approach is that in which each agent
represents an intersection control [3]. A MAS might have additional attributes that
enable it to solve problems by itself, to understand information, to learn and to eva-
luate alternatives. This section reviews a number of broad approaches in previous
research that have been used to create intelligent traffic signal controllers using
MASs. In some work [5-8] it was argued that the communication capabilities of MAS
can be used to accomplish traffic signal coordination. However, there is no consensus
A Conceptual MAS Model for Real-Time Traffic Control 159
on the best configuration for a traffic-managing MAS and its protocol [7]. To solve
conflicts between agents, in addition to communication approaches, work has been
done on i) hierarchical structure, so that conflicts are resolved at an upper level,
ii) agents learning how to control, iii) agents being self-organized.
Many authors make use of a hierarchical structure in which higher-level agents are
able to monitor lower level agents and intervene whenever necessary. In some ap-
proaches [6, 8, 9] there is no communication between agents at the same level. The
higher-level agents have the task of resolving conflicts between lower-level agents
which they cannot resolve by themselves. In approaches (ii) and (iii), agents need
time to learn or self-organize, which may be incompatible with the dynamics of the
environment. Agents learning to control (ii) is a popular approach related to control-
ling traffic signals. One or more agents learn a policy for mapping states to actions by
observing the environment and selecting actions; the reinforcement learning technique
is the most popular method used [4, 10, 11]. The approach of self-organizing agents
(iii) is a progressive system in which agents interact to communicate information and
make decisions. Agent behavior is not imposed by hierarchical elements but is
achieved dynamically during agent interactions creating feedback to the system [12].
Dresner and Stone [13] view cars as an enormous MAS involving millions of hete-
rogeneous agents. The driver agents approaching the intersection request the intersec-
tion manager for a reservation of “green time interval.” The intersection manager
decides whether to accept or reject requested reservations according to an intersection
control policy. Vasirani and Ossowski [14, 15] extended Dresner’s and Stone’s ap-
proach to network intersections. The approach is called market-based in which driver
agents, i.e., buyers, trade with the infrastructure agents, i.e., sellers in a virtual mar-
ketplace, purchasing reservations to cross intersections. The drivers have an incentive
to choose an alternative to the shortest paths.
In summary, since the beginning of this century, interest in application of MAS to
traffic control has been increasing. Further, the promising results already achieved by
several authors have helped to establish that agent-based approaches are suitable to
traffic management control. Most reviewed MASs have focused their attention on
network controllers, with or without coordination, rather than on isolated intersec-
tions. Another issue is that traffic control approaches focus on private vehicle as the
major component of traffic, and may be missing important aspects of urban traffic
such as public transport and soft modes (pedestrian, bicycles).
3 Methodological Approach
The development of a MAS conceptual model for real-time traffic control at an iso-
lated intersection followed a methodology for agent-oriented analysis and design. In
this section an increasingly detailed model is constructed using Gaia [16, 17] as the
main methodology, complemented by concepts introduced by Passos, et al. [18].
The first step is an overview of the scenario description and system requirements. The
Gaia process starts with an analysis phase whose goal is to collect and establish the or-
ganization specifications. The output of this phase is the basis for the second phase,
namely the architectural design, and the third phase, which is a detailed design phase.
160 C. Vilarinho et al.
• At time X (e.g., each 5 min) or event Y (e.g., traffic conditions, new topology,
system failure), a request for a new traffic signal plan is created;
• All information about current topology and traffic conditions is updated to generate
new traffic data predictions for the movements of each traffic component. In this way
a new traffic signal plan is defined to meet the new intersection characteristics;
• During processing of the new traffic signal plan, if topology has changed, the stage
design is developed following the new topology;
• The traffic signal plan is selected based on criteria such as the minimum delay, the
system saves the traffic plan information (design, times) and implements it;
• During monitoring, current traffic data are compared with traffic predictions, the
topology is verified and data are analyzed by the auditor, which computes the ac-
tual level of service and informs the advisor of the results. Depending on the re-
sults, the auditor decides if it should make a suggestion for the traffic streams such
as to terminate or to extend the current stage or if a new plan should be requested;
• Depending on the information received, traffic streams can continue with the traf-
fic signal plan or negotiate adjustments to it;
The system is responsible for defining and implementing a traffic signal plan as
well as deciding when to suspend it, in which case it initiates negotiation between
traffic streams to adjust the plan according to traffic flow fluctuations and characteris-
tics (e.g., traffic modes, priority vehicles), or even decides to design a new plan.
As input, the Gaia methodology uses a collection of requirements. The require-
ments can be collected through analyzing and understanding the scenario in which the
organizations are identified, as well as the basic interactions between them to achieve
their goals. For early requirements collection, it uses the Tropos methodology [19], in
which relevant roles, their goals and intentions, as well as their inter-dependencies are
identified and modeled as social actors with dependencies.
Collect
Design Traffic Generate Traffic
Information
Stream Data Prediction
Assume traffic Keep movement
In case of topology change data in case of information
(user or event) update no sensor update Reliable prediction
traffic stream Split into
Detect Minimize data traffic
Respect the sensor processing mode Split into vehicle
street topology time occupancy
fault
Provide all
Traffic Traffic Provide actual
Traffic Stream Traffic
Stream Data Traffic Data
information Predictor
Provider Provider
Ask actual
traffic data Provide Traffic
Ask actual Predictions
Verify Topology traffic data Provide actual
Traffic Data Ask actual traffic Generate
Observe objective function
predictions Stage Design
“actual” and prediction
Monitor Generate Stage Sequence
Observe traffic Calculate the Provide Traffic
data “actual” level of service /Auditor Traffic
Planner Information Determine Traffic Signal Time
and prediction of intersection Signal
Planner Choose Traffic Signal Plan
Track system Keep information Provide
Information Ask new plan
performance updated Min/ Max objective function
of the intersection
Assist timely decision Provide Provide
making by advisor actuation Traffic Traffic
Satisfy traffic signal
Advisor Signal Plan
suggestions Stream plan constraints
The TrafficStreamProvider has the goal to design a traffic stream. Each traffic
stream is described by movements and by lanes assigned to each movement, including
information about traffic sensor locations. To achieve its goal, two soft goals were
defined: to respect the intersection topology and to keep the topology information
updated in case of topology changes (e.g. road works, accidents). The actor should
“provide all traffic stream information” to TrafficDataProvider.
The TrafficDataProvider has the main objective to collect information about traffic
data from sensors installed at a signalized intersection and aggregate data according to
the traffic stream information received. The goal is built upon four sub-goals: to keep
traffic data information updated, to minimize data processing time when dependent
actors are waiting for the information, assume traffic data if no sensor is installed or if
a sensor seems to act strangely. TrafficPredictor requests recent traffic data from this
actor and makes its own traffic predictions. Monitor/Advisor also requests traffic data
from this actor and uses them for early detection of possible problems and improve-
ments at the intersection control.
The TrafficPredictor has the main goal to generate a traffic data prediction for
each movement; this is to optimize signal control for imminent demand rather than
being reactive to current flow. The strategy may include traffic measurements from a
past time period and the current time and uses them to estimate the near future. The
generated traffic prediction should be: reliable for future traffic and comprehensive,
with total values and splits into traffic modes. The actor requests recent traffic data
from TrafficDataProvider and makes its own traffic predictions. TrafficSignalPlanner
requests this actor for recent traffic predictions and uses them to optimize the traffic
plan.
The TrafficSignalPlanner has the following objectives. Generate stage design:
search possible signal group sets that can run concurrently respecting a set of safety
162 C. Vilarinho et al.
constraints; generate stage sequence: once possible stage designs are defined, this
step compiles strategic groupings of stages to have signal plans designed; determine
traffic signal times: for each traffic signal plan, the green-interval durations, inter-
green and cycle lengths are calculated; and choose traffic signal plan, based on a
criterion or a weighted combination. Two soft goals were defined for the objective:
traffic signal plan selection is based on the best objective function and plan design
and timing should be conducted respecting some operational constraints, such as
maximum and minimum cycle lengths. The actor requests TrafficPredictor for recent
traffic predictions and uses these to optimize its traffic signal plan. It provides the
selected plan to TrafficStream to be applied. Advisor asks for a new plan search if the
current plan is not adequate to remain active. Finally, Monitor/Auditor receives traffic
planner information such as traffic predictions and the objective function so it can
monitor independently.
The TrafficStream has three main goals. Apply a traffic signal plan, each traffic
stream assumes a signal state: red, yellow or green according to the plan or the current
actuation action, if it has been defined; negotiate actuation, traffic streams cooperate
to find possible actuation actions following the advisor’s suggestions; and decide
actuation, traffic stream actors together decide an actuation action to implement. To
accomplish its goal, the actor intends: to verify transition to next stage and to satisfy
user beliefs about the traffic light to prevent frustration. The actor receives the se-
lected traffic signal plan from TrafficSignalPlanner to apply it and actuation sugges-
tions from Advisor to guide the negotiation phase. If negotiations are needed, Traffic
Stream actors discuss these among themselves.
The Advisor´s two main objectives are: to evaluate the future of plan, choose a
possible action depending on information received from Monitor/Auditor: find a new
plan, adjust the current plan or continue the implementation; and to suggest actuation
action, if it is decided to adjust the plan through actuation, the actor prepares a rec-
ommendation to guide the actuation process. The suggest actuation action has a soft
goal defined: formulate a recommendation that will restrict the solution space of actu-
ation negotiation. The actor provides actuation suggestions to TrafficStream. It re-
quests a new plan search from TrafficSignalPlanner if the current plan is not adequate
to remain active. Monitor/Auditor sends monitor information to this actor.
The Monitor/Auditor´s four main objectives are: verify topology, check if any topol-
ogy change occurred, and report it to Advisor if so; observe traffic data, “actual” and
prediction; observe objective function; and calculate the level of service of the intersec-
tion. The data acquired through monitoring are used to evaluate if Advisor should be
asked for any plan change. The objectives are complemented with three sub-goals: data
collection to keep information updated, track system performance and assist timely
decision-making by Advisor to exploit every opportunity to improve the intersection
system. The actor requests recent traffic data from TrafficDataProvider and receives
them for early detection of possible problems and improvements at the intersection
control. It receives traffic planner information such as traffic predictions and the objec-
tive function from TrafficSignalPlanner. It sends monitor information to Advisor.
Modeling the environment is one of the agent-oriented methodologies’ major activi-
ties. The environment model can be viewed in its simplest form as a list of resources
that the MAS can exploit, control or consume when working towards the accomplish-
ment of its goal. The resources can be information (e.g., a database) or physical entity
A Conceptual MAS Model for Real-Time Traffic Control 163
(e.g., a sensor). Six resources were defined for the proposed traffic signal control: topol-
ogy, traffic detector, traffic database, traffic prediction, traffic signal plan and traffic
light. The resources are identified by name and characterized by their types of actions.
A partial list of those resources is:
• Topology has the action to read and change when new topology is detected. The
resource contains information regarding intersection topology such as number of
traffic arms, their direction, number of approach lanes in each traffic arm, move-
ments assigned in each lane and traffic detector position;
• Traffic Detector is essential for the system because it contains all traffic data (read)
and also needs to be frequently updated (change) so it can correspond to the real
traffic demand. This makes it possible to know information in each detector about:
current traffic data in lane, number of users type, vehicle occupancy, traffic flow
distribution by movement, lanes without sensor or equipment failure;
Complex scenarios such as this are very dynamic, so the approach presented by Pas-
sos, Rossetti and Gabriel [18] extends Gaia methodology to include Business Process
Management Notation (BPMN) to capture the model dynamics. Business Process (BP)
collects related and structured activities that can be executed to satisfy a goal.
Traffic Stream
Gather
Traffic Data Provider
Provider
Traffic Predictor
Signal Traffic Planner
Traffic Stream
Advisor
Monitor / Auditor
The diagram in Fig. 2 shows the interactions between the seven participants (actors
of Fig. 1) with message exchanges and includes tasks within participants, providing a
detailed visualization of the scenario. Their interactions with resources are also
present in the diagram.
The actors and goals in diagrams in Fig.1 and Fig.2 help to identify the roles that
will build up the final MAS organization. The preliminary roles model defined first,
as the name implies, is not a complete configuration at this stage, but it is appropriate
to identify system characteristics that are likely to remain. It identifies the basic skills,
functionalities and competences required by the organization to achieve its goals. For
traffic signal control 13 preliminary roles were defined. A partial list of those roles is:
There are three types of relationships: “depends on”, “controls” and “peer”. “De-
pends on” is a dependency relationship that means one role relies on resources or
knowledge from the other. “Controls” is an association relationship usually meaning
that one role has an authoritative relationship with the other role, controlling its ac-
tions. “Peer” is a dependency relationship also and usually means that both roles are
at the same level and collaborate to solve problems.
After achieving the structural organization, the roles and interactions of the prelim-
inary model can be fulfilled. To complete the role model, it is necessary to include all
protocols, the liveness and safety responsibilities. In Table 1, one of 13 roles is de-
scribed according to the role schema. For complete definition of interaction protocols,
they should be revised to respect the organizational structure. Table 2 shows the defi-
nition of the “InformPlanEvents" protocol using Gaia notation.
Id Properties
Description: role involves monitoring the traffic condition and traffic signal plan for events related
to: Substantial difference: between objective function acceptable value versus current value, or
traffic flow (total and by traffic) prediction versus current; Reach some limit of measure of effec-
tive (such as maximum queue length); Number maximum of plan repetition; Sensor system fault.
After detecting one of these events the RosterPlanMonitor role will request to evaluate what should
be done for EvaluateFutureOfPlan role.
RosterPlanMonitor
To better clarify the organization with its roles and interactions, a final diagram is
presented in Fig.4 with the full model, including all protocols and required services
that will be the bases for the roles agents choose.
166 C. Vilarinho et al.
The detailed design phase is the last step, and it is responsible for the most important
output: the full agent model definition for helping the actual implementation of
agents. The agent model identifies the agents from role-interaction analysis. Moreo-
ver, it includes a service model. The model design should try to reduce the model
complexity without compromising the organization rules. To present the agent model,
the dependency relations between agents, roles and services are presented in Fig. 5.
The diagram above should be read as “Traffic Signal Planner agent is responsible to
perform the service Make Plan Static.” Five types of agents were defined. “Traffic
Stream” has n agents, one for each traffic stream of the intersection, so it depends on
A Conceptual MAS Model for Real-Time Traffic Control 167
intersection topology. It means that the agent class “Traffic Stream” is defined to play
the roles PlanApply, NegotiateTrafficActuation and UpdatePlan, and there are between
one and n instances of this class in MAS. After the completion of the design process, the
agent classes defined are ready to be implemented, according to previous models.
4 Conclusions
This paper presents the design of a conceptual model of a multi-agent architecture for
real-time signal control at isolated traffic intersections using Gaia as the main metho-
dology. The main idea is to make rational decisions about traffic stream lights such
that the control is autonomous and efficient under different conditions (e.g., topology,
traffic demand, traffic priority, and system failure). The traffic control of an isolated
intersection has the advantage that each intersection may have an independent control
not limited by neighbors’ control. This allows a control algorithm to be simpler than
one for coordinated intersections and more flexible to define plan design and times.
Comparing the proposed strategy with traditional approaches using MAS, it is
possible to find several differences. Traditional traffic control methods rely on each
agent controlling an intersection within the traffic network. The system usually has a
traffic signal plan defined a priori and the system controls how to perform small ad-
justments such as decreasing, increasing or advancing the green time interval of a
traffic stage. Research is being conducted on using MAS to coordinate several neigh-
boring agent controllers, in either a centralized or distributed system. Another feature
shared by traditional approaches is the agent decision (action selections) based on
learning. From the result of each decision, the learning rule gives the probability with
which every action should be performed in the future.
As introduced before, the present approach is distinct from other works to the ex-
tent that each traffic stream is an agent and each signalized intersection builds upon
independent MASs. Thus, the multitude of agents designed for isolated intersections
create, manage and evolve their own traffic signal plans. Therefore this proposed
multi-agent control brings the benefit of staged designs and sequences being formed
as needed instead of being established a priori. The system structure is flexible, and it
has the ability to adapt traffic control decisions to predictions and react to unexpected
traffic events.
The validation of this traffic control strategy will be performed using a state-of-the-
art microscopic traffic simulator such as, for instance, AIMSUN. The proposed model
was developed from scratch rather than by enhancing an existing model.
Finally, it is not our goal to present the process of designing and implementing a
MAS or promoting the use of Gaia; there is existing research that is much more ade-
quate for that. However, the methodology applied is well-suited to the problem.
Acknowledgment. This project has been partially supported by FCT, under grant
SFRH/BD/51977/2012.
168 C. Vilarinho et al.
References
1. Park, B., Schneeberger, J.D.: Evaluation of traffic signal timing optimization methods us-
ing a stochastic and microscopic simulation program. Virginia Transportation Research
Council (2003)
2. Zheng, H., Son, Y., Chiu, Y., Head, L., Feng, Y., Xi, H., Kim, S., Hickman, M.: A Primer
for Agent-Based Simulation and Modeling in Transportation Applications. FHWA (2013)
3. McKenney, D., White, T.: Distributed and adaptive traffic signal control within a realistic
traffic simulation. Engineering Applications of Artificial Intelligence 26, 574–583 (2013)
4. Bazzan, A.L.C.: Opportunities for multiagent systems and multiagent reinforcement learn-
ing in traffic control. Auton. Agent. Multi-Agent Syst. 18, 342–375 (2009)
5. Katwijk, R., Schutter, B., Hellendoorn, H.: Look-ahead traffic adaptive control of a single
intersection – A taxonomy and a new hybrid algorithm (2006)
6. Choy, M., Cheu, R., Srinivasan, D., Logi, F.: Real-Time Coordinated Signal Control
Through Use of Agents with Online Reinforcement Learning. Transportation Research
Record: Journal of the Transportation Research Board 1836, 64–75 (2003)
7. Bazzan, A.L.C., Klügl, F.: A review on agent-based technology for traffic and transporta-
tion. The Knowledge Engineering Review 29, 375–403 (2013)
8. Hernández, J., Cuena, J., Molina, M.: Real-time traffic management through knowledge-
based models: The TRYS approach. ERUDIT Tutorial on Intelligent Traffic Management
Models, Helsinki, Finland (1999)
9. Roozemond, D.A., Rogier, J.L.: Agent controlled traffic lights. In: ESIT 2000, European
Symposium on Intelligent Techniques. Citeseer (2000)
10. Bazzan, A.L.C., Oliveira, D., Silva, B.C.: Learning in groups of traffic signals. Engineer-
ing Applications of Artificial Intelligence 23, 560–568 (2010)
11. Wiering, M., Veenen, J., Vreeken, J., Koopman, A.: Intelligent Traffic Light Control (2004)
12. Oliveira, D., Bazzan, A.L.C.: Traffic lights control with adaptive group formation based on
swarm intelligence. In: Dorigo, M., Gambardella, L.M., Birattari, M., Martinoli, A., Poli,
R., Stützle, T. (eds.) ANTS 2006. LNCS, vol. 4150, pp. 520–521. Springer, Heidelberg
(2006)
13. Dresner, K., Stone, P.: A Multiagent Approach to Autonomous Intersection Management.
J. Artif. Intell. Res. (JAIR) 31, 591–656 (2008)
14. Vasirani, M., Ossowski, S.: A market-inspired approach to reservation-based urban road traffic
management. In: Proceedings of 8th International Conference on AAMAS, pp. 617–624.
International Foundation for AAMS (2009)
15. Vasirani, M., Ossowski, S.: A computational market for distributed control of urban road traf-
fic systems. IEEE Transactions on Intelligent Transportation Systems 12, 313–321 (2011)
16. Zambonelli, F., Jennings, N.R., Wooldridge, M.: Developing multiagent systems: The
Gaia methodology. ACM T. Softw. Eng. Meth. 12, 317–370 (2003)
17. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia methodology for agent-oriented
analysis and design. Auton. Agent. Multi-Agent Syst. 3, 285–312 (2000)
18. Passos, L.S., Rossetti, R.J.F., Gabriel, J.: An agent methodology for processes, the envi-
ronment, and services. In: IEEE Int. C. Intell. Tr., pp. 2124–2129. IEEE (2011)
19. Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J.: Tropos: An agent-
oriented software development methodology. Auton. Agent. Multi-Agent Syst. 8, 203–236
(2004)
20. Castro, A., Oliveira, E.: The rationale behind the development of an airline operations
control centre using Gaia-based methodology. International Journal of Agent-Oriented
Software Engineering 2, 350–377 (2008)
Prediction of Journey Destination in Urban Public
Transport
Vera Costa1(), Tânia Fontes1, Pedro Maurício Costa1, and Teresa Galvão Dias1,2
1
Department of Industrial Management, Faculty of Engineering,
University of Porto, Porto, Portugal
[email protected]
2
INESC-TEC, Porto, Portugal
1 Introduction
In the last decade, Urban Public Transport (UPT) systems have turned to Information
and Communication Technologies (ICT) for improving the efficiency of existing
transportation networks, rather than expanding their infrastructures [3, 7]. Public
transport providers make use of a wide range of ICT tools to adjust and optimise their
service, and plan for future development.
The adoption of smart cards, in particular, has not only enabled providers to access
detailed information about usage, mobility patterns and demand, but also contributed
significantly towards service improvement for travellers [21, 25]. For instance,
inferring journey transfers and destination based on historical smart card data has
allowed transportation providers to significantly improve their estimates of service
usage – otherwise based on surveys and other less reliable methods [8]. As a result,
UPT providers are able to adjust their service accordingly while reducing costs [2].
Furthermore, the combination of UPT and ICT has enabled the development of
Traveller Information Systems (TIS), with the goal of providing users with relevant
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 169–180, 2015.
DOI: 10.1007/978-3-319-23485-4_18
170 V. Costa et al.
on-time information. Previous work has shown that TIS have a positive impact on
travellers. For instance, providing on-time information at bus stops can significantly
increase perception, loyalty and satisfaction [5].
The latest developments in ICT have paved the way for the emergence of ubiqui-
tous environments and ambient intelligence in UPT, largely supported by miniaturised
computer devices and pervasive communication networks. Such environments sim-
plify the collection and distribution of detailed real-time data that allow for richer
information and support the development of next-generation TIS [6, 20].
In this context, as transportation data is generated and demand for real-time infor-
mation increases, the need for contextual services arises for assisting travellers, identi-
fying possible disruptions and anticipating potential alternatives [20, 26].
A number of methods have been used for inferring journeys offline (e.g. [1,8,23]).
After the journeys are completed, the application of these methods can support different
analysis, such as patterns of behaviour (e.g. [13,14]) and traveller segmentation
(e.g. [11,12]. In contrast, little research has focused on real-time journey prediction (e.g.
[16]). Contextual services, however, require on-time prediction and, unless explicitly
stated by the user, the destination of a journey may not be known until alighting.
The prediction of journeys based on past data and mobility patterns is a pivotal
component of the next generation of TIS for providing relevant on-time contextual
information. Simultaneously, UPT providers benefit from up-to-date travelling infor-
mation, allowing them to monitor their infrastructures closely and take action.
An investigation of journey prediction is presented, based on a group of bus travel-
lers in Porto, Portugal. Specifically this research focuses on the following questions:
In order to predict the journey destinations for an individual traveller, three steps were
defined: (i) firstly, data from smart cards were collected and pre-processed from the
public transport network in Porto, Portugal (see section 2.1); (ii) secondly, four differ-
ent groups of users were defined, considering their travel patterns (see section 2.2);
(iii) finally, three different intelligent algorithms were assessed considering different
performance measures (see section 2.3). Figure 1 presents an overview of the overall
methodology applied to perform the simulations. While at this stage the analysis is
Prediction of Journey Destination in Urban Public Transport 171
based on a set of simulations and historical data, the goal is to apply the method for a
timely prediction of destinations and which will be implemented in the scope of the
Seamless project [4]. Thus, the simulations presented in the present paper enable the
evaluation of the importance of groups of travellers, and the best algorithm to use for
predicting journey destinations in a real-world environment.
Data collection
Data preprocessing
Definition of groups
No
All Repetitions?
Yes
No
Last group?
Yes
Model evaluation
2.1 Data
The public transport network of the Metropolitan Area of Porto covers an area
of 1,575 km2 and serves 1.75 million of inhabitants [10]. The network is
composed of 126 buses lines (urban and regional), 6 metro lines, 1 cable line, 3 tram
lines, and 3 train lines [24]. This system is operated by 11 transport providers, of
which Metro do Porto and STCP are the largest.
The Porto network is based on an intermodal and flexible ticket system: the An-
dante. Andante is an open zonal system, based on smart cards, that requires validation
only when boarding. A validated occasional ticket allows for unlimited travel within a
specified area and time period: 1 hour for the minimum 2-zone ticket, and longer as
the number of zones increases. Andante holders can use different lines and transport
modes in a single ticket.
In this work, to perform the simulations two months of data were used, April and May
of 2010 to perform the simulations. Table 1 shows an example of the data collected for
an individual traveller for one week of April 2010. Journey ID is a unique identifier for
each trip, sorted in ascending order by the transaction time. For each traveller (i.e. for
each Andante smart card), the information related with the boarding time (first boarding
on the route), the line (or lines for each trip) and the stop (or stops for each trip) is avail-
able. Each trip could have one or more stages. The first line of the table shows a trip with
two stages. First the traveller uses stop 1716 and line 303 at 11h34 followed the stop
3175 and line 302. Based on these data the route sequence can be rebuilt.
Table 1. Extracted trip chain information for an individual travellers during a week of April,
2010.
Journey ID Date First boarding time Route sequence Stop sequence
of the route (Line ID) (Stop ID)
1036866 10/04/2010 11:24 303 302 1716 3175
1036867 10/04/2010 16:27 203 1622
1036868 10/04/2010 23:14 200 1035
1036869 12/04/2010 09:05 402 1632
1036870 12/04/2010 12:42 203 1695
1036871 12/04/2010 13:44 203 1632
1036872 12/04/2010 19:45 303 1338
1036873 12/04/2010 22:29 400 400 1675 1689
1036874 13/04/2010 09:09 402 1632
1036875 13/04/2010 19:11 206 1338
1036876 14/04/2010 08:45 302 1632
1036877 14/04/2010 12:30 402 1338
1036878 14/04/2010 13:43 203 1632
1036879 14/04/2010 19:11 303 1338
1036880 15/04/2010 09:08 402 1632
1036881 15/04/2010 12:57 203 1695
1036882 15/04/2010 14:04 302501 16321810
1036883 15/04/2010 20:52 303 1338
absence of information, the results obtained with the application of this algorithm
were partially restricted, since data from only one transport provider, the STCP com-
pany, were available. Thus, in order to minimize the error, only data from users with
at least 80% of destinations inferred with success and, on average, two or more vali-
dations per day was used. As a result, the sample consists of 615,647 trips corre-
sponding to 6865 different Andante cards.
The data set consists of a set of descriptive attribute of which three of them were
used. The first attribute represents the code of the origin bus stop. The second attrib-
ute identifies the date, which represents the day of the week for each validation. The
third represents the bus stop as an inferred destination.
2.3 Methods
In order to estimate the destination of each traveller, three different algorithms were
analysed: (i) the decision trees (J48); (ii) the Naïve Bayes (NB); and (iii) the Top- K
algorithm (Top-K).
Decision trees represent a supervised approach to classification. These algorithms
are a tree-based knowledge representation methodology, which are used to represent
classification rules in a simple structure where non-terminal nodes represent tests on
one or more attributes and terminal nodes reflect decision outcomes. The decision tree
approach is usually the most useful in classification problems [17]. With this tech-
nique, a tree is built to model the classification process. J48 is an implementation of a
decision tree algorithm in the WEKA system, used to generate a decision tree model
to classify the destination based on the attribute values of the available training data.
In R software, the RWeka package was used.
The Naïve Bayes algorithm is a simple probabilistic classifier that calculates a set
of probabilities by counting the frequency and combinations of values in a given data
set [19]. The probability of a specific feature in the data appears as a member in the
set of probabilities and is calculated by the frequency of each feature value within a
class of a training data set. The training dataset is a subset, used to train a classifier
algorithm by using known values to predict future, unknown values. The algorithm is
based on the Bayes theorem and assumes all attributes to be independent given the
value of the class variable. In this work the e1071 R package was used.
The Top-K algorithm enables finding the most frequent elements or item sets
based on an increment counter [15]. The method is generally divided into counter-
based and sketch-based techniques. Counter-based techniques keep an individual
counter for a subset of the elements in the dataset, guaranteeing their frequency.
Sketch-based techniques, on the other hand, provide an estimation of all elements,
with a less stringent guarantee of frequency. Metwally proposed the Space-Saving
algorithm, a counter-based version of the Top-K algorithm that targets performance
and efficiency for large-scale datasets. This version of the algorithm maintains partial
information of interest, with accurate estimates of significant elements supported by a
lightweight data structure, resulting in memory saving and efficient processing. It
focuses on the influential nodes and discards less connected ones [22]. The main idea
Prediction of Journey Destination in Urban Public Transport 175
behind this method is to have a set of counters that keep the frequency of individual
elements. Invoking a parallelism with social network analysis, the algorithm proposed
by Sarmento et al. [22] was changed; a journey is considered to be an edge i.e. a con-
nection between any node (stop) A and B. The algorithm starts to count occurrences
of journeys. For each traveller, if the new journey is monitored, the counter is up-
dated. Otherwise, the algorithm adds a new journey in your Top-K list. If the number
of unique journeys exceeds 10*K monitored journeys the algorithm follows the space
saving application.
For each algorithm and group of travellers defined previously (see Section 2.2), 15
repetitions were performed. For each repetition, 30 travellers were randomly selected
from each group. Figure 2 shows the average number of journeys for the groups. In
each simulation, the test size is always one and corresponds to the day under evalua-
tion, i (ntest = 1 day), while the train continuously grows with i (ntrain = i-1 day(s)).
Table 3 illustrates this procedure.
*
2010-04-27:
STCP strike
day;
2010-05-14:
visit of pope in
the city.
* *
To evaluate the performance of the different algorithms the Accuracy measure (1),
which represents the proportion of correctly identified results, was used. The basis for
this approach is the confusion matrix, a two-way table that summarizes the performance
of the classifier. Considering one of the classes as the positive (P) class, and other the
negative (N) class, four quantities may be defined: the true positives (TP), the true nega-
tives (TN), the false positives (FP) and the false negatives (FN), we have:
176 V. Costa et al.
(1)
Table 4 show the average Accuracy and F-score obtained for journey destination
prediction respectively, by algorithm, group of travellers and day of the week (week-
day vs weekend). The analysis of results shows several similarities and differences
between the different groups of travellers analysed.
Regarding the comparison of the journey destination prediction between weekdays
and weekends, the results shows small differences of performance for the individuals
of Group 4. In this case the Accuracy and F-score for weekdays is around 2% higher
than on weekends. Nevertheless, for the remaining groups (G1, G2 and G3), a clear
difference is observed for these two periods. As example, while during weekdays for
Group 1, the Accuracy is, on average, 77-79% (with exception of disruptive events in
the city, namely on 2010-04-27, a STCP strike day, and on 2010-05-14, a pope visit to
the city, which were removed), during the weekends the values fall down to 47-49%.
The same trend is observed to F-score (an average value of 80-83% and 51-53% re-
spectively). Therefore, high deviations are observed during the weekends, which sug-
gest uncertainty in predicting for these days, associated with the lack of travelling
routines for these groups (G1, G2 and G3). This discrepancy between weekdays and
weekends dissipates, as mobility pattern characteristics are less strict. Simultaneously,
as weekdays and weekends become indiscernible, so does the average prediction per-
formance, with an average decrease which varies between 6 and 40%.
Regarding the algorithms applied, similar results were found between them. For
this comparison, the first four weeks were disregarded to exclude the initial learning
period. Thus, the last five weeks represent a more stable account of journey predic-
tion. On average low differences of performance were found during the weekdays
(Accuracy: G1=2%, G2=3%, G3=2%, and G4=4%; F-score: G1=2%, G2=3%,
G3=3%, and G4=3%). During the weekends, these differences are generally higher
(Accuracy: G1=7%, G2=4%, G3=4%, and G4=4%; F-score: G1=6%, G2=4%,
G3=5%, and G4=4%).
Prediction of Journey Destination in Urban Public Transport 177
Table 4. Accuracy and F-score (%) average (X) and standard deviation (SD), obtained to the
prediction of the journey destination, by algorithm (J48, NB and Top-K), day (weekdays and
weekends), and group of individual travellers (G1, G2, G3 and G4).
J48 NB Top-K
X±SD min-max X±SD min-max X±SD min-max
G1 78.3 ± 5.6 27.0 - 89.8 77.2 ± 6.0 39.8 - 88.7 79.6 ± 6.0 38.8 - 89.4
Weekend Weekday Weekend Weekday
G2 74.3 ± 6.9 21.0 - 85.7 72.5 ± 7.4 28.7 - 83.7 75.2 ± 7.1 25.5 - 85.0
Accuracy (%)
G3 63.5 ± 5.2 17.5 - 74.2 63.1 ± 5.5 24.3 - 72.3 62.9 ± 5.7 24.2 - 74.1
G4 21.1 ± 5.0 6.9 - 29.4 18.8 ± 5.0 8.4 - 27.0 19.2 ± 5.0 4.9 - 29.0
G1 49.7 ± 21.9 30.2 - 62.6 47.3 ± 21.6 31.2 - 65.7 47.8 ± 22.4 34.2 - 63.8
G2 65.0 ± 12.9 30.3 - 82.6 63.3 ± 13.2 40.2 - 76.7 65.3 ± 12.8 40.2 - 83.4
G3 56.9 ± 15.1 27.8 - 73.9 59.0 ± 14.9 40.3 - 77.1 58.5 ± 15.6 39.1 - 72.5
G4 19.5 ± 9.4 9.0 - 26.8 17.2 ± 8.8 10.8 - 25.9 15.8 ± 9.0 5.8 - 24.0
G1 81.8 ± 5.2 33.9 - 90.9 80.8 ± 5.3 55.9 - 90.3 83.3 ± 5.3 55.8 - 90.3
G2 77.6 ± 6.7 28.2 - 87.7 75.4 ± 7.2 43.6 - 86.0 78.3 ± 7.0 38.8 - 88.1
F-score (%)
G3 66.3 ± 5.5 18.4 - 76.4 65.9 ± 5.7 33.1 - 74.9 66.0 ± 5.9 33.0 - 76.2
G4 22.2 ± 5.8 7.7 - 31.2 20.8 ± 5.8 10.6 - 27.5 20.9 ± 5.7 7.3 - 30.9
G1 52.9 ± 22.1 36.0 - 66.4 51.5 ± 21.9 32.6 - 70.0 52.4 ± 22.4 40.9 - 68.1
G2 66.7 ± 12.9 37.2 - 81.5 65.4 ± 12.8 50.6 - 76.6 67.6 ± 12.6 50.9 - 81.7
G3 58.4 ± 14.7 30.0 - 73.4 60.7 ± 14.5 38.9 - 76.7 60.2 ± 15.0 37.8 - 72.3
G4 20.6 ± 10.2 8.8 - 30.3 18.9 ± 9.9 10.5 - 27.1 17.1 ± 9.3 5.5 - 26.2
A detailed analysis of Accuracy revealed that the three methods have different per-
formance levels related to the spatiotemporal characteristics of the groups. For Group
1 while Top-K shows better performance in the first five weeks, the J48 method per-
forms better for the last four weeks. The NB was better than the other two in only
17% of the days. In Group 2, the Top-K method performs better in 46% of the days,
followed by the J48 with 41% and NB with 13%. In contrast to the previous two
groups, in Group 3 the NB method performs better in the first three weeks of predic-
tions and in 34% of the days overall, with the J48 method performing better in 46%
and the Top-K in 20%. Interestingly, the first three weeks are very similar in terms of
performance between the NB and Top-K, with J48 and NB in the remaining ones.
Similarly, in Group 4 the NB method performs best in the first two weeks, down to
17% overall. Top-K performs better in only 7% of the days, and J48 in 76% of them.
With the exception of the last Group 4, both Accuracy and F-score measures show
similar results. However, in Group 4, the F-score measure reveals that both NB and
Top-K perform better in 20% of the days, and J48 in 60%. In addition, the F-score
performance tends to show better performance for Top-K for Groups 2, 3 and 4 in
detriment of J48. With the exception of the mentioned differences, the similarity be-
tween Accuracy and F-score measures indicates robustness in the results obtained.
Figure 3 shows the average F-score for one algorithm, Top-K. Whereas in the first
1-2 weeks of prediction, the F-score values increase steeply, and almost duplicate for
Groups 1 and 2, after this period it increases very slowly until 5-6 weeks. The excep-
tion is Group 4, with a slow grow tendency for the entire period of 2 months.
178 V. Costa et al.
G1 G2 G3 G4
100%
80%
60%
F-score
40%
20%
0%
sat
sat
sat
sat
sat
sat
sat
sat
sat
wed
wed
wed
wed
wed
wed
wed
wed
sun
mon
sun
mon
sun
mon
sun
mon
sun
mon
sun
mon
sun
mon
sun
mon
sun
fri
tue
fri
tue
fri
tue
fri
tue
fri
tue
fri
tue
fri
tue
fri
tue
fri
thu
thu
thu
thu
thu
thu
thu
thu
thu
week week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9
Fig. 3. F-score average for the prediction of the journey destination, by group of individual
travellers obtained with the application of the Top-K algorithm.
4 Conclusions
In this work, an investigation into journey prediction was performed, based on past
data and mobility patterns. Three different methods were used to predict journey des-
tination for four different groups of travellers and spatiotemporal characteristics. The
main findings obtained are described in the previous Section provide answers to the
questions originally formulated as follows:
Even though the three methods present similar results overall, the analysis shows that
certain scenarios allow them to perform differently. The performance differences are
mainly related to the level of historic data, day of the week and travelling patterns.
Thus, journey prediction is impacted by a number of factors that inform the design
and implementation of TIS.
Prediction of Journey Destination in Urban Public Transport 179
Future work will enable a comparison between used classifiers regarding process-
ing time and memory efficiency. We did not approach these metrics in this work, due
to space restrictions. Further research is also demanded on the characterization of
groups of travellers, regarding further analysis to enable the discovery of additional
groups of typical traveller’s profile. We will hopefully be able to find and study these
new profiles with a vaster dataset of users.
Acknowledgements. This work was performed under the project "Seamless Mobility"
(FCOMP-01-0202-FEDER-038957), financed by European Regional Development Fund
(ERDF), through the Operational Programme for Competitiveness Factors (POFC) in the Na-
tional Strategic Reference Framework (NSRF), within the Incentive System for Technology
Research and Development. The authors would also like to acknowledge the bus transport
provider of Porto, STCP, which provided travel data for the project.
References
1. Bagchi, M., White, P.R.: The potential of public transport smart card data. Transport Poli-
cy 12(5), 464–474 (2005)
2. Bera, S., Rao, K.V.: Estimation of origin-destination matrix from traffic counts: the state
of the art, European Transport\Trasporti Europei, ISTIEE, Institute for the Study of Trans-
port within the European Economic Integration, vol. 49, pp. 2–23 (2011)
3. Caragliu, A., Bo, C.D., Nijkamp, P.: Smart Cities in Europe. J. of Urban Technology
18(2), 65–82 (2011)
4. Costa, P.M., Fontes, T., Nunes, A.N., Ferreira, M.C., Costa, V., Dias, T.G., Falcão e Cunha, J.:
Seamless Mobility: a disruptive solution for public urban transport. In: 22nd ITS World Con-
gress, 5-9/10, Bordeux (2015)
5. Dziekan, K., Kottenhoff, K.: Dynamic at-stop real-time information displays for public
transport: effects on customers. Transp. Research Part A 41(6), 489–501 (2007)
6. Foth, M., Schroeter, R., Ti, J.: Opportunities of public transport experience enhancements
with mobile services and urban screens. Int. J. of Ambient Computing and Intelligence
(IJACI) 5(1), 1–18 (2013)
7. Giannopoulos, G.A.: The application of information and communication technologies in
transport. European J. of Operational Research 152(2), 302–320 (2004)
8. Gordon, J.B., Koutsopoulos, H.N., Wilson, N.H.M., Attanucci, J.P.: Automated Inference
of Linked Transit Journeys in London Using Fare-Transaction and Vehicle Location Data.
Transp. Res. Record: J. of the Transportation Research Board 2343, 17–24 (2013)
9. He, H., Garcia, E.: Learning form imbalanced data. IEEE Transactions on Knowledge and
Data Engineering 21(9), 1263–1284 (2009)
10. INE (2013). https://www.ine.pt/. Instituto Nacional de Estatística I.P., Portugal
11. Kieu, L.M., Bhaskar, A., Chung, E.: Transit passenger segmentation using travel regularity
mined from Smart Card transactions data. In: Transportation Research Board 93rd Annual
Meeting. Washington, D.C., January 12–16, 2014
12. Krizek, J.J., El-Geneidy, A.: Segmenting preferences and habits of transit users and non-
users. Journal of Public Transportation 10(3), 71–94 (2007)
13. Kusakabe, T., Asakura, Y.: Behavioural data mining of transit smart card data: A data fu-
sion approach. Transp. Research Part C 46, 179–191 (2014)
180 V. Costa et al.
14. Ma, X., Wu, Y.-J., Wanga, Y., Chen, F., Liu, J.: Mining smart card data for transit riders’
travel patterns. Transp. Research Part C 36, 1–12 (2013)
15. Metwally, A., Agrawal, D.P., El Abbadi, A.: Efficient computation of frequent and top-k
elements in data streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363,
pp. 398–412. Springer, Heidelberg (2005)
16. Mikluščák, T., Gregor, M., Janota, A.: Using neural networks for route and destination
prediction in intelligent transport systems. In: Mikulski, J. (ed.) TST 2012. CCIS, vol. 329,
pp. 380–387. Springer, Heidelberg (2012)
17. Nor Haizan, W., Mohamed,W., Salleh, M.N.M., Omar, A.H.: A Comparative Study of Re-
duced Error Pruning Method in Decision Tree Algorithms. In: International Conference on
Control System, Computing and Engineering (IEEE). Penang, Malaysia, November 25,
2012
18. Nunes, A., Dias, T.G., Cunha, J.F.: Passenger Journey Destination Estimation from Auto-
mated Fare Collection System Data Using Spatial Validation. IEEE Transactions on Intel-
ligent Transportation Systems. Forthcoming
19. Patil, T., Sherekar, S.: Performance Analysis of Naive Bayes and J48 Classification Algo-
rithm for Data Classification. Int. J. of Comp. Science and Applic. 5(2), 256–261 (2013)
20. Patterson, D.J., Liao, L., Gajos, K., Collier, M., Livic, N., Olson, K., Wang, S., Fox, D.,
Kautz, H.: Opportunity knocks: a system to provide cognitive assistance with transportation
services. In: Mynatt, E.D., Siio, I. (eds.) UbiComp 2004. LNCS, vol. 3205, pp. 433–450.
Springer, Heidelberg (2004)
21. Pelletier, M.P., Trépanier, M., Morency, C.: Smart card data use in public transit: A litera-
ture review. Transp Research Part C 19(4), 557–568 (2011)
22. Sarmento, R., Cordeiro, M., Gama, J.: Streaming network sampling using top-k neworks.
In: Proceedings of the 17th International Conference on Enterprise Information Systems
(ICEIS 2015), p. to appear. INSTICC (2015)
23. Seaborn, C., Attanucci, J., Wilson, H.M.: Analyzing multimodal public transport journeys
in London with smart card fare payment data. Transp. Research Record: J. of the Transp.
Research Board 2121(1) (2009)
24. TIP (2015). http://www.linhandante.com/. Transportes Intermodais do Porto
25. Utsunomiya, M., Attanucci, J., Wilson, N.H.: Potential Uses of Transit Smart Card Regis-
tration and Transaction Data to Improve Transit Planning. Transp. Research Record: J. of
the Transp. Research Board 119–126 (2006)
26. Zito, P., Amato, G., Amoroso, S., Berrittella, M.: The effect of Advanced Traveller Infor-
mation Systems on public transport demand and its uncertainty. Transportmetrica 7(1),
31–43 (2011)
Demand Modelling for Responsive Transport
Systems Using Digital Footprints
1 Introduction
Transportation systems are a key factor for economic sustainability and social
welfare, but providing quality public transportation may be extremely expensive
when demand is low, variable and unpredictable, as it is on some periods of the
day in urban areas. Demand Responsive Transportation (DRT) services try to
address this problem with routes and frequencies that may vary according to
the actual observed demand. However, in terms of financial sustainability and
quality level, the design of this type of services may be complicated.
Anticipating demand by studying users short-term destination choice can
improve the overall efficiency and sustainability of the transport services. Tradi-
tionally, demand modelling focused on long-term socio-economic scenarios and
land-use to estimate the required level of supply. However, the limited number
of transportation requests in DRT systems does not allow the application of
traditional models. Also, DRTs require a higher resolution zoning, otherwise it
can lead to unacceptable inaccuracies. Information coming from various sources
should be used effectively in order to model demand for DRTs trips.
The approach followed in this work analyses users short-term destination
choice patterns, with a careful analysis of the available data coming from var-
ious different sources, such as, GPS traces and social networks. The theory of
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 181–186, 2015.
DOI: 10.1007/978-3-319-23485-4 19
182 P. Silva et al.
3 Methodology
3.1 Data Gathering
We use GPS data traces provided by TU Delft from 80 individuals over the
course of four days, and also data collect from social networks, namely Twitter,
Instagram and Foursquare. The data obtained is cleared of personal values as to
ensure privacy. To get the geo-located points of interest, we use the FourSquare
API, extracting the 50 most popular venues, within a radius of 30 meters for
each given point, resulting in a total of 37506 venues, in 489 categories, with their
identification, geo-location and total number of check-ins made. The subscription
zone for Instagram had a radius of 5 kilometers from the city center. For Twitter,
we covered a bigger area in order to get Delft surroundings.
Friendship. To get the friendship, we have to use the user unique identification
from the post, and request the users that the user followed and that follow him
back. The only significant friendships considered are the ones between users that
posted around Delft. The total number of friendships used is 35457. Discrete
choice model also had to take into account the strength between users. To get
and measure the ties strength, tie mutuality, propinquity, mutual friends and
multiplexity factors are used.
Detecting Important Locations. To build the MNL we also need to know the
user home and work location, since we are only interested in the user movement
patterns before and after work hours. Home and work are the starting points for
which the distance to the points of interest are measured. To get these locations,
we use a clustering algorithm, namely DBScan [11].
In the data set with the posts and associated venues, i.e., the choice set (CS),
there is a large amount of data with no use for us, as it does not provide useful
information (for instance, useless categories) or represent work or residential
places, for which the demand patterns are well established and can be met by
traditional transportation services. The data containing those specific categories
was erased from the choice set. Since the number of alternatives is quite big, we
grouped those venues in 6 main categories: Appointment (17%), Food (17%),
Bar (5%), Shop (24%), Entertainment (27%) and Travel (10%).
If we used these categories as our number of different alternatives for the MNL
model, we would only get results concerning each of those 6 alternatives, which
are quite generic. However, we want to use the model to predict probabilities
of destination choices with a higher resolution, so we generated data for all the
venues and then use those categories only to filter unnecessary data.
184 P. Silva et al.
Since we cannot directly extract user personal information (e.g. age, gender),
our data does not contain individual specific variables, and so the alternative
specific variables have a generic coefficient, i.e., we consider that the number
of check-ins, distance and friendship have the same value for all alternatives.
Choice takes values of yes and no, if the alternative was chosen or not by the
user. To estimate the MNL we have used the R statistics system with the mlogit
package. The following formula was used for our work,
where choice is the variable that indicates the choice made for each individual
among the alternatives and the distance, friendship and attractiveness being the
alternative specific variables with generic coefficients from the choice set CS.
4 Results
We present the results and estimation parameter for one choice set, namely the
one representing the choices made at hour 21, which has 24 alternatives and 91
observations.
The model predictions are reasonable good when tested against the user
observed choices. Table 1 presents the average probabilities returned by the
model against the observed frequency. The results from the MNL model show
meaningful relationships between distance and attractiveness for all the different
alternatives, being distance the most significant variable, i.e., longer distances
almost always reduce the attractiveness of a destination, all else being equal.
The same can be said for the attractiveness variable, but the friendship vari-
able does not have the same impact to the individual when choosing an alter-
native. Table 2 illustrates these findings. To show the usefulness of the analyses
made, we feed the probabilities predicted by our model to a DRT simulator
developed in [12]. Figure 1 shows that most origins and destinations found for
the time period and travel objective considered lie outside the service area of
the different public transport modes (dotted lines) and DRT could satisfy this
demand (solid lines).
Demand Modelling for Responsive Transport Systems 185
5 Conclusions
Network Analyses. The low frequency of posts with identified locations for each
user made it difficult to generate a clear pattern for each user. Nevertheless,
the results from the model show meaningful relationships between distance and
attractiveness for all the different alternatives, with the variable distance being
the most significant.
Since the analyses of the social network done in this work does not produce
individual characteristics, like age, gender and socio-economic, it would be inter-
esting for future work to include data mining algorithms to extract some of those
values from tweets, and add features specific to each venue, to better understand
the motivation behind the choice made.
References
1. Cuff, D., Hansen, M., Kang, J.: Urban sensing: Out of the woods. Communications
of the ACM 51(3), 24–33 (2008)
2. Carrasco, J., Hogan, B., Wellman, B., Miller, E.: Collecting social network data to
study social activity-travel behaviour: an egocentric approach. Environment and
Planning B: Planning and Design 35, 961–980 (2008)
3. Chen, L., Mingqi, L., Chen, G.: A system for destination and future route predic-
tion based on trajectory mining. Pervasive and Mobile Computing 6(6), 657–676
(2010)
4. Axhausen, K.: Social networks, mobility biographies, and travel: survey challenges.
Environment and Planning B: Planning and Design 35, 981–996 (2008)
5. Paáez, A., Scott, D.: Social influence on travel behavior: a simulation example of
the decision to telecommute. Environment and Planning A 39(3), 647–665 (2007)
6. Hasan, S., Ukkusuri, S.: Urban activity pattern classification using topic mod-
els from online geo-location data. Transportation Research Part C: Emerging
Technologies 44, 363–381 (2014)
7. Ben-Akiva, M., Lerman, S.: Discrete Choice Analysis: Theory and Application to
Travel Demand (1985)
8. Brock, W., Durlauf, S.: Discrete Choice with Social Interactions. Review of
Economic Studies 68(2), 235–260 (2001)
9. Brock, W., Durlauf, S.: A multinomial choice model with neighborhood effects.
American Economic Review 92, 298–303 (2002)
10. Zanni, A.M., Ryley, T.J.: Exploring the possibility of combining discrete choice
modelling and social networks analysis: an application to the analysis of weather-
related uncertainty in long-distance travel behaviour. In: International Choice
Modelling Conference, Leeds, pp. 1–22 (2011)
11. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering
clusters in large spatial databases with noise. In: Proceedings of 2nd International
Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press
(1996)
12. Gomes, R., Sousa, J.P., Galvao, T.: An integrated approach for the design of
Demand Responsive Transportation services. In: de Sousa, J.F., Rossi, R. (eds.)
Computer-based Modelling and Optimization in Transportation. AISC, vol. 232,
pp. 223–235. Springer, Heidelberg (2014)
Artificial Life and Evolutionary
Algorithms
A Case Study on the Scalability of Online
Evolution of Robotic Controllers
1 Introduction
Evolutionary computation has been widely studied and applied to synthesise
controllers for autonomous robots in the field of evolutionary robotics (ER).
In online ER approaches, an evolutionary algorithm (EA) is executed onboard
robots during task execution to continuously optimise behavioural control. The
main components of the EA (evaluation, selection, and reproduction) are per-
formed by the robots without any external supervision. Online evolution thus
enables addressing tasks that require online learning or online adaptation. For
instance, robots can evolve new controllers and modify their behaviour to
respond to unforeseen circumstances, such as changes in the task or in the envi-
ronment.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 189–200, 2015.
DOI: 10.1007/978-3-319-23485-4 20
190 F. Silva et al.
Research in online evolution started out with a study by Floreano and Mon-
dada [1], who conducted experiments on a real mobile robot. The authors success-
fully evolved navigation and obstacle avoidance behaviours for a Khepera robot.
The study was a significant breakthrough as it demonstrated the potential of
online evolution of controllers. Researchers then focused on how to mitigate the
issues posed by evolving controllers directly on real robots, especially the pro-
hibitively long time required [2]. Watson et al. [3] introduced an approach called
embodied evolution in which an online EA is distributed across a group of robots.
The main motivation behind the use of multirobot systems was to leverage the
potential speed-up of evolution due to robots that evolve controllers in parallel
and that exchange candidate solutions to the task.
Over the past decade, numerous approaches to online evolution in multirobot
systems have been developed. Examples include Bianco and Nolfi’s open-ended
approach for self-assembling robots [4], mEDEA by Bredeche et al. [5], and
odNEAT by Silva et al. [6]. When the online EA is decentralised and distributed
across a group of robots, one common assumption is that online evolution inher-
ently scales with the number of robots [3]. Generally, the idea is that the more
robots are available, the more evaluations can be performed in parallel, and the
the faster the evolutionary process [3]. The dynamics of the online EA itself, and
common issues that arise in EAs from population sizing such as convergence rates
and diversity [7] have, however, not been considered. Furthermore, besides ad-
hoc experiments with large groups of robots, see [5] for examples, there has been
no systematic study on the scalability properties of online EAs across different
tasks. Given the strikingly long time that online evolution requires to synthesise
solutions to any but the simplest of tasks, the approach remains infeasible on
real robots [8].
In this paper, we study the scalability properties of online evolution of robotic
controllers. The online EA used in this case study is odNEAT [9], which opti-
mises artificial neural network (ANN) controllers. One of the main advantages of
odNEAT is that it evolves both the weights and the topology of ANNs, thereby
bypassing the inherent limitations of fixed-topology algorithms [9]. odNEAT is
used here as a representative efficient algorithm that has been successfully used
in a number of simulation-based studies related to adaptation and learning in
robot systems, see [6,8–11] for examples. We assess the scalability properties and
performance of odNEAT in four tasks involving groups of up to 25 simulated e-
puck-like robots [12]: (i) an aggregation task, (ii) a dynamic phototaxis task, and
(iii, iv) two foraging tasks with differing complexity. Overall, our study shows
how online EAs can enable groups of different size to leverage their multiplicity
for higher performance, and for faster evolution in terms of evolution time and
number of evaluations required to evolve effective controllers.
for multirobot systems. The algorithm starts with minimal networks with no
hidden neurons, and with each input neuron connected to every output neuron.
Throughout evolution, topologies are gradually complexified by adding new neu-
rons and new connections through mutation. In this way, odNEAT is able find
an appropriate degree of complexity for the current task, and a suitable ANN
topology is the result of a continuous evolutionary process [9].
odNEAT is distributed across multiple robots that exchange candidate solu-
tions to the task. The online evolutionary process is implemented according to a
physically distributed island model. Each robot optimises an internal population
of genomes (directly encoded ANNs) through intra-island variation, and genetic
information between two or more robots is exchanged through inter-island migra-
tion. In this way, each robot is potentially self-sufficient and the evolutionary pro-
cess opportunistically capitalises on the exchange of genetic information between
multiple robots for collective problem solving [9].
During task execution, each robot is controlled by an ANN that represents
a candidate solution to a given task. Controllers maintain a virtual energy level
reflecting their individual performance. The fitness value is defined as the mean
energy level. When the virtual energy level of a robot reaches a minimum thresh-
old, the current controller is considered unfit for the task. A new controller is
then created via selection of two parents from the internal population, crossover
of the parents’ genomes, and mutation of the offspring. Mutation is both struc-
tural and parametric, as it adds new neurons and new connections, and optimises
parameters such as connection weights and neuron bias values.
odNEAT has been successfully used in a number of simulation-based stud-
ies related to long-term self-adaptation in robot systems. Previous studies have
shown: (i) that odNEAT effectively evolves controllers for robots that oper-
ate in dynamic environments with changing task parameters [11], (ii) that the
controllers evolved are robust and can often adapt to changes in environmental
conditions without further evolution [9], (iii) that robots executing odNEAT can
display a high degree of fault tolerance as they are able to adapt and learn new
behaviours in the presence of faults in the sensors [9], (iv) how to extend the algo-
rithm to incorporate learning processes [11], and (v) how to evolve behavioural
building blocks prespecified by the human experimenter [8,10]. Given previous
results, odNEAT is therefore used in our study as a representative online EA.
The key research question of our study is if and how online EAs can enable
robots to leverage their multiplicity. That is, besides performance and robust-
ness criteria, we are interested in studying scalability with respect to the group
size, an important aspect when large groups of robots are considered.
3 Methods
ΔE
= α(t) + γ(t) (1)
Δt
where t is the current control cycle, α(t) is a reward proportional to the num-
ber n of different genomes received in the last P = 10 control cycles. Because
robots executing odNEAT exchange candidate solutions, the number of different
1
The original e-puck infrared range is 2-3 cm [12]. In real e-pucks, the liblrcom library,
see http://www.e-puck.org, extends the range up to 25 cm and multiplexes infrared
communication with proximity sensing.
A Case Study on the Scalability of Online Evolution of Robotic Controllers 193
genomes received is used to estimate the number of robots nearby. γ(t) is a factor
related to the quality of movement computed as:
-1 if vl (t) · vr (t) < 0
γ(t) = (2)
Ωs (t) · ωs (t) otherwise
where vl (t) and vr (t) are the left and right wheel speeds,
Ωs (t) is the ratio
between the average and maximum speed, and ωs (t) = vl (t) · vr (t) rewards
controllers that move fast and straight at each control cycle.
Phototaxis Task. In a phototaxis task, robots have to search and move towards
a light source. Following [9], we use a dynamic version of the phototaxis task
in which the light source is periodically moved to a new random location. As a
result, robots have to continuously search for and reach the light source, which
eliminates controllers that find the light source by chance. The virtual energy
194 F. Silva et al.
level E ∈ [0, 100] units, and controllers are assigned an initial value of 50 units.
At each control cycle, E is updated as follows:
⎧
⎪
⎨Sr if Sr > 0.5
ΔE
= 0 if 0 < Sr ≤ 0.5 (3)
Δt ⎪
⎩
-0.01 if Sr = 0
where Sr is the maximum value of the readings from light sensors, between 0 (no
light) and 1 (brightest light). Light sensors have a range of 50 cm and robots are
therefore only rewarded if they are close to the light source. Remaining sensors
have a range of 25 cm.
Foraging Tasks. In a foraging task, robots have to search for and pick up
objects scattered in the environment. Foraging is a canonical testbed in cooper-
ative robotics domains, and is evocative of tasks such as toxic waste clean-up,
harvesting, and search and rescue [15].
We setup a foraging task with different types of resources that have to be
collected. Robots spend virtual energy at a constant rate and must learn to find
and collect resources. When a resource is collected by a robot, a new resource
of the same type is placed randomly in the environment so as to keep the num-
ber of resource constant throughout the experiments. We experiment with two
variants of a foraging task: (i) one in which there are only type A resources,
henceforth called standard foraging task, and (ii) one in which there are both
type A and type B resources, henceforth called concurrent foraging task. In the
concurrent foraging task, resources A and B have to be consumed sequentially.
That is, besides learning the foraging aspects of the task, robots also have to
learn to collect resources in the correct order. The energy level of each controller
is initially set to 100 units, and limited to the range [0,1000]. At each control
cycle, E is updated as follows:
⎧
⎪
⎨reward if right type of resource is collected
ΔE
= penalty if wrong type of resource is collected (4)
Δt ⎪
⎩
-0.02 if no resource is consumed
where reward = 10 and penalty = -10. The constant decrement of 0.02 means
that each controller will execute for a period of 500 seconds if no resource is
collected since it started operating. Note that the penalty component applies
only to the concurrent foraging task. To enable a meaningful comparison of
performance when groups of different size are considered, the number of resources
of each type is set to the number of robots multiplied by 10.
Fig. 1. Distribution of the fitness score of the final controllers in: (a) aggregation task,
and (b) phototaxis task.
196 F. Silva et al.
Table 2. Summary of the individual fitness score of final solutions in the two foraging
tasks.
outperformed by larger groups (ρ < 0.001, see Fig. 1). In the phototaxis task,
groups of 25 robots also perform significantly better than groups with 20 robots
(ρ < 0.01). Specifically, results suggest that a minimum of 10 robots are necessary
for high-performing controllers to be evolved in a consistent manner.
A summary of the results obtained in the two foraging tasks is shown in
Table 2. Given the dynamic nature of task, especially as the number of robots
increases, the fitness score of the final controllers displays a high variance. The
results, however, further show that larger groups typically yield better perfor-
mance both in terms of the mean and of the maximum fitness scores, and is
an indication that decentralised online approaches such as odNEAT can indeed
capitalise on larger groups to evolve more effective solutions to the current task.
To quantify to what extent is a robot dependent on the candidate solutions
it receives from other robots, we analyse the origin of the information stored
in the population of each robot. In the phototaxis task, when capable solutions
have been evolved approximately 86.85% (5 robots) to 93.95% (25 robots) of
genomes maintained in each internal population originated from other robots,
whereas the remaining genomes stored were produced by the robots themselves
(analysis of the results obtained in the other tasks revealed a similar trend).
The final solutions executed by each robot to solve the task have on average
from 87.26% to 89.10% matching genes. Moreover, 39.73% (5 robots) to 47.70%
(25 robots) of these solutions have more than 90% of their genes in common. The
average weight difference between matching connection genes varies from 2.48 to
4.37, with each weight in [-10, 10], which indicates that solutions were refined by
the EA on the receiving robot. Local exchange of candidate controllers therefore
appears to be a crucial part in the evolutionary dynamics of decentralised online
EAs because it serves as a substrate for collective problem solving. In the fol-
lowing section, we analyse how the exchange of such information enables online
EAs to capitalise on increasingly larger groups of robots for faster evolution of
solutions to the task.
A Case Study on the Scalability of Online Evolution of Robotic Controllers 197
Fig. 2. Distribution of evaluations in: (a) aggregation task, (b) phototaxis task,
(c) standard foraging task, and (d) concurrent foraging task.
Fig. 3. Operation time of intermediate controllers in the concurrent foraging task. 67%
to 96% of intermediate controllers operate for few minutes before they fail (not shown
for better plot readability).
The speed up of evolution with the increase of group size also occurs in the
phototaxis task. The number of evaluations is significantly reduced (ρ < 0.001)
with the increase of the group size from 5 to 10 robots (mean number of evalua-
tions of 39 and 14, respectively). The mean evolution time is of 39.16 hours for
groups of 5 robots, 9.51 hours for 10 robots, 7.20 hours for 15 robots, 6.30 hours
for 20 robots, and 5.27 hours for 25 robots. Similarly to the number of evalua-
tions, the evolution time yields on average a 4-fold-decrease when the group is
enlarged from 5 to 10 robots (ρ < 0.001). Larger groups enable further improve-
ments (ρ < 0.001 for increases up to 20 robots, ρ < 0.01 when group size is
changed from 20 to 25 robots), but at comparatively smaller rates. Chiefly, the
results of the aggregation task and of the phototaxis task show quantitatively
distinct speed-ups of evolution when groups are enlarged.
With respect to the two foraging tasks, the distribution of the number of
evaluations shown in Fig. 2 is inversely proportional, with a gentle slope, to
the number of robots in the group. For both tasks, differences in the number
of evaluations are significant across all comparisons (ρ < 0.001). In effect, the
number of evaluations is reduced on average: (i) from 115 evaluations (5 robots)
to 15 evaluations (25 robots) in the standard foraging task, which corresponds
to a 7.67-fold decrease in terms of evaluations, and (ii) from 82 evaluations (5
robots) to 8 evaluations in the concurrent foraging task, which amounts to a
10.25-fold decrease. These results show that decentralised online evolution can
scale well in terms of evaluations, even when task complexity is increased.
Regarding the evolution time, results show a similar trend for both foraging
tasks. On average, the evolution time varies from approximately 35 and 36 hours
for groups of 5 robots to 21 and 23 hours for groups of 25 robots. That is, despite
significant improvements in terms of the number of evaluations, the evolution
time required to evolve the final controllers to the task is still prohibitively long.
This result is due to the controller evaluation policy. Online evolution approaches
A Case Study on the Scalability of Online Evolution of Robotic Controllers 199
References
1. Floreano, D., Mondada, F.: Automatic creation of an autonomous agent: Genetic
evolution of a neural-network driven robot. In: 3rd International Conference on
Simulation of Adaptive Behavior, pp. 421–430. MIT Press, Cambridge (1994)
2. Matarić, M., Cliff, D.: Challenges in evolving controllers for physical robots.
Robotics and Autonomous Systems 19(1), 67–83 (1996)
3. Watson, R.A., Ficici, S.G., Pollack, J.B.: Embodied evolution: Distributing an evo-
lutionary algorithm in a population of robots. Robotics and Autonomous Systems
39(1), 1–18 (2002)
4. Bianco, R., Nolfi, S.: Toward open-ended evolutionary robotics: evolving elemen-
tary robotic units able to self-assemble and self-reproduce. Connection Science
16(4), 227–248 (2004)
5. Bredeche, N., Montanier, J., Liu, W., Winfield, A.: Environment-driven distributed
evolutionary adaptation in a population of autonomous robotic agents. Mathemat-
ical and Computer Modelling of Dynamical Systems 18(1), 101–129 (2012)
6. Silva, F., Urbano, P., Oliveira, S., Christensen, A.L.: odNEAT: An algorithm for
distributed online, onboard evolution of robot behaviours. In: 13th International
Conference on the Simulation and Synthesis of Living Systems, pp. 251–258. MIT
Press, Cambridge (2012)
7. De Jong, K.A.: Evolutionary computation: a unified approach. MIT Press,
Cambridge (2006)
8. Silva, F., Duarte, M., Oliveira, S.M., Correia, L., Christensen, A.L.: The case for
engineering the evolution of robot controllers. In: 14th International Conference
on the Synthesis and Simulation of Living Systems, pp. 703–710. MIT Press,
Cambridge (2014)
9. Silva, F., Urbano, P., Correia, L., Christensen, A.L.: odNEAT: An algorithm
for decentralised online evolution of robotic controllers. Evolutionary Computa-
tion (2015) (in press). http://www.mitpressjournals.org/doi/pdf/10.1162/EVCO
a 00141
10. Silva, F., Correia, L., Christensen, A.L.: Speeding Up online evolution of robotic-
controllers with macro-neurons. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.)
EvoApplications 2014. LNCS, vol. 8602, pp. 765–776. Springer, Heidelberg (2014)
11. Silva, F., Urbano, P., Christensen, A.L.: Online evolution of adaptive robot
behaviour. International Journal of Natural Computing Research 4(2), 59–77
(2014)
12. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat,
S., Zufferey, J., Floreano, D., Martinoli, A.: The e-puck, a robot designed for
education in engineering. In: 9th Conference on Autonomous Robot Systems and
Competitions, pp. 59–65, IPCB, Castelo Branco (2009)
13. Duarte, M., Silva, F., Rodrigues, T., Oliveira, S.M., Christensen, A.L.: JBotE-
volver: A versatile simulation platform for evolutionary robotics. In: 14th Interna-
tional Conference on the Synthesis and Simulation of Living Systems, pp. 210–211.
MIT Press, Cambridge (2014)
14. Groß, R., Dorigo, M.: Towards group transport by swarms of robots. International
Journal of Bio-Inspired Computation 1(1–2), 1–13 (2009)
15. Cao, Y., Fukunaga, A., Kahng, A.: Cooperative mobile robotics: Antecedents and
directions. Autonomous Robots 4(1), 1–23 (1997)
Spatial Complexity Measure for Characterising
Cellular Automata Generated 2D Patterns
Abstract. Cellular automata (CA) are known for their capacity to gen-
erate complex patterns through the local interaction of rules. Often the
generated patterns, especially with multi-state two-dimensional CA, can
exhibit interesting emergent behaviour. This paper addresses quanti-
tative evaluation of spatial characteristics of CA generated patterns.
It is suggested that the structural characteristics of two-dimensional
(2D) CA patterns can be measured using mean information gain. This
information-theoretic quantity, also known as conditional entropy, takes
into account conditional and joint probabilities of cell states in a 2D
plane. The effectiveness of the measure is shown in a series of experiments
for multi-state 2D patterns generated by CA. The results of the experi-
ments show that the measure is capable of distinguishing the structural
characteristics including symmetry and randomness of 2D CA patterns.
1 Introduction
Cellular automata (CA) are one of the early bio-inspired systems invented by von
Neumann and Ulam in the late 1940s to study the logic of self-reproduction in
a material-independent framework. CA are known to exhibit complex behaviour
from the iterative application of simple rules. The popularity of the Game of
Life drew the attention of a wider community of researchers to the unexplored
potential of CA applications and especially in their capacity to generate complex
behaviour. The formation of complex patterns from simple rules sometimes with
high aesthetic qualities has been contributed to the creation of many digital
art works since the 1960s. The most notable works are “Pixillation”, one of the
early computer generated animations [11], the digital art works of Struycken [10],
Brown [3] and evolutionary architecture of Frazer [5]. Furthermore, CA have been
used for music composition, for example, Xenakis [17] and Miranda [9].
Although classical one-dimensional CA with binary states can exhibit com-
plex behaviours, experiments with multi-state two-dimensional (2D) CA reveal
a very rich spectrum of symmetric and asymmetric patterns [6,7].
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 201–212, 2015.
DOI: 10.1007/978-3-319-23485-4 21
202 M.A. Javaheri Javid et al.
2 Cellular Automata
This section serves to specify the cellular automata considered in this paper, and
to define notation.
A cellular automaton A is specified by a quadruple L, S, N, f where:
with various structural characteristics is evaluated. Fig. 2a-b are patterns with
ordered structures and Fig. 2c is a pattern with repeated three element structure
over the plane. Fig. 2d is a fairly structureless pattern.
where sr , ss are the states at r and s. Since |S|= N , Gr,s is a value in [0, N ].
In particular, horizontal and vertical near neighbour pairs provide four MIGs,
G(i,j),(i+1,j) , G(i,j),(i−1,j) , G(i,j),(i,j+1) and G(i,j),(i,j−1) . In the interests of nota-
tional economy, we write Gs in place of Gr,s , and omit parentheses, so that,
for example, Gi+1,j ≡ G(i,j),(i+1,j) . The relative positions for non-edge cells are
given by matrix M : ⎡ ⎤
(i,j+1)
M = ⎣ (i−1,j) (i,j) (i+1,j)
⎦. (7)
(i,j−1)
Correlations between cells on opposing lattice edges are not considered. Fig. 4
provides an example. The depicted pattern is composed of four different symbols
S = {light-grey,grey,white,black }. The light-grey cell correlates with two neigh-
bouring white cells (i + 1, j) and (i, j − 1). On the other hand, The grey cell
has four neighbouring cells of which three are white and one is black. The result
of this edge condition is that Gi+1,j is not necessarily equal to Gi−1,j . Differ-
ences between the horizontal (vertical) mean information rates reveal left/right
(up/down) orientation.
The mean information gains of the sample patterns in Fig. 2 are presented
in Fig. 5. The merits of G in discriminating structurally different patterns rang-
ing from the structured and symmetrical (Fig. 5a-b), to the partially structured
206 M.A. Javaheri Javid et al.
(Fig. 5c) and the structureless and random (Fig. 5d), are clearly evident. The
cells in the columns of pattern (a) are completely correlated. However knowledge
of cell state does not provide complete predictability in the horizontal direction
and, as a consequence, the horizontal G is finite. Pattern (b) has non-zero, and
identical G’s indicating a symmetry between horizontal and vertical directions,
and a lack of complete predictability. Analysis of pattern (c) is similar to (a)
except the roles of horizontal and vertical directions are interchanged. The four
Gs in the final pattern are all different, indicating a lack of vertical and hori-
zontal symmetry; the higher values show the increased randomness. Details of
calculations for a sample pattern are provided in the appendix.
Fig. 5. The comparison of H with measures of Gi,j for structurally different patterns.
The experiments are conducted with two different ICs: (1) all white cells
except for a single red cell and (2) a random configuration with 50% white
quiescent states (8320 cells), 25% red and 25% green. The experimental rule
has been iterated synchronously for 150 successive time steps. Fig. 6 and Fig. 7
illustrate the space-time diagrams for a sample of time steps starting from single
and random ICs.
Fig. 6. Space-time diagram of the experimental cellular automaton for sample time
steps starting from the single cell IC.
Fig. 7. Space-time diagram of the experimental cellular automaton for sample time
steps starting from the random IC.
ΔGi,j±1 and ΔGi±1,j for both ICs are plotted in Fig. 10. The structured but
asymmetrical patterns emerging from the random start are clearly distinguished
from the symmetrical patterns of the single cell IC.
Fig. 8. Measurements of H, Gi,j+1 , Gi,j−1 , Gi+1,j ,Gi−1,j for 150 time steps starting
from the single cell IC.
Fig. 9. Measurements of H, Gi,j+1 , Gi,j−1 , Gi+1,j ,Gi−1,j for 150 time steps starting
from the random IC.
Fig. 10. Plots of ΔGi,j±1 and ΔGi±1,j for two different ICs
Fig. 11. Comparison of the cellular automaton’s H with four directional measure of
Gi,j ΔGi,j±1 and ΔGi±1,j starting from single (a, b) and random ICs (c, d).
5 Conclusion
Cellular automata (CA) are one of the early bio-inspired models of self-
replicating systems and, in 2D, are powerful tools for the pattern generation.
Indeed, multi-state 2D CA can generate many interesting and complex patterns
with various structural characteristics. This paper considers an information-
theoretic classification of these patterns.
Entropy, which is a statistical measure of the distribution of cell states, is not
in general able to distinguish these patterns. However mean information gain,
as proposed in [1,2,13], takes into account conditional and joint probabilities
Spatial Complexity Measure for Characterising Cellular Automata 211
between pairs of cells and, since it is based on correlations between cells, holds
promise for pattern classification.
This paper reports on a pair of experiments for two different initial con-
ditions of an outer-totalistic CA. The potential of mean information gain for
distinguishing multi-state 2D CA patterns is demonstrated. Indeed, the mea-
sure appears to be particular good at distinguishing symmetry from non-random
non-asymmetric patterns.
Since CA are one of the generative tools in computer art, means of evaluating
the aesthetic qualities of CA generated patterns could have a substantial con-
tribution towards further automation of CA art. This is the subject of on-going
research.
Appendix
In this example the pattern is composed of two different cells S = {white, black}
where the set of permutations with repetition is {ww, wb, bb, bw}. Considering
the mean information gain (Eq. 4) and given the positional matrix M (Eq. 7),
the calculations can be performed as follows:
References
1. Andrienko, Y.A., Brilliantov, N.V., Kurths, J.: Complexity of two-dimensional pat-
terns. Eur. Phys. J. B 15(3), 539–546 (2000)
2. Bates, J.E., Shepard, H.K.: Measuring complexity using information fluctuation.
Physics Letters A 172(6), 416–425 (1993)
3. Brown, P.: Stepping stones in the mist. In: Creative Evolutionary Systems,
pp. 387–407. Morgan Kaufmann Publishers Inc. (2001)
4. Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in
Telecommunications and Signal Processing). Wiley-Interscience (2006)
5. Frazer, J.: An evolutionary architecture. Architectural Association Publications,
Themes VII (1995)
6. Javaheri Javid, M.A., Al-Rifaie, M.M., Zimmer, R.: Detecting symmetry in cel-
lular automata generated patterns using swarm intelligence. In: Dediu, A.-H.,
Lozano, M., Martı́n-Vide, C. (eds.) TPNC 2014. LNCS, vol. 8890, pp. 83–94.
Springer, Heidelberg (2014)
7. Javaheri Javid, M.A., te Boekhorst, R.: Cell dormancy in cellular automata. In:
Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006.
LNCS, vol. 3993, pp. 367–374. Springer, Heidelberg (2006)
8. Langton, C.G.: Studying artificial life with cellular automata. Physica D: Nonlinear
Phenomena 22(1), 120–149 (1986)
9. Miranda, E.: Composing Music with Computers. No. 1 in Composing Music with
Computers. Focal Press (2001)
10. Scha, I.R.: Kunstmatige Kunst. De Commectie 2(1), 4–7 (2006)
11. Schwartz, L., Schwartz, L.: The Computer Artist’s Handbook: Concepts, Tech-
niques, and Applications. W W Norton & Company Incorporated (1992)
12. Shannon, C.: A mathematical theory of communication. The Bell System Technical
Journal 27, 379–423, 623–656 (1948)
13. Wackerbauer, R., Witt, A., Atmanspacher, H., Kurths, J., Scheingraber, H.:
A comparative classification of complexity measures. Chaos, Solitons & Fractals
4(1), 133–173 (1994)
14. Wolfram, S.: Statistical mechanics of cellular automata. Reviews of Modern Physics
55(3), 601–644 (1983)
15. Wolfram, S.: Universality and complexity in cellular automata. Physica D:
Nonlinear Phenomena 10(1), 1–35 (1984)
16. Wolfram, S.: A New Kind of Science. Wolfram Media Inc. (2002)
17. Xenakis, I.: Formalized music: thought and mathematics in composition.
Pendragon Press (1992)
Electricity Demand Modelling
with Genetic Programming
1 Introduction
Load forecasting is the task of predicting the electricity demand on different time
scales, such as minutes (very short-term), hours/days (short-term), and months
and years (long-term). This information has to be used to plan and schedule
operations on power systems (dispatch, unit commitment, network analysis) in
a way to control the flow of electricity in an optimal way, with respect to various
aspects (quality of service, reliability, costs, etc). An accurate load forecasting
has great benefits for electric utilities and both negative or positive errors lead
to increased operating costs [10]. Overestimate the load leads to an unnecessary
energy production or purchase and, on the contrary, underestimation causes
unmet demand with a higher probability of failures and costly operations. Several
factors influence electricity demand: day of the week and holidays (the so-called
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 213–225, 2015.
DOI: 10.1007/978-3-319-23485-4 22
214 M. Castelli et al.
analog electrical circuits [14], antennas [18], mechanical systems [17], photonic
systems [22], optical lens systems [13] and sorting networks [15].
This section briefly introduces the techniques used in this paper: neural networks,
model trees, and genetic programming.
1
M5PrimeLab is an open source toolbox for MATLAB/Octave available at http://
www.cs.rtu.lv/jekabsons/regression.html
Electricity Demand Modelling with Genetic Programming 217
4 Experiments
The aim of this forecasting task is the load at time t (yt ) providing information
until day t − 1 (one-day ahead forecasting) using the past samples of the load
and the information provided by temperature. We built a data set with 9 input
variables: x0 , x1 , . . . , x6 representing the daily load for each past day. The vari-
able x0 refers to the load at time t − 7 while x6 to the load at time t − 1. The
value x7 is the daily average temperature (Celsius degrees) at day t − 1 while x8
is the daily average temperature the same day of the forecast.
Temperature data have been obtained with an average of all the data avail-
able in Italy provided by the ECMWF (European Centre for Medium-Range
Weather Forecasts) ERA-Interim reanalysis [6]. For the variable x8 , we assume
to have a perfect forecast for the day t and hence we use the observed data.
M5 Model Trees. In this case we performed a single execution for each dataset,
in fact the M5 algorithm is not stochastic and so there is no need to perform mul-
tiple runs. We set as 10 the minimum number of training data cases represented
by each leaf, this value has been selected after a set of exploratory tests.
the four binary operators +, −, ∗, and / protected as in [21]. The terminal set
contained 9 variables and 100 random constants randomly generated in the range
[0, 40000]. This range has been chosen considering the magnitude of the values
at stake in the considered application. Because the cardinalities of the function
and terminal sets were so different, we have explicitly imposed functions and
terminals to have the same probability of being chosen when a random node is
needed. The reproduction (replication) rate was 0.1, meaning that each selected
parent has a 10% chance of being copied to the next generation instead of being
engaged in breeding. Standard tree mutation and standard crossover (with uni-
form selection of crossover and mutation points) were used with probabilities of
0.1 and 0.9, respectively. Recall that the use crossover and mutation are applied
only on individuals not selected for the replication. The new random branch
created for mutation has maximum depth 6. Selection for survival was elitist,
with the best individual preserved in the next generation. The selection method
used was tournament selection with size 6. The maximum tree depth is 17. This
depth value is considered the standard value for this parameter [11]. Despite
the higher number of parameters, parameter tuning in GP was not particularly
problematic with respect to the other methods since there are some general rules
(e.g., low mutation and high crossover rate) that provide a good starting point
for setting the parameters.
4.2 Results
We outline here the results obtained after the performed experimentation on the
50 different partitions of the dataset.
The objective of the learning process is to minimize the root mean squared
error (RMSE) between outputs and targets. For each partition of the dataset, we
collected the RMSE on test set of the best individual produced at the end of the
training process. Thus, we have 50 values for each partition and we considered the
median of these 50 values. The median was preferred over the arithmetic mean
due to its robustness to outliers. Repeating this process with all the considered
50 partitions results in a set of 50 values. Each value is the median of the error on
test set at the end of the learning process, for a specific partition of the dataset.
Table 1 reports median and standard deviation of all the median errors
achieved considering all the 50 partitions of the dataset for the considered tech-
niques. The same results are shown with a boxplot in Fig. 1(a). Denoting by IQR
the interquartile range, the ends of the whiskers represent the lowest datum still
within 1.5·IQR of the lower quartile, and the highest datum still within 1.5·IQR
of the upper quartile. Errors for each dataset are shown in Fig. 1(b) where it is
particularly visible that GP and M5 have similar performances.
GP is the best performer, considering both the median and the standard
deviation. To analyze the statistical significance of these results, a set of statis-
tical tests has been performed on the resulting median errors. The Kolmogorov-
Smirnov test shows that the data are not normally distributed hence a rank-
based statistic has been used. The Mann Whitney rank-sum test for pairwise
data comparison is used under the alternative hypothesis that the samples do
220 M. Castelli et al.
Table 1. Median and standard deviation of median test errors of the dataset’s parti-
tions for the considered techniques.
not have equal medians. The p-values obtained are 3.9697 · 10−7 when GP is
compared to Neural Networks, 0.0129 when it is compared to a Neural Network
ensemble and 0.9862 when GP is compared to M5 trees. Therefore, when using
a significance level α = 0.05 with a Bonferroni correction for the value α, we
obtain that in the first two cases GP produces fitness values that are significantly
lower (i.e., better) than the other methods, but the same conclusion cannot be
reached when comparing to M5 (the p-value is equal to 0.9862). Results of Mann
Whitney test are summarized in Table 2.
For a better understanding of the dynamic of the evolutionary process for this
particular real-world problem, in Fig. 2 the median of the test fitness generation
by generation is reported.
Table 2. p-values of Mann Whitney rank-sum test. In bold values that can not reject
the null hypothesis with 5% significance value, meaning that errors difference is not
significative.
40
4 NNs
35 NN Ens
3.5 GP
Test RMSE x104
30 M5
Test RMSE
3
25
2.5
2 20
1.5 15
1 NNs NN Ens GP M5
10
0 10 20 30 40 50
Method Dataset (sorted by RMSE)
(a) (b)
4
x 10
5
Median of Test Error for GP
4.5
4
Test Error
3.5
2.5
2
0 20 40 60 80 100
Number of Generations
Fig. 2. Median of test fitness generation by generation. 50 independent runs have been
considered.
x1 ≤ 66267
L1 (128) x9 ≤ 22.76
L2 (11) x7 ≤ 111090
x7 ≤ 83770 L5 (23)
L3 (21) L4 (22)
Fig. 3. A sample model tree. In brackets the number of data samples represented by
the specific linear model is given.
1
50
P (x7 , x8 ) = fi (x7 , x8 ) (1)
50 i=1
The bigger is the difference between x7 and x8 , the bigger is the “correc-
tion”. This result matches our intuition regarding energy demand pattern: if in
a particular day D, with a temperature equal to T , the energy consumption is
E, we expect that the day D + 1 the energy consumption will be greater than E
Electricity Demand Modelling with Genetic Programming 223
Fig. 4. Average correction considering the best individuals of all the considered
datasets. x7 is the temperature at time t − 1 while x8 is the temperature at time
t. The black points represent the pairs of temperatures present in the dataset.
5 Conclusions
Energy load forecasting can provide important information regarding future
energy consumption. In this work, a genetic programming based forecasting
method has been presented and a comparison with other machine learning tech-
niques has been performed. Experimental results show that GP and M5 perform
quite similar, with a difference that is not statistically significant. Neural net-
works and the ensemble method have returned results of poorer quality. GP,
differently from the other methods, produces solutions that explain data in a
simple and intuitively meaningful way and could be easily interpreted. Hence,
GP has been used in this work not only to build a model, but also to evaluate
the effect of the parameters of the model. In fact, for the considered problem GP
highlighted the relationship between the energy consumption and the external
temperature. Moreover, GP demonstrates the ability to perform feature selection
in an automatic and effective way, using a more compact set of variables than
the other techniques and using always the same limited set of variables in all
the returned solutions. When other machine learning techniques are considered,
the analysis of the solutions is much more difficult and less intuitive. Possible
future works may consider longer forecasting periods, also with the use of data
provided by temperatures forecasts.
References
1. Adya, M., Collopy, F.: How effective are neural networks at forecasting and pre-
diction? A review and evaluation. Journal of Forecasting 17, 481–495 (1998)
2. Barnum, H., Bernstein, H.J., Spector, L.: Quantum circuits for OR and AND of
ORs. Journal of Physics A: Mathematical and General 33, 8047–8057 (2000)
3. Bhattacharya, M., Abraham, A., Nath, B.: A linear genetic programming approach
for modelling electricity demand prediction in victoria. In: Hybrid Information
Systems, pp. 379–393. Springer (2002)
4. Box, G., Jenkins, G.M., Reinsel, G.: Time Series Analysis: Forecasting & Control,
3rd edn. Prentice Hall, February 1994
5. De Felice, M., Yao, X.: Neural networks ensembles for short-term load forecasting.
In: IEEE Symposium Series in Computational Intelligence (SSCI) (2011)
6. Dee, D., Uppala, S., Simmons, A., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U.,
Balmaseda, M., Balsamo, G., Bauer, P., et al.: The ERA-Interim reanalysis: con-
figuration and performance of the data assimilation system. Quarterly Journal of
the Royal Meteorological Society 137(656), 553–597 (2011)
7. Feinberg, E.A., Genethliou, D.: Load forecasting. In: Chow, J., Wu, F., Momoh, J.
(eds.) Applied Mathematics for Restructured Electric Power Systems: Optimiza-
tion, Control and Computational Intelligence, pp. 269–285. Springer (2005)
8. Hansen, J., Nelson, R.: Neural networks and traditional time series methods: a
synergistic combination in state economic forecasts. IEEE Transactions on Neural
Networks 8(4), 863–873 (1997)
9. Hippert, H., Pedreira, C., Souza, R.: Neural networks for short-term load forecast-
ing: a review and evaluation. IEEE Transactions on Power Systems 16(1), 44–55
(2001)
Electricity Demand Modelling with Genetic Programming 225
10. Hobbs, B., Jitprapaikulsarn, S., Konda, S., Chankong, V., Loparo, K.,
Maratukulam, D.: Analysis of the value for unit commitment of improved load
forecasts. IEEE Transactions on Power Systems 14(4), 1342–1348 (1999)
11. Koza, J.R.: Genetic programming: on the programming of computers by natural
selection. MIT Press, Cambridge (1992)
12. Koza, J.R.: Human-competitive results produced by genetic programming. Genetic
Programming and Evolvable Machines 11, 251–284 (2010)
13. Koza, J.R., Al-Sakran, S.H., Jones, L.W.: Automated ab initio synthesis of com-
plete designs of four patented optical lens systems by means of genetic program-
ming. Artif. Intell. Eng. Des. Anal. Manuf. 22(3), 249–273 (2008)
14. Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III:
Darwinian Invention & Problem Solving, 1st edn. Morgan Kaufmann Publishers
Inc., San Francisco (1999)
15. Koza, J.R., Bade, S.L., Bennett, F.H.: Evolving sorting networks using genetic
programming and rapidly reconfigurable field-programmable gate arrays. In: Work-
shop on Evolvable Systems. International Joint Conference on Artificial Intelli-
gence, pp. 27–32. IEEE Press (1997)
16. Lee, D.G., Lee, B.W., Chang, S.H.: Genetic programming model for long-term fore-
casting of electric power demand. Electric Power Systems Research 40(1), 17–22
(1997)
17. Lipson, H.: Evolutionary synthesis of kinematic mechanisms. Artif. Intell. Eng. Des.
Anal. Manuf. 22(3), 195–205 (2008)
18. Lohn, J.D., Hornby, G.S., Linden, D.S.: Human-competitive evolved antennas.
Artif. Intell. Eng. Des. Anal. Manuf. 22(3), 235–247 (2008)
19. Troncoso Lora, A., Riquelme, J.C., Martı́nez Ramos, J.L., Riquelme Santos, J.M.,
Gómez Expósito, A.: Influence of kNN-based load forecasting errors on optimal
energy production. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI),
vol. 2902, pp. 189–203. Springer, Heidelberg (2003)
20. Park, D., El-Sharkawi, M., Marks, R., Atlas, L., Damborg, M.: Electric load fore-
casting using an artificial neural network. IEEE Transactions on Power Systems 6,
442–449 (1991)
21. Poli, R., Langdon, W.B., Mcphee, N.F.: A field guide to genetic programming,
published via Lulu.com (2008). http://www.gp-field-guide.org.uk/
22. Preble, S., Lipson, M., Lipson, H.: Two-dimensional photonic crystals designed by
evolutionary algorithms. Applied Physics Letters 86(6) (2005)
23. Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Confer-
ence on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
24. Solomatine, D., Xue, Y.: M5 model trees and neural networks: application to flood
forecasting in the upper reach of the Huai River in China. Journal of Hydrologic
Engineering 9(6), 491–501 (2004)
25. Štravs, L., Brilly, M.: Development of a low-flow forecasting model using the M5
machine learning method. Hydrological Sciences Journal 52(3), 466–477 (2007)
26. Wang, Y., Witten, I.: Inducing model trees for continuous classes. In: Proceedings
of the Ninth European Conference on Machine Learning, pp. 128–137 (1997)
The Optimization Ability of Evolved Strategies
1 Introduction
Hyper-Heuristics (HH) is a field of research that aims to automatically discover
effective and robust optimization algorithms [1]. HH frameworks can generate
metaheuristics for a given computational problem either by selecting/ combin-
ing low level heuristics or by designing a new method based on components
of existing ones. In [1], Burke et al. have presented a detailed discussion of
these HH categories, complemented with several representative examples. HH
are commonly divided in two sequential stages: Learning is where the strate-
gies are automatically created, whilst, in Validation, the most promising learned
solutions are applied to unseen and more challenging scenarios.
Evolutionary Algorithms (EAs) are regularly applied as HH search engines
to learn effective algorithmic strategies for a given problem or class of related
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 226–237, 2015.
DOI: 10.1007/978-3-319-23485-4 23
The Optimization Ability of Evolved Strategies 227
2 Hyper-Heuristics
algorithm which is then applied to solve the problem at hand. During this pro-
cess, the HH are usually guided by feedback obtained through the execution of
each candidate solution in simple instances of the problem under consideration.
Genetic Programming (GP), a branch of EAs, has been increasingly adopted
as the HH search engine to learn effective algorithmic strategies [10]. In the
recent years, Grammatical Evolution (GE) [9], a linear form of GP, has received
increasing attention from the HH community since it allows for a straightforward
enforcement of semantic and syntactic restrictions, by means of a grammar.
2.1 Framework
In this work a two phase architecture is adopted (see Fig. 1). In the first phase,
Learning, a GE-based HH will construct algorithmic strategies. GE is a GP
branch that decouples the genotype from the phenotype by encoding solutions
as a linear string of integers. The evaluation of an individual requires the appli-
cation of a genotype-phenotype mapping that decodes the linear string into a
syntactically legal algorithmic strategy, by means of a grammar. GE grammars
are composed by a set of production rules written in the Backus-Naur format,
defining the general structure of the programs being evolved and also the com-
ponents that can be selected to build a given strategy (consult [9] for details
concerning GE algorithms).
The quality of a strategy generated by the GE should reflect its ability to
solve a given problem. During evolution, each GE solution is applied to a pre-
determined problem instance and its fitness corresponds to the quality of the
best solution found. Given this modus operandi, the GE evaluation step is a
computationally intensive task. To prevent the learning process from taking an
excessive amount of time, some simple evaluation conditions are usually defined:
i) one single and small problem instance is used to assign fitness; ii) only one
run is performed; iii) the number of iterations is kept low. Clearly, the adoption
of such simple conditions might compromise the results by hindering differences
between competing strategies, leading to an inaccurate assessment of the real
optimization ability of evolved solutions. The experiments described in section 4
aim to gain insight into this situation.
The Optimization Ability of Evolved Strategies 229
of the building of the solutions, pheromone trail update and daemon actions.
Each component contains several alternatives to implement a specific task. As an
example, the decision policy adopted by the ants to build a trail can be either the
random proportional rule used by AS methods or the q-selection pseudorandom
proportional rule introduced by the Ant Colony System (ACS) variant. If the
last option is selected, the GE engine also defines a specific value for the q-value
parameter. The grammar allows the replication of all main ACO algorithms,
such as AS, ACS, Elitist Ant System (EAS), Rank-based Ant System (RAS),
and Max-Min Ant System (MMAS). Additionally, it can generate novel combi-
nations of blocks and settings that define alternative ACO algorithms. Results
presented in [14] show that the GE-HH framework is able to learn original ACO
architectures, different from standard strategies. Moreover, results obtained in
validation instances reveal that the evolved strategies generalize well and are
competitive with human-designed variants (consult the aforementioned reference
for a detailed analysis of the results).
4 Experimental Analysis
Experiments described in this section aim to gain insight into the capacity of
the GE-based HH to identify the most promising solutions during the learning
step. In concrete, we determine the relation between the quality of strategies
as estimated by the GE and their optimization ability when applied to unseen
and harder scenarios. Such study will provide valuable information about the
capacity of the GE to build and identify strategies that are robust, i.e., highly
applicable and with small fallibility.
In practical terms, we take all strategies belonging to the last generation of
the GE and rank them by the fitness obtained in the learning evaluation instance.
Since the GE relies on a steady-state replacement method, the last generation
contains the best optimization strategies identified during the learning phase.
Then, these strategies are applied to unseen instances and ranked again based
on the new results achieved. The comparison of the ranks obtained in different
phases will provide relevant information in what concerns the generalization
ability of the evolved strategies.
Runs 30
Population Size 64
Generations 40
Individual Size 25
Wrapping No
Crossover Operator One-Point with a 0.7 rate
Mutation Operator Integer-Flip with a 0.05 rate
Selection Tournament with size 3
Replacement Steady State
Learning Instances pr76, ts225
The Optimization Ability of Evolved Strategies 231
The GE settings used in the experiments are depicted in Table 1. The pop-
ulation size is set to 64 individuals, each one composed by 25 integer codons,
which is an upper bound on the number of production rules needed to generate
an ACO strategy using the grammar from [14]. As this grammar does not contain
recursive production rules, it is possible to determine the maximum number of
values needed to create a complete phenotype. Also, wrapping is not necessary
since the mapping process never goes beyond the end of the integer string.
We selected several TSP instances from the TSPLIB1 for the experimental
analysis. Two different instances were selected to learn the ACO strategies: pr76
and ts225 (the numerical values represent how many cities the instance has).
Each ACO algorithm encoded in a GE solution is executed once during 100
iterations. The fitness assigned to this strategy corresponds to the best solution
found. The strategies encode all the required settings to run the ACO algorithm,
with the exception of the colony size, which is set to 10% of the number of cities
(truncated the closest integer).
In what concerns the validation step, the best ACO strategies are applied
to four different TSP instances: lin105, pr136, pr226, lin318. In this phase, all
ACO algorithms are run for 30 times and the number of iterations is increased
to 5000. The size of the colony is the same (10% of the size of the instance being
optimised). Table 2 summarises the parameters used. In both phases, the results
are expressed as a normalised distance to the optimum.
Runs 30
Iterations 5000
Colony Size 10% of the Instance Size
Instances lin105, pr136, pr226, lin318
Fig. 2 displays the ranking distributions of the best ACO strategies learned
with the pr76 instance. The 4 panels correspond to the 4 different validation
instances. Each solution from the last GE generation is identified using an integer
from 1 to 64, displayed in the horizontal axis. These solutions are ranked by
the fitness obtained in training (solution 1 is the best strategy from the last
generation, whilst solution 64 is the worst). The vertical axis corresponds to the
position in the rank. Small circles highlight the learning rank and, given the
ordering of the solutions from the GE last generation, we see a perfect diagonal
in all panels. The small triangles identify the ranking of the solutions achieved
in the 4 validation tasks (one on each panel). Ideally, these rankings should
be identical to the ones obtained in training, i.e., the most promising solutions
identified by the GE would be those that generalize better to unseen instances.
An inspection of the results reveals an evident correlation between the behav-
ior of the strategies in both phases. An almost perfect line of triangles is visible in
1
http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
232 N. Lourenço et al.
Fig. 2. Ranking distribution of the best ACO strategies discovered with the pr76 learn-
ing instance.
Fig. 3. Ranking distribution of the best ACO strategies discovered with the ts225
learning instance.
The Optimization Ability of Evolved Strategies 233
the 4 panels, confirming that the best strategies from training keep the good per-
formance in validation. This trend is visible across all the validation instances
and shows that, with the pr76 instance, training is accurately identifying the
more robust and effective ACO strategies.
Fig. 3 displays the ranking distributions of the best ACO strategies learned
with the ts225 instance. Although the general trend is maintained, a close inspec-
tion of the results reveals some interesting disagreements. The best ACO strate-
gies learned with ts225 tend to have a modest performance when applied to small
validation instances, such as lin105 and pr136. On the contrary, they behave well
on larger instances (see, e.g., the results obtained with the validation instance
from panel d)). This outcome confirms that the training conditions impact the
structure of the evolved algorithmic strategies, which is in agreement with other
findings reported in the literature [5,13]. The ts225 instance is considered a hard
TSP instance [8] and, given the results displayed in Fig. 3, it promotes the evo-
lution of ACO strategies particularly suited for TSP problems with a higher
number of cities. In the remainder of this section we present some additional
results that help gain insight into these findings.
To authenticate the correlation between learning and validation we com-
puted the Pearson correlation coefficient between the rankings obtained in each
phase. This coefficient ranges between -1 and 1, where -1 identifies a completely
negative correlation and 1 highlights a total correlation (the best strategies in
learning are the best in validation). The results obtained are presented in Table 3.
Columns contain instances used in learning, whilst rows correspond to validation
instances. The values from the table confirm that there is always a clearly posi-
tive correlation between the two phases, i.e., the quality obtained by a solution
in learning is an accurate estimator if its optimization ability. The lowest val-
ues of the Pearson coefficient are obtained by strategies learned with the ts225
instance and validated in small TSP problems, confirming the visual inspection
of Fig. 3. In this correlation analysis we adopted a significance level of α = 0.05.
All the p-values obtained were smaller than α, thus confirming the statistical
significance of the study.
To complement the analysis, we present in Fig. 4 the absolute performance
of the best learned ACO strategies in the 4 selected validation instances. Each
panel comprises one of the validation scenarios and contains a comparison
between the optimization performance of strategies evolved with different learn-
ing instances (black mean and error bars are from ACO strategies trained with
the pr76 instances, whilst the grey are from algorithms evolved with the ts225
instance). In general, for all panels and for strategies evolved with the two train-
ing instances, the deviation from the optimum increases with the training rank-
ing, confirming that the best algorithms from phase 1 are those that exhibit a
better optimization ability. However, the results reveal an interesting pattern
in what concerns the absolute behavior of the algorithms. For the smaller val-
idation instances (lin 105 and pr136 in panels a) and b)), the ACO strategies
evolved by the smaller learning instances achieve a better performance. On the
contrary, ACO algorithms learned with the ts225 instance are better equipped to
234 N. Lourenço et al.
handle the largest validation problem (lin318 in panel d)). This is another piece
of evidence that confirms the impact of the training conditions on the structure
of the evolved solutions. A detailed analysis of the algorithmic structure reveals
that the pr76 training instance promotes the appearance of extremely greedy
ACO algorithms (e.g., they tend to have very low evaporation levels), partic-
ularly suited for the quick optimization of simple instances. On the contrary,
strategies evolved with the ts225 training instance strongly rely on full evapo-
ration, thus promoting the appearance of methods with increased exploration
ability, particularly suited for larger and harder TSP problems.
Fig. 4. MBF of the best evolved ACO strategies in the 4 validation instances. Black
symbols identify results from strategies learned with the pr76 instance and grey symbols
correspond to results from strategies obtained with the ts225 instance.
Fig. 5 and 6 present the evolution of the Mean Best Fitness (MBF) during
the learning phase, respectively for the pr76 and ts225 instances. Both figures
contain two panels: panel a) exhibits the evolution of the MBF measured by
the learning instance, which corresponds to the value used to guide the GE
exploration; panel b) displays the MBF obtained with the testing instance and
it is only used to detect overfitting.
The results depicted in panels 5a and panel 6a show that the HH framework
gradually learns better strategies. A brief perusal of the MBF evolution reveals
a rapid decrease in the first generations, followed by a slower convergence. This
is explained by the fact that in the beginning of the evolutionary process the
GE combines different components provided by the grammar to build a robust
strategy, whilst at the end it tries to fine-tune the numeric parameters. The
search for a meaningful combination of components has a stronger impact on
fitness than modifying numeric values.
Overfitting occurs when the fitness of the learning strategies keeps improving,
whilst it deteriorates in testing. Panels 5b and 6b show the MBF for the testing
step. An inspection of the results shows that it tends to decrease throughout the
evolutionary run. This shows that the strategies being evolved are not becoming
overspecialized, i.e., they maintain the ability to solve instances different from
the ones used in training.
Fig. 5. Evolution of the MBF for the pr76 learning instance and the corresponding
eil76 testing instance.
Fig. 6. Evolution of the MBF for the ts225 learning instance and the corresponding
tsp225 testing instance.
5 Conclusions
HH is an area of research that aims to automate the design of algorithmic
strategies by combining low-level components of existing methods. Most of the
HH frameworks are divided two phases. The first phase, Learning, is where the
strategies are built and evaluated. Afterwards, the robustness of the best solu-
tions is validated in unseen scenarios. Usually, researchers select the best learned
strategies based only simple and somehow inaccurate criteria. Given this situ-
ation, there is the risk of failing the identification of the most effective learned
algorithmic strategies.
In this work we studied the correlation between the quality exhibited by
strategies during learning and their effective optimization ability when applied
to unseen scenarios. We relied on an existing GE-based HH to evolve full-fledged
ACO algorithms to perform the analysis. Results revealed a clear correlation
between the quality exhibited by the strategies in both phases. As a rule, the
most promising algorithms identified in learning generalize better to unseen val-
idation instances. This study provides valuable guidelines for HH practitioners,
as it suggests that the limited training conditions do not seriously compromise
the identification of the algorithmic strategies with the best optimization abil-
ity. The outcomes also confirmed the impact of the training conditions on the
structure of the evolved solutions. Training with small instances promotes the
appearance of greedy optimization strategies particularly suited for simple prob-
lems, whereas larger (and harder) training cases favor algorithmic solutions that
excel in more complicated scenarios. Finally, a preliminary investigation revealed
that training seems to be overfitting free, i.e., the strategies being learned are
not becoming overspecialized to the specific instance used in the evaluation.
There are several possible extension to this work. In the near future we aim
to validate the correlation study in alternative training evaluations conditions.
Also, a complete understanding of overfitting is still in progress and we will
extend this analysis to a wider range of scenarios (e.g., the size, structure and
The Optimization Ability of Evolved Strategies 237
the number of the instances used for testing might influence the results). Finally,
we will investigate if the main results hold for different HH frameworks.
References
1. Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.:
Hyper-heuristics: A survey of the state of the art. Journal of the Operational
Research Society 64(12), 1695–1724 (2013)
2. Dorigo, M., Stützle, T.: Ant Colony Optimization. Bradford Company, Scituate
(2004)
3. Eiben, A., Smit, S.: Parameter tuning for configuring and analyzing evolutionary
algorithms. Swarm and Evolutionary Computation 1(1), 19–31 (2011)
4. López-Ibáñez, M., Stützle, T.: Automatic configuration of multi-objective
ACO algorithms. In: Dorigo, M., Birattari, M., Di Caro, G.A., Doursat, R.,
Engelbrecht, A.P., Floreano, D., Gambardella, L.M., Groß, R., Şahin, E.,
Sayama, H., Stützle, T. (eds.) ANTS 2010. LNCS, vol. 6234, pp. 95–106. Springer,
Heidelberg (2010)
5. Lourenço, N., Pereira, F.B., Costa, E.: The importance of the learning conditions
in hyper-heuristics. In: Proceedings of the 15th Annual Conference on Genetic and
Evolutionary Computation, GECCO 2013, pp. 1525–1532 (2013)
6. Lourenço, N., Pereira, F., Costa, E.: Learning selection strategies for evolutionary
algorithms. In: Legrand, P., Corsini, M.-M., Hao, J.-K., Monmarché, N., Lutton, E.,
Schoenauer, M. (eds.) EA 2013. LNCS, vol. 8752, pp. 197–208. Springer, Heidelberg
(2014)
7. Martin, M.A., Tauritz, D.R.: A problem configuration study of the robustness of
a black-box search algorithm hyper-heuristic. In: Proceedings of the 2014 Confer-
ence Companion on Genetic and Evolutionary Computation Companion, GECCO
Comp. 2014, pp. 1389–1396 (2014)
8. Merz, P., Freisleben, B.: Memetic algorithms for the traveling salesman problem.
Complex Systems 13(4), 297–346 (2001)
9. O’Neill, M., Ryan, C.: Grammatical evolution: evolutionary automatic program-
ming in an arbitrary language, vol. 4. Springer Science (2003)
10. Pappa, G.L., Freitas, A.: Automating the Design of Data Mining Algorithms:
An Evolutionary Computation Approach, 1st edn. Springer Publishing Company,
Incorporated (2009)
11. Runka, A.: Evolving an edge selection formula for ant colony optimization. In:
Proceedings of the GECCO 2009, pp. 1075–1082 (2009)
12. de Sá, A.G.C., Pappa, G.L.: Towards a method for automatically evolving bayesian
network classifiers. In: Proceedings of the 15th Annual Conference Companion on
Genetic and Evolutionary Computation, pp. 1505–1512. ACM (2013)
13. Smit, S.K., Eiben, A.E.: Beating the world champion evolutionary algorithm
via revac tuning. In: IEEE Congress on Evolutionary Computation (CEC) 2010,
pp. 1–8. IEEE (2010)
14. Tavares, J., Pereira, F.B.: Automatic design of ant algorithms with grammatical
evolution. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.)
EuroGP 2012. LNCS, vol. 7244, pp. 206–217. Springer, Heidelberg (2012)
Evolution of a Metaheuristic for Aggregating
Wisdom from Artificial Crowds
1 Introduction
Computing the optimal solution using an exhaustive search becomes intractable
as the size of the problem grows for computationally hard (NP-hard) prob-
lems [1]. Consequently, heuristics and stochastic search algorithms are commonly
used in an effort to find reasonable approximations to difficult problems in a poly-
nomial time [1,2]. These approximation algorithms are often incomplete and can
produce indeterminate results that vary when repeated on larger search spaces
[3,4]. The variance produced by these types of searches, assuming several search
attempts have been made, can be exploited in a collaborative effort to form better
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 238–249, 2015.
DOI: 10.1007/978-3-319-23485-4 24
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 239
2 Related Work
The concept of applying WoC to the TSP has shown promising results in several
works. Yi and Dry aggregated human-generated TSP responses and demonstrated
240 C.J. Lowrance et al.
where c is the cost vector (i.e. array) that contains the fitness associated with
each individual crowd member and n represents the total number of crowd mem-
bers. Given the range of costs associated with the candidates, we can formulate
a cost-distance using
ci − min(c)
di = (2)
max(c) − min(c)
where di is the cost-distance ratio for the i th member of the crowd. This calcu-
lation provides a ratio or a percentage of how far away a candidate’s solution
is from the best agent in the crowd with respect to the worst. This metric
varies from 0-1, and as individual fitness scores approach the best, the metric
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 243
approaches zero. Finally, this ratio is transformed into a weight that is with
respect to the candidate’s proximity to the best agent by
wpi = 1 − di (3)
BEGIN
1 C1,C2,...,Cn = gather_crowd(n) //Perform n independent searches
2 best_performer = MAX(C1,C2,...,Cn) //Most-fit crowd member
3 worst_performer = MIN(C1,C2,...,Cn) //Least-fit crowd member
4 FOR i = 1 to n // Iterate through all crowd members
5 d = calc_dist_ratio(C(i)), best_performer,worst_performer)
// see eqn. (2)
244 C.J. Lowrance et al.
4 Experimental Evaluation
4.1 Evaluation Overview
The modified WoAC algorithm was repetitively tested to evaluate its perfor-
mance. Every trial run of the algorithm consisted of the following two-step pro-
cess. First, a crowd (i.e. pool) of approximations were generated to a specific
TSP by instantiating 30 GA searches. After the searches converged, the post-
processing metaheuristic was executed according to the procedure outlined in
Section 3. The best aggregate solutions of each method, as well as their respec-
tive crowd sizes, were logged for statistical purposes.
Before reviewing the evaluation statistics, we will provide some preliminary
information about the testing environment. The evaluation process was repeated
on four different TSP datasets, and a total of 100 trials were executed on each.
The TSP datasets were randomly generated using Concorde [19], and the sizes
and optimal (i.e. best-known) costs of each are displayed in Table 1. The optimal
costs were obtained using the TSP solver in Concorde.
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 245
The parameters of the GA used on the TSP datasets are outlined in Table 2.
A total of 30 parallel instances of the GA were allowed to search and converge
before executing the WoAC algorithm. The GA generally produced crowd mem-
bers (i.e. candidate approximations) that were near, but slightly suboptimal to
the costs generated by Concorde. Therefore, there was opportunity for the WoAC
algorithm to improve upon the pool of candidate solutions and aggregate them
to form a new solution closer to the best-known optimum.
GA Parameter Setting
Population Size 20
Parent Selection Fitness Tournament (uniformly random among top 5)
Crossover Operator Single-point (uniformly random)
Mutation Operator Combination of Two Mutation Steps:
1. Uniformly random 1% mutation
2. Greedy custom - adjacent node swap until improved
Fig. 1. The plot on the left shows the convergence of 30 independent instances of the
same GA searching for the optimum tour on a TSP. The plot on the right corresponds to
the exponential (dotted ) and percentage (square) weights assigned to the 30 converged
GA outcomes as part of the modified WoAC algorithm.
Table 4. Mean Costs (μ) and Standard Deviations (σ ) of Crowd Members and WoAC
Aggregates
The other dynamic investigated during the evaluation of the modified WoAC
algorithm was the concept of varying the crowd size and incrementally adding
new opinions (i.e. approximations) to the crowd one-at-a-time. In the original
WoAC algorithm [1,5], a fixed crowd size was determined a priori and all opin-
ions were aggregated only once after considering the votes from all contributors.
The success rate for the equal-weight technique in Table 3 was based on varying
the crowd size; however, if the crowd size was not varied and fixed at 30, then
the mean success rate of the equal-weight technique would drop to 1.8%, given
the results from all the TSP trials (i.e. 400 experiments). To better visualize the
impact of crowd size, Fig. 2 shows a histogram of the number of members in
the crowd when the equal-weight technique successfully surpassed the best GA
during the 400 trials. It indicates that the number of opinions needed to outper-
form the best GA is unpredictable and should not be fixed a priori ; therefore,
varying the crowd size is effective at mitigating this challenge and improving the
success rate of the metaheuristic.
7
Number of Occurrences
0
0 5 10 15 20 25 30
Crowd Size When Equal−Weight Method Surpassed Best GA
Fig. 2. The crowd size at the time when the equal-weight technique surpassed the best
GA. This statistic, which is based on 400 trials from all TSP datasets, is plotted as a
histogram.
248 C.J. Lowrance et al.
References
1. Yampolskiy, R.V., El-Barkouky, A.: Wisdom of artificial crowds algorithm for solv-
ing NP-hard problems. International Journal of Bio-Inspired Computation 3(6),
358–369 (2011)
2. Collet, P., Rennard, J.-P.: Stochastic optimization algorithms (2007). arXiv
preprint arXiv:0704.3780
3. Hoos, H.H., Sttzle, T.: Stochastic search algorithms, vol. 156. Springer (2007)
4. Kautz, H.A., Sabharwal, A., Selman, B.: Incomplete Algorithms. Handbook of
Satisfiability 185, 185–204 (2009)
5. Yampolskiy, R.V., Ashby, L., Hassan, L.: Wisdom of Artificial Crowds - A Meta-
heuristic Algorithm for Optimization. Journal of Intelligent Learning Systems and
Applications 4, 98 (2012)
6. Surowiecki, J.: The wisdom of crowds. Random House LLC (2005)
7. Yi, S.K.M., Steyvers, M., Lee, M.D., Dry, M.: Wisdom of the Crowds in Traveling
Salesman Problems. Memory and Cognition 39, 914–992 (2011)
Evolution of a Metaheuristic for Aggregating Wisdom from Artificial Crowds 249
8. Hoshen, Y., Ben-Artzi, G., Peleg, S.: Wisdom of the crowd in egocentric video
curation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pp. 587–593, June 23–28, 2014
9. Jiangbo, Y., Kian Hsiang, L., Oran, A., Jaillet, P.: Hierarchical Bayesian nonpara-
metric approach to modeling and learning the wisdom of crowds of urban traf-
fic route planning agents. In: 2012 IEEE/WIC/ACM International Conferences
on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 478–485,
December 4–7, 2012
10. Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality
through coordination. Paper presented at the Proceedings of the 2008 ACM con-
ference on Computer supported cooperative work, San Diego, CA, USA
11. Moore, T., Clayton, R.C.: Evaluating the wisdom of crowds in assessing phishing
websites. In: Tsudik, G. (ed.) FC 2008. LNCS, vol. 5143, pp. 16–30. Springer,
Heidelberg (2008)
12. Velic, M., Grzinic, T., Padavic, I.: Wisdom of crowds algorithm for stock mar-
ket predictions. In: Proceedings of the International Conference on Information
Technology Interfaces, ITI, pp. 137–144 (2013)
13. Ashby, L.H., Yampolskiy, R.V.: Genetic algorithm and wisdom of artificial crowds
algorithm applied to light up. In: 2011 16th International Conference on Computer
Games (CGAMES), pp. 27–32, July 27–30, 2011
14. Hughes, R., Yampolskiy, R.V.: Solving Sudoku Puzzles with Wisdom of Artificial
Crowds. International Journal of Intelligent Games and Simulation 7(1), 6 (2013)
15. Khalifa, A.B., Yampolskiy, R.V.: GA with Wisdom of Artificial Crowds for Solving
Mastermind Satisfiability Problem. International Journal of Intelligent Games and
Simulation 6(2), 6 (2011)
16. Port, A.C., Yampolskiy, R.V.: Using a GA and wisdom of artificial crowds to solve
solitaire battleship puzzles. In: 2012 17th International Conference on Computer
Games (CGAMES), pp. 25–29, July 30, 2012-August 1, 2012
17. Puuronen, S., Terziyan, V., Tsymbal, A.: A dynamic integration algorithm for an
ensemble of classifiers. In: Ra, Z., Skowron, A. (eds.) Foundations of Intelligent
Systems. Lecture Notes in Computer Science, vol. 1609, pp. 592–600. Springer,
Berlin Heidelberg (1999)
18. Wagner, C., Ayoung, S.: The wisdom of crowds: impact of collective size and exper-
tise transfer on collective performance. In: 2014 47th Hawaii International Confer-
ence on System Sciences (HICSS), pp. 594–603, January 6–9, 2014
19. Concorde TSP Solver. http://www.math.uwaterloo.ca/tsp/concorde/index.html
The Influence of Topology in Coordinating
Collective Decision-Making in Bio-hybrid
Societies
1 Introduction
Social living is integral to organisms across many magnitudes of scale and com-
plexity, from bacterial biofilms [1] to primates [2], and such societies frequently
exhibit behaviours at the level of the collective, such as moving together by fol-
lowing a leader [3] or self-organised aggregation [4]. Many social animals and
behaviours have a substantial impact on humanity, both beneficial (e.g., pol-
lination) and detrimental (e.g., spread of disease). Since collective behaviours
can emerge from a combination of self-organised interactions, it can be prob-
lematic to understand what triggers, modulates, or suppresses their emergence.
One emerging methodology used to examine collective behaviours is to develop
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 250–261, 2015.
DOI: 10.1007/978-3-319-23485-4 25
The Influence of Topology in Coordinating Collective Decision-Making 251
bio-hybrid societies, in which robots are integrated into the animal society [5,6].
In so doing, such an approach allows direct testing of hypotheses regarding indi-
vidual behaviours and how they are modulated by the group context (for exam-
ple, confirming a hypothesised behaviour by showing that a collective behaviour
is not changed when some animals are substituted by robots [6]). Alternatively,
it becomes possible to use robot behaviours that can manipulate the overall
collective behaviours [5].
Our research aims to develop bio-hybrid societies, ultimately comprising mul-
tiple species that interact with robots, which thus form an interface between ani-
mals that need not naturally share a habitat. Interfacing in this manner has the
added advantage that we can monitor precisely what information is exchanged
between animal groups (and permits experiments that attenuate or amplify spe-
cific information types). To move towards addressing this overarching aim, here
we use a simplified system that comprises multiple populations of the same
species, and we examine this using individual-based simulation modelling.
In recent work, it has been shown that juvenile honeybees interacting with
robots can reach collective decisions jointly with those robots, and moreover,
that such collective decisions can be coordinated across multiple populations of
animals that reside in distinct habitats [7]. This work showed that robots using
cross-inhibition and local excitation led to high levels of collective decision-
making. However, it only compared ‘all-or-nothing’ coupling between the two
habitats. In this paper, we examine the sensitivity of decision-making and coor-
dination of those decisions across arenas, with respect to the inter-robot commu-
nication topology. We find that even relatively sparse numbers of links between
habitats can be sufficient to coordinate outcomes across those habitats. These
findings improve our understanding about the interactions that are sufficient to
coordinate behaviours among separated groups of animals, and the limits that
can be tolerated.
Fig. 1. A preliminary experiment with Fig. 2. How we split and name the are-
a hybrid society comprising robots, real nas into zones during the analysis.
bees, virtual bees and simulated robots,
yielding collective decisions among vir-
tual and real bees.
Our recent work [7] has examined how collective decisions can be reached by
hybrid animal–robot societies, with individual-based simulation following pre-
liminary work that coupled real bees to simulated bees via physical and sim-
ulated robots (see Fig. 1). This work uses robots that are able to manipulate
key environmental variables for honeybees, including the temperature, light, and
vibration in the vicinity of the robot [13], as well as being able to detect the pres-
ence of bees nearby. Using two robots, when we introduced a positive feedback
loop between the heat each robot emits and the presence of bees nearby, the ani-
mals make a collective decision by aggregating around one of the robots. There
was nothing to discriminate between the robots to start with, but the action of
the bee population breaks symmetry – initially by chance, but reinforced by one
or other robot. Moreover, we also showed that collective decisions made in two
separate arenas, each with a population of bees and two robots, can be coordi-
nated when the robots share task-specific information with another robot in the
other arena. The current paper builds on these results, examining the influence
of the inter-robot links used to couple two arenas of simulated honeybees.
2 Methods
We use a real-time platform for 2D robot simulation1 to simulate the inter-
action of bees and robots. We model both bees and robots as agents in this
world, making use of a basic motile robot for the bees, and use a fixed robot
with a customised model that corresponds to the bespoke robots designed in
our laboratory-based work [13]. While simulation modelling cannot fully replace
reality, it does allow us to explore relationships between key micro-level mecha-
nisms and how these can give rise to observed macro-level dynamics. The simu-
lator design enables execution of the exact same robot controllers in simulation
1
Enki – an open source fast 2D robot simulator http://home.gna.org/enki/
The Influence of Topology in Coordinating Collective Decision-Making 253
and the physical robots, adding substantial value to the resolution of models
employed, within the larger cycle of modelling and empirical work.
We model juvenile honeybee behaviour using Beeclust [14]. This is a social
model that results in aggregations in zones of highly favoured stimulus. This
model was developed based on observations of honeybees: specifically, that they
exhibit a preference to aggregate in regions with temperatures in the range
34◦ C–38◦ C; that groups of bees are able to identify optimal temperature zones,
but individual bees do not do so; and that specific inter-animal chemical cues
(e.g., pheromones) have not been shown to be important in this collective aggre-
gation [14]. It has previously been used to illustrate light-seeking behaviour in a
swarm of robots [9]. Here we simulate the bees in a thermal environment.
The robot and bee models used in this work are the same as those in [7] and
for completeness we describe them fully in the remainder of this section.
is a time-
where draw is a raw estimate of local bee density in a given timestep, m
averaged estimate computed as the mean value of the memory vector m, dx is a
vector of density estimates received from other robots in the interaction neigh-
bourhood. In this paper, robots have zero, one, or two neighbours depending on
the specific topology under test.
We use saturate(s) = min(4, s) in this study. The density to heat function
maps the time-averaged detection count to an output temperature via a linear
transformation, and is parameterised to allow for different topologies examined.
Each involved robot x makes a contribution cx = d4x (Tmax − Tmin ) that depends
on a robot’s temperature. These are combined as a weighted sum:
Tnew = Tmin + cx wx , (2)
x∈{l,r,c}
where the relative weights of each robot’s contribution depends on the specific
setup. Each robot can be influenced by the local environment cl , cross-inhibitory
signals from a competitor cc , and collaborative signals from a specific remote
robot cr . In this paper we use cross-inhibition wc = −0.5 throughout. When a
robot has an incoming collaborative link then we set wl = wr = 0.5, and otherwise
set wl = 1 and wr = 0. The topologies tested are shown below.
The Influence of Topology in Coordinating Collective Decision-Making 255
3 Simulation Experiments
This paper aims to examine the sensitivity of coordinating collective decision-
making between arenas as a function of inter-arena communication links. To
address this aim, we examine a range of different topologies within the limits
examined in prior work, employing a basic setup that uses two identical arenas,
each comprising a population of bees and two robots. We vary the inter-arena
links, keeping the link weights positive where present. We use six different topolo-
gies that vary in the number and direction of coupling that they provide between
256 R. Mills and L. Correia
Fig. 3. Topologies tested in the multi-arena experiments. Solid lines indicate positive
contributions and dashed lines indicate negative contributions, with respect to the
receiving robot. From top-left to bottom-right, the two arenas become more loosely
coupled.
the two arenas. Fig. 3 shows these topologies. Moving from top-left to bottom-
right, t-a has the strongest coupling; t-b and t-c have some reciprocal paths;
t-d and t-e have links in one direction only. t-f is the other extreme without
any between-arena links, which we use to establish a baseline for the other out-
comes. These motifs give broad coverage of the space and while other topologies
are possible with more links or more classes of link, our motivation to understand
sparser networks is better served by these networks with fewer rather than more
inter-arena links.
Our prior work showed strong coordination under (a) and confirmed that
the absence of links (f) does not lead to coordination [7]. Intuitively, we expect
a weaker ability to coordinate as the links become sparser; however, we do not
know what the limits are or how gracefully the system will degrade.
We run 50 independent repeats for each of the six topologies, each experiment
lasting for 15 mins. Fig. 5 shows the frequency of statistically collective decisions
made, for each of the topologies. Fig. 4 provides a slightly different view of the
experiments by showing the mean percentage of bees that were present in the east
Zone during the last 120 s. All 50 repeats in each topology have a point plotted
in this graph, and while it is not always the case that the points in the extremes
correspond to a significant collective decision at the time of measurement, the
two views are strongly linked.
Considering the distributions shown in Fig. 5, we perform the following statis-
tical tests to identify the collective decision-making and coordination that arises
The Influence of Topology in Coordinating Collective Decision-Making 257
Fig. 4. Final states of each of the topologies, with one point shown per experiment.
When the system has coordinated the decisions made in each arena, the points only
appear in two of the four corners (indicating that the decisions are mutually con-
strained).
under each topology. Using a χ2 test with a null model of equal likelihood for
each of the four possible decision pairings, topologies t-e and t-f do not deviate
significantly from the null model (ρ > 0.05). The other four topologies all devi-
ate significantly, i.e., they exhibit coordination between the two arenas. We also
compare the overall rate of collective decision-making across different topologies.
The three cases that use two links (t-b, t-c, t-d) have similar ability to induce
coordinated collective decisions as t-a (binomial test, ρ > 0.05). Comparing the
rate of collective decision within the two arenas separately (i.e., any of the four
outcomes), t-b, t-c, t-e have significantly lower rates than t-a (binomial test,
ρ < 0.05); however, although the t-d rate is lower, it is not significantly lower
(ρ > 0.05). None of the topologies exhibit a bias towards either coordinated out-
come (binomial test, ρ > 0.05), with the exception of t-d (ρ < 0.05).
Overall, these results show that: (i) All topologies with two or more links
coupling the two arenas are able to coordinate the decision-making. t-e, the
sparsest topology, is not able to reliably coordinate the decision-making (nor
is the unlinked case t-f but of course this is to be expected). (ii) Most of the
sparser topologies are less frequently able to induce collective decision-making
than the most tightly linked case t-a. t-d is a marginal exception in this regard.
(iii) t-d exhibits some anomalies regarding a bias towards the WW outcome over
the EE outcome. Given the absence of bias in the model or the topology, this is
somewhat surprising and requires further investigation to identify the source of
this bias.
258 R. Mills and L. Correia
Fig. 5. Frequency of runs with significant collective decisions, from 50 repeats. Values in
brackets indicate the frequency of coordinated collective decisions made, i.e., lumping
together both populations. Other values indicate assessment of the two populations
separately.
Fig. 6. Example trajectories showing a fraction of bees in East half of each arena, and
annotated for where the collective decisions are made with thick, solid lines.
References
1. Nadell, C.D., Xavier, J.B., Foster, K.R.: The sociobiology of biofilms. FEMS Micro-
biol. Rev. 33(1), 206–224 (2009)
2. King, A.J., Cowlishaw, G.: Leaders, followers, and group decision-making.
Commun. Integr. Biol. 2(2), 147–150 (2009)
The Influence of Topology in Coordinating Collective Decision-Making 261
3. Faria, J.J., Dyer, J.R., Clément, R.O., Couzin, I.D., Holt, N., Ward, A.J.,
Waters, D., Krause, J.: A novel method for investigating the collective behaviour
of fish: introducing ‘robofish’. Behav. Ecol. Sociobiol. 64(8), 1211–1218 (2010)
4. Parrish, J.K., Edelstein-Keshet, L.: Complexity, pattern, and evolutionary trade-
offs in animal aggregation. Science 284(5411), 99–101 (1999)
5. Halloy, J., Sempo, G., Caprari, G., Rivault, C., Asadpour, M., Tâche, F.,
Said, I., Durier, V., Canonge, S., Amé, J.M., et al.: Social integration of robots
into groups of cockroaches to control self-organized choices. Science 318(5853),
1155–1158 (2007)
6. De Schutter, G., Theraulaz, G., Deneubourg, J.L.: Animal-robots collective intel-
ligence. Ann. Math. Artif. Intel. 31(1–4), 223–238 (2001)
7. Mills, R., Zahadat, P., Silva, F., Mliklic, D., Mariano, P., Schmickl, T., Correia, L.:
Coordination of collective behaviours in spatially separated agents. In: Procs.
ECAL (2015)
8. Webb, B.: Can robots make good models of biological behaviour? Behav. Brain.
Sci. 24(06), 1033–1050 (2001)
9. Kernbach, S., Thenius, R., Kernbach, O., Schmickl, T.: Re-embodiment of hon-
eybee aggregation behavior in an artificial micro-robotic system. Adapt. Behav.
17(3), 237–259 (2009)
10. Campo, A., Garnier, S., Dédriche, O., Zekkri, M., Dorigo, M.: Self-organized dis-
crimination of resources. PLoS ONE 6(5), e19888 (2011)
11. Grodzicki, P., Caputa, M.: Social versus individual behaviour: a comparative app-
roach to thermal behaviour of the honeybee (Apis mellifera L.) and the american
cockroach (Periplaneta americana L.). J. Insect. Physiol. 51(3), 315–322 (2005)
12. Szopek, M., Schmickl, T., Thenius, R., Radspieler, G., Crailsheim, K.: Dynamics
of collective decision making of honeybees in complex temperature fields. PLoS
ONE 8(10), e76250 (2013)
13. Griparic, K., Haus, T., Bogdan, S., Miklic, D.: Combined actuator sensor unit for
interaction with honeybees. In: Sensor Applications Symposium (2015)
14. Schmickl, T., Thenius, R., Moeslinger, C., Radspieler, G., Kernbach, S.,
Szymanski, M., Crailsheim, K.: Get in touch: cooperative decision making based on
robot-to-robot collisions. Auton. Agent. Multi. Agent. Syst. 18(1), 133–155 (2009)
15. Gautrais, J., Michelena, P., Sibbald, A., Bon, R., Deneubourg, J.L.: Allelomimetic
synchronization in merino sheep. Anim. Behav. 74(5), 1443–1454 (2007)
A Differential Evolution Algorithm
for Optimization Including Linear
Equality Constraints
1 Introduction
Constrained optimization problems are common in many areas, and due to the
growing complexity of the applications tackled, nature-inspired metaheuristics
in general, and evolutionary algorithms in particular, are becoming increasingly
popular. That is due to the fact that they can be readily applied to situations
where the objective function(s) and/or constraints are not known as explicit
functions of the decision variables, and when potentially expensive computer
models must be run in order to compute the objective function and/or check the
constraints every time a candidate solution needs to be evaluated.
As move operators are usually blind to the constraints (i.e. when operating
upon feasible individuals they do not necessarily generate feasible offspring) stan-
dard metaheuristics must be equipped with a constraint handling technique. In
simpler situations, repair techniques [18], special move operators [19], or special
decoders [9] can be designed to ensure that all candidate solutions are feasible.
We will not attempt to survey the current literature on constraint handling here,
and the reader is referred to [4], [13], and [5].
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 262–273, 2015.
DOI: 10.1007/978-3-319-23485-4 26
A Differential Evolution Algorithm 263
2 The Problem
E = {x ∈ Rn : Ex = c}
A vector d ∈ Rn is said to be a feasible direction at the point x ∈ E if x + d
is feasible: E(x + d) = c. It follows that the feasible direction d must satisfy
Ed = 0 or, alternatively, that any feasible direction belongs to the null space of
the matrix E
N (E) = {x ∈ Rn : Ex = 0}
Now, given two feasible vectors x1 and x2 it is easy to see that d = x1 − x2 is
a feasible direction, as E(x1 − x2 ) = 0. As a result, one can see that the standard
mutation formulae adopted within DE (see Section 3) would always generate a
feasible vector whenever the vectors involved in the differences are themselves
feasible. If crossover is avoided, one could start from a feasible random initial
population and proceed, always generating feasible individuals.
3 Differential Evolution
Differential evolution (DE) [20] is a simple and effective algorithm for global
optimization, specially for continuous variables. The basic operation performed is
the addition to each design variable in a given candidate solution of a term which
is the scaled difference between the values of such variable in other candidate
solutions in the population. The number of differences applied, the way in which
the individuals are selected, and the type of recombination performed define the
DE variant (also called DE strategy). Although many DE variants can be found
in the literature [17], the simplest one (DE/rand/1/bin) is adopted here:
where f (x) is the average of the objective function values in the current popu-
lation and vl (x) is the violation of the l-th constraint averaged over the current
population. The idea is that the values of the penalty coefficients should be dis-
tributed in a way that those constraints which are more difficult to be satisfied
should have a relatively higher penalty coefficient.
4 The Proposal
Differently from penalty or selection schemes, and from special decoders, the
proposed DE algorithm that satisfies linear equality constraints is classified as a
feasibility preserving approach.
In order to generate a feasible initial population of size NP one could think of
starting from a feasible vector x0 and proceed by moving from x0 along random
feasible directions di : xi = x0 + di , i = 1,2, . . . , NP. A random feasible direction
can be obtained by projecting a random vector onto the null space of E. The
projection matrix is given by [12]
PN (E) = I − E T (EE T )−1 E (1)
where the superscript T denotes transposition. Random feasible candidate solu-
tions can then be generated as
xi = x0 + PN (E) vi , i = 1,2, . . . , NP
where vi ∈ Rn is randomly generated and x0 is computed as x0 = E T (EE T )−1 c.
It is clear that x0 is a feasible vector, as Ex0 = EE T (EE T )−1 c = c.
It should be mentioned that the matrix inversion in eq. 1 is not actually
performed and, then, PN (E) is never computed (see Algorithm 1).
The Differential Evolution for Linear Equality Constraints (DELEqC) algo-
rithm is defined as DE/rand/1/bin, equipped with the feasible initial population
generation procedure in Algorithm 1, and running without crossover.
Notice that any additional non-linear equality or inequality constraint can
be dealt with via existing constraint handling techniques, such as those in
sections 3.1 and 3.2.
266 H.J.C. Barbosa et al.
5 Computational Experiments
In order to test the proposal (DELEqC) and assess its performance, a set of test
problems with linear equality constraints was taken from the literature (their
descriptions are available in Appendix A). The results produced are then com-
pared with those from alternative procedures available in the metaheuristics
literature [15,16], as well as running them with established constraint handling
techniques (Deb’s selection scheme and an adaptive penalty method) to enforce
the linear equality constraints. One hundred independent runs were executed in
all experiments.
Initially, computational experiments were performed aiming at selecting the
values for population size (NP) and F. The tested values here are NP ∈ {5, 10, 20,
30, . . . , 90, 100} and F ∈ {0.1, 0.2, . . . , 0.9, 1}. Due the large number of combi-
nations, performance profiles [7] were used to identify the parameters which
generate the best results. We adopted the maximum budget allowed in [15,16]
as a stop criterion and the final objective function value as the quality metric.
The area under the performance profiles curves [1] indicates that the best per-
forming parameters according to these rules are NP = 50 and F = 0.7, and these
values were then used for all DEs in the computational experiments.
5.1 Results
First, we analyze how fast the proposed technique is in order to find the best
known solution of each test-problem when compared with a DE using (i) Deb’s
selection scheme (DE+DSS) or (ii) an adaptive penalty method (DE+APM).
The objective in this test case is to verify if DELEqC is able to obtain the best
known solutions using a similar number of objective function evaluations. We
used CR = 0.9, NP = 50, and F = 0.7 for both DE+DSS and DE+APM. Statistical
information (best, median, mean, standard deviation, worst), obtained from 100
independent runs, The number of successful runs (sr) is also shown. A successful
run is one in which the best known solution is found (absolute error less or
A Differential Evolution Algorithm 267
Table 2. Results for the test-problems 7, 8, 9, 10, and 11 using the reference budget
(rb) and 2 × rb.
Table 3. Results for the test-problems 7, 8, 9, 10, and 11 using 3 × rb and 4 × rb.
6 Concluding Remarks
which maintains feasibility with respect to the linear equality constraints along
the search process. Results from the computational experiments indicate that
DELEqC outperforms the few alternatives that could be found in the literature
and is a useful additional tool for the practitioner.
Further ongoing work concerns the extension of DELEqC so that linear
inequality constraints are also exactly satisfied, as well as the introduction of
a crossover operator that maintains feasibility with respect to the linear equality
constraints.
Acknowledgments. The authors would like to thank the reviewers for their com-
ments, which helped improve the paper, and the support provided by CNPq (grant
310778/2013-1), CAPES, and Pós-Graduação em Modelagem Computacional da Uni-
versidade Federal de Juiz de Fora (PGMC/UFJF).
References
1. Barbosa, H.J.C., Bernardino, H.S., Barreto, A.M.S.: Using performance profiles
to analyze the results of the 2006 CEC constrained optimization competition. In:
IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)
2. Barbosa, H.J.C., Lemonge, A.C.C.: An adaptive penalty scheme in genetic algo-
rithms for constrained optimization problems. In: Langdon, W.B., et al. (ed.) Proc.
of the Genetic and Evolutionary Computation Conference. USA (2002)
3. Barbosa, H.J.C., Lemonge, A.C.C.: A new adaptive penalty scheme for genetic
algorithms. Information Sciences 156, 215–251 (2003)
4. Coello, C.A.C.: Theoretical and numerical constraint-handling techniques used
with evolutionary algorithms: a survey of the state of the art. Computer Methods
in Applied Mechanics and Engineering 191(11–12), 1245–1287 (2002)
5. Datta, R., Deb, K. (eds.): Evolutionary Constrained Optimization. Infosys Science
Foundation Series. Springer, India (2015)
6. Deb, K.: An efficient constraint handling method for genetic algorithms. Comput.
Methods Appl. Mech. Engrg 186, 311–338 (2000)
7. Dolan, E., Moré, J.J.: Benchmarking optimization software with performance pro-
files. Math. Programming 91(2), 201–213 (2002)
8. Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes.
Springer-Verlag New York Inc., Secaucus (1981)
9. Koziel, S., Michalewicz, Z.: A decoder-based evolutionary algorithm for con-
strained parameter optimization problems. In: Eiben, A.E., Bäck, T., Schoenauer,
M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 231–240. Springer,
Heidelberg (1998)
10. Koziel, S., Michalewicz, Z.: Evolutionary algorithms, homomorphous mappings,
and constrained parameter optimization. Evol. Comput. 7(1), 19–44 (1999)
11. Lemonge, A.C.C., Barbosa, H.J.C.: An adaptive penalty scheme for genetic
algorithms in structural optimization. Intl. Journal for Numerical Methods in
Engineering 59(5), 703–736 (2004)
12. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. Springer-Verlag
(2008)
13. Mezura-Montes, E., Coello, C.A.C.: Constraint-handling in nature-inspired numer-
ical optimization: Past, present and future. Swarm and Evolutionary Computation
1(4), 173–194 (2011)
272 H.J.C. Barbosa et al.
14. Michalewicz, Z., Janikow, C.Z.: Genocop: A genetic algorithm for numerical opti-
mization problems with linear constraints. Commun. ACM 39(12es), 175–201
(1996)
15. Monson, C.K., Seppi, K.D.: Linear equality constraints and homomorphous map-
pings in PSO. IEEE Congress on Evolutionary Computation 1, 73–80 (2005)
16. Paquet, U., Engelbrecht, A.P.: Particle swarms for linearly constrained optimisa-
tion. Fundamenta Informaticae 76(1), 147–170 (2007)
17. Price, K.V.: An introduction to differential evolution. New Ideas in Optimization,
pp. 79–108 (1999)
18. Salcedo-Sanz, S.: A survey of repair methods used as constraint handling techniques
in evolutionary algorithms. Computer Science Review 3(3), 175–192 (2009)
19. Schoenauer, M., Michalewicz, Z.: Evolutionary computation at the edge of feasi-
bility. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN
1996. LNCS, vol. 1141, pp. 245–254. Springer, Heidelberg (1996)
20. Storn, R., Price, K.V.: Differential evolution - a simple and efficient heuristic for
global optimization over continuous spaces. Journal of Global Optimization 11,
341–359 (1997)
Appendix A. Test-Problems
Test-problems 1 to 6 and 7 to 11 were taken from [8] and [15], respectively. Also
note that problems 7 to 11 are subject to the same set of linear constraints (2).
Problem 1 - The solution is x∗ = (1, 1, 1, 1, 1)T with f (x∗ ) = 0.
min (x1 − x2 )2 + (x2 + x3 − 2)2 + (x4 − 1)2 + (x4 − 1)2 + (x5 − 1)2
s.t. x1 + 3x2 = 4
x3 + x4 − 2x5 = 0
x2 − x5 = 0
A Differential Evolution Algorithm 273
1 2
10 10
xi
min x − cos( √ ) + 1
x∈E 4000 i=1 i i=10 i
Multiobjective Firefly Algorithm for Variable
Selection in Multivariate Calibration
1 Introduction
Multivariate calibration may be considered as a procedure for constructing a
mathematical model that establishes the relationship between the properties
measured by an instrument and the concentration of a sample to be deter-
mined [3]. However, the building of a model from a subset of explanatory vari-
ables usually involves some conflicting objectives, such as extracting information
from a measured data with many possible independent variables. Thus, a tech-
nique called variable selection may be used [3]. In this sense, the development of
efficient algorithms for variable selection becomes important in order to deal with
large and complex data. Furthermore, the application of Multiobjective Opti-
mization (MOO) may significantly contribute to efficiently construct an accurate
model [8].
Previous works about multivariate calibration have demonstrated that while
monoobjective formulation uses a bigger number of variables, multiobjective
algorithms can use fewer variables with a lower prediction error [2][1]. On the
one hand, such works have used only genetic algorithms for exploiting MOO. On
the other hand, the application of MOO in bioinspired metaheuristics such as
Firefly Algorithm may be a better alternative in order to obtain a model with
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 274–279, 2015.
DOI: 10.1007/978-3-319-23485-4 27
Multiobjective Firefly Algorithm for Variable Selection 275
a more appropriate prediction capacity [8]. In this sense, some works have used
FA to solve many types of problems. Regarding multiobjective characteristic,
Yang [8] was the first one to present a multiobjective FA (MOFA) to solve
optimization problems and showed that MOFA has advantages in dealing with
multiobjective optimization.
As far as we know, the application of MOO-based Firefly Algorithm is not
still widely used. There is no work in the literature that uses a multiobjective
FA to select variables in multivariate calibration. Therefore, this paper presents
an implementation of a MOFA for variable selection in multivariate calibra-
tion models. Additionally, estimates from the proposed MOFA are compared
with predictions from the following traditional algorithms: Successive Projec-
tions Algorithm (SPA-MLR) [6], Genetic Algorithm (GA-MLR) [1] and Partial
Least Squares (PLS). Based on the results obtained, we concluded that our pro-
posed algorithm may be a more viable tool for variable selection in multivariate
calibration models.
Section 2 describes multivariate calibration and the original FA. The pro-
posed MOFA is presented in Section 3. Section 4 describes the material and
methods used in the experiments. Results are described in Section 5. Finally,
Section 6 shows the conclusions of the paper.
2 Background
2.1 Multivariate Calibration
The multivariate calibration model provides the value of a quantity y based on
values measured from a set of explanatory variables {x1 , x2 , . . . , xk }T [3]. The
model can be defined as:
y = β0 + β1 x1 + ... + βk xk + ε, (1)
where β0 , β1 , ..., βk , i = 1, 2, ..., k, are the coefficients to be determined, and ε
is a portion of random error. Equation (2) shows how the regression coefficients
may be calculated using the Moore-Penrose pseudoinverse [4]:
yi −ŷi ei
| yi | | yi |
M AP E = (100) =(100), (4)
N N
where yi is the actual data at variable i, ŷi is the forecast at variable i, ei is the
forecast error at variable i, and N is the number of samples.
3 Proposal
Previous works have showed that multiobjective algorithms can use fewer vari-
ables and obtain lower prediction error [2]. Thus, this paper presents a Multiob-
jective Firefly Algorithm (MOFA) for variable selection in multivariate calibra-
tion. In the multiobjective formulation of FA, the choice of current best solution
is based on two conditions: i) error of prediction; and ii) number of variables
selected. Among non-dominated solutions, it is applied a multiobjective decision
maker method called W ilcoxon Signed-Rank 1 to choose the final best one [2].
Algorithm 1 shows a pseudocode for the proposed MOFA. In line 9 of Algo-
rithm 1, a firefly i dominates another firefly j when its RMSEP/MAPE and
number of variables selected are lower.
1
W ilcoxon Signed-Rank is a nonparametric hypotheses test used when comparing
two related samples to evaluate if the rank of the population means are different [7].
Multiobjective Firefly Algorithm for Variable Selection 277
4 Experimental Results
The proposed MOFA was implemented using α = 0.2, γ = 1 and ω0 = 0.97. The
number of firelies and the number of generations were 200 and 100, respectively.
We have used for RMSEP comparison three traditional methods for variable
selection: SPA-MLR [6], GA-MLR [1] and the PLS. The number of iterations
was the same for all algorithms and the multiobjective approach was not applied
in this three traditional methods.
The dataset employed in this work consists of 775 NIR spectra of whole-kernel
wheat, which were used as shoot-out data in the 2008 International Diffuse
Reflectance Conference (http://www.idrc-chambersburg.org/shootout.html).
Protein content (%) was used as the y-property in the regression calculations.
All calculations were carried out by using a desktop computer with an Intel
Core i7 2600 (3.40 GHz), 8 GB of RAM memory and Windows 7 Professional.
The Matlab 8.1.0.604 (R2013a) software platform was employed throughout.
Regarding the outcomes, it is important to note that all of them were obtained
by averaging fifty executions.
390 130
fireflies fireflies
380 120
370 110
Number of variables
Number of variables
360 100
350 90
340 80
330 70
320 60
310 50
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.06 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1
RMSEP RMSEP
Fig. 1. Behaviour of fireflies with: (a) randomly generated fireflies.; (b) monoobjective
formulation.
100
fireflies
80
Number of variables
60
40
20
0
0 0.5 1 1.5 2
RMSEP
Table 1. Results for the FA, MOFA, SPA-MLR, GA-MLR and PLS.
6 Conclusion
Acknowledgments. Authors thank the research agencies CAPES and FAPEG for
the support provided to this work.
References
1. Soares, A.S., de Lima, T.W., Soares, F.A.A.M.N., Coelho, C.J., Federson, F.M.,
Delbem, A.C.B., Van Baalen, J.: Mutation-based compact genetic algorithm for
spectroscopy variable selection in determining protein concentration in wheat grain.
Electronics Letters 50, 932–934 (2014)
2. Lucena, D.V., Soares, A.S., Soares, T.W., Coelho, C.J.: Multi-Objective Evolu-
tionary Algorithm NSGA-II for Variables Selection in Multivariate Calibration
Problems. International Journal of Natural Computing Research 3, 43–58 (2012)
3. Martens, H.: Multivariate Calibration. John Wiley & Sons (1991)
4. Paula, L.C.M., Soares, A.S., Soares, T.W., Delbem, A.C.B., Coelho, C.J., Filho,
A.R.G.: Parallelization of a Modified Firefly Algorithm using GPU for Variable
Selection in a Multivariate Calibration Problem. International Journal of Natural
Computing Research 4, 31–42 (2014)
5. Hibon, M., Makridakis, S.: Evaluating Accuracy (or Error) Measures. INSEAD
(1995)
6. Arajo, M.C.U., Saldanha, T.C., Galvo, R.K., Yoneyama, T.: The successive pro-
jections algorithm for variable selection in spectroscopic multicomponent analysis.
Chemometrics and Intelligent Laboratory Systems 57, 65–73 (2001)
7. Ramsey, P.H.: Significance probabilities of the wilcoxon signed-rank test. Journal of
Nonparametric Statistics 2, 133–153 (1993)
8. Yang, X.S.: Multiobjective firefly algorithm for continuous optimization. Engineer-
ing with Computers 29, 175–184 (2013)
2
It is worth noting that the Successive Projections Algorithm is composed of three
phases, and its main objective is to select a subset of variables with low collinear-
ity [6].
Semantic Learning Machine: A Feedforward
Neural Network Construction Algorithm
Inspired by Geometric Semantic Genetic
Programming
1 Introduction
Moraglio et al. [6] recently proposed a new Genetic Programming formulation
called Geometric Semantic Genetic Programming (GSGP). GSGP derives its
name from the fact that it is formulated under a geometric framework [5] and
from the fact that it operates directly in the space of the underlying semantics
of the individuals. In this context, semantics is defined as the outputs of an
individual over a set of data instances. The most interesting property of GSGP
is that the fitness landscape seen by its variation operators is always unimodal
with a linear slope (cone landscape) by construction. This implies that there
are no local optima, and consequently, that this type of landscape is easy to
search. When applied to multidimensional real-life datasets, GSGP has shown
competitive results in learning and generalization [3,7]. In this paper, we adapt
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 280–285, 2015.
DOI: 10.1007/978-3-319-23485-4 28
Semantic Learning Machine 281
the two subtrees TR1 and TR2 ) results in significant improvements in terms of
generalization ability [3]. In fact, if a unbounded mutation (equivalent to using
a linear activation function) is used, there is a tendency for GSGP to greatly
overfit the training data [3]. For this reason, it is recommended that the acti-
vation function for the neurons in the last hidden layer to be a function with
a relatively small codomain. In this work a modified logistic function (trans-
forming the logistic function output to range in the interval [−1, 1]) is used for
this purpose. In terms of generalization ability, it is also essential to use a small
learning/mutation step [3]. If more than one hidden layer is used, the activation
functions for the remaining neurons may be freely chosen.
2.2 Algorithm
The SLM algorithm is essentially a geometric semantic hill climber for feedfor-
ward neural networks. The idea is to perform a semantic sampling with a given
size (SLM parameter) by applying the mutation operator defined in the previ-
ous subsection. As is common in hill climbers, only one solution (in this case a
neural network) is kept along the run. At each iteration, the mentioned semantic
sampling is performed to produce N neighbors. At the end of the iteration, the
best individual from the previous best and the newly generated individuals is
kept. The process is repeated until a given number of iterations (SLM param-
eter) has been reached. As mentioned in the previous subsection, the mutation
operator always adds a new neuron to the last hidden layer, so the number of
neurons in the last hidden layer is at most the same as the number of iterations.
This number of neurons can be smaller than the number of iterations if in some
iterations it was not possible to generate an individual superior to the current
best.
3 Experimental Setup
The experimental setup is based on the setup of Vanneschi et al. [7] and
Gonçalves et al. [3], since these works recently provided results for GSGP. Exper-
iments are run for 500 iterations/generations because that is where the statistical
comparisons were made in the mentioned works. 30 runs are conducted. Popu-
lation/sample size is 100. Training and testing set division is 70% - 30%. Fitness
is computed as the root mean squared error. The initial tree initialization is
performed with the ramped half-and-half method, with a maximum depth of 6.
Besides GSGP, the Semantic Stochastic Hill Climber (SSHC) [6] is also used as
baseline for comparison. The variation operators used are the variants defined
for real-value semantics [6]: SGXM crossover for GSGP, and SGMR mutation
for GSGP and SSHC. For GSGP a probability of 0.5 is used for both opera-
tors. The function set contains the four binary arithmetic operators: +, -, *,
and / (protected). No constants are used in the terminal set. Parent selection in
GSGP is based on tournaments of size 4. Also for GSGP, survivor selection is
elitist as the best individual always survives to the next generation. All claims
Semantic Learning Machine 283
4 Experimental Study
Figure 1 presents the training and testing error evolution plots for SLM, GSGP
and SSHC. These evolution plots are constructed by taking the median over 30
runs of the training and testing error of the best individuals in the training data.
The mutation/learning step used was 1 for the the Bio and PPB datasets (as in
Vanneschi et al. [7] and Gonçalves et al. [3]), and 10 for the LD50 as it was found,
in preliminary testing, to be a suitable value (other values tested were: 0.1, 1, and
100). A consideration for the different initial values (at iteration/generation 0)
is in order. The SLM presents much higher errors than GSGP/SSHC after the
random initialization. This is explained by the fact that the weights for the SLM
are generated with uniform probability between -1.0 and 1.0, and consequently, the
amount of data fitting is clearly bounded. On the other hand, GSGP and SSHC
have no explicit bound on the random trees and therefore can provide a superior
initial explanation of the data. It is interesting to note that, despite this initial
disadvantage, the SLM compensates with a much higher learning rate. This higher
learning efficiency is confirmed by the statistically significant superiority found in
terms of training error across all datasets, against GSGP (p-values: Bio 2.872 ×
10−11 , PPB 2.872 × 10−11 and LD50 7.733 × 10−10 ), and against SSHC (p-values:
Bio 2.872 × 10−11 , PPB 2.872 × 10−11 and LD50 3.261 × 10−5 ).
This learning superiority is particularly interesting when considering that
the SLM and the SSHC use the exact same geometric semantic mutation oper-
ator. This raise the question: how can two methods with the same variation
operator, the same induced semantic landscape, and the same parametrizations
achieve such different outcomes? The answer lies in the different semantic dis-
tributions that result from the random initializations. Different representations
have different natural ways of being randomly initialized. This translates into
different semantic distributions and, consequently, to different offspring distri-
butions. From the results it is clear that the distribution induced by the random
initialization of a list of weights (used in SLM), is more well-behaved than the
initialization of a random tree (used in SSHC). In the original GSGP proposal,
Moraglio et al. [6] provided a discussion on whether syntax (representation)
matters in terms of search. They argued that, in abstract, the offspring distri-
butions may be affected by the different syntax initializations. In our work, we
284 I. Gonçalves et al.
80 80
60 60
Training error
Testing error
SLM
40 40 GSGP
SSHC
20 20
0 0
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations
80 80
60 60
Training error
20 20
0 0
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations
2600 2600
Training error
2400 2400
Testing error
SLM
2200 2200 GSGP
SSHC
2000 2000
1800 1800
0 100 200 300 400 500 0 100 200 300 400 500
Iterations / generations Iterations / generations
Fig. 1. Bio (top), PPB (center) and LD50 (bottom) training and testing error evolution
plots
can empirically see how different representations induce different offspring dis-
tributions and consequently reach considerably different outcomes. A possible
research venue lies in analyzing the semantic distributions induced by different
tree initialization methods, and to possibly propose new tree initializations that
are more well-behaved.
In terms of generalization, results show that all methods achieve similar
results. The only statistically significant difference shows that the SLM is supe-
rior to GSGP in the Bio dataset (p-value: 1.948 × 10−4 ). However, it seems
that in this case, GSGP is still evolving and that in a few more generations
may reach a generalization similar to the SLM. On a final note, the evolution
plots also show that SSHC consistently learns the training data faster and better
than GSGP. This should be expected as the semantic space has no local optima
and consequently the search can be focused around the best individual in the
population. These differences are confirmed as statistically significant (p-values:
Bio 2.872 × 10−11 , PPB 2.872 × 10−11 and LD50 1.732 × 10−4 ). There are no
statistically significant differences in terms of generalization.
Semantic Learning Machine 285
5 Conclusions
This work presented a novel feedforward Neural Network (NN) construction algo-
rithm, derived from Geometric Semantic Genetic Programming (GSGP). The
proposed algorithm shares the same fitness landscape as GSGP, which enables
an efficient search for any supervised learning problem. Results in regression
datasets show that the proposed NN construction algorithm is able to surpass
GSGP, with statistical significance, in terms of learning the training data. Gen-
eralization results are similar to those of GSGP. Future work involves extending
the experimental analysis to other regression datasets and to provide results for
classification tasks. Comparisons with other NN algorithms and other commonly
used supervised learning algorithms (e.g. Support Vector Machines) are also in
order.
References
1. Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for
computational pharmacokinetics in drug discovery and development. Genetic
Programming and Evolvable Machines 8(4), 413–432 (2007)
2. Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic program-
ming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A.,
Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84.
Springer, Heidelberg (2013)
3. Gonçalves, I., Silva, S., Fonseca, C.M.: On the generalization ability of geometric
semantic genetic programming. In: Machado, P., Heywood, M.I., McDermott, J.,
Castelli, M., Garcı́a-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) Genetic Pro-
gramming. LNCS, vol. 9025, pp. 41–52. Springer, Heidelberg (2015)
4. Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random sampling tech-
nique for overfitting control in genetic programming. In: Moraglio, A., Silva, S.,
Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244,
pp. 218–229. Springer, Heidelberg (2012)
5. Moraglio, A.: Towards a Geometric Unification of Evolutionary Algorithms. Ph.D.
thesis, Department of Computer Science, University of Essex, UK, November 2007
6. Moraglio, A., Krawiec, K., Johnson, C.G.: Geometric semantic genetic program-
ming. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M.
(eds.) PPSN 2012, Part I. LNCS, vol. 7491, pp. 21–31. Springer, Heidelberg (2012)
7. Vanneschi, L., Castelli, M., Manzoni, L., Silva, S.: A new implementation of geo-
metric semantic GP and its application to problems in pharmacokinetics. In:
Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013.
LNCS, vol. 7831, pp. 205–216. Springer, Heidelberg (2013)
Eager Random Search for Differential Evolution
in Continuous Optimization
1 Introduction
Differential evolution [1] presents a class of metaheuristics [2] to solve real param-
eter optimization tasks with nonlinear and multimodal objective functions. DE
has been used as very competitive alternative in many practical applications due
to its simple and compact structure, easy use with fewer control parameters, as
well as high convergence in large problem spaces. However, the performance of
DE is not always excellent to ensure fast convergence to the global optimum. It
can easily get stagnation resulting in low precision of acquired results or even
failure.
Hybridization of EAs with local search (LS) techniques can greatly improve
the efficiency of the search. EAs that are augmented with LS for self-refinement
are called Memetic Algorithms (MAs) [3]. Memetic computing has been used
with DE to refine individuals in their neighborhood. Norman and Iba [4] pro-
posed a crossover-based adaptive method to generate offspring in the vicinity of
parents. Many other works apply local search mechanisms to certain individuals
of every generation to obtain possibly even better solutions, see examples in ([5],
[6], [7]). Other Researchers investigate adaptation of control parameters of DE
to improve the performance ([8], [9]).
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 286–291, 2015.
DOI: 10.1007/978-3-319-23485-4 29
Eager Random Search for Differential Evolution in Continuous Optimization 287
2 Basic DE
DE is a stochastic and population based algorithm with Np individuals in the
population. Every individual in the population stands for a possible solution to
the problem. One of the Np individuals is represented by Xi,g with i = 1, 2, ., Np
and g is the index of the generation. DE has three consecutive steps in every
iteration: mutation, recombination and selection. The explanation of these steps
is given below:
MUTATION. Np mutated individuals are generated using some individuals
of the population. The vector for the mutated solution is called mutant vec-
tor and it is represented by Vi,g . There are some ways to mutate the current
population, but the most common one is called random mutation strategy. This
mutation strategy will be explained below. The other mutation strategies and
their performance are given in [10].
SELECTION. In this last step we compare the fitness of a trial vector with
the fitness of its parent in the population with the same index i, selecting the
individual with the better fitness to enter the next generation. So we compare
Normal Local Search (NLS). In Normal Local Search (NLS), we create a new
trial solution by disturbing the current solution in terms of a normal probability
distribution. This means that, if dimension k is selected for change, the value on
this dimension for trial solution X will be given by
Cauchy Local Search (CLS). In this third local search strategy, we apply
the Cauchy density function in creating trial solutions in the neighborhood. It is
called Cauchy Local search (CLS). A nice property of the Cauchy function is that
it is centered around its mean value whereas exhibiting a wider distribution than
the normal probability function. The value of trial solution X will be generated
as follows:
DE has three main control parameters: population size (Np ), crossover rate (CR)
and the scaling factor (F ) for mutation. The following specification of these
parameters was used in the experiments: N p = 60, CR = 0.85 and F = 0.9. All
290 M. Leon and N. Xiong
the algorithms were applied to the benchmark problems with the aim to find the
best solution for each of them. Every algorithm was executed 30 times on every
function to acquire a fair result for the comparison. The condition to finish the
execution of DE programs is that the error of the best result found is below 10e-8
with respect to the true minimum or the number of evaluations has exceeded
300,000. In DECLS, t = 0.2.
Table 1. Average error of the found solutions on the test problems with random
mutation strategy
We can see in Table 1 that DECLS is the best in all the unimodal functions
except on Function 4 that is the second best. In multimodal functions, DERLS
is the best on Functions 8, 10 and 11. DECLS found the exact optimum all the
times in Functions 12 and 13. The basic, DE performed the worst in multimodal
functions. According to the above analysis, we can say that DECLS improve a
lot the performance of basic DE with random mutation strategy and also we
found out that DERLS is really good in multimodal functions particularly on
Function 8, which is the most difficult function. Considering all the functions,
the best algorithm is DECLS and the weakest one is the basic DE.
5 Conclusions
are introduced and explained as instances of the general ERS method. The use
of different local search strategies from the ERS family leads to variants of the
proposed memetic DE algorithm, which are abbreviated as DERLS, DENLS and
DECLS respectively. The results of the experiments have demonstrated that the
overall ranking of DECLS is superior to the ranking of basic DE and other
memetic DE variants considering all the test functions. In addition, we found
out that DERLS is much better than the other counterparts in very difficult
multimodal functions.
References
1. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for
global optimization over continuous spaces. Journal of Global Optimization 11(4),
341–359 (1997)
2. Xiong, N., Molina, D., Leon, M., Herrera, F.: A walk into metaheuristics for engi-
neering optimization: Principles, methods, and recent trends. International Journal
of Computational Intelligence Systems 8(4), 606–636 (2015)
3. Krasnogor, N., Smith, J.: A tutorial for competent memetic algorithms: Model, tax-
onomy, and design issue. IEEE Transactions on Evolutionary Computation 9(5),
474–488 (2005)
4. Norman, N., Ibai, H.: Accelerating differential evolution using an adaptative local
search. IEEE Transactions on Evolutionary Computation 12, 107–125 (2008)
5. Ali, M., Pant, M., Nagar, A.: Two local search strategies for differential evolution.
In: Proc. 2010 IEEE Fifth International Conference on Bio-Inspired Computing:
Theories and Applications (BIC-TA), Changsha, China, pp. 1429–1435 (2010)
6. Dai, Z., Zhou, A.: A diferential ecolution with an orthogonal local search. In:
Proc. 2013 IEEE Congress on Evolutionary Computation (CEC), Cancun, Mexico,
pp. 2329–2336 (2013)
7. Leon, M., Xiong, N.: Using random local search helps in avoiding local optimum in
diefferential evolution. In: Proc. Artificial Intelligence and Applications, AIA2014,
Innsbruck, Austria, pp. 413–420 (2014)
8. Qin, A., Suganthan, P.: Self-adaptive differential evolution algorithm for numerical
optimization. In: The 2005 IEEE Congress on Evolutionary Computation, vol. 2,
pp. 1785–1791 (2005)
9. Leon, M., Xiong, N.: Greedy adaptation of control parameters in differential evo-
lution for global optimization problems. In: IEEE Conference on Evolutionary
Computation, CEC2015, Japan, pp. 385–392 (2015)
10. Leon, M., Xiong, N.: Investigation of mutation strategies in differential evolution for
solving global optimization problems. In: Rutkowski, L., Korytkowski, M., Scherer,
R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS,
vol. 8467, pp. 372–383. Springer, Heidelberg (2014)
Learning from Play: Facilitating Character
Design Through Genetic Programming
and Human Mimicry
1 Introduction
Designing intelligence is a sufficiently complex task that it can itself be aided by
the proper application of AI techniques. Here we present a system that mines
human behaviour to create better Game AI. We utilise genetic programming
(GP) to generalise from and improve upon human game play. More importantly,
the resulting representations are amenable to further authoring and develop-
ment. We introduce a GP system for evolving game characters by utilising
recorded human play. The system uses the platformerAI toolkit, detailed in
section 3, and the Java genetic algorithm and genetic programming package
(JGAP) [6]. JGAP provides a system to evolve agents when given a set of com-
mand genes, a fitness function, a genetic selector and an interface to the target
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 292–297, 2015.
DOI: 10.1007/978-3-319-23485-4 30
Learning from Play 293
In practice, making a good game is achieved by a good concept and long iterative
cycles in refining mechanics and visuals, a process which is resource consuming.
It requires a large number of human testers to evaluate the qualities of a game.
Thus, analysing tester feedback and incrementally adapting games to achieve
better play experience is tedious and time consuming. This is where our approach
comes into play by trying to minimise development, manual adaptation and
testing time, yet allow the developer to remain in full control.
Agent Design initially no more than creating 2D shapes on the screen, e.g.
the aliens in SpaceInvaders. Due to early hardware limitations, more complex
approaches were not feasible. With more powerful computers it became feasible
to integrate more complex approaches from science. In 2002 Isla introduced the
BehaviourTree (BT) for the game Halo, later elaborated by Champandard
[2]. BT has become the dominant approach in the industry. BTs are a combi-
nation of a decision tree (DT) with a pre-defined set of node types. A related
academic predecessor of the BT were the Posh dynamic plans of Bod [1,3].
Generative Approaches [4,7] build models to create better and more appeal-
ing agents. In turn, a generative agent uses machine learning techniques to
increase its capabilities. Using data derived from human interaction with a
game—referred to as human play traces—can allow the game to act on or re-
act to input created by the player. By training on such data it is possible to
derive models able to mimic certain characteristics of players. One obvious dis-
advantage of this approach is that the generated model only learns from the
behaviour exhibited in the data provided to it. Thus, interesting behaviours are
not accessible because they were never exhibited by a player.
In contrast to other generative agent approaches [7,9,15] our system combines
features which allow the generation and development of truly novel agents. The
first is the use of un-authored recorded player input as direct input into our
fitness function. This allows the specification of agents only by playing. The
second feature is that our agents are actual programs in the form of java code
which can be altered and modified after evolving into a desired state, creating
a white box solution. While Stanley and Miikkulainen[13] use neural networks
(NN) to create better agents and enhance games using Neuroevolution, we utilise
genetic programming [10] for the creation and evolution of artificial players in
human readable and modifiable form. The most comparable approach is that
of Perez et al.[9] which uses grammar based evolution to derive BTs given an
294 S.E. Gaudl et al.
initial set and structure of subtrees. In contrast, we start with a clean slate to
evolve novel agents as directly executable programs.
way through a level was given by Baumgarten [14] using the A∗ algorithm. This
approach produces agents which are extremely good at winning the level within
a minimum amount of time but at the same time are clearly distinguishable
from actual human players. For games and game designers a less distinguishable
approach is normally more appealing—based on our initial assumptions.
4 Fitness Function
References
1. Bryson, J.J., Stein, L.A.: Modularity and design in reactive intelligence. In: Pro-
ceedings of the 17th International Joint Conference on Artificial Intelligence,
pp. 1115–1120. Morgan Kaufmann, Seattle, August 2001
2. Champandard, A.J.: AI Game Development. New Riders Publishing (2003)
3. Gaudl, S.E., Davies, S., Bryson, J.J.: Behaviour oriented design for real-time-
strategy games - an approach on iterative development for starcraft ai. In: Proceed-
ings of the Foundations of Digital Games, pp. 198–205. Society for the Advance-
ment of Science of Digital Games (2013)
4. Holmgard, C., Liapis, A., Togelius, J., Yannakakis, G.: Evolving personas for player
decision modeling. In: 2014 IEEE Conference on Computational Intelligence and
Games (CIG), pp. 1–8, August 2014
5. Krawiec, K., O’Reilly, U.M.: Behavioral programming: a broader and more detailed
take on semantic gp. In: Proceedings of the 2014 Conference on Genetic and Evo-
lutionary Computation, pp. 935–942. ACM (2014)
6. Meffert, K., Rotstan, N., Knowles, C., Sangiorgi, U.: Jgap-java genetic algorithms
and genetic programming package, September 2000. http://jgap.sf.net (last viewed:
January 2015)
7. Ortega, J., Shaker, N., Togelius, J., Yannakakis, G.N.: Imitating human playing
styles in super mario bros. Entertainment Computing 4(2), 93–104 (2013)
8. Osborn, J.C., Mateas, M.: A game-independent play trace dissimilarity metric. In:
Proceedings of the Foundations of Digital Games. Society for the Advancement of
Science of Digital Games (2014)
9. Perez, D., Nicolau, M., O’Neill, M., Brabazon, A.: Evolving behaviour trees for
the mario ai competition using grammatical evolution. In: Di Chio, C., et al. (eds.)
EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 123–132. Springer, Heidelberg
(2011)
10. Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R.: A field guide to genetic pro-
gramming. Lulu. com (2008)
11. Schwefel, H.P.P.: Evolution and optimum seeking: the sixth generation. John Wiley
& Sons, Inc. (1993)
12. Smit, S.K., Eiben, A.E.: Comparing parameter tuning methods for evolution-
ary algorithms. In: IEEE Congress on Evolutionary Computation, CEC 2009,
pp. 399–406. IEEE (2009)
13. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting
topologies. Evolutionary Computation 10, 99–127 (2002)
14. Togelius, J., Karakovskiy, S., Baumgarten, R.: The 2009 mario ai com-
petition. In: 2010 IEEE Congress on Evolutionary Computation (CEC),
pp. 1–8. IEEE (2010)
15. Togelius, J., Yannakakis, G., Karakovskiy, S., Shaker, N.: Assessing believability.
In: Hingston, P. (ed.) Believable Bots, pp. 215–230. Springer, Heidelberg (2012)
Memetic Algorithm for Solving the 0-1
Multidimensional Knapsack Problem
1 Introduction
The Multidimensional Knapsack Problem (MKP) is a strong NP-hard combi-
natorial optimization problem [14]. The MKP has been extensively considered
because of its theoretical importance and wide range of applications. Many prac-
tical engineering design problems can be formulated as MKP such as: the capital
budgeting problem [17], the project selection [2] and so on.
The solutions for MKP can be classified into exact, approximate and hybrid.
The exact solutions are used for problems of small size. Branch and bound,
branch and cut, linear, dynamic and quadratic programming, etc. are the princi-
pal exact methods used for solving MKP [13,21]. The approximate solutions are
used when the data size is high but it is not sure to obtain the optimal results.
They are mainly based on heuristics such as: simulated annealing, tabu search,
genetic algorithm, ant colony particle swarm, harmony search, etc [5,6,20]. The
hybrid solutions combine two or more exact or/and approximate solutions. These
solutions are the most used in the field of optimization and especially for MKP
such as [4,8–12,18] and so on.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 298–304, 2015.
DOI: 10.1007/978-3-319-23485-4 31
Memetic Algorithm for Solving the 0-1 Multidimensional Knapsack Problem 299
n
Subject to : aij xj ≤ bi i ∈ {1 . . . m} (2)
j=1
xj ∈ {0, 1} j ∈ {1 . . . n}
4 The Experiments
GA, GA-SA and GA-SLS were implemented in C++ on 2 GHz Intel Core 2
Duo processor and 2 GB RAM. They were tested on the OR-Library [22] 54
benchmarks, with m = 2 to 30 and n = 6 to n = 105 and on the OR-Library
GK [22] with m = 15 to m = 50 and n = 100 to n = 1500. In all experiments the
parameters are chosen empirically such as: the number of iteration N I = 30000,
the population size P S = 100, the waiting time W T = 50, the number of crossing
bites N CB = 1/10. the initial temperature T0 = 50, the walk probability wp =
0.93, the number of local iteration N = 100 and the number of runs is 30.
Results for the SAC-94 Standard Instances. The average fitness (Result),
the average gap (GAP ), the best (Best) and the worst fitness (Worst), the
number of success runs (NSR), the number of success instance (NSI ) and the rate
of success runs (RSR) have been recorded by analyzing the recorded obtained
fitness. Also, the average CPU runtime (Time) has been calculated. All the
results and statistics computed by the GA, GA-SA and GA-SLS are reported
in Tables 1-2. From results, GA resolved to optimality one instance of 54 with
average gap of 4,454 %, GA-SA 35 instances with a global gap of 0,093 % and GA-
SLS 39 instances with a global gap of 0,0221 %. GA-SLS reached the optimum
at least once in 50 instances followed by GA-SA in 49 instances then GA in 18
instances. The RSR show that GA-SLS totally solved instances of groups hp, pb
Memetic Algorithm for Solving the 0-1 Multidimensional Knapsack Problem 301
GA GA-SA GA-SLS
Dataset Opt Result GAP Result GAP Result GAP
hp 3418 3381,07 1,080 3418 0 3418 0
3186 3120,63 2,052 3186 0 3186 0
Average 3302 3250,85 1,566 3302 0 3302 0
3090 3060,27 0,962 3090 0 3090 0
3186 3139,13 1,471 3186 0 3186 0
pb 95168 93093,5 2,180 95168 0 95168 0
2139 2079,93 2,762 2139 0 2139 0
776 583,767 24,772 776 0 776 0
1035 1018,13 1,630 1035 0 1035 0
Average 17565,666 17162,454 5,629 17565,666 0 17565,666 0
87061 86760,1 0,346 87061 0 87061 0
4015 4015 0 4015 0 4015 0
pet 6120 6091 0,474 6120 0 6120 0
12400 12380,3 0,159 12400 0 12400 0
10618 10560,9 0,538 10609,1 0,084 10608,6 0,089
16537 16373,9 0,986 16528,1 0,054 16528,3 0,053
Average 22791,833 22696,866 0,417 22788,866 0,023 22788,816 0,024
sento 7772 7606,03 2,135 7772 0 7772 0
8722 8569,7 1,746 8721,2 0,009 8722 0
Average 8247 8087,865 1,941 8246,6 0,005 8247 0
141278 141263 0,011 141278 0 141278 0
130883 130857 0,020 130883 0 130883 0
95677 94496,2 1,234 95677 0 95677 0
weing 119337 118752 0,490 119337 0 119337 0
98796 97525,3 1,286 98796 0 98796 0
130623 130590 0,025 130623 0 130623 0
1095445 1086484,2 0,818 1094579,6 0,079 1095432,7 0,0011
624319 581683 6,829 623727 0,095 624319 0
Average 304545,375 297707,062 1,339 304362,625 0,022 304543,22 0,0001
4554 4530,03 0,526 4554 0 4554 0
4536 4506,77 0,644 4536 0 4536 0
4115 4009,37 2,567 4115 0 4115 0
4561 4131,07 9,426 4561 0 4561 0
4514 4159,73 7,848 4514 0 4514 0
5557 5491,73 1,175 5557 0 5557 0
5567 5428,37 2,490 5567 0 5567 0
5605 5509,43 1,705 5605 0 5605 0
5246 5104,5 2,697 5246 0 5246 0
6339 6014,23 5,123 6339 0 6339 0
5643 5234,33 7,242 5643 0 5643 0
6339 5916 6,673 6339 0 6339 0
6159 5769,5 6,324 6159 0 6159 0
6954 6495,6 6,592 6954 0 6954 0
weish 7486 6684,6 10,705 7486 0 7486 0
7289 6878,4 5,633 7289 0 7289 0
8633 8314,73 3,687 8629,5 0,041 8633 0
9580 9146,5 4,525 9559,63 0,213 9568,63 0,119
7698 7223,17 6,168 7698 0 7698 0
9450 8632,1 8,655 9448,63 0,014 9449,37 0,007
9074 8114,4 10,575 9073,23 0,008 9073,33 0,007
8947 8321,17 6,995 8926,73 0,227 8938,83 0,091
8344 7603,77 8,871 8321,97 0,264 8318,93 0,3
10220 9685,77 5,227 10152,9 0,657 10164,2 0,546
9939 9077,9 8,664 9900,07 0,392 9910,73 0,284
9584 8728,87 8,922 9539,4 0,465 9560,53 0,245
9819 8873,7 9,627 9777,9 0,419 9802,03 0,173
9492 8653,57 8,833 9423,87 0,718 9442,17 0,525
9410 8466,67 10,025 9359,5 0,537 9369,5 0,430
11191 10250,1 8,408 11106,3 0,757 11128,7 0,557
Average 7394,833 6898,536 6,218 7379,386 0,157 7386,772 0,109
302 A. Rezoug et al.
Table 2. Results of NSR, RSR and Time parameters obtained by GA, GA-SA and
GA-SLS.
GA GA-SA GA-SLS
NSR RSR Time NSR RSR Time NSR RSR Time
hp 1 1,67 1,798 2 100 6,077 2 100 7,101
pb 4 3,33 1,811 6 100 3,443 6 100 4,681
pet 4 32,78 1,395 6 81,67 11,296 6 80,55 12,179
sento 0 0,00 2,616 2 66,67 46,584 2 100 24,277
weing 6 39,58 1,669 8 76,25 10,259 8 99,58 10,586
weish 3 4,55 1,620 25 66,89 17,352 26 76,88 17,146
Average 18 13,65 1,818 49 81,91 15,835 50 92,83 12,662
and sento followed by GA-SA. GA-SLS obtained a total RSR better than GA-
SA (92,83% and 81,91%, respectively). At the same time, GA-SA and GA-SLS
widely surpass GA (13,65%). RSR shows that hybridization of GA with SA has
improved the success rate of 79,18% and its hybridization with SLS of 68,49%.
From Table 2, GA is the fastest with an global average CPU time of 1.818 sec.
Results for the Ten Large Instances. From results on the GK shown in
Table 3 GA-SA has the best value of Result and GAP for 1, 3, 5, 6 and 8
instances. GA-SLS has the best value of Result and GAP for instances 2, 4, 7,
9 and 10. Global, GA-SLS has the best performance for all instances with an
total average GAP of 2.479 %. GA-SA has almost the same performance with
average GAP of 2.484 %. Also, GA is not very far from GA-SA and GA-SLS
with an total average GAP of 2.632 %.
5 Conclusion
In this paper we addressed the multidimensional knapsack problem (MKP). We
proposed, compared and tested two combinations: GA-SLS and GA-SA. GA-SLS
combines the genetic algorithm and the stochastic local search (SLS) while GA-
SA uses the simulated annealing (SA) instead of SLS. The experiments have
shown the performance of our methods for MKP. Also, the hybridization of
GA with local search methods allows to greatly improving its performance. As
perspectives, we plan to study the impact of local search method when used with
other evolutionary approaches such as: harmony search and particle swarm.
References
1. Bean, J.C.: Genetics and random keys for sequencing and optimization. ORSA
Journal of Computing 6(2), 154–160 (1994)
2. Beaujon, G.J., Martin, S.P., McDonald, C.C.: Balancing and optimizing a portfolio
of R&D projects. Naval Research Logistics 48, 18–40 (2001)
3. Boughaci, D., Benhamou, B., Drias, H.: Local Search Methods for the Optimal
Winner Determination Problem in Combinatorial Auctions. Math. Model. Algor.
9(1), 165–180 (2010)
4. Chih, M., Lin, C.J., Chern, M.S., Ou, T.Y.: Particle swarm optimization with
time-varying acceleration coefficients for the multidimensional knapsack problem.
Applied Mathematical Modelling 38, 1338–1350 (2014)
5. Cho, J.H., Kim, Y.D.: A simulated annealing algorithm for resource-constrained
project scheduling problems. Operational Research Society 48, 736–744 (1997)
6. Chu, P., Beasley, J.: A Genetic Algorithm for the Multidimensional Knapsack
Problem. Heuristics 4, 63–86 (1998)
7. Cotta, C., Troya, J.: A Hybrid Genetic Algorithm for the 0–1 Multiple Knapsack
problem. Artificial Neural Nets and Genetic Algorithm 3, 250–254 (1994)
8. Deane, J., Agarwal, A.: Neural, Genetic, And Neurogenetic Approaches For Solv-
ing The 0–1 Multidimensional Knapsack Problem. Management & Information
Systems - First Quarter 2013 17(1) (2013)
9. Della Croce, F., Grosso, A.: Improved core problem based heuristics for the 0–1
multidimensional knapsack problem. Comp. & Oper. Res. 39, 27–31 (2012)
10. Djannaty, F., Doostdar, S.: A Hybrid Genetic Algorithm for the Multidimensional
Knapsack Problem. Contemp. Math. Sciences 3(9), 443–456 (2008)
11. Feng, L., Ke, Z., Ren, Z., Wei, X.: An ant colony optimization approach for the
multidimensional knapsack problem. Heuristics 16, 65–83 (2010)
304 A. Rezoug et al.
12. Feng, Y., Jia, K., He, Y.: An Improved Hybrid Encoding Cuckoo Search Algo-
rithm for 0–1 Knapsack Problems. Computational Intelligence and Neuroscience,
ID 970456 (2014)
13. Fukunaga, A.S.: A branch-and-bound algorithm for hard multiple knapsack prob-
lems. Annals of Operations Research 184, 97–119 (2011)
14. Garey, M.R., Johnson, D.S.: Computers and intractability: A guide to the theory
of NP-completeness. W. H. Freeman & Co, New York (1979)
15. Khuri, S., Bäck, T., Heitkötter, J.: The zero-one multiple knapsack problem and
genetic algorithms. In: Proceedings of the ACM Symposium on Applied Comput-
ing, pp. 188–193 (1994)
16. Kirkpatrick, S., Gelatt, C.D., Vecchi, P.M.: Optimization By Simulated Annealing.
Science 220, 671–680 (1983)
17. Meier, H., Christofides, N., Salkin, G.: Capital budgeting under uncertainty-an
integrated approach using contingent claims analysis and integer programming.
Operations Research 49, 196–206 (2001)
18. Tuo, S., Yong, L., Deng, F.: A Novel Harmony Search Algorithm Based on
Teaching-Learning Strategies for 0–1 Knapsack Problems. The Scientific World
Journal Article ID 637412, 19 pages (2014)
19. Thiel, J., Voss, S.: Some Experiences on Solving Multiconstraint Zero-One Knap-
sack Problems with Genetic Algorithms. INFOR 32, 226–242 (1994)
20. Vasquez, M., Vimont, Y.: Improved results on the 0–1 multidimensional knapsack
problem. Eur. J. Oper. Res. 165, 70–81 (2005)
21. Yoon, Y., Kim, Y.H.: A Memetic Lagrangian Heuristic for the 0–1 Multidimen-
sional Knapsack Problem. Discrete Dynamics in Nature and Society, Article ID
474852, 10 pages (2013)
22. http://people.brunel.ac.uk/∼mastjjb/jeb/orlib/mknapinfo.html
Synthesis of In-Place Iterative Sorting
Algorithms Using GP: A Comparison Between
STGP, SFGP, G3P and GE
1 Introduction
This work compares four approaches, namely Strongly Typed Genetic Pro-
gramming (STGP), Strongly Formed Genetic Programming (SFGP), Grammar
Guided Genetic Programming (G3P) and Grammatical Evolution (GE), at evolv-
ing sorting algorithms. Special emphasis is given on their ability to scale well in
spite of bigger primitive sets.
We restrict ourselves to iterative (non-recursive), in-place, comparison based
sorting algorithms, expecting quadratic running times (O(n2 )) and constant
(O(1)) additional memory. We make no assumptions about stability and adapt-
ability of the evolved algorithms. Bloat analysis and solution will be left for
future work.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 305–310, 2015.
DOI: 10.1007/978-3-319-23485-4 32
306 D. Pinheiro et al.
scalability related to the primitive set size, therefore we will not delve into fine
tuning the choices made for operators and values (shown in Table 1).
The nonterminals used are shown in Table 2. In the first three columns a
number is attributed to every syntactic element to help understand the results,
followed by the name and a description. The node data type and the needed child
data types, used by STGP and SFGP, and the node type and child node types,
used by SFGP, fulfill the last four columns. The approaches based on grammars
do not need to define data or node types, for the very grammar contains the
specification of the requirements and restrictions on the types of data and syn-
tactic form of the solutions. The strongly typed approaches were designed to
achieve side effects, to change global variables, for that reason the Statement
nonterminals return data type Void. SFGP nonterminal nodes belong to only
three supertypes, namely CodeBlock, Statement and Expression. The implemen-
tation uses polymorphism then whenever it is required to use an Expression, for
example, any Expression subtype can be used. Only two terminals were used,
the minimum for quadratic algorithms, which in practice work as indexes to
array elements (next section tests the use of more terminals). Using insight from
human-made algorithms, the loops were restricted to a small set of widely used
variants, like looping from a specified position of the array, ascending or descend-
ing. To prevent infinite loops, a limit of 100 cycles in each loop was set.
Synthesis of In-Place Iterative Sorting Algorithms Using GP 307
the particularity that the set of size 18 presents more useful constructs to evolve
sorting algorithms than the set of size 10.
Protection against out of bounds array accesses. A recurrent situation that
happens when using indexed data types, for example arrays, is that the indexes
can get out of bounds of the data type when used inside some of the loops, causing
run-time exceptions. In this experiment we protected against out of bounds array
accesses, using the % (mod) operator against the size of the array, to ensure that
the index is always in the correct bounds, and obtained the results presented in
Figure 2 and Table 3. The important positive impact that this tweak had on the
performance and scalability of the approaches can be ascertained by comparing
lines named PAA (Protected Array Accesses) and UAA.
Influence of grammar context insensitivity. Context-free Grammars (CFG),
used in G3P and GE, show a lack of expressiveness to describe semantic con-
straints1 [6]. Their context insensitivity can have an appreciable negative impact
on the size of the search space, especially in the presence of loops and swaps that
repeatedly require the same index. As our system doesn’t allow us to specify that
certain terminal (index) assignments should be repeated in a given nonterminal,
we obtained the same result changing the grammar so that the rules which
require more than one index are split into two, one specifically for the index i,
another specifically for the index j2 . This acts as a kind of context sensitivity,
forcing these constructs to always correctly match the indexes. The results, pre-
sented in Figure 3 and lines PAACS (Protected Array Accesses with Context
Sensitivity) of Table 3, attest that this has a huge positive impact on the per-
formance of grammar guided approaches, even to the point that almost all runs
produce a correct individual.
Performance and scalability with bigger terminal sets. The last experiment
tests the performance and scalability of SFGP in the presence of bigger terminal
sets, the same number of terminals as nonterminals, between 4 and 18 of each.
The terminals consist of integers of node type Variable. From Figure 4 and line
PAANT (Protected Array Accesses with N Terminals) of Table 3 one can see that
the number of terminals has an important negative impact in the performance
but nevertheless SFGP remains scalable, showing almost the same performance
for sets of size 10 and 18. This gives us confidence in the introduction of more
data types in the evolutionary process, such as trees, graphs, stacks, etc., with
the important goal of evolving not-in-place algorithms.
1
For example, to define a loop, the grammar can state
<for>::= for(<index> = 0; <index> < array.length; <index>++);
<index>::= i | j | k;
which can be evaluated as
for(i = 0; j < array.length; k++){}
or any other combination of indexes that give an infinite loop. This situation could be
overcome by the evolutionary system, indicating that the second and third indexes
should be the same as the first.
2
For example, the for loop was subdivided into one loop for each index:
<loop i>::= for(i = 0; i < array.length; i++){}
<loop j>::= for(j = 0; j < array.length; j++){}
Synthesis of In-Place Iterative Sorting Algorithms Using GP 309
Fig. 3. Context Free vs Context Sensitive Fig. 4. Use of n terminals in SFGP with
index assignment Protected Array Accesses
310 D. Pinheiro et al.
References
1. Otero, F., Castle, T., Johnson, C.: Epochx: genetic programming in java withstatis-
tics and event monitoring. In: Proceedings of the 14th Annual Conference Compan-
ion on Genetic and Evolutionary Computation, pp. 93–100 (2012)
2. ONeill, M, Nicolau, M., Agapitos, A.: Experiments in program synthesis withgram-
matical evolution: a focus on Integer Sorting. In: IEEE Congress on Evolutionary
Computation (CEC), pp. 1504–1511 (2014)
3. Koza, J.R.: Genetic programming: on the programming of computers by means of
natural selection, vol. 1. MIT press (1992)
4. Christensen, S., Oppacher, F.: An analysis of koza’s computational effort statis-
tic for genetic programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C.,
Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 182–191. Springer,
Heidelberg (2002)
5. Walker, M., Edwards, H., Messom, C.H.: Confidence intervals for computational
effort comparisons. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-
Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 23–32. Springer, Heidelberg
(2007)
6. Orlov, M., Sipper, M.: FINCH: a system for evolving Java (bytecode). In: Genetic
Programming Theory and Practice VIII, Springer, pp. 1–16 (2011)
Computational Methods in
Bioinformatics and Systems Biology
Variable Elimination Approaches
for Data-Noise Reduction in 3D QSAR Calculations
Abstract. In the last several decades, the drug research has moved to involve
various IT technologies in order to rationalize the design of novel bioactive
chemical compounds. An important role among these computer-aided drug de-
sign (CADD) methods is played by a technique known as quantitative structure-
activity relationship (QSAR). The approach is utilized to find a statistically
significant model correlating the biological activity with more or less extent da-
ta derived from the chemical structures. The present article deals with ap-
proaches for discriminating unimportant information in the data input within the
three dimensional variant of QSAR – 3D QSAR. Special attention is turned to
uninformative and iterative variable elimination (UVE/IVE) methods applicable
in connection with partial least square regression (PLS). Herein, we briefly in-
troduce 3D QSAR approach by analyzing 30 antituberculotics. The analysis is
examined by four UVE/IVE-PLS based data-noise reduction methods.
1 Introduction
The core of 3D QSAR method, originally named comparative molecular field analysis
(CoMFA), was designed by Cramer et al. in 1988 as a four-step procedure: 1) supe-
rimposition of ligand molecules on a selected template structure, 2) representation of
ligand molecules by molecular interaction fields (MIFs), 3) data analysis of MIFs and
biological activities by PLS, utilizing cross-validation to select the most robust 3D
QSAR model, 4) graphical explanation of the results through three-dimensional pseu-
do βPLS coefficients contour plots.
Within the 3D QSAR analysis, a starting set of molecular models can be prepared
with any chemical software capable of creating and geometrically optimizing
chemical structures (e.g. HyperChem, Spartan, ChemBio3D Ultra, etc.). Usually, a
molecular dynamics method (e.g. simulated annealing, quenched molecular dynamics,
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 315
The potential energies (e.g. Lennard-Jones, van der Waals (VDW) and Coulomb
electrostatic potential energies (ESP)) experienced by unit charge or atom probe at
various points (xi, yj, zk) around the studied molecules are usually regressed by PLS on
the biological activities to reveal significant correlations. Generally, any supervised
learning method is applicable in 3D QSAR analysis instead of PLS, provided it is able
to process many thousands of “independent” x variables within a reasonable time [4].
A common MIF related to a 1.0Å spaced gridbox of the size 30.0 x 30.0 x 30.0Å
represents a chemical compound by 27 000 real numbers. Because not all points in
MIFs are typically related with the observed biological activity, the redundant infor-
mation in the data input may bring about overlearning which mostly causes unreliable
prediction for compounds outside the training and test sets. As a rule, the raw input
data for 3D QSAR analysis must be pre-processed by data-noise reduction methods
prior to building the final model by PLS. The data-noise reduction techniques like
fractional factor design (FFD), uninformative variable elimination (UVE) or iterative
variable elimination (IVE) frequently utilize cross-validated coefficient of determina-
tion (Q2) as a cost-function for selecting the most robust 3D QSAR model [5]. The
final step in deciding whether the derived 3D QSAR model is trustworthy is statistical
validation. Such methods as progressive Y-scrambling, randomization, leave-two-out
(LTO) or multiple leave-many-out cross-validation (LMO), and, above all, external
validation are employed for these ends [6].
316 R. Dolezal et al.
By 3D contour maps of pseudo βPLS x SD coefficients one can disclose which m mo-
lecular features are crucial for
f the biological activity observed. Accordingly, mediccin-
al chemists can utilize the information to design novel drugs through their chemmical
intuition or they can emp ploy the found 3D QSAR model in ligand-based virttual
screening of convenient dru ug databases (e.g. zinc.docking.org).
3 Variable Elimin
nation as Data-Noise Reduction Method for 33D
QSAR Analysis
models result via PLS regression when a proper selection of independent variables
from the original MIFs is neglected. This drawback, which challenges not only 3D
QSAR models, manifests especially in external prediction or exhaustive LMO cross-
validation. Lately, several variable selection algorithms have been developed to ad-
dress the overlearning in PLS based models. In the present study, a special attention is
turned to variable elimination methods applicable in 3D QSAR analysis.
; ; ; (2)
where means the pseudo βPLS regression coefficients; and denote the
weight matrices; means the loading matrix; and are the score matrices. The
columns of are orthogonal and called the latent variables ( ). The crucial
operation within PLS analysis consists in simplifying the complexity of the system by
selecting only few LVs to build the model. The number of involved LVs is often de-
termined by cross-validation. When Q2LOO starts dropping or the standard error of
prediction (SDEP) increases, the optimum number of latent variables has been ex-
ceeded (Fig. 3). Considerably more robust algorithms for latent variable selection
implement leave-many-out cross-validated Q2LMO or coefficient of determination for
external prediction R2ext as a control function.
318 R. Dolezal et al.
Since PLS is a PCA based method, it is highly sensitive to the variance of the val-
ues included in the input data. This problematic feature of PLS can lead to masking
significant information by data assuming greater values. For instance, the information
on weak hydrophobic interactions is suppressed by Coulombic interactions and hy-
drogen bonding that are stronger. However, it has become evident that hydrophobic
interactions play such an important role in drug binding to receptors that they cannot
be neglected in 3D QSAR calculations. To prevent discriminating variables with rela-
tively low values, the MIFs as well as the biological activities have to be column cen-
tered and normalized prior to PLS. In 3D QSAR analysis, different MIFs may be also
scaled as separated blocks by block unscaled weighting (BUW) to give each probe the
same significance in PLS [8]. PLS regression can be performed by a number of subtle
differing algorithms (e.g. NIPALS, SIMPLS, Lanczos bidiagonalization) [9]. In case
consists only of one column, then NIPALS algorithm may be simply illustrated as
follows:
[X,Y] = autoscale(X, Y); % Centering and normalization
T=[]; P=[]; W=[]; Q=[]; B=[]; % initialization
for a=1:LV % Calculate the entered number of LVs
w=(Y'*X)'; % Calculate weighting vector
w=w/sqrt(w'*w); % Normalization to unit length
t=X*w; % Calculate X scores
if a>1 t=t - T*(inv(T'*T)*(t'*T)'); end;
u=t/(t'*t);
p=(u'*X)'; % Calculate X loadings
X=X - t*p'; % Calculate X residuals
T=[T t]; % Calculate X scores
P=[P p]; % Calculate X loadings
W=[W w]; % Calculate X weights
Q=[Q;y'*u]; % Calculate Y loadings
B=[B W*inv(P'*W)*Q]; % Calculate PLS coefficients
end
Y_pred=X*B; % Internal prediction
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 319
The above PLS algorithm derives the entered number of LVs and utilizes them in
the internal prediction of the autoscaled dependent variable . The robustness of the
resulting PLS model can be easily controlled by incorporating a cross-validation into
the algorithm. Nonetheless, the prediction instability of the 3D QSAR models has to
be solved mainly through preselecting the input data.
The main goal of 3D QSAR analysis is to develop a stable mathematical model which
can be used for prediction of unseen biological activities. From this somewhat nar-
rowed point of view, any model that is not able to prove itself in external prediction
must be rejected from further consideration. However, such refusal does not provide
any suggestion why different models are successful or fail in predictions and how to
boost the predictive ability.
The method introduced by Centner and Massart focuses on what is noisy and/or ir-
relevant information and how to discriminate it [10]. Comparing to other variable
selection techniques like forward selection, stepwise selection, genetic and evolution-
al algorithms, uninformative variable elimination (UVE) does not attempt to find the
best subset of variables to build a statistically significant model but to remove such
variables that contain no useful information. Centner and Massart took their inspira-
tion from previously published studies which tried to eliminate those variables having
small loadings or pseudo βPLS coefficient in a model derived by PLS.
UVE-PLS method resembles stepwise variable selection used in MLR. The j-th va-
riable (i.e. a MIF vector) is eliminated from the original vector pool if its cj value is
lower than a certain cutoff level (Eq. 3)
320 R. Dolezal et al.
(3)
RMSEP ∑ (4)
Here, is the observed biological activity for i-th compound, stands for the pre-
dicted biological activity of i-th compound, n is the number of compounds in the set.
The above-mentioned UVE algorithm can be transformed to a more robust variant by
expressing the cj as median )/interquartile range ( ). The criterion for vari-
able elimination may be substituted for a 90-95 quantile of abs(cjrandom).
Uninformative variable elimination was designed to improve the predictive power and
the interpretability of 3D QSAR models via removing those MIFs parts which do not
contain fecund information in comparison to random noise introduced to the input
data. The UVE method is based on calculation of cj values indicating the ratio of the
size and standard deviation of pseudo βPLS coefficient related to j-th vector of MIFs.
All MIF vectors with cj smaller then a cutoff derived from c values of the random
variables are removed from the matrix in a single step procedure.
A modification of the UVE algorithm suggested by Polanski and Gieleciak revises
the very one-step elimination of MIF vectors with low cj values [11, 12]. Their im-
proved algorithm named iterative variable elimination (IVE) does not remove the
selected vectors in a single step but in a sequential manner. In the first version of IVE,
the MIF vector having the lowest pseudo βPLS coefficient is eliminated and the
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 321
remaining matrix is regressed by PLS to evaluate the benefit. The iterative IVE
procedure can be described by the following protocol: 1) carry out PLS analysis with
a fixed number of LVs and estimate the performance by leave-one-out cross-
validation; 2) eliminate the matrix column with the lowest absolute value of pseu-
do βPLS coefficient; 3) carry out PLS analysis of the reduced matrix and
estimate the performance by leave-one-out cross-validation; 4) go to step 1 and repeat
until the maximal leave-one-out cross-validated coefficient of determination Q2LOO is
reached (Eq. 5).
∑
1 ∑
(5)
Here, means the observed biological activity for i-th compound, denotes
the biological activity of i-th compound predicted by the model derived without the i-
th compound, n is the number of compounds in the set. In the first version, the IVE
procedure was based on iterative elimination of MIF vectors with the lowest absolute
values of pseudo βPLS coefficients. In the next IVE variants, the criterion for MIF
vector elimination was substituted by cj values obtained by leave-one-out cross-
validation. The most robust form of IVE was proposed to involve optimization of the
LV number and cj values defined as median )/interquartile range ( ). It was
proved by Polanski and Gieleciak that the robust IVE form surpassed the other vari-
ants and gave the most reliable 3D QSAR models in terms of the highest Q2LOO and
sufficient resolution of pseudo βPLS coefficient contour maps.
position in 3D QSAR analysis. SRD procedure aims to rearrange the unfolded MIFs
into group variables related to the same chemical regions (e.g. points around the same
atoms). These groups of neighboring variables are explicitly associated with chemical
structures and when treated as logical units, the resulting 3D QSAR models are less
prone to chance correlations and easier to interpret [14]. Since the SRD procedure
clusters similar MIF vectors into groups, the time consumed by UVE or IVE analyses
is shorter than in the standard processing of all individual MIF vectors. The SRD
algorithm involves three major operations: 1) selecting the most important MIF vec-
tors (seeds) having the highest PLS weights; 2) building 3D Voronoi polyhedra
around the seeds; 3) collapsing Voronoi polyhedral into larger regions.
The starting point of the SRD procedure is PLS analysis of the whole matrix
which reveals through the magnitude of weights significant MIF vectors. Depending
on the user’s setting, a selection of important MIF vectors is denoted as seeds. In next
step, the remaining MIF vectors are assigned to the nearest seed according to preset
Euclidian distance. In case a MIF vector is too far from each seed, it is assigned to
“zero” region and eliminated from the matrix. After distributing the MIF vectors
into Voronoi polyhedra, further variable absorption is performed. Neighboring Vo-
ronoi polyhedra are statistically analysed and if found significantly correlated, they
are merged into one larger Voronoi polyhedra. The cutoff distances for initial building
the Voronoi polyhedra as well as for subsequent collapsing are critical points which
decide on the merit of SRD and, thus, have to be cautiously optimized.
In order to practically evaluate the performance of the UVE and IVE based noise-
reduction methods, a 3D QSAR analysis has been carried out. We selected a group of
30 compounds, which are currently considered as potential antituberculotics [15, 16],
and analyzed them in Open3DAlign and Open3DQSAR programs [17, 18]. Since we
cannot provide a detailed description of all undertaken steps of the analysis in this
article, we will confine the present study only to the performance of the UVE and IVE
algorithms and their SRD hybridized variants. In the 3D QSAR analyses, the most
common or default setting was used.
First, the set of 30 compounds published in the literature was modeled in Hyper-
Chem 7.0 to prepare the initial molecular models. Then, the molecular ensemble was
submitted to quenched molecular dynamics and the resulting conformers were
processed by an aligning algorithm in Open3DAlign program to determine the optim-
al molecular superimposition. In Open3DQSAR program two MIFs were generated
(i.e. van der Waals MIF and Coulombic MIF) and processed by four 3D QSAR me-
thods: 1) UVE-PLS, 2) IVE-PLS, 3) UVE-SRD-PLS and 4) IVE-SRD-PLS (Fig. 4).
As a dependent variable, we used the published logMICs against Mycobacterium
tuberculosis. To evaluate the 3D QSAR model, the original set of compounds was
randomly divided into training and test sets in ratio 25 : 5.
Variable Elimination Approach
hes for Data-Noise Reduction in 3D QSAR Calculations 323
From the above plots fo ollows that an efficient data noise reduction is cruciall to
achieve a stable PLS mod del in cross-validations. The original dataset provided all
Q2LOO/LTO/LMO with negative values, which indicate an overtrained PLS model. Signnifi-
cant improvement was reaached even by unsupervised preselection of the mattrix.
UVE and SRD-UVE PLS models showed considerable better performance and exxhi-
bited nearly the same robusstness in cross-validation. Top scoring model resulted w
when
applying SRD/UVE-PLS wiith 6-7 LVs (Q2LMO = 0.7352 – 0.7361). The best 3D QS SAR
324 R. Dolezal et al.
model was obtained by IVE-PLS (5 LVs; Q2LMO = 0.7652). It is interesting that applica-
tion of SRD-IVE-PLS caused significant deterioration of the IVE-PLS model stability
in cross-validation (max(Q2LMO) = 0.4617; 6 LVs). It is likely a consequence of stepwise
elimination of larger groups of MIF vectors grouped by SRD. Without SRD, the IVE
algorithm iteratively investigates all MIF vectors and is better able to find the critical
portion of eliminated information from the input data. Numbers of eliminated MIF
vectors by the used data-noise reduction methods are given in Table 1.
5 Conclusion
References
1. Brown, A.C., Fraser, T.R.: XX.—On the Connection between Chemical Constitution and
Physiological Action. Part II.—On the Physiological Action of the Ammonium Bases de-
rived from Atropia and Conia. Trans. Roy. Soc. Edinburgh 25, 693–739 (1869)
2. Hansch, C., Fujita, T.: ρ-σ-π Analysis. A method for the correlation of biological activity
and chemical structure. J. Am. Chem. Soc. 86, 1616–1626 (1964)
3. Free, S.M., Wilson, J.W.: A mathematical contribution to structure-activity studies. J.
Med. Chem. 7, 395–399 (1964)
4. Cramer Iii, R.D.: Partial least squares (PLS): its strengths and limitations. Perspect. Drug.
Discov. 1, 269–278 (1993)
Variable Elimination Approaches for Data-Noise Reduction in 3D QSAR Calculations 325
5. Dolezal, R., Korabecny, J., Malinak, D., Honegr, J., Musilek, K., Kuca, K.: Ligand-based
3D QSAR analysis of reactivation potency of mono- and bis-pyridinium aldoximes toward
VX-inhibited rat acetylcholinesterase. J. Mol. Graph. Model. 56c, 113–129 (2014)
6. Tropsha, A., Gramatica, P., Gombar, V.K.: The importance of being earnest: validation is
the absolute essential for successful application and interpretation of QSPR models. QSAR
Comb. Sci. 22, 69–77 (2003)
7. Chuang, Y.C., Chang, C.H., Lin, J.T., Yang, C.N.: Molecular modelling studies of sirtuin
2 inhibitors using three-dimensional structure-activity relationship analysis and molecular
dynamics simulations. Mol. Biosyst. 11, 723–733 (2015)
8. Kastenholz, M.A., Pastor, M., Cruciani, G., Haaksma, E.E., Fox, T.: GRID/CPCA: a new
computational tool to design selective ligands. J. Med. Chem. 43, 3033–3044 (2000)
9. Bro, R., Elden, L.: PLS works. J. Chemometr. 23, 69–71 (2009)
10. Centner, V., Massart, D.L., de Noord, O.E., de Jong, S., Vandeginste, B.M., Sterna, C.: Elimi-
nation of uninformative variables for multivariate calibration. Anal. Chem. 68, 3851–3858
(1996)
11. Polanski, J., Gieleciak, R.: The comparative molecular surface analysis (CoMSA) with
modified uniformative variable elimination-PLS (UVE-PLS) method: application to the
steroids binding the aromatase enzyme. J. Chem. Inf. Comp. Sci. 43, 656–666 (2003)
12. Gieleciak, R., Polanski, J.: Modeling robust QSAR. 2. iterative variable elimination
schemes for CoMSA: application for modeling benzoic acid pKa values. J. Chem. Inf.
Model. 47, 547–556 (2007)
13. Mehmood, T., Liland, K.H., Snipen, L., Sæbø, S.: A review of variable selection methods
in partial least squares regression. Chemometr. Intel. Lab. 118, 62–69 (2012)
14. Pastor, M., Cruciani, G., Clementi, S.: Smart region definition: a new way to improve the
predictive ability and interpretability of three-dimensional quantitative structure-activity
relationships. J. Med. Chem. 40, 1455–1464 (1997)
15. Dolezal, R., Waisser, K., Petrlikova, E., Kunes, J., Kubicova, L., Machacek, M., Kaustova, J.,
Dahse, H.M.: N-Benzylsalicylthioamides: Highly Active Potential Antituberculotics. Arch.
Pharm. 342, 113–119 (2009)
16. Waisser, K., Matyk, J., Kunes, J., Dolezal, R., Kaustova, J., Dahse, H.M.: Highly Active
Potential Antituberculotics: 3-(4-Alkylphenyl)-4-thioxo-2H-1,3-benzoxazine-2(3H)-ones
and 3-(4-Alkylphenyl)-2H-1,3-benzoxazine-2,4(3H)-dihiones Substituted in Ring-B by
Halogen. Archiv Der Pharmazie 341, 800–803 (2008)
17. Tosco, P., Balle, T., Shiri, F.: Open3DALIGN: an open-source software aimed at unsuper-
vised ligand alignment. J. Comput. Aid. Mol. Des. 25, 777–783 (2011)
18. Tosco, P., Balle, T.: Open3DQSAR: a new open-source software aimed at high-throughput
chemometric analysis of molecular interaction fields. J. Mol. Model. 17, 201–208 (2011)
Pattern-Based Biclustering with Constraints
for Gene Expression Data Analysis
1 Introduction
Biclustering, the task of finding subsets of rows with a coherent pattern across
subsets of columns in real-valued matrices, has been largely used for expression
data analysis [9,11]. Biclustering algorithms based on pattern mining methods
[9,11,12,18,22,25], referred in this work as pattern-based biclustering, are able
to perform flexible and exhaustive searches. Initial attempts to use background
knowledge for biclustering based on user expectations [5,7,15] and knowledge-
based repositories [18,20,26] show its key role to guide the task and guaran-
tee relevant solutions. In this context, two valuable synergies can be identified
based on these observations. First, the optimality and flexibility of pattern-based
biclustering provide an adequate basis upon which knowledge-driven constraints
can be incorporated. Contrasting with pattern-based biclustering, alternative
biclustering algorithms place restrictions on the structure (number, size and
positioning), coherency and quality of biclusters, which may prevent the incor-
poration of certain constraints [11,16]. Second, the effective use of background
knowledge to guide pattern mining searches has been largely researched in the
context of domain-driven pattern mining [4,23].
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 326–339, 2015.
DOI: 10.1007/978-3-319-23485-4 34
Pattern-Based Biclustering with Constraints 327
2 Background
Definition 1. Given a matrix, A=(X, Y ), with a set of rows X={x1 , .., xn }, a
set of columns Y ={y1 , .., ym }, and elements aij ∈R relating row i and column j:
the biclustering task aims to identify a set of biclusters B={B1 , .., Bm }, where
each bicluster Bk = (Ik , Jk ) is a submatrix of A (Ik ⊆ X and Jk ⊆ Y ) satisfying
specific criteria of homogeneity and significance [11].
A real-valued matrix can thus be described by a (multivariate) distribution
of background values and a structure of biclusters, where each bicluster satis-
fies specific criteria of homogeneity and significance. The structure is defined by
the number, size and positioning of biclusters. Flexible structures are character-
ized by an arbitrary-high set of (possibly overlapping) biclusters. The coherency
(homogeneity) of a bicluster is defined by the observed correlation of values (see
Definition 2). The quality of a bicluster is defined by the type and amount of
accommodated noise. The statistical significance of a bicluster determines the
deviation of its probability of occurrence from expectations.
328 R. Henriques and S.C. Madeira
Definition 2. Let the elements in a bicluster aij ∈ (I, J) have coherency across
rows given by aij =kj +γi +ηij , where kj is the expected value for column j, γi is
the adjustment for row i, and ηij is the noise factor [16]. For a given real-valued
matrix A and coherency strength δ: aij =kj +γi +ηij where ηij ∈ [−δ/2, δ/2].
As motivated, the discovery of exhaustive and flexible structures of biclusters
satisfying certain homogeneity criteria (Definition 2) is a desirable condition to
effectively incorporate knowledge-driven constraints. However, due to the com-
plexity of such biclustering task , most of the existing algorithms are either based
on greedy or stochastic approaches, producing sub-optimal solutions and plac-
ing restrictions (e.g. fixed number of biclusters, non-overlapping structures, and
simplistic coherencies) that prevent the flexibility of the biclustering task [16].
Pattern-based biclustering appeared in recent years as one of various attempts
to address these limitations. As follows, we provide background on this class of
biclustering algorithms, as well as on constraint-based searches.
Pattern-Based Biclustering. Patterns are itemsets, rules, sequences or other
structures that appear in symbolic datasets with frequency above a specified
threshold. Patterns can be mapped as a bicluster with constant values across
rows (aij =cj ), and specific coherency strength determined by the number of
symbols in the dataset, δ=1/|L| where L is the alphabet of symbols. The rel-
evance of a pattern is primarily defined by its support (number of rows) and
length (number of columns). To allow this mapping, the pattern mining task
needs to output not only the patterns but also their supporting transactions
(full-patterns). Definitions 3 and 4 illustrate the paradigmatic mapping between
full-pattern mining and biclustering.
Given an illustrative symbolic matrix D={(t1 , {a, c, e}), (t2 , {a, b, d}),
(t3 , {a, c, e})}, we have Φ{a,c} ={t1 , t3 }, sup{a,c} =2. For a minimum support θ=2,
the full-pattern mining task over D returns the set of closed full-patterns,
{({a}, {t1 , t2 , t3 }), ({a, c, e}, {t1 , t3 })} (note that |Φ{a,c} |≤|Φ{a,c,e} |). Fig.1 illus-
trates how full-pattern mining can be used to derive constant biclusters1 .
3 Related Work
Knowledge-Driven Biclustering. The use of background knowledge to guide
biclustering has been increasingly motivated since solutions with good homo-
330 R. Henriques and S.C. Madeira
are pushed as deeply as possible within the mining step for an optimal prun-
ing of the search space. The nice properties exhibited by constraints, such as
anti-monotone and succinct properties, have been initially seized by Apriori
methods [21] to affect the generation of candidates. Convertible constraints,
can hardly be pushed in Apriori but can be handled by FP-Growth approaches
[23]. FICA, FICM, and more recently MCFPTree, are FP-Growth extensions to
seize the properties of anti-monotone, succinct and convertible constraints [23].
The inclusion of monotone constraints is more complex. Filtering methods, such
as ExAnte, are able to combine anti-monotone and monotone pruning based on
reduction procedures [2]. Reductions are optimally handled in FP-Trees [3].
BicPAM [11], BicSPAM [12] and BiP [9] are the state-of-the-art algorithms
for pattern-based biclustering. They integrate the dispersed contributions of
previous pattern-based algorithms and extend them to discover non-constant
coherencies and to guarantee their robustness to discretization (by assigning
multi-items to a single element [11]), noise and missings. In this section, we pro-
pose BiC2PAM (BiClustering with Constraints using PAttern Mining) to inte-
grate their contributions and adapt them to effectively incorporate constraints.
BiC2PAM is a composition of three major steps: 1) preprocessing to itemize real-
valued data; 2) mining step, corresponding to the application of full-pattern min-
ers; and 3) postprocessing to merge, reduce, extend and filter similar biclusters.
As follows, Section 4.1 lists native constraints supported by parameterizations
along these steps. Section 4.2 lists biologically meaningful constraints with prop-
erties of interest. Finally, we extend a pattern-growth search to seize efficiency
gains from succinct, (anti-)monotone and convertible constraints (Section 4.3 ).
– ranges of values (or symbols) to ignore from the input matrix, remove(S) where
S ⊆ R+ (or S ⊆ L). In gene expression, elements with default/non-differential
expression are generally less relevant and thus can be removed. This is achieved
by removing these elements from the transactions. Despite the simplicity of this
constraint, this option is not easily supported by peer biclustering algorithms [16].
– minimum coherency strength (or number of symbols) of the target biclusters:
δ=1/|L|. Decreasing the coherency strength (increasing the number of symbols)
reduces the noise-tolerance of the resulting set of bilusters and it is often associ-
ated with solutions composed by a larger number of biclusters with smaller areas.
– level of relaxation to handle noise by increasing the ηij noise range (Definition 2).
This constraint is used to adjust the behavior of BiC2PAM in the presence of noise
or discretization problems (values near a boundary of discretization). By default,
one symbol is associated with an element. Yet, this constraint gives the possibility
to assign an additional symbol to an element when its value is near a boundary
of discretization, or even a parameterizable number of symbols per element for a
high tolerance to noise (proof in [11]).
Fig. 2. Illustrative symbolic dataset and “price table” for expression data analysis.
and symbols to remove. We observe that the proposed F2G miner is the most
efficient option for denser data settings (looser coherency). Also, in contrast
with existing biclustering algorithms, BiC2PAM seizes large efficiency gains from
neglecting specific ranges of values (symbols) from the input matrix.
In order to test the ability of BiC2PAM to seize further efficiency gains in the
presence of non-trivial constraints, we fixed the 2000×200 setting with 6 sym-
bols/values {-3,-2,-1,1,2,3}. In the baseline performance, constraints were satis-
fied using post-filtering procedures. Fig.5 illustrates this analysis. As observed,
the use of constraints can significantly reduce the search complexity when they
are properly incorporated within the full-pattern mining method. In particular,
CFG principles [23] are used to seize efficiency gains from convertible constraints
and FP-Bonsai [3] to seize efficiency gains from monotonic constraints.
Results on Real Data. Fig.6 shows the (time and memory) efficiency of apply-
ing BiC2PAM in the yeast4 expression dataset with different pattern miners and
varying support thresholds for a desirable coherency strength of 10% (|L|=10).
The proposed F2G is the most efficient option in terms of time and, along with
Apriori, a competitive choice for efficient memory usage.
Finally, Figs.7 and 8 show the impact of biologically meaningful constraints
in the efficiency and effectiveness of BiC2PAM. For this purpose, we used the
complete gasch dataset (6152×176) [6] with six levels of expression (|L|=6). The
effect of constraints in the efficiency is shown in Fig.7. This analysis supports
their key role of providing opportunities to solve hard biomedical tasks.
4
http://www.upo.es/eps/bigs/datasets.html
Pattern-Based Biclustering with Constraints 337
Fig. 6. Computational time and memory of full-pattern miners for yeast (2884×17).
Fig. 7. Efficiency gains from using biological constraints for gasch (6152×176).
6 Conclusions
This work motivates the task of biclustering biological data in the presence of
constraints. To answer this task, we explore the synergies between pattern-based
338 R. Henriques and S.C. Madeira
References
1. Besson, J., Robardet, C., De Raedt, L., Boulicaut, J.-F.: Mining Bi-sets in numeri-
cal data. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 11–23.
Springer, Heidelberg (2007)
2. Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Exante: a preprocessing
method for frequent-pattern mining. IEEE Intel. Systems 20(3), 25–31 (2005)
3. Bonchi, F., Goethals, B.: FP-Bonsai: the art of growing and pruning small FP-
trees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI),
vol. 3056, pp. 155–160. Springer, Heidelberg (2004)
4. Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern
discovery. Data Knowl. Eng. 60(2), 377–399 (2007)
5. Fang, G., Haznadar, M., Wang, W., Yu, H., Steinbach, M., Church, T.R.,
Oetting, W.S., Van Ness, B., Kumar, V.: High-Order SNP Combinations Associ-
ated with Complex Diseases: Efficient Discovery, Statistical Power and Functional
Interactions. Plos One 7 (2012)
6. Gasch, A.P., Werner-Washburne, M.: The genomics of yeast responses to environ-
mental stress and starvation. Functional & integrative genomics 2(4–5), 181–192
(2002)
7. Guerra, I., Cerf, L., Foscarini, J., Boaventura, M., Meira, W.: Constraint-based
search of straddling biclusters and discriminative patterns. JIDM 4(2), 114–123
(2013)
8. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and
future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
9. Henriques, R., Madeira, S.: Biclustering with flexible plaid models to unravel inter-
actions between biological processes. IEEE/ACM Trans, Computational Biology
and Bioinfo (2015). doi:10.1109/TCBB.2014.2388206
10. Henriques, R., Antunes, C., Madeira, S.C.: Generative modeling of repositories
of health records for predictive tasks. Data Mining and Knowledge Discovery,
pp. 1–34 (2014)
Pattern-Based Biclustering with Constraints 339
11. Henriques, R., Madeira, S.: Bicpam: Pattern-based biclustering for biomedical data
analysis. Algorithms for Molecular Biology 9(1), 27 (2014)
12. Henriques, R., Madeira, S.: Bicspam: Flexible biclustering using sequential pat-
terns. BMC Bioinformatics 15, 130 (2014)
13. Henriques, R., Madeira, S.C., Antunes, C.: F2g: Efficient discovery of full-patterns.
In: ECML /PKDD IW on New Frontiers to Mine Complex Patterns. Springer-
Verlag, Prague, CR (2013)
14. Khiari, M., Boizumault, P., Crémilleux, B.: Constraint programming for mining
n-ary patterns. In: Cohen, D. (ed.) CP 2010. LNCS, vol. 6308, pp. 552–567.
Springer, Heidelberg (2010)
15. Kuznetsov, S.O., Poelmans, J.: Knowledge representation and processing with for-
mal concept analysis. Wiley Interdisc. Reviews: Data Mining and Knowledge Dis-
covery 3(3), 200–215 (2013)
16. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis:
A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1(1), 24–45 (2004)
17. Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D., Jacq, B.: Gotoolbox:
functional analysis of gene datasets based on gene ontology. Genome Biology (12),
101 (2004)
18. Martinez, R., Pasquier, C., Pasquier, N.: Genminer: Mining informative association
rules from genomic data. In: BIBM, pp. 15–22. IEEE CS (2007)
19. Mouhoubi, K., Létocart, L., Rouveirol, C.: A knowledge-driven bi-clustering
method for mining noisy datasets. In: Huang, T., Zeng, Z., Li, C., Leung, C.S.
(eds.) ICONIP 2012, Part III. LNCS, vol. 7665, pp. 585–593. Springer, Heidelberg
(2012)
20. Nepomuceno, J.A., Troncoso, A., Nepomuceno-Chamorro, I.A., Aguilar-Ruiz, J.S.:
Integrating biological knowledge based on functional annotations for biclustering
of gene expression data. Computer Methods and Programs in Biomedicine (2015)
21. Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning
optimizations of constrained associations rules. SIGMOD R. 27(2), 13–24 (1998)
22. Okada, Y., Fujibuchi, W., Horton, P.: A biclustering method for gene expression
module discovery using closed itemset enumeration algorithm. IPSJ T. on Bioinfo.
48(SIG5), 39–48 (2007)
23. Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In:
KDD. pp. 350–354. ACM, New York (2000)
24. Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view.
SIGKDD Explor. Newsl. 4(1), 31–39 (2002)
25. Serin, A., Vingron, M.: Debi: Discovering differentially expressed biclusters using
a frequent itemset approach. Algorithms for Molecular Biology 6, 1–12 (2011)
26. Visconti, A., Cordero, F., Pensa, R.G.: Leveraging additional knowledge to sup-
port coherent bicluster discovery in gene expression data. Intell. Data Anal. 18(5),
837–855 (2014)
A Critical Evaluation of Methods for the
Reconstruction of Tissue-Specific Models
1 Introduction
Over the last years, genome-scale metabolic models (GSMMs) for several organ-
isms have been developed, mainly for microbes with an interest in Biotechnology
[6,20]. These models have been used to predict cellular metabolism and promote
biological discovery [17], under constraint-based approaches such as Flux Bal-
ance Analysis (FBA) [18]. FBA finds a flux distribution that maximizes biomass
production, considering the knowledge of stoichiometry and reversibility of reac-
tions, and taking some simplifying assumptions, namely assuming quasi steady-
state conditions.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 340–352, 2015.
DOI: 10.1007/978-3-319-23485-4 35
A Critical Evaluation of Methods 341
between distinct brain cells [14], and to find potential therapeutic targets for
the treatment of non-alcoholic steatohepatitis [15].
However, the aforementioned methods have not yet been critically and sys-
tematically evaluated on standardized case studies. Indeed, each of the methods
is proposed and validated with distinct cases and taking distinct omics data
sources as inputs. Thus, the impact of using different omics datasets on the final
results of those algorithms is a question that remains to be answered. Here, we
present a critical evaluation of the most important methods for the reconstruc-
tion of tissue-specific metabolic models published until now.
We have developed a framework where we implemented different methods
for the reconstruction of tissue-specific metabolic models. In this scenario, the
algorithms use sets of metabolites and/or maps of scores for each reaction as
input. So, in our framework the algorithms are independent from the omics data
source, and the separation of these two layers allows to use different data sources
in each algorithm for the generation of tissue-specific metabolic models. As a
case study, to compare the three different approaches implemented, metabolic
models were reconstructed for hepatocytes, using the same set of data sources as
inputs for each algorithm. Moreover, distinct combinations of data sources are
evaluated to check their influence on the final results.
Although there are several applications of the human GSMMs, the specificity of
cell types requires the reconstruction of tissue-specific metabolic models. Some
approaches have been proposed based on existing generic human models. Here,
we present three of the most well-known approaches for this task that will be
used in the remaining of this work.
A Critical Evaluation of Methods 343
INIT/ tINIT. The Integrative Network Inference for Tissues (INIT) [1] uses
the Human Protein Atlas (HPA) as its main source of evidence. Expression data
can be used when proteomic evidence is missing. It also allows the integration
of metabolomics data by imposing a positive net production of metabolites for
which there is experimental support, for instance in HMDB. The algorithm is
formulated using mixed integer-linear programming (MILP), so that the final
model contains reactions with high scores from HPA data. This algorithm does
not impose strict steady-state conditions for all internal metabolites, allowing a
small net accumulation rate. A couple of years later, a new version of this algo-
rithm was proposed, the Task-driven Integrative Network Inference for Tissues
(tINIT) [2], which reconstructs tissue-specific metabolic models based on protein
evidence from HPA and a set of metabolic tasks that the final context-specific
model must perform. These tasks are used to test the production or uptake of
external metabolites, but also the activation of pathways that occur in a spe-
cific tissue. Another improvement from the previous version is the addition of
constraints to guarantee that irreversible reactions operate in one direction only.
rather the frequency of expressed states over several transcript profiles. So, it
is necessary to previously binarize the expression data. Thus, it is possible to
use data retrieved from the Gene Expression Barcode project that already con-
tains binary information on which genes are present or not in a specific tissue/
cell type. Reactions from the non-core set are ranked according to the expres-
sion scores, connectivity-based scores and confidence level-based scores. Then,
sequentially, each reaction is removed and the consistency of the model is tested.
The elimination only occurs if the reaction does not prevent the production of
a key-metabolite and the core consistency is preserved. Comparing with the
MBA algorithm, mCADRE presents two improvements: it allows the definition
of key metabolites, i.e. metabolites that have evidence to be produced in the
context-specific model reconstruction, and relaxes the condition of including all
core reactions in the final model.
Table 1 shows the mathematical formulation and pseudocode for all algo-
rithms described above.
3 Results
To compare the metabolic models generated by the different algorithms and the
effects of distinct omics data sources, we chose the reconstruction of hepatocytes
metabolic models as our case study. Hepatocytes are the principal site of the
metabolic conversions underlying the diverse physiological functions of the liver
[10]. The hepatocytes metabolic models were generated using Recon2 as a tem-
plate model and the GEB, HPA and the sets CH and CM from [12] as input
data, for the three methods described in the previous section.
In the experiments, we seek to answer two main questions: Are omics data
consistent across different data sources? What is the overlap of the resulting
metabolic models obtained using different methods and different data sources? In
2010, a manually curated genome-scale metabolic network of human hepatocytes
was presented, the HepatoNet1 [8], used as a reference in the validation process.
Fig. 1. A) Number of genes present in Gene Expression Barcode and Human Protein
Atlas. In HPA, the number of genes with reliability “supportive” and “uncertain” are
shown. B) Number of genes with evidence level “Low”, “Moderate” or “High” in HPA
and gene expression evidence higher than 0 in Gene Expression Barcode.
Fig. 2. A) Distribution of genes from Gene Expression Barcode project and Human
Protein Atlas across the evidence levels - “High”, “Moderate” and “Low”. The ranges
[0.9, 1], [0.5, 0.9[ and [0.1, 0.5[ were used to classify the data into “Low”, “Moderate”
and “High” levels. B) Genes with no evidence to be present in hepatocytes from GEB,
but with evidence in the HPA. C) Genes with no evidence to be present in hepatocytes
from HPA, but with evidence in GEB.
Fig. 3. A) Reactions with evidence that support their inclusion in the hepatocytes
metabolic model. B) Number of reactions that have a high level of evidence of expres-
sion for each data source. C) Number of reactions that have a moderate evidence of
expression for each data source
on different sources can be observed. Considering all data sources, 3243 reactions
show some evidence that support their inclusion in the hepatocytes metabolic
model, but only 388 are supported by all sources. The numbers are further
dramatically reduced if we consider only moderate or high levels of evidence
(Figure 3 B-C).
Fig. 4. A-C) Distribution of reactions across the 50 models for each data type. Grey
bars show an histogram with the number of reactions present in a certain number of
models. Green bars show the reactions that are present in the final model. D) Results
from hierarchical clustering of the resulting nine models.
“High” and “Moderate” evidence levels, and from CH and CM sets, are all
considered as belonging to the core. Furthermore, the mean of reactions that
belong to all models of the same algorithm is around 45%. When the compari-
son is made by grouping models with the same input data, the variance between
models is lower than grouping by algorithm. Here, the mean of reactions com-
mon to all models with the same data source is around 67% (Supplementary
Table S3). Again, the variability of the final results seems to be dominated by
the data source factor.
The quality of the metabolic models was further validated using the metabolic
functions that are known to occur in hepatocytes [8]. The generic Recon2 human
metabolic model, used as template in the reconstruction process, is able to satisfy
337 of the 408 metabolic functions available. Metabolic functions related with
disease or involving metabolites not present in Recon2 were removed from the
original list.
A Critical Evaluation of Methods 349
Fig. 5. Metabolic models reaction intersection considering: (A) the same algorithm;
(B) the same omics data source.
Table 2. Number of reactions and the percentage of liver metabolic functions that
each metabolic model performs when compared with the template model - Recon 2.
The results of this functional validation, showing also the number of reactions
in each metabolic model, are given on Table 2. These show that the number of
satisfied metabolic tasks is very low compared with the manual curated metabolic
model HepatoNet1. The metabolic model which performs the higher number of
metabolic tasks was obtained using the MBA algorithm with the HPA evidence.
Nevertheless, the success percentage is less than 25% when comparing with the
performance by the template metabolic model - Recon2.
4 Conclusions
In this work, we present a survey of the most important methods for the recon-
struction of tissue-specific metabolic models. Each method was proposed to use
350 S. Correia and M. Rocha
Acknowledgments. S.C. thanks the FCT for the Ph.D. Grant SFRH/BD/
80925/2011. The authors thank the FCT Strategic Project of UID/BIO/04469/2013
unit, the project RECI/BBB-EBI/0179/2012 (FCOMP-01-0124-FEDER-027462) and
the project “BioInd - Biotechnology and Bioengineering for improved Industrial and
Agro-Food processes”, REF. NORTE-07-0124-FEDER-000028 Co-funded by the Pro-
grama Operacional Regional do Norte (ON.2 - O Novo Norte), QREN, FEDER.
References
1. Agren, R., Bordel, S., Mardinoglu, A., Pornputtapong, N., Nookaew, I., Nielsen, J.:
Reconstruction of Genome-Scale Active Metabolic Networks for 69 Human Cell
Types and 16 Cancer Types Using INIT. PLoS Computational Biology 8(5),
e1002518 (2012)
2. Agren, R., Mardinoglu, A., Asplund, A., Kampf, C., Uhlen, M., Nielsen, J.: Iden-
tification of anticancer drugs for hepatocellular carcinoma through personalized
genome-scale metabolic modeling. Molecular Systems Biology 10, 721 (2014)
3. Barrett, T., Troup, D.B., Wilhite, S.E., Ledoux, P., et al.: NCBI GEO: archive for
functional genomics data sets - 10 years on. Nucleic Acids Research 39(suppl 1),
D1005–D1010 (2011)
4. Carlson, M.: hgu133plus2.db: Affymetrix Human Genome U133 Plus 2.0 Array
annotation data (chip hgu133plus2) (2014). r package version 3.0.0
5. Duarte, N.C., Becker, S.A., Jamshidi, N., Thiele, I., Mo, M.L., Vo, T.D., Srivas, R.,
Palsson, B.O.: Global reconstruction of the human metabolic network based on
genomic and bibliomic data. Proceedings of the National Academy of Sciences of the
United States of America 104(6), 1777–1782 (2007)
6. Duarte, N.C., Herrgård, M.J., Palsson, B.O.: Reconstruction and validation of Sac-
charomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic
model. Genome Research 14(7), 1298–1309 (2004)
A Critical Evaluation of Methods 351
7. Flicek, P., Amode, M.R., Barrell, D., et al.: Ensembl 2014. Nucleic Acids Research
42(D1), D749–D755 (2014)
8. Gille, C., Bölling, C., Hoppe, A., et al.: HepatoNet1: a comprehensive metabolic
reconstruction of the human hepatocyte for the analysis of liver physiology. Molec-
ular Systems Biology 6(411), 411 (2010)
9. Hao, T., Ma, H.W., Zhao, X.M., Goryanin, I.: Compartmentalization of the Edin-
burgh Human Metabolic Network. BMC Bioinformatics 11, 393 (2010)
10. Ishibashi, H., Nakamura, M., Komori, A., Migita, K., Shimoda, S.: Liver architec-
ture, cell function, and disease. Seminars in Immunopathology 31(3) (2009)
11. Jerby, L., Ruppin, E.: Predicting Drug Targets and Biomarkers of Cancer via
Genome-Scale Metabolic Modeling. Clinical Cancer Research : An Official Journal
of the American Association for Cancer Research 18(20), 5572–5584 (2012)
12. Jerby, L., Shlomi, T., Ruppin, E.: Computational reconstruction of tissue-specific
metabolic models: application to human liver metabolism. Molecular Systems Biol-
ogy 6(401), 401 (2010)
13. Kaddurah-Daouk, R., Kristal, B., Weinshilboum, R.: Metabolomics: a global bio-
chemical approach to drug response and disease. Annu. Rev. Pharmacol. Toxicol.
48, 653–683 (2008)
14. Lewis, N.E., Schramm, G., Bordbar, A., Schellenberger, J., Andersen, M.P., Cheng,
J.K., Patel, N., Yee, A., Lewis, R.A., Eils, R., König, R., Palsson, B.O.: Large-scale
in silico modeling of metabolic interactions between cell types in the human brain.
Nature Biotechnology 28(12), 1279–1285 (2010)
15. Mardinoglu, A., Agren, R., Kampf, C., Asplund, A., Uhlen, M., Nielsen, J.:
Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in
patients with non-alcoholic fatty liver disease. Nature Communications 5, Jan 2014
16. McCall, M.N., Jaffee, H.A., Zelisko, S.J., Sinha, N., et al.: The Gene Expression
Barcode 3.0: improved data processing and mining tools. Nucleic Acids Research
42(D1), D938–D943 (2014)
17. Oberhardt, M.A., Palsson, B.O., Papin, J.A.: Applications of genome-scale
metabolic reconstructions. Molecular Systems Biology 5(320), 320 (2009)
18. Orth, J.D., Thiele, I., Palsson, B.O.: What is flux balance analysis? Nature Biotech-
nology 28(3), 245–248 (2010)
19. Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., et al.:
ArrayExpress-a public repository for microarray gene expression data at the EBI.
Nucleic Acids Research 33(Database issue), Jan 2005
20. Reed, J.L., Vo, T.D., Schilling, C.H., Palsson, B.O.: An expanded genome-scale
model of Escherichia coli K-12 ( i JR904 GSM / GPR ) 4(9), 1–12 (2003)
21. Sahoo, S., Franzson, L., Jonsson, J.J., Thiele, I.: A compendium of inborn errors
of metabolism mapped onto the human metabolic network. Mol. BioSyst. 8(10),
2545–2558 (2012)
22. Sahoo, S., Thiele, I.: Predicting the impact of diet and enzymopathies on human
small intestinal epithelial cells. Human Molecular Genetics 22(13), 2705–2722
(2013)
23. Shlomi, T., Benyamini, T., Gottlieb, E., Sharan, R., Ruppin, E.: Genome-scale
metabolic modeling elucidates the role of proliferative adaptation in causing the
Warburg effect. PLoS Computational Biology 7(3), e1002018 (2011)
24. Shlomi, T., Cabili, M.N., Ruppin, E.: Predicting metabolic biomarkers of human
inborn errors of metabolism. Molecular Systems Biology 5(263), 263 (2009)
25. Thiele, I., Swainston, N., Fleming, R.M.T., et al.: A community-driven global
reconstruction of human metabolism. Nature Biotechnology 31(5), May 2013
352 S. Correia and M. Rocha
26. Uhlen, M., Oksvold, P., Fagerberg, L., Lundberg, E., et al.: Towards a knowledge-
based Human Protein Atlas. Nat Biotech 28(12), 1248–1250 (2010)
27. Wang, Y., Eddy, J.A., Price, N.D.: Reconstruction of genome-scale metabolic mod-
els for 126 human tissues using mCADRE. BMC Systems Biology 6(1), 153 (2012)
28. Wishart, D.S., Knox, C., Guo, A.C., Eisner, R., et al.: HMDB: a knowledgebase
for the human metabolome. Nucleic Acids Research 37(suppl 1), Jan 2009
29. Yizhak, K., Le Dévédec, S.E., Rogkoti, V.M.M., et al.: A computational study of
the Warburg effect identifies metabolic targets inhibiting cancer migration. Molec-
ular Systems Biology 10(8) (2014)
Fuzzy Clustering for Incomplete Short Time
Series Data
1 Introduction
bone metastatic patients. Due to the problems referred before, these biomarkers
measurements are commonly short time series with missing data. Many methods
are not able to deal, at the same time, with missing data and short time series,
let alone, with unevenly sampled time series. The proposed clustering algorithm
is able to take into account both missing data and short time series, evenly or
unevenly sampled. The approaches are unsupervised, since the marker study
objective is to relate the outcome of the unsupervised clustering with outcomes
of the patient’s health.
V1 is the derivative of data points with interval of one point, V2 is the derivative
of data points with interval of two points and V3 is the derivative of data points
with interval of three points. The vector V will be the input used in this method.
A) B)
5 4
0
0
−2
−4
−5
−6
−8
−10 −10
0 1 2 3 4 5 6 7 8 9 1011121314151617181920 0 1 2 3 4 5 8 9 11 16 18
Time Time
C) D)
6 6
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
1 2 3 4 5 6 1 2 4 7 10 13
Time Time
Fig. 1. Time Series classification. For each dataset the 20 time series are divided by
the 4 clusters, specified by the different contours.
Fuzzy Clustering for Incomplete Short Time Series Data 357
A) B)
12 12
10 10
misclassificaƟons
misclassificaƟons
Mean number of
Mean number of
8 8
6 6
4 4
2 2 PDS
0 0 PDS - Slopes
0 10 20 30 40 0 10 20 30 40 PDS - Slopes Comb
Percentage of missing data (%) Percentage of missing data (%)
OCS
C) D) OCS - Slopes
12 12
OCS - Slopes Comb
10 10
STS - OCS
misclassificaƟons
misclassificaƟons
Mean number of
Mean number of
8 8
6 6
4 4
2 2
0 0
0 10 20 30 40 0 10 20 30 40
Percentage of missing data (%) Percentage of missing data (%)
PDS performs better than the OCS, probably due to the bias arising from the
imputation. In all datasets, the PDS-Slopes Comb method performed the best.
In Table 1 the results for 40% of missing data are summarised. For each
dataset every method was computed with 500 runs, and the mean and standard
358 L.P. Cruz et al.
4 Conclusions
This work describes and compares several algorithms for the clustering of short
and incomplete time series data. As expected, all methods perform very well with
complete data and the performance decreases when the percentage of missing
data increases. Nevertheless it is possible to maintain a reasonable accuracy in
the final classification even for values as large as 40%. When the original time
series have some sort of underlying patterns or evident trends, the methods
using the combination of different slopes (Slopes comb) with all possible lags is
preferable. Overall, PDS with combined slopes achieves an excellent performance
in the datasets tested, with average values of zero misclassified series when 10% of
the values are missing. Interestingly, even for higher percentage of missing values,
the performance of this method with combined derivatives or slopes is very high,
with average misclassification much lower than what would be expected.
These algorithms can be applied directly in several areas of clinical stud-
ies, namely in oncology. One possible future application is the stratification of
patients based on biomarker evolution, which is expected to have a direct impact
on feature selection methods and survival analysis of oncological patients.
References
1. Cismondi, F., et al.: Missing data in medical databases: Impute, delete or classify?
Artificial Intelligence in Medicine 58(1), 63–72 (2013)
2. Westhoff, P.G., et al.: An Easy Tool to Predict Survival in Patients Receiving
Radiation Therapy for Painful Bone Metastases. International Journal of Radiation
Oncology*Biology*Physics 90(4), 739–747 (2014)
3. Harries, M., et al.: Incidence of bone metastases and survival after a diagnosis of
bone metastases in breast cancer patients. Cancer Epidemiology 38(4), 427–434
(2014)
4. Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE
Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication
of the IEEE Systems, Man, and Cybernetics Society 31(5), 735–744 (2001)
Fuzzy Clustering for Incomplete Short Time Series Data 359
5. Dixon, J.K.: Pattern recognition with partly missing data. IEEE Transactions on
Systems, Man and Cybernetics 9(10), 617–621 (1979)
6. Warren Liao, T.: Clustering of time series data - A survey. Pattern Recognition
38(11), 1857–1874 (2005)
7. Möller-Levet, C.S., Klawonn, F., Cho, K.-H., Wolkenhauer, O.: Fuzzy clustering
of short time-series and unevenly distributed sampling points. In: Berthold, M.,
Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810,
pp. 330–340. Springer, Heidelberg (2003)
General Artificial Intelligence
Allowing Cyclic Dependencies in Modular Logic
Programming
1 Introduction
Over the last few years, answer set programming (ASP) [2,6,12,15,18] emerged
as one of the most important methods for declarative knowledge representation
and reasoning. Despite its declarative nature, developing ASP programs resem-
bles conventional programming: one often writes a series of gradually improving
programs for solving a particular problem, e.g., optimizing execution time and
space. Until recently, ASP programs were considered as integral entities, which
becomes problematic as programs become more complex, and their instances
grow. Even though modularity is extensively studied in logic programming, there
are only a few approaches on how to incorporate it into ASP [1,5,8,19] or other
module-based constraint modeling frameworks [11,22]. The research on modular
systems of logic program has followed two main-streams [3], one is program-
ming in-the-large where compositional operators are defined in order to combine
different modules [8,14,20]. These operators allow combining programs alge-
braically, which does not require an extension of the theory of logic programs.
The other direction is programming-in-the-small [10,16], aiming at enhancing
logic programming with scoping and abstraction mechanisms available in other
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 363–375, 2015.
DOI: 10.1007/978-3-319-23485-4 37
364 J. Moura and C.V. Damásio
Modular aspects of ASP have been clarified in recent years, with authors describ-
ing how and when two program parts (modules) can be composed [5,11,19] under
Allowing Cyclic Dependencies in Modular Logic Programming 365
the stable model semantics. In this paper, we will make use of Oikarinen and
Janhunen’s logic program modules defined in analogy to [8] which we review
after presenting the syntax of ASP.
Answer Set Programming Logic programs in the ASP paradigm are formed by
finite sets of rules r having the following syntax:
As observed by [19], the heads of choice rules possessing multiple atoms can be
freely split without affecting their semantics. When splitting such rules into n
different rules
B1 , . . . Bk , not C1 , . . . , not Cm .
is safe, namely that a car is safe if it has an airbag; it is known that car c1 has
an airbag, c2 does not, and the choice rule states that car c3 may or may not
have an airbag.
Next, the SM semantics is generalized to cover modules by introducing a
generalization of the Gelfond-Lifschitz’s fixpoint definition. In addition to weakly
negated literals (i.e., not ), also literals involving input atoms are used in the
stability condition. In [19], the SMs of a module are defined as follows:
Intuitively, the SMs of a module are obtained from the SMs of the rules part,
for each possible combination of the input atoms.
Example 3. Program modules PB , PC , and Pmg1 have each a single answer set:
AS(PB ) = {{exp(c2 )}}, AS(PC ) = {{exp(c3 )}}, and AS(Pmg1 ) = {{saf e(c1 ),
airbag(c1 )}}.
Module Pmg2 has two SMs, namely:
{saf e(c1 ), car(c1 ), car(c2 ), car(c3 ), airbag(c1 )}, and
{saf e(c1 ), saf e(c3 ), car(c1 ), car(c2 ), car(c3 ), airbag(c1 ), airbag(c3 )}.
Alice’s ASP program module has 26 = 64 models corresponding each to an
input combination of safe and expensive atoms. Some of these models are:
However, the conditions given for ⊕ are not enough to guarantee composi-
tionality in the case of answer sets and as such they define a restricted form:
Natural join () on visible atoms is used in [19] to combine the stable models
of modules as follows:
Definition 4 (Join). Given modules P1 and P2 and sets of interpretations
A1 ⊆ 2At(P1 ) and A2 ⊆ 2At(P2 ) , the natural join of A1 and A2 is:
It is immediate to see that the module theorem holds in this case. The visible
atoms of PA are saf e/1, exp/1 and buy/1, and the visible atoms for Pmg1 are
{saf e(c1 ), saf e(c2 )}. The only model for Pmg1 = {saf e(c1 )} when naturally
joined with the models of PA , results in eight possible models where saf e(c1 ),
not saf e(c2 ), and not saf e(c3 ) hold, and exp/1 vary. The final ASP program
module Q is
{buy(X) ← car(X), saf e(X), not exp(X).
car(c ). car(c ). car(c ). exp(c ). saf e(c ).},
1 2 3 2 1
{exp(c1 )},
{buy(c1 ), buy(c2 ), buy(c3 ), exp(c2 ), saf e(c1 ), saf e(c2 ), saf e(c3 )},
{car(c1 ), car(c2 ), car(c3 )}
2.3 Shortcomings
The conditions imposed in these definitions bring about some shortcomings such
as the fact that the output signatures of two modules must be disjoint which dis-
allows many practical applications e.g., we are not able to combine the results of
program module Q with any of PC or Pmg2 , and thus it is impossible to obtain
Allowing Cyclic Dependencies in Modular Logic Programming 369
the combination of the five modules. Also because of this, the module union
operator is not reflexive. By trivially waiving this condition, we immediately
get problems with conflicting modules. The compatibility criterion for the oper-
ator also rules out the compositionality of mutually dependent modules, but
allows positive loops inside modules or negative loops in general. We illustrate
this in Example 5, which has been solved recently in [17] and the issue with
positive loops between modules in Example 6 .
Their SMs are: AS(P1 ) = AS(P2 ) = {{}, {airbag, saf e}} while the single SM
of the union AS(P1 P2 ) is the empty model {}. Therefore AS(P1 P2 ) =
AS(P1 ) AS(P2 ) = {{}, {airbag, saf e}}, also invalidating the module theorem.
Example 7. Given modules P1 = {a ← b. ⊥ ← not b.}, {b}, {a}, {} with one
SM {a, b}, and P2 = {b ← a.}, {a}, {b}, {} with SMs {} and {a, b}, their
composition has no inputs and no intended SMs while their minimal join contains
{a, b}.
We present a model join operation that requires one to look at every model of
both modules being composed in order to check for minimality on models com-
parable on account of their inputs. However, this operation is able to distinguish
between atoms that are self supported through positive loops and atoms with
proper support, allowing one to lift the condition in Definition 3 disallowing
positive dependencies between modules.
Example 8 (Minimal Join). A car is safe if it has an airbag and it has an airbag
if it is safe and the airbag is an available option. This is captured by two modules,
namely: P1 = {airbag ← saf e, available option.}, {saf e, available option},
{airbag}, ∅ and P2 = {saf e ← airbag.}, {airbag}, {saf e}, ∅ which respec-
tively have AS(P1 ) = {{}, {saf e}, {available option}, {airbag, saf e, available
option}} and AS(P2 ) = {{}, {airbag, saf e}}. The composition has
as its input signature {available option} and therefore its answer set
{airbag,safe,available option} is not minimal regarding the input signature of
the composition because {available option} is also a SM (and the only intended
model among these two). Thus AS(P1 ⊕ P2 ) = AS(P1 ) min AS(P2 ) =
{{}, {available option}}.
This join operator allows us to lift the prohibition of composing mutually depen-
dent modules under certain situations. Integrity constraints containing only
input atoms in their body are still a problem with this approach as these would
exclude models that would otherwise be minimal in the presence of unsupported
loops.
Because the former operator is not general and it forces us to compare one model
with every other model for minimality, thus it is not local, we present next an
alternative that requires adding annotations to models. We start by looking at
positive cyclic dependencies (loops) that are formed by composition. It is known
from the literature (e.g. [21]) that in order to do without looking at the rules of
the program modules being composed, which in the setting of MLP we assume
not having access to, we need to have extra information incorporated into the
models.
Allowing Cyclic Dependencies in Modular Logic Programming 371
starting from the interpretation mapping every atom into the empty set. In order
to consider input atoms in modules we set I(a) = {{a}} for every a ∈ M ∩ I,
and {} otherwise.
Definition 11 (Modified Join). Given two compatible annotated (in the sense
of Definition 10) modules P1 , P2 , their composition is P1 ⊗P2 = P1 ⊕P2 provided
that (i) P1 ⊕ P2 is defined. This way, given modules P1 and P2 and sets of
annotated interpretations AA1 ⊆ 2
At (P1 )
and AA
2 ⊆ 2
At (P2 )
, the natural join of
AA1 and AA
2 , denoted by AA
1 A A A
2 , is defined as follows for intersecting output
atoms:
Example 10 (Cyclic Dependencies Revisited). Take again the two program mod-
ules in Example 6:
which respectively have annotated models AS A (P1 )= {{}, {airbag{saf e} , saf e}}
and AS A (P2 )={{},{airbag, saf e{airbag} }} while AS A (P1 ⊗ P2 ) = {{},
{airbag{saf e} , saf e{airbag} }}. Because of this, AS A (P1 ⊗ P2 ) = AS A (P1 ) A
AS A (P2 ). Now, take P3 = {airbag.}, {}, {airbag}, ∅ and compose it with
P1 ⊗ P2 . We get AS A (P1 ⊗ P2 ⊗ P3 )= {{airbag, saf e}}.
References
1. Babb, J., Lee, J.: Module theorem for the general theory of stable models. TPLP
12(4–5), 719–735 (2012)
2. Baral, C.: Knowledge Representation, Reasoning, and Declarative Problem Solv-
ing. Cambridge University Press (2003)
3. Bugliesi, M., Lamma, E., Mello, P.: Modularity in logic programming. J. Log.
Program. 19(20), 443–502 (1994)
4. Viegas Damásio, C., Moura, J.: Modularity of P-log programs. In: Delgrande, J.P.,
Faber, W. (eds.) LPNMR 2011. LNCS, vol. 6645, pp. 13–25. Springer, Heidelberg
(2011)
5. Dao-Tran, M., Eiter, T., Fink, M., Krennwallner, T.: Modular nonmonotonic logic
programming revisited. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS,
vol. 5649, pp. 145–159. Springer, Heidelberg (2009)
6. Eiter, T., Faber, W., Leone, N., Pfeifer, G.: Computing preferred and weakly pre-
ferred answer sets bymeta-interpretation in answer set programming. In: Proceed-
ings AAAI 2001 Spring Symposium on Answer Set Programming, pp. 45–52. AAAI
Press (2001)
Allowing Cyclic Dependencies in Modular Logic Programming 375
7. Ferraris, P., Lifschitz, V.: Weight constraints as nested expressions. TPLP 5(1–2),
45–74 (2005)
8. Gaifman, H., Shapiro, E.: Fully abstract compositional semantics for logic
programs. In: Symposium on Principles of Programming Languages, POPL,
pp. 134–142. ACM, New York (1989)
9. Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In:
Proceedings of the 5th International Conference on Logic Program. MIT Press
(1988)
10. Giordano, L., Martelli, A.: Structuring logic programs: a modal approach. The
Journal of Logic Programming 21(2), 59–94 (1994)
11. Järvisalo, M., Oikarinen, E., Janhunen, T., Niemelä, I.: A module-based framework
for multi-language constraint modeling. In: Erdem, E., Lin, F., Schaub, T. (eds.)
LPNMR 2009. LNCS, vol. 5753, pp. 155–168. Springer, Heidelberg (2009)
12. Lifschitz, V.: Answer set programming and plan generation. Artificial Intelligence
138(1–2), 39–54 (2002)
13. Lifschitz, V., Pearce, D., Valverde, A.: Strongly equivalent logic programs. ACM
Transactions on Computational Logic 2, 2001 (2000)
14. Mancarella, P., Pedreschi, D.: An algebra of logic programs. In: ICLP/SLP,
pp. 1006–1023 (1988)
15. Marek, V.W., Truszczynski, M.: Stable models and an alternative logic program-
ming paradigm. In: The Logic Programming Paradigm: A 25-Year Perspective
(1999)
16. Miller, D.: A theory of modules for logic programming. In. In Symp. Logic Pro-
gramming, pp. 106–114 (1986)
17. Moura, J., Damásio, C.V.: Generalising modular logic programs. In: 15th Interna-
tional Workshop on Non-Monotonic Reasoning (NMR 2014) (2014)
18. Niemelä, I.: Logic programs with stable model semantics as a constraint program-
ming paradigm. Annals of Mathematics and Artificial Intelligence 25, 72–79 (1998)
19. Oikarinen, E., Janhunen, T.: Achieving compositionality of the stable model
semantics for smodels programs. Theory Pract. Log. Program. 8(5–6), 717–761
(2008)
20. O’Keefe, R.A.: Towards an algebra for constructing logic programs. In: SLP,
pp. 152–160 (1985)
21. Slota, M., Leite, J.: Robust equivalence models for semantic updates of answer-set
programs. In: Brewka, G., Eiter, T., McIlraith, S.A. (eds.) Proc. of KR 2012. AAAI
Press (2012)
22. Tasharrofi, S., Ternovska, E.: A semantic account for modularity in multi-language
modelling of search problems. In: Tinelli, C., Sofronie-Stokkermans, V. (eds.) Fro-
CoS 2011. LNCS, vol. 6989, pp. 259–274. Springer, Heidelberg (2011)
Probabilistic Constraint Programming
for Parameters Optimisation
of Generative Models
Massimiliano Zanin(B) , Marco Correia, Pedro A.C. Sousa, and Jorge Cruz
Abstract. Complex networks theory has commonly been used for mod-
elling and understanding the interactions taking place between the ele-
ments composing complex systems. More recently, the use of generative
models has gained momentum, as they allow identifying which forces and
mechanisms are responsible for the appearance of given structural prop-
erties. In spite of this interest, several problems remain open, one of the
most important being the design of robust mechanisms for finding the
optimal parameters of a generative model, given a set of real networks. In
this contribution, we address this problem by means of Probabilistic Con-
straint Programming. By using as an example the reconstruction of net-
works representing brain dynamics, we show how this approach is superior
to other solutions, in that it allows a better characterisation of the param-
eters space, while requiring a significantly lower computational cost.
1 Introduction
The last decades have witnessed a revolution in science, thanks to the appearance
of the concept of complex systems: systems that are composed of a large number
of interacting elements, and whose interactions are as important as the elements
themselves [1]. In order to study the structures created by such relationships,
several tools have been developed, among which complex networks theory [2,3],
a statistical mechanics understanding of graph theory, stands out.
Complex networks have been used to characterise a large number of different
systems, from social [4] to transportation ones [5]. They have also been valu-
able in the study of brain dynamics, as one of the greatest challenges in modern
science is the characterisation of how the brain organises its activity to carry
out complex computations and tasks. Constructing a complete picture of the
computation performed by the brain requires specific mathematical, statistical
and computational techniques. As brain activity is usually complex, with differ-
ent regions coordinating and creating temporally multi-scale, spatially extended
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 376–387, 2015.
DOI: 10.1007/978-3-319-23485-4 38
Probabilistic Constraint Programming for Parameters Optimisation 377
networks, complex networks theory appears as the natural framework for its
characterisation.
When complex networks are applied to brain dynamics, nodes are associated
to sensors (e.g. measuring the electric and magnetic activity of neurons), thus
to specific brain locations, and links to some specific conditions. For instance,
brain functional networks are constructed such that pairs of nodes are connected
if some kind of synchronisation, or correlated activity, is detected in those nodes
- the rationale being that a coordinated dynamics is the result of some kind
of information sharing [6]. Once these networks are reconstructed, graph the-
ory allows endowing them with a great number of quantitative properties, thus
vastly enriching the set of objective descriptors of brain structure and function
at neuroscientists’ disposal. This has especially been fruitful in the characterisa-
tion of the differences between healthy (control) subjects and patients suffering
from neurologic pathologies [7].
Once the topology (or structure) of a network has been described, a further
question may be posed: can such topology be explained by a set of simple gen-
erative rules, like a higher connectivity of neighbouring regions, or the influence
of nodes physical position? When a set of rules (a generative model) has been
defined, it has to be optimised and validated: one ought to obtain the best set
of parameters, such that the networks yielded by the model are topologically
equivalent to the real ones. This usually requires maximising a function of the
p-values representing the differences between the characteristics of the synthetic
and real networks. In spite of being accepted as a standard strategy, this method
presents several drawbacks. First, its high computational complexity: large sets
of networks have to be created and analysed for every possible combination of
parameters; and second, its unfitness for assessing the presence of multiple local
minima.
In this contribution, we propose the use of probabilistic constraint program-
ming (PCP) for characterising the space created by the parameters of a gen-
erative model, i.e. a space representing the distance between the topological
characteristics of real and synthetic networks. We show how this approach allows
recovering a larger quantity of information about the relationship between model
parameters and network topology, with a fraction of the computational cost
required by other methods. Additionally, PCP can be applied to single subjects
(networks), thus avoiding the constraints associated with working with a large
and homogeneous population. We further validate the PCP approach by study-
ing a simple generative model, and by applying it to a data set of brain activity
of healthy people.
The remainder of the text is organised as follows. Besides this introduction,
Sections 2 and 2.1 respectively review the state of the art in constraint program-
ming and its probabilistic version. Afterwards, the application of PCP is pre-
sented in Section 3 for a data set of brain magneto-encephalographic recordings,
and the advantages of PCP are discussed in Section 4. Finally, some conclusions
are drawn in Section 5.
378 M. Zanin et al.
2 Constraint Programming
A constraint satisfaction problem [8] is a classical artificial intelligence paradigm
characterised by a set of variables and a set of constraints, the latter specifying
relations among subsets of these variables. Solutions are assignments of values
to all variables that satisfy all the constraints.
Constraint programming is a form of declarative programming, in the sense
that instead of specifying a sequence of steps to be executed, it relies on prop-
erties of the solutions to be found that are explicitly defined by the constraints.
A constraint programming framework must provide a set of constraint reasoning
algorithms that take advantage of constraints to reduce the search space, avoid-
ing regions inconsistent with the constraints. These algorithms are supported by
specialised techniques that explore the specificity of the constraint model, such
as the domain of its variables and the structure of its constraints.
Continuous constraint programming [9,10] has been widely used to model
safe reasoning in applications where uncertainty on the values of the variables is
modelled by intervals including all their possibilities. A Continuous Constraint
Satisfaction Problem (CCSP) is a triple X, D, C, where X is a tuple of n real
variables x1 , · · · , xn , D is a Cartesian product of intervals D(x1 ) × · · · × D(xn )
(a box), each D(xi ) being the domain of variable xi , and C is a set of numerical
constraints (equations or inequalities) on subsets of the variables in X. A solution
of the CCSP is a value assignment to all variables satisfying all the constraints
in C. The feasible space F is the set of all CCSP solutions within D.
Continuous constraint reasoning relies on branch-and-prune algorithms [11]
to obtain sets of boxes that cover exact solutions for the constraints (the feasible
space F ). These algorithms begin with an initial crude cover of the feasible space
(the initial search space, D) which is recursively refined by interleaving pruning
and branching steps until a stopping criterion is satisfied. The branching step
splits a box from the covering into sub-boxes (usually two). The pruning step
either eliminates a box from the covering or reduces it into a smaller (or equal)
box maintaining all the exact solutions. Pruning is achieved through an algo-
rithm [12] that combines constraint propagation and consistency techniques [13]:
each box is reduced through the consecutive application of narrowing operators
associated with the constraints, until a fixed-point is attained. These opera-
tors must be correct (do not eliminate solutions) and contracting (the obtained
box is contained in the original). To guarantee such properties, interval analysis
methods are used.
Interval analysis [14] is an extension of real analysis that allows computations
with intervals of reals instead of reals, where arithmetic operations and unary
functions are extended for interval operands. For instance, [1, 3] + [3, 7] results
in the interval [4, 10], which encloses all the results from a point-wise evaluation
of the real arithmetic operator on all the values of the operands. In practice these
extensions simply consider the bounds of the operands to compute the bounds
of the result, since the involved operations are monotonic. As such, the narrow-
ing operator Z ← Z ∩ (X + Y ) may be associated with constraint x + y = z
to prune the domain of variable z based on the domains of variables x and y.
Probabilistic Constraint Programming for Parameters Optimisation 379
Similarly, in solving the equation with respect to x and y, two additional narrow-
ing operators can be associated with the constraint, to safely narrow the domains
of these variables. With this technique, based on interval arithmetic, the obtained
narrowing operators are able to reduce a box X × Y × Z = [1, 3] × [3, 7] × [0, 5]
into [1, 2] × [3, 4] × [4, 5], with the guarantee that no possible solution is lost.
Fig. 1. Schematic representation of the use of generative models for analysing func-
tional networks. f and fˆ respectively represent real and synthetic topological features,
as the ones described in Sec. 3.2. Refer to Sec. 3 for a description of all steps of the
analysis.
of the sampling space, where a pure non-naı̈ve Monte Carlo (adaptive) method
is not only hard to tune, but also impractical in small error settings.
In order to validate the use of PCP for analysing the parameters space of a
generative models, here we consider a set of magneto-encephalographic (MEG)
recordings. A series of preliminary steps are required, as shown in Fig. 1. First,
starting from the left, real brain data (or data representing any other real com-
plex system) have to be recorded and encoded in networks, then transformed
into a set of topological (structural) features. In parallel, as depicted in the right
part, a generative model has to be defined: this allows to generate networks
as a function of the model parameters, and extract their topological features.
Finally, both features should be matched, i.e. the model parameters should be
optimised to minimise the distance between the vectors of topological features
of the synthetic and real networks.
0.5 0.5
0.4 0.4
Link density
Link density
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
dij being the distance between nodes i and j, i.e. the number of jumps
required to travel between them. A low value of E implies that all brain
regions are connected by short paths.
It has to be noticed how these two measures are complementary, the cluster-
ing coefficient and efficiency respectively representing the segregation and inte-
gration of information [24,25]. Additionally, both C and E are here defined as
a function of the threshold τ applied to prune the networks - their evolution is
represented in Fig. 2.
ki,j is the number of neighbours common to i and j, and di,j is the physical
distance between the two nodes. This model thus includes two different forces
that compete to create links. On one side, γ controls the appearance of trian-
gles in the network, by positive biasing the connectivity between nodes having
nearest neighbours in common; it thus defines the clustering coefficient and the
appearance of computational communities. On the other side, η accounts for the
distance in the connection, such that long-range connections, which are biologi-
cally costly, are penalised.
4.0
2.0
2.0 4.0
Second parameter (η)
Fig. 3. (Left) Contour plot of the energy E (see Eq. 4) in the parameters space, for
a link density of 0.3. (Right) Energy contour plots for ten link densities, from 0.05
(bottom) to 0.5 (top); for the sake of clarity, only region outlines are visible.
and E = f˜E (γ, η). Afterwards, each observed feature oi is modelled as a function
fi of the model parameters plus an associated error term i ∼ N μ = 0, σ 2 :
oi = fi (γ, η) + i
3σ being chosen to keep the error within reasonable bounds, and the joint p.d.f.
f,
n
f (γ, η) = g (oi − fi (γ, η)) (5)
i=1
Results presented in Sec. 3.4 and 3.5 allow comparing the p-value and PCP
methods, and highlight the advantages that the latter presents over the former.
The extremely high computational cost of analysing the parameters space
by means of K-S tests seldom allows a full characterisation of such space. This
is due to the fact that, for any set of parameters, a large number of networks
have to be created and characterised. Increasing the resolution of the analysis, or
enlarging the region of the space considered, increases the computational cost in
a linear way. This problem is far from being trivial, as, for instance, the networks
required to create Fig. 3 represents approximatively 3 GB of information and
several days of computation in a standard computer. Such computational cost
implies that it is easy to miss some important information. Let us consider, for
instance, the result presented in Fig. 3 Left. The shape of the iso-lines suggests
that the maximum is included in the region under analysis, and that no further
explorations are required - while Figs. 4 and 5 prove otherwise.
Probabilistic Constraint Programming for Parameters Optimisation 385
Fig. 4. Contour plot of the parameters space, as obtained by the PCP method, for
the whole population of subjects and as a function of the link density. The colour of
each point represents the normalised probability of generating topologically equivalent
networks.
11
First parameter ( )
7
First parameter ( )
-1
Fig. 5. (Left) Parameters space, as obtained with the PCP method, for a link density
of 0.3 and for the whole studied population. (Right) Parameters space for six subjects.
The scale of the right graphs is the same as the left one; the colour scale is the same
of the one of Fig. 4.
On the other hand, estimating the functions f˜C and f˜E requires the creation
and analysis of a constant number of networks, independently on the size of
the parameters space. The total computational cost drops below the hour in
a standard computer, implying a 3 orders of magnitude reduction. This has
important consequences on the kind of information one can obtain. Fig. 5 Left
presents the same information as Fig. 3 Left, but calculated by means of PCP
386 M. Zanin et al.
over a larger region. It is then clear that the maximum identified in Fig. 3 is just
one of the two maxima presents in the system.
The second important advantage is that, while the PCP can yield results for
just one network or subject, a p-value analysis requires a probability distribu-
tion. It is therefore not possible to characterise the parameters space for just
one subject, but only for a large population. Fig. 5 Right explores this issue, by
showing the probability evolution in the parameters space for six different sub-
jects. It is interesting to notice how subjects are characterised by different shapes
in the space. This allows a better description of subjects, aimed for instance at
detecting differences among them.
5 Conclusions
In this contribution, we have presented the use of Probabilistic Constraint Pro-
gramming for optimising the parameters of a generative model, aimed at describ-
ing the mechanisms responsible for the appearance of some given topological
structures in real complex networks. As a validation case, we have here presented
the results corresponding to functional networks of brain activity, as obtained
through MEG recordings of healthy people.
The advantages of this method against other customary solutions, e.g. the
use of p-values obtained from Kolmogorov-Smirnoff tests, have been discussed.
First, the lower computational cost, and especially its independence on the size
of the parameters space and on the resolution of the analysis. This allows a bet-
ter characterisation of such space, reducing the risk of missing relevant results
when multiple local minima are present. Second, the possibility of characteris-
ing the parameters space for single subjects, thus avoiding the need of having
data for a full population. This will in turn open new doors for understand-
ing the differences between individuals: as, for instance, for the identification of
characteristics associated to specific diseases in diagnosis and prognosis tasks.
References
1. Anderson, P.W.: More is different. Science 177, 393–396 (1972)
2. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Reviews of
Modern Physics 74, 47 (2002)
3. Newman, M.E.: The structure and function of complex networks. SIAM Review
45, 167–256 (2003)
4. Costa, L.D.F., Oliveira Jr, O.N., Travieso, G., Rodrigues, F.A., Villas Boas, P.R.,
Antiqueira, L., Viana, M.P., Correa Rocha, L.E.: Analyzing and modeling real-
world phenomena with complex networks: a survey of applications. Advances in
Physics 60, 329–412 (2011)
5. Zanin, M., Lillo, F.: Modelling the air transport with complex networks: A short
review. The European Physical Journal Special Topics 215, 5–21 (2013)
6. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis
of structural and functional systems. Nature Reviews Neuroscience 10, 186–198
(2009)
Probabilistic Constraint Programming for Parameters Optimisation 387
7. Papo, D., Zanin, M., Pineda-Pardo, J.A., Boccaletti, S., Buldú, J.M.: Functional
brain networks: great expectations, hard times and the big leap forward. Philo-
sophical Transactions of the Royal Society of London B: Biological Sciences 369,
20130525 (2014)
8. Mackworth, A.K.: Consistency in networks of relations. Artificial Intelligence 8,
99–118 (1977)
9. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th
IJCAI, pp. 232–238 (1993)
10. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138 (1994)
11. Van Hentenryck, P., McAllester, D., Kapur, D.: Solving polynomial systems using
a branch and prune approach. SIAM Journal on Numerical Analysis 34, 797–827
(1997)
12. Granvilliers, L., Benhamou, F.: Algorithm 852: realpaver: an interval solver using
constraint satisfaction techniques. ACM Transactions on Mathematical Software
32, 138–156 (2006)
13. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.-F.: Revising hull and box
consistency. In: Procs. of ICLP, pp. 230–244 (1999)
14. Moore, R.: Interval analysis. Prentice-Hall, Englewood Cliffs (1966)
15. Carvalho, E.: Probabilistic constraint reasoning. PhD Thesis (2012)
16. Halpern, J.Y.: Reasoning about uncertainty. MIT, Cambridge (2003)
17. Hammersley, J.M., Handscomb, D.C.: Monte Carlo methods. Methuen, London
(1964)
18. Carvalho, E., Cruz, J., Barahona, P.: Probabilistic constraints for nonlinear inverse
problems. Constraints 18, 344–376 (2013)
19. Maestú, F., Fernández, A., Simos, P.G., Gil-Gregorio, P., Amo, C., Rodriguez, R.,
Arrazola, J., Ortiz, T.: Spatio-temporal patterns of brain magnetic activity during
a memory task in Alzheimer’s disease. Neuroreport 12, 3917–3922 (2001)
20. Stam, C.J., Van Dijk, B.W.: Synchronization likelihood: an unbiased measure
of generalized synchronization in multivariate data sets. Physica D: Nonlinear
Phenomena 163, 236–251 (2002)
21. Yang, S., Duan, C.: Generalized synchronization in chaotic systems. Chaos, Solitons
& Fractals 9, 1703–1707 (1998)
22. Newman, M.E.: Scientific collaboration networks. I. Network construction and fun-
damental results. Physical Review E 64, 016131 (2001)
23. Latora, V., Marchiori, M.: Efficient behavior of small-world networks. Physical
Review Letters 87, 198–701 (2001)
24. Tononi, G., Sporns, O., Edelman, G.M.: A measure for brain complexity: relating
functional segregation and integration in the nervous system. Proceedings of the
National Academy of Sciences 91, 5033–5037 (1994)
25. Rad, A.A., Sendiña-Nadal, I., Papo, D., Zanin, M., Buldu, J.M., del Pozo, F.,
Boccaletti, S.: Topological measure locating the effective crossover between segre-
gation and integration in a modular network. Physical Review Letters 108, 228701
(2012)
26. Vértes, P.E., Alexander-Bloch, A.F., Gogtay, N., Giedd, J.N., Rapoport, J.L.,
Bullmore, E.T.: Simple models of human brain functional networks. Proceedings
of the National Academy of Sciences 109, 5868–5873 (2012)
27. Vértes, P.E., Alexander-Bloch, A., Bullmore, E.T.: Generative models of rich clubs
in Hebbian neuronal networks and large-scale human brain networks. Philosophical
Transactions of the Royal Society B: Biological Sciences 369, 20130531 (2014)
Reasoning over Ontologies
and Non-monotonic Rules
1 Introduction
Ontology languages in the form of Description Logics (DLs) [4] and non-
monotonic rule languages as known from Logic Programming (LP) [6] are both
well-known formalisms in knowledge representation and reasoning (KRR) each
with its own distinct benefits and features. This is also witnessed by the emer-
gence of the Web Ontology Language (OWL) [18] and the Rule Interchange
Format (RIF) [7] in the ongoing standardization of the Semantic Web driven by
the W3C1 .
On the one hand, ontology languages have become widely used to represent
and reason over taxonomic knowledge and, since DLs are (usually) decidable
fragments of first-order logic, are monotonic by nature which means that once
drawn conclusions persist when adopting new additional information. They also
allow reasoning on abstract information, such as relations between classes of
objects even without knowing any concrete instances and a main theme inherited
from DLs is the balance between expressiveness and complexity of reasoning. In
fact, the very expressive general language OWL 2 with its high worst-case com-
plexity includes three tractable (polynomial) profiles [27] each with a different
application purpose in mind.
1
http://www.w3.org
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 388–401, 2015.
DOI: 10.1007/978-3-319-23485-4 39
Reasoning over Ontologies and Non-monotonic Rules 389
2
http://www.ihtsdo.org/snomed-ct/
3
http://protege.stanford.edu
390 V. Ivanov et al.
Syntax Semantics
atomic concept A ∈ NC AI ⊆ ΔI
atomic role R ∈ NR R ⊆ ΔI × ΔI
I
individual a ∈ NI aI ∈ Δ I
top ΔI
bottom ⊥ ∅
conjunction C D C I ∩ DI
existential restriction ∃R.C {x ∈ ΔI | ∃y ∈ ΔI : (x, y) ∈ RI ∧ y ∈ C I }
concept inclusion C D C I ⊆ DI
role inclusion R S RI ⊆ S I
role composition R1 ◦ · · · ◦ Rk S (x1 , x2 ) ∈ R1 ∧ . . . ∧ (xk , y) ∈ RkI → (x1 , y) ∈ S I
I
features including the possibility to load and edit rule bases, and define pred-
icates with arbitrary arity; guaranteed termination of query answering, with a
choice between one/many answers; robustness w.r.t. inconsistencies between the
ontology and the rule part and demonstrate its effective usage on the applica-
tion use-case combining EL+ ⊥ ontologies and non-monotonic rules outlined in
the following and adapted from [29], as well as an evaluation for real ontology
SNOMED CT with over 300,000 concepts.
Example 1. The customs service for any developed country assesses imported
cargo for a variety of risk factors including terrorism, narcotics, food and con-
sumer safety, pest infestation, tariff violations, and intellectual property rights.
Assessing this risk, even at a preliminary level, involves extensive knowledge
about commodities, business entities, trade patterns, government policies and
trade agreements. Parts of this knowledge is ontological information and taxo-
nomic, such as the classification of commodities, while other parts require the
CWA and thus non-monotonic rules, such as the policies involving, e.g., already
known suspects. The overall task then is to access all the information and assess
whether some shipment should be inspected in full detail, under certain condi-
tions randomly, or not at all.
2 Preliminaries
2.1 Description Logic EL+
⊥
knowledge base K is a pair (O, P). A rule r is DL-safe if all its variables occur
in at least one non-DL-atom Ai with 1 ≤ i ≤ n, and K is DL-safe if all its rules
are DL-safe. The ground instantiation of K is the KB KG = (O, PG ) where PG
is obtained from P by replacing each rule r of P with a set of rules substituting
each variable in r with constants from K in all possible ways.
DL-safety ensures decidability of reasoning with MKNF knowledge bases and can
be achieved by introducing a new predicate o, adding o(i) to P for all constants
i appearing in K and, for each rule r ∈ P, adding o(X) for each variable X
appearing in r to the body of r. Therefore, we only consider DL-safe MKNF
knowledge bases.
The semantics of K is based on a transformation of K into an MKNF formula
to which the MKNF semantics can be applied (see [22,26,28] for details). Instead
of spelling out the technical details of the original MKNF semantics [28] or its
three-valued counterpart [22], we focus on a compact representation of models
for which the computation of the well-founded MKNF model is defined5 . This
representation is based on a set of K-atoms and π(O), the translation of O into
first-order logic.
Definition 3. Let KG = (O, PG ) be a ground hybrid MKNF knowledge base.
The set of K-atoms of KG , written KA(KG ), is the smallest set that contains (i)
all ground atoms occurring in PG , and (ii) an atom ξ for each ground not-atom
notξ occurring in PG . For a subset S of KA(KG ), the objective knowledge of S
w.r.t. KG is the set of first-order formulas OBO,S = {π(O)} ∪ S.
The set KA(KG ) contains all atoms occurring in KG , only with not-atoms substi-
tuted by corresponding atoms, while OBO,S provides a first-order representation
of O together with a set of known/derived facts. In the three-valued MKNF
semantics, this set of K-atoms can be divided into true, undefined and false
atoms. Next, we recall operators from [22] that derive consequences based on
KG and a set of K-atoms that is considered to hold.
Definition 4. Let KG = (O, PG ) be a positive, ground hybrid MKNF knowledge
base. The operators RKG , DKG , and TKG are defined on subsets of KA(KG ):
The operator TKG is monotonic, and thus has a least fixpoint TKG ↑ ω. Trans-
formations can be defined that turn an arbitrary hybrid MKNF KB KG into a
positive one (respecting the given set S) to which TKG can be applied. To ensure
coherence, i.e., that classical negation in the DL enforces default negation in the
rules, two slightly different transformations are defined (see [22] for details).
5
Strictly speaking, this computation yields the so-called well-founded partition from
which the well-founded MKNF model is defined (see [22] for details).
Reasoning over Ontologies and Non-monotonic Rules 393
NoHR Plugin
ELK XSB
Knowledge
Base
Translator
Based on these two antitonic operators [22], two sequences Pi and Ni are
defined, which correspond to the true and non-false derivations.
P0 = ∅ N0 = KA(KG )
Pn+1 = ΓKG (Nn ) Nn+1 = ΓK G (Pn )
Pω = Pi Nω = Ni
The fixpoints yield the well-founded MKNF model [22] (in polynomial time).
Fig. 2. NoHR Query tab with a query for TariffCharge(x, y) (see Sect. 4)
3 System Description
In this section, we briefly describe the architecture of the plug-in for Protégé
as shown in Fig. 1 and discuss some features of the implementation and how
querying is realized.
The input for the plug-in consists of an OWL file in the DL EL+ ⊥ as described
in Sect. 2.1, which can be manipulated as usual in Protégé, and a rule file. For
the latter, we provide a tab called NoHR Rules that allows us to load, save and
edit rule files in a text panel following standard Prolog conventions.
The NoHR Query tab (see Fig. 2) also allows for the visualization of the
rules, but its main purpose is to provide an interface for querying the combined
KB. Whenever the first query is posed by pushing “Execute”, a translator is
started, initiating the ontology reasoner ELK [21] tailored for EL+ ⊥ and con-
siderably faster than other reasoners when comparing classification time [21].
ELK is used to classify the ontology O and then return the inferred axioms to
the translator. It is also verified whether DisjointW ith axioms appear in O,
i.e., in EL+⊥ notation, axioms of the form C D ⊥ for arbitrary classes C
and D, which determines whether inconsistencies may occur in the combined
hybrid knowledge base. Then the result of the classification is translated into
rules and joined with the already given non-monotonic rules in P, and the result
is conditionally further transformed if inconsistency detection is required.
The result is used as input for the top-down query engine XSB Prolog6
which realizes the well-founded semantics for logic programs [13]. To guarantee
6
http://xsb.sourceforge.net
Reasoning over Ontologies and Non-monotonic Rules 395
full compatibility with XSB Prolog’s more restrictive admitted input syntax,
the joint resulting rule set is further transformed such that all predicates and
constants are encoded using MD5. The result is transfered to XSB via Inter-
Prolog [9]7 , which is an open-source Java front-end allowing the communication
between Java and a Prolog engine.
Next, the query is sent via InterProlog to XSB, and answers are returned to
the query processor, which collects them and sets up a table showing for which
variable substitutions we obtain true, undefined, or inconsistent valuations (or
just shows the truth value for a ground query). The table itself is shown in the
Result tab (see Fig. 2) of the Output panel, while the Log tab shows measured
times of pre-processing the knowledge base and answering the query. XSB itself
not only answers queries very efficiently in a top-down manner, with tabling, it
also avoids infinite loops.
Once the query has been answered, the user may pose other queries, and the
system will simply send them directly without any repeated preprocessing. If
the user changes data in the ontology or in the rules, then the system offers the
option to recompile, but always restricted to the part that actually changed.
* * * O* * *
* * * P * * *
The overall task then is to access all the information and assess whether some
shipment should be inspected in full detail, under certain conditions randomly,
or not at all. In fact, an inspection is considered if either a random inspection is
indicated, or some shipment is not compliant, i.e., there is a mismatch between
the filed cargo codes and the actually carried commodities, or some suspicious
cargo is observed, in this case tomatoes from slovakia. In the first case, a potential
random inspection is indicated whenever certain exclusion conditions do not
hold. To ensure that one can distinguish between strictly required and random
inspections, a random inspection is assigned the truth value undefined based on
the rule Random(x) ← ShpmtCommod(x, y), notRandom(x).
The result of querying this knowledge base for Inspection(x) reveals that of
the three shipments, s2 requires an inspection (due to mislabeling) while s1 may
be subject to a random inspection as it does not knowingly originate from the
EU. It can also be verified using the tool that preprocessing the knowledge base
can be handled within 300ms and the query only takes 12ms, which certainly
suffices as interactive response. Please also note that the example indeed utilizes
the features of rules and ontologies: for example exceptions to the potential
random inspections can be expressed, but at the same time, taxonomic and non-
closed knowledge is used, e.g., some shipment may in fact originate from the EU,
this information is just not available.
5 Evaluation
In this section, we present some tests showing that a) the huge EL+ ontology
SNOMED CT can be preprocessed for querying in a short period of time, b)
adding rules increases the time of the translation only linearly, and c) querying
time is in comparison to a) and b) in general completely neglectable. We per-
formed the tests on a Mac book air 13 under Mac OS X 10.8.4 with a 1.8 GHz
Intel Core i5 processor and 8 GB 1600 MHz DDR3 of memory. We ran all tests
in a terminal version and Java with the “-XX:+AggressiveHeap” option, and
test results are averages over 5 runs.
We considered SNOMED CT, freely available for research and evaluation9 ,
and added a varying number of non-monotonic rules. These rules were generated
arbitrarily, using predicates from the ontology and additional new predicates (up
to arity three), producing rules with a random number of body atoms varying
from 1 to 10 and facts (rules without body atoms) with a ratio of 1:10. Note
that, due to the translation of the DL part into rules, all atoms literally become
non-DL-atoms. So ensuring that each variable appearing in the rule is contained
in at least one non-negated body atom suffices to guarantee DL-safety for these
rules.
The results are shown in Fig. 4 (containing also a constant line for classifica-
tion of ELK alone and starting with the values for the case without additional
rules), and clearly show that a) preprocessing an ontology with over 300,000
concepts takes less than 70 sec. (time for translator+loading in XSB), b) the
9
http://www.ihtsdo.org/licensing/
398 V. Ivanov et al.
60
50
40
Time (s)
30 ELK
Translator
20
XSB
10
0
0 44 88 132 176 220 264 308 352 396 440
NM Rules + Facts ( 1000)
time of translator and loading the file in XSB only grows linearly on the number
of rules with a small degree, in particular in the case of translator, and c) even
with up to 500,000 added rules the time for translating does not surpass ELK
classification, which itself is really fast [21], by more than a factor 2.5. All this
data indicates that even on with a very large ontology, preprocessing can be
handled very efficiently.
Finally, we also tested the querying time. To this purpose, we randomly
generated and handcrafted several queries of different sizes and shapes using
SNOMED with a varying number of non-monotonic rules as described before.
In all cases, we observed that the query response time is interactive, observing
longer reply times only if the number of replies is very high because either the
queried class contains many subclasses in the hierarchy or if the arbitrarily gen-
erated rules create too many meaningless links, thus in the worst case requiring
to compute the entire model. Requesting only one solution avoids this problem.
Still, the question of realistic randomly generated rule bodies for testing querying
time remain an issue of future work.
6 Conclusions
We have presented NoHR, the first plug-in for the ontology editor Protégé that
integrates non-monotonic rules and top-down queries with ontologies in the OWL
2 profile OWL 2 EL. We have discussed how this procedure is implemented as
a tool and shown how it can be used to implement a real use case on cargo
shipment inspections. We have also presented an evaluation which shows that
the tool is applicable to really huge ontologies, here SNOMED CT.
There are several relevant approaches discussed in the literature. Most closely
related are probably [15,23], because both build on the well-founded MKNF
semantics [22]. In fact, [15] is maybe closest in spirit to the original idea of
Reasoning over Ontologies and Non-monotonic Rules 399
References
1. Alberti, M., Knorr, M., Gomes, A.S., Leite, J., Gonçalves, R., Slota, M.: Norma-
tive systems require hybrid knowledge bases. In: Procs. of AAMAS, IFAAMAS,
pp. 1425–1426 (2012)
2. Alferes, J.J., Knorr, M., Swift, T.: Query-driven procedures for hybrid MKNF
knowledge bases. ACM Trans. Comput. Log. 14(2), 1–43 (2013)
3. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
and relations. J. Artif. Intell. Res. (JAIR) 36, 1–69 (2009)
4. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.
(eds.): The Description Logic Handbook: Theory, Implementation, and Applica-
tions, 3rd edn. Cambridge University Press (2010)
5. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Procs. of IJCAI
(2005)
6. Baral, C., Gelfond, M.: Logic programming and knowledge representation. J. Log.
Program. 19(20), 73–148 (1994)
7. Boley, H., Kifer, M. (eds.): RIF Overview. W3C Recommendation, February 5,
2013 (2013). http://www.w3.org/TR/rif-overview/
400 V. Ivanov et al.
8. Bonatti, P.A., Faella, M., Sauro, L.: EL with default attributes and overriding.
In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z.,
Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 64–79.
Springer, Heidelberg (2010)
9. Calejo, M.: InterProlog: towards a declarative embedding of logic programming
in java. In: Alferes, J.J., Leite, J. (eds.) JELIA 2004. LNCS (LNAI), vol. 3229,
pp. 714–717. Springer, Heidelberg (2004)
10. Drabent, W., Henriksson, J., Maluszynski, J.: Hd-rules: a hybrid system interfacing
prolog with dl-reasoners. In: Procs. of ALPSWS, vol. 287 (2007)
11. Drabent, W., Maluszynski, J.: Hybrid rules with well-founded semantics. Knowl.
Inf. Syst. 25(1), 137–168 (2010)
12. Eiter, T., Ianni, G., Lukasiewicz, T., Schindlauer, R., Tompits, H.: Combining
answer set programming with description logics for the semantic web. Artif. Intell.
172(12–13), 1495–1539 (2008)
13. Gelder, A.V., Ross, K.A., Schlipf, J.S.: The well-founded semantics for general
logic programs. J. ACM 38(3), 620–650 (1991)
14. Giordano, L., Gliozzi, V., Olivetti, N., Pozzato, G.L.: Reasoning about typicality
in low complexity dls: the logics EL⊥ Tmin and DL-Lite c Tmin . In: Procs. of IJCAI
(2011)
15. Gomes, A.S., Alferes, J.J., Swift, T.: Implementing query answering for hybrid
MKNF knowledge bases. In: Carro, M., Peña, R. (eds.) PADL 2010. LNCS,
vol. 5937, pp. 25–39. Springer, Heidelberg (2010)
16. Gonçalves, R., Alferes, J.J.: Parametrized logic programming. In: Janhunen, T.,
Niemelä, I. (eds.) JELIA 2010. LNCS, vol. 6341, pp. 182–194. Springer, Heidelberg
(2010)
17. Heymans, S., Eiter, T., Xiao, G.: Tractable reasoning with dl-programs over
datalog-rewritable description logics. In: Procs of ECAI, pp. 35–40. IOS Press
(2010)
18. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S. (eds.):
OWL 2 Web Ontology Language: Primer (Second Edition). W3C Recommenda-
tion, December 11, 2012 (2012). http://www.w3.org/TR/owl2-primer/
19. Ivanov, V., Knorr, M., Leite, J.: A query tool for EL with non-monotonic rules.
In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo,
L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS, vol. 8218,
pp. 216–231. Springer, Heidelberg (2013)
20. Kaminski, T., Knorr, M., Leite, J.: Efficient paraconsistent reasoning with ontolo-
gies and rules. In: Procs. of IJCAI. IJCAI/AAAI (2015)
21. Kazakov, Y., Krötzsch, M., Simančı́k, F.: The incredible ELK: From polynomial
procedures to efficient reasoning with EL ontologies. Journal of Automated Rea-
soning 53, 1–61 (2013)
22. Knorr, M., Alferes, J.J., Hitzler, P.: Local closed world reasoning with description
logics under the well-founded semantics. Artif. Intell. 175(9–10), 1528–1554 (2011)
23. Knorr, M., Alferes, J.J.: Querying OWL 2 QL and non-monotonic rules. In: Aroyo,
L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist,
E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 338–353. Springer, Heidelberg
(2011)
24. Knorr, M., Hitzler, P., Maier, F.: Reconciling OWL and non-monotonic rules for
the semantic web. In: Procs. of ECAI, pp. 474–479. IOS Press (2012)
25. Knorr, M., Slota, M., Leite, J., Homola, M.: What if no hybrid reasoner is available?
Hybrid MKNF in multi-context systems. J. Log. Comput. 24(6), 1279–1311 (2014)
Reasoning over Ontologies and Non-monotonic Rules 401
26. Lifschitz, V.: Nonmonotonic databases and epistemic queries. In: Procs. of IJCAI
(1991)
27. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C. (eds.):
OWL 2 Web Ontology Language: Profiles. W3C Recommendation, February 5,
2013. http://www.w3.org/TR/owl2-profiles/
28. Motik, B., Rosati, R.: Reconciling description logics and rules. J. ACM 57(5)
(2010)
29. Slota, M., Leite, J., Swift, T.: Splitting and updating hybrid knowledge bases.
TPLP 11(4–5), 801–819 (2011)
30. Xiao, G., Eiter, T., Heymans, S.: The DReW system for nonmonotonic dl-
programs. In: Procs. of SWWS 2012. Springer Proceedings in Complexity. Springer
(2013)
On the Cognitive Surprise in Risk Management:
An Analysis of the Value-at-Risk (VaR)
Historical
1 Introduction
Financial markets such as stock markets are complex and dynamic environments
in which a variety of products are negotiated by a very large number of hetero-
geneous agents ([1], [2]). Agents, either human, artificial or hybrid, are hetero-
geneous in the sense that they have, for instance, different preferences, beliefs,
goals, and trading strategies (e.g., [3], [4]). Additionally, in such environments
agents need to cope with uncertainty and with different kinds of risks [5].
Generally speaking, in economical and financial systems, agents usually try
to assess in an objective or subjective way the risks they face. Ideally they would
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 402–413, 2015.
DOI: 10.1007/978-3-319-23485-4 40
On the Cognitive Surprise in Risk Management 403
2 Value-at-Risk (VaR)
The Value-at-Risk (VaR) tool is one of the most popular financial risk measures,
used by financial institutions all over the world [16]. The objective of the VaR
is to measure the probability for significant loss in a portfolio of financial assets
[17]. Generally speaking, we can assume that for a given time horizon t and a
confidence level p, VaR is the loss in market value over the time horizon t that is
exceeded with probability 1 − p [18]. For example, suppose a period of one-day
(t = 1) and a confidence level p of 95%, the VaR would be 0.05 or 5% the critical
value. There are several different methods for calculating VaR. For instance, let
us briefly present the following two methods to calculate VaR, the statistical and
historical approach [19].
The VaR statistical assumes that the historical returns respect the EMH. The
EMH in turn assumes that the series of historical financial returns are Gaussian,
with the average value μ of zero and constant variance of σ 2 , i.e., returns ∼
N (0, σ 2 ). Based on the EMH assumptions and on the Gaussian characteristics,
we could compute that VaR statistical for a confidence level p of 99% and 95% are
−2σ and −3σ, respectively. For example, if a series of returns show a standard
deviation of 5%, the VaR statistical for a confidence level of 99% and 95% are
-10% and -15%, respectively.
Unlike the VaR statistical method, an alternative way to calculate VaR is to
rank the historical simulation from the smallest to the highest, which is named
VaR historical. Suppose that the series of T returns are r1 ,..., rt , we define
that this series of returns are ranked if r1 ≤r2 ≤...≤rT . In this case, the VaR
historical is the return on the position integer ((1 − p) T ). For example, suppose
a confidence level p of 99% and T of 250, the VaR historical would be r3 . In the
case of p of 95%, the VaR historical would be r5 .
We consider the VaR historical as the most appropriate method for this work
for several reasons. First, because the VaR historical method is widely used
by practitioners ([20], [21], [22]). Second, because it is an easy to understand
measure that computes the estimation based on historical data and a confidence
level (we will later explain in Section 3.2 the particular importance of confidence).
Last but not least, because unlike the VaR statistical, VaR historical is free of
the assumption about the distribution of the series of returns [19].
to how beliefs are stored in memory. Our semantic memory, i.e., our general
knowledge and concepts about the world, is assumed to be represented in mem-
ory through knowledge structures known as schemas (e.g., [27]). A schema is a
well-integrated chunk of knowledge or sets of beliefs, which main source of infor-
mation available comes from abstraction from repeated personally experienced
events or generalizations, that are our episodic memory.
This study suggests that the intensity of surprise about an event Eg , from a
set of mutually exclusive events E1 , E2 , ..., Em , is a nonlinear function of the
difference, or contrast, between its probability/belief and the probability/belief
of the highest expected event (Eh ) in the set of mutually exclusive events E1 ,
E2 , ..., Em .
Formally, let (Ω, A, P ) be a probability space where Ω is the sample space
(i.e., the set of possible outcomes of the event), A = A1 , A2 , ..., An , is a σ-field of
subsets of Ω (also called the event space, i.e., all the possible events), and P is a
probability measure which assigns a real number P (F ) to every member F of the
σ-field A. Let E = E1 , E2 , ..., Em , Ei ∈ A, be a set of mutually exclusive events
m
in that probability space with probabilities P (Ei ) ≥ 0, such that P (Ei ) = 1.
i=1
Let Eh be the highest expected event from E. The intensity of surprise about an
event Eg , defined as S(Eg ), is calculated as S(Eg ) = log2 (1 + P (Eh ) − P (Eg ))
(Equation 1). In each set of mutually exclusive events, there is always at least
one event whose occurrence is unsurprising, namely Eh .
Fig. 1. SP500, daily close (left), daily return (center), and histogram of daily return
(right).
Fig. 2. SP500, daily close of calm period (left), daily close of crash period (center),
and histogram of daily return of calm period and crash period (right)
how humans use past experience in decision-making (e.g., [31]) which indicate
that in revising their beliefs, people tend to overweight recent information and
underweight prior information. The alpha set contains the 0.995, 0.99, 0,97, 0.94
values. A higher (lower) alpha implies a smaller (higher) level of forgetfulness.
Unlike the first approach, we opted to use just a confidence level of 0.99. The
reason is that we observed in our initial experiments that the combination of
a confidence level of 0.95 with the previous alpha values caused the agent to
“forget” too many returns, generating in the end a quite low and poor VaR
historical estimation. Therefore, in the second approach we have four treatments.
We conducted an experiment with these twelve different treatments.
The algorithm we used in our experiment works as follows. We first perform
initial adjustments to ensure that all iterations are actually carried out within a
specified period. For each simulation (1) for a window size value, windowk ∈ {50,
125, 250, 500}, (2) according to a uniform distribution function, we select a begin
day di within the period (for the calm period we fixed the di = 01-09-2008), (3)
we compute the VaR historical estimations VaR95 and VaR99 based on windowk
daily returns preceding the di as well as the confidence levels p ∈ {0.95, 0.99},
respectively, (4) for each day dk beginning in the day after di , from k = i + 1
to k = 210 (for the next 210 days), we check if daily return of dk ≤ VaR95 and
if daily return of dk ≤ VaR99 , (5) we advance the rolling window one day, i.e.,
408 D. Baccan et al.
i = i + 1, and (6) we back to step (3). For the alpha approach the algorithm
is similar, except for some minor adjustments, specifically in step (1) since the
algorithm runs for each alpha value ∈ {0.995, 0.99, 0, 97, 0.94}, and in step (3)
since we modify the preceding daily returns by applying the current alpha value
and compute the VaR historical estimation with a confidence level p of 0.99. All
other steps are exactly the same.
Let us now describe how we address this problem in the context of the artifi-
cial surprise, presented in Section 3. We essentially applied the concepts, ideas,
and method presented by Baccan and Macedo [32]. This work can be thought
of as a continuation and expansion of their initial work to other contexts. We
assume, for the sake of the experiment and simplicity, the confidence levels p
(0.99, and 0.95) as the subjective belief of the agent in the accurateness of the
VaR historical estimation. By making this assumption and considering a higher
subjective belief, we are empowering the agent with a “firmly believe” in the
accurateness of the VaR historical estimation. So, suppose an event Eg as VaR
historical estimation that can assume two mutually exclusive events, meaning
that it can be either correct (E1 ), i.e., daily return is not lower than estimation,
or incorrect (E2 ), i.e., daily return is lower than estimation.
The agent will either “feels” no surprise (a higher intensity of surprise) as
what he/she considered as more (less) likely, i.e., correct (E1 ) (incorrect (E2 )),
happened. More precisely, for the confidence level of 0.99 the surprise about
event E2 would be 0.9855004, i.e., S(Eg ) = log2 (1 + 0.99 − 0.01). Similarly, for
the confidence level of 0.95 the surprise about event E2 would be 0.9259994,
i.e., S(Eg ) = log2 (1 + 0.95 − 0.05). For each day dk in which the VaR historical
estimation is tested in step (4) of the algorithm described above, if the daily
return of dk ≤ VaRp , then we compute the cognitive surprise, surprisek .
In the end we have a sequence {surprise1 , ..., surprise210 }. Afterwards, for
each simulation, we compute the cumulative sum of the surprise. It means that
we generate a sequence of 210 elements as a result of the partial sums surprise1 ,
surprise1 + surprise2 , surprise1 + surprise2 + surprise3 , and so forth. In the
case of the calm period, we added the cumulative sum of the surprise of each
treatment and then average it by the number of simulations. The cumulative sum
and the average assumption make it easier both the observation of surprise over
time as well as the comparison of surprise between the calm and crash period.
The average cumulative sum of the surprise for a given treatment is presented
in the next figures for the calm period.
We ran 104 independent simulations for each treatment for the calm period
and one simulation for each treatment for the crash period (since the begin day
di = 01-09-2008 is fixed).
Figures 3 and 4 present the behaviour of all treatments for the calm period
and crash period, respectively. Figures 5 present a comparison between different
rolling window treatments and alpha treatments, respectively, for both periods.
On the Cognitive Surprise in Risk Management 409
5 Discussion
First of all, it is important to bear in mind that financial markets are by their very
nature complex and dynamic systems. Such complexity significantly increases if
we take into account the sophisticated and complex human decision-making
mechanism. As a result, we believe that the task of risk management in finance
and economics is indeed quite difficult. Therefore, we do not have in this work
the goal of providing evidence neither in favor of nor against to a particular risk
management tool or system. Instead, our goal is to compute, in a systematic
and clear method, the cognitive surprise “felt” by an agent relying on the VaR
historical during different periods, a calm period and a crash period.
Let us then begin by analyzing some characteristics of the calm period in
comparison with the crash period. We can see in Figure 1 that daily returns
seem, as expected and in line with the existing literature, to reproduce some
statistical regularities that are often found in a large set of different assets and
markets known as stylized facts [33]. Specifically we can observe that returns
do not follow a Gaussian distribution (are not normally distributed), seem to
exhibit what is known as fat-tails, as well as to reproduce the volatility cluster-
ing fact, i.e., high-volatility events tend to cluster in time. Volatility clustering
resembles the concept of entropy used in a variety of areas such as information
and communication theory. We can see in Figure 2 that the comparison between
the daily returns of the calm period with the daily returns of the crash period
allows us to claim that the daily returns for the crash period seem to exhibit
fat tail distributions. We can also see that during the crash period the SP&500
depreciated almost 50% in value in a short period of time.
We now turn our attention to the analysis of the cognitive surprise. For the
calm period, we can observe in Figure 3 that the cumulative surprise for alpha
treatment (right) is higher when compared to all window treatments (left). Addi-
tionally, the lower the alpha, the higher the cumulative surprise and, similarly,
the larger the window size, the lower the surprise. The cumulative surprise of the
window treatments with a confidence level p of 0.95, VaR95 is, in turn, higher
when compared to the cumulative surprise with a confidence level p of 0.99,
VaR99 . Interestingly, if we assume that alpha somewhat emulates the human
memory process of “forgetting” and considering that a lower alpha implies a
higher level of forgetfulness, we may argue that an agent should be careful in
forgetting the past, at least in the context of a stock market, since the cognitive
surprise “felt” by an agent under this treatment is significantly higher than the
window treatment.
For the crash period, we first observe in Figure 4 that, once again, the lower
the alpha, the higher the cumulative surprise (right). However, quite contrary
to the calm period, the lower the window size, the lower the surprise (left). The
cognitive surprise “felt” is higher for agents that rely on a larger window size. It
may be explained by the fact that the crash period is indeed a period in which
the volatility is high. Therefore, a VaR historical based on a larger window size
takes more time to adapt to this new and changing environment, in comparison
with a VaR historical based on a smaller window size.
On the Cognitive Surprise in Risk Management 411
When comparing in Figure 5 how treatments behave during the calm period
and the crash period, we can observe that, as expected, the cognitive surprise is
higher during the crash period.
Generally speaking, each day the agent “felt” surprise represents a failed
VaR historical estimation. Therefore, the higher the surprise, the wrong a given
VaR historical treatment is. The analysis of the results indicate that it may be
quite difficult for an agent not to “feel” surprise when relying on VaR historical.
Indeed, there are several issues with models, like VaR historical, that take into
account historical data for their estimation. These models somewhat assume the
past is a good indicator of what may happen in the future, i.e., history repeats
itself. However, this inductive reasoning often underestimate the probability of
extreme returns and, consequently, underestimates the level of risk. Addition-
ally, an essential flaw of this kind of rationale is not to truly acknowledge that
“absence of evidence is not evidence of absence” and the existence of “unknown
unknowns” [34].
Consider, for instance, the turkey paradox ([35]). There is a butcher and a
turkey. Every day for let us say 100 days the butcher feeds the turkey. As time
goes by the turkey increases its belief that in the next day it will receive food
from the butcher. However, at a given day, for the “shock” and “surprise” of the
turkey, instead of being feed by the butcher, the butcher kills the turkey. The
same analogy may be applied to the black swan scenario [6] as well as to other
complex and risky financial operations that provide small but regular gains, until
the day they blow all the gains, resulting in huge losses [15].
The main contribution of our work resides in the fact that we have applied in
a systematic, clear and easy to reproduce way the ideas, concepts and methods
described by Baccan and Macedo [32] to the context of risk, uncertainty, and
therefore risk management. Our interdisciplinary work is in line with those who
claim that there is a need for novel approaches so that complex and financial
systems may be improved (e.g., [12], [13], [14]). It is, as far as we know, one of
the first attempts to apply a cognitive science perspective to risk management.
In this paper we addressed the problem of risk management from the cognitive
science perspective 1 . We computed the cognitive surprise “felt” by an agent rely-
ing on a popular risk management tool known as Value-at-Risk (VaR) historical.
We applied this approach to the S&P500 stock market index from 26-11-1990 to
01-07-2009, and divide the series into two subperiods, a calm period and a crash
period. We carried out an experiment with twelve different treatments and for
each treatment we compare the intensity of surprise “felt” by the agent under
these two different regimes. This interdisciplinary work is in line with a broader
movement and contributes toward the truly understanding and improvement of
1
This work was supported by FCT, Portugal, SFRH/BD/60700/2009, and by
TribeCA project, funded by FEDER through POCentro, Portugal.
412 D. Baccan et al.
References
1. Fama, E.F.: Efficient capital markets: A review of theory and empirical work. The
Journal of Finance 25(2), 383–417 (1970)
2. Fama, E.F.: Two pillars of asset pricing. American Economic Review 104(6),
1467–1485 (2014)
3. Lo, A.W.: Reconciling efficient markets with behavioral finance: The adaptive mar-
kets hypothesis. Journal of Investment Consulting 7, 21–44 (2005)
4. Treleaven, P., Galas, M., Lalchand, V.: Algorithmic trading review. Commun. ACM
56(11), 76–85 (2013)
5. Lo, A., Mueller, M.: WARNING: physics envy may be hazardous to your wealth!.
Journal of Investment Management 8, 13–63 (2010)
6. Taleb, N.N.: The Black Swan: The Impact of the Highly Improbable, 1st edn.
Random House, April 2007
7. Meder, B., Lec, F.L., Osman, M.: Decision making in uncertain times: what can
cognitive and decision sciences say about or learn from economic crises? Trends in
Cognitive Sciences 17(6), 257–260 (2013)
8. Markowitz, H.: Portfolio selection. Journal of Finance 7, 77–91 (1952)
9. Kahneman, D.: The myth of risk attitudes. The Journal of Portfolio Management
36(1), 1 (2009)
10. Chiodo, A., Guidolin, M., Owyang, M.T., Shimoji, M.: Subjective probabili-
ties: psychological evidence and economic applications. Technical report, Federal
Reserve Bank of St. Louis (2003)
11. Kahneman, D., Tversky, A.: Prospect theory: An analysis of decision under risk.
Econometrica 47(2), 263–291 (1979)
12. Farmer, J.D., Foley, D.: The economy needs agent-based modelling. Nature
460(7256), 685–686 (2009)
13. Bouchaud, J.: Economics needs a scientific revolution. Nature 455(7217), 1181
(2008)
14. Gatti, D., Gaffeo, E., Gallegati, M.: Complex agent-based macroeconomics: a man-
ifesto for a new paradigm. Journal of Economic Interaction and Coordination 5(2),
111–135 (2010)
On the Cognitive Surprise in Risk Management 413
15. Taleb, N.N.: Antifragile: Things That Gain from Disorder. Reprint edition edn.
Random House Trade Paperbacks, New York (2014)
16. Kawata, R., Kijima, M.: Value-at-risk in a market subject to regime switching.
Quantitative Finance 7(6), 609–619 (2007)
17. Alexander, C., Sarabia, J.M.: Quantile Uncertainty and Value-at-Risk Model Risk.
Risk Analysis 32(8), 1293–1308 (2012)
18. Duffie, D., Pan, J.: An Overview of Value at Risk. The Journal of Derivatives 4(3),
7–49 (1997)
19. Jorion, P.: Value at Risk: The New Benchmark for Managing Financial Risk, 3rd
edn. McGraw-Hill (2006)
20. Halbleib, R., Pohlmeier, W.: Improving the value at risk forecasts: Theory and
evidence from the financial crisis. Journal of Economic Dynamics and Control
36(8), 1212–1228 (2012)
21. David Cabedo, J., Moya, I.: Estimating oil price Value at Risk using the historical
simulation approach. Energy Economics 25(3), 239–253 (2003)
22. Hendricks, D.: Evaluation of value-at-risk models using historical data. Economic
Policy Review, 39–69, April 1996
23. Reisenzein, R., Hudlicka, E., Dastani, M., Gratch, J., Hindriks, K., Lorini, E.,
Meyer, J.J.: Computational modeling of emotion: Towards improving the inter-
and intradisciplinary exchange. IEEE Transactions on Affective Computing 99(1),
1 (2013)
24. Reisenzein, R.: Emotions as metarepresentational states of mind: Naturalizing the
belief-desire theory of emotion. Cognitive Systems Research 10(1), 6–20 (2009)
25. Meyer, W.U., Reisenzein, R., Schutzwohl, A.: Toward a process analysis of emo-
tions: The case of surprise. Motivation and Emotion 21, 251–274 (1997)
26. Macedo, L., Cardoso, A., Reisenzein, R., Lorini, E., Castelfranchi, C.: Artificial
surprise. In: Handbook of Research on Synthetic Emotions and Sociable Robotics:
New Applications in Affective Computing and Artificial Intelligence, pp. 267–291
(2009)
27. Baddeley, A., Eysenck, M., Anderson, M.C.: Memory, 1 edn. Psychology Press,
February 2009
28. Macedo, L., Reisenzein, R., Cardoso, A.: Modeling forms of surprise in artificial
agents: empirical and theoretical study of surprise functions. In: 26th Annual Con-
ference of the Cognitive Science Society, pp. 588–593 (2004)
29. Lorini, E., Castelfranchi, C.: The cognitive structure of surprise: looking for basic
principles. Topoi: An International Review of Philosophy 26, 133–149 (2007)
30. Ortony, A., Partridge, D.: Surprisingness and expectation failure: what’s the dif-
ference? In: Proceedings of the 10th International Joint Conference on Artificial
Intelligence, vol. 1, pp. 106–108. Morgan Kaufmann Publishers Inc., Milan (1987)
31. Griffin, D., Tversky, A.: The weighing of evidence and the determinants of confi-
dence. Cognitive Psychology 24(3), 411–435 (1992)
32. Baccan, D., Macedo, L., Sbruzzi, E.: Towards modeling surprise in economics and
finance: a cognitive science perspective. In: STAIRS 2014. Frontiers in Artificial
Intelligence and Applications, vol. 264, Prague, Czech Republic, pp. 31–40 (2014)
33. Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues.
Quantitative Finance 1(2), 223 (2001)
34. Rumsfeld, D.: DoD News Briefing (2002)
35. Taleb, N.N.: Fooled by Randomness: The Hidden Role of Chance in Life and in
the Markets, 2 edn. Random House, October 2008
Logic Programming Applied to Machine Ethics
1 Introduction
The need for systems or agents that can function in an ethically responsible
manner is becoming a pressing concern, as they become ever more autonomous
and act in groups, amidst populations of other agents, including humans. Its
importance has been emphasized as a research priority in AI with funding sup-
port [26]. Its field of enquiry, named machine ethics, is interdiscplinary, and is
not just important for equipping agents with some capacity for moral decision-
making, but also to help better understand morality, via the creation and testing
of computational models of ethical theories.
Several logic-based formalisms have been employed to model moral theories
or particular morality aspects, e.g., deontic logic in [2], non-monotonic reason-
ing in [6], and the use of Inductive Logic Programming (ILP) in [1]; some of
them only abstractly, whereas others also provide implementations (e.g., using
ILP-based systems [1], an interactive theorem prover [2], and answer set pro-
gramming (ASP) [6]). Despite the aforementioned logic-based formalisms, Logic
Programming (LP) itself is rather limitedly explored. The potential and suitabil-
ity of LP, and of computational logic in general, for machine ethics, is identified
and discussed at length in [11], on the heels of our work. LP permits declara-
tive knowledge representation of moral cases with sufficiently level of detail to
distinguish one moral case from other similar cases. It provides a logic-based
programming paradigm with a number of practical Prolog systems, thus allow-
ing not only addressing morality issues in an abstract logical formalism, but also
via a Prolog implementation as proof of concept and a testing ground for exper-
imentation. Furthermore, LP are also equipped with various reasoning features,
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 414–422, 2015.
DOI: 10.1007/978-3-319-23485-4 41
Logic Programming Applied to Machine Ethics 415
In [17], moral permissibility is modeled through several cases of the classic trol-
ley problem [5], by emphasizing the use of ICs in abduction and preferences over
abductive scenarios. The cases, which include moral principles, are modeled in
order to deliver appropriate moral decisions that conform with those the major-
ity of people make, on the basis of empirical results in [9]. DDE [15] is utilized
in [9] to explain the consistency of judgments, shared by subjects from demo-
graphically diverse populations, on a series of trolley dilemmas. In addition to
DDE, we also consider DTE [10].
416 A. Saptawijaya and L.M. Pereira
Each case of the trolley problem is modeled individually; their details being
referred to [17]. The key points of their modeling are as follows. The DDE and
DTE are modeled via a priori ICs and a posteriori preferences. Possible deci-
sions are modeled as abducibles, encoded in Acorda by even loops over default
negation. Moral decisions are therefore accomplished by satisfying a priori ICs,
computing abductive stable models from all possible abductive solutions, and
then appropriately preferring amongst them (by means of rules), a posteriori,
just some models, on the basis of their abductive solutions and consequences.
Such preferred models turn out to conform with the results reported in the
literature.
Capturing Deontological Judgment via a Priori ICs. In this application, ICs are
used for two purposes. First, they are utilized to force the goal in each case (like
in [9]), by observing the desired end goal resulting from each possible decision.
Such an IC thus enforces all available decisions to be abduced, together with
their consequences, from all possible observable hypothetical end goals. The
second purpose of ICs is for ruling out impermissible actions, viz., actions that
involve intentional killing in the process of reaching the goal, enforced by the
IC: f alse ← intentional killing. The definition of intentional killing depends
on rules in each case considered and whether DDE or DTE is to be upheld.
Since this IC serves as the first filter of abductive stable models, by ruling out
impermissible actions, it affords us with just those abductive stable models that
contain only permissible actions.
fact that the man on the bridge died on the side track and the agent was seen on
the bridge at the occasion. Is the agent guilty (beyond reasonable doubt), in the
sense of violating DDE, of shoving the man onto the track intentionally?
To answer it, abduction is enacted to reason about the verdict, given the avail-
able evidence. Considering the active goal judge, to judge the case, two abducibles
are available: verdict(guilty brd) and verdict(not guilty), where guilty brd
stands for ‘guilty beyond reasonable doubt’. Depending on how probable each
verdict (the value of which is determined by the probability print shove (P ) of
intentional shoving), a preferred verdict(guilty brd) or verdict(not guilty) is
abduced as a solution.
The probability with which shoving is performed intentionally is causally
influenced by evidences and their attending truth values. Two evidences are con-
sidered, viz., (1) Whether the agent was running on the bridge in a hurry; and
(2) Whether the bridge was slippery at the time. The probability print shove (P )
of intentional shoving is therefore determined by the existence of evidence,
expressed as dynamic predicates evd run/1 and evd slip/1, whose sole argu-
ment is true or f alse, standing for the evidences that the agent was running in
a hurry and that the bridge was slippery, resp.
Based on this representation, different judgments can be delivered, subject
to available (observed) evidences and their attending truth value. By considering
the standard probability of proof beyond reasonable doubt –here the value of
0.95 is adopted [16]– as a common ground for the probability of guilty verdicts to
be qualified as ‘beyond reasonable doubt’, a form of argumentation (à la Scanlon
contractualism [23]) may take place through presenting different evidence (via
updating of observed evidence atoms, e.g., evd run(true), evd slip(f alse), etc.)
as a consideration to justify an exception. Whether the newly available evidence
is accepted as a justification to an exception –defeating the judgment based
on the priorly presented evidence– depends on its influence on the probability
print shove (P ) of intentional shoving, and thus eventually influences the final
verdict. That is, it depends on whether this probability is still within the agreed
standard of proof beyond reasonable doubt. The reader is referred to [8], which
details a scenario capturing this moral jurisprudence viewpoint.
Distinct from the two previous applications, Qualm emphasizes the interplay
between LP abduction, updating and counterfactuals, supported furthermore by
their joint tabling techniques.
Moral Updating. Moral updating (and evolution) concerns the adoption of new
(possibly overriding) moral rules on top of those an agent currently follows. Such
adoption often happens in the light of situations freshly faced by the agent, e.g.,
when an authority contextually imposes other moral rules, or due to some cul-
tural difference. In [12], moral updating is illustrated in an interactive storytelling
(using Acorda), where the robot must save the princess imprisoned in a castle,
by defeating either of two guards (a giant spider or a human ninja), while it
should also attempt to follow (possibly conflicting) moral rules that may change
dynamically as imposed by the princess (for the visual demo, see [13]).
The storytelling is reconstructed in this paper using Qualm, to particularly
demonstrate: (1) The direct use of LP updating so as to place a moral rule
into effect; and (2) The relevance of contextual abduction to rule out tabled but
incompatible abductive solutions, in case a goal is invoked by a non-empty initial
abductive context (the content of this context may be obtained already from
another agent, e.g., imposed by the princess). A simplified program modeling
the knowledge of the princess-savior robot in Qualm is shown below, where
f ight/1 is an abducible predicate:
guard(spider). guard(ninja). human(ninja).
survive f rom(G) ← utilV al(G, V ), V > 0.6. utilV al(spider, 0.4). utilV al(ninja, 0.7).
intend saveP rincess ← guard(G), f ight(G), survive f rom(G).
intend saveP rincess ← guard(G), f ight(G).
Logic Programming Applied to Machine Ethics 419
The first rule of intend saveP rincess corresponds to a utilitarian moral rule
(wrt. the robot’s survival), whereas the second one to a ‘knight’ moral, viz., to
intend the goal of saving the princess at any cost (irrespective of the robot’s
survival chance). Since each rule in Qualm is assigned a unique name in
its transform (based on rule name fluent in [21]), the name of each rule for
intend saveP rincess may serve as a unique moral rule identifier for updating
by toggling the rule’s name, say via rule name fluents #rule(utilitarian) and
#rule(knight), resp. In the subsequent plots, query ?- intend saveP rincess is
referred, representing the robot’s intent on saving the princess.
In the first plot, when both rule name fluents are retracted, the robot does not
adopt any moral rule to save the princess, i.e., the robot has no intent to save the
princess, and thus the princess is not saved. In the second (restart) plot, in order
to maximize its survival chance in saving the princess, the robot updates itself
with the utilitarian moral: the program is updated with #rule(utilitarian). The
robot thus abduces f ight(ninja) so as to successfully defeat the ninja instead
of confronting the humongous spider.
The use of tabling in contextual abduction is demonstrated in the third
(start again) plot. Assuming that the truth of survive f rom(G) implies the
robot’s success in defeating (killing) guard G, the princess argues that the
robot should not kill the human ninja, as it violates the moral rule she fol-
lows, say a ‘Gandhi’ moral, expressed by the following rule in her knowledge
(the first three facts in the robot’s knowledge are shared with the princess):
f ollow gandhi ← guard(G), human(G), not f ight(G). That is, the princess
abduces not f ight(ninja) and imposes this abductive solution as the initial
(input) abductive context of the robot’s goal (viz., intend saveP rincess). This
input context is inconsistent with the tabled abductive solution f ight(ninja),
and as a result, the query fails: the robot may argue that the imposed ‘Gandhi’
moral conflicts with its utilitarian rule (in the visual demo [13], the robot
reacts by aborting its mission). In the final plot, as the princess is not saved
yet, she further argues that she definitely has to be saved, by now addition-
ally imposing on the robot the ‘knight’ moral. This amounts to updating the
rule name fluent #rule(knight) so as to switch on the corresponding rule. As
the goal intend saveP rincess is still invoked with the input abductive context
not f ight(ninja), the robot now abduces f ight(spider) in the presence of the
newly adopted ‘knight’ moral. Unfortunately, it fails to survive, as confirmed by
the failing of the query ?- survive f rom(spider).
The plots in this story reflect a form of deliberative employment of moral
judgments within Scanlon’s contractualism. For instance, in the second plot,
the robot may justify its action to fight (and kill) the ninja due to the utilitar-
ian moral it adopts. This justification is counter-argued by the princess in the
subsequent plot, making an exception in saving her, by imposing the ‘Gandhi’
moral, disallowing the robot to kill a human guard. In this application, rather
than employing updating, this exception is expressed via contextual abduction
with tabling. The robot may justify its failing to save the princess (as the robot
leaving the scene) by arguing that the two moral rules it follows (viz., utilitarian
420 A. Saptawijaya and L.M. Pereira
and ‘Gandhi’) are conflicting wrt. the situation it has to face. The argumenta-
tion proceeds, whereby the princess orders the robot to save her whatever risk
it takes, i.e., the robot should follow the ‘knight’ moral.
References
1. Anderson, M., Anderson, S.L.: EthEl: Toward a principled ethical eldercare robot.
In: Procs. AAAI Fall 2008 Symposium on AI in Eldercare (2008)
2. Bringsjord, S., Arkoudas, K., Bello, P.: Toward a general logicist methodology for
engineering ethically correct robots. IEEE Intelligent Systems 21(4), 38–44 (2006)
3. Cushman, F., Young, L., Greene, J.D.: Multi-system moral psychology. In: Doris,
J.M. (ed.) The Moral Psychology Handbook. Oxford University Press (2010)
4. Dell’Acqua, P., Pereira, L.M.: Preferential theory revision. Journal of Applied Logic
5(4), 586–601 (2007)
5. Foot, P.: The problem of abortion and the doctrine of double effect. Oxford Review
5, 5–15 (1967)
6. Ganascia, J.-G.: Modelling ethical rules of lying with answer set programming.
Ethics and Information Technology 9(1), 39–47 (2007)
7. Anh, H.T., Kencana Ramli, C.D.P., Damásio, C.V.: An implementation of
extended P-Log using XASP. In: Garcia de la Banda, M., Pontelli, E. (eds.) ICLP
2008. LNCS, vol. 5366, pp. 739–743. Springer, Heidelberg (2008)
8. Han, T.A., Saptawijaya, A., Pereira, L.M.: Moral reasoning under uncertainty. In:
Bjørner, N., Voronkov, A. (eds.) LPAR-18 2012. LNCS, vol. 7180, pp. 212–227.
Springer, Heidelberg (2012)
9. Hauser, M., Cushman, F., Young, L., Jin, R.K., Mikhail, J.: A dissociation between
moral judgments and justifications. Mind and Language 22(1), 1–21 (2007)
10. Kamm, F.M.: Intricate Ethics: Rights, Responsibilities, and Permissible Harm.
Oxford U. P (2006)
11. Kowalski, R.: Computational Logic and Human Thinking: How to be Artificially
Intelligent. Cambridge U. P (2011)
12. Lopes, G., Pereira, L.M.: Prospective storytelling agents. In: Carro, M., Peña, R.
(eds.) PADL 2010. LNCS, vol. 5937, pp. 294–296. Springer, Heidelberg (2010)
13. Lopes, G., Pereira, L.M.: Visual demo of “Princess-saviour Robot” (2010). http://
centria.di.fct.unl.pt/∼lmp/publications/slides/padl10/quick moral robot.avi
14. Mallon, R., Nichols, S.: Rules. In: Doris, J.M. (ed.) The Moral Psychology Hand-
book. Oxford University Press (2010)
15. McIntyre, A.: Doctrine of double effect. In: Zalta, E.N. (ed.) The Stanford Encyclo-
pedia of Philosophy. Center for the Study of Language and Information, Stanford
University, Fall 2011 edition (2004). http://plato.stanford.edu/archives/fall2011/
entries/double-effect/
16. Newman, J.O.: Quantifying the standard of proof beyond a reasonable doubt: a
comment on three comments. Law, Probability and Risk 5(3–4), 267–269 (2006)
17. Pereira, L.M., Saptawijaya, A.: Modelling morality with prospective logic. In:
Anderson, M., Anderson, S.L. (eds.) Machine Ethics, pp. 398–421. Cambridge U.
P (2011)
18. Pereira, L.M., Saptawijaya, A.: Abduction and beyond in logic programming with
application to morality. Accepted in “Frontiers of Abduction”, Special Issue in
IfCoLog Journal of Logics and their Applications (2015). http://goo.gl/yhmZzy
422 A. Saptawijaya and L.M. Pereira
19. Pereira, L.M., Saptawijaya, A.: Bridging two realms of machine ethics. In: White,
J.B., Searle, R. (eds.) Rethinking Machine Ethics in the Age of Ubiquitous Tech-
nology. IGI Global (2015)
20. Pereira, L.M., Saptawijaya, A.: Counterfactuals in Logic Programming with Appli-
cations to Agent Morality. Accepted at a special volume of Logic, Argumentation
& Reasoning (2015). http://goo.gl/6ERgGG (preprint)
21. Saptawijaya, A., Pereira, L.M.: Incremental tabling for query-driven propagation
of logic program updates. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.)
LPAR-19 2013. LNCS, vol. 8312, pp. 694–709. Springer, Heidelberg (2013)
22. Saptawijaya, A., Pereira, L.M.: TABDUAL: a Tabled Abduction System for Logic
Programs. IfCoLog Journal of Logics and their Applications 2(1), 69–123 (2015)
23. Scanlon, T.M.: What We Owe to Each Other. Harvard University Press (1998)
24. Scanlon, T.M.: Moral Dimensions: Permissibility, Meaning, Blame. Harvard Uni-
versity Press (2008)
25. Swift, T.: Tabling for non-monotonic programming. Annals of Mathematics and
Artificial Intelligence 25(3–4), 201–240 (1999)
26. The Future of Life Institute. Research Priorities for Robust and Beneficial Arti-
ficial Intelligence (2015). http://futureoflife.org/static/data/documents/research
priorities.pdf
Intelligent Information Systems
Are Collaborative Filtering Methods Suitable for Student
Performance Prediction?
Hana Bydžovská()
CSU and KD Lab Faculty of Informatics, Masaryk University, Brno, Czech Republic
[email protected]
1 Introduction
Students have to accomplish all the study requirements defined by their university.
The most important is to pass all mandatory courses and to select elective and volun-
tary courses that they are able to pass. Masaryk University offers a vast amount of
courses to its students. Therefore, it is very difficult for students to make a good deci-
sion. It is important for us to understand students’ behavior to be able to guide them
through their studies to graduate. Our goal is to design an intelligent module inte-
grated into the Information System of Masaryk University that will help students with
selecting suitable courses and warn them against too difficult ones. Necessarily, we
need to be able to predict whether a student will succeed or fail in an investigated
course in order to realize the module. We need the information at the beginning of a
term when we have no information about students’ knowledge, skills or enthusiasm
for any particular course. We also do not want to obtain the information directly from
students using questionnaires. Since the questionnaires tend to have a lower response
rate, we use only verifiable data from the university information system.
We have drawn inspiration from techniques utilized in recommender systems. No-
wadays, usage of collaborative filtering (CF) methods [5] spreads over many areas
including the educational environment. Walker et al. [9] designed a system called
Altered Vista that was specifically aimed at teachers and students who reviewed web
resources targeted at education. The system implemented CF methods in order to
recommend web resources to its users. Many researchers aimed at e-learning, e.g.
Loll et al. [6] designed a system that enables students to solve their exercises and to
criticize their schoolmates’ solutions. The system based on students’ answers could
reveal difficult tasks and recommend good solutions to enhance students’ knowledge.
In this paper, we report on the possibility to estimate student performance in par-
ticular courses based only on knowledge of students’ previously passed courses. We
utilize different CF to estimate the final prediction. The preliminary work can be seen
in [1]. Now we explore the most suitable settings of the approach in detail and com-
pare the results with our previous approach using classification algorithms [2].
3 Experiment
Our aim was to predict the grades of students enrolled in the investigated courses
in the year 2012 based on the results of similar students enrolled in the same courses
in the years 2010 and 2011. Then we could verify the predictions with the real grades
and evaluate the methods and the settings. Then we selected the most suitable method
and verified it on data about students enrolled in the same courses in the year 2013.
Similarity of Students. For each student, we constructed four vectors of grades cha-
racterizing the knowledge. The values were computed with respect to the number of
repetition of each course. We consider only the last grade (NEWEST), a grade of each
attempt at the last year (YEAR), only the last grades of each repetition (LAST), and
all grades (ALL). For example, a student failed a course in the first year using three
attempts and got the grades 444. The student had to repeat the course next year. Sup-
posing he or she got the grades 442, the student’s values for this particular course
were the following: NEWEST: 2, YEAR: 4+4+2, LAST: 4+2, ALL: 4+4+4+4+4+2.
Vectors of grades were compared by five methods. Mean absolute difference
(MAD) and Root mean squared difference (RMSD) measure the mean difference of
the investigated student’s grades and the grades of students’ in their shared courses.
The lower the value, the better the result is. The other methods return values near 1
for the best results. Cosine similarity (COS) and Pearson’s correlation coefficient
(PC) define the similarity of grades of shared courses. Jaccard’s coefficient (JC) de-
fines the ratio of shared and different courses. Supposing that students’ knowledge
can be represented with passed courses, it was very important to calculate the overlap
of students’ courses.
• Top x, where x ∈ [1; 50] with step 1; (the analysis [4] indicates that the neighbor-
hood of 20 - 50 neighbors is usually optimal).
• More similar than the threshold y, where y ∈ [0; 1] with step 0.1.
• We also utilized the idea of baseline user [8]. We selected only these students to
the neighborhood that were more similar to the investigated one than the investi-
gated one to the baseline user. We decided to calculate two types of baseline user:
─ Average student – we characterized an average student by the average grades of
courses in which the investigated students were enrolled in.
─ Uniform student – we characterized a uniform student by grades with values 2.5
(the average grade through all courses) of all courses in which the investigated
student was enrolled in.
4 Results
Mean absolute error (MAE) represents the size of the prediction error. The exact
grade prediction is very difficult and even less powerful prediction can be sufficient.
Therefore, we also predicted the grades as good (1) / bad (2) / failure (4) or just suc-
cess or failure. The results of the CF methods were compared with our previous work
described in Section 2 where the predictions were obtained using classification algo-
rithms (CA). We used a confusion matrix for calculating MAE. We mined study-
related data and data about social behavior of students.
The comparison of both approaches can be seen in Table 1. Although both the ap-
proaches used different data from the information system and utilized different
processing, the results showed that their performance was almost similar. The only
one significant difference can be seen in grade prediction when CF methods were
slightly better. We consider the accuracy of 78.5% for student success or failure pre-
diction reliable enough considering that we did not know students’ skills or enthu-
siasm for courses. MAE of good / bad / failure prediction was around 0.6. We consid-
er MAE less than one degree in the modified grade scale to be very satisfactory. Even
in the grade (1, 1.5, 2, 2.5, 3, and 4) prediction, MAE was around 0.7 which means
only slightly more than one degree in the grade scale. In general, these results were
positive but the grade prediction was still not trustworthy.
The advantage of the CF approach is that all information systems store the data
about students’ grades. Therefore, this approach can be used in all systems. Our pre-
vious approach was based on mining data obtained from the information system. But
not all systems store the data about social behavior of students. We proved that this
data improve the accuracy of the results significantly [2].
5 Discussion
The settings of the CF approach that reached the best average results can be seen in
Table 2. As the results show, PC worked properly in combination with the uniform
student for selecting a proper neighborhood and significance weighting with an exten-
sion using average grades of compared students for the final prediction. On the other
hand, for MAD, a Top x function was the best option for selecting the neighborhood
and median for the final prediction. Both the approaches reached very similar results
Are Collaborative Filtering Methods Suitable for Student Performance Prediction? 429
in all tasks and we consider them to be trustworthy. We also investigated the most
suitable x for these tasks. We searched for the minimal x with the best possible results.
We derived x = 25 to be the best choice generally for all methods and settings. The
most suitable classification algorithms were SMO and Random Forests (Table 3).
Table 2. The settings of the CF approach that reached the best average results
Sim. function Neighborhood Estimation approach
Grade PC Uniform student Sig. weighting +
Good/bad/failure PC Uniform student Sig. weighting +
Success/failure MAD Top 25 Median
Table 3. The settings of the classification algorithms that reached the best average results
Classification algorithm Feature selection algorithm
Grade SMO InfoGainAttributeEval
Good/bad/failure SMO OneRAttributeEval
Success/failure Random Forests 5 attributes selected by each
FS algorithm for each course
6 Conclusion
In this paper, we used CF methods for student modeling. Our experiment provides
evidence that CF approach is also suitable for student performance prediction. The
data set comprised of 62 courses taught in 4 years with almost 3,423 students and
their 42,635 grades. We confirmed our hypothesis, that students’ knowledge can be
sufficiently characterized only by their previously passed courses that should cover
their knowledge of the field of study. We processed data about students’ grades stored
in the Information System of Masaryk University to be able to estimate students’
interests, enthusiasm and prerequisites for passing enrolled courses at the beginning
of each term. For each investigated student, we searched for students enrolled in the
same courses in the last years who were the most similar ones to the investigated stu-
dent. Based on their study results, we predicted the students’ performance.
We compared the results with the results obtained by classification algorithms that
researches usually utilize for student performance prediction. The results were almost
the same. The main advantage of CF approach is that all university information sys-
tems store the data about students’ grades needed for the prediction. On the other
hand, this approach is not suitable if we have no information about the history of the
particular students. Now, we are able to predict the student success or failure with the
accuracy of 78.5%, whether the grade will be good, bad, or failure with the MAE of
430 H. Bydžovská
0.6 and the exact grade with the MAE of 0.7. We consider the results to be very satis-
factory and CF approach can be considered as expressive as the commonly used clas-
sification algorithms.
Based on this approach we can recommend suitable voluntary courses for each stu-
dent with respect to his or her interests and skills. We hope that this information will
also encourage students to study hard when they have to enroll in a mandatory course
that seems to be too difficult for them. Moreover, teachers can utilize this information
to identify potentially weak students and help them before they will be at risk to fail
the course. This approach can be also beneficially used in an intelligent tutoring sys-
tem as the basic estimation of students’ potentials before they start to operate with the
system in the investigated course.
References
1. Bydžovská, H.: Student performance prediction using collaborative filtering methods. In:
Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M. (eds.) AIED 2015. LNCS, vol. 9112,
pp. 550–553. Springer, Heidelberg (2015)
2. Bydžovská, H., Popelínský, L.: The Influence of social data on student success prediction.
In: Proceedings of the 18th International Database Engineering & Applications Sympo-
sium, pp. 374–375 (2014)
3. Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for per-
forming collaborative filtering. In: Proceedings of the 22nd Annual International ACM
SIGIR Conference, pp. 230–237 (1999)
4. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommenda-
tions. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work,
pp. 241–250 (2000)
5. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems: An Intro-
duction. Cambridge University Press (2010)
6. Loll, F., Pinkwart N.: Using collaborative filtering algorithms as elearning tools. In:
Proceedings of the 42nd Hawaii International Conference on System Sciences (2009)
7. Marquez-Vera, C., Romero, C., Ventura, S.: Predicting school failure using data mining.
In: Pechenizkiy, M., et al. (eds.) EDM, pp. 271–276 (2011)
8. Matuszyk, P., Spiliopoulou, M.: Hoeffding-CF: neighbourhood-based recommendations
on reliably similar users. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P.,
Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 146–157. Springer, Heidelberg
(2014)
9. Walker, A., Recker, M.M., Lawless, K., Wiley, D.: Collaborative Information Filtering:
A Review and an Educational Application. International Journal of Artificial Intelligence
in Education 14(1), 3–28 (2004)
10. Witten, I., Frank, E, Hall, M.: Data Mining: Practical Machine Learning Tools and
Techniques, 3rd edn. Morgan Kaufmann Publishers (2011)
Intelligent Robotics
A New Approach for Dynamic Strategic
Positioning in RoboCup Middle-Size League
1 Introduction
RoboCup (“Robot Soccer World Cup”) is a scientific initiative with an annual
international meeting and competition that started in 1997. The aim is to world-
widly promote developments in Artificial Intelligence, Robotics and Multi-agent
systems. Robot soccer represents one of the attractive domains promoted by
RoboCup for the development and testing of multi-agent collaboration tech-
niques, computer vision algorithms and artificial intelligence approaches, only
to name a few.
In the RoboCup Middle Size League (MSL), autonomous mobile soccer
robots must coordinate and collaborate for playing and winning a game of soc-
cer, similar to the human soccer games. They have to assume dynamic roles in
the field, to share information about visible objects of interest or obstacles and
to position themselves in the field so that they can score goals and prevent the
opponent team from scoring. Decisions such as game strategies, positioning and
team coordination play a major role in the MSL soccer games.
This paper introduces Utility Maps as a tool for the dynamic positioning of
soccer robots on the field and for opportunistic passing between robots, under
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 433–444, 2015.
DOI: 10.1007/978-3-319-23485-4 43
434 A.J.R. Neves et al.
different situations that will be presented throughout the paper. As far as the
authors know, no previous work has been presented about the use of Utility
Maps in the Middle Size League of RoboCup.
The paper is structured into 8 sections, first of them being this Introduction.
In Section 2 we present a summary of the work already done on strategic posi-
tioning. Section 3 introduces the use of Utility Maps in the software structure of
the CAMBADA MSL team. In Section 4 we describe the construction of Util-
ity Maps. Section 5 describes the use of Utility Maps for the positioning of the
robots in defensive set pieces. In Section 6 we present the use of Utility Maps
in offensive set pieces, while Section 7 presents their use in Free Play. Finally,
Section 8 introduces some measures of the impact of the Utility Maps on the
performance of the team and discusses a series of results that prove the efficiency
of the team strategy, based on Utility Maps.
2 Related Work
Strategic positioning is a topic with broad interest within the RoboCup com-
munity. As teams participating in the RoboCup Soccer competitions gradually
managed to solve the most basic tasks involved in a soccer game, such as loco-
motion, ball detection and ball handling, the need of having smarter and more
efficient robotic soccer players arose. Team coordination and strategic position-
ing are nowadays the key factors when it comes to winning a robotic soccer game.
The first efforts for achieving coordination in multi-agent soccer teams has
been presented in [2] [3]. Strategic Positioning with Attraction and Repulsion
(SPAR) takes into account the positions of other agents as well as that of the
ball. The following forces are evaluated when taking a decision regarding the
positioning of an agent: repulsion from opponents and team members, attraction
to the active team member and ball and attraction to the opponents’ goal.
In the RoboCup Soccer Simulation domain, Situation Based Strategic Posi-
tion (SBSP) [4] is a well known technique used for the positioning of the software
agents. The positioning of an agent only takes into consideration the ball posi-
tion, as focal point, and does not consider other agents. However, if all agents are
assumed to always devote their attention to the ball position, then cooperative
behavior can be achieved indirectly. An agent defines its base strategic position
based on the analysis of the tactic and team formation. Its position is adjusted
accordingly to ball pose and game situation. This approach has been adapted to
the Middle Size League constraints and has been presented in [5].
In [6] a method for Dynamic Positioning based on Voronoi Cells(DPVC) was
introduced. The robotic agents are placed based on attraction vectors. These
vectors represent the attraction of the players towards objects, depending on
the current state of the game and roles.
The Delaunay Triangulation formations (DT) [7] divide the soccer pitch into
several triangles based on given training data. Each training datum affects only
the divided region to which it belongs. A map is built from a focal point, such
as ball position, to the positioning of the agents.
A New Approach for Dynamic Strategic Positioning 435
For more than 20 years, grid-based representations have been used in robotics
in order to show different kinds of spacial information, allowing a more accurately
and simplified world perception and modelling [8] [9]. Usually, this type of rep-
resentations are oriented to a specific goal. Utility functions have been presented
before [10] [11] as a tool for role choosing within multi-agents systems.
Taking into account the successful use of this approach, a similar idea was
applied to the CAMBADA agents, but with different functionalities and purpose.
The aim of the proposed approach was to improve the collective behavior in some
specific game situations. Utility Maps have been developed as support tools for
the positioning of soccer robots in defensive and offensive set pieces, as well as
in freeplay passes situations.
When the game is in free play, a robot can assume one of two roles: Striker
or Midfielder, depending on the relative robot position regarding the ball.
The use of these Utility Maps allow the robots to easily take decisions regard-
ing their positioning simply choosing the local maximum on the maps. The Util-
ity Maps are calculated locally on the robots and are part of their worldstate so
that they are easily accessible by any behavior or role. In terms of implemen-
tation, the TCOD1 library has been used. The library provides built-in toolkits
for management of height maps, which in the context of this work are used as
Utility Maps, and field of view calculations. It takes, on average, 4ms to update
the necessary maps in each cycle of the agent software execution. The robots are
currently working with a cycle of 20 ms, controlled by the vision process that
works at 50 frames per second [14].
The identified opponent robots lead to hills in the map on their position, with
some persistence to improve the stability of the decisions based on the maps. It
takes 5 agent cycles (100 milliseconds) for a new obstacle to reach the maximum
cost level. In the end, the map is normalized, thus always holding values between
0.0 and 1.0.
1
http://roguecentral.org/doryen/libtcod/
438 A.J.R. Neves et al.
Fig. 1. On the left, the FOV calculated from robot number 3. The ball is represented by
a small circle and the robots as larger circles. The circles with numbers are considered
team mates. The red areas are considered visible from the point of view of the robot.
On the right, the configuration tool used for the positioning of our robots in opponent
set pieces and during free play (DT).
a team mate are ignored, unless that team mate sees it. The filtered information
is then used to build the map.
From each cluster of obstacles, a valley is carved in direction to the ball. After
that, the calculated map is added with a predefined height map that defines the
priorities of the positions (see Figure 2). This map takes into consideration that
it is more important to cover the opponent robots in the direction of our goal,
rather than in the direction of their goal. Finally, all the restrictions (minimum
distance to the ball, positions inside the field and avoidance of penalty areas)
are added.
In Fig. 2 we can see a game situation where robots number 3, 4 and 5 are
configured to cover the opponent robots. The best position given by the Utility
Map for each robot is represented in red. As intended, these positions are between
the ball and the opponent robots. Robot 2 is in its base position provided by
the DT configuration.
The distance between our team robot and the opponent robot that it is trying
to cover can be configured (in the configuration file of the robot). The human
coach can specify, in the same configuration file, what are the robots allowed to
perform covering.
To configure our set pieces we use a graphical tool (Figure 3) that implements
an SBSP algorithm. The field is divided into 10 zones. Each zone defines a set of
positions for the Replacer (the role of the robot closer to the ball and responsible
for putting the ball in play) and Receivers (the role of the other robots, except
the Goalie). The position to kick by default in a situation where there is no
Receiver available can also be configured. The position of the Receivers can
be absolute or relative to the ball. We can also define if the Receiver needs to
have a clear line between its position and the ball and an option to force the
A New Approach for Dynamic Strategic Positioning 439
Fig. 2. On the left, cover priorities height map used to define the priority of the cover
positions in the field. Red is the more prioritary and blue is the least prioritary. On
the right, example of a cover Utility Map. As we can see, the best position for the
robots number 3, 4 and 5 are between the ball and the robots of the opponent team
(red color).
Receiver to be aligned with the goal. The priority for each receiver is indicated
as well, in the case that for a specific region will be more than one configured
and available. The one with more priority will be tested first for receiving a
pass. The action to be performed by the Replacer to that specific Receiver is
also configurable. This action can be a pass, a cross or none. In the last case,
the Receiver will never be considered as an option. It is possible to configure
differently each one of the possible set pieces (corner, free kick, drop ball, throw
in and kick of) for each one of the regions.
When the set pieces using the referred tool are configured, the opponent
team is not taken into account. After the opponent team has positioned itself,
our configured positions can be positions where the receiver is not able to receive
a pass. To deal with these situations, there is the need to have an alternative
reception position that has to be calculated dynamically, taking into account the
opponent team. An Utility Map is used to calculate the alternative position for
the Receiver.
All the constraints imposed by the rules, namely minimum distance to the
ball (2m) and no entering in the goal areas are taken into account for the con-
struction of the Receiver Utility Map. The field is divided into two zones for the
application of different metrics to calculate the utility value for each position.
On our side of the field, only one metric is used. This metric is the distance
to the halfway line. Three metrics are used on the opponent side of the field.
One is the free space between the pass line and the closest obstacle. The sec-
ond one is the weighted average between the distance to the ball, the distance
to the opponent goal, the rotation angle for a shot on target and the distance
from the point on the map to the position of the robot. Finally, the third metric
is the angle of each map position to the opponent goal. The weights for the
second metric are easily configurable in the configuration file of the robot. These
metrics are only applied within a circle whose radius is also defined in the con-
figuration file. This circle is centered on the position of the ball in the set piece,
and only the positions that have FOV from the ball (positions where the ball
can be passed) are considered.
440 A.J.R. Neves et al.
Fig. 3. On the left, the configuration tool used for our set pieces. On the right, an
example of a Utility Map for a Receiver.]Example of an alternative positioning map
for Receiver calculated for Robot number 2. CAMBADA is attacking towards the blue
goal. The black line goes from the ball to the alternative position indicated by Robot
number 2 to receive the ball.
The robots move to the best position extracted from the Utility Map only
after the referee gives the start signal to prevent the opponent robots to follow
them. In Fig. 3 all the receivers are sharing that they have line clear to receive
the ball (lineClear is information associated to each robot). Robot number 4 is
the Replacer already chosen to pass the ball to robot number 2. The pass line
it is trying to make is represented by the black line. Robot number 2 will move
to that position to receive the ball.
7 Positioning in Freeplay
Fig. 4. On the left, the Free-play utility map calculated by the Midfielders. The best
positions on the field to receive the ball are in red. On the right, the Free-play Utility
Map to be used by the Striker to choose the best position to perform a pass or a kick
to the goal, when dribbling the ball. The robot will dribble to the positions on red.
robot will be constantly adapting to the changes of the opponent formation and
the ball position.
The Striker, the robot holding the ball or closer to it, also uses an Utility
Map for selecting the best position to perform a pass or to kick towards the goal.
The calculated map (Figure 4) deals with the constraints regarding a generic
dribble behavior, complying with the current MSL rules, that do not allow a
robot to dribble for more than 3 meters. Some areas of the field have less utility,
namely both penalty areas, areas outside the field and outside a 3 meter radius
circle centered on the point where the ball was grabbed. More priority is given
to the areas close the limits of the field since it is more advantageous to kick to
the goal from there.
Table 1. Defensive set pieces cover efficiency during the last two games of
RoboCup2014. According to the rules, the defending team has to be at least 3 meters
from the ball. In these situations it is impossible to intercept the ball.
Intercepted
Game Opponent <3m % Sucess
Yes No
Semi-final Tech United 11 10 3 77
3rd place MRL 14 4 3 57
Total 25 14 6 70
Table 2. Attacking set pieces efficiency during the last two games of RoboCup2014.
We are considering the success of the ball reception after a pass.
Pass
Game Opponent % Sucess
Yes No
Semi-final Tech United 10 8 56
3rd place MRL 15 6 71
Total 25 14 64
from the RoboCup games. This analysis reveals that the team reached a 70%
success rate in the interception of the ball in defensive set pieces (see Table 1),
performed 64% successful passes in offensive set pieces (see Table 2) and a high
percentage of successful passes in free play, being these last situations hard to
analyze due to the high dynamism of the games.
Looking at Table 1, considering that most of the times when the attacking
team made a short pass means that was forced into it by not having other pass
option, we have a success rate of 70% in defensive set pieces situations.
Looking further into the unsuccessful situations, the problem was clearly
identified and it is not related to the cover position obtained from the Utility
Maps. The problem was rather the transition from the Barrier role into the
Midfielder or Striker role, situations where the cover position are not used.
This is still an open issue to be addressed in the near future.
In the last two games of RoboCup 2014, the final and semi-final - which
were probably the most dynamic games, there was a total of 45 defensive set
pieces situations. An average of 22 defensive set pieces per game, in a game of
30 minutes, which means a defensive set piece situation every 1 minute and 20
seconds.
In Fig. 5 we can see two game situations were the CAMBADA robots are in
strategic positions. By being in those positions, the CAMBADA robots do not
allow the attacking team to perform a pass in a proper way.
In the same last two games of RoboCup 2014, the final and semi-final, there
was a total of 39 offensive set pieces situation. An average of 20 offensive set
pieces per game, in a game of 30 minutes, which means an offensive set piece
situation every 1 minute and 30 seconds. In 64% of the situations the robots were
able to properly receive the ball. Looking further into the unsuccessful situations,
there were cases were the ball was passed to a position far from the Receiver
A New Approach for Dynamic Strategic Positioning 443
Fig. 5. Defensive game situations during RoboCup 2014. CAMBADA team has blue
markers and it is the defending team.
Fig. 6. Offensive game situations during RoboCup 2014. CAMBADA team is with blue
markers and is the attacking team.
and it was lost and some other cases where the reception was not properly done
mainly due to misalignment of the Receiver. These situations where not due
to a wrong positioning given by the Utility Map. We just counted a total of 3
interceptions by the opponent team of the ball for long passes.
In Fig. 6 we can see two game situations where the CAMBADA robots are
in strategic positions, which allows them to receive the ball with success.
Based upon this study, we are convinced that the use of Utility Maps is an
advantageous approach in extremely dynamic environments, such as the one of
robotic soccer. Without great complexity being added to the structure of the
agents, as it was described in the previous sections, it was possible to introduce
the desired dynamism that led to the increase of the team competitiveness and
improved its overall performance.
References
1. Neves, A., Azevedo, J., Lau, N., Cunha, B., Silva, J., Santos, F., Corrente, G.,
Martins, D.A., Figueiredo, N., Pereira, A., Almeida, L., Lopes, L.S., Pedreiras, P.:
CAMBADA soccer team: from robot architecture to multiagent coordination,
chapter 2, pp. 19–45. I-Tech Education and Publishing, Vienna, January 2010
2. Veloso, M., Bowling, M., Achim, S., Han, K., Stone, P.: The cmunited-98 champion
small robot team (accessed February 27, 2014)
3. Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to
Robotic Soccer. MIT Press (2000)
4. Reis, L.P., Lau, N., Oliveira, E.C.: Situation based strategic positioning for coordi-
nating a team of homogeneous agents. In: Hannebauer, M., Wendler, J., Pagello, E.
(eds.) ECAI-WS 2000. LNCS (LNAI), vol. 2103, pp. 175–197. Springer, Heidelberg
(2001)
444 A.J.R. Neves et al.
5. Lau, N., Lopes, L.S., Corrente, G.: CAMBADA: information sharing and team
coordination. In: Proc. of the 8th Conference on Autonomous Robot Systems and
Competitions, Portuguese Robotics Open - ROBOTICA 2008, pp. 27–32, Aveiro,
Portugal, April 2008
6. Dashti, H.A.T., Aghaeepour, N., Asadi, S., Bastani, M., Delafkar, Z., Disfani, F.M.,
Ghaderi, S.M., Kamali, S.: Dynamic positioning based on voronoi cells (DPVC).
In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (eds.) RoboCup 2005. LNCS
(LNAI), vol. 4020, pp. 219–229. Springer, Heidelberg (2006)
7. Akiyama, H., Noda, I.: Multi-agent positioning mechanism in the dynamic envi-
ronment. In: Visser, U., Ribeiro, F., Ohashi, T., Dellaert, F. (eds.) RoboCup 2007:
Robot Soccer World Cup XI. LNCS (LNAI), vol. 5001, pp. 377–384. Springer,
Heidelberg (2008)
8. Elfes, A.: Using occupancy grids for mobile robot perception and navigation. Com-
puter 22(6), 46–57 (1989)
9. Rosenblatt, J.K.: Utility fusion: map-based planning in a behavior-based system.
In: Zelinsky, A. (ed.) Field and Service Robotics, pp. 411–418. Springer, London
(1998)
10. Chaimowicz, L., Kumar, V.: Mario Fernando Montenegro Campos. A paradigm
for dynamic coordination of multiple robots. Auton. Robots 17(1), 7–21 (2004)
11. Spaan, M.T.J., Groen, F.C.A.: Team coordination among robotic soccer players.
In: Kaminka, G.A., Lima, P.U., Rojas, R. (eds.) RoboCup 2002. LNCS (LNAI),
vol. 2752, pp. 409–416. Springer, Heidelberg (2003)
12. Santos, F., Almeida, L., Lopes, L.S., Azevedo, J.L., Cunha, M.B.: Communicating
among robots in the robocup middle-size league. In: Baltes, J., Lagoudakis, M.G.,
Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS, vol. 5949, pp. 320–331.
Springer, Heidelberg (2010)
13. Lau, N., Lopes, L.S., Corrente, G., Filipe, N., Sequeira, R.: Robot team coordi-
nation using dynamic role and positioning assignment and role based setplays.
Mechatronics 21(2), 445–454 (2011)
14. Trifan, A., Neves, A.J.R., Cunha, B., Azevedo, J.L.: UAVision: a modular time-
constrained vision library for soccer robots. In: Bianchi, R.A.C., Akin, H.L.,
Ramamoorthy, S., Sugiura, K. (eds.) RoboCup 2014. LNCS, vol. 8992, pp. 490–501.
Springer, Heidelberg (2015)
15. Silva, J., Lau, N., António, J.R., Neves, A.J., Azevedo, J.L.: World modeling on
an MSL robotic soccer team. Mechatronics 21(2), 411–422 (2011)
Intelligent Wheelchair Driving: Bridging the Gap
Between Virtual and Real Intelligent Wheelchairs
1 Introduction
Recently, virtual reality has attracted much interest in the field of motor rehabilitation
engineering [1]. Virtual reality has been applied to provide safe and interesting train-
ing scenarios with near-realistic environments for subjects to interact with it [2]. The
performance of elements within these virtual environments proved to be representa-
tive of the elements’ abilities in the real world and their real-world skills showed sig-
nificant improvements following the virtual reality training [3-5]. Until now, electric
powered wheelchairs simulators were mainly developed to either facilitate patient
training and skill assessment [6] [7] or assist in testing and development of semi-
autonomous intelligent wheelchairs [8]. While in the training simulators the focus is
on user interaction and immersion, the main objective of the robotics simulators is the
accurate simulation of sensors and physical behaviour. The simulator presented here
addresses the need to combine these approaches. It is a simple design that provides
the user training ability while supporting a number of sensors and ensuring physically
feasible simulation for intelligent wheelchairs’ development. The simulator is a part
of a larger project where the IntellWheels prototype will include all typical IW capa-
bilities, like facial expression recognition based command, voice command, sensor
base command, advanced sensorial capabilities, the use of computer vision as an aid
for navigation, obstacle avoidance, intelligent planning of high-level actions and
communication with other devices.
The experiments with real wheelchair users allowed to access information about
the usability of the virtual intelligent wheelchair and virtual environment. These users
besides using a wheelchair to move around are also potential users of an intelligent
wheelchair. In fact, they suffer from cerebral palsy which is a group of permanent
disorders in the development of movement and posture.
This paper is organized with five sections; the first one is composed with this in-
troduction. Section two presents an overview of the methodologies for wheelchair
simulation and the criteria to select the platform to simulate the IW and the environ-
ment. A special attention is given to the USARSim which was the chosen platform to
produce the simulation. In section three a brief description of the IntellWheels project
is presented. Section four presents the experiments and results and finally the last
section refers the conclusions and future work.
The methodologies that typically are concerned with rendering 2D and 3D are
known as graphics engine. Usually they are aggregate inside games engine using
specific libraries for rendering. Examples of graphics engines are OpenSceneGraph
[15], Object- Oriented Graphics Rendering Engine (OGRE) [16], jMonkey Engine
[17] and Crystal Space [18]. The physics engines are software applications with the
objective of simulate the physics reality of objects and world. Bullet [19], Havok
[20], Open Dynamics Engine (ODE) [21] and PhysX [22] are examples of physics
engines. These engines also contribute for robotics simulation for more realistic
motion generation of the robot. The game engines are software framework that de-
velopers use to create games. The game engines normally include a graphic engine
and a physics engine. The collision detection/ response, sound, scripting, animation,
artificial intelligence, networking, streaming, memory management, localization
support and scene graph are also functionalities included in this kind of engine.
Examples of game engine are Unreal Engine 3 [23], Blender Game Engine [24],
HPL Engine [25] and Irrlicht Engine [26]. The robotics simulator is a platform to
develop software for robots modulation and behaviour simulation in a virtual envi-
ronment. In several cases it is possible to transfer the application develops in the
simulation to the real robots without any extra modification. In the literature there
are several commercial examples of robotics simulators: AnyKode Marilou (for
mobile robots, humanoids and articulated arms) [27]; Webots (for educational pur-
poses it has a large choice of simulated sensors and actuators is available to prepare
each robot) [28]; Microsoft Robotics Developer Studio (HRDS) (allows an easy
access to simulated sensors and actuators) [29]; Workspace 5 (environment based
on Windows and allows the creation, manipulation and modification of images in
3DCad and several ways of communication) [30]. The non-commercial robotics
simulators are also available: SubSim [31]; SimRobot [32]; Gazebo [33]; USARSim
[34]; Simbad [35] and SimTwo [36]. A comparison of the most used 3D robotics
simulator according several criteria was presented by Petry et al. [37]. Petry et al.
[37] also presented the requirements and characteristics for simulation of intelligent
wheelchairs and in particularly to the IntellWheels prototype.
The USARSim, acronym of Unified System for Automation and Robot Simulation,
is a high-fidelity simulation of robots and environments, based on the Unreal Tour-
nament game engine [34]. Initially was created as a research tool designated as a si-
mulation of Urban Search And Rescue (USAR) robots and environments for the study
of human-robot interaction (HRI) and multi-robot coordination [37]. USARSim is the
basis for the RoboCup rescue virtual robot competition (RoboCup) as well as the
IEEE Virtual Manufacturing Automation Competition (VMAC) [34].
Nowadays, the simulator uses the Unreal Engine UDK and the NVIDIA's PhysX
physics engine. The version used to develop the IntellSim was the Unreal Engine 2.5
and the Karma physics engine (which are integrated into the Unreal Tournament 2004
game) which maintain and render the virtual environment and model the physical
behaviour of its elements respectively.
448 B.M. Faria et al.
3 IntellWheels Project
IntellWheels project aims at providing a low cost platform to electric wheelchairs in
order to transform them into intelligent wheelchairs. The simulated environment al-
lows to test and train potential users of the intelligent wheelchair. And select the ap-
propriate interface, among the available possibilities, for a specific user. After the first
set of experiments [38] it was necessary to improve the realism of the simulated envi-
ronment and behaviour of the IW. For that reason and trying to maintain the principle
of producing the IntellWheels’ project as the lowest cost possible the USARSim was
the choice for the new simulator. There were other reasons to decide by the USAR-
Sim such as, having an advance support on robots with wheels, allowing it the inde-
pendent configuration; allowing the importation of object and robots modelled in
different platforms in order to facilitate for instance the wheelchair modulation; being
possible to program robots and control them in the network which can be imple-
mented in the mixed reality [39].
One of the main objectives of the project is also the creation of a development plat-
form for intelligent wheelchairs [40], entitled IntellWheels Platform (IWP). The
project main focus is the research and design of a multi-agent platform, enabling easy
integration of different sensors, actuators, devices for extended interaction with the
user [41], navigation methods and planning techniques and methodologies for intelli-
gent cooperation to solve problems associated with intelligent wheelchairs [42].
The IntellWheels platform allows the system to work in real mode (the IW has a
real body), simulated (the body of the wheelchair is virtual) or mixed reality (real IW
with perception of real and virtual objects). In real mode it is necessary to connect the
system (software) to the IW hardware. In the simulated mode, the software is con-
nected to the IWP simulator. In the mixed reality mode, the system is connected to
both (hardware and simulator). Several types of input devices were used in this
project to allow people with different disabilities to be able to drive the IW. The in-
tention is to offer the patient the freedom to choose the device they find most
comfortable and safe to drive the wheelchair. These devices range from traditional
joysticks, accelerometers, to commands expressed by speech, facial expressions or a
combination of some of them. Moreover, these multiple inputs for interaction with the
IW can be integrated with a control system responsible for the decision of enabling or
disabling any kind of input, in case of any observed conflict or dangerous situation.
To compose the necessary set of hardware to provide the wheelchair’s ability to avoid
obstacles, follow walls, map the environment and see the holes and unevenness in the
ground, two side bars were designed, constructed and place on the wheelchair. In
these bars were incorporated 16 sonars and a laser range finder. Two encoders were
also included, and coupled to the wheels to allow the odometry.
The virtual intelligent wheelchair was modelled using the program 3DStudioMax
[44]. The visualizing part, which appears on the screen, was imported to the UnrealE-
ditor as separated static meshes (*.usx) file. The model was then added to USARSim
by writing appropriate UnrealScript classes and modifying the USARSim configura-
tion file. The physics property of the model was described in Unreal Script language,
using a file for each robot’s part. The model has fully autonomous caster wheels and
two differential steering wheels. In the simulation it is equipped with: camera; front
sonar ring; odometry sensor and encoders. Fig. 1 shows different perspectives of the
real and virtual wheelchair.
An important factor affecting the simulation of any model in UT2004 is its mass
distribution and associated inertial properties. These were calculated using estimated
masses of the different parts of the real chair (70 kg) with batteries and literature val-
ues for average human body parameters (60 kg). The values obtained for the center of
mass and tensor of inertia were used to calculate the required torque for the two simu-
lated motors using the manufacturer’s product specification as a guideline. The sen-
sors used in the simulated wheelchair are the same as those used in the real IW. As in
the real prototype 16 sonars and a laser range finder were place in two side bars. Two
encoders were also included, and coupled to the wheels to allow odometry. These
sensors provide the wheelchair’s ability to avoid obstacles, follow walls, map the
environment and see the holes and unevenness in the ground. Using the simulator it
was also possible to model rooms with low illumination and noisy environments and
test the performance of users while driving the wheelchair. The map created was done
using the Unreal Editor 3 and it is similar to the local were the patients are used to
move around. Several components in the map were modelled using 3DStudioMax. In
order to increase the realism of the virtual environment it was implemented several
animations using sequence scripts. The simulator runs on a dual-core PC with a gam-
ing standard dual-view graphics card. And other supported input devices include key-
board, mouse, mouse replacement devices and gaming joysticks.
The initial experiments were conducted to have more information about the real pro-
totype and with the objective of being able to modelled and simulate the behaviour of
450 B.M. Faria et al.
the real wheelchair more precisely in virtual environment. The final experiments
involved real wheelchair users and potential users of the intelligent wheelchair.
Therefore the experiments were divided into two components: wheelchair technical
information and users’ feeling about the simulated wheelchair and the virtual
environment modelled.
Fig. 2 shows some of the experiments done with real intelligent wheelchair proto-
type. It was analyzed the velocity and time for the total rotation with the new adapta-
tions applied to the real prototype. The results obtained, using the real prototype, were
considered in order to develop the virtual intelligent wheelchair.
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 451
Fig. 3. Overall circuit and snapshots of the first person view during the game
After the experiments the users respond to the System Usability Scale [45] which
is a simple ten-item Likert scale giving a global view of individual assessments of
usability (with a score between 0 and 100) and some more questions about safety and
control managing the IW, if it was easy to drive the IW in tight places and the atten-
tion needed to drive the IW.
illiterate, 16% just have the elementary school, 16% have the middle school, 37% have
the high school and only 5% have a BSc. The dominant hand was divided as: 12 for left,
6 for right hand and 1 did not answer. Another question was the frequency of use of in-
formation and communication technologies: 27% did not answer; 42% answered rarely;
21% sometimes; 5% lots of times and 5% always. The aspects related to experience of
using manual and electric wheelchair were also questioned. Table 2 shows the distribu-
tion of answers about autonomy and independency using the wheelchair and constraints
presented by these individuals.
Table 2. Experience using wheelchair, autonomy, independence and constraints of the cerebral
palsy users
The SUS mean score was satisfactory and all the users considered the usability
positive (higher than 57.5). Overall, the users were very satisfied with the experience.
The users made the circuit with a median of 9.5 minutes. The best time was made by a
user in 5.6 minutes and the worst time 42.4 minutes was made by a user with severe
difficulties and without autonomy or independence in driving a wheelchair. The data
from the logs allows plotting the circuits after the experiment. It is a way of analysing
the behaviour of the users (Fig. 4).
Fig. 4. Circuits performed by patients suffering from cerebral palsy with joystick
It is interesting to notice that the path is smooth using the joystick in manual mode
and using the IntellSim. These three individuals are autonomous and independent in
driving their own electric wheelchairs with joystick (Level IV of the GMF). The next
three circuits’ examples (second line in the Fig. 4) were executed by the users that
took the longest times. In the left example the user took 17.7 minutes to collect 14
objects and to finish the circuit. In the middle the user took 22.35 to collect 12 objects
and on the right side there is the circuit performed by the user that took the longest
time of 42.4 minutes to collect only 9 objects. Although these three examples are the
worst in terms of performance it is necessary to enhance that they are classified in the
most severe degree of the GMF and do not have the autonomy and independence to
drive a conventional wheelchair. However, the IntellSim can be used to train these
users and with appropriate methodologies, such as shared or automatic controls, it is
possible to drive the IW in efficient and effective manner.
The attention given to the autonomy and independence of the individual is nowadays an
actual subject. The scientific community is concerned in develop and present many
prototypes, such as intelligent wheelchairs, however most of them only execute experi-
mental work in the labs and without real potential users. The virtual reality wheelchair
454 B.M. Faria et al.
References
1. Holden, M.K.: Virtual environments for motor rehabilitation: review. J. Cyberpsychol
Behav. 8(3), 187–211 (2005)
2. Boian, R.F., Burdea, G.C., Deutsch, J.E., Windter, S.H.: Street crossing using a virtual
environment mobility simulator. In: Proceedings of IWVR, Lausanne, Switzerland (2004)
3. Inman, D.P., Loge, K., Leavens, J.: VR education and rehabilitation. Commun. ACM
40(8), 53–58 (1997)
4. Harrison, A., Derwent, G., Enticknap, A., Rose, F.D., Attree, E.A.: The role of virtual real-
ity technology in the assessment and training of inexperienced powered wheelchair users.
Disabil Rehabil 24(8), 599–606 (2002)
5. Adelola, I.A., Cox, S.L., Rahman, A.: VEMS - training wheelchair drivers, Assistive
Technology, vol. 16, pp. 757–761. IOS Press (2005)
6. Desbonnet, M., Cox, S.L., Rahman, A.: Development and evaluation of a virtual reality
based training system for disabled children. In: Sharkey, P., Sharkeand, R., Lindström, J.-I.
(eds.) The Second European Conference on Disability, Virtual Reality and Associated
Technologies, Mount Billingen, Skvde, Sweden, pp. 177–182 (1998)
7. Niniss, H., Inoue, T.: Electric wheelchair simulator for rehabilitation of persons with motor
disability, Symp Virtual Reality VIII, Belém (PA). Brazilian Comp. Society (BSC) (2006)
8. Röfer, T.: Strategies for using a simulation in the development of the Bremen autonomous
wheelchair. In: 12th European Simulation Multi Conference 1998 Simulation: Past,
Present and Future, ESM 1998, Manchester, UK, pp. 460–464 (1998)
Bridging the Gap Between Virtual and Real Intelligent Wheelchairs 455
9. Tefft, D., Guerette, P., Furumasu, J.: Cognitive predictors of young children’s readiness
for powered mobility. Dev. Medicine and Child Neurology 41(10), 665–670 (1999)
10. Faria, B.M., Silva, A., Faias, J., Reis, L.P., Lau, N.: Intelligent wheelchair driving: a com-
parative study of cerebral palsy adults with distinct boccia experience. In: Rocha, Á.,
Correia, A.M., Tan, F., Stroetmann, K. (eds.) New Perspectives in Information Systems
and Technologies, Volume 2. AISC, vol. 276, pp. 329–340. Springer, Heidelberg (2014)
11. Palisano, R.J., Tieman, B.L., Walter, S.D., Bartlett, D.J., Rosenbaum, P.L., Russell, D., et
al.: Effect of environmental setting on mobility methods of children with cerebral palsy.
Developmental Medicine & Child Neurology 45(2), 113–120 (2003)
12. Wiart, L., Darrah, J.: Changing philosophical perspectives on the management of children
with physical disabilities-their effect on the use of powered mobility. Disability & Rehabil-
itation 24(9), 492–498 (2002)
13. Edlich, R.F., Nelson, K.P., Foley, M.L., Buschbacher, R.M., Long, W.B., Ma, E.K.: Tech-
nological advances in powered wheelchairs. Journal Of Long-Term Effects of Medical
Implants 14(2), 107–130 (2004)
14. Faria, B.M., Teixeira, S.C., Faias, J., Reis, L.P., Lau, N.: Intelligent wheelchair simulator
for users’ training: cerebral palsy children’s case study. In: 8th Iberian Conf. on Informa-
tion Systems and Technologies, vol. I, pp. 510–515 (2013)
15. Wang, R., Qian, X.: OpenSceneGraph 3 Cookbook. Packt Pub. Ltd., Birmingham (2012)
16. Koranne, S.: Handbook of Open Source Tools, West Linn. Springer, Oregon (2010)
17. jMonkeyEngine (2012). http://jmonkeyengine.com/ (current May 2014)
18. Space, C.: Crystal Space user manual, Copyright Crystal Space Team. http://www.
crystalspace3d.org/main/Documentation#Stable_Release_1.4.0 (current May 2014)
19. Gu, J., Duh, H.B.L.: Handbook of Augmented Reality. Springer, Florida (2011)
20. Havok, Havok (2012). http://havok.com/ (current May 2014)
21. Smith, R.: Open Dynamics Engine (2007). http://www.ode.org/ (current May 2014)
22. Rhodes, G.: Real-Time Game Physics, in Introduction to Game Development, Boston,
Course Technology, pp. 387–420 (2010)
23. Busby, J., Parrish, Z., Wilson, J.: Mastering Unreal Technology. Sams Publishing, Indian-
apolis (2010)
24. Flavell, L.: Beginning Blender - Open Source 3D Modeling, Animation and Game Design.
Springer, New York (2010)
25. Games, F.: Frictional Games (2010). http://www.frictionalgames.com/site/about (current
May 2014)
26. Kyaw, A.S.: Irrlicht 1.7 Realtime 3D Engine – Beginner’s Guide. Packt Publishing Ltd.,
Birmingham (2011)
27. Marilou, April 2012. http://doc.anykode.com/frames.html?frmname=topic&frmfile=index.
html (current May 2014)
28. Cyberbotics, Webots Reference Manual, April 2012. http://www.cyberbotics.com/
reference.pdf (current May 2014)
29. Johns, K., Taylor, T.: Microsoft Robotics Developer Studio. Wrox, Indiana (2008)
30. Workspace, Workspace Robot Simulation, WAT Solutions (2012). http://www.workspacelt.
com/ (current May 2014)
31. Boeing, A., Braunl, T.: SubSim: an autonomous underwater vehicle simulation package.
In: 3rd Int. Symposium on A. Minirobots for Research and Edut, Fukui, Japan
32. Laue, T., Röfer, T.: SimRobot - development and applications. In: Proceedings of the
International Conference on Simulation, Modeling and Programming for Autonomous
Robots, Venice, Italy (2008)
33. Gazebo, Gazebo. http://gazebosim.org/ (current May 2014)
456 B.M. Faria et al.
34. Carpin, S., Lewis, M., Wang, J., Balakirsky, S., Scrapper, C.: USARSim: a robot simulator
for research and education. In: Proceedings of the IEEE International Conference on
Robotics and Automation, Roma, Italy (2007)
35. Hugues, L., Bredeche, N.: Simbad Project Home, May 2011. http://simbad.sourceforge.
net/ (current May 2014)
36. Costa, P.: SimTwo - A Realistic Simulator for Robotics, March 2012. http://paginas.
fe.up.pt/~paco/pmwiki/index.php?n=SimTwo.SimTwo (current May 2014)
37. Petry, M., Moreira, A.P., Reis, L.P., Rossetti, R.: Intelligent wheelchair simulation: re-
quirements and architectural issues. In: 11th International Conference on Mobile Robotics
and Competitions, Lisbon, pp. 102–107 (2011)
38. Faria, B.M., Vasconcelos, S., Reis, L.P., Lau, N.: Evaluation of Distinct Input Methods of
an Intelligent Wheelchair in Simulated and Real Environments: A Performance. The Offi-
cial Journal of RESNA (Rehabilitation Engineering and Assistive Technology Society of
North America) 25(2), 88–98 (2013). USA
39. Namee, B.M., Beaney, D., Dong, Q.: Motion in Augmented Reality Games: An engine for
creating plausible physical interactions in augmented reality games. International Journal
of Computer Games Technology (2010)
40. Braga, R., Petry, M., Moreira, A., Reis, L.P.: A development platform for intelligent
wheelchairs for disabled people. In: 5th Int. Conf Informatics in Control, Automation and
Robotics, vol. 1, pp. 115–121 (2008)
41. Reis, L.P., Braga, R.A., Sousa, M., Moreira, A.P.: IntellWheels MMI: a flexible interface
for an intelligent wheelchair. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S.
(eds.) RoboCup 2009. LNCS, vol. 5949, pp. 296–307. Springer, Heidelberg (2010)
42. Braga, R., Petry, M., Moreira, A., Reis, L.P.: Platform for intelligent wheelchairs using
multi-level control and probabilistic motion model. In: 8th Portuguese Conf. Automatic
Control, pp. 833–838 (2008)
43. Braga, R.A., Malheiro, P., Reis, L.P.: Development of a realistic simulator for robotic
intelligent wheelchairs in a hospital environment. In: Baltes, J., Lagoudakis, Michail G.,
Naruse, Tadashi, Ghidary, Saeed Shiry (eds.) RoboCup 2009. LNCS, vol. 5949, pp. 23–34.
Springer, Heidelberg (2010)
44. Murdock, K.L.: 3ds Max 2011 Bible. John Wiley & Sons, Indianapolis (2011)
45. Brooke, J.: SUS: A quick and dirty usability scale, in Usability evaluation in industry,
pp. 189–194. Taylor and Francis, London (1996)
46. Rosenbaum, P., Paneth, N., Leviton, A., Goldstein, M., Bax, M., Damiano, D., Dan, B.,
Jacobsson, B.: A report: the definition and classification of cerebral palsy April 2006.
Developmental Medicine & Child Neurology - Supplement 49(6), 8–14 (2007)
A Skill-Based Architecture for Pick
and Place Manipulation Tasks
1 Introduction
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 457–468, 2015.
DOI: 10.1007/978-3-319-23485-4 45
458 E. Pedrosa et al.
The motivation of this challenge is to bring a mobile robot onto the shop floor
for dexterous manipulations and logistics carries. Enabling an autonomous robot
to operate in an unstructured environment and establishing a safe and effective
human-robot interaction are two of the main research issues to be addressed.
The first stage consists of a sequence of tasks to be performed in a simu-
lated environment, with increasing difficulties in the problem to be solved. The
Simulation Environment consist of a Light-Weight-Robot (LWR), with a two-
jaw-gripper, mounted on a moving XY axis on a table top, and a fixed mast
with a Pan & Tilt (PT) actuator. Additionally, a vision system made by a pair of
RGB and Depth cameras is installed on the PT and another on the Tool Center
Point (TCP). The objects to be manipulated assume a basic shape, i.e. cylinder
or box, or a compound of basic shapes. An overview is depicted in Fig. 1.
All tasks include a Pick & Place (P&P) scenario where objects (e.g. Fig. 2)
have to be picked from unknown locations and placed on target locations.
Required actions to accomplish a task includes: perception, to locate the objects;
manipulation, to pick and place them; and planning, to move the arm to the
target locations. For demonstrating these problems four different tasks are con-
sidered.
P&P. The goal is to pick all objects in the working space and place them in the
proper location without any particular order. The task contains three objects of
different shape and color on the table (e.g. Figures 2c, 2b 2d). The pose of the
objects in the environment are unknown but their properties (color and shape
composition) and corresponding place zone are given. The LWR base cannot
move. Scoring is achieved by picking an object and place it in the correct zone.
P&P with Significant Errors and Noise. This is the same task as P&P but
with the difference that there are significant errors in the robot precision and
also significant noise in all sensors. This implies an operation of the LWR in an
imprecise and uncalibrated environment.
A Skill-Based Architecture for Pick and Place Manipulation Tasks 459
Mobile P&P with Typical Errors and Noise. This is based on the P&P
task, but with typical calibration errors and sensor noise. The LWR can now
use the XY axis to move. This task introduces the concept of mobility, meaning
that pick and place positions may not be in range of the LWR, not considering
the additional axis.
Fig. 2. Examples of objects that appear in the simulated environment. The puzzle
parts are only examples of the set of valid shapes. Objects may vary on scale and color.
Simulation Environment
Interface Node
Sensor Effector
Interface Interface
Sensory Data
Adapter
Skill
Solver
Agent
4 Agent Design
The EuRoC project exposes a set of tasks with perception and manipulation
problems to be completed. To solve these tasks we need a proper agent capable
of perceiving the environment through sensors and act upon that environment
using actuators [8]. To design such agent we have to analyze the properties of
the environment, which in this case is a well defined Simulated Environment, to
develop a generic as possible solution capable of handling all tasks. Our proposal
is summarized in algorithm form in Fig. 4.
The set O of objects to be manipulated in each task, including their properties
and place zones, is known in advance. The algorithm starts by determining the
order by which the objects have to be manipulated. The order is restricted
by a direct acyclic graph (dag), which represents a dependency graph between
objects in terms of order of manipulation. Leafs represent objects that need to
be handled first. A dummy object λ is added to represent the graph root.
Once an object is manipulated, it is removed from the graph. Thus, at any
moment in the task execution, leafs represent objects that can be manipulated.
When the dummy object is the only one in the graph, the task is completed.
462 E. Pedrosa et al.
1: G ← BuildOrderGraph(O), S ← buildSearchSpace(G)
2: while leafs(G) = λ do λ is the empty root
3: L ← leafs(G)
4: obj ← detectObject(L)
5: while obj is not found do
6: s ← next(S)
7: move(s)
8: obj ← detectObject(L)
9: end while
10: focusObject(obj)
11: plan = MakePlan(obj)
12: success = execute(plan)
13: if success then
14: removeLeaf(G, obj)
15: end if
16: end while
move the TCP and place the object properly in the target position. Those actions
depend on a priori calculation of the pick and place pose. The way the plan is
calculated depends on the task being solved and on the disposition of the object
and its target position. For instance, the target position could be non-reachable
from the top, then the object has to be picked from a different direction. If the
execution of the plan succeed, the leaf corresponding to the processed object is
removed from the graph and the algorithm goes to the next iteration.
From this design only two procedures are task dependent, BuildOrderGraph
and MakePlan. This fosters the re-utilization of software code and facilitates the
task solving job, since the developer only has to focus on the creation of a plan. To
aid the definition of a plan, a set of skills was defined and implemented.
Depth and RGB Integration. Before any processing takes place it is nec-
essary to match every depth value to the corresponding RGB value [3]. The
depth and the RGB images come from different cameras separated by an offset,
thus they do not overlap correctly. To solve this issue, for each depth value in
the depth image, the corresponding 3D space position is calculated and then
re-projected into the image plane of the RGB camera. The registered depth is
then transformed into an organized point cloud in the coordinates frame of the
LWR.
Height Filter. The purpose of the Height Filter is to reduce the search space for
objects of interest using the fact that they are on top of a table (e.g. Fig. 6a). A
filter is applied to the pointcloud where positions with z value below a threshold
are set to not-a-number (NaN), identifying a non searchable position. The output
of this block is a mask defined by the filtered pointcloud that identifies the
searchable areas in the RGB image (Fig. 6b).
Blob Extraction. At this stage, it is expected that the input mask defines one
or more undefined shape areas. More than one blob can happen if two, or more,
objects share the same color. However, we are only interested in one object, thus
only the blob with the biggest area is considered. This is a safe action because
different objects have different shapes and can be disambiguated by matching
the shape from the blob contours with the correct object. To extract a blob, the
contours of the segmented areas are calculated by contour detection function
[11]. The output mask is a set of points that delimits the area of the blob.
Pose Estimation. After the identification of the object blob in the image we
calculate its position (x, y, z) and orientation θ. We start by calculating the
rotated bounding box that best fits the select blob (e.g. Fig. 6d). The center of
the rectangle in the image provide us enough information to extract its posi-
tion because we have a registered point cloud where (u, v) coordinate from the
RGB image has a corresponding (x, y, z). Furthermore, from the rotation of the
rectangle we can now extract the orientation of the object relative to the LWR.
Morphologic Extraction. The goal is to extract a useful shape from the blob
(e.g. Fig. 6e). All objects are treated as polygons, even the cylinder that is a
circle when looked from above – a circle can be approximated by a polygon. The
shape is obtained by a function that approximates the blob to a polygon [2]. The
polygon shape can then be used to disambiguate an object detection.
bound to one actuator. For example, moving the LWR and the XY axis at the
same time can be considered a single skill.
To solve any task we need a set of skills that are enough for the tasks ahead,
but at the same time keep their number low as possible. The idea is to create
skills that can be used as atoms from more complex skills. To solve a task we need
the capability to move the LWR so that its TCP is in the desired position, e.g. a
search pose to perceive the environment. Objects needs to be picked and place,
so this actions are also necessary. It may not always be possible to position an
object with a place action, for example, the Gripper may hit a wall when placing
right next to it. This can be solved by pushing the object to its rightfull place.
Simple Move. This skill uses only the LWR actuator. Its objective is to put the
LWR’s TCP on the requested pose. This action requires at first the calculation of
the inverse-kinematics, and then the control of the joints to the desired position.
Move XY Axis. The reach of the LWR can be increased by using the XY
axis to move it. This skill only requires x, y positions to move. The values are
directly applied to the joints.
Pick Object. The LWR and Gripper are used together to provide this skill.
It expects as input a pick pose that must correspond to the tip of the Gripper
end-effector. The skill will do the necessary calculation to transform the input
pose to the corresponding TCP pose to move the LWR. Then it takes the proper
actions to safely grab the object.
Place Object. This skill uses the LWR and Gripper to place the object in the
requested pose. The input pose is adjusted to the TCP and then a safe position
is calculated to ensure that the object will hover the floor before the actual place.
Then it will lower the object with a lower velocity to prevent any undesirable
collisions due to higher accelerations.
In these tasks, the order by which the objects are picked and placed is not
relevant and so graph G does not contain any order dependency. Thus, once an
object is detected it can be picked and placed in its target location.
The agent assumes all objects are pickable. This implies that there is a part
of the object the gripper is able to grab. Since the shapes of the objects are
known in advance, the grab pose, defined in the object’s frame of reference, can
be pre-determined. The gripper’s grabbing pose can be obtained by merging
this information with the rotated bounding box (RBB) that encloses the object,
estimated by the object detection module. For basic shape objects, the grab pose
coincides with the center of the RBB. For compound objects, the grab pose can
be obtained from the center of the RBB, adding half of its width and subtracting
half of its height. This approach worked well even in the presence of significant
noise.
The preferable way to pick an object is applying a top-down trajectory to the
gripper (Fig. 7a). However, this may not be possible due to constraints in the
freedom of motion of the LWR. In such cases, an angular adjustment is applied
to make the pick operation possible (Fig. 7c). This drawback can be mitigated
when the LWR’s movement along the XY axes is available, since the LWR can
be moved to a better position to pick the object of interest.
After an object is picked it must be placed in its target location. The most
efficient move is to place it keeping the gripper orientation. But, again, it may
not be possible. When this happens, a fake placement is done instead: an appro-
priated location is chosen, the object is placed there, and a failure signal is
returned so that the object is picked again. Similarly as before, the availability
of robot movement along the XY axes can avoid this drawback.
The assembly of the puzzle can not be done by putting the parts from above.
The margin of error for the positioning is very thin, making it difficult to use an
approach where the parts are put from the top. Then, using the puzzle fixture
boundary and the already in place parts, a pushing approach can be used. To
find the right order we search all permutations for ordering the objects in their
insertion in the puzzle. The valid solutions must comply with the following rule:
the pushing of the object, horizontally or vertically, is not prevented by any
other object. After a solution is found the graph G is built.
The pick position is selected from one of the convex terminations of the part
that has a width smaller than the Gripper maximum width. The parts of the
puzzle are always assembled from cubes with a width smaller than the gripper
maximum width, hence the pick position must consider a convex termination
with the width of a cube. To calculate the pick position the vertices of the
polygon shape are used. The vertices v are properly ordered and ring accessible
by vi . The main idea is to search for an edge ei = (vi , vi+1 ) that is part of the
frame of reference. An edge ei is a candidate when:
π
ei ≈ ∧ ∠(ei−1 , ei ) ≈ ∠(ei , ei+1 ) ≈
2
where is the length of a cube edge. Approximate values are considered to handle
errors. Afterward, all candidates go through a final validation. To recognize the
object frame of reference, which is needed to correctly place the object, we
assume that the vertices vi is the origin, then, the number of blocks at the left,
right, top and bottom are compared with the original object shape definition.
Once an edge is selected the pick position is given by sum of the normalized
edges ei and ei+1 , and the orientation is given by the normal direction of ei .
The XY is available for this task, thus any object can be picked in the preferable
way.
The next step is to define the set of actions to position the puzzle part in
its rightful place. An offset is added to the final position p of the puzzle part in
the puzzle fixture po = p + offset to prevent an overlap of parts. The offset takes
into account how the parts are connected. After it is placed at po , it has to be
pushed towards p. The first push is towards the closest fixed axis and the next
it against the supporting piece – or axis. Doing a single pushing sequence may
not be enough. For some reason the piece may get stuck, therefore, the sequence
must be repeated. Detecting a stuck piece is simple, since the TCP reports the
applied force and when a force above a threshold is detected a termination is
triggered.
6 Related Work
ROS has become the robot middleware of choice for researcher and the industry.
For example, MoveIt! [1] is a mobile manipulation software suitable for research
and the industry. In addition to manipulation, the creation of behaviors is also
a topic of interest, e.g. ROSCo [5]. Task level programming of robots is an
important exercise for industrial applications. The authors of SkiROS [7] propose
a paradigm based on a hierarchy movement primitives, skills and planning. For
P&P tasks, the authors of [9] propose a manipulation planner under continuous
grasps and placements, while a decomposition of the tasks is proposed by [4].
468 E. Pedrosa et al.
7 Conclusion
The tasks to be accomplished were different. However, while studying the prob-
lems to be solved, we identified common subtasks. In order to take advantage
of that fact, a general system architecture to solve all tasks was developed.
Additionally, the architecture works seamlessly in the ROS infrastructure. This
solution allowed our team to achieve the 5th rank.
References
1. Chitta, S., Sucan, I., Cousins, S.: Moveit! [ros topics]. IEEE Robotics Automation
Magazine 19(1), 18–19 (2012)
2. Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points
required to represent a digitized line or its caricature. Cartographica: The Inter-
national Journal for Geographic Information and Geovisualization (1973)
3. Henry, P., Krainin, M., Herbst, E., Ren, X., Fox, D.: RGB-D mapping: Using
Kinect-style depth cameras for dense 3D modeling of indoor environments. I. J.
Robotic Res. 31(5), 647–663 (2012)
4. Lozano-Pérez, T., Jones, J.L., Mazer, E., O’Donnell, P.A.: Task-level planning of
pick-and-place robot motions. IEEE Computer 22(3), 21–29 (1989)
5. Nguyen, H., Ciocarlie, M., Hsiao, K., Kemp, C.: Ros commander (rosco): Behavior
creation for home robots. In: ICRA, pp. 467–474 (May 2013)
6. Quigley, M., Gerkey, M., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E.,
Wheeler, R., Ng, A.: ROS: An open-source robot operating system. In: ICRA
Workshop on Open Source Software, Kobe, Japan (May 2009)
7. Rovida, F., Chrysostomou, D., Schou, C., Bøgh, S., Madsen, O., Krüger, V., Ander-
sen, R.S., Pedersen, M.R., Grossmann, B., Damgaard, J.S.: Skiros: A four tiered
architecture for task-level programming of industrial mobile manipulators. In: 13th
Internacional Conference on Intelligent Autonomous System, Padova (July 2013)
8. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn., Pren-
tice Hall (2010)
9. Simeon, T., Cortes, J., Sahbani, A., Laumond, J.P.: A manipulation planner for
pick and place operations under continuous grasps and placements. In: Proceedings
of the Robotics and Automation, ICRA 2002, vol. 2, pp. 2022–2027 (2002)
10. Sural, S., Qian, G., Pramanik, S.: Segmentation and histogram generation using
the hsv color space for image retrieval. In: Proceedings of the 2002 International
Conference on Image Processing 2002, vol. 2, pp. II-589–II-592 (2002)
11. Suzuki, S., Abe, K.: Topological structural analysis of digitized binary images by
border following. Computer Vision, Graphics and Image Processing 30(1), 32–46
(1985)
12. Urmson, C., Baker, C.R., Dolan, J.M., Rybski, P.E., Salesky, B., Whittaker, W.,
Ferguson, D., Darms, M.: Autonomous Driving in Traffic: Boss and the Urban
Challenge. AI Magazine 30(2), 17–28 (2009)
Adaptive Behavior of a Biped Robot
Using Dynamic Movement Primitives
Abstract. Over the past few years, several studies have suggested that adaptive
behavior of humanoid robots can arise based on phase resetting embedded in
pattern generators. In this paper, we propose a movement control approach that
provides adaptive behavior by combining the modulation of dynamic movement
primitives (DMP) and interlimb coordination with coupled phase oscillators.
Dynamic movement primitives (DMP) represent a powerful tool for motion
planning based on demonstration examples. This approach is currently used as a
compact policy representation well-suited for robot learning. The main goal is
to demonstrate and evaluate the role of phase resetting based on foot-contact in-
formation in order to increase the tolerance to external perturbations. In particu-
lar, we study the problem of optimal phase shift in a control system influenced
by delays in both sensory information and motor actions. The study is per-
formed using the V-REP simulator, including the adaptation of the humanoid
robot’s gait pattern to irregularities on the ground surface.
1 Introduction
The coordination within or between legs is an important element for legged systems
independently of their size, morphology and number of legs. Evidences from
neurophysiology indicate that pattern generators in the spinal cord contribute to
rhythmic movement behaviors and sensory feedback modulates proper coordination
dynamics [1], [2]. In this context, several authors studied the role of phase shift and
rhythm resetting. Phase resetting is a common strategy known to have several advan-
tages in legged locomotion, namely by endowing the system with the capability to
switch among different gait patterns or to restore coordinated patterns in the face of
attractor. The approach was originally proposed by Ijspeert et al. [8] and, since then,
other mathematical variants have been proposed [9].
In the case of rhythmic movement, the dynamical system is defined in the form of
a linear second order differential equation that defines the convergence to the goal g
(baseline, offset or center of oscillation) with an added nonlinear forcing term f that
defines the actual shape of the encoded trajectory. This model can be written in first-
order notation as follows:
τz = α z [β z (g − y ) − z ] + f
τy = z , (1)
where τ is a time constant the parameters α z , β z > 0 are selected and kept fixed,
such as the system converge to the oscillations given by f around the goal g in a criti-
cally damped manner. The forcing function f (nonlinear term) can be defined as a
normalized combination of fixed basis functions:
ψω
N
f (φ ) = i =1 i i
r
ψ
N
i =1 i
where ωi are adjustable weights, r characterizes the amplitude of the oscillator, ψi are
von Mises basis functions, N is the number of periodic kernel functions, hi > 0 are the
widths of the kernels and ci equally spaced values from 0 to 2π in N steps (N, hi and ci
are chosen a priori and kept fixed). The phase variable φ bypasses explicit dependen-
cy on time by introducing periodicity in a rhythmic canonical system. This is a simple
dynamical system that, in our case, is defined by a phase oscillator:
τφ = Ω , (3)
where Ω is the frequency of the canonical system. In short, there are two main com-
ponents in this approach: one providing the shape of the trajectory patterns (the trans-
formation system) and the other providing the synchronized timing signals (the
canonical system). In order to encode a desired demonstration trajectory ydemo as a
DMP, the weight vector has to be learned with, for example, statistical learning tech-
niques such as locally weighted regression (LWR) given their suitability for online
robot learning.
system achieves the desired attractor dynamics for each individual DOF and the re-
spective forcing terms modulate the shape of the produced trajectories.
The adaptation of learned motion primitives to new situations becomes difficult
when the demonstrated trajectories are available in the joint space. The problem oc-
curs because, in general, a change in the primitive’s parameters does not correspond
to a meaningful effect on the given task. Having this in mind, the proposed solution is
to learn the DMP in task space and relate their parameters to task variables. To con-
cretely formulate the dynamical model, a task coordinate system is fixed to the hip
section that serves as a reference frame where tasks are presented. The y-axis
is aligned with the direction of movement, the z-axis is oriented downwards and the
x-axis points towards the lateral side to form a direct system.
In line with this, a total of six DMP are learned to match the Cartesian trajectories
of the lower extremities of both feet (end-effectors), using a single demonstration. It is
worth note that a DMP contains one independent dynamical system per dimension of
the space in which it is learned. At the end, the outputs of these DMP are converted,
through an inverse kinematic algorithm, to the desired joint trajectories used as refer-
ence input to a low-level feedback controller. Fig. 1 shows the close match between
the reference signals (solid lines) and the learned ones (dashed lines) as defined in the
reference frame. The gray shaded regions show the phases of double-support.
Once the complete desired movement {y, y , y} is learned (i.e., encoded as a DMP),
new trajectories with similar characteristics can be easily generated. In this work, the
DMP parameters resulting from the previous formulation (i.e., amplitude, frequency
and offset) are directly related to task variables, such as step length, hip height, foot
clearance and forward velocity. For example, the frequency is used for speed up or
slow down the motion, the amplitudes of the DMP associated with the y- and z-
coordinates are used to modify the step length and the hip height (or foot clearance) of
the support leg (or swing leg), respectively.
0.8
0.7
0.6 x
y
Position (m)
0.5
z
0.4
0.3
0.2
0.1
-0.1
0 0.5 1 1.5 2
Time (s)
Fig. 1. Result of learning the single-demonstration: the task is specified by the x, y and z-
coordinates of the robot’s foot in the reference frame. Reference signal (solid line) and trained
signal (dashed line) are superimposed. Gray shaded regions show double-support phases.
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 473
2 Learned Signal
Reference Signal
1
Position, y(t)
-1
-2
0 2 4 6 8 10 12 14 16
Time (s)
Fig. 2. Time courses of the rhythmic dynamical system with continuous learning (solid line)
and the input reference signal (dashed-line). The vertical dashed-lines in the plot indicate, from
left to right, the instant in which the learned signal doubles the amplitude r (at t = 4 s), change
the reference signal (at t = 8 s) and modulates the baseline g (at t = 12.5 s).
474 J. Rosado et al.
3
COG path
Transition from Turning curve
2 turning left to right
1
y (m)
-1
Transition from
Transition from turning to straight
straight to turning
-2
-3
0 1 2 3 4 5 6 7
x (m)
Fig. 3. View of the movement path of the robot’s COG projected on the ground and the corres-
ponding turning curve. The black box represents an obstacle placed on the path.
As a result, the dynamics of the phase oscillators in (3), for the left and the right
leg, are modified according to:
( )(
τφleft = Ω − Kφ sin (φleft − φright − π ) − φleft − φ contact δ t − tleft
contact
− Δt ) , (4)
( )(
τφright = Ω − Kφ sin (φright − φleft − π ) − φrightt − φ contact δ t − trightt
contact
− Δt )
where K φ is the coupling strength parameter ( K φ > 0 ) , φ contact is the phase value
to be reset when the foot touch the ground, δ (⋅) is the Dirac’s delta function,
ticontact (i = left, right) is the time when the foot touch the ground and Δt is a factor
used to study the influence of delays in both sensory information an motor control.
4 Numerical Simulations
50
-50
-100
-150
-200
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
Timing (s)
Fig. 4. Additional tolerance to perturbation forces applied at different instants of the movement
cycle when using phase resetting.
2 210
without phase resetting
with phase reseting
phase difference
1 200
0 190
180
-1
5 10 15 20 25
Time (s)
Fig. 5. Velocity of the COG in the direction of the motion with a perturbation force applied at
11.5s without and with the use of phase resetting; the time course of the phase difference
between the oscillators is represented in a different vertical axis.
irregularities up to 0.5 cm height. Here, the proposed strategy is to change the phase
of the canonical system to a value corresponding to the point of ground contact on the
normal signal generation (when there’s no early contact with the ground).
In the second experiment, it is examined how the phase reset of the canonical oscil-
lator provides changes on the DMP that allows the robot to overcome a set of irregu-
larities that assemble like a set of steps of a small staircase. These consist of two
consecutive steps up followed by two steps down, each one with 2cm high. Beside
this the robot system is also supposed to receive visual information regarding the
stairs location and height in order to modify the basic gait pattern (foot clearance and
step size). Fig. 7 shows the path the robot has to go through and the sequence of cap-
tured images of the robot stepping on the first step, followed by the second step and
after a few steps on this, the first down step followed by the final down step takes the
robot to the ground level. As in the previous example, a phase reset is applied as soon
the robot senses the foot as hit the ground sooner than expected.
Fig. 6. Snapshots of the robot walking on a level surface when it finds a small step 2 cm high
that disturbs its balance (top: without phase resetting; bottom: with phase resetting). Numerical
simulations performed in V-REP [10].
478 J. Rosado et al.
Fig. 7. Top: full view of the path the robot has to go through; center and bottom: sequence of
the robot walking through the path. Numerical simulations performed in V-REP [10] (a video
of this experiment is available at: http://www.youtube.com/watch?v=WjBq27hJAJE).
Adaptive Behavior of a Biped Robot Using Dynamic Movement Primitives 479
5 Conclusions
This paper presents a study in which online modulation of the DMP parameters and
interlimb coordination trough phase coupling providing adaptation of biped locomotion
with improvement to external perturbations. By using the DMP in the task space, new
tasks are easily accomplished by modifying simple DMP parameters that directly relate
to the task, such as step length, velocity, foot clearance. By introducing coupling be-
tween members and using phase reset we have shown that adaption to irregularities on
the terrain are successful. The phase resetting methodology also allowed increasing the
tolerance to external perturbations, such as forces that push or pull the robot on the di-
rection of the movement. Future work will address problems like the role of phase reset-
ting and DMP parameters change in sudden changes on the trunk mass, stepping up
stairs, climbing up and down on ramps. Demonstrations from human data behavior will
be collected using a VICON system and use to train the DMP in new tasks.
Acknowledgements. This work is partially funded by FEDER through the Operational Pro-
gram Competitiveness Factors - COMPETE and by National Funds through FCT - Foundation
for Science and Technology in the context of the project FCOMP-01-0124-FEDER-022682
(FCT reference Pest-C/EEI/UI0127/ 2011).
References
1. Hultborn, H., Nielsen, J.: Spinal control of locomotion – from cat to man. Acta Physiologica
189(2), 111–121 (2007)
2. Grillner, S.: Locomotion is vertebrates: central mechanisms and reflex interaction.
Physiological Reviews 55(2), 247–304
3. Aoi, S., Ogihara, N., Funato, T., Sugimoto, Y., Tsuchiya, K.: Evaluating the functional roles of
phase resetting in generation of adaptive human biped walking with a physiologically based
model of the spinal pattern generator. Biological Cybernetic 102, 373–387 (2010)
4. Yamasaki, T., Nomura, T., Sato, S.: Possible functional roles of phase resetting during
walking. Biological Cybernetic 88, 468–496 (2013)
5. Aoi, S., Tsuchiva, K.: Locomotion control of a biped robot using nonlinear oscillators.
Autonomous Robots 19(3), 219–232 (2005)
6. Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.: A framework
for learning biped locomotion with dynamical movement primitives. International Journal
of Humanoid Robots (2004)
7. Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from dem-
onstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)
8. Ijspeert, A., Nakanishi, J., Schaal, S.: Movement imitation with nonlinear dynamical sys-
tems in humanoid robots. In: Proceedings of the 2002 IEEE International Conference on
Robotics and Automation, pp. 1398–1403 (2002)
9. Ijspeert, A., Nakanishi, J., Hoffmann, H., Pastor, P., Schaal, S.: Dynamical movement primi-
tives: learning attractor models for motor behaviors. Neural Computation 25, 328–373 (2013)
10. Rohmer, E., Singh, S., Freese, M.: V-REP: A versatile and scalable robot simulation
framework. In: IEEE/RSJ International Conference on Intelligent Robots and Systems,
pp. 1321–1326 (2013)
Probabilistic Constraints for Robot Localization
1 Introduction
Uncertainty plays a major role in modeling most real-world continuous systems
and, in particular, robotic systems. A reliable framework for decision support
must provide an expressive mathematical model for a sound integration of the
system and uncertainty.
Stochastic approaches associate a probabilistic model to the problem and rea-
son on approximations of the most likely scenarios. In highly nonlinear problems
such approximations may miss relevant satisfactory scenarios leading to erro-
neous decisions. In contrast, constraint programming (CP) approaches reason
on safe enclosures of all consistent scenarios. Model-based reasoning and what-if
scenarios are supported through safe constraint techniques, which only eliminate
scenarios that do not satisfy model constraints. However, safe reasoning based
exclusively on consistency may be inappropriate to sufficiently reduce the space
of possibilities on large uncertainty settings.
This paper shows how probabilistic constraints can be used for solving global
localization problems providing a probabilistic characterization of the robot posi-
tions (consistent with the environment) given the uncertainty on the sensor
measurements.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 480–486, 2015.
DOI: 10.1007/978-3-319-23485-4 47
Probabilistic Constraints for Robot Localization 481
2 Probabilistic Robotics
Probabilistic robotics [1] is a generic approach for dealing with hard robotic
problems that relies on probability theory to reason with uncertainty in robot
perception and action. The idea is to model uncertainty explicitly, representing
information by probability distributions over all space of possible hypotheses
instead of relying on best estimates.
Probabilistic approaches are typically more robust in the face of sensor limita-
tions and noise, and often scale much better to unstructured environments. How-
ever, the required algorithms are usually less efficient when compared with non-
probabilistic algorithms, since entire probability densities are considered instead
of best estimates. Moreover, the computation of probability densities require
working exclusively with parametric distributions or discretizing the probability
space representation.
In global localization problems, a robot is placed somewhere in the environ-
ment and has to localize itself from local sensor data. The probabilistic paradigm
maintains over time, the robot’s location estimate which is represented by a
probability density function over the space of all locations. Such estimate is
updated whenever new information is gathered from sensors, taking into account
its underlying uncertainty.
A generic algorithm known as Bayes filter [2] is used for probability esti-
mation. The Bayes filter is a recursive algorithm that computes a probability
distribution at a given moment from the distribution at the previous moment
accordingly to the new information gathered. Two major strategies are usually
adopted for the implementation of Bayes filters in continuous domains: Gaussian
filters and nonparametric filters.
Gaussian techniques share the idea that probabilities are represented by
multivariate normal distributions. Among this techniques the most popular are
(Extended) Kalman Filters [3,4] which are computationally efficient but inad-
equate for problems where distributions are multimodal and subject to highly
nonlinear constraints.
Nonparametric techniques [5,6] approximate continuous probabilities by a
finite number of values. Representatives of these techniques for robot localiza-
tion problems are Grid and Monte Carlo Localization algorithms [7,8]. Both
techniques do not make any assumptions on the shape of the probability distri-
bution and have the property that the approximation error converges uniformly
to zero as the the number of values used to represent the probabilistic space
goes to infinity. The computational cost is determined by the granularity of
the approximation (the number of values considered) which is not easy to tune
depending both on the model constraints and on the underlying uncertainty.
3 Constraint Programming
Continuous constraint programming [9,10] has been widely used to model safe
reasoning in applications where uncertainty on the values of the variables is
modeled by intervals including all their possibilities. A Continuous Constraint
482 M. Correia et al.
box cover S for the feasible space of the model constraints C (line 1). Function
Branch&P rune (see [14] for details) is used with a grid oriented parametriza-
tion, i.e., it splits the boxes in the grid and choose to process only those boxes
that are not yet inside a grid cell, stopping when there are no more eligible
boxes. Matrix M is initialized to zero (line 2) as well as the normalization factor
P that will contain, in the end, the overall sum of all non normalized parcels
(line 3). For each box B in the cover S (lines 4-7), its corresponding index of
the matrix cell is identified (line 5) and its probability is computed by func-
tion M CIntegrate (that implements the Monte Carlo method to compute the
contribution of B) and assigned to the value in that cell (line 6). The normaliza-
tion factor is updated (line 7) and used in the end to normalize the computed
probabilities (line 9).
object in the direction αi (or is δmax if such distance exceeds the maximum
ladar range). The difference between the predicted and the distance recorded
by the ladar is the measurement error and its square is accumulated for all
measurements and used to compute the pdf.
The specialized function that narrows a domains box accordingly to the ladar
measurements maintains a set of numerical constraints that can be enforced
over the variables of the problem and then calls the generic interval arithmetic
narrowing procedure [16]. The numerical constraints may result from each ladar
measurement. Firstly a geometric function is used to determine which of the
walls in the map can eventually be seen by a robot positioned in the box with
an angle of vision within the range of the robot pose angle plus the relative
angle of the ladar measurement. If no wall can be seen, the predicted distance
is the maximum ladar range δmax and a constraint is added to enforce the error
between the predicted and the ladar measurement not to exceed a predefined
threshold max . If it is only possible to see a single wall, an adequate numerical
constraint is enforced to restrain the error between the ladar measurement and
the predicted distance for a pose x, y, α. Notice that whenever there is the
possibility of seeing more that one wall it cannot be decided which constraint to
enforce, and the algorithm proceeds without associating any constraint to the
ladar measurement.
5 Experimental Results
The probabilistic constraint framework was applied on a set of global localiza-
tion problems covering different simulated environments and robot poses and
are illustrative of the potential and limitations of the proposed approach. The
algorithms were implemented in C++ over the RealPaver constraint solver [16]
and the experiments were carried out on an Intel Core i7 CPU at 2.4 GHz.
Our grid approach adopted as reference a grid granularity commonly used in
indoor environments [1]: 15 cm for the xy dimensions (10 units represents 1.5
cm), and 4 degrees for the rotational dimension. Based on previous experience
on the hybridization of Monte Carlo techniques with constraint propagation we
adopted a small sampling size of N = 100.
In all our experiments increasing the sampling size did not improve the qual-
ity of the results. Similarly the reference value of 4 degrees for the grid size of the
rotational dimension was fixed since coarser grids prevented constraint pruning
and finer grids increased the computation time without providing better results.
In the following problems, to illustrate the effect of the resolution of the xy grid
we consider, apart from the reference grid size, a 4 times coarser grid (60 cm)
and a 4 times finer grid (3.75 cm).
Fig. 1(left) illustrates a problem where, from the given input, the robot loca-
tion can be circumscribed to a unique compact region. The results obtained
with the coarse grid clearly identify a single cell enclosing the simulated robot
location. The obtained enclosure for the heading direction is guaranteed to be
between 40 and 48 degrees (not shown). With the reference and the fine-grained
grids the results were similar. The CPU time was 5s, 10s and 30s for increasing
grid resolutions.
Probabilistic Constraints for Robot Localization 485
0 0 0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
1000 1000
0 0 0
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Fig. 1. (above) environment and robot poses with the distance measurements cap-
tured by the robot sensors; (below) computed solutions given the environment and the
measurements.
6 Conclusions
In this paper we propose the application of probabilistic constraint programming
to probabilistic robotics. We show how the approach can be used to support
sound reasoning in global localization problems integrating prior knowledge on
the environment with the uncertainty information gathered by the robot sensors.
Preliminary experiments on a set of simulated problems highlighted the potential
and limitations of the approach.
In the future the authors aim to extend the approach to address kinematic
constraints and their underlying uncertainty. Probabilistic constraint reasoning
486 M. Correia et al.
References
1. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press (2006)
2. Särkkä, S.: Bayesian filtering and smoothing. Cambridge University Press (2013)
3. Kalman, R.E.: A new approach to linear filtering and prediction problems. ASME
Journal of Basic Engineering (1960)
4. Julier, S.J., Jeffrey, Uhlmann, K.: Unscented filtering and nonlinear estimation.
Proceedings of the IEEE, 401–422 (2004)
5. Kaplow, R., Atrash, A., Pineau, J.: Variable resolution decomposition for robotic
navigation under a pomdp framework. In: IEEE Robotics and Automation,
pp. 369–376 (2010)
6. Arulampalam, M., Maskell, S., Gordon, N.: A tutorial on particle filters for online
nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Proc. 50, 174–188
(2002)
7. Wang, Y., Wu, D., Seifzadeh, S., Chen, J.: A moving grid cell based mcl algorithm
for mobile robot localization. In: IEEE Robotics and Biomimetics, pp. 2445–2450
(2009)
8. Dellaert, F., Fox, D., Burgard, W., Thrun, S.: Monte carlo localization for mobile
robots. In: IEEE Robotics and Automation, pp. 1322–1328 (1999)
9. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proc. of the 13th IJCAI
(1993)
10. Benhamou, F., McAllester, D., van Hentenryck, P.: CLP(intervals) revisited. In:
ISLP, pp. 124–138. MIT Press (1994)
11. Hentenryck, P.V., Mcallester, D., Kapur, D.: Solving polynomial systems using a
branch and prune approach. SIAM Journal Numerical Analysis 34, 797–827 (1997)
12. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.F.: Revising hull and box
consistency. In: Procs. of ICLP, pp. 230–244. MIT (1999)
13. Moore, R.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966)
14. Carvalho, E.: Probabilistic Constraint Reasoning. PhD thesis, FCT/UNL (2012)
15. Hammersley, J., Handscomb, D.: Monte Carlo Methods. Methuen, London (1964)
16. Granvilliers, L., Benhamou, F.: Algorithm 852: Realpaver an interval solver using
constraint satisfaction techniques. ACM Trans. Mathematical Software 32(1),
138–156 (2006)
Detecting Motion Patterns in Dense Flow
Fields: Euclidean Versus Polar Space
1 Introduction
The work presented by this research studies and compares techniques for motion
analysis and segmentation using dense optical flow fields. Motion segmentation is
the process of dividing an image into different regions using motion information
in a way that each region presents homogeneous characteristics. Two techniques
are considered during this research, the Expectation-Maximization (EM) and K-
means. The performance and the behavior of these techniques is well-known in
the scientific community since they have been applied in countless applications of
machine learning; however, this paper presents their performance for clustering
dense motion fields obtained by a realistic robotic application [5]. The optical
flow technique [7] is used this paper because it estimates dense flow fields in short
time and makes it suitable for robotic applications without specialized computer
devices however, the quality of the flow fields that are obtained is lower when
compared to the most recent methods.
Four different feature spaces are considered in this research: the motion vector
is represented in Cartesian space or Polar space, and the feature can have the
positional information of the image location. Mathematically, this is represented
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 487–492, 2015.
DOI: 10.1007/978-3-319-23485-4 48
488 A. Pinto et al.
by the following features: flc = (x̄, ȳ, ū, v̄), flp = (x̄, ȳ, m̄, φ̄), f c = (u, v) and
f p = (m, φ), where (x, y) is the image location, (u, v) is the flow vector in
Cartesian space and the (m, φ) is the flow vector in Polar space (magnitude
and angle). In most cases, a normalization is performed [3] due to different
physical meanings of the feature’s components: v̄ = mean(v) σv ,, where σv is the
standard deviation of the component v. As can be noticed, the influence of this
normalization on the segmentation procedure is also analyzed.
Therefore, contributions of this articles include: study about motion analysis
in dense optical flow fields and for a practical use in a mobile robot. The goal is to
segment different objects according to their motion coherence; comparison about
the most suitable feature space for clustering dense flow fields; extensive qual-
itative and quantitative evaluations considering several baseline and pixel-wise
clustering techniques, namely the K-means and the Expectation-Maximization.
2 Related work
The work [3] evaluates the performance of several clustering methods namely,
K-means, self-tuning spectral clustering and nonlinear dimension reduction.
Authors defend that one most important factor for clustering dense flow fields
is the proper choice of the distance measure. They considered the feature space
to be formed by the pixel coordinates and motion vectors, whose values are
normalized by taking into consideration the mean and standard deviation of
each feature. Results show the difficulty of segmenting dense flow fields because
no technique was outperformed and, thereby, the choice of the most suitable
clustering technique and distance metric must be investigated for a specific con-
text and environment. An accurate segmentation technique that resorts to dense
flow fields and uses long term point trajectories is presented in [1]. By clustering
trajectories over time, it is possible to use a metric that measures the distance
between these trajectories. The work in [4] proposes a sparse approach for detect-
ing salient regions in the sequences. Feature points are tracked over time in order
to pursue saliency detection as violation of co-visibility. The evaluation of the
method shows that it cannot achieve a real-time computational performance
since it took 32.6 seconds to process a single sequence. The [2] addresses the
problem of motion detection and segmentation in dynamic scenes with small
camera movements and resorting to a set of moving points. They use the Lukas-
Kanade optical flow to compute the sparse flow field (features obtained by the
Harris corner). Afterwards, these points are clustered using a variable bandwidth
mean-shift technique and, finally, the cluster segmentation is conducted using
graph cuts.
3 Practical Results
motion analysis in a robotic and surveillance context [7]. The EM and the K-
means are used in this research as baselines for segmenting dense optical flow
fields and they were implemented as standard functions. In the first experiment,
the assessment was performed using an objective (quantitative) and subjective
(qualitative) evaluation. The objective metric F-score [6] provides quantitative
quality evaluations of the clustering results since it weights the average of pre-
cision and recall and reaches the best value at 1. The baseline methods provide
a pixel-wise segmentation and factors such as the computational effort and the
quality of the visual clustering are considered. Experiments were performed 1
considering four feature spaces: flc , flp , f c and f p .
The results start by demonstrating the segmentation performed by the EM
and K-means in several testing sequences that capture a real surveillance scenario
(indoor). Figures 1(a), 1(b) and 1(c) depict only three dense flow fields that were
obtained from these sequences. Using the EM and the K-means for clustering
the flow field represented in f p have resulted in figures 2(d) to 2(f) and 3(d)
to 3(f), respectively. Figures 2(a) to 2(c) and 3(a) to 3(c) depict results for the
Cartesian space.
As can be noticed, the segmentation conducted by the EM in f c do not
originate a suitable segmentation because the clusters of people appear larger
and they have spatially isolated regions that are meaningless (hereafter, called
clustering noise). This issue is more depicted in figure 2(c) however, the same
flow field segmented in Polar space originated a result that represents more
faithfully the person’s movement since it is less affected by meaningless and
isolated regions, see figure 2(f). On the other hand, the visual illustration of
the motion segmentation conducted by the K-means in f p is similar to f c . The
qualitative analysis of these results is not possible however; and independently
of the feature space that is considered (f p or f c ), the result of the K-means is
better than the EM since the person’s movements are more faithfully depicted.
In addition, the clustering noise of the K-means is lower than the EM for these
two feature spaces. The K-means is simpler than the EM however, it is a powerful
technique to cluster the input dataset.
Fig. 1. Figures of the first row depict dense flow fields that were obtained from the
technique proposed in [7]. The HSV color space is used to represent the direction (color)
and magnitude (saturation) of the flow.
1
The results in this section were obtained using an I3-M350 2.2GHz and manually
annotated images.
490 A. Pinto et al.
Fig. 3. Comparison between the K-means in f c (first row) and K-means in f p (second
row). Motion segmentation for the flow fields represented in figures 1(a), 1(b) and 1(c).
EMp was 0.630 and 0.809; the K-meansc and K-meansp was 0.893 and 0.852
(in average). Therefore, the Polar feature space is a clear advantage for the EM
technique while the quality of the segmentation produced by the K-means is not
so affected by the feature space. Both clustering techniques were able to char-
acterize the two motion models present in each trial although, EM technique
produces clusters affected by a higher level of noise. This may be caused due
to a process that is more complex in nature since it is an iterative scheme that
computes the posterior probabilities and the log-likelihood. Therefore, it is less
robust to noisy data relatively to the K-means that is a more simpler technique.
In addition, table 1 proves that the quality of the segmentation obtained by
the EM was substantially better when compared to the experiments with f c .
In detail, the performance of the EM increased by 43.3% however, the average
performance of the K-mean was similar to the result obtained in f c . Generally,
the flc makes possible for the EM and K-means to achieve a better segmenta-
tion quality (despising the first sequence). These trials depict that the EM and
K-mean are not suitable for the Polar feature space with information about the
image location, flp , since results are inconclusive (only some trials reported an
improved quality).
convergence of the clustering in both techniques since the processing time is sub-
stantially reduced, especially for the EM case whose processing time is reduced
by 46.9% while the processing time of the K-means is reduced by 14.1%.
4 Conclusion
Therefore, the paper presented an important research topic for motion perception
and analysis because the segmentation produces poor results when the features
space is not properly adjusted. This compromises the ability of the mobile robot
to understand its surrounding environment. An extensive set of experiments
were conducted as part of this work and several factors were considered and
studied such as, space (Cartesian and Polar) and dimensionality of the feature
vector. Results prove that choosing a good feature space for the detection of
motion patters is not a trivial problem since it influences the performance of
the Expectation-Maximization and K-means. This last technique in Cartesian
space revealed the best performance for motion segmentation of flow fields (with
a resolution of 640×480). It originates a good visual segmentation (evaluated
using the F-score metric) in a reduced period of time since it took 0.122 seconds
to compute.
This work was funded by the project FCOMP - 01-0124-FEDER-022701.
References
1. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories.
In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS,
vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
2. Bugeau, A., Prez, P.: Detection and segmentation of moving objects in complex
scenes. Computer Vision and Image Understanding 113(4), 459–476 (2009)
3. Eibl, G., Brandle, N.: Evaluation of clustering methods for finding dominant optical
flow fields in crowded scenes. In: International Conference on Pattern Recognition,
pp. 1–4 (December 2008)
4. Georgiadis, G., Ayvaci, A., Soatto, S.: Actionable saliency detection: Independent
motion detection without independent motion estimation. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 646–653 (2012)
5. Pinto, A.M., Costa, P.G., Correia, M.V., Paulo Moreira, A.: Enhancing dynamic
videos for surveillance and robotic applications: The robust bilateral and temporal
filter. Signal Processing: Image Communication 29(1), 80–95 (2014)
6. Dan Melamed, I., Green, R., Turian, J.P.: Precision and recall of machine transla-
tion. In: Proceedings of the 2003 Conference of the North American Chapter of the
Association for Computational Linguistics on Human Language Technology: Com-
panion Volume of the Proceedings of HLT-NAACL, NAACL-Short 2003, pp. 61–63.
Association for Computational Linguistics, Stroudsburg (2003)
7. Pinto, A.M., Paulo Moreira, A., Correia, M.V., Costa, P.G.: A flow-based motion
perception technique for an autonomous robot system. Journal of Intelligent and
Robotic Systems, 1–25 (2013) (in press)
Swarm Robotics Obstacle Avoidance:
A Progressive Minimal Criteria Novelty
Search-Based Approach
Abstract. Swarm robots are required to explore and search large areas.
In order to cover largest possible area while keeping communications,
robots try to maintain hexagonal formation while moving. Obstacle
avoidance is an extremely important task for swarm robotics as it saves
robots from hitting objects and being damaged.
This paper introduces novelty search evolutionary algorithm to swarm
robots multi-objective obstacle avoidance problem in order to overcome
deception and reach better solutions.
This work could teach robots how to move in different environments
with 2.5% obstacles coverage while keeping their connectivity more than
82%. Percentage of robots reached the goal was more than 97% in 70%
of the environments and more than 90% in the rest of the environments.
1 Introduction
Our main interest in this work, is to teach swarm robots by using novelty search
evolutionary algorithm how to reach a certain goal while maintaining formation
and avoiding obstacles.
2 Related Work
2.1 Novelty Search
Lehman and Stanley proposed a new change in genetic algorithms [1]. Instead
of calculating a fitness function and selecting individuals with the best fitness
values, individuals who have more novel behaviour than other individuals are
selected to be added to the new generation.
Novelty search can be easily implemented on top of most evolutionary algo-
rithms. Basically, the fitness value would be replaced with what is called a nov-
elty metric. The novelty metric (sparseness of an individual) is calculated as the
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 493–498, 2015.
DOI: 10.1007/978-3-319-23485-4 49
494 N.M. Rezk et al.
average distance between behavior vectors of the individual and its kth nearest
neighbors, and an archive.
Pure novelty search for large search spaces is not enough to reach solutions
as the algorithm will spend a lot of time searching behaviors that are not meet-
ing the goal. So, novelty search can overcome deception but cannot work alone
without the guidance of the fitness value.
In MCNS (Minimal Criteria Novelty Search) only individuals who have a
fitness value greater than a minimal criteria would be assigned their novelty
score [2]. Otherwise their novelty score would be zero. Zero novelty value indi-
viduals would only be used for reproduction if no other individuals meet the
minimal criteria. It is clear that MCNS acts as random search until individuals
that meet the minimal criteria are reproduced. So, MCNS should be seeded with
initial population that meets the minimal criteria.
Progressive minimal criteria novelty search was proposed by Gomes et al.
to overcome the need of seeding the MCNS algorithm with initial population
[3]. The minimal criteria is a dynamic fitness threshold initially set to zero. The
fitness threshold progressively increases among generations to avoid the search
from exploring irrelevant solutions. In each generation, the new criteria is found
by determining the value of the P-th percentile of the fitness scores in the current
population. This means that P percent of the fitness values would fall under the
minimal criteria.
3 Applying PMCNS
To apply PMCNS, we have several issues to handle. The penalty function of our
problem is a minimizing function. It needs to minimize the penalty for not reach-
ing the goal, penalty for non-cohesion and penalty for collisions. The PMCNS
algorithm equations assumes that the fitness function is a maximizing function.
It allows individuals that have fitness values higher than the minimal criteria
to be selected for the next generation. We have two options to apply. The first
option is to change all of the equations of the algorithm to be a maximal criteria
algorithm not a minimal criteria.
The second option (which was adopted in this work) is to inverse all the
penalty values to change the problem from a minimizing to a maximizing prob-
lem. The minimum value would be the maximum value and the maximum value
would be the minimum value . Equation (2) shows how the penalty value would
be inverted.
where max penalty is the maximum penalty of the current generation. penalty(i)
is subtracted from current generation maximum penalty so the inverted penalty
values will have the same range of penalty values.
We need to decide how to capture the behavior vector, and how to apply
PMCNS to a multi-objective problem. To fill the behaviour vector, we need to
make the vector express how the controller behave through simulation. For two
reasons, we chose the genomes of the individual to express the behaviour vector.
The first reason is that the genomes are the parameters of the three versions of
the force law. Those parameters decide how the robot will behave with robots,
obstacles, and the goal, so they express the behaviour of the controller. The
second reason is that Cuccu and Gomez stated that the simplest way to fill the
behavior vector is to fill it with the individual genomes [6].
Fig. 1. Minimum penalty of each generation for both objective-based and novelty
search evolutionary algorithms.
The next experiments were held to examine the changes can be done in the eval-
uation module to reach better solutions. Since the previous experiment showed
that PMCNS can perform better than objective-based search, we used PMCNS
in the rest of experiments. In these experiments PMCNS evolutionary algorithm
with percentile= 50% was used.
The target of these experiments is to train robots to move in environments
with obstacle coverage less than or equal 2.5%. So, the environments for training
contained 40 robots and 50 obstacles. The diameter of the obstacle is 0.2 units,
while the diameter of the robot is 0.02 units. The arena dimensions are 9 x 7
units, so the obstacles coverage is 2.5% of the environment. Robots are initially
placed at the bottom left of the arena and the goal is located at the top right of
the arena. Obstacles are randomly placed in the arena. There is an area around
the nest where no obstacles are placed to prevent proximity collisions.
Robot can sense goal at any distance. A penalty is added at the end of simu-
lation (1500 time step) if less than 80% of the robots did not reach the goal area.
The goal area is 4R from the center of the goal. Each individual was evaluated
20 times. Most of those settings are like Hettiarachchi and Spears experimental
settings in their work [4]. For each experiment the best penalty value individual
performance was tested over 20, 40, 60, 80, and 100 robots moving in environ-
ments that contain 10, 20, 30, 40, and 50 obstacles corresponding to obstacle
coverages equal 0.5, 1, 1.5, 2 and 2.5% of the environment. So, the total number
of performance experiments=25. Each experiment is evaluated 50 times.
1. First Penalty Function (Penalty Experiment 1)
In this experiment, robot can sense neighbor robots at distance 1.5R where
R is the desired separation between robots. Robot attracts its neighbors if
neighbors are 1.5R distance away, and repulses its neighbors if the neighbors
Swarm Robotics Obstacle Avoidance: A PMCNS-Based Approach 497
are closer than 1.5R distance. A penalty for non cohesion is added if less
than or more than 6 robots are found at distance R from the robot.
Obstacles are sensed at distance equals double the obstacle diameter from
the center of the robot to the center of the obstacle. Robot starts to interact
with the obstacle at distance equals double the obstacle diameter from the
center of the robot to the center of the obstacle. A penalty is added for
collision if the distance between center of robot and center of obstacle is less
than robot diameter.
2. Second Penalty Function (Penalty Experiment 2) In this experiment, we
changed the distances of doing action and adding penalty for cohesion (robot
to robot interaction). Robot can sense neighbor robots at distance 1.5R,
where R is the desired separation between robots. Robot attracts its neigh-
bors if the neighbors are R distance away and repulses its neighbors if the
neighbors are closer than R distance. A penalty for non cohesion is added if
less than or more than 6 robots are found at distance R from the robot
3. Applying Harder Problem on Third Penalty Function (Penalty Experiment 3)
In the last two experiments, we noticed that results are the worst at 50
obstacle environments, so we decided to hold a new experiment with same
settings of penalty experiment 2, but the training environments contain 60
obstacles instead of 50 obstacles to see if the training environments were
harder would the robots behave better for easier environments.
remained connected was over 90%. The number of collisions shows the total
number of collisions found in each experiment.
This work shows that progressive minimal criteria novelty search can perform in
a different way than objective-based search. It can reach better solutions than
objective-based search for a multi-objective task for swarm robots. However we
believe that as the task is deceptive we can have better solutions and better
evolutionary behavior using novelty search algorithms than the behavior reached
in this work.
The purpose of the upcoming experiments is to find the part of the fitness
function that was decepted by the multi-objective problem to describe it in the
behavior vector and apply novelty search. Otherwise, we shall prove that the
way used in this work for calculating the fitness value using the multi-objective
function was able to overcome the deception of the problem and novelty search
will not be of a big value.
This work examined different changes in the evaluation module of the multi-
objective genetic algorithm settings. These changes can enhance one objective,
but another objective may get worse. This work proved that using harder prob-
lem for learning during the evolutionary algorithm will give us better solutions
for easier problems.
References
1. Lehman, J., Stanley, K.: Improving evolvability through novelty search and self-
adaptation. In: Proceedings of the 2011 IEEE Congress on Evolutionary Computa-
tion (CEC), Piscataway, NJ, US (2011)
2. Lehman, J., Stanley, K.: Revising the evolutionary computation abstraction: mini-
mal criteria novelty search. In: Proceedings of the Genetic and Evolutionary Com-
putation Conference (GECCO), New York, US (2010)
3. Gomes, J., Urbano, P., Christensen, A.L.: Progressive minimal criteria nov-
elty search. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.)
IBERAMIA 2012. LNCS, vol. 7637, pp. 281–290. Springer, Heidelberg (2012)
4. Hettiarachchi, S., Spears, W., Spears, D.: Physicomimetics, chapter 14, pp. 441–473.
Springer, Heidelberg (2011)
5. Prabhu, S., Li, W., McLurkin, J.: Hexagonal lattice formation in multi-robot sys-
tems. In: 11th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS) (2012)
6. Cuccu, G., Gomez, F.: When novelty is not enough. In: Di Chio, C., Cagnoni, S., Cotta,
C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M.,
Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS,
vol. 6624, pp. 234–243. Springer, Heidelberg (2011)
7. Rezk, N., Alkabani, Y., Bedour, H., Hammad, S.: A distributed genetic algorithm
for swarm robots obstacle avoidance. In: IEEE 9th International Conference on
Computer Engineering and Systems (ICCES), Cairo, Egypt (2014)
Knowledge Discovery
and Business Intelligence
An Experimental Study on Predictive Models
Using Hierarchical Time Series
1 Introduction
Nowadays, with the increasing competitiveness, it is important for companies to
adopt management strategies that allow them to value up against the compe-
tition. In the retail sector, in particular, there is an evident relationship among
different time series. The problem presented here is related to a Portuguese leader
company in the retail sector, in the electronics area. As we see in Figure 1, the
total sales of the company can be divided into five business unities: Home Appli-
ances (U51), Entertainment (U52), Wifi (U53), Image (U54) and Mobile (U55),
each one can be divided in 137 different stores.
In this paper, we propose a predictive model that estimates the monthly
sales revenue for all stores of this company. We then compare this flat model
that ignores the hierarchy present in the time series, with three other ways,
existing in the literature, to combine the obtained forecasts, by exploring its
hierarchical structure (e.g. [1,2,3,4]): bottom-up, top-down and combination of
predictions made at different levels of the hierarchy. The experimental results
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 501–512, 2015.
DOI: 10.1007/978-3-319-23485-4 50
502 A.M. Silva et al.
obtained with our case study comprove that taking advantage of the hierarchical
structure present in the data leads to an improvement of the models performance
as it reduces the error of the forecasts.
This paper is organized as follows: in Section 2 the related work is presented;
furthermore, in Section 3, is described how the forecast model was built and how
we do the comparison between the different hierarchical models; and finally, in
Section 4 are exposed the conclusions of the work and future work.
2 Related Work
Increasingly, the data mining techniques have been applied in time series analy-
sis [5]. In the retail sector, where the time series display well defined components
of trend and seasonality, the usage of learning algorithms such as Artificial Neural
Networks (ANN) proves to be more efficient than the application of traditional
methods, once that this one can capture the non-linear dynamics associated to
these components and their interactions [6]. However, to apply these algorithms
we must study the best way to present the data. Some studies show that, as
standard, the ANNs are most efficient when applied to time series with trend
and seasonality correction [7]. In addition, in the retail sector, normally, there
are several variables that can somehow justify fluctuations in sales. In [8], ANNs
are applied in daily sales forecasting of a company in the shoe’s industry, using as
explanatory variables: month of the year, day of the week, holidays, promotions
or special events, sales period, weeks pre/post Christmas and Easter, the aver-
age temperature, the turnover index in retail sale of textiles, clothing, footwear
and leather articles and the daily sales of previous seven days with correction of
trend and seasonality.
In [9] the predictive power of ANNs is compared with the Support Vector
Regression Machines [10] (SVRs) - this second algorithm also compares the use of
the linear kernel function with the Gaussian kernel function. In that work, these
learning algorithms are applied to five different artificial time series: stationary,
with additive seasonality, with linear trend, with linear trend and additive sea-
sonality and linear trend and multiplicative seasonality. The results showed that
the SVRs with Gaussian kernel function is the most efficient algorithm in the
forecasting of time series without trend. However, in series with trend, the pre-
dictions shown are disastrous, while the ANNs and SVRs with linear kernel
An Experimental Study on Predictive Models Using Hierarchical Time Series 503
where Yt (h) represents the recalculated prediction for series h at time t, Yt (h)
represents the prediction obtained independently for the time series h at time t
and S is the matrix that represents the hierarchy of the time series.
The calculation of S(S t S)−1 S t gives us the weights we need to do the forecast
adjustment. Considering the matrix S (cf. Equation 1), we obtain the weights
matrix (cf. Equation 3) corresponding to all series of the different hierarchy
nodes.
⎛ ⎞
0.58 0.30 0.28 0.10 0.10 0.10 0.14 0.14
⎜0.31 0.51 −0.20 0.17 0.17 0.17 −0.10 −0.10⎟
⎜ ⎟
⎜0.27 −0.21 0.48 −0.07 −0.07 0.07 0.24 0.24⎟
⎜ ⎟
⎜0.10 0.18 −0.08 0.72 −0.27 −0.27 −0.04 −0.04⎟
t −1 t
S(S S) S = ⎜ ⎜ ⎟ (3)
⎟
⎜0.10 0.18 −0.08 −0.27 0.72 −0.27 −0.04 −0.04⎟
⎜0.10 0.18 −0.08 −0.27 −0.27 0.72 −0.04 −0.04⎟
⎜ ⎟
⎝0.15 −0.09 0.24 −0.03 −0.03 −0.03 0.62 −0.38⎠
0.15 −0.09 0.24 −0.03 −0.03 −0.03 −0.38 0.62
For example, the forecast value for the times series AA would be obtained by
using the weights of the fourth line of this weights matrix, as shown in Equation 4.
An Experimental Study on Predictive Models Using Hierarchical Time Series 505
We should note that the negative weights are associated to the time series
which are not directly influencing the considered time series. This coefficient is
negative, instead of null, since we want to extract the effect of this series in the
series on the top levels.
The same authors made available in R [11], the package hts [12] which has
the implementation of an algorithm that automatically returns predictions for
all hierarchical levels based on the idea described previously. However, it only
allows the use of a linear model to predict each set.
The challenge lies in building a model to forecast the sales revenue of the com-
pany, per month. Additionally, this forecast should be made by store - there
are 137 stores across the country - and by business unit - i.e., is not intended a
forecast for the whole store, but for a particular set products. In this particular
business, there are five business units: Home Appliances (U51), Entertainment
(U52), Wifi (U53), Image (U54) and Mobile (U55). The goal is at the 15th day
of each month, foreseen the sales revenue of the following month.
This problem is as a regression problem - since it is the forecast of a contin-
uous variable - and the available data will be used to train the algorithms, i.e.,
we will have a supervised learning process.
We have monthly aggregated data since January 2011 until December 2014.
We keep the last six months of 2014 to evaluate our models, using the remaining
data for training. Due to the reduced number of instances, we decided to use a
growing window with a time horizon of two months. This means that to predict
the sales of July 2014, we use as training window all the data until May 2014.
Then, the training window grows by incorporating the data of June 2014, to
predict the sales of August 2014, and so on.
From a first analysis of our data, we have noticed that the sales evolution regard-
ing each business unit in each store exhibits very different behaviours. In these
conditions, it is unfeasible to obtain a single model that would have a good perfor-
mance for every store and business unit. Thus, in order to avoid very large errors,
we have started by applying the k-means algorithm [13] to our stores, setting
the number of clusters to three. Our aim was to cluster the stores by the three
main areas of the country: north, center and south. The stores that constituted
the centroid of each cluster, were used to tune the learning methods parameters
for the time series corresponding to the total and the business unit of each store.
506 A.M. Silva et al.
Our flat modelling approach, described in previous section, of predicting sales for
each store, each business unit, and total of the company shown some drawbacks.
The sum of the forecasts of the series of lower levels, does not correspond exactly
to the value of the upper level. In this context, we found that would be useful to
explore the hierarchies present in the database, which will also help to build a
more consistent model in time series forecasts from different hierarchical levels.
Therefore, based on the forecasts obtained with our base model,we considered
the following four different modelling approaches.
An Experimental Study on Predictive Models Using Hierarchical Time Series 507
Table 1. MAPE of sales forecast per business unit and total sales of the company by
four modelling approaches to combine predictions of the different hierarchical levels.
that TopDown has higher error rates and we also know that for this hierarchi-
cal level, the models NonHierarch and BottomUp are equal. Therefore, and since
parametric tests are more powerful than non-parametric tests, we used the t-test
for paired samples, considering the difference between the pairs of error rates
observations, and the null hypothesis tests whether the mean of these differ-
ences is null, instead of the alternative hypothesis, which tests if the mean value
of the differences is higher that zero, which means the errors obtained by the
model HierarchComb are lower than that obtained by the model BottomUp. We
got a p-value less than 1% so, for this level of significance, we reject the null
hypothesis. We conclude that the model HierarchComb produces better fore-
casts than the model BottomUp, and the differences are statistically significant.
In fact, graphically, the model HierarchComb particularly reduces larger errors,
which have great impact on the average.
An Experimental Study on Predictive Models Using Hierarchical Time Series 509
Fig. 3. Distribution of MAPE of sales forecast for all the stores in business units U51,
U52 and U53 by four modelling approaches: NonHierarch, BottomUp, TopDown and
HierarchComb
510 A.M. Silva et al.
Fig. 3. (Continued)
An Experimental Study on Predictive Models Using Hierarchical Time Series 511
References
1. Ferreira, N., Gama, J.: Análise exploratória de hierarquias em base de dados
multidimensionais. Revista de Ciências da Computação 7, 24–42 (2012)
2. Fliedner, G.: Hierarchical forecasting: issues and use guidelines. Industrial
Management and Data Systems 101, 5–12 (2001)
3. Gross, C.W., Sohl, J.E.: Disaggregation methods to expedite product line forecast-
ing. Journal of Forecasting 9(3) (1990)
4. Hyndman, R.J., Ahmed, R.A., Athanasopoulos, G., Shang, H.L.: Optimal com-
bination forecasts for hierarchical time series. Computational Statistics & Data
Analysis 55, 2579–2589 (2011)
5. Azevedo, J.M., Almeida, R., Almeida, P.: Using data mining with time series data
in short-term stocks prediction: A literature review. International Journal of Intel-
ligence Science 2, 176 (2012)
6. Alon, I., Qi, M., Sadowski, R.J.: Forecasting aggregate retail sales: A compari-
son of artifcial neural networks and traditional methods. Journal of Retailing and
Consumer Services, 147–156 (2001)
7. Zhang, G.P.: Neural networks for retail sales forecasting. In: Encyclopedia of
Information Science and Technology (IV), pp. 2100–2104. Idea Group (2005)
8. Sousa, J.: Aplicação de redes neuronais na previsão de vendas para retalho.
Master’s thesis, Faculdade de Engenharia da Universidade do Porto (2011)
9. Crone, S.F., Guajardo, J., Weber, R.: A study on the ability of support vector
regression and neural networks to forecast basic time series patterns. In: Bramer,
M. (ed.) Artificial Intelligence in Theory and Practice. LNCS, vol. 217, pp. 149–158.
Springer, Boston (2006)
512 A.M. Silva et al.
10. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector
regression machines. In: Advances in Neural Information Processing Systems 9,
December 2–5, NIPS, Denver, CO, USA, pp. 155–161 (1996)
11. R Core Team: R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria (2014)
12. Hyndman, R.J., Wang, E., with contributions from Roman, A., Ahmed, A.L., to
earlier versions of the package, H.L.S.: hts: Hierarchical and grouped time series.
R package version 4.4 (2014)
13. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Applied
Statistics 28(1), 100–108 (1979)
14. Torgo, L.: Data Mining with R, learning with case studies. Chapman and Hall/CRC
(2010)
15. With contributions from George Athanasopoulos, R.J.H., Razbash, S., Schmidt,
D., Zhou, Z., Khan, Y., Bergmeir, C., Wang, E.: forecast: Forecasting functions for
time series and linear models. R package version 5.6 (2014)
16. Ripley, B.: nnet: Feed-forward Neural Networks and Multinomial Log-Linear Mod-
els R package version 7.3-8 (2014)
17. Kuhn, M.: caret: Classification and Regression Training. R package version 6.0-35
(2014)
18. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc
Functions of the Department of Statistics (e1071), TU Wien. R package version
1.6-4 (2014)
19. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - an S4 package for
kernel methods in R. Journal of Statistical Software 11(9), 1–20 (2004)
20. Breiman, L., Cutler, A., Liaw, A., Wiener, M.: Random forests for classification
and regression. R package version 4.6-10 (2014)
21. Hyndman, R.J., Athanasopoulos, G.: Optimally reconciling forecasts in a hierarchy.
Foresight: The International Journal of Applied Forecasting (35), 42–48 (2014)
Crime Prediction Using Regression
and Resources Optimization
1 Introduction
Violent crime is a severe problem in society. Its prediction can be useful for the
law enforcement agents to identify problematic regions to patrol. Additionally,
it can be a valuable information to optimize available resources ahead of time.
In the United States of America (USA), according to the Uniform Crime
Reports (UCR) published by the Federal Bureau of Investigation (FBI) [1], vio-
lent crimes imply the use of force or threat of using force, such as rape, murder,
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 513–524, 2015.
DOI: 10.1007/978-3-319-23485-4 51
514 B. Cavadas et al.
2 Related Work
Crime prediction has been extensively studied throughout the literature due to
its relevance to society. These studies employ diverse machine learning techniques
to tackle the crime forecasting problem.
Nath [3] combined K-means clustering and a weighting algorithm, considering
a geographical approach, for the clustering of crimes according to their types.
Liu et al. [4] proposed a search engine for extracting, indexing, querying and
visualizing crime information using spatial, temporal, and textual information
and a scoring system to rank the data. Shah et al. [5] went a step further and pro-
posed CROWDSAFE for real-time and location-based crime incident searching
and reporting, taking into account Internet crowd sourcing and portable smart
devices. Automatic crime prediction events based on the extraction of Twitter
posts has also been reported [6].
Regarding the UCI data set used in this work, Iqbal et al. [7] compared Naive
Bayesian and decision trees methods by dividing the data set into three classes
based on the risk level (Low, medium and high). In this study, decision trees out-
performed Naive Bayesian algorithms, but the pre-processing procedures were
rudimentary. Somayeh Shojaee et al. [8], applied a more rigorous data process-
ing methodology for a binary class and applied the usage of two different feature
selection methods to a wider range of learning algorithms (Naive Bayesian, deci-
sion trees, support vector machine, neural networks and K-Nearest neighbors). In
these studies no class balancing methodologies were employed. Other approaches
such as the fuzzy association rule mining [9] and case-based editing [10] have also
been performed.
Crime Prediction Using Regression and Resources Optimization 515
3.2 Prediction
We started by pre-processing the data set. The violent crime is our target vari-
able, thus we removed all the other 17 possible target variables contained in
the data set. We also eliminated all the examples that had a missing value on
our target variable and removed all the attributes that had more than 80% of
missing values. The data set contained four non-predictive attributes, which we
have also eliminated. Finally, we have removed one more example that still had
a missing value, and have normalized all the remaining attributes.
Although this problem was previously tackled as a classification task, we
opted for addressing it as a regression task. This is an innovative aspect of our
proposal and this choice is also based on the fact that we will use the numeric
results obtained with the predictions for solving an optimization problem. There-
fore, it makes sense to use a continuous variable throughout the work, instead
of discretizing the target variable and latter recovering a numeric value.
Another challenge involving this data set is the high number of attributes.
To address this problem we have applied the same feature selection scheme with
two different percentages. The scheme applies a hierarchical clustering analysis,
using the Pearson Correlation Coefficient. This step removes a percentage of the
features less correlated with the target variable. Then, a Random Forest (RF)
1
Available at UCI repository in
https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized.
516 B. Cavadas et al.
Given the predicted violent crime per 100k population, we propose to optimize
the distribution of available police officers by state. We present our proposal as a
proof of concept, since more detailed data and insight into the problem would be
needed to implement a more realistic solution. Given that the number of officers
by state is an integer quantity, it is used Integer Linear Programming. To solve
the optimization problem it was applied the Branch-and-bound algorithm.
Crime Prediction Using Regression and Resources Optimization 517
xi ≥ fi Hi ; ci xi ≤ Bi ;
xi ∈ N
where i ∈ {1, ..., m} indexes each of the m states, with m = 46, xi is the number
of officers to distribute by state, si is the violent crime predictions by state, Hi
is the ideal number of officers by state, fi is a fraction on the ideal number of
officers that each state accepts as the minimum, ci is the cost that each state
should pay for each officer, and Bi is the available budget for each state.
The ideal number of officers was defined in function of the violent crime
prediction of the state and the population (number of citizens), since bigger
populations, with more violent crime, have higher demands regarding police
officers. To this end, the violent crime predictions were scaled (ssi ) to the interval
[vl , vh ]. This way, it acts as a proportion on the population. However, since some
populations have millions of citizens, this value was divided by 100 to get more
realistic estimates for the ideal number of officers. So,
ssi pi
Hi = (1)
100
where pi is the real population of the state i.
It was defined that the minimum number of officers should be a fraction on
the ideal number, taking into account the crime predictions. Defining a lower (lb )
and an upper (ub ) bound for the fraction, the previously scaled violent crime
predictions are linearly mapped to the interval [lb , ub ]. Knowing that it is in the
interval [vl , vh ], the fraction on the ideal number of officers is calculated as,
si − vl
fi = (ub − lb ) + lb (2)
vh − v l
Budget was defined in function of the population and its density. Such def-
inition is based on the intuition that a small and less dense population needs
less budget and officers than a highly dense and big population. However, the
518 B. Cavadas et al.
population numbers are several orders of magnitude higher than density, which
would make the effect of density negligible. So, we have rescaled both population
and density to the range [0, 100] (psi and dsi ). Moreover, the budget for each
state is a part of the total national budget (BT ). So, Bi was calculated as
(ds + a · psi ) BT
Bi = mi (3)
i=1 dsi + a · psi
where a > 0 is a parameter to tune the weight of the density and population
over the budget calculation.
4 Experimental Analysis
We have divided our problem, and analysis, into two sub-problems: prediction
and optimization. In this section, we describe the tools, metrics, and evaluation
methodology for each sub-problem. Then we focus in each sub-problem results.
Prediction. The main goal of our experiments is to select one of the two pre-
processed data sets, a smoteR variant (in case it has a positive impact) and a
model (among SVM, RF and MARS) to apply in the optimization task.
The experiments were conducted with R software. Table 1 summarizes the
learning algorithms that were used and the respective parameter variants. All
combinations of parameters were tried for the learning algorithms, which led to
4 SVM variants, 6 RF variants and 8 MARS variants.
We started by splitting each data set in train and test sets, approximately
corresponding to 80% and 20% of the data. The test set was held apart to be used
in the optimization, after predicting its crime severities. This set was randomly
built with stratification and with the condition of including at least one example
for each possible state of the USA.
In imbalanced domains, it is necessary to use adequate metrics since tra-
ditional measures are not suitable for assessing the performance. Most of these
specific metrics, such as precision and recall, exist for classification problems. The
notions of precision and recall were adapted to regression problems with non-
uniform relevance of the target values by Torgo and Ribeiro [19] and Ribeiro [18].
We will use the framework proposed by these authors to evaluate and compare
our results. More details on this formulation can be obtained in[18].
All the described alternatives were evaluated according to the F-measure
with β = 1, which means that the same importance was given to both precision
and recall scores. The values of F1 were estimated by means of 3 repetitions of a
10-fold Cross Validation process and the statistical significance of the observed
paired differences was measured using the non-parametric pairwise Wilcoxon
signed-rank test.
Crime Prediction Using Regression and Resources Optimization 519
Prediction. We started by examining the results obtained with all the param-
eters selected for the two pre-processed data sets, the three types of learners and
the smoteR variants. All combinations of parameters were tested by means of 3
repetitions of a 10-fold cross validation process. Figure 1 shows these results.
We have also analysed the statistical significance of the differences observed
in the results. Table 2 contains the several p-values obtained when comparing the
SmoteR variants and the different learners, using the non-parametric pairwise
Wilcoxon signed rank test with Bonferroni correction for multiple testing.
The p-value for the differences between the two data sets (with 30% and 50%
of the features) was 0.17. Therefore we chose the data set with less features to
continue to the optimization problem. This was mainly because of: i) the non
statistical significant differences and ii) the smaller size of the data (less features
can explain well the target variable, so we chose the most efficient alternative).
Table 2. Pairwise Wilcoxon signed rank test with Bonferroni correction for the SmoteR
strategies (left) and the learning systems (right).
Strategies none S.o2.u0.5 S.o2.u1 S.o4.u0.5
S.o2.u0.5 1.3e-14 - - - Learners svm rf
S.o2.u1 < 2e-16 1 - - rf <2e-16 -
S.o4.u0.5 2.3e-16 1 1 - mars 0.077 <2e-16
S.o4.u1 < 2e-16 0.18 1 1
520 B. Cavadas et al.
Fig. 1. Results from 3 × 10-fold CV by learning system and SmoteR variant. (none-
original data; S-smoteR; ox-x × 100% over-sampling; uy-y × 100% under-sampling)
Regarding the SmoteR strategy, Figure 1 and Table 2 provide clear evi-
dence of the advantages of this procedure. Moreover, we also observed that the
differences between the several variants of this procedure are not statistically
significant. Therefore, we have opted for the variant which leads to a smaller
data set and consequently a lower run time. For the optimization sub-problem
we chose to use the smoteR variant with 200% of over-sampling percentage and
100% of under-sampling percentage. The learning system that provides a better
performance is clearly the RF. With this learner, there is almost no differences
among the several experimented variants.
Considering these results, we chose the following setting to generate a model
for the optimization sub-problem:
– Pre-processing to remove missing values and select 30% of the most relevant
features;
– Apply the smoteR strategy with parameters k=5; over-sampling percent-
age=200; under-sampling percentage=100;
– RF model with parameters: mtry=7; ntree=750.
After generating the model we obtained the predictions for the test set which
was held apart to use in the optimization sub-problem. These predictions were
used as input of the optimization task.
this weight is decreased, those states lost officers, while, for instance, Vermont
obtained the ideal number, although the population is one of the lowest
Table 3 shows the results of distributing 500,000 police officers, with a budget
of 8,000,000, and a = 1. Figure 2 shows the same results in a map of the USA,
where brighter red is associated with higher criminality, and the radius of the
circles is proportional to the amount of officers assigned to the state. The color
of the circle indicates which restriction limited the number of officers. Therefore,
green means that the state received the ideal number, the minimum is repre-
sented in blue, yellow means that the budget of the state did not allow more
State Crime Prediction Budget Min. Off. Ideal Off. Dist. Off. Cost
NJ 676.8 320390.4 1597 18614 18614 182659.9
PA 276.1 323802.5 1279 15991 1279 11413.7
OR 638.3 87076.7 678 7951 7951 59607.0
NY 893.1 504835.7 4445 49991 41007 504830.6
MO 381.3 12559.3 123 1503 123 1280.8
MA 408.1 233862.6 842 10283 842 5457.7
IN 726.6 164404.1 1247 14420 14420 162921.4
TX 821.1 648307.0 5643 64214 64214 477378.9
CA 850.2 948629.7 8369 94781 68083 948620.3
KY 655.6 104572.9 769 8996 8996 66277.2
AR 928.2 64349.2 691 7726 5885 64346.6
CT 355.4 146317.7 413 5090 413 6109.4
OH 542.8 90044.9 587 6997 587 8778.9
NH 312.6 33606.1 142 1760 142 2027.5
FL 1313.2 503583.7 6432 67716 52441 503581.4
WA 557.3 168051.3 1089 12954 12954 120927.3
LA 1721.0 109854.6 1994 19765 11893 109847.5
WY 528.1 1846.3 87 1036 87 778.2
NC 1315.6 247220.2 3221 33900 27118 247214.5
MS 1089.4 65686.5 808 8801 7149 65677.9
VA 851.7 208745.0 1798 20363 14106 208734.1
SC 1171.5 119505.3 1397 15028 10543 119498.9
WI 325.7 136544.5 629 7793 629 5271.4
TN 712.7 160819.6 1219 14128 10940 160817.3
UT 794.8 61723.4 599 6850 4253 61723.1
OK 488.6 86322.2 545 6560 545 5960.0
ND 342.0 6056.3 83 1026 83 618.2
AZ 500.5 155514.1 962 11554 962 11214.4
CO 791.7 121546.9 1087 12432 12432 84710.8
WV 551.2 39327.5 283 3371 283 3485.0
RI 440.9 110983.5 138 1680 138 1327.7
AL 1452.9 113590.5 1738 17915 7786 113576.2
GA 1254.6 248026.3 3120 33144 26054 248017.1
ID 444.8 28553.6 216 2616 216 2020.0
ME 275.9 23475.0 133 1663 133 1942.2
KS 1286.4 60751.9 920 9724 9724 52508.6
SD 568.5 8853.4 133 1585 1403 8851.3
NV 920.6 58244.7 656 7350 7350 56186.7
IA 556.7 67617.0 479 5696 2243 15039.1
MD 1271.9 191051.4 1872 19832 13217 191045.6
MN 862.1 125640.6 1191 13465 13465 96440.4
NM 835.6 39199.9 443 5031 3950 39199.7
DE 1161.9 48882.5 268 2891 2891 16497.7
VT 517.2 8881.3 92 1097 92 548.1
AK 932.3 64349.2 694 7752 7752 45227.2
DC 3044.8 926793.2 553 4612 4612 67407.4
522 B. Cavadas et al.
Fig. 2. Map of the USA representing the level of violent criminality by state, the
amount of police officers assigned, and the restriction that imposed that number. White
states are not represented in the data set.
officers, and white means that the state received a middle value of officers, which
is less than the ideal or the maximum allowed by the budget, but higher than the
minimum. It is possible to observe that ten states received the ideal number of
officers. Some of them were associated with low or moderate levels of criminality,
but the density or the population was high, such as New Jersey or Texas. Others
are less populated, such as Oregon, but the ideal number of officers was also
lower than other states constraint by the budget. The violent crime rate was
particularly important in Kansas, since with a lower density and population,
its budget allowed the state to receive the ideal number of officers. It is, also,
possible to observe that the states with more violent criminality reached the
number of officers allowed by their budget, such as Alabama or South Carolina.
Accordingly, many states with less criminality received the minimum number of
officers that they would allow (North Dakota), or values between the minimum
and the ideal, without being constrained by the budget (Iowa). This behaviour
may be desirable, since having too many officers in states with less criminality
may be a waste of resources. The influence of the crime severity may be per-
ceived when comparing Arizona with Nevada. The former has more population,
higher density and budget than the latter, but received less officers because of
the lower criminality rating.
According to the FBI [1], the region with more violent crime incidents is the
South, followed by the West, Midwest and Northeast. It is interesting to notice
that, in Figure 2, it was predicted more severe criminality for the southern states.
These were the states that receive more police officers.
Crime Prediction Using Regression and Resources Optimization 523
5 Conclusions
In this paper, we proposed a pipeline for predicting violent crime and a resources
optimization scheme. Prediction encompasses feature selection through correla-
tion and feature importance analysis, over-sampling of the rare extreme values
of the target variable and regression. Among the evaluated learning systems, RF
presented the best performance. This pipeline itself is one of the contributions
of this work, given that, to the best of our knowledge, this problem in this data
set was never approached as regression. Having the predictions, we propose a
decision support scheme through the optimization of police officers across states,
while taking into account the violent crime predictions, population, density and
budget of the states. This contribution is presented as a proof of concept, since
some of the parameters were synthesized and may not correspond to the real
scenario. Nevertheless, our results show an higher crime burden in states located
in the southern part of the USA compared with the states in the north. For
this reason, southern states tend to have an higher assignment of police officers.
These predictions are in accordance with some national reports, and although
some parameters of the optimization are not completely realistic, it seems to
work as expected.
This work, although limited to the United States, can be easily applied to
various other countries. So, as future work we consider that it would be inter-
esting to apply the proposed framework in other countries or regions.
References
1. FBI, Crime in the United States 2013 (2014). http://www.fbi.gov/about-us/cjis/
ucr/crime-in-the-u.s/2013/crime-in-the-u.s.-2013 (accessed: January 21, 2015)
2. Labor-Statistics, B.: United States Department of Labor - Bureau of Labor Statis-
tics: Police and detectives (2012). http://www.bls.gov/ooh/protective-service/
police-and-detectives.htmtab-1 (accessed: January 21, 2015)
3. Nath, S.V.: Crime pattern detection using data mining. In: 2006 IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology
Workshops, WI-IAT 2006 Workshops, pp. 41–44. IEEE (2006)
4. Liu, X., Jian, C., Lu, C.-T.: A spatio-temporal-textual crime search engine. In:
Proceedings of the 18th SIGSPATIAL International Conference on Advances in
Geographic Information Systems, pp. 528–529. ACM (2010)
5. Shah, S., Bao, F., Lu, C.-T., Chen, I.-R.: Crowdsafe: crowd sourcing of crime
incidents and safe routing on mobile devices. In: Proceedings of the 19th ACM
SIGSPATIAL International Conference on Advances in Geographic Information
Systems, pp. 521–524. ACM (2011)
524 B. Cavadas et al.
6. Wang, X., Gerber, M.S., Brown, D.E.: Automatic crime prediction using events
extracted from twitter posts. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.)
SBP 2012. LNCS, vol. 7227, pp. 231–238. Springer, Heidelberg (2012)
7. Iqbal, R., Murad, M.A.A., Mustapha, A., Panahy, P.H.S., Khanahmadliravi, N.:
An experimental study of classification algorithms for crime prediction. Indian
Journal of Science and Technology 6(3), 4219–4225 (2013)
8. Shojaee, S., Mustapha, A., Sidi, F., Jabar, M.A.: A study on classification learn-
ing algorithms to predict crime status. International Journal of Digital Content
Technology and its Applications 7(9), 361–369 (2013)
9. Buczak, A.L., Gifford, C.M.: Fuzzy association rule mining for community crime
pattern discovery. In: ACM SIGKDD Workshop on Intelligence and Security Infor-
matics, p. 2. ACM (2010)
10. Redmond, M.A., Highley, T.: Empirical analysis of case-editing approaches for
numeric prediction. In: Innovations in Computing Sciences and Software Engineer-
ing, pp. 79–84. Springer (2010)
11. Donovan, G., Rideout, D.: An integer programming model to optimize resource
allocation for wildfire containment. Forest Science 49(2), 331–335 (2003)
12. Caulkins, J., Hough, E., Mead, N., Osman, H.: Optimizing investments in security
countermeasures: a practical tool for fixed budgets. IEEE Security & Privacy 5(5),
57–60 (2007)
13. Mitchell, P.S.: Optimal selection of police patrol beats. The Journal of Criminal
Law, Criminology, and Police Science, 577–584 (1972)
14. Daskin, M.: A maximum expected covering location model: formulation, properties
and heuristic solution. Transportation Science 17(1), 48–70 (1983)
15. Li, L., Jiang, Z., Duan, N., Dong, W., Hu, K., Sun, W.: Police patrol service
optimization based on the spatial pattern of hotspots. 2011 IEEE International
Conference on in Service Operations, Logistics, and Informatics, pp. 45–50. IEEE
(2011)
16. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression.
In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154,
pp. 378–389. Springer, Heidelberg (2013)
17. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for
regression. Expert Systems (2014)
18. Ribeiro, R.P.: Utility-based Regression. PhD thesis, Dep. Computer Science, Fac-
ulty of Sciences - University of Porto (2011)
19. Torgo, L., Ribeiro, R.: Precision and recall for regression. In: Gama, J., Costa,
V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 332–346.
Springer, Heidelberg (2009)
20. Milborrow, S.: earth: Multivariate Adaptive Regression Spline Models. Derived
from mda:mars by Trevor Hastie and Rob Tibshirani (2012)
21. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc
Functions of the Department of Statistics (e1071), TU Wien (2011)
22. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3),
18–22 (2002)
23. U.S.C. Bureau, Population Estimates (2012). http://www.census.gov/popest/
data/index.html (accessed: January 23, 2015)
Distance-Based Decision Tree Algorithms
for Label Ranking
1 Introduction
Label Ranking (LR) is an increasingly popular topic in the machine learning
literature [7, 8, 18, 19, 24]. LR studies a problem of learning a mapping from
instances to rankings over a finite number of predefined labels. It can be consid-
ered as a natural generalization of the conventional classification problem, where
only a single label is requested instead of a ranking of all labels [6]. In contrast
to a classification setting, where the objective is to assign examples to a specific
class, in LR we are interested in assigning a complete preference order of the
labels to every example.
There are two main approaches to the problem of LR: methods that trans-
form the ranking problem into multiple binary problems and methods that were
developed or adapted to treat the rankings as target objects, without any trans-
formation. An example of the former is the ranking by pairwise comparison of
[11]. Examples of algorithms that were adapted to deal with rankings as the
target objects include decision trees [6,23], naive Bayes [1] and k -Nearest Neigh-
bor [3,6].
Some of the latter adaptations are based on statistical distribution of rankings
(e.g., [5]) while others are based on rank correlation measures (e.g., [19,23]). In
this paper we carry out an empirical evaluation of decision tree approaches for LR
based on correlation measures and compare it to distribution-based approaches.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 525–534, 2015.
DOI: 10.1007/978-3-319-23485-4 52
526 C. Rebelo de Sá et al.
2 Label Ranking
Splitting Criterion. The splitting criterion is a measure that quantifies the qual-
ity of a given partition of the data. It is usually applied to all the possible splits
of the data that can be made based on individual tests of the attributes.
In RT the goal is to obtain leaf nodes that contain examples with target rank-
ings as similar between themselves as possible. To assess the similarity between
the rankings of a set of training examples, we compute the mean correlation
between them, using Spearman’s correlation coefficient. The quality of the split
is given by the weighted mean correlation of the values obtained for the subsets,
where the weight is given by the number of examples in each subset.
528 C. Rebelo de Sá et al.
The splitting criterion of ranking trees is illustrated both for nominal and
numerical attributes in Table 1. The nominal attribute x1 has three values
(a, b and c). Therefore, three binary splits are possible. For the numerical
attribute x2 , a split can be made in between every pair of consecutive values. In
this case, the best split is x1 = c, with a mean correlation of 0.5 for the training
examples that verify the test and a mean correlation of 0.2 for the remaining,
i.e., the training examples for which x1 = a or x1 = b.
λ1 λ2 λ3 λ4
π1 1 3 2 4
π2 2 1 4 3
π 1.5 2 3 3.5
π̂ 1 2 3 4
Distance-Based Decision Tree Algorithms for Label Ranking 529
where K is the number of distinct rankings in S and kt (S) is the average nor-
malized Kendall τ distance in the subset S:
K n τ (πi ,πj )+1
i=1 j=1 2
kt (S) =
K × nS
where K is the number of distinct target values in S.
As in Section 2.1 the leafs of the tree should not be forced to have pure leafs.
Instead, they should have a stop criterion to avoid overfitting and be robust to
noise in rankings. As shown in [20], the MDLPC Criterion can be used as a
splitting criterion with the adapted version of entropy Hranking . This entropy
measure also works with partial orders, however, in this work, we only use total
orders.
One other ranking tree approach based in Gini Impurity, which will not be
presented in detail in this work, was proposed in [25].
3 Experimental Setup
The data sets in this work were taken from KEBI Data Repository in the Philipps
University of Marburg [6] (Table 3). Two different transformation methods were
530 C. Rebelo de Sá et al.
used to generate these datasets: (A) the target ranking is a permutation of the
classes of the original target attribute, derived from the probabilities generated
by a naive Bayes classifier; (B) the target ranking is derived for each example
from the order of the values of a set of numerical variables, which are no longer
used as independent variables. Although these are somewhat artificial datasets,
they are quite useful as benchmarks for LR algorithms.
The statistics of the datasets used in our experiments is presented in Table 3.
Uπ is the proportion of distinct target rankings for a given dataset.
The code for all the examples in this paper has been written in R ([16]).
The performance of the LR methods was estimated using a methodology
that has been used previously for this purpose [11]. It is based on the ten-
fold cross validation performance estimation method. The evaluation measure is
Kendall’s τ and the performance of the methods was estimated using ten-fold
cross-validation.
4 Results
RT uses a parameter, γ, that can affect the accuracy of the model. A γ ≥ 1 does
not increase the purity of nodes. On the other hand, small γ values will rarely
generate any nodes. We vary γ from 0.50 to 0.99 and measure the accuracy on
several KEBI datasets.
To show in what extent γ affects the accuracy of RT we show in Figure 1 the
results obtained for some of the datasets in Table 3. From Figure 1 it is clear
Distance-Based Decision Tree Algorithms for Label Ranking 531
that γ plays an important role in the accuracy of RT. It seems that the best
values lie between 0.95 and 0.98. We will use γ = 0.98 for the Ranking Tees
(RT).
Table 4 presents the results obtained by the two methods presented in com-
parison to the results for Label Ranking Trees (LRT) obtained in [6]. Even
though LRT perform better in the cases presented, given the closer values to it,
both RT and ERT give interesting results.
To compare different ranking methods we use a method proposed in [4] which
is a combination of Friedmans test and Dunns Multiple Comparison Procedure
[14]. First we run the Friedman’s test to check whether the results are different
or not, with the following hypotheses:
532 C. Rebelo de Sá et al.
Table 4. Results obtained for Ranking Trees on KEBI datasets. (The mean accuracy
is represented in terms of Kendall’s tau, τ )
RT ERT LRT
authorship .879 .890 .882
bodyfat .104 .183 .117
calhousing .181 .292 .324
cpu-small .461 .437 .447
elevators .710 .758 .760
fried .796 .773 .890
glass .881 .854 .883
housing .773 .704 .797
iris .964 .853 .947
pendigits .055 .042 .935
segment .895 .902 .949
stock .854 .859 .895
vehicle .813 .786 .827
vowel .085 .054 .794
wine .899 .907 .882
wisconsin -.039 -.035 .343
RT ERT LRT
RT 1.0000 0.2619
ERTt 1.0000 0.1529
LRT 0.2619 0.1529
Using the friedman.test function from the stats package [16] we got a p-value
< 1%, which shows strong evidence against H0 .
Now that we know that there are some differences between the 3 methods
we will test which are different from one another with the Dunns Multiple Com-
parison Procedure [14]. Using the R package dunn.test [9] with a Bonferroni
adjustment, as in [4], we tested the following hypotheses for each pair of of
methods a and b:
H0 . There is no difference in the mean average correlation coefficients between
a and b
H1 . There is some difference in the mean average correlation coefficients between
a and b
The p-values obtained are presented in Table 5. Table 5 indicates that there is no
strong statistically evidence that the methods are different. One other conclusion
Distance-Based Decision Tree Algorithms for Label Ranking 533
is that both RT and ERT are very equivalent approaches. While RT and ERT
does not seem to outperform LRT in most of the cases studied, from the statical
tests we can say that both approaches are competitive.
5 Conclusions
In this work we implemented a decision tree method for Label Ranking, Ranking
Trees (RT) and proposed an alternative approach Entropy-based Ranking Trees
(ERT). We also present an empirical evaluation on several datasets of correlation-
based methods, RT and ERT, and compare with the state of the art distribution-
based Label Ranking Trees (LRT). The results indicate that both RT and ERT
are reliable LR methods.
Our implementation of Ranking Trees (RT) shows that the method is a com-
petitive approach in the LR field. We showed that the input parameter, γ, can
have a great impact on the accuracy of the method. The tests performed on
KEBI datasets indicate that the best results are obtained when 0.95 < γ < 1.
The method proposed in this paper, ERT, which uses IG as a splitting cri-
terion achieved very similar results to the RT presented in [17]. Statistical tests
indicated that there is no strong evidence that the methods (RT, ERT and
LRT) are significantly different. This means that both RT and ERT are valid
approaches, and, since they are correlation-based methods, we can also say that
this kind of approaches is also worth pursuing.
References
1. Aiguzhinov, A., Soares, C., Serra, A.P.: A similarity-based adaptation of naive
bayes for label ranking: application to the metalearning problem of algorithm
recommendation. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010.
LNCS, vol. 6332, pp. 16–26. Springer, Heidelberg (2010)
2. Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees.
CoRR cs.LG/0011032 (2000). http://arxiv.org/abs/cs.LG/0011032
3. Brazdil, P., Soares, C., Costa, J.: Ranking Learning Algorithms: Using IBL and
Meta-Learning on Accuracy and Time Results. Machine Learning 50(3), 251–277
(2003)
4. Brazdil, P., Soares, C., da Costa, J.P.: Ranking learning algorithms: Using IBL
and meta-learning on accuracy and time results. Machine Learning 50(3), 251–277
(2003). http://dx.doi.org/10.1023/A:1021713901879
5. Cheng, W., Dembczynski, K., Hüllermeier, E.: Label ranking methods based on
the plackett-luce model. In: ICML, pp. 215–222 (2010)
6. Cheng, W., Huhn, J.C., Hüllermeier, E.: Decision tree and instance-based learn-
ing for label ranking. In: Proceedings of the 26th Annual International Confer-
ence on Machine Learning, ICML 2009, June 14–18, Montreal, Quebec, Canada,
pp. 161–168 (2009)
7. Cheng, W., Hüllermeier, E.: Label ranking with abstention: Predicting partial
orders by thresholding probability distributions (extended abstract). Computing
Research Repository, CoRR abs/1112.0508 (2011). http://arxiv.org/abs/1112.0508
534 C. Rebelo de Sá et al.
8. Cheng, W., Hüllermeier, E., Waegeman, W., Welker, V.: Label ranking with par-
tial abstention based on thresholded probabilistic models. In: Advances in Neural
Information Processing Systems 25: 26th Annual Conference on Neural Informa-
tion Processing Systems 2012. Proceedings of a meeting held December 3–6, Lake
Tahoe, Nevada, United States, pp. 2510–2518 (2012). http://books.nips.cc/papers/
files/nips25/NIPS2012 1200.pdf
9. Dinno, A.: dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums,
r package version 1.2.3 (2015). http://CRAN.R-project.org/package=dunn.test
10. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued
attributes for classification learning. In: Proceedings of the 13th International Joint
Conference on Artificial Intelligence, August 28-September 3, Chambéry, France,
pp. 1022–1029 (1993)
11. Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning
pairwise preferences. Artificial Intelligence 172(16–17), 1897–1916 (2008)
12. Kendall, M., Gibbons, J.: Rank correlation methods. Griffin London (1970)
13. Mitchell, T.: Machine Learning. McGraw-Hill (1997)
14. Neave, H., Worthington, P.: Distribution-free Tests. Routledge (1992). http://
books.google.nl/books?id=1Y1QcgAACAAJ
15. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986).
http://dx.doi.org/10.1023/A:1022643204877
16. R Development Core Team: R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria (2010). http://
www.R-project.org ISBN 3-900051-07-0
17. Rebelo, C., Soares, C., Costa, J.: Empirical evaluation of ranking trees on some
metalearning problems. In: Chomicki, J., Conitzer, V., Junker, U., Perny, P. (eds.)
Proceedings 4th AAAI Multidisciplinary Workshop on Advances in Preference
Handling (2008)
18. Ribeiro, G., Duivesteijn, W., Soares, C., Knobbe, A.: Multilayer perceptron for
label ranking. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.)
ICANN 2012, Part II. LNCS, vol. 7553, pp. 25–32. Springer, Heidelberg (2012)
19. de Sá, C.R., Soares, C., Jorge, A.M., Azevedo, P., Costa, J.: Mining association
rules for label ranking. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011,
Part II. LNCS, vol. 6635, pp. 432–443. Springer, Heidelberg (2011)
20. de Sá, C.R., Soares, C., Knobbe, A.: Entropy-based discretization methods for
ranking data. Information Sciences in Press (2015) (in press)
21. de Sá, C.R., Soares, C., Knobbe, A., Azevedo, P., Jorge, A.M.: Multi-interval
discretization of continuous attributes for label ranking. In: Fürnkranz, J.,
Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 155–169.
Springer, Heidelberg (2013)
22. Spearman, C.: The proof and measurement of association between two things.
American Journal of Psychology 15, 72–101 (1904)
23. Todorovski, L., Blockeel, H., Džeroski, S.: Ranking with predictive clustering trees.
In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI),
vol. 2430, pp. 444–455. Springer, Heidelberg (2002)
24. Vembu, S., Gärtner, T.: Label ranking algorithms: A survey. In: Fürnkranz, J.,
Hüllermeier, E. (eds.) Preference Learning, pp. 45–64. Springer, Heidelberg (2010)
25. Xia, F., Zhang, W., Li, F., Yang, Y.: Ranking with decision tree. Knowl. Inf. Syst.
17(3), 381–395 (2008). http://dx.doi.org/10.1007/s10115-007-0118-y
A Proactive Intelligent Decision Support System
for Predicting the Popularity of Online News
1 Introduction
Decision Support Systems (DSS) were proposed in the mid-1960s and involve the
use of Information Technology to support decision-making. Due to advances in
this field (e.g., Data Mining, Metaheuristics), there has been a growing interest
in the development of Intelligent DSS (IDSS), which adopt Artificial Intelligence
techniques to decision support [1]. The concept of Adaptive Business Intelligence
(ABI) is a particular IDSS that was proposed in 2006 [2]. ABI systems combine
prediction and optimization, which are often treated separately by IDSS, in order
to support decisions more efficiently. The goal is to first use data-driven models
for predicting what is more likely to happen in the future, and then use modern
optimization methods to search for the best possible solution given what can be
currently known and predicted.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 535–546, 2015.
DOI: 10.1007/978-3-319-23485-4 53
536 K. Fernandes et al.
Within the expansion of the Internet and Web 2.0, there has also been a grow-
ing interest in online news, which allow an easy and fast spread of information
around the globe. Thus, predicting the popularity of online news is becoming a
recent research trend (e.g., [3,4,5,6,7]). Popularity is often measured by consid-
ering the number of interactions in the Web and social networks (e.g., number of
shares, likes and comments). Predicting such popularity is valuable for authors,
content providers, advertisers and even activists/politicians (e.g., to understand
or influence public opinion) [4]. According to Tatar et al. [8], there are two main
popularity prediction approaches: those that use features only known after pub-
lication and those that do not use such features. The first approach is more
common (e.g., [3,5,9,6,7]). Since the prediction task is easier, higher prediction
accuracies are often achieved. The latter approach is more scarce and, while a
lower prediction performance might be expected, the predictions are more useful,
allowing (as performed in this work) to improve content prior to publication.
Using the second approach, Petrovic et al. [10] predicted the number of
retweets using features related with the tweet content (e.g., number of hash-
tags, mentions, URLs, length, words) and social features related to the author
(e.g., number of followers, friends, is the user verified). A total of 21 million
tweets were retrieved during October 2010. Using a binary task to discriminate
retweeted from not retweeted posts, a top F-1 score of 47% was achieved when
both tweet content and social features were used. Similarly, Bandari et al. [4]
focused on four types of features (news source, category of the article, subjec-
tivity language used and names mentioned in the article) to predict the number
of tweets that mention an article. The dataset was retrieved from Feedzilla and
related with one week of data. Four classification methods were tested to predict
three popularity classes (1 to 20 tweets, 20 to 100 tweets, more than 100; articles
with no tweets were discarded) and results ranged from 77% to 84% accuracy,
for Naı̈ve Bayes and Bagging, respectively. Finally, Hensinger et al. [11] tested
two prediction binary classification tasks: popular/unpopular and appealing/non
appealing, when compared with other articles published in the same day. The
data was related with ten English news outlets related with one year. Using
text features (e.g., bag of words of the title and description, keywords) and
other characteristics (e.g., date of publishing), combined with a Support Vector
Machine (SVM), the authors obtained better results for the appealing task when
compared with popular/unpopular task, achieving results ranging from 62% to
86% of accuracy for the former, and 51% to 62% for the latter.
In this paper, we propose a novel proactive IDSS that analyzes online news
prior to their publication. Assuming an ABI approach, the popularity of a can-
didate article is first estimated using a prediction module and then an optimiza-
tion module suggests changes in the article content and structure, in order to
maximize its expected popularity. Within our knowledge, there are no previous
works that have addressed such proactive ABI approach, combining prediction
and optimization for improving the news content. The prediction module uses a
large list of inputs that includes purely new features (when compared with the
literature [4,11,10]): digital media content (e.g., images, video); earlier popular-
A Proactive Intelligent Decision Support System 537
ity of news referenced in the article; average number of shares of keywords prior
to publication; and natural language features (e.g., title polarity, Latent Dirich-
let Allocation topics). We adopt the common binary (popular/unpopular) task
and test five state of the art methods (e.g., Random Forest, Adaptive Boosting,
SVM), under a realistic rolling windows. Moreover, we use the trendy Mashable
(mashable.com/) news content, which was not previously studied when predict-
ing popularity, and collect a recent and large dataset related with the last two
years (a much larger time period when compared with the literature). Further-
more, we also optimize news content using a local search method (stochastic hill
climbing) that searches for enhancements in a partial set of features that can be
more easily changed by the user.
We extracted an extensive set (total of 47) features from the HTML code
in order to turn this data suitable for learning models, as shown in Table 2.
In the table, the attribute types were classified into: number – integer value;
ratio – within [0, 1]; bool – ∈ {0, 1}; and nominal. Column Type shows within
brackets (#) the number of variables related with the attribute. Similarly to
what is executed in [6,7], we performed a logarithmic transformation to scale
the unbounded numeric features (e.g., number of words in article), while the
nominal attributes were transformed with the common 1-of-C encoding.
538 K. Fernandes et al.
for collecting the online articles and computing their respective features. The
prediction module first receives the processed data and splits it into training,
validation and test sets (data separation). Then, it tunes and fits the classifica-
tion models (model training and selection). Next, the best classification model
is stored and used to provide article success predictions (popularity estimation).
Finally, the optimization module searches for better combinations of a subset of
the current article content characteristics. During this search, there is an heavy
use of the classification model (the oracle). Also, some of the new searched fea-
ture combinations may require a recomputing of the respective features (e.g.,
average keyword minimum number of shares). In the figure, such dependency is
represented by the arrow between the feature extraction and optimization. Once
the optimization is finished, a list of article change suggestions is provided to
the user, allowing her/him to make a decision.
Data
Separation
Model
Training
and Prediction
Data Extraction and Processing Selection
Optimization Decision
We adopted the Scikit learn [14] library for fitting the prediction models. Sim-
ilarly to what is executed in [10,4,11], we assume a binary classification task,
where an article is considered “popular” if the number of shares is higher than
a fixed decision threshold (D1 ), else it is considered “unpopular”.
In this paper, we tested five classification models: Random Forest (RF);
Adaptive Boosting (AdaBoost); SVM with a Radial Basis Function (RBF) ker-
nel; K-Nearest Neighbors (KNN) and Naı̈ve Bayes (NB). A grid search was used
to search for the best hyperparameters of: RF and AdaBoost (number of trees);
SVM (C trade-off parameter); and KNN (number of neighbors). During this grid
search, the training data was internally split into training (70%) and validation
sets (30%) by using a random holdout split. Once the best hyperparameter is
selected, then the model is fit to all training data.
540 K. Fernandes et al.
2.4 Optimization
Local search optimizes a goal by searching within the neighborhood of an initial
solution. This type of search suits our IDSS optimization module, since it receives
an article (the initial solution) and then tries to increase its predicted popularity
probability by searching for possible article changes (within the neighborhood of
the initial solution). An example of a simple local search method is the hill climb-
ing, which iteratively searches within the neighborhood of the current solution
and updates such solution when a better one is found, until a local optimum
is reached or the method is stopped. In this paper, we used a stochastic hill
climbing [2], which works as the pure hill climbing except that worst solutions
can be selected with a probability of P . We tested several values of P , ranging
from P = 0 (hill climbing) to P = 1 (Monte-Carlo random search).
For evaluating the quality of the solutions, the local search maximizes the
probability for the “popular” class, as provided by the best classification model.
Moreover, the search is only performed over a subset of features that are more
suitable to be changed by the author (adaptation of content or change in day of
publication), as detailed in Table 3. In each iteration, the neighborhood search
space assumes small perturbations (increase or decrease) in the feature original
values. For instance, if the current number of words in the title is n = 5, then a
search is executed for a shorter (n = 4) or longer (n = 6) title. Since the day of
the week was represented as a nominal variable, a random selection for a different
day is assumed in the perturbation. Similarly, given that the set of keywords (K)
is not numeric, a different perturbation strategy is proposed. For a particular
article, we compute a list of suggested keywords K that includes words that
appear more than once in the text and that were used as keywords in previous
articles. To keep the problem computationally tractable, we only considered the
best five keywords in terms of their previous average shares. Then, we generate
perturbations by adding one of the suggested keywords or by removing one
A Proactive Intelligent Decision Support System 541
Feature Perturbations
Number of words in the title (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of words in the content (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of images (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Number of videos (n) n ∈ {n − 1, n + 1}, n ≥ 0 ∧ n = n
Day of week (w) w ∈ [0..7), w = w
Keywords (K) k ∈ {K ∪ i} ∪ {K − j}, i ∈ K ∧ j ∈ K
For the prediction experiments, we adopted the rolling windows scheme with
a training window size of W = 10, 000 and performing L = 1, 000 predic-
tions at each iteration. Under this setup, each classification model is trained
29 times (iterations), producing 29 prediction sets (each of size L). For defining
a popular class, we used a fixed value of D1 = 1, 400 shares, which resulted
in a balanced “popular”/“unpopular” class distribution in the first training set
(first 10, 000 articles). The selected grid search ranges for the hyperparameters
were: RF and AdaBoost – number of trees ∈ {10, 20, 50, 100, 200, 400}; SVM –
C ∈ {20 , 21 , ..., 26 }; and KNN – number of neighbors ∈ {1, 3, 5, 10, 20}.
Table 4 shows the obtained classification metrics, as computed over the union
of all 29 test sets. In the table, the models were ranked according to their per-
formance in terms of the AUC metric. The left of Figure 2 plots the ROC curves
of the best (RF), worst (NB) and baseline (diagonal line, corresponds to ran-
dom predictions) models. The plot confirms the RF superiority over the NB
model for all D2 thresholds, including more sensitive (x-axis values near zero,
D2 >> 0.5) or specific (x-axis near one, D2 << 0.5) trade-offs. For the best
model (RF), the right panel of Figure 2 shows the evolution of the AUC metric
542 K. Fernandes et al.
Table 4. Comparison of models for the rolling window evaluation (best values in bold).
Fig. 2. ROC curves (left) and AUC metric distribution over time for RF (right).
Table 5 shows the relative importance (column Rank shows ratio values,
# denotes the ranking of the feature), as measured by the RF algorithm when
trained with all data (39,000 articles). Due to space limitations, the table shows
the best 15 features and also the features that are used by the optimization
module. The keyword related features have a stronger importance, followed by
LDA based features and shares of Mashable links. In particular, the features
that are optimized in the next section (with keywords subset) have a strong
importance (33%) in the RF model.
3.2 Optimization
For the optimization experiments, we used the best classification model (RF),
as trained during the last iteration of the rolling windows scheme. Then, we
selected all articles from the last test set (N = 1, 000) to evaluate the local
search methods. We tested six stochastic hill climbing probabilities (P ∈
{0.0, 0.2, 0.4, 0.6, 0.8, 1.0}). We also tested two feature optimization subsets
A Proactive Intelligent Decision Support System 543
related with Table 3: using all features except the keywords (without keywords)
and using all features (with keywords). Each local search is stopped after 100 iter-
ations. During the search, we store the best results associated with the iterations
I ∈ {0, 1, 2, 4, 8, 10, 20, 40, 60, 80, 100}.
Figure 3 shows the final optimization performance (after 100 iterations) for
variations of the stochastic probability parameter P and when considering the
two feature perturbation subsets. The convergence of the local search (for differ-
ent values of P ) is also shown in Figure 3. The extreme values of P (0 – pure hill
climbing; 1 – random search) produce lower performances when compared with
their neighbor values. In particular, Figure 4 shows that the pure hill climbing
is too greedy, performing a fast initial convergence that quickly gets flat. When
using the without keywords subset, the best value of P is 0.2 for MG and 0.4
for CR metric. For the with keywords subset, the best value of P is 0.8 for both
optimization metrics. Furthermore, the inclusion of keywords-related suggestions
produces a substantial impact in the optimization, increasing the performance
in both metrics. For instance, the MG metric increases from 0.05 to 0.16 in the
best case (P = 0.8). Moreover, Figure 3 shows that the without keywords subset
optimization is an easier task when compared with the with keywords search.
As argued by Zhang and Dimitroff [17], metadata can play an important role
on webpage visibility and this might explain the importance of the keywords in
terms of its influence when predicting (Table 5) and when optimizing popularity
(Figure 3).
For demonstration purposes, Figure 5 shows an example of the interface
of the implemented IDSS prototype. A more recent article (from January 16
2015) was selected for this demonstration. The IDSS, in this case using the
without keywords subset, estimated an increase in the popularity probabil-
ity of 13 percentage points if several changes are executed, such as decreas-
ing the number of title words from 11 to 10. In another example (not shown in
the figure), using the with keywords subset, the IDSS advised a change from
the keywords K ∈ {“television”, “showtime”, “uncategorized”, “entertainment”,
“film”, “homeland”, “recaps”} to the set K ∈ {“film”,“relationship”,“family”,
and “night”} for an article about the end of the “Homeland” TV show.
544 K. Fernandes et al.
Fig. 3. Stochastic probability (P ) impact on the Mean Gain (left) and in the Conver-
sion Rate (right).
Fig. 4. Convergence of the local search under the without keywords (left) and with
keywords (right) feature subsets (y-axis denotes the Mean Gain and x-axis the number
of iterations).
4 Conclusions
With the expansion of the Web, there is a growing interest in predicting online
news popularity. In this work, we propose an Intelligent Decision Support System
(IDSS) that first extracts a broad set of features that are known prior to an article
publication, in order to predict its future popularity, under a binary classification
task. Then, it optimizes a subset of the article features (that are more suitable
to be changed by the author), in order to enhance its expected popularity.
Using large and recent dataset, with 39,000 articles collected during a 2 year
period from the popular Mashable news service, we performed a rolling win-
dows evaluation, testing five state of the art classification models under distinct
metrics. Overall, the best result was achieved by a Random Forest (RF), with
an overall area under the Receiver Operating Characteristic (ROC) curve of
73%, which corresponds to an acceptable discrimination. We also analyzed the
importance of the RF inputs, revealing the keyword based features as one of the
most important, followed by Natural Language Processing features and previ-
ous shares of Mashable links. Using the best prediction model as an oracle, we
explored several stochastic hill climbing search variants aiming at the increase
in the estimated article probability when changing two subsets of the article
features (e.g., number of words in title). When optimizing 1,000 articles (from
the last rolling windows test set), we achieved 15 percentage points in terms of
the mean gain for the best local search setup. Considering the obtained results,
we believe that the proposed IDSS is quite valuable for Mashable authors.
In future work, we intend to explore more advanced features related to con-
tent, such as trends analysis. Also, we plan to perform tracking of articles over
time, allowing the usage of more sophisticated forecasting approaches.
References
1. Arnott, D., Pervan, G.: Eight key issues for the decision support systems discipline.
Decision Support Systems 44(3), 657–672 (2008)
2. Michalewicz, Z., Schmidt, M., Michalewicz, M., Chiriac, C.: Adaptive business
intelligence. Springer (2006)
3. Ahmed, M., Spagna, S., Huici, F., Niccolini, S.: A peek into the future: predicting
the evolution of popularity in user generated content. In: Proceedings of the sixth
ACM international conference on Web search and data mining, pp. 607–616. ACM
(2013)
4. Bandari, R., Asur, S., Huberman, B.A.: The pulse of news in social media: fore-
casting popularity. In: ICWSM (2012)
5. Kaltenbrunner, A., Gomez, V., Lopez, V.: Description and prediction of slashdot
activity. In: Web Conference, LA-WEB 2007, pp. 57–66. IEEE, Latin American
(2007)
546 K. Fernandes et al.
6. Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Commu-
nications of the ACM 53(8), 80–88 (2010)
7. Tatar, A., Antoniadis, P., De Amorim, M.D., Fdida, S.: From popularity prediction
to ranking online news. Social Network Analysis and Mining 4(1), 1–12 (2014)
8. Tatar, A., de Amorim, M.D., Fdida, S., Antoniadis, P.: A survey on predicting the
popularity of web content. Journal of Internet Services and Applications 5(1), 1–20
(2014)
9. Lee, J.G., Moon, S., Salamatian, K.: Modeling and predicting the popularity of
online contents with cox proportional hazard regression model. Neurocomputing
76(1), 134–145 (2012)
10. Petrovic, S., Osborne, M., Lavrenko, V.: RT to win! predicting message propagation
in twitter. In: Fifth International AAAI Conference on Weblogs and Social Media
(ICWSM), pp. 586–589 (2011)
11. Hensinger, E., Flaounas, I., Cristianini, N.: Modelling and predicting news
popularity. Pattern Analysis and Applications 16(4), 623–635 (2013)
12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine
Learning Research 3, 993–1022 (2003)
13. De Smedt, T., Nijs, L., Daelemans, W.: Creative web services with pattern. In:
Proceedings of the Fifth International Conference on Computational Creativity
(2014)
14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
15. Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8),
861–874 (2006)
16. Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review.
International Journal of Forecasting 16(4), 437–450 (2000)
17. Zhang, J., Dimitroff, A.: The impact of metadata implementation on webpage
visibility in search engine results (part ii). Information Processing & Management
41(3), 691–715 (2005)
Periodic Episode Discovery Over Event Streams
1 Introduction
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 547–559, 2015.
DOI: 10.1007/978-3-319-23485-4 54
548 J. Soulas and P. Lenca
The main contributions of the paper are: a new frequent parallel episode min-
ing algorithm on data streams; and a heuristic for the online estimation of the
periodicity of the episodes. The rest of the paper is organized as follows: section 2
presents some prominent related work. Section 3 details the proposition for fre-
quent periodic pattern mining and updating. Experiments (section 4) on two
real datasets illustrate the interest of this approach. Finally, some conclusions
are drawn, and ideas for future work are presented.
2 Related Work
Frequent episode mining has attracted a lot of attention since its introduction
by Mannila et al. [12]. The algorithms (e.g. [11–13,18]) differ from one another
by their target episodes (sequential or parallel), their search strategies (breadth
or depth first search), the considered occurrences (contiguous, minimal, overlap-
ping, etc.), and the way they count support. However, most algorithms consider
only static data. The formalism used in this paper (see section 3.1) is loosely
inspired from the formalism used in [11] and [18].
With the rapidly increasing amount of data recording devices (network traffic
monitoring, smart houses, sensor networks,...), stream data mining has gained
major attention. This evolution led to paradigm shifts. For an extensive problem
statement and review of the current trends, see [6]. In particular, item set [3,17]
and episode [11,13,14] mining in streams have been investigated. The application
context of [14] is close to the behavior we are searching for: the focus is set on the
extraction of human activities from home automation sensor streams. However,
periodicity is not taken into account.
Due to their powerful descriptive and predictive capabilities, periodic pat-
terns are studied in several domains. For instance, Kiran and Reddy [8] discover
frequent and periodic patterns in transactional databases. Periodicity is also
defined and used with event sequences, for example with the study of parallel
episodes in home automation sensor data for the monitoring of elderly peo-
ple [7,15]. These three periodic pattern mining algorithms process only static
data.
To the best of our knowledge, few studies have focused on mining both fre-
quent and periodic episodes over data streams. One can however point out some
rather close studies: Li et al. [10] and Baratchi et al. [2] both use geo-spatial data
in order to detect areas of interest for an individual (respectively eagles and peo-
ple) and periodic movement patterns. They both also determine the period of the
discovered patterns. However, their periodicity descriptions are based on single
events, not episodes.
50 55 60
t
... a b c a c c b a c d b
Fig. 2. Example of an histogram representing the observed occurrence times for some
daily habits of an elderly person living in a smart home
Root Legend:
a d ls episode support
b c
(50, 50), (51, 51),
(52, 52), {·} frequent episode
(54, 54), that was recently
(53, 53),
(57, 57)
{a} (56, 56),
(60, 60)
{b} (55, 55), {c} {d} observed
(58, 58)
s = 3 s = 3 (59, 59)
s = 4
b c a c a b s = 1
Update with (b, 60):
(60, 60) the TQ elements
(50, 51), (50, 52),
(51, 52), in bold face cor-
(51, 53), (52, 53),
(55, 56),
(53, 56), (53, 54),
{a, c}
respond to newly
(56, 57),
{a, b} (55, 57),
{b, c} (56, 58),
(58, 60) added elements
(57, 60) (57, 58)
s = 4
s = 5 s = 5
c b a {·} node created with
Fig. 3. Lattice corresponding to the example stream figure 1, when (b, 60) is the last
seen event. The update when event (b, 60) arrives is highlighted in boldface.
Episode Lattice. The frequent episodes and their time queues are stored in
a frequent episode lattice (FEL). The nodes in the FEL correspond either to
length-1 episodes, or to frequent episodes. Length-1 episodes are kept even if they
are not frequent (yet) in order to build longer episodes when they do. The parents
of a node (located at depth d) correspond to its sub-episodes of length d − 1, and
its children to its super-episodes of length d + 1. The edges linking two episodes
are indexed on the only event label that is present in the child episode but
not in the parent. Each node retains the TQ of the corresponding episode and
the GMM description that best fits the episode (see section 3.3). The episode
lattice corresponding to the example in figure 1 is given in figure 3. In spite of
its possibly big edge count, the lattice structure was chosen over the standard
prefix tree because it allows faster episode retrieval and update.
552 J. Soulas and P. Lenca
Update with a New Event. We keep track of the recently modified nodes
(RMN, the nodes describing an episode that occurred recently, i.e. less than Tep ,
or TW , ago). Indeed (see observations 4a and 4b), the recent occurrences of these
episodes can be extended with new, incoming events to form longer episodes. The
RMN are stored in a collection of lists (nodes at depth 1, depth 2, etc). The TQ
of a newly frequent length-n episode is computed thanks to the time queues of a
length-(n-1) sub-episode and the length-1 episode containing the missing item,
using algorithm 1.
When a new event (e, t) arrives, it can be a new occurrence (and also a MO) of
the length-1 episode {e}. It can also form a new MO of an episode E = E ∪ {e},
where E is a recently observed episode. The lattice update follows these steps:
1. If label e is new: create a node for episode {e} and link it to the FEL root;
2. Update the time queue of episode {e};
3. If {e} is frequent:
(a) Add it to the RMN list;
(b) For each node NE in the RMN list, try to build a new occurrence of
E ∪ {e}, following algorithm 2, which takes advantage of observations
1–4. If an episode E = E ∪ {e} becomes frequent, a new node NE is
created, and is linked to its parents in the lattice. The parents are the
nodes describing the episodes E \{e } for each e ∈ E, and are accessible
via NE .parent(e ).child(e), where NE is the node for the known subset
E. Since the RMN list is layered, and explored by increasing node depth,
NE .parent(e ).child(e) is always created before NE tries to access it.
The update process of the FEL is illustrated with the arrival of a new event
(b, 60) in figure 3. (b, 60) makes {b} a frequent episode. The nodes the RMN
list ({a}, {c} and {a, c}) are candidate for the extension with the (frequent)
new event. This allows the investigation of episodes {a, b} (extension of {a}),
{b, c} (extension of {c}), and {a, b, c} (extension of {a, c}), which indeed become
frequent.
Periodic Episode Discovery Over Event Streams 553
Iteration of EM
New Comp. yes Distribution Empty no Close no New no
occ. match? update comp.? comps.? occ.?
no yes yes
Algorithm 3 presents the general workflow for the periodicity update. When
a new MO is detected for the episode, the position of the timestamp in the
period tr = timestamp modulo period is computed. If tr does not match any
of the existing components, i.e. for each component (μ, σ), |tr − μ| > σ, a new
component is added. When outdated data is removed, some components lose
their importance. When too rare, they are removed from the GMM. Finally, when
two components (μ1 , σ1 ), (μ2 , σ2 ) become close to one another, i.e if |μ1 − μ2 | <
a ∗ (σ1 + σ2 ) (with a = 1.5 in the experiments), the two components are merged.
In the general case, GMM updates do not change much the model. Thus, when
the number of components does not change, a single EM iteration is necessary
to update the characteristics of the components.
The interest of this approach was evaluated on synthetic data, following know
mixture of gaussian models evolving with time. The heuristics allow the detection
of the main trends in the data: emergence of new components, disparition of old
and rare components, shiftings in characteristics of the components.
4 Experimentation
A prototype was implemented in Python. It was also instrumented to record the
episodes and lattice updates. The instrumentation slows down the experimenta-
tions: the execution times given in the next subsections are over-estimated.
1600 150 80 s
frequent
1200 periodic 60 s
100
800 40 s
50
400 20 s
0 0 0s
Dec Feb Apr Jun Dec Feb Apr Jun Dec Feb Apr Jun
Time Time Time
(a) Event count in the win- (b) Interesting episode (c) Cumulative execution
dow count time
The dataset was processed using a period of one day, a window TW of 3 weeks,
a minimal support Smin of 15, a maximal episode duration Tep of 30 minutes,
and an accuracy threshold of 70%. The parameter setting was reinforced by a
descriptive analysis of the data (e.g., it showed that most activities last less
than 30 minutes). The results obtained throughout the course of the execution
are given in figure 4. During the first 3 weeks, the sliding window fills with the
incoming events, and the first frequent and periodic episodes appear. Then, the
number of events in the window remains quite stable, but the behaviors keep
evolving. The execution time (figure 4c) shows the scalability of the approach
for this kind of application. The contents of the FEL in the last window is
investigated, some of periodic episodes with the highest accuracy A are:
– {Sleeping end}: 50 MO, 1 component, μ = 6:00, σ = 2 hours, A = 100%
– {Sleeping end, Meal Preparation begin, Meal Preparation end, Relax begin}:
26 MO, 1 component, μ =6:00, σ =1.45 hours, A = 82%
– {Enter Home begin, Enter Home end}: 61 MO, 1 component, μ =14:00, σ =3
hours, A = 88%
These patterns can be interpreted as habits: the person woke up every morn-
ing around 6:00, and also had breakfast in 82% of the mornings. The third
episode describes a movement pattern: the inhabitant usually goes out of home
at some time (it is another episode), and comes back in the early afternoon.
Figure 5 presents the influence of the minimal support Smin , maximal episode
duration Tep , and window length TW on the maximal size of the FEL and the
execution time. In particular, it shows that the execution time is reasonable and
scalable. The duration of the episodes also has a large impact on the size of the
FEL.
6000
500 10
450 frequent episodes 9 Total execution time
400 periodic episodes 8
350 7
300 6
250 5
200 4
150 3
100 2
50 1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Window duration (weeks) Window duration (weeks)
(e) Window duration (f) Window duration
Fig. 5. Influence of the algorithm configuration on the frequent and periodic episode
counts, and on the execution time for the CASAS Aruba dataset
(a) Event count in the win- (b) Interesting episode (c) Cumulative execution
dow count time
Fig. 6. Execution log for the Travian fr5 alliance membership dataset
Periodic Episode Discovery Over Event Streams 557
players alliance shifts: the event labels look like “Player P [joined|left] alliance
A”. 27674 such events are recorded, but most labels are rare (25985 labels).
The dataset was processed with a period of one week, a window TW = six
weeks, a minimal support Smin = 5 and a maximal episode duration Tep = 1 day.
Figure 6 presents the evolution of the window size, episode counts, and execution
time during the mining. The results are fairly different from those of the home
automation dataset, but were explained by a player (picturing a domain expert).
During the first 6 weeks, the window fills rapidly with events: new players register
onto the game, and the diplomacy begins. The players join or switch alliances.
After the 6 weeks, the event count in the window decreases with time. Several
explanations: (i) the opening of a new game round (on August 22nd ) slowed down
the number of new player registrations (players tend to join the most recent game
round); (ii) most players have found an alliance they like: they stop changing
alliances. Until October, little frequent and periodic patterns are detected, but
their number increases rapidly after that. The periodic episodes discovered in
the Sep, 18th – Oct, 30th (maximal count of periodic episodes) contain notably:
5 Conclusion
Behavior pattern (episode) mining over event sequences is an important data
mining problem, with many applications, in particular for ambient assisted liv-
ing, or wildlife behavior monitoring. Several frequent episode mining algorithms
have been proposed for both static data and data streams. But while periodic-
ity can also be an interesting characteristic for the study of behaviors, very few
algorithm have addressed frequent and periodic patterns. We propose an effi-
cient algorithm to mine frequent periodic episodes in data streams. We briefly
illustrate the interest of this algorithm with two case studies. As a perspective
of this work, the experiments can be extensively increased, and applied to other
application domains. It also be interesting to include a period-determination
algorithm in order to automatically adapt the period to each pattern. Closed
episodes and non-overlaping occurrences could also be investigated.
558 J. Soulas and P. Lenca
References
1. Amphawan, K., Lenca, P., Surarerks, A.: Efficient mining top-k regular-frequent
itemset using compressed tidsets. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S.,
Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 124–135. Springer,
Heidelberg (2012)
2. Baratchi, M., Meratnia, N., Havinga, P.J.M.: Recognition of periodic behavioral
patterns from streaming mobility data. In: Stojmenovic, I., Cheng, Z., Guo, S. (eds.)
MOBIQUITOUS 2013. LNICST, vol. 131, pp. 102–115. Springer, Heidelberg (2014)
3. Calders, T., Dexters, N., Goethals, B.: Mining frequent itemsets in a stream. In:
ICDM, pp. 83–92 (2007)
4. Cook, D.J., Crandall, A.S., Thomas, B.L., Krishnan, N.C.: Casas: A smart home
in a box. IEEE Computer 46(7), 62–69 (2013)
5. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society. Series B (Method-
ological), 1–38 (1977)
6. Gama, J.: A survey on learning from data streams: current and future trends.
Progress in Artificial Intelligence 1(1), 45–55 (2012)
7. Heierman, E.O., Youngblood, G.M., Cook, D.J.: Mining temporal sequences to
discover interesting patterns. In: KDD Workshop on mining temporal and sequen-
tial data (2004)
8. Kiran, R.U., Reddy, P.K.: Mining periodic-frequent patterns with maximum
items’ support constraints. In: ACM COMPUTE Bangalore Conference, pp. 1–8
(2010)
9. Lahiri, M., Berger-Wolf, T.Y.: Mining periodic behavior in dynamic social net-
works. In: ICDM, pp. 373–382. IEEE Computer Society (2008)
10. Li, Z., Han, J., Ding, B., Kays, R.: Mining periodic behaviors of object movements
for animal and biological sustainability studies. Data Mining and Knowledge Dis-
covery 24(2), 355–386 (2012)
11. Lin, S., Qiao, J., Wang, Y.: Frequent episode mining within the latest time win-
dows over event streams. Appl. Intell. 40(1), 13–28 (2014)
12. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in
sequences. In: Fayyad, U.M., Uthurusamy, R. (eds.) KDD, pp. 210–215. AAAI
Press (1995)
13. Patnaik, D., Laxman, S., Chandramouli, B., Ramakrishnan, N.: Efficient episode
mining of dynamic event streams. In: ICDM, pp. 605–614 (2012)
14. Rashidi, P., Cook, D.J.: Mining sensor streams for discovering human activity
patterns over time. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X.
(eds.) ICDM 2010, The 10th IEEE International Conference on Data Mining,
Sydney, Australia, December 14–17, 2010, pp. 431–440. IEEE Computer Society
(2010)
15. Soulas, J., Lenca, P., Thépaut, A.: Monitoring the habits of elderly people through
data mining from home automation devices data. In: Reis, L.P., Correia, L.,
Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 343–354. Springer, Heidelberg
(2013)
Periodic Episode Discovery Over Event Streams 559
16. Surana, A., Kiran, R.U., Reddy, P.K.: An efficient approach to mine periodic-
frequent patterns in transactional databases. In: Cao, L., Huang, J.Z., Bailey, J.,
Koh, Y.S., Luo, J. (eds.) PAKDD Workshops 2011. LNCS, vol. 7104, pp. 254–266.
Springer, Heidelberg (2012)
17. Wong, R.W., Fu, A.C.: Mining top-k frequent itemsets from data streams. Data
Mining and Knowledge Discovery 13(2), 193–217 (2006)
18. Zhu, H., Wang, P., He, X., Li, Y., Wang, W., Shi, B.: Efficient episode mining
with minimal and non-overlapping occurrences. In: ICDM, pp. 1211–1216 (2010)
Forecasting the Correct Trading Actions
1 Introduction
of the variation of prices is a 2.5% increase then the correct decision is to buy
the asset as this will allow covering transaction costs and still have some profit.
Given the deterministic mapping from forecasted values into decisions we can
define the prediction task in two different ways. The first consists on obtaining
a numeric prediction model that we can then use to obtain predictions of the
future variation of the prices which are then transformed (deterministically)
into trading decisions (e.g. [1], [2]). The second alternative consists of directly
forecasting the correct trading decisions (e.g. [3], [4], [5]). Which is the best
option in terms of the resulting financial results? To the best of our knowledge
no comparative study was carried out to answer this question. This is the goal of
the current paper: to compare these two approaches and provide experimental
evidence of the advantages and disadvantages of each alternative.
2 Problem Formalization
The problem of decision making based on forecasts of a numerical (continuous)
value can be formalized as follows. We assume there is an unknown function
that maps the values of p predictor variables into the values of a certain numeric
variable Y . Let f be this unknown function that receives as input a vector x
with the values of the p predictors and returns the value of the target numeric
variable Y whose values are supposed to depend on these predictors,
f : Rp → R
x → f (x).
We also assume that based on the values of this variable Y some decisions
need to be made. Let g be another function that given the values of this target
numeric variable transforms them into actions/decisions,
g : R → A = {a1 , a2 , a3 , . . . }
Y → g(Y ).
The decisions to open or close short/long positions are typically the result of
a deterministic mapping from the predicted prices variation.
In our experiments, we have used the assets prices of 12 companies. Each
data set has a minimum of 7 years of daily data and a maximum of 30 years.
In order to simplify the study, we will be working with a one-day horizon, i.e.
take a decision based on the forecasts of the assets variation for one day ahead.
Moreover, we will be working exclusively with the closing prices of each trading
session, i.e. we assume trading decisions are to be made after the markets close.
The decision function for this application receives as input the forecast of the
daily variation of the assets closing prices and returns a trading action. We will
be using the following function in our experiments:
This means we are assuming that any variation above 2% will be sufficient
to cover the transaction costs and still obtain some profit. Concerning the data
that will be used as predictors for the forecasting models (either forecasting the
prices variation (Y ) or directly the trading action (A)) we have used the price
variations on recent days as well as some trading indicators, such as the annual
volatility, the Welles Wilder’s style moving average [6], the stop and reverse point
indicator developed by J. Welles Wilder [6], the usual moving average and others.
The goal of this selection of predictors is to provide the forecasting models with
useful information on the recent dynamics of the assets prices.
Regarding the performance metrics we will use to compare each approach, we
will use two metrics that capture important properties of the economic results of
the trading decisions made by the alternative models. More specifically, we will
use the Sharpe Ratio as a measure of the risk (volatility) associated with the
decisions, and the percentage Total Return as a measure of the overall financial
results of these actions. To make our experiments more realistic we will consider
a transaction cost of 2% for each Buy or Sell decision a model may take.
At this stage it is important to remark that the prediction tasks we are facing
have some characteristics that turn them in to particularly challenging tasks. One
of the main hurdles results from the fact that interesting events, from a trading
perspective, are rare in financial markets. In effect, large movements of prices are
not very frequent. This means that the data sets we will provide to the models
have clearly imbalanced distributions of the target variables (both the numeric
percentage variations and the trading actions). To make this imbalance problem
harder the situations that are more interesting from a trading perspective are
rare in the data sets which creates difficulties to most modelling techniques. In
the next section, we will describe some of the measures we have taken to alleviate
this problem.
564 L. Baı́a and L. Torgo
Table 1. Regression models used for the experimental comparisons. SVM stands for
Support Vectorial Machines, KNN for K-nearest neighbours, NNET for Neural Net-
works and MARS for Multivariate Adaptive Regression Spline models
Table 2. Classification models used for the experimental comparisons. SVM stands
for Support Vectorial Machines, KNN for K-nearest neighbours, NNET for Neural
Networks and MARS for Multivariate Adaptive Regression Spline models
The predictive tasks we are facing have two main difficulties: (i) the fact
that the distribution of the target variables is highly imbalanced, with the more
relevant values being less frequent; and (ii) the fact that there is an implicit order-
ing among the decisions. The first problem causes most modelling techniques to
Forecasting the Correct Trading Actions 565
focus on cases (the most frequent) that are not relevant for the application goals.
The second problem is specific to classification tasks as these algorithms do not
distinguish among the different types of errors, whilst in our target applciation
confusing a Buy decision with a Hold decision is less serious than confusing it
with a Sell.
These two problems lead us to consider several alternatives to our base mod-
elling approaches described in Tables 1 and 2. For the first problem of imbalance
we have considered the hypothesis of using resampling to balance the distri-
bution of the target variable before obtaining the models. In order to do that,
we have used the Smote algorithm [7]. This method is well known for classi-
fication models, consisting basically of oversampling the minority classes and
under-sampling the majority ones. The goal is to modify the data set in order
to ensure that each class is similarly represented. Regarding the regression tasks
we have used the work by Torgo et. al [8], where a regression version of Smote
was presented. Essentially, the concept is the same as in classification, using a
method to try to balance the continuous distribution of the target variable by
oversampling and under-sampling different ranges of its domain.
We have thoroughly tested the hypothesis that using resampling before
obtain ghe models would boost the performance of the different models we have
considered for our tasks. Our experiments confirmed that resampling lead the
models to issue more Buy and Sell signals (the one that are less frequent but
more interesting). However, this increased number of signals was accompanied
by an increased financial risk that frequently lead to very poor economic results,
with very few exceptions.
Regarding the second problem of the order among the classes we have also
considered a frequently used approach to this issue. Namely, we have used a
cost-benefit matrix that allows us to distinguish between the different types of
classification errors. Using this matrix, and given a probabilistic classifier, we
can predict for each test case the class that maximises the utility instead of the
class that has the highest probability.
We have used the following procedure to obtain the cost-benefit matrices for
our tasks. Correctly predicted buy/sell signals have a positive benefit estimated
as the average return of the buy/sell signals in the training set. On the other
hand, in the case of incorrectly predicting a true hold signal as buy (or sell ), we
we assign it minus the average return of the buy (or sell ) signals. Basically, the
benefit associated to correctly predicting one rare signal is entirely lost when the
model suggests an investment when the correct action would be doing nothing.
In the extreme case of confusing the buy and sell signals, the penalty will be
minus the sum of the average return of each signal. Choosing such a high penalty
for these cases will eventually change the model to be less likely to make this
type of very dangerous mistakes. Considering the case of incorrectly predicting
a true sell (or buy) signal as hold, we also charge for it, but in a less severe way.
Therefore, the average of the sell (or buy) signal is considered, but divided by
two. This division was our way of “teaching” the model that it is preferable to
miss an opportunity to earn money rather than making the investor lose money.
566 L. Baı́a and L. Torgo
Finally, correctly predicting a hold signal gives no penalty nor reward, since
no money is either won or lost. Table 3 shows an example of such cost-benefit
matrix that was obtained with the data from 1981-01-05 to 2000-10-13 of Apple.
Trues
s h b
s 0.49 -0.49 -0.82
Pred h -0.24 0.00 -0.17
b -0.82 -0.33 0.33
We have also thoroughly tested the hypothesis that using cost-benefit matri-
ces to implement utility maximisation would improve the performance of the
models. Our tests have shown that nearly half the model variants see their per-
formance boosted with this approach.
4 Experimental Results
This section presents the results of the experimental comparisons between the
two general approaches to making trading decisions based on forecasting models.
Forecasting the Correct Trading Actions 567
Fig. 1. Best classification variant against the best regression one for the Total Return
and Sharpe Ratio metrics (asterisks denote that the respective variant is significantly
better, according to a Wilcoxon test with α = 0.05).
a model obtains a very good result for one company but poor for all the others
(meaning that it was lucky in that specific company), its average ranking will be
low allowing the top average rankings to be populated by the true top models
that perform well across most companies.
Table 4 summarises the results in terms of Total Return. Since we could not
reject the Friedman’s null hypothesis, the post-hoc Nemenyi’s test was not per-
formed. This means that we can not say with 95% confidence that there is some
difference in terms of Total return between these modelling approaches. Never-
theless, there are some observations to remark. The model with the best average
ranking is a classification model using cost-benefit matrices. All the remain-
ing classification variants are in their original form (without using cost-benefit
matrices) and occupying mostly the last positions. Moreover, not a single variant
obtained with Smote appears in this top 5 for each approach, which means that
we confirm that resampling does not seem to pay off for this class of applications
due to the economic costs of making more risky decisions. Furthermore, another
very interesting remark is that all the top models are are using SVMs as the
base learning algorithm. Overall we can not say that any of the two approaches
(forecasting directly the trading actions using classification models or forecasting
the price returns before using regression) is better than the other.
Table 5 shows the results of the same experiment in terms of Sharpe Ratio,
i.e. the risk exposure of the alternatives. The conclusions are quite similar to
Forecasting the Correct Trading Actions 569
Table 4. The average rank of the top 5 Classification and Regression models in terms
of Total Return. The Friedman test returned a p-value of 0.3113477, meaning that there
is no statistical difference between all the 10 variants compared. Note: BC means the
model was obtained using a benefit-cost matrix, while (p)/(l) means the SVM model
was obtained using a polynomial/linear kernel. The vx labels represent the different
parameter settings that were considered within each variant.
the Total Return metric. Once again, no significant differences were observed.
Still, one should note that the first 5 places are dominated by the classification
approaches. The best variant for the Total Return is also the best variant for the
Sharpe Ratio, which makes this variant unarguably the best one of our study
when considering the 12 different companies. Hence, ultimately we can state that
the most solid model belongs to the classification approach using an SVM with
cost-benefit matrices, since it obtained the highest returns with lowest associated
risk. Finally, unlike the results for Total Return, in this case we observe other
learning algorithms appearing in the top 5 best results.
Table 5. Top 5 average rankings of the Classification and Regression models for the
Sharpe Ratio. The Friedman test returned a p-value of 0.1037471, implying there is no
statistical difference between all the 10 variants tested.
In conclusion, we can not state that one approach performs definitely bet-
ter than the other in the context of financial trading decisions. The scientific
community typically puts more effort into the regression models, but this study
strongly suggests that both have at least the same potential. Actually, the most
consistent model we could obtain is a classification approach. Another interest-
ing conclusion is that, of a considerably large set of different types of models,
SVMs achieved better results both when considering classification or regression
tasks.
570 L. Baı́a and L. Torgo
5 Conclusions
This paper presents a study of two different approaches to financial trading
decisions based on forecasting models. The first, and more conventional, app-
roach uses regression tools to forecast the future evolution of prices and then
uses some decision rules to choose the “correct” trading decision based on these
predictions. The second approach tries to directly forecast the “correct” trading
decision. Our study is a specific instance of the more general problem of mak-
ing decisions based on numerical forecasts. In this paper we have focused on
financial trading decisions because this is a specific domain that requires specific
trade-offs in terms of economic results. This means that our conclusions from
this study in this area should not be generalised to other application domains.
Overall, the main conclusion of this study is that, for this specific applica-
tion domain, there seems to not be any statistically significant difference between
these two approaches to decision making. Given the large set of classification and
regression models that were considered, as well as different approaches to the
learning task, we claim that this conclusion is supported by significant experi-
mental evidence.
The experiments carried out in this paper have also allowed us to draw some
other conclusions in terms of the applicability of resampling and cost-benefit
matrices in the context of financial forecasting. Namely, we have observed that
the application of resampling, although increasing the number of trading deci-
sions made by the models, would typically bring additional financial risks that
would make the models unattractive to traders. On the other hand the use of
cost-benefit matrices in an effort to maximise the utility of the predictions of the
models, did bring some advantages to a high percentage of modelling variants.
As future work we plan to extend our comparisons of these two forms of
addressing decision making based on numeric forecasting, to other application
domains, in an effort to provide general guidelines to the community on how to
address these relevant real world tasks.
References
1. Lu, C.J., Lee, T.S., Chiu, C.C.: Financial time series forecasting using independent
component analysis and support vector regression. Decision Support Systems 47(2),
115–125 (2009). cited By 112
2. HELLSTR iOM, T.: Data snooping in the stock market. Theory of Stochastic Pro-
cesses 21, 33–50 (1999) (1999b)
3. Luo, L., Chen, X.: Integrating piecewise linear representation and weighted support
vector machine for stock trading signal prediction. Applied Soft Computing 13(2),
806–816 (2013)
Forecasting the Correct Trading Actions 571
4. Ma, G.Z., Song, E., Hung, C.C., Su, L., Huang, D.S.: Multiple costs based decision
making with back-propagation neural networks. Decision Support Systems 52(3),
657–663 (2012)
5. Teixeira, L.A., de Oliveira, A.L.I.: A method for automatic stock trading combining
technical analysis and nearest neighbor classification. Expert Systems with Appli-
cations 37(10), 6885–6890 (2010)
6. Wilder, J.: New Concepts in Technical Trading Systems. Trend Research (1978)
7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research 16(1),
321–357 (2002)
8. Torgo, L., Branco, P., Ribeiro, R.P., Pfahringer, B.: Resampling strategies for regres-
sion. Expert Systems (2014)
9. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal
of Machine Learning Research 7, 1–30 (2006)
CTCHAID: Extending the Application
of the Consolidation Methodology
1 Introduction
In some problems that make use of classification techniques, the reason of why
a decision is made is almost as important as the accuracy of the decision, thus
the classifier must be comprehensible. Decision trees are considered comprehen-
sible classifiers. The most common way of improving the discriminating capacity
of decision trees is to build ensemble classifiers. However with ensembles, the
explaining capacity individual trees possess is lost. The consolidation of algo-
rithms is an alternative that resamples the training sample multiple times and
applies the ensemble voting process while the classifier is being built, so that
the final classifier is a single classifier (with explaining capacity) built using the
knowledge of multiple samples. The well-known C4.5 tree induction algorithm
[10] has successfully been consolidated in the past [9].
With the aim of studying the benefit of the consolidation process on other algo-
rithms, maintaining the explaining capacity of the classifier, in this work we apply
this methodology on a variation of the CHAID [7,8] algorithm (CHAID* [5]), one
of the first tree induction algorithms along with C4.5 and CART. We propose the
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 572–577, 2015.
DOI: 10.1007/978-3-319-23485-4 56
CTCHAID: Extending the Application of the Consolidation Methodology 573
consolidation of CHAID* and using tests for statistical significance [4] we com-
pare its results in three different classification contexts (amounting a total of 96
datasets) against 16 genetics-based and 7 classical algorithms and also the origi-
nal CTC (Consolidated Tree Construction) algorithm.
The rest of the paper is organized as follows. Section 2 details the related
work. Section 3 explains the consolidation version of the CHAID*, CTCHAID.
Section 4 defines the experimental methodology. Section 5 lays out the obtained
results. Finally, section 6 gives this work’s conclusions.
3 CTCHAID
As explained in section 2 the changes made to CHAID* make it very similar to
the C4.5 algorithm, which makes the implementation of CTCHAID very similar
to the implementation of CTC45 (Consolidated C4.5) described in section 2.
Aside from the split function, the other main difference between the algorithms
is how discrete variables (nominal and ordinal) are handled. By default, when
splitting using a discrete variable C4.5 creates a branch for every possible value
for the attribute. On the other hand, CHAID* considers grouping more than
one value on each branch. In each node a contingency table is created for each
variable. Each of these tables describes the relationship between the values a
variable can take and how the examples with this value are distributed among
all possible classes. CHAID* uses Kass’ algorithm [7] on all contingency tables
to find the most significant variable and value-group to make the split.
When consolidating CHAID* the behavior is different depending on the type
of variable. First the contingency tables are built from each sample and processed
with Kass’ algorithm to find the most important grouping. From each subsample
a variable is proposed and voting takes place as with CTC45. If the voted variable
is continuous the median value of the proposed cut-point values will be used. For
categorical values, the contingency tables from each tree for the chosen variable
are averaged into a single table. This averaged table is processed with Kass’
algorithm to find the most significant combination of categories.
4 Experimental Methodology
This experiment follows a very similar structure as the works in [3] and [6] as we
compare to the results published in those works. The same three classification
contexts are analyzed: 30 standard (mostly multi-class) datasets, 33 two-class
imbalanced datasets and the same 33 imbalanced datasets preprocessed with
SMOTE (Synthetic Minority Over-sampling Technique [2]) until the two classes
were balanced by oversampling the minority class. Fernández et al. [3] proposed
a taxonomy to classify genetics-based machine learning (GBML) algorithms for
rule induction. They listed 16 algorithms and classified them in 3 categories and
5 subcategories. They compared the performance of these algorithms with a set
of classical algorithms (CART, AQ, CN2, C4.5, C4.5-Rules and Ripper). In our
work, for each of the contexts, the winner for each of the 5 GBML categories, 7
classical algorithms (including CHAID*), CTC45 and CTCHAID are compared.
Finally a global ranking is also computed. All algorithms used the same 5-run
× 5-fold cross-validation strategy and the same training/test partitions (found
in the KEEL repository1 ). The tables containing all the information have been
omitted from this article for space issues and have been moved to the website
with the additional material for this paper2 .
1
http://sci2s.ugr.es/keel/datasets.php
2
http://www.aldapa.eus/res/2015/ctchaid/
CTCHAID: Extending the Application of the Consolidation Methodology 575
For CTC45 and CTCHAID, following the conclusions of the latest work on
consolidation [6], the subsamples used in this work are balanced and the num-
ber of examples per class is the number of examples the least populous class
has in the original training sample. The number of samples for each dataset
has been determined using a coverage value of 99% based on the results of [6].
The tables detailing the number of samples for each dataset have been moved
to the additional material. The pruning used for C4.5, CHAID*, CTC45 and
CTCHAID was C4.5’s reduced-error pruning. However, when pruning resulted
in a tree with just the root node, the tree was kept unpruned. This is due to the
fact that a root node tree results in zero for most performance measures used in
this paper. Thus, the results shown for C4.5 are not those previously published
by Fernández et al. using the KEEL platform but Quinlan’s implementation of
the algorithm.
5 Results
Fig. 1. Visual representation of Friedman Aligned Ranks for the three contexts.
Fig. 2. Visual representation of Friedman Aligned Ranks for the global ranking.
Acknowledgments. This work was funded by the University of the Basque Country
UPV/EHU (BAILab, grant UFI11/45); by the Department of Education, Universities
and Research and by the Department of Economic Development and Competitiveness
of the Basque Government (grant PRE-2013-1-887; BOPV/2013/128/3067, grant IT-
395-10, grant IE14-386); and by the Ministry of Economy and Competitiveness of the
Spanish Government (eGovernAbility, grant TIN2014-52665-C2-1-R).
References
1. Abbasian, H., Drummond, C., Japkowicz, N., Matwin, S.: Inner ensembles: using
ensemble methods inside the learning algorithm. In: Blockeel, H., Kersting, K.,
Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part III. LNCS, vol. 8190,
pp. 33–48. Springer, Heidelberg (2013)
2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic
minority over-sampling technique. Journal of Artificial Intelligence Research 16(1),
321–357 (2002)
3. Fernández, A., Garcia, S., Luengo, J., Bernadó-Mansilla, E., Herrera, F.: Genetics-
based machine learning for rule induction: State of the art, taxonomy, and com-
parative study. IEEE Transactions on Evolutionary Computation 14(6), 913–941
(2010)
4. Garcı́a, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests
for multiple comparisons in the design of experiments in computational intelligence
and data mining: Experimental analysis of power. Information Sciences 180(10),
2044–2064 (2010)
5. Ibarguren, I., Lasarguren, A., Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga,
I.: BFPART: Best-first PART. Submitted to Information Sciences
6. Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-
based resampling: Building robust consolidated decision trees. Knowledge-Based
Systems 79, 51–67 (2015)
7. Kass, G.V.: Significance testing in automatic interaction detection (a.i.d.). Journal
of the Royal Statistical Society. Series C (Applied Statistics) 24(2), 178–189 (1975)
8. Morgan, J.A., Sonquist, J.N.: Problems in the analysis of survey data, and a pro-
posal. J. Amer. Statistics Ass. 58, 415–434 (1963)
9. Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martı́n, J.I.: Combining
multiple class distribution modified subsamples in a single tree. Pattern Recogni-
tion Letters 28(4), 414–422 (2007)
10. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers
Inc., San Francisco (1993)
Towards Interactive Visualization of Time Series Data
to Support Knowledge Discovery
Jan Géryk()
1 Introduction
methods are available for visualizing data with more than two dimensions (e.g. mo-
tion charts or parallel coordinates), as the logical mapping of the data dimension to
the screen dimension cannot be directly applied.
Although a snapshot of the data can be beneficial, presenting changes over time
can provide a more sophisticated perspective. Animations allow knowledge discovery
in complex data and make it easier to see meaningful characteristics of changes over
time. The dynamic nature of Motion Charts (MC) allows a better identification of
trends in the longitudinal multivariate data and enables visualization of more element
characteristics simultaneously, as presented in [2]. The authors also conducted an
experiment whose results concluded that MC excels at data presentation. MC is a
dynamic and interactive visualization method that enables analysts to display complex
and quantitative data in an intelligible way.
In this paper, we show the assets of animated data visualizations for successful un-
derstanding of complex and large data. In the next section, we describe our VA tool
that implements visualization methods which make use of enhanced MC design. Fur-
ther, we conduct an empirical study with 16 participants on their data comprehension
to compare the efficacy of various static data visualizations with our enhanced me-
thods. We then discuss the implications of our experiment results. Finally, we draw
the conclusion and outline future work.
3 Experiment
3.1 Hypotheses
We designed the experiment to address the following three hypotheses:
• H1. The MC methods will be more effective than both the static methods for all
datasets. That is, the subjects will be (a) faster and (b) make fewer errors when us-
ing MC.
• H2. The subjects will be more effective with the small datasets than with the large
datasets for all methods. That is, participants will be (a) faster and (b) make fewer
errors when working with small datasets.
In each trial, the participants completed 12 tasks, each with 1 to 3 required answers.
Each task had identification numbers of students or fields of study as the answer.
Several questions have more correct answers than requested. The participants selected
Towards Interactive Visualization of Time Series Data to Support Knowledge Discovery 581
answers by selecting IDs in the legend box located in the upper right from the chart
area. In order to complete the task, two buttons could be used–either “OK” button to
confirm the participant’s choice or “Skip Question” button to proceed to the next task
without saving the answer. There was no time limit during the experiment. For each
task, the order of the datasets was fixed with the smaller ones first.
The participants were asked to proceed as quickly and accurately as possible. In
order to reduce learning effects, the participants were told to make use of as many
practice trials as they needed. It was followed by 12 tasks (6 small dataset tasks and 6
large dataset tasks in this particular order). After that, the subjects completed survey
with questions specific for the visualization. Each block lasted about 1.5 hours. The
subjects were screened to ensure that they were not color-blind and understood com-
mon data visualization methods. To test for significant effects, we conducted repeated
measures analysis of variance (RM-ANOVA). Post-hoc analyses were performed by
using the Bonferroni technique. Only significant results are reported.
3.2 Results
Accuracy. Since some of the tasks required multiple answers, accuracy was calcu-
lated as a percentage of the correct answers. Thus, when a subject selected only one
correct answer from two, we calculated the answer as 50 % accurate rather than an
incorrect answer. The analysis revealed several significant accuracy results at the .05
level. The type of visualization had a statistically significant effect on the accuracy for
large datasets (F(1.413, 21.194) = 20.700, p < 0.001). Pair-wise comparison of the
visualizations found significant differences showing that MC was significantly more
accurate than the LC (p < 0.001). MC was also more accurate than the SP (p < 0.001).
There was no statistically significant difference between the LC and the SP. For the
small datasets, visualizations were not statistically distinguishable. Second, the sub-
jects were more accurate with the small datasets (F(1, 15) = 50.668, p < 0.001). This
fact supports our hypothesis H2.b.
Subjective Preferences. For each experiment block, the subjects completed a survey
where the subjects assessed their preferences regarding analysis. The subjects rated
LC, SP, and MC on a five-point Likert scale (1 = strongly disagree, 5 = strongly
agree). Using RM-ANOVA, we revealed statistically significant effects (F(1.364,
20.453) = 4.672, p = 0.033). Post-hoc analysis found that MC was significantly more
helpful than LC (p = 0.046).
SP LC MC
The visualization was helpful when solving the tasks. 3.50 3.44 3.94
I found this visualization entertaining and interesting. 2.56 2.31 4.13
I prefer visualization for the small datasets. 3.88 4.00 2.63
I prefer visualization for the large datasets. 2.38 2.69 3.69
The significant differences indicate that MC was judged to be more helpful than
the static methods. The subjects preferred the static methods to MC for the small data-
sets. However, MC was judged to be more beneficial than static methods for the large
datasets (p < 0.001). The results also showed that MC was more entertaining and
interesting than the static methods (p < 0.001).
4 Discussion
Our first hypothesis (H1) was that MC would outperform both the static methods for all
dataset sizes, but the hypothesis was only partially confirmed. Contrary to the hypothe-
sis, the static methods proved to achieve better speed than the animated methods for the
small datasets. Moreover, the methods were not statistically distinguishable in terms of
accuracy. We also hypothesized that the accuracy would increase for the smaller data-
sets (H2). Hypothesis H2.a was supported, because the subjects were faster with the
small datasets. The mean time for the large datasets was 56.94 seconds and for the small
datasets was 36.69 seconds. Hypothesis H2.b was also supported, because the subjects
made fewer errors with the small datasets when compared with the large datasets. Accu-
racy is an issue for static visualizations when the large datasets are employed.
The study supports the intuition that using animations in analysis requires conve-
nient interactive tools to support effective use. The study suggests that MC leads to
fewer errors. Also, the subjects found MC method to be more entertaining and excit-
ing. The evidence from the study indicates that the animations were more effective at
building the subjects' comprehension of large datasets. However, the simplicity of
static methods was more effective for small datasets. These observations are consis-
tent with the verbal reports in which the subjects refused to abandon the static visual
methods generally. Results supported the thoughts that MC does not represent a re-
placement of common statistic data visualizations but a powerful addition. The over-
all accuracy was quite low in the study with average about 75%. However, only one
question was skipped.
Towards Interactive Visualization of Time Series Data to Support Knowledge Discovery 583
In the tool, we enhanced the MC design and expanded it to be more suitable for AA
analysis. We also developed an intuitive, yet powerful, interactive user interface that
provides analysts with instantaneous control of MC properties and data configuration,
along with several customization options to increase the efficacy of the exploration
process. We validate the usefulness and general applicability of the tool with the
experiment to assess an efficacy of the described methods.
The study suggests that animated methods lead to fewer errors for the large data-
sets. Also, the subjects find MC to be more entertaining and interesting. The enter-
tainment value probably contributes to the efficacy of the animation, because it serves
to hold the subjects' attention. This fact can be useful for the purpose of designing
methods in academic settings.
Despite the findings of the study, further investigation is required to evaluate the
general applicability of the animated methods. We also plan to combine our animated
interactive methods with common DM methods to follow the VA principle more pre-
cisely. We already implemented a standalone method utilizing decision tree algorithm
providing interactive visual representation. We prefer decision trees because of their
clarity and simplicity to comprehend. We will also finish the integration of the tool
with our university information system to allow university executives and administra-
tors easy access when analyzing AA and to better support decision making.
References
1. Goldstein, P.J.: Academic analytics: The uses of management information and technology
in higher education. Educause (2005)
2. Al-Aziz, J., Christou, N., Dinov, I.D.: SOCR Motion Charts: an efficient, open-source,
interactive and dynamic applet for visualizing longitudinal multivariate data. Journal of
Statistics Education 18(3), 1–29 (2010)
3. Grossenbacher, A.: The globalisation of statistical content Statistical Journal of the IAOS.
Journal of the International Association for Official Statistics, 133–144 (2008)
4. Baldwin, J., Damian, D.: Tool usage within a globally distributed software development
course and implications for teaching. Collaborative Teaching of Globally Distributed
Software Development, 15–19 (2013)
5. Sultan, T., Khedr, A., Nasr, M., Abdou, R.: A Proposed Integrated Approach for BI and GIS
in Health Sector to Support Decision Makers. Editorial Preface (2013)
6. Vermylen, J.: Visualizing Energy Data Using Web-Based Applications. American Geo-physical
Union (2008)
7. Géryk, J., Popelínský, L.: Visual analytics for increasing efficiency of higher education
institutions. In: Abramowicz, W., Kokkinaki, A. (eds.) BIS 2014 Workshops. LNBIP, vol.
183, pp. 117–127. Springer, Heidelberg (2014)
Ramex-Forum: Sequential Patterns of Prices
in the Petroleum Production Chain
1 Introduction
Petroleum is one of the most important resources to the developed world and
still is a major variable influencing the Economy and markets. The price of
petroleum and its derivatives isn’t influenced simply by supply and demand;
taxes, speculation, wars, costs in refinement and transportation all contribute in
setting prices. Due to its lengthy refinement process, a significant increase in the
price of the source material can only reflect in the price of its derivatives after
the time it takes to refine (usually within 3-4 weeks [1]). Moreover, due to its
high economic importance and cost, the price of crude oil should always reflect
on the final price [2].
This work presents a study on a method to quantify how the price of the
crude oil (raw material) can influence the price of manufactured products, by
using Ramex-Forum. This paper departs from the work of [3], with the original
Ramex-forum proposal [4]. It analysis how this proposal can be improved and
then tunned for finding sequential patterns using the prices of petroleum and
derivatives. Sect. 2 presents the basic method and introduces the main concepts
and Sect. 3 presents an evaluation on how the price of derivatives are influenced
by the price of the crude oil (the source material). Finally, some conclusions are
presented.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 584–589, 2015.
DOI: 10.1007/978-3-319-23485-4 58
Ramex-Forum in the Petroleum Production Chain 585
Fig. 1. Financial product (normalized DJI index in black) and respective moving aver-
age (blue) and crossover starting Buy (green) and Sell (red) decision with confidence.
available historical values for a wide range of petroleum related products pro-
vided by the U.S. Energy Information Administration1 . The variations in the
prices of these products are also compared with the stock market value of eleven
corporations dedicated to extracting, processing, and selling of crude and crude
related products. The prices of the 55 products are separated into retail/bulk
price and spot price (for some items the price is taken from retail sellers and
for other items it is the security price at that day). The data is separated into
four categories of known benchmarks [6] for: crude oil (West Texas Intermedi-
ate as OklahomaWTI, European Brent, and the OPEC Basket); Refinery price
for Gasoline, RBOB Gasoline, Diesel, Kerosene Jet, Propane, and Heating Oil;
National, state, and city averages for regular gasoline and diesel; Corporation
stock values.
This paper studies the influence and best values for parameters δ, thresh-
olds, and moving average size. Focus will be put on the Buy comparison because
in the selected data the increase/decrease of prices is very asymmetrical with a
strong lean towards increases. Also, for better parameter comparison, an addi-
tional measure is used in our results, the average edge weight: a relation between
the weight sum of output edges divided by the number of edges in the graph:
AverageEdgeW eight(V, E) = weight(e)/|E|, ∀e ∈ E.
Parameter δ was analyzed regarding its effect on the average edge weight
changes. The result can be seen in Fig. 2A. The chart shows the average edge
weight change for each increment in the value of δ. Each line represents the
results obtained using different moving average sizes. Several big spikes can be
seen every 5 days, this is because gas and diesel prices at the pump are only
registered on a weekly basis, so for each 5 day increase in δ the algorithm will
pick up another change in value. This makes the analysis somewhat harder but
it’s still useful as now changes in retail prices are clearly identified. The first
thing noted is that at the first week there is already a noticeable increase in
the average edge weight, however some of it is due to influences between retail
prices and not only from refinery to retail prices. Second, after the fourth week
the individual increases in δ barely produce a meaningful increase in value, still
the cumulative increases are significant.The parameter δ was fixed at a value of
30 working days (around six weeks: two weeks more than the expected).
1
The used data was downloaded from http://www.eia.gov/ on June 2014 and ranges
form January 2006 to June 2014.
Ramex-Forum in the Petroleum Production Chain 587
Fig. 2. Graphs showing the change in: (A) average edge weight with each increment of
δ using the Buy comparison; (B) average edge weight and number of nodes with each
t+1 = t +1% increment in the threshold interval for δ = 30 using the Buy comparison.
Moving Average Size, NM A The choices for available moving average sizes were
based on [4] and the graphs show that maximizing this parameter yields the
best results and even raises the question of how further increases in the size
would fare. The user still needs to take into account of what it means to increase
the moving average size: the bigger it is the moving average, the smoother the
curve will be and thus it will behave like a noise filter (ie., by becoming less
and less sensitive to small changes in the behavior of the product). The values
overlap for small values and it is hard to read the effects of the first increments
in a linear scale, so Fig. 2B uses a logarithmic scale for representing values,
showing that the 240 and 120 moving average sizes have a very similar behavior.
The average weight for a moving average of 120 days has a higher starting value
than the 240 days one, this means that for a buy signal best thresholds are:
= 1% ∧ δ = 30 ∧ NM A = 120. Fig. 3 shows that it is possible to find more than
just sequential patterns with this parameters.
588 P. Tiple et al.
Fig. 3. Part of the graph showing the resulting Buy tree after applying Ramex Forum
on the data with the selected parameters.
In the complete result graph (available in [3]) colors were added to each node
according to their product type. These colors show a clear grouping of prod-
uct types, with same color nodes mostly close to each other. This was expected
for gas to gas and diesel to diesel influences. However even the stock, refinery,
and reference benchmark prices tend to group together at least in pairs. Fur-
thermore, refineries are almost exclusively related to the same type of product,
gas producing refineries are connected to retail gas prices and diesel producing
refineries are connected to diesel retail prices. The Gulf Coast GAS refinery node
(Fig. 2) does not exactly meet the previous observation as it is shown influencing
some diesel products, even so, this might be a positive thing as it will alert an
attentive analyst to the weight behind the Gulf Coast refinery gas prices. After
further analysis Gulf Coast GAS is identified as the most influential node as it
has at least one detected event for all other products and its average edge weight
is the highest by a margin of 5%, probably due to huge oil production in this
area it is mostly the start of oil production chain. In [3], it was also observed
that specificities of gas usage in the Rocky Mountain retail gas price could trig-
ger third level dependences. Next the most glaring aspect of the graph is how
influential specific products are, the tree is not just an assorted web of relations
but groups of products aggregating around very influential/influenced products.
There are some expected trend setters like the OPECBasket that is used as a
benchmark for oil price, the Gulf Coast refineries and then some unexpected
like the Minnesota retail gas price. For other tested data and parameters, equal
graphs were observed. Indeed, color coding also showed very similar results with
strong groupings of colors and some few select products influencing groups of
others.
4 Conclusions
The presented case study, using real world data and deep analysis, aims to
provide an illustrative and useful example of Ramex-Forum: the signal-to-noise
ratio on the Petroleum production chain analysis already shows that sequential
Ramex-Forum in the Petroleum Production Chain 589
References
1. Borenstein, S., Shepard, A.: Sticky Prices, Inventories, and Market Power in Whole-
sale Gasoline Markets. NBER working paper series, vol. 5468. National Bureau of
Economic Research (1996)
2. Suviolahti, H.: The influence of volatile raw material prices on inventory valuation
and product costing. Master Thesis, Department of Business Technology, Helsinki
School of Economics (2009)
3. Tiple, P.: Tool for discovering sequential patterns in financial markets. Master Thesis
in Engenharia Informática, Faculdade de Ciências e Tecnologia da Universidade
Nova de Lisboa (2014)
4. Marques, N.C., Cavique, L.: Sequential pattern mining of price interactions. In:
Advances in Artificial Intelligence – Proceedings of the Workshop Knowledge Dis-
covery and Business Intelligence, EPIA-KDBI, Portuguese Conference on Artificial
Intelligence, pp. 314–325 (2013)
5. Gary, J., Handwerk, G.: Petroleum Refining. Institut français du pétrole publica-
tions. Taylor & Francis (2001)
6. Hammoudeh, S., Ewing, B.T., Thompson, M.A.: Threshold cointegration analysis
of crude oil benchmarks. The Energy Journal 29(4), 79–96 (2008)
7. Matos, D., Marques, N., Cardoso, M.: Stock market series analysis using self-
organizing maps. Revista de Ciências da Computação 9(9), 79–90 (2014)
Geocoding Textual Documents
Through a Hierarchy of Linear Classifiers
1 Introduction
Geographical Information Retrieval (GIR) has recently captured the attention
of many different researchers that work in fields related to language processing
and to the retrieval and mining of relevant information from large document
collections. For instance, the task of resolving individual place references in tex-
tual documents has been addressed in several previous works, with the aim of
supporting subsequent GIR processing tasks, such as document retrieval or the
production of cartographic visualizations from textual documents [5,6]. How-
ever, place reference resolution presents several non-trivial challenges [8,9], due
to the inherent ambiguity of natural language discourse. Moreover, there are
many vocabulary terms, besides place names, that can frequently appear in the
context of documents related to specific geographic areas [1]. Instead of resolving
individual references to places, it may be interesting to instead study methods
for assigning entire documents to geospatial locations [1,11].
In this paper, we describe a technique for assigning geospatial coordinates of
latitude and longitude to previously unseen textual documents, using only the
raw text of the documents as evidence, and relying on a hierarchy of linear models
built with basis on a discrete hierarchical representation for the Earths surface,
known in the literature as the HEALPix approach [4]. The regions at each level of
this hierarchical representation, corresponding to equally-distributed curvilinear
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 590–596, 2015.
DOI: 10.1007/978-3-319-23485-4 59
Geocoding Textual Documents Through Hierarchical Classification 591
and quadrilateral areas of the Earths surface, are initially associated to textual
contents (i.e., we use all the documents from a training set that are known to refer
to particular geospatial coordinates, associating each text to the corresponding
region). For each level in the hierarchy, we build classification models using
the textual data, relying on a vector space model representation, and using the
quadrilateral areas as the target classes. New documents are assigned to the
most likely quadrilateral area, through the usage of the classifiers inferred from
training data. We finally assign documents to their respective coordinates of
latitude and longitude, taking the centroid coordinates from the quadrilateral
areas.
The proposed document geocoding technique was evaluated with samples of
geo-referenced Wikipedia documents in four different languages. We achieved an
average prediction error of 83 Kilometers, and a median error of just 9 Kilome-
ters, in the case of documents from the English Wikipedia. These results are
slightly better than those reported in previous state-of-the-art studies [11,12].
Fig. 1. Orthographic views associated to the first four levels of the HEALPix sphere
tessellation.
usage of a k-d tree data structure. Moreover, Roller et al. proposed to assign the
centroid coordinates of the training documents contained in the most probable
cell, instead of just using the center point for the cell. These authors report
on a mean error of 181 Kilometers and a median error of 11 Kilometers, when
geocoding documents from the English Wikipedia.
More recently, Wing and Baldridge also reported on tests with discrimina-
tive classifiers [12]. To overcome the computational limitations of discriminative
classifiers, in terms of the maximum number of classes they can handle, the
authors proposed to leverage a hierarchical classification procedure that used
feature hashing and an efficient implementation of logistic regression. In brief,
the authors used an hierarchical approach in which the Earth’s surface is divided
according to a rectangular grid (i.e., using either a regular grid or a k-d tree),
and where an independent classifier is learned for every non-leaf node of the
hierarchy. The probability of any node in the hierarchy is the product of the
probabilities of that node and all of its ancestors, up to the root. The most
probable leaf node is used to infer the final geospatial coordinates. Rather than
greedily using the most probable node from each level, or rather than comput-
ing the probability of every leaf node, the authors used a stratified beam search.
This procedure starts at the root, keeping the b highest-probability nodes at
each level, until reaching the leafs. Wing and Baldridge report on results over
English Wikipedia data corresponding to a mean error of 168.7 Kilometers and
a median error of 15.3 Kilometers.
Table 1. Number of regions and approximate area for HEALPix grids of different
resolutions.
1
http://healpix.jpl.nasa.gov
594 F. Melo and B. Martins
4 Experimental Validation
In our experiments, we used samples with geocoded articles from the English
(i.e., 847,783 articles), German (i.e., 307,859 articles), Spanish (i.e., 180,720 arti-
cles) and Portuguese (i.e., 131,085 articles) Wikipedias, taken from database
dumps produced in 2014. Separate experiments evaluated the quality of the doc-
ument geocoders built for each of the four languages, in terms of the distances
from the predictions towards the correct geospatial coordinates. We processed the
Wikipedia dumps to extract the raw text from the articles, and for extracting the
geospatial coordinates of latitude and longitude from the corresponding infoboxes.
We used 90% of the geocoded articles of each Wikipedia for model training, and
the other 10% for model validation.
In what regards the geospatial distributions of documents, we have that some
regions (e.g, North America or Europe) are considerable more dense in terms of
document associations than others (e.g, Africa), and that oceans and other large
masses of water are scarce in associations to Wikipedia documents. This implies
that the number of classes that has to be considered by our model is much smaller
than the theoretical number of classes given by the HEALPix procedure. In our
English dataset, there are a total of 286,966 regions containing associations to
documents at a resolution level of Nside = 1024, and a total of 82,574, 15,065,
and 190 regions, respectively at resolutions 256, 64, and 4. These numbers are
even smaller in the collections for the other languages.
Table 2 presents the obtained results for the different Wikipedia collections.
The prediction errors shown in Table 2 correspond to the distance in Kilome-
ters, computed through Vincenty’s geodetic formulae [10], from the predicted
locations to the true locations given in Wikipedia. The accuracy values corre-
spond to the relative number of times that we could assign documents to the
correct region (i.e., the HEALPix region where the document’s true geospatial
coordinates of latitude and longitude are contained), for each level of hierarchical
classification. Table 2 also presents upper and lower bounds for the average and
median errors, according to a 95% confidence interval and as measured through
a sampling procedure.
The results attest for the effectiveness of the proposed method, as we mea-
sured slightly inferior errors than those reported in previous studies [2,7,11,12],
which besides different classifiers also used simpler procedures for representing
2
http://scikit-learn.org/
3
http://www.csie.ntu.edu.tw/∼cjlin/liblinear/
Geocoding Textual Documents Through Hierarchical Classification 595
textual contents and for representing the geographical space. It should nonethe-
less be noted that the datasets used in our tests may be slightly different from
those used in previous studies (e.g., they were taken from different Wikipedia
dumps), despite their similar origin.
References
1. Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In:
Proceedings of the International AAAI Conference on Weblogs and Social Media
(2012)
2. Dias, D., Anastácio, I., Martins, B.: A language modeling approach for georefer-
encing textual documents. Actas del Congreso Español de Recuperación de Infor-
mación (2012)
3. Dutton, G.: Encoding and handling geospatial data with hierarchical triangular
meshes. In: Kraak, M.J., Molenaar, M., (eds.) Advances in GIS Research II. CRC
Press (1996)
596 F. Melo and B. Martins
4. Górski, K.M., Hivon, E., Banday, A.J., Wandelt, B.D., Hansen, F.K., Reinecke, M.,
Bartelmann, M.: HEALPIX - a framework for high resolution discretization, and
fast analysis of data distributed on the sphere. The Astrophysical Journal 622(2)
(2005)
5. Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming
news. In: Proceedings of the International ACM SIGIR Conference on Research
and Development in Information Retrieval (2011)
6. Mehler, A., Bao, Y., Li, X., Wang, Y., Skiena, S.: Spatial analysis of news sources.
IEEE Transactions on Visualization and Computer Graphics 12(5) (2006)
7. Roller, S., Speriosu, M., Rallapalli, S., Wing, B., Baldridge, J.: Supervised text-
based geolocation using language models on an adaptive grid. In: Proceedings of
the Conference on Empirical Methods on Natural Language Processing (2012)
8. Santos, J., Anastácio, I., Martins, B.: Using machine learning methods for disam-
biguating place references in textual documents. GeoJournal 80(3) (2015)
9. Speriosu, M., Baldridge, J.: Text-driven toponym resolution using indirect supervi-
sion. In: Proceedings of the Annual Meeting of the Association for Computational
Linguistics (2013)
10. Vincenty, T.: Direct and inverse solutions of geodesics on the ellipsoid with appli-
cation of nested equations. Survey Review XXIII(176) (1975)
11. Wing, B., Baldridge, J.: Simple supervised document geolocation with geodesic
grids. In: Proceedings of the Annual Meeting of the Association for Computational
Linguistics (2011)
12. Wing, B., Baldridge, J.: Hierarchical discriminative classification for text-based
geolocation. In: Proceedings of the Conference on Empirical Methods on Natural
Language Processing (2014)
A Domain-Specific Language for ETL Patterns
Specification in Data Warehousing Systems
Abstract. During the last few years many research efforts have been done to
improve the design of ETL (Extract-Transform-Load) systems. ETL systems
are considered very time-consuming, error-prone and complex involving sever-
al participants from different knowledge domains. ETL processes are one of the
most important components of a data warehousing system that are strongly in-
fluenced by the complexity of business requirements, their changing and evolu-
tion. These aspects influence not only the structure of a data warehouse but also
the structures of the data sources involved with. To minimize the negative im-
pact of such variables, we propose the use of ETL patterns to build specific
ETL packages. In this paper, we formalize this approach using BPMN (Busi-
ness Process Modelling Language) for modelling more conceptual ETL
workflows, mapping them to real execution primitives through the use of a do-
main-specific language that allows for the generation of specific instances that
can be executed in an ETL commercial tool.
1 Introduction
Commercial tools that support ETL (Extract, Transform, and Load) processes develop-
ment and implementation have a crucial impact in the implementation of any data ware-
housing system populating process. They provide the generation of very detailed models
under a specific methodology and notation. Usually, such kind of documentation follows
a proprietary format, which is intrinsically related to architectural issues of the develop-
ment tool. For that reason, ETL teams must have skills and experience on such tools that
allow for them to use and explore appropriately the tools. In the case of a migration proc-
ess of ETL migration to another ETL tool environment, the ETL development team will
need to understand all specificities provided by the new tool and start a new project often
almost from scratch. We believe that ETL systems development requires a simply and
reliable approach. A more abstract view of the processes and data structures is very con-
venient as well as a more effective mapping to some kind of execution primitives allow-
ing for its execution inside the environments of commercial tools. Using a pallet of spe-
cific ETL patterns representing some of the most used ETL tasks in real world applica-
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 597–602, 2015.
DOI: 10.1007/978-3-319-23485-4_60
598 B. Oliveira and O. Belo
tion scenarios – e.g. Surrogate Key Pipelining (SKP), Slowly Changing Dimensions
(SCD) or Change Data Capture (CDC) –, we designed a new ETL development layer on
top of a traditional method, making possible to use ETL tools from the very beginning of
the project, in order to plan and implement more appropriated ETL processes. To do that
we used the Business Process Modelling Notation (BPMN) [1] for ETL processes repre-
sentation, extending its original meta-model for including the ETL pattern specification
we designed. The inclusion of these patterns distinguishes clearly two very relevant as-
pects in ETL design and implementation: process flow coordination and data processing
tasks. BPMN is very suitable for this kind of processes, simply because it provides some
very convenient features like expressiveness and flexibility on the specification of proc-
esses. Thus, after a brief exposure of some related work (section 2), we present and dis-
cuss briefly a demonstration scenario using one of the most useful (and crucial) ETL
process: a Data Quality Enhancement (DQE) (section 3). Next, in section 4, we present a
DQE specification skeleton, its internal behaviour and how we can configure using a
Domain-Specific Language (DSL) to enable its execution. Finally, we discuss the ex-
periments done so far, analysing results and presenting some conclusions and future work
(section 5).
2 Related Work
With the exception of some low-level methods for ETL development [2], most ap-
proaches presented so far use conceptual or logical models as the basis for ETL mod-
elling. Such models reduce complexity, produce detailed documentation and provide
the ability to easily communicate with business users. Some of the proposals pre-
sented by Vassiliadis and Simitsis cover several aspects of ETL conceptual specifica-
tion [3], its representation using logical views [4, 5], and its implementation using a
specific ETL tool [6]. Later, Trujillo [7] and Muñoz [8] provided an UML extension
for ETL conceptual modelling, reducing some of the communication issues that the
proposal of Vassiliadis et al. revealed previously. However, the translation to execu-
tion primitives they made was not very natural, since UML is essentially used to de-
scribe system requirements and not to support its execution. The integration of exist-
ing organizational tasks with ETL processes was addressed by Wilkinson et al. [9],
which exposed a practical approach for the specification of ETL conceptual models
using BPMN. BPMN was firstly introduced by Akkaoui and Zimanyi [10] on ETL
systems specification. Subsequently, Akkaoui et al. [11] provided a BPMN-based
meta-model for an independent ETL modelling approach. They explored and dis-
cussed the bridges to a model-to-text translation, providing its execution in some ETL
commercial tools. Still using BPMN notation, Akkaoui et al. [12] provided a BPMN
meta-model covering two important architectural layers related to the specification of
ETL processes. More recently and following the same guidelines from previous
works, Akkaoui et al. [13] disposed a framework that allows for the translation of
abstract BPMN models to its concrete execution in a target ETL tool using model-to-
text transformations.
A Do
omain-Specific Language for ETL Patterns Specification 599
3 Pattern-Based ETL
E Modelling
Fig. 1. A BPMN
B pattern specification for an ETL process
assigns the surrogate keys to the data extracted, maintaining the correspondence meta-
data stored in specific mapping tables located in the DSA. After the SKP process
appears an IDL pattern that loads data into the data warehouse, establishing all the nec-
essary correspondences to the data warehouse schema.
Fig. 2. A decomposition procedure (left) and its exception handling description (right)
Fig. 2 shows an example of the DSL proposed for a typical decomposition proce-
dure used as sub-part of a DQE pattern that splits the customer full name in two new
attributes: ‘FirstName’ and ‘LastName’. The decomposition rule is performed based
on a regular expression: ‘\\s’, which means that an original string must be split using a
A Domain-Specific Language for ETL Patterns Specification 601
space as delimiter. The pattern configuration starts with the description of the pattern,
using the ‘Name’ and the ‘Description’ keywords inside of a ‘Header’ block. Blocks are
identified using square brackets delimiters. Next, a decomposition block is used in order
to describe the internal components of a decomposition pattern. This block must have an
internal ‘ID’, which can be manually or automatically assigned (‘AUTO’ keyword).
Next, three blocks are used: ‘Input’ block for input metadata, ‘Output’ block for output
metadata, and ‘Rule’ for a decomposition rule specification. Both input and output
blocks use a target data schema name storing the original/resulted data, and a collection
of attributes (and data types) used for each block. We distinguished single assignments
(a singular value) and composite assignments (composite data structures) using an equal
operator for atomic attributions and square brackets for composite attributions. The
‘Rule’ block describes the regular expression that should be applied to the original
string using the ‘Regex’ keyword. The ‘Limit’ keyword is used to control the number of
times that a pattern is applied affecting the length of the returned result set. Two special
identifiers (‘FIRST’ and ‘LAST’) are used to extract the first and last occurrence match-
ing the regular expression. The results of the output rule (‘FirstName’ and ‘LastName’)
are mapped to the ‘Output’ attributes. The DSL also includes some compensation and
error exception statements associated to each pattern. The compensation events provide
an alternative approach to handle a specific exception event, e.g. storing the non-
conform records in quarantine tables for later evaluation or applying automatic error.
The error events block or end the process execution. Fig. 2 (right) presents an exception
block with compensation and error policies associated. The ‘Exception’ block is formed
by three mandatory constructs: 1) Event that specifies an exception that may occur, e.g.
‘EmptyAttribute’ or ‘InvalidDecompositionRule’; 2) Action that identifies the action
that should be performed, e.g. record that started the exception can be stored to specific
quarantine table or can abort the workflow; and 3) Log activity that stores the exception
occurrences to a specific log file structure. With these domain-level instructions, it is
possible generating dynamically the instances following the language rules. For that, we
can implement code generators to translate the DSL to specifics formats supported by
ETL commercial tools.
In this paper we showed how a typical ETL process can be represented exclusively
using ETL patterns on BPMN models, and how these patterns can be integrated in a
single ETL system package. To demonstrate their practical application, we selected and
discussed a DQE pattern, describing its internal composition and providing a specific
DSL for its configuration. From a conceptual modelling point of view, we consider that
ETL models should not include any kind of implementation infrastructure specification
or any criteria associated with its execution. All infrastructures that support the imple-
mentation of conceptual models are related to specific classes of users involving there-
fore the application of specific constructors. The BPMN provides this kind of abstrac-
tion, focusing essentially on the coordination of ETL patterns, promoting the reusability
of patterns across several systems, and making the system more robust to process
changes. Additionally, the DSL proposed dispose an effective way to formalize each
pattern configuration, allowing for its posterior mapping to a programing language such
602 B. Oliveira and O. Belo
as Java. Using a domain-level DSL it is possible to describe more naturally each part of
an ETL process without having the need to program each component. In the short term,
we intend to have an extended family of ETL patterns that will allows for building a
complete ETL system from scratch, covering all coordination and communication as-
pects as well as the description of all the tasks required to materialize it. Additionally, a
generic transformation plug-in for generating ETL physical schemas for data integration
tools is also planned.
References
1. OMG, Documents Associated With Business Process Model And Notation (BPMN) Ver-
sion 2.0 (2011)
2. Thomsen, C., Pedersen, T.B.: Pygrametl: a powerful programming framework for extract-
transform-load programmers. In: Proceeding of the ACM Twelfth International Workshop
on Data Warehousing and OLAP, DOLAP 2009, pp. 49–56 (2009)
3. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In:
Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP,
DOLAP 2002, pp. 14–21 (2002)
4. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: On the logical modeling of ETL processes.
In: Pidduck, A., Mylopoulos, J., Woo, C.C., Ozsu, M. (eds.) CAiSE 2002. LNCS,
vol. 2348, pp. 782–786. Springer, Heidelberg (2002)
5. Simitsis, A., Vassiliadis, P.: A method for the mapping of conceptual designs to logical
blueprints for ETL processes. Decis. Support Syst. 45, 22–40 (2008)
6. Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: Arktos: A Tool
for Data Cleaning and Transformation in Data Warehouse Environments. IEEE Data Eng.
Bull. 23(4), 42–47 (2000)
7. Luján-Mora, S., Trujillo, J., Song, I.-Y.: A UML profile for multidimensional modeling in
data warehouses. Data Knowl. Eng. 59, 725–769 (2006)
8. Trujillo, J., Luján-Mora, S.: A UML based approach for modeling ETL processes in data
warehouses. Concept. Model. 2813, 307–320 (2003)
9. Wilkinson, K., Simitsis, A., Castellanos, M., Dayal, U.: Leveraging business process mod-
els for ETL design. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER
2010. LNCS, vol. 6412, pp. 15–30. Springer, Heidelberg (2010)
10. El Akkaoui, Z., Zimanyi, E.: Defining ETL worfklows using BPMN and BPEL. In: Pro-
ceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP,
DOLAP 2009, pp. 41–48 (2009)
11. El Akkaoui, Z., Zimànyi, E., Mazón, J.-N., Trujillo, J.: A model-driven framework for
ETL process development. In: Proceedings of the ACM 14th International Workshop on
Data Warehousing and OLAP, DOLAP 2011, pp. 45–52 (2011)
12. El Akkaoui, Z., Mazón, J.-N., Vaisman, A., Zimányi, E.: BPMN-based conceptual model-
ing of ETL processes. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448,
pp. 1–14. Springer, Heidelberg (2012)
13. El Akkaoui, Z., Zimanyi, E., Mazon, J.-N., Trujillo, J.: A BPMN-based design and main-
tenance framework for ETL processes. Int. J. Data Warehous. Min. 9, 46 (2013)
14. Rahm, E., Do, H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull.
23, 3–13 (2000)
15. Köppen, V., Brüggemann, B., Berendt, B.: Designing Data Integration: The ETL Pattern
Approach. Eur. J. Informatics Prof. XII (2011)
Optimized Multi-resolution Indexing and Retrieval
Scheme of Time Series
1 Introduction
[5] we introduced Tight-MIR which has the advantages of the two previously men-
tioned methods. Tight-MIR, however, stores distances corresponding to all resolution
levels, even though some of them might have a low pruning power. In this paper we
present an optimized version of Tight-MIR which stores and processes only the reso-
lution levels with the maximum pruning power.
The rest if the paper is organized as follows: Section 2 is a background section. The
optimized version is presented in Section 3 and tested in Section 4. We conclude this
paper with Section 5.
2 Background
| , , | (1)
This relation represents a pruning condition which is the first filter of Weak-MIR.
By applying the triangle inequality again we get:
, , , (2)
The disadvantage of the indexing scheme presented in the previous section is that it is
“deterministic”, meaning that at indexing time the time series are indexed using a top-
down approach, and the algorithm behaves in a like manner at query time. If some
resolution levels have low utility in terms of pruning power, the algorithm will still
606 M.M. Muhammad Fuad
use the pre-computed distances related to these levels, and at query time these levels
will also be examined. Whereas the use of the first filter does not require any query
time distance evaluation, applying the second does include calculating distances and
thus we might be storing and calculating distances for little pruning benefit.
We propose in this paper an optimized multi-resolution indexing and retrieval
scheme. Taking into account that the time series to which we apply equations (1) and
(2) are those which have not been filtered out at lower resolution levels, this opti-
mized scheme should determine the optimal combination of resolution levels the algo-
rithm should keep at indexing time and consequently use at query time.
The optimization algorithm we use to solve this problem is the Genetic Algorithm.
The Genetic Algorithm (GA) is a famous evolutionary algorithm that has been ap-
plied to solve a variety of optimization problems. GA is a population-based global
optimization algorithm which mimics the rules of Darwinian selection in that weaker
individuals have less chance of surviving the evolution process than stronger ones.
GA captures this concept by adopting a mechanism that preserves the “good” features
during the optimization process.
In GA a population of candidate solutions (chromosomes) explores the search space
and exploits this by sharing information. These chromosomes evolve using genetic
operations (selection, crossover, mutation, and replacement).
GA starts by randomly initializing a population of chromosomes inside the search
space. The fitness function of these chromosomes is evaluated. According to the val-
ues of the fitness function new offspring chromosomes are generated through the
aforementioned genetic operations. The above steps repeat for a number of genera-
tions or until a predefined stopping condition terminates the GA.
The new algorithm, which we call Optimized Multi-Resolution Indexing and Re-
trieval – O-MIR, works as follow; we proceed in the same manner described for
Tight-MIR to produce candidate resolution levels. The next step is handled by the
optimizer to select resolution levels of the resolution levels, where these
levels provide the maximum pruning power. For the current version of our algorithm
the number of resolution levels to be kept, , is chosen by the user according to the
storage and processing capacity of the system. In other words, our algorithm will
decide which are the optimal resolution levels to be kept out of the resolution
levels produced by the indexing step.
Notice that when 1 we have one resolution level, which is the case with tradi-
tional dimensionality reduction techniques.
The optimization stage of O-MIR starts by randomly initializing a population of
chromosomes 〈 , ,…, 〉 where 1, … , and where
. Each chromosome represents a possible configuration of the resolution
levels to be kept. The fitness function of our optimization problem is the pruning
power of this configuration. As in [5], the performance criterion is based on the laten-
cy time concept presented in [6]. The latency time is calculated by the number of
cycles the processor takes to perform the different arithmetic operations (>,+ - ,*,abs,
sqrt) which are required to execute the similarity search query. This number for each
operation is multiplied by the latency time of that operation to get the total latency
time of the similarity search query. The latency time is 5 cycles for (>, + -), 1 cycle
Optimized Multi-reesolution Indexing and Retrieval Scheme of Time Series 607
for (abs), 24 cycles for (*)), and 209 cycles for (sqrt) [6]. The latency time for eeach
chromosome is the average of the latency time of random queries.
a
As with other GAs, our algorithm selects a percentage of chromosomes for
mating and mutates a perceentage of genes. The above steps repeat for
generations.
4 Experiments
We compared O-MIR with h Tight-MIR on similarity search experiments on differrent
time series datasets from diifferent time series repositories [7], and [18] using differrent
threshold values, and for different values of . Since the value of is related to the
value of , which in turn deepends on the length of the time series tested, we denote the
percentage of the resolution levels kept to the total resolution levels by ⁄ .
As for the parameters of the algorithm that we used in the experiments; the popuula-
tion size, , was 166, the number of generations was 100, the mutattion
rate, , was 0.2, the selection
s rate, , was 0.5, and the number of querries,
, was set to 10.
We show in Fig. 1 the reesults of our experiments. For (CBF), (Wafer), and (G Gun-
Point) we have 5 resoluttion levels ( 5). For these datasets we chose
40%, 60%, 80%. (motoCu urrent) has 8 resolution levels ( 8), we chose
25%, 50%, 75% As we can n see, the results are promising in terms of latency time and
storage space. For the threee first datasets O-MIR is faster than Tight-MIR and in adddi-
tion, it required less storagee space. This is also the case with (motoCurrent) exceptt for
20% where the laten ncy time for O-MIR was longer than that of Tight-M MIR.
However, for this value off the gain of storage space is substantial without m much
increase in latency time.
Fig. 1. Comparison of the lattency time between Tight-MIR and O-MIR on datasets (CB
BF),
(Wafer), (GunPoint) ), and (mo
otoCurrent) for different values of pk
608 M.M. Muhammad Fuad
5 Conclusion
References
1. Morinaka, Y., Yoshikawa, M., Amagasa, T., Uemura, S.: The L-index: an indexing struc-
ture for efficient subsequence matching in time sequence databases. In: Proc. 5th Pacific
Asia Conf. on Knowledge Discovery and Data Mining (2001)
2. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally Adaptive Dimensionality
Reduction for Similarity Search in Large Time Series Databases. SIGMOD (2001)
3. Muhammad Fuad, M.M., Marteau P.F.: Multi-resolution approach to time series retrieval.
In: Fourteenth International Database Engineering & Applications Symposium– IDEAS
2010, Montreal, QC, Canada (2010)
4. Muhammad Fuad, M.M., Marteau P.F.: Speeding-up the similarity search in time series da-
tabases by coupling dimensionality reduction techniques with a fast-and-dirty filter. In:
Fourth IEEE International Conference on Semantic Computing– ICSC 2010, Carnegie Mel-
lon University, Pittsburgh, PA, USA (2010)
5. Muhammad Fuad, M.M., Marteau, P.F.: Fast retrieval of time series by combining a multi-
resolution filter with a representation technique. In: The International Conference on Ad-
vanced Data Mining and Applications–ADMA 2010, ChongQing, China, November 21,
2010
6. Schulte, M.J., Lindberg, M. Laxminarain, A.: Performance evaluation of decimal floating-
point arithmetic. In: IBM Austin Center for Advanced Studies Conference, February 2005
7. http://povinelli.eece.mu.edu/
8. Keogh, E., Zhu, Q., Hu, B., Hao, Y., Xi, X., Wei, L., Ratanamahatana, C.A.: The UCR
Time Series Classification/Clustering (2011). www.cs.ucr.edu/~eamonn/time_series_data/
Multi-agent Systems:
Theory and Applications
Minimal Change in Evolving Multi-Context
Systems
1 Introduction
Open and dynamic environments create new challenges for knowledge repre-
sentation languages for agent systems. Instead of having to deal with a single
static knowledge base, each agent has to deal with multiple sources of distributed
knowledge possibly written in different languages. These sources of knowledge
include the large number of available ontologies and rule sets, as well as the
norms and policies published by institutions, the information communicated by
other agents, to name only a few.
The need to incorporate in agent-oriented programming languages the ability
to represent and reason with heterogeneous distributed knowledge sources, and
the flow of information between them, has been pointed out in [1–4], although
a general adequate practical solution is still not available.
Recent literature in knowledge representation and reasoning contains several
proposals to combine heterogeneous knowledge bases, one of which – Multi-
Context Systems (MCSs) [5–7] – has attracted particular attention because it
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 611–623, 2015.
DOI: 10.1007/978-3-319-23485-4 62
612 R. Gonçalves et al.
1
For simplicity of presentation, we consider discrete steps in time here.
614 R. Gonçalves et al.
3 Minimal Change
In this section, we discuss some alternatives for the notion of minimal change in
eMCSs. What makes this problem interesting is that there are different param-
eters that we may want to minimize in a transition from one time instant to the
next one. In the following discussion we focus on two we deem most relevant: the
operations that can be applied to the knowledge bases, and the distance between
consecutive belief states.
We start by studying minimal change at the level of the operations. In the
following discussion we consider fixed an eMCS Me = C1 , . . . , Cn , O1 , . . . , O .
Recall from the definition of evolving equilibrium that, in the transition
between consecutive time instants, the knowledge base of each context Ci of
Me changes according to the operations in appnext i (S, O), and these depend on
the belief state S and the instant observation O. The first idea to compare ele-
ments of this set of operations is to, for a fixed instant observation O, distinguish
those equilibria of Me which generate a minimal set of operations to be applied
to the current knowledge bases to obtain the knowledge bases of the next time
instant. Formally, given a knowledge base configuration K ∈ KBMe and an
instant observation O for Me , we can define the set:
M inEq(K, O) = {S : S is an equilibrium of Me [K] given O and there is no
equilibrium S of Me [K] given O such that, for all 1 ≤ i ≤ n,
appnext
i (S , O) ⊂ appnext
i (S, O)}
This first idea of comparing equilibria based on inclusion of the sets of oper-
ations can, however, be too strict in most cases. Moreover, different operations
usually have different costs,2 and it may well be that, instead of minimizing
based on set inclusion, we want to minimize the total cost of the operations to
be applied. For that, we need to assume that each context has a cost function
over the set of operations, i.e., costi : OPi → N, where costi (op) represents the
cost of performing operation op.
Let S be a belief state for Me and O an instant observation for Me . Then,
for each 1 ≤ i ≤ n, we define the cost of the operations to be applied to obtain
the knowledge base of the next time instant as:
Costi (S, O) = costi (op)
op(s)∈appnext
i (S,O)
Summing for all evolving contexts, we obtain the global cost of S given O:
n
Cost(S, O) = Costi (S, O)
i=1
Now that we have defined a cost function over belief states, we can define a
minimization function over possible equilibria of eMCS Me [K] for a fixed knowl-
edge base configuration K ∈ KBMe . Formally, given O an instant observation for
Me , we define the set of equilibria of Me [K] given O which minimize the global
2
We use the notion of cost in an abstract sense, i.e., depending on the context, it may
refer to, e.g., the computational cost of the operation, or its economic cost.
Minimal Change in Evolving Multi-Context Systems 617
Proposition 1. The functions dmax and davg defined above are both distance
functions, i.e., satisfy the axioms 1) - 4).
618 R. Gonçalves et al.
We now study how we can use one of these distance functions between belief
states to compare the possible alternatives in the sets mngi (appnext i (S, O), kbi ),
for each 1 ≤ i ≤ n. Recall that the intuitive idea is to minimize the distance
between the current belief state S and the possible equilibria that each element
of mngi (appnext
i (S, O), kbi ) can give rise to. We explore here two options, which
differ on whether the minimization is global or local. The idea of global minimiza-
tion is to choose only those knowledge base configurations k1 , . . . , kn ∈ KBMe
with ki ∈ mngi (appnext i (S, O), kbi ), which guarantee minimal distance between
the original belief state S and the possible equilibria of the obtained eMCS.
The idea of local minimization is to consider all possible tuples k1 , . . . , kn
with ki ∈ mngi (appnext i (S, O), kbi ), and only apply minimization for each such
choice, i.e., for each such knowledge base configuration we only allow equilibria
with minimal distance from the original belief state.
We first consider the case of pruning those tuples k1 , . . . , kn such that
ki ∈ mngi (appnexti (S, O), kbi ), which do not guarantee minimal change with
respect to the original belief state. We start by defining an auxiliary function.
Let S be a belief state for Me , K = k1 , . . . , kn ∈ KBMe a knowledge base
configuration for Me , and O = o1 , . . . , o an instant observation for Me . Then
we define the set of knowledge base configurations that are obtained from K
given the belief state S and the instant observation O as:
N extKB(S, O, k1 , . . . , kn ) = {k1 , . . . , kn ∈ KBMe : for each 1 ≤ i ≤ n
we have that ki ∈ mngi (appnext
i (S, O), ki )}
For each choice d of a distance function between belief states, we define the set of
knowledge base configurations that minimize the distance to the original belief
state. Let S be a belief state for Me , K = k1 , . . . , kn ∈ KBMe a knowledge
base configuration for Me , and Oj and Oj+1 instant observations for Me .
M inN ext(S, Oj , Oj+1 , K) = {(S ,K ) : K ∈ N extKB(S, Oj , K) and
S ∈ M inCost(Me [K ], Oj+1 ) s.t. there is no
K ∈ N extKB(S, Oj , K) and no
S ∈ M inCost(Me [K ], Oj+1 ) with
d(S, S ) < d(S, S )}.
Note that M inN ext applies minimization over all possible equilibria resulting
from every element of N extKB(S, Oj , K). Using M inN ext, we can now define
a minimal change criterion to be applied to evolving equilibria of Me .
We call this minimal change criterion the strong minimal change criterion
because it applies minimization over all possible equilibria resulting from every
possible knowledge base configuration in N extKB(S, Oj , K).
The following proposition states the desirable property that the existence of
an equilibrium guarantees the existence of an equilibrium satisfying the strong
minimal change criterion. We should note that this is not a trivial statement
since we are combining minimization of two different elements: the cost of the
operations and the distance between belief states. This proposition in fact follows
from their careful combination in the definition of M inN ext.
Proposition 2. Let Obs = O1 , . . . , Om be an observation sequence for Me . If
Me has an evolving equilibrium of size s given Obs, then at least one evolving
equilibrium of size s given Obs satisfies the strong minimal change criterion.
Note that in the definition of the strong minimal change criterion, the knowl-
edge base configurations K ∈ N extKB(S j , Oj , K j ), for which the corresponding
possible equilibria are not at a minimal distance from S j , are not considered. How-
ever, there could be situations in which this minimization criterion is too strong.
For example, it may well be that all possible knowledge base configurations in
N extKB(S j , Oj , K j ) are important, and we do not want to disregard any of them.
In that case, we can relax the minimization condition by applying minimization
individually for each knowledge base configuration in N extKB(S j , Oj , K j ). The
idea is that, for each fixed K ∈ N extKB(S j , Oj , K j ) we choose only those equi-
libria of Me [K] which minimize the distance to S j .
Formally, let S be a belief state for Me , K ∈ KBMe a knowledge base
configuration for Me , and O an instant observation for Me . For each distance
function d between belief states, we can define the following set:
M inDist(S, O, K) ={S : S ∈ M inCost(Me [K], O) and
there is no S ∈ M inCost(Me [K], O)
such that d(S, S ) < d(S, S )}
Using this more relaxed notion of minimization we can define an alternative
weaker minimal change criterion to be applied to evolving equilibria of an eMCS.
Definition 8. Let Me = C1 , . . . , Cn , O1 , . . . , O be an eMCS, Obs =
O1 , . . . , Om an observation sequence for Me , and Se = S 1 , . . . , S s an evolv-
ing equilibrium of Me given Obs. We assume that K 1 , . . . , K s , with K j =
k1j , . . . , knj , is the sequence of knowledge base configurations associated with Se
as in Definition 6. Then, Se satisfies the weak minimal change criterion of Me
given Obs, if for each 1 ≤ j ≤ s the following conditions are satisfied:
– S j ∈ M inCost(Me [K j ], Oj )
– S j+1 ∈ M inDist(S j , K j+1 , Oj+1 )
We can now prove that the existence of an evolving equilibrium implies the
existence of an equilibrium satisfying the weak minimal change criterion. Again
note that the careful combination of the two minimizations – cost and distance
– in the definition of M inDist is fundamental to obtain the following result.
620 R. Gonçalves et al.
We can now prove that the strong minimal change criterion is, in fact,
stronger than the weak minimal change criterion.
is to apply the ideas in this paper to study the dynamics of frameworks closely
related to MCSs, such as those in [27–30].
Finally, and in line with the very motivation set out in the introduction, we
believe that the research in MCSs – including eMCSs with the different notions
of minimal change – provides a blue-print on how to represent and reason with
heterogeneous dynamic knowledge bases which could (should) be used by devel-
opers of practical agent-oriented programming languages, such as JASON [31],
2APL [32], or GOAL [33], in their quest for providing users and programmers
with greater expressiveness and flexibility in terms of the knowledge representa-
tion and reasoning facilities provided by such languages. To this end, an appli-
cation scenario that could provide interesting and rich examples would be that
of norm-aware multi-agent systems [34–39].
Acknowledgments. We would like to thank the referees for their comments, which
helped improve this paper. R. Gonçalves, M. Knorr and J. Leite were partially sup-
ported by FCT under project ERRO (PTDC/EIA-CCO/121823/2010) and under
strategic project NOVA LINCS (PEst/UID/CEC/04516/2013). R. Gonçalves was par-
tially supported by FCT grant SFRH/BPD/100906/2014 and M. Knorr was partially
supported by FCT grant SFRH/BPD/86970/2012.
References
1. Dastani, M., Hindriks, K.V., Novák, P., Tinnemeier, N.A.M.: Combining multi-
ple knowledge representation technologies into agent programming languages. In:
Baldoni, M., Son, T.C., van Riemsdijk, M.B., Winikoff, M. (eds.) DALT 2008.
LNCS (LNAI), vol. 5397, pp. 60–74. Springer, Heidelberg (2009)
2. Klapiscak, T., Bordini, R.H.: JASDL: A practical programming approach com-
bining agent and semantic web technologies. In: Baldoni, M., Son, T.C.,
van Riemsdijk, M.B., Winikoff, M. (eds.) DALT 2008. LNCS (LNAI), vol. 5397,
pp. 91–110. Springer, Heidelberg (2009)
3. Moreira, Á.F., Vieira, R., Bordini, R.H., Hübner, J.F.: Agent-oriented programming
with underlying ontological reasoning. In: Baldoni, M., Endriss, U., Omicini, A.,
Torroni, P. (eds.) DALT 2005. LNCS (LNAI), vol. 3904, pp. 155–170. Springer,
Heidelberg (2006)
4. Alberti, M., Knorr, M., Gomes, A.S., Leite, J., Gonçalves, R., Slota, M.: Norma-
tive systems require hybrid knowledge bases. In: van der Hoek, W., Padgham, L.,
Conitzer, V., Winikoff, M. (eds.) Procs. of AAMAS, pp. 1425–1426. IFAAMAS (2012)
5. Brewka, G., Eiter, T.: Equilibria in heterogeneous nonmonotonic multi-context
systems. In: Procs. of AAAI, pp. 385–390. AAAI Press (2007)
6. Giunchiglia, F., Serafini, L.: Multilanguage hierarchical logics or: How we can do
without modal logics. Artif. Intell. 65(1), 29–70 (1994)
7. Roelofsen, F., Serafini, L.: Minimal and absent information in contexts. In:
Kaelbling, L., Saffiotti, A. (eds.) Procs. of IJCAI, pp. 558–563. Professional Book
Center (2005)
8. Brewka, G., Eiter, T., Fink, M., Weinzierl, A.: Managed multi-context systems. In:
Walsh, T. (ed.) Procs. of IJCAI, pp. 786–791. IJCAI/AAAI (2011)
9. Benerecetti, M., Giunchiglia, F., Serafini, L.: Model checking multiagent systems.
J. Log. Comput. 8(3), 401–423 (1998)
622 R. Gonçalves et al.
10. Dragoni, A., Giorgini, P., Serafini, L.: Mental states recognition from communica-
tion. J. Log. Comput. 12(1), 119–136 (2002)
11. Sabater, J., Sierra, C., Parsons, S., Jennings, N.R.: Engineering executable agents
using multi-context systems. J. Log. Comput. 12(3), 413–442 (2002)
12. Gonçalves, R., Knorr, M., Leite, J.: Evolving multi-context systems. In: Schaub, T.,
Friedrich, G., O’Sullivan, B. (eds.) Procs. of ECAI. Frontiers in Artificial Intelligence
and Applications, vol. 263, pp. 375–380. IOS Press (2014)
13. Brewka, G.: Towards reactive multi-context systems. In: Cabalar, P., Son, T.C.
(eds.) LPNMR 2013. LNCS, vol. 8148, pp. 1–10. Springer, Heidelberg (2013)
14. Ellmauthaler, S.: Generalizing multi-context systems for reactive stream reasoning
applications. In: Procs. of ICCSW. OASICS, vol. 35, pp. 19–26. Schloss Dagstuhl
- Leibniz-Zentrum fuer Informatik, Germany (2013)
15. Brewka, G., Ellmauthaler, S., Pührer, J.: Multi-context systems for reactive rea-
soning in dynamic environments. In: Schaub, T., Friedrich, G., O’Sullivan, B.,
(eds.) Procs. of ECAI. Frontiers in Artificial Intelligence and Applications, vol. 263,
pp. 159–164. IOS Press (2014)
16. Alferes, J.J., Brogi, A., Leite, J., Moniz Pereira, L.: Evolving logic programs. In:
Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol.
2424, p. 50. Springer, Heidelberg (2002)
17. Gebser, M., Grote, T., Kaminski, R., Schaub, T.: Reactive answer set program-
ming. In: Delgrande, J.P., Faber, W. (eds.) LPNMR 2011. LNCS, vol. 6645,
pp. 54–66. Springer, Heidelberg (2011)
18. Wang, Y., Zhuang, Z., Wang, K.: Belief change in nonmonotonic multi-context
systems. In: Cabalar, P., Son, T.C. (eds.) LPNMR 2013. LNCS, vol. 8148,
pp. 543–555. Springer, Heidelberg (2013)
19. Kaminski, T., Knorr, M., Leite, J.: Efficient paraconsistent reasoning with ontolo-
gies and rules. In: Procs. of IJCAI. IJCAI/AAAI (2015)
20. Gonçalves, R., Knorr, M., Leite, J.: Evolving bridge rules in evolving multi-context
systems. In: Bulling, N., van der Torre, L., Villata, S., Jamroga, W., Vasconcelos, W.
(eds.) CLIMA 2014. LNCS, vol. 8624, pp. 52–69. Springer, Heidelberg (2014)
21. Slota, M., Leite, J.: On semantic update operators for answer-set programs. In
Coelho, H., Studer, R., Wooldridge, M., (eds.) Procs. of ECAI. Frontiers in Arti-
ficial Intelligence and Applications, vol. 215, pp. 957–962. IOS Press (2010)
22. Slota, M., Leite, J.: Robust equivalence models for semantic updates of answer-set
programs. In: Brewka, G., Eiter, T., McIlraith, S.A. (eds.) Procs. of KR. AAAI
Press (2012)
23. Slota, M., Leite, J.: The rise and fall of semantic rule updates based on se-models.
TPLP 14(6), 869–907 (2014)
24. Slota, M., Leite, J.: A unifying perspective on knowledge updates. In: del Cerro, L.F.,
Herzig, A., Mengin, J. (eds.) JELIA 2012. LNCS, vol. 7519, pp. 372–384. Springer,
Heidelberg (2012)
25. Gonçalves, R., Knorr, M., Leite, J.: Towards efficient evolving multi-context sys-
tems (preliminary report). In: Ellmauthaler, S., Pührer, J. (eds.) Procs. of React-
Know (2014)
26. Knorr, M., Gonçalves, R., Leite, J.: On efficient evolving multi-context systems.
In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS, vol. 8862, pp. 284–296.
Springer, Heidelberg (2014)
27. Knorr, M., Slota, M., Leite, J., Homola, M.: What if no hybrid reasoner is available?
hybrid MKNF in multi-context systems. J. Log. Comput. 24(6), 1279–1311 (2014)
Minimal Change in Evolving Multi-Context Systems 623
28. Gonçalves, R., Alferes, J.J.: Parametrized logic programming. In: Janhunen, T.,
Niemelä, I. (eds.) JELIA 2010. LNCS, vol. 6341, pp. 182–194. Springer, Heidelberg
(2010)
29. Knorr, M., Alferes, J., Hitzler, P.: Local closed world reasoning with description
logics under the well-founded semantics. Artif. Intell. 175(9–10), 1528–1554 (2011)
30. Ivanov, V., Knorr, M., Leite, J.: A query tool for EL with non-monotonic rules.
In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X.,
Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part I. LNCS,
vol. 8218, pp. 216–231. Springer, Heidelberg (2013)
31. Bordini, R.H., Hübner, J.F., Wooldridge, M.: Programming Multi-Agent Systems
in AgentSpeak Using Jason (Wiley Series in Agent Technology). John Wiley &
Sons (2007)
32. Dastani, M.: 2APL: a practical agent programming language. Journal of
Autonomous Agents and Multi-Agent Systems 16(3), 214–248 (2008)
33. Hindriks, K.V.: Programming rational agents in GOAL. In: El Fallah
Seghrouchni, A., Dix, J., Dastani, M., Bordini, R.H. (eds.) Multi-Agent Pro-
gramming, pp. 119–157. Springer, US (2009)
34. Criado, N., Argente, E., Botti, V.J.: THOMAS: an agent platform for supporting
normative multi-agent systems. J. Log. Comput. 23(2), 309–333 (2013)
35. Meneguzzi, F., Rodrigues, O., Oren, N., Vasconcelos, W.W., Luck, M.: BDI rea-
soning with normative considerations. Eng. Appl. of AI 43, 127–146 (2015)
36. Cardoso, H.L., Oliveira, E.: A context-based institutional normative environment.
In: Hübner, J.F., Matson, E., Boissier, O., Dignum, V. (eds.) COIN 2008. LNCS,
vol. 5428, pp. 140–155. Springer, Heidelberg (2009)
37. Gerard, S.N., Singh, M.P.: Evolving protocols and agents in multiagent systems.
In: Gini, M.L., Shehory, O., Ito, T., Jonker, C.M. (eds.) Procs. of AAMAS,
pp. 997–1004. IFAAMAS (2013)
38. Vasconcelos, W.W., Kollingbaum, M.J., Norman, T.J.: Normative conflict resolu-
tion in multi-agent systems. Autonomous Agents and Multi-Agent Systems 19(2),
124–152 (2009)
39. Panagiotidi, S., Alvarez-Napagao, S., Vázquez-Salceda, J.: Towards the norm-
aware agent: bridging the gap between deontic specifications and practical mech-
anisms for norm monitoring and norm-aware planning. In: Balke, T., Dignum, F.,
van Riemsdijk, M.B., Chopra, A.K. (eds.) COIN 2013. LNCS, vol. 8386, pp. 346–363.
Springer, Heidelberg (2014)
Bringing Constitutive Dynamics
to Situated Artificial Institutions
1 Introduction
Among the different works related to artificial institutions, [1,2] are concerned
with the grounding of norms in the environment where the agents act, keeping
a clear separation among regulative, constitutive, and environmental elements
involved in the regulation of Multi-Agent Systems (MAS). In this paper we con-
sider and extend the Situated Artificial Institution (SAI) model [2]. The choice
of SAI is motivated by its available specification language, that is interesting to
specify norms decoupled but still grounded in the environment as shown in [3].
For example, the norm stating that “the winner of the auction is obliged to
pay its offer” is specified on top of a constitutive level that defines who, in the
environment, is the winner that must pay its offer and what must be done, in
the environment, to comply with that expectation. Norms abstracting from the
environment are more stable and flexible but must be connected to the environ-
ment [1], as the regulation of the system (realised in what we call institutions)
is, in fact, the regulation of what happens in the environment.
The notion of constitution proposed by John Searle [4] has inspired different
works addressing the relation between the environment and the regulative ele-
ments in MAS. Among them, SAI goes in a particular direction, considering that
constitutive rules specify how agents acting, events occurring, and states holding
in the environment compose (or constitute) the constitutive level of the institu-
tion. In the previous example, a constitutive rule could state that the agent that
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 624–637, 2015.
DOI: 10.1007/978-3-319-23485-4 63
Bringing Constitutive Dynamics to Situated Artificial Institutions 625
acts in the environment placing the best bid counts, in the constitutive level, as
the winner of the auction (Figure 1).
While the notion of constitution in SAI is well defined, a precise and formal
definition of the dynamics of the constitutive level, resulting of the interpretation
of constitutive rules, is still lacking. Interpreting the constitutive rules and man-
aging the SAI constitutive state require to consider (i ) how to tackle with the
different natures of the environmental elements that may constitute the relevant
elements to the institutional regulation (i.e. agents, events, states) and (ii ) how
to base the dynamic of the constitution both on the occurrences of these ele-
ments in the environment and on the production of new constitutive elements in
the institution itself. Taking as granted that the institutional regulation depends
on the constitutive state, this paper departs from the SAI conceptual model to
propose clear defined semantics answering to these two challenges.
The paper begins with a global overview of the SAI model (Section 2), on
which we base our contributions, that are presented in the sections 3 and 4.
While the Section 3 introduces the necessary representations to support the
interpretation of the constitutive rules, the Section 4 is focused on the dynamic
aspects of this interpretation. Before concluding and pointing some perspectives
for future work, Section 5 discusses the contributions of this paper with respect
to related work.
2 Background
Before presenting our contributions in the next sections, this section briefly
describes the SAI model proposed in [2]. In SAI, norms define the expected
behaviour from the agents in an abstract level that is not directly related to
the environment. For example, the norm “the winner of an auction is obliged
to pay its offer ” does not specify neither who is the winner that is obliged to
fulfil the norm nor what the winner must concretely do to fulfil it. The effective-
ness of a norm depends on its connection to the environment as its dynamics
(activation, fulfilment, etc) results of facts occurring there. Such a connection
626 M. de Brito et al.
is established when the components of the norms – the status functions – are
constituted, according to constitutive rules, from the environmental elements
(Figure 1). These elements are described below:
– The environmental elements, represented by X = AX ∪ EX ∪ SX , are organized
in the set AX of agents possibly acting in the system, the set EX of events that
may happen in the environment, and the set SX of properties used to describe
the possible states of the environment.
– The status functions of a SAI, represented by F = AF ∪ EF ∪ SF , are the
set AF of agent-status functions (i.e. status functions assignable to agents), the
set EF of event-status functions (i.e. status functions assignable to events), and
the set SF of state-status functions (i.e. status functions assignable to states).
Status functions are functions that the environmental elements (agents, events,
and states) perform from the institutional perspective [4]. For example, in an
auction, an agent may have the function of winner, the utterance “I offer $100”
may have the function of bid, and the state of “more than 20 people placed in a
room at Friday 10am” may mean the minimum quorum for its realization.
– The constitutive rules defined in C specify the constitution of the status func-
tions of F from the environment element of X . A constitutive rule c ∈ C is a
tuple x, y, t, m where x ∈ F ∪ X ∪ {ε}, y ∈ F, t ∈ EF ∪ EX ∪ , m ∈ W , and
W = WF ∪ WX . WF is the set of status-functions-formulae (sf-formulae) and
WX is the set of environment-formulae (e-formulae), defined later. A constitutive
rule x, y, t, m specifies that x counts as y when t has happened while m holds.
If x = ε, then there is a freestanding assignment of the status function y, i.e. an
assignment where there is not a concrete environmental element carrying y [2,4].
When x actually counts as y (i.e. when the conditions t and m declared in the
constitutive rule are true), we say that there is a status function assignment
(SFA) of the status function y to the element x. The establishment of a SFA of
y to some x is the constitution of y. The set of all SFAs of a SAI composes its
constitutive state (see Def. 4).
The sf-formulae wF ∈ WF are logical formulae, based on status functions (see
the Expression 1 below). The e-formulae wX ∈ WX are logical formulae, based
on environmental elements (see the Expression 2 below). Section 3 defines the
proper semantics of these formulae, based on SFA and on the actual environment.
2
Similarly to the SAI specification, the SAI dynamics can be divided in two parts:
(i) constitutive dynamics, consisting of the status functions assignments and revoca-
tions, and (ii) normative dynamics, consisting of the norm activations, fulfilments,
violations, etc. The normative dynamic is beyond the scope of this paper.
628 M. de Brito et al.
syntax)3 :
SX |=wX iff ∃θ : wX ∈ SX ∧ wX θ ∈ SX (3)
EX |=wX iff ∃θ : wX ∈ EX ∧ wX θ ∈ EX (4)
Definition 4 (Constitutive state). The constitutive state of a SAI is repre-
sented by F = AF , EF , SF where (i) AF ⊆ AX × AF is the set of agent-status
function assignments, (ii) EF ⊆ EX × EF × AX is the set of event-status func-
tion assignments and (iii) SF ⊆ SX × SF is the set of state-status function
assignments.
As introduced in the previous section, SFA are relations between environmental
elements and status functions. Elements of AF are pairs aX , aF meaning that
the agent aX ∈ AX has the status function aF ∈ AF . Elements of EF are triples
eX , eF , aX meaning that the event-status function eF ∈ EF is assigned to the
event eX ∈ EX produced by the agent aX ∈ AX . As events are supposed to be
considered at the individual agent level in normative systems [6], it is important
to record the agent that causes an event-status function assignment. Elements
of SF are pairs sX , sF meaning that the state sX ∈ SX carries the status
function sF ∈ SF . In the following, we will note F = AF , EF , SF to denote
the current constitutive state and F i = AiF , EFi , SFi will be used to refer to the
constitutive state F at the step i of the SAI history.
The constitutive state F is used to evaluate the sf-formulae (see Expression 1
for syntax). If an agent x participates in the system (i.e. x ∈ AX ) and carries the
status function y (i.e. if x, y ∈ AF ), then the formula x is y is true in current
state F :
AF |=x is y iff x ∈ AX ∧ y ∈ AF ∧ x, y ∈ AF (5)
In the same way, event-status function semantics is defined in the Expression 6.
In addition, if an event-status function is assigned to some environmental event,
then this event-status function follows from the current constitutive state F
(Expression 7):
EF |=x is y iff x ∈ EX ∧ x = e, a ∧ y ∈ EF ∧ e, y, a ∈ EF (6)
EF |=wF iff wF ∈ EF ∧ ∃eX : eX is wF (7)
State-status function semantics is similarly defined in the Expression 8. In addi-
tion, if there is some assignment involving a state-status function, then this state-
status function follows from the current constitutive state F (Expression 9):
SF |=x is y iff x ∈ SX ∧ y ∈ SF ∧ x, y ∈ SF (8)
SF |=wF iff wF ∈ SF ∧ ∃sX : sX is wF (9)
The constitutive state defines how the institution is situated. The next section
defines how this constitutive state is deduced from the environmental state and
from the constitutive state itself.
3
In this paper, a substitution is always represented by θ. A substitution is a finite
set of pairs {α1 /β1 , · · · αn /βn } where αi is a variable and βi is a term. If θ is a
substitution and ρ is a literal, then ρθ is the literal resulting from the replacement
of each αi in ρ by the corresponding βi [5].
Bringing Constitutive Dynamics to Situated Artificial Institutions 629
status functions:
agents: auctioneer, bidder, current winner, winner.
events: to bid(Value), to pay(Value), to fine winner, commercial transaction.
states: auction running, auction finished, current value(Value).
norms:
1:auction finished: winner obliged to pay(current value).
constitutive rules:
/* The agent that proposes an auction is the auctioneer */
1: Agent count-as auctioneer when (propose(auction),Agent) while not auction finished.
/* While the auction is running, any agent other than the auctioneer is a bidder */
2: Agent count-as bidder while not(Agent is auctioneer)& auction running.
/* Auctioneer and bidders are auction participants */
3: auctioneer count-as auction participant
4: bidder count-as auction participant
/* The agent that performs the best bid is the current winner */
5: Agent count-as current winner when (to bid(Value),Agent)
while (not(current value(Current)) & Current>Value)& (auction running|auction finished).
/* The current winner is the (final) winner if the auction is finished */
6: current winner count-as winner while auction finished.
/* An auction is running while there is an agent being the auctioneer */
7: count-as auction running while is auctioneer.
/* Auctioneer hitting the hammer means that the auction is finished */
8: count-as auction finished when (hit hammer, Agent) while Agent is auctioneer.
/* An offer done by a bidder while the auction is running is a bid */
9: (offer(Value),Agent) count-as to bid(Value) while auction running & Agent is bidder.
/* An offered value is the current value if it is greater than the last one */
10: count-as current value(Value) when (to bid(Value),Agent)
while Agent is bidder & (not(Current is current value) & Current>Value)& (auction running|auction finished).
/* A bid is a commercial transaction */
11: to bid count-as commercial transaction.
/* A bank deposit from the winner to the auctioneer is a payment */
12: (bank deposit(Creditor,Value),Agent) count-as to pay(Value)
while Creditor is auctioneer & Agent is winner & auction finished & current value(Value).
4 Constitutive Dynamics
The interpretation of the constitutive rules produces the SFA composing the
SAI constitutive state. Constitutive rules can specify two kinds of constitution
of status functions: first-order constitution (Section 4.1) and second-order con-
stitution (Section 4.2). From these two definitions, Section 4.3 defines the con-
stitutive dynamics of SAI. This is all illustrated considering an auction scenario
whose regulation is specified in the Figure 2, according to the SAI specification
language proposed in [2].
AiF ={f −consta (F, C, X [i−1] , F [i−1] ) ∪ s−consta (F, C, X [i] , F [i] )}
EFi ={f −conste (F, C, X [i−1] , F [i−1] ) ∪ s−conste (F, C, X [i] , F [i] )}
SFi ={f −consts (F, C, X [i−1] , F [i−1] ) ∪ s−consts (F, C, X [i] , F [i] )}
Note that the proposed semantics does not define just the establishment of
the SFA but it defines also their revocations. For example, the constitutive rule 1
in the Figure 2 defines that the agent that proposes an auction is the auctioneer
while the auction is not finished. In the example, this condition ceases to hold in
the step 7, leading to a new state (8) where the assignment of the status function
auctioneer to the agent bob is revoked.
5 Related Work
Different approaches in the literature investigate how environmental facts affect
artificial institutions. Some, contrary to us, do not consider the environment pro-
ducing some kind of dynamic inside the institution: in [1,8], the environmental
elements are related to the concepts appearing in the norm specification but they
do not produce facts related to the dynamics of norms (violations, fulfilments,
etc); in [9], environmental facts determine properties that should hold in the
institution but the institution is in charge to take such information and produce
some dynamic where appropriate.
Some approaches, as we do, consider that environmental facts produce some
kind of dynamics in the institution: in [10] they affect the dynamics of organi-
sations producing role assignments, goal achievements, etc; they produce insti-
tutional events in [11]; they affect the normative dynamics in [12,13] producing
norm fulfilments, violations, etc. Compared to these related works, this paper
deals with the definition of how the environment determines another fact in the
institution, that is namely the constitution of status functions, defining (and not
just affecting) the constitutive dynamics that is the base of the regulation in
SAI.
When the constitution of each kind of status functions is considered in iso-
lation, some relations can be made, for example, between the constitution of
Bringing Constitutive Dynamics to Situated Artificial Institutions 635
important in the sense that it makes possible to situate the institution in the
environment while making possible to consider the definition and dynamics of
constitutive abstractions, generalisations, etc.
Future work include investigations about the normative state affecting
the SAI constitutive state, normative dynamics on top of the constitutive
dynamic, and manipulations inside the constitutive level through second-order
constitution.
References
1. Aldewereld, H., Álvarez Napagao, S., Dignum, F., Vázquez-Salceda, J.: Making
norms concrete. In: van der Hoek, W., Kaminka, G.A., Lespérance, Y., Luck, M.,
Sen, S. (eds) AAMAS 2010, pp. 807–814 (2010)
2. de Brito, M., Hübner, J.F., Boissier, O.: A conceptual model for situated artificial
institutions. In: Bulling, N., van der Torre, L., Villata, S., Jamroga, W., Vasconcelos,
W. (eds.) CLIMA XV 2014. LNCS (LNAI), vol. 8624, pp. 35–51. Springer, Heidelberg
(2014)
3. De Brito, M., Thevin, L., Garbay, C., Boissier, O., Hübner, J.F.: Situated artificial
institution to support advanced regulation in the field of crisis management. In:
Demazeau, Y., Decker, K.S., Bajo Pérez, J., De la Prieta, F. (eds.) PAAMS 2015.
LNCS (LNAI), vol. 9086, pp. 66–79. Springer, Heidelberg (2015)
4. Searle, J.: Making the Social World. The Structure of Human Civilization. Oxford
University Press (2009)
5. Brachman, R., Levesque, H.: Knowledge Representation and Reasoning. Morgan
Kaufmann Publishers Inc., San Francisco (2004)
6. Vos, M.D., Balke, T., Satoh, K.: Combining event-and state-based norms. In:
AAMAS 2013, pp. 1157–1158 (2013)
7. Cassandras, C.G., Lafortune, S.: Introduction to Discrete Event Systems. Springer-
Verlag New York Inc., Secaucus (2006)
8. Grossi, D., Meyer, J.-J.C., Dignum, F.P.M.: Counts-as: classification or constitu-
tion? an answer using modal logic. In: Goble, L., Meyer, J.-J.C. (eds.) DEON 2006.
LNCS (LNAI), vol. 4048, pp. 115–130. Springer, Heidelberg (2006)
9. de Brito, M., Hübner, J.F., Bordini, R.H.: Programming institutional facts in multi-
agent systems. In: Aldewereld, H., Sichman, J.S. (eds.) COIN 2012. LNCS (LNAI),
vol. 7756, pp. 158–173. Springer, Heidelberg (2013)
10. Piunti, M., Boissier, O., Hübner, J.F., Ricci, A.: Embodied organizations: a uni-
fying perspective in programming agents, organizations and environments. In:
MALLOW 2010. CEUR, vol. 627 (2010)
11. Cliffe, O., De Vos, M., Padget, J.: Answer set programming for representing and
reasoning about virtual institutions. In: Inoue, K., Satoh, K., Toni, F. (eds.)
CLIMA 2006. LNCS (LNAI), vol. 4371, pp. 60–79. Springer, Heidelberg (2007)
Bringing Constitutive Dynamics to Situated Artificial Institutions 637
12. Dastani, M., Grossi, D., Meyer, J.-J.C., Tinnemeier, N.: Normative multi-agent
programs and their logics. In: Meyer, J.-J.C., Broersen, J. (eds.) KRAMAS 2008.
LNCS (LNAI), vol. 5605, pp. 16–31. Springer, Heidelberg (2009)
13. Campos, J., López-Sánchez, M., Rodrı́guez-Aguilar, J.A., Esteva, M.: Formalising
situatedness and adaptation in electronic institutions. In: Hübner, J.F., Matson, E.,
Boissier, O., Dignum, V. (eds.) COIN 2008. LNCS (LNAI), vol. 5428, pp. 126–139.
Springer, Heidelberg (2009)
14. Jones, A., Sergot, M.: A formal characterisation of institutionalised power. Logic
Journal of IGPL 4(3), 427–443 (1996)
Checking WECTLK Properties of Timed
Real-Weighted Interpreted Systems
via SMT-Based Bounded Model Checking
1 Introduction
The formalism of interpreted systems (ISs) was introduced in [2] to model multi-
agent systems (MASs) [7], which are intended for reasoning about the agents’
epistemic and temporal properties. Timed interpreted systems (TIS) was pro-
posed in [9] to extend interpreted systems in order to make possible reasoning
about real-time aspects of MASs. The formalism of weighted interpreted systems
(WISs) [10] extends ISs to make the reasoning possible about not only temporal
and epistemic properties, but also about agents’s quantitative properties.
Multi-agent systems (MASs) are composed of many intelligent agents that
interact with each other. The agents can share a common goal or they can
pursue their own interests. Also, the agents may have deadline or other tim-
ing constraints to achieve intended targets. As it was shown in [2], knowledge
is a useful concept for analyzing the information state and the behaviour of
agents in multi-agent systems. Another different extensions of temporal logics [1]
with doxastic [4], and deontic [5] modalities have been proposed. In this paper,
we consider the existential fragment of a weighted epistemic computation tree
logic (WECTLK) interpreted over Timed Real-Weighted Interpreted Systems
(TRWISs).
SMT-based bounded model checking (BMC) consists in translating the exis-
tential model checking problem for a modal logic and for a model to the satis-
fiability modulo theory problem (SMT-problem) of a quantifier-free first-order
formula.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 638–650, 2015.
DOI: 10.1007/978-3-319-23485-4 64
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 639
The original contributions of the paper are as follows. First, we define TRWIS
as a model of MASs with the agents that have real-time deadlines to achieve
intended goals and each transition holds a weight, which can be any non-negative
real value. Second, we introduce the language WECTLK. Third, we propose a
SMT-based BMC technique for TRWIS and for WECTLK.
To the best of our knowledge, there is no work that considers SMT-based
BMC methods to check multi-agent systems modelled by means of timed real-
weighted interpreted systems. Thus, in this paper we offer such a method. In
particular, we make the following contributions. Firstly, we define and imple-
ment an SMT-based BMC method for WECTLK and for TRWISs. Secondly, we
report on the initial experimental evaluation of our SMT-based BMC method.
To this aim we use a scalable benchmark: the timed weighted generic pipeline
paradigm [8,10].
The structure of the paper is as follows. In Section 2 we shortly introduce
the theory of timed real-weighted interpreted systems and the WECTLK lan-
guage. In Section 3 we present our SMT-based BMC method. In Section 4 we
experimentally evaluate the performance of our SMT-based BMC encoding. We
conclude the paper in Section 5.
2 Preliminaries
In this section we first explain some notations used through the paper, and next
we define timed real-weighted interpreted systems, and next we introduce syntax
and semantics of WECTLK.
Let IN be a set of natural numbers, IN+ = IN \ {0}, IR be the set of non-
negative real numbers, and X be a finite set of non-negative natural variables,
called clocks ranging over a set of non-negative natural numbers. A clock val-
uation is a function v : X → IN that assigns to each clock x ∈ X a non-
negative natural value v(x). A set of all the clock valuations is denoted by IN|X | .
The valuation v = v[X := 0], for X ⊆ X is defined as: ∀x∈X v (x) = 0 and
∀x∈X \X v (x) = v(x). For δ ∈ IN, v + δ denotes the valuation that assigns the
value v(x) + δ to each clock x.
The grammar
ϕ := true | x < c | x ≤ c | x = c | x ≥ c | x > c | ϕ ∧ ϕ
generates the set C(X ) of clock constraints over X , where x ∈ X and c ∈ IN. A
clock valuation v satisfies a clock constraint ϕ, written as v, iff ϕ evaluates to
be true using the clock values given by v.
Let cmax be a constant and v, v ∈ IN|X | two clock valuation. We say that
v v iff the following condition holds for each x ∈ X :
v(x) > cmax and v (x) > cmax or v(x) ≤ cmax and v (x) ≤ cmax and
v(x) = v (x)
The clock valuation v such that for each clock x ∈ X , v (x) = v(x) + 1 if
v(x) ≤ cmax , and v (x) = cmax + 1 otherwise, is called a time successor of v
(written succ(v)).
640 A.M. Zbrzezny and A. Zbrzezny
–Act = Act1 × . . . × Actn × ActE is the set of all the joint actions,
–S = (L1 × IN|X1 | ) × . . . × (Ln × IN|Xn |) ) × (LE × IN|XE | ) is the set of all the global
states
–ι = (ι1 × {0}|X1 | ) × . . . × (ιn × {0}|Xn | ) × (ιE × ({0}|XE | ) is the set of all the
initial global states,
–V : S → 2PV is the valuation function defined as V(s) = c∈Ag∪{E} Vc (lc (s)),
T ⊆ S × (Act ∪ IN) × S is a transition relation defined by action and time
transitions. For a ∈ Act and δ ∈ IN:
a
1. action transition: (s, a, s ) ∈ T (or s −→ s ) iff for all c ∈ Ag ∪ E, there exists
a local transition tc (lc (s), ϕc , X , a) = lc (s ) such that vc (s) |= ϕc ∧ I(lc (s))
and vc (s ) = vc (s)[X := 0] and vc (s ) |= I(lc (s ));
2. time transition (s, δ, s ) ∈ T iff for all c ∈ Ag ∪ E, lc (s) = lc (s ) and vc (s ) =
vc (s) + δ and vc (s ) |= I(lc (s )).
–d : Act → IR is the “joint” weight function defined as follows: d((a1 , . . . ,
an , aE )) = d1 (a1 ) + . . . + dn (an ) + dE (aE ).
PV
is the set of all abstract global states. V : S → 2 is the valuation function such
that: p ∈ V(s) iff p ∈ c∈Ag∪E Vc (lc (s)) for all p ∈ PV; and T ⊆ S ×(Act∪τ )×S.
Let a ∈ Act. Then,
1. Action transition: (s, a, s ) ∈ T iff ∀c∈Ag ∃φc ∈C(Xc ) ∃Xc ⊆Xc (tc (lc (s),
φc , Xc , a) = lc (s ) and vc |= φc ∧ I(lc (s)) and vc (s ) = vc (s)[Xc := 0] and
vc (s ) |= I(lc (s )))
2. Time transition: (s, τ, s ) ∈ T iff ∀c∈Ag∪E (lc (s) = lc (s )) and vc (s) |= I(lc (s))
and succ(vc (s)) |= I(lc (s))) and ∀c∈Ag (vc (s ) = succ(vc (s ))) and (vE (s ) =
succ(vE (s))).
1 b 2 b 3 b
A path π in an abstract model is a sequence s0 −→ s1 −→ s2 −→ . . . of
transitions such that for each i ≤ 1, bi ∈ Act ∪ {τ } and b1 = τ and for each two
consecutive transitions at least one of them is a time transition.
Given an abstract model one can define the indistinguishability relation ∼c ⊆
S × S for agent c as follows: s ∼c s iff lc (s ) = lc (s) and vc (s ) = vc (s).
642 A.M. Zbrzezny and A. Zbrzezny
Proof. The theorem can be proved by induction on the length of the formula ϕ
(for details one can see [8]).
Thus, given the above, we can define the formula [Mϕ,ι ]k as follows:
fk (ϕ)
[Mϕ,ι ]k := s∈ι Is (w0,0 ) ∧ j=1 w0,0 = w0,j ∧
fk (ϕ) k−1
j=1 i=0 T (wi,j , ((ai,j , di,j ), δ i,j ), wi+1,j )
644 A.M. Zbrzezny and A. Zbrzezny
where wi,j , ai,j , and di,j are, respectively, symbolic states, symbolic actions, and
symbolic weights for 0 ≤ i ≤ k and 1 ≤ j ≤ fk (ϕ). Hereafter, by πj we denote
the j-th symbolic k-path of the above unfolding, i.e., the sequence of transitions:
(a1,j ,d1,j ),δ 1,j (a2,j ,d2,j ),δ 2,j (ak,j ,dk,j ),δ k,j
w0,j −→ w1,j −→ ... −→ wk,j .
The formula [ϕ]M,k encodes the bounded semantics of a WECTLK for-
mula ϕ, and it is defined on the same sets of individual variables as the for-
mula [Mϕ,ι ]k . Moreover, it uses the auxiliary quantifier-free first-order formulae
defined in [8].
Furthermore, following [8], our formula [ϕ]M,k uses the following auxiliary
functions gl , gr , gμ , hU , hG that were introduced in [11], and which allow to
divide the set A ⊆ Fk (ϕ) = {j ∈ IN | 1 ≤ j ≤ fk (ϕ)} into subsets needed for
translating the subformulae of ϕ. Let 0 ≤ n ≤ fk (ϕ), m k, and n = min(A).
The rest of translation is defined in the same way as in [8].
[m,n,A] [m,n,A]
–[true]k := true, [false]k := false,
[m,n,A]
–[p]k := p(wm,n ),
[m,n,A]
–[¬p]k := ¬p(wm,n ),
[m,n,A] [m,n,gl (A,fk (α))] [m,n,gr (A,fk (β))]
–[α ∧ β]k := [α]k ∧ [β]k ,
[m,n,A] [m,n,gl (A,fk (α))] [m,n,gl (A,fk (β))]
–[α ∨ β]k := [α]k ∨ [β]k ,
[m,n,A] [1,n ,g (A)]
–[EXI α]k := wm,n = w0,n ∧ (d1,n ∈ I) ∧ [α]k µ
, if k > 0; false,
otherwise,
[m,n,A] k [i,n ,hU (A,k,fk (β))(j)]
–[E(αUI β)]k := wm,n = w0,n ∧ i=0 ([β]k ∧
i i−1 [j, n ,hU (A,k,fk (β))]
( j=1 dj,n ∈ I ∧ j=0 [α]k ),
[m,n,A] k k i
–[E(GI α)]k := wm,n = w0,n ∧ ( j=1 dj,n ≥ right(I)∧ i=0 ( j=1 dj,n ∈ /
[i,n ,hG (A,k)(j)] k k i
I ∨ [α]k )) ∨ ( j=1 dj,n < right(I) ∧ i=0 ( j=1 dj,n ∈ / I∨
[i,n ,hG (A,k)(j)] k−1 k−1 I
[α]k ) ∧ l=0 (wk,n = wl,n ∧ i=l (¬D0,k;l,i+1 (πn )∨
[i,n ,hG (A,k)(j)]
[α]k ))) ,
[m,n,A] k [j,n ,gµ (A)]
–[Kc α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[DΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k [j,n ,gµ (A)]
–[EΓ α]k := ( s∈ι Is (w0,n )) ∧ j=0 ([α]k ∧ c∈Γ Hc (wm,n , wj,n )),
[m,n,A] k j [m,n,A]
–[CΓ α]k := [ j=1 (EΓ ) α]k .
The theorem below states the correctness and the completeness of the presented
translation. It can be proved in a standard way by induction on the complexity
of the given WECTLK formula.
4 Experimental Results
In this section we experimentally evaluate the performance of our SMT-based
BMC encoding for WECTLK over the TRWIS semantics.
The benchmark we consider is the timed weighted generic pipeline paradigm
(TWGPP) TRWIS model [10]. The model of TWGPP involves n + 2 agents:
– Producer producing data within certain time interval ([a, b]) or being inactive,
– Consumer receiving data within certain time interval ([c, d]) or being inac-
tive within certain time interval ([g, h]),
– a chain of n intermediate Nodes which can be ready for receiving data
within certain time interval ([c, d]), processing data within certain time interval
([e, f ]) or sending data.
The weights are used to adjust the cost properties of Producer, Consumer, and
of the intermediate Nodes.
Each agent of the scenario can be modelled by considering its local states,
local actions, local protocol, local evolution function, local weight function, the
local clocks, the clock constraints, invariants, and local valuation function. Fig. 1
shows the local states, the possible actions, and the protocol, the clock con-
straints, invariants and weights for each agent. Null actions are omitted in the
figure.
Given Fig. 1, the local evolution functions of TWGPP are straightforward
to infer. Moreover, we assume the following set of propositional variables: PV =
{P rodReady, P rodSend, ConsReady, ConsF ree} with the following definitions
of local valuation functions:
646 A.M. Zbrzezny and A. Zbrzezny
Finally, we assume the following two local weight functions for each agent:
The set ofnall the global states S for the scenario is defined as the product
(LP × IN) × i=1 (Li × IN) × (LC × IN). The set of the initial states is defined as
ι = {s0 }, where
s0 = ((P rodReady-0, 0), (N ode1 Ready-0, 0), . . . , (N oden Ready-
0, 0), (ConsReady-0, 0)).
The system is scaled according to the number of its Nodes (agents), i.e., the
problem parameter n is the number of Nodes. For any natural number n ≥ 0, let
D(n) = {1, 3, . . . , n − 1, n + 1} for an even n, and D(n) = {2, 4, . . . , n − 1, n + 1}
for an odd n. Moreover, let
j j−1
r(j) = dP (P roduce) + 2 · i=1 dNi (Sendi ) + i=1 ·dNi (proci )
Then we define Right as follows:
Right = j∈D(n) r(j).
We consider the following formulae as specifications:
Checking WECTLK Properties of Timed Real-Weighted Interpreted Systems 647
Memory in MB
3500
Time in sec.
3000 150
2500
2000 100
1500
1000 50
500
0 0
1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11 13 15 17 19 21 23
Number of Nodes Number of Nodes
7000
Time in sec.
3000
6000
2500
5000
2000
4000
3000 1500
2000 1000
1000 500
0 0
1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
Number of Nodes Number of Nodes
10000
Time in sec.
2000
8000
1500
6000
1000
4000
500 2000
0 0
1 2 4 6 8 10 12 14 16 1 2 4 6 8 10 12 14 16
Number of Nodes Number of Nodes
Memory in MB
14000
Time in sec.
1000 12000
800 10000
600 8000
6000
400
4000
200 2000
0 0
1 2 3 4 5 6 1 2 3 4 5 6
Number of Nodes Number of Nodes
Table 1. Formula ϕ2 for basic weights Table 2. Formula ϕ2 for basic weights
multiplied by 1,000
5 Conclusions
We have proposed SMT-based BMC verification method for model checking
WECTLK properties interpreted over the timed real-weighted interpreted sys-
tems. We have provided a preliminary experimental results showing that our
method is worth interest. In the future we are going to provide a comparison
of our new method with the SAT- and BDD-based BMC methods. The module
will be added to the model checker VerICS([3]).
References
1. Emerson, E.A.: Temporal and modal logic. In: van Leeuwen, J. (eds.) Handbook of
Theoretical Computer Science, vol. B, chapter 16, pp. 996–1071. Elsevier Science
Publishers (1999)
2. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge.
MIT Press, Cambridge (1995)
3. Kacprzak, M., Nabialek, W., Niewiadomski, A., Penczek, W., Pólrola, A., Szreter,
M., Woźna, B., Zbrzezny, A.: VerICS 2007 - a model checker for knowledge and
real-time. Fundamenta Informaticae 85(1–4), 313–328 (2008)
4. Levesque, H.: A logic of implicit and explicit belief. In: Proceedings of the 6th
National Conference of the AAAI, pp. 198–202. Morgan Kaufman, Palo Alto (1984)
5. Lomuscio, A., Sergot, M.: Deontic interpreted systems. Studia Logica 75(1), 63–92
(2003)
6. de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008)
7. Wooldridge, M.: An introduction to multi-agent systems, 2nd edn. John Wiley &
Sons (2009)
8. Woźna-Szcześniak, B.: SAT-Based bounded model checking for weighted deontic
interpreted systems. In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013.
LNCS, vol. 8154, pp. 444–455. Springer, Heidelberg (2013)
9. Woźna-Szcześniak, B.: Checking EMTLK properties of timed interpreted systems
via bounded model checking. In: Bazzan, A.L.C., Huhns, M.N., Lomuscio, A.,
Scerri, P. (eds.) International Conference on Autonomous Agents and Multi-Agent
Systems, AAMAS 2014, Paris, France, May 5–9, pp. 1477–1478. IFAAMAS/ACM
(2014)
10. Woźna-Szcześniak, B., Zbrzezny, A.M., Zbrzezny, A.: SAT-Based bounded model
checking for weighted interpreted systems and weighted linear temporal logic. In:
Boella, G., Elkind, E., Savarimuthu, B.T.R., Dignum, F., Purvis, M.K. (eds.)
PRIMA 2013. LNCS, vol. 8291, pp. 355–371. Springer, Heidelberg (2013)
11. Zbrzezny, A.: Improving the translation from ECTL to SAT. Fundamenta Infor-
maticae 85(1–4), 513–531 (2008)
12. Zbrzezny, A.: A new translation from ECTL∗ to SAT. Fundamenta Informaticae
120(3–4), 377–397 (2012)
SMT-Based Bounded Model Checking
for Weighted Epistemic ECTL
1 Introduction
The previous ten years in the area of multi-agent systems (MASs) have seen
significant research in verification procedures, which automatically evaluate
whether a MAS reaches its intended specifications. One of main techniques
here is symbolic model checking [2]. Unfortunately, because of the agents’ intri-
cate nature, the practical applicability of model checking is firmly limited by
the “state-space explosion problem” (i.e., an exponential growth of the system
state space with the number of agents). To reduce this issue, various techniques,
including the SAT- and BDD-based bounded model checking (BMC) [3,4], have
been proposed. These have been effective in permitting users to handle bigger
MASs, however it is still hard to check MASs with numerous agents and cost
demands on agents’ actions. The point of this paper is to help beat this inad-
equacy by employing SMT-solvers (i.e., satisfiability modulo theories tools for
deciding the satisfiability of formulae in a number of theories) [1,5].
The fundamental thought behind bounded model checking (BMC) is, given
a system, a property, and an integer bound k ≥ 0, to define a formula (in the
case of SAT-based BMC, this is a propositional logic formula; in the case of
SMT-based BMC, this can be a quantifier-free first-order formula) such that the
formula is satisfiable if and only if the system has a counterexample of length
at most k violating the property. The bound is incremented until a satisfiable
formula is discovered (i.e, the specification does not hold for the system) or a
completeness threshold is reached without discovering any satisfiable formulae.
To model check the prerequisites of MASs, different extensions of temporal
logics have been proposed. In this paper, we consider the existential fragment of
Partly supported by National Science Centre under the grant No. 2014/15/N/
ST6/05079.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 651–657, 2015.
DOI: 10.1007/978-3-319-23485-4 65
652 A.M. Zbrzezny et al.
2 Preliminaries
WIS. Let Ag = {1, . . . , n} be the non-empty and finite set of agents, and E
be a special agent that is used to model the environment in which the agents
operate, and let PV = c∈Ag∪{E} PV c be a set of propositional variables such
that PV c1 ∩ PV c2 = ∅ for all c1 , c2 ∈ Ag ∪ {E}. The weighted interpreted system
(WIS) [6,7] is a tuple ({Lc , Actc , Pc , Vc , dc }c∈Ag∪{E} , {tc }c∈Ag , tE , ι), where Lc
is non-empty and finite set of local states (S = L1 × . . . × Ln × LE denotes the
non-empty set of all global states), Actc is a non-empty and finite set of possible
actions (Act = Act1 × . . . × Actn × ActE denotes the non-empty set of joint
actions), Pc : Lc → 2Actc is a protocol function, Vc : Lc → 2PV c is a valuation
function, dc : Actc → IN is a weight function, tc : Lc × LE × Act → Lc is
a (partial) evolution function for agents, and tE : LE × Act → LE is (partial)
evolution function for the environment, and ι ⊆ S is a set of initial global states.
Assume that lc (s) denotes the local component of agent c ∈ Ag ∪ {E} in the
global state s ∈ S. For a given WIS we define a model as a tuple M = (Act, S, ι,
T, V, d), where the sets Act and S aredefined as above, V : S → 2PV is the
valuation function defined as V(s) = c∈Ag∪{E} Vc (lc (s)), d : Act → IN is a
“joint” weight function defined as d((a1 , . . . , an , aE )) = d1 (a1 ) + . . . + dn (an ) +
dE (aE ), and T ⊆ S × Act × S is a transition relation defined as: (s, a, s ) ∈ T (or
a
s −→ s ) iff tc (lc (s), lE (s), a) = lc (s ) for all c ∈ Ag and tE (lE (s), a) = lE (s ); we
assume that the relation T is total, i.e. for any s ∈ S there exists s ∈ S and a
a
non-empty joint action a ∈ Act such that s −→ s .
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 653
Syntax of WECTLK. The WECTLK logic has been defined in [6] as the
existential fragment of the weighted CTLK with cost constraints on all temporal
modalities.
For convenience, the symbol I denotes an interval in IN = {0, 1, 2, . . .} of the
form [a, ∞) or [a, b), for a, b ∈ IN and a = b. Moreover, the symbol right(I)
denotes the right end of the interval I. Given an atomic proposition p ∈ PV, an
agent c ∈ Ag, a set of agents Γ ⊆ Ag and an interval I, the WECTLK formulae
are defined by the following grammar: ϕ ::= true | false | p | ¬p | ϕ ∨ ϕ | ϕ ∧ ϕ |
EXI ϕ | E(ϕUI ϕ) | EGI ϕ | Kc ϕ |DΓ ϕ | EΓ ϕ | CΓ ϕ.
E (for some path) is the path quantifier. XI (weighted neXt time), UI
(weighted until) and GI (weighted always) are the weighted temporal modal-
ities. Note that the formula “weighted eventually” is defined as standard:
def
EFI ϕ = E(trueUI ϕ) (meaning that it is possible to reach a state satisfying
ϕ via a finite path whose cumulative weight is in I). Kc is the modality dual to
Kc . DΓ , EΓ , and CΓ are the dualities to the standard group epistemic modalities
representing, respectively, distributed knowledge in the group Γ , everyone in Γ
knows, and common knowledge among agents in Γ .
We omit here the definition of the bounded (i.e., the relation |=k ) and
unbounded semantics (i.e., the relation |=) of the logic, since they can be found
in [6] . We only recall the notions of k-paths and loops, since we need them to
explain the SMT-based BMC. Namely, given a model M and a bound k ∈ IN,
a1 a2 ak
a k-path πk is a finite sequence s0 −→ s1 −→ . . . −→ sk of transitions. A k-path
πk is a loop if l < k and π(k) = π(l). Furthermore, let M be a model, and ϕ
a WECTLK formula. The bounded model checking problem asks whether there
exists k ∈ IN such that M |=k ϕ, i.e., whether there exists k ∈ IN such that the
formula ϕ is k-true in the model M .
3 SMT-Based BMC
In order to encode the BMC problem for WECTLK by means of SMT, we
consider a quantifier-free logic with individual variables ranging over the natural
numbers. Formally, let M be the model, ϕ a WECTLK formula and k ≥ 0 a
bound. We define the quantifier-free first-order formula: [M, ϕ]k := [M ϕ,ι ]k ∧
[ϕ]M,k that is satisfiable if and only if M |=k ϕ holds.
The definition of the formula [M, ϕ]k is based the SAT encoding of [6], and
it assumes that each state, each joint action, and each sequence of weights
associated with a joint action are represented by a valuation of, respectively,
a symbolic state w = (w1 , . . . , wn , wE ) consisting of symbolic local states wc ,
a symbolic action a = (a1 , . . . , an , aE ) consisting of symbolic local actions ac ,
and a symbolic weights d = (d1 , . . . , dn+1 ) consisting of symbolic local weights
654 A.M. Zbrzezny et al.
dc , where each wc , ac , and dc are individual variables ranging over the natu-
ral numbers, for c ∈ Ag ∪ {E}. Next, the definition of [M, ϕ]k uses the auxil-
iary function fk :WECTLK → IN of [6] which returns the number of k-paths
that are required for proving the k-truth of ϕ in M . Finally, the definition of
[M, ϕ]k uses the following auxiliary quantifier-free first-order formulae: Is (w) -
it encodes the state s of the model M ; p(w) - it encodes the set of states of M
in which p ∈ PV holds; Hc (w, w ) := wc = w c for c ∈ Ag; Tc (wc , (a, d), w c ) -
it encodes the local evolution function of agent c ∈ Ag ∪ {E}; A(a) - it encodes
that each symbolic local action ac of a has to be executed by each agent in
which it appears; T (w, (a, d), w ) := A(a) ∧ c∈Ag∪{E} Tc (wc , (a, d), wc ). Let
πj denote the j-th symbolic k-path, i.e. the sequence of symbolic transitions:
a1,j ,d1,j a2,j ,d2,j ak,j ,dk,j
w0,j −→ w1,j −→ . . . −→ wk,j , and let di,j,m denotes the m-th com-
ponent of the symbolic joint weight di,j . Then,
k n+1
– BkI (πj ) := i=1 m=1 di,j,m < right(I) - it encodes that the weight repre-
sented by the sequence d1,j , . . . , dk,j is less than right(I);
I
– Da,b (πj ) for a ≤ b - if a < b, then it encodes that the weight represented by
the sequence da+1,j , . . . , db,j belongs to the interval I, otherwise, i.e. if a = b,
I
then Da,b (πj ) is true iff 0 ∈ I;
I
– Da,b;c,d (πj ) for a ≤ b and c ≤ d - it encodes that the weight represented by
the sequences da+1,j , . . . , db,j and dc+1,j , . . ., dd,j belongs to the interval I.
Given symbolic states wi,j , symbolic actions ai,j and symbolic weights di,j
for 0 ≤ i ≤ k and 0 ≤ j < fk (ϕ), the formula [M ϕ,ι ]k , which encodes a rooted
tree of k-paths of the model M , is defined as follows:
fk (ϕ)−1 k−1
ϕ,ι
[M ]k := Is (w0,0 ) ∧ T (wi,j , (ai+1,j , di+1,j ), wi+1,j )
s∈ι j=0 i=0
The formula [ϕ]M,k encodes the bounded semantics of the WECTLK formula
ϕ, it is defined on the same sets of individual variables as the formula [M ϕ,ι ]k ,
and it uses the auxiliary functions gμ , hU G
k , hk of [9] that allow us to divide the
set A ⊆ Fk (ϕ) = {j ∈ IN | 0 ≤ j < fk (ϕ)} into subsets necessary for translating
the subformulae of ϕ.
[m,n,A]
Let [ϕ]k denote the translation of ϕ at symbolic state wm,n by using the
[0,0,Fk (ϕ)]
set A ⊆ Fk (ϕ). The formula [ϕ]M,k := [ϕ]k is defined inductively with the
classical rules for the propositional fragment of WECTLK and with the following
rules for weighted temporal and epistemic modalities. Let 0 ≤ n ≤ fk (ϕ), m k,
n = min(A), hU U G G
k = hk (A, fk (β)), and hk = hk (A). Then,
[m,n,A] I [1,n ,g (A)]
–[EXI α]k := wm,n = w0,n ∧ D0,1 (πn ) ∧ [α]k µ
, if k > 0; false, else,
[m,n,A] k [i,n ,hUk (k)] I
–[E(αUI β)]k := wm,n = w0,n ∧ i=0 ([β]k ∧ D0,i (πn )∧
i−1 [j, n ,hUk (j)]
j=0 [α]k ),
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 655
k
:= wm,n = w0,n ∧ (¬BkI (πn ) ∧ i=0 (¬D0,i I
[m,n,A]
–[E(GI α)]k (πn )∨
[i,n ,hG (k)] k [i,n
,hG
(k)]
[α]k k
)) ∨ (BkI (πn ) ∧ i=0 (¬D0,iI
(πn ) ∨[α]k k
)∧
k−1 k−1 I
G
[i,n ,hk (k)]
l=0 (w k,n = w l,n ∧ i=l (¬D0,k;l,i+1 (πn ) ∨[α]k ))) ,
4 Experimental Results
Here we experimentally evaluate the performance of our SMT-based BMC
method for WECTLK over the WIS semantics. We compare our method with
the SAT-based BMC [6,8], the only existing method that is suitable with respect
to the input formalism (i.e., weighted interpreted systems) and checked proper-
ties (i.e., WECTLK). We have computed our experimental results on a computer
equipped with I7-3770 processor, 32 GB of RAM, and the operating system Arch
Linux with the kernel 3.15.3. We set the CPU time limit to 3600 seconds. For
the SAT-based BMC we used the PicoSAT solver and for the SMT-based BMC
we used the Z3 solver.
The first benchmark we consider is the weighted generic pipeline
paradigm (WGPP) WIS model [6]. The problem parameter n is the number of
Nodes. Let M in be the minimum cost incurred by Consumer to receive the data
produced by Producer, and p denote the cost of producing data by Producer.
The specifications we consider are as follows:
ϕ1 = KP EF[M in,M in+1) ConsReady - it expresses that it is not true that Pro-
ducer knows that always the cost incurred by Consumer to receive data is
M in.
ϕ2 = KP EF(P rodSend ∧ KC KP EG[0,M in−p) ConsReady) - it states that it is
not true that Producer knows that always if it produces data, then Consumer
knows that Producer knows that Consumer has received data and the cost is
less than M in − p.
656 A.M. Zbrzezny et al.
The size of the reachable state space of the WGPP system is 4 · 3n , for
n ≥ 1. The number of the considered k-paths is equal to 2 for ϕ1 and 5 for
ϕ2 , respectively. The lengths of the discovered witnesses for formulae ϕ1 and ϕ2
vary, respectively, from 3 for 1 node to 23 for 130 nodes, and from 3 for 1 node
to 10 for 27 nodes.
The second benchmark of our interest is the weighted bits transmission prob-
lem (WBTP) WIS model [7]. We have adapted the local weight functions of [7].
This system is scaled according to the number of bits the S wants to communi-
cate to R. Let a ∈ IN and b ∈ IN be the costs of sending, respectively, bits by
Sender and an acknowledgement by Receiver. The specifications we consider are
as follows: 2n −2
φ1 = EF[a+b,a+b+1) (recack ∧ KS (KR ( i=0 (¬i)))) - it expresses that it is not
true that if an ack is received by S, then S knows that R knows at least one
value of the n-bit numbers except the maximal value, and the cost is a + b.
2n −1
φ2 = EF[a+b,a+b+1) (KS ( i=0 (KR (¬i))) - it expresses that it is not true that
S knows that R knows the value of the n-bit number and the cost is a + b.
The size of the reachable state space of the WBTP system is 3 · 2n for n ≥
1.The number of the considered k-paths is equal to 3 for φ1 and 2n + 2 for φ2 ,
respectively. The length of the witnesses for both formulae is equal to 2 for any
n > 0.
Performance Evaluation. The experimental results show that the both BMC
method, SAT- and SMT-based, are complementary. We have noticed that for
the WGPP system and both considered formulae the SMT-based BMC is faster
than the SAT-base BMC, however, the SAT-based BMC consumes less memory.
Moreover, the SMT-based method is able to verify more nodes for both tested
formulae. In particular, in the time limit set for the benchmarks, the SMT-based
BMC is able to verify the formula ϕ1 for 120 nodes while the SAT-based BMC
can handle 115 nodes. For ϕ2 the SMT-based BMC is still more efficient - it is
able to verify 27 nodes, whereas the SAT-based BMC verifies only 25 nodes.
In the case of the WBTP system the SAT-based BMC performs much better
in terms of the total time and the memory consumption for both the tested
formulae. In the case of the formula φ2 both methods are able to verify the same
number of bits. For the WBTP the reason of a higher efficiency of the SAT-
based BMC is, probably, that the lengths of the witnesses for both formulae
is constant and very short, and that there is no nested temporal modalities in
the scope of epistemic operators. For formulae like φ1 and φ2 the number of
arithmetic operations is small, so the SMT-solvers cannot show its strength.
Further more we have noticed that the total time and the memory consump-
tion for both benchmarks and all the tested formulae is independent from the
values of the considered weights.
SMT-Based Bounded Model Checking for Weighted Epistemic ECTL 657
5 Conclusions
We have proposed, implemented, and experimentally evaluated SMT-based
bounded model checking approach for WECTLK interpreted over the weighted
interpreted systems. We have compared our method with the corresponding
SAT-based technique. The experimental results show that the approaches are
complementary, and that the SMT-based BMC approach appears to be superior
for the WGPP system, while the SAT-based approach appears to be superior
for the WBTP system. This is a novel and interesting result, which shows that
the choice of the BMC method should depend on the considered system.
References
1. Clark, B., Sebastiani, R., Sanjit, S., Tinelli, C.: Satisfiability modulo theories. In:
Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications, vol.
185, chapter 26, pp. 825–885. IOS Press (2009)
2. Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. The MIT Press (1999)
3. Jones, A.V., Lomuscio, A.: Distributed BDD-based BMC for the verification of
multi-agent systems. In: Proc. AAMAS 2010, pp. 675–682. IFAAMAS (2010)
4. Mȩski, A., Penczek, W., Szreter, M., Woźna-Szcześniak, B., Zbrzezny, A.: BDD- ver-
sus SAT-based bounded model checking for the existential fragment of linear tem-
poral logic with knowledge: algorithms and their performance. Autonomous Agents
and Multi-Agent Systems 28(4), 558–604 (2014)
5. de Moura, L., Bjørner, N.S.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R.,
Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg
(2008)
6. Woźna-Szcześniak, B.: SAT-based bounded model checking for weighted deontic
interpreted systems. In: Reis, L.P., Correia, L., Cascalho, J. (eds.) EPIA 2013.
LNCS, vol. 8154, pp. 444–455. Springer, Heidelberg (2013)
7. Woźna-Szcześniak, B., Zbrzezny, A.M., Zbrzezny, A.: SAT-based bounded model
checking for weighted interpreted systems and weighted linear temporal logic. In:
Boella, G., Elkind, E., Savarimuthu, B.T.R., Dignum, F., Purvis, M.K. (eds.)
PRIMA 2013. LNCS, vol. 8291, pp. 355–371. Springer, Heidelberg (2013)
8. Woźna-Szcześniak, B., Szcześniak, I., Zbrzezny, A.M., Zbrzezny, A.: Bounded model
checking for weighted interpreted systems and for flat weighted epistemic compu-
tation tree logic. In: Dam, H.K., Pitt, J., Xu, Y., Governatori, G., Ito, T. (eds.)
PRIMA 2014. LNCS, vol. 8861, pp. 107–115. Springer, Heidelberg (2014)
9. Zbrzezny, A.: Improving the translation from ECTL to SAT. Fundamenta Informat-
icae 85(1–4), 513–531 (2008)
Dynamic Selection of Learning Objects
Based on SCORM Communication
Adaptability and reuse are important aspects that contribute to improve learning
process in virtual learning environments [1]. The former relates to different students’
profiles and needs. An adaptable system increases the student understanding, taking
into account its knowledge level and preferences [2,3,4]. The latter means that it is
unnecessary to develop new resources if there are others related to the same learning
purpose [4,5]. Some computational tools improve the teaching-learning process, i.e.:
(1) Intelligent Tutoring Systems (ITS) - applications created for a specific domain,
generally with few adaptability and interoperability [6]; (2) Learning Management
Systems (LMS) - environments used to build online courses (or publishing material),
allowing teacher to manage educational data [1], [7,8]; and (3) Learning Objects (LO)
- digital artifacts that promotes reuse and adaptability of resources [9]. LO and LMS
provide reusability, but they usually are not dynamically adaptable [8,9]. This article
presents our research that seeks the convergence of these different paradigms for the
development of intelligent learning environments and describes the mechanisms of an
Intelligent Learning Objects’ dynamic presentation model, based on communication
with SCORM (Sharable Content Object Reference Model) resources [19, 20].
There are analogous studies that provides adaptability to learning systems. Some
examples extend the LMS with distinct adaptive strategies, such as conditional jumps
[8], Bayesian networks [3] or data mining [7]. Other researches are not integrated with
a LMS, and use diversified ways to adapt the learning to the students’ style, i.e.: ITS
[6], recommender system [2], genetic algorithm [10] and swarm intelligence [11].
Moreover, there are some similar works based on the Multi-Agent System (MAS)
approach resulting on smarter applications [12,13]. Some of them combine LMS and
MAS to make the former more adaptive [14], and another is a dynamically adaptive
environment, based on agents that are able to identify the student cognitive profile
[14]. These related works identify the student’s profile applying questionnaires in the
beginning of the course or by clustering the students according to their assessments
performance. Additionally, we observe in these papers that the attachment of new LO
to the system is not possible without teacher intervention. The educator needs to con-
figure previously all the possible course paths for each student style, what could be
hard and take so much time [3]. Further, the attaching of a new LO to the course in-
volves modifying its structure, resulting in limited adaptability and reuse.
In order to produce more intelligent LO, we have proposed in previous researches
the convergence between the LO and MAS technologies, called Intelligent Learning
Objects (ILO) [15]. This approach makes possible to offer more adaptive, reusable
and complete learning experiences, following the learner cognitive characteristics and
performance. According to this approach an ILO is an agent capable to play the role
of a LO, which can acquire new knowledge by the interaction with students and other
ILO (agents information exchange), raising the potential of student’s understanding.
The LO metadata permits the identification of what educational topic is related to the
LO [9]. Hence, the ILO (agents) are able to find out what is the subject associated
with the learning experience shown to the student, and then to show complementary
information (another ILO) to solve the student’s lack of knowledge in that subject.
2 ILOMAS
The proposed model integrates MAS and LMS into an intelligent behavior system,
resulting on the improvement of the related works, leading to dynamic LO inclusion.
The objective of the new model called Intelligent Learning Object Multi-Agent Sys-
tem (ILOMAS) is to enhance the framework developed to create ILO based on MAS
with BDI architecture [16], extending this model to allow the production of adaptive
and reusable learning experiences taking advantage of the SCORM data model ele-
ments. The idea is to select dynamically ILO in the LMS according to the student
performance, without previous specific configuration on the course structure. The
proposed model achieves reuse by the combination of pre-existed and validated LO
whose concept is the same of that the student needs to learn about, avoiding the build-
ing of new materials. Moreover, the course structure becomes more flexible, since it
is unnecessary to configure all the possible learning paths for each student profile.
The solution’s adaptability is based on the ability to attach new LO to the LMS
(that was not explicitly added before) as soon as the system finds out that the student
needs to reinforce its understanding on a specific concept. This is automatically iden-
tified through the verification of the student assessment performance (i.e.: grade), on
each instructional unit, or by student choice, when the learner interacting with the LO.
660 J. de Amorim Junior and R.A. Silveira
It is important to clarify that the approach does not use student’ learning profile (i.e.:
textual, interactive [2]) as information to select LOs. The scope of this research is to
consider only the learner performance results (grades, time of interaction, sequencing
and navigation). The ILOMAS is composed by agents with specific goals, and capa-
ble of communicating and offering learning experiences to students in a LMS course,
according to the interaction with these students, taking advantage of the SCORM
standard’s features [19]. The ILOMAS architecture needs two kinds of agents:
• LMSAgent – Finds out the subject that the student must learn about, and passes the
control of the interaction with the student to a new ILOAgent. Its beliefs are data
provided from the LMS database, i.e.: the topic that the student must learn about.
• ILOAgent – Searches for a LO on the repository (related to the topic obtained from
the LMSAgent), and exhibits it to the student. Besides, monitors the interaction be-
tween the student and the LO, which means the analysis of the data received from
the SCORM communication. Depending on the analyzed data (beliefs), the agent
will deliberate the exhibition of another LO (course with dynamic content).
The JADEX BDI V3 (V3) platform was chosen to implement the agents based on
the BDI architecture [12,13]. The design of ILOMAS includes the characteristics of e-
learning courses deployed on LMS (as MOODLE [7]), which means an environment
accessed mostly from Web Browsers. The Java Servlets and JSP technologies are the
bases of the interface between the client side (student) and the server side (agents’
environment), getting benefits of the V3 services communication structure [13]. A
non-agent class based on the Facade design pattern [17] keeps the low coupling be-
tween the MAS layer and the external items (front-end and servlets).
A first prototype was developed and tested with emphasis on the MAS develop-
ment, instead of visualization issues (such as LO formats or graphical user interfaces)
[18]. The simulation of a learning situation resulted on a different LO retrieved from
the repository. This new LO had the same subject as the previous LO shown. It was
not explicitly defined in the database that the student should have watched this new
LO (only the topic was defined, no specific LO), so the MAS obtained the related LO
dynamically, taking into account the metadata elements declared in IEEE-LOM [9].
The main desire defined d to ILOAgent is to solve the student’s lack of understaand-
ing about the subject. Thuss, when the data received from SCORM points to a learrner
difficulty (error), the ILOM MAS deliberation process (based on the JADEX enggine
[13]) dispatch the goal relatted with this objective. The ILOAgent’s belief base sto ores
the data received, and the deliberation
d process defines that the student needs to vieew a
new (different) LO when th he student selects an incorrect answer. This is the mom ment
when the system achieves a dynamic learning situation, because a new LO not de-
fined previously becomes part of the course structure. From the student’s pointt of
view (and even the teacher’’s point of view) the accessed object was just one, but w with
several contents (a larger LOO composed dynamically by other smaller).
To validate this new version
v of the platform (SCORM integrated), we uused
some SCORM objects (on n the version SCORM 1.2) about Social Security Laaws
(Public Law course). The leearning interaction takes place in a custom LMS developed
with limited features, onlyy to test purposes. The implemented SCORM integrattion
to ILOMAS was tested to reproduce distinct learning situations (Table 1): studdent
that selects all the correcteed answers (Student 1), another that misses all questiions
(Student 2), and one who increases understanding on the subject during interacttion
(Student 3). Each time th hat a student makes a mistake, the ILOMAS identifies
the understanding problem m and suggests another related LO to fill the learnning
gap (Fig. 2).
Table 1. IL
LOMAS SCORM preliminaries evaluation tests
Fig. 2. The ILOMAS SCORM M Web application execution: (1) The moment of the identificaation
that the student needs another LO (wrong answer); (2) New LO exhibition
3 Conclusions and
d Future Work
References
1. Allison, C., Miller, A., Oliver,
O I., Michaelson, R., Tiropanis, T.: The Web in educattion.
Computer Networks 56, 3811–3824
3 (2012)
2. Vesin, B., Klasnja-Milicev vic, A., Ivanovic, M., Budimac, Z.: Applying recommender systtems
and adaptive hypermediaa for e-learning personalization. Computing and Informatics 32,
629–659 (2013). Institute ofo Informatics
3. Bachari, E., Abelwahed, E., Adnani, M.: E-Learning personalization based on dynaamic
learners’ preference. Interrnational Journal of Computer Science & Information Technollogy
(IJCSIT) 3(3) (2011)
Dynamic Selection of Learning Objects Based on SCORM Communication 663
4. Mahkameh, Y., Bahreininejad, A.: A context-aware adaptive learning system using agents.
Expert Systems with Applications 38, 3280–3286 (2011)
5. Caeiro, M., Llamas, M., Anido, L.: PoEML: Modeling learning units through perspectives.
Computer Standards & Interfaces 36, 380–396 (2014)
6. Santos, G., Jorge, J.: Interoperable Intelligent Tutoring Systems as Open Educational
Resources. IEEE Transactions on Learning Technologies 6(3), 271–282 (2013). IEEE
CS & ES
7. Despotovic-Zrakic, M., Markovic, A., Bogdanovic, Z., Barac, D., Krco, S.: Providing
Adaptivity in Moodle LMS Courses. Educational Technology & Society 15(1), 326–338
(2012). International Forum of Educational Technology & Society
8. Komlenov, Z., Budimac, Z., Ivanovic, M.: Introducing Adaptivity Features to a Regular
Learning Management System to Support Creation of Advanced eLessons. Informatics in
Education 9(1), 63–80 (2010). Institute of Mathematics and Informatics
9. Barak, M., Ziv, S.: Wandering: A Web-based platform for the creation of location-based
interactive learning objects. Computers & Education 62, 159–170 (2013)
10. Chen, C.: Intelligent web-based learning system with personalized learning path guidance.
Computers & Education 51, 787–814 (2008)
11. Kurilovas, E., Zilinskiene, I., Dagiene, V.: Recommending suitable scenarios according to
learners’ preferences: An improved swarm based approach. Computers in Human Beha-
vior 30, 550–557 (2014)
12. Wooldridge, M.: An Introduction to MultiAgent Systems, 2nd edn. John Wiley & Sons
(2009)
13. Pokahr, A., Braubach, L., Haubeck, C., Ladiges, J.: Programming BDI Agents with Pure
Java. University of Hamburg (2014)
14. Giuffra, P., Silveira, R.: A multi-agent system model to integrate Virtual Learning Envi-
ronments and Intelligent Tutoring Systems. International Journal of Interactive Multimedia
and Artificial Intelligence 2(1), 51–58 (2013)
15. Silveira, R., Gomes, E., Vicari, R.: Intelligent Learning Objects: An Agent-Based
Approach of Learning Objects. IFIP – International Federation For Information
Processing, vol. 182, pp. 103–110. Springer-Verlag (2006)
16. Bavaresco, N., Silveira, R.: Proposal of an architecture to build intelligent learning objects
based on BDI agents. In: XX Informatics in Education Brazilian Symposium (2009)
17. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley (1995)
18. de Amorim Jr., J., Gelaim, T.Â., Silveira, R.A.: Dynamic e-Learning content selection
with BDI agents. In: Bajo, J., Hallenborg, K., Pawlewski, P., Botti, V., Sánchez-Pi, N.,
Duque Méndez, N.D., Lopes, F., Vicente, J. (eds.) PAAMS 2015 Workshops. CCIS, vol.
524, pp. 299–308. Springer, Heidelberg (2015)
19. SCORM 2004. Advanced Distributed Learning. http://www.adlnet.org/scorm
20. Gonzalez-Barbone, V., Anido-Rifon, L.: Creating the first SCORM object. Computers &
Education 51, 1634–1647 (2008)
21. Campos, R.L.R., Comarella, R.L., Silveira, R.A.: Multiagent based recommendation
system model for indexing and retrieving learning objects. In: Corchado, J.M., Bajo, J.,
Kozlak, J., Pawlewski, P., Molina, J.M., Julian, V., Silveira, R.A., Unland, R., Giroux, S.
(eds.) PAAMS 2013. CCIS, vol. 365, pp. 328–339. Springer, Heidelberg (2013)
Sound Visualization Through a Swarm
of Fireflies
1 Introduction
Although sound visualization has been an object of study for a long time, the
emergence of the computer, with graphic capabilities, allowed the creation of new
paradigms and creative processes in the area of sound visualization. Most of the
initial experiments were done through analogical processes. Since the advent
of computer science, art has taken significant interest in the use of computers
for the generation of automated images. In section 2, we present some of the
main inspirations to our work including sound visualization, generative artworks,
computer art and multi-agent systems.
Our research question relies on the possibility of developing a multi-agent
model for sound visualization. We explore the intersection between computer
art and nature-inspired multi-agent systems. In the context of this work, swarm
simulations are particularly interesting because they allow the expression of a
large variety of different types of behaviors and tend to be intuitive and natural
forms of interaction.
In section 3 we present the developed project, which is based on a multi-agent
system of swarms and inspired by the visual nature of fireflies. In the scope of our
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 664–670, 2015.
DOI: 10.1007/978-3-319-23485-4 67
Sound Visualization Through a Swarm of Fireflies 665
2 Related Work
Ernst Chladni studied thoroughly the relation between sound and image. One
of his best-known achievements was the invention of Cymatics. It geometrically
showed the various types of vibration on a rigid surface [5]. In the 1940s Oskar
Fischinger made cinematographic works exploring the images of sound by means
of traditional animation [4]. His series of 16 studies was his major success [4].
Another geometric approach, was made by Larry Cuba in 1978, but this time
with digital tools. “3/78” consisted of 16 objects performing a series of precisely
choreographed rhythmic transformations [2].
Complex and self-organized systems have a great appeal for the artistic prac-
tice since they can continuously change, adapt and evolve. Over the years, com-
puter artifacts promoting emergent systems behaviors have been explored [1] [7].
Artists got fascinated with the possibility of an unpredictable but satisfying
outcome. Examples of this include the work of Ben F. Laposky, Frieder Nake,
Manfred Mohr, among many others [3].
3 The Environment
In this section we present a swarm-based system of fireflies and all of its interac-
tions. In this environment, fireflies are fed by the energy of sound beats (rhythmic
onsets). While responding to the surrounding elements of the environment, they
search for these energies (see Fig. 1). The colors were chosen according to the
real nature of fireflies. Since they are visible at night, we opted for a dark blue in
the background and a brighter one for the sound beats. As for bioluminescence,
we used yellow.
The environment rules and behaviors, plus the visualization were imple-
mented with Processing. The mechanism for extracting typical audio informa-
tion was made with the aid of the Minim library, mainly because it contains a
function for sound beat detection.
Sound’s Graphic Representation. After the sound analysis, all the proper-
ties of sound are mapped into graphical representations. Sound beats are mapped
into instants (t1, t2, t3,. . . ) which defined the objects horizontal position as
shown in Fig. 2a. Each sound object has a pre-defined duration, meaning that
it is removed from the environment at the end of its duration. Amplitude was
translated into the objects size, i.e., the size is directly proportional to the ampli-
tude (Fig. 2b). Lastly, frequency is mapped into the objects vertical position in
the environment (Fig. 2c). High frequencies (HIF) are positioned on the top
of screen and low frequencies (LOF) emerge in lower positions of the vertical
axis. A fourth characteristic presented in the graphical representation of sound
Sound Visualization Through a Swarm of Fireflies 667
objects is collision (Fig. 2d). This last one is not directly related to sound, only
to sound object’s physics. When a object collides with another one, a contrary
force is applied between these two, separating them from each other.
The closer they are to a source of light, the more attracted they get to it,
meaning that there is a force of attraction towards it. Along with that, agents
have a swarming behavior, meaning that neighbor agents can see each others
and follow them through flocking behavior rules [6].
These rules were presented by Reynolds with a computational model of
swarms exhibiting natural flocking behavior. He demonstrated how a particular
computer simulation of boids could produce complex phenomena from simple
mechanisms. These behaviors define how each creature behaves in relation to its
neighbors: separation, alignment or cohesion [6].
The swarming behaviors present in this system are: separation and cohesion
(Fig. 4). Separation gives the agents the ability to maintain a certain distance
668 A. Rodrigues et al.
from others nearby in order to prevent agents from crowding together. Cohe-
sion gives agents the ability to approach and form a group with other nearby
agents [6]. No alignment force was applied. Alignment is usually associated with
flocking behavior, like birds and fishes do. Swarm behavior – like the one found
in bees, flies and our fireflies – does not imply alignment.
Additionally, the life and death of each agent is also determined by the way it
interacts with the environment. The agent begins with an initial lifespan, losing
part of its energy at each cycle. If the agent gets close to an energy source, it
gains more energy and a longer lifespan; otherwise, it keeps losing its energy
until it dies. There are no mechanisms for the rebirth of agents, as we intend to
keep a clear visualization and understanding of interactions among agents.
Fig. 5. Left image: agent approximation to an object (AG→OB). Right image: agent
growth (E).
Fig. 6. The music that generated this response is characterized by a variety of intensity
and big density of beats.
1
A demonstration video can be found at http://tinyurl.com/ky7yaql.
Sound Visualization Through a Swarm of Fireflies 669
complete visualization of the track so we can perceive the differences inside each
one. Secondly, we present the trajectory made by the agents of the corresponding
music to better analyze their behavior in the different tracks. We present only
one example of those figures due to space constraints.
Track 1 corresponds to a piece with high density of beats and low contrast of
intensities. This promotes a higher chance of having a longer lifespan. However,
the low contrast of the intensities implies that they do not gather so much energy
at once. Track 2 (Fig. 6), is also characterized by a high density of beats, but in
this case the contrast in intensities make swarms gain more energy. Track 3 has
a low contrast of frequencies and a balanced density of beats. For Last, Track
4 as opposed to almost all of the other examples so far described, has a strong
contrast between high and low frequencies. Adding to this, the low density of
beats results in a reduced lifespan for swarms as they have a short field of view.
From the observation of these patterns created by our system, we can con-
clude: (i) fireflies have a tendency to follow the pattern created by the sound
beats as we could see in the example depicted in Fig. 6; (ii) there is a bigger
concentration of fireflies in the sources that contain more energy; (iii) tracks
with a lower contrast between frequencies promote a more balanced spread of
the fireflies in the environment; (iv) tracks with a high density of beats give
fireflies a longer lifespan because the agents have a narrow vision field and thus
they can collect more energy even if it is in small pieces of it.
References
1. Barszczewski, P., Cybulski, K., Goliski, K., Koniewski, J.: Constellaction (2013).
http://pangenerator.com
2. Compart: Larry Cuba, 3/78 (nd). http://tinyurl.com/k2y3vef
3. Dietrich, F.: Visual intelligence: The first decade of computer art (1965–1975).
Leonardo 19(2), 159–169 (1986)
4. Evans, B.: Foundations of a visual music. Computer Music Journal 29(4), 11–24
(2005)
5. Monoskop: Ernst Chladni (nd). http://monoskop.org/Ernst Chladni
6. Reynolds, C.W.: Steering behaviors for autonomous characters. In: Game Develop-
ers Conference, vol. 1999, pp. 763–782 (1999)
7. Uozumi, Y., Yonago, T., Nakagaito, I., Otani, S., Asada, W., Kanda, R.: Sjq++ ars
electronica (2013). https://vimeo.com/66297512
Social Simulation and Modelling
Analysing the Influence of the Cultural
Aspect in the Self-Regulation of Social
Exchanges in MAS Societies: An Evolutionary
Game-Based Approach
PPGCOMP, C3, Universidade Federal Do Rio Grande (FURG), Rio Grande, Brazil
{andressavonlaer,gracaliz,dianaada}@gmail.com
1 Introduction
As it is well known in social sciences, the acts, actions and practices that involve
more than two agents and affect or take account of other agents’s activities, expe-
riences or knowledge states are called social interactions. Social interactions and,
mainly, the quality of these interactions, are crucial for the proper functioning
of the system, since, e.g., communication failure, lack of trust, selfish attitudes,
or unfair behaviors can leave the system far of a solution. The application of
the social interaction concept to enhancements of MAS’s functionality is a nat-
ural step towards designing and implementing more intelligent and human-like
populations of artificial autonomous systems. [13]
Social relationships are often described as social exchanges [1], understood
as service exchanges between pairs of individuals with the evaluation of those
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 673–686, 2015.
DOI: 10.1007/978-3-319-23485-4 68
674 A. Von Laer et al.
exchanges by the individuals themselves [16]. Social exchanges have been fre-
quently used for defining social interactions in MAS [10,15,21]. A fundamental
problem discussed in the literature is the regulation of such exchanges, in order
to allow the emergence of equilibrated exchange processes over time, promot-
ing the continuity of the interactions [12,21], social equilibrium [15,16] and/or
fairness behaviour.1 In particular, this is a difficult problem when the agents,
adopting different social exchange strategies, have incomplete information on the
other agents’ exchange strategies, as in open societies [9].
In the literature (e.g, [9,15,21]), different models were developed (e.g., cen-
tralized/decentralized control, closed/open societies) for the social exchange
regulation problem. Recently, this problem was tackled by Macedo et al. [12],
by introducing the spatial and evolutionary Game of Self-Regulation of Social
Exchange Processes (GSREP), where the agents, adopting different social
exchange strategies (e.g., selfishness, altruism), considering both the short and
long-term aspects of the interactions, evolve their exchange strategies along the
time by themselves, in order to promote more equilibrated and fair interactions.
This approach was implemented in NetLogo.
However, certain characteristics involved in social exchanges may be more
appropriately modeled with cognitive agents2 , such as BDI Agents (Belief,
Desire, Intention) [4]. Also, taking into account the observations made by a
society on the behavior of an agent ag in its past interactions, it is possible to
qualify ag’s reputation [5,6], which can be made available to the other agents
who themselves have not interacted with that agent. These indirect observations
can be aggregated to define any agent past behaviour based on the experiences of
participants in the system [11]. Reputation can assist agents in choosing partners
where there are other agents that can act so as to promote the disequilibrium of
the social exchange processes in the society. Given the importance of this kind
of analysis in many real-world applications, a large number of computational
models of reputation have been developed (e.g., [23,24]).
Then, this paper introduces an evolutionary and cultural approach of GSREP
game for the JaCaMo [2] framework, considering also the influence of the agent
society culture, so defining the Cultural-GSREP game. Observe that here are at
least five basic categories of cultural knowledge that are important in the belief
space of any cultural evolution model: situational, normative, topographic, his-
torical or temporal, and domain knowledge [18]. In this paper, we explore just
the normative category, and let the combination of other cultural aspects for
further work. We consider a specific society’s culture where the agents’ repu-
tations are aggregated as group beliefs [23], using the concept of artifacts [20].
Based on the idea that “the culture of a society evolves too, and its evolution
may be faster than genetics, enabling a better adaptation of the agent to the
environment”[19], we analyse the influence of the culture in the evolution of the
1
We adopted the concept of fairness behaviour/equilibrium as in [17, 25].
2
For discussions on the role of BDI agents in social agent-based simulation, see [14].
Analysing the Influence of the Cultural Aspect in the Self-Regulation 675
Fig. 2. Two stages of a single social exchange game (for selfishness/altruism exchange
strategies)
by the agents themselves, and a fitness function helps to evolve the agents’s
exchange strategies; the second part is the creation of group beliefs (GBs) using
artifacts, which forms the cultural level based on agents’ reputations constructed
over the exchanges experienced by the agents in the society. The model was
implemented in Jason [3], using the concept of Agents & Artifacts [20] for the
implementation of group beliefs in CArtAgO framework [2].4
However, if b believes that the service offered by a provides less satisfaction than
the minimal satisfaction (Smin ) it is willing to accept, than b refuses a’s offer
and this exchange stage does not occur. Supposing that b accepts the service
provided by a, then, at the end of this stage, the agent a has a credit value (V ),
that is, a credit related to the service it has previously performed to agent b. R
and S are called material values, related to the performed exchanges. T and V
are virtual values, related to future interactions, since they help the continuation
of the exchanges.
The second stage, denoted by II, is similar to the first, but referring to a
possible debt collection by agent a, when a charges b for a service in payment for
its virtual value (V ) (the credit a has obtained from b in the stage I). The agent
b has on its belief base a debt value (T ) and it then performs a service offer with
an investment value (R) to a (with R ≤ Rmax ), which in turn generates a value of
satisfaction (S) for b’s offer, in case that it accepts such satisfaction value (i.e.,
S ≥ Smin ), otherwise this exchange stage does not occur. After each 2-stage
exchange between a and b, they calculate the material reward they received,
using the payoff function pab : [0, 1]4 → [0, 1]:
pab (RIab , RIIba , SIba , SIIab ) = (1)
⎧ 1−R
⎪ Iab + SIIab
⎪
⎪ if (RIab ≤ Ra
max
∧ SIba ≥ Sbmin ) ∧ (RIIba ≤ Rbmax ∧ SIIab ≥ Samin )
⎨ 2
1 − RIab
⎪
⎪ if (RIab ≤ Ra
max
∧ SIba ≥ Sbmin ) ∧ (RIIba > Rbmax ∨ SIIab < Samin )
⎪
⎩ 2
0 if (RIab > Ra
max
∨ SIba < Sbmin ) ∧ (RIIba > Rbmax ∨ SIIab < Samin )
Observe that, according to Eq. (1), if both exchange stages I and II are suc-
cessfully performed, then the agents’ rewards are greater. On the contrary, if no
stage occurs, i.e., the agent b refuses the service of agent a in the first stage I,
the payoff is null.
where r(0) and s(0) are, respectively, the initial investment and satisfaction
values, and the current parameters that represent its exchange strategy are: R,
Rmax , S min , k t and k v , where R ∈ [0; 1] is the value of investment, Rmax ∈ [0; 1]
is the maximum value that the agent will invest, S min ∈ [0; 1] is the minimum
678 A. Von Laer et al.
value of satisfaction that the agent accepts, k ot ∈ [0; 1] and k dt ∈ [0; 1] are,
respectively, the debt overestimation and credit depreciation factors, as shown
in Eqs. (2) and (3), e ∈ [0, 1] is the weight that represents the agent’s tolerance
degree when its payoff is less than of its neighboring agents (envy degree), and
g ∈ [0, 1] represents the agent’s tolerance degree when its payoff is higher than
its neighboring agents’ payoffs (guilt degree) [12,25].
Analogously, the initial chromosome belief of a selfish agent is defined as:
chromosome([r(0), s(0), rmax(0.2), smin(0.8), e(0.9), g(0.1), kdt(0.2), kov(0.2)]).
To implement/evaluate the model, we consider five agents that perform the
exchanges, each agent with a different initial exchange strategy, namely: altru-
ism, weak altruism, selfishness, weak selfishness and rationality. The rational
agent plays just for the Nash Equilibrium6 , and then smin = e = g = k t = k v = 0.
The agent i calculates its adaptation degree through its fitness function Fi :
[0, 1]m → [0, 1], whose definition, encompassing all types of exchange strategies, is:
ei gi
Fi (X) = xi − max(xj − xi , 0) − max(xi − xj , 0), (5)
(m − 1) (m − 1)
j=i j=i
where X is the total payoff allocation of agent i (Eq. (4)), ei and gi are i’s
envy and guilt degrees, respectively. To evaluate its fitness, the agent compares
its current fitness with the previous one: if it exceeds the value of the previous
fitness, then the current strategy is better than the previous one, and the agent
makes an adjustment in the vector of probabilities, increasing the probability of
the current strategy to be chosen again, increasing/decreasing the parameters of
the chromosome belief defining its strategy.7
The probability vector of adjustments is in Table 1. There are 27 possible
adjustments, e.g., p0i is the probability of increasing Ri , Rimax and Simin (by
a certain exogenously specified adjustment step), and p5i is the probability of
increasing the value of Ri , keeping the value of Rimax and decreasing Simin . The
probability and strategy adjustment steps fp and fs determine, respectively, on
which extent the probabilities of the probability vector and the values ri , rimax
and smin
i are increased or decreased.
6
See [12] for a discussion on the Nash Equilibrium of the Game of Social Exchange
Processes.
7
The fitness function was based in [12, 25].
Analysing the Influence of the Cultural Aspect in the Self-Regulation 679
The culture of the agent society is consisted of the group belief (GB) and the
reputation artifacts. For the implementation in CArtAgO, these artifacts are
firstly created by the mediator agent, which is also responsible for initiating the
exchanges by sending a message to all agents to start the sequence of exchanges.
The GB artifact stores the beliefs sent by agents after obtaining experience in
exchanges and the reputation artifact creates the reputation of agents.
The beliefs that compose the artifacts are observable properties. The
announcements are treated as interface operations, where some parameters are
informed: the predicate announcement, the degree of certainty of a belief and a
value of the strength of this certainty. The composition of a GB works as follows.
The formation rules of individual beliefs lie within the agent minds. The rules
that form the group beliefs (synthesis rules) are in an external entity to agents
and the communication for the formation of GB is made through announcements
sent to a component that aggregates it, forming a GB (see Fig. 3). The set A of
all announces is defined by
def
A = {< p, c, s > |p ∈ P, c ∈ [0..1], s ∈ N}, (6)
where P is the set of all the predicates, and p, c and s are, respectively, the
predicate, the certainty degree and the strength degree of an announce. For
example, in the announce personality("selfish",bob), with certainty degree 0.8
and strength 6, the advertiser is quite sure that agent bob adopts a selfishness
exchange strategy, based on 6 experiences it had in past exchanges with bob. See
the method announce in Fig. 4.
Figure 5 shows the architecture of the artifact ArtCG of group beliefs, including
the classes AgentAnnounce, Belief and the announce method, which corresponds
to the announce operation of beliefs (Eq. (6)). When receiving an announce, the
artifact adds it to a list of announces, and whenever there exists at least one
equal announce from each agent present in the system, this announce becomes
a reputation (see Fig. 4).
The Belief class function is to represent the group belief composed by
the tuple: predicate, certainty degree and strength, and it implements a
ToProlog interface, which allows its description in the form of a predicate.
680 A. Von Laer et al.
The AgentAnnounce class represents the announces made by the agents and inher-
its the Belief class, also adding the advertiser attribute that represents the agent
that made the announce.
To create a reputation, the certainty and strength values are calculated by
the synthesis process and the artifact Reputation is notified of the new belief
group by the method update. If there is already a group belief with the same
predicate in the artifact, then it updates such values, otherwise a new group
belief is added.
In this paper, we consider a mixed society (composed of agents with five
different exchange strategies), and, due to this fact, the adopted aggregation
method is the weighted synthesis [23], where announcements are synthesized in
order to seek a middle term between them, so not only benefiting a optimistic
or pessimistic society. The function weighted synthesis sinponp , which gives the
certainty degreec, the strength s, where |Cp | is the subset containing all the
announcements of a predicate p, is given by:
ca sa sa
a∈Cp a∈Cp
sinponp =< p, c, s >, c = , s= (7)
sa |Cp |
a∈Cp
Then, in Fig. 2, to begin the second exchange stage between two agents a
and b, the agent a charges the agent b for the service made in the first stage,
and then it sends b the credit value V that it thinks itself worthy. Through a
comparison between a’s credit value and the value R that a has invested in the
Analysing the Influence of the Cultural Aspect in the Self-Regulation 681
first stage, b is able to draw a conclusion about the exchange strategy adopted
by a:
– Ra > Va : if the value of investment used by a in the first stage is higher than
the credit value it attributed itself, b concludes that a is altruist;
– Ra < Va : if the value of investment used by a in the first stage is lower than
the credit value it attributed itself, b concludes that a is selfish;
– Ra = Va : If the value of a’s investment is equal to a’s credit value, b concludes
that a is rational.
The agent b sends its conclusion about the strategy adopted by agent a to
the group belief artifact ArtCG, using the announce method (Fig. 4), to form a
reputation of the agent a. Once the reputation is formed in the Reputation
artifact, it is added to the agents’ beliefs, thus becoming a common group belief
to all participants of the game.
Whenever there is a reputation that an agent i is selfish, the agents send a
message informing the mediator agent, which sends a message to agent i saying
that i can not participate in the next play. So, i fails to improve its fitness
value, unless it modifies its strategy to enter into the game again, increasing its
investment value R and the maximum investment value Rmax , and decreasing
its minimum satisfaction value Smin .
5 Simulation Analysis
An social exchange strategy is determined by how an agent behaves towards
the exchanges proposed by other agents, by the way this agent determines the
682 A. Von Laer et al.
8
Each mark in the X-axis of 6 represents 10 cycles.
Analysing the Influence of the Cultural Aspect in the Self-Regulation 683
we show the simulation of the evolution of the agents’ fitness values in a period
of 300 cycles.
Table 3 shows the number of two-stage exchanges, which was increased by
385.71 %. Table 5 shows the average value and standard deviation of the fitness
values in the initial and final cycles of the simulations, considering the different
exchange strategies. The increase in the fitness value of the altruist agent was
252.49 %, while for the weak altruist agent it was 258.20 %, for the rational
agent it was 188.94 %, for the selfish agent it was 385.77 % and, finally, for the
weak selfish agent it was 258.58 %. The strategy that showed lower growth was
the rationality strategy, while the selfishness strategy had a higher evolution. In
Table 4, we present the values of the overall average and the standard deviation
of the global fitness value, showing an increasing of 584.26%.
In the case with culture, the increase in the number of two-stage exchanges
was higher (385.71 %) than in the scenario without culture (171.73 %). Regarding
the fitness values, in the scenario with culture only the Weak selfish strategy
has not increased the average of the fitness value (343.95 % without culture
and 258.58 % with culture). The others strategies showed largest increase in
their fitness values, as shown in Table 6. Observe that, in both scenarios, the
684 A. Von Laer et al.
Average Average
Begin End Strategy Initial fitness Final fitness
One-stage exchange 3.65 0.1 Altruist 0.27849 0.98170
Two-stage exchanges 2.8 13.6 Weak altruist 0.27411 0.98188
None exchange 8.65 0.1 Rational 0.33743 0.975
Standard deviation Selfish -0.33172 0.94797
Begin End Weak selfish 0.27673 0.99231
One-stage exchange 2.36809 0.44721 Standard deviation
Two-stage exchanges 3.17224 4.96726 Strategy Initial fitness Final fitness
None exchange 3.32890 0.44721 Altruism 0.02539 0.08101
Weak altruism 0.01128 0.07812
Rationality 0.00747 0.11180
Table 4. Global fitness value Selfishness 0.02938 0.20056
Weak selfishness 0.12844 0.03437
Initial Final
Global standard deviation 0.28005 0.01673
Global average 0.16701 0.97577
rationality strategy was the one that showed lower growth in relation to the
others, while selfishness strategies showed higher evolution.
6 Conclusion
In this paper, the GSREP game was adapted to a BDI-MAS society, using the
Jason language, with the addition of group beliefs as the society “culture” com-
mon to all agents involved in the system, implemented as a CArtAgO artifact.
We consider that the society culture is composed by the agents’ reputations.
This BDI version of the game was called the Cultural-GSREP game. Then, we
analysed and compared the simulation results considering two scenarios, just
taking into account or not the culture.
The equilibrium of Piaget’s Social Exchange Theory is reached when occurs
reciprocity in exchanges during the interactions. Our approach showed that with
the evolution of the strategies the agents were able to maximize their adaptation
values becoming self-regulators of exchanges processes and thereby contributing
to increasing the number of successful interactions. All agents have evolved and
contributed to the evolution of the society. Whenever the services offered are
more fair (balanced), the greater is the number of successful interactions. Com-
paring the two scenarios, we conclude that the addition of the culture – the
reputation as a focal point – in social exchanges had the expected influence on
the evolution of the agents’s strategies and exchange processes, increasing the
exchanges successfully performed and the fitness value in a shorter time.
Analysing the Influence of the Cultural Aspect in the Self-Regulation 685
Future work will consider the analysis of the final parameters of the strate-
gies that emerged in the evolution process, and other categories of the cultural
knowledge in the belief space, using belief artifacts in different scopes beyond
the reputation, and creating different ways for the agents to reason about the
group beliefs.
References
1. Blau, P.: Exchange & Power in Social Life. Trans. Publish., New Brunswick (2005)
2. Boissier, O., Bordini, R.H., Hübner, J.F., Ricci, A., Santi, A.: Multi-agent oriented
programming with JaCaMo. Science of Computer Programming 78(6), 747–761
(2013)
3. Bordini, R.H., Hübner, J.F., Wooldrige, M.: Programming Multi-agent Systems in
AgentSpeak Using Jason. Wiley Series in Agent Technology. John Wiley & Sons,
Chichester (2007)
4. Bratman, M.E.: Intention, plans, and practical reason. Cambridge University Press
(1999)
5. Castelfranchi, C., Falcone, R.: Principles of trust for MAS: cognitive anatomy,
social importance and quantification. In: Intl. Conf. of Multi-agent Systems
(ICMAS), pp. 72–79 (1998)
6. Castelfranchi, C., Falcone, R., Firozabadi, B., Tan, Y.: Special issue on trust,
deception and fraud in agent societies. Applied Artificial Intelligence Journal 1,
763–768 (2000)
7. Criado, N., Argente, E., Botti, V.: Open issues for normative multi-agent systems.
AI Communications 24(3), 233–264 (2011)
8. Dignum, V., Dignum, F. (eds.): Perspectives on Culture and Agent-based Simula-
tions. Springer, Berlin (2014)
9. Dimuro, G.P., Costa, A.R.C., Gonçalves, L.V., Pereira, D.: Recognizing and learn-
ing models of social exchange strategies for the regulation of social interactions
in open agent societies. Journal of the Brazilian Computer Society 17, 143–161
(2011)
10. Grimaldo, F., Lozano, M.A., Barber, F.: Coordination and sociability for intelligent
virtual agents. In: Sichman, J.S., Padget, J., Ossowski, S., Noriega, P. (eds.) COIN
2007. LNCS (LNAI), vol. 4870, pp. 58–70. Springer, Heidelberg (2008)
11. Huynh, T.D., Jennings, N.R., Shadbolt, N.R.: An integrated trust and reputation
model for open multi-agent systems. JAAMAS 13(2), 119–154 (2006)
12. Macedo, L.F.K., Dimuro, G.P., Aguiar, M.S., Coelho, H.: An evolutionary spatial
game-based approach for the self-regulation of social exchanges in mas. In: Schaub,
et al. (eds.) Proc. of ECAI 2014–21st European Conf. on Artificial Intelligence.
Frontier in Artificial Intelligence and Applications, no. 263, pp. 573–578. IOS Press,
Netherlands (2014)
13. Nguyen, N.T., Katarzyniak, R.P.: Actions and social interactions in multi-agent
systems. Knowledge and Information Systems 18(2), 133–136 (2009)
14. Padgham, L., Scerri, D., Jayatilleke, G., Hickmott, S.: Integrating BDI reasoning
into agent based modeling and simulation. In: Proc. WSC 2011, pp. 345–356. IEEE
(2011)
686 A. Von Laer et al.
15. Pereira, D.R., Gonçalves, L.V., Dimuro, G.P., Costa, A.C.R.: Towards the self-
regulation of personality-based social exchange processes in multiagent systems.
In: Zaverucha, G., da Costa, A.L. (eds.) SBIA 2008. LNCS (LNAI), vol. 5249, pp.
113–123. Springer, Heidelberg (2008)
16. Piaget, J.: Sociological Studies. Routlege, London (1995)
17. Rabin, M.: Incorporating fairness into game theory and economics. The American
Economic Review 86(5), 1281–1302 (1993)
18. Reynolds, R., Kobti, Z.: The effect of environmental variability on the resilience of
social networks: an example using the mesa verde pueblo culture. In: Proc. 68th
Annual Meeting of Society for American Archeology, vol. 97, pp. 224–244 (2003)
19. Reynolds, R., Zanoni, E.: Why cultural evolution can proceed faster than biological
evolution. In: Proc. Intl. Symp. on Simulating Societies, pp. 81–93 (1992)
20. Ricci, A., Viroli, M., Omicini, A.: The A&A programming model and technology
for developing agent environments in MAS. In: Dastani, M., Seghrouchni, A.E.F.,
Ricci, A., Winikoff, M. (eds.) ProMAS 2007, vol. 4908, pp. 89–106. Springer-Verlag,
Heidelberg (2008)
21. Rodrigues, M.R., Luck, M.: Effective multiagent interactions for open cooperative
systems rich in services. In: Proc. AAMAS 2009, Budapest, pp. 1273–1274 (2009)
22. Schelling, T.C.: The strategy of conflict. Harvard University Press, Cambridge
(1960)
23. Schmitz, T.L., Hübner, J.F., Webber, C.G.: Group beliefs as a tool for the forma-
tion of the reputation: an approach of agents and artfacts. In: Proc. ENIA 2012,
Curitiba (2012)
24. Serrano, E., Rovatsos, M., Botı́a, J.A.: A qualitative reputation system for multia-
gent systems with protocol-based communication. In: AAMAS 2012, Valencia, pp.
307–314 (2012)
25. Xianyu, B.: Social preference, incomplete information, and the evolution of ulti-
matum game in the small world networks: An agent-based approach. JASSS 13, 2
(2010)
Modelling Agents’ Perception: Issues and Challenges
in Multi-agents Based Systems
Abstract. In virtual agents modelling, perception has been one of main focus
for cognitive modelling and multi-agent-based simulation. Research has been
guided by the representation of human senses operations. In this sense, percep-
tion focus remains on the absorption of changes that occur in the environment.
Unfortunately, scientific literature has not covered the representation of most of
the perception mechanisms that are supposed to exist in an agent’s brain like for
example ambiguity. In terms of multi-agent based systems, perception is re-
duced to a parameter forgetting the complex mechanisms behind it. The goal of
this article is to point out that the challenge of modelling perception ought to be
centred on the internal mechanisms of perception that occur in our brains,
which increases the heterogeneity among agents.
1 Introduction
During the last decade, studies simulating virtual agents (VA) in multi-agent-based
simulation (MABS) systems have tried to bring more realism into modelling the per-
ception of environment. Researchers have been focused their efforts on improving
perception models, and corresponding techniques.
Nevertheless, recent work [1, 25] has produced sophisticated theoretical models for
reproducing the human senses like sight and hearing. The models were integrated then
in a sustainable multi-sense perception system, in order to put together a perceptual
system capable of approximately replicating the human sensory system. In fact, this is
a keystone to use VAs to simulate how the senses of people work in order to capture a
dynamic and nondeterministic environment [1]. The major problem of these proposals
is to summarise perception into the operations of human senses.
Many of the psychological activities involved in perception, as well as the inherent
mechanisms of the brain subsystems associated to it, have been overlooked. Is a VA
sure about what it is capturing under this multiple senses frameworks? Is it reality?
Clearly, the answer is no to both questions. The VA perception described in literature
misses and does not represent the principal target of human perception: recognition.
This article proposes to discuss the challenges involved in perception, including the
reproduction of all the mechanisms behind this cognitive process. The multi-senses
framework represents only a small component of this huge and complex process that
is perception. Perception goes beyond faithful representation of input sensors. As an
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 687–695, 2015.
DOI: 10.1007/978-3-319-23485-4_69
688 N.T. Magessi and L. Antunes
example, the article will focus on risk perception to demonstrate the challenge and
what is involved in it.
Section 2 reviews the literature on modelling perception for virtual agents in a
multi-agent-based system. Section 3 revisits the concepts behind perception. Section 4
discusses the principal main issues in the literature and presents the crucial chal-
lenges, that modelling perception will have in near future. Section 5 describes our
vision for implementing perception and section 6 puts some conclusions forward.
2 Related Literature
Literature presents different standpoints about endowing VAs with perception for
MABS. The most frequently used approach is to ensure that VAs have a generalised
knowledge about the environment [2]. This approach does not allow us to use cor-
rectly perception in order to simulate realistic scenarios, because the VA is not certain
about the veracity of what is capturing. The opposite case is the one in which agents
take their decisions sustained on the collected data received through their multiple
senses, having no knowledge about the environment, not even generic knowledge.
In between these extreme cases, we have agents that can perceive some informa-
tion, have a conception of the environment around them and “act on their perception”
[3]. In the case described by [4], agents have a graphical access to their environment.
According the described concept of perception, agents choose which path on the
graph is more feasible to achieve their target. There are several problems associated to
this perspective. The main one is the assumption that environments are static, which
raises difficulties in simulating complex scenarios. In this perspective, perception is
incomplete and conceived in a very restricted form. Clearly, it is not adequate to use it
for modelling realistic situations. Rymill and Dodgson have developed a method to
simulate vision and attention of individuals in a crowd [5]. The simulation was done
for open and closed spaces. Independently of the problems identified in the techniques
to filter information from a highly dynamic environment, the issue remained that per-
ception was incomplete and conceived in a very restricted form. Vision was modelled
only as an input sensor and attention is its precedent on the cognitive process.
Pelechano et al [6] made a debate about a simulator system for an evacuation
scenario, but this system was inaccurate in representing real vision and consequently
perception. Brooks [7] developed what he called creatures: a series of mobile robots
operating without supervision in standard office environments. The intelligent system
behind them was decomposed into independent and parallel activity producers, all of
which interfaced directly to the world through perception and action, rather than inter-
facing to each other.
Other proposals, like [8], had built-in simulators to describe hearing. However ol-
factory perception is limited to a few published articles with no consistent simulator.
And no study is known for simulating tactile senses. Steel et al [9] proposed and
developed a cohesive framework to integrate, under a modular and extensible archi-
tecture, many virtual agent perception algorithms, with multiple senses available.
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 689
Their architecture allows the assimilation, in the sense of integration, of dynamic and
distributed environments. They conceive perception according to an environment
module, where information is extracted and transposed to the agent’s memory.
Clearly, they identify the brain as being outside the scope of perception and more
related to memories. Kuiper et al [10] associated the vision process to perception and
presented more efficient algorithms to process visual input, which were entirely im-
plemented under the DIVAs (Dynamic Information Visualization of Agents systems)
framework. Recently, Magessi et al [11] presented an architecture for risk perception.
This architecture puts the main focus on the representativeness of perception as it is
performed in reality by individuals. Vision and other senses are designated by input
sensors and were considered as one component of this cognitive process.
3 Perception
3.1 Definition
Perception is one of the cognitive processes in the brain that precede decision making.
Perception is the extraction, selection, organisation and interpretation of sensory in-
formation in order to recognise and understand the environment [12]. Perception is
not restricted to passivity upon reception of input signals. Perception can suffer the
influence of psychological, social and cultural dimensions [13]. Psychology influ-
ences perception through capabilities and cognitive factors. For example, one individ-
ual who suffers from a psychological disorder may have the notion that his/her per-
ception may be being affected. Concerning the social dimension, the influence comes
from the interaction among individuals in society, towards imitation or persuasion, for
example. Learning, memory, and expectations can shape the way in which we per-
ceive things [13, 14]. Perception involves these “top-down” effects as well as “bot-
tom-up” methods for processing sensory input [14]. The “bottom-up” processing is
basically low-level information that is used to build up higher-level information (e.g.,
shapes for object recognition). The “top-down” processing refers to recognition task
in terms of what it was expected in a specific situation. It is a vital factor in determin-
ing where entities look like and knowledge that influence perception [23]. Perception
depends on the nervous system complex functions, but subjectively it seems mostly
effortless, because processing happens outside conscious awareness [10].
However, it is important to realise that if we want to have a more complete atten-
tion mechanism related to vision, the work must be conducted by the interaction of
bottom-up factors based on image features and top-down guidance based on scene
knowledge and goals. The top-down component could be understood as the epicentre
of attention allocation when a task is at hand. Meanwhile, the bottom-up component
acts as reaction mechanism of alert. It allows the system to discover potential oppor-
tunities or risks in order to stop threatening events. While the top-down process estab-
lishes coherence between the environment looked by agent and its goals or tasks, the
bottom-up component has the intent of reproducing the alert mechanisms, warning
about objects or places relevant to the agent.
690 N.T. Magessi and L. Antunes
3.3 Affordances
In [16], Gibson developed an interaction approach on perception and action, settled
on information available in the environment. He refused the framing assumption of
factoring external-physical and internal-mental processes. The interaction alternative
is centred on processes of agent-situation interactions that come from ecological psy-
chology and philosophy, namely situation theory [26, 27]. The concept of affordance
for an agent can be defined as the conditions or constraints in the environment to
which the agent is attuned. This broad view of affordances includes affordances that
are recognised as well as affordances that are perceived directly.
Norman used the term affordances to refer just the possibilities of action that are
perceivable by an individual [17]. He made the concept dependent on the physical
capabilities of an agent and his/her individual goals, plans, values, beliefs, or past
experiences. This means that he characterised the concept of affordance as relational,
rather than subjective or intrinsic. In 2002, Anderson et al. [18] sought that directed
visual attention, and not affordance, is the key responsible for the fast generation of
many motor signals associated with the spatial characteristics of perceived objects.
They discovered this by examining how the properties of an object affect an ob-
server’s reaction time for judging its orientation.
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 691
perceive something if they have a representation of that thing, or from the parts that
compose it, even if inchoate.
Other researchers assume that every perception even culminates in storing. How-
ever, memory should not be seen as passive, a simple storage of data collected by
sensors. It must be seen under a dualistic perspective. VA memory should also have
an influence on perception, because to perceive something we need to have the se-
mantic knowledge in our semantic memory, for that object or event. Otherwise, VAs
have to learn first, beside of accelerating recognition.
Clearly, the first challenge is to systematise all the perception process, including
the missed activities or dimensions that have the incumbency to format agent’s per-
ception. The second challenge is to bring to Multi-Agent Based Systems the capacity
to represent the interconnections among psychological, social and cultural dimensions
involved in perception [11]. These dimensions and subsequent factors are the
keystones for the dynamics of perception.
The third challenge, which is both ambitious and complex, is to establish the
macro-micro link between a specific judgement and the neuro-physiological dimen-
sion of perception. Modelling perception of VAs cannot be trapped to upstream stage
and moving on to the downstream stage of the process
Taking into account the issues and challenges described above, it is important to
figure out what would be the consequences if we improved perception modelling. The
major consequence is to separate perception from decision in VAs, similarly to what
happens in reality. This is determinant to understand many issues related to decision
science, where in fact the relation between what we assimilate and decision is not
linear. If we want to understand why a decision-maker took an incorrect decision we
need to have clearly modelled his decision and perception processes. If the problem
came from perception, it is relevant to pin down in which part of the process it
occurred. This brings more heterogeneity to agents in multi-agent based systems.
Another important consequence is to understand if an agent perceived the reality
surrounding him. Or instead, if he perceived something different from reality, when
he took the decision.
In terms of improving perception modelling in robots, the strategy goes by the use
of very simple cases, like the perception of a common figure which has associated
ambiguity. For example, the Rubin Vase, which has two interpretations, either as a
vase or as two faces. This can be done under pixel techniques, where the captured
images allow robots to perceive some figure formats when a connection is established
with their own semantic memory.
One of the common mistakes is to insist in capturing some kind of standard percep-
tion, common to all individuals. Of course, people have mechanisms in common for
perception. However, perception is highly subjective, since it depends on the past
experiences of each individual. These experiences and associated acquisition define
his/her representation of an object, figure, event or environment. So, instead of
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 693
searching for (or defining) standard mechanisms of perception, we could replicate the
perception of one individual. More specifically, to try to clone a specific person per-
ception by using its own description of what this person is perceiving. In the Rubin
Vase example, this means that one robot could recognise a vase and another could
recognise faces. Everything depends on the forms (vases or faces) that were collected
in the past by each robot and stored on their own semantic memory. In a case of
multi-agent based system, the ambiguity of perception is present on the way that
agents interpret the variations occurred in some parameters.
Another important key point about modelling perception is to build multi-
disciplinary teams to work on it. In this sense, the operational strategy must continue,
refresh and fix the idea of [7] where Brooks developed multiple algorithms working
in parallel to pursue perception. This strategy happens because a stimulus could not
be transformed into a percept. Our claim is that an ambiguous stimuli may be trans-
formed into multiple perceptions, experienced randomly, one at a time, in what is
called “multi stable perception.” [22] However, the same stimuli, or absence of them,
may induct in different perceptions depending on the person’s culture and previous
experiences. After we integrate fundamental psychological insights in perception
modelling and the advance of neuroscience brings us new inputs continuously, model-
lers will be able to substitute the developed algorithms by new ones, where these rep-
licate what happens in real physiology. So, this vision clearly defends that is possible
to build robots with perception similar to human beings if it focuses on a specific
target and/or individual.
6 Conclusion
References
1. Ray, A.: Autonomous perception and decision-making in cyberspace. In: The 8th Interna-
tional Conference on Computer Science & Education (ICCSE), Colombo, Sri Lanka, April
26–28, 2013
2. Uno, K., Kashiyama, K.: Development of simulation system for the disaster evacuation
based on multi-agent model using GIS. Tsinghua Science and Technology 13(1), 348–353
(2008)
694 N.T. Magessi and L. Antunes
3. Shi, J., Ren, A., Chen, C.: Agent-based evacuation model of large public buildings under
fire conditions. Automation in Construction 18(3), 338–347 (2009)
4. Sharma, S.: Simulation and modelling of group behaviour during emergency evacuation.
In: Proceedings of the IEEE Symposium on Intelligent Agents, Nashville, Tennessee,
pp. 122–127, March 30–April 2, 2009
5. Rymill, S.J., Dodgson, N.A.: Psychologically-based vision and attention for the simulation
of human behaviour. In: Proceedings of Computer Graphics and Interactive Techniques,
Dunedin, New Zealand, pp. 229–236, November 29–December 2, 2005
6. Pelechano, N., Allbeck, J., Badler, N.: Controlling individual agents in high-density crowd
simulation. In: 2007 ACM SIGGRAPH/Eurographics Symposium on Computer Anima-
tion, San Diego, California, pp. 99–108, August 2–4, 2007
7. Brooks, R.A.: Intelligence without representation. Artificial Intelligence 47, 139–159
(1991)
8. Piza, H., Ramos, F., Zuniga, F.: Virtual sensors for dynamic virtual environments. In: Pro-
ceedings of 1st IEEE International Workshop on Computational Advances in Multi-Sensor
Adaptive Processing (2005)
9. Steel, T., Kuiper, D., Wenkstern, R.: Virtual agent perception in multi-agent based simula-
tion systems. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intel-
ligent Agent Technology (2010)
10. Kuiper, D., Wenkstern, R.Z.: Virtual agent perception in large scale multi-agent based si-
mulation systems (Extended Abstract). In: Tumer, K., Yolum, P., Sonenberg, L., Stone, P.
(eds.) Proc. of 10th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS
2011), Taipei, Taiwan, pp. 1235–1236, May 2–6, 2011
11. Magessi, N., Antunes, L.: An Architecture for Agent’s Risk Perception. Advances in
Distributed Computing and Artificial Intelligence Journal 1(5), 75–85 (2013)
12. Schacter, D.L., Gilbert, D.T., Wagner, D.M.: Psychology, 2nd edn. Worth, New York
(2011)
13. Magessi, N., Antunes, L.: Modelling agents’ risk perception. In: Omatu, S., Neves, J.,
Corchado Rodriguez, J.M., Paz Santana, J.F., Gonzalez, S.R. (eds.) Distributed Computing
and Artificial Intelligence. AISC, vol. 217, pp. 275–282. Springer, Heidelberg (2013)
14. Bernstein, D.A: Essentials of Psychology. Cengage Learning. pp. 123–124.
ISBN 978-0-495-90693-3 (Retrieved March 25, 2011)
15. Gregory, R.: Perception. In: Gregory, R.L., Zangwill, O.L., pp. 598–601 (1987)
16. Gibson, J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving,
Acting, and Knowing (1977). ISBN 0-470-99014-7
17. Norman, D.: Affordance, Conventions and Design. Interactions 6(3), 38–43 (1999)
18. Anderson, S.J., Yamagishi, N., Karavia, V.: Attentional processes link perception and ac-
tion. Proceedings of the Royal Society B: Biological Sciences 269(1497), 1225 (2002).
doi:10.1098/rspb.2002.1998
19. Ropeik, D.: How Risky Is It, Really? Why Our Fears Don’t Always Match the Facts.
McGraw Hill, March 2010
20. Steel, T., Kuiper, D., Wenkstern, R.Z.: Context-aware virtual agents in open environments.
In: Proceedings of the Sixth International Conference on Autonomic and Autonomous Sys-
tems (ICAS 2010), Cancun, Mexico, March 7–13, 2010
21. Fine, K., Rescher, Nicholas: The Logic of Decision and Action. Philosophical Quarterly
20(80), 287 (1970)
Modelling Agents’ Perception: Issues and Challenges in Multi-agents Based Systems 695
22. Eagleman, D.: Visual Illusions and Neurobiology. Nature Reviews Neuroscience 2(12),
920–926 (2001). doi:10.1038/35104092. PMID: 11733799
23. Yabus, A.: “Eye movements and vision”, chapter Eye movements during perception of
complex objects. Plenum Press, New York (1967)
24. Danks, D.: The Psychology of Causal Perception and Reasoning. In: Beebee, H.,
Hitchcock, C., Menzies, P. (eds.) Oxford Handbook of Causation. Oxford University
Press, Oxford (2009)
25. Kurzweil, R.: How to Create a Mind: The Secret of Human Thought Revealed. Viking
Books, New York (2012). ISBN 978-0-670-02529-9
26. Barwise, J., Perry, J.: Situations and attitudes. MIT Press/Bradford, Cambridge (1983)
27. Devlin, K.: Logic and information, pp. 49–51. Cambridge University Press (1991)
Agent-Based Modelling for a Resource
Management Problem in a Role-Playing Game
1 Introduction
In Gaza, a province of Mozambique, a scenario of conflict exists between farmers
and cattle producers. Both stakeholders need water and although the resource
exists in abundance, the lack of planning to circumscribe an area for the cattle
and a specific local for the agriculture, have been responsible for the increase
the number of conflicts between these two activities. The cattle are usually
abandoned in the fields near the river and, alone, these animals they follow
an erratically trajectory to the water destroying cultivated fields near the river.
Local authorities have difficult to deal with the problem because cattle produc-
ers argue that they have the right to have the cattle in lands that belong to
community. Although cattle producers pay fines for the farmers’ harvest losses,
their behaviour seems not to change. Ancient practices are difficult to modify.
To address this problem, it was decided to follow the steps identified in the
companion modelling approach [4]. With this approach we expect to promote
an open debate inside the community and to help to find a participated solu-
tion that will help to overcome the problem. Although some solutions have been
discussed between local authorities and population in general, the lack of invest-
ment and the difficulty to joint stakeholders to discuss the problem have delayed
the implementation of a definite solution.
Role Playing Games (RPG) have been used for different proposes and one
of them is the social learning. In fact, with RPG, it is possible to “reveal some
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 696–701, 2015.
DOI: 10.1007/978-3-319-23485-4 70
Agent-Based Modelling for a Resource Management Problem 697
It was assumed that the scenario could leverage our understanding of how the
different stakeholders see the conflict. As already pointed out, a BDI architecture
is used to support agents modelling in the ABM implementation. It is expected,
as the RPG is implemented, the agents’ model will be improved, adding beliefs,
desires, rules and filters which underly the decisions observed in the real negoti-
ation context.
Fig. 1. BDI architecture supporting the negotiation between farmers and producers.
The figure 1 shows a model for beliefs, desires and intentions of the agents
inspired in [7]. The personality traits are related to decisions related to the inten-
tion to be executed. This mechanism in the architecture is implemented using
a filter (F1) that defines the cases for which an agent selects one option. Notice
that this process of decision depend upon the value of uncertainty of the beliefs.
The rules are used to define which desires and beliefs activate which intentions.
These rules are also part of the agent’s trait. We adopt the model proposed by [9]
for the dialogue protocol. Two protocols are used. One for information-seeking
(info-seeking) and other for negotiation (negotiation). They are defined as sim-
ple request-response message sequences between two agents. While the former
are used to ask for some information, the latter is used for exchange resources.
In the case modelled, farmers ask to producers about their commitment to pay.
Then they negotiate the value to pay in different contexts, as a result of suc-
cessive agreements. The producers may have two different behaviours. They can
assume that they agree with the farmer point of view or, otherwise, will have
to negotiate. Although this is a very simple protocol, our goal was to test the
Agent-Based Modelling for a Resource Management Problem 699
Fig. 2. The interaction environment in the Netlogo representing the conflict between
the farmer and the cattle producer, represented by a link (blue line connecting the two
agents).
The Interface for the Participatory Approach. The interface has two
distinct goals. The first one is to provide a styllized environment in which stake-
holders could identify the narrative space of the events described which are the
farmers, the red human-shape agents and the producers, the blue human-shape
agents. The latter is to foster autonomy to generate a simulation of events that
create the conflictual situations which are intended to be studied. In the case-
study, the river is identified as a blue area and the village where farmers and
producers live, correspond to the yellow area. The green area along the right
margin of the river is where the farmers have their cultivated areas. The inter-
face is prepared to interact with users. As the game is played some of the actions
are autonomous (e.g. the motion of the cattle to drink water). The red leaf-shape
agents are the cultivated areas damaged by the cattle. The link between a farmer
and a producer shows a conflict between them. The producer owns the cattle
identified with the number 4 which damaged a large area of plantation that
belongs to the farmer.
700 J. Cascalho and P. Mabunda
Fig. 3. The information-seeking messages sent by the farmer and the response of the
producer.
4 Conclusions
In this paper we present a prototype for an agent-based modelling in the context
of conflict within a resource management in Gaza, Mozambique. To address this
problem we chose the RPG approach. The following steps were made:
1. To create a stylized scenario where stakeholders could interact and to identify
the situation of conflict;
2. To define a protocol of communication between agents in conflict. The pro-
pose of this protocol is to support the interaction of the simulation with the
different stakeholders;
Agent-Based Modelling for a Resource Management Problem 701
References
1. Adamatti, D.F., et al.: A prototype using multi-agent-based simulation and role-
playing games in water management. In: CABM-HEMA-SMAGET, 2005, Bourg-
Saint-Maurice, Les Arcs. CABM-HEMA-SMAGET 2005, pp. 1–20. CDROM
(2005)
2. Bandini, S., Manzoni, S., Vizzari, G.: Agent Based Modeling and Simulation: An
Informatics Perspective. Journal of Artificial Societies and Social Simulation 12(4),
4 (2009)
3. Barreteau, O., Bousquet, F., Attonaty, J.M.: Role-playing games for opening the
black box of multi-agent systems: method and lessons of its application to Senegal
River Valley irrigated systems. Journal of Artificial Societies and Social Simulation
4(3) (2001)
4. Barreteau, O., et al.: Participatory approaches. In: Edmonds, B., Meyer, R. (eds.)
Simulating Social Complexity, pp. 197–234. Springer, Heidelberg (2013)
5. Bousquet, F., et al.: Multi-agent systems and role games: collective learning pro-
cesses for ecosystem management. In: Janssen, M.A. (ed.) Complexity and Ecosys-
tem Management: The Theory and Practice of Multi-agent Systems, pp. 249–285.
E. Elgar, Cheltenham (2002)
6. Briot, J.-P., et al.: A computer-based role-playing game for participatory manage-
ment of protected areas: the SimParc project. In: Anais do XXVIII Congresso da
SBC, Belém do Pará (2008)
7. Cascalho, J., Antunes, L., Corrêa, M., Coelho, H.: Characterising agents’ behaviours:
selecting goal strategies based on attributes. In: Klusch, M., Rovatsos, M., Payne,
T.R. (eds.) CIA 2006. LNCS (LNAI), vol. 4149, pp. 402–415. Springer, Heidelberg
(2006)
8. Cleland, D., et al.: REEFGAME as a Learning and Data-Gathering Computer-
Assisted Role-Play Game. Simulation Gaming 43, 102 (2012)
9. Hussain, A., Toni, F.: Bilateral agent negotiation with information-seeking. In:
Proc. of the 5th European Workshop on Multi-Agent Systems (2007)
10. Luong, B.V., et al.: A BDI game master agent for computer role-playing games.
In: Proceedings of the 2013 International Conference on Autonomous Agents and
Multi-Agent Systems (AAMAS 2013), pp. 1187–1188 (2013)
11. Wilensky, U., Stroup, W.: Learning through participatory simulations: network-
based design for systems learning in classrooms. In: Proceedings of Computer Sup-
ported Collaborative Learning (CSCL 1999). Stanford, CA, December 12–15, 1999
An Agent-Based MicMac Model for Forecasting
of the Portuguese Population
1 Introduction
Agent-Based computational demography models (Billari et al. [4] Ferber [5])
can deal with complex interactions between individuals, constituting an alter-
native to mainstream modelling techniques. Conventional population projection
methods forecast the number of people at a given age an a given point in time
assuming that the members of a cohort are identical with respect to demographic
behaviour. Different approaches include: macro simulation, based on policy inter-
ventions and other external events and conditions, and micro simulation, based
on life courses of individual cohort members. Micro and Macro (MicMac) (Gaag
et al. [6]) approaches offer a bridge between aggregate projections of cohorts.
These are important contributes for the sustainability of the health care and
pension systems, for example, as they are issues of current concern.
We construct an Agent-Based model, based on the MicMac approach, to
simulate the behaviour of the Portuguese population, open to migrations. The
notation and the main model components (fertility, mortality and migration) are
firstly introduced. Then the iterative simulation process is created and a forecast
for the Portuguese Population from 2011 to 2041 is presented.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 702–707, 2015.
DOI: 10.1007/978-3-319-23485-4 71
An Agent-Based MicMac Model for Forecasting 703
2 Population Model
2.1 Fertility and Mortality
We start by establishing the notation and the main components of the model.
The variables with A (resp. G) refer to agent variables (resp. global variables).
The indices a, s, k and y are used to denote age, sex, agent identification and
year, respectively. Any variable indexed by a, s, k, y represents the realization of
the variable in the agent k, aged a years-old and of sex s, in the year y. A similar
interpretation applies to any subset of these indices. The following variables are
then defined:
AAlive
a,s,k,y : vital status, taking the value 1 if the agent is alive and 0 otherwise
Alive
GAlive : number of living agents; clearly G Alive
= k Aa,s,k,y .
a,s,y
MaleFreq
a,s,y
Gy : relative frequency of male agents, equal to GAlive
a,M,y / GAlive
a,s,y
a a,s
GBirths
a,s,y : number of births of sex s given by female agents aged a years-old
GFertR
a,y+1 : global fertility rate
GDeaths
a,s,y : number of deaths; it satisfies
GMortR
a,s,y+1 : global mortality rate.
Real data from the 2011 Portuguese Census is used as the base population, in
a 2% size scale. The updating of the mortality and fertility rates is ensured by
MicMac models, as in Gaag et al. [6]. The Mac part is ruled by the predictions
obtained from Statistics Portugal [7], controlling the overall evolution of the
variables. The Mic part is based on the results obtained from the previous year.
The controlling factor for the fertility (resp. mortality) rate is the expected
mean fertility (resp. mortality) growth rate for the year y, denoted by GFertEvo
y
(resp. GMortEvo
y ). Then
GBirths
s
a,s,y GDeaths
a,s,y
GFertR = GFertEvo , GMortR
a,s,y+1 = GMortEvo .
a,y+1
GDeaths
a,F,y + GAlive
a,F,y
y
GDeaths
a,s,y + GAlive
a,s,y
y
Whenever the population size is very small, the mortality formula is replaced
a,s,y+1 = Ga,s,y × Gy
by GMortR MortR MortEvo
.
The following random variables are created in order to achieve heterogeneity
among the population of agents (division by 3 in the fractions ensures that its
values lie between 0 and 1):
GFertR FertR
a,y+1 1−Ga,y+1
FertR
Xa,y+1 ∼ N (GFertR FertR
a,y+1 , σxa,y+1 ), σxFertR
a,y+1 = min{0.02, 3
, 3
}
GMortR MortR
a,s,y+1 1−Ga,s,y+1
MortR
Xa,s,y+1 ∼ N (GMortR MortR
a,s,y+1 , σxa,s,y+1 ), σxMortR
a,s,y+1 = min{0.02, 3
, 3
}.
704 R. Fernandes et al.
2.2 Migration
The Portuguese population is also affected by migrations, with a high amount
of entries and exits, summing a total negative net migration. Our model also
includes such process. Throughout, c is an index denoting a given country while
c0 denotes Portugal.
GHealth
c : health indicator with AHealthW
k as its corresponding weight
GSafety
c : safety indicator with ASafetyW
k as its corresponding weight
GWage
c,y : wage andicator with AWageW
k as its corresponding weight
GPop
c : indicator for the Portuguese population size, with APopW
k as its
corresponding weight
GLang
c : indicator for the Portuguese language, with ALangW
k as its corre-
sponding weight
GLimit
c : emigration limits for country c, defined by the destination country
ECounter
Gc : emigration counter
The first four indicators range between 0 and 1. The used indicator must be
the same for all countries and it is preferable that the data source is the same,
because the same indicator may vary in different sources. GLang c equals 1 if
Portuguese is the native language and 0 otherwise. The wage indicator changes
every year according to the country expected mean wage growth. Data were
obtained from the UN and OECD databases [9], [8].
The above weights are assigned to each agent by a randomly sampled value
from N (μ, 0.75μ); μ for the first three and last weights are obtained from
Balaz [3]. The value of APopW
k was based on findings from Anjos and Campos [2].
The gains of migrating also depend on the will to migrate, which is highly
dependent on the age of the agent and its employment status. We define
An Agent-Based MicMac Model for Forecasting 705
GEmpProp
a,s : proportion of employed individuals, obtained from the 2011 Por-
tuguese rates of INE database [7]
GEcoGrow : expected economic growth for Portugal
AEmp
a,s,k,y : agent’s employment status, coded −1 if employed and 1 otherwise.
and is updated by
y-axis
AWill
a+1,k,y+1 = Ak × w(Ax-axis
k × a; AShape
k , AScale
k ) + ASuccessW
k × ASuccess
k,y .
ADest
k = arg max{AGain
c,a,k,y }
c
restricted to GECounter
ADest
< GLimits
ADest
and add 1 to the counter GECounter
ADest
k k k
Step 4. Remove all agents with ADest
k different from Portugal.
706 R. Fernandes et al.
GImmi
c,y : estimated amount of immigrants;
ImmiAge
X : Weibull distribution for the immigrants’ age;
GImmiProp
c : male proportion of immigrants;
GFertF
c : immigrants multiplying fertility factor, as in Adsera and Ferrer [1];
AImmiC
k : origin country of the immigrant agent k;
the immigration process is done according to:
3 Results
The results from the previously presented model for three different expected eco-
nomic growth rates for Portugal, GEcoGrow ∈ {0.9, 1.0, 1.1}, are now presented.
For each scenario, 300 simulations were considered for the period 2011-2041.
The outputs of the model are: total population size, number of births, number of
deaths and the total number of emigrants, by age and for each of the considered
years. Totals across all ages (and subsequently their means) are obtained.
Whatever the economic scenario, the population size is expected to be a
decreasing function with time. Moreover, the decrease is deepest when the eco-
nomic growth rate attains the lowest value. This derives from the fact that eco-
nomic growth plays a major role on the emigration decision and a decrease on this
parameter would increase emigration. Such expectation is confirmed by fig 1.
In addition, although the economic does not directly affect the fertility rate, a
decrease on this parameter leads to a faster decrease on the number of births over
the years. This is most likely due to the fact that the primary age interval of the
Portuguese emigrant population is within the women fertile ages. So the increase
in the number of emigrants leads to a reduction of the Portuguese women that
An Agent-Based MicMac Model for Forecasting 707
are in the fertile age. This justifies the decrease of births and further decreases
the total Portuguese Population.
Acknowledgments. The first and second authors were partially financed by the FCT
Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Tech-
nology) within project UID/EEA/50014/2013. The last author was partially supported
by CMUP (UID/MAT/00144/2013), which is funded by FCT (Portugal) with national
(MEC) and European structural funds through the programs FEDER, under the part-
nership agreement PT2020.
References
1. Adsera, A., Ferrer, A.: Factors influencing the fertility choices of child immigrants
in Canada. Population Studies: A Journal of Demography 68(1), 65–79 (2014)
2. Anjos, C., Campos, P.: The role of social networks in the projection of international
migration flows: an Agent-Based approach. Joint Eurostat-UNECE Work Session
on Demographic Projections. Lisbon, April 28–30, 2010
3. Baláž, V., Williams, A., Fifeková, E.: Migration Decision Making as Complex
Choice: Eliciting Decision Weights Under Conditions of Imperfect and Complex
Information Through Experimental Methods. Popul. Space Place (2014)
4. Billari, F., Ongaro, F., Prskawetz, A.: Introduction: Agent-Based Computational
Demography. Springer, New York (2003)
5. Ferber, J.: Multi-Agent Systems - An Introduction to Distributed Artificial
Intelligence. Addison-Wesley Longman, Harlow (1999)
6. Gaag, N., Beer, J., Willekens, F.: MicMac Combining micro and macro approaches
in demographic forecasting. Joint Eurostat-ECE Work Session on Demographic
Projections. Vienna, September 21–23, 2005
7. INE: Statistics Portugal (2012). www.ine.pt/en/
8. OECD: Organization for Economic Co-operation and Development (2015). www.
stats.oecd.org/
9. UN: United Nations (2015). www.data.un.org/
Text Mining and Applications
Multilingual Open Information Extraction
1 Introduction
Recent advanced techniques in Information Extraction aim to capture shallow
semantic representations of large amounts of natural language text. Shallow
semantic representations can be applied to more complex semantic tasks involved
in text understanding, such as textual entailment, filling knowledge gaps in text,
or integration of text information into background knowledge bases. One of the
most recent approaches aimed at capturing shallow semantic representations is
known as Open Information Extraction (OIE), whose main goal is to extract a
large set of verb-based triples (or propositions) from unrestricted text. An Open
Information Extraction (OIE) system reads in sentences and rapidly extracts
one or more textual assertions, consisting in a verb relation and two arguments,
which try to capture the main relationships in each sentence [1]. Wu and Weld
[2] define an OIE system as a function from a document d, to a set of triples,
(arg1, rel, arg2), where arg1 and arg2 are verb arguments and rel is a textual
fragment (containing a verb) denoting a semantic relation between the two verb
arguments. Unlike other relation extraction methods focused on a predefined set
of target relations, the Open Information Extraction paradigm is not limited
to a small set of target relations known in advance, but extracts all types of
(verbal) binary relations found in the text. The main general properties of OIE
systems are the following: (i) they are domain independent, (ii) they rely on
unsupervised extraction methods, and (iii) they are scalable to large amounts of
text [3].
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 711–722, 2015.
DOI: 10.1007/978-3-319-23485-4 72
712 P. Gamallo and M. Garcia
First, our OIE system detects the argument structure of the verb boycotted in
this sentence: there is a subject, a direct object, and two prepositional phrases
functioning as verb adjunts. Then, a set of basic rules transform the argument
structure into a set of triples:
2 Related Work
The goal of an OIE system is to extract triples (arg1, rel, arg2) describing basic
propositions from large amounts of text. A great variety of OIE systems has been
developed in recent years. They can be organized in two broad categories: those
systems requiring automatically generated training data to learn a classifier and
those based on hand-crafted rules or heuristics. In addition, each system category
Multilingual Open Information Extraction 713
can also be divided in two subtypes: those systems making use of shalow syn-
tactic analyisis (PoS tagging and/or chunking), and those based on dependency
parsing. In sum, we identify four categories of OIE systems:
(1) Training data and shallow syntax: The first OIE system, TextRunner
[8], belongs to this category. A more recent version of TextRunner, also using
training data (even if hand-labeled annotated) and shallow syntactic analysis
is R2A2 [9]. Another system of this category is WOEpos [2] whose classifier
was trained with corpus obtained automatically from Wikipedia.
(2) Training data and dependency parsing: These systems make use of
training data represented by means of dependency trees: WOEdep [2] and
OLLIE [10].
(3) Rule-based and shallow syntax: They rely on lexico-syntactic patterns
hand-crafted from PoS tagged text: ReVerb [11], ExtrHech [12], and LSOE
[13].
(4) Rule-based and dependency parsing: They make use of hand-crafted
heuristics operating on dependency parses: ClauseIE [3], CSD-IE [14],
KrakeN [15], and DepOE [16].
Our system belongs to the fourth category and, thus, is similar to ClauseIE
and CSD-IE, which are the best OIE extractors to date according to the results
reported in both [3] and [14]. However, these two systems are dependent on the
output format of a particular syntactic parser, namely the Standford dependency
parser [17]. In the same way, DepOE reported in [16], relies on a specific depen-
dency parser, DepPattern [7], since it only operates on the by-default output
given by this parser. ArgOE, by constrast, uses as input the standard CoNLL-X
format and, then, does not depend on a specific dependency parser.
Another significant difference between ArgOE and the other rule-based sys-
tems is that ArgOE does not distinguish between arguments and adjuncts. As
this distiction is not always clear and well identified by the syntactic parsers, we
simplify the number of different verb constituents within the argument struc-
ture: all prepositional phrases headed by a verb are taken as verb comple-
ments, regardeless of their degree of dependency (internal arguments or external
adjuncts) with the verb. So, the set of rules used to generate triples from this
simplified argument structure is smaller than in other rule-based approaches.
In addition, we make extraction multilingual. More precisely, our system has
the following properties:
3 The Method
Our OIE method consists of two steps: detection of argument structures and
generation of triples.
714 P. Gamallo and M. Garcia
For each parsed sentence in the ConLL-X format, all verbs are identified and,
for each verb (V), the system selects all dependents whose syntactic function
can be part of its argument structure. Each argument structure is the abstract
representation of a clause. The functions considered in such representations are
subject (S), direct object (O), attribute (A), and all complements headed by a
preposition (C). Five types of argument structures were defined and used in
the first experiments: SVO, SVC+, SVOC+, SVA, SVAC+, where “C+” means
one or more complements. All these argument structures are correct syntactic
options in our working languages: English, Portuguese, and Spanish. Table 1
shows English examples for each type of argument structure.
Table 2. Argument structures extracted from the sentence A Spanish official offered
what he believed to be a perfectly reasonable explanation for why the portable facilities
weren’t in service.
Type Constituents
SV0 S=”A Spanish official”, V=”offered”, O=”what he believed to be a perfectly rea-
sonable explanation for why the portable facilities weren’t in service”
SV0 S=”he”, V=”believed to”, O=”be a perfectly reasonable explanation for why the
portable facilities weren’t in service”
SVA S=”what”, V=”be”, A=”a perfectly reasonable explanation for why the portable
facilities weren’t in service”
SVA S=”the portable facilities”, V=”weren’t”, A=”in service”
Multilingual Open Information Extraction 715
One of the most discussed problems of OIE systems is that about 90% of the
extracted triples are not concrete facts [1] expressing valid information about
one or two named entities, e.g. “Obama was born in Honolulu”. However, the
vast amount of high confident relational triples (propositions) extracted by OIE
systems are a very useful starting point for further NLP tasks and applications,
such as common sense knowledge acquisition [18], and extraction of domain-
specific relations [19]. It follows that OIE systems are not suited to extract facts,
but to transform unstructured texts into structured and coherent information
(propositions), closer to ontology formats. Having this in mind, our objective
is to generate propositions from argument structures, where propositions are
defined as coherent and non over-specified pieces of basic information.
716 P. Gamallo and M. Garcia
From each argument structure detected in the previous step, our OIE sys-
tem generates a set of triples representing the basic propositions underlying the
linguistic structure. We assume that every argument structure can convey dif-
ferent pieces of basic information which are, in fact, minimal units of coherent,
meaningful, and non over-specified information. For example, consider again the
sentence:
In May 2010, the principal opposition parties boycotted the polls after accusa-
tions of vote-rigging.
if O is a that-clause, then:
arg1=S, rel=V, arg2=O
for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+prepi , arg2=Termi
SVA arg1=S, rel=V, arg2=A
SVAC+ arg1=S, rel=V, arg2=A
for i = 1 to n where n is the number of Complements C:
Ci is descomposed in prepi and Termi
arg1=S, rel=V+A+prepi , arg2=Termi
The output of ArgOE does not offer confidence values for each extraction. As
the system is rule-based, there is not probabilistic information to be considered.
Finally, with regard to the output format, it is worth mentioning that most OIE
systems produce triples only in textual, surface form. This can be a problem if
triples are used for NLP tasks requiring more linguistic information. This way,
in addition to surface form triples, ArgOE also provides syntax-based informa-
tion, with PoS tags, lemmas, and heads. If more syntactic information would be
required, it can be easily obtained from the dependency analysis.
4 Experiments
We conducted thre experimental studies: with English, Spanish, and Portuguese
texts. Preliminary studies were performed to select an appropriate syntactic
parser as input of ArgOE. Two multilingual dependency parsers were tested:
718 P. Gamallo and M. Garcia
MaltParser 1.7.11 and DepPattern 3.0 2 , which is provided with a format con-
verter that changes the standard output of the parser into the CoNLL-X format.
We opted for DepPattern as input of ArgOE because the tagset and dependency
names of DepPattern is the same for all the languages it is able to analyze, and
then, there is no need to configure and adapt ArgOE for each new language. The
use of MaltParser with different languages would require implementing convert-
ers from tagsets and dependency names defined for a particular language to a
common set of PoS tags and dependency names. Besides DepPattern, we also
use two different PoS taggers as input of the syntactic analyzer: TreeTagger [20]
for English texts and FreeLing [21] for Spanish and Portuguese. All datasets,
extractions and labels of the two experiments, as well as a version of ArgOE
configured for English, Spanish, Portuguese, French, and Galician, are freely
available3 .
We compare ArgOE against several OIE existing systems for English, namely
TextRunner, ReVerb, OLLI, WOEparse , and ClausIE. In this experiment, we
will report the results obtained by the the best version of ClauseIE, i.e., without
considering redundancy and without processing conjunctions in the arguments.
Note that we are comparing four systems based on training data (TextRunner,
ReVerb, OLLI, and WOEparse ) against two rule-based methods: ClausIE and
ArgOE.
The dataset used in the experiment is the Reverb dataset4 manually labeled
for the evaluation reported in [3]5 . The dataset consists of 500 sentences with
manually-labeled extractions for the five systems enumerated above. In addi-
tion, we manually labeled the extractions obtained from ArgOE for the same
500 sentences. To maintain consistency among the labels associated to the five
systems and those associated to ArgOE, we automatically identified those triples
extracted by ArgOE that also appear in, at least, one of the other labeled extrac-
tions. As a result, we obtained 355 triples extracted by ArgOE that were labeled
by annotators of previous work. Then, the extractions of ArgOE were given to
two annotators who were instructed to consider the 355 already labeled extrac-
tions as starting point. So, our annotators were required to study and analyze
the evaluation criteria used by other annotators before starting annotating the
rest of extracted triples. We also instructed the annotators to treat as incor-
rect those triples denoting incoherent and uninformative propositions, as well
as those triples constituted by over-specified relations, i.e., relations containing
numbers, named entities, or excessively long phrases (e.g., boycotted the polls
after accusations of vote-rigging in). An extraction was considered as correct
1
htpp://www.maltparser.org/
2
http://gramatica.usc.es/pln/tools/deppattern.html/
3
http://172.24.193.8/ArgOE-epia2015.tgz (anonymous version)
4
http://reverb.cs.washington.edu/
5
http://www-mpi-inf.mpg.de/departments/d5/software/clausie
Multilingual Open Information Extraction 719
Table 4. Number of correct extractions and total number of extractions in the Reverb
dataset, according to the evaluation reported in [3] and our own contribution with
ArgOE.
The results show that the two rule-based systems, ClausIE and ArgOE,
perform better than the classifiers based on automatically generated training
data. This is in accordance with previous work reported in [3,14]. Moreover,
the four systems based on dependency analysis (ClausIE, ArgOE, OLLIE, and
720 P. Gamallo and M. Garcia
Most errors made by our OIE system come from three different sources: the
syntactic parser, the PoS tagger, and the Named Entity Recognition module used
by the PoS tagger. So, the improvement of our system relies on the performance
of other NLP tasks.
applied on the sentences and 190 triples was extracted. One annotator labeled
the extracted triples and Table 6 shows the number of correct triples and pre-
cision achieved by the system. To the best of our knowledge, this is the first
experiment that reports an OIE system working on Portuguese. Precision is
again similar (53%) to that obtained in the previous experiments. Again, most
errors are due to problems from the syntactic parser and PoS tagger.
5 Conclusion
We have described a rule-based OIE system to extract verb-based triples than
takes as input dependency parsers in the CoNLL-X format. So, it may take
advantage of efficient, robust, and multilingual syntactic parsers. Even if our
system is outperformed by other similar rule-based methods, it reaches better
results than those strategies based on training data. As far as we know, ArgOE
is the first OIE system working on more than one language. In future work, we
will include NLP modules to find linguistic generalizations over the extracted
triples: e.g., co-reference resolution to link the arguments of different triples, and
synonymy detection of verbs to reduce the open set of extracted relations and,
then, to enable semantic inference.
Acknowledgments. This work has been supported by projects Plastic and Celtic,
Innterconecta (CDTI).
References
1. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open infor-
mation extraction from the web. In: International Joint Conference on Artificial
Intelligence (2007)
2. Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Annual Meet-
ing of the Association for Computational Linguistics (2010)
3. Corro, L.D., Gemulla, R.: Clausie: clause-based open information extraction. In:
Proceedings of the World Wide Web Conference (WWW-2013), Rio de Janeiro,
Brazil, pp. 355–366 (2013)
4. Hall, J., Nilsson, J.: CoNLL-X shared task on multilingual dependency parsing. In:
The Tenth CoNLL (2006)
5. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilson, J., Riedel, S., Yuret, D.: The
CoNLL-2007 shared task on dependency parsing. In: Proceedings of the Shared
Task Session of EMNLP-CoNLL 2007, Prague, Czech Republic, pp. 915–932 (2007)
6. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S.,
Marsi, E.: Maltparser: A language-independent system for data-driven dependency
parsing. Natural Language Engineering 13(2), 115–135 (2007)
722 P. Gamallo and M. Garcia
K.M. Kavitha1,3(B) , Luı́s Gomes1,2 , José Aires1,2 , and José Gabriel P. Lopes1,2
1
NOVA Laboratory for Computer Science and Informatics (NOVA LINCS),
Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa,
2829-516 Caparica, Portugal
[email protected], [email protected], [email protected],
[email protected]
2
ISTRION BOX-Translation & Revision, Lda., Parkurbis,
6200-865 Covilhã, Portugal
3
Department of Computer Applications, St. Joseph Engineering College,
Vamanjoor, Mangaluru 575 028, India
[email protected]
1 Introduction
Annotated bilingual lexica with their entries tagged for (in)correctness can be
mined to discover the nature of new term translation extractions and/or align-
ment errors. An automated classification system can then be trained when suffi-
cient amount of positive and negative evidence is available. Such a classifier can
facilitate and speed up the manual validation process of automatically extracted
term translations, and contribute to make the human validation effort easier
while augmenting the number of validated (rejected and accepted) bilingual
entries in a bilingual term translation lexicon. Bionic interaction between lin-
guists and highly precise machine classifiers in a continuous common effort,
without loosing knowledge contributes to improve alignment precision and, at
another level, translation quality. It is therefore important to have term trans-
lation extractions automatically classified prior to having them validated by
human specialists.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 723–734, 2015.
DOI: 10.1007/978-3-319-23485-4 73
724 K.M. Kavitha et al.
In this paper, we assume sentence aligned parallel corpora for extracting term
translations, constructing translation tables or obtaining that parallel corpora
aligned at a subsentence grain [7,13]. In this setting, translation correspondences
are identified between term pairs by computing their occurrence frequencies or
similarities within the aligned sentences rather than in the entire corpus.
In the completely unsupervised models based on parallel corpora, all the
phrase pairs that are considerably consistent with the word alignment are
extracted and gathered into a phrase table along with their associated probabil-
ities [4,19]. Naturally, the resulting table extracted from the alignment, with no
human supervision, contains alignment errors. Moreover, many of the transla-
tions in the phrase table produced are spurious or will never be used in any
translation [10]. A recent study shows that nearly 85% of phrases gathered
in the phrase table can be reduced without any significant loss in translation
quality [21].
A different approach [1], that deviates from this tradition acknowledges the
need for blending the knowledge of language, linguistics and translation as
relevant for research in Machine Translation [27]. The approach being semi-
supervised and iterative takes privilege in informing the machine for not making
the same kind of errors in subsequent iterations of alignment and extraction. In
this partially supervised, iterative strategy, first a bilingual lexicon is used to
align parallel texts [7]. New1 term-pairs are then extracted from those aligned
texts [1]. The newly extracted candidates are manually verified and then added
to the existing bilingual lexicon with the entries manually tagged as accepted
(Acc) and rejected (Rej). Iteration over these three steps (parallel text alignment
using an updated and validated lexicon, extraction of new translation pairs and
their validation) results in an improved alignment precision, improved lexicon
quality, and in more accurate extraction of new term-pairs [7]. Human feedbacks
are particularly significant in this scenario as incorporating them prevents align-
ment and extraction errors from being fed back into subsequent alignment and
extraction iterations. The work described in this paper may be easily integrated
in such a procedure.
Several approaches for extracting phrase translations prevail [1,4,8,15]. How-
ever, it is important to have the extractions automatically classified prior to
having them validated by human specialists. We view classification as a pre-
validation phase that allows a first-order separation of correct entries from incor-
rect ones, so that the human validation task becomes lighter [11]. We extend our
previous work by using a larger set of extracted translation candidates for the
language pair EN-PT and by additionally adopting other extraction techniques
[4,15] and others not yet published. Experimental evaluations of the classifier
for additional language pairs EN-FR and FR-PT are also presented. Further,
the performance of the classifier with additional features is discussed.
In the Section 2, we provide a quick review of the related work. In the
Section 3, we present the classification approach for selecting translation candi-
dates and the features used. The data sets used, the classification results, and
1
Not seen in the bilingual lexicon that was used for alignment.
Classification and Selection of Translation Candidates 725
the other hand, view the selection of translation candidates as a supervised clas-
sification problem with labeled training examples for both the classes (positive
and negative instances).
3 Classification Model
In the current section, we discuss the use of SVM based classifier in segregating
the extracted translation candidates as accepted, ‘Acc’ or rejected, ‘Rej ’. The
classification task involves training and testing data representing bilingual data
instances. Each bilingual pair is a data instance represented as a feature vector
and a target value known as the class label5 . We train the learning function
with the scaled training data set, where each sample is represented as a feature
vector with the label +1 (‘Acc’) or -1 (‘Rej’). The estimated model is then
used to predict the class for each of the unknown data instance kept aside for
testing, represented similarly as any sample in the training set, but with the class
2
label 0. We use the Radial Basis Function (RBF) kernel: K(xi , xj ) = eγx−y ;
parameterised by (C, γ), where C > 0 is the penalty parameter of the error term
and γ > 0 is the kernel parameter.
3.1 Features
Adequate feature identification for representing the data in hand is fundamental
to enable good learning. An overview of the features used in our classification
model is discussed in this section. We use the features derived using the ortho-
graphic similarity measures (strsim) and the frequency measures (freq) discussed
in the section below as baseline (BLstrsim+f req ) for our experiments.
We use the ‘accepted ’ entries in the training dataset with EditSim ≥ 0.65 as
examples to train SpSim and a dictionary containing the substitution patterns
is learnt. For instance, the substitution pattern extracted from EN-PT cognate
word pair ‘phase’ and ‘fase’ is (‘ˆph’, ‘ˆf’), after eliminating all matched (aligned)
characters, ‘a’ ⇔ ‘a’, ‘s’ ⇔ ‘s’ and ‘e’ ⇔ ‘e’. The caret (ˆ), at the beginning of
the aligned strings distinguishes that the pattern appears as a prefix.
M in(F (X), F (Y ))
M inM axRatio(X, Y ) = (4)
M ax(F (X), F (Y ))
We use two different approaches to identify bad ends: one set of two features
based on endings that are stop words (BESW ) and the other set of two features
based on endings seen in the rejected, but not in the accepted training dataset
(BEP atR−A ). We consider only those endings that occur more than 5 times in
the rejected but not in the accepted training dataset. To avoid the content words
from being considered as bad ends, the term length is restricted to less than 5
characters.
6
A neutral value reflecting our lack of support in deciding whether to accept or to
reject that pair.
7
Constructed separately using the first and second language terms in the accepted
bilingual training data.
Classification and Selection of Translation Candidates 729
4.2 Results
In the current section, we discuss the classification results and the performance of
the classifier with respect to various features using the complete data set (95%)
introduced in the Section 4.1 for each of the language pairs EN-PT, EN-FR and
FR-PT.
8
A library for support vector machines - Software available at http://www.csie.ntu.
edu.tw/∼cjlin/libsvm
9
DGT-TM - https://open-data.europa.eu/en/data/dataset/dgt-translation-memory
Europarl - http://www.statmt.org/europarl/
OPUS (EUconst, EMEA) - http://opus.lingfil.uu.se/
730 K.M. Kavitha et al.
The Table 3 shows the precision (PAcc , PRej ), recall (RAcc , RRej ) and the
accuracy of the estimated classifier in predicting each of the classes (Acc and
Rej ) while using different features. Micro-average Recall (μR ), Micro-average
Precision (μP ), and Micro-average f-measure (μF )10 are used to assess the global
performance over both classes.
As might be seen from the Table 3, for EN-PT, substantial improvement is
achieved by using the feature that looks for translation coverage on both sides
of the bilingual pair. We observe an increase in μF of 22.85% over the base line
and 19.32% over a combination of the features representing baseline and bad
ends. Best μF is obtained when the stemmed11 lexicon is used to look for stem
coverage rather than the original lexicon. However, for EN-FR, training with
stemmed lexicon did not show a meaningful improvement.
Table 3. Classifier Results using different features for EN-PT, EN-FR and FR-PT
FR-PT results are worse than the results obtained for other language pairs:
the best μF and accuracy of 65.87% and 84.13% respectively are obtained when
we use a combination of features BL+BEP atR−A + CovStm + SpSim. How-
ever, the improvement is negligible (approximately ranging from 0.01% - 0.14%)
against the baseline (BLstrsim+f req ) in every terms (precision, recall and micro
f-measure) over both classes. This may be explained because the number of ‘sin-
gle word - single word’ pairs is comparatively larger than for the other language
pairs and the number of ‘multi-word - multi-word’ pairs is small (50,552 for the
accepted). Approximately 250K French multi-words are paired with single Por-
tuguese words and approximately 9K Portuguese multi-words are paired with
single French words. Moreover, approximately 130K are single word pairs for
this pair of languages which is quite different from the EN-PT scenario.
10
Computed as discussed in [11].
11
Stemmed using the snowball stemmer.
Classification and Selection of Translation Candidates 731
Also, patterns indicating bad ends that are stop words (BESW ) are substan-
tially few in number with respect to FR-PT12 and EN-FR13 lexicon corpus as
opposed to EN-PT14 . This is because extractions for these language pairs use
all of the techniques mentioned in section 4.1 except for the suffix array based
extraction technique [1]. Hence EN-FR and FR-PT were much cleaner.
Table 4. Classifier Results for EN-PT, EN-FR and FR-PT by training set sizes
Looking at the classification results for EN-PT using SVM and the training
set, we observe that the larger the training set larger the recall (RAcc is 92.6%
against 92.22%) for the ‘Accepted ’ class. Meanwhile, when we augment the train-
ing set we loose in precision from 99.45% to 98.38%. However, by augmenting
the training set we augment the precision (RAcc from 89.59% to 89.91%) for the
‘Rejected ’ class, whereas the recall drops (RRej from 99.24% to 97.74%). As the
training set is much larger than for other language pairs (95% of the corpus) we
12
5 in FR and 8 in PT; most frequent are ‘de’ in FR with 27 occurrences and ‘de’ in
PT with 43 occurrences.
13
43 in EN and 15 in FR; most frequent are ‘to’ in EN with 210 occurrences and ‘pas’
in FR with 237 occurrences.
14
112 in EN and 86 in PT; most frequent are ‘the’ in EN with 27,455 occurrences and
‘a’ in PT with 22,242 occurrences.
732 K.M. Kavitha et al.
do not necessarily gain much. Thus, precision and recall for EN-PT does evolve
in a way, such that, while one augments the other tends to decrease, partially
deviating from the trend observed in our earlier experiments [11]. It is possible
that some sort of overfitting occurs.
Unlike EN-PT, for the language pairs EN-FR and FR-PT, with larger train-
ing sets the performance of the trained classifier improved. For the features listed
in Table 3, best results were obtained with 95% and 90% of the training set.
Table 5. Performance of Classifier trained on one language pair when tested on others.
5 Conclusion
We have discussed the classification approach as a means for selecting appro-
priate and adequate candidates for parallel corpora alignment. Experimental
results demonstrate the use of the classifiers on EN-PT, EN-FR and FR-PT lan-
guage pairs under small, medium and large data conditions. Several insights are
useful for distinguishing the adequate candidates from inadequate ones such as,
lack (presence) of parallelism, spurious terms at translation ends and the base
properties (similarity and occurrence frequency) of the translation pairs.
This work is motivated by the need for a system that evaluates the trans-
lation candidates automatically extracted prior to their submission for human
Classification and Selection of Translation Candidates 733
References
1. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned par-
allel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N.,
Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597.
Springer, Heidelberg (2009)
2. Aker, A., Paramita, M.L., Gaizauskas, R.J.: Extracting bilingual terminologies
from comparable corpora. In: Proceedings of the 51st Annual Meeting for Compu-
tational linguistics, vol. 2, pp. 402–411 (2013)
3. Bergsma, S., Kondrak, G.: Alignment-based discriminative string similarity. In:
Annual meeting-ACL, vol. 45, p. 656 (2007)
4. Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of
statistical machine translation: Parameter estimation. Computational linguistics
19(2), 263–311 (1993)
5. Chen, B., Cattoni, R., Bertoldi, N., Cettolo, M., Federico, M.: The ITC-irst SMT
system for IWSLT-2005, pp. 98–104 (2005)
6. Fraser, A., Marcu, D.: Measuring word alignment quality for statistical machine
translation. Computational Linguistics 33(3), 293–303 (2007)
7. Gomes, L.: Parallel texts alignment. In: New Trends in Artificial Intelligence, 14th
Portuguese Conference in Artificial Intelligence, EPIA 2009, Aveiro, October 2009
8. Gomes, L., Pereira Lopes, J.G.: Measuring spelling similarity for cognate identifica-
tion. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS, vol. 7026, pp. 624–633.
Springer, Heidelberg (2011)
734 K.M. Kavitha et al.
9. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and
computational biology. Cambridge Univ Pr., pp. 52–61 (1997)
10. Johnson, J.H., Martin, J., Foster, G., Kuhn, R.: Improving translation quality by
discarding most of the phrasetable. In: Proceedings of EMNLP (2007)
11. Kavitha, K.M., Gomes, L., Lopes, G.P.: Using SVMs for filtering translation
tables for parallel corpora alignment. In: 15th Portuguese Conference in Arificial
Intelligence, EPIA 2011, pp. 690–702, October 2011
12. Kavitha, K.M., Gomes, L., Lopes, J.G.P.: Identification of bilingual suffix classes
for classification and translation generation. In: Bazzan, A.L.C., Pichara, K. (eds.)
IBERAMIA 2014. LNCS, vol. 8864, pp. 154–166. Springer, Heidelberg (2014)
13. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N.,
Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for
statistical machine translation. In: Proceedings of the 45th Annual Meeting of the
ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. ACL (2007)
14. Kutsumi, T., Yoshimi, T., Kotani, K., Sata, I., Isahara, H.: Selection of entries
for a bilingual dictionary from aligned translation equivalents using support vector
machines. In: Proceedings of PACLING (2005)
15. Lardilleux, A., Lepage, Y.: Sampling-based multilingual alignment. In: Proceedings
of RANLP, pp. 214–218 (2009)
16. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and
reversals. Soviet Physics Doklady 10, 707–710 (1966)
17. Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing
n-best translation lexicons. In: Proceedings of the Third Workshop on Very Large
Corpora, pp. 184–198. Boston, MA (1995)
18. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment
models. Computational linguistics 29(1), 19–51 (2003)
19. Och, F.J., Ney, H.: The alignment template approach to statistical machine trans-
lation. Computational Linguistics 30(4), 417–449 (2004)
20. Sato, K., Saito, H.: Extracting word sequence correspondences based on support
vector machines. Journal of Natural Language Processing 10(4), 109–124 (2003)
21. Tian, L., Wong, D.F., Chao, L.S., Oliveira, F.: A relationship: Word alignment,
phrase table, and translation quality. The Scientific World Journal (2014)
22. Tiedemann, J.: Extraction of translation equivalents from parallel corpora. In:
Proceedings of the 11th NoDaLiDa, pp. 120–128 (1998)
23. Tomeh, N., Cancedda, N., Dymetman, M.: Complexity-based phrase-table filtering
for statistical machine translation (2009)
24. Tomeh, N., Turchi, M., Allauzen, A., Yvon, F.: How good are your phrases? Assess-
ing phrase quality with single class classification. In: IWSLT, pp. 261–268 (2011)
25. Vapnik, V.: The Nature of Statistical Learning Theory. Data Mining and Knowl-
edge Discovery 1–47 (2000)
26. Vilar, D., Popovic, M., Ney, H.: AER: Do we need to “improve” our alignments?
In: IWSLT, pp. 205–212 (2006)
27. Way, A., Hearne, M.: On the role of translations in state-of-the-art statistical
machine translation. Language and Linguistics Compass 5(5), 227–248 (2011)
28. Zens, R., Stanton, D., Xu, P.: A systematic comparison of phrase table pruning
techniques. In: Proceedings of the 2012 Joint Conference on EMNLP and CoNLL,
EMNLP-CoNLL 2012, pp. 972–983. ACL (2012)
29. Zhao, B., Vogel, S., Waibel, A.: Phrase pair rescoring with term weightings for
statistical machine translation (2004)
A SMS Information Extraction Architecture
to Face Emergency Situations
1 Introduction
Currently, it is hard to imagine any line of business that does not use any textual
information. Aside from the available content on the Internet, a large amount
of information is generated and transmitted by computers and smartphones all
over the world. Gary Miner et al. estimate that 80% of the information available
in the world are in free text format and therefore not structured [7]. With such
large amount of potentially relevant data, an information extraction system can
structure and refine raw data in order to find and link relevant information amid
extraneous information [3,5]. This process is made possible by understanding
the information contained in texts and their context, but this complex task
face difficulties when processing informal languages, such as SMS messages or
tweets [1,6,10].
Messages using the Short Message Service (SMS), as well as tweets, are widely
used for numerous purposes, which makes them rich and useful data for informa-
tion extraction. The content of these messages can be of high value and strategic
interest, specially during emergencies1 . Under these circumstances, the amount
1
Also referred to as crisis events, disasters, mass emergencies and natural hazards by
other researchers in the area.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 735–746, 2015.
DOI: 10.1007/978-3-319-23485-4 74
736 D. Monteiro and V.L.S. de Lima
2 Related Work
Corvey et al. introduce a system that incorporates linguistic and behavioral
annotation on tweets during crisis events to capture situation awareness infor-
mation [2]. The system filters relevant and tactical information intending to
help the affected population. Corvey et al. collected data during five disaster
events and created datasets for manual annotation. The authors linguistically
annotated the corpus, looking for named entities of four types: person, name,
organization and facilities. A second level of behavioral annotation assesses how
community members tweet during crisis events. Tweets receive different and
non-exclusive qualitative tags according to the type of information provided.
Tweets containing situational awareness information are collected and tagged
with macro-level (environmental, social, physical or structural) and micro-level
(regarding damage, status, weather, etc.) information. The results indicated that,
under emergencies, “users communicate via Twitter in a very specific way to con-
vey information” [2]. Becoming aware of such behavior helped the framework’s
machine learning classifier to achieve accuracy of over 83% using POS tags and
bag of words. To classify location, they used Conditional Random Fields (CRFs)
A SMS Information Extraction Architecture to Face Emergency Situations 737
with lexical and syntactic information and POS as features. The annotated cor-
pus was divided into 60% for training and 40% for testing. They obtained an
accuracy of 69% for the complete match and 86% for the partial match and
recall of 63% for the complete match and 79% for the partial match.
Sridhar et al. present an application of statistical machine translation to
SMS messages [11]. This research details on the data collection process and
steps and resources used on a SMS message translation framework, which uses
finite state transducers to learn the mapping between short texts and canonical
form. The authors used a corpus of tweets as surrogate data and a bitext corpus
from 40,000 English and 10,000 Spanish SMS messages, collected from transcrip-
tions of speech-based messages sent through a smartphone application. Another
1,000 messages were collected from the Amazon Mechanical Turk2 . 10,000 tweets
were collected and normalized by removing stopwords, advertisements and web
addresses. The framework processes messages segmented into chunks using an
automatic scoring classifier. Abbreviations are expanded using expansion dictio-
naries and translated using a translation model based on sentences. The authors
built a static table to expand abbreviations found in SMS messages, where a
series of noisy texts have the corresponding canonical form mapped. For example,
“4ever” is linked to the canonical form “forever”. Next, the framework segments
phrases using an automatic punctuation classifier trained over punctuated SMS
messages. Finally, the Machine Translation component uses a hybrid translation
approach with phrase-based translation and sentences from the input corpus
represented as a finite-state transducer. The framework was evaluated over a set
of 456 messages collected in a real SMS interaction, obtaining a BLEU score of
31.25 for English-Spanish translations and 37.19 for Spanish-English.
Ritter et al. present TwiCAL, an open-domain event extraction and catego-
rization system for Twitter [9]. This research proposes a process for recognizing
temporal information, detecting events from a corpus of tweets and outputting
the extracted information in a calendar containing all significant events. The
authors focused on identifying events referring to unique dates. TwiCAL extracts
a 4-tuple representation of events, including a named entity, an event phrase, and
an event type. The authors trained a POS tagger and a NE tagger on in-domain
Twitter data. To build an event tagger, they trained sequence models with a
corpus of annotated tweets, and a rule-based system and POS to mark tempo-
ral expressions on a text. The open-domain event categorization uses variable
models to discover types that match the data and discards incoherent types.
The result is applied to the categorization of extracted events. The classifica-
tion model is evaluated according to the event types created from a manual
inspection of the corpus. The authors compared the results with a supervised
Maximum Entropy baseline, over a set of 500 annotated events using 10-fold
cross validation. Results achieved a 14% increase in maximum F1 score over the
supervised baseline. A demonstration of the system is available at the Status
Calendar webpage3 .
2
https://www.mturk.com/
3
http://statuscalendar.com
738 D. Monteiro and V.L.S. de Lima
removal; and steps specifically designed for linguistic processing: POS tagging
and spell-checking. The normalization step is responsible for adjusting the text
while facing spelling variations, abbreviations, treating special characters and
other features of the short message language. Next, the sentence splitting step
divides each message into a list of sentences in order to process them individ-
ually. Every token is compared to a list of stopwords, which enables discarding
unnecessary items and speeding up the process of information extraction.
Accordingly, the tokens are tagged with a part-of-speech tagger, which is
trained with an annotated corpus of messages. The following step in the linguistic
processing component comprises a Spell Checker, which makes use of an external
dictionary to label untagged tokens and submits them to the POS tagger for
revision. This component outputs a set of preprocessed sentences that serve as
input for the temporal processing component.
The temporal reference classifier analyzes the expression according to its lex-
ical triggers and defines the type and value of the temporal expression. Finally,
the component tags the temporal expression according to the TIMEX2 tag sys-
tem provided by TimeML4 .
4 Case Study
In order to validate our proposal, we present a case study conducted over the IE
Architecture. In this chapter, we detail our choices and decisions made.
4
http://www.timeml.org/site/publications/timeMLdocs/timeml 1.2.1.html
A SMS Information Extraction Architecture to Face Emergency Situations 741
The input data for the process was organized from a set of 3,021 short mes-
sages received by an electric utility company. Clients notify the company when
there is a power outage, sending short messages with the word “LUZ” (light)
and the installation number (provided by the company). As observed in mes-
sages received, the companys clients use this communication channel to provide
situation awareness information, which is currently not yet processed but could
be of great help in services provision. It is important to extract information from
these messages to deliver relevant and strategic information about emergencies
so as to restore power to customers as quickly and safely as possible. The corpus
was built in a XML format, comprising the messages and their delivery date.
We split the corpus in a ‘learning corpus’, containing 2,014 messages; a ‘gold
standard corpus’, containing 100 messages, to perform an evaluation of the pro-
totypes taggers; a ‘test corpus’ to improve the prototype from the evaluation
results.
We prototyped the architecture using Python5 (version 2.7), mainly due to
its ease of use, productivity and features for handling strings, lists, tuples and
dictionaries, along with the Natural Language Toolkit (NLTK6 ) (version 3.0).
NLTK provides some interesting features for Portuguese, like tokenizers, stem-
mers, Part-of-Speech taggers, and annotated corpora for training purposes. We
highlight the main aspects of the components implementation as follows.
The component standardizes the text input. SMS messages contain many mis-
spellings, as texters tend not to follow spelling and grammar rules, which led us
to address this matter beforehand, covering the most common cases found on
the learning corpus. Some variations are caused by different levels of literacy,
besides idiosyncratic SMS language characteristics. In this step, we lowercased
messages, removed commas, hyphens and special characters, such as ‘#’ and ‘@’,
and unnecessary full stops, such as in zip codes or abbreviations. Each sentence
undergoes a tokenization step, using whitespaces to mark word boundaries. The
prototype uses wordpunct tokenize7 to split strings into lists of tokens. We built
an external list containing 45 stopwords found on the learning corpus, as well as
common word shortenings and phonetic abbreviations, such as “vc” and “q”.
The component uses MacMorpho8 , a tagged training corpus with news in
standard Brazilian Portuguese. However, the lack of a tagged corpus of texts in
SMS language hampers the POS tagging step. Even though Normalization han-
dles some misspellings, many words not written in standard Portuguese remain
untagged. To address this matter, we used PyEnchant9 , a spell checking library
for Python, as a step to mitigate the spelling variation problem. We added the
5
https://www.python.org/
6
http://www.nltk.org
7
http://www.nltk.org/api/nltk.tokenize.html
8
http://www.nilc.icmc.usp.br/macmorpho/
9
https://pythonhosted.org/pyenchant
742 D. Monteiro and V.L.S. de Lima
the messages. From understanding the verb, its meaning and accessories, one
can determine the structure of the sentence of which it is part. The prototype
considers sentences in the following structure: Noun Phrase + Verb + Object.
Both the Noun phrase and the Object can play semantic roles of agent and
patient.
The prototype iterates through sentences looking for POS tags assigned dur-
ing the linguistic processing in order to find verbs. After, the Event Detection
step marks the boundaries of the event mention by greedy searching for nouns,
prepositions, noun compounds, adjectives or pronouns on the surroundings of
the verb. Verbs interspersed with other POS tags mark different event mentions,
while adjectives and nouns (or noun compounds) mark the boundaries of event
mentions.
Next, the prototype classifies the detected events. Through an extensive study
of the learning corpus, we observed how clients communicate during emergen-
cies as well as what they notify. Then, we listed the most relevant events and
related words we found whilst defining the annotation standard. We defined
three non-mutually exclusive categories of events, according to the observed
events and thirteen notification types to provide situation awareness informa-
tion. “Instalação” refers to messages containing information regarding the con-
sumer unit (electrical installation), such as power outages, voltage drops and
instabilities. “Rede” groups information about the electrical grid status and its
components, such as short circuits or fallen utility poles. “Ambiente” comprises
information regarding the environment that might affect the electrical grid, like
fallen trees, storms and lightnings.
To properly classify the event, we split sentences and analyze separately noun
phrases, verbs and objects, searching for domain-related words. The component
depends on a list built from 83 words related to the notification types, collected
from sources such as Dicionário Criativo11 and Wordnet12 . For instance, the sen-
tence “caiu uma arvore na rede” (a tree fell over the power grid), is divided in
two phrases: “caiu” (verb), and “uma arvore em rede” (object). Once verified,
the Event Classifier considers that while the verb does not determine a notifica-
tion type, the object notifies a notification type “Queda de Árvore”, due to the
presence of the words “arvore” and “rede”. Finally, the Tagger groups all the
information in a set of tags, according to its categories and notification types.
Once the information are already tagged, one can use different approaches to
visualize and understand such data. Being aware of other possibilities that could
be explored in a more extended study, we generated charts from tagged mes-
sages and their corresponding notification types, allowing the visualization of
the application of the proposed model.
11
http://dicionariocriativo.com.br/
12
http://wordnetweb.princeton.edu/
744 D. Monteiro and V.L.S. de Lima
differentiating “está” (is) and its popular contraction “tá” (absent in the training
corpus) from “esta” (this) which compromised the detection of some events.
The temporal processing component behaved well over the gold standard. In
fact, in one of the evaluated messages, the component tagged “15 minutos” (15
minutes) as a temporal expression, while the judges did not recognize it, showing
that this understanding is not clear even for humans.
6 Considerations
During emergencies, any detail can help services provision. For that matter, SMS
messages can be an important source of valuable information, as it is one of the
most widely used means of communication. However, SMS messages are usually
written in a proper language containing abbreviations, slangs and misspellings,
which hamper their processing in their context of operation.
As observed in Section 2, even IE systems built for different tasks may present
similarities. From this learning, we could propose an architecture for information
extraction from SMS messages according to core components shared by most IE
systems reviewed, while adding other components to treat domain-specific char-
acteristics. The architecture comprises a linguistic processing component, which
prepares messages for information extraction; a temporal processing component,
which identifies and tags existing temporal information within messages; an event
processing component, which detects and classifies events according to a list of
domain-related categories; and an information fusion component that interprets
information and displays them in a human-understandable manner.
To validate the architecture, we conducted a case study over a corpus of SMS
messages sent to an electric utility company. We studied how users communicate
during emergencies and defined categories of information that could aid services
provision. We validated the architecture against a gold standard corpus built
with the assistance of judges with domain knowledge. Among the tagging stages,
we established a degree of severity (varying from 1 to 5) to distinguish the
categories of events. We assessed the range of scores given by the judges, resulting
in a kappa coefficient of 0.0013, i.e., a poor level agreement, which led us to use
fewer severity degrees.
As, to the best of our knowledge, there is no architecture to address this mat-
ter, especially for the Portuguese language, we expect this proposal to bring focus
to this area and encourage other researchers to contribute to its improvement.
IE systems built on this architecture may attend other electric utility companies
or address other types of disasters or emergencies and other short messages, such
as tweets. As improvement opportunities unveiled, we could mention resorting to
a more appropriate tagged corpus trained over the SMS language. Such resource
would decrease the number of untagged tokens, which in turn would increase the
accuracy of the event detection step. However, to the present, we do not know
of the existence of such resource for the Portuguese language.
For future work, we intend to continue our research, revising the Case Study
results, and refine the prototype according to other approaches, such as machine
746 D. Monteiro and V.L.S. de Lima
References
1. Bernicot, J., Volckaert-Legrier, O., Goumi, A., Bert-Erboul, A.: Forms and func-
tions of SMS messages: A study of variations in a corpus written by adolescents.
Journal of Pragmatics 4412, 1701–1715 (2012)
2. Corvey, W.J., Verma, S., Vieweg, S., Palmer, M., Martin, J.H.: Foundations of a
multilayer annotation framework for twitter communications during crisis events.
In: 8th International Conference on Language Resources and Evaluation Confer-
ence (LREC), p. 5 (2012)
3. Cowie, J., Lehnert, W.: Information extraction. Communications of the ACM 391,
80–91 (1996)
4. Dai, Y., Kakkonen, T., Sutinen, E.: SoMEST: a model for detecting competi-
tive intelligence from social media. In: Proceedings of the 15th International Aca-
demic MindTrek Conference: Envisioning Future Media Environments, pp. 241–248
(2011)
5. Jurafsky, D., Martin, J. H.: Speech and language processing, 2nd edn. Prentice
Hall (2008)
6. Melero, M., Costa-Juss, M.R., Domingo, J., Marquina, M., Quixal, M.: Holaaa!!
writin like u talk is kewl but kinda hard 4 NLP. In: 8th International Conference
on Language Resources and Evaluation Conference (LREC), pp. 3794–3800 (2012)
7. Miner, G., Elder, J.I., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and
Statistical Analysis for Non-structured Text Data Applications. Elsevier, Burling-
ton (2012)
8. Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning.
OReilly Media, Inc. (2012)
9. Ritter, A., Etzioni, O., Clark, S., et al.: Open domain event extraction from twitter.
In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1104–1112 (2012)
10. Seon, C.-N., Yoo, J., Kim, H., Kim, J.-H., Seo, J.: Lightweight named entity extrac-
tion for korean short message service text. KSII Transactions on Internet and
Information Systems (TIIS) 5–3, 560–574 (2011)
11. Sridhar, V.K.R., Chen, J., Bangalore, S., Shacham, R.: A Framework for trans-
lating SMS messages. In: Proceedings of COLING 2014, the 25th International
Conference on Computational Linguistics: Technical Papers, pp. 974–983 (2014)
Cross-Lingual Word Sense Clustering for Sense
Disambiguation
1 Introduction
Word sense ambiguity is present in many words no matter the language, and
translation is one of the areas where this problem is important to be solved. So,
in order to select the correct translation, it is necessary to find the right meaning,
that is, the right sense, for each ambiguous word. Although multi-word terms
tend to be semantically more accurate than single words, multiword terms may
also have some ambiguity, depending on the context.
Thus, a system for automatic translation, for example, from English to Por-
tuguese, should know how to translate the word bank as banco (an institution for
receiving, lending, exchanging and safeguarding money), or as margem (the land
alongside or sloping down to a river or lake). As the efficiency and effectiveness
of a translation system depends on the meaning of the text being processed,
disambiguation will always be beneficial and necessary.
Approaches to tackle the issue of WSD may be divided in two main types:
the supervised and the unsupervised learning. The former requires semantically
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 747–758, 2015.
DOI: 10.1007/978-3-319-23485-4 75
748 J. Casteleiro et al.
tagged training data. Although supervised approaches can provide very good
results, the need for tagging may become a limitation: semantic tagging depends
on more or less complex approaches and it may occur that tagging is not pos-
sible for some languages; and POS-tagging, if used, needs good quality tag-
gers that may not exist for some languages. On the other hand, by working
with untagged information, unsupervised approaches are more easily language-
independent. However, the lack of tags may be a limitation to reach the same
level of results as those achieved by supervised approaches.
One way to work around the limitations of both supervised and unsupervised
approaches, keeping their advantages, is the use of a hybrid solution. We propose
the use of a reliable and valid knowledge source, automatically extracted from
sentence-aligned untagged bilingual parallel corpora.
In this paper we present a cross-lingual approach for Word Sense Clustering
to assist automatic and human translators on translation processes when faced
with expressions which are more complex, more ambiguous and less frequent
than general. The underlying idea is that the clustering of word senses provides
a useful way to discover semantically related senses, provided that each clus-
ter contains strongly correlated word senses. To achieve our target we propose
a semi-supervised strategy to classify words according to their most probable
senses. This classification uses a SVM classifier which is trained by the informa-
tion obtained in the process of the sense clustering. Clusters of senses are built
according to the correlation between word senses taking into account the combi-
nations of their neighbor words and the relative position of those neighbor terms;
those combinations are taken as features, which are automatically extracted [1]
from a sentence-aligned parallel corpora.
2 Related Work
Several studies that combine clustering processes with word senses and parallel
corpora has been assessed by several authors in the past years. In [3], the authors
present a clustering algorithm for cross-lingual sense induction that generates
bilingual semantic resources from parallel corpora. These resources are composed
by the senses of words of one language that are described by clusters of their
semantically similar translations in another language. The authors proved that
the integration of sense-clusters resources leads to important improvements in
the translation process. In [4], the authors proposed an unsupervised method for
clustering translations of words through point-wise mutual information, based
on a monolingual and a parallel corpora. Comparing the induced clusters to ref-
erence clusters generated from WordNet, they demonstrated that their method
identifies sense-based translation clusters from both monolingual and parallel
corpora.
Brown et al. described in [5] a statistical technique for assigning senses to
words based on the context in which they appear. By incorporating this method
in a machine translation system, a significant reduction of the translation error
rate was achieved. In [7], Diab addresses the problem of WSD from a multilingual
Cross-Lingual Word Sense Clustering for Sense Disambiguation 749
3 System Description
3.1 Dataset
The experiments performed to support the research presented in this article
comply with the datasets presented in Table 1.
the project ISTRION 1 . This lexicon contains 810.000 validated entries for the
English-Portuguese language pair, 380.000 for the English-French and 290.000 for
the English-Spanish one. This knowledge was automatically extracted and man-
ually validated. For each ambiguous word in the source language (eg. English) we
get all different senses existing in the target language (eg. Portuguese, French,
Spanish) by consulting the bilingual lexica database; see tables 2 and 3 con-
taining an example for Portuguese and French respectively. These tables show a
set of different senses for the same English word “sentence”, each one expressed
in a word in the target language. According to the content of each table, the
reader may predict that the senses could be divided in two semantically different
groups (clusters): those signed with a “*”, which are related to textual units;
and those with a “+”, related to Court resolutions. Thus, one of the purposes of
this approach is to build clusters of senses according to the semantic closeness
among word senses.
Table 2. Example of the different senses for the ambiguous word “sentence” concerning
the translation to Portuguese. Senses signed with a “*” are textual units of one or more
words. Those signed with a “+” are related to Court resolutions
Table 3. Example of the different senses for the ambiguous word “sentence” concerning
the translation to French. Senses signed with a “*” are textual units of one or more
words. Those signed with a “+” are related to Court resolutions
one language into the feature vector will be more informative than only using
monolingual features . By using a sentence-aligned parallel corpora, the pro-
posal we present in this paper confirms this principle. Thus, we use a sentence-
aligned parallel corpora (composed by Europarl2 and DGT3 ), from which we
extract features from the neighbor context of the target pair (Ambiguous Word)
\t (Sense N) that fall within a window of three words to the left and three
words to the right of each word of the pair, discarding stop-words. Each tar-
get pair has a set of features where each one is a combination of one of the
words in the window and its relative position. For a better understanding, let
us take the example of the target pair “sentence” – “frase” and one of the
sentence-pairs containing it, retrieved from the bilingual parallel corpora (EN \t
PT): Besides being syntactically well-formed, the sentence is correctly translated
\t Para além de estar sintaticamente bem formada, a frase está corretamente
traduzida. Thus, the context words of the target pair “sentence” – “frase” in this
sentence-pair are “Besides”, “syntactically”, “well-formed”, “correctly”, “trans-
lated”, “sintaticamente”, “bem”, “formada”, “corretamente” and “traduzida”,
taking into account the limits of the window (three words to the left and three
words to the right of each word of the pair). Following this, the correspond-
ing features include a tag indicating the language and the relative position of
the context word to the corresponding word of the target pair: “enL3 Besides”,
“enL2 syntactically”, “enL1 well-formed”, “enR1 correctly”, “enR2 translated”,
“ptL3 sintaticamente”, “ptL2 bem”, “ptL1 formada”, “ptR1 corretamente” and
“ptR2 traduzida” —Recall that stop-words are discarded. “L” and “R” stands
for Left and Right respectively.
However, for each target pair, there are usually several sentence-pairs
retrieved from the bilingual parallel corpora (EN \t PT), containing that target
pair. This means that probably several contexts will neighbor the same target
pair, generating several features. In our approach, everytime a feature occurs in
a sentence-pair, its frequency is incremented for the corresponding target pair.
In other words, taking the feature “enL2 syntactically”, it may have for exam-
ple: 3 occurrences for target pair “sentence” – “frase” (meaning that the word
“syntactically” occurs 2 positions left to the target word “sentence”, in 3 of
the sentence-pairs containing this target pair); 2 occurrences for “sentence” –
“oração”; 0 occurrences for “sentence” – “pena”, etc..
We consider that, there is a tendency such that, the closer the relative posi-
tion of the context word to the target word, the stronger the semantic relation
between both words. So, in this approach a different importance is assigned
to each feature,√according to their relative positions. Thus, we use the crite-
rion we called p f , that is: for features whose relative position is p, the root
of degree p is applied to frequency f , which is the number of times the feature
occurs in the set of sentence-pairs containing the target pair. This criterion was
chosen empirically as it showed good results after some experiments. Table 4
shows part of the feature extraction concerning the ambiguous word “sentence”.
2
http://www.statmt.org/europarl/
3
http://ipsc.jrc.ec.europa.eu/?id=197
752 J. Casteleiro et al.
Table 4. Feature extraction for the target pairs concerning the ambiguous word “sen-
tence” (only a small part is shown)
For reasons of space, only values for some features of one of the target pairs (“sen-
tence” – “frase”) are shown.√Values in column Final Assigned Value contain the
result of the application of p f criterion on the values of column Frequency.
The information contained in all columns of Table 4, except Frequency, form
a matrix which is the base for obtaining the Word Sense Clustering concerning
the ambiguous word “sentence”. At the end of the feature extraction task we
obtained 15 matrices per language pair, corresponding to each of the 15 ambigu-
ous words used, as referred in Table 1.
between that sense and each of the N senses. Each correlation is given by (1),
which is based on the Pearson’s correlation coefficient.
Cov(Si , Sj )
Corr(Si , Sj ) = (1)
Cov(Si , Si ) × Cov(Sj , Sj )
1
Cov(Si , Sj ) = f (Si , F ) − f (Si , .) × f (Sj , F ) − f (Sj , .) (2)
F − 1
F ∈F
where F is an element of the feature set F and f (Si , F ) stands for the Final
Assigned Value (a column of Table 4) of feature F for sense Si ; f (Si , .) gives the
average Final Assigned Value of the features for sense Si , which is given by (3).
1
f (Si , .) = f (Si , F ) (3)
F
F ∈F
Correlation given by (1) measures how semantically close are senses Si and
Sj . However, a qualitative explanation for this can be given through (2), rather
than by (1). Thus, (2) shows that, for each feature F , two deviations are taken:
one is given by the Final Assigned Value of feature F for sense Si , subtracted
from the average Final Assigned Value for Si , that is, f (Si , F ) − f (Si , .); the
other one is obtained by the Final Assigned Value of the same feature F for
sense Sj , subtracted from the average Final Assigned Value for Sj , that is,
f (Sj , F ) − f (Sj , .). If both deviations have the same algebraic sign (+/−), the
product will be positive, which means that both senses present similar deviations
concerning feature F . And, if positive products happen for most of features
resulting in high values, then there will be a strong positive covariance value
(Cov(Si , Sj )), and therefore, a high correlation (Corr(Si , Sj )) — notice that (1)
has the effect of just standardizing Cov(Si , Sj ) values, ranging from -1 to +1.
Still analyzing (2), if the partial sum of the positive products has a similar
value to the partial sum of the negative ones (when deviations are contrary), then
the correlation is close to 0, which means that the semantic closeness between
both senses is very weak (or even null). In other words, Corr(Si , Sj ) gives close
to +1 values, meaning a high correlation, when both senses tend to occur in the
same contexts. If one of the senses occur in contexts where the other sense never
occurs, and vice versa, then there is a negative correlation between them.
Our goal is to join similar senses of the same ambiguous word in the same
cluster, based on the correlation matrix obtained as explained in Subsec. 3.4. To
754 J. Casteleiro et al.
create clusters we used the WEKA tool [8] with X-means [12] algorithm. With X-
means the user does not need to supply the number of clusters, contrary to other
clustering algorithms such as k-means or k-medoids. The algorithm returns the
best solution for the correlation matrix presented as input. As a matter of fact,
for the example of the ambiguous word “sentence”, regarding the Portuguese as
the target language, it assigned the words “oração”, “expressão”, and “frase” to a
cluster, while “pena” and “condenação” were assigned to another one. In other
words, it returned the results expected that were presented in Table 2. With
respect to the possible translations of the same ambiguous word “sentence”
to French, the clusters were correctly formed too, according to the expected
distribution shown in Table 3.
The results of the clustering phase for all ambiguous words gave rise to the
evaluation presented in tables 5, 6 and 7.
However, for some target pairs existing in the bilingual lexica database, there
were very few occurrences in the sentence-aligned parallel corpora, which pre-
vents the accurate calculation of the correlation between senses. This is the
reason why clustering results are relatively poor for some ambiguous words: for
example “motion” for English-French and English-Spanish, among others, as
shown in tables 5, 6 and 7.
sentences (containing ambiguous words) were classified and so, F-measure were
calculated regarding each target language, as shown in Table 8. The fact that F-
measure for French and Spanish did not reach the same value as for Portuguese
(0.79 vs 0.85) is probably due to the fact that the EN-PT language pair lexicon
used in the clustering process was considerably larger (810, 000 entries) than
the ones used for the other two language pairs (380, 000 and 290, 000), implying
therefore a better quality training phase for that pair.
In order to have a baseline for comparison, the same tests described above
were performed using the output of GIZA++ alignments on DGT [11], where the
most probable sense is used to disambiguate each sentence: the results obtained
were 0.43, 0.38 and 0.38 respectively for EN-PT, EN-FR and EN-SP pairs.
References
1. Aires, J., Lopes, G.P., Gomes, L.: Phrase translation extraction from aligned par-
allel corpora using suffix arrays and related structures. In: Lopes, L.S., Lau, N.,
Mariano, P., Rocha, L.M. (eds.) EPIA 2009. LNCS, vol. 5816, pp. 587–597.
Springer, Heidelberg (2009)
2. Apidianaki, M.: Cross-lingual word sense disambiguation using translation sense
clustering. In: Proceedings of the 7th International Workshop on Semantic Evalu-
ation (SemEval 2013), pp. 178–182. *SEM and NAACL (2013)
3. Apidianaki, M., He, Y., et al.: An algorithm for cross-lingual sense-clustering tested
in a MT evaluation setting. In: Proceedings of the International Workshop on
Spoken Language Translation, pp. 219–226 (2010)
4. Bansal, M., DeNero, J., Lin, D.: Unsupervised translation sense clustering. In:
Proceedings of the 2012 Conference of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Language Technologies, pp. 773–782.
Association for Computational Linguistics (2012)
5. Brown, P.F., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: Word-sense disambigua-
tion using statistical methods. In: Proceedings of the 29th annual meeting on Asso-
ciation for Computational Linguistics, pp. 264–270. Association for Computational
Linguistics (1991)
6. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM
Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
7. Diab, M.T.: Word sense disambiguation within a multilingual framework. Ph.D.
thesis, University of Maryland at College Park (2003)
8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
weka data mining software: an update. ACM SIGKDD Explorations Newsletter
11(1), 10–18 (2009)
9. Lefever, E., Hoste, V.: Semeval-2010 task 3: Cross-lingual word sense disambigua-
tion. In: Proceedings of the 5th International Workshop on Semantic Evaluation,
pp. 15–20. Association for Computational Linguistics (2010)
10. Lefever, E., Hoste, V., De Cock, M.: Five languages are better than one: an attempt
to bypass the data acquisition bottleneck for WSD. In: Gelbukh, A. (ed.) CICLing
2013, Part I. LNCS, vol. 7816, pp. 343–354. Springer, Heidelberg (2013)
11. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment mod-
els. Computational Linguistics 29(1), 19–51 (2003)
12. Pelleg, D., Moore, A.W., et al.: X-means: Extending k-means with efficient esti-
mation of the number of clusters. In: ICML, pp. 727–734 (2000)
13. Rijsbergen, V. (ed.): Information Retrieval, 2nd edn. Information Retrieval Group,
University of Glasgow (1979)
14. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external
cluster evaluation measure. EMNLP-CoNLL 7, 410–420 (2007)
15. Tufiş, D., Ion, R., Ide, N.: Fine-grained word sense disambiguation based on paral-
lel corpora, word alignment, word clustering and aligned wordnets. In: Proceedings
of the 20th international conference on Computational Linguistics, p. 1312. Asso-
ciation for Computational Linguistics (2004)
Towards the Improvement of a Topic Model
with Semantic Knowledge
1 Introduction
Whether it was during pre-processing [8], the generative process [18], or post
processing [2], incorporating semantics into topic modeling emerged as an app-
roach to deal with concepts rather than surface words. Since a word may have
different meanings (e.g. bank ) and since the same concept may be denoted by
different words (e.g. car and automobile), these attempts exploit external seman-
tic resources, such as WordNet [11] or, alternatively, follow a fully unsuper-
vised approach, for instance, using word sense induction techniques [3]. In those
approaches, topic distributions with synonymous and semantically similar words
are unified in concept representations, such as synsets.
In order to improve current topic models, we propose a new model, SemLDA,
which incorporates semantics in the well-known LDA model, using knowledge
from WordNet. Similarly to other semantic topic models, the topics produced
by SemLDA are sets of synsets, instead of words. The main difference is that
SemLDA considers all possible senses of the words in a document, together with
their probabilities. Moreover, it only requires a minimal intuitive change to the
classic LDA algorithm.
The remaining of the paper is organized as follows: in Section 2, there is a brief
enumeration of existing approaches to topic modelling; Section 3 introduces the
proposed model in detail, with special focus on the differences towards the classic
LDA. Section 4 reports on the performed experiments, with illustrative examples
of the obtained topics and their automatic evaluation against the classic LDA.
Finally, Section 5 draws some conclusions and future plans for this work.
2 Related Work
The first notable approach to reduce the dimensionality of documents was Latent
Semantic Indexing (LSI) [5], which aimed at retaining the most of the variance
present in the documents, thus leading to a significant compression of large
datasets. Probabilistic Latent Semantic Indexing (pLSI) [9] later emerged as a
variant of LSI, where different words in documents are modelled as samples from
a simple mixture model where the mixture components are multinomial random
variables that can be viewed as representations of “topics”. Nevertheless, pLSI
was still not a proper generative model of documents, given that it provides no
probabilistic model at the level of documents. Having this limitation in mind,
Blei et al. [1] developed the Latent Dirichlet Allocation (LDA), a generalization
of pLSI that is currently the most applied topic model. It allows documents
to have a mixture of topics, given that it enables to capture significant intra-
document statistical structure via the mixing distribution.
The single purpose of the previous models is to discover and assign different
topics – represented by sets of surface words, each with a different probability –
to the collection of documents provided. Those approaches have no concern with
additional semantic knowledge about words, which can lead to some limitations
in the generated topics. For instance, they might include synonyms, and thus
be redundant and less informative. Alternative attempts address this problem
using, for instance, WordNet [11], a lexical-semantic knowledge base of English.
Towards the Improvement of a Topic Model with Semantic Knowledge 761
3 Proposed Model
according to the documents where they appear, regardless of their known seman-
tics. In SemLDA, instead of just words, all the possible senses for each word are
considered, although with different probabilities. We should notice that the more
informative output (synsets) does not necessarily imply an increasing complexity
in the topic representation. If needed, in order to have a comparable output to
other topic models, a single word can be selected from each synset. When using
WordNet, it makes sense to select the first word of a synset, which we recall to
be the most frequently used to denote the concept.
The graphical model of SemLDA is displayed in Figure 1, where D is the
number of documents in the corpus, K is the number of topics, N is the number
of words in a document and S is the number of synsets of a given word. In
this model, each word of a document, wn , is drawn from a concept, cn . This is
represented by using the synset’s distribution over words, parameterized by η,
which we shall assume to be fixed. The concept cn is determined by a discrete
topic-assignment zn , picked from the document’s distribution over topics θ and
a topic distribution β. It follows the same reasoning as the LDA model, but
includes a new layer corresponding to the concepts cn that the words wn express.
The generative process of a document d under SemLDA is the following:
Our goal is thus to calculate, for every document, the posterior distribution
over the latent variables, θ, z1:N , c1:N . However, as in LDA, performing exact
inference is intractable, so we need to use an approximate inference method. In
this paper, we use variational inference to perform approximate Bayesian infer-
ence. The purpose of variational inference is to minimize KL divergence between
the variational distribution q(θ, z1:N , c1:N ) and the true posterior distribution
p(θ, z1:N , c1:N |w1:N ). A fully factorized (mean field) variational distribution q,
Towards the Improvement of a Topic Model with Semantic Knowledge 763
N
K
K
− φn,i log φn,i + μ φn,k − 1 (4)
n=1 i=1 k=1
Setting the derivatives of L[φ] w.r.t φ to zero gives the update in equation 5.
K
S
φn,i ∝ exp Ψ (γi ) − Ψ γj + λn,j log βi,j (5)
j=1 j=1
N
S
S
− λn,j log λn,j + μ λn,k − 1 (6)
n=1 j=1 k=1
764 A. Ferrugento et al.
Setting the derivatives of L[λ] w.r.t λ to zero gives the update in equation 7.
Vj
K
λn,j ∝ exp φn,i log βi,j + wn,i log ηj,i (7)
i=1 i=1
Setting the derivatives of L[β] w.r.t β to zero gives the following update in
equation 9, which is analogous to the update in standard LDA [1], but with the
d
words wn,j replaced by their probability in the j th concept, λdn,j .
Nd
D
βi,j ∝ λdn,j φdn,i (9)
d=1 n=1
4.1 Datasets
Two freely available textual corpora were used in our experiments, namely: the
Associated Press (AP) and the 20 Newsgroups dataset, both in English. AP is a
large news corpus, from which we used only a part. More precisely, the sample
data for the C implementation of LDA, available in David Blei’s website1 , which
1
http://www.cs.princeton.edu/∼blei/lda-c/
Towards the Improvement of a Topic Model with Semantic Knowledge 765
4.2 Experiments
The experiments performed aimed at comparing the classic LDA algorithm [1]
with SemLDA. An implementation of the classic algorithm, implemented in C,
is available from Blei’s website5 . No changes were made to his code. We just had
to pre-process the documents, generate a suitable input, and execute it.
For running SemLDA, extra work was needed. First, we retrieved all synsets
from the SemCor 3.0 annotations6 , and calculated their probability in this cor-
pus. This is a straightforward task for those WordNet synsets that are in Sem-
Cor. But SemCor is a limited corpus and does not cover all words and senses in
WordNet. To handle this issue, an extra pre-processing step was added, where
all documents were reviewed and, when a word did not occur in SemCor, a new
‘dummy’ synset was created with a special negative id, and probability equal to
the average probability of all the other synsets. This value was chosen to balance
the unknown probabilities of dummy synsets according to the probabilities of
the remaining synsets, and thus not favor any of them.
For each dataset, the SemLDA input file had the synsets retrieved from
SemCor and the words that were in the documents, but not in SemCor. The
only difference on the text pre-processing is the use of part-of-speech (POS)
tagging, to consider only open-class words, namely nouns, verbs, adjectives and
adverbs.
Instead of trial and error with different numbers of topics, we used a Hierar-
chical Dirichlet Process (HDP) [19] to discover the appropriate number of top-
ics for each dataset. The results obtained suggested that the 20 Newsgroups
dataset contains 15 topics and the AP corpus 24. After the pre-processing
phase, each model was run for both datasets, with the α parameter fixed at
0.5. Tables 1 and 2 illustrate the results obtained with the classic LDA and
SemLDA, respectively for the 20 Newsgroups and for the AP corpus. For each
topic discovered by the SemLDA presented, we tried to find an analogous topic
by the classic LDA, in a sense that they share similar domains. For the sake of
simplicity, we only show the top 10 synsets for each SemLDA topic, with their
Synset ID, POS-tag, words and gloss. Underlined words are those present in
SemCor and WordNet, whereas the others only appear in WordNet. For each
LDA topic only the top 10 words are displayed.
2
http://qwone.com/∼jason/20Newsgroups/
3
http://snowball.tartarus.org/algorithms/english/stop.txt
4
http://www.nltk.org/
5
http://www.cs.princeton.edu/∼blei/lda-c/
6
http://web.eecs.umich.edu/∼mihalcea/downloads.html#semcor
766 A. Ferrugento et al.
Table 1. Illustrative (analogous) topics from 20 Newsgroups, obtained with the classic
LDA (top) and with SemLDA (bottom).
LDA
medical, health, use, patient, disease, doctor, cancer, study, infection, treatment
SemLDA
Synset ID POS Words Gloss
14447908 N health, wellness A healthy state of well being free from disease.
3247620 N drug A substance that is used as a medicine or narcotic.
10405694 N patient A person who requires medical care.
10020890 N doctor, doc, physician, MD, A licensed medical practitioner.
Dr., medico
14070360 N disease An impairment of health or a condition of abnormal
functioning.
14239918 N cancer, malignant neoplastic Any malignant growth or tumor caused by abnor-
disease mal and uncontrolled cell division.
47534 ADV besides, too, also, likewise, In addition.
as well
14174549 N infection The pathological state resulting from the invasion
of the body by pathogenic microorganisms.
1165043 V use, habituate Take or consume (regularly or habitually).
7846 N person, individual, someone, A human being.
somebody, mortal, soul
The results show success on incorporating semantics into LDA. Topics are
based on synsets and WordNet can be used to retrieve additional information
on the concept they denote, including their definition (gloss), POS and other
words with the same meaning. With both models, the top words of each topic are
consistently nouns, which should transmit more content. The presented examples
clearly describe very close semantic domains. They share many words and the
other are closely related to each other (eg. drug and treatment, or exchange and
trading). We call attention to topics where the same word is in different synsets
(Table 2). While this might sometimes be undesirable, and a possible sign of
incoherence, it also shows that the algorithm is correctly handling different senses
of the same word. These situations should be minimized in the future, as we
intend to acquire sense probabilities from word sense disambiguation (WSD) [14],
instead of relying blindly in SemCor for this purpose. This will also minimize
the number of dummy synsets.
We can say that the overall results are satisfying. Despite one or another less
clear word association, we may say that we are moving towards the right direc-
tion. Still, to measure progress towards the classic LDA, we made an automatic
evaluation of the coherence of the discovered topics.
4.3 Evaluation
Although, at a first glance, the results might seem promising, they were validated
automatically, using metrics previously applied to the context of topic modelling,
namely: pointwise mutual information (PMI) and topic coherence.
Table 2. Illustrative (analogous) topics from AP, obtained with the classic LDA (top)
and with SemLDA (bottom).
LDA
stock, market, percent, rate, price, oil, rise, say, point, exchange
SemLDA
Synset ID POS Words Gloss
8424951 N market The customers for a particular product or service.
13851067 N index A numerical scale used to compare variables with
one another or with some reference number.
8072837 N market, securities industry The securities markets in the aggregate.
13342135 N share Any of the equal portions into which the capital
stock of a corporation is divided and ownership of
which is evidenced by a stock certificate.
79398 N trading Buying or selling securities or commodities.
3843092 N oil, oil color, oil colour Oil paint containing pigment that is used by an
artist.
7167041 N price A monetary reward for helping to catch a criminal.
14966667 N oil A slippery or viscous liquid or liquefiable substance
not miscible with water.
5814650 N issue An important question that is in dispute and must
be settled.
13333833 N stock The capital raised by a corporation through the
issue of shares entitling holders to an ownership
interest (equity).
We recall that topics discovered by SemLDA are sets of synsets and not
of surface words. Therefore, to enable a fair comparison with the classic LDA,
before computing the PMI scores, we converted our topics to a plain word rep-
resentation. For this purpose, instead of full synsets, we used only their first
word. According to WordNet, this is the word most frequently used to denote
the synset concept, in the SemCor corpus. For instance, the SemLDA topic
in Table 2 becomes: market, index, market, share, trading, oil, price, oil, issue,
stock. On the one hand, this representation limits the extent of our results, which
are, in fact, synsets. On the other hand, by doing so, it might lead to duplicate
words in the same topic, though corresponding to different senses.
768 A. Ferrugento et al.
The results obtained with the two topic models, for both datasets, are pre-
sented in Table 3. Even if it was a close call, SemLDA outperformed the classic
LDA. On both metrics, SemLDA had better scores in the AP corpus, which was
the dataset originally used by Blei. For the 20 Newsgroups dataset, the topic
coherence measure was very close with both models, whereas the PMI score was
better for SemLDA. This confirms that we are heading towards a promising
approach that, by exploiting an external lexical-semantic knowledge base, may
improve the outcome of the classic LDA model. We should still stress that these
are just preliminary results. The following steps are explained with further detail
in the next section.
20 Newsgroups AP
PMI Coherence PMI Coherence
LDA 1.16 ± 0.39 -32.89 ± 19.77 1.12 ± 0.31 -13.62 ± 9.51
SemLDA 1.22 ± 0.46 -35.4 ± 17.65 1.43 ± 0.26 -9.18 ± 7.51
5 Concluding Remarks
We have presented SemLDA, a topic model based on the classic LDA that incor-
porates external semantic knowledge to discover less redundant and more infor-
mative topics, represented as concepts, instead of surface words. We may say that
we have been successful so far. The classic algorithm was effectively changed to
produce topics based on WordNet synsets, which, after an automatic validation,
shown to have comparable coherence to the original topics. Despite the promising
results, there is still much room for improvement.
In fact, to simplify our task, we relied on some assumptions that should be
dealt with in a near future and, hopefully, lead to improvements. For instance,
Towards the Improvement of a Topic Model with Semantic Knowledge 769
the α parameter of LDA was simply set to a fixed value of 0.5. Its selection
should be made after testing different values and assess their outcomes. We are
also considering to add a Dirichlet prior over the variable concerning topics, β,
so that it produces a smooth posterior and controls sparsity. Additional planned
tests include the generation of topics considering just a subset of the open class
words, for instance, just nouns, which might be more informative. Last but
not least, we recall that we obtained the word sense probabilities directly from
SemCor. While this corpus is frequently used in WSD tasks and should thus have
some representativeness, this approach does not consider the context where the
words occur. Instead of relying in SemCor, it is our goal to perform all-words
WSD to the input corpora, and this way extract the probabilities of selecting
different synsets, given the word context. This should also account for words
that are not present in SemCor, and minimize the number of dummy synsets,
with the averaged probabilities assigned.
Moreover, we should perform an additional evaluation of the results, not
just through automatic measures, but possibly using people to assess the topic
coherence. For instance, we may adopt the intruder test, where judges have
to manually select the word not belonging to a topic (see [15]). It is also our
intention to evaluate SemLDA indirectly, by applying it in tasks that require
topic modelling, such as automatic summarization and classification.
We conclude by pointing out that, although the proposed model is language
independent, it relies in language specific resources, especially the existence of a
wordnet, besides models for POS-tagging and lemmatization. One of our mid-
term goals is precisely to apply SemLDA to Portuguese documents. For such, we
will use available POS-taggers and lemmatizers for this language, as well as one
or more of the available wordnets (see [7] for a survey on Portuguese wordnets).
Since sense probabilities will soon be obtained from a WSD method, the
unavailability of a SemCor-like corpus for Portuguese is not an issue.
References
1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine
Learning Research 3, 993–1022 (2003)
2. Boyd-Graber, J., Blei, D., Zhu, X.: A topic model for word sense disambigua-
tion. In: Proceedings of 2007 Joint Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning (EMNLP-
CoNLL), pp. 1024–1033. ACL Press, Prague, Czech Republic, June 2007
3. Brody, S., Lapata, M.: Bayesian word sense induction. In: Proceedings of 12th Con-
ference of the European Chapter of the Association for Computational Linguistics.
EACL 2009, pp. 103–111. ACL Press (2009)
4. Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by
combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P.,
Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.)
ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
770 A. Ferrugento et al.
5. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.:
Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
6. Flaherty, P., Giaever, G., Kumm, J., Jordan, M.I., Arkin, A.P.: A latent variable
model for chemogenomic profiling. Bioinformatics 21(15), 3286–3293 (2005)
7. Gonçalo Oliveira, H., de Paiva, V., Freitas, C., Rademaker, A., Real, L., oes, A.S.:
As wordnets do português. In: Simões, A., Barreiro, A., Santos, D., Sousa-Silva, R.,
Tagnin, S.E.O. (eds.) Linguı́stica, Informática e Tradução: Mundos que se Cruzam,
OSLa, vol. 7, no. 1, pp. 397–424. University of Oslo (2015)
8. Guo, W., Diab, M.: Semantic topic models: combining word distributional statistics
and dictionary definitions. In: EMNLP, pp. 552–561. ACL Press (2011)
9. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development in infor-
mation retrieval, pp. 50–57. ACM (1999)
10. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to vari-
ational methods for graphical models. Machine learning 37(2), 183–233 (1999)
11. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM
38(11), 39–41 (1995)
12. Miller, G.A., Chodorow, M., Landes, S., Leacock, C., Thomas, R.G.: Using a
semantic concordance for sense identification. In: Proceedings of ARPA Human
Language Technology Workshop. Plainsboro, NJ, USA (1994)
13. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing
semantic coherence in topic models. In: Proceedings of the Conference on Empirical
Methods in Natural Language Processing. EMNLP 2011, pp. 262–272. ACL Press
(2011)
14. Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys 41(2),
1–69 (2009)
15. Newman, D., Bonilla, E.V., Buntine, W.: Improving topic coherence with reg-
ularized topic models. In: Advances in Neural Information Processing Systems,
pp. 496–504 (2011)
16. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic
coherence. In: Human Language Technologies: The 2010 Annual Conference of the
North American Chapter of the Association for Computational Linguistics. HLT
2010, pp. 100–108. ACL Press (2010)
17. Rajagopal, D., Olsher, D., Cambria, E., Kwok, K.: Commonsense-based topic mod-
eling. In: Proceedings of the 2nd International Workshop on Issues of Sentiment
Discovery and Opinion Mining, p. 6. ACM (2013)
18. Tang, G., Xia, Y., Sun, J., Zhang, M., Zheng, T.F.: Topic models incorporat-
ing statistical word senses. In: Gelbukh, A. (ed.) CICLing 2014, Part I. LNCS,
vol. 8403, pp. 151–162. Springer, Heidelberg (2014)
19. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes.
Journal of the american statistical association 101(476) (2006)
20. Wang, C., Blei, D., Li, F.F.: Simultaneous image classification and annotation.
In: IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009,
pp. 1903–1910. IEEE (2009)
RAPPORT — A Portuguese
Question-Answering System
1 Introduction
It is also possible that the information sources are not, or can not be, struc-
tured in such a way that can be easily accessed by more conventional techniques
of information retrieval (IR) [2].
Most of these issues are addressed by question answering (QA) systems [22],
which allow the user to interact with the systems by using natural language, and
process documents whose contents are specified also using natural language.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 771–782, 2015.
DOI: 10.1007/978-3-319-23485-4 77
772 R. Rodrigues and P. Gomes
2 Question Answering
Question answering, as in other subfields of IR, may include techniques such as:
named entity recognition (NER) or semantic classification of entities, relations
between entities, and selection of semantically relevant sentences, phrases or
chunks [14], beyond the customary tokenization, lemmatization, and part-of-
speech (POS) tagging. QA can also address a restricted set of topics, in a closed
world domain, or forgo that restriction, operating in an open world domain.
Most approaches usually follow the framework shown in Fig. 1, where most
of the processing stages are made on run-time (except for document indexing).
2.1 Esfinge
Esfinge [4] is a general domain question answering system that tries to take
advantage of the great amount of information existent in the Web. Esfinge relies
on pattern identification and matching. For each question, a tentative answer is
created. For instance, a probable answer for a “What is X ?” question will start
RAPPORT — A Portuguese Question-Answering System 773
with “X is...”. Then this probable answer beginning is used to search the corpus,
through the use of a search engine, in order to find possible answers that match
the same pattern. In the following stages of the process, n-grams are scored and
NER is performed in order to improve the performance of the system.
2.2 Senso
The Senso Question Answering System [19] (previously PTUE [17]) uses a local
knowledge base, providing semantic information for text search terms expansion.
It is composed of two modules: the solver module, that uses two components to
collect plausible answers (the logic and the ad-hoc solvers); and the logic solver,
that starts by producing a first-order logic expression representing the question
and a logic facts list representing the text information and then looking for
answers within the facts list that unify and validate the question logic form.
There is also an ad-hoc solver for cases where the answer can be directly detected
in the text. After all modules are used, the results are merged for answer list
validation, to filter and adjust answers weight.
2.3 Priberam
Some of the most well known works on NLP and QA have been done at Priberam.
Priberam’s QA System for Portuguese [1] uses a conservative approach, where
the system starts by building contextual rules for performing morphological dis-
ambiguation, named entity recognition, etc.. Then it analyses the questions and
divides them into categories. These same categories are applied to sentences
in the source text. This categorization is done according to question patterns,
answer patterns and question answering patterns (where pattern match between
answer and question is performed).
2.4 NILC
2.5 RAPOSA
2.6 QA@L2 F
QA@L2 F [13], the question-answering system from L2 F, INESC-ID, is a system
that relies on three main tasks: information extraction, question interpretation
and answer finding.
This system uses a database to store information obtained by information
extraction, where each entry is expected to represent the relation between the
recognized entities. On a second step, the system processes the questions, cre-
ating SQL queries that represent the question and are run in the database. The
retrieved records from the database are then used to find the wanted questions,
through entity matching.
2.7 IdSay
Our system, RAPPORT, follows most of the typical framework for a QA sys-
tem, while being an open domain system. It does, however, improves on some
techniques that differ from other approaches to Portuguese.
One of the most differentiating elements is the use of triples as the basic
unit of information regarding any topic, represented by a subject, a predicate
and an object, and then using those triples as the base for answering questions.
This approach shares also some similarities with open information extraction,
regarding the storage of information in triples [7].
RAPPORT depends on a combination of four modules, addressing infor-
mation extraction, storage, querying and retrieving. The basic structure of the
system comprehends the following modules:
the second chunk. If the second chunk starts with a determinant or a noun, the
predicate of the future triple is set to ser ; if it starts with the preposition em
(in), it is used the verb ficar ; if it starts with the preposition de (of), it is used
the verb pertencer ; and so on.
As an example, the sentence “Mel Blanc, o homem que deu a sua voz a o
coelho mais famoso de o mundo, Bugs Bunny, era alrgico a cenouras.”5 yields
three distinct triples: “{Bugs Bunny} {ser} {o coelho mais famoso do mundo}”
and “{Mel Blanc} {ser} {o homem que deu a sua voz ao coelho mais famoso do
mundo}”, both using the proximity approach, and “{Mel Blanc} {ser} {alrgico
a cenouras}”, using the dependency approach.
After triple extraction is performed, Lucene 6 is used for storing the triples, the
sentences where the triples are found, and the documents that, by its turn,
contain those sentences. For that purpose, three indices were created:
5
In English,“Mel Blanc, the man who lent his voice to the world’s most famous rabbit,
Bugs Bunny, was allergic to carrots.”.
6
For Lucene, please refer to http://lucene.apache.org.
RAPPORT — A Portuguese Question-Answering System 777
– the triple index stores the triples (subject, predicate and object), their id,
and the ids of the sentences and documents that contain them;
– the sentence index stores the sentences’ id (a sequential number representing
its order within the document), the tokenized text, the lemmatized text and
the documents’ id they belong to;
– the document index stores the data describing the document, as found in
CHAVE (number, id, date, category and author);
Although each index is virtually independent from the others, they can refer
one another by using the ids of the sentences and of the documents. That way,
it is easy to determine the relations between documents, sentences, and triples.
These indices (mainly the sentence and the triple indices) are then used in the
next steps of the present approach.
When a sentence contains more than one triple, it is selected the triple which
predicate matches the verb in the initial query. If that fails, it is selected the
triple, as a whole, that best matches the query, accordingly to the Lucene ranking
algorithm for text matches. After a triple is selected, if the best match against the
query is found in the subject, the object is returned as being the answer. If, on
the other hand, the best match is found against the object, it is the subject that
is returned. An algorithm describing both the data querying and this process is
found in Alg. 2.
Data: Question
Result: Answers
Create query using named entities, proper nouns, or nound as mandatory, and
the remaining lemmas from the question as optional;
Run query against sentence index ;
foreach sentence hit do
Retrieve triples related to the sentence;
foreach triple do
if subject contains named entities from question then
Add object to answers and retrieve sentence and document
associated with the triple;
end
else if object contains named entities from question then
Add subject to answers and retrieve sentence and document
associated with the triple;
end
end
end
Algorithm 2: Answer Retrieval Algorithm
Continuing with the used example, given the correct sentence, of the three
triples, the one that best matches the query is “{Mel Blanc} {ser} {alrgico a
cenouras}”. Removing from the triple the known terms from que question, what
remains must yield the answer: “[a] cenouras”. Besides that, as the named entity,
Mel Blanc, is found in the subject of the triple, the answer is most likely to be
found in the object.
4 Experimentation Results
For the experimental work, we have used the CHAVE corpus [20], a collection
of the 1994 and 1995 editions — a total of 1456 — of newspapers “Pblico” and
“Folha de So Paulo”, with each of the editions usually comprehending over one
hundred articles, identified by id, number, date, category, author, and the text
of the news article itself.
CHAVE was used in the Cross Language Evaluation Forum (CLEF)8 QA
campaigns as a benchmark — although in the lasts editions of the Multilingual
QA Track at CLEF a dump of the Portuguese Wikipedia was also used.
8
http://www.clef-initiative.eu/
RAPPORT — A Portuguese Question-Answering System 779
Almost all of the questions used in each of the CLEF editions are known. It is
also known the results of each of the contestant systems. The questions used in
CLEF adhere to the following criteria [12]: they can be list questions, embedded
questions, yes/no questions (although none was found in the questions used for
Portuguese), who, what, where, when, why, and how questions, and definitions.
For reference, in Table 1 there is a summary of the best results (all answers
considered) for the Portuguese QA task on CLEF from 2004 to 2008 (abridged
from [6,8,11,12,23]), alongside with the arithmetic mean for each system com-
prehending the editions where they were contenders. At the end of the table, it
is also shown the results of the proposed system.
Although we are using the questions for Portuguese used in CLEF in those
years, a major restriction applies: just the questions made upon CHAVE with
known answers were selected, as made available from Linguateca. As such, we
are using a grand total of 641 questions for testing our system.
Notice that the years of 2004, 2005 and 2006 have only 180 (out of 200) known
questions each, with their answers found in CHAVE, and the other two years
have the remaining, with 56 in 2007, and 45 in 2008. In those two last years, the
majority of the questions had the answers found on the Portuguese Wikipedia
instead of just CHAVE. As such, the results for 2007 and 2008 represent the
overall accuracy (grouping CHAVE and Wikipedia) of the different systems in
those years, and not just for questions over CHAVE — unfortunately, the values
for CHAVE and Wikipedia were not available separately. That is the reason for
omitting the average result of our system in Table 1, and signalling the results
for the years 2007 and 2008.
For verifying if the retrieved triples contain the expected answers, the triples
must contain (in the subject or in the object) the named entities found in the
questions (or, in alternative, proper nouns, and, if that fails, just the remaining
nouns), and also match in the subject or in the object the known answers from
CHAVE (alongside the same document id).
Using the set of questions that were known to have their answers found on
CHAVE, on a base line scenario, we were able to find the triples that answer
780 R. Rodrigues and P. Gomes
42.09% of the questions (274 in 641), grouping all the question from the already
identified editions of CLEF, without a limit of triples for each question. (If the
maximum retrieved triples per question is reduced to 10, the number of answered
questions drops to 20.75%.)
On the answers that have not been found, we have determined that in a
few cases the fail is due to questions depending on information contained in
other questions or their answers. In other situations, the problem lies on the
use of synonyms, hyponyms, hypernyms and related issues: for instance, the
question focusing on a verb and the answer having a related noun, as in “Who
wrote Y? ” for a question and “X is the author of Y.”. There are certainly also
many shortcomings in the creation of the triples, mainly on the chunks that are
close together, as opposed to the dependency chunks, that should and must be
addressed, in order to improve and create more triples. Furthermore, there are
questions that refer to entities that fail to be identified as such by our system,
an so no triples were created for them when processing the sentences.
Finally, the next major goal is to use the Portuguese Wikipedia as a repos-
itory of information, either alongside CHAVE, to address the latter editions of
CLEF, or by itself, as it has happened in Págico [15].
References
1. Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Prib-
eram’s question answering system for Portuguese. In: Peters, C., et al. (eds.) CLEF
2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006)
2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press,
New York (1999)
3. Carvalho, G., de Matos, D.M., Rocio, V.: IdSay: question answering for Portuguese.
In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 345–352. Springer,
Heidelberg (2009)
4. Costa, L.F.: Esfinge – a question answering system in the web using the Web. In:
Proceedings of the Demonstration Session of the 11th Conference of the European
Chapter of the Association for Computational Linguistics, pp. 410–419. Association
for Computational Linguistics, Trento, Italy, April 2006
5. Filho, P.P.B., de Uzêda, V.R., Pardo, T.A.S., das Graças Volpe Nunes, M.: Using
a text summarization system for monolingual question answering. In: CLEF 2006
Working Notes (2006)
6. Forner, P., et al.: Overview of the CLEF 2008 multilingual question answering
track. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 262–295.
Springer, Heidelberg (2009)
7. Gamallo, P.: An overview of open information extraction. In: Pereira, M.J.V.,
Leal, J.P., Simões, A. (eds.) Proceedings of the 3rd Symposium on Languages,
Applications and Technologies (SLATE 2014). OpenAccess Series in Informatics,
pp. 13–16. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publish-
ing, Germany (2014)
8. Giampiccolo, D., et al.: Overview of the CLEF 2007 multilingual question answer-
ing track. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A.,
Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 200–236. Springer,
Heidelberg (2008)
9. Gonçalo Oliveira, H.: Onto.PT: Towards the Automatic Construction of a Lexical
Ontology for Portuguese. Ph.D. thesis, Faculty of Sciences and Technology of the
University of Coimbra (2012)
10. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson
Education International Inc, Upper Saddle River (2008)
11. Magnini, B., et al.: Overview of the CLEF 2006 multilingual question answering
track. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W.,
de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 223–256.
Springer, Heidelberg (2007)
12. Magnini, B., et al.: Overview of the CLEF 2004 multilingual question answering
track. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini,
B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 371–391. Springer, Heidelberg (2005)
13. Mendes, A., Coheur, L., Mamede, N.J., Ribeiro, R., Batista, F., de Matos,
D.M.: QA@L2 F, first steps at QA@CLEF. In: Peters, C., Jijkoun, V., Mandl, T.,
Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS,
vol. 5152, pp. 356–363. Springer, Heidelberg (2008)
782 R. Rodrigues and P. Gomes
1 Introduction
With the dawn of Text Mining (TM) a massive amount of information was
enabled to be retrieved automatically. It is intended to find and quantify even
subtle correlations over a large amount a data. This way, a wide variety of themes
(economics, sports, etc) with different levels of structure, could be analyzed
with little effort. Many TM solutions have been employed in security and web
text analysis (blogs, news, etc). TM has been used in sentiment analysis as
for evaluating movie reviews to estimate acceptability [1]. Forensic Linguistics
enhances TM by considering higher level features of text. Linguistic techniques
are usually applied to legal and criminal contexts for problems such as document
authorship, analysis and measure of content and intent.
The purpose of this study is to generate a small representation of a large
corpus of poems, able to discern between authors or aliases. For this initial
study we selected to classify the collection of poems by two of Fernando Pes-
soa’s heteronyms. Ricardo Reis and Álvaro de Campos were chosen due to their
contrasting themes and initial concerns relative to the model’s accuracy for this
kind of dataset.
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 783–788, 2015.
DOI: 10.1007/978-3-319-23485-4 78
784 J.F. Teixeira and M. Couto
To the best of our knowledge, there are no pattern recognition studies for
alias distinction on poetic texts, therefore, no direct comparison of this work can
be made. On the other hand, there is research on generic alias identification [2],
however the objective is to find which aliases correspond to the same author and
not to distinguish between personnas.
The author whose works we analyze is Fernando Pessoa [3], who wrote under
several heteronyms or aliases. Each one had their own life stories and personal
taste in writing style and theme.
Ricardo Reis is an identity of classical roots, when considering his poems’
structure, theatricality and entities mentioned (ancient Greek and Roman ref-
erences). He is fixated with death and avoids sorrow by trying to disassociate
himself with anything in life. He seeks resignation and intellectual happiness.
Álvaro de Campos has a different personality, even presenting an internal
evolution. Initially, he is shown to be a thrill seeker, mechanic enthusiast, and
wishes to live the future. In the end, he feels defeated by time and devoided
of the will to experience life. Consequently, he uses a considerable amount of
interjections, in a weakly formatted writing style, with expressive punctuation.
The remaining of this document is structured as follows: In Section 2 the
dataset is presented and described. Section 3 shows the methodology employed.
Section 4 details the experimental approach, along with the result discussion.
Section 5 the overall findings are presented along with possible future work.
2 Dataset
The dataset used in this work consists of the complete known poetic works1 of
Ricardo Reis (class RR) and Álvaro de Campos (class AC). Table 1 presents
some statistics concerning the dataset.
# of Words # of Verses
Class # of entries % Avg. Std. Min Max Avg. Std. Min Max
RR 129 54% 77.9 65.1 19 570 14 12 4 106
AC 108 46% 360.9 904.2 29 7857 46 103 5 909
3 Methodology
In this section, the steps taken and experimental approach followed are shown.
First, we tested the classifier with the tokenized documents and we progressively
introduced other pre-processing models, comparing their performance.
The SVM model was validated with 70% of the dataset using 5-fold cross-
validation while the remaining 30% enabled to evaluate the generalization per-
formance of the generated model, i.e., the voting result of the 5 fold models.
1
Available at: http://www.dominiopublico.gov.br
Automatic Distinction of Fernando Pessoas’ Heteronyms 785
4 Experimental Results
4.1 Estimation Using Cross-Validation
Binary TO TO TF TF-IDF
Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC
80.74 84.91 73.33 68.71 77.39 49.01 91.03 91.89 89.80 90.44 91.58 88.73
In this section, we considered, from the previous experiments, the pipelines with
the two best performances and with BTO (baseline), while using the remaining
30% of the dataset. The results are shown on Table 3.
As expected, BTO maintained the low accuracy and obtained lower F1AC .
The best two models kept the high accuracy, however, tf-idf managed to over-
come the improvement of TF, from the validation phase, even if only by 3
instances. This suggests that, even though tf-idf presented lower validation
results, it was somewhat underfitting. Either way, these comparative results were
expected due to the consideration of term rarity metrics of idf.
Due to a large difference of length statistics between the two labels, we conducted
further analysis. The documents were segmented into multiple instances such
Automatic Distinction of Fernando Pessoas’ Heteronyms 787
Binary TO TF TF-IDF
Acc F1RR F1AC Acc F1RR F1AC Acc F1RR F1AC
66,20 76,47 40,00 92,96 93,83 91,80 97,18 97,44 96,88
# of Words # of Verses
Class # of entries % Avg. Std. Min Max Avg. Std. Min Max
RR 169 51% 63.5 27.0 3 161 11 4 1 17
AC 165 49% 240.8 150.0 29 521 30 145 5 54
that the portions had the maximum amount of verses equal to the previous mean
for that class (plus a tolerance). Table 4 presents the updated class distribution.
For this experiment, the complete word processing pipeline and tf-idf scor-
ing criteria were used (testing phase best results). For the cross-validation and
testing steps, respectively, the accuracy was 96.58% and 96.04%; the F1RR was
96,64% and 96,15% and the F1AC was 96,49% and 95,92%.
The results show that imposing the upper bound on the number of verses
per poem increased the accuracy of the model in the validation phase by around
5%. Apart from accuracy, the model should have really improved since the mis-
classification rates became more balanced.
It is safe to assume that longer poems might be more difficult to sort correctly
into classes since they encompass more terms that can be highly influential to
the tf-idf metric (through emphatic repetition, etc) which may not contribute
positively for the accurate learning of attribute weights. Thus, this can affect
poorly on the classification. On the other hand, the test results, in a way, con-
tradict the analysis from the cross-validation. It performs slightly worse than the
test experiment for tf-idf. As of this, we cannot provide an acceptable hypothesis
as to which this occurs, rendering this analysis inconclusive.
References
1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using
machine learning techniques. In: Proceedings of the ACL 2002 Conference on
Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10,
pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
2. Nirkhi, S., Dharaskar, R.V.: Comparative study of authorship identification tech-
niques for cyber forensics analysis (2014). CoRR abs/1401.6118
3. de Castro, M.G. (ed.): Fernando Pessoa’s Modernity Without Frontiers: Influences,
Dialogues, Responses. Tamesis Books, Woodbridge (2013)
4. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Com-
put. Surv. 34(1), 1–47 (2002)
5. Quaresma, P., Pinho, A.: Análise de frequências da lı́ngua portuguesa. In: Livro de
Actas da Conferência Ibero-Americana InterTIC, pp. 267–272. IASK, Porto (2007)
6. Porter, M.F.: Snowball: A language for stemming algorithms, October 2001.
http://snowball.tartarus.org/texts/introduction.html
7. Porter, M.F.: Snowball: Portuguese stemming algorithm. http://snowball.tartarus.
org/algorithms/portuguese/stemmer.html
8. Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval.
Inf. Process. Manage. 24(5), 513–523 (1988)
9. Robertson, S.: Understanding inverse document frequency: on theoretical argu-
ments for IDF. Journal of Documentation 60(5), 503–520 (2004)
10. Vapnik, V.N.: An overview of statistical learning theory. Trans. Neur. Netw. 10(5),
988–999 (1999)
11. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization.
In: Proceedings of the 19th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR 1996, pp. 21–29. ACM,
New York (1996)
Social Impact - Identifying Quotes
of Literary Works in Social Networks
1 Introduction
Social networks emerged in last decade and changed the way we communicate,
becoming essential tools in the human interaction. This happened possibly due
to the fact that, at the distance of a click, lays the possibility to send and share
content. As Kwak et al.[4] refers, this wide use of social networks provide a great
interest of investigation in many areas like extraction and information analysis.
Most of the information shared in social networks such as Twitter and Face-
book is in text format, and an interesting amount of such information (messages)
contains quotes to literary works (e.g.: “Tudo vale a pena, quando a alma não é
pequena - Fernando Pessoa”). Nevertheless, in a non-neglectable number of cases
there is no reference to which text, book or literary work the quote is referred.
Due the fact that quotes may have incoherencies (e.g.: quote is different
from the original text), the identification of the original text or author can be
very challenging. These incoherencies have a higher presence on social networks
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 789–795, 2015.
DOI: 10.1007/978-3-319-23485-4 79
790 C. Barata et al.
2 Related Work
Social Impact platform is generically supported on two different technological
blocks: SocialBus, a social network crawling and analysis platform, and Lucene,
a high-scalable infrastructure for indexing and querying documents.
SocialBus: Social networks such as Twitter and Facebook provide APIs that
allow access to public messages, within certain limits, giving the possibility of
analysing such content for a variety of purposes, including quotes detection. We
propose to use SocialBus platform[2,7], a framework that collects and analyse
data from Twitter and Facebook for a pre-defined set of users representative of
the Portuguese community.
Lucene: is an open-source software2 for text searching and indexing through
a document indexation, coded in Java programming language and developed by
Apache Software Foundation. According to Gospodnetic et al.[3] this framework
works through the indexation of documents, information parsers and queries
to consult and retrieve the indexed information. The result is a ranked list of
documents ordered by relevance [1,5,6].
1
http://reaction.fe.up.pt/socialbus/
2
http://lucene.apache.org/core/
Social Impact - Identifying Quotes of Literary Works in Social Networks 791
3.1 Architecture
3
Data obtained from “Arquivo Pessoa” available at http://arquivopessoa.net
4
https://dev.twitter.com/rest/public
5
https://developers.facebook.com
792 C. Barata et al.
4 Case Studies
O Mundo em Pessoa 6 is a web based project that aims to depict the presence
of Fernando Pessoa poems on social networks, based on quotes to his literary.
This project is based on Social Impact platform and it covers Fernando Pessoa
work and from all his heteronyms. The list of terms used to narrow the mes-
sages crawling (refer to Section 3.1) contains the names of all Fernando Pessoa
heteronyms. This project is supported on a Web Application that displays the
identified quotes from Fernando Pessoa organized by timeframes, going from one
6
http://fernandopessoa.labs.sapo.pt/
Social Impact - Identifying Quotes of Literary Works in Social Networks 793
day to one month. For each quote, the user has the possibility to explore the
number of social network users that publish that particular quote and access the
original message, among other features.
Lusica 7 main purpose was study the lusophone music and its presence on the
social networks, supported on Social Impact platform. There are two importante
aspects that differentiate “Lusica” from “Mundo em Pessoa”: (i) the domain is
music instead of literary, and (ii) a large effort was put on the visualization of the
information obtained from the quotes detection, through an interactive graph
available online. “Lusica” external knowledge (refer to Figure 1) is based on the
musics’ and albums’ titles from lusophone artists. Such information was obtained
from LastFM APIs8 (the list of lusophone artists) and from MusicBrainz service9
(the albums and musics titles for each lusophone artist).
1, 0 and 2, 0). Concerning Facebook, precision value for low scores was 100% and
recall 21% while for high scores was precision was 96% and recall was 100%. The
average precision for “Mundo em Pessoa” was PM undoemP essoa = 98%, while
recall was PM undoemP essoa = 59%. In respect to “Lusica”, the same principle
was followed, by selecting a sample of 200 messages and dividing them in two
groups (Twitter messages with low and high Lucene score). Results shown a
average precision value PLusica = 100%, while recall was PLusica = 53%.
Execution Time: the performance of Social Impact platform was also eval-
uated, measuring the execution time of processing a single message in a desktop
with an Intel(R) Xeon(R) CPU E5405 @ 2.00GHz processor and 3GB of RAM
memory. For O Mundo em Pessoa the results achieved an average time of exe-
cution of 0,01 seconds (±0, 002). Regarding Lusica, the result obtained was an
average execution time of 0,02 seconds (±0, 004).
6 Conclusions
Acknowledgments. This work was partially supported by SAPO Labs and FCT
through the project PEst-OE/EEI/UI0408/2013 (LaSIGE), and by the European Com-
mission through the BiobankCloud project under the Seventh Framework Programme
(grant #317871). The authors would like to thank to Bruno Tavares, Sara Ribas and
Ana Gomes from SAPO Labs, João Martins, Tiago Aparcio, Farah Mussa, Gabriel
Marques and Rafael Oliveira from University of Lisbon and Arian Pasquali from Uni-
versidade of Porto for all their support, insights and feedback.
References
1. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval, vol. 463.
ACM press New York (1999)
2. Boanjak, M., Oliveira, E., Martins, J., Mendes Rodrigues, E., Sarmento, L.:
Twitterecho: a distributed focused crawler to support open research with twitter
data. In: Proceedings of the 21st WWW, pp. 1233–1240. ACM (2012)
Social Impact - Identifying Quotes of Literary Works in Social Networks 795
1 Introduction
Since the advent of the World Wide Web, an increasing number of users insert
text from a multitude of sources, namely from newly created web pages, news
in electronic newspapers, blogs, product reviewing, and social networks. This
opened new opportunities for linguistic studies and the need for new applications
to intelligently deal with all this text and to make sense out of it.
The question of automatically and effectively assessing the quality of a text
remains unanswered. In general, an experienced human reader can judge the
complexity and quality of a given text, a task not so easily attained with com-
putational means. The human reader can not only determine if phrases are
grammatically correct, but also figure out the lexicon degree and the structural
and rhetorical combination of words, sentences, and ideas. There are aesthetic
principles in the way of writing, yielding different types of texts. The spectrum
ranges from almost telegraphic accretions, posted in Twitter, up to nobel prize
winning novels. The goal of this study covers the topic of the text quality referred
herein by looking after the mathematical principles underlying it.
This paper discusses a series of experiments conducted to find out if the text
produced by humans exhibits a statistical property known as self-similarity. Self-
similarity is a property of fractals, and it refers to the possibility of parts of a
c Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 796–802, 2015.
DOI: 10.1007/978-3-319-23485-4 80
Fractal Beauty in Text 797
2 Related Work
To the best of our knowledge, no other work has been involved in the deter-
mination of self-similarity in text and how it can be used for characterization
assessment of general aesthetic principles. However, there are a number of related
works with similar goals. For example, McCarthy and Jarvins [9] compared dif-
ferent methods to determine lexical diversity (LD) in text.
The LD index measures the vocabulary diversity of a given text segment
and is usually calculated by dividing the number of tokens by the number of
types (the number of unique tokens in a segment of text), also known as the
token-type proportion. As it depends on the considered text segment length, it
is not used directly to compute the LD index. Instead, a number of strategies
have been proposed [6,9], some of them being based on the division of a text in
fixed segments of n tokens. The LD index has been used to assess the writing
skills of a subject in a variety of studies, namely in children language skills
measurements, English second language acquisition, Alzheimer’s onset, and even
speaker socioeconomic status [7,8].
Forensic linguistic analysis [10] is a recent field with diverse applications,
such as plagiarism detection [1], authorship identification [12], cybercrime and
terrorism tracking [10], among others. The work in this field is based on the use
of a number of text characteristics in different levels of analysis like morpho-
logical, lexical, syntactical, and rhetorical [1,12]. These textual characteristics
necessarily exhibit different self-similar properties that suggest investigation. The
findings presented in this work are a first step toward that objective, specifically
at the lexical level.
d
process {X(t)}t∈N as the one that fulfills the condition that X(t) = a−H X(at),
d
where = denotes equality in all finite-dimensional distributions, a ∈ N and 0 <
H < 1 is the Hurst parameter, also referred to as the self-similarity degree or
the Hurst exponent. The most widely known example of a self-similar process
is the fractional Brownian motion (fBm), which has a Gaussian distribution.
Its first order differences process, denoted as fractional Gaussian noise (fGn) is
often useful too, since many natural or artificial processes occur in this form.
Thus, when performing self-similarity analysis, it is typical to assess whether the
empirical values are consistent with sampling a Gaussian variable. In this work,
the Kolmogorov-Smirnov goodness-of-fit test [3] was used for that purpose.
There are several methods for estimating the Hurst parameter from empirical
data, most of them based on repeatedly calculating a given statistic (e.g., variance
or maximum value) for the original process and for a finite number of aggregated
processes. The Hurst parameter estimators used in this study were the well-known
Variance Time (VT) and Rescaled Range Statistics (RS) estimators. The statisti-
cal tools mentioned herein are all implemented in the open-source TestH tool [4].
It accepts files containing raw values separated by space or newline, normalizes
them, and outputs the estimated values of the Hurst parameter and the p-value
concerning the Kolmogorov-Smirnov goodness-of-fit test.
When the Hurst parameter is 0.5, the process is memoryless and each occur-
rence is completely independent of any past or future occurrences. For values
of the Hurst parameter ranging from 0 to 0.5, the process is anti-persistent or
short-range dependent, while for values between 0.5 and 1, the process is said to
be persistent or long-range dependent. There are many examples of long-range
dependent processes in natural and artificial processes (e.g., the water level in
rivers [5]). Prior from starting this work, our expectation was that the text was
self-similar with Hurst parameter larger than 0.5 and that the degree of self-
similarity was perhaps related with the quality of the text.
Our base unit to construct processes of text attributes is a block of 100
words, meaning that all the sequences analyzed in the scope of this work refer to
attributes per 100 tokens. If a self-similar structure is embedded in the data, then
the statistical behavior of these attributes is the same (apart from scaling) for
each block of tokens, or for any number of them. Additionally, it is an indication
that human writing is done in bursts, which means that blocks with higher counts
in some attribute are probably followed by other blocks with higher counts also,
and vice-versa.
The Blogs Corpus: Also known as the Blog Authorship Corpus [11], is a mas-
sive collection of 681K blog posts, gathered from 19320 users, from the blog-
ger.com website. These blogs cover a wide range of subjects like Advertising,
Biotechnology, Religion, Science, among others. It contains a total of 38 differ-
ent subjects, with 300 million words (844 MB). There are three user age clusters:
13-17 years (8240 users), 23-27 years (8066 users) and 33-48 years (2944 users);
The News Corpus: This corpus is formed by a huge set of news stories, auto-
matically collected from the web. An amount of 4.2 MB of text was randomly
selected from the set. The news stories were collected for several main subjects,
namely Politics, Economics, Finance, Science, among others;
The Literature Corpus: The (i) complete work of Shakespeare and the (ii)
set of books (66) from the Bible1 was selected for this corpus type;
The Random Corpus: A corpus of similar size was randomly generated in
order to validate the proposed self-similarity measure. Each word was randomly
taken from the English vocabulary, according to a uniform2 distribution.
The aim of this study is to know whether certain text characteristics exhibit
self-similarity properties by resorting to the estimation of the Hurst parameter.
The test herein described was designed to determine self-similarity in time series,
performing a large number of measurements of attributes over time. Here, the
origin of the several time series is a corpus consisting of a considerable amount
of text. Thus, the experience had to be drawn to meet two principles: the (i) set
of the textual attributes to be measure and the (ii) reading structure of the text
in order to achieve a number of significant measurements.
Definition of Attributes: For the first principle we have considered six lexical
features that are measured for a given block (amount) of text: the number of
(A0 ) non-words; (A1 ) small words (|w| < 3); (A2 ) medium words (3 ≤ |w| < 7);
(A3 ) long words (|w| > 6); (A4 ) sentences; and (A5 ) the lexical diversity.
Reading Structure: To satisfy the second principle we have decided to divide
the text in sequential blocks with equal number of words. It was decided to choose
the block size near to an average paragraph length, assuming five sentences per
paragraph with each one having an average length of 20 words [2]. In previous
studies of this kind, researchers usually take text chunks of this length [6,9].
Below is one such block of 100 tokens:
The guy’s license plate was a little obvious" 68CONVT". I mean, you can see that
it’s a convertible please, because the top was down. Anyway, I stared straight
ahead. but could hear that low throaty rumble next to me. Suddenly, I felt tears
prickling in my eyes. It dawned on me that I was suffering from the maliblues.
What will happen at Hot August Nights? Those muscle cars cruise nightly and rev
and rev and rev. I’m thinking I should get a medical bracelet with maliblues
----------------------------------------------------------------------------------
(~W, |W|<3, 3=<|W|<6, |W|>=6, #Sentences, Lex. Div.) ---> (18, 16, 51, 15, 8, 67)
1
We have chosen the English translation version from King James.
2
In the future a Zipfian law will be considered.
800 J. Cordeiro et al.
5 Results
Each corpora (Section 4) was processed to produce the necessary sequences of
numbers representing the time series to be analyzed. Given the attributes and
block size, these sequences consisted of integer numbers larger than 0 and smaller
than 100. After being input to the TestH tool, they were normalized and the
VT √and RS estimators were applied to the resulting process. The p-value for
the nD statistic was also calculated, via the application of the Kolmogorov-
Smirnov goodness-of-fit statistical test available in the tool. Note that the values
A0 : num. non words A1 : num. words < 3 chars A2 : num. words 3-6 chars
Corpora
VT RS KS VT RS KS VT RS KS
Blogs 13-17 0.73778 0.86064 0.20301 0.70702 0.83312 0.27614 0.72485 0.83613 0.27614
Blogs 23-27 0.76648 0.86895 0.10449 0.81549 0.83606 0.34726 0.75532 0.81676 0.34726
Blogs 33-48 0.84432 0.85356 0.09445 0.83969 0.86252 0.11157 0.86863 0.84596 0.07483
News 0.63524 0.78496 0.01202 0.46540 0.81174 0.00491 0.73615 0.85698 0.00351
Bible 0.53201 0.83720 0.19772 0.65733 0.79622 0.19772 0.88749 0.81389 0.00688
Shakespeare 0.64795 0.84271 0.01136 0.63140 0.75893 0.03116 0.57925 0.78148 0.02035
Random 0.28842 0.46262 0.00432 0.68364 0.51305 0.97693 0.57995 0.46603 0.03807
A3 : num. words > 6 chars A4 : num. sentences A5 : Lexical Diversity (LD)
Corpora
VT RS KS VT RS KS VT RS KS
Blogs 13-17 0.71949 0.86469 0.69745 0.76482 ∗ VOR 0.43729 0.76005 0.82502 0.32957
Blogs 23-27 0.81764 0.88098 0.65178 0.82454 ∗ VOR 0.40002 0.76380 0.81367 0.13147
Blogs 33-48 0.88160 0.92609 0.28191 0.81082 0.88063 0.66304 0.81663 0.82250 0.18178
News 0.75006 0.87849 0.03398 0.66815 ∗ VOR 0.99286 0.64356 0.77951 0.25809
Bible 0.80095 0.84375 0.07482 0.81976 0.89219 0.59353 0.87111 0.86753 0.00039
Shakespeare 0.69241 0.75295 0.14385 0.66554 0.90009 0.29770 0.69826 0.81306 0.00000
Random 0.60186 0.40670 0.03351 0.26403 0.47593 0.67707 0.29526 0.46444 0.00495
∗ VOR: Estimated value out of range.
in the KS column (table 1) suggest that the sequences seem to be coming from
Gaussian processes, with only seven cases rejecting the null hypothesis if the
significance level is set to 0.01 (bold values). This is interesting, and we will
explore if it may be the consequence of the central limit theorem, verifying if
the attributes are the result of the sum of several independent and identically
distributed variables.
For the three age sub-sets of the blogs corpus, we can see that the VT and RS
values improve consistently in almost all attributes, and this difference was very
marked in the A3 and A5 attributes for the VT estimator. This shows well that,
even within the same genre, more mature and possibly more experienced authors
produce text with an higher degree of self-similarity. The estimated values of the
Hurst parameter are consistently larger for RS.
In the literary corpora, we have a significant difference between text from
Shakespeare and the Bible. In the latter, self-similarity values are much higher for
most of the attributes and in particular for the lexical diversity (A5 ). Moreover,
Shakespeare text reveals a low self-similarity in all attributes, with the exception
for the RS estimator in A4 . The news genre did not reveal high self-similarity for
most attributes. There are, however higher parameter values, for the attributes
A1 , A2 , and A3 , which suggests strong self-similarity in the lexicon used. The low
values obtained in the randomly generated corpus (Random) allow us to put in
perspective the values obtained for the other corpora. This reveals a phenomenon
of self-similarity in the human writing process.
Fractal Beauty in Text 801
References
1. Alzahrani, S., Naomie, S., Ajith, A.: Understanding plagiarism linguistic patterns,
textual features, and detection methods. IEEE Transactions on Systems, Man, and
Cybernetics, Part C: Applications and Reviews 42(2), 133–149 (2012)
2. Cordeiro, J., Dias, G., Cleuziou G.: Biology Based Alignments of Paraphrases for
Sentence Compression. Workshop on Textual Entailment (ACL-PASCAL) (2007)
3. Corder, G.W., Foreman, D.I.: Nonparametric Statistics for Non-Statisticians:
A Step-by-Step Approach. Wiley, New Jersey (2009)
4. Fernandes, D.A.B., Neto, M., Soares, L.F.B., Freire, M.M., Inácio, P.R.M.: A tool
for estimating the hurst parameter and for generating self-similar sequences. In:
Proceedings of the 46th Summer Computer Simulation Conference 2014 (SCSC
2014), Monterey, CA, USA (2014)
5. Hurst, H.: Long-Term Storage Capacity of Reservoirs. Transactions of the American
Society of Civil Engineers 116, 770–799 (1951)
6. Koizumi, R.: Relationships Between Text Length and Lexical Diversity Measures:
Can We Use Short Texts of Less than 100 Tokens? Vocabulary Learning and
Instruction 1(1), 60–69 (2012)
7. Malvern, D., Richards, B., Chipere, N., Durán, P.: Lexical diversity and language
development: Quantification and assessment. Houndmills, NH (2004)
8. McCarthy, P., Jarvis, S.: A theoretical and empirical evaluation of vocd. Language
Testing 24, 459–488 (2007)
802 J. Cordeiro et al.
9. McCarthy, P., Jarvis, S.: MTLD, vocd-D, and HD-D: A validation study of sophis-
ticated approaches to lexical diversity assessment. Behavior Research Methods
42(2), 381–392 (2010)
10. Olsson, J., Luchjenbroers, J: Forensic linguistics. A&C Black (2013)
11. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of Age and Gender
on Blogging. AAAI Spring Symposium: Computational Approaches to Analyzing
Weblogs 6, 199–205 (2006)
12. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of
the American Society for Information Science and Tech. 60(3), 538–556 (2009)
How Does Irony Affect Sentiment Analysis Tools?
Sentiment analysis and opinion mining are very growing topics of interest over the
last few years due to the large number of texts produced through Web 2.0. A common
task in opinion mining is to classify an opinionated document as a positive or a nega-
tive opinion. A comprehensive review of both sentiment analysis and opinion mining
as a research field for Natural Language Processing (NLP) is presented in Pang and
Lee [1]. The demand for applications and tools to accomplish sentiment classification
tasks has attracted the researchers´ attention in this area. Hence, sentiment analysis
applications have spread to many domains: from consumer products, healthcare and
financial services to political elections and social events. Sentiment classification is
commonly categorized in two basic approaches: machine learning and lexicon-based.
Machine learning approach uses a set of features, usually some function of the voca-
bulary frequency, which are learned from annotated corpora or labelled examples.
The lexicon-based approach uses lexicon to provide the polarity, or semantic orienta-
tion, for each word or phrase in the text. Despite the considerable amount of research,
© Springer International Publishing Switzerland 2015
F. Pereira et al. (Eds.) EPIA 2015, LNAI 9273, pp. 803–808, 2015.
DOI: 10.1007/978-3-319-23485-4_81
804 L. Weitzel et al.
2 Related Work
3 Background
The Web 2.0 is the ultimate manifestation of User-Generated Content (UGC) systems.
The UGC can be virtually about anything including politics, products, people, events,
etc. One of highlights is the Twitter. Twitter constitutes a very open social network
space, whose lack of barriers to access, e.g., even non-registered users are able to use
Twitter to track breaking news on their chosen topics, from “World Economic Crisis” to
“European Football Championship”, for instance. Twitter social networkers communi-
cate with each other by posting tweets allowing for a public interactive dialogue. On
Twitter, users often post or update short messages referred to as tweets, describing one’s
current status within a limit of 140 characters [5]. Beyond merely displaying news and
reports, the Twitter itself is also a large platform where different opinions are presented
How Does Irony Affect Sentiment Analysis Tools? 805
and exchanged. The interest that users (companies, politicians, celebrities) show in on-
line opinions about products and services and the potential influence such opinions
wield is something that vendors of these items are paying more and more attention to.
Thus, it is important for correct identification of users opinions expressed in written text.
In the general area of sentiment analysis, irony and sarcasm play a role as an interfering
factor that can flip the polarity of a message. According to Macmillan English Dictio-
nary (2007), irony is “a form of humor in which you use words to express the opposite
of what the words really mean”. This mean that it is the activity of saying or writing the
opposite of what you mean. Unlike a simple negation, an ironic message typically con-
veys a negative opinion using only positive words or even intensified positive words
[2]. As humans, when we communicate with one another, we have access to a wide
range of spoken and unspoken figures that help create the intended message and ensure
that our audience will understand what we are saying. Some of these figures include
body language, hand gestures, inflection, volume, and accent. Hence, the challenge for
Natural Language Processing (NLP) is: how to recognize sarcasm and gauge the appro-
priate sentiment of any given statement.
4 Methodology
#Sarcasm>. We also removed special characters ($, %, &, #, etc.); punctuation marks
(full stops, commas etc); all hashtags words and emoticons (smiley). We applied au-
tomatic filtering to remove duplicates tweets, and tweets that are written in other
idioms. Afterwards, we manually classify the ironic posts in order to ensure that all
messages are truly ironic or sarcastic. We gathered about 10000 tweet, after the pre-
processing step, the sample size was about 7628 tweets, where 3288 are positive,
3600 are sarcastic and 740 are neutral.
5 Classification Results
The predictive performances of the models can be seen in Table 1. Table 2 shows the
accuracy and the kappa coefficient. We have not obtained a reasonable model, which
means we have not achieved a small number of misclassification errors (Relative
Absolute Error is about 34%). The simple mean of ROC area is about 87%, which
indicates a good performance of the models in terms of AUC. The Accuracy rates
(equal to 79%) and Kappa (equal to 66%) point out that there is a moderate statistical
dependence between the attributes and the classes. The best performance was
achieved by NBM algorithm according to Precision (86%), F-measure (83%) and
ROC curve Area (95%). However, the rates TP (84%) and Recall (84%) of the SVM
were better than the NBM. On can see that there is a considerable number of TPs, but,
nonetheless, there is not a small number of FPs (16% - incorrectly identified), mainly
if we only consider the irony category.
TP FP F- ROC
Methods Precision Recall Category
Rate Rate Measure Area
80% 13% 82% 80% 81% 83% positive
84% 16% 80% 84% 82% 84% irony
SVM 55% 6% 58% 55% 57% 74% neutral
weighted
78% 13% 78% 78% 78% 82%
avg.
83% 15% 81% 83% 82% 93% positive
80% 10% 86% 80% 83% 95% irony
65% 7% 58% 65% 61% 90% neutral
NBM weighted
79% 12% 80% 79% 80% 94%
avg.
weighted
82% 12% 82% 82% 82% 87%
avg.
SVM NBM
Kappa 64% 66%
Accuracy 78% 79%
6 Conclusions
Individuals post messages in the internet using e-mail, message boards and websites
like Facebook and Twitter. These forms of contact are highly integrated in everyday
life. With the proliferation of reviews, ratings, recommendations and other forms
of online expression, online opinion has turned into a kind of virtual currency for
808 L. Weitzel et al.
businesses willing to market their products, identify new opportunities and manage
their reputations. Despite the considerable amount of research, the classification of
polarity is still a challenging task; mostly since it involves a deep understanding of
explicit and implicit information conveyed by language structures. Henceforth, irony
or sarcasm has become an important topical issue in NLP, mostly because irony (or
sarcasm) flips the polarity of the message. Hence, this paper investigated how irony
affects tools of sentiment analysis. The classifications were conducted with Support
Vector Machine and Naïve Bayes Classifier. The results and conclusions of the expe-
riments raise remarks and new questions. A first remark to be made is that all experi-
ments are performed with English texts. Consequently, the result cannot directly be
generalized to other languages. We believe that the results in other languages are
different are likely, for instance, to have more structured languages such as Brazilian
Portuguese. Another interesting observation is whether very similar results are ob-
tained when the experiments are carried out on data from a different method. From
statistic point of view, there are no relevant differences between methodologies, for
example, the total accuracy ranges between 78 and 79% and kappa from 64 to 66%, in
spite of the inherently ambiguous nature of irony (or sarcasm) that makes hard to be
analyzed, not just automatically but often for humans. Our work indicate that the
NBM and SVM were reasonably able to detect irony in twitter messages. Bearing in
mind that our research deals with only one type of irony that is common in tweets.
The study provides us an initial understanding of the how irony affects the polarity
detection. To better understand the phenomenon, it is essential to apply different me-
thods, such as polarity given by SentiWordNet, based on Lexicon. In this sense, for
future work, we aim to explore the use Lexicon-based tools and thus measure and
compare the obtained results.
References
1. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2,
1–135 (2008)
2. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization.
In: Advances in Kernel Methods, pp. 185−208. MIT Press (1999)
3. Weitzel, L., Aguiar, R.F., Rodriguez, W.F.G., Heringer, M.G.: How do medical authorities
express their sentiment in twitter messages? In: 2014 9th Iberian Conference on Information
Systems and Technologies (CISTI), pp. 1−6 (2014)
4. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)
5. Weitzel, L., Quaresma, P., Oliveira, J.P.M.D.: Measuring node importance on twitter
microblogging. In: Proceedings of the 2nd International Conference on Web Intelligence,
Mining and Semantics, pp. 1−7. ACM, Craiova (2012)
6. Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Trans.
Intell. Syst. Technol. 2, 1–27 (2011)
7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA
data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Author Index